The Library of Google


In “The Library of Babel“, Jorge Luis Borges describes a library “…composed of an indefinite, perhaps an infinite, number of hexagonal galleries… ” lined with shelves of books. Unfortunately, the books are not organized in any predictable manner, causing librarians to travel “… in search of a book, perhaps of the catalogue of catalogues…” The searches, though, are in vain, given the improbability of finding what you seek in an infinite collection.

This whimsical account is not as far-fetched as it seems: As Geoff Nunberg points out in his critique of the Google Books approach to representing book metadata, Google is proceeding with a reckless abandon driven by technological expedience. His essay chronicles (with his typical wit) the systemic errors in metadata as available through Google Books, and Google’s apparent lack of concern for the reliability of the process or the accuracy or utility of the results. Errors include blatantly incorrect dates, author attributions, and categorical classifications, among others. The lack of concern about this aspect of preservation so vital to library and academic disciplines is reminiscent of Borges’ contrast between “Man, the imperfect librarian”¬† and the perfect universe of the (Google) Library.

But Nunberg’s is not just a pedantic rant of a self-proclaimed wordinista; the settlement will affect how people will access all books in the future because Google is in the process of negotiating itself into possession of the rights to distribute books.

Exploiting an opportunity made possible by lawsuits brought by a small number of plaintiffs on one narrow issue, Google has negotiated a settlement agreement designed to give it a compulsory license to all books in copyright throughout the world forever. [Pamela Samuelson, Professor of Law, UC Berkeley]

Pamela Samuelson goes on to argue that this settlement is problematic because the plaintiffs, while claiming to be acting on behalf of authors and publishers, do not represent the entire classes but only small fractions thereof. The settlement would create a mechanism (the Book Rights Registry) through which Google would distribute 63% of revenues it collects to authors and publishers.

But even if the Registry works as intended in compensating the copyright holders, the settlement will still have an adverse effect on classes of people not represented in it, namely the libraries and the reading public. As currently worded, the settlement provides for extremely limited public access through libraries, providing for “single free terminal per public library building,” with some vague wording for increasing the number if necessary.

Another major issue that needs to be addressed by the settlement is privacy. There is a tradition in the US of the library protecting the privacy of its patrons. But if Google becomes the de facto library, these expectations may well not be met. Not only will there be an intrinsic, overwhelming pressure on Google to mine peoples’ reading patterns for opportunities to make a buck, but also by virtue of that data being collected and centralized, it will be vulnerable to leaks, bugs, and subpoenas.

The preceding has been a list of sins of commission, but the settlement also implies some sins of omission. In the JCDL Panel on Google as a Library, Clifford Lynch raised an interesting and often forgotten point that our cultural heritage consists of more than books, and that these other artifacts –games, images, musical scores, etc. — can be harder to make accessible if the settlement, with its focus on books, ignores these issues. Another issue that was raised concerns the impact on secondary markets such as used book stores: used book sellers are not represented in the settlement, yet stand to lose most of their business to the ubiquity of Google and its trivially-low cost of duplicating what once was a scarce resource.

The recent conference organized by the UC Berkeley School of Information to discuss this settlement is one of a growing number of discussions of the implications of the settlement, but it was an invitation-only event. I am certain that there are many other aspects of the settlement that need to subjected to public scrutiny and debate well beyond the upcoming September 4th deadline. Given that our access to books is about to be transformed, given that Google has demonstrated little regard for the process of preserving the books in ways that reflect how they are used (not just as bags of words), and given that Google’s fundamental interest¬† is to increase the value of its stock, we have a responsibility to examine this settlement with due vigilance. If this deal proceeds as it is currently framed, it will make Amazon’s recent yanking of Orwell’s books from customers’ Kindles look like childs’ play.

