I thought I would post some numbers here which were prepared by my colleague Brian Lavoie for another purpose. The question was: how many of the books in US libraries are in English?
First of all, what is a book? Deciding what a book is involves some choices (are theses in or out, for example?). This analysis uses the definition of 'print books' given in the Google 5 analysis published in DLib Magazine a while back .
a. All of WorldCat (Apr 09):
135.3 million records
Cataloged as "eng": 46 percent (so 54 percent non-English)
b. Print books only (Apr 09):
Cataloged as: "eng": 40 percent (so 60 percent non-English)
c. Print books in US libraries (Jan 09)
Cataloged as "eng": 57 percent (so 43 percent non-English)
d. Print books representing combined collections of three academic research libraries participating in GBS (April 2009):
Cataloged as: "eng": 54 percent (so 46 percent non-English)
Note - c is calculated on a slightly earlier version of the database as we had already pulled out US library holdings. The data in d is being looked at for another purpose: hence the slightly arbitrary selection of 3 libraries.
Note - these numbers are for records in the database, which represent 'manifestations' in FRBR terms. If one were to count holdings or actual copies the numbers would be different. The proportion of 'eng' would go up as English titles will be more widely held and in greater numbers of copies.
 Here is how the definition of a 'print book' was decided upon and operationalised for the Google 5 analysis. "Although there is no unambiguous bibliographic definition of a book, libraries have often used monographic language materials as a proxy for books, and this practice is adopted for this study. More specifically, in the context of a MARC21 record, a book is defined as a language-based monograph, identified by the codes "a" and "m" in bytes 6 and 7 of the leader, respectively. For the purposes of this study, theses/dissertations and government documents are excluded from the analysis, since these materials are usually acquired and managed as separate segments of the library collection. Records describing books in print format were identified by eliminating all non-print formats, such as digital, microform, Braille, and so on."