In English?

I thought I would post some numbers here which were prepared by my colleague Brian Lavoie for another purpose. The question was: how many of the books in US libraries are in English?

First of all, what is a book? Deciding what a book is involves some choices (are theses in or out, for example?). This analysis uses the definition of 'print books' given in the Google 5 analysis published in DLib Magazine a while back [1].

a. All of WorldCat (Apr 09):
135.3 million records
Cataloged as "eng": 46 percent (so 54 percent non-English)

b. Print books only (Apr 09):
91.2 million
Cataloged as: "eng": 40 percent (so 60 percent non-English)

c. Print books in US libraries (Jan 09)
42.5 million
Cataloged as "eng": 57 percent (so 43 percent non-English)

d. Print books representing combined collections of three academic research libraries participating in GBS (April 2009):
7.2 million
Cataloged as: "eng": 54 percent (so 46 percent non-English)

Note - c is calculated on a slightly earlier version of the database as we had already pulled out US library holdings. The data in d is being looked at for another purpose: hence the slightly arbitrary selection of 3 libraries.

Note - these numbers are for records in the database, which represent 'manifestations' in FRBR terms. If one were to count holdings or actual copies the numbers would be different. The proportion of 'eng' would go up as English titles will be more widely held and in greater numbers of copies.

[1] Here is how the definition of a 'print book' was decided upon and operationalised for the Google 5 analysis. "Although there is no unambiguous bibliographic definition of a book, libraries have often used monographic language materials as a proxy for books, and this practice is adopted for this study. More specifically, in the context of a MARC21 record, a book is defined as a language-based monograph, identified by the codes "a" and "m" in bytes 6 and 7 of the leader, respectively. For the purposes of this study, theses/dissertations and government documents are excluded from the analysis, since these materials are usually acquired and managed as separate segments of the library collection. Records describing books in print format were identified by eliminating all non-print formats, such as digital, microform, Braille, and so on."

Comments: 4

Jul 03, 2009
Jonathan Rochkind

That's percent of bibs, not percent of holdings, right? I'd guess that the English bibs are much more widely held than the non-English ones. To really get at what the questioner probably wanted to know, it would be interesting to take account of holdings attached to the bibs, and say what percentage of holdings are of English books.

Jul 04, 2009

I don't suppose we have similar numbers for non-English speaking countries (say France, Spain, Germany, Russia, Japan) that show how many holdings they have not in the native language of their country? I'd guess they would have a slightly higher percentage, but that would just be a guess....

Jul 04, 2009
Lorcan Dempsey

@Jonathan .. yep. That is what I was saying in one of the notes. You would get a diff answer with holdings, and a diff one again with individ copies.

Jul 04, 2009
Lorcan Dempsey

@Scott. WC coverage might be less good in some of these countries although is growing. I imagine you are right. It is something we might check sometime.