The proposed Google Books settlement has created a strong interest in quantifying publications and authors, to get a better sense of the scale of impact. We have been looking at Worldcat and hope to publish an analysis later this year.
Here is an issue that came up this week: how many print books were published in the US since 1923, and how many authors were associated with those books? Here are some numbers, acknowledging that they provide good indications based on the data we have and what we can do with it, not definitive answers.
- Print books published in the US in 1923 or later: 12,582,962
- Unique personal authors: 3,685,778
- Unique corporate authors: 977,679
Now, 'book' is a pretty vague term. This analysis uses the definition we used in the Anatomy of Aggregate Collections paper we published a few years ago, which analyses the collections of the orginal Google 5 libraries, which was as follows:
Although there is no unambiguous bibliographic definition of a book, libraries have often used monographic language materials as a proxy for books, and this practice is adopted for this study. More specifically, in the context of a MARC21 record, a book is defined as a language-based monograph, identified by the codes "a" and "m" in bytes 6 and 7 of the leader, respectively. For the purposes of this study, theses/dissertations and government documents are excluded from the analysis, since these materials are usually acquired and managed as separate segments of the library collection. Records describing books in print format were identified by eliminating all non-print formats, such as digital, microform, Braille, and so on.)
What we are counting are 'manifestations' (in FRBR terms), which might relate to 'title' in common usage. There would be more individual copies. We pull together authors as best we can.
Here is the ranked list of the personal authors by number of manifestations published in the US after 1923.
- Shakespeare, William 1564 1616
- Marsh, Carole
- Twain, Mark 1835 1910
- Rudman, Jack
- Dickens, Charles 1812 1870
- Jackson, Ronald vern
- Bloom, Harold
- Christie, Agatha 1890 1976
- Stevenson, Robert Louis 1850 1894
- Cowley, Joy
An interesting list; I have remarked on the Bloom phenomenon before.
Here is the ranked list of corporate authors:
- society of automotive engineers
- american national standards institute
- national business institute
- national learning corporation
- foreign technology div wright patterson afb ohio
- national bureau of economic research
- sothebys firm
- sotheby parke bernet inc
- electric power research institute
- naval postgraduate school monterey ca
It will be seen from the list of corporate authors that our working definition pulls in standards and art catalogs. Remember that we are not counting theses and government documents. This is a reminder that although we may have a common-sense notion of a 'book' based on an academic or trade publication, it actually requires some discretionary interpretation to bound the population of books in an operational way for this type of analysis.
And a final reminder: these lists are based on print books published in the US since 1923, not on an analyis of the whole of Worldcat.
The actual analysis was done by my colleagues Jenny Toves and Brian Lavoie.