The proposed Google Books settlement has created a strong interest in quantifying publications and authors, to get a better sense of the scale of impact. We have been looking at Worldcat and hope to publish an analysis later this year.

Here is an issue that came up this week: how many print books were published in the US since 1923, and how many authors were associated with those books? Here are some numbers, acknowledging that they provide good indications based on the data we have and what we can do with it, not definitive answers.

  • Print books published in the US in 1923 or later: 12,582,962
  • Unique personal authors: 3,685,778
  • Unique corporate authors: 977,679

Now, 'book' is a pretty vague term. This analysis uses the definition we used in the Anatomy of Aggregate Collections paper we published a few years ago, which analyses the collections of the orginal Google 5 libraries, which was as follows:

Although there is no unambiguous bibliographic definition of a book, libraries have often used monographic language materials as a proxy for books, and this practice is adopted for this study. More specifically, in the context of a MARC21 record, a book is defined as a language-based monograph, identified by the codes "a" and "m" in bytes 6 and 7 of the leader, respectively. For the purposes of this study, theses/dissertations and government documents are excluded from the analysis, since these materials are usually acquired and managed as separate segments of the library collection. Records describing books in print format were identified by eliminating all non-print formats, such as digital, microform, Braille, and so on.)

What we are counting are 'manifestations' (in FRBR terms), which might relate to 'title' in common usage. There would be more individual copies. We pull together authors as best we can.

Here is the ranked list of the personal authors by number of manifestations published in the US after 1923.

  1. Shakespeare, William 1564 1616
  2. Marsh, Carole
  3. Twain, Mark 1835 1910
  4. Rudman, Jack
  5. Dickens, Charles 1812 1870
  6. Jackson, Ronald vern
  7. Bloom, Harold
  8. Christie, Agatha 1890 1976
  9. Stevenson, Robert Louis 1850 1894
  10. Cowley, Joy

An interesting list; I have remarked on the Bloom phenomenon before.

Here is the ranked list of corporate authors:

  1. society of automotive engineers
  2. american national standards institute
  3. national business institute
  4. national learning corporation
  5. foreign technology div wright patterson afb ohio
  6. national bureau of economic research
  7. sothebys firm
  8. sotheby parke bernet inc
  9. electric power research institute
  10. naval postgraduate school monterey ca

It will be seen from the list of corporate authors that our working definition pulls in standards and art catalogs. Remember that we are not counting theses and government documents. This is a reminder that although we may have a common-sense notion of a 'book' based on an academic or trade publication, it actually requires some discretionary interpretation to bound the population of books in an operational way for this type of analysis.

And a final reminder: these lists are based on print books published in the US since 1923, not on an analyis of the whole of Worldcat.

The actual analysis was done by my colleagues Jenny Toves and Brian Lavoie.

Comments: 3

Aug 17, 2009
Laszlo Nagypal

I am interested in the number of books concerning Geoffrey Chaucer published during the separate decades of the last century. Can anybody give me a piece of advice (free of charge)?

Aug 17, 2009
Bryan Campbell

I am curious to know if these numbers would change much if you were to factor in the loss
of names, mainly personal, associated with books because of the longstanding cataloging "Rule of three." I have heard the rule will be optional in RDA. I suppose you could count how many books in your sample are affected by the rule, multiply
by 3, because 3 is the minimum number of names excluded, and then add that number to the total.

Aug 17, 2009
Paul Biba

Interesting! Took the liberty of re-posting most of this on TeleRead - with credit and a link of course.

Paul