OCLC Research has been doing quite a bit of work on collection analysis in the last couple of years: mining Worldcat for management intelligence about the characteristics of libary collections.

One strand of this work was reported at the recent CNI meeting under the title 'A System-Wide View of Library Collections'. This is work carried out by my colleague Brian Lavoie and Roger Schonfeld of Ithaka. [abstract] [ppt]

The purpose here is to analyse a subset of WorldCat (broadly 'books': the detail is in the ppt) to see what can be said about the aggregate characteristics of library collections over time. The analysis ended up focusing on 32M records.

Some standout preliminary findings for me:

  • Half of all books held were published after 1977
  • 82% of books were published in 1923 or afterwards. 18% were published before 1923. In other words, the larger part of library collections is of in-copyright materials.
  • 'rareness' is more common than was expected. Only 33% of records have more than 5 holdings. I like to think about this by saying that library collections are less 'vanilla' than was expected. This requires further analyis as it has implications for systemwide preservation moving forward.

We are looking at other strands of work also. One study is looking at the characteristics of items held uniquely by a library for example The motivating factor for us is how interest in this intelligence is growing as research libraries and others begin to think about how they will manage a growing bookstock in a changing environment.

Incidentally, OCLC has also just released its new collection analysis service.

All part of making data work harder!

Comments: 2

Apr 11, 2005
kgs

Does the report factor in the "unique" records that are really the same bibliographic item, i.e. editions with slightly different pagination, etc.?

Apr 11, 2005
Brian Lavoie

Re: the previous comment:

The report analyzes data for both print book manifestations and works (defined according to FRBR). So the factor you mention would be accounted for when we rolled up the individual manifestations into distinct works.

By the way, for those that are interested, we mapped manifestations to works using the OCLC research FRBRization algorithm; see:
http://www.oclc.org/research/software/frbr/

Brian