Data flows in the book world

One of the recommendations of the Library of Congress Working Group on the Future of Bibliographic Control was that ways should be found of harnessing publisher data upstream of the cataloging process. The rationale was that this would make data about materials available earlier and reduce overall creation effort.

OCLC recently organized an invitational symposium which had this issue as a central topic. The report is an interesting set set of notes from the different perspectives of the multiple players involved. It discusses current practices and incentives to do things differently.

In a follow-on activity to the LC report, R2 Consulting are mapping the flow of MARC records in North American. The symposium notes say: "This list of distributors is much larger than originally anticipated and consists of a very diverse group of entities."

And, as I discussed the other day, the Research Information Network has published a report about UK practices, Creating catalogues: bibliographic records in a networked world [Splash page; pdf], which also recommends greater re-use of records across the publishing and library worlds.

So, there certainly seems to be a convergence of interest here. Indeed, the potential benefits of such sharing have been a topic of discussion for many years. For example, at the OCLC Symposium, Brian Green, Executive Director of the International ISBN Agency, and I reminisced about UK initiatives to we had been party almost, gulp, twenty years ago to try to create the conditions for an 'all-through' system of bibliographic record exchange between the various players in the bookworld.

Now, clearly quite a lot has happened and as R2 reported above data flows through many parties. And publisher data does flow into CIP, and into various organizations which support libraries. Amazon has done much to underline the importance to publishers of having book metadata to support a variety of operations. That said, the renewed emphasis on publisher-library data flow, certainly from the library side, suggests that much more might be done.

Why has more not happened to promote the flow of metadata through the system, from publishers to libraries? Three things occur ...

First, there is the mechanical issue of data exchange. Onix has now emerged as a shared approach to disseminating publisher data. However, it is interesting reading the remarks about Onix in the report of the OCLC Symposium. Netlibrary reports that 10% of publishers supply data in ONIX, representing 50% of the supplied content. NLM also reported that 10% of publishers supply Onix, but that these account for 80% of materials catalogued at NLM. There were also lots of comments about the consistency of Onix data. However, one would expect improved technical apparatus to support data flow, not create the need for it.

This prompts the second question: what incentives exist and are they aligned across the system? Historically, metadata may have been created for different purposes. Publishers had an interest in the supply chain, and libraries an interest in inventory control. There may be a shared interest in discovery, but it has been approached differently in each area. In fact, one library interest is a recognition that more descriptive material (table of contents, summary, etc) is in fact very useful for users of their catalogs and other systems even though they have not historically made it a part of their catalog data. There may also be an interest in getting basic descriptive data earlier, to allow more time to be spent on other parts of record creation. What incentives exist for publishers to make data available to libraries? Amazon, and other agents in the supply chain, provide an incentive to make appropriate metadata available to support discovery and sales. Data is supplied for CIP purposes. Are there additional incentives? One may be to have enriched metadata flow back to publishers. Are there incentives here which are strong enough for a framework to emerge within which there is greater flow?

And third, related to this, and probably most important, is that the incentives on either side have not been strong enough to encourage organizations to develop services in this area which would make the flow a reality.

Of course, the reason that OCLC hosted the Symposium mentioned above is that it is now looking at whether it is sensible to begin providing such services. It is doing this in its 'next generation cataloging' program.

OCLC has launched a pilot project to explore upstream metadata capture and enhancement using publisher and vendor ONIX metadata. Pilot partners from the publishing, vendor and library communities are assisting us in this effort. We hope the pilot will result in ongoing processes for the early addition of new title metadata to WorldCat and enhanced quality and consistency in upstream title metadata used by multiple channels. [Next generation cataloging]

Update: In response to query, see here for more information about how OCLC can work with publishers and here for how OCLC works with book vendors to deliver cataloging data.


Postscript: The conversation with Brian Green prompted me to look up various pieces I wrote at the time which reflected some of the discussion we remembered. (I note that while I have difficulty opening Word files from that time, the RTF file is still readable.)

  • Publishers and libraries: an all-through system for bibliographic data?
    International Cataloguing and Bibliographic Control. 20 (3), July/September, 1991, 37-41.
    RTF: http://www.ukoln.ac.uk/services/papers/ukoln/dempsey-1991-01/ubcim.rtf
  • Users' requirements of bibliographic records: publishers, booksellers, librarians.
    ASLIB Proceedings, 42 (2), February 1990, 61-69. [Worldcat.org]
  • Bibliographic records: use of data elements in the book world. Bath: Bath University Library, 1989. ISBN 0861970853 [Worldcat.org]

Comments: 3

Jun 15, 2009
Hugh Taylor

Another report worth keeping sight of is Celia Burton's "Onix for Libraries" (2001)
http://www.bic.org.uk/files/pdfs/onixlibrep.pdf

But one of the problems in any discussion about the value of ONIX in the continuum of data flow is that, like MARC, it's principally a transmission standard. Like MARC it includes a number of coded lists that ensure consistency of content, but generally it's a data carrier - no more, no less. Mapping ONIX elements to MARC is one thing; ensuring that the content that's being mapped is appropriate for use in a different environment is quite another.

Jun 15, 2009
Jonathan Rochkind

An important topic. So, as you note, one of the current biggest incentives for publishers to supply relatively good metadata is Amazon. And I don't mean Amazon as synechdoche for things like it -- I mean Amazon itself, singularly, due to it's giant market position.

So publishers supply metadata to Amazon, in some format, but don't necessarily supply it in a format or through a mechanism available to anyone else.

So, it's worth thinking about -- can the library chain of communication get this metadata _from Amazon_ in any useful way? There may not be sufficient metadata in Amazon to support library needs -- but is some better than none? One could theoretically get metadata from Amazon for free through Amazon's API, although Amazon's terms of service for the API make this use somewhat questionable.

I'm not very familiar with library acquisitions processes, but I know many libraries are now actually purchasing books through Amazon, I think through an intentional library program Amazon has? I wonder if Amazon supplies any metadata with these library purchased books, in what formats? I wonder if it would be worthwhile for OCLC to explore making some kind of deal with Amazon where OCLC is explicitly allowed to harvest metadata from Amazon to serve as the seed of richer library metadata.

Jun 15, 2009
Ed Jones

For much of the content it carries, ONIX is more structured than MARC 21, and its content is more formalized. In some ways the recent additions to MARC 21 made to accommodate the needs of the Germanophone cataloging community--whose content is similarly more structured and formalized--may serendipitously facilitate the accommodation of relevant ONIX data without severe degradation.