Tim O'Reilly has just made available a suggestive piece about Web 2.0, which provides many points for the library community to ponder. One of these comes back to a concern I raise regularly in these pages: making data work harder. He is discussing the use of data, and notes how NavTeq provides data to many services.
The now hotly contested web mapping arena demonstrates how a failure to understand the importance of owning an application's core data will eventually undercut its competitive position. MapQuest pioneered the web mapping category in 1995, yet when Yahoo!, and then Microsoft, and most recently Google, decided to enter the market, they were easily able to offer a competing application simply by licensing the same data.
Contrast, however, the position of Amazon.com. Like competitors such as Barnesandnoble.com, its original database came from ISBN registry provider R.R. Bowker. But unlike MapQuest, Amazon relentlessly enhanced the data, adding publisher-supplied data such as cover images, table of contents, index, and sample material. Even more importantly, they harnessed their users to annotate the data, such that after ten years, Amazon, not Bowker, is the primary source for bibliographic data on books, a reference source for scholars and librarians as well as consumers. Amazon also introduced their own proprietary identifier, the ASIN, which corresponds to the ISBN where one is present, and creates an equivalent namespace for products without one. Effectively, Amazon "embraced and extended" their data suppliers. [O'Reilly: What Is Web 2.0]Libraries have invested a great deal in bibliographic data - yet it has remained somewhat inert in our catalogs, failing to release the value of that investment. We have been working to address that more recently by trying to make data work more effectively on the open web through Open Worldcat, developing work-based approaches with FRBR, and by creating interfaces which better exploit the available data. We also will begin to capture user-contributed data this weekend as we add Wiki capabilities to OpenWorldCat. (Examples and screenshots can be found in a Members' Council presentation on making data work harder [ppt]).
Electronic catalogs, wherever you go in the academic world, have become a horrible crazy-quilt assemblage of incompatible interfaces and vendor-constrained listings. Working through Tripod's article and specialized subject indices, in a relatively small collection, you still have to navigate at least five completely different interfaces for searching. Historical epochs of data collection and cataloguing lie indigestibly atop one another. The Library of Congress subject headings, which long ago slid into uselessness, now actively misrepresent existing nodes and clusters of knowledge in many academic fields. [Burn the Catalog]In thinking about this, I keep coming back to three things:
- Available data needs to work harder: it is not good enough talking about the value of structured data if that structure is not exercised to provide value in a user interface. Some examples were given above.
- Put data where the user is. Sometimes the user will come to a catalog; other times the catalog needs to go to the user - by exposing metadata in the search engines, in RSS feeds, through data export of other types. In fact, the catalog is often hidden: a user will have to click through several screens to reach it. A growing number of libraries have a catalog search box on their home page, which is useful.
- Consolidation. The catalog is a part of a fragmented library resource, which exercises low gravitational pull. In the longer run, I wonder if a larger part of the traffic to the catalog will be generated by linking from larger consolidated resources. Whether this is from union catalog resources such as RedLightGreen or OpenWorldCat, Google Scholar, or maybe even Amazon. Some of these in time will allow local views of data.