|
Recent Features
Categories
|
Circulating intentional dataNovember 27, 2005 • Categories: Featured , Libraries - organization and services , Marketing , Metadata , OCLCI have posted a couple of times recently about intentional data, data that records choices and behaviors. I mentioned holdings data, ILL records, circulation records, and database usage records. One could extend this list to any data which records an interaction or choice. We are used to looking at transaction logs of various sorts, and new forms of data are emerging, for example, in the form of questions asked in virtual reference. What types of intelligence could be mined from a comparison of the subject profiles of virtual reference questions to the subject profile of collections? Would it expose gaps in the collection, for example? In that context I was interested to read a post on the Gordian Knot pointing to some work by David Pattern at the University of Huddersfield which shows a 'people who borrowed this also borrowed ...' feature. And it does look like a good enhancement. (It does not seem to be available on the 'publicly visible' catalogue.) Circulation is interesting in this context. We run into a long tail sort of a thing. Amazon is the primary exemplar of this type of 'recommender' service. Amazon aggregates supply (it has a very big database of potential hits in the context of any query, increasing the chances that a person will find something of interest), and it aggregates demand (it is a major gravitational hub on the network, so it assembles lots of eyeballs, increasing the chances that any one book will be found by an interested person). The result of this - the aggregation of supply and the aggregation of demand - is that use is driven down the long tail. More materials are aggregated, and more of them find an audience. Now, we know that, typically, the smaller part of a library collection circulates (maybe less than 20% in a research library). We also know that, typically, interlibrary lending trafffic is very, very much smaller than circulation. What does this suggest? Well the former suggests that we have an excess of supply over demand in any library, and we have indeed built 'just in case' collections. However, aggregating demand should make those collections more used, and this appears to be the case in services like Ohiolink, for example, which have aggregated demand for insitutional collections at the state-wide level, increasing the chances that an item will be found by an interested reader. The latter suggests that we have not aggregated supply across libraries in a systemwide way very efficiently, as library users do not very often go beyond their local collection. There are various reasons for this, including library policy in what is made available, but in general one might say that the transaction costs of discovering, locating, requesting and having delivered resources are high enough to inhibit use. Again, this suggests that we have not aggregated supply as effectively as we might in systemwide situations (this was the focus of another post). Coming back to recommendations based on circulation, two things occur to me:
It is clear that we will see services emerge in the library space which are based on the standardization, consolidation, and syndication of 'intentional' data. We may also see greater systems support for the collection and mining of particular forms of local data. These will supply 'intelligence' to support richer user experiences and better management decisions. Compare how services can already access Amazon's data in this way (see for example the liveplasma service build on top of Amazon data). As we extend the ways in which users can discover materials, it puts additional emphasis on the need to improve our systemwide apparatus for delivering those materials. Making data work harder is an integral part of the Web 2.0 discussions, and we certainly have a lot of data to do things with! |
Colleagues
Book Reviews
Journals & Magazines
Find In A Library
|
2 comments so far
I can't keep myself from pointing at the work that the Digital Library Research and Prototyping Team of the Los Alamos National Laboratory has been doing over the past years in mining Digital Library usage data. Quite intriguing results of that work were recently published in:
Bollen, Johan, Herbert Van de Sompel, Joan Smith, Rick Luce. Towards alternative metrics of journal impact: a comparison of dowload and citation data. 2005. Information Processing & Management. Preprint at arXiv:cs.DL/0503007. Information Processing & Management paper at doi:10.1016/j.ipm.2005.03.024.
Over the past year or so, this work has extended into the realm of federating usage logs across environments. We will report on that work at the upcoming CNI Task Force meeting. The abstract of our presentation is available from the CNI Web site.
Hi Lorcan
Just to let you know that the suggestions are live on our test OPAC - e.g.
http://wwwlibrarycat.hud.ac.uk/bib.html?285279
regards
Dave Pattern