I have written about 'intentional' data on and off, data recording user preferences or choices. Such data has a variety of uses in our domain: we are all familiar with Amazon's 'people who bought this also bought this' feature. One of the major lessons of Google is to show how important such data is to improving the retrieval experience. The page rank algorithm uses 'intentional' data (the choices made by people in linking to other sites) to inform the order in which results are returned. One of the reasons I like FictionFinder is that it uses holdings data to rank results to similar effect. In this case, purchasing choices made by libraries influences the ranking and it works well. And we are familiar with the use of citation data in broader scholarly discussion and assessment.

In general, consumer sites on the web make major use of such data, and it is especially valuable when they can connect it to individial identities. They use it to build up user profiles, to do rating and comparisons across sites, to recommend, and so on. Of course this is increasingly important in an environment of abundant choice and scarce attention: they are investing more effort in 'consumption management'. We are all familiar with the benefits, and the irritations, of organizations who want to build a deeper understanding of what we do and make us offers based on that.

Libraries have a lot of data about users and usage. And there are now some initiatives which are looking at sharing it. However, in general, libraries do not have a data-driven understanding of individual users' behaviors, or of systemwide performance of particular information resources. This is likely to change in coming years given the value of such data. So, we are seeing the growth in interest in sharing database usage data. And technical agreements and business incentives for third party providers will support this development. And, of course, libraries want to preserve the privacy of learning and research choices.

We are also seeing more research into the usefulness of usage data, and I am thinking in particular here of the Mesure project:

The project's major objective is enriching the toolkit used for the assessment of the impact of scholarly communication items, and hence of scholars, with metrics that derive from usage data. The project will start with the creation of a semantic model of scholarly communication, and an associated large-scale semantic store that relates a range of scholarly bibliographic, citation and usage data obtained from a variety of sources. Next, an investigation into the definition and validation of usage-based metrics will be conducted on the basis of this comprehensive collection. Finally, the defined metrics will be cross-validated, resulting in the formulation of guidelines and recommendations for future applications of metrics derived from scholarly usage data. [MEtrics from Scholarly Usage of Resources - Los Alamos National Laboratory]
In the context of this discussion, I was interested recently to come across a paper on 'Emergent Knowledge' by Chunka Mui [available for fee on Amazon]. As more of what we do moves into a network environment, so does the amount of data that we shed grow. Data about behaviors and choices, and other data. Mui talks about how this data can be gathered and mined to create 'emergent knowledge'. He presents this taxonomy of emergent knowledge:
  1. Identity. People and objects increasingly reveal their identity to systems and services enabling better tracking and profiling. We are familar with the use of transaction data where we can connect connect identities and track behaviors.
  2. Location. Connecting identies to locations is generating value in many service areas. Geo positioning and geo locator services are growing.
  3. Health and diagnostics. Remote monitoring and diagnostics.
  4. Preferences. The ability to connect identities (of people and objects) through transactions, and potentially at particular locations, provides many opportunities to mine data as discussed above.
  5. Quality of service. Mui gives the example of how the Hartford Insurance Company actually analyses the recordings it has of telephone transactions, connecting that with outcome and process information to create a cycle of learning and improvement. (Think virtual reference ...).
Much of what I am talking about above relates to Identities and Preferences in this taxonomy. And incidentally this type of application is one more reason why it would be good to be able better to unambiguously identify the range of resources of interest to libraries.

I am prompted to caricature those portentous lines of Eliot from The Rock often raised in library conversation (where is the knowledge we have lost in information, etc). We might well ask ourselves where is the data we have lost in information management, and the knowledge we have forsaken thereby.

Related entries:

Comments: 1

Jan 07, 2007
Mark

This post was a Ringmaster's Pick for the 62nd Carnival of the Infosciences, 7 January 2007. This installment of the Carnival may be found at: http://marklindner.info/blog/2007/01/07/carnival-of-the-infosciences-62/