All that is solid melts into flows ....

Like most people ;-), I tend to think about metadata as 'schematized statements about resources': schematized because machine understandable; statements because they involve a claim about the resource by a particular agent; resource because any identifiable object may have metadata associated with it.

Metadata is useful because it relieves a potential user (person or program) of having to have full advance knowledge of the characteristics or existence of a resource. In other words, metadata provides 'intelligence' which supports more efficient operations on resources. Examples of operations are discovery, preservation, purchase, reformatting, embedding, analysis, extraction of components, and so on.

Now, I say this by way of introduction because much of our metadata discussion still focuses on refining descriptive metadata for information objects. However, it is clear that as we move into more complex digital environments that this is one part only of the metadata picture. Libraries have developed practices which focus on the inventory needs of relatively 'solid' information resources (books, journals, ...). But all that is solid is melting into flows ... We need more types of metadata than just descriptive; and we need to represent more entities in our world than 'solid' information objects.

In the network world, at least four things have changed.

  1. Information objects have become fluid. They can flow between different environments of use more readily and they can be mixed recombinantly in new forms. To take a conservative example, think about the potential impact of the Google Print/Libraries intiative on our current bibliographic apparatus. We will have print originals, Google digital copies, library digital copies. Technical and rights metadata will come into play with the digital copies. But the germane issue here is how will we relate these various instances within our current bibliographic apparatus? How will this articulate with emerging FRBR practices? And looking beyond this, how do we manage the decomposition of these objects and their recombination in multiple content packages (e-portfolio, exhibition, courseware, and so on)?
  2. As more business processes are moved into applications, we need to manage data about many more business entities. Think about emerging e-resource management systems for example, where we need to manage information objects, but also licenses, policies and a range of other data. See the ERMI initiative, for example, as one place where these issues are being addressed.
  3. We need to manage interactions between these entities. This raises issues of rights and tracking, among others. So, for example, the Counter initiative is looking at how we manage and share usage data.
  4. We need to be able to programmatically derive more metadata, whether this is resource metadata promoted from digital resources themselves, or usage and tracking data collected from interactions within a digital environment, or data captured from users. Think of how Amazon reflexively adapts to your use of it, based on the data about use and usage it collects.

In turn, here are some issues that these directions suggest, presented in no particular order:

  • Multiple business entities. Here are some of the entities that we need to model within our systems: users, rights, licenses, policies, services, 'complex' information objects, 'simple' information objects, organizations. Where it makes sense we need to take from the broader community. With limited effort, we should only develop approaches where none else suitable exist.
  • Abstraction and models. The liquidity of resources and the multiple entities involved in our activities suggest the benefits of some abstraction and modelling if we are to be able to build viable digital information environments. See, for example, the entity-relationship model advanced as part of the ERMI work above, whose purpose is to help clarify what needs to be modelled within e-resource management systems. See the entity-relationship model presented by Michael Heaney in his discussion of collection level description, whose purpose is to help clarify what needs to be modelled in a collection description scehema. See the model in Premis. See FRBR. See INDECS. See, no doubt, multiple other initiatives. What is the appropriate level of engagement between these activities and where does it happen?
  • Rights, policies, licences. These all become more important in a liquid world. We tend to think of rights as a way of locking commerical resources down. But, increasingly we want to be able to say something (make 'statements') about appropriate uses of any resource. This is especially so as resources flow recombinantly between many parties and packages. As more interactions are automated, then we also need to encapsulate 'intelligence' which guides decisions in machine-readable form. Policy and license data potentially become more important. This data is becoming available in digital form, but for human inspection only. It needs to be 'schematized' for machines to make use of it without human intervention.
  • All of this raises the importance of modeling and representing events, which I speak about elsewhere.
  • We have made some progress with automatic promotion of metadata from resources. We need to do more, especially as our existing manual processes do not scale very well. Much existing metadata creation for digital resources does not look sustainable unless more cost is taken out of the process.

This note is prompted by discussion about the protocols entry I did below where I suggest that we would benefit from focusing in on a small number of simple protocols and building services from those. However, metadata presents us with more challenges moving forward which make it less easy to suggest where the 'simple enough' balance is.

We should not be adding cost and complexity, which is what tends to happen when development is through multiple consensus-making channels which respond to the imperatives of a part only of the service environment. This is especially so as libraries work hard to demonstrate value in changing times. The Blue Ribbon Panel, set up to review NISO strategy, suggested that it needs to develop a framework within which to establish gaps and direction. Perhaps this issue is something which might form part of their deliberations.

Comments: 2

Jun 01, 2005
Yan Han

One of the metadata issues is that there are multiple standards/best practices, which in turn creates complexity in managing digital collections. MARC, DC, EAD, IEEE/LOM, SCORM etc. No single digital library systems can handle all the standards. I think one major research should be doing (or there are some people working on) is to facitate a higher inteorperability for libraries, special collections and museums.

May 21, 2007
K.G. Schneider

"Like most people ;-)"

Best. emoticon. use. evah.