Metadata sources

A while ago, I suggested that it was interesting to think about four sources of metadata in our systems and services:

  1. Professional. Produced by staff in support of particular business aims. Think of cataloging, or data produced within the book industry, or A&I data.
  2. Crowdsourced. Produced by users of systems.Think of tags, reviews and ratings on consumer sites.
  3. Programmatically promoted. Think of automatic extraction of metadata from digital files, automatic classifcation, entity identification, and so on.
  4. Intentional. Data about choices and transactions which support analytics or business intelligence services. Think about ranking, relating, recommending in consumer sites (e.g people who like this also like this) based on collected transaction data.

We were discussing these types of data in a meeting at work the other day, and it occurred to me that what I call here crowdsourced, programmatically promoted and intentional data are all ways of managing abundance. Our model to date has been a 'professional' one, where metadata is manually created by trained staff. This model may not scale very well with large volumes of digital material. Nor does it necessarily anticipate the variety of ways in which resources might be related. The other sources will become increasingly important ....

Comments: 4

Sep 21, 2009
James Pakala

These are helpful indeed, and the professional cataloger also grows in importance, at least for many of us. There's only "Perseverance (Theology)" to get works like ARE THE SHEEP SECURE? that, for a variety of reasons including publisher, completely elude keyword but mix all over the place with law enforcement and wool production. And then there's student naivete and scholar frustration when name and series authorities are lacking. Whew.

Sep 21, 2009
Matthew Beacom

There are a few ways to manage the abundance that currently overwhelms the professional source model for metadata. The most important and powerful is to create the metadata when the object it references is created. Potentially, producers can create metadata for the material they create at the same scale of abundance. What's needed are tools that make it possible to operationalize the potential. Some are coming or now here-id.loc.gov is one example. The relationships (between upstream producers and downstream users like booksellers, libraries, etc.) will grow alongside the tools, but creating and maintaining trusted relationships needs a conscious application of effort.

Sep 21, 2009
bowerbird

that just occurred to you "the other day"?

really? just the other day?

hasn't anybody suggested this to you before?

haven't you ever read that, from any source?

i'm surprised, to the point of being astounded.

-bowerbird

Sep 22, 2009
Peter Findlay

When we developed one of our services we had a very clear message from users (during lab based tests) that they wanted to see a visible distinction between the professional and crowd generated metadata. They wanted both to be present, but still felt it important to know the 'true' provenance of the object. This has proven interesting when the crowd knows the provenance (e.g. this is a recording of my Father made in 1956 in such and such a place) and we don't.