Metadata ...

Günter has a nice entry on metadata and explores correspondences across the GLAM sectors - libraries, archives and museums. He notes a specific content type in each domain, bibliographic, archival, and material culture, respectively. Then he compares the metadata stack for each type of material, using a useful typology: data structure (e.g. MARC), data content (e.g. AACR2), data format (e.g. ISO 2709) and data exchange (OAI). Check it out for fuller enumeration of acronyms. Of course, one can add other acronyms along various dimensions ...

Reading the entry prompted several thoughts, largely from a library perspective:

  • Conceptual models. The library community has FRBR; the museum community has the CIDOC Conceptual Reference Model. Each attempts to identify and define concepts important to a domain and, importantly, the relationships between them: they aim to provide a model of the world of interest, which in turn provides a basis for design of metadata approaches. Of course, although I say 'world' there are things in the world which are not included. FRBR, for example, identifies some of the concepts and relationships of interest, and not others. Other models have been developed in more specific areas. A couple which are influenced by FRBR are Michael Heaney's work on collections, and, more recently, Andy Powell and Julie Allinson's work on the model underlying the E-prints application profile.
    This work uses a combination of FRBR and the DCMI Abstract Model to create a description set for an eprint that is much richer than the traditional flat descriptions normally associated with Dublin Core. The intention is to capture some of the relationships between works, expressions, manifestations, copies and agents. [eFoundations: DC-2006 Special session - ePrints Application Profile]
    INDECS and the work built on it is in a similar space in the rights world.
  • Abstract model. The Dublin Core Abstract Model is a data model, whose purpose " is to provide a reference model against which particular DC encoding guidelines can be compared, in order to facilitate better mappings and translations between different syntaxes". More broadly, its supporters see it as having application beyond DC, potentially providing a consistent framework for how one groups properties about resources. In a way, it shifts emphasis from particular fixed 'data structures' in the typology above towards constructs like application profiles.
  • The data structures mentioned by Günter, and other data structures, will typically designate some elements whose values are taken from controlled lists or vocabularies. We are used to thinking about controlled vocabularies for people (e.g. authority files), places (e.g. gazeteers) and things (e.g. subject schemes like LCSH, MESH, and so on). This is clearly an area of strong shared interest for libraries, archives and museums even if approaches have diverged. There are other controlled lists. For example, Thom talks about MARC relator terms and codes, where the redundancy he discusses would seem to limit the usefulness of the controlled approach. This is a pity, as relationships between entities are probably among the most useful things that we can record about them especially as we try to improve navigation, clustering and retrieval in large bibliographic systems. We have lists for languages or countries and so on. Onix has codelists; indeed its approach is to 'control' a large part of the data. An advantage of control is predictability, simplifying design and processing. A more permissive or discretionary approach may appear attractive to some, but ultimately may make data less useful and applications harder to build.
  • In the library community, the ISO2709/MARC/AACR stack is in widespread use but is not universal.
  • Although they are intricately connected, the data structure (MARC), the data content structure (AACR/RDA), and the conceptual model (FRBR), are managed through different structures and on different schedules. One might argue that while they are conceptually distinct; in practice they are closely linked and mutually interdependent.
  • At the data structure level, a library may have some interest in MARC, various flavors of Dublin Core, MODS, EAD, and potentially IEEE LOM and Onix. Given the variety of levels at which this data can diverge, issues of transformation are complex.
One could go on. Does this all seem a little too complex in our fast moving world?

I hope that the Advisory Committee on the Future of Bibliographic Control, established by the Library of Congress, considers some of these issues. (Disclosure: I am at at-large representative on the Committee.)

Note: I have benefited from some discussion with colleagues on these matters and am certainly interested in more general views about the 'future of bibliographic control'.

Related entry:

Comments: 5

Nov 08, 2006
Jonathan Rochkind

I am confused about the difference between a 'conceptual model' like FRBR and an 'abstract model' like DCMI.

What is the difference between a conceptual model and an abstract model? Does anyone have any useful thoughts there?

I agree strongly that getting our thoughts straight about what models ARE and how to use them, and which ones we need---followed by adopting some consensus on models---is the crucial puzzle to be solved in dealing with contemporary metadata issues, in libraries and out. I've been thinking about this a lot lately. It's a very very confusing topic, so much abstraction. But that's what makes it tough, of course.

Nov 08, 2006
Jerome McDonough

Two of your points particularly strike home for me. The first is the importance of recording the relationships between entities, and developing languages to account for them. This extends beyond the realm of traditional descriptive metadata. The various information packaging standards (METS, MPEG21, XFDU) could also benefit from standards describing relationships between metadata records, and records and content. Ontologies applicable across all these standards would be a good thing.

The second point is that while formats/rules of description/data models may be conceptually distinct, they are in practice highly interdependent. If you want to see more radical change in bibliographic practice, you can't change just one.

Nov 08, 2006
Alex

Jerome: Exaclty! We need to look at the full stack if we're ever going to get out of here alive.

I've done a bit of work creating a joint model of MARC (and the culture of MARC) and FRBR within a Topic Maps Reference Model (a generic frames-based contractual model which can support any abstract or concrete model you like), and also see where models like XOBIS fits into this. It certainly creates interesting cross-sections. I'm starting to think we need to have an abstract layer that can ecapsulate the various bits better, done through an ontological layer and a generic system.

Personally I think the whole GLAM shebang (catchy title of some report? :) have for too long relied on putting their IP into formats instead of models, which is why smaller sane models (such as FRBR, for example) is good in themselves but not good enough for what we want to do.

I also doubt very much we'll have a single vendor helping us out in this respect.

Nov 09, 2006
lorcan dempsey

Jonathan, Pete Johnston, one of the authors of the DCMI Abstract model has a post which addresses the difference.

This is at:
http://efoundations.typepad.com/efoundations/2006/11/models_models_e.html

Among other things, he says: "I'm probably glossing over some more complex issues here, but I think the key difference is that these two classes of model are models of different things: the former specifies what types of things are being described, and the latter specifies the nature of the descriptions themselves." (The former is the conceptual or application model, the latter is the abstract model here.)

I am putting this link here as Pete had difficulty leaving a comment.

Nov 14, 2006
Simon Spero

One thing that would be helpful, especially for FRBR-like systems, would be more rigorous definitions of what inference rules can be applied to the semantic networks they define.

For example, it seems that the most reasonable interpretation of FRBR is as a non-monotonic inheritance network with exceptions.

Also, if you're trying to deal with complicated compound works (e.g. the versions of a number of source files used to run a simulation refered to in a published paper), it would really nice to have much more precise definitions of equivalence for expressions and manifestations. For example, in a digital setting, if two manifestations can automatically be converted into each other it would be nice to be able to infer that they embody the same expression, and conversely, that if they cannot be automatically be converted, they embody different expressions. Unfortunately FRBR is moving in the opposite direction :-(

Ah well, there's always the next one. Viva Mess O' Data?

Simon // F is to FRBR as S is to SOAP