Repository frameworks

adoreArchiveArch.pngHerbert Van de Sompel and colleagues at LANL have been writing about the aDORe architecture for a while (see [pdf] for example). They have now released software to implement an aDORe archive.

The aDORe Archive is a write-once/read-many storage approach for Digital Objects and their constituent datastreams. The approach combines two interconnected file-based storage mechanisms that are made accessible in a protocol-based manner. First, XML-based representations of multiple Digital Objects are concatenated into a single, valid XML file named an XMLtape. The creation of indexes for both the identifier and the creation datetime of the XML-based representation of the Digital Objects, facilitates OAI-PMH-based access. Second, ARC files, as introduced by the Internet Archive, are used to contain the constituent datastreams of the Digital Objects in a concatenated manner. An index for the identifier of the datastream facilitates OpenURL-based access. The interconnection between an XMLtape and its associated ARC file(s) is provided by conveying the identifiers of these ARC files as administrative information in the XMLtape, and by including OpenURL references to constituent datastreams of a Digital Object in the XML-based representation of that Digital Object stored in the XMLtape. [aDORe Archive – Overview]

I am pleased to see some of our Open Source Software as part of the included third-party libraries (in particular for OpenURL and OAI support – see here, and here and here).
We still have no agreed protocol-based ‘interface layer’ for repositories. This has two parts: what are the core services one would like to support, and what approach is best for each service. Each repository has its own way of interacting with users and user applications. This issue is complicated as increasingly we want such interface layers to be active across domains. Think for example of a campus environment which has an institutional repository and a learning object repository, or various departmental repositories (of drawings, of slides, of data-sets, whatever). Increasingly, we want to have consistent ways of interacting with (at least some of) these. Think of a simple scenario. Many people create documents within Microsoft applications. It would be nice to be able to build a simple application on top of, say, the Research Pane (or whatever succeeds it in upcoming versions of Windows) which communicates with one or more repositories so that one could simplify deposit/retrieval/etc by placing it in the user’s routine desktop workflow.
We have candidate protocols for common interfaces:
get: OpenURL
put: SRU/W update
harvest: OAI-PMH
search: SRU/W
An interesting question then becomes how persuasive these approaches are within our community, and maybe more importantly, outside it, to others who are building repositories and applications which interact with them.

3 thoughts on “Repository frameworks”

  1. Thanks for blogging our work, Lorcan.

    The get and harvest protocol interfaces you list are exactly the ones I proposed in several presentations I did over the past months. They are combined with an XML-based complex object representation of digital objects contained in repositories (MPEG-21 DIDL, METS, …). In those presentations, I did not introduce a search interface, because I personally regard search as a service overlay to one or more repositories, not necessarily a core service interface provided by each repository.

    Related links:

  2. Imho they’re very persuasive within our community, but not at all outside it. The kind of common interface you are describing has to be much simpler to gain any sort of real broad adoption. This is why some of us have been working on “unAPI“, which basically tries to pull out some 80% use cases into a five-minutes-to-implement spec. -Dan

  3. I think that it is time to call for standardization of layered services for all the digital repositories and learning objects repositories. It is kind of like OAIS model to define preservation reference model.
    Almost all the institutions have at least one or two digital repositories. Digital assets needs certain (prefer standardized ways) to acquire, process, maintain and preserve. In the long run, standrization will reduce libraries’ cost / stress in terms of selecting software(e.g. DSpace v.s. Fedora ), crosswalk, and interacting with other companies (yahoo, google).

Comments are closed.