May 02, 2008
•
Categories:
Libraries - distributed environments
, OCLC
, Research, learning and scholarly communication
Some items of possible interest which were in a little email pile waiting for attention ......
Arrow
An Australian colleague alerted me to the redesign of the Arrow Discovery Service. Arrow aggregates access to Australian research repositories.
Welcome to the ARROW Discovery Service - where you can search 143,582 Australian research outputs, including theses; preprints; postprints; journal articles; book chapters; music recordings and pictures.
The ARROW Discovery Service searches simultaneously across the contents of Australian university research repositories. The list of currently participating universities, and the number of outputs currently in each repository, is listed at the left. [Arrow]
Search box is complemented by tag cloud access. Results filtering by facets, including institutional facets. Alerts can be set (although it does not have RSS feeds, as I notice Roddy MacLeod pointed out somewhere).
Catalog Widget
The Information Resource Centre (IRC) at Jacobs University, Bremen, has produced a catalog widget, jOPAC, as part of its broader initiative to produce a range of 'Web 2 tools'.
The IRC has started developing Web 2.0 tools. Because we want to be able to deliver digital (library and multimedia) services at the point of need, where our patrons are. And because we want to enhance our services by mashing them up with other available services out there on the web. [Web 2.0 Tools - Teamwork at Jacobs University]
The are using the Universal Widget API from Netvibes:
Using the UAW API allows easy implementation within various platforms, such as iGoogle, Macintosh, Vista, Yahoo Dashboard, and various others. This way, any developed tool can easily integrate within any supported platform - some of which you might already use! [Web 2.0 Tools - Teamwork at Jacobs University]
See a jOPAC demo here.
I was interested to see the University Confluence-based wiki infrastructure that the pages above are part of. Also interesting is the dedicated focus on such tools that IRC is making.
Linking from Wageningen
As linking between systems becomes more important, so does our interest in identifiers, and in mappings between identifiers. Here is an example from Wouter Gerritsma:
Previously I announced that we made use of the Google Books API to link to the full text whenever possible. We only experienced two problems with this service. First, the quite frequent Google spam warnings, which have been partially resolved but still keep coming back. Second, we did not have the required OCLC or LCCN numbers for the pre-ISBN books in our catalog. [Linking from Catalog of Wageningen UR Library to Google Books at WoW! Wouter on the Web]
He goes on to describe a service from our OCLC Dutch colleagues that returns an OCLC number when fed a Pica Production Number, which they have in their catalog. And the results:
A few examples are: Even when the full text is not available on Google Books, the service can be usefull. In the following example of Hogg, R. (1884) The fruit manual, the electronic version of the 1860 edition is available on Google Books rather than the 1884 edition we have in our collection. [Linking from Catalog of Wageningen UR Library to Google Books at WoW! Wouter on the Web]
April 18, 2008
•
Categories:
Books, movies and reading ...
, Libraries - distributed environments
, OCLC
I recently came across xignite, a financial web services company. Here is their blurb ....
Financial events around the world impact not only finance professionals but every business. This is why successful businesses integrate key financial information into the processes and applications their employees or clients use every day. Until now, this integration has been a challenge. Xignite answers that challenge by letting you access the latest financial and industry data on-demand, and easily integrate it straight into your company's mission-critical applications using through web services. With Xignite, you can make your business financial-aware in minutes. [Global Financial Data, News & Information Web Services - Xignite]
I liked the expression 'making your business financial-aware' through web services.
Worldcat.org is a bibliographic 'destination' and is used heavily in that way. For example, it is an important scholarly tool given its topical reach and historic depth. I recently came across an interesting niche use, when I was told by a used book seller that he uses it to discover how widely distributed an item is, or to identify libraries who might be interested in buying an item.
However, very importantly, it is also a switch into the library network. It connects discovery to actual locations, and depending on your institutional affiliation it offers you various services against those locations. In this way, Worldcat.org discloses library collections and services on the web.
In recent years, we have seen many bibliographic destinations emerge. They are variously positioned in terms of value creation. Amazon, AbeBooks, Goodreads, Google Book Search, LibraryThing, Live Search Books, OpenLibrary, the Library of Congress catalog, many national and regional library union catalogs (for example Libraries Australia, COPAC, OhioLink, Bibsys, ....), and so on. This variety seem healthy to me, and Worldcat sits alongside this range as one more destination with its own characteristics and uses. An increasingly valuable destination, we trust!
However, we also hope that Worldcat is also used by the other sites - and it is - as a switch. It is a way for other sites to add value by providing access to library resources. And it creates value for libraries by making them present in other environments where people look for, work with, share information about, books and other resources. It allows libraries to disclose collections and services in other environments.
As we move forward with the Worldcat API, this allows Worldcat functionality to be made available to other applications. So, adapting the xignite phrasing above, the Worldcat API will make applications 'bibliographic-aware'; however, thinking of the switch functionality, it will also help make applications 'library-network aware'. It will allow applications to incorporate access to a network of library assets, and to focus in on particular ones of interest. We can do this because of the collective investment by the library community and OCLC in Worldcat and registry data.
I was prompted to do this post by another interesting post from Mark Dahl where he talks about (my words) how OCLC can make the library network available at the network level, to other applications as well as to user interfaces.
OCLC has this valuable data, and great potential to develop things with it, as well as a general current towards network-level computing moving in its favor. When an libraries compare OCLC's products with that of a traditional ILS vendor, they need to see that the OCLC product is more than technology. Rather, it is an extension of a community, a network. [synthesize-specialize-mobilize: OCLC's competitive advantage]
March 09, 2008
•
Categories:
Libraries - distributed environments
, Libraries - organization and services
, Research, learning and scholarly communication
, Social networking
, Standards
, User experience
OCUL (Ontario Council of University Libraries) has released a nice white paper which discusses issues in providing an end-user access environment for its shared resources, and more interestingly, how that environment engages with the behaviors and expectations of its academic users. This document (pdf) was created to highlight opportunities and drive discussion for the OCUL consortium in both the short term through the launch of a new Scholars Portal server in 2008, and in the long term by incorporating more 'social' means of sharing and organizing information within OCUL's Scholars Portal and the larger academic community that it serves. [Scholr 2.0]
There is a pdf version and a commentpress version with the benefit of reader comments.
As one might expect from a discussion white paper, there is a focus on questions and potential directions. Recommendations are given in several areas:
Enhance and improve the user interface • Enrich Scholars Portal content by bringing in metadata from sources outside the journal repository • Explore the implementation of controlled vocabulary, thesauri and authority control • Add user tagging functionality [Scholr 2.0]
Connect the citation network to user workflow • Provide table of contents (TOC) RSS feeds with links that facilitate authentication. If it is possible, allow users to generate their own RSS feeds. • Provide users of scholarly resources with social bookmarking services • Consider services that support the whole of the user’s research process and the development of online space for OCUL research communities. • Seek means for Scholars Portal to be integrated into Learning Management Systems used by OCUL [Scholr 2.0]
Embrace standards and technologies that will allow present and future network discovery systems to make use of what we offer • Provide both permalinks as well as COinS OpenURLs in the Scholars Portal server and to encourage OCUL libraries to adopt their own versions of LibX or promote other COinS readers • Investigate how to take advantage of the attribute-based information that Shibboleth can provide • Consider what semantic metadata could be provided through Scholars Portal [Scholr 2.0]
It usefully brings together a range of material. Worth a read.
March 02, 2008
•
Categories:
Featured
, General - distributed environments
, Libraries - distributed environments
, Libraries - organization and services
, Social networking
I find Web 2.0 increasingly confusing as a label; no surprise there. This is not just because of its essential vagueness, but because I think it tends to be used in a couple of very different ways. Where this happens there is bound to be some confusion. Schematically, I will use the labels 'diffusion' and 'concentration' for these two ways.
diffusion is probably the more dominant of the two. Here it covers a range of tools and techniques which create richer connectivity between people, applications and data; which support writers as well as readers; which provide richer presentation environments. What tends to get discussed here are blogs and wikis; RSS; social networking; crowdsourcing of content; websites made programmable through web services and simple APIs; simple service composition environments; Ajax, flex, silverlight; and so on.
concentration is a major characteristic of our network experience, which often involves major gravitational hubs (google, amazon, flickr, facebook, propertyfinder.com). These concentrate data, users (as providers and consumers), and communications and computational capacity. They build value by collaboratively sourcing the creation of powerful data assets with their users. The value grows with the reinforcing property of network effects: the more people who participate, the more valuable they become. And opening up these platforms through web services creates more network effects. These sites also mobilize usage data to reflexively adapt their services, to better target particular users or to identify design directions. Of course, these platforms are very closely controlled, and there is an interesting balance of interests between openness and control at various levels in how they manage resources (see for example my discussion of the Amazon and Google APIs).
Interestingly, if you trace Tim O'Reilly's writings on Web 2.0 since the publication of his major defining article you see an emphasis on what I have called 'concentration' come through. (See my note on an interview with Tim O'Reilly by David Weinberger, on which I draw above, and also see O'Reilly blog posts here and here.)
Now, of course 'concentration' and 'diffusion' are often complementary approaches. The major Internet hubs 'diffuse' their benefits through service and data syndication, apis, participation, etc, but their value often derives from successfully driving network effects through wide participation and consolidation of data. In fact, many of the 'diffusion' techniques work best when associated with concentrating applications. Think of tagging for example. People have incentives to tag their resources in Flickr or Librarything in ways that may not obtain in the library catalog. Scale matters in the context of the social value created in these services (of course, in these examples, folks are also tagging their own resources). You cannot simply add social networking to a site and expect it to work well. Think of all those empty forums.
Much of the library discussion of Web 2.0 is about 'diffusion', about a set of techniques for richer interaction. It is appropriate that libraries should offer an experience that is continuous with how people experience the web.
However, there is a very important way in which the library experience is not continuous with the web. It remains fragmented: it does not have the characteristics of the concentrating, gravitational hubs which characterize so much web use, and are so much a part of O'Reilly's Web 2.0. Fragmented by database boundary, by service boundary (e.g. connecting a discovery experience gracefully to a fulfillment experience through resolution), by library boundary. We are now familiar with the comparison between this fragmented experience and discovery on the web. And we are also familiar with discussion of how the library presence is weakly represented in the major network presences.
However, think also of the library management environment. Think for example of places where data needs to be concentrated to create value: aggregating user data across sites (e.g. counter data), or aggregating user created data (tags, reviews), or aggregating transactions (e.g. circulations, resolver clickthroughs). Motivations here are to drive business intelligence which allows services to be refined (e.g. how does my database usage compare to that of my peer group), to develop targeted services (people who like this, also liked that), to improve local services (e.g. add tags or reviews). These are examples where scale matters, where data may need to be concentrated above the individual library level.
And, we are seeing for fee services emerge which address this need. LibraryThing, for example, syndicates its user-generated tagging to libraries. I am not sure that ScholarlyStats provides a service which compares usage across libraries; it would be interesting to know if there were demand for such a thing.
This then touches on larger questions about sourcing decisions (in what combination of local, collaborative, and third party do libraries acquire their service capacities) and about concentration of library presence (in what combination of library or library and third party are services offered).
For example, I discussed Georgia Pines and OhioLink the other day as examples of groups of libraries collaboratively sourcing a concentrated library presence which increases their gravitational pull.
And libraries are beginning to think more seriously about sourcing services with central web presences. Think for example of the decisions made by the National Library of Australia and the Library of Congress when they chose to use Flickr for significant image projects. NLA is seeking to expand the coverage of PictureAustralia; LC is seeking to collect tags from viewers. In each case, the library wants to benefit from the concentration of users and data that Flickr has created on the web. And to suggest another example, Andy Powell has been raising some intriguing questions about how repository services should be sourced in ways that, again, map onto peoples' experience of the web: would a consolidated network level service be more motivating than a serious of institutional presences? (see here and here). Social networking or other services, he suggests, might flourish at this network level in ways that are not feasible at the institutional level.
When we discuss Web 2.0, there is a temptation to think about blogs and wikis, RSS and a Facebook application, and to stop there. There is also some useful thinking about how to expose web services or data in ways that they can be remixed into other applications. However, Web 2.0 is also about concentration, concentration of data, of users and of communications. We need also to think about how libraries reconfigure services in an environment of network level gravitational hubs, driven by network effects. This will involve greater concentration of library resources in various ways, and also - probably? - greater reliance on other web presences to deliver their services.
January 30, 2008
•
Categories:
Libraries - distributed environments
, Metadata
, OCLC
I have written before about how registries provide 'intelligence' in the network. Scalable loose coupling between library services will benefit from good ways to discover those services.
The Worldcat Registry includes data for library services (resolver, catalog, virtual reference) which drives Worldcat Local and Worldcat.org. Worldcat.org's 'understanding' of the library network is captured in the Registry.
A while ago the OpenURL Resolver Registry and Gateway were incorporated into the Worldcat Registry. The Registry is openly available and the Gateway is systematically used by several other parties including Zotero:
Zotero 1.0.2 also includes several new site translators, including translators for social media sites Flickr and YouTube. Zotero’s default Open URL resolver has also been changed to the OCLC OpenURL Resolver Gateway, which will allow many Zotero users to automatically find items from their collections in their campus library through the Locate button without editing their preferences. [Zotero: The Next-Generation Research Tool » Blog Archive » Our Most Stylish Release Yet: Zotero 1.0.2]
My colleague Joanna White tells me that use of the gateway is climbing. In November 2007, over 250,.000 requests were processed. There are currently about 1,600 resolvers registered.
Here are some details about use and update of the OpenURL Resolver registry and gateway.
[OCLC OpenURL Resolver Registry [OCLC]]
Related entries:
January 10, 2008
•
Categories:
General - systems and technologies
, Libraries - systems and technologies
, Libraries - distributed environments
, Search
One of the major questions for library systems is the role of metasearch or federation. I have written about this here (Metasearch: a boundary case) and here (Metasearch, Google and the rest).
The issue is that libraries have to manage a range of database resources whose legacy technical and business boundaries do not very well map user preferences or behaviors. The approach has been to try to move away from presenting a fragmentary straggle of databases to bundling them in various ways in a metasearch application, sometimes in one big search, sometimes in smaller course or subject bundles. The issues here are well-known, not least of which is that libraries typically have limited control over the performance of the target databases.
As an alternative, a few libraries have explored consolidating locally loaded data. This can work very well, as it becomes easier to build additional services over a consolidated resource. However, this is a rather too adventurous undertaking for most libraries. Another approach is for a third party to consolidate, and this is what we have seen with Google Scholar, Scopus, Worldcat, and others.
More recently, recognizing the advantages of local consolidation, we have seen the emergence of a new class of library system which pulls together metadata from locally managed stores (e.g. digital repository, ILS, institutional repository, ...) and offers an integrated search. This may still have to work closely with a metasearch engine to integrate access to external databases. ILS vendors are moving in this direction, and through Worldcat Local, OCLC is also addressing this type of integration.
This is a discussion worth returning to, but that is not my purpose here. Rather I wanted to point to an interesting treatment of similar issues from a different domain. Mike Stonebraker, database guru and writer in the group blog, The Database Column, has a post where he contrasts two models of data integration: ETL (extract, transform and load) and federation. The focus is on enterprise systems. The ETL model will typically involve a centralized data warehouse and "for each operational system, they will employ some sort of ETL process to transform data instances into the global schema and then load them into the centralized warehouse".
'Extract, transform and load' is a good characterization of what is involved in consolidation of library data, whether this is attempted locally or through third parties. One of the interesting questions is the sophistication of the 'transform'. Think of author names, for example, or subjects, or other controlled data, and what would be involved to effectively merge data created within different regimes. What is the impact, for search or for faceted display, of limited or no transformation of these elements?
Here are the headings Stonebraker uses for his discussion.
- Data element "heat": Hot data favors ETL
-
Indexing: Federation is harder to optimize
-
Resource management: Faster BI query responses for ETL shops
-
Complexity of the schema change: ETL approach performs less joins
-
Contention (concurrency control): Federation contention challenges
-
Timeliness: ETL approaches must deal with out-of-date data issues
-
Mapping: Federations can't handle some transformations
BI is short for 'business intelligence'. 'hot' data is data that is accessed often.
Now, while it is clear that our environment is similar to that discussed here in many ways it would be interesting to do a similar analysis with our domain in mind to see where there are differences. Of course, one issue is that most of the data under discussion here seems to be within institutional control.
Here is his conclusion:
In summary, virtually all enterprises use the ETL approach for data integration. The data federation market is, in contrast, quite small. The place where I see federations as most viable is when there are many, many data sources (e.g., more than 5,000 sources) and BI users utilize only a small number of them at any given time. In this extreme case, the average data element is accessed zero times before it is updated or deleted. In this instance, one is better off leaving the data where it originates. On the other -- more common -- hand, when most data elements get used several times, the ETL approach will continue to be preferred. [To ETL or federate ... that is the question - The Database Column]
Related entries:
November 09, 2007
•
Categories:
General - distributed environments
, Libraries - distributed environments
, Search
, Standards
Under the auspices of OASIS appears a discussion document about the 'search web service'.
The Search web service is a means of opening a database to external enquiry in a standardized manner that facilitates discovery of query and response possibilities and makes it possible for heterogeneous databases to be queried simultaneously with the same or similar queries. Client software can be easily configured using a standardized XML explain document that is accessible from the base URL or via the explain operation. In contrast with protocols such as SQL and XQuery, detailed knowledge of a database’s structure is not necessary as the explain document contains parsable information on server defaults, searchable indexes and record schemas that are returned in the response. [OASIS Specification Template] There is a cryptic note about its relationship to SRU: This specification is based on the SRU (Search Retrieve via URL) specification which can be found at http://www.loc.gov/standards/sru/. It is expected that this standard, when published, will deviate from SRU. How much it will deviate cannot be predicted at this time. The fact that the SRU spec is used as a starting point for development should not be cause for concern that this might be an effort to fast track SRU. The committee hopes to preserve the useful features of SRU, but not to preserve those that are not considered useful. [OASIS Specification Template] There is a wiki for the OASIS group working on this.
October 31, 2007
•
Categories:
Libraries - distributed environments
, Metadata
, OCLC
Our Openly colleagues have added a new service, xISSN, alongside xISBN. The xISSN Web service supplies ISSNs and other information associated with serial publications represented in WorldCat. Submit an ISSN to this service, and it returns a list of related ISSNs and selected metadata. The service is based on WorldCat, the world's largest network of library content and services. The current xISSN database covers 575,573 ISSNs. [WorldCat Web service: xISSN [OCLC - WorldCat Affiliate tools]: Home]
October 25, 2007
•
Categories:
Libraries - distributed environments
, OCLC
My Openly colleagues have added an IE version of the OpenURL Referrer extension. The Firefox version is already available.
OpenURL Referrer is a browser extension that can take certain kinds of citations on the web and convert them to direct links to one of your local library's databases. This can be accomplished thanks to OpenURL, a powerful technology that packages bibliographic information into a format that many internet services can understand. ...... ..... You will need to have access to a library with a link resolver that supports OpenURL version 0.1 or 1.0. OpenURL Referrer can automatically detect your link resolver settings using the OCLC OpenURL Resolver Registry. If your institution is not yet represented in the registry, ask your library administrator to register a link resolver as part of your library profile in the WorldCat Registry (http://worldcat.org/registry/institutions). Registration is open to all libraries, both OCLC and non-OCLC. The The Firefox version of the extension can also be configured manually. [OpenURL Referrer [OCLC - Openly Informatics]]
OpenURL Referrer can produce links for three different kinds of citations: Google Scholar results, the Google News Archive, and Coins.
It is nice to see the registry being used in this way for lookup - removing the need in many cases to do local configuration.
I have no sense of how heavily various of the library oriented extensions are used, Libx for example. Nor, for that matter, have I seen any analysis of how well Coins are working in helping to connect citations in web pages back to library fulfillment options. It would be interesting to see some numbers.
September 16, 2007
•
Categories:
General - distributed environments
, Libraries - distributed environments
, Marketing
, Search
, User experience
I have been using the phrase 'discovery happens elsewhere' in recent presentations. I think it captures quite nicely an increasingly important part of how we think about our services.
No single website is the sole focus of a user's attention. Increasingly people discover websites, or encounter content from them, in a variety of places. These may be network level services (Google, ...), or personal services (my RSS aggregator or 'webtop'), or services which allow me to traverse from personal to network (Delicious, LibraryThing, ...).
This means thinking about services in different ways. About how we disclose stuff to other discovery environments; about where our metadata is; about URL structures, RSS feeds, and so on.
I have suggested before that it would be an interesting experiment to think about our services as if they had no user interface. Here maybe it would be interesting to think about services as if they could only be reached from some other place. It makes you think about the variety of other places that discovery happens.
Credits. 'Discovery happens elsewhere' is influenced by Steve Rubel's use of the phrase 'traffic happens elsewhere' in his discussion of what he calls the 'cut and paste' web.
Related entries:
September 13, 2007
•
Categories:
Books, movies and reading ...
, Libraries - distributed environments
, OCLC
I bought the following book in the congenial City Newstand in Cheyenne this afternoon. The back cover tells me that it was the Wyoming Historical Society Book of the year ...

I like the 'get it', 'save it', 'add to it', and 'share it' features.
Because I am sitting at a machine in the public library in Cheyenne it provides a link through to Wyldcat, the Wyoming-wide system. And also to FirstSearch, to which the library has access. Worldcat.org relates the IP address to the Laramie County Library System and presents the services that we know about (see the orange arrow).
Incidentally, the novel is published by Penguin, and I was drawn by what I took to be a deliberately retro cover (see previous remarks on Penguin covers).
Note: I will put in a better picture when I am re-united with my own work environment. Update: Mmmm... Of course, now that I am somewhere else if I do a new screenshot it will not show the Cheyenne location because I am not coming in from that particular IP address. I guess I will just leave it as is. Another example of how much of what we do is situational - dependent on some combination of locational attributes.
Related entries:
August 01, 2007
•
Categories:
General - distributed environments
, Libraries - distributed environments
, Libraries - organization and services
, Metadata
, Social networking
, User experience
I was interested to see the Page Tools in the University of Alberta catalogue (look in the left hand bar below). A reader can send a correction or suggestion to the library: it would be interesting to know how many folks use this option and what types of suggestion are made. Also interesting to see the ability to save the page to various bookmarking sites.
Folks will notice that I am looking at this through the library's new Facebook application.
We are pleased to announce the University of Alberta Libraries Facebook application. This new application allows access to our library catalogue, Ask Us Services, RefWorks and Get It Citation linker from within the Facebook platform. [Library News » Library Services Available Through Facebook]
As libraries place themselves more and more in other environments it would also be good to begin to see some numbers about what the impact on use of services is.

July 06, 2007
•
Categories:
Featured
, Learning and research - distributed environments
, Libraries - systems and technologies
, Libraries - distributed environments
, Libraries - organization and services
, User experience
One of the main issues facing libraries as they work to create richer user services is the complexity of their systems environment. Consider these pictures which I have been using in presentations for a while now.

Reductively, we can think of three classes of systems - (1) the classic ILS focused on 'bought' materials, (2) the emerging systems framework around licensed collections, and (3) potentially several repository systems for 'digital' resources. Of course, there are other pieces but I will focus on these.
In each case what we see is a backend apparatus for managing collections, each with its own workflow, systems and organizational support. And each with its own - different - front-end presentation and discovery mechanisms. What this means is that the front-end presentation mirrors the organizational development over time of the library backend systems, rather than the expectations or behaviors of the users.
You have the catalog here, maybe several options for licensed resources (a-to-z, metasearch, web pages of databases, and so on) over there, and potentially several repository interfaces (local digitized materials, institutional repository) somewhere else.
This is one reason that people have difficulties with the library website. Effectively, it is a layer stretched over a set of systems and services which were not designed as a unit. Indeed, in some cases, they were not originally designed to work on the web at all. So what do we have?
ILS: a management system for inventory control of the 'bought' collection (books, DVDs, etc). The catalog is bolted onto this and gives a view onto this part of the collection. In effect, in virtue of its integration with inventory management, the catalog provides discovery (what is in the collection), location (where those things are) and request (get me those things) in a tightly integrated way. The ILS and catalog may be part of a wider apparatus of provision, and may have mechanisms for interfacing to resource sharing systems of one sort or another. The management side may have interfaces to a variety of other systems for sharing and communicating data: procurement, finance, student records. And there will be a flow of data into the system, from jobbers, as part of a shared cataloging environment, and so on.
Licensed: This has been an area of rapid recent development as the journal literature moved to electronic form. On the backend we now see a variety of approaches, and the frontend can be very confusing with lists of databases and journals presented in various ways, often in uncertain relation to the catalog (where do I look for something?). We are now seeing the emergence here of an agreed set of systems around knowledge-base, ERM, resolution and metasearch, and there is rapidly developing vendor support. This is the range of approaches for which Serials Solutions has proposed the ERAMS name. These systems require the management of new kinds of data, and mechanisms are being put in place, certainly not yet optimal, for the creation, propagation and sharing of this data. With journals data, discovery, location and request are not so tightly coupled as they were with the catalog. Discovery has happened in one set of tools (A&I databases), but then the appropriate title may have to be located in another tool (the catalog for example) and, if not available locally, requested through yet another system. The importance of the resolver, and the enabling OpenURL, has been to tie some of these things together and remove some of the human labor of making connections between these systems. And metasearch has been seen as a way of reducing human labor by providing a unified discovery experience over disparate databases. However, this whole apparatus is still not as as well-seamed as it needs to be, and users and managers still do more work than they should to make it all work.
Repository: Libraries are increasingly managing digital materials locally and supporting repository frameworks for those. This includes digitized special collections, research and learning materials in institutional repositories, web archives, and so on. There are a variety of repository solutions available, some open source. Typically, the contents of the repository backend may be available to repository front-ends on a per-repository basis. Here, discovery (what is there), location (where is it) and request and delivery are typically tightly integrated. Repositories may also have interfaces for harvesting or remote query. On the management side, metadata creation and material preparation may still be labor-intensive.
OK, so here are some general observations about this environment: - There is still a major focus - in terms of attention, organizational structures, and resource allocation - on the systems and processes around the ILS and the bought collection. In academic libraries, we will surely see some of this move towards the systems and processes around the licensed collections given the rising relative importance of this part of the collection. The repository strand of activity, associated with emerging digital library activities, may, in some cases, be supported from grant or other special resources. It will need to become more routine.
- The fragmentation of this systems activity, the multiple vendor sources, the different workflows and data management processes, and the absence of agreed simple links between things mean that the overall cost of management is high.
There is also another cost: diminished impact and lost opportunity. The awkward disjointedness described above also means that it is difficult to mobilize the consolidated library resource into other environments, course management or social networking systems for example. It is difficult to flexibly put what is wanted where it is wanted.- There has been much discussion of library interoperability, but it has tended to be about how to tie together these individual pieces, or about tying pieces to other environments (how do I get my repository harvested for example). There has been less focus on how you might abstract the full library experience for consumption by other applications - a campus portal for example.
This in turn means several things. - We will see more hosted and shared solutions emerge, which offer to reduce local cost of ownership. And, of course, we are seeing vendors consider more integration between products. In particular it is interesting seeing the concentration on support for the licensed e-resources emerge strongly, as well as discussion about integrated discovery environments.
- Over time, we can expect to see some more reconfiguration in a network environment. Shared cataloging and externalizing the journal literature have been two significant reconfigurations in the past. The pace of current developments suggest that we may be ready for other ways of collaboratively sourcing shared operations. For example, does it make sense for there to be library by library solutions for preservation, social networking, disclosure to search and social networking engines, and so on.
The next picture tries to capture an important direction that has emerged in the last year or so.

For many of the reasons identified above, we are seeing a growing interest in separating the discovery and presentation front end from the management backend across this range of systems. Why? Well, because it is becoming clearer as I suggested in my opening that legacy system boundaries do not effectively map user preferences. And because fragmentation adds to effort and accordingly diminishes impact.
What about the discovery side? So, we saw metasearch, a partial response to fragmentation of A&I databases. We are now seeing a new generation of products from the 'ILS vendors' which look at unifying access to the library collection: Encore, Primo, Enterprise Portal Solution. However, discovery has also moved to the network level. So, folks discover resources in Amazon, Google, Google Scholar. And OCLC is working to create discovery experiences which connect local and network through Worldcat Local, Worldcat.org and Open Worldcat.
And on the management side? Here the variety of workflows and systems adds cost, as resources are managed on a per-format basis. We can expect to see simplification and rationalization in coming years as libraries cannot sustain expensive diversity of management systems. The National Library of Australia's discussion of a 'single business' systems environment, or Ex Libris's discussion of Uniform Resource Management are relevant here. It is likely that there will be a growing investment in collaboratively sourced solutions, as libraries seek to share the costs of development and deployment.
As discovery peels off, then the issue of connecting discovery environments back to resources themselves becomes very important. It is interesting to look at Google Scholar in this regard, as different approaches are required for the three categories identified above. It has worked with OCLC and other union catalogs to connect users through to catalogs and the ILS; it has worked with resolver data to connect users through to licensed materials; and it has crawled repositories and links directly to digital content.
Given this great divide, several issues become very important: - Routing, resolution and registries become critical, as one wants to enable users to move easily from a variety of discovery environments to resources they are authorized to use. We need a richer apparatus to support this. (I have discussed the role of registries elsewhere.)
- Libraries have thought about discovery. There is now a switch of emphasis to disclosure: libraries need to think about how their resources are best represented in discovery environments which they don't manage. (I have also discussed disclosure in more detail elsewhere in these pages.)
- And again, how we present library services for consumption by other environments becomes an issue. For example, we are lacking an ILS Service Layer, an agreed way of presenting the functionality of the ILS so that it can be placed, say, in another discovery environment (shelf status, place a hold, etc).
- Better discovery puts more pressure on delivery, whether from a local collection, throughout a consortium, or in broader resource sharing or purchase options. Streamlining the logistics of delivery and providing transparency on status at any stage for the user (as they can do with UPS or Amazon) become more important.

And finally ....
We are used to thinking about better integration of library services. But that is a means, not an end. The end is the enhancement of research, learning and personal development. I discussed above how we want resources to be represented in various discovery environments. Increasingly, we want to represent resources in a variety of other workflows. These might be the personal digital environments that we are creating around RSS aggregators, toolbars and so on. Or the prefabricated institutional environments such as the course management system or the campus portal. Or emerging service composition environments like Facebook or iGoogle. As well as in network level discovery environments like Google or Amazon that are so much a part of people's behaviors.
Libraries need to focus more attention on reconfiguring library services for network environments. This is the main reason for streamlining the backend management systems environment. It does not make sense to spend so much time on non-value creating effort.
Related entries:
June 12, 2007
•
Categories:
Libraries - distributed environments
, Libraries - organization and services
, Metadata
, Search
, User experience
Judith Pearce from the National Library of Australia left an interesting comment about the integration, or not, of full-text book indexes and library catalogs. Here is an excerpt: Here at the National Library of Australia, just as we are starting to address the challenge of getting nice fully FRBRised, relevance-ranked and clustered search results from a centralised data corpus, we need to start thinking about searching the whole boook. We already have full-text indexes to our own locally hosted content so it makes sense to extend this to externally hosted content. Our Library Labs prototype at http://ll01.nla.gov.au/ does search Google Books at the moment but the results are not at all well-integrated into the rest of the page. And we would need to target multiple external sources to get full coverage. [Judith Pearce comment on Lorcan Dempsey's weblog: On demand book search again ...]
The Library Labs prototype she points to is worth a look, acknowledging that it is a place for trying out things.
I was interested to follow the link from that page to a presentation by her colleagues Alison Dellit and Tony Boston which provides discusses this work in the context of the further development of Libraries Australia. Our challenge – as a library community – is to make these resources as easy to find and get as the best “long tail” businesses resources are. Finding and getting a library item should be no more complicated than searching and ordering on Amazon, or Ebay. To do this, we need to make searching Libraries Australia as easy and intuitive as possible – including providing new ways for users to browse material; and we need to make getting resources as easy as possible. This paper reports on efforts to improve the searchability of Libraries Australia. Discussions on improving the getting of Libraries Australia material are outside the scope of this paper, however, we would like to note the recent establishment of the Rethinking Resource Sharing Reference Group, which is looking at this problem. [Relevance ranking of results from MARC-based catalogues: from guidelines to implementation exploiting structured metadata]
The paper discusses potential approaches to a range of issues around ranking, tagging, clustering, recommending, and also considers the benefits of consolidation. Worth a read.
Aside: I was reminded reading it of my suggestion that we want to 'rank, relate and recommend' better in our systems. I have changed the order from the original Rank, recommend and relate.
Aside: I need to update my list of entries about the catalog: Talking about the catalog.
Related entries:
May 29, 2007
•
Categories:
Libraries - distributed environments
, Metadata
, Standards
, ebooks and other e-resources
Link resolvers and the serials supply chain [pdf] is the title of an interesting report commissioned by the UK Serials Group and written by James Culling. From the summary:
The current knowledge base data supply chain is characterized by a complex series of roles, relationships and inter-dependencies between publishers, other content hosts, subscription agents, link resolver suppliers, libraries and others. [Link resolvers and the serials supply chain pdf]
The report argues that a lack of understanding between these stakeholders results in many inefficiencies.
It could be said that whilst the community's attention has been mostly focused on what it means to be OpenURL compliant, a code of practice and information standards to ensure knowledge base compliance and the efficient transfer of data through the supply chain have been sorely absent and overlooked. [Link resolvers and the serials supply chain pdf]
The report discusses issues from different stakeholder perspective. It recommends the establishment of an organization which brings stakeholders together around a "code of practice for effective participation in the knowledge base supply chain". Counter is suggested as a model.
My colleague Phil Norman alerted me to the publication of the report.
March 26, 2007
•
Categories:
General - distributed environments
, Libraries - systems and technologies
, Libraries - distributed environments
, Libraries - organization and services
The National Library of Australia has made an interesting report available, National Library of Australia IT Architecture Project Report, March 2007. [pdf] Here is the declared purpose: The aim of this report is to define the IT architecture that will be needed to support the management, discovery and delivery of the National Library of Australia's collections over the next three years. The current architecture has allowed the library to develop a significant digital library capability over the last decade. Now the burden of maintaining and supporting existing systems and services is increasingly hindering us from bringing new services online, improving the user experience, exploring new ideas or responding to technological change. In the meantime, enormous changes are occurring in the broader environment. The report identifies three major responses within the context of a new framework for digital library services (I talk about them in a different order than the one in which they are presented). One, it recommends a move to a service-oriented architecture. The grounds for this are clear, and clearly made in the report. They include the ability to share common services across applications, to be able to respond to change effectively, and to reduce over time the redundancy, cost and complexity of development.
Two, it argues for using open source solutions where they are 'functional and robust'. It notes an amendment to prior policy which favored a buy over a build policy. The Library will now consider open source solutions based on function and cost comparisons. The assessment of cost will not only include consideration of the direct costs of additional development but also the benefits of contributing code to the community and, interestingly, the opportunity costs of using commercial software whose development path is not aligned with library direction and need. The report notes the possibility of collaboratively sourcing some functionality with partners.
And three, the report talks about a 'single business' approach. This was the most interesting aspect of the document to me, because it underscores a major issue for libraries and the systems they deploy. This is that applications have developed in a piecemeal fashion over recent years, so that library operations are now supported by many applications, in different stages of maturity, and with different levels of process standardization. However, this ensemble of applications does not support efficient working across the range of library requirements, and inhibits flexible service development. Indeed, boundaries between these applications seem increasingly arbitrary, and to owe more to historic circumstance, and to the structure of the industry that has developed over time, than to current needs. Simply managing this diversity is a major task in itself. The ERAMS (electronic resource access and management services) discussion I mentioned a while ago is one symptom of a growing sense that the library systems landscape needs to be redefined.
The 'single business' approach is a recommendation that the library think in terms of a single 'business' and a single data corpus as part of its planning process, rather than in terms of separate planning for each service line or resource type (e.g. images, books, music). And that technical solutions be designed in ways that minimize the number of separate business applications that need to be developed. Of course, the service-oriented approach would facilitate the latter goal. In practice this would mean trying to streamline workflow across management environments for different resource types; using common delivery, rights management and other solutions; and developing a single integrated discovery environment across collections and resource types, which can be accessed through different views.
The report is well structured, and is worth reading as much as for its discussion of some general issues as it is for the particular National Library of Australia situation. .
February 20, 2007
•
Categories:
Libraries - distributed environments
Ed Summers has a nice note on the Worldcat Registry of institutional profiles. Talking about machine level access through web services.
The details are available here (free worldcat.org registration required). Two web services are now available. There are full details on the pages; here is what the headlines say:
WorldCat Registry Search is a Web service that retrieves a set of "thin" records for Registry-profiled institutions or consortia that match specified criteria. The criteria could be any combination of institution name, alias, type, city, state, country, postal code, OCLC Symbol or RLG ID. The service returns a set of records in XML format that meet those criteria. The records returned by this Web service are the same information normally displayed to an unauthenticated user who conducts a search on the WorldCat Registry Web site. Worldcat registry detail. This Web service retrieves details about a single institution or consortium profiled in the WorldCat Registry. The service uses the Representational State Transfer (REST, or "RESTful") software architecture and retrieves a record from the Registry based on a specified numeric WorldCat Institution Identifier (ID). Results are returned in XML. The former uses SRU. Here is an example of the latter.
Click on the following for the xml response:
http://worldcat.org/webservices/registry/content/Institutions/14229
and here is the same entry in the (humanly) web acessible registry:
http://worldcat.org/registry/Institutions/14229
Update: I meant to mention that the lead architect for the registry is my research colleague Jeff Young, working closely with product and development folks in Dublin and in Mountain View.
Related entries:
February 19, 2007
•
Categories:
Libraries - distributed environments
, Marketing
I wrote about registries a while ago. But this is exactly the situation we are in with higher level network services where we have no such directory services. Increasingly, library applications need to know about a variety of entities. We are used to thinking about information objects (books, journals, maps, etc). What about institutions (suppliers, libraries, etc), policies (e.g. ILL policies), licenses, collections (databases, special collections, summary level descriptions of archival collections, and so on), and services (addresses and interface details for machine users, and descriptions for human users)? The absence of appropriate directory services for each of these reduces the efficiency of the network. We have an extensive infrastructure to allow us to discover and use information objects, and we are currently figuring out how that needs to be re-engineered for more effective use in a network world. However, we are very poorly equipped in the other areas. This means that there is a lot of local configuration and redundant effort in making certain applications work. [Lorcan Dempsey's weblog: Registries: the intelligence in the network]
The OpenURL Resolver Registry is an example of a service registry, the service in this case being an OpenURL resolution service.
I am pleased to point to the Worldcat Registry, a registry of institutions. Primarily libraries for now, but it will grow. The introduction of this new registry has been led by my New Services colleagues with input from folks around OCLC.
Check out the informational pages: The WorldCat Registry is a Web-based directory for libraries and library consortia. It is an authoritative single source for information that defines institutional identity, services, relationships, contacts and other key data often shared with third parties. [WorldCat Registry]
Related entry:
|