ncsucat.pngI said a few entries ago that I was working on an entry on the catalog. I need to hurry up! Some big catalogish things came along this week.

First the University of California released a significant report on its bibliographic infrastructure, on how catalogs should be built, presented and managed.

Rethinking how we provide bibliographic services for the University of California
  • Full report [pdf]

  • Executive summary [pdf]

And then North Carolina State University released its new catalog to some acclaim. This is special in that they have built it themselves outside the integrated library system using Endeca software. Interestingly, the NCSU catalog incorporates many of the things discussed in the UC report, and more of them are on its development path. That suggests that there may be some convergence in thinking about features, although many other questions remain open.

Here are some of the things I took away from the UC report (my words):

  • Services. A desire to provide direct access to described items. Bibliographic systems should not run into dead-ends and disappoint the user. A recommender service is desirable. Results should be ranked in meaningful ways.
  • Bibliographic structure. A variety of metadata schema should be supported and used. The ISO 2709/MARC/AACR2 stack should not be seen as the default; schema should be appropriate to materials under consideration. There is strong support for FRBR and faceted browse to structure large result sets and provide sensible navigation options. Browse needs to be supported by controlled data for name, place, time period, and uniform title. Interestingly, the report questions whether controlled subject data will be necessary in light of table of contents and other associated data. (Subject browse is a strong feature of the NCSU catalog.)
  • At the point of need. There is a recognition that services need to be where the user is, so bibliographic services and data need to be surfaced in course management systems, institutional portals, and search engines. (The library in the user environment.)
  • Discovery: consolidation and gravitational pull. There is some discussion of unifying the UC 'bibliographic universe', recognizing that a larger integrated data resource exerts a stronger gravitational pull than multiple resources, especially where users do not appreciate the differences between resources.
  • Technical processing. There is also discussion of consolidating cataloging and other processing activity, and reducing the cost of bibliographic management. It is suggested that more data needs to be captured upstream from vendors, and that there be more selective programmatic upgrade of data downstream. Which leads to less time spent on manual cataloging.
  • Platform and organization. One of the recurrent themes is how to source particular capacities. For example, what platform would a unified catalog be built on: a local integrated library system product or an external party like RLG or OCLC.
  • Value. There is an arresting statement on page 9 of the report: "we all agree that the cost of our bibliographic services enterprise is unsupportable as we move into an increasingly digital world, and a solution is nowhere in sight". Throughout the report, there is an awareness that inefficiencies represent opportunity costs and that these are increasingly intolerable in a changing world with many additional demands.

The NCSU catalog is big news. Why it even makes it as a news item on the home page of NCSU itself!

NC State Vice Provost and Director of Libraries Susan K. Nutter says, "With this groundbreaking approach, the NCSU Libraries is responding to Web searchers who expect to retrieve results in order of relevance. The new system - the first of its kind in a library - empowers users to quickly locate the items they're looking for or to explore the multifaceted research collection in depth, exploiting both the software's cutting-edge capabilities and the library's many decades of investment in detailed cataloging and classification." [News Release:]
I like this reference to investment. One of the ironies of the library world is that while the creation of bibliographic data was historically central to the library mission and continues to be a major investment, we have not released its full value in systems and services. We have not made that data work as hard as we might, either in the context of the user experience or in terms of the management intelligence that can be mined from it (to support recommender systems, collection analysis, and so on).

And the NCSU catalog does indeed make much more use of the data. There are lots of nice things in it. At first sight, the Endeca faceted browse structure works well with the bibliographic data pivoting on topic, genre, era, format, region, library, language, and author. There is also a general subject browse. It would be interesting to know how users find the approach, although it may be familiar to some from sites like shopping.com. I liked the 'send search to' option, where you could repeat the search on other catalogs and search sytems. It would benefit from FRBRization and this is on the agenda for development. I am sure that there will be a lot of discussion about this over the next while, as people put it through its paces - Andrew Pace briefly reviews some features in a web4lib mail (and gives well deserved kudos to his colleague Emily Lynema who worked on the project).

It is interesting that they use circulation data for sorting, as a measure of 'popularity'. (OCLC uses holdings counts - another type of 'intentional' data.)

It is also interesting that NCSU has chosen this route. Coming fresh from reading the UC report, I was thinking about organizational structures and the role of the catalog in the context of the wider bibliographic database spectrum when I saw the NCSU catalog. NCSU has independently created an impressive resource which they will maintain and further develop. At some stage, I hope that they discuss the considerations that prompted them to go down this path. It would be interesting to know whether this is seen as an interim approach until the market catches up, or whether we are going to see more of this kind of approach in libraries. It would also be interesting to know whether others might benefit from the work done, either through some Endeca offering or in another way. Of course, the catalog covers a part only of the library collection. Would this approach be hospitable to other data (A&I resources for example, with their differently structured data). Over the next few years, a major issue for libraries will be thinking about how to erase some of the boundaries between databases and allow their users to prospect the full literature in easier ways. This will raise important issues about bibliographic structures and practices as the historical investment in cataloging and classification has focused on one part of the library collection. That said, it is really great seeing the data exercised in this way.

So, a significant report and a significant new catalog in one week. It is good to see this rethinking of the catalog, and the wider bibliographic apparatus, alongside the type of innovation in bringing the service to the user that Dave Pattern and others are exploring.

Full disclosure: I had been thinking about catalogs because I have been interviewed for two reports on catalogs and cataloging in the last while. One of them was the UC report discussed above. And of course, OCLC is discussed at various points in the UC report.

Some related entries:

Comments: 5

Jan 13, 2006
Bill Landis

Haven't had the chance to wade through the UC BSTF report yet, but wanted to comment on your point about its questioning the utility of controlled subject data. First, I think calling these new systems "bibliographic" is semantically problematic. The old systems were bibliographic in that their primary intent was to manage metadata about books on shelves in libraries. The new systems need to, as the report points out, support a variety of metadata schemas. But also, they need to support a variety of different, continuously evolving, non-bibliographic (e.g., archival) approaches to managing information, where we can't safely make assumptions like the likely existence of tables of contents. Controlled subject headings will continue to have utility for managing--at broad, collection-type levels--lots of future resources that our systems will be required to manage. And when it comes to supporting faceted browsing interfaces "to structure large result sets and provide sensible navigation options", using automated clustering approaches will work soooooo much better if there are some controlled subject fields to work from (in terms of both data for clustering and information to help in weighting contributions of various data elements to clustering results). Kudos to the folks at NCSU for not abandoning some of the power of current bibliographic systems. I work for UC and will definitely be pushing from this end that we test out some of these potentially problematic assumptions before we actually start building next-generation information management and access systems!

Jan 13, 2006
K.G. Schneider

Bill, I skimmed the whole report because it's relevant to what we're doing in LII. What I take from UC's approach is that the metadata doesn't necessarily need to be the most expensive available.

NCSU is introducing major improvements to its catalog, and I applaud them with professional pride and personal envy. But it would be a mistake to attribute improvements in findability to the use of LCSH. What other metadata did they have to work with?

Within LII's system, we can manually create LII-based metadata (maybe call it a bibliofolksonomy...) for 1/4 the effort required for LCSH. It's also more user-friendly. It's faceted information, and in fact it's much richer faceting. So it's cheaper and it's better. Why LCSH? We only continue to apply and manage LCSH as legacy metadata, and also because I have a futuristic hope that the LCSH-bibliofolksonomy crosswalk might someday prove helpful to systems, well, such as UC's. But it's expensive, and I'm fully aware that the money spent on LCSH could be used for other things.

Lorcan, one area the report played footsy with and backed away from was the issue of full-text searching. I thought that was interesting, maybe because I'm so focused on addressing that problem in directory datbases at present.

Great report... though are they scared to put their names on it? ;-)

Jan 13, 2006
Andrew Pace

First, I want to thank Lorcan for his thoughtful post. We're all pleased to see the accolades, but are mindful that our implementation of a new kind of catalog leaves us much to do. Lorcan has laid much of this out, as does the UC report. Several have commented that the catalog's inclusion of only one kind of metadata (usually pointing to only one kind of content) is still a weak-point. Like most libraries, we are trying to reshape a new jigsaw puzzle with the homegrown, licensed, and co-developed technology. The point is that the traditional catalog does not fit in this puzzle. We needed a new tool like that provided by Endeca to integrate into a dismantled (and re-built) system. The Endeca catalog is just one piece of a puzzle that inludes the new catalog, Electronic Resource Management, digital repositories, institutional repositories, web search, metasearch, and the rest. And while we're still building the puzzle, we get a really cool catalog in the meantime, with all the features that we wished it had for the past 10 years and should have had for the last 5. Is NCSU going to stop implementing a new metasearch solution? No. Stop ERM? No. Stop development of new tools? Certainly not. Libraries have spent a lot of time down-playing the catalog, but I think that's been a rationalization for its ineffectiveness as a search tool. We are confident that we've changed that.

Jan 13, 2006
K.G. Schneider

Btw, don't get me wrong, the NCSU catalog is awesome. It would be interesting seeing the same catalog using different metadata.

Jan 16, 2006
Yan Han

Well, the NCSU catalog looks great and the report is terrific. When thinking libraries catalog in business model, what is the competative edge of a catalog over google/yahoo/msn? It is the local environment. Technical stuff (metadata, easy search/find) is important, but the priority is to make the catalog tightly integrated with all the campus learning/research systems. (e.g. single sign-on, different resource/different group).