Business intelligence

The collection and analysis of data about collections, systems and services is becoming increasingly important for libraries. For management and accountability purposes. Here are some examples of where such data might be useful: to refine systems based on usage data, to develop data-driven knowledge of user behaviors, to support collection development decisions, and so on.

In this context I was interested to read the proposal to develop a "Scholars' Portal Use Data Service' as an appendix to the 2006/07 Scholars Portal plan [pdf] of the Ontario Consortium of University Libraries. The stated aim is to collect and consolidate resource use data to support collection development and other management decisions. The Scholars Portal pulls together metasearch, access to electronic journals, personal citation management, and resource sharing services (the latter based on OCLC Pica's VDX). It states its goal as follows:

... to provide access to scholarly electronic resources through a set of tools which allow the networked scholar to search, save and integrate these resources with their teaching and learning to foster greater learning opportunities.

They note the need to collect data from a range of suppliers, to integrate it with local resolution and resource sharing data, and to provide relevant management reports on the data. At the moment, they note that the full service envisaged here is not available from a vendor and they need to develop it themselves.

This is a general need, and it is an area where we will see much more activity over the next while. A couple of significant environmental trends support this. One is the development of the COUNTER and Sushi agreements to enable the sharing of data across services. Another is the emergence of products and services in the business intelligence category, such as ScholarlyStats or, with a different emphasis, our WorldCat Collection Analysis service. ScholarlyStats is provided by MPS Technologies, which provides outsourcing services for publishers.

Comments: 3

Mar 05, 2007
Amos Lakos

What is curious to me is that it is taking so long for libraries to apply business intelligence tools to collections decision making.

At the University of Waterloo and in the Tri-University environment this has been ongoing for the last eight years, with some degree of success, although adoption was slow.

The Ontario Scholars Portal is a positive example of developing networked analytics and then distributing the analysis and reports to the local libraries.

Because of the dearth of local leadership and skills, I think that the trend will be increasingly in developing consortia based analytics frameworks to collections, and not only to analytics but also to centralized collection management of both print and digital.

Applying Business Intelligence solutions at the consortia level may foster other type of "collections" collaborations and opportunity cost applications.

Mar 08, 2007
Joe Zucca

Libraries are indeed working very slowly to apply what Amos calls business intelligence tools to management. What seems curious, in addition to the pace, is the relative absence of any discussion of the architectural requirements and features of such tools. Using a standardized XML model, Sushi takes an infrastructural approach to harvesting COUNTER data, and that makes Sushi a guide of sorts. Is there a similar way to capture, structure, and integrate data for a wider range of library services and make them available for analysis? Imagine if you will an XML metrics document composed of a medley of service attributes that together comprise a library event schema.

Every user interaction with the library, no matter how distinctive—a book circulation, an e-journal connection, a courseware access, an instructional session, at some libraries even a photocopier use—can be resolved into a set of properties that lend themselves to representation in the branching layers of an XML document. As an example, here are a few possible event attributes:

o Environment (date, time, IP address, handle, building, service point)
o Population (user status, affiliation, tenure, age cohort, gender...),
o Content (title, publisher, url, doi, issn, isbn, course number…)
o Service Genre (consultative or instructional, cataloging, circulation, login, download, image display...)
o Library and University Program ( course level, school org-code, library org-code, staff_id...)
o Budget (fund code, budget category, dollar amount…).

This is a just a rough illustration. Suffice it to say, the metrics architecture that might exploit such a schema could be comprehensive, demographically informed—a critical ingredient of business intelligence that neither COUNTER nor any vendor can provide—and integrative for enterprise or even multi-enterprise level analysis. For several reasons, it is realistic to believe such an architecture is feasible: The very systems we use to deliver service today are rich with the information needed to create comprehensive warehouses of library data. The parsing and indexing technologies, and the repository solutions necessary to begin development are all part of the standard tool kit used in most large libraries or library consortia. Finally and more generally, rather than an expensive new add-on, the metrics architecture is a repurposing of investments many institutions will have to make in their digital libraries just to keep pace with the evolution of technology and services. Strategically— and this I think is the larger lesson to take from the OCUL’s Scholars’ Portal that Lorcan cites—we should see the service and business intelligence environments as two sides of the same coin. The challenge is to work through the various standardization and plumbing issues to place intelligence systems on a par with other kinds of digital development.

It’s a honking big project, but one that librarians and IT experts ought to be discussing. At the University of Pennsylvania Library, we’re making a stab at it as part of our Digital Library infrastructure planning which encompasses the Library’s Data Farm program. An early version of the event schema (metric.doc) outlined above is currently underdevelopment using EZproxy logs, with a broader implementation scheduled to start this summer. We would welcome a further comment.

Mar 08, 2007
Amos Lakos

Joe - I like your vision. I agree that in a web based - xml based environment, and especially if libraries increasingly will use network based transactional real time environments, based on agreed upon standards, the possibility of creative analytics is a given.

My concern is that the transition to such environments is too slow and that the profession and the academic environment (at least) is not making this actionable or a priority. What I wonder is this - can we change to an framework of decision making that takes data, analytics and evidence as the basis for decisions?

On top of that, if your solutions actually will work, I still do not believe that it will be successfully implemented in most local libraries. I think that real successful use of such schema will be in adapting it to consortial, state or national organizations, that then will enable local entities to mine the data or to acquire customized reports.

Also we could take a page from the Google development model - start as a BETA - and possibly keep it beta.