Metasearch: a boundary case

A couple of metasearch reports have been recently released. One, carried out as part of an NSDL project at the California Digital Library, proposes 'approaches, principles and practices' which might be applied by anybody evaluating integrated search options [pdf]. The second, the RLG Metasearch Survey Report, discusses member experiences and expectations with metasearch. Roy Tennant, one of the authors of the former, comments in the latter on hangingtogether.org.

The reports raise many issues, especially when laid alongside a more general discussion about how library services are presented to users. To this I will return; in the interim, a few remarks on metasearch:

Advances?
Metasearch has come onstage in a big way in the last couple of years: there is now a variety of products available and many libraries are implementing them. However, the concepts, technologies and approaches that they adopt have been in currency for many years. Index Data and Fretwell Downing, amongst others, for example, or indeed OCLC with SiteSearch, have many years of experience deploying metasearch approaches. There is also quite a record of discussion of some potential features: creating an individualized 'landscape' based on some match between a representation of user interests and a representation of collections and services available, alerting, metadata schema and terminology merging, deduplication, forward knowledge based on collection description or an index and so on. What has changed most over the years is the emergence of the Amazoogle search experience and the recognition that fragmentation reduces the gravitational pull of library resources. The renewed emphasis on metasearch is one response to this - and the NISO Metasearch Intiative responds to a recognition that despite several years of deployment it needs to work better. How do you avoid some of the current inefficiencies of interaction which make life difficult for the data provider and the metasearch application supplier?

Incentives?
How to explain this lack of progress over the years? There seem to be social or business factors delaying forward progress: what incentives are there for parties to change or improve the situation? One major incentive for the library is clear, and was mentioned above: to reduce fragmentation and increase gravitational pull for the user. At the same time, Ben Toth points to a counter-trend in a comment on another post.

I know it's a bit of a generalisation, but professionally we've had little incentive to simplify search experience for users and quite a lot of incentive to emphasise the complexity and mystery surrounding search.
Various library activities are indeed bound up with that complexity. And he goes on to touch on data provider incentives
It's not just the fault of librarians - the industry is locked into a business model - creating and maintaining large sets of metadata - that is increasingly irrelevant to connecting users with the content they need. [Lorcan Dempsey's weblog: Simpler search]
A major issue that metasearch is trying to address is that boundaries may fall in different places on the demand and supply sides. On the demand side, one wants to present data in terms of user interest, for which purpose database or technical boundaries may be unhelpful. On the supply side, databases are provided by many providers, some of whom may be concerned about their distintiveness disappearing behind somebody else's interface. They may want the user to be very aware of the boundary between their data and other people's. (I refer to this as the 'brandscape' factor elsewhere, where the interests of individual providers may overcome the interests of the overall user experience.) It is also interesting to wonder about the distinctiveness of current Metasearch providers and what impact more streamlined metasearch would have on their position in the value chain. How does that play into incentives for change? So, while there may be general assent to the benefits of more streamlined metasearch capacity, incentives for librarians, data providers, and metasearch application providers may not all be clearly aligned around this direction.

One brick in the wall
Metasearch is not an end in itself, although we sometimes talk about it as if it were. The aim is to provide search services at the level of database combination that makes sense for the user, to provide guidance on those combinations, and to present the services in ways which make sense in user environments. This last point is important; one may want to present a metasearch service as a web page, as a box in a reading list or course page, as a machine interface which other applications talk to, and so on. Metasearch, like all other library services, will be part of an eco-system of services. One can talk of its place in the discover-locate-request-deliver chain, and we have seen much work of late providing integration with resolution and fulfilment services of various kinds, so that the user can move from discovery to fulfilment in a more streamlined way. Increasingly, we may want data to flow more easily (to work with reference/citation managers), or to mix metasearch capacity into particular environments (a course apparatus is an example). In some cases, a search may bring back updated results against a particular stored query. Some users might like to the ability to set up searches whose results can be viewed in their RSS aggregator. And so on. Search - and metasearch - is a part only of what a library user wants to do - it needs to be integrated into a variety of workflows.

Alternatives?
Now, in the last section, I may have been a touch heavy on the qualification. This is because of the difficulties involved in providing some of these services effectively, and the lack of progress I noted over recent years. It is for this reason that I wondered a while ago if it might make more sense to attack the boundary issue differently, by working on business and technical approaches which would result in fewer, larger resources to search. This would reduce the complexity of boundary spanning by pushing data integration and other issues upstream. At the cost of putting more burden on the search system to make discriminations that have been lost. It does also raise the question of how much difference is useful. This would require significant change in how we currently manage the data supply side, but we are living in a time of significant change.

Related entries:

Comments: 4

Aug 29, 2005
Judith Pearce

We are having a debate at the National Library of Australia about the value of fielded searching and the need for distributed search protocols that support complex query languages. The case being put is different from the "Z39.50/SRW/SRU/CQL is too hard to implement" argument. The question being asked is: "Could it be that, instead of trying to address the semantic interoperability issues always inherent in implementing a complex query language, we could spend our efforts better on making 'search' become 'find' by exploiting the semantic relationships inherent in, or deducible from, the content being searched?" This is closely related to the idea of building fewer, larger targets with a fairly clearly defined scope that can be subject to such analysis at the server end, leaving what is presented to server choice via a simple search and retrieval protocol. This raises questions about the role of the 'collection aggregator' at the portal end versus the 'metadata aggregator' at the target end, in terms of interpreting user input and presenting results. Leaving all the 'smarts' to the targets and just displaying the results like a9.com would be the simplest approach, but ideally, the collection aggregator would first analyse the intent of the search (and possibly what it knows about the searcher) to select (or emphasise) specific targets for each search and to cluster results.

Aug 30, 2005
Thom Hickey

Can't resist pointing to my blog entry about this.

--Th

Aug 30, 2005
Ralph LeVan

I think there's an 80/20 rule here. 80% of the searches can probably be solved with unfielded Find-type searching. The question is, how much work are we willing to do to satisfy to missing 20%?

Jan 20, 2006
Kate

This is closely related to the idea of building fewer, larger targets with a fairly clearly defined scope that can be subject to such analysis at the server end, leaving what is presented to server choice via a simple search and retrieval protocol. This raises questions about the role of the 'collection aggregator' at the portal end versus the 'metadata aggregator' at the target end, in terms of interpreting user input and presenting results. Leaving all the 'smarts' to the targets and just displaying the results like a9.com would be the simplest approach, but ideally, the collection aggregator would first analyse the intent of the search (and possibly what it knows about the searcher) to select (or emphasise) specific targets for each search and to cluster results