On demand book search again ...

In December 2005 I wrote:

Now, one potential advantage of the book mass digitisation initiatives currently underway is that they are potentially creating a 'book content index' in the way that the search engines currently have a 'web content index'. Amazon is opening up a business which makes that 'web content index' available to other applications through its APIs. Which leads to an interesting question: Will Amazon open up its 'search inside the book' indexes in this way also (or can it)? Or will another player - Google for example - develop such a service? Or ... Does anybody yet have a critical mass, or will they soon?
Such a service would be very useful, and if offered in an appropriate way could be integrated into library catalogs or other library services. Indeed, libraries could build vertical applications on top of such a service.
It seems that within a few years we will have a book content index. One of the questions for the library community will be how to use it. Another will be how to make sure that parts of the scholarly or cultural record that are not attractive to current mass digitization initiatives are not rendered less accessible over time because they are not being indexed in this way. [Lorcan Dempsey's weblog: On demand book search]

I was thinking of this entry when writing about the latest Google announcement last week. Although we now have much more indexed book text online, we don't (yet?) have the types of interfaces that would allow access to the book text from other services. It will be a pity if books continue to be less accessible than the web.

Aside: I was prompted to do this entry by the fact that the original post popped up in my logs as one of the top entries looked at so far in June. The topic must be in the air ....

Related entries:

Comments: 1

Jun 12, 2007
Judith Pearce

Incorporating full-text indexes into library catalogue searches is another one of those problem areas where you wonder if the real answer is just to throw away library catalogues and use search engines, particularly now that Google is setting precedents for an integrated search and we are seeding search engines with our records.

Somehow, though, there does still seem to be a business case for libraries to host their own discovery services based on local collections.

Here at the National Library of Australia, just as we are starting to address the challenge of getting nice fully FRBRised, relevance-ranked and clustered search results from a centralised data corpus, we need to start thinking about searching the whole boook. We already have full-text indexes to our own locally hosted content so it makes sense to extend this to externally hosted content. Our Library Labs prototype at http://ll01.nla.gov.au/ does search Google Books at the moment but the results are not at all well-integrated into the rest of the page. And we would need to target multiple external sources to get full coverage.

I've been wondering whether we need to treat full-text indexes as metadata that can be harvested and aggregated. (Wasn't there a project in the 1990s that proposed something like this?)

Of course, this is not something that every library can do, so it raises the other question of whether online library services are best delivered as logical views of federated collections of metadata and full-text indexes. Or does this bring us back to Google and the types of interfaces that might let us build our own services over Google or other search engines?