We have been looking at things recently which have made my colleagues interested in hardware again, and which also show how far we have come in being able to manipulate and move large amounts of data.

WorldCat is our union catalogue of about 56 million bibliographic records, which represent approximately a billion holdings. It is about 50 gigabytes in MARC Communications (100+ gigabytes in XML) format and about 23 gigabytes compressed.

OCLC Research recently acquired a 24-node (48-cpu) Beowulf cluster with 96 Gigabytes of memory. According to my colleague Thom Hickey, whose team has been working on the machine, the cluster speeds up most bibliographic processing by about a factor of 30. This means that what might have taken a minute now takes two seconds, what might have taken an hour takes two minutes, what might have taken a month takes a day. For jobs that will fit entirely in memory (e.g. a `grep' of WorldCat) avoiding disk i/o gives another factor of about 20, reducing 1-hour jobs down to 6 seconds. We can 'frbrize' WorldCat on the cluster in about an hour.

WorldCat is also now more mobile. Thom has a 40 gig iPod which can accommodate WorldCat on its disk with room left for 5,000 song tracks. Now, you can't do much with the data on the iPod, but you can certainly carry it around. Again, it takes about an hour to get it on and off the iPod.

Thom adds "much as 'quantity has a quality all its own', such a speed-up changes both the approaches that can be taken to solving problems and the type of problems that can be tackled at all. Google typically works with cluster of 3,600 CPUs, raising performance to yet another level."

Comments: 0

Jan 24, 2005
Roy Tennant

I am unbelievably happy to see OCLC Research thinking this way. The key is that this technique "changes both the approaches that can be taken to solving problems and the type of problems that can be tackled at all." No kidding. I mean, a grep of WorldCat, who would have thought? You must be able to find out all kinds of interesting (and troubling) things. I can't believe how much you guys "get it." Now we need the rest of the profession to catch up.

Jan 24, 2005
Randy

Lorcan: Amazing news, and I agree with Roy. My question: where are the instructions for downloading WorldCat to one's iPod, or is this strictly internal to OCLC at the moment? - Randy

Jan 25, 2005
Lorcan

Randy, thanks for this kind note. At the moment this activity is within OCLC Research. We do not currently deliver WorldCat in this way. Certainly makes you think about what types of service model might be possible in the future though!