My colleague Thom Hickey has a short article in the current OCLC Newsletter expanding on our experiments with the Beowulf Cluster to speed up processing.
We obtained the machine to investigate parallel text searching. At OCLC we have always searched our databases in parallel, but in as few pieces as we could. In this project we took the opposite approach--to break our database into as many pieces as we could, search each at the same time, and then deal with the coordination needed to return a single result to a searcher. We are finding this works very well for searching, but, more generally, we have found it to be useful for virtually any work with large numbers of bibliographic records. WorldCat now contains well over 55 million records, even accounting for records that have been deleted and merged over the years. Since our cluster has 24 separate nodes with a total of 48 processors, we typically get 30-fold speedups in processing, and occasionally much more than that because the entire database can be cached in main memory. [research [OCLC]]I mentioned this a while ago in an entry about how developments in hardware were interesting again.