Thom has a nice piece on text retrieval on ResourceShelf (arranged by Shirley).
To misquote William Gibson, "the future of text retrieval is already here, it's just not evenly distributed." High performance searching of large files is benefiting from the continuing exponential drop in computing and storage costs, with clusters of cheap Linux boxes becoming the standard platform for searching. The drop in hardware costs combined with recent advances in caching and computing document rankings, is making the computational cost of searches rapidly approach zero, even for very large collections. [Text retrieval, 2004. ResourceShelf]