How do Search Engines Stored Index Pages?
I was puzzling on Google after reading a couple of posts on this board.
For a good book about indexing large bodies of text, check out Managing Gigabytes by Witten, Moffat, and Bell.
For something like a single website, there's a generalized text indexing/search engine from Apache called Lucene. Web search software has been built on top of it. There's some good documentation on the website ( http://jakarta.apache.org/lucene/ ).
There's an interesting (and infamous) paper called "The Anatomy of a Search Engine" that describes the architecture and implementation of the prototype that became Google at http://www-db.stanford.edu/~backrub/google.html
Fog Creek Home