Fog Creek Software
Discussion Board

How do Search Engines Stored Index Pages?

I was puzzling on Google after reading a couple of posts on this board.

I was wondering how Google would index the words in a page, add priority to the words (or weighting) and refer to them.

Assuming a search came in for the for "Fantastico".

Would the first step after the query has been submitted be to search "Fantastico" in an index; and the index points to the part of the database where "Fantasticio" resides.

And thisrelates to the associated links and page summary for the query.

All of this can be very process and time consuming.

Would this be also how the "Search" feature work on a site like "Fogcreek"?

Tuesday, September 2, 2003

For a good book about indexing large bodies of text, check out Managing Gigabytes by Witten, Moffat, and Bell.

Rob Mayoff
Tuesday, September 2, 2003

For something like a single website, there's a generalized text indexing/search engine from Apache called Lucene.  Web search software has been built on top of it.  There's some good documentation on the website ( ).

Kannan Goundan
Tuesday, September 2, 2003

There's an interesting (and infamous) paper called "The Anatomy of a Search Engine" that describes the architecture and implementation of the prototype that became Google at

Tuesday, September 2, 2003

*  Recent Topics

*  Fog Creek Home