Fog Creek Software
Discussion Board




How do Search Engines Stored Index Pages?

I was puzzling on Google after reading a couple of posts on this board.

I was wondering how Google would index the words in a page, add priority to the words (or weighting) and refer to them.

Assuming a search came in for the for "Fantastico".

Would the first step after the query has been submitted be to search "Fantastico" in an index; and the index points to the part of the database where "Fantasticio" resides.

And thisrelates to the associated links and page summary for the query.

All of this can be very process and time consuming.

Would this be also how the "Search" feature work on a site like "Fogcreek"?

Mir
Tuesday, September 02, 2003

For a good book about indexing large bodies of text, check out Managing Gigabytes by Witten, Moffat, and Bell.

http://tinyurl.com/lz1r

Rob Mayoff
Tuesday, September 02, 2003

For something like a single website, there's a generalized text indexing/search engine from Apache called Lucene.  Web search software has been built on top of it.  There's some good documentation on the website ( http://jakarta.apache.org/lucene/ ).

Kannan Goundan
Tuesday, September 02, 2003

There's an interesting (and infamous) paper called "The Anatomy of a Search Engine" that describes the architecture and implementation of the prototype that became Google at http://www-db.stanford.edu/~backrub/google.html

r1ch
Tuesday, September 02, 2003

*  Recent Topics

*  Fog Creek Home