Fog Creek Software
Discussion Board




Not all of Google's servers give the same results

I have no idea how many servers Google has exposed for users queries - but lately, or at least I have noticed lately, that the same searches give a different result at differing times.

Okay - now I am sure most of you will, fairly, say that the results have to change with time.

But I am not talking about results changing forward with time - but backward.

Howszadt?

Lets say I looked at Google cached version of www.a.com this morning - it has been indexed recently by Google and has all the changes I uploaded last week.

Now this evening when I check the Google cached version - it has the version from two months ago.

Other points I noticed:

Occationally when I do searches for a name say, "Joe Blogs".

It has his lastest papers published last month.

When I go back to find the papaers and enter "Joe Blogs" in google - I have an older version of "Joe Blogs" with none of the latest papers.

Perhaps the search index databases are not replicating and merging the results across all servers in a timely fashion.

This is an obervation by me - I am not a heavy user of serach engines. Perhaps more frequent users can share their experiences.

Also I am sure Google pays a heavier weithing to certain website and search queries - perhaps those are replicated more frequently across all databases and servers.

This flaw may exist with more mundane queries - that happen to be updated more frequently.

i.e. if I have a site that discusses my hobby - "teeth chattering" music. Now as I use CityDesk as content management - I update my site three times a week on the latest and greatest of the sounds of "teeth chaterring".

Now "Teeth Chaterring" exists in many dimensions - but it is not a hot topic and changes to the subject are infrequent. Suddenly I come along and make changes often to a web-site.

This is way the flaw of Google's indexing may exisit - its crawler indexes my site But places low priority on merge replicating the results across all server/databases.

Or I could be horribly wrong and the above is total balony.

Ram Dass
Tuesday, August 26, 2003

Maybe it's something to do with how (and when) Google updates its index, see: http://dance.efactory.de/

Anoymous coward
Tuesday, August 26, 2003

yes, is when google reindex his db.

its named google-dancing.

Robert Cappa
Tuesday, August 26, 2003

The Google architecture is interesting. They have multiple indices (the monthly, the daily, and the news). Having things move from monthly to daily or back can cause some oddities.

Then, in each index, the content is distributed and replicated. So imagine a grid of PCs. Across the top is your distribution; going from left to right, you find different slices of the index. Across the sides is your replication; going from top to bottom, you find each of those slices is replicated to multiple PCs.

Of course, the cache is also distributed and replicated.

At any given time, your search request may hit a whole slew of different PCs than the last time you searched. If that index is in motion -- I imagine the montly index in particular takes a long time to replicate -- then each time you search, you may find different results.

Nelson Minar from Google (hope I'm spelling his name right) gave a really fascinating talk about all this at Gnomedex last month.

Brad Wilson (dotnetguy.techieswithcats.com)
Tuesday, August 26, 2003

An article on the Google architecture that I enjoyed:

http://www.computer.org/micro/mi2003/m2022.pdf

RH
Wednesday, August 27, 2003

*  Recent Topics

*  Fog Creek Home