Fog Creek Software
Discussion Board




Odd Google Observation

<observation>

Google claims it "ignores common words and characters such as 'where' and 'how'".  But I noticed that including such words in a search will yield different results from when they are omitted.  For example...

http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=what+is+salad+cream

http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=salad+cream

This struck me as odd, since I have gotten in the habit of filtering the words myself when I enter a query.

</observation>


So, is Google's deliberately misinforming the user?  If so, why?

On a (sorta) related note, have you ever deliberately told your users something untrue about your software's functionality?  If so, why?

Google User
Wednesday, May 05, 2004

> On a (sorta) related note, have you ever deliberately told
> your users something untrue about your software's
> functionality?  If so, why?

Of course! I think everyone does. It is when Dell says the PC has a 40GB harddrive, but it really turns out to be 38.2GB, or when a potential customer asks you if your product can do X, you say yes even though it doesn't do it yet.

The reason? Sales guys lie quite a bit to make their product look good.

I don't know if that's the case with google...

grunt
Wednesday, May 05, 2004

I've seen differences in results even with the same query. I put this down to load-balancing; presumably different Google servers may have sightly different views of the index at a given time.

So maybe they do filter those words, as they claim.

MugsGame
Wednesday, May 05, 2004

They generally ignore noise words in the query but the noise words ARE indexed in case you use quotes for an exact match.

(e.g. "United States of America" in quotes returns that exact string, which means of must be indexed.)

"What is" is not a noise word, anyway, it's a sign that you're looking for definition which Google does not ignore.

Joel Spolsky
Fog Creek Software
Wednesday, May 05, 2004

The reason is that google also does a proximity search and common words are not excluded from it...

This snippet from www.googleguide.com might help

Google favors results that have your search terms near each other.

Google considers the proximity of your search terms within a page. So the query [ snake grass ] finds pages about a plant of that name, while [ snake in the grass ] tends to emphasize pages about sneaky people. Although Google ignores the words "in" and "the," (these are stop words), Google gives higher priority to pages in which "snake" and "grass" are separated by two words.

Code Monkey
Wednesday, May 05, 2004

Joel-
It is true that it kicks out a definition response when you use "what is", but there are other differences.  Look lower in the list of results.

They must be using "what" and "is", even though they go to the trouble of stating "The following words are very common and were not included in your search: what is. [details]"


grunt-
I agree that marketing will say anything to move a product.  But in this case, the misinformation does not improve Google's market position.

Maybe they are trying to condition the user to self-filter "noise words".

Google User
Wednesday, May 05, 2004

Code Monkey-

That would explain it.

Google User
Wednesday, May 05, 2004

MugsGame is right -- even the same query will return slightly different results depending on exactly which servers it's run against. I've seen this acknowledged in the past (or maybe somebody at Google told me firsthand), but I can't find a reference for it now.

Anyway, looking in the Google Help section reveals a more complete explanation of the handling of stop words:

Google ignores common words and characters such as "where" and "how", as well as certain single digits and single letters, because they tend to slow down your search without improving the results. Google will indicate if a common word has been excluded by displaying details on the results page below the search box.

If a common word is essential to getting the results you want, you can include it by putting a "+" sign in front of it. (Be sure to include a space before the "+" sign.)

John C.
Wednesday, May 05, 2004

The common words are excluded from the search that chooses the sample but not from the ranking within that sample.

That is to say if you search for Frankie goes to Hollywood without quotes, Google will include all web pages with 'Frankie', 'Hollywood' and 'goes' but not all web pages with 'to'. However your top results ranking will be marginally different according to which you enter in the search engine.

Stephen Jones
Wednesday, May 05, 2004

Simple experiment: http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=the
yields 5,770,000,000 pages found.
Google used to ignore "noise" words in queries but they stopped doing it long time ago.

Just me (Sir to you)
Thursday, May 06, 2004

Now here is an odd observation:
Google on the homepage claims "Searching 4,285,199,774 web pages", and yet the previous query yields "Results 1 - 10 of about 5,770,000,000 for the".
I feel mindraped! Now that they are going to be in the money, where can I sign up for the first class-action lawsuit?

Just me (Sir to you)
Thursday, May 06, 2004

The total they give appears to vary by search. You could try a lawsuit, but I think you would need to prove you had a connection fast enough to access the missing ones.

Stephen Jones
Thursday, May 06, 2004

*  Recent Topics

*  Fog Creek Home