Fog Creek Software
Discussion Board

Any ideas on how Google indexes images?

It does not seemed to be the file name - that would be too inefficient and open to abuse.

It cannot be image mining - that technology is too immature at the moment.

I am stumped! Any ideas?

Ram Dass
Thursday, August 21, 2003

AFAICT they run their search engine over the page and weight terms based on proximity to the image, esp in table cells adjacent to the image.


Thursday, August 21, 2003

I agree with Philo.  If you do an image search, just look at the page that the image was found on and you can see that sure enough the word(s) you searched for are somewhere in the page, or in a page that linked to it.

BTW, has anyone tried the new google toolbar (version 2.0)?  The form filler and popup-blocker features are great.

Thursday, August 21, 2003

Their image algorithm usually works well, but Google News will occasionally have a funny mixup with the wrong news image and article summary. :-)

Thursday, August 21, 2003

They might not take advantage of this fact, but they could use the fact that many images appear on different sites with different scaling/cropping, etc.

Just find a way to compare two images to see if they are basically the same (modulo the format, scaling, etc).  Each web page with terms related to the query and an identical image would be a "vote".  They could use this rank as a factor in the PageRank for the image.

But they display a significant number of dupes now so I guess they don't do that.  They do have that "show duplicates" link at the bottom (which seems to work well for web pages), but the images usually have a bunch of dupes anyway.

Friday, August 22, 2003

*  Recent Topics

*  Fog Creek Home