Fog Creek Software
Discussion Board

"Searching is not a hard problem" ...

... says Joel's most recent article.

So can someone please explain why:

+ Google is the only viable web search engine;
+ it takes my Windows computer several minutes to search for a given filename;
+ Longhorn was going to provide network-wide search with almost instant results by using a database-based filesystem, but now it isn't;
+ searching for images of the Empire State Building on Google using the search criterea "Empire State Building" will only give me what I'm looking for if such images have filenames like "empire-state-building.jpeg" or appear alongside text that matches the search criterea is some way;

Search *IS* hard. Well, searching is easy, but finding what the user wants, in a way that the user finds usable is very, very hard.

Wednesday, July 21, 2004

Check out how much better the latest Mac OSX search is compared to the standard Windows XP search:

That's what we need on Windows. I had okay results with DocSearcher but it crashes so much and the author seems to have abandoned it judging from the bug reports:

Matthew Lock
Wednesday, July 21, 2004

Search is not hard on Unix based systems.  Seems to be incredibly difficult on Windows.  Maybe someone could explain.

.net, the equivalent of MS Bob.
Wednesday, July 21, 2004

It is a matter of priority.

You HAVE to have a very damn good search engine within UNIX to find stuff without having to do directory-digging.  The need was there, so it was made.

For Windows, it's all icons.  Click it, run it.  Need docs?  My Documents.  The need were not there as much.

Wednesday, July 21, 2004

Isn't searching for images of the Empire State Building on Google as you described - like walking up to a total stranger and saying "Empire State Building."

It would be MHO that you would have to at least state what you were looking for.

We represent the lollipop guild
Wednesday, July 21, 2004

Searching on Windows seems to be very broken. I assume that this is because either there is no indexing or the indexing is switched off by default. I use "Cathy" to track my file names, which gives instantaneous results across all my numerous hard drives, cd's and dvd's, but does require me to update the index on regular basis. (On a related point, I think "find" on unix is also pretty slow)

For searching for text within files, grep is massively quicker than explorer.

So - generalized searching can be hard, but specific kinds of searching can be extremely easy (or at least, solutions are well known, and should be easy to implement).

Chris Welsh
Wednesday, July 21, 2004

On a related note, has anyone ever encountered a serious discussion of search outside of AI? I don't remember seeing it, outside a couple simple tree searches.

(Too bad the "intelligence" part of AI was overhyped. AI seems just to be about pushing machines to their conceptual limits, so a lot of interesting work comes from there.)

Tayssir John Gabbour
Wednesday, July 21, 2004

These guys specialise in searching collections ...

Wednesday, July 21, 2004

Although find is slow, the linux locate command is much faster because it uses an index. Unfortunatly the index has to be updated daily  because linux does not have an efficient way to track file creation and deletion.

Wednesday, July 21, 2004

This was a cute way of implying just how far searching has come since the Dawn Of Computing:

Sam Livingston-Gray
Wednesday, July 21, 2004

The power of Unix's find command is that you can run a program of arbitrary complexity against each file that it finds.  Just getting a list of files is trivial (although noobs whine about the syntax), but when you can do:

find . -name '*.cc' -print -exec grep '#include' {} \; | sort | uniq -c

Find the names of all the #includes in all of your source and count the usage of each.

This might not be the perfect example, but you get the idea of the power.  I have a directory tree of almost 50,000 text files at home that I manage using tricks like this.

I tried to use Win2K's search facility but my biggest complaint was the it lowercases all the file names after a search.  Then when I want to use the exporer tool to rename a few I have to figure out which of the letters were really lowercase, and which were upper because I have to change them back, besides whatever renaming I wanted to do.  That was so bogus I gave on the tool as more trouble than help.

And another thing - why can't I use Win2K's search to find multi-word strings.  I want to search for "Mayor McCheese" and have it find exactly that phrase instead of all the files with either Mayor or McCheese.  That's why I install Cygwin and use the Unix tools to manage my files.

Thursday, July 22, 2004

"I want to search for "Mayor McCheese" and have it find exactly that phrase instead of all the files with either Mayor or McCheese. "

Tell me about it!  I have heard that Agent Ransack is a good tool for Windows searching that supports regexes, etc--but I haven't tried it since I installed it.

Thursday, July 22, 2004

*  Recent Topics

*  Fog Creek Home