Fog Creek Software
Discussion Board




how good a software engineer

How good a software engineer would you be if you could replicate the quality of the clustering done by one of the commercial search engines,and get similar results in a few months of work?
Do people think it's difficult to do?

Robert
Saturday, November 29, 2003

It's easy if you have enough money for all the hardware you'd need to scan the entire net.

The real problem would be negotiating license fees with google since their methods are patented.

Pretty Good
Saturday, November 29, 2003

Google's Pagerank is really "I should have thought of that" brilliant. Not to mention that in order to pull results from a database the size of Alaska in less than a second you'd have to tweak everything, down to the OS, maybe even hardware. Oh yeah, and then server up several billion search results a day.

www.MarkTAW.com
Saturday, November 29, 2003

I have implemented algorithms for clustering on a search engine data set, and results are comparable to some other commercial engines.
Do people think that the heuristic algorithms you would use are difficult?

Robert
Saturday, November 29, 2003

Actually the real problem isn't developing the software or buying the hardware. The real problem with re-creating Google would be generating the mindshare to get people to switch. Unless your version is so vastly superior to google as to generate word-of-mouth, you'd have to spend a bazillion dollars on advertising to get people to know about your search engine.

Anon. Coward
Saturday, November 29, 2003

It isn't comepting with google, once you have the 10,000 results google generates, mine groups and clusters the results, and is comparable to other clustering engines of this category, vivisimo, and a couple of others.

Robert
Saturday, November 29, 2003

The thing about Google is that it proves that if something is a lot better, then word of mouth is all that is needed for it to take off. Remember the old days when Yahoo search ruled the roost?

Anyway the OP's question is a little like how good a physicist would you be if you thought up the law of gravity again. Once the original guy's done it redoing actual calcuations don't get many genius points.

Stephen Jones
Sunday, November 30, 2003

Hey, "any application could be re-written in 45 days" (joke for those of you who read that thread).

Google really did have humble roots, if you can write something that "just works" the way Google does there's no reason it can't take off, or at least get a few corporate clients who need that kind of functionality.

As far as taking on Google in 3 months... No way. You'd have to spider the whole internet, and tackle all the problems they've tackled, some things probably took months or years to work out and get to the point where they are.

www.MarkTAW.com
Sunday, November 30, 2003

I don't recall Yahoo search ever ruling the roost. Indeed, for the longest time Yahoo's "search" was nothing more than searching its own user submitted web "index".

Before Google I used Excite, which was most definitely the winner at the time, and before that the search engine of choice was AltaVista.

Dennis Forbes
Sunday, November 30, 2003

The fact that Yahoo's list was done manually meant that it was good. Other search engines often failed because people shamelessly exploited the meta tags.

I would say Google took over some time in 2000 to 2001.

With Yahoo I would often have to drill down six or seven levels. When I found one search in Google would do the trick I was amazed.

Searching used to take up to half an hour. With Google it became more or less one click.

Stephen Jones
Sunday, November 30, 2003

Remember, coming up with software which can do as good as google in one area doesn't mean you can compete, not be a long shot. You have to crawl allot before you become in any way useful, you need a real dataset to work with, which means you need allot of storage, then you have to think about the bandwidth you have available, then failover, then scalability, then indexing etc....

Mind you, best of luck all the same, I often do mini projects e.g. coming up with a system like google, they're just for the sake of learning really...

fw
Sunday, November 30, 2003

Robert, if you have everything else Google has, you can take them on what they don't have: a better query.

Their queries are just too simple, when you are doing research you need more power and it's not there. Nested boolean, etc.

A complicated, albeit cryptic, powerful query language would definitely win my heart.

Alex
Sunday, November 30, 2003

And punctuation.

.net

c++

etc.

www.MarkTAW.com
Sunday, November 30, 2003

Such queries with && and || might be good for you and I, who are programmers.  But I've heard that people have examined queries (on alta vista and such), and pretty much no one uses those features.  I think Google was smart not to have them.  The design decision was that you should be able to get the best results possible without resorting to them.  And even though it does seem counterintuitive, I think they were right.  There are occasions when I have to do a couple or 3 queries where one could have done, but it doesn't seem to make a big difference.

Roose
Sunday, November 30, 2003

In any case, the point was that you're not going to get any market share by adding boolean logic to queries.

Roose
Sunday, November 30, 2003

Of course not. If I were Google, I'd definitely not have nested boolean statements.

But sometimes I miss them, that's all. Along with EXACT, even case-sensitive, matching like ".net"

Alex
Sunday, November 30, 2003

*  Recent Topics

*  Fog Creek Home