how good a software engineer
How good a software engineer would you be if you could replicate the quality of the clustering done by one of the commercial search engines,and get similar results in a few months of work?
It's easy if you have enough money for all the hardware you'd need to scan the entire net.
Google's Pagerank is really "I should have thought of that" brilliant. Not to mention that in order to pull results from a database the size of Alaska in less than a second you'd have to tweak everything, down to the OS, maybe even hardware. Oh yeah, and then server up several billion search results a day.
I have implemented algorithms for clustering on a search engine data set, and results are comparable to some other commercial engines.
Actually the real problem isn't developing the software or buying the hardware. The real problem with re-creating Google would be generating the mindshare to get people to switch. Unless your version is so vastly superior to google as to generate word-of-mouth, you'd have to spend a bazillion dollars on advertising to get people to know about your search engine.
It isn't comepting with google, once you have the 10,000 results google generates, mine groups and clusters the results, and is comparable to other clustering engines of this category, vivisimo, and a couple of others.
The thing about Google is that it proves that if something is a lot better, then word of mouth is all that is needed for it to take off. Remember the old days when Yahoo search ruled the roost?
Hey, "any application could be re-written in 45 days" (joke for those of you who read that thread).
I don't recall Yahoo search ever ruling the roost. Indeed, for the longest time Yahoo's "search" was nothing more than searching its own user submitted web "index".
The fact that Yahoo's list was done manually meant that it was good. Other search engines often failed because people shamelessly exploited the meta tags.
Remember, coming up with software which can do as good as google in one area doesn't mean you can compete, not be a long shot. You have to crawl allot before you become in any way useful, you need a real dataset to work with, which means you need allot of storage, then you have to think about the bandwidth you have available, then failover, then scalability, then indexing etc....
Robert, if you have everything else Google has, you can take them on what they don't have: a better query.
Such queries with && and || might be good for you and I, who are programmers. But I've heard that people have examined queries (on alta vista and such), and pretty much no one uses those features. I think Google was smart not to have them. The design decision was that you should be able to get the best results possible without resorting to them. And even though it does seem counterintuitive, I think they were right. There are occasions when I have to do a couple or 3 queries where one could have done, but it doesn't seem to make a big difference.
In any case, the point was that you're not going to get any market share by adding boolean logic to queries.
Of course not. If I were Google, I'd definitely not have nested boolean statements.
Fog Creek Home