Fog Creek Software
Discussion Board




Google uses Linux Component computers

I have been puzzling on Google's hardware set up and a friend of mine, who is a regular poster here, suggested that I post my question here.

Google uses over 15 000 component linux computers to service query analysis.

Component compuiters means that there are simple PCs and not expensive servers on racks.

Now Google obviously must store all the index pages on databases - and run a serach on the index when a query is made.

What I would like to know is that does each of the 15 000 Linux boxes have a database running on it?

Are the index spread out in multiple boxes - i.e. the entire index stored, in say, separate databases on 3000 PC boxes?

If the index are stored in a spread out fashion - how can a query be made quickly - i.e. does it have to go to 3000 PCs before having an answer?

Jean M
Saturday, August 30, 2003

might be worth looking from here:
http://www.google.com/press/query.html

blargle
Saturday, August 30, 2003

One of the founders of Google gave a very interesting lecture that answers those questions here:  http://technetcast.ddj.com/tnc_play_stream.html?stream_id=420

and here

http://technetcast.ddj.com/tnc_play_stream.html?stream_id=421

Matthew Lock
Saturday, August 30, 2003

"Component compuiters means that there are simple PCs and not expensive servers on racks."

I'm curious - how do you think "simple PCs" are differentiated from "expensive servers on racks" insofar as clustering is concerned?

Philo

Philo
Saturday, August 30, 2003

A few things to note.

1. While Google is "running Linux", that's sort of like equating the "gas powered engine" of a Geo Metro w/ the one in a professional race car. The OS is heavily modified.

2. As far as I know, Google doesn't use a traditional database. It seems as though every bit of software on their boxes is custom to do the exact and specific task at hand.

3. Google doesn't use traditional (Beowulf-style) clustering for its index machines. The index is distributed and replicated, and the requests are dispatched based on an approximation of where it expects to find the results. Again, custom code, not the stock stuff.

Brad Wilson (dotnetguy.techieswithcats.com)
Saturday, August 30, 2003

"While Google is "running Linux", that's sort of like equating the "gas powered engine" of a Geo Metro w/ the one in a professional race car. The OS is heavily modified."

Actually, that's exactly why it's worth pointing out that Google is running Linux - you can't do that with Windows.

Suffice to say it's likely that Linux made Google possible. (okay, so they could've written their own OS from scratch...)

Philo

Philo
Saturday, August 30, 2003

"Actually, that's exactly why it's worth pointing out that Google is running Linux - you can't do that with Windows."

Agreed.

"Suffice to say it's likely that Linux made Google possible. (okay, so they could've written their own OS from scratch...)"

Disagreed. There have been, are, and will be other source-included operating systems. Linux is not the first, and won't be the last. Hell, given the choice of Linux vs. BSD and a heavily modified OS, I doubt I would start with Linux, personally.

Brad Wilson (dotnetguy.techieswithcats.com)
Saturday, August 30, 2003

"given the choice of Linux vs. BSD and a heavily modified OS, I doubt I would start with Linux, personally."

in the speech linked above, the guy says they evaluated bsds and found them much slower.

It is worth a good listen, even if it is a long speech.

micronerd
Saturday, August 30, 2003

---" I'm curious - how do you think "simple PCs" are differentiated from "expensive servers on racks" insofar as clustering is concerned?"-----

Err cost? :)

Stephen Jones
Saturday, August 30, 2003

What bearing would that have on Jean's question, Stephen?

Philo

Philo
Saturday, August 30, 2003

Let me clarify - she specifically pointed out that Google is using consumer PC's instead of servers. In my experience, "expensive servers on racks" is a matter of hot swappable components, RAID, perhaps multiple CPU's. But there's nothing about them from a digital/logical standpoint that's not present in a consumer PC. I was just wondering what she was thinking was a differentiator with respect to Google's processing capabilities.

Philo

Philo
Saturday, August 30, 2003

----"What bearing would that have on Jean's question, Stephen?

Philo "-------

As much bearing as your question I was answering - that is none whatsoever.

Sonething about the pot and the kettle comes to mind here.

Stephen Jones
Saturday, August 30, 2003

Philo: the difference is that in the case of a server, people go out of their way to make the server as reliable as possible. That includes typically redundant hard disks, redundant power supplies.

Google uses redundancy on another level: not the different parts of the computer are redundant; the computers themselves are redundant. When one of their computers fails, they just let it sit in the rack; after some time, they collect and replace all failed computers. They don't even bother to try to repair the ones that are kaput.

From a logical standpoint this means that the logic should be able to use any computers available, and ignore failed computers. But I guess load balancing systems do that anyway.

Roel Schroeven
Saturday, August 30, 2003

I thought this was an interesting interview when I read it last year.

Craig Silverstein answers your Google questions
[Google director of technology]

http://interviews.slashdot.org/article.pl?sid=02/07/03/1352239&mode=thread&tid=95

The Voice of Rationality
Saturday, August 30, 2003

*  Recent Topics

*  Fog Creek Home