Fog Creek Software
Discussion Board




gmail technical challenges

Joel,
don't you think the 1Gb offer from Google deserves your comments about the programming behind? Because I don't think they are just paying for the hardware.

Are they compressing messages with a new algorithm?
Are they detecting duplicated messages? Or perhaps duplicated attachments?
Are they buying Maxtor or Seagate?

What do you think about it?

</w>
PS: BTW, I miss a "Ask Joel" button in the top of the page. The scroll is too large.

Wuonm
Tuesday, April 06, 2004

A gig of hard drive space is about one dollar using cheap drives. In the quantities Google's going to buy them, it might be as low as fifty cents. The average user won't have nearly that much email, they'll probably have more like 100 KB, so we're talking five cents. They can make that up with the revenue from one clickthrough on one ad.

Joel Spolsky
Fog Creek Software
Wednesday, April 07, 2004

...and becoming the one-stop-shop on the web -- Priceless.

Edward
Wednesday, April 07, 2004

What about the cost of the arrays, the logical and physical setup...? I don't think this is just about the cost of the hard disks.
Uptime would be essential, data safety doubly so. You can't just "pop another in" and tell the customer you've lost their data.

Adriano Varoli Piazza
Wednesday, April 07, 2004

what about the electricity bill for running hard disks?

(i mean keep the tiny motors spinning, and the disk controllers need some electricity too - then scale it up by two gazillions).

Michael Moser
Wednesday, April 07, 2004

"What about the cost of the arrays, the logical and physical setup...? I don't think this is just about the cost of the hard disks.
Uptime would be essential, data safety doubly so. You can't just "pop another in" and tell the customer you've lost their data."

Given the business Google's currently in, I imagine they've mastered all of these things already.  Yes, they probably will be able to just "pop another in" and have the backed up data "magically" reappear without the intervention of any Google employees.

Jim Rankin
Wednesday, April 07, 2004

>"I imagine they've mastered all of these things already."

You are right, Google will probably have this down pat. It still isn't going to be zero cost, though.

Adriano Varoli Piazza
Wednesday, April 07, 2004

They have their own file system, gfs, that acts sort of like raid for computers instead of disks.  one server goes down, they have 2 others that are have enough info to reconstruct the data.

Using gfs, which is used in their search cluster, for email will probably make it easy to have scalability and redundancy on their email servers, and make 1 gig mailboxes possible, by making the minimum number of copies of mails sent to multiple users.  one spam mail sent to a million users can be located at one network location (which may be actually 1 server or multiple servers depending on load or other factors)

Just my guess



Just my guess. 

doubtful
Wednesday, April 07, 2004

Funny, I just ran across an interesting post on that very topic:  http://blog.topix.net/archives/000016.html

Sam Livingston-Gray
Wednesday, April 07, 2004

Actually, this sounds like a good way to eliminate spam. Give me a little configuration opton that says "Suppress all messages where same message went to more than <X> recipients except for <list of whitelisted newsletters/mailinglists>".

Ron Porter
Wednesday, April 07, 2004

"Actually, this sounds like a good way to eliminate spam"

But would be a problem for the times that you did actually want to receive a bulk email, like the ones Joel sends out every now and then with article updates.

Ben R
Thursday, April 08, 2004

That's what the whitelist is for.

Ron Porter
Thursday, April 08, 2004

Won't work, since spam contains random phrases and junk unique to each email. So the spam you get is a little different to spam everyone else gets.

Matthew Lock
Thursday, April 08, 2004

<i>The average user won't have nearly that much email, they'll probably have more like 100 KB, so we're talking five cents. </i>

I think Joel meant 100 MB, not KB.

Karl Max
Friday, April 09, 2004

>>Won't work, since spam contains random phrases and junk unique to each email. So the spam you get is a little different to spam everyone else gets.


SpamNet does what he is talking about - collaborative filtering. Even with the "unique" emails due to garbage added on, somehow Cloudmark figured out a way to get around that.

Ever since I installed the program, it blocked 36,313 spam emails and I had to manually block 2,858. Not bad.

AEB
Friday, April 09, 2004

"I think Joel meant 100 MB, not KB. "

On the surface it may seem so, but I've been using yahoo mail for 5 years now, and I easily live within the 6 MB limit. My use is reasonably heavy, I do the occasional bit of housekeeping, but I keep the usage count down by downloading all attachments locally and deleting them from the yahoo account.

This brings up some other thoughts about gmail and it's established competitors yahoo and hotmail:

Would you be comfortable to entrust Google to store 1GB of data that increasingly represents more and more details of our private and professional lives. Are you happy for google to store your emails involving banking details, resumes, love letters etc?

Personally, when it comes to personal email, I would actually prefer a paid email hosting service over a free one - as a general rule of thumb, there are certain rules-of-engagement and legal responsibilities binding such transactions that do not apply to a free service.

just my 2c for today.

Ash
Tuesday, April 13, 2004

I'm a big google fan.
Mainly because the text ads don't bother me (although I never click them anyway).

If what it takes to get gmail, 1GB mailbox, and upto 10MB attachments per email, is having some software go through my emails to place relevant text ads, then so be it. That's worth the price.

Now if it were human beings reading the emails, that's a different thing all together.

Also if they were getting paid to place certain companies results ahead of their competitors, that might bother me if the results became less relevant to the search.

What if they correlate your search behaviour with your email contents, and obtain your name and address from those emails, and then sell that information ? A big lawsuit that's what 8)

Hani Obaid
Wednesday, April 21, 2004

*  Recent Topics

*  Fog Creek Home