Fog Creek Software
Discussion Board




mbox vs. RDBMS


I am trying to come up with a way to save e-mails from an e-mail client program I am writing. I am currently comparing 2 solutions:

1) using mbox or a similar file format 
2) Using an RDBMS

You are probably asking yourself "Why reinvent the wheel?". Well, because it is fun to do that sometimes.  :)

Anyways... I was planning to use SQLite for the RDBMS solution. I have read all the pros and cons of SQLite. It is not a traditional RDBMS system, and its limitations may not work out too well. The biggest problem is that a given row is limited to 1MB in size, which, the author claims, could be extended to 16MB by changing a define in the source code and recompiling.

16MB is quite a good size for text-only e-mails, but when attachments come into play, then it's a problem if I chose to save the whole thing in the database. If I don't save the attachments into the database, but save them locally to a file, than it's another issue to try to keep track of which e-mails had which attachments; what if two e-mails have files with the same name attached, but the files are actually different?! Aarrgghh!!

Why use RDBMS?  I basically thought I could use the power of SQL instead of writing my own search algorithms. If I use mbox file format (or any other), I have to write the code to maintain those files too. Using SQLite would save me lots of time and testing.

Are there any e-mail clients using a database to save e-mails as opposed to flat files? Any other pitfalls I am unable to see other than the ones I mentioned?

Your input would be appreciated.

entell
Thursday, April 15, 2004

JWZ already went down that path:
http://www.jwz.org/doc/content-length.html

Check is homepage. He wrote something else about not using an RDBMS and using mbox's instead. It might that same text, I can't remember.

RP
Thursday, April 15, 2004

Outlook uses it's own file format to store emails.  There's nothing wrong with using a traditional SQL DBMS (MS SQL Server, Sybase ASE, etc.) for this.  It actually saves you a lot of headaches since you don't have to roll your own file format, parsers, and libraries.

Plus, it's kind of neat to be able to run SQL queries against them (makes it easy to do trend reporting, trivial to delete emails based on a certain criteria, etc.)

MR
Thursday, April 15, 2004

You could also use something like a dBase-format (*.dbf) file.

Christopher Wells
Thursday, April 15, 2004

If you are doing a mail application, I request two things:

1) Keep the attachment IN the e-mail until the user explicitly pulls it out (Outlook does this). This has it's downsides, but from the point of view of organizing e-mails, it's a big win for the user (in my opinion anyway). Eudora gets this VERY wrong in my opinion.

2) Give a simple way to get an mbox format file OUT of your system. One of my (many) rants about Lotus Notes is that it's text output format does NOT include the attachments. I was trying to save all e-mails related to a project into our source control system, so I exported them as text. This would have worked perfectly, except that many of the e-mails had attachments of one form or another, which got lost.

Also, think about including easy import of mbox files. Combined with easy export, folks can safely try your app without fear of losing their stuff.

Oh, one more thing: You might think about storing the messages in mbox files, but maintaining an index in an RDMS. Don't even bother to re-write the mbox file when a message is deleted - just kill the index entry. Only re-write the mbox file (recovering lost space) when the user says to do so. The only thing this doesn't get you is the ability to full-text search via the DB. Everything else is pretty good though.

Michael Kohne
Thursday, April 15, 2004

Instead of using mbox, you should look into using the maildir format instead, which is widely used in the Unix world.  maildir differs from mbox in that each message is stored in its own individual file, and there's no locking issues that plague mbox.

See http://cr.yp.to/proto/maildir.html for more information on maildir.

I also agree with the idea of using an RDBMS (or other database system, like Berkeley or Jet (or whatever you MS guys use for embedded databases these days)) for indices.

joev
Thursday, April 15, 2004

If you're going to use mbox, store all the indexes on a separate file.

RP
Thursday, April 15, 2004

It all depends on the context. If you ever will need to access the data from another application, go with SQL. If performance is critical, go with your own specialised storage. Plus what kind of indexing you need, full text only or specialised parsing of all the headers.

Jonas B.
Thursday, April 15, 2004

You may find this interesting, it's how Arsdigita made a webmail system that imported mbox to an RDBMS

http://rhea.redhat.com/asj/webmail/

Matthew Lock
Thursday, April 15, 2004

Thank you all for your comments and suggestions! You gave me lots of leads to follow.

entell
Friday, April 16, 2004

*  Recent Topics

*  Fog Creek Home