Fog Creek Software
g
Discussion Board




Files in the DB

I havent yet dealt with tucking binary data into a DB. I have always favoured uploading the files to a plain folder and storing the paths in the DB.
Now I got a CMS type job where the client kind of assumed that the images (and assorted files like PDFs) would go into the DB.
Should I talk them out of it and go with the approach I know, or put the files right into the database like they seem to want?


(DB engine will probably be MySQL since they already have servers up and running (albeit not very well kept.. see my last post, same client). This is a stand alone thing though so I guess I could talk them into using another db for this.)

Thanks

Eric DeBois
Thursday, January 22, 2004

There is really only one way to do this. Test, test, test! Take the files they want to manage with a DB. Make sure you have full representation of file sizes expected (and some big ones). Load up the DB with a ridiculous number of these files. Run some load tests on your setup and provide the results to management.

For smaller files, some DB's do well, but there seems to be a breaking point where life goes sour.

m
Thursday, January 22, 2004

Yeah, I was planning on doing some testing, but I wasnt sure it was even worth bothering with.

Tell me.. when things go sour, how sour do they go? Occational ful stops are kind of acceptable, but messed up data is a total no no.

Eric DeBois
Thursday, January 22, 2004

We have found slowness in retrieving the larger files and bringing that session to its knees. This isn't entirely unreasonable, but the file system was a better fit for our particular needs. I would say data reliability will not be a problem from the DB.

m
Thursday, January 22, 2004

I haven't used MySQL for this, but I've used other database products and haven't had a problem storing a lot of binary files in a database.

I've heard that if you're just storing files, and not removing them, then it isn't really a problem.  If you do a lot of removing/changing files, though, it's better to just store their paths.

Matt
Thursday, January 22, 2004

I deal with a 16GB SQL server database of binary data on a daily basis.  It's nearly unmanageable.

For the simple CRUD-style  apps,  everything seems to work fairly well, but for anything more complicated - backups / restores, replication, etc... it's a nightmare.

Save yourself the trouble and use the filesystem for the one thing it was made to do.  Use the DB as a metadata store only.

Sassy
Thursday, January 22, 2004

I've done the path thing and the blob thing. The one thing I can tell you is that the blob thing can save you a couple of weeks of testing and debugging because you have no path or permissions issues.

Of course the downside is that you cannot browse the files.

Philo

Philo
Thursday, January 22, 2004

The only problem with treating the file system as some kind adjunct to the database, with the location stored in the table is that its very difficult to validate.  The files can move, be deleted or suffer, as Philo says, from some kind of permission change.

Then as paranoia increases you can begin to wonder if file x really is the same file x as referenced in the database.

The problems of replication are rather made worse not improved by separating it out in this way, since the data has to get replicated anyhow.

For performance sake you might split the actual file data into a separate database (in some databases a separate table would be sufficient), but keyed in the original record.

Simon Lucy
Thursday, January 22, 2004

In that case I'd store the files in the database. The primary reason is for the reasons that Philo mentioned -- you have no consistency if half of your data (the, err, "metadata") is stored in the database, and half is stored in a non-ACID participating filesystem with a completely separate administrative process, backup scheme, etc. The primary downside, of course, is that you can't use the many filesystem tools on database stored files.

Regarding some of the pitfalls, I think they might be overblown a bit -- Yeah you'll have to back more database up, but this is in lieu of backing up more filesystem. Yeah replication will take longer, but presumably if you're replicating the data then you need it at multiple sites, then you'd have to replicate the filesystem as well, etc. i.e. One is instead of the other, rather than being in addition to the other.

Dennis Forbes
Thursday, January 22, 2004

>DB engine will probably be MySQL

A DBA/Unix friend of mine told me that MySQL has a limit for the maximum data size it can handle for a single INSERT statement is 16MB. So that would mean that if you have files larger than 16MB to be stored in LOBs they get corrupted.

This is worth checking out before you store files in MySQL, since it would be a bitch to run into this problem when the system has gone into production.

I might be wrong on the above, but check beforehand.

Patrik
Thursday, January 22, 2004

Ok. thanks all of you. Good info.

I looks like the benefits are greater than the risks for this app.

Eric DeBois
Thursday, January 22, 2004

Actually the default configuration of MySQL is 1MB table rows.  This means it will not stuff up a file greater than 1MB with an error.

I have quite a bit of experience in this regard; I currently have 5000 images files stuffed into a MySQL table.  For regular files, however, I store them in the file system (due to the size limitation) w/ a table referencing them. 

Pluses:
*Way easier to insert/retrieve a file from a BLOB field than from the file system.  No need of file permissions or anything.
* Backup is much easier; just backup the DB and you have all the data.

Negatives:
* Size limitation of MySQL.
* Large transfers from the DB server might not be as efficient as opening a local file.

Almost Anonymous
Thursday, January 22, 2004

The document management systems that I'm familiar with typically have an area on the filesystem devoted to them. They store the metadata in a database and copy the file to their own protected area on the filesystem. They also typically rename the files to some unique name. By putting the files in a protected place, the user can't screw with the files and break the links yet you don't have to burden the database with the actual contents of the files.

pdq
Thursday, January 22, 2004

pdq - that sounds like the worst of both worlds - you have no filesystem access to the files, but you *do* have to worry about users screwing with the files, relative pathing problems, and sysadmins messing up your permissions.

Philo

Philo
Thursday, January 22, 2004

I'm dealing with an app that holds files and metadata about the files. I considered using BLOBs or the filesystem.

I ultimately decided against using BLOBs. Most apps that traditionally do this sort of thing don't use BLOBs. It also makes the database a bastard to manage as someone mentioned...

However the biggest drawback about the filesystem is that I haven't found a simple way to make the update of both part of a single atomic transaction.

Surely it must be such a common requirement that I'm really surprised that there is no standard set of classes available.

Gwyn
Thursday, January 22, 2004

Hang on, does the client mean the same thing as you when he says "the database"?  To you it means a specific piece of software that performs a function, to him it might be a generalized way of saying the "website backend".

I'd keep the files stored in the filesystem if possible.

maybe misunderstanding
Thursday, January 22, 2004

Good point on understanding the client requirements.

"I'd keep the files stored in the filesystem if possible. "

...because...?

Philo

Philo
Thursday, January 22, 2004

The principle of "keeping it simple, stupid".

maybe misunderstanding
Thursday, January 22, 2004

The 16MB largeblob limit in mysql is accurate (see packetlength in my.cnf).  Consider breaking the file up into 1MB or less chunks and reassembling as you need them.

dir at badblue dot com
Thursday, January 22, 2004

The KISS principle really leads one to put files in the database, not in the file system.

1. The file system isn't transactional
2. The file system doesn't have relational integrity
3. The file system is easily tampered with

There are reasons for both, but on the whole, I prefer to put the files in the database if it's not otherwise inconvenient.

Brad Wilson (dotnetguy.techieswithcats.com)
Thursday, January 22, 2004

I don't see how putting files in a DB is *less* complicated than using the file system.  A quick glance at this thread would indicate otherwise. Besides, what is a file system but a database specifically designed for storing files?  I'd trust the filesystem over mysql anyday.

File references are still maintained in the database so your points regarding transactions and relational integrity don't really apply. I disagree the file system is more liable to tampering than the database, personally I think the opposite is true. Regardless, the developer is a much greater factor in reliability or security than the approach.

maybe misunderstanding
Friday, January 23, 2004

Oh yeah, if somebody does manage to tamper with your file system, you're pretty much rooted anyway IMO.

maybe misunderstanding
Friday, January 23, 2004

As for the drawback of not being able to browse the files, it should be a matter of a few moments to knock up a dodgy client that can preview the data. 

A form with a container consisting of embedded viewers of the required kinds.

Simon Lucy
Friday, January 23, 2004

I set up exactly this situation (in MySQL) for a customer about 18 months ago. Based on testing & opinions I went with files in the database, partly for ease of backup as the whole thing can just be exported.

The table, now at about 500Mb has corrupted three times since the launch. That's not a catastrophic issue; just a simple MySQL table corruption which can be solved with the standard 'REPAIR' commands. These have sovled it each time and there have been no lasting problems. The problem is the downtime in the meantime. (only minutes / at most a couple of hours) but it's a heavily-used corporate intranet so any downtime is bad within working hours.

I'm very slightly nervous of it but for now they have to stay in the database. In the next round of changes to the system I'll look at moving them out; but am undecided for now.

James

James Ussher-Smith
Saturday, January 24, 2004

I find it fascinating that most of the comments about why it's a bad idea to put files in the database specifically related to limitations and fundamental lack of stability in MySQL. I'd recommend looking into SQL Server, Oracle, DB2, or at least Postgresql for crying out loud.

Dennis Forbes
Saturday, January 24, 2004

*  Recent Topics

*  Fog Creek Home