Fog Creek Software
g
Discussion Board




Storing files in databases

We need to attach files to records in our database system. I would like some opinions on what is the best way to do this, here are the ways I have thought of:

1. The way we do it now, simply storing the path to the file - this is bad because the user attaches a file in their my documents folder and then complains it is a bug in the program because their boss can't read it - or worse still, they attach the file and then delete it off their disk thinking it is "Attached".

2. The way Joel does it in FogBugz - Only issue with this is that our application has both a web front end and a desktop front end. can this still work in this situation? I was thinking it might become a pain because we would have to give users direct access to the folder structure whereas Fogbugz can keep that away from the users.

3. The way I have done it in another application - storing the file as a string in a text field in SQL Server. This seems like a good way of doing it, and I just Base64 encode the strings before storing them just because I was told this was necesary to prevent corruption. Is that encoding really necessary? and is this an accepted way of doing this? And am I correct in thinking it would work in all database because memo fields all work the same?

4. Using the Databases proper binary format: Blobs, General, etc. - My only worry with this is that we will have to write seperate code to handle each of the database backends we must support (VFP, MSSQL, Oracle, Others?) - Is there a way to make use of the proper binay fields with a single set of code?

ChrisO
Monday, September 29, 2003

I suppose I should add my question in case it isn't clear, Are any of the ways I have thought of the "accepted" way to do attachments in a database system? Or are there more industry standard ways of doing this in a reliable fashion?

ChrisO
Monday, September 29, 2003

I like storing attachments as blobs in a database. The advantages of blobs are:

- can't lose the attachments, it they are in the database along with the records referencing it
- backing up the database backs up all the attachments too in one go
- less of a security risk on public web sites as the file doesn't actually exist on the file system and so malicious files uploaded to the server can't be coaxed to run

Matthew Lock
Monday, September 29, 2003

By the way base 64-ing binary files usually doubles their size.

Matthew Lock
Monday, September 29, 2003

Matthew, would that suggestion map to either my 3 or 4? And if 4, is that compatible with all databases?

Also I was going to ask if anybody has had to do this in an "enterprise" situation and if so have they had any experience at sending the files off to a virus scanner before attaching them etc?

ChrisO
Monday, September 29, 2003

And yes I have been experimenting with not Base64ing the files this afternoon and it seems to not cause any dramas.

ChrisO
Monday, September 29, 2003

What would be the typical size of the files? This will have a large impact on the solution.

Rhys Keepence
Monday, September 29, 2003

Anything, usually word documents and photos - so usually less than 1-2MBs and most definately not more than 10MB for the most part

ChrisO
Monday, September 29, 2003

If you're accessing the databases using oledb, you should be able to use the same code (or with very minor modifications) on Oracle and SQL Server. In theory this should extend to all database types that support blobs, although I have no idea as to whether this is true in practice.

Sum Dum Gai
Monday, September 29, 2003

Zip the files before storing them.  Files like Word docs drop to about 10-20% of their original size when zipped.

T. Norman
Monday, September 29, 2003

And if you're storing them in a text column, use Base64 or another text-based encoding scheme unless it is documented that binary dumps can be reliably stored and fetched from such columns. If that capability is not documented, the fact that you see it work a few times (or even many times) doesn't mean it is guaranteed to work every time.  Chances are it isn't documented or supported, which is why they provided Blobs.

T. Norman
Monday, September 29, 2003

Why would a web-based front-end require giving the users direct access to any folder?  Browsers can upload files via HTTP, and you can provide links that will enable users to download files via HTTP.

NoName
Monday, September 29, 2003

I would be inclined to store the files on the filesystem.

Storing files in the database can create problems. In particular, if the files are large, you will have to consider sizing of table spaces, etc. If the files are to be updated regularly, you will experience bad fragmentation of tablespaces, which will negatively affect performance. If you are performing queries often, you will run into the performance issues of having large chunks of unsearchable binary data mixed in with the real data.

My approach would be this: create a directory on the server filesystem with permissions so that the web server user may only access it. This is where the uploaded files are placed. This shouldn't under the web server's root directory either. Write a server side script (once again, not sure on what language/environment you are using) that writes the file requested to response stream of the request, making sure to set the right mime type. This allows you to implement security, so that bob can't download fred's files by typing in random url's, and to implement whatever other feature later on (caching, etc).

Rhys Keepence
Monday, September 29, 2003

Problem with the files in the file system approach is what to do with clustered servers? What happens if you need to distribute your app across multiple servers, in which file system are the files stored now? If the files are stored in the DB then the DB's facilities take care of the problem. Overall easier to handle this way IMHO.

Alex
Monday, September 29, 2003

I'm building an application that manages file uploads/downloads between two entities, and i'm just storing the content of the file in an Image field on the database.  basically stores it in binary, and doesn't matter if it's text or whatever.  i just stream the file directly into the database, and vice versa for downloading.  i don't have to worry about file-system permissions or anyone seeing someone else's files.

nathan
Monday, September 29, 2003

IMO, using BLOBs is the simple strategy.  BLOBs are easy to implement and work reliably on most databases.  Differences between databases will mostly depend on the API you are using, and probably will be small. 

Storing files on disk has the problem of referential integrity - you may have to build a mechanism to verify that files are cleaned up correctly when they are no longer needed, and ensure that the incorrect files never get deleted.  I would choose files over BLOBs only if there was specific performance problems with the database that couldn't otherwise be resolved.

Colin Evans
Monday, September 29, 2003

The files can be on the file-system, for a cluster all you need is a reliable file server/NAS system. This is simple and proven.

Storing the file in the database is practical however for infrequent uses, I don't  think database are a good substitute for general purpose file systems.

Compression is a good idea no matter what you choose. The simple and easy way to do that is with zlib. The library is opensource, widely deployed and the license is very similar to the BSD license (ie: free without liability for the author.)

Since you seem to imply a client on the users' machine, you can have the client do the compression and save on the network bandwidth, too.

Saruman
Monday, September 29, 2003

A lot depends on how many files you have and how robust your database system is. My preference is for relative links. That way you transfer the whole folder the database is in, and all the links still work. And basically all people need to do is upload the file and type in its name, or you set that programmatically.

Should work for both web and desktop access as well.

Stephen Jones
Tuesday, September 30, 2003

*  Recent Topics

*  Fog Creek Home