Fog Creek Software
Discussion Board




Database versus flat file flamewar


I was discussing an application of mine with a friend of mine who is also a comp sci guy. The application is a strategy game that handles about 10 players per game.

He was quite adamant that storing my data in a database was a poor choice compared to a flat text file. Having worked on enterprise systems for a while I just threw everything into a database without much cost/benefit analysis.

To me the only real advantage of a text file is ease of quick editing. I think this advantage is outweighed by the referential integrity you get from a set of relational database tables.

I don't see a database as being harder to work with that text file data. Right now I use SQL to manipulate the data and JDBC to pull data into my application. If I was using a flatfile I'd have to write all that glue myself.

Eric Raymond seems pretty psyched up on text files in his new book, but he's really talking about binary formats vs text formats and doesn't seem to cover databases.

Does anyone else see advantages to the flat file?

NathanJ
Sunday, January 04, 2004

Well, in your case it looks like database files are at least two acronyms more complicated that text files.  :)

sgf
Sunday, January 04, 2004

Standardization? XML is XML. On the other hand, I've yet to see two DBMSs that had more than a tiny least common denominator common across both systems. Also, with a text file you don't need a server, and most (though many don't) DBs need a server. More admin work, etc.

And a game may really have different enough requirements and data to make a text file worthwhile. I imagine some data would be just silly to store in a database. Maps, for instance, don't feel row-oriented to me. Or data that must be accessed quickly without the overhead of a connection, such as any number of key/value pairs that games need a lot of.

Everything has its place. I usually use the DB for everything when I have one, but i.e. a DB is excessive for most configuration stuff.

Mike Swieton
Sunday, January 04, 2004

Did your comp sci friend give any technologies for accessing this flat file?  What are you planning to put into this flat file/database?  Is it shared among the multi users or is it user specific?  Is speed/latency/bandwidth an issue?  How often will the data change?  Will the user have access to a file system or is it sandboxed away?  Should I go on?

Make the best decision that you can.  It sounds like you optimized for development time -- I know how to do it this way and it will be less code for me.  If you get further along and people complain how hard or performance is unacceptable rewrite that section of code.  You did properly abstract it right?

Billy Boy
Sunday, January 04, 2004

stick with the database for now at the very least.

Everytime I make the decision to go with a flatfile solution for anything thats not trivial I _always_ end up wishing I hadn't, and often end up moving to a database eventually.

A database is perfectly designed to store non-trivial data, and Im betting thats what you want to do.

(they also scale an awful lot better than a flatfile does, which for a networked game I can see being important)

FullNameRequired
Sunday, January 04, 2004

In this day and age, we shouldn't be writing parsers. If your data is simple, put it in an XML file and use an XML parser, if you care. My advice would be like most: stick with what you already have, on the theory that it's not broken.

Brad Wilson (dotnetguy.techieswithcats.com)
Sunday, January 04, 2004

I don't put a lot of weight in ESR's writings...  As for flat textfiles, they're useful when it's cheap & easy to code, like simple comma-delimited lines.  For anything more complex, especially when it's probably going to change later, some scaffolding's probably appropriate.  A database connection can work well as scaffolding now, but you'll eventually have to figure out what you need from the dataset, and how you need it.

A good mix may be berkeley db (sleepycat.com), or metakit (www.equi4.com/metakit/).

Lally Singh
Monday, January 05, 2004

It takes time and expertise to install and configure a database. Using a database may or may not be worthwhile, based on the complexity of the data accessing, your database skills, where database software needs to be installed, and other factors.

Julian
Monday, January 05, 2004

Sqlite (www.sqlite.org) gives you all the advantages of a flat-file while having all the features of an RDBMS.  Might be worthwhile choice for something like this.

Almost Anonymous
Monday, January 05, 2004

> If I was using a flatfile I'd have to write all that glue myself.

Libraries that CSV for you, MySQL, Berkeley DB, etc etc...

Li-fan Chen
Monday, January 05, 2004

Many many real games, e.g. the Command & Conquer ones, use ini files.

i like i
Monday, January 05, 2004

A strong flat file proponent myself. Yet keep in mind that scale as well as complexity is proportional to the preference for a DBMS (Relational or not).

I'll be a bit careful of XML. Should you want something flat for something specific, as your case seems to me, I'd suggest coding your own. Mainly because if you can manage to get rid of any variable length strings in your database, searching/replacing/inserting records is far faster with a custom built library than in an XML file. At least that's what I faced. I may not have exploited XML's capacity to the fullest.

Indian Developer in India
Monday, January 05, 2004

So how big is this database anyway? A 10-player game shouldn't have huge amounts of data like an MMORPG. Are you storing large graphics and sound files in this database? Otherwise I'd suggest simply loading all the required data into memory at the start of the game. The storage format could be anything, XML or CSV or a simple binary format or whatever.

Chris Nahr
Monday, January 05, 2004

> Does anyone else see advantages to the flat file?

The basic advantage is far less complexity.

I would also investigate systems like Prevayler, which are much faster and simpler than DBs (because they are in-memory and do a lot less).

Portabella
Monday, January 05, 2004

Although you don't say what you are storing in your file, it is quite common for games especially to use flat files for storage.

1. You don't have to distribute data access components with the game.  When was the last time you saw a game say "Installing MDAC".

2. Data access components are probably not available on a console.

3. A flat file easily maps to an array in memory.

4. Most commercial games are written in C/C++ and performing file i/o in those languages is easy compared to learning ODBC or DAO or what have you.

5. Most games define their own file formats simply because there's a gazillion ways to do it and the author does it the way he sees fit.  Not everything maps to a relational database.  Storing/Reading graphics in a flat file is a lot faster than pulling them out of a database.  Load time is everything.

6. Straight file access is >a lot< faster than executing a query in SQL every time you need some data or loading up an XML DOM.

7. Straight file access uses less memory than a database and it's drivers.

There are more reasons than this but if you ever make or have made a game you will understand.

Inside Sales
Monday, January 05, 2004

Most of the arguments against a database above apply to traditional client/server databases, but something like SQLite would work wonderfully in a game. 

In fact, I am a game developer, and I recently wrote the code to control character customization for a console extreme sports title, amoung other things.  (Amped 2 for Xbox, if you are interested.)  In the game, there are more than seven-hundred different items of clothing your character can wear that involve both model and texture variations.  All the information about the categories of clothing variations, what sex the clothing was appropriate for, and where to find the models and textures on disc were stored in a flat file.  In retrospect, it was a bigger pain to maintain this database and the code to read and write it than it would have been to plug SQLite in to the codebase and use it for this task. 

If I implement something similar in the future, I will strongly consider using SQLite.  It looks like it would be fairly easy to get running on a modern console system.  The license is friendly to closed-source applications.  My only concern is how much memory it would use at run-time, but perhaps that could be worked out by limiting the cache.

Matt Kimball
Monday, January 05, 2004

Huge Flat Text Files are not something you maintain.  They are not meant to be maintained by you.

In your situation I would create a relational database of the data you describe and then write some code to take the data from the database and create the flat file.

Inside Sales
Monday, January 05, 2004

For the "text files are easier to maintain than XML" - people, people, PEOPLE!

We have tools - USE THEM!

If you're going to do any work with xml, you should write a schema for it. Then, using an XML editor like XMLSpy (there are others; that's the one I use) make slinging XML easy - you get auto-closing of tags, intellisense on fields, automatic data validation...

It's a very sweet way to work.

Philo

Philo
Monday, January 05, 2004

A text file can be easier because you can easily
splat a complex data structure to the file. If
you do primarily primary key access then
a flat file can work very well for a long time.

Where i have regreted my decision to use
flat files is when i have to start making
queries across lots of data. Then flat files
are just too slow.

I also try to make the layers such that
i can go to a database later.

son of parnas
Monday, January 05, 2004


Thank you for the replies.

Some of the advice seems geared more to a traditional game installed on the client. This is a web-based turn game. Most of the game data is stored in the server database. Images and a few configuration bits are stored as regular files. The data is loaded when needed and stored in a cache. Almost all game logic is handled on the server.

--
The basic advantage is far less complexity.
--
I agree that a flat file is less complex than a database, but less complexity is not always advantage. An advantage should equal less work. But I still need an API to parse/read/write the data (whether I code my own parser or use something like CSV/XML is another story). I'll still need to worry about data integrity (XML may help in this regard, but then I'm back to using a bunch of API code, if I use CSV all that checking would have to be written from scratch).

The only extra work created by the database is the need to create and run the database server. For me that isn't really complex because I already have one running.

--
6. Straight file access is >a lot< faster than executing a query in SQL every time you need some data or loading up an XML DOM.

7. Straight file access uses less memory than a database and it's drivers.
--
I agree with points 1-5, but as I just mentioned this is a little different from a "normal" console game.

Performance hasn't been a huge problem for me yet (since this isn't a real-time game). Most players have less than a few dozen interactions with the server each day, so database hits aren't a problem. I can imagine performance problems if I start handling hundreds of users, but in that case it seems like data access from a file would also take longer. With a database I'll start to see benefits from the built-in indexing.

NathanJ
Monday, January 05, 2004

I'd be grateful if anyone had an interesting text/link comparing flat files vs. (what-people-think-when-they-say-) databases.

FYI, I use something similar to XML for my data needs.  I'm reading Weikum/Vossen's book on transactions (probably my favorite compsci book because of insane clarity), and I can see myself giving it indices and a transactional layer if I start doing weird things.  Maybe something like Prevayler, as Portabella mentioned.  (I assume Prevayler works because it exploits people structuring their apps as oop, so serialization comes for free.  Do monsters go away when you shout "OOP!"?)

Tayssir John Gabbour
Monday, January 05, 2004

Prevayler works if and only if your data all fits in memory.

The "prevalent hypothesis" is actually pretty logical.  Nearly half of computer science is devoted to algorithms for doing things quickly with slow storage by appropriate caching, indexing, etc.  At some point, though, RAM is going to start getting cheaper faster than hard drives.  And when that happens, prevalence-style applications will start to become more practical in general, instead of being a relative niche as they are now.

Phillip J. Eby
Monday, January 05, 2004

> I assume Prevayler works because it exploits people structuring their apps as oop, so serialization comes for free.

Not sure what you mean about oop.

The Prevayler docs explain in more detail, but the basic idea is that you serialize all your business objects, and then keep a write-ahead log of all the commands which change the business objects.  The write-ahead log consists of the serializations of all the commands. So you can always get back to a known state by getting the last snapshot and rerunning all the commands.

I believe that there's one implementation in Python and another in Java. I would think that C# shouldn't be hard to add. Rather than oop, I'd say that Prevayler exploits the serialization features of modern languages.

Portabella
Monday, January 05, 2004

> The only extra work created by the database is the need to create and run the database server. For me that isn't really complex because I already have one running.

Fair enough, but for many people who *aren't* running a DB, it's a compelling argument.

I'm not dissing DBs either; to paraphrase Bjarne Stroustrup, relational databases are the best relational databases around.

Portabella
Monday, January 05, 2004

"Prevayler works if and only if your data all fits in memory."

Your data usually does fit in memory. A relational database is now just a data structure.  I'm not sure why prevalayer is interesting, because I've been stuffing my whole RDBMS dataset in memory since 1998.

bill joy
Monday, January 05, 2004

As I understand from reading the docs, Prevayler is interesting for having a better interface than SQL, and robust plus fun to hack due to small implementation sizes.  (And apparently speed.)  Of course, what popular thing stays simple and small?

It's weird though, because when people want to start making statements about objects, we're back to things like relations...  So.. I don't see what's going on here.  Is this one of Date's critiques, that SQL has really bad object types, and other technologies might start eating RDBMS's lunch because of it?

Tayssir John Gabbour
Monday, January 05, 2004

Prevayler didn't seem particularly simple to me.
I had a hard time understanding how to make it
work from the documentation. In addition, write
transactions were slower than mysql because
it does synchronous writes.

son of parnas
Monday, January 05, 2004

> write transactions were slower than mysql because
> it does synchronous writes.

Ironicallly, you could probably speed it up a lot by applying standard database techniques. For instance, if you have a Command which updates a Customer and it looks like this:

  new UpdateAccountBalanceCommand(Customer c, long newBalance).update()

then the serialization mechanism needs to serialize the customer as well as the new balance. If the customer includes some embedded objects, then this might be fairly slow.

Whereas if you did:

  new UpdateAccountBalanceCommand(long customer_id, long new balance).update()

Then the serialization would have a lot less work to do, and thus be much faster.

Obviously, the mechanism is going to work best if there are lots of reads relative to the number of writes. I have seen plenty of applications which fit this profile.

Portabella
Monday, January 05, 2004

Some obvious thoughts:

If anyone ever is going to do any of the following, use a database:

1)Write custom reports
2) Chart/Graph the data
3) Write their own data entry forms
4) Link to another application's SQL database
5) Merge two data sets
6) Fix problem data (by an end user)
7) Implement more advanced DB features (transactions, roll back, etc)

If not, then a flat file sounds ok to me.

pdq
Monday, January 05, 2004

Adding a few more criteria to PDQ's list:

* Share the data between more than one process
* Share the data between more than one user
* Make business decisions using the data

HeWhoMustBeConfused
Tuesday, January 06, 2004

In Nathan's situation a database is a fine choice.  This isn't an application that gets distributed to multiple clients, so that installation and configuration issues have gone away.  What the database does do is prevent Nathan from having to spend a lot of time implementing data storage algorithms.  It leaves him free to spend his time on application logic and generating customers, both of which provide much more value than writing data storage routines.

Clay Dowling
Tuesday, January 06, 2004

> In Nathan's situation a database is a fine choice

I agree with the decision (basically because he said he's already running a DB anyway), but I think your reasoning is totally wrong.

There are plenty of web applications that are more complicated than they need to be precisely because they're using a database: it's mapping code you don't need, system administration headaches you don't need (is the database up?), etc etc.

If you're distributing the database to clients, it's even worse.

Again, if you *need* a DB, by all means use one; but if you can make life much simpler by NOT using one, don't hesitate.

Portabella
Tuesday, January 06, 2004

you are so not answering my question!

Amelia Thermopilis
Tuesday, February 24, 2004

I have SAS programs reading data from huge flat files. Does anybody recommend Database solution for this? Any Pros and Cons for both solutions?

msv
Tuesday, April 06, 2004

*  Recent Topics

*  Fog Creek Home