Fog Creek Software
g
Discussion Board




Embedded Databases (ala MS Jet or Berkeley DB)

I've been looking at embedded databases for use in a project that requires multiple computers to share a database without a server.

Basically, I don't have a problem with the key/value aspect of Berkeley DB; but I really need the database files to reside on a shared filesystem. Berkeley DB specifically states that databases can not reside on NFS or other shared filesystems.

I can't use Microsoft's Jet or MSDE embedded databases because this system is for the Gov't and must be able to run on Unix as well as Windows.

Can anyone recommend an embedded database solution that supports POSIX clients and allows multiple machines to share database files that reside on shared filesystems?

Jeff
Tuesday, December 16, 2003

Don't know if this will help but SQLite is becoming popular as an embedded DB http://www.hwaci.com/sw/sqlite/

Matthew Lock
Tuesday, December 16, 2003

SQLite is extremely good. It won't allow you to have multiple clients accessing the database at once (as far as I can recall this means two can have it open, but only one transaction active at a time per file). NFS shouldn't be a problem, but you should check the mailing list archives to be sure.

Mike Swieton
Tuesday, December 16, 2003

MySQL can run 'in process'.  However I'm not sure about the NFS support.  it's not usually a good idea to do that sorta thing.  Why can't you have a database server?

Michael Koziarski
Tuesday, December 16, 2003

I won't plug my product, but embedded database engines
is what I do for a living :)  But our product (and probably
nobody else) doesn't do exactly this, if what you want is to
have a central datastore that is concurrently accessed by
multiple accessors without a central server.  It can be
done, but you'll need something with a  very fast database
open and close - since your apps will have to open and
close the database every time they exchange control - and
you'll need some sort of application-level messaging
scheme for making sure only one accessor is in the database at a time.  If this doesn't match your
requirements, you will have to get a client-server database
or pitch the concurrency requirement, otherwise, it ain't
gonna happen.

If you want more info about our product - which does
satisfy your other needs regarding OS, etc - send me email.

foobarista
Tuesday, December 16, 2003

plenty of options out there, about half of these are embeddable, and all are open source.

http://www.mysql.com/
http://www.postgresql.com/
http://firebird.sourceforge.net/
http://www.hwaci.com/sw/sqlite/
http://www.sapdb.org/
http://www.sleepycat.com/
http://hsqldb.sourceforge.net/
http://axion.tigris.org/
http://mckoi.com/database/
http://www.drjava.de/smyle/
http://www.jepstone.net/tinySQL/

divide by zero
Tuesday, December 16, 2003

I would recommend sqlite also. It is extremely easy to use with numerous languages and on various platforms. A plus, you can create an entire database in :memory: .

Tom Vu
Tuesday, December 16, 2003

OK, off topic, but here's what I've always wondered - how the hell does a shared file database system work?  You've got one DBMS file set sitting on a single network box, and all the "clients" simply open the same file and do their thing.  There is no server app running on the host box at all.  How could this possibly work - there is no single point of arbitration that determines who can read and write what record. 

Somebody esplain this to me ...

Mitch & Murray (from downtown)
Tuesday, December 16, 2003

M&M:  I don't know how any of the other DBMSs manage it, but Access creates a separate file and uses it to store page locking info.  The last client to close the database deletes the lock file as well.

Sam Livingston-Gray
Tuesday, December 16, 2003

The only way is if you simulate single-process access, and
you do some sort of application-level queueing or
whatever.  I've actually seen embedded systems
use this style of access.  The keys to making this work is
that the "database open" and "database close" operations
have to be really fast.  If this sort of queued access is
unacceptable, another way to do this is with cooperative
locking through shared memory if the apps all run on the
same machine.  If they aren't on the same machine, you
have to go client-server.

foobarista
Tuesday, December 16, 2003

Well JET does NOT get installed on the server side, so, in theory, you can use jet with any server, as long as the clients are windows based computers. Thus, you can use JET with a Linux samba share for example. However, some attention must be paid to the op-locks settings. There is some risk in doing this, but assuming the file system is compatible with windows, then one should be able to use JET. I can’t recommend this, as never did have production code running this way.

Since these embed systems run totally on the client, no software actually get installed on the server side.

As to how these file systems work? Hum, that is actually a good question. While most OS systems support some type of conflict when two users try to open, or grab the same file at the same time, then the server can resolve this.

Since most servers support a file being opened at the same time by more then one user, then things get a LOT more tricky when it comes to locking records. When trying to lock a record, the OS can’t really manage this problem. Further, all software is running on EACH client.

With Microsoft JET, not only does it manage record locking, but it also supports referential integrity. That means if two users try and write a record to disk at the same time with the same key id (or, even just a regular field with a normal index set to no duplicates), the whole thing works! In other words, the software running on each pc somehow does NOT allow two users to write the same key id to a record. A good many file based systems from FoxPro, FileMaker and JET have existed over the years...so, they do work.

Thus, the standard windows file system does support locking in shared files. I don’t know exactly how, but a some locking mechanism does occur at the file level. I mean, any multi-threaded application running on the server has to be able to manage this conflict also. Also, multiple applications running on your desktop also have to resolve file sharing issues. Thus, I do assume that the locking is done via the files system op-locks. For example, if the op-locks setting is not set right in windows 2000 server, then it can cause corruption in JET shared files. (big problem..and in fact the service updates to win2000 actually changes the op-locks setting to fix a JET problem).

So, there is some OS support for locking of files. Otherwise, it can’t work!

So, you still might be able to use JET (depends on your client computer). There is also some very high performance file libraries available, and the following good database one uses the dbase format:

www.codebase.com

Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
http://www.attcanada.net/~kallal.msn

Albert D. Kallal
Tuesday, December 16, 2003

Faircom's c-tree Plus looks like it may fit the bill. It's available for a number of platforms and the documentation specifically mentions NFS support.

http://www.faircom.com/index.shtml

me
Wednesday, December 17, 2003

How does this work? Well, most OS's provide both file locking and region locking. So when you need to write a file, you first lock the section you're going to write into. Then any other processes will get a failed read or write if they try to access that part of the file while you've got the lock.

Chris Tavares
Wednesday, December 17, 2003

BerkeleyDb shouldn't be shared over NFS because of characteristics of NFS which they describe in their documentation.  I think there is a way to have multiple clients share the same files though.  I recall looking into this myself and I think you might be able to do it with a SAN.  Maybe not.

Why these constraints?  It doesn't sound like a good idea.  Why not use BerkeleyDb's client server capabilities?  They have RPC set up, I beleive.

Name withheld out of cowardice
Wednesday, December 17, 2003

Two clarifications:

1) I don't plan to share the db via NFS but instead on a Windows or Apple shared drive. My concern stems from the fact that NFS file-locking is broken (on some implementations) and I'm concerned therefore that there might be problems with SMB or AppleShare volumes.

2) We don't have the luxury of installing a server, because the client doesn't really want the burden of administering it. In most cases, I expect only one person to be accessing the db at a time; however, I need to allow for multiple accesses.

My guess is that shared drives probably work fine under Windows and Mac OS X. But I don't want to find out there are problems after spending weeks implementing the data store.

I would prefer a library like Berkeley DB or MySQL embedded, which allow me to purchase a license after doing my development (and probably after getting paid by the client).

While I like SQLite, it really doesn't support concurrent file access (or doesn't seem to from the documentation). Also, allow me to reiterate, MS JET or MSDE is not an option because the client code must run on Windows and POSIX.

Thanks for all the suggestions...

Jeff
Wednesday, December 17, 2003

There isn't any burden to administering a simple file or db server. You turn it on and put it in a cupboard.

When you move office you take it out of the cupboard and take it with you.

Servers are as complicated as you want them to be, but basically they are just computers like any other.

Stephen Jones
Wednesday, December 17, 2003

Jeff,

Before you commit yourself to Berkeley DB, especially the Sleepycat version, check out the licensing requirements.  If you're distributing this app, even to a single customer, there's a very steep licensing fee.  If the budget is can support the licensing then I would recommend it. I've been very happy with my own use of their package.

Clay Dowling
Wednesday, December 17, 2003

Does the POSIX interface even provide a >workable< record locking interface to support this type of application?

I suspect the constraints of the client might turn out to be, lets say, impracticle (meaning they are not going to like to foot the bill for this). It is very cheap to write down "no server" and "POSIX client interface" and "concurrent users" on a requirements document, but it could mean that instead of looking at a 1 MM Access app, you could suddenly be facing 30 MM of custom development, not to forget a huge operational budget to keep all that custom code running.

Just me (Sir to you)
Wednesday, December 17, 2003

All modern UNIX systems have good file locking support.  It's just NFS that bites.

Clay Dowling
Wednesday, December 17, 2003

The recommendations for Codebase (simple and elegant), or C-Tree (complex but powerful) are probably the best solution to your problem.

I suggest Codebase. As Albert mentioned, the files are xbase compatible, and will work with freely available ODBC drivers etc. should you wish to integrate with spreadsheets or other tools at a later date.

Both products have source code available. Both offer the ability to share a database file, or have a server option for use in a true client-server model.

Why don't you want a server model? Given that you have a computer running on which the DATA must be stored, why not run the server on that computer as well? Servers such as Interbase/Firebird, Sybase ASA and others will allow a user on an older computer to work effectively on the same machine as the DBMS. Both are low footprint, and virtually zero administration.

HeWhoMustBeConfused
Wednesday, December 17, 2003

The problem with codebase/xbase is that each table is stored in a separate file. If you only have a few tables, that might be ok, but when you get lots of them, it becomes difficult to manage.

pdq
Wednesday, December 17, 2003

LOL!

Incoherent Abbreviator
Wednesday, December 17, 2003

I got a response back from the folks at Sleepycat. Berkeley DB will work just fine with multiple machines hitting a db on a shared drive using SMB (Windows) and probably using Apple Share (Mac OS X).

The reason I don't want to require a server is this application is targeted at a low-tech crowd who don't necessarily have access to a server machine. They have access to a file share; but not much more.

Most of the data will be kept in XML files (one file per "entry"); but in order to allow speedy queries (e.g. find all entries between these dates) I'm going to build b-tree indexes onto a record table containing the file names and some meta data.

These indexes can be rebuilt at any time; but I anticipate multiple users hitting the repository and hence updating and making queries against the indices.

The big draw back to Berkeley DB is the licensing cost. A full license for the version I'd like will set me back $150K. On the positive side, that means I get "free" updates for ever and never have to pay a distribution fee. Plus, I'm willing to bet I don't have to pay out anything until I ship the first version of the application.

On the other hand, c-tree from Faircom (which I've used before) costs ~$900 up front before I can even lay my hands on the code. I have to pay to get upgrades. And I have to pay up to $15K per year for distribution rights (granted that's a lot of copies of the application).

Of course, if I were willing to GPL my application, I could use Berkeley DB for free.

Jeff
Wednesday, December 17, 2003

"Of course, if I were willing to GPL my application, I could use Berkeley DB for free."

s/GPL/open source/

http://www.sleepycat.com/download/licensinginfo.shtml

me
Thursday, December 18, 2003

Does Berkley DB offer shared file access over a POSIX compliant interface? Every modern OS does offer POSIX interfaces (including Windows), but that does not mean applications restrict themselves to use these. If I look at POSIX record locking, it looks rather basic, certainly compared to the OS native file IO interfaces, and not >obviously< satisfactory for this type of application.
This is trailing quite far from my practical knowledge though.

Just me (Sir to you)
Thursday, December 18, 2003

Jeff,

I also choked on the licensing costs.  I wound up with the open source model, and what I sell is instead my service in setting up and administering the software.  This model really isn't too restrictive for my audience, and it sounds like for your either.  It does have a couple of advantages.

First, making the source code available is a nice perk for the customer.  If they're the sort of person who can build from the code, they're highly likely to choose a solution that offers source code anyway. A closed source solution won't appeal. If they can't build from source, or are disinclined, they can still buy the pre-compiled edition.

Second, you can do a good trade in selling service and support for your software.  Although I don't sell huge amounts of pre-written software, what I do sell is almost always the service of installing it.

Clay Dowling
Thursday, December 18, 2003

Berkeley DB uses native file locking for the target system. The bit about POSIX support is targeted at Unix-based systems. If a shared file system supports POSIX file locks, it will work fine. Of course, if you're building on Windows, the library will take advantage of Windows filesystem locking. Presumably the same is true for Mac OS X.

I need to think a bit about the licensing model before making any decisions. $150K is a big nut to swallow; even if I don't have to pay out until the product is complete. On the other hand, it represents only about 500 units at about $300 per unit. Granted, that means I make no profit on the first 500 units shipped...

I'll be following up with the Sleepycat folks to see what wiggle room they allow...

Jeff
Thursday, December 18, 2003

Jeff,

You can also try Sybase SQL Anywhere:
http://www.ianywhere.com/

MR
Thursday, December 18, 2003

I think SleepyCat has (or at least had) a licensing model based on a percentage of the price of your app.  THey are a small company and probably can work something out with you. 

Name withheld out of cowardice
Thursday, December 18, 2003

As a final follow-up: I've been corresponding with the very helpful people at Sleepycat; and it seems that multiple machines will NOT be able to modify a database stored on a file share.

That's a bummer; however, it does actually relieve my concern about licensing costs. I can develop a very light-weight server application (spawned by the first client that opens the database) to which all clients connect (discovered using ZeroConfig aka Rendezvous). This light-weight server can be open sourced, because it really does you no good without my client.

That means I don't have to pay for a license. However, it means that I have to put a little more thought into co-ordinating access to the database via the server process.

Thanks again everyone for your input. I've learned a lot about the embedded db market space. For example, I had no idea there were so many Java-based embedded SQL databases.

Jeff
Friday, December 19, 2003

Oh, one last thing, in case anyone thinks I'm hung up on Berkeley DB...

If I'm going to have to spawn a light-weight server, I might as well make it in Java and use one of the embedded Java SQL databases.

The rest of the application will use platform native APIs: C#/.Net on Windows and Cocoa/Objective-C on Mac OS X.

Jeff
Friday, December 19, 2003

*  Recent Topics

*  Fog Creek Home