Fog Creek Software
Discussion Board




The Database Database

The recent thread "Don't Define the Interface in the Database" made me think of a database design pattern I observed a coworker using in many of his projects.  He defined the database in the database.

The basic concept is to use just a few tables that are designed to represent as many arbitrary tables as necessary.  In its simplest manifestation, you have a "table" table, a "table column" table, and a "value" table, with a "row id" field in the "value" table.  The "row id" can be broken out into a separate table for referential integrity purposes if desired.

This simplistic approach results in all data from any "table" being stored in a single "value" field, which means there isn't any type checking, and all data elements should be serializable to the datatype of this field, usually a string.  A fuller-featured version of this model would store the datatype in the "table column" table, and perhaps have different "value" tables for each datatype.

This approach came in handy when we had a client who insisted on minimal physical schema changes on a project with constantly changing requirements.  Once we set up this "database within a database", we were able to add new "tables" and "fields" without having to make any more physical schema changes.  Of course we paid for this dearly in performance when we reconstituted "tables" full of data for reporting purposes.

As you can probably imagine, this approach also became a nightmare to maintain, since some crucial information such as datatype had been hardwired into code.  But when done properly, it allowed great flexibility, such as for a survey engine that allowed the definition of multiple surveys (similar to tables) with different types of questions (similar to columns with different datatypes) with different answers (the "value" table).

What has been your experience with this database design pattern?  What scenarios did you feel justified its use, and where did it fall apart?  Has anyone gone so far as to implement their own referential integrity?

ODN
Thursday, February 06, 2003

This is very similar to the "Graphical" database construct.  (It's called graphical because of it's use in things like CAD).

Instead of using columns you use rows in another table as the columns and the rows.  In your example, this was extended to tables as well.

In it's minimalist form you get a table like this:
RowID   
ColumnID   
Value

This sort of layout is very good for unstructured data.  (Think user-defined columns)

I had the choice of using this method in one of my projects but instead decided to just serialize any unstructured data into a blob field.  Better performance, but you can't query on that data.

Wayne Venables
Thursday, February 06, 2003

Being the author of the 'Don't Define the Interface in the Database' thread, you can probably guess my viewpoint.  The whole idea is wrong.  How can you justify something like this?  You say the client requested it and that it allowed flexibility. Hmm...  The brilliant programmer that schemed up the database I'm working on started to implement something like this for use with the reports.  So not only would he have had the Interface in the database, but he would have also had this rather confuscated database definition within a database simply to add some tables that defined reports.

Why don'nt you simply use something like DAO along with a simple utility that allows you to add a table (with fields) to the database (or for that matter create the whole database).  That's all the functionality you have implemented anyway.  Except you have done so with such a convoluted scheme that it seems to me that it would be almost unmanageable.

Dave B.
Thursday, February 06, 2003

Dynamically adjusting actual database columns dynamically in software is a hell too.  Possibly less hell than what is described about.

It's all about the right tool for the right job.  It doesn't sound like the developers on that project really saved themselves anything.  The still had to add columns and tables but they had to go through a level of indirection to do it.  Why'd they do it, I don't know.

Wayne Venables
Thursday, February 06, 2003

I tend to agree with you, Dave.  For the most part, the way my coworker used "the database database" felt very wrongheaded to me.  Especially because he didn't realize that he was actually building a database within a database until I told him that's what he was doing.  Once we had a name for it, he delighted in suggesting it every chance he got, whether or not it was appropriate.  You give a man a hammer....

But it got me to thinking, since I felt there was still a kernel of usefulness there, if only I could find a scenario that justified it.  I believe that using the technique entirely as a substitute for making schema changes is a foolish approach to take.  You are always better off implementing the table directly if possible.

But there are scenarios such as the generic survey engine I mentioned where there is a need for this kind of fill-in-the-user-defined-template flexibility.  This is more of a special-purpose use of the database database pattern that is obviously tailored to fit the needs of a survey engine.  I think it is rare that the database database pattern is useful and justified in its pure form, but I'm guessing there are many uses for it when modified slightly to fit a specific need.

ODN
Thursday, February 06, 2003

I did write up a little utility that converts the structure and content of actual tables into the database database structure, and can reconstitute them back into real tables as needed.  I was working toward creating a data access layer that allows you to work with tables that are in the database database just like you work with regular tables, with the ability to change whether the table was virtual or real without impacting your code.  This way, if you were making frequent schema changes in the database database, you could periodically transition them out into real tables, which would minimize the frequency of physical schema changes, plus allow you to eventually regain the performance that was lost.  But unless someone can think of a justifiable scenario for this, I don't think it's worth my while to pursue it.

ODN
Thursday, February 06, 2003

Is there any real reason to continue using this database database structure?

If I were you I'd just use you utility to convert the structure to normal databases and be done with it.

Frequent schema changes are not that difficult to do without this structure.

Wayne Venables
Friday, February 07, 2003

This is coming out of my ass, but I think the Arsdigita Community System (now a RedHat open source CRM system) at one time contemplated putting all table fields into one single field in one single table for simpliciity reasons--I don't know if they ever followed through. For some reason this was something Oracle had no problem with, and if you had "great programmers" it was somehow a good idea.

If you use SQL Server 2000 you would start to worry about performance, because scanning an ordinary table now would have to scan 10 times more rows to get the job done. You can solve this by using partitioned tables that is joined again using a view. This would improve performance for backups/restored/adds/deletes/searches.

But on the other hand, if 99% of your fields are varchars anyway and your domain of business is playing with text and more text--such a table makes some sense I guess.

Li-fan Chen
Friday, February 07, 2003

It is called the mother of all datamodels(TM).

David Hay has it in his datamodel patterns.

I have used it with good effect under appropriate circumsatnces. Mine has a few tables more than Hays' 7. I also distinguish between Descriptive Qualitative and Quantitative observations (values in your model) Then Units comes into it as well. And inheritance.

An IEEE software last year featured this by on of the OMT (rumbaugh) guys - they said about 30% performance hit.

Karel
Friday, February 07, 2003

metadata ~ data about data

The furthest I pushed this was a schema design that defined fields, variables and constraints for a UK Personal Tax Database because in any one tax year there would be a variation in both the data stored, the constraints and types.

It also had the capacity to attach code to the Tax rules. 

So you ended up with a table of definitions for all tax years and for a particular tax year for a particular client you generated a real table with the correct fields and constraints.

Simon Lucy
Friday, February 07, 2003

My current development is taking the approach being discussed here to add user fields to items. The actual data for these fields gets store twice

(1) all serialized in a column of the item that has the extra fields
(2) in an index table where each field/value combination will be in a row.

Storing twice? This is a bit of an overhead isn't it?

Yes, but in the system 99% will be reads and queries and only 1% writes therefore this approach is taken to ensure that reads and queries are performed quickly; the hit on database size and on write-time is acceptable.

It is not possible to dynamically add columns to tables (for other reasons too complicated to explain here)

Works a treat... but creates a big database!

Gwyn
Friday, February 07, 2003

I consistently use this model in my database. It works great!

Let me rephrase that a little. The "real data model" for the application uses traditiona (real) tables. In most applications, users want to define custom fields and other "defined on the fly by the end user" entities that cannot, by definition, be hard-coded into the schema by the developer during development. A the DatabaseWithinADatabase pattern/model works quite well for allowing the end-user to define this stuff on-the-fly.

I would *not*, however base the core data model on this pattern.

Anonymous Coward
Friday, February 07, 2003

Goodness sakes, this is a disaster.

How in the world can you do a "select sum(sales) from dailysales" if all of your numeric values are stored internally as strings?

Benji Smith
Friday, February 07, 2003

It's worth beraing in mind that most databases use the "database database" approach anyway.

Eg, in a SQL Server database you have:
  * sysobjects: defining tables amongst other things, 
  * syscolumns: er, your columns...,
  * sysreferences: relationships/constraints
etc

So infact what you are doing is creating a "database-database-database" :-)

Duncan Smart
Friday, February 07, 2003

>> It's worth beraing in mind that most databases use the "database database" approach anyway.
>> Eg, in a SQL Server database you have:
(snip)
>> So infact what you are doing is creating a "database-database-database" :-)

Yeah, but if you use the vendor's approach to metadata you at least have access to a SQL engine, DB drivers, and programming logic that can work with that metadata!

This subject seems like a distinct step backwards, the kind of thing that a junior developer feels like a genius for pulling off due to its sheer complexity and which becomes an albatross around some other poor schmuck's neck when it comes time for maintenance.

Sometimes I read such enthusiasm for hand-rolling one's own engine to substitute for the vendor's existing system or creating some other weird little trinket  that I wonder what some people have to demonstrate to prove that they did their work...

Anonymous Coward
Friday, February 07, 2003

This sounds like somewhat like the Sentences database: http://www.lazysoft.com/technology.htm

Instead of using a separate, unique table for every different type of data, Lazy Software's Associative Model of Data uses a single, generic structure to contain all types of data. Information about the logical structure of the data and the rules that govern it are stored alongside the data in the database.

However, see also: http://www.pgro.uk7.net/lazyness_1.htm !

Robert Cowham
Saturday, February 08, 2003

>'It's worth beraing in mind that most databases use the "database database" approach anyway. '

But the difference is that the data in their "database database" is mostly automatically maintained, instead of having to be manipulated by programming gymnastics.  When you do a CREATE TABLE, the column definitions automatically get populated into the relevant metadata tables - neither you nor the DB maker has to worry about it any more.

The "database database" may make sense if the complexities of that approach are built once, hidden behind a simplified interface, and require little or no future maintenance.  However, that definitely does not seem to be the case with the original poster.  To modify, debug, or extend that application, you have to know the insane mechanisms of how to read and write the details of the "database database".

T. Norman
Saturday, February 08, 2003

"So infact what you are doing is creating a "database-database-database" :-)"... to clarify - I *don't* think that it's a good idea - just pointing out that databases themselves use this technique - so doing it *again* on top of this seems slightly perverse.

Anyway, I have seen in a couple of products that when they need the database to be flexible, they simply go ahead and create the required tables directly in the db. For example, MS Commerce Server 2000 and later (I'm *not* a fan of this product though...) does this when you define your own products in its product catalog. I'm sure I remember MS Content Management Server does something similar.

So why not have your system do this. You don't need to give your app's login DBO provileges, just allow it to create tables in its own schema (GRANT CREATE TABLE TO fred) this will allow your app to do:
  CREATE TABLE fred.FredsTable(ID int IDENTITY PRIMARY KEY, Stuff varchar(50))

and:
  ALTER TABLE fred.FredsTable ADD MoreStuff varchar(50)

and even:
  DROP TABLE fred.FredsTable

without being able to affect dbo, or any other tables in that database. ie, you can effectively "sandbox" what your app's login can do above and beyond simply allowing it to SELECT/INSERT...

Just an idea.

Duncan Smart
Monday, February 10, 2003

Robert -

You reference an item called "On (Inelectual) Lazyness" (sic).

I make that two errors in three words in the title alone.

A quick scan of the text reveals that it is a typical DD piece. It spends more time attacking other peoples credibility and demainding that they define their terms than it does in genuine criticism.

Thats the way the money goes
Monday, February 10, 2003

I have seen it done when they wanted "DBMS independance". I still think it cost more than it payed for.
Other applications that actually benefitted from the flexibility (heaps of schemas, all overlapping, think "interfaces" in OO, with few rows per table) would have been far simpler skipping the relational model all together.

Just me (Sir to you)
Tuesday, February 11, 2003

*  Recent Topics

*  Fog Creek Home