Fog Creek Software
Discussion Board




Why RDBMS?

It seems that whenever a new project starts up, the automatic reaction to data storage/access needs is Oracle, or SQL Server, or MySQL.  There are alternatives -- ODBMS, ORDBMS, XMLDB, but these solutions seem to be rarely considered despite the fact that they may be a better fit, reduce development time, be faster or require fewer computing resources.

What keeps everyone going to the RDBMS?  Is it just habit, or are there other factors that keep pushing everyone to Oracle and SQL Server?

RDBMS Skeptic
Thursday, October 23, 2003

The burden of proof is always on a new technology.

So far, except for a niche markets, no advantage has been generally observed in the industry to XML databases, OO databases and various other new ideas.

On the other hand, RDBMSs are very mature and very well known, and even if alternatives were generally equivalent or slightly better (which, to date, 20 years later for OO, they still aren't), it would be hard to dethrone the RDBMS.

It would have to be something clearly significant.

And, for everyone who starts posting synthetic examples and argue about program-database model mismatch - the arguments are well known, and you can convince yourself, and possibly others, that in theory other things should be better. The industry's collective experience is, so far, that RDBMSs are superior as a general purpose solution -- and it is not for lack of attempts from OODBMS and XMLDB proponents to convince the industry otherwise.

Ori Berger
Thursday, October 23, 2003

Not to mention the huge number of supporting applications, components and databases for RDBMS solutions.

chris
Thursday, October 23, 2003

Look around at your staff and ask who has experience with Object oriented databases, or who has worked on a production grade XML DB. 

Sometimes its a matter of what you have experience with.  That may limit you in unforseen ways, but implementing a foreign technology is always a big risk.  Better to fight the enemy you know.

Lou
Thursday, October 23, 2003

Reporting, reporting, reporting - nothing beats a RDBMS for summing, grouping, filtering etc. for reporting purposes. 

If your app does not have any reporting needs then other storage methods would probably be OK.

DJ
Thursday, October 23, 2003

Don't forget too that your data will probably outlast your application. Having a RDBMS hold your data in a well thought out relational model makes it accessible not only by your application, but other applications (i.e. reporting) and creates a high degree of potential use for the future.

As others have said, DB skills are in wide use. DB systems are very mature/stable - something people like when committing their data to storage.

DB's have common interfaces that can be used by most languages and allow the potential for swapping out DB's with only some pain. I think the perception is that using an XML or OO db locks you into that DB and perhaps that language. Obviously there would be more universal interfaces if everyone adopted them.

m
Thursday, October 23, 2003

Visit DBDebunk.org to see all the arguments against OO/XML DBMS.  Remember though that no SQL DBMS is a fully Relational Database Management System – which means that you’re not a RDBMS skeptic but a SQL DBMS one.  Of course, you’re right to be skeptical of SQL-based products because they’re a flawed and incomplete implementation of a RDBMS.  However, the data model that they are based on (the Relational Model by Codd) has been formally proven to be a *general* data model which could, if the SQL vendors tried hard enough, store XML, Objects, etc. and would give you the logical data independence, data correctness, and other requirements of a RDBMS.

The problem is that since end-users don’t know the relational model they assume (incorrectly) that the current products’ limitations must be a limitation of the model and they end up using even worse products like XML and OO DBMS which have no formal data model that makes sense and worse, a model that can’t be shown to be correct! 

In short: the SQL DBMS we have now aren't perfect but they're the best we've got (until someone invents a Truly Relational DBMS and makes a bazillion dollars).

MR
Thursday, October 23, 2003

"What keeps everyone going to the RDBMS?  Is it just habit, or are there other factors that keep pushing everyone to Oracle and SQL Server?"

Well, for starters, there's also MySQL and PostgreSQL. Both are free (as in beer and speech), bundled with many Linux distributions, with extensive documentation, widespread use and have large communities of users. Especially the former.

Joe
http://www.joegrossberg.com

Joe Grossberg
Thursday, October 23, 2003

RDBMS is the devil you know.

PostgreSQL and MySQL vs Oracle, SQL Server et al is a different question. 

There's no doubt in my mind that database vendors charge way too money, so some competition from below will be good for customers.

Portabella
Thursday, October 23, 2003

MR -
"Of course, you’re right to be skeptical of SQL-based products because they’re a flawed and incomplete implementation of a RDBMS"

Care to amplify on this? Give specific examples?

Philo

Philo
Thursday, October 23, 2003

> Of course, you’re right to be skeptical of SQL-based products because they’re a flawed and incomplete implementation of a RDBMS

So where are all the really relational RDBMSes?

This has all the flavor of something like Sather: it's much better than C++, except that no one ever used it to do real work.

Portabella
Thursday, October 23, 2003

>>Care to amplify on this? Give specific examples?

>>So where are all the really relational RDBMSes?


Read the works of Codd and Date. This is the theoretical underpinnings of relation theory (Codd is the inventor of the relational database model).

Current relational databases don't impelment everything that is necessary to have a 'full' relational model.

Does this mean that they aren't useful? No.
Would having a 'fully relational' database be helpful? probably not.

There is a big difference between what's useful in the market and what's complete in theory.

RocketJeff
Thursday, October 23, 2003

> Read the works of Codd and Date

Yes, yes. I know very well who they are, and I've taken relational DB courses in school.

The point is that "SQL is not fully relational" is utterly useless to me as a developer. It may well be true, but so what?

Portabella
Thursday, October 23, 2003

my point exactly.

RocketJeff
Thursday, October 23, 2003

I'm with Phillo, give us examples. Or otherwise I'll state: "Why everyone is so over enthusiastic about lager? Sure where are other kinds of bear! When we about to bbq everyone goes for lager not even considering stout, draught, ale or shandy".

Plumber
Thursday, October 23, 2003

directory based data (LDAP/File) is pretty popular too, not in all cases but sometimes

the artist formerly known as prince
Thursday, October 23, 2003

The other big win RDBMS has is standards.

With a good knowledge of SQL, I am able to move from SQL Server through MySQL to Oracle with very little effort. 

Ged Byrne
Thursday, October 23, 2003

"Why does everyone choose lager?"

Well, I've brewed both lagers and ales, and I can tell you that while lagers are a pain in the butt, there's something hard to resist about that clean, crisp flavor. Even the very dark and heavy lagers like Dunnkel or Bock have a certain magic about their taste that sets them apart.

As far as I'm concerned, it's a pain in the tail, because I can turn out a new batch of ale every two weeks, as opposed to every two months with even the lightest lagers.  But even with the financial rewards of producing ales, it's hard to argue with the taste or the market.

Clay Dowling
Thursday, October 23, 2003

I'd say that the foundation of RDBMS is a strongly mathematical model continues to be a critical component of its success. Other database systems are based on single or multiple 'cool' features, but don't have a foundation in theoretical mathematics and thus fail to achieve either the inherent simplicity or the scalability complexity of Codd's relational calculus.

Dustin Alexander
Thursday, October 23, 2003

That last line should read 'scalable complexity'. Sorry. No coffee yet this morning.

Dustin Alexander
Thursday, October 23, 2003

For most application, a properly configured RDBMS would work better than LDAP.

You can easily map an LDAP database onto an RDBMS table, and while you can usually a table back to LDAP, the LDAP query language is inferior to in what it can provide - it's ability to join various trees/tables is insufficient for most nontrivial tasks.

There are other features which come with most LDAP servers, such as replication, delegation, and others, that make them attractive for specific uses (e.g., directories and addressbooks). This is not inherent to LDAP vs. RDBMS, but rather to specific implementations. It's possible to get the same features from almost any RDBMS these days.

Ori Berger
Thursday, October 23, 2003

Dustin: That sounds plausible, but is probably not the reason for RDBMS' prevailing success; The same can be said about, e.g., functional programming vs. OOP.

Yet, OOP is immensely more popular than functional programming.

Ori Berger
Thursday, October 23, 2003

The discrepancy that Codd/Date point out between a "true RDBMS" and a "SQL DMBS" can be best summed up in the denormalized field issue. I am sure we are all aware of the whole debate of having a fully-normalized structure vs. the "real-world" argument, in which we need de-normalized fields to speed up some complex queries and such.
Well, Date/Codd argue that the fact that you need to denormalize because of speed issues is NOT due to a weakness in the relational model -- they argue that it is due to a weakness in the *implementation* of the model in the existing RDBMS applications (Oracle, MySQL, etc.).
Another example they cite is the use if indexes. In a "truly" relational database program (a hypothetical beast of course), you wouldn't even need indexes -- because this is another "add-on" to the relational model that is required to get a sufficient amount of speed in the current implementations.
I don't quite understand *how* they think this can be accomplished, but there is a nauseatingly large amount of articles on their DBDebunk.org site that talk about this.

Jordan Lev
Thursday, October 23, 2003

I think the two models are hard to compare. One [RDBMS] is for the storing and modeling of data and statistics, mathematically sound, measurable, discrete variables. The other [OOP] is for modeling active systems of continuous variables and procedures.

At this level of comparison, procedural programming is no more mathematically sound than object oriented programming, as they are neither of them based on strong mathematical principle. instead programming is based on a more functional set of principles. Procedural programming is drawn from the electrical principles of a system and step by step logic inherent is such principles. OOP is based on a more common sense, ease of programming principle. I think neither are especially strong in a foundational sense.

I agree with you, however, that the foundations of RDBMS in Codd's work is not the only, nor the largest factor in its longevity. But I would argue that without it, the other systems would be faring much better and the lifespan of SQL would have been significantly reduced. Mathematics is timeless, because it provides a way of looking at the base components of a system, and mathematical foundations contribute this timelessness to SQL. We'll be using differential calculus for much longer than we use object oriented programming (which is a perception based process, not a system of analysis).

Dustin Alexander
Thursday, October 23, 2003

Their point is blindingly simple.

The Relational Database Model is a way of arranging data that is perfectly in sync with the real nature of that data - that is why you can only decide what is the best way of setting up your tables when you know exactly what your data is and how you use it.

A Query language, or a database engine, are ways of using that model. They will bring their own imperfections - indeed for much of its early life the RDBM was unusable because the hardware couldn't take it, which is why you had flat or hierarchial databases.

However this aspect of the implementaiton, the hardware, improved, and so by the end of the seventies relational databases took off. The fact that for much of their early life they were unusable was an implementation question, not a design question.

So you don't compromise the model because the implementation hasn't caught up yet.

Stephen Jones
Thursday, October 23, 2003

Knocking SQL based products is trivial if you know the Relational model... Teaching everyone the relational model here is significantly outside of the message board scope.

>Would having a 'fully relational' database be helpful? probably not.

Of course it would. 

Fully relational DBMSs can easily and simply handle hierarchical data which SQL-based products have a significant amount of trouble dealing with.  Current methods ‘CONNECT BY .. PRIOR’ etc. are unintuitive, non-standard, and not as flexible as what the RDBMS is capable of performing.

SQL currently has more than one way of achieving the same result.  This is, in of itself, not a bad thing.  But most query optimizers have ‘gotchas’ that result in one logically equivalent query performing significantly worse than another.  The fault doesn’t entirely lie in the hands of SQL, but a less ambiguous query language might be more ‘optimizer friendly’.

I make my living programming complex Oracle 9i SQL all day, so certainly I’m not suggesting that SQL products are horrible and we should throw them all out (this is clear in my post, yet knee-jerk reactions are suggesting that).  But, once you realize the true power of the RDBMS you understand what dog food we’re being peddled because we don’t know any better. 

MR
Thursday, October 23, 2003

>better fit, reduce development time, be faster or require fewer computing resources

Skeptic, call Microsoft to have them break microsoft word's 5 trillion little features into unix commands.

And then use those tools to write an article.

Each of these commands would use much fewer cycles or memory, each would run faster.

But they won't reduce your time reading man pages.

Having to learn, use, and manage all kinds of separate databases is no fun--probably why people stick to RDBMS and common access methods like ADO or JDO.

Li-fan Chen
Thursday, October 23, 2003

> denormalize because of speed issues is NOT due to a weakness in the relational model

Well, it’s not.  If you’re seeing a performance hit it’s due to the fact that SQL-based products have a one-to-one relationship between logical (table) rows and physical (on disk) rows.  More or less a given row for a particular table is stored in a contiguous stripe on disk.  When you have a fully normalized database you (generally) end up with many, many logical tables and joining them together often results in a lot of disk I/O. 

The relational model requires no specific ordering or grouping of tables on disk – that’s an implementation detail.  An example of how DBMS vendors are band-aiding this is by adding ‘materialized’ views or whatnot.  Basically they store the results of the joined tables on disk.  Other DBMS vendors have things like ‘clustered indexes’ which order rows on disk to avoid sorts.  The Sybase IQ product, since it is a data warehouse only sort of beast, shows just how important breaking this ‘one row’ mentality is by storing rows per column: it achieves a significant performance (and disk space!!) advantage for aggregate queries over row-based products on identical data.  The TRDBMS should be free of any such row-based methods.  You can imagine how complex on-disk structures could be but you can also imagine how much of a performance benefit you can achieve, too.

There are more practical benefits to table normalization, of course.  Normalized tables eliminate update, insert, delete anomalies (and the associated maintenance scripts) which we’ve all run into with de-normalized systems.

>Another example they cite is the use if indexes.

I’ve read a lot of that site and I don’t ever recall seeing something where they are advocating that no DBMS should have indexes.  Could you cite that URL please?  I suspect, though, is that they object to the end-user (e.g. programmer) being exposed to them through things like optimizer hints and the like.  The DBMS engine should be smart enough to determine what index you need – the current implementations are sometimes ‘dumb’ and you have to push them in the right direction.  Optimizer hints are one of those things that will eventually fade with time as optimizers get better and better (those of us who worked on the early DBMS can attest to the horribly ignorant optimizers we had to work with!!)

MR
Thursday, October 23, 2003

What kind of beers are draught and shandy?

son of parnas
Thursday, October 23, 2003

All I know is when I was visiting friends in Dublin I was exposed to the shandy.  From what I gathered, it's a mix of 3/4 beer and 1/4 soft-drink syrup or lemonade.

It tasted like cherry cough-medicine.

Draught is a UK version of the USA “draft” aka from a keg.

MR
Thursday, October 23, 2003

The shandy is a crime against all beer kind!

son of parnas
Thursday, October 23, 2003

I guess you started programming before you knew about a RDMBS and you found yourself writing all this code to store your data and then gosh and golly you saw this thing labelled a RDBMS and suddenly you could rely on it and it having been around for quite some time and tried and tested and how easy it is to import/export data from a RDBMS ..... I assume  you were wondering why someone would use a RDMBS over some home-grown flat file system as opposed to perhaps some new cutting edge tool for which then the answer is obvious.

Me
Thursday, October 23, 2003

"At this level of comparison, procedural programming is no more mathematically sound than object oriented programming, as they are neither of them based on strong mathematical principle..."

Good objects have a strong mathematical theory behind them: the theory of abstract data types.  These are in turn built on category theory and the theory of partial functions.  Read material from Hoare and Liskov.

abu
Thursday, October 23, 2003

There are mathematical theories behind Heirarchical, Object, and Network databases as well.  In different scenariios different methods will give better results.

Simon Lucy
Thursday, October 23, 2003

MR:
http://www.inconcept.com/JCM/May2002/pascal.html
is a link to a good article on the site. Upon re-examination, the comment about indexes was a by-product of some new kind of DBMS they are talking about -- not an integral feature in a "true RDBMS".

This article actually is a good summary of the whole Relational-Theory-versus-Physical-Implementation debate.

Jordan Lev
Thursday, October 23, 2003

Simon, can you give me a scenario where a hierarchical database is better, excluding performance limitations?

Companies spent a fortune in the late 70's getting rid of them because they wanted to be able to query their data in minutes instead of weeks.

Stephen Jones
Thursday, October 23, 2003

Shandy is just for children.

Matthew Lock
Thursday, October 23, 2003

>Simon, can you give me a scenario where a hierarchical
>database is better, excluding performance limitations?

geographical information, for instance
(hirarchy of country,city,neighborhoud,street) - if you have found a city object, then you have all lower level objects and saved a SELECT or two,

Michael Moser
Friday, October 24, 2003

Largely the benefits of a heirarchical database are where you want to get data back out in the same way as its put in and likely in only that one way.

So one way trees are reasonable, such as straightforward address databases, or dictionaries (but not etymologies).  XML databases could be considered a form of heirarchical database.

Network databases (where a record can be a member of more than one set), were far more useful and give processing like:

for (ffm("orders");!eos("orders");fnm("orders")
    {
    scm_owner("order","orderlines");
    for (ffm("orderlines");!eos("orderlines");fnm("orderlines")
        {
          // process order lines
        }
    // process end of order
    }

ffm, find first member of set
fnm, find next member of set
eos, return true if end of set
scm_owner, set the current member of set1 to be the owner of set2

Which is much closer to the procedural nature of the language.  Its true that you can get much the same kind of processing with a relational database so long as you can use records and you treat relations as sets.

Simon Lucy
Friday, October 24, 2003

Generally, Shandy is 50/50 Lager or Beer, and Lemonade.

A "Top" is 85/15 Lager or Beer, and Lemonade.

75/25 would be a strong shandy, or a large top....!

Tom
Friday, October 24, 2003

It's a bit misleading to say that the advantage of the relational database is that it has a "mathematical model". An RDBMS is (or should be) a system for stating axioms in the forms of constraints, etc., and storing data: the beauty of it is that each piece of data within the relation is a proposition which must be true within the system. In other words, a true relational database simply does not allow you to enter data that violates the business rules used to set up the database. [And now I'm wondering if there's some way to drag Goedel's Theorem into this.]

Even if you're turned off by Fabian Pascal's ranting at dbdebunk.com, I recommend Date & Darwen's "Third Manifesto"; it's a very interesting outline of a true relational database system, including the conceptual integration of objects and subtyping into the relational system. Essentially, they argue that current "object-relational" databases map objects to the wrong place; object classes should correspond to domains, which are by and large vestigial in current RDBMS's.

Chris Hoess
Friday, October 24, 2003

Fabian Pascal is _still_ ranting?

Some things just never change.

Simon Lucy
Saturday, October 25, 2003

And the fact that some things never change is precisely what Pascal is ranting about.

Stephen Jones
Saturday, October 25, 2003

It uses relationship as bridge between tables, and tables have column based on relationship model of set theory.

piyush
Wednesday, June 23, 2004

Is this a relevant question to ask?


Which of the RDBMS is most likely to provide the user with the most information?

Didi Asamoa
Tuesday, July 06, 2004

*  Recent Topics

*  Fog Creek Home