Fog Creek Software
Discussion Board




Why would one need an XML database?

I have been researching databases and database design for a college course - basically to list the pros and cons of ODBMS and RDBMS.

In my research, I came across XML databases such as http://www.bluestream.com

Why would anyone want an XML database?

Puzzle Ment
Sunday, September 07, 2003

Why would someone want a database of user logins?  If your app needs user login data, have a database of user login data.  If your app needs to deal with xml data, have an xml database. 

Joe Pants
Sunday, September 07, 2003

"If your app needs to deal with xml data, have an xml database"

once again proving the value of the advice you get often has a strong correlation with the price you paid for it.


Sunday, September 07, 2003

Hmm, XML data - what exactly is that? I know that user login data has certain attributes and that it follows some form, but xml data does not. It's mostly useful as an interchange format, so I see little need to maintain a database in xml.  Anybody care to name a use for such an animal?

Chris Cooney
Sunday, September 07, 2003

XML is really useful for data interchange and for storing settings in little config files, but I don't see any reason as to why you would want to store data as XML itself.

John Rosenberg
Sunday, September 07, 2003

<quote>
Why would anyone want an XML database?
</quote>

1. To look good on a resume.

Mmm, thats about it really.

Seeya

Matthew
Sunday, September 07, 2003

There is no good reason for an XML database.


Sunday, September 07, 2003

At the very least to keep these guys in business.

http://www.x-hive.com/

fool for python
Monday, September 08, 2003

...or these guys?

http://www.intel.com

Sam Livingston-Gray
Monday, September 08, 2003

You can generate XML from a reasonable database (or middle-ware component), XML does not make a good native format for a database it would have to parsed into a tree of some kind, and if you do that you might as well store it properly.

Simon Lucy
Monday, September 08, 2003

where have the mentioned the part about XML having a cure for Cancer?:-)

Prakash S
Monday, September 08, 2003

"Why would anyone want an XML database?"

The main reason you might want one is because a vendor convinces you they are another part of the XML panacea.

However, it is difficult to represent hierarchical data (e.g. LDAP) cleanly using relations (though it has been done; IBM have some papers on how they did it in DB2.). Whereas XML very naturally models hierarchical data.

This assumes by "XML database" you mean something with a native hierarchical query language (probably not standardised). How the data is stored internally isn't very relevant.

A lot of the disadvantages of an "XML database" are the same as for OODBs - no standard query language, no strict mathematical basis (lack of closure makes query optimisation, composition of operators etc. harder) .

MugsGame
Monday, September 08, 2003

There is a native query language for xml databases, it's called xquery and is a W3C standard.

The main reason for xml databases is that pseduo-technical managers can justify their existence by spending $80K on them (heck, they even buy three for clustering). If you're a small company and you stand close when there's that kind of stupidity throwing around that much money, then you just might get hit by some.

However, I have to admit that the xml database that we are being forced to use doesn't actually seem that bad. It's not as fast as MS SQL Server, but it's ok. It doesn't seem to scale well and eats up memory like there's no tomorrow, but that's nothing that can't be solved by throwing hardware at it. Just ask a Java coder :).

We often sit around and try to figure out how an xml database works and we always come to the conclusion that it's really just an xml layer on top of a relational database. Nothing else makes sense. If anyone knows different, please feel free to explain.

RB
Monday, September 08, 2003

There is a very good discussion of XML and databases on these pages:

http://www.rpbourret.com/xml/XMLAndDatabases.htm
http://www.rpbourret.com/xml/XMLDatabaseProds.htm

I think this is an excellent site ... the author has also created a XML-to-DBMS mapping package (Java and Perl) that looks interesting.

Mike S.
Monday, September 08, 2003

I get amused when people think XML makes a good database for a heirachical data domain.  Before relational toy databases came out we had such things as Heirachical Databases, and Network Databases.

Using Set theory you could navigate heirachical relations to your hearts content and manage the data in a schema which rather more closely mirrored the document reality than the relational table.

Simon Lucy
Monday, September 08, 2003

XML is a mark-up language - so in essence in some ways it is no different than HTML.

HTML focuses on describing formats, XML focuses on basically describing data.

For XML to be of value - standards must be acceptable. As HTML, passed by W3C, is understood by all browsers; XML standards must be agreed to between sender and receiver.

Q)Why am I mentioning all this?

A)Just a trip back to the basics.

So now how does one store an XML document that may look like the below?

<name>Hideo Nomo</name>
<occupation>Baseball Supremo</occupation>

One would need to store the above in a highly normalized database.

Imagine a fact table and a couple of other joined tables containing the relavant info.

To store the above XML document a simple script can be run to strip the XML tags and place the relavant info in the correct tables.

To perform queries, will be no different than standard SQL queries.

XML databases per say would be something of the above - if it works well.

My guess is that alot of the XML database vendors were chasing venture funding back in the boom days and hence the buzz word heavy descriptions.

The same thing comes with Content Management and XML. One can use XML to descrip certain text - hence it becomes somewhat of an object and can be inserted in multiple places. But you can't use it as a method - pun intended.

Because that is it - it just allows multiple placements and obviously makes editing easy. And it essense it becomes like RSS - another site can grap the data between the tag knowing what it is.

When it comes to buzz words - I often think of the "Emperor with no clothes" type scenario.

At the end of the day, without a bull market, all tese buzz words will amount to hot air if there are not explained for what they are. Same thing with web services.

Ram Dass
Monday, September 08, 2003

What about those systems where the data you are dealing with is by its very nature document-centric.  Sure, the vast majority of data warehousing doesn't fit a document-centric setup, but what about those that DO.  Saying that slow, useless XML is only useful on a resume sounds a lot like the people who claimed the same thing about slow, useless Java not too long ago.  And now I'm jealous of all those java programmers who have jobs.

J.F.
Monday, September 08, 2003

...and there is a query language, XPATH.  Those who use it frequently appreciate it.  And I don't have to worry about one XPATH library having a significantly different syntax than another, but I do have to worry about variations on SQL.

J.F.
Monday, September 08, 2003

J.F.

How would a company store documents in a data warehouse?

It depends - this is a really big topic. See www.dwreview.com - btw thru a link on dwreview, thats how I came to know about CityDesk etc.

Back to the subject, when you have reams of data - say MS word files. There generally will be a standard in which to capture data - i.e. metadata.

Storing the data can be made as my post mentiones earlier.

One can use XML to tag the data or not - but an XML database is not needed. What is more important is a robust database engine :)

Ram Dass
Monday, September 08, 2003

The trouble with hierarchical databases is that they become a nightmare to maintain or change.

Relational databases came into existence because they accurately reflected real life relationships (in fact as Date says, you can't start to draw up a relational database until you know exactly what the reality it is reflecting is). They may seem more artificial than a tree structure, but that is because hierarchies, although nice to look at, don't actually reflect reality very well.

One of the main disadvantages of XML is that people think it is an alternative to the relational model, when it is simply a useful format for interchanging data found in web pages.

Unless you have a clear grasp of the relationships in both the document you import from, and the one you are importing to, you are going to find XML merely adds to the confusion.

Date argues that CSV works just as well. Of course he's never tried to import to a French or Spanish database : )

The principle however is correct. If you have the correct data structures in both places, then the actual exohange format is trivial. Time-saving, convenient yes - but not fundamentally affecting the structure.

Stephen Jones
Monday, September 08, 2003

I accept the traditional rigid schema problem of heirarchical databases, that isn't so true now  and I might argue about the aptness of heirarchies and networks for mapping some domains but anyway...

There is one fundamental difference between a flat transport file such as CSV and a structured one such as XML.  In a flat transport the structure has to be 'known' outside of the data.  To a large degree XML carries its schema along with it,  although this really only changes the translation problem from one of location (3rd field is the Order number), to knowing which node corresponds to your idea of the same entity.

Simon Lucy
Monday, September 08, 2003

I'll admit it...I have no idea what Simon Lucy just said.

Pedantic
Tuesday, September 09, 2003

Yea, I agree that XML is better than CSV. You can transfer a XML file and understand the underlying layout. However, as you said, you still have to know how to map it to the data schema you are importing to.

The danger with XML is that, because it carries structural information with it, the ill-informed think that thay can ignore the last twenty-five years of study of database design.

Stephen Jones
Tuesday, September 09, 2003

Indeed. Actually Iwas just reminded that the UNEDIFACT EDI standard had/has a repository that allows you to send your published kind of node (a Purchase Order line, say), and have it spit out someone else's also published, but different node for the same kind of entity.

DTDs could be used to achieve the same kind of thing if anyone actually used them properly and there was a central repository of them.

Simon Lucy
Tuesday, September 09, 2003

Could be a buzzword company trying to raise VC funding ;)

Ram Dass
Thursday, September 11, 2003

*  Recent Topics

*  Fog Creek Home