Fog Creek Software
Discussion Board




Why not just admit there is an impedance mismatch?

All the work to solve this problem so far have been doing the same thing. "Yes, there is an impedance mismatch, now, let's match'em".  The attitude is the same as "Yes, perpetual machine doesn't exist, now, let's make one". No wonder all of them have failed. I am sorry to see Joel fell into this too.

The mismatch between OO and relational is deep in the philosophy. They are different way to view the world. You can not "map" one to the other. Just like you can not "map" a plane to a truck. Even you chop off all the wings and toll the plane onto the highway, it still can not behave like a truck.

An OO layer on top of relational model is the leakiest abstraction I've ever seen. No, it's not leaky, it a sieve! Wonder why Microsoft doesn't have entity bean, not even stateful session bean? Because MS understands this while those Java folks don't.

Why can't we just accept the fact that there is an impedance mismatch? Stop trying to put one on top of the other. Let them stand side by side.

Siyan
Wednesday, March 31, 2004

OK, I do admit there's an impedance mismatch, so admission is not the problem. You're asking me to accept the impedance mismatch, live with it forever, never quite resolving the fact that every day I spend too much time writing code to convert a row from a recordset (query result) into an object and vice versa. Mindless code like this:

If IsNull(rs!field) Then
      bFieldIsNull = True
Else
      bFieldIsNull = False
      theField = rs("field').Value
End If

and on and on and on. Just to take that one tiny example, give me a programming language where any scalar can optionally be nullable, and I'm instantly more productive. Give me a language where the very act of typing a SELECT statement or a CREATE TABLE statement immediately creates corresponding entity classes, with data types that exactly correspond -- even for Date/Time types -- so I don't have to write and wire entity classes any more. Give me a language where a method defined on a class can be used in a select statement, so I can say "select ... from ... where foo()", where foo is a method defined in the class. Give me a language where there's no difference between an event and a trigger. I could go on for a while here.

Joel Spolsky
Fog Creek Software
Wednesday, March 31, 2004

Joel, start an open source project to do all that.  ;)

michael sica (michaelsica.com)
Wednesday, March 31, 2004

Well, admit it doesn't necessarily mean you have to live with it. I just say this problem can not be fixed in syntax, nor by put another layer on top of it.

There are still a lot of things we can do. First of all, I trust most applications do not really need a relational database. What they need is something between a file and an RDBMS. It dosn't have to be relational. It doesn't need SQL. It doesn't have to support ACID etc. Therefor, it can be built into a language and be very convient to use.

For those do need an RDBMS, the bigest problem, in my experience, is not looping through a result set. The problems is most developers can not think in set, but think a table as a spreadsheet, like those who can not understand pointer. And most people do not have time to digest a complecated schema. I haven't met a developer who isn't happy after I gave him a stored procedure. So, my believe is if we can just invent something a bit better than good old stored procedure, life would be just fine on this side, too.

Siyan
Wednesday, March 31, 2004

I still say Apple's EOF gives you most of what you're asking for, or something close to it.

Allowing scalars to be null sounds like a boxing/unboxing issue, and certainly a reasonble request.  This should be supported at the language level.

As for:

"Give me a language where the very act of typing a SELECT statement or a CREATE TABLE statement immediately creates corresponding entity classes, with data types that exactly correspond -- even for Date/Time types -- so I don't have to write and wire entity classes any more."

Where does this class definition then go?  How do you add custom methods to it, or subclass it?  Where does your business logic go?

Having a model file of some kind that specifies what gets mapped to what gives you much more flexibiltiy and much better encapsulation.  A good tool (like EOModeler) can even create both the SQL for defining the database schema and simultaneously generate the code for your model objects from the same file.  You can then add customer business logic to your generated classes to your heart's content.

I guess if you never have any business logic in your objects, your approach could work.

Agreed that conversion from database to programming language Date/Time types should be transparent and seamless.

"select ... from ... where foo()"

EOF has something like this for in memory filtering.  See:

http://developer.apple.com/documentation/WebObjects/Reference/API/com/webobjects/eocontrol/EOQualifier.html

(scroll to "Using Selectors in Qualifier Expressions")

If you want to send this to a database, you're running head long into the impedance wall.  How does Oracle know about your foo() method, how to evaluate it, etc.?  At that point, don't you have an OO database, not really a relational one anymore?

Jim Rankin
Wednesday, March 31, 2004

Offtopic: Relevant to the Oracle folks only...isn't PL/SQL designed to accomodate some of this type mismatch? I mean you can't get all the libraries of the .Net world, but it does a bit api wise right?

Li-fan Chen
Wednesday, March 31, 2004

We (well, I...) solve this problem by generating code.  You define your entities, attributes, relationships and I take your XML  and generate DDL, stored procedures, entity classes and an object model that sits on top. (I partly based my design on Joel's document on entity classes, except you only need to do by hand what the tool doesn't already support, like the funky join between Products, Orders, ProductOrders and the foreign Invoices table)

It's part of a bigger product that also generates a presentation layer, but that's as much as I can tell you right now...

Oli
Wednesday, March 31, 2004

In the Ada world, there is SAMeDL (SQL Ada Module Description Language) which is supposed to do most of what Joel want. I doubt that you can get away from preprosessing the source any time soon.

I believe SAMeDL was developed by SEI at Carnegie-Mellon under a contract with the DoD.

greetings,

Tarjei T. Jensen
Thursday, April 01, 2004

Joel: "I could go on for a while here.".

Please do, I think it'd be an interesting article.

Of course, I have a material interest in this, as my company has a long term project to address some of these issues, so anything you'd care to share would be useful.

Steve Jones (UK)
Thursday, April 01, 2004

Progress 4GL (http://www.progress.com) has everything Joel is asking for: every scalar type is nullable, "select" and "create" produce objects, native transactions control support, including local variables roll-backs etc.

The language has been around two decades now. The rest of it is much more like Visual Basic.

But:
-it can be executed only by the Progress runtime library (and you have to pay for every license), in addition of runtime 20MB installation
-it has the ugliest GUI support
-it's business systems oriented not general programming language

But it's great for BL. I wish somebody took the db access concept from there and merged it with C\C++\C# type language.

Vlad Gudim
Thursday, April 01, 2004

Currently available OO languages are not build for multiuser work with large data amounts. So these languages do not support:

- transactions
- declarative conditions
- way to customize mapping fields to storage device (Tables can be treated conceptually as low level storage devices)

so imperative OO languages have to low level level to map them to RDMBS

But there are some usefull nice-tries:

http://www.cetus-links.org/oo_db_systems_3.html

hibernate ,for example, are widely used by java developers, and even have ported to C#

Max Belugin (http://belugin.newmail.ru)
Thursday, April 01, 2004

Joel, wouldn't Cold Fusion or CFML count as that?

It creates structures based on one Tag, and it treats Nulls as empty strings.

I'm not trying to pimp for the language, but it seems to do the job you'd like, although it's only aimed at Web Applications

John R. Troy
Thursday, April 01, 2004

>Progress 4GL (http://www.progress.com) has everything >Joel is asking for

Except: "select ... from ... where foo()"
The function gets evaluated once, the 1st result gets sent to the server.

It  does have a good database/language match..
But it's only possibele because of:
1-row level locking/not set based
2-a really small set of datatypes.. (no DATETIME ,chars max 32k etc..)

still programming in 4gl
Thursday, April 01, 2004

"Let them stand side by side"?  What the hell does that mean?

Microsoft doesn't have the equivalent of an Entity Bean because, to paraphrase, they have admitted it is impossible to use object-oriented languages with relational databases?

If the underlying theme of this post is that people tend to use relational databases for everything, even when it is innappropriate, I have to agree.  Reality though, is that people will continue to use relational databases in almost all situations so there is value in finding ways to make it easier to program to.with them.

name withheld out of cowardice
Thursday, April 01, 2004

I have to agree with Siyan that at the core there are two different ways of thinking.

SQL is a set based functional programming language.  The fundamental unit is a "result SET".  These are great at dealing with Data.  Application programming languages are item based declarative languages.  At the heart, they deal with individual objects, windows, datum etc.  These are great at dealing with UI.

So, each language/approach is great it it's respective area, which gives you alot of productivity in those areas, but you pay for it at the meshing point.

That being said, I totally agree with Joel that the libraries are hideous.  I stopped using MFC recordsets because the raw API was just easier.  Using ADO results in a ton of code being generated (in VB, much less C++)  to get anything.

And on a final point, while there are definately cases relational DBs are overkill, I have run into quite a few projects that rolled their own filestructure only to end up duplicating much of the flexibility that DB's offer.  Given the choice between an "impedance mismatch" with a DB and re-implementing even 10% of what that DB offers me, I will deal with the mismatch every time.

Tito
Thursday, April 01, 2004

A few things need to make clear:

1. You can not map a class to a table

Because you can not put your hands on a table. In a relational database, only RDBMS have access to tables . Applications can only access result sets. Even you do a "select * from table", you still do not "get" the table. Because at the time you see your result set, the table might have already changed. Unless, of cause, you put a permanant lock on the table. I heard Bill Gates locked a Tiffany to serve his wife only on their wedding day. But that's not the way database should work.

2. You can not map an object to a row

A relational table is not like a spreadsheet consists of rows and columns. The truth is a table consists of rows and a row consists of columns. Rows in a table has no order, there is no such thing as next row or previous row. Order only exists in your result set.

Because of these reasons, all thoses O-R mapping tools/mathods/technologies are building cathedral on sand. The cathedral collapse simply because data is updated by other processes -- which is relational database designed for.

Also, the most important part in a relational database is relation. Relation is why you pay Larry Elison for the product. OO system has no concept of relation. When you do an O-R mapping, relations are lost.

When I say side by side, I mean you can only access a relational database. You can not map to it, you can not abstract-away it and you should not use a relational database for persistance.

For object persistance, I would seriously suggest you to use a file. It will be under your 100% control, it will save you from most of those troubles Joel mentioned and it could be a few hundred times faster.

The only reason that Sun promotes Entity Bean and Stateful Session Bean is, I think, they don't have an RDBMS product.

Siyan
Thursday, April 01, 2004

>1. You can not map a class to a table

Class is not mapped to a table - class is mapped to the table definition. Collection is mapped to the table.

You can get copy of collection to client.

Your argumends do not deal with with object-relational mismatch - they just deal with local-distributed mismatch. Imagine object oriented distributed technology...

----- quote --------------
>2. You can not map an object to a row

>A relational table is not like a spreadsheet consists of rows and columns. The truth is a table consists of rows and a row consists of columns. >Rows in a table has no order, there is no such thing as next row or previous row. Order only exists in your result set.
}----- quote end --------------

Look at Java collections design  there are collections with order or not. This is can be clearly mapped to relational databases.

{{{
Because of these reasons, all thoses O-R mapping tools/mathods/technologies are building cathedral on sand. The cathedral collapse simply because data is updated by other processes -- which is relational database designed for.
}}}

Lets think about databases like about version conrol systtems - we get copy of object, do work with it and trying to merge it back to the main branch.

{{{
Also, the most important part in a relational database is relation. Relation is why you pay Larry Elison for the product. OO system has no concept of relation. When you do an O-R mapping, relations are lost.
}}}

OO provide higher level of abstraction so it can be clearly mapped if we can do declarative OO.

{{{
When I say side by side, I mean you can only access a relational database. You can not map to it, you can not abstract-away it and you should not use a relational database for persistance.
}}}

some peole realy do it and like it.

{{{
For object persistance, I would seriously suggest you to use a file. It will be under your 100% control, it will save you from most of those troubles Joel mentioned and it could be a few hundred times faster.
}}}

flat file does not work well with concurrent access and large data fetches persistence is not the issue. The issue is concurrent access to large amount of data.

{{{
The only reason that Sun promotes Entity Bean and Stateful Session Bean is, I think, they don't have an RDBMS product.
}}}

:) why Oracle bought TOPLonk?

Max Belugin (http://belugin.newmail.ru)
Thursday, April 01, 2004

Siyan, give the Enterprise Objects Framework (EOF) a try.  It addresses much of this so-called impedance mismatch very elegantly.

Chris Hanson
Thursday, April 01, 2004

> Look at Java collections design  there are collections with order or not. This is can be clearly mapped to relational databases.

In some cases, yes.  But it does not always make sense to map an object to a table.  The desired data at the application level may be the result of joining several tables (e.g. a person with an address).  A single piece of data might also be the result of multiple queries (e.g. a person and a list of the movies they've rented out). 

From what I can see, Joel's complaints primarily come from the fact that VB just isn't that great (wow, I managed to say that politely!) for handling tasks like this.  Joel probably wants to write code like this:

"Given a SQL query, return a class with properties and getters/setters appropriately typed"

Now, that's really not so bad, and in a significant number of languages it, or an equivalent, would be fairly simple.  But Joel is using VB, which simply isn't up to the task, and thus for each sql query he must type out an entire class with properties and getters/setters appropriately typed. 

Bottom line: this impedance mismatch doesn't exist when you have a language powerful enough to express the conversion you're currently performing by hand.

Matt
Thursday, April 01, 2004

"But it does not always make sense to map an object to a table.  The desired data at the application level may be the result of joining several tables (e.g. a person with an address)."

EOF uses faults for this.  It knows that a Person instance has a reference to an Address instance, and that this corresponds to the relationship between a foreign key on the PERSON table and the primary key of the ADDRESS table.  When you ask the Person for it's address, EOF checks to see if it has already fetched the Address from the database, and goes ahead and fetches it if not.

The "stand in" for the Address object before it's fetched is called a fault in EOF.

Jim Rankin
Thursday, April 01, 2004

Thanks Chris and others. This EOF thing is really interesting. I definately need to take a closer look at it.

Siyan
Thursday, April 01, 2004

{{{
In some cases, yes.  But it does not always make sense to map an object to a table.  The desired data at the application level may be the result of joining several tables (e.g. a person with an address). 
}}}

"person with an address"  is a person object with address property. Can be implemented for example by person object proxy with lazy loading.

{{{
A single piece of data might also be the result of multiple queries (e.g. a person and a list of the movies they've rented out). 
}}}

Conceptually it is a person object with Movies peroperty can be implemented by the collection
of Movies with lazy loading.


{{{
From what I can see, Joel's complaints primarily come from the fact that VB just isn't that great (wow, I managed to say that politely!) for handling tasks like this.  Joel probably wants to write code like this:

"Given a SQL query, return a class with properties and getters/setters appropriately typed"
}}}

This is VB impedance but not OO. Or even VB implementation impedance - you can write these proxies using C++ or Delphi and IDispatch dynamic facilities and use it by VB.

{{{
Bottom line: this impedance mismatch doesn't exist when you have a language powerful enough to express the conversion you're currently performing by hand.
}}}

Yes but there is no such languages now (as i know) existing solutions use compromisses like minilanguages in string literals (like JDO query filter strings)  or dynamic construction of meta-expression consisting of object (like TOPLink for Smalltalk does). In each case declarative-imperative intergation has seams

Max Belugin (http://belugin.newmail.ru)
Friday, April 02, 2004

>> EOF uses faults for this.  It knows that a Person instance has a reference to an Address instance, and that this corresponds to the relationship between a foreign key on the PERSON table and the primary key of the ADDRESS table.  When you ask the Person for it's address, EOF checks to see if it has already fetched the Address from the database, and goes ahead and fetches it if not.

That doesn't sound like the kind of decision I'd want a library automatically making for me.  What if I loop through 1000 people?  Are you saying it's going to make 1000 queries as I request each of their addresses?  What if I don't want their entire address, but just the zip code?  Will this library waste CPU and memory grabbing data I don't need by always assuming I want every column/property in a table/object?

Matt
Friday, April 02, 2004

Well, Xen (see the link in the blog entry) seems like a step in the direction everyone here seems to be converging on. Select statements, etc... although the automatic entity classes aren't quite there, I think that much ought to be ok, since it seems like it will be one step closer. Is there something I'm missing as to why this is insufficient?

Eric Astor
Friday, April 02, 2004

{{{
That doesn't sound like the kind of decision I'd want a library automatically making for me.  What if I loop through 1000 people?  Are you saying it's going to make 1000 queries as I request each of their addresses?  What if I don't want their entire address, but just the zip code?  Will this library waste CPU and memory grabbing data I don't need by always assuming I want every column/property in a table/object?
}}}

Matt, your questions are newbie questions for this area. Please, look at existing implementations freely available documentation (AFAIR you can obtain TOPLink disk from Oracle for free).

The answers are "lazy loading" and "proxy pattern".

Sure you can not achieve the same efficiency as with SQL made by hands (as you can not achieve assembler efficiency with compiler generated code), but there are a lots of solutions available at present

Max Belugin (http://belugin.newmail.ru)
Saturday, April 03, 2004

I think this is what Joel want:

http://www.sleepycat.com

Siyan
Monday, April 05, 2004

My favorite OO Database was a little application called PSE Pro that I used with C++.  It was great for creating pure OO applications and having a place to store things.  Even came with a little schema generator.

The big brother to that was a databse called Objectstore.  Not sure if its still going, but it was powerfull in its day.

The great thing about it is that when you accessed an object that wasnt transient, it was retrieved from the Db in the time it took to process a memory fault - or slightly longer depending on the location of the Db.

There is none of this having to write queries, or translating data from some useless normalised form that RDBMs use.

It was a great system and its a shame that the average joe developer is still incapable of getting their heads around what OO is all about.  The reason I feel that OO DBs kind of died out.

I really hate RDBMS systems - I find them glorified garbage collection systems at best.  Put a terabyte of data in and you suddenly have to pay bucket loads of money to employ some technical dudes to understand the resultant mess just so you can get the data back out again.

Humans have evolved because they are really good at copying things.  Unfortunately, they copy the bad as well as the good.  Once something is rapidly being copied 'group thinking' mentality will take over.  Doesnt matter how sucky the thing is, the masses will believe its the best thing since sliced bread.  I believe that RDBMS systems is one of those sucky things and something that should have been shot at birth :)

Andy Watson
Thursday, April 08, 2004

There is another thread on this at Lambda the Ultimate.
http://lambda.weblogs.com/discuss/msgReader$11790

I am not in academia but spent the last 10 years doing software development (created AppleScript, Allegis, Net-It Software, Writer's Solution). I am currently working on the  impedance mismatch problem. I would be curious if anyone has comments on this paper:

Safe Query Objects: Statically-Typed Objects as Remotely-Executable Queries, with Siddhartha Rai. Submitted to OOPSLA 2004.
http://www.cs.utexas.edu/users/wcook/Drafts/CookRaiOOPSLA04.pdf

Here are some comments I posted to the group:

Impedance mismatch is interesting because there is no shortage of solutions, yet very little understanding of what the problem actually is. In the paper mentioned above we present a new anaylsis of impedance mismatch.

I tend to think of the relational model as "assembly language of data". One key similarity is that data does not have an inherent interpretation, but is only interpreted by they operations applied to it. For example, a word of memory is interpretated as an integer or a float by an assembly instruction, not by its inherent type; similarly, values in a table are interpreted as keys/foreign keys during a join operation, but the semantics of the relationship is not stored in the database -- it is only in the query. As a result, I prefer to use the Entity-Relationship (ER) Model, since it assigns more meaning to the data in the database.

In either case, I don't think that typing problems are the central issue in impedance mismatch. As many have pointed out, most languages can represent lists of tuples, and NULL data values are not really a fundamental problem for programming languages. In addition, the ER model maps very naturally to object-oriented languages.

As for the idea that "memory is so cheap that we can just load everything and use normal iteration" -- I don't buy it:

Iterating through millions of records to find the few that are needed is not efficient; indexes can still help a lot.
Straightforward queries often involve nested iteration (think a million squared); query optimization with indexes can reorder these operations to be very efficient.
Transaction models require isolation: as a result every object a transaction touches must be added into the transactions working set -- a linear scan of all objects touches everything.
So, what is the fundamental problem? The desire for a uniform (object-oriented) semantic model while still supporting the efficiency of query optimization. This is why none of the object-relational/persistent object systems have been fully successful: they are conceptual clean but don't provide a way to leverage the power of database engines for query optimization. The alternative, ODBC and its descendants, let you use the power of the database but introduce a complete semantic disconnect between the programming language and the query language. Even the advanced "query objects" in systems like Hibernate and JDO are really no better than SQL strings, since they are still constructed and verified at runtime.

Our paper is a first step toward a new approach to solving this problem. If anyone has comments, I'd love to hear them.

William Cook
Monday, April 12, 2004

I may be late to this discussion, but what about Visual FoxPro? This is an OO language with a native data engine and SLQ command support.

I used it up until a couple of years ago (Version 6) and I understand that it is still developed, mainly for those one or two large comanies that still use it.

Roy
Tuesday, April 13, 2004

For code like this:

If IsNull(rs!field) Then
      bFieldIsNull = True
Else
      bFieldIsNull = False
      theField = rs("field').Value
End If

Why don't you use VB's IIf() function:

theField = IIf(IsNull(rs!Field), "", rs("Field").Value)

...unless you really want to know if the field is null.

Tom
Tuesday, May 18, 2004

*  Recent Topics

*  Fog Creek Home