Fog Creek Software
Discussion Board




impedance mismatch: The glue you're missing.

You'll want to refere to ObjectSpaces:
http://longhorn.msdn.microsoft.com/lhsdk/ndp/daconoverviewofobjectspacesarchitecture.aspx

In more human terms:
http://www.15seconds.com/Issue/040112.htm

Matt
Thursday, March 25, 2004

I don't want glue. There's lot of glue. I want languages that are designed so you don't need glue.

Joel Spolsky
Fog Creek Software
Thursday, March 25, 2004

If the database is powerful enough you can push
program code into the query language, and then
use a much smaller external layer to do the interface.

This can mean less glue for GUI apps and very little
glue at all for server apps. I'm thinking of something
like Kdb or Oracle JDeveloper. What are your thoughts
on this?

PS
I think you can map records to C structs using stuff like
Pro* C, What are your thoughts on that?

Ali
Thursday, March 25, 2004

Joel, type out an imaginary exampe of what you would like so we can see it.

Matthew Lock
Thursday, March 25, 2004

Joel says something to the effect of "you could probably do it in Lisp", which is true:
http://schematics.sourceforge.net/schemeunit-schemeql.ps

You can also do it in Smalltalk.  See http://www.smalltalksolutions.com/schedule2004.htm#Bryant1 and http://www.cincomsmalltalk.com/userblogs/avi/blogView?searchCategory=databases for more info.

Avi Bryant
Thursday, March 25, 2004

Joel, your redirect code for links in comments doesn't preserve the #anchor part.  People trying to visit the Smalltalk Solutions link above will have to copy and paste.

Avi Bryant
Thursday, March 25, 2004

Actually, if you really want a language that has no impedance mismatch with relational databases, you need a logic language like Prolog, or something ORMish.  No glue, just "facts".

Of course, such languages usually aren't great for doing anything *else*...

Phillip J. Eby
Thursday, March 25, 2004

Don't know if it's still current, but there used to be a lot of work at http://www.abdn.ac.uk/csd on the use of Prolog as an interface to RDBMS. From what I recall, there was need to formulate 'queries' as such, persistent data was treated in exactly the same way as local data. I seem to remember that they built a large scale protein structure / pharmaceutical (?) database using the technology.

David Roper
Friday, March 26, 2004

Joel,
Objectspaces really is what you want . . . if you'd like, drop me an e-mail and I can share some code that transparently persists your business objects.

Furthermore, ObjectSpaces allows generation of a database against your existing object model (via one of their samples) . . . you'll truly want to invest in this technology.

Matt
Friday, March 26, 2004

"Joel,

Objectspaces really is what you want . . . if you'd like, drop me an e-mail and I can share some code that transparently persists your business objects."

Business objects... am I the only one who find these terms highly amusing?

Leonardo Herrera
Friday, March 26, 2004

Oh, and this:

reader = os.GetObjectReader(New  _
    ObjectQuery(GetType(Customer), _
    "Company = 'Litware, Inc.'", Nothing))

looks just like glue to me. Of course, I can be wrong.

Leonardo Herrera
Friday, March 26, 2004

*Cough* Foxpro, xbase++, dbase *cough*

Fuel Guzzler
Friday, March 26, 2004

The comments by Joel about impedence mismatches between data access and languages misses out a couple of things. Firstly the integration between SQL and PL/1 is perfect, the syntax for the SQL SELECT clause fitted perfectly into PL/1 since they shared the same mind set/designers. The problems with NULLs was handled by having a PL/1 bit variable to hold the true/fale value along with the native variable that held the actual value. Personally I think the NULL was an invention of the devil and makes things more complex than is usually needed. Also with PL/1 QSL was actually embedded into the main PL/1 code and was ripped out by a pre-processor and compiled separately into a Plan. I could go on if anyone is really interested.

When I moved into the PC world I was surprised to see SQL being dynamically created and sent to the database to execute. In the above PL/1 case the SQL was compiled to provide faster access, this was done at the same time as the PL/1 was compiled.

I think that most of the ways that modern languages access databases is flawed, and would love to see the PL/1 method reinstated. I think the reason we don't see this (although I believe ORACLE supports it), is the principle of keeping a language small and offloading all the work to a library with no hardwired I/O in the language itself. This was the mentality behind C and most languages that have followed it seems.

WhatTimeIsItEccles
Friday, March 26, 2004

"
reader = os.GetObjectReader(New  _
    ObjectQuery(GetType(Customer), _
    "Company = 'Litware, Inc.'", Nothing))

looks just like glue to me. Of course, I can be wrong.
"

In my implementation using ObjectSpaces, I have the above line written once. Object usage is literally: BizObjCollection.Create() or BizObjCollection.Retrieve() or BizObject.Update() or BizObjCollection.Delete(BizObj) . . . all filter parameters are based upon the object properties. You don't see a single connection, SQL query or .Open() anywhere within my object library, although they're persisted as intrinsically as my might can conceive.

I lub it. When MBF arrives in this decade (which is heavily based upon ObjectSpaces), you'll lub it too.

Matt
Friday, March 26, 2004

Well, since we're talking about object impedance (ence?) to databases and XML structures, here's my attempt at crossing the XML/object boundary for Cocoa/Objective C:

http://homepage.mac.com/jimbokun/Excelsior.html

No, it's (obviously) not part of the language (Objective C), but reading the linked paper on XML#/Xen/whatever, I'm a little nervous about that whole idea.  Dealing with things that are kind of objects and simultaneously kind of XML structures, adding complexity to the language, changing language semantics in ways that may not be immediately obvious or transparent...(* shudder *).  Next thing you know, you end up with C++

Even taking Joel's example of string classes, String is still a class in a library (java.lang) in Java.  The only thing I know that's really "special" about it is the overloading exception for the "+" operator.

Jim Rankin
Friday, March 26, 2004

OK, couple more things, then I'll stop for a while:

1. Doesn't this have leaky abstraction written all over it?  The object model is an abstraction and the XML model is a different abstraction.  Where does one end and the other begin?  Isn't it better to have clear boundaries between your leaky abstractions so you know where to look when something breaks?

2. It seems to have been a generally poor idea to push object orientation onto databases.  The relational model is still going strong, and the market hasn't strongly favored the OO db vendors out there.  So why is it such a good idea to push  relational abstractions into your objected oriented programming language?

Back to XML, who knows what subtle bugs can creep in when compilers need to switch from being a OO language compiler to an XML compiler mid file then back again...oops, I promised I would stop.

Jim Rankin
Friday, March 26, 2004

Really, it's just a matter of bridging the gap between two preferences:

- We prefer to store data relationally, because it makes sense.
- We prefer to build systems object oriented, because it makes sense.

This can also be expressed as:

- Persistence has to do with schema (ie, how the data is saved).
- OOP has to do with behavior (ie, who/what is responsible for conducting business).

There's a gap here . . . and I don't see this gap closing anytime soon. So the value of ObjectSpaces is tremendous . . . because it closes that gap (or, at least, greatly reduces its width).

Well, at least to me . . .

Matt
Friday, March 26, 2004

since someone mentioned object spaces already.

you could look at NEO

http://neo.codehaus.org/

which is here and working today.

not a first class part of the language, but i think what most of us would accept is a standard way to treat "objects as objects" and have the objects themselves figure out how to update/manipulte the datastore.

mikester
Friday, March 26, 2004

Old timers will remember MUMPS, a language based on a database model:

http://en.wikipedia.org/wiki/MUMPS_programming_language

And the Pick operating system, which looked to the user like a database:

http://en.wikipedia.org/wiki/Pick_operating_system

What was distinctive about both systems is that quite ordinary users "got" the concept quickly and used them to build large, sophisticated systems, some of which are still in use.

lkb
Friday, March 26, 2004

BTW, Joel says "functional languages like lisp", which seems to  miss the point.  Lisp can be used as a functional language, but it doesn't have to be.  It's not "pure-functional" like some languages -- evaluation can have side effects.  You can write programs with line numbers and GOTOs if you want to.

(tagbody - 10 (princ "basic rules!") 20 (go 10)) ; put that in your pipe and smoke it, mccarthy

But he is right, though for the wrong reason.  It would be really simple to do in Lisp.  If you wanted to find a language feature that made it so easy, it'd probably be a combination of the simple syntax, and real macros (not those wimpy C macros which don't do much).

If it was me, I'd probably use a very thin glue layer at first, using reader macros.  So you could write, say, (setq x [[SELECT * FROM my_table WHERE my_field == y]]) -- putting your own variables in an SQL statement, and putting an SQL statement in your code.  Then, if that was still too low-level, I'd write a macro which used my new inline SQL feature to wrap variable access -- so you could simply treat an SQL table as a list, like (setq x (remove-if-not y my_table)) -- which is what Joel seems to want.

Dang, I'm too young to be an old Lisp fart...

K.H.
Friday, March 26, 2004

On the topic of integration with the database, I often wonder why more people don't suggest PL/SQL (Oracle).

It's essentially SQL with procedural extensions, and whatever it's warts (it has more than a few), I personally find it a very clean, ordered language to program in. Most of the datatypes that are used in the language map directly onto allowed database column types, for example.. which seems to be something Joel is asking for.

deja vu
Friday, March 26, 2004

Dataphor ( http://www.dataphor.com ) is a potential solution to this impedence problem. They provide an implementation of a language called D4, which provides a relational algebra that can be used to define and manipulate truly relational data models. D4 also provides the flow control constructs etc common to other languages.

D4 programs run on top of a data access engine which implements a truly relational DBMS over a pluggable persistance layer that can sit on top of a legacy SQL DBMS or other simpler engines (like flat files). The whole thing runs on .Net and you can use C# etc. to create custom D4 operator extensions.

D4 is based on Tutorial D which was an example language that Chris Date and Hugh Darwen provided in their book The Third Manifesto ( http://www.thethirdmanifesto.com/ ). This attempts to provide a foundation for integrating object and relational technologies. You can also find a lot of the same discussion/theory in the latest edition of Chris Date's excellent book "An Introduction to Database Systems".

A great thing about a language based around the true relational model of data (rather than the flawed SQL data model) is that it frees you from the asymmetry that you have to choose when you model with objects or with XML (which have hierarchical and network models respectively).

As long as you fully normalise your base relations (tables in SQL speak), you can then create any number of assymetric derived relations (views in SQL) which are fully updatable.

It makes you programs more maintainable because these views give you a compatability layer between the logical model your program works with and the underlying base logical model. So if you want to write a new program to work with the same data in a different way, you're not forced to work with a logical model that's designed for how the original application wanted to process the data.

Of course you can partially get this benefit by using a SQL DBMS but because the SQL model is flawed it's not nearly as good ... and if you want to get this kind of flexibility using object hierarchies, you effectively end up writing your own DBMS anyway.

Another good thing is that a truly relational DBMS/language provides full support for declaring  integrity contraints for the logical model. Once you fully constrain your model by declaring these constraints, a lot of the imperative checks you would normally need to do in your code become redundant since the DBMS does the checking for you. So your data management code becomes simpler.

Given that the proper relational model is fundamentally based on predicate logic and set theory, we're not to far away from the whole Lisp/logic language thing mentioned above. The other thing to mention is that of course your relational language doesnt need to be strictly imperative like D4 is. It just needs to provide a complete relational algebra, along with the other language constructs needed to make it computationally complete.

That being said, I haven't actually used Dataphor properly so I can't really comment on how good it is (although I've read good things about it). I downloaded the version 2 beta to discover that they hadnt updated their documentation from version 1, and that lots of stuff had changed which made it hard to even work through their tutorials .. so I uninstalled it.

But we're talking theory here rather than implementation so don't flame me :)

Simon Collins
Friday, March 26, 2004

One of the key benefits of .NET, and one that's way undersold as far as I can tell is the strongly typed datasets.  99 out of 100 of the examples use weakly typed datasets, so it's easy to overlook the strongly typed datasets.

I know they're glue.  But they're the best glue yet (Maybe ObjectSpaces is better, but it's not here now).  I can drag a table onto a designer and the glue is squeezed out of the tube for me.  The nice thing is that the code I actually write is all compiled, no "" necessary, so no code in strings which doesn't get checked by the compiler.

Jim
Saturday, March 27, 2004

Of course there's an impedence mismatch between Object-Oriented Languages and Relational Databases. I'm actually glad there is. Not because I want cheap "job security," but because they are both powerful ideas and tools in their own rights, but weakened when you try to put them together. I want a good boat when I'm on the water, and a good car when I'm on land. Most amphibious vehicles are either way too clunky or outrageously expensive.

Here's a perspective from basic English grammar:

Data are encoded representations of facts. A fact is a type of sentence, typically a subject and a predicate. When you organize those facts by the subject, you get a subject database. This is the basic of Entity-Relationship (ER) modeling, the most common way of modeling data.

Object-oriented is focused on sending messages to objects, where the reciever of the message determines how exactly it will be interpreted. So we are dealing with sentences with a transitive verb, and the emphasis is on the "direct object" of the sentence. Hence, object-oriented. This gets us into the whole concept of "responsibility driven design."

This gives us two very different structures for thinking about things. Facts (data) are something you know about a subject; messages are something you want done by an object. Mixing them is like constructing sentences that are simultaneously in the active and passive voices.

Check out Scott Ambler's Agile Data web site www.agiledata.org and book "Agile Database Techniques: Effective Strategies for the Agile Software Developer" ISBN#: 0-471-20283-5. He gives good reasons why relational database schemas and class models should not be identical. Also, some techniques for dealing with them.

This is similar to the difference between the language of business, versus the language of programming. We need to map between them (architecture/design) but keep them separate, using each where it makes sense.

This is the difference between conceptual, logical and physical data models; business concepts, data theory, and practical implementation. Unfortunately, most tools and training that are available treat these as if there was a very simple transformation between these layers of models. For example, a logical entity with attributes simply becomes a physical table with matching columns; everything is a one-to-one correspondence and only the terms change. Unfortunatly, this usually results in either a database that users understand but has terrible performance problems, or an efficient database that users don't understand because it doesn't match their view of the world.

I'm all for having tools that will help with handling the mismatch. However, I prefer to keep this to a mapping between models. Each model should have a language appropriate for that model. There also needs to be a language for the mapping. We should always be able to tell which language we're speaking.

David Lathrop
Saturday, March 27, 2004

>> Check out Scott Ambler's Agile Data web site
>> www.agiledata.org and book "Agile Database Techniques:
>> Effective Strategies for the Agile Software Developer"
>> ISBN#: 0-471-20283-5. He gives good reasons why
>> relational database schemas and class models should not
>> be identical. Also, some techniques for dealing with them.

David, having read through Ambler's essays on his website, I can't say I find them either clear or persuasive. He frequently confuses the difference between conceptual, logical and physical models and seems to regard relational databases as simply a persistence layer on top of which you need to stick a logical layer modelled with objects.

A very good article by Chris Date that dissects one of Ambler's essays illustrates this well. See "Models, models everywhere, nor any time to think" ( http://www.pgro.uk7.net/cjd3a.htm )

IMHO data modelling with objects is a bad idea. Modelling your system relationally (and as much as possible declaritively), and then using a language that supports relational operations natively to add the dynamic aspects of your application, is a better way to go.

That being said, I currently program mainly in Java so I'm as guilty as the next guy. But I'd change in a second if I could find a system that allowed me to program relationally. I guess I really should try out Dataphor again :(

Simon Collins
Saturday, March 27, 2004

>I think that most of the ways that modern languages access databases is flawed, and would love
>to see the PL/1 method reinstated. I think the reason we don't see this (although I believe ORACLE supports it),
>is the principle of keeping a language small and offloading all the work to a library with no hardwired I/O
>in the language itself. This was the mentality behind C and most languages that have followed it seems.

<speculation_speculation>
The reason might be that language designers
don't want the language to become obsolete, just because database ideology changed.

Once they had Hirarchical DB's then came SQL. In another ten years it may be still SQL,
it may be something else.

<speculation_speculation>
What could sell, is a generalized mechanism for extending languages, something like templates in C++
but with more power.

Something that
- can invoke external programs (like you need to get the database schema from somewhere).
- access information in symbol table of the compiler
- maybe extend the syntax of a computer language (if needed).

In this case database vendors can extend the language without resorting to preprocessors (those do not
quite integrate into modern IDE's). Of course, you would have to give up the notion, that a language
is once and for all defined by the language standardization process, that would be a huge change in mindset ...

</speculation_speculation>

Michael Moser
Sunday, March 28, 2004

Try: http://msdn.microsoft.com/vfoxpro/

Jamie Osborn
Sunday, March 28, 2004

Jole says:
In other words, language designers never bother to put database integration features into their languages. As a tiny example of this, the syntax for "where" clauses is never identical to the syntax for "if" statements.

That sort of lingual integration is the main weakness of Perl.  By baking so many features into the language it makes it hard to keep the whole language in your mind at the same time.  Try maintaining a large application written in Perl and you'll be reduced to prayer as a means to finding your bugs.  You'll quickly come to realize why it is much better to rely on libraries then try to create super languages with every feature nested inside. Have other languages that allow the use of regular expressions chosen to add them as language features or do they add them as libraries?  I know that C, Python and Java have added libraries, and it is easier to understand as the code size of the application gets larger.  I am not alone in this point of view, it is one of ERS reasons for moving to Python as his main development language. See http://www.linuxjournal.com/article.php?sid=3882

Michael Thomas
Sunday, March 28, 2004

"Joel, type out an imaginary exampe of what you would like so we can see it."

Well, I'm no Joel (who is?! - apart from Joel of course), but ....


schema.sql

create table Employee
{
        ..
}


Employee.java

import "schema.sql";

public relational entity Employee
    table Employee
{
      //no need to declare types, as they are
      //auto-implied from the schema

      public static Employee lookup(long id)
      {
            return new Employee(
                      SELECT * FROM Employee
                      WHERE ID = id);
      }

      private Employee( SQL fetch_statement )
      {
            super(fetch_statement);
      }

      public Employee(String first_name,
                              String last_name)
      {
            this.first_name = first_name;
            this.last_name = last_name;
            insert;
      }

      public void giveRaise()
      {
            this.salary *= 1.1;
            update;     
      }

      public void fire()
      {
            delete;
      }


      // special override for custom delete
      private SQL delete()
      {
            return DELETE from EMPLOYEE WHERE id = this.id;
      }
}

Ash
Friday, April 02, 2004

Kind of a late comment, but I was looking at the suggestions for Arc (Paul Graham's new language) at http://www.archub.org/arcsug.txt and, reading between the lines, it looks like it will contain database support.

Of course, for all you anti-Lisp weenies, it may not be of interest...

adam connor
Friday, May 07, 2004

*  Recent Topics

*  Fog Creek Home