Fog Creek Software
Discussion Board




Database and OO: design mismatch?

I'm not sure how to describe what I'm talking about here, so bear with me...

Just for background, I've been doing OO programming in C++, and database programming in Oracle, for almost ten years.  (Yes, I'm sure there are better languages and databases.)  So, I "get" OO.  I mean, I'm not Stroustrup, but I've read, understood, and used the GoF book, just as an example.  (Yes, I'm sure there are better design pattern books.)  Likewise, I'm fairly comfortable with database design issues (3rd normal form, normalization, de-normalization, etc.).

But I'm beginning to see that I don't understand how to effectively link the two together.  Currently, I'm working on a large program with hundreds of C++ classes, and dozens of database tables.  But let me give you a trivial example, and you can scale it up in your mind.

Image you have a customer table and an invoice table, joined on a 'custnum' column.  Each customer can have many invoices.  Your task is to mark each invoice for customer "123ABC" with a 'status' of "PAID".  (Yes, yes, this is all quite horrible, but that's not the point).

My OO understanding tells me, make the Invoice objects children of the Customer object, in a collection of some sort.  Then the Customer can iterate its collection of children, and tell each child, "Mark yourself as PAID".  Each Invoice will know how to do that; the Customer has no idea how the Invoice does it, of course.  Each Invoice object (knowing its unique invoice number) would say, for example:

UPDATE invoice SET status = 'PAID'
WHERE invoicenum = '2003061199999001'

On the other hand, my database understanding tells me that UPDATEing rows in the invoice table iteratively like this is ungodly inefficient.  (Image that customer '123ABC' has 10,000 invoices.)  So instead, I want to issue one statement that updates all the invoice rows at once:

UPDATE invoice SET status = 'PAID'
WHERE custnum = '123ABC'

Again, this is a trivial example, but you get the idea.  I can't find the right words, but it's like there's a mismatch between OO design and DB design.  OO tells me to hide the "MarkAsPaid" implementation from Customer in the Invoice object; DB tells me it's extremely more efficient to let Customer do it, even though OO says it belongs in Invoice.

Why is this is a problem?  Because if I do it the OO way, the database stuff is just too slow; if I do it the DB way, I end up with source code for the invoice table spread out everywhere, instead of being encapsulated in a single class.  The former is intolerable, the latter is a maintenance nightmare (especially in a 300,000 line program).

Can anyone point me to some resources - even some keywords to google would be great - that address this issue?  (And no, it's not homework.  I *wish* I was young enough for homework!)

Thanks.

Spaghetti Rustler
Wednesday, June 11, 2003

Just a quick thought.  I would not inherit Invoice from Customer.  An invoice is NOT a customer.  I would utilize Composition rather than inheritance.

Hope that helps

shiggins
Wednesday, June 11, 2003

That was pretty short.  I believe Thinking in Java (which is free at http://www.mindview.net/Books/TIJ/) has a pretty good explanation of composition.

shiggins
Wednesday, June 11, 2003

The difference is simply in the SQL Statement and the naming of the function.

In the Invoice Object:

oInv->MarkAsPaid(long InvNum);

In the Customer Object:

oCust->MarkAllOutstandingInvoicesPaid();

Each function has different SQL statements.

I would say the code goes in the Customer Object because it is the Customer who pays the invoice.  The invoice itself has nothing to do with paying itself, it only knows if it is paid or not.

If I could ask the Customer, "What Invoices have you paid?" they would respond, "I have paid many different invoices? 1,2,3,4 etc etc".

If I could ask the Invoice, "Are you paid?", It would say, "I don't know ask the customer?"

If I understand you problem correctly, I think your code goes in the Customer Class and I don't see why you would have code scattered everywhere.

Dave B.
Wednesday, June 11, 2003

I am learning OO after 12 years of pure DB stuff...that's EXACTLY what I am wondering about...I might be that stupid after all...

shiggins,

You've got it wrong. He never mentioned Invoices inherited from Customer!

DBvsOO
Wednesday, June 11, 2003

Ok, It might sound as if I'm contradicting myself.  I should say the invoice only knows if it is paid, if the customer tells it that it is paid.

Dave B.
Wednesday, June 11, 2003

Oh, good God.  I said "children", didn't I.

No, no, no, I meant composition, not inheritance.

Can we start over?

Spaghetti Rustler
Wednesday, June 11, 2003

And yeah, "Thinking in Java" is excellent.  I've read it and "Thinking in C++" cover-to-cover a few times.  Has anyone read Eckels' "Thinking in Patterns"?  I haven't and I'm curious...

Spaghetti Rustler
Wednesday, June 11, 2003

DB

"make the Invoice objects children of the Customer object"

shiggins
Wednesday, June 11, 2003

OK.  Now that we are all clear...I agree with Dave.  It follow the Expert pattern stating "whoever has the information necessary to fulfill the request should do it."  However, I also think that Invoice should be responsible for marking invoices paid/due.  So what about this.  Creating an instance of the customer ID in Invoice in order to fulfill the request?  This is more along the lines of the Creator pattern which states "class B closely uses class A objects". Something like:

Customer objCust
int custid
custid = objCust.getID()

public void setPaid()
{
    UPDATE ....
}

I may be WAY off on this.  I was just brainstorming.

shiggins
Wednesday, June 11, 2003

I knew what you meant re composition/inheritance, SpaghettiRustler.

Here's my thought. If you want to do a collective action on a group of invoices then you should create a custom collection class that has this action on it. So instead of telling each invoice to mark itself paid, tell the collection to do the "MarklAsPaid" method/operation. The collection needs to know who owns it, though, so it can formulate the query.

I think the dichotomy you are describing has to do with the fact that OO is instance based while databases are structure based - like distinct types (tables) that can be referenced collectively by type (table) or by instance (row). At least in Java, there isn't naturally a way to get the list of all objects of type Customer, unless you code it in or use a framework that provides it.

Lauren B.
Wednesday, June 11, 2003

Also, I am reading Thinking in Patterns.  However, it feels unfinished.  It's a little unorganized and there are places where he seems not to know why things are done a certain way.  Maybe I got a weird download.

shiggins
Wednesday, June 11, 2003

As a layman it appears to me that the basic mismatch is that OO is hierarchical and hierarchy is anathema to relational modelling.

I may be quite wrong of course

Stephen Jones
Wednesday, June 11, 2003

Relations form graphs/trees all over the place, so that's not
the mismatch. The question is, where is the behaviour?

You are saying in the database that bejaviour is
a simple attribute update. What if it isn't? What if you
have to do 10 other things when an invoice is paid?
As long as you are just doing attribute updates you are
ok. But what if you must email someone? Put an entry
into a callander? Trigger a report? Update accounts
X, Y, C, each of which have their own behaviours? etc.

That isn't being capture in your database code. I know
the world can look like attribute assignments, but it usually
isn't.

You can have trigger in a database, but then we are
combining application behaviours and the database,
which can get very troubled.

If you have to have a larger transaction context then
you are really in trouble. What if you had to contact
a credit card agency? That needs to be in the transaction
too. It won't be in the database version only transaction.

valraven
Wednesday, June 11, 2003

There is a big disconnect.  That's the whole reason for existence of software that's called an "object persistence framework", "object persistence layer", or something similar.  It's all about making the relational data in a database work with a programming model that's object oriented.

There is some controversy regarding whether OPF's are the best way of doing things or not, when doing object oriented programming against relational databases.  Martin Fowler's book "Patterns of an Enterprise Architecture" is largely devoted to showing different ways of working in an OO way with relational data, whether it's with a full-blown OPF or just using datasets that correlate directly to tables and fields in the database.  He's pretty good at bringing out the advantages of disadvantages of each method, it seems to me.

Herbert Sitz
Wednesday, June 11, 2003

Lauren B. - When you said, "OO is instance based while databases are structure based", I immediately thought, "But C++ classes can be structure-based, by using static members."  So, I can write a static member like this:

/* static */
void Invoice::MarkAsPaid(const CString& sCustNum)
{
    CString sSql("UPDATE invoice SET "
      "status = 'PAID' "
      "WHERE custnum = " + sCustNum);

    ExecuteSql(sSql);
}

This encapsulates the implementation in the Invoice class, where it belongs, but doesn't sacrifice execution speed.  It's so obvious in retrospect, but I just totally couldn't see it.  Just took someone with the right insight.  Thanks! 


Herbert Sitz - Is "object persistence framework" the same thing as a "metadata layer"?  At any rate, I'll look into it, thanks.  And thanks for the pointer to Fowler's "Patterns of Enterprise Application Architecture"; I love his book, "UML Distilled".

Spaghetti Rustler
Wednesday, June 11, 2003

Spaghetti Rustler -- you hit the nail on the head. I haven't seen yet any reasonable solution to the problem you described. They are either not efficient or they don't scale, in the sense that for every new action you need to come up with the new class/collection.

Another example (that we tried to solve with EJB, and failed): a Customer is a huge object, representing the data that is stored in many joined tables. It has one integer field -- counter. Every time customer hits certain web page, we'd like to increment the counter. We have customer ID, and we don't need any information about the customer at this moment. Our previous version (non-OO Perl) had a small clean funciton with one SQL statement updating one table.

How would you do it with OO / persistence layer efficiently, without loading the unneded huge object first (we need just one field)? The consultants from IBM Websphere told us that the only reasonable way to do it is to forget about all the entity beans and write session beans / direct JDBC calls for every such an action. So, the  end result is not different from procedural, non-OO code. The irony is that entity EJBs work well for simple "standard" web applications; but for majority of such applications EJBs are a huge overkill, anyway :)

raindog
Wednesday, June 11, 2003

More fundamental question: how do you design complex database-driven enterprise application? Do you start from object model or from database model? The end result is different. And it seems like the database model always comes first, from my experience with big enterprise apps (millions of transactions per day).

Just one example: denormalization. There's no need to denormalize data at all, if you look from the object perspective. However, there's an absolute necessity to have it, if you look from the practical database point of view. And only when you start from the database model, you can properly denormalize your tables.

raindog
Wednesday, June 11, 2003

Spaghetti Rustler, I must be missing something, in your example of a static method, wouldn't this update EVERY invoice for that customer as paid? Even ones that might not be in the collection?

I still vote for Lauren B. idea, put a method in the collection. After the database update you could do any other necessary processing on the collection objects.

Bill
Wednesday, June 11, 2003

As mentioned previously, the correct approach is to add a 'PayAll' method to your invoice collection class, with appropriately formed sql.

I've had my fair share of programming/design problems, due to my own incompetence and/or inexperience, but no problems I could pin on using a relational database within an OO design.

There seems to be a lot of industry angst re relational DBs and OO design, perhaps there are others issues of which I should be enlightened?

no brainer
Wednesday, June 11, 2003

Nothing to add here, but in a previous life I was the technical admin for an Oracle Financials installation, and I left that job for various reasons, including that whenever I closed my eyes I had very vivid visions of bludgeoning Larry Ellison with an axe handle.

Thud. Thud. *Thwok*

Anyone else have similar experiences?

Jamie Anstice
Wednesday, June 11, 2003

Problems with tying OO and the RDMBS together can stem from the approach that "objects ARE data", instead of "objects USE data" or "objects ENCAPSULATE data".

Proponents of the "objects ARE data" philosophy see the database as just a way to store the object permanently.  They probably would not even bother with the database if the server had petabytes of main memory,a 100% unstoppable supply of electricity, and indestructible circuits.

They expend lots of time building and maintaining a "transparent persistence" layer, where objects can be instantiated automagically from the database. So the "Customer" object gets written out to a bunch of fields in a bunch of tables primarily because it can't be held in memory forever, and on some other day the data is read in again and bundled up back into a Customer object.

The trouble is that the "automagic" functionality isn't so automagic at all, as you still have to maintain the data mappings as your object model and data model changes, and they often force you to make unwanted compromises in your design, such as reams of public getter and setter methods or loads of one-dimensional dumb data objects.  Bulk operations can also become nightmares in performance and/or complexity.

However, in database-heavy organizations such financial institutions and large retailers, the database is at the core of their IT operations and is actively used for its ability to conveniently aggregate, manipulate, and correlate huge volumes of data. For them the database is far more than a parking spot for object state.  A piece of data can be added to the database by data entry operators, then displayed on the internet, read by a touch-tone phone system, replicated to a data warehouse, aggregated into weekly and monthly reports, and output to text files for FTP to business partners and regulators. The object oriented application is just one of many uses for the same data.

This example may be over-simplifying it somewhat, so try to overlook the specifics to see the idea:  In the "objects ARE data" school, account.getTotalDepositsForMonth() would cause the instantiation of all the relevant transaction objects from their database sources, after which they would be looped through and the sum of the deposit amounts taken.  In the "objects USE data" school, it would simply invoke a method that performs an SQL query to get the total.

Still, it is important to separate the SQL and database access code (JDBC, ODBC, ADO, etc) from the application objects so they don't become tightly coupled to the data model and/or brand of database.

T. Norman
Wednesday, June 11, 2003

And some half-remembered (and probably wrong) things about invoices in Oracle Financials (I also had four feet or Oracle manuals on my desk). It wasn't the invoices that were marked as paid that counted, but the entry of balancing payments or credit notes in the ledger. So (thinking out loud) each customer might have a Ledger object, which holds various Invoices, Payments, etc and hopefully it all balanced up.  I remember it being quite a performance importing this sort of stuff from various other systems into the Oracle system.

Jamie Anstice
Wednesday, June 11, 2003

there really is not a problem there.

Having every invoice object do a separate Update call is only slow if you are doing a few hundred or more at a time.
In my experience this is fairly unusual...in any manual system payments are being entered 1 at a time, and since each individual UPDATE is actually pretty damn quick, they can be done as the data is entered without slowing down the UI.

Where the payments _are_ being done in batch lots for whatever reason you just need to do a batch UPDATE <shrug> not perfect OO, but then real life very rarely is.

you can do them both through the same mechanism, making either the batch call or the individual call for each depending on the # of invoices to be processed.


bottom line?  ....dont stress the small stuff...

FullNameRequired
Wednesday, June 11, 2003

http://www.agiledata.org/essays/mappingObjects.html

William Williams
Wednesday, June 11, 2003

"Having every invoice object do a separate Update call is only slow if you are doing a few hundred or more at a time."

You've confused a contrived example with an actual use case.  The assumption was that there are 10,000 invoices.

It would be damned inefficient.

LookAtMyFingerPointingAtTheMoon
Thursday, June 12, 2003

"Has anyone read Eckels' "Thinking in Patterns"?"

His cakes are nice. And I did so enjoy him in The Goon Show.

I tried (like I tried with "in c++"), but I just didn't find him very readable.


Thursday, June 12, 2003

SpaghettiRustler:
Yes, yes of course, static members! Java has that capability too. Good solution!

Lauren B.
Thursday, June 12, 2003

Depending on your programming language you may be able to fix this.
In VB/C# I would have a customers.invoices class which implements ICollection (Or whatever your language requires) and also has a MarkAsPaid method.
It's worth pointing out that MS do this on the Worksheets collection in excel.
So the following code in VBA is valid:

Public Sub PITest()
    Dim WS As Worksheet
    For Each WS In Worksheets
        WS.Visible = False
    Next
    Worksheets.Visible = True
End Sub

Does this mean Joel "solved" your problem all those years ago?....

Peter Ibbotson
Thursday, June 12, 2003

T. Norman said: "In the 'objects ARE data' school, account.getTotalDepositsForMonth() would cause the instantiation of all the relevant transaction objects from their database sources, after which they would be looped through and the sum of the deposit amounts taken.  In the 'objects USE data' school, it would simply invoke a method that performs an SQL query to get the total."

Right, and that's just what my static member function does: it executes a single UPDATE statement.

The contrast between "objects ARE data" and "objects USE data" was lurking murkily in my mind, but you stated it elegently.  I knew (from bitter experience) that the former would lead to the problems you describe, but couldn't come up with the latter.  Thanks!

(It occurs to me, too, that the Composite pattern might be useful here.  Maybe the Invoice class implements IMarkAsPaid, and contains a collection of individual IMarkAsPaid-implementing objects... then Customer could tell either the whole Invoice collection, or just individual invoices, to "MarkAsPaid", without having to care which it was talking to ... just thinking out loud...)

Now I'm off to buy that Martin Fowler book...

Spaghetti Rustler
Thursday, June 12, 2003

I'm not actually sure about the idea of using a collection method "MarkAsPaid".  Maybe it would work better with your actual use case, but my problem with the customer/invoice example is this:  If the invoice collection object does "MarkAsPaid" by using the customerID update (as you have suggested), that means that the collection is making assumptions about how it is used.  Namely, it is assuming that  the customer objects keeps all of its invoices in that one collection, and that marking the invoices in the collection as paid means that all invoices for the customer should be marked.  This may be a valid assumption, but it still seems to be a problem.  I think it might make more sense for the customer to have a "markAllInvoicesPaid" method that does the update, since the customer should know how its invoices are laid out.

Mike McNertney
Thursday, June 12, 2003

"You've confused a contrived example with an actual use case.  The assumption was that there are 10,000 invoices."

?????  so move onto point two of my post...if you have to do a batch UPDATE then do so and dont worry about it.

FullNameRequired
Thursday, June 12, 2003

There is a real life case of this.

The Naked Objects framework automatically generates the user interface and persistance layers.

For the single transaction stuff it works like a dream, but there are problems with batch updates.

http://www.nakedobjects.org/discuss/article.php?id=382&group=nakedobjects

I think that if objects are to be pushed to the next level, like Naked Objects it trying to do, then this is one of the problems that needs to be resolved.

Ged Byrne
Thursday, June 12, 2003

I see that Developers Express has just released a beta version of a .NET object-relational mapper.

http://www.devexpress.com/products/net/xpo/index.asp

For those who don't know, Dev Express is probably the premier 3rd party component vendor in the Delphi world.  From what I understand they're building up a library of .NET components that are supposed to be pretty good, too.

Herbert Sitz
Tuesday, June 17, 2003

*  Recent Topics

*  Fog Creek Home