Fog Creek Software
Discussion Board




databases

Databases satisfy a general program: that of working with lists.

There is another way of working with lists that is remarkably easy and fast to develop with: the Python way.

In Python, lists are very powerful and dynamic.


So for solving the "working with data" problem, we have 2 powerful solutions:

1. databases - better at persisting data

2. powerful lists (containers) - better at manipulating data, at programming with data


So, I think that in order to solve the famous "impedance mismatch" we have to somehow merge the two models.

Some people will propose to invent a new language to do that.

Other people want to just add a class library.

I think that, in order to get a powerful unified model, both the language and the database need to change.


There is a lot of work to be done in order to achieve this great unified model.

I don't think this will happen soon.

Businesses are looking for proven solutions, so they don't want to change the database.


If a big player like Oracle and Microsoft takes a strong initiative in that direction, it may happen.

However, there doesn't seem to be such an initiative or the interest to take such an initiative, in spite of the fact that such a unified model would make programming significantly more efficient.

All they are interested in is creating "glue" solutions.

For example I consider that ADO .NET is yet another wrapper for the database, which is a lot harder to program with than with something like Python's lists.

So.. the impedance mismatch, the gap between programming and databases, is never really solved once and for all.

:-(

MX
Friday, April 09, 2004

"Databases satisfy a general program: that of working with lists"

You mean you are using cursors for everything?

Just me (Sir to you)
Friday, April 09, 2004

I wouldn't say databases were lists. They are more like arrays of hashes, were the record is the array, and the fields are the hashes.

Matthew Lock
Friday, April 09, 2004

Not only does Python have very powerful list processing, but also native support for tuples which (at least to my reading) seem very close conceptually to tuples in Codd's relational model.

Because of this I thought at one time about using Python as a  data query language, but hit a mental wall on scalability. There are persistence frameworks for Python which save tuples and lists transparently, but the whole shebang gets retrieved to memory for processing meaning that you really don't want to be working with multi-megabyte, let alone gigabyte datasets. A strong reason for using an RDBMS, however, is exactly to avoid this problem by reducing the amount of data retrieved into the application by pre-selecting only the records of interest.

What seemed to be needed was either an RDBMS that included a Python VM so that you could dispatch Python (?bytecode) to the RDBMS instead of SQL, or perhaps a Python to SQL translator embedded into the Python interpretor (which was well beyond my poor skills!).

David Roper
Friday, April 09, 2004

I consider that, if you use programming language terms, and not database terms, a table is simply a list of structs (or Pascal records).

It's very plain and simple:

table = list of structs

A database contains several tables.

My wish is to:

- use a table or a query result as easily as I use a list in Python

- use SQL on my lists as easily as I use it on an SQL Server database

- use the same data types when doing general programming and when accessing the database

MX
Friday, April 09, 2004

"Databases satisfy a general problem: that of working with lists"

> You mean you are using cursors for everything?

No, I don't mean that. Please see my message immediately above.

MX
Friday, April 09, 2004

--
I wouldn't say databases were lists. They are more like arrays of hashes, were the record is the array, and the fields are the hashes.
--

Thats exactly how I represent them in my perl code.  I have functions I use that when called with a sql statement and some bind variables return a list of hashes, and I just use that in my code.

Works like a charm.  The only downside is saving back to the database, because I need to roll the inserts/updates by hand.  I haven't looked into ways to make the hashses self-aware enough to be able to just pass a modified hash to some other wrapper functions.

Andrew Hurst
Friday, April 09, 2004

Isn't relational database theory based on sets (unordered), rather than lists (ordered)? Not that I think it's that important to the topic at hand, I'm just feeling a little pedantic this morning.

Andrew, you might want to check out Class::DBI if you haven't already. It provides a very simple yet quite powerful mechanism for instantiating Perl classes out of database tables. That module and the fact that Perl has intrinsic support for NULLs (because any scalar can be undef) goes a long way toward solving the impedance-mismatch issue for me. I feel like I get to concentrate on my logic instead of on annoying housekeeping.

John C.
Friday, April 09, 2004

John, Thanks for the pointer.

Unfortunately most of what I do nowadays is cgi, and without much OO support at that.  Mostly just lots of code 'require'd in and run procedurally.  No need for anything complex.

Though this project has been dragging a little lately, maybe I could spruce it up with some OO...

In related news, I think one of the quickest ways to unmaintainable code is "boring" problems.  People keep thinking of ways to solve the problem better and keep it interesting.  Thus too many layers of abstractions.

Andrew Hurst
Friday, April 09, 2004

>>table = list of structs

How about,

table = b-tree of structs

?

Nick
Friday, April 09, 2004

A table would be a b-tree of structs depending on the indexing.  I would think that that structure wouldn't be too useful in a program, because you wouldn't want the whole talbe in your program at once.  More likely the database could do any sorting or bulk manipulation of the rows.

In your program (generally) you're more likely to only want a subset of the data at a time.  Web pages, local programs, etc with more than a few hundred rows are probably information overload.

Andrew Hurst
Friday, April 09, 2004

A relational table is, by definition, unordered.

Many uses do require order, which is why the 'order by' construct was introduced into SQL. However, it's a kludge at best; Try to, e.g., sort a list of entries by a 'time_of_day' field, and query the standard deviation of time between subsequent entries.

APL (and K) have a concept that is sometime called 'arrables' (array + table), that embraces order and makes it part of the table definition; It makes many things considerably simpler.

Ori Berger
Friday, April 09, 2004

While MX makes a solid philosophical argument, I believe they missed it with:
'there doesn't seem to be such an initiative or the interest to take such an initiative, in spite of the fact that such a unified model would make programming significantly more efficient'

There isn't an initiative because the benefit of more efficient programming can be captured in a multitude of manners (off-shoring for one).  The thought that it would result in _significant_ improvement is nearly impossible to prove without one and the cost of creating one on the possibility is a mismanagement of the cost/benefit model most DB companies are willing to accept.

Of course, with OSS, it is possible one could build it into something like MySQL and prove it.  The question is are you _that_ certain?

MSHack
Friday, April 09, 2004

"Of course, with OSS, it is possible one could build it into something like MySQL and prove it"

http://gadfly.sourceforge.net/gadfly.html

Tom H
Saturday, April 10, 2004

DBMS != persistent storage.
DBMS = Database *Management* System.  That is, all the things you need to manage the database store have already been built in and optimized.  People who treat Databases that are under the management of a DBMS as simple persistent storage are re-implementing code that has already been done for them and wasting more time than by simply learning how to use a database management system correctly.

It does not matter what DBMS system you are using, you must know how to use it effectively.  If you wish to use an SQL DBMS (note: not a true Relational Database Management System - SQL allows duplicates! to speak nothing of allowing nulls <cringe/>), then please learn to use all of it's functions.  Most of the complaints I have seen relating to usage of SQL DBMS can be solved by using Stored Procedures.  Anyone who is using adhoc SQL queries in their program code is asking for trouble.  SQL belongs _inside_ your SQL DBMS managed database!

Stored procedures will handle updates, selects, transactions, multi-table updates, etc. and can be treated similar to function calls in your program code (not to mention being extremely more efficient).  If you can't write stored procedures, either learn or find/hire someone who can.  Programmers should stay out of the database at all times, and only use provided interfaces (stored procedures) to access or update data.

If you aren't going to use the features of a DBMS, then why the hell did you aquire one?

NOTE: above rant does not apply to people who are not using a DBMS, but then the question "Why aren't you using a DBMS?" comes to mind...

Chris Kennedy
Sunday, April 11, 2004

One reason to avoid stored procedures etc. is to maintain database portability. This reason should be evaluated on a case-by-case basis, but generally, if you wish to support multiple database backends (Oracle, MS-SQL, Postgres, etc.) you'd be doing significantly less work by putting the SQL in your application.

And apart from that, relational databases are horrible for anything that has a time-series aspect. SQL tries to improve the situation by including the non-relational "order by" construct - and fails miserably.

Ori Berger
Monday, April 12, 2004

So you wish to sacrifice integrity, correctness, and efficiency in favor of portability?  But it is a false portability:  If you are supporting multiple DBMS, then you are already having to deal with multiple versions of SQL because each DBMS has its own quirks (This is a problem with the SQL standard, not the individual DBMS).  It is much, much easier to write different sets of create database scripts, complete with standardized stored procedures (that return the same things, and have the same parameters, despite having different internal SQL) and have your program code be exactly identical no matter what DBMS you are running.  Otherwise you may find youself in a situation where you are putting in lots of if statements to handle odd errors from the DBMS.  Then everytime you port to a new DBMS you will have to revalidate all of your program code, instead of simply writing and validating a new create database script and set of stored procedures.  Use the appropriate tools for the job.

Keep the database rules in the DBMS and your program will be much happier and easier to debug.  Most of the excuses not to use the power of the DBMS has to do with programmers not wanting to learn how to use SQL or do database analysis.  I see it everyday.  The world probably needs more database professionals, but too many people are convinced that DBMS are too hard.

The entire reason for DBMS is to move your data manipulation statements OUT of your program logic and into the DBMS itself.  I say again, if you write your stored procedures correctly, they essentially become function calls to your program logic.  Therefore I see no "impedance mismatch" between DBMS and programming languages.  All I see is massive incorrect use of DBMS (and lots of people thinking the toy MySQL is a real DBMS.  Folks, it ain't).

Chris Kennedy
Monday, April 12, 2004

Chris, I generally agree with you; but I've seen too many exceptions to be a "believer" in the one true way of offloading everything to the database.

Applications I've worked on have often required joining against transient data not in the DBMS for various reasons (and it was prohibitively expensive to put them into the database just for the sake of joining).
Maintenance of multiple DBMS backends was impractical (everyone on the team knew SQL and could support it, but not even one person knew all the programmatic SQL variations), and though there were variations in the SQL code used, they were rather minimal and self-describing.

Work I've been doing in the last two years or so _requires_ time series manipulation. There are various DBMS based solutions, and none fits the relational model.  Shasha & Lerner have an excellent introduction of this at [ http://www.cs.nyu.edu/cs/faculty/shasha/papers/aqueryseminar.ppt ]. See the examples about detecting "packet flow" sessions, and of locating the best "buy then sell" strategy. These, and their kind map to horrible SQL which executes hideously inefficiently on "toy" DBMS like Oracle and DB2; Some real life queries require 9-way joins to express in SQL (or worse - transitive closure joins, which often cannot be expressed by e.g. Oracle's 'START WITH / CONNECT BY' SQL extension).

I'd rather not sacrifice correctness and integrity if possible. But-- for the applications I'm dealing with -- the DBMSs ability to enforce those is nearly nonexistant, and efficiency is severly damaged by trying.

If you can put all your logic in your DBMS -- good for you; The rest of your system must be comprised almost entirely of UI. Systems I've worked on have generally had logic which the DBMS cannot generally help.

Ori Berger
Monday, April 12, 2004

Yes, SQL has horrible support for time series.  I believe Date and Darwen address this somewhat in The Third Manifesto, but frankly I have not had the time to read it yet.

If SQL DBMS do such a bad job with handling the data you are working with, might I ask why use SQL DBMS at all?  It sounds like things might go easier for you if you just used regular files to store your data, similar to an ISAM approach.  But that's just a thought.

Chris Kennedy
Tuesday, April 13, 2004

I'm using flat files for the time series things. The only relevant database seems to be KDB [ http://kx.com ], but it's priced outside the reach of this project.

Ploni Almoni
Wednesday, April 14, 2004

Yep, that was me in the previous post, using a friend's computer. Sorry about that.

Ori Berger
Wednesday, April 14, 2004

*  Recent Topics

*  Fog Creek Home