Fog Creek Software
Discussion Board




Database Design: Ordered Collections

RDBMS are based on set theory and therefore do not implement the concept of ordered relationships and collections.

In practice you have to do this on your own.

One possibility is to add an additional integer field to the table that represents the "items" inside the collection. This field contains the ordering criteria. The ordering semantics is available to relational operations through the "ORDERED BY" option in SELECT statements.

However, when the items inside the collection have to be rearranged, this strategy can lead to expensive reorganizations of many rows inside the table, affecting an unpredictable set of items, even if the position of only one item has to be adjusted. Among other things this can lead to locking problems.

How do you handle this problem?

Not actually a database programmer
Sunday, February 22, 2004

ORDER BY always sorts result sets only. Why bother with the "real" order in the table? You select whatever you need and sort the result set. There is no need to sort the individual data in the table.

Or what am I missing here?

Patrik
Sunday, February 22, 2004

Patrik,

the challenge is to store an (artificial) ordering criteria that cannot be derived from existing database fields.

Not actually a database programmer
Sunday, February 22, 2004

How many items are in your collection? How often will they be rearranged? Will you act on the database of one item order change at a time, or prefer to make the entire change at once?

If you have pretty huge lists with some major reorders, you could try to store the order in a seperate table. The table would contain the shared key for that record set and a long string field. You could then store the order in a string with the indiviudal record key seperated by a delimiter. When you query for that record set, you would also query for that order list. In your programming language, you would combine the two for ordering purposes. This is more work on your end, but to rewrite the list order, you are updating a single field in the DB. Obviously this is not relational and would have Codd juming up and down in agrevation. The number of list items would be limited to the length of your string field, but you can have very large text fields. Sometimes you have to use shortcuts to gain speed though.

m
Sunday, February 22, 2004

An ordering that cannot be derived from table fields?

What would that accomplish other than poor SQL performance?

I mean if you invent some virtual homebrew ordering magic that is not based on table fields, then that would be invisible to the SQL parser and you royally fool the SQL query plan optimizer.

You would need to sort the result set of the complete table in your program anyways. This would lead to full table scans in the database. This is a bad idea.

What is the root problem you want to solve by having this "invisible" sort order? ... Why cant the sort order be visible in your database schema?

And, in my experience, any half decent RDBMS do a much better job at sorting using ORDER BY than your average sort algorithm in your (or my) programs.

Patrik
Sunday, February 22, 2004

"challenge is to store an (artificial) ordering criteria that cannot be derived from existing database fields"

How can you store the artificial ordering criteria if it can't be derived from the existing data? There must be some way to map the order into the records, or you could never find the records in which to store it.

So look at the algorithm you are using to sequence or resequence the fields, and apply it to the result set (instead of storing an artificial ordering column). 

If I've still missed something here, you might need to supply a more concrete example.

Tom H
Sunday, February 22, 2004

I think he's talking about an order that's derived from data outside the database. For example, a student roll that lists kids in class rank order, but the grades aren't kept in this system.
So every grading period the entire roll has to be re-ordered.

And someone already gave the answer - just use a second table with two fields: StudentID and Order. Then you just have to re-write that table.

As for performance, this example is exactly why you keep the data in a relational database - only an RDBMS would give you the ability to re-order the entire recordset while only locking the order data and not the full record data.

You just have to learn to think relationally. ;-)

Philo

Philo
Sunday, February 22, 2004

The second table is essentially the same as his original design. What's the difference if you store it in the same table or a child table?

But your guess as to the source of the problem is probably correct; he needs to join data in one database with data from another source. So this is an external interface problem, not a database design problem (think web services, XML-RPC, etc).

Tom H
Sunday, February 22, 2004

If you store it in the same table, then you have to lock every row when you're updating the ordinal data. If you store it in a child table you only have to lock the child table.

Since it's ordinal data, I'd think just about any change would generally be a full re-ordering (if #2 drops out, you have to touch rows 3->the end), so depending on how often you change the order, it could be a real issue.

Philo

Philo
Sunday, February 22, 2004

Tempering my last post, the child table might be useful if changing the order does cause the locking/many row update he mentions.

But I'd still prefer to see the original data (the grade) stored in the db, rather than a computed value (the rank).

Tom H
Sunday, February 22, 2004

Agreed. There are several potential problems that immediately come to mind:
1) Privacy. The system that's printing class rank may not be authorized to hold grade data.
2) Politics. The academic staff keeps grade data, the administrative staff keeps ranking data (or needs ranking data for some other purpose)
3) The grades are in a legacy system that doesn't do ranking.

I've run into each of those problems in various systems I've worked on.

Philo

Philo
Sunday, February 22, 2004

Another motivation to store the ordering data in a table separate from the items being ordered is that it is probably a more logical model of what you are trying to do.  The master table represents a set of individual items or objects.  The ordering table represents ordered collections of those items/objects.  Two distinct notions.

FPR
Sunday, February 22, 2004

>>But I'd still prefer to see the original data (the grade) stored in the db, rather than a computed value (the rank).
<<

You can run into a situation where you're trying to order not by data that's out of your hands for whatever reason (the grades in the above example), but completely arbitrary schema.

I've been thinking about this just this morning - trying to determine how to order questions for insurance application forms being read in from the database.  The questions will be ordered by the (changeable) whims of particular insurance agents and underwriters, but I certainly don't want to hard-code the questions onto each and every form!

My "solution" is an integer field to control the ordering, defaulting back to ordering by the date the question field was entered at need (the ordering field exists in the table associating the question with a given insurance product).

It feels very kludgy, but I can sorta get away with it because of the relatively small size of the problem.

The only way that sort of thing could really work, I suspect, would be through an appropriate interface which manages the ordering whenever a change is made (for instance, an interface that allowed the insurance agents to change the order of questions without bothering me).

I can't think of an elegant way to handle the problem, though...

Mediocre ASP Monkey
Sunday, February 22, 2004

> based on set theory and therefore do not implement the concept of ordered relationships

Try telling that to the stl::set class. (Or at least the MS VC++ 6 implementation thereof.)

Yes, the contents are ordered :-(
Monday, February 23, 2004

I think what was intended was a CODASYL or network database structure where records are members of sets.

You can achieve the same thing with RDBMS though without the cool navigational syntax. 

Simon Lucy
Monday, February 23, 2004

You can use

a) an integer field that jumps by 100 or 1000 in the original assignment of the sortkey.  If you have A with sortkey 100 and B with sortkey 200 and you need to insert a new key, C, in between them, you insert C with sortkey 150.  This pushes the need to update the entire table off until later.

b) a string field using longer and longer strings as you do more "insert betweens".  You have A with sortkey 'aa' and B with sortkey 'ab' and you want to insert C between them.  Therefore, you give C a sortkey of 'aaa'.

In most systems, a is a better option because integer sorting is much faster than string sorting.

Richard P
Monday, February 23, 2004

*  Recent Topics

*  Fog Creek Home