Fog Creek Software
Discussion Board




Database Model vs. UI design

As the majority of the relational database examples I have seen over the years (not unlike that provided with VB, Delphi, and third party components) are for the most part very basic and simplistic in design, there are often times when the relational database model appears to fall short of being able to address all the ways we use and gather data in the real world especially when it comes to normalizing a database. Also, while I do find it beneficial to look for ways to minimize the number of tables required to accomplish the task at hand, this sometimes seems to limit my available options with regard to UI design. In other words, I have found that in certain cases I can resolve a UI problem (whereby I have more and better UI design options at my disposal) by adding or creating additional tables (while still being able to normalize the database). Hence, the question arises as to where's the tradeoff or what's the real determining factor in deciding how many tables to use?

Marty Potokar

Marty Potokar
Wednesday, June 11, 2003

Forty-two.

Philo

So long, and thanks for all the fish
Wednesday, June 11, 2003

Use as many as you need.

One of my recent projects required several hundred tables, but they were all named in a (hopefully) logical way, so it was easy to understand.

Often, adding tables does make the implementation of the rest of the system easier, so it's a good thing. Having to join a few extra tables is no problem for most modern databases, unless you have very large tables or a lot of concurrent access.

Given that hardware is so cheap and labour is so expensive, it doesn't really make sense to try and cut down on tables, within reason.

The same is true of disk space usage, limiting it is tatally pointless (in 99.99% of cases), as Joel pointed out previously.

Steve Jones (UK)
Wednesday, June 11, 2003

Go Philo (if that really is you) - Douglas Adams and Daffy Duck references within a 48 hour timespan.

www.MarkTAW.com
Wednesday, June 11, 2003

Back on topic - I agree with Stephen. Do what works for you here, odds are there is no holy-grail (though it never hurts to ask) standard to which you want to uphold to. And if there ever was, there ought to be a darned clear reason why. So clear that once you heard it, you would *want* to make your databases that way.

www.MarkTAW.com
Wednesday, June 11, 2003

I think the best approach is to go with a a purely normalised table design.  You can normalise a table almost mechanically, which means you'll have the full database scheme in no time.

Then make sure you have the ability to measure the performance of the tables.  Collect the data and identify the bottlenecks.

You can save up these changes for periods when the user doesn't get to see much change.  You can get big performance increases with very little work, so the user will think your very clever and busy.

Ged Byrne
Wednesday, June 11, 2003

I have little understanding of this afraidness to add many tables to a database. If it helps add as many as you need. Most complex database systems have thousands of tables anyway. Most client server stuff I have done have had anywhere from 20 to 50 or so. My approach to map UI design to database design is to create database views.

This way, if you have say 4 tables to store your data, you can easily join them to create a view like this:

CREATE OR REPLACE VIEW THE_STUFF AS
SELECT FIELD1,FIELD2,FIELDn FROM
TABLE1 T1, TABLE2 T2, TABLE3 T3, TABLE4 T4
WHERE T1.ID=T2.ID
AND T2.ID=T3.ID
AND T3.ID=T4.ID

Then your GUI database functionality can use a simple

SELECT * FROM THE_STUFF

to show all data, and if the underlying tables changes recreate the view to suit the needs of your GUI code.

Most "real" databases I have worked with supports either updateable views or some kind of rewrite rules so you can have the GUI code post updates or inserts to the view layer. How you implement updateable database views depends on the database.

This helps simplifying the GUI code, and as long as you keep the database views the GUI code is happy even if the underlying tables changes.

Patrik
Wednesday, June 11, 2003

Marty said:

"by adding or creating additional tables (while still being able to normalize the database). "

This seems strange: creating additional tables and normalisation usually go hand in hand.  Normalisation normally leads to additional tables and joins and it is denormalisation the reduces the number of tables and joins.

What exactly do you mean by normalising the database, Marty? 

Ged Byrne
Wednesday, June 11, 2003

Isn't normalization when you make sure all the content works with each other... That the username field is always 256 characters long and there are 3 address fields and so on?

One step beyond that, you want to make sure that "John Doe" is always "John Doe" and not "john Doe" somewhere.

The more tables you have, the more chance of that kind of thing happening.

www.MarkTAW.com
Wednesday, June 11, 2003

>Isn't normalization when you make sure all the content >works with each other...

No. Normalization is to avoid double storage of data between multiple tables. In your John Doe vs. jOhN dOE example the normalized way of storing that would be to have the following, for example:

TABLE USERS
ID_USER    NUMBER,
FIRST_NAME VARCHAR2(128)
LAST_NAME VARCHAR2(128)
.
.
.

and then, in all other tables requiring user information, just store the ID_USER. That means we have the name and other user information stored only in one place. This eliminates the JoHn DoE problems.

Patrik
Wednesday, June 11, 2003

Oh. I thought that was just the way it was done. That's the way I design my databases and I thought it was the way everyone on the planet designed theirs.

www.MarkTAW.com
Wednesday, June 11, 2003

MarkTAW,

>I thought it was the way everyone on the planet designed theirs.

Quite the opposite Im affraid, I've seen some horrific examples of case sensetive clear text storage between tables. Needless to say its been homebrew niche applications that I've converted to our system.

I've also seen it done by countless people in the "I know Access, its simple to do databases" - crowd.

Proper database design is uncommon in the real world :)

Patrik
Wednesday, June 11, 2003

More specifically, Normalisation is a formal method for ensuring that there is no duplication within a database.  It is based on the Relational Model, hence the name Relational Database.

Wikipedia has a good explanation:
http://www.wikipedia.org/wiki/Database_normalisation

Ged Byrne
Wednesday, June 11, 2003

The number of tables in your database should have no more effect on the GUI than it does on your wife's hairdo or the position of Mt.Fujii.

The whole point of the relational model is that it is independent of the implementation.

The answer to your queries has been given by another poster. The GUI should be either a Windows form or a web form, and that form should be based on a View (or a Query as it is called in MS Access). You should never have a form based on a table (much less use the table to enter the data directly). Even if you have a form that just enters one piece of data into a one column table you should still make a one column query in case you need to add something to the form later that won't be on the table.

As has been pointed out normalization increases the number of tables.

The relational model is there because it is the best model for complex data in the real world. How you set it up depends on the relations between the data you are dealing with, and the use you intend to put it to. If you have problems getting the relational model to agree with your real world data it is because you have developed a bad model, either because you misunderstand  relational modelling or because you misunderstand your real world data. It's like saying cameras don't work for taking pictures because either you suffer from delirium tremens and can't hold it straight, or because the light is so bad you can't take a good picture.

The reason that sample databases for the relational model are simple is didactic. The relational model was chosen despite the much greater demands it made on hardware at the time, as the mist effective system for large enterprise level databases.

Stephen Jones
Wednesday, June 11, 2003

You need to consider long term consequences when making this choice. Adding tables for fun may make coding easier now, but inevitably it'll make the product harder to develop for in the future,

ALthough it can cause more short term pain, in my experience it's better to normalize as much as possible and maintain the integrity of the data.

Yanwoo
Wednesday, June 11, 2003

. . . which as someone else pointed out may in fact increase the number of tables.

Agree with point on views from previous poster, UI design should be completely independent of dbase deisgn.

Yanwoo
Wednesday, June 11, 2003

While it would be nice if the number of tables had no effect, it is only so in relational theory.  In the real world, data models are based on the understanding of data, as known at the time, and how it will be used, often unknown at the time.  This is then complicated by the "how long does it take to get."

Given a model with 100 tables and 1000 rows each, a fifth normal form (5Nf) model is fine.  Given 1000 tables with some spanning 4 million rows, maintaining the relational model in 5nf is most likely going to cause you terrible response time, without big iron.

Many database vendors/consultants recommend keeping to 3nf if you expect/find performance issues.  So, if you have an online application, performance is probably your number 1 requirement.  [Ask the sponsor]  Then you can decide how much you can afford to be a purist.

If you are looking for the "best" as some pointed out, it is up to you.  More importantly it is up to the person paying for it.  And yes, you can expect it to read, I want to bring back any record, given any search criteria, in less than 1 second.  Then it is time to talk about reality.

Real examples:
A bank with 4.5 million accounts, 8 million addresses, and 51 million transactions.
A health insurance company 200,000 claims a day. Verified against 9 million subscribers, 300,000 providers, addresses for both, groups, group benefits, and procedures. 

Alan Willman (DBA)
Wednesday, June 11, 2003

Stephen and Patrick hit the nail on the head.  Normalization ensures no (or little) redundancy.  An easy way to do this is to look at the data a table will store, if there is a lot of repeating information, split it out into a different table.

That being said, there are times when you don't want a normalized db.  For example,  if you have a large amount of records (like 1 million+) that you are querying a lot.  It is less resource intensive for the db to binary search one indexed table than it is to join and search multiple tables.

This happened to me last year.  I agreed to write a system for a client that had millions of records.  The specs main requirement was NO DUPLICATES (for general user info. firstname, lastname, etc.).  I split the table into 2.  One containing general user info and the other containing the users historical touch points.  Unfortunately, this db was for querying for mailings.  So as you can imagine, queries took HOURS!!!  My mistake for listening to the client about what was most important in the system (although that's a discussion for another thread).

As far as gui goes, you should NEVER change your data designs based on what your gui needs.  That is what your application/business layer is for. 

shiggins
Wednesday, June 11, 2003

OK, I didn't mean binary search I meant b-tree search.  Sorry.

shiggins
Wednesday, June 11, 2003

Ask any developer what they prefer:

Bad data designs, and good code
Good Data designs, and bad code?

Of course , any developer will prefer a good data design over good code. The above question is a bit loaded, since I can’t imagine an application with a first rate data design, and bad code! (but the reverse is actually common!).

As developer walking into a unknown application, we need to understand the data design before any code can really be modified with confidence.

For sure, a more normalized design is better, but it also can cost more up front. In many cases, more normalizing will just increase the cost of the application with little benefit.

For example, how many people split out the City field of  a typical customer entry form (table) to another table? I have never written an application where I broke out the city field to another table. A normalized design says we should break out the city field (a repeating value) to another table. (this is getting to be 3NF)

The problem here is that this normalizing brings up few benefits, and causes a lot of code to be written:

Some the results of this issue are:

** During data entry of this simple customer record, you must build a combo box, or some means to select the City from the city table. This increases the cost of building what should be a simple customer form.

** If city is NOT in the current city table, you have to add in the ability to your GUI to allow the user to add a new city during data entry.  Since users can freely add new cities, then the data design does nothing to prevent bad, or wrong city names being entered into the system. Worse, to correct those wrong city spellings in a “batch” mode now requires the search/replace code to deal with two tables.

** To print things like a simple mailing label, or even a merge to a word doc you are using a relational join to grab the city name.

So, you can see that any code that now deals with the customer will incur the cost of dealing with an extra table.

The list of benefits that his extra table gives to a typical applications is rather small, and the code cost is great.

On the other hand, if this is direct mailing system, and we don’t want to allow the operators to enter new city fields, then we do need the ability to PREVENT operators from entering a city field.

Having said all of the above, there tends to be real beautiful balance between a good data design, and a good GUI.

If the design is normalized, then you have lots of ready made pick lists for your report screens. In fact, in the following report screens the user NEVER has to actually type any values in. This is result of normalized data. I don’t even have to write any code to check for invalid input, since the user can’t make a mistake.

Those sample screen shots:

http://www.attcanada.net/~kallal.msn/ridesrpt/ridesrpt.html

Due to lots of table normalizing, then those reports screens are a breeze to build. I have tons of pick lists ready made from all those extra tables.
Due to normalizing, the SQL query often needs a “id” in place of the actual text descriptions, so once again those forms allow the user to select the text value, but pass the “id” value for the SQL used.

While one can argue that GUI and tables are seperate issues, they really feed each other in a a application.

There is real dance of a harmony between a good GUI and tables.

Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
http://www.attcanada.net/~kallal.msn

Albert D. Kallal
Wednesday, June 11, 2003

"I can resolve a UI problem (whereby I have more and better UI design options at my disposal) by adding or creating additional tables (while still being able to normalize the database). Hence, the question arises as to where's the tradeoff or what's the real determining factor in deciding how many tables to use?"

The UI and database are two separate, distinct entities and should be treated as such.

You write an SQL query that retrieves the data from the database, either directly or through a view, and stores it in program variables.  The variable is normally a recordset, or if you are using ODBC you can bind individual variables to each field.  These variables are then used by the interface elements to display their value.

Be careful when binding controls directly to database fields.  Don't get me wrong, this is very useful, in certain circumstances, but it should not influence the way you design your UI or your database.  This type of binding is mainly useful for static data.  For example, a drop-down combo box that dispays a list of state abbreviations.  (This is an easy trap to fall into when using Access forms in an Access database or even using VB/C++ controls.)

Once you understand that, then you may be able to normalize your database design.  Of course, as mentioned in earlier posts, there are tradeoffs between the different normal forms, mostly speed and to a lesser extent maintainability.  (If the normal form is broken correctly.)

Dave B.
Wednesday, June 11, 2003

You start denormalizing when performance is an issue, but Marty was taliking about an app for his personal use as a housing inspector. Not exactly Oracle stuff. And how many people even know what 4th and 5th normalizations are?

Data warehousing is a different ball game, and I was under the impression that what was being done was creating another denomralized table, maybe to be stored separately, for the purposes of doing spontaneous searches.

Correct me if I'm wrong though; this is way out of my league.

Stephen Jones
Wednesday, June 11, 2003

4th = prohibits multivalue dependencies from existing between columns

5th = if a table has 3 or more foreign keys is should be split out.

:)

shiggins
Wednesday, June 11, 2003

To answer Ged, agreed that adding tables and normalization will ordinarily go hand in hand as adding tables is usually done in the process of normalizing the database (I was either unclear or you misunderstood my post). To answer Stephen Jones, 'If you have problems getting the relational model to agree with your real world data it is because you have developed a bad model', this is not always true. An example would be a relational model that works Great until I have to change a Clients billing address in which event one might fail to realize that the old billing address reflected on previous invoices may be changed in the process thereby reflecting the new one. If one hasn't thought about this (in the real world), this could have ramifications. As for database normalization, Ged asks 'What do I mean'. Simply put, normalization is the elimination of redundant data from each table in a database. Furthermore, there are five forms or stages namely, first normal form through fifth normal form. As the last two are seldom used, let me define the first three. In order for a table to be considered normalized to first normal form, each of its fields must be completely atomic (only one element of data), and must contain no repeating groups. In order for a table to be normalized to second normal form, each of its non-key fields must be fully dependent upon the table's primary key including each field in the primary key when the primary key consists of multiple fields. Third normal form implies that each of the table's non-key fields must be fully dependent upon the table's primary key and independent of each other. So, aside from meeting the qualifications of second normal form, each non-key field in a table must be independent of the other non-key fields. Hope this is clear and clarifys 'What Normalization means to me'. Thanks to all for your replies.

Marty Potokar
Thursday, June 12, 2003

Marty,
            The question you raise about changing the clients billing address is one that is dealt with in all the  elementary tutoriials for database design. It's so obviouls that you should be thinking about it at the very initial stage. It's precisely an example of what I am saying - the problem comes from having a relational model that doesn't reflect the real world; clients change billing addresses.

            Work out the problem; if the billing address is not normalized then you have to go to each outstanding invoice and check it manually. If it is normalized, and you haven't made any provision for the old and new billing addresses to co-exist, then what happens is that the paid invoices will now reflect the current and not the historical billing address. That does not seem to be too much of a problem, particularly as you will normally be keeping snapshots of the invoices actually sent out to the clients. You can solve the problem by having a drop down list of billing addresses, and then letting the correct one be chosen for each record. Then when the new billing address comes round you can change the one that shows on the invoice according to the date on the invoice.

              How you normalize your data depends on the real world; plenty of times you are going to get seemingly "redundant" data that is not necessariy redundant.  A person's sex is unlikely to change over his lifetime, but his weight range is, so you wouldn't want to do cascading updates for the weight range. You might even decide that it would be simpler to have it typed in. Equally first names and last names; very few people ever do a normalized list of first names and last names (I will probably be doing so for a database but I have special reasons), yet you see the same first name in many records, which is a technical breach of normalization.

                          A databse will only have part of the real world in it. When there are shifts that haven't been planned for then you have a problem either way. The relational model ensures that this happens less often than otherwise.
             

Stephen Jones
Thursday, June 12, 2003

Marty,

I've come accross the problem you refer to: changed billing address.  It comes from people applying the stages of normalisation a little too mindlessly.

My understanding now is that you are talking about the amount of databeing stored in real terms, rather than the count of tables in the schema. 

Normalising leads to a reduction in the amount of data being stored by eliminating duplication. 

Have I got it right this time?

Ged Byrne
Thursday, June 12, 2003

Thinking on the Address problem I think the best design is to have an address table, which can contain all of the addresses used by a user.

I like indicate in a link table the type of address, so that there is no need to duplicate if the invoicing and billing address are identical.

Experience using Amazon seems to indicate that Amazon use this approach.  They remember every address I have ever used and enable me to select from them when ordering.

I wonder if Amazon actually use a relational database, or do they have some monster hierarchial thing used for performance?

Ged Byrne
Thursday, June 12, 2003

Why should Amazon be worried about performance?

You denormalize when you are doing lots of spontaneous queries that go through all the data. All that is happening with Amazon is that lots of people are making simple concurrent requests on indexed fields.

Stephen Jones
Thursday, June 12, 2003

Albert, why split out city you ask??

There are many reasons one might do this, the obvious one being the ability to report, say sales per city.

You might also use the city field to influence business logic in another part of your application.... eg. use bike to delive to all in London and courier everywhere else.

A lot of country's have post codes to link up to GIS systems, but a lot  don't, and the city becomes the obvious field.

If anyone can type Gloster, instead of Gloucester (pronounced like the former), then your database breaks down.

The problem though, as covered in numerous threads, and Joel's articles, is the risk of over engineering....

It is a fine line between a data model that works today and is extensible, one that will only work today, and one that does not quite work  today but covers an infinite number of possibilities.

tapiwa
Thursday, June 12, 2003

*  Recent Topics

*  Fog Creek Home