Fog Creek Software
Discussion Board




basic db question

what's the general feeling on a table for states? something like?

CREATE TABLE if not exists States (
state_id  int(11) not null primary key,
state_name varchar(25) not null,
state_abbrv char(2) not null
);

or is the id redundant, since name and abbrv are already unique. i guess i just always like to have a row id.

is a table even necessary? couldn't we just define a constant since the state data seems pretty constant in the US?

not a db guy
Thursday, January 22, 2004

Look up ISO 3166. It is primarily a international standard for country code, but there are substandards covering states.

Dennis Forbes
Thursday, January 22, 2004

"couldn't we just define a constant since the state data seems pretty constant in the US?"

But what if Puerto Rico becomes the 51st state? Or Iraq?

:-P

Full name not required
Thursday, January 22, 2004

Or Sedgefield?

Steve Jones (UK)
Thursday, January 22, 2004

The postal abbreviations for states are unique and short (2bytes). Why add an artificial key (that is 11bytes)?

njkayaker
Thursday, January 22, 2004

Are substandards worse than regular ones?

C Rose
Thursday, January 22, 2004

Check out item #3 in this article:

http://www.intelligententerprise.com/001205/celko1_1.shtml

m
Thursday, January 22, 2004

well, here's the current implementation i was handed...

CREATE TABLE if not exists States (
Field1  CHAR(50),
INDEX Field1 ( Field1 )
);
INSERT INTO States VALUES ('AL');
.....

anything has to be better than this, right?

also, do US territories have 2 character postal codes? the cost  of an "id" seems small (11-2=9;9*50=450 bytes?).

not a db guy
Thursday, January 22, 2004

If you have an intended purpose for the ID then you obviously need to include it.

Consider your overall design and determine if an ID is necessary.

If you are using Visual Basic for the front-end, for example, and you are displaying the state abbreviations in a combo box, having the ID in the table makes sense because you can now use the ItemData field, a field of type long, to associate each state with its ID.

A down side to including the ID is that is may make queries asking for information about the entities associated with the State more complex.

Only you can determine if having the ID in the table is beneficial for your situation.  It requires a little analysis of the situation and the context in which it will be used.

Dave B.
Thursday, January 22, 2004

****DOUBLE-PLUS UNGOOD****
CREATE TABLE if not exists States (
state_id  int(11) not null primary key,
state_name varchar(25) not null,
state_abbrv char(2) not null
);

****GOOD****
CREATE TABLE if not exists States (
state_abbrv char(2) not null primary key,
state_name varchar(25) not null
);

MR
Thursday, January 22, 2004

as for the celko article, he says
"The next stupidest thing is using a prefix that repeats the table name in front of the data elements."

well, in code, i'd rather look at

resultSet['group_name'] ... resultSet2['user_name']

rather than

resultSet['name'] ... resultSet2['name']

not a db guy
Thursday, January 22, 2004

Use the state abbreviation as the Primary Key - it fits the definition perfectly. It also makes the address table more readable (and may eliminate a join in some cases).

When there is an abbreviation or code that is required to be unique, think about using it as the PK instead of inventing one.

Philo

Philo
Thursday, January 22, 2004

"well, in code, i'd rather look at
resultSet['group_name'] ... resultSet2['user_name']
rather than
resultSet['name'] ... resultSet2['name']"

I immediately question the naming of "resultSet1" and "resultSet2"...  what's it a set of?  You'll see that proper naming of your resultsets make that issue go away pretty quick.

Almost Anonymous
Thursday, January 22, 2004

"When there is an abbreviation or code that is required to be unique, think about using it as the PK instead of inventing one"

That is fine but if there is even a remote chance that the codes will change then you should create a numeric PK which has no relation to the code.

DJ
Thursday, January 22, 2004

not a db guy,

I write SQL code all day and, although I prefer to design as he suggests (e.g. without table prefixes) I really can appreciate it under certain circumstances.

For example, when you join tables together and the join condition is the same name, you end up with:
SELECT *
  FROM foo
INNER JOIN bar on bar.ID = foo.ID

Sure, you can alias, but then you end up with nonsensical aliases, like:
SELECT *
  FROM foo a
INNER JOIN bar b ON a.ID = b.ID
INNER JOIN baz c ON a.ID = c.ID
etc. etc.

Also, if you have two tables like this (this is a contrived example; I know you would have date_of_birth or somesuch and calculate age on the fly):
Person( SSNo, Name, Age )
Car( VIN, Make, Model, Age )
Person_Car( SSNo, VIN )

And you want to get a person's age and their car's age then you end up with:
SELECT Car.Age AS CarAge, Person.Age AS PersonAge etc.

Due to the third-party application I’m forced to work with on a daily basis I really love table prefixes.  Why?  This application uses Oracle forms and when you want to figure out what column of which table is supplying your data you can ask the forms manager “What information do you know about this field?” Unfortunately, it will only give you the column name.  So, if you have the table name prefixed you immediately know what table and column the data came from.  Yes, I hate this, and no, there’s nothing I can do about it, although I wish I could.

MR
Thursday, January 22, 2004

Celko is a really smart guy, especially in SQL, but he is really over the top -- I mean proclaiming that entirely subjective decisions such as table or field prefixes are "stupid" is arrogant and, well, stupid.

Dennis Forbes
Thursday, January 22, 2004

"if there is even a remote chance that the codes will change then you should create a numeric PK which has no relation to the code."

Um, is there some underhanded conspiratorial plan in the works to change the abbreviations for the US states? Maybe switch to a letter and a number, and all the states that begin with the same letter get to have a little war to see who gets number 1?

IOW, use the abbreviations as the PK.

(Caveat: don't then try to repurpose the field to also store Canadian provinces or UK counties or whatever.)

Martha
Thursday, January 22, 2004

There is no question Joe knows his stuff. He is the ANSI SQL master and knows every SQL trick in the book. Just as implementations of RDBMS are not fully ANSI compliant, so take the advice offered.

m
Thursday, January 22, 2004

"Um, is there some underhanded conspiratorial plan in the works to change the abbreviations for the US states?"

I know it's not gonna happen in this case, but I have been burned too many times by "don't worry those codes will never change"

DJ
Thursday, January 22, 2004

Well you just have to use a little business sense.

Building Contractor's homegrown materials codes? Will change annually
ANSI Materials Codes? Pretty safe

Joe's Invoice units of measure? What day is it?
X12 Units of Measure? Fixed in stone.

etc, etc.

Philo

Philo
Thursday, January 22, 2004


Eh, I've been burnt too many times. I prefer to err on the side of caution and always have keys that have no meaning outside the database because some joker will always come along who wants to make changes to the external meaningful keys.

Now, I admit, that doing this for US state abbreviations is probably overkill. The odds of those changing are very close to zero. But I think that there is something to be said for consistency. If 90% of my tables have application meaning only keys, there is some advantage to treating the other 10% of the tables the same way even if you don't have to.

Bill Tomlinson
Thursday, January 22, 2004

So I'm the only guy tired of digging around inside the database and having to say "ok, what's state 04?" or "Which type allowance is 21, and why isn't it working right?"

My whiteboard is littered with notes to myself about FK's. Here's an example I wish I hadn't done:
DocType
1 - PO
2 - Inv

I wish more than anything that I had just made the document codes (810=invoice, 850=purchase order) the primary keys. It's an x12 standard and has been for decades - it's not going anywhere. If I'd used the codes instead of arbitrary autonumbers, I would have saved a ginormous amount of joins and lookups.

I was a viciously zealous defender of "no meaningful primary keys" until I did a lot of x12 work. That made me rethink it.

Philo

Philo
Thursday, January 22, 2004

The debate on whether to prefix the column with the table name misses the point. A column should be named after the Entity it represents (StateID, PersonID, etc), it doesn't matter what table the column is in.

It should be obvious when you are relating columns or performing other operations on them that the entities go together.

Tom H
Thursday, January 22, 2004

Slightly offtopic and not at all implying that "not a db guy" is doing this, but:

As someone outside the US, it really, really annoys me when a site insists I pick a state as part of an address, and the only options I get are a fixed list of American state codes.

If I'm filling out an address simply for site registration details, you're gonna get bogus data.

If I need to fill out the address for a purchase, I'll buy elsewhere, thanks very much.

Andrew Lighten
Thursday, January 22, 2004

the post about wanting meaningless "id" keys in the "other 10%" is one reason i like them.

another reason is for HTML select elements. granted, with states, the postal code would work just as well, but in many cases, it's stretching things to have:
<option value="Space Rocket 4">Space Rocket

tablename prefixes are also beneficial for code generators.

not a db guy
Thursday, January 22, 2004

Philo,

It could be worse, we had a guy who wanted to use GUIDs for the independent keys (his thinking being that it's much easier to just generate a new GUID than having to write a incrementing key library (because we wanted database independence)).

So it would have been: "So what state is
{936DA01F-9ABD-4d9d-80C7-02AF85C822A8}?".

Of course, he changed his mind after a suitable beating.

Bill Tomlinson
Friday, January 23, 2004

GUIDs are nice if the key is artificial (i.e. where there isn't something unique and convenient like a "state acronym") and you may have the need to transport clusters of data between databases -- for instance I have a reporting application where I extensively used GUIDs because the reports and language elements (i.e. resource strings) are copied wholesale between databases, and they are referenced by some outside-of-database items (like XML documents). The same scene played out in a team where there was 10 developers all developing on their own database, adding descriptors, lookups, etc, and then when the merge came they'd find that someone has the id that they are linking to, so they start some cascade of changes. i.e. GUIDs are nice for merges. Mind you another developer rightfully pointed out that we should just all skip having local autonumbers, and use a central code release system which is also very useable.

Having said that, I worked with one wanker once who insisted upon GUIDs for the sole reason that they were more "Enterprise" ready -- it's a bigger number, therefore it's a better choice. That's a really poor reason for using GUIDs.

Dennis Forbes
Friday, January 23, 2004

*  Recent Topics

*  Fog Creek Home