Fog Creek Software
Discussion Board




Data Types Chosen On Roulette Wheel

Have you ever inherited a database in which the column data types seemed to have been chosen at random?

LOL, luckily I'm already insane...

GuyIncognito
Thursday, July 01, 2004

No, but had one where everything as a VARCHAR.

Egor
Thursday, July 01, 2004

I'd sure like to see some examples....

Yoey
Thursday, July 01, 2004

I would not! I would like to stay sane.

Gary van der Merwe
Thursday, July 01, 2004

OK...

A column named 'YES_NO_SOMETHING_SOMETHING' thats an integer data type.  I may be a bit hasty, but I guess if something is gonna be either yes/no you only need 1 bit to indicate so.

Columns named 'SOMETHING_SOMETHING_COST', 'SOMETHING_SOMETHING_AMOUNT', and 'NUMBER_OF_SOMETHING_SOMETHING' set to varchar(12) for no apparent reason.  I've checked, and they all contain numeric values and according to the business rules should be numeric.

Columns named 'CHOICE_SOMETHING_SOMETHING' and 'CHOICE_SOME_OTHER_THING' with varchar(1) and bit data types respectively.

Another clue that the original author was a slob is the inconsistent naming conventions for columns in a single table:

MY_FAVORITE_COLUMN
another_column_is_here
CHECK_out_This_COLUMN

Like I said, I'm already insane so this doesn't do much other than make me chuckle.

Seriously, if you saw something slapped together like this would you put much faith in the application as a whole?  That's a rhetorical question... :)     

GuyIncognito
Thursday, July 01, 2004

I have seen a database where some integers were incorrectly typed as being floating point. They were things like number of parking spaces etc. You can't have half a parking space.

They also used strings to hold numbers. One shining example was where they used a VARCHAR to hold  either a Boolean value (as a string) and a number (as a string)!

The VisualBasic application that went with this Access database used Variants for virtually all the data types.

And yes the whole application was stupid at every possible level from the coding to the joke quality "design".

I got the feeling the clowns that wrote the system decided to reinvent the wheel, never actually worked out how to do the task, wrote 5% of the necessary code, did pointless timings on 5% of the code and congragulated themselves how clever they were and pissed about when they should have been doing the conventional programming parts of the application.

Savage
Thursday, July 01, 2004

I've seen it - and I even built one that way (when I was learning about databases in Access in highschool).  I can only imagine the amount of manipulation that must be occurring in the code to handle this data!  If you're able to sort everything out you could really reduce the processing time and program size (and make it more reliable, maintainable, etc).

Out of curiosity, how are you planning to attack this database - and are you also in charge of the program(s) that use the database?  Do you have any influence over them, can you force them to change if you change the database?

Lou
Thursday, July 01, 2004

Ouija Boards. Actually everything I do seems to have originate from a Ouija Board. Budgets, estimates, requirements, datamodels, everything orginally stems from the Ouija Board. 

me
Thursday, July 01, 2004

I've seen sql_variant.  Needless to say the programmers were doing the modeling.

brad
Thursday, July 01, 2004

We obviously work at the same company.

Brian
Thursday, July 01, 2004

Lou,

*LOL*  I'm hoping my new book 'Agile Database Techniques' can help me out... :)

Seriously most of what's here should probably be scrapped...

Thank goodness in addition to being insane, I'm young, single and naive.

GuyIncognito
Thursday, July 01, 2004

In defense of this:

One place I worked had ID fields for various things which were numbers, sequence numbers.

Then one day we had to change to include characters in the ID "numbers". It was a nightmare, changing schemas and code and forms interfaces.

frustrated
Thursday, July 01, 2004

There are so many places that not only don't care about producing quality, but that actually somehow believe it shouldn't be produced.  I have had people tell me that all varchar columns are better.  I have seen varchar ID's a billion times, where the ID is a number.  No autonumbering, no separate table numbering, just some number socked in there.  I've seen relational databases where there was one giant flat table that had everything, and then a bunch of tables that may or may not contain the right data -- all in all, having 40% complete data tops.  Ah, the joys of ignorant bliss.

Al A. Kazaam
Thursday, July 01, 2004

"A column named 'YES_NO_SOMETHING_SOMETHING' thats an integer data type.  I may be a bit hasty, but I guess if something is gonna be either yes/no you only need 1 bit to indicate so."

I stand by my assertion that anything that represents real world data should *never* be boolean. There is no real world situation that is or isn't - there's generally always a third option that will surface the day after you release the software.

Philo

Philo
Thursday, July 01, 2004


Good call Philo.

I am currently working on an application where a particular form was either "complete" or "not complete", so they told me to use a boolean.

Me, being the smart guy I am, called it a "status" field and had it reference another table for lookups.  Sure enough, two weeks later, they have a "pending further info" status.

They panic and start blaming each other for the bad information.  I add the value to the database and everything works with no other changes.

Viola.

KC
Thursday, July 01, 2004

rdbms and you got your third value: null, 1,0 or null

we had a nomailflag. an x meant yes don't send mail and a blank meant no I don't mind getting mail and there was no case for "we don't know yet".

me
Thursday, July 01, 2004

KC, you shoulda just let those ignorant management types slug it out. :-)

I used to be like you, but now I just do what I'm told, even though it usually goes against my better judgement. I'll simply chuckle and think thoughts questioning the person's judgment, intelligence, and sanity, without mentioning any edge cases. My future at the company is guaranteed because I'm not being confrontational, and life is good.

Nearly Nameless
Thursday, July 01, 2004

The YES_NO value being stored in an int field may be a legacy of SQL 7.  It couldn't use a bitfield in an index, so I've seen int and tinyint used instead.

So far as the idea of using a binary value in an index (they have rather poor selectivity), I won't go there...

chiph
Thursday, July 01, 2004

"They also used strings to hold numbers"


I do this all the time. My rule is simple

1) if it's not a serial/autonumber/identity/whatever-name-the-rdbms-has-for-an-auto-increment-type...

2) ...and if it isn't possible to make calculations with it...

...then, it's a varchar. Even if it will never have an alpha char in it (e.g., a phone number, or zip code)

If possible, I'll take the loss of efficiency in indexing, and the extra storage (especially, because I like to have some extra padding - 5-10 extra chars above the supposed max).

I've seen much too often the "yeah, we're changing these codes - they'll have alpha chars, from now on", or the "you know those codes that we said would never have more than 8 digits? Guess what?"

Paulo Caetano
Thursday, July 01, 2004

A bank's customer list where there was a YY in front of some of the surnames.

It turns out there used to be a bool field for "is customer a staff member", but this had got merged into the name field in an upgrade.

There was naturally a problem with non-staff customers whose surname began with Y, so the solution was to put YY in front of all the staff names so it wouldn't be confusing.

Martin Beckett
Thursday, July 01, 2004

I inherited some databases in which the cumbersome @@IDENTITY was deprecated in favour of having a 'most_recently_updated' column in every table. Every update would null all records in the column, and then insert a '1' to indicated it was the last to be updated.

I've little doubt that when I check tomorrow, this field will turn out to be a VARCHAR.

Thom Lawrence
Thursday, July 01, 2004

"I've seen much too often the "yeah, we're changing these codes - they'll have alpha chars, from now on", or the "you know those codes that we said would never have more than 8 digits? Guess what?" "

I am with you on this (of course you didnt mention binary, text, and the other special cases, but I know what you mean).

In general, I have been noticing recently that there is a curve of diminishing returns on increasing the optimization or "tightness" of a design past a certain point.

I don't know how to describe how to find this point. It's like you're sitting there, trying to figure out how to normalize some really intricate model and thinking of all the possible ways it could change and break everything. Then you realize you could fix it with one varchar() column or RelatedId or something.

And you just realize it will be better for everyone to just make it flexible and not *perfectly* sexy.

I'm sure these's a time and place for squeezing every last bit out of a design, but I'm learning to appreciate simplicity more and more.

Big, clunky, obvious interfaces kick ass.

Aaron Boodman
Thursday, July 01, 2004

That's why I use SQLite whenever I can.  Everything really is a varchar, even the auto incrementing id columns (and they really do auto-increment).  You can give it whatever data type you want in the create statement, it will still be a varchar in the database.

To some extent is doesn't matter too much for most applications, since the data that comes back to you is a character string anyway.

Clay Dowling
Thursday, July 01, 2004

Booleans being stored as ints might be due to the back end language. Not all  languages  are blessed with true Boolean datatypes and might use the “if it zero false, and all other numbers are true“ approach.

ITECS
Thursday, July 01, 2004

> Even if it will never have an alpha char in it (e.g., a phone number, or zip code)

Two interesting examples, because they really are *sequences of digits*, not *numbers*. (Of course, non-US postal codes do often contain alpha characters as well as digits.)

My initial reaction to the "will I ever do any calculations on it" criterion is that it makes sense -- you're unlikely to want to know the average zip code or to add two phone numbers. Now I'm going to see if I can think of any exceptions to that over lunch :-)

John C.
Thursday, July 01, 2004

In addition to phone numbers and zip codes, I also use varchar's for "numbers" which require leading zero's (fixed length employee ID's, last four digits of an SSN, etc.)  But overall, we have distinct data types for a reason.

Joe
Thursday, July 01, 2004

Before the Currency data type (which is just a fixed point decimal after all), it was very common to store financial values as integers and even as characters to avoid rounding errors.

But the 'every field is varchar' just strikes me as laziness.

I did have a problem with a DDL script recently where it created all the fields but through some incompetence of mine own some numerics were left at the default datatype of char(10).

But it was a matter of a few moments to put it right.

Simon Lucy
Thursday, July 01, 2004

Well let's see... a phone number could be anything like this:

(123) 456-7890
123-456-7890
1234567890
+44 1 012 345678

etc...

You don't need to do *any* validating on the front end - just let the user enter their friggin' phone number. What are you going to do with it after it's stored in the DB anyway??? Some CSR is gonna look it up 6 months later and dial it! So what?

Similar with address field - *one* multiline field is all you need for the *user* to enter their address - wherever in the world they live.

Similar with name - *one* field to store first, last, middle initial, nickname - whatever the *user* wants to enter.

Just deal with it! Let the report writers figure out how to do the stats.

Are you getting the picture yet?

FG
Thursday, July 01, 2004

Interesting point, FG, though I'm not sure I'd agree 100%. I bet if you asked a bunch of random users to enter "their phone number", more than a few would conveniently omit the area code. That can be a pretty serious oversight if they're not all in the same area code that you are.

John C.
Thursday, July 01, 2004

True, although you could likely look it up from their address.

Another thing, why do some sites want you to enter an email address twice? Why not any other fields? I just copy & paste the email address anyway.

FG
Friday, July 02, 2004

And another thing, what if the user has an extension. They should be able to type in

(123) 456-7890 ext 4321

in *one* field.

Doesn't any one speak for the users anymore? We are developing software for *users*, right???

OK </rant>

FG
Friday, July 02, 2004

"Another thing, why do some sites want you to enter an email address twice? Why not any other fields?"

Because if you get anything else wrong they can email you and ask what it was supposed to be?

Jack V.
Friday, July 02, 2004

The front-end does not need to match the back end (that's the point of the UI, right)?

Let them enter the phone number any way they want, but then break it up into different parts in the DB. Same with address -- how would you find who lives in a certain state/county without breaking the address up into parts?

Captain McFly
Friday, July 02, 2004

*  Recent Topics

*  Fog Creek Home