Fog Creek Software
Discussion Board




Table normalization

I think that is the right word... I have an enormous Excel table (like 2000 x 3000), with tons of redundant data.  If I separated it out into separate tables, it would probably be about 1/8 the size.

Is there any algorithm or tool which can help me split this into logical tables?  Maybe I can import it into Access and run some nice tool that will split it up or identify the most common redundancies?

Roose
Monday, February 02, 2004

I remember Access used to do that, I suppose recent versions still do. It should be in tools/analysis or something like that.

gunga
Monday, February 02, 2004

Slight tangent, what version of excel was that?

the reason I ask is that my Excel 2000 seems to be limited to 256 columns, and it is a limit I've been hitting against for a while and getting fustrated.

i like i
Monday, February 02, 2004


In MS Access 2000 from the menu try Tools | Analyze | Tables

Is that what you are looking for?

John Ridout
Monday, February 02, 2004

The database analyzer in ms-access is good for splitting out and normalizing repeating data. This means:

Name        FavorateColor            FavoratePop
Joe            Blue                            Pepsi
John          Red                              Coke


In the above, ms-access will split out the above data tables to 3 tables, create the relations, and joins.

So, for repeating data, it is  great tool. If you have a database full of repeating names, and invoice numbers, the wizard will split that data out to a main client file, and a child table with each invoice. Quite a handy tool.  Typically a lot of data from a spreadsheet is entered this way.

Customer Name              InvoiceNumer            InvoiceAmount
Joe                                  123                            200
Joe                                  124                            155

So, the wizard does get you close to 2nd normal form.

However, often stuff coming from a speardsheet un-normalized with repeating fields! So, you typically get:

Name    FavoreColor1          FavorateColor2          FavorateColor3
Joe        Red                            Blue                          Green

In the above, we are not even 1st normal form! However, of course again we need two tables. One for the name, and then  a child table for the list of favorite colors. To fix the above, you need to cobble together a bunch of append queries (the table wizards do NOT fix the above for you).

However,  once you get a few quires written and going, the process of splitting out that data to another table is not bad at all. (it starts to go quite quickly).

You simply make a append query to move FavorateColor1 to the tblFavorateColors table (you move both the color, ad the key id to that child table). You then change the query to append color2, then color3. You can then delete the FavoreateColor1, 2, 3 fields.

Interesting, but then you can now run the table analyzer on that new colors table to split out those repeating values if you wish (that gets you to 3rd normal form!).

Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
http://www.attcanada.net/~kallal.msn

Albert D. Kallal
Monday, February 02, 2004

Great!  Thanks a lot guys, that is exactly what I was looking for.

Roose
Monday, February 02, 2004

Oh and I might have been exaggerating, I think it is actually maybe 100 columns by 2000 rows or something... I don't think we hit the 256 column limit.

Roose
Monday, February 02, 2004

*  Recent Topics

*  Fog Creek Home