Fog Creek Software
Discussion Board




Reverse Engineering Database Design

I'm currently working as a tester on a project that uses a SQLServer database composed of around 500 tables.  Unfortunately, the database was "designed" using a proprietary in-house developed tool that generates the DDL from a Rational Rose object model.

I've been charged with devising a way to populate a test database with enough "real" data to do a load test, for performance reasons.  My problem is, without a database schematic, I don't know where to start (I should mention I don't have access to the Rose model since I'm "only a tester" and therefore don't have a need to see the object model).  I've tried using the reverse engineering tools in Visio and the SQL Enterprise Manager, but a DB diagram with 500 tables and their relationships is unprintable, and unweildy to work with on a screen.

Does anyone have a process or methodology for reverse engineering databases, and determining if and how the tables can be logically grouped by relationships (I'm hoping there are smaller subsets of inter-related tables that are easier to work with).  I tried asking our developers for help, but (1) they're pressed for time with a new release and (2) they don't know the db tables or relationships since they don't write the SQL (it's also generated by the proprietary tool for them).

Any ideas would be greatly appreciated.  Thanks :-)

Frustrated Tester
Wednesday, December 04, 2002

Never mind the reverse engineering. Tell them that if you can't access the object model, you can't test the database. Period.

The "only a tester" attitude always gets my hackles up.

Martha
Wednesday, December 04, 2002

Have a look at
http://www.embarcadero.com/products/products.asp
I believe all of their stuff has a time limited full featured eval downloadable.

Just me (Sir to you)
Wednesday, December 04, 2002

I would try to get access to the queries being used for reporting and business logic. If you can get a hold of those, you can make the valuable connections between the tables by analyzing the way the tables are married together. Moreover, you'll have examples of what the logic looks like and thus more of the application's viewpoint.

If you can't get a hold of the SQL, try analyzing the table names. When you see names that you think may be related, dig down and analyze data similarities. Columns may not necessarily be named the same, but they WILL contain or join on the same data.

These suggestions will by no means give you a view of the database as a whole unless you pour countless hours into it, but they're probably your best bet if you can't purchase a tool. Once you have some understanding of the model, the huge printout from your analyzer will become much more familiar and manageable to you.

Sorry I couldn't be of more help.

BTW, what tool are they using for the SQL?

Dustin Alexander
Wednesday, December 04, 2002

Dustin,

Thanks for the reply.  The DBMS is MS SQLServer 2000.  I have no idea how the application's persistance layer communicates with the database, but I think it may be ODBC.  The persistance layer is about 90% dynamic SQL calls, where the SQL is generated by a proprietary "query builder" component.  The applications provide parameters to the "query builder" which creates the dynamic SQL.

Frustrated Tester
Wednesday, December 04, 2002

Check out Win Design. It automatically groups tables (with a fair bit of success).

There is a demo version to play with before you commit.

http://www.win-design.com/En-principal.htm

tapiwa
Wednesday, December 04, 2002

This won't group the tables but you can get a list of tables with something like this

select name from sysobjects where type = 'U'

tables and columns

select o.name,c.name from syscolumns  c, sysoobjects o
where o.id = c.id
  and o.type = 'U'
order by o.name, colid

data types can also be added.

save the output to a file and load in to excell

Cavet this works on Sybase and earlier versions of Sql Server but I have not worked with the latest versions of Sql Server.

John McQuilling
Wednesday, December 04, 2002

I assume that you've told your manager that certain pieces of information would vastly improve your productivity. Or, you could chat with the development engineers and ask for a whiteboard session going over the DB design.

Unless your company is disfunctional, someone in the management chain should make sure you get the information you need to do your job.

Julian
Wednesday, December 04, 2002

You can also try Sybase powerdesigner 45 day evaluation - test data generation is built-in.

You can organize your model into however many diagrams you want.

Your real nightmare is when the model lack basic ref integrity like foreign keys.

Karel
Thursday, December 05, 2002

Microsoft Visio Modeler 3.1 will reverse engineer the database to both a logical model and if you know how to use it, a conceptual ORM model.

Its a free download from Microsoft.

Visio Enterprise will do it as well, but its a lot more expensive.

Simon Lucy
Thursday, December 05, 2002

Julian: "I assume that you've told your manager that certain pieces of information would vastly improve your productivity. ....

Unless your company is disfunctional, someone in the management chain should make sure you get the information you need to do your job."

and to Martha's comment as well (thank you for it)...

meandering a bit off-point here, so it'll be short, but this sort of stuff happens a lot to QA/QC folks. Expected to test mysterious systems nobody has documented; or that only the developers know the designs and requirements for, and because they're all-holy "developers", can't be 'bothered' to provide us low-lifes with basic info we need to do our jobs. And "dysfunctional companies"... you'd be surprised how many there are out there (well, maybe you wouldn't be), that will spend thousands at the drop of a hat on development tools, but nary a dime on tools for testing or QA/QC. I've had people look me straight in the face and ask me 'what do you need to see the database for?' (to tell whether your 'effing code works or not, you freakin' idiot! that's why!!). Nobody balks at making sure developers get to conferences and training, but the QA/QC folks are lucky to be able to get home at night, much less sent to any professional development training. Damn but it gets frustrating sometimes!

I'll tell the audience of developers on this group one thing -- if you all as a team of developers are having a meeting where you're discussing the externally observable attributes and behaviors of the system, especially if as a result you may change them, the second thought you need to have is "wait a sec, let's get Joe[Sue] from QA in on this so they'll know what the hell's going on." Depending on the situation, people and your shop, include them in discussions of lower level details as well.

ok. gonna take the rest of my medication now and calm back down...<g>

anonQAguy
Thursday, December 05, 2002

Sounds like quite a fun job you have ahead of you.

I am a PM on a medium-sized (approx. 12 man year) project and personally, I think it boils down to:

1.  You can't test the behavior (from the end-user's perspective) without the requirements.
2.  You can't test the low-level behavior ("under the covers") without the detailed design documents.

But I too, digress. This doesn't solve your problem of loading up the database with sample data. For that I second the "grab someone and get to the whiteboard" idea.

You need to find the "holier than thou" person who actually knows what's going on behind-the-scenes, stick them in a room with you, and have them explain where the data comes from for each piece of functionality you're supposed to be load-testing.

Dave
Thursday, December 05, 2002

Can we get an AMEN!

I'm a developer, and I happily admit that my productivity and the quality of my systems greatly improves when I'm working with a good QA team. Give the QA people everything they need - they'll take care of you!

Chris Tavares
Thursday, December 05, 2002

Frustrated --

though this doesn't help much for the reengineering portion of your problem, and these days companies have less money for equipment, but if you guys can get hold of an e-size plotter, it sure makes outputting the db when you do have a model you can access a lot better.

we have two big HP plotters at the office, we use ErWin to print from, and are very grateful to have them.

something to put on the company wish-list, I'd recommend, anyway.

best of luck,

cheers,

anonQAguy
Thursday, December 05, 2002

oops:  s/reengineering/reverse engineering/
-damn.

anonQAguy
Thursday, December 05, 2002

"...this sort of stuff happens a lot to QA/QC folks. Expected to test mysterious systems nobody has documented; or that only the developers know the designs and requirements for..." Yup, don't I know it. When I first started testing at my company, I described my job as "testing unfinished software of an unknown version and undefined specifications on nonexistent test computers -- oh, and can I have that by yesterday?"

I've since passed on the testing job to someone else and gone back to database land, but I still get annoyed when engineers expect testers to be mind readers.

Martha
Thursday, December 05, 2002

tapiwa:  Thanks for the link to Win'Design.  It did a nice job and reverse engineering the database and partitioning it into more managable chunks.  Now, to convice management to pony up the money for a license.

anonQAguy:  We also have an HP plotter, which I started to use, but it didn't really help (diagram looked like a bunch of flies caught in a spider web).

All:  Thanks for all your input and suggestions, I really appreciate it.  Hopefully, I'll be able to complete this assignment and make my management happy.  If not, anyone instested in hiring a performance engineer with 8 yrs experience coding and 4 yrs experience performance testing?

Frustrated Tester
Friday, December 06, 2002

I am always amazed when I walk into a company that has a fairly complex system, and are using SQL server.

The tables and ER diagrams should have been already laid out. If you have to only put a enough tables on one page, and then space things down, so the next set is on the next page…so be it. In most cases, the grouping of tables can be placed on the same page.

Here is a example of mine, and note how a “bunch” of tables are moved to the 2nd page (well, when it prints it uses two pages). The following is from ms-access. You can see in the following diagram that the 2nd page is near full, and thus I would start a 3rd page.

That link:
http://www.attcanada.net/~kallal.msn/Articles/PickSql/Appendex2.html

The above was designed using Visio, and that diagram can be found at:
  http://www.attcanada.net/~kallal.msn/Articles/PickSql/Appendex1.html

The idea that the basic designs and layout of the tables is not clearly, and readily available for you is a TOTAL shame.

When you ask any developer what they need to work on a data centric project, the first thing the developer will ask:

    Show me the data!

Thus, does an developer prefer:

    1) rotten ugly hard to read code, and good data structures?
or
    2) real nice code, and horrible data structures with no docs?

99 out of a hundred times, a developer will pick the rotten code as it is MUCH BETTER (hence, most will prefer option #1).  With a good data design and docs, the code practicality writes it self. You cannot write code, or modify ANY system without a working knowledge of the data structures. It is always a question of “show me the data”.

If those folks place data on SQL server, then the diagramming tools and ER stuff should be used that SQL server offers. Are not the options to enforce RI being used here? Any time, and any day ANY ONE who needs a table layout diagram should get one. These things need to be given out like free pencils. It is top dog as far as docs go.

The data structures and good quality data layouts is near the top when I evaluate a company, and how well the IT department is being run.

I recently on one of my many overseas trips had the chance to evaluate a company’s system. I was shocked, as they had been developing software with VB and SQL server for several years now (they tended to only have 2 developers on staff). I asked the developers for some of the table diagrams from SQL server. The developers response was that we do not use those tools! In fact, they could not provide me with any diagrams!!!

Needless to say. I was shocked. On the other hand, that developer does not work there anymore.

Tons of business rules and table enforcement stuff was actually being done in code. What a complete mess!. The code actually looked very nice, but trying to figure out how to add, or modify the system was impossible! You could not even tell if it was safe to delete a record, and have orphaned records all over the place. Even worse was you could NOT figure out how to even add a single record! ( since other tables * might * be involved). Tons of table RI stuff was being done in code, when the server should be doing this stuff.

To not use some good quality ER and diagramming tools when it comes to building a database is simply nuts. You need quality documents as to the table layouts.  Most quality databases allow both the diagramming and table enforcement rules to be made at the same time. This is ideal, as it solves many problems, and moving rules out of code is certainly one of them. At the same time, you get a quality document (diagram) for all to use. Years ago, diagramming tools for database stuff was big bucks, and only a select few companies had this ability. Now it is standard fair (even desk stop databases like ms-access have this ability.

* Now lets all be honest here *:

    During the development of a product all kinds of documents RAPIDLY get out of date. Much of the functional document/specs for software is OFTEN not updated to reflect the software system (even modifications to screens generally does not make it back to the functional spec, or the original screen layout doc). This is a fact of life for most companies. In other words,  when the application is live, then little effort is made to update the specs, be they functional specs (which is used to write user manuals etc), or technical. The updating of technical specs rarely happens when everything is running.

Of course the above process occurs since the software its self does in fact become the design document. If you need to learn something about the product, you will go to the software source code. Further, many companies are not near lucky enough to even have technical specs! So, I am NOT going to shoot down a company just because they don’t have great documentation. This is just a fact of life, and everything is not always ideal dispuite our best wishes.

The reality here is that the software it self in fact becomes the design document. After all, that is what software is: a design document for the computer to convert it into executable code.

Ok, so I think most would agree in that in the real word, much of the design documents don’t get maintained when the system is in use. This is especially the case for in-house developed software systems.

However, the BUCK STOPS at table designs! There is where I draw the line!! It is also where companies must draw the battle lines!

    Table designs and good diagrams must be maintained somewhere!

If you force the design team to use a existing set of tools (such as the enterprise tools for sql-server), then you ALWAYS should/must have an up to date data diagram (in fact, by the mere fact of using these tools, you will ALWAYS have a good ER diagram).. This is critical, since most developers (and users) will NEVER go back to the original design documents (they will collect dust).

However, on a REGULAR bases, the table documents WILL BE USED!! Diagrams will be used when you need to find field names for a query. They will be used when you simply need to know what field is to be used for a join. They will be used when you have to write a small piece of code to update a file, and you are not sure what other tables might be involved (if you don’t know this, then you can’t write the code).

The table diagrams will also be needed every time you bring in a new developer. If the developer is experienced, learning the code will occur quite rapidly, but  the table layout is going to be #1, and worth it’s weight in gold.

Simply put, you can’t write software if you don’t know the table layout.

In addition, the lack of nice clear table layouts will cost the company a bundle as programmers and DBA’s try to use the data. They will stumble in the dark, and waste huge amounts of time as they try and figure out how the data is laid out.

This thread, and the need to understand the table layout is a perfect example of this fact.

Anyway....don’t let me evaluate your software system, and not find any table layouts!!. If no table designs are available, you will get very poor marks indeed. Developers will in fact get fired over this one!!!

Albert D. Kallal
Edmonton, Alberta Canada
Kallal@msn.com

Albert D. Kallal
Saturday, December 07, 2002

I would also second the notion of checking out Embarcadero's ERStudio.  This tool actually throws out a half decent ERD, which can be tweaked in a few minutes.  I would avoid ERWin, for many reasons, but mainly because it just pukes all the tables out in a pile for your ERD.  You then have to have someone manually spread them out and arrange connecting lines.  Yuck!

Also, TOAD generates pretty good scripts for populating and/or modifying data in the tables. 

We used these two tools, along with Rational Rose, and had fair success in the reverse engineering process. The only caveat I will add is that we used Oracle, as opposed to your  MS SQLServer.

Good Luck!

Drew Lee
Thursday, August 19, 2004

*  Recent Topics

*  Fog Creek Home