Fog Creek Software
Discussion Board




Database Version Control++

(Note this general topic has been discussed before:
http://discuss.fogcreek.com/joelonsoftware/default.asp?cmd=show&ixPost=18896&ixReplies=5
http://discuss.fogcreek.com/joelonsoftware/default.asp?cmd=show&ixPost=26754&ixReplies=4
but I want to bring a new look to it)

How do you version your database?  A project I was on a while back (mod_perl && mysql) used to dump the whole thing to one big .sql file (using mysqldump) with every change and check that into the source repository.  A process that I thought was error prone and cumbersome.

Lately the project I'm working on (Oracle, Stored Procs, Oracle Forms, Perl, Excel VB) has everything versioned except the database schema.  I version the procedures, triggers, and functions by saving all changes to flat files, and checking those into the CVS repository.  I keep regular backups of the database, though it has no real versioning.

What I would like, is the ability to use CVS commands on my database schema.  Tell me the changes in this table between 10/10/2002 and 10/10/2003.  And do it in ALTER TABLE syntax as well, no CREATE TABLE diff's for me.  Show me the differences in my version of a script with the CVS version.  What roles have been added to the schema in the past week?  Tag the current schema as version-3_4. etc. etc.

Does anyone know of any tools that do this already?  I'm working on my Masters (in Computer Science) and I was thinking of writing a few papers on this subject, and maybe implementing a working prototype using bunches of stored procedures in Oracle, or maybe building it into PgSQL.  Who knows how far this could go.  I've already got a few ideas for how I would implement this.  I'd probably tie it into CVS, somehow.

I saw in one of those other threads that Visual Studio.NET does this a little already.  Is it useful?  Does it go far enough?

Andrew Hurst
Friday, May 30, 2003

In some projects, the entire database is re-created at each build from SQL that's kept in CVS like ordinary code.

This guarantees that you always have code that will create a working database when the software is installed on the production system.

Anonymous
Friday, May 30, 2003

Andrew,
VS.NET and the database tools that are built into SQL Server Enterprise Manager (basically the same thing) will allow you to save a change script when you make a schema change. In VS.NET they only work with SQL Server and Oracle.

A tool our DBA loves is the SQL Compare bundle from Red Gate: http://www.red-gate.com/sql/ -- allows you to do a compare on db's schema and data if you wish.

Duncan Smart
Friday, May 30, 2003

... oh, but they're SQL Server only too.

Duncan Smart
Friday, May 30, 2003

This depends on whether you consider the "master source" of the schema to be the database or the script that creates the database. The approach that we have taken is to have a database script that we keep in SourceSafe that we modify when changes are needed and use the script to re-create (or update) the database - the actual database is the result, not the source of changes. All changes to the script are versioned and checked in to SourceSafe.

I took this approach a little further and developed a database script that can create a new database or update an existing database (preserving the data) from a single script. This is done by deleting all the keys, indexes, etc; creating a new temporary table for each table; transferring the data from the old table if one exists and then deleting the old table; renaming the temporary table; at the end creating all the indexes, foreign keys, etc. Most of this work is done with a collection of stored procedures. This allows for changes to the schema, keys, triggers, SPs, etc, and ensures that all databases are identical if the latest version of the script has been run; there is no dependency on incremental conversion scripts.

We also have a version table in the database schema and the script includes an INSERT statement into this table that is built using SourceSafe keywords so that the revision number and revision date of the script are inserted into the version table.

Philip Dickerson
Friday, May 30, 2003

Second vote for the Redgate product, although it basically generates script against the Source and Target DBs and does a 'diff' on them.

The problem with this is that you actually need to know if the table structure is different, not if the script is different.
Eg. If I notice a new column has not been added to my local working DB and add it, unless it is in the exact same place as the master DB, the SQLCompare will flag the tables as being different.

Nit picking, I know but its a pain if you have a hundred tables and 99 of them are "different". On the other hand, it does generate 'Alter' scripts and data migration for you.

I can think of a lot worse ways to spend the money.

Slightly off topic: Have a look at Martin Fowler's Evolutionary Database work at http://martinfowler.com/articles/evodb.html

Justin
Friday, May 30, 2003

PS We have a similar system to Phillip's.

Justin
Friday, May 30, 2003

I'm pretty sure the Perl module Alzabo [1] can do this for you. It's a big project and does more than data modelling, but it's fairly mature and well-supported. (I think Dave recently mentioned he's been working on it for 3.5 years) It's also on CPAN.

[1] http://www.alzabo.org/

Chris Winters
Friday, May 30, 2003

What is error prone and cumbersome about doing a mysqldump and checking that into CVS?  Especially if it's completely automated (I can think of no reason why it wouldn't be)?

It seems like a simple, reasonable solution to me.

Brent P. Newhall
Friday, May 30, 2003

The reason why I call mysqldump error prone and cumbersome, was because of my work on the Scoop project (http://scoop.kuro5hin.org/).  We had a starter database that was available in CVS.  It had the standard schema, but it also had a few default values in it already, and a story there as well so you would at least have a basic working system when you installed it.  The problem was our development databases generally had much more data in them then the version we liked to keep in CVS.  So we'd need to mysqldump them, to get the latest schema changes, but then also find all of the new insert statements for the default values of the fields we added and add those to a patch file (people needed to be able to upgrade easily between the different versions) and then apply that patch file to the main database sql file, and then check in that file.  That process was error prone and cumbersome, not necessarily yours :)

It could have benefitted from some automation, sure.

But pretty much any database versioning system that stores the whole database creation scripts as one big file won't do what I need.  How do you reasonably get a diff between just one table and its CVS version?

The approach I would like to have seems to most closely mirror Philip's approach (if it was all in separate files) (I haven't checked out the products mentioned above yet).  The final result of database versioning like I would like to have, would be an easy way to get all the changes from the last major revision into a path file, to apply to the production databases.

Lastly the inspiration for this thread this time around is that I've been programming my stored procedures in TOAD lately.  I'd like to be able to see the changes I've made from the CVS version easily.  Same goes for my views, tables, roles, grants, synonyms, etc etc.  With CVS integration into the database.  It could even just dump the database schema to separate files and check them into CVS.  I just want an integrated way to see their changes.

Rember these are all just programs, they should be able to keep us from doing any of the busy/repetetive work :)

Andrew Hurst
Friday, May 30, 2003

I'm in the midst of building the database layer for a new application. The way I'm planning to version control my schema is by building an XML file that describes the schema then version control that XML file.

Two of the attributes in my XML schema description are "birth version" and "death version".

A series of trivial little utilities will use this XML to dynmically created the SQL used to (a) create the database schema; and (b) migrate a database from schema version x to schema version y.

One nice side effect of this approach is that support for multiple SQL server flavors is trivial: a command line switch to the utility that builds the SQL DDL from the XML and you can generate schema scripts for Oracle, postgresql, etc.

Andrew Lighten
Friday, May 30, 2003

I know there are many solutions out there but this is the solution I devised for our enterprise application. We have a huge number of tables and millions of records and regenerating them every day makes no sense. Also we have many beta testers and custom solution groups who need to apply the updates rather often.

We have a master database where we apply the changes after they have been thoroughly tested by developers. We built a conversion application which generates an XML file after reading the master DB. The XML file is checked in with the code. The XML contains the proper DB schema, static records (i.e. various applications settings that must be stored in the DB) and C# scripts compiled on the fly for various dynamic changes (i.e. checking the machine at run time and doing maintenance work).

Developers run the conversion application and the XML file every morning to get up to date. The conversion application makes a diff between the XML and the target database, generates and runs the proper SQL statements for all backends (Oracle, MSSql, etc) our product supports.

There are many advantages with this approach:
- It generates only the required SQL statements, for instance it won't drop a FK unless an included column has been changed. This saves hours of conversion time on enterprise DBs with millions of records

- It is re-entrant. If for some reason the conversion fails, a new diff is generated and the conversion starts over where it left. This can save days of restoring back-ups on a huge DB.

- We use the same application to update our clients and our internal databases. Each installation of our product comes with this XML and the conversion application.

- The SQL generation layer is backend specific. It knows Oracle supports a 'disable RI' statement and uses it when appropriate, instead to drop the RI.  Or that Oracle is case sensitive unlike MSSQL.

The April or May MSDN magazine had an article describing a similar, albeit less powerful technology. MSSQL2000 has the ability to generate the DB schema as XML.

coresi

coresi
Saturday, May 31, 2003

*  Recent Topics

*  Fog Creek Home