Fog Creek Software
Discussion Board




Unit Testing Database Applications

Hi,

The more work that goes into the current project I am working on, the more convinced I am that we need to write unit tests to make sure we're on track.

I've looked over some of the unit testing literature and it certainly makes sense to me. However, the application we are developing is very largely data-driven and I'm a little unsure how to implement unit tests, i.e. if the method "list all tasks" extracts a list of tasks from a database then there is a very high chance that the result of this method will differ between calls, even when the same arguments are used.

Given this, how can I construct meaningful tests and what should I be focusing on?

Your thoughts most definitely welcome.

Walter Rumsby
Sunday, July 07, 2002

Are you able to hook your unit tests to a database with dummy data?  If this is not the case, unit tests make you refactor your code so there's a separation from your live database.  You can then test multiple db's with corner cases.

anon
Sunday, July 07, 2002

We have a special database for unit tests. Each test expected not to corrupt the data in it. Bu even if it does, we have an original copy to restore from.
As many sources pointed - if you have problems with unit testing, this is an indicator of poor code structure. Theoretically you should have a generic database abstraction layer, and all your application's classes should be tested assuming this db layer is working. I mean you would create stubs for DB classes that return the data you need.
But (as in our current case), usually you have a system on hand already, where you have to add unit tests. Since you can't really afford total redesigning to make testing simpler, you have to live with not-so-convenient testing (like restoring database from backup before your tests).

Roman Eremin
Monday, July 08, 2002

The easiest way is to start from a blank database and make sure all the tests give useful errors, then add known data and run more tests.
Assuming your running with a commercial database loss of data or large datasets shouldn't need to be tested for as part of your tests (Although it might be a good plan to do some performance testing with large datasets anyway)
Harder things to test for are multiple users updating data at the same time and making sure your transactions can abort part way through. It may also be worth adding some "robotic" users to check that everything works right.

Peter Ibbotson
Monday, July 08, 2002

If you are doing "real time" testing such as displaying the current users it is difficult to repeat a test.  I would suggest creating a test module that can be use to replace the real time code. Use this module for regression testing and verify the real code manually to make sure that it returns the correct data. 

Another approach is that some testing tools will simulate user connections to the data base so these could be used to create a repeatable test.

John McQuilling
Monday, July 08, 2002

Unit testing and Revision control of Database code is such a neglected topic.  My observations:

1)  Be able to easily reinitialize your database.
2)  Establish sample Data for your database tests
3)  Start off blank, and build up the entries in the DB.
4) You may have a little bleed over from test to test, as reinitializing the DB may be too expensive between each unit test
5) Separate your schemas so that you can test reference data (list of states in the US, enumerated string values that don't change) separate from the Data you are capturing.

adam
Tuesday, July 09, 2002

Why not start with a full-size database and use an outer transaction that is never committed? This way You will have realistic execution times, the same data all the time, and no need for copying an original data set.

Nordic Light
Tuesday, July 09, 2002

Nordic... and no way to validate the correctness of your inserts, updates and deletes.

Joe AA.
Wednesday, July 10, 2002

Any one got any good ideas for unit testing databases with respect to multiple users trying to do the same thing at the same time? i.e. two workstations both trying to sell the last widget in stock. *
The only way I can think of doing this is to add special case test code that "waits" at various points down a transaction,
but this always feels like a good way to add obscure bugs to your code. (i.e. due to an programmer error the code ships with the debug mode enabled, and some bit of test code is left in the main code)

* if you're using optimistic rather than pessimistic locking

Peter Ibbotson
Wednesday, July 10, 2002

> Any one got any good ideas for unit testing databases with respect to multiple users trying to do the same thing at the same time? i.e. two workstations both trying to sell the last widget in stock.

I think that reliability of real-time software must happen at design time, not at testing time: for example by having a database abstraction layer you can control your app's use of the database by controlling your abstraction layer.

> The only way I can think of doing this is to add special case test code that "waits" at various points down a transaction, but this always feels like a good way to add obscure bugs to your code. (i.e. due to an programmer error the code ships with the debug mode enabled, and some bit of test code is left in the main code)

Perhaps you're confusing unit test with system test; and shipping the wrong version wouldn't just be a "programmer error" ...

I always like ASSERTs, especially in release builds that I'm maintaining (as well as during development): more money on design, less on testing. For example, if two tables must always be accessed (locked) in the same sequence to avoid a deadlock, then enforce that at design time (in the abstraction layer) or look for a way to ASSERT it at run-time.

Christopher Wells
Wednesday, July 10, 2002

[quote]
I always like ASSERTs, especially in release builds that I'm maintaining (as well as during development): more money on design, less on testing. For example, if two tables must always be accessed (locked) in the same sequence to avoid a deadlock, then enforce that at design time (in the abstraction layer) or look for a way to ASSERT it at run-time.
[quote]
Err... I'm talking about testing it in the abstraction layer (I think, my terminology here could get weak) In particular what I'm interested in is testing "business layer" objects and making sure they don't hang or screw up over locking type issues. Since most locking bugs require two processes to show themselves, I'm interested in what other folks do to stop this happening.
My background over testing starts out from that of a chip designer where asynchrous events are a large worry and much formal effort goes into sorting these kinds of problem out.
Regression testing for obscure bugs is a very important area and in the area of locking somewhat diffcult.
I suppose the kind of scenario I'm worrying about is where a many line order is shipped and the customers account balance is then updated, if two orders for the same product are posted at the same time without having enought stock available for just one line, making sure that:

a) Only one order gets the product
b) The total order value is correct for both orders before being added on to the customers account balance.

This should be trivial but for testing purposes you'ld want to make sure this was done correctly and make sure that both bits of code are contending for the same resource(s) at the same time and that both recover and retry correctly.

Peter Ibbotson
Thursday, July 11, 2002

>> [quote] I always like ASSERTs ...
> ... testing it in the abstraction layer ... most locking bugs require two processes ...

For example I wrote an abstraction layer for critical sections in a multi-threaded process (similar to the ACE library), which is analogous. It includes ASSERTs (warnings) to detect threads' acquiring critical sections in dangerous sequences, and it detects deadlocks if they occur: can you do something analogous for a database layer?

Also I have one process which accesses the database, all clients go to the database via network connections through that one multi-threaded process ... so I only have one instance/process of the DB abstraction layer, which makes it easier to instrument and control that layer (it's possible for me to use critical sections to check in real-time whether threads are interfering which other) ... and possible to implement my own cacheing of data in the abstraction layer, by the way. This is an instance of a more general strategy: gain control (of the system) by restricting functionality (available to clients); another example might be to persuade clients to access the database using some limited set of defined stored procedures only.

Perhaps you can "unit-test" using a debugger, i.e. single-step through the two processes to test the example you gave (I avoid writing test-only code).

Christopher Wells
Friday, July 12, 2002

*  Recent Topics

*  Fog Creek Home