Fog Creek Software
g
Discussion Board




Data generation

On a previous thread I asked for recommendations for data generators; software that enables the quick generation of large amounts of various types of data.

I looked at the suggested ones, and others, but none of the <  $400 apps really could do what I wanted. For example, I have a sales-tracking table consisting of three columns. The first is just a key field, the second a date-of-sale field, and the third is the sales amount. I'd like to generate data (quickly and easily) where the dates progress sequentially between a start and end date, but concentrated more towards the end date. As well, I'd like the sales figure to increase slightly exponentionally, indicating increases sales volumes. I can also sum different inputs, so that an integer field could consist of an exponential data stream, with a small randomly generated stream (with low amplitude) for noise.

I've written a VB app using ADO that can do these sorts of tricks, plus generate random "greeked" text for character-based fields, and more typical features such as names, address, emails, phone numbers, etc.

It's pretty rough around the edges, and has some limitations such as having my database username & password hardwired in, but it does what I need. I'm contemplated enhancements such as cleaning up the underlying script syntax so that users could track changes with their existing CVS/RCS/etc infrastructure. Nothing worse than changing branches and not having your data move along with you ;-)

This has been just been to scratch my own itch, and I'm not a DBA, but I've been feeling around with ex-colleagues to gauge interest in this. I've used Matlab in the past to do some of this work, but that's quite a learning curve to do something so straightforward.

Does anyone have experience with this? I'd be interested in hearing all comments, including suggestions for features, or negative experiences from the past. I also check my email frequently if you prefer.

Again, this wasn't started with the goal of selling software, but I'm open to ideas.

Nigel
Friday, March 26, 2004

I think a convenient random data generator is always useful, especially as unit testing gets more and more popular (since "test first" implies no real data available yet).

Joe Hendricks
Friday, March 26, 2004

That's been my line of thinking, Joe. Talking to some of the bigger ERP/MRP shops, they seem to have gigabytes of this data lying around, and generating more is the least of their worries ;-)

My background is in smaller shrink-wrap companies, and that's what I'm focusing on here. In my own experience, test data gets corrupted during ad hoc "well, how about this" testing, and unit testing becomes unreliable at best.

I'd like to have things in a position where all the data is generated from a script, so that testing can be done across branches.

One other potential use is in contracted work involving reports. I have a small contract coming up where I'd much rather present sample reports showing a solid two-years worth of increasing sales data, rather than one daily sale of $50.

Nigel
Friday, March 26, 2004

Nigel, would you post a link or two to generator software you found the most capable?

Egor Shipovalov
Friday, March 26, 2004

Egor,

There's www.testdata.com - $10 minimum per data set, does standard addresses, cities, etc., with simple numerical series as well.

www.upscene.com has a shrink-wrapped product that does something similar (Advanced Data Generator). 150 euros per, if memory serves.

Dbi-Script is another one, from sendorosoftware.com. $20 now.  Haven't tried it much.

Let me know how those suit you, if you can.

Thanks,

Nigel
Friday, March 26, 2004

*  Recent Topics

*  Fog Creek Home