Fog Creek Software
Discussion Board




Random numbers in a fixed distribution?

I need to generate a bunch of random data for a SQL Server database, but the data needs to be in a certain distribution (in one column I need mostly 0's, but some 1's and a rare 2; in another I need 0-23 in a bell curvish distribution)

To make it really tricky, I need to generate the rows in a fixed manner (daily data points for each of a piece of equipment); but it could be done in two steps (generate the rows, then generate the data)

Any ideas where to start?

Philo

Philo
Sunday, June 13, 2004

Well, you could use a random number generator and check a range.  For instance for your first example, you could generate numbers between 1-100.  Any time you get something back between 1-95, that's a 0, 96-99 is a 1, and 100 gives you a two.

For the second one, something similar but your ranges should follow a curve.  For example, let's use a smaller example and say you need a bell curve for 1-3.  Well then 1-25 would be a 0, 26-75 would be a 1, 76-100 would be a 3.

Oren Miller
Sunday, June 13, 2004

If you don't mind doing it in Python the RandomArray module should do it for you.

http://doc.astro-wise.org/RandomArray.html
http://www.onlamp.com/pub/a/python/2000/08/09/numerically.html?page=2

TomH
Sunday, June 13, 2004

That's perfect, Oren - thanks!

Philo

Philo
Sunday, June 13, 2004

This question pops up every now and then. Here's what I use.

Imagine that for every possible generated value there's a segment of size proportional to that value's probability. For example, if you need to generate 1's and 0's, and the latter twice as often, you can have 1 represented by a segment of length 1 and 0 by a segment of length 2. Now imagine you combine these segments into one (of length 3) and cast a uniformly distributed value (using good old rand()) on it. The probability of it getting into 0's segment will be twice of it getting in 1's segment (because former is longer). So the distributed value is just the value to which the segment (or interval, less visually speaking) corresponds.

This is easy to implement with distribution defined as array:

distribution = [0, 1, 1] // define the array
total_len = 3
distributed_rand_value = distribution[rand(total_len)]

Egor
Sunday, June 13, 2004

It is useful to note that as you sum n random variables, as n-> infinity the sum converges to a normal (bell curve) distribution. The uniform distribution converges quickly, so you could get a very decent bell curve from summing a few (say 23) [0,1] random variables.

Devil's Advocate
Monday, June 14, 2004

Chapter 7 of the wonderful book Numerical Recipes discusses random numbers, both the theory and practice and code. You can read the whole thing on line free at
http://www.library.cornell.edu/nr/cbookcpdf.html

Harvey Motulsky
Monday, June 14, 2004

Here's a little trick for simulating a normal distribution:

If X and Y are two independant random variables uniformly distributed on (0, 1), then

Z := cos(2*Pi*X)*sqrt(-2*ln(Y))

is distributed as Normal(0,1), ie normal with mean 0 and variance 1.

So effectively you just take pairs of pseudorandom numbers between 0 and 1, plug them into that formula, and the results will come out in what looks like a normal distribution. If you want it to have a different mean and variance, just apply a simple linear transformation,  aZ + b, which'll give it mean b and variance a^2.

Matt
Monday, June 14, 2004

Be wary of Numerical Recipies.  It's ok for a first stab in the dark at something, but not for industrial strength work, I've found.  (Frustration from using the optimization routines in this led directly to my doctoral work.)

Aaron F Stanton
Monday, June 14, 2004

*  Recent Topics

*  Fog Creek Home