Random numbers in a fixed distribution?
I need to generate a bunch of random data for a SQL Server database, but the data needs to be in a certain distribution (in one column I need mostly 0's, but some 1's and a rare 2; in another I need 0-23 in a bell curvish distribution)
To make it really tricky, I need to generate the rows in a fixed manner (daily data points for each of a piece of equipment); but it could be done in two steps (generate the rows, then generate the data)
Any ideas where to start?
Philo
Philo
Sunday, June 13, 2004
Well, you could use a random number generator and check a range. For instance for your first example, you could generate numbers between 1-100. Any time you get something back between 1-95, that's a 0, 96-99 is a 1, and 100 gives you a two.
For the second one, something similar but your ranges should follow a curve. For example, let's use a smaller example and say you need a bell curve for 1-3. Well then 1-25 would be a 0, 26-75 would be a 1, 76-100 would be a 3.
Oren Miller
Sunday, June 13, 2004
If you don't mind doing it in Python the RandomArray module should do it for you.
http://doc.astro-wise.org/RandomArray.html
http://www.onlamp.com/pub/a/python/2000/08/09/numerically.html?page=2
TomH
Sunday, June 13, 2004
That's perfect, Oren - thanks!
Philo
Philo
Sunday, June 13, 2004
This question pops up every now and then. Here's what I use.
Imagine that for every possible generated value there's a segment of size proportional to that value's probability. For example, if you need to generate 1's and 0's, and the latter twice as often, you can have 1 represented by a segment of length 1 and 0 by a segment of length 2. Now imagine you combine these segments into one (of length 3) and cast a uniformly distributed value (using good old rand()) on it. The probability of it getting into 0's segment will be twice of it getting in 1's segment (because former is longer). So the distributed value is just the value to which the segment (or interval, less visually speaking) corresponds.
This is easy to implement with distribution defined as array:
distribution = [0, 1, 1] // define the array
total_len = 3
distributed_rand_value = distribution[rand(total_len)]
Egor
Sunday, June 13, 2004
It is useful to note that as you sum n random variables, as n-> infinity the sum converges to a normal (bell curve) distribution. The uniform distribution converges quickly, so you could get a very decent bell curve from summing a few (say 23) [0,1] random variables.
Devil's Advocate
Monday, June 14, 2004
Chapter 7 of the wonderful book Numerical Recipes discusses random numbers, both the theory and practice and code. You can read the whole thing on line free at
http://www.library.cornell.edu/nr/cbookcpdf.html
Harvey Motulsky
Monday, June 14, 2004
Here's a little trick for simulating a normal distribution:
If X and Y are two independant random variables uniformly distributed on (0, 1), then
Z := cos(2*Pi*X)*sqrt(-2*ln(Y))
is distributed as Normal(0,1), ie normal with mean 0 and variance 1.
So effectively you just take pairs of pseudorandom numbers between 0 and 1, plug them into that formula, and the results will come out in what looks like a normal distribution. If you want it to have a different mean and variance, just apply a simple linear transformation, aZ + b, which'll give it mean b and variance a^2.
Matt
Monday, June 14, 2004
Be wary of Numerical Recipies. It's ok for a first stab in the dark at something, but not for industrial strength work, I've found. (Frustration from using the optimization routines in this led directly to my doctoral work.)
Aaron F Stanton
Monday, June 14, 2004
Recent Topics
Fog Creek Home
|