   Random numbers in a fixed distribution? I need to generate a bunch of random data for a SQL Server database, but the data needs to be in a certain distribution (in one column I need mostly 0's, but some 1's and a rare 2; in another I need 0-23 in a bell curvish distribution) To make it really tricky, I need to generate the rows in a fixed manner (daily data points for each of a piece of equipment); but it could be done in two steps (generate the rows, then generate the data) Any ideas where to start? Philo Philo Sunday, June 13, 2004 Well, you could use a random number generator and check a range.  For instance for your first example, you could generate numbers between 1-100.  Any time you get something back between 1-95, that's a 0, 96-99 is a 1, and 100 gives you a two. For the second one, something similar but your ranges should follow a curve.  For example, let's use a smaller example and say you need a bell curve for 1-3.  Well then 1-25 would be a 0, 26-75 would be a 1, 76-100 would be a 3. Oren Miller Sunday, June 13, 2004 If you don't mind doing it in Python the RandomArray module should do it for you. http://doc.astro-wise.org/RandomArray.html http://www.onlamp.com/pub/a/python/2000/08/09/numerically.html?page=2 TomH Sunday, June 13, 2004 That's perfect, Oren - thanks! Philo Philo Sunday, June 13, 2004 This question pops up every now and then. Here's what I use. Imagine that for every possible generated value there's a segment of size proportional to that value's probability. For example, if you need to generate 1's and 0's, and the latter twice as often, you can have 1 represented by a segment of length 1 and 0 by a segment of length 2. Now imagine you combine these segments into one (of length 3) and cast a uniformly distributed value (using good old rand()) on it. The probability of it getting into 0's segment will be twice of it getting in 1's segment (because former is longer). So the distributed value is just the value to which the segment (or interval, less visually speaking) corresponds. This is easy to implement with distribution defined as array: distribution = [0, 1, 1] // define the array total_len = 3 distributed_rand_value = distribution[rand(total_len)] Egor Sunday, June 13, 2004 It is useful to note that as you sum n random variables, as n-> infinity the sum converges to a normal (bell curve) distribution. The uniform distribution converges quickly, so you could get a very decent bell curve from summing a few (say 23) [0,1] random variables. Devil's Advocate Monday, June 14, 2004 Chapter 7 of the wonderful book Numerical Recipes discusses random numbers, both the theory and practice and code. You can read the whole thing on line free at http://www.library.cornell.edu/nr/cbookcpdf.html Harvey Motulsky Monday, June 14, 2004 Here's a little trick for simulating a normal distribution: If X and Y are two independant random variables uniformly distributed on (0, 1), then Z := cos(2*Pi*X)*sqrt(-2*ln(Y)) is distributed as Normal(0,1), ie normal with mean 0 and variance 1. So effectively you just take pairs of pseudorandom numbers between 0 and 1, plug them into that formula, and the results will come out in what looks like a normal distribution. If you want it to have a different mean and variance, just apply a simple linear transformation,  aZ + b, which'll give it mean b and variance a^2. Matt Monday, June 14, 2004 Be wary of Numerical Recipies.  It's ok for a first stab in the dark at something, but not for industrial strength work, I've found.  (Frustration from using the optimization routines in this led directly to my doctoral work.) Aaron F Stanton Monday, June 14, 2004 Recent Topics Fog Creek Home