Fog Creek Software
Discussion Board




On average?

How many times on average would you have to roll a regular, six sided die before every number came up at least once?

Pete
Saturday, November 15, 2003

The above problem is too hard to think about.  To get started, this post will attempt to solve a simpler problem, which is modified from the problem above.  The problem is as follows:
How many times on average would one have to roll a fair six-sided die to get at least a one and at least a two?

The following formula solves the problem:
Avg roll = summation(n * p(n));  n = 1 to infinity
where n is the index of summation and p(n) is the probability of rolling a die n times to get at least a one and at least a two.

The main task is finding a general formula for p(n).  For small n, finding the number p(n) is fairly easy.  For example, when n = 1, p(1) = 0.  This is because it is impossible to roll a die just once to get at least a one AND at least a two.

P(2) is the probability of rolling a die two times and getting at least a one and a two.  To find p(2), use the formula p = event space / sample space.  The sample space in this case = 6 * 6 = 36, because from the first roll there are six possible outcomes and from the second roll there are also six possible outcomes.  Of the 36 possible outcomes in the sample space, only 2 outcomes satisfy the restriction placed on the event space.  These 2 outcomes are (1, 2) and (2, 1), which have at least a one and at least a two.  Therefore, event space = 2, and p(2) = 2 / 36 = 1 / 18.

Following the same reasoning, p(3) is the probability of rolling a die three times and getting at least a one and at least a two.  The event space for p(3) is:
{ (1, 1, 2), (1, 3, 2), (1, 4, 2), (1, 5, 2), (1, 6, 2),
  (          ), (3, 1, 2), (4, 1, 2), (5, 1, 2), (6, 1, 2),
  (2, 2, 1), (2, 3, 1), (2, 4, 1), (2, 5, 1), (2, 6, 1),
  (          ), (3, 2, 1), (4, 2, 1), (5, 2, 1), (6, 2, 1)  }
Event space for p(3) = 18.  Sample space for p(3) = 6 * 6 * 6 = 6 ^ 3 = 216.  Therefore, p(3) = 18 / 216 = 1 / 12.

You might be wondering why (1, 2, 3) is not listed in the event space.  After all, it contains at least a one and at least a two.  This is because p(3) is the probability of rolling a die THREE times and getting at least a one and at least a two.  If we rolled (1, 2, 3), after the second roll we would have already got at least a one and at least a two, and would not need to roll the third time.  Rolling (1, 2) was already considered in p(2).

The event space for p(4) is too numerous to list here, so it is time to find a general formula for p(n).  The paragraph above provides an important fact; the last roll in the events in the event space must be either a one or a two, any other number would not make sense.  This means that the n-th roll in p(n) must be either a one or a two.  If the last roll was a one, then the n - 1 previous rolls must not have a one and the n - 1 previous rolls must have at least a two.  Likewise, if the last roll was a two, then the previous n-1 rolls must not have a two and the n - 1 previous rolls must have at least a one.  Let us assume the last roll in n rolls was a two, how many permutations are there of n - 1 rolls which contain no twos and at least a one?  Take all possible permutations of n - 1 rolls which contain no twos (5^(n-1)), and subtract from it all possible permutations of n - 1 rolls which contain no twos and no ones (4^(n-1)), the result is all permutations of n - 1 rolls which contain no twos and at least a one.  Since the last roll can end with a one or a two, the event space is just 2 times 5^(n-1)-4^(n-1).

Event space for p(n) = 2 * [5 ^ (n-1) - 4 ^ (n-1)]
Sample space for p(n) = 6 ^ n
p(n) = 2 * [5 ^ (n-1) - 4 ^ (n-1)] / 6^n

Plug this result into the formula for average roll,
Avg roll = summation(2 * n * [5 ^ (n-1) - 4 ^ (n-1)] / 6^n)
          with n = 1 to infinity

This is what avg roll looks like
OO
----                n-1        n-1
\    2 * n * (5        -  4      )
>  ------------------------------
/                      n
----                6
n=1


    OO                                    OO
    ----                      n-1        ----                    n-1
    \      2 * n * (5/6)              \      2 * n * (2/3)
=  >    --------------------    -      >  -------------------
    /                  6                    /                6
    ----                                    ----
    n=1                                    n=1


          OO                                OO
          ----                                ----           
    1    \                  n-1    1    \                  n-1
=  -  *  >  n * (5/6)      -  -  *  >  n * (2/3)
    3    /                            3    /       
          ----                                ----
          n=1                              n=1


    1              1
=  -  * 36  -  -  * 9
    3              3


=  9

Finally, the answer is: on average one would have to roll a fair six-sided die 9 times to get at least a one and at least a two.  Questions, comments, concerns, and critiques welcome.


The following is a simple qbasic program to simulate the dice roll and test the result.  The output does support the answer.

DECLARE FUNCTION randomnum! (lower!, upper!)
CLS
RANDOMIZE TIMER

DIM SHARED array(1 TO 6)

sum = 0
numtrial = 30000
FOR trial = 1 TO numtrial
    numroll = 0
    FOR count = 1 TO 6
        array(count) = 0
    NEXT count
    DO
        die = randomnum(1, 6)
        array(die) = 1
        numroll = numroll + 1
        total = array(1) + array(2)
    LOOP UNTIL total >= 2
    sum = sum + numroll
NEXT trial
PRINT sum / numtrial

FUNCTION randomnum (lower, upper)
    randomnum = INT((upper - lower + 1) * RND) + lower
END FUNCTION



This shows how to calculate the summation.

summation(n * (5/6) ^ (n-1)) with n = 1 to k

= 1*(5/6)^0 +
  2*(5/6)^1 +
  3*(5/6)^2 +
  4*(5/6)^3 +
  .
  .
  .
  k*(5/6)^(k-1)

= (5/6)^(0)*[(5/6)^(k-0)-1]/(5/6-1) +
  (5/6)^(1)*[(5/6)^(k-1)-1]/(5/6-1) +
  (5/6)^(2)*[(5/6)^(k-2)-1]/(5/6-1) +
  .
  .
  .
  (5/6)^(k-1)*[(5/6)^(k-k+1)-1]/(5/6-1)

= [(5/6)^k-(5/6)^(0)]/(5/6-1) +
  [(5/6)^k-(5/6)^(1)]/(5/6-1) +
  [(5/6)^k-(5/6)^(2)]/(5/6-1) +
  .
  .
  .
  [(5/6)^k-(5/6)^(k-1)]/(5/6-1)

= k*(5/6)^k/(5/6-1)-[(5/6)^k-1]/(5/6-1)^2

= [k*(5/6)^k*(5/6-1)-(5/6)^k+1]/(5/6-1)^2

= {(5/6)^k*[k*(5/6-1)-1]+1}/(5/6-1)^2

= -(5/6)^k*(k/6+1)/(5/6-1)^2+1/(5/6-1)^2

= -(5/6)^k*(k/6+1)*36+36

= 36*[1-(5/6)^k*(k/6+1)]



summation(n * (5/6) ^ (n-1)) with n = 1 to infinity

= limit{36*[1-(5/6)^k*(k/6+1)]}  k->infinity

= 36 - 36 * limit[(5/6)^k*(k/6+1)]  k->infinity

= 36 - 36 * limit[(5/6)^k*k/6+(5/6)^k]  k->infinity

= 36 - 36/6 * limit[k*(5/6)^k] + 36 * limit[(5/6)^k]
  k->infinity

= 36 - 6 * limit[k/(6/5)^k] + 36 * limit[1/(6/5)^k]  k->infinity

= 36 - 6 * limit[k/(1.2)^k] + 36 * limit[1/(1.2)^k]  k->infinity

L'Hospital's Rule
= 36 - 6 * limit{1/[lk(1.2)*(1.2)^k]} + 36 * 0  k->infinity   

= 36

JHY
Tuesday, November 25, 2003

The general rule is that if the probability of an event is P, then the average number of trials to get one of those events is 1/P.

In the six-sided die problem, we can reduce it to six events which have to occur in order:

1. We get any number
2. We get any number besides the number in (1).
3. We get any of the four remaining numbers
4. Any of the three remaining numbers
5. Any of the two remainers
6. The final number.

The probabilities for each event are 1, 5/6, 4/6, 3/6, 2/6, and 1/6 respectively.  Therefore, the average number of rolls to see one of each number is 1+6/5+6/4+6/3+6/2+6=14.7

We can also solve the "see a one or a two" problem the same way:

1. See a one or a two (2/6)
2. See the other number (1/6)

Which gets the same answer 6/2+6/1=9 reported above.

Foolish Jordan
Tuesday, November 25, 2003

Ok, I see the solution is very simple.  Thanks for pointing it out.  I never saw such elegant solutions before for problems like this.  I must have missed school the day when they taught this.  I still do not understand why adding the reciprocals of probabilities would get the correct answer, although I know the answer is correct from program simulations.  Can someone shed some light on this for me please?

Anyways, here is my thought process before seeing the solution above.

Another simplified problem that might lead us closer to the solution of Pete's problem is:
How many times on average would one have to roll a fair six-sided die to get two different numbers?

How does this get us closer to the solution of Pete's problem?  Well, once we have solved that problem we can try to solve the problem:
How many times on average would one have to roll a fair six-sided die to get three different numbers?  Then solve the problem for four different numbers.  The solution for six different numbers is the solution for Pete's problem.

To solve the problem for two different numbers, I used the following formula:
Avg roll = summation(n * p(n));  n = 1 to infinity
where n is the index of summation and p(n) is the probability of rolling a die n times to get two different numbers.

To find a general formula for p(n), we can look at p(n) for small n to see if there is an observable pattern.

p(1) = 0 because the probability of rolling a die once and getting two different numbers is zero.

p(2) = 5/6 because on the first roll you can have any of the six numbers, but you must not roll the same number the second time in order to get two different numbers in two rolls.  The probability of not rolling the same number the second time is 5/6 since you have 1/6 chance of rolling the same number again.

p(3) = 5/36 because on the first roll you can have any of the six numbers, in the second roll you must roll the same number as the first roll, otherwise you would not need three rolls to get two different numbers.  On the third roll you must roll a different number.  The probability for the first roll is 6/6 = 1 since you can roll any number.  The probability for the second roll is 1/6 since it must match the first roll.  The probability for the third roll is 5/6 since it must be different from the first roll.  p(3) = 1 * 1/6 * 5/6 = 5/6^2

Following the same reasoning, if you rolled four times to get two different numbers, the first three rolls must have the same number, the fourth roll must have a different number from the first three rolls.  The probability of that is p(4) = 1 * 1/6 * 1/6 * 5/6 = 5/6^3.

In general, if you rolled n times to get two different numbers, the first n-1 rolls must have the same number, the last roll must have a different number from the first n-1 rolls.  Therefore, p(n) = 1 * (1/6)^(n-2) * 5/6 = 5/6^(n-1) with n >= 2.

Avg roll
= summation(n * p(n));  n = 2 to infinity
= summation[5n/6^(n-1)];  n = 2 to infinity
= 5*summation[n/6^(n-1)];  n = 2 to infinity
= 5*summation[n*(1/6)^(n-1)];  n = 2 to infinity

To solve this summation we use the formula (deduced from my first post):
summation[n*c^(n-1)];  n = 0 to infinity
=1/(c-1)^2  where 0 < c < 1

summation[n*(1/6)^(n-1)];  n = 2 to infinity
= [1 / (1/6-1)^2] - 1  note: we need to subtract 1 from the sum because n starts from 2
= 11/25

Avg roll = 5*11/25 = 11/5 = 2.2
This means on average one would have to roll a fair six-sided die 2.2 times to get two different numbers.

This is the same as Jordan's solution, although all he had to do was 1 + 6/5 = 2.2

We can use similar reasoning to solve the problem:
How many times on average would one have to roll a fair six-sided die to get three different numbers?
This is how far I have gotten, so might as well put the solution down.

I found p(n) for getting three different numbers is:
p(n) = 6*10*[2^(n-1)-2]/6^n

6 is the number of ways to pick a number for the result of the last roll.
10 is the number of ways to pick 2 numbers from the remaining 5 numbers to be in the previous n-1 rolls, which is 5 choose 2, or 5!/(3!2!) = 10.
2^(n-1) is all the possible permutations of the 2 numbers in the first n-1 rolls.
Since we need both numbers to appear at least once in the first n-1 rolls, we need to subtract from 2^(n-1) the number of permutations where only 1 number appears, which is 2.
6^n is all the possible permutations of rolling a die n times.


Avg roll
= summation(n * p(n));  n = 3 to infinity
= summation(n * 60*[2^(n-1)-2]/6^n);  n = 3 to infinity
= 60 * summation(n * [2^(n-1)-2]/6^n);  n = 3 to infinity

We could try to solve the summation by hand or write a program to approximate the sum for large n, but why bother.  The solution is obviously 1+6/5+6/4 = 3.7

I used the same qbasic simulation program to confirm the solution for six numbers, which is 14.7
Here is the program I used:

DECLARE FUNCTION randomnum! (lower!, upper!)
CLS
RANDOMIZE TIMER

DIM SHARED array(1 TO 6)

FOR try = 1 TO 20
    sum = 0
    numtrial = 30000
    FOR trial = 1 TO numtrial
        numroll = 0
        FOR count = 1 TO 6
            array(count) = 0
        NEXT count
        DO
            die = randomnum(1, 6)
            array(die) = 1
            numroll = numroll + 1
        LOOP UNTIL array(1) + array(2) + array(3) + array(4) + array(5) + array(6) = 6
        sum = sum + numroll
    NEXT trial
    PRINT numtrial, sum / numtrial
NEXT try

FUNCTION randomnum (lower, upper)
    randomnum = INT((upper - lower + 1) * RND) + lower
END FUNCTION

JHY
Tuesday, November 25, 2003

The rule for reciprocating probabillities in order to get the average: If a event has a chance "x" of occuring each time you make a pick, it has " 1-x " chance of not happening. In order to know the average number of picks necessary on average before the event occurs, calculate the following sum: for convenience, let (1-x) = y; x*Sum(1*1+y*2+y^2*3+....y^n*(n+1)+......)= x*Sum(y^n*(n+1),n=0..infinity)=x*(y/(1-y)^2 +1/(1-y))=x*(1/(1-y)^2); but since (1-y)=x...sum=x*1/x^2=1/x--the reciprical. I hope that this is somewhat understandable, it is hard to write something like this out on a word processor.

John
Friday, November 28, 2003

Thanks, John.  Now I got it.

JHY
Monday, December 01, 2003

*  Recent Topics

*  Fog Creek Home