Fog Creek Software
Discussion Board




Probability question...

We have a certain system evolving at work, and I've noticed some patterns occurring. I'm wondering if anyone can tell me the probabilities associated here.

Let's say you're flipping a coin 10 times. What is the probability of getting heads exactly 4 times in a row? More than 4 times? That question I can manage (with some effort).

Now, what if I tell you that out of those 10 trials it comes up heads exactly 5 times, and tails the other 5 times?

What now? If you're going to get 5 heads coming up, what are the odds of getting 4 in a row?

With these small trials, I could just count the total number of solutions, but I'm more interested in the calculations here.

Tim
Wednesday, July 28, 2004

Uhhh... were you awake during 7th grade math?

Bob
Thursday, July 29, 2004

You've "noticed some patterns" in a "system evolving at work"?

Sorry, Tim, but this really smells like a homework question.  I can't think of many instances where the scenario you described would provide any useful information, let alone in such a way that you'd prefer to have the calculations involved rather than just the answer.

The tail of the "g" in "fog creek software"
Thursday, July 29, 2004

"We have a certain system evolving at work, and I've noticed some patterns occurring."

The pattern is you keep losing in the office "who can flip 5 heads in a row" contest?

www.MarkTAW.com
Thursday, July 29, 2004

>Now, what if I tell you that out of those 10 trials it comes up heads >exactly 5 times, and tails the other 5 times?

>What now? If you're going to get 5 heads coming up, what are the >odds of getting 4 in a row?

So, you know for sure that out of 10 tosses, you'll definately get 5 heads? First throw, all is the same, but on consecutive throws, it depends what the previous results were. Say you only have 4 throws left and have only had heads once, the odds of getting four in a row are 1.
  -tim

a2800276
Thursday, July 29, 2004

> I could just count the total number of solutions

You do have to "count" the number of solutions to get the odds...

Normally you know the chances of getting heads are 0.5, so the odds of getting heads 4 times in a row are 0.5^4.

Now: after the first toss of the coin, depending on whether you got heads or tails, the odds of getting heads increases or diminishes.

1st throw: 5/10 -> 0.5 
2nd (first was heads): 4/9 -> 0.444.. (you'll definately get 4 'heads' in the next 9 throws)
2nd (first was tails): 5/9 -> 0.555..

But, since the odds are always changing, you can't just say the chance of getting 4 in a row are (odds^4).

Continuing the example above: if you get head the second throw, the chances for the 3rd are 3/8 and then 2/7. So the odds of getting four in a row starting from the first toss are:

1/2 * 4/9 * 3/8 * 2/7 = 24/1008 = 0.0238...

instead of 1/2 ^ 4 = 0.0625 for a normal coin toss. The current and subsequent odds in your case depend on the number of heads that have already come up, and how many throws you have left.

All of this, provided a proper distribution within your 10 tries (which is unlikely in your model where you'll definately get 5 heads in 10 throws).

  -tim

a2800276
Thursday, July 29, 2004

> what if I tell you that out of those 10
> trials it comes up heads exactly 5 times,
> and tails the other 5 times?
>
> What now?

You just created a bunch of new Universes, why not go play with one of them?


Thursday, July 29, 2004

Obviously my work doesn't involve flipping coins, but it seemed an easier way of explaining the situation than going into the details of the wireless network handshaking protocols.

Sorry I asked.

Tim
Thursday, July 29, 2004

For the first question, it's just a matter of counting, although there's tricks you can use to make that easier. I can't immediately think of a nice formula (although I expect if you look in a combinatorics textbook you may find some suitable generalisation), so you'll have to do it by cases and think hard to make sure you're never counting the same combination twice.

The second part what you're asking for is a conditional probability. Google on that phrase, there's an easy way of calculating these.

Matt
Thursday, July 29, 2004

Ah yes, a2800276's approach is good, conditioning on outcome the first toss. This could give you a recurrence relation to solve for some sort of formula (although if you just want the answer to this particular case may aswell just count)

Actually, given this is a programmer's forum not a mathematician's one - just write a 10-line bloody python script to calculate this by brute force! It's not gonna take long, there are only 1024 cases.

Matt
Thursday, July 29, 2004

"Actually, given this is a programmer's forum not a mathematician's one"

But it is a pretty good example of why a Comp. Sci. major is expected to take a number of distribution courses, including calculus and statistics. You never know when a problem like this will pop up

Admittedly this is not a college level math problem, but it makes sense that the OP wants to know if what he's observing is random or not.

Tom H
Thursday, July 29, 2004

>> "Actually, given this is a programmer's forum not a mathematician's one"

What a dumbass.  This is why I think no one should be permitted to  get any where near production code until s/he's earned a BS in Math.

anon
Thursday, July 29, 2004

Actually I just saw the bit where he was asking for enlightenment about the calculations. Fair enough.

I'm actually a mathematician of sorts... so should (in theory) be able to do this standing on my head. I'm working on it ;-)

(A neat formula that is, for any values in place of '4' and '10' in this case)

Matt
Thursday, July 29, 2004

I have a fricking BS in Maths! and will have a masters in another year. I was just saying that if you want an immediate answer to that particular problem, and given the tools/skills available to you, brute force would be the way to go. Looks like I didn't read to OP carefully enough though

Matt
Thursday, July 29, 2004

4 heads in a row out of 10 flips

50 % chance on each flip

4 * 50 / 10 = 20%

5 heads in a row out of 10 flips

50 % chance on each flip

5 * 50 / 10  = 25%

...............

This is why I am not a mathematician
Thursday, July 29, 2004

Aha! as I suspected after playing around with it for a bit, finding a closed form for this isn't easy.

See this: http://mathworld.wolfram.com/Run.html

If you're interested in the maths behind it - there's a rather horrific-looking but cool generating function whose coefficients will give you the answers, but no simple formula valid for all r and n (in their terminology). Appears to be a neat formula for runs of length two though, involving Fibonacci coefficients.

If you're interested in the asymptotic behaviour of the function (ie what happens to the probabilities as the numbers get large) this should be easy to approximate, and if you need to calculate the function accurately I'd suggest using one of the recurrence relations given on the above page and coming up with a recursive algorithm.

Matt
Thursday, July 29, 2004

Thanks a bunch Matt.

I'll have a look at that.

Tim
Thursday, July 29, 2004

http://www.amazon.com/exec/obidos/tg/detail/-/0471257087/qid=1086006247/sr=8-1/ref=pd_ka_1/102-8744697-3422511?v=glance&s=books&n=507846

Introduction to Probability Theory and Its Applications, Volume 1, by William Feller.

I've gotten a surprising amount of mileage out of this book in the 8 or 9 months since I bought it.

But admittedly Wolfram is often a good choice, and at a much better price point.

Danil
Thursday, July 29, 2004

Sidebar - is anybody else amused by the notion that "staying awake in 7th grade math" would enlighten one about processes involving Fibonacci k-step numbers and the Euler-Mascheroni constant?

Danil
Thursday, July 29, 2004

Yeah wolfram is great for reference - trying to actually learn new material from it would be pretty damn hard though. It's just a good summary of mathematical results and definitions

Matt
Thursday, July 29, 2004

Danil, what part of "what are the odds of getting 4 in a row?" involves "Fibonacci k-step numbers" [sic]?

Bob
Thursday, July 29, 2004

The OP's question comes very close to the gambler's fallacy.  If you are flipping a coin, that has a 50% chance of heads or tails, then each trial (each flip) is completely independent of the others.

If it has come up heads 4 times in a row, the next flip (IF it is a fair coin, of course) has a 50% chance of coming up heads.  Each trial tells you NOTHING about the following trials.

The gambler's flaw says that if it has come up heads 4 times in a row, then it should be more likely that the next flip will come up heads.  This is incorrect, unless there is something influencing the fall of the coin.

It is very human to want to apply some pattern to an already occurred set of random events, and then expect that pattern to extend into the future.  This is why programs to predict lottery numbers are popular.  But it is not true.

In communication networks, on the other hand, the volume and speed, and number of nodes wanting to talk, can heavily influence the throughput of the network.  Just make sure you don't use the gambler's fallacy to predict performance.

AllanL5
Thursday, July 29, 2004

Agreed Allan, but I don't think that applies here.

Each new flip of the coin depends heavily on previous flips.

A better example is probably a bag containing 10 marbles: 5 black, and 5 white.

Pulling out one marble at a time, what are the odds of pulling out 4 in a row of the same colour?

Tim
Thursday, July 29, 2004

The first marble has 5/10 chance of being white.  The next 4/10, then 3/10, then 2/10.  Multiply these together, you get .012.  So you have a 1.2% chance of 4 marbles in a row for the condition of it being the first 4 marbles.

Now do the condition of pulling one black, then the 4 white.
Then the condition of two black, then 4 white.  Hmm, this problem does get complex, doesn't it?

I agree with an earlier poster.  Enumerate it in a program.

AllanL5
Thursday, July 29, 2004

Probability to get heads is 1/2. Probability to get heads 4 times in a row = 1/2^4. Probability to get 4 in a row in a sequence of 10 is 7 x 1/2^4 = 7/16 (if you didn't hit 4 in a row to the 7th try then you missed - ie you have 7 trys to get 4 in a row)

Again 7/16

Dino
Thursday, July 29, 2004

So the probability of getting 6 in a row is 7 * 1/2^6 = 7/64?

This problem is more difficult than that. Out of ten flips, exactly 5 are heads, and 5 are tails. It's not a trivial problem.

Edward
Thursday, July 29, 2004

What we're saying is - the general case is a fairly hard problem in combinatorics / number theory. Solving a particular case will just involve doing some fairly ugly/boring calculations, and for what it's worth you may aswell just brute-force it, but the general case is certainly non-trivial

Matt
Thursday, July 29, 2004

The probability to get 6 in a row out of 10 is

5 x 1/2^6 = 5/64

since you have only 5 tries for the sequence.

Dino
Thursday, July 29, 2004

Oh, and out of 10 flips you'll get any combination of heads and tails which is physically possible. It is random after all.

Dino
Thursday, July 29, 2004

May I switch to meta issues? This is an interesting thread. The problem is certainly nontrivial and to anyone who has dealt with serial transmission of binary data, is instantly recognizable.

So we see people who, to put it kindly, clearly don't have much experience with or knowledge of serious computer science, or probability issues, smacking down the posters with remarks about the odds of getting heads are1/2 and how it is 7th grade math, how the OP knows nothing of computer science, how a degree in math should be required for CS. Yet these people making such comments clearly have very little understanding of the subjects they are proclaiming themselves experts in.

At this point it is a question of social dynamics and psychological issues. The detractors do not understand a problem even at its most basic level. They are theratened by this, So in their minds they make themselves experts and start shouting down others with their trite and incorrect answers. It's really fascinating.

It's like politics - people who know nothing at all about the situation in iraq are experts and are second guessing every decision! First we should not be in Fallujah at all and our army has no business in iraq, then when we pull out, we have done so too early and are betraying the iraqis to whom we have a responsibility to support with our army!  Both these claims from the same journalists!

Likewise, very thickheaded and ignorart people who have no idea about chemistry or physics are claiming that in the future all electricity will be generated by hydrogen fuel cell or solar panels!


Anyway, how I solve these sorts of problems is I work out the first few terms using brute force. Then, I go to the encyclopedia of integer sequences, which will list a few sequences that start with those numbers. That gives me tips as to the theory I need to look in to to continue.

Dennis Atkins
Thursday, July 29, 2004

Hasty, hasty, hast ....

You were right - I made a mistake

P(A U B) = P(A) + P(B) only if A and B are mutually exclusive. Otherwise

P(A U B) = P(A) + P(B) - P(A & B). For a coin flip

P(A U A) = 1/2 + 1/2 - 1/2 * 1/2 = 3/4


Then for our case:

P = 7 * 1/2^4 - 21 * 1/2^8 - 35 * 1/2 ^12 - 35 * 1/2^16 - 21 * 1/2^20 - 7 * 1/2^24 - 1/2^28 = 0.34636932238936424...

Indeed getting the general expression is a more complicatied thing. I wonder if this is a random real or not ... hmm

Dino
Thursday, July 29, 2004

Dennis, it's only interesting if you're not interested in getting an answer ;-)

If I may put myself in your shoes, I find it fascinating that these same people usually post anonymously (as if we're all not doing that). Why would I care what someone calling themselves 'Bob' thinks of my question? Does he/she care what other 'Bobs' post about them? I dunno.

Anyways, it was just something that I was curious about. I see that the math is a little more than I can digest in a short while, so it'll be put aside for the moment.

Tim
Thursday, July 29, 2004

n in a row out of m tries:

p = 1/2^n
                  m-n-1            m-n-2  i<j
P = n * p - sum (i) p^2 - sum (  sum (i)) p^3 - ...
                  i=1                j=1      i = 1

  m-n-3    j<k    i < j 
- sum    (  sum  (sum(i))p^4 ....
  k = 1        j = 1    i = 1

OK. Lunch is over ... I'm back to work :-)

Dino
Thursday, July 29, 2004

But I think you are ignoring the bag-like nature of the thing. Once you've pulled an item of a particular type out, the probability of getting another of that type is reduced, because there are fewer left.

For the original question, I think the probability of 4 in a row is 36/252.

The general case is not obvious to me. But it is like you have an N-bit value, and how many of those have exactly M bits set, and how many of those have K contiguous bits set?

I can't work this out. My maths is too weak! But I think once you know this, you can work out the probability:

The bottom number is how many N-bit values have exactly M bits set, for obvious reasons.

The top number is a product of two other values. The first is N-(K-1), as there are N-(K-1) places where K contiguous bits could start. The second is how many N-(K-1)-bit values have exactly (M-K) bits set. (Because, I think, you have (M-K) bits "left over" that you know will appear, and N-(K-1) bit positions that these have to go in... somewhere).

Tom
Thursday, July 29, 2004

(As in, I can't work out how many N-bit values have exactly M bits set, contiguous or otherwise.)

Tom
Thursday, July 29, 2004

Actually, I think I am wrong. Certain positions of the "left over" bits will put a contiguous set somewhere other than where you first thought of (as it were). So it isn't quite that simple.

But 36/252 was worked out by brute force, as a first step to see if I could work this out, so I think that is right :)

Tom
Thursday, July 29, 2004

Continuing the meta thread

"The detractors do not understand a problem even at its most basic level. They are theratened by this"

No, I don't think this fits.  There's a step missing, I think.  The detractors aren't reacting to this problem, but reacting to some other problem that is superficially similar.

That is, had the original question been superficially similar to some other problem of reasonable interest and on topic, I think we would have seen a relatively civil set of completely useless answers.

In other words, I think what we see here is different from what we see in the case of the Monty Hall problem (where people understand the question just fine, but don't understand the answer before they have become invested in their position).

Danil
Thursday, July 29, 2004

Tom, the key phrase in looking up solutions for bags of marbles is "without replacement".

But you've got the right idea.  If there are N white marbles and M black marbles, there are (N+M)!/(N!M!) different ways to pull them out of the bag.  That's the denominator

The numerator can be a real pain, especially when K < M/2, so you have to start making sure that the remaining marbles don't make a longer streak than the one you are looking at.

(For example, consider the cases where K=1).

Danil
Thursday, July 29, 2004

I don't know which feeling is stronger.

The relief that I'm not a complete idiot after all, or the frustration in that there's no easy answer.

;-)

Tim
Thursday, July 29, 2004

Quickly ... I hope I don't make too many mistakes  .... :-)
here is how you get the probabiliy for:

Pk(A) = P(A U A U A .... U A)  k times

k1  k2  k3  k4  k5  k6  k7  k8  ....
1
2  -1
3  -3  -1
4  -6  -4  -1
5  -10 -10 -5  -1
6  -15 -20 -15 -6  -1
7  -21 -35 -35 -21  -7  -1
8  -28 -56 -70 -56  -28 -8  -1
9    ...

Pk(A) = Sum(ki * p^i) for p = P(A) and i = 1 to k

For a 4 coin flip p = 1/2^4

Dino
Thursday, July 29, 2004

The real world is seldom really random. Are you sure there isn't some bias in your results or is it really just coin flipping?

MilesArcher
Thursday, July 29, 2004

OK, "Let's say you're flipping a coin 10 times. What is the probability of getting heads exactly 4 times in a row?"

Exactly, so not 5 or 6. Also, there's not enough total to have two sets of4, since the exactly means we are looking for the sequence 011110 in a 10 bit sequence. That leaves 4 bits which can be any value. Therefore there are 16 ways to get this among 2^10=1024 possibilities. Probability of the sequence: 16/1024.

Now if we constrain to 5 of 10 must be 1 and 5 0, and we have 011110, there are 4 bits remaining but only one may be 1. Therefore there are only 4 possibilities, for odds of 4/1024.

That is the answer to your question. Extracting a formula for cases with longer sequences or different total #s etc is trivial to extend from my example.

This sort of consulting I don't give away for free to firms. Therefore I ask that you donate $500 to either the American Cancer Society, or Ronald McDonald House. Thanks.

Dennis Atkins
Thursday, July 29, 2004

My math is screwy - I didn't consider that the sequence can be in different positions as well, or that if it is on the edge of the sequence, there is an implied 0 on one side and we are dealing with 5 variable bits.

That's why brute force is the answer - spew them out and count them.

Dennis Atkins
Thursday, July 29, 2004

Danil, I agree with your comments on the meta thread.

Dennis Atkins
Thursday, July 29, 2004

But the thing on the superficially similar problem is that coin toss probability is what most people are familiar with from high school. Probability is really really difficult - harder than calculus. The reason is precisely this - it is very easy to follow the wrong line of reasoning and have it look correct. It is extremely hard to get the right odds unless you do brute force or empirical measurements to know in advance what the right answer is.

However, I do know that the guys talking about 1/2 chance of heads haven't taken or didn't do well at combinatorics, which you would as a junior in any CS program. If you've taken combinatorics (or had to deal with serial data, encoding, error correction...) then you'd know enough to skip the whole 50% line of reasoning and start trying to count all the ways the thing you're looking for can happen.

Dennis Atkins
Thursday, July 29, 2004

Damn, what about:

1111001111
1111011110
0111101111

You CAN have two sets of four in a row. Are these allowed? Not specified in problem...

Dennis Atkins
Thursday, July 29, 2004

Dennis, for this problem I'm limited to an equal number of 1's and 0's (or black and white marbles).

Thus, no double sets of sequential 4's (with ten flips)

However, in the general problem I'm looking at, a more accurate simplification would have 20 coin-flips, and we're still looking for 4 in-a-row. Then indeed we could have two sets of 4's.

Tim
Thursday, July 29, 2004

OK, well here are your cases:

A. 11110nnnnn
B. 011110nnnn
C. n011110nnn
D. nn011110nn
E. same as C, reversed
F. same as B, reversed
G. same as A, reversed

The n's are comprised of one 1, the rest 0s.

So 5 possibilities for A, 4 for B, C and D.
5+4+4+4+4+4+5 = 30 possibilties total.

Odds then are 30/1024.

Dennis Atkins
Thursday, July 29, 2004

For 20 bits with 10 1 and 10 0, so you can have 2 chains of 4 and they can straddle the middle, it's not then a case of extrapolating from these results.

I advise you to count from 0 to 2^20 and check which ones have a chain of exactly 4 1s in a row and just pritn the total or print out the numbers. That's the fastest way since when you do it by hand it seems you always miss some.

If had to to 70 bits in a row or something where its not reasonable to calculate with brute force, then  you'd do more thinking.

So, have you written this script Tim? Go ahead and do so and then post your results.

Dennis Atkins
Thursday, July 29, 2004

Yep, I'm just in the middle of it now. I'm stuck on some time-optimization stuff, but hoping to get it done soon.

If it'll start soon, I'll send it spinning overnight, and hopefully have something for tomorrow.

Tim
Thursday, July 29, 2004

Some might find this interesting...it's been dumbed down a little, but kind of neat

http://www.shodor.org/interactivate/activities/marbles/index.html

Tim
Thursday, July 29, 2004

Let me know if you get 48025.

Dennis Atkins
Friday, July 30, 2004

Which would be 4.58% of the cases.

Dennis Atkins
Friday, July 30, 2004

BTW, use C - it runs here in just a few milliseconds.

Dennis Atkins
Friday, July 30, 2004

Heh, I left out a few, it's actually 52,580 cases or 5.01442% of all possibilities. Do you get the same?

Dennis Atkins
Friday, July 30, 2004

Work exploded here. I'll post results ASAP.

Tim
Friday, July 30, 2004

This looks like a case where using resampling would be the way to figure out your answer.  It eliminates the problem of picking the correct statistical formula, along with that formula's hidden assumptions.  I know Julian Simon wrote a couple of books on the subject.

J. Lee
Friday, July 30, 2004

The OP was talking about Bernoulli's Binomail Probabiliites

n= # of trials
r = # of desired successes
p = probability of success
q = probability of failure

P(EXACTLY r times) = (nCr)(p^r)(q^(n-r))

P(AT LEAST r times) = P(EXACTLY r) + P(EXACTLY r+1) + ... + P(EXACTLY n)

Bella
Friday, July 30, 2004

Okay, I decided to try writing a little algorithm to calculate these numbers of runs recursively. Make of it what you will:

# Calculates the number of possible binary sequences of length n
# which contain a 'run' of /at least/ k 1's.
# Works recursively using a simple recurrence relation - algorithmic
# complexity could doubtless be improved a lot

def runs(k, n):
        if k > n or n == 0:
                return 0
        return sum([runs(k, i) for i in range(n-k, n)]) + (1 << (n-k))

# Caclulates the number of possible binary sequences of length n
# which contain a run of length k (but no longer runs)

def runsofexactly(k, n):
        return runs(k, n) - runs(k+1, n)

print 'Runs of at least 4 in 10: %d\nRuns of exactly 4 in 10: %d' \
        % (runs(4,10), runsofexactly(4,10))

This outputs:

Runs of at least 4 in 10: 251
Runs of exactly 4 in 10: 139

I think it's correct - it comes out with the right answers for small numbers - but I may be missing an obvious flaw.

Matt
Friday, July 30, 2004

If you're interested in my mathematical justification for that algorithm by the way, just ask :-)

I'm working on a similar algorithm for the case when you know the exact amounts of 0s and 1s in the sequence now...

Matt
Friday, July 30, 2004

The probability to NOT get k heads sequence in a n toss is:
        k
P = F    / n^2
        n+2


where F is the fibonacci number of step k.

For k=4 n=10  P = 773/1024

Then 1- P = 251/1024 is the probability of getting AT LEAST 4 heads sequence in a 10 toss.

For more: http://mathworld.wolfram.com/CoinTossing.html

Dino
Friday, July 30, 2004

I guess Tim was a student scamming for homework answers. And one who can't even program. I was such a fool! When will I learn?

Dennis "Dumbass Patsy Fool" Atkins
Saturday, July 31, 2004

*  Recent Topics

*  Fog Creek Home