Fog Creek Software
Discussion Board

Testing floating point numbers for equality

In general I know it's dangerous to test floating point numbers for being equal to a known value because of the inherent inaccuracy of the representation for some values.

However, if we are only ever assigning a variable the values -1.0, 0.0 or +1.0 and only ever testing it's equality for being equal to -1.0 or +1.0 is it still dangerous since there's no ambiguity representing these values in FP?

note: We could just use an integer type but practically every other variable involved needs to be a double and some of the group are concerned about the needless casting overhead in tight loops. This is in a class being passed from .NET managed C++ to unmanaged C++ in case it's relevant.

Monday, July 12, 2004

It's the same problem, when you assign a floating point number you don't know what it is stored as.

Except by coincidence the value stored as 1.0 in floating point is no more likely to be exactly 1.0 than storing 1.1234567890 is exactly that.

To compare them, multiply by the number of digist preceision you want and convert to ints.

Martin Beckett
Monday, July 12, 2004

> needless casting overhead in tight loops

I would write a small program to test this assumption. I would be suprised if other things in your loop didn't overshadow the casting time.

This is called premature optimisation, you can read about it here:

Matthew Lock
Monday, July 12, 2004

>Except by coincidence the value stored as 1.0 in floating point is no >more likely to be exactly 1.0 than storing 1.1234567890 is exactly >that.

You could also use constants (YES NO MAYBE, GT EQ LT) to ensure they're exactly the same value.


Monday, July 12, 2004

If you're only testing for those three values you can just use: x!=0, x>0, x<0 and you'll be fine.
Checking for equality (or greater/smaller than) against zero is inherently faster anyway, so why not?


John Q Tester
Monday, July 12, 2004

>You could also use constants (YES NO MAYBE, GT EQ LT) to >ensure they're exactly the same value.

The important point about floating point is that you ask the computer to store 1.0 it stores an approximation to it. You don't know what the stored value is.
Setting a #define to 1.0 and using this to set the floating value and compare with doesn't help.

ps. If you meant INSTEAD of using floating point my apologies I misunderstood you.

Martin Beckett
Monday, July 12, 2004

I believe that 1.0, 0.0 and -1.0 (and indeed any integers, up to a certain limit) will always be stored as exact values in floating point. So if you really are only assigning these values once, then you shouldn't have any problems - but can you guarantee that will be true in the long term?

Of course you're not guaranteed that things like 0.1 * 10 == 1.0 though.

Matthew Wilson
Monday, July 12, 2004

The solution I like best is the one that suggested comparing to zero.  Assign the values -1.0, 0, 1.0.  Then, compare them to zero

if (MyFloat < 0.0 ) ....

if (MyFloat == 0.0) ...

if (MyFloat > 0.0) ...

The '0.0' literal is if you really want to be that specific, and insure a floating-point compare is done.  (I'm not sure it matters.  I also agree with the poster who said you should write a short test program, and see which variant runs faster).

Monday, July 12, 2004

Or, you could use interval arithmetic and not have to worry about precision problems anymore.

Monday, July 12, 2004

I don't know of any floating-point system ever used in which the numbers 0 and +-1 weren't represented exactly. If your platform is using IEEE floating-point (which it is) then those values are guaranteed to be exact.  So is any number of the form integer / 2^k where k and the length of the integer aren't too large.

Testing against 0 is surely the right way to go, anyway.

It's already been pointed out that using a floating-point number rather than an integer may be premature optimization. That's true. But if you *are* going to engage in premature optimization, I suspect that there may be better ways to do it :-). Is this -1/0/+1 variable going to be used as a multiplication factor? Is the factor going to remain the same through each iteration of whatever inner loop does the multiplying? If so, then it might be a win to turn

    for each element:
        a[i] = b[i] + factor*c[i]


    if factor==0:
        copy b into a
    else if factor<0:
        for each element: a[i] = b[i]-c[i]
        for each element: a[i] = b[i]+c[i]

But, for goodness' sake, don't start doing this sort of thing without some evidence that you need to :-).

If performance is critical, then you shouldn't consider interval arithmetic. I suspect "sir_flexalot" had tongue firmly in cheek.

Gareth McCaughan
Monday, July 12, 2004

Theoretically, it would be possible to implement floating point data that did not store an integer as an exact value, but I have never heard of any implementation that did such a thing.  Integers are stored as exact values in floating point up to the number of digits that fit in the mantissa.  A double can store more digits than a four byte integer and are useful for storing large integers.

For most cases you shouldn't have any trouble comparing an integer value stored in a double.  If these are just stored values you could even use fractions.  A number like 0.1 isn't stored as an exact value, but it is always stored as the same value.

You run in to trouble when doing computations (e.g., 10 * 0.1) and sometimes when doing conversions.  I was working on a project that storing large integers in double data types and stored them in a database.  Our equality tests were failing.  We tracked it down to problem with a data format conversion.  The number coming out was not the same as the number that went in.  But that was an error in the implementation, not inherent to floating point.

Monday, July 12, 2004

See here for some details on the IEEE format for double-precision floating point numbers: .

Note that it has more bits of precision (52) than there are in a long integer on 32-bit systems (32). This means a double can actually hold anything that a long int can, with no loss of precision. Proviso (1): I'm assuming the double is IEEE. Proviso (2): this won't work on 64-bit compilers, where the "long int" may have 64 bits.

But +1.0, -1.0 are certainly OK. If you're using IEEE floats, anyway.

Matthew Morris
Monday, July 12, 2004

Use an aggregated class (System.Decimal) ... Compare against 1.0M?

Monday, July 12, 2004

Ok, forgive my stupidity here, but how is it that a value of 1.0 can be stored as "not quite" 1.0 in a FP variable?  I've never read anything like that.

muppet from
Monday, July 12, 2004

Also keep an eye on how the values are getting set. If they are assgned from constants you are taking a pretty small risk. But if those values come from any kind of a calculation...

Anyway, I'd use integers because most programmers will trip over the code when they see a comparison to an exact number.

Tom H
Monday, July 12, 2004

Do NOT use == with floating point!

In floating point there is both +0.0 and -0.0. This is just the beginning of your woes.

Use an epsilon and do all floating point comparisons using a set of functions.

Dennis Atkins
Monday, July 12, 2004


They are right that 1 and 2 and other small powers of 2 are always going to be stored exactly. But 1/3 is not stored exactly, for example.

Dennis Atkins
Monday, July 12, 2004

Then again that's only if they were just put there. Pass them around a little and that 1.0 becomes 0.999999999999987 before you know it.

Dennis Atkins
Monday, July 12, 2004

ok, but how?  How can the value of 1.0 change if it is only being assigned once, and then compared against other values or passed to other functions?

Is this a C/C# thing?

muppet from
Monday, July 12, 2004

It's generally very good advice not to compare floating-point numbers with ==, but *in this particular situation* there's nothing wrong with it. The fact that IEEE floating point has signed zeroes doesn't cause any problems here; -0 and +0 compare equal. And the second part of Dennis's advice ("use an epsilon and do all floating-point comparisons through a set of functions") needs to be applied with caution; what it *doesn't* mean (or, at least, shouldn't) is to define some universal value epsilon and replace x==y with fabs(x-y)<=epsilon everywhere. If you do that, then you will lose when x and y are either very large or very small. So you could use a relative value for epsilon instead, and replace x==y with fabs(x-y) <= epsilon*(fabs(x)+fabs(y)), but that's not right for all situations either and it's quite expensive. The only reliable advice, I'm afraid is: Understand what you're doing. Then it's easy. Well, easier. :-)

Muppet: The value of 1.0 won't change. If you put 1.0 in a variable and refrain from assigning to that variable, its value won't change. What some people have been expressing concern about is: If those 1.0 and -1.0 values don't actually come from literal constants, but are the results of calculations that *in theory* (with infinite precision, etc.) yield +1 and -1, then those calculations may actually produce slightly incorrect results. I don't think that's a real concern in "mutabled"'s application.

I hope Dennis's last remark is either a joke or an exaggeration for effect. The values will not change just by being passed around.

<pedant>Passing floating-point values around *can* change them, if it results in conversion between different FP formats: double to single, extended double to double. But that won't be an issue for small integer values like 1, which are representable exactly in every FP format in the known universe. And if you're using FORTRAN -- at least some versions thereof -- the value of 1.0 *can* change, kinda-sorta, because FORTRAN does all argument passing by pointers, so when you pass 1.0 to a function it actually gets a pointer to the value 1.0, and ... well, you can guess the tragic remainder of the story.</pedant>

Gareth McCaughan
Monday, July 12, 2004

In retrospect, I think I'm very glad that I never expended the effort to learn C/C++ completely.  I stopped when I hit Windows GUI programming.  :P

Higher level languages are just fine by me.

muppet from
Monday, July 12, 2004

Subtract the two numbers and see if the result is less than some reasonable error.

Monday, July 12, 2004

Not a joke. Consider that case when, as it gets passed down a calling chain with no apparent operations performed, it gets downcast to be a float and then upcast to be a double, etc. Before you know it it's turned into a different number when it gets compared to the original and you didn't do anything to it. This is not going to happen with powers of 2 but I have seen this problem occur and it's why I will grep on 'float' and tear that out of everywhere and replace it with double. Even so, you can still get stuck if you are relying on library functions to use doubles anid not floats internally.

On the comparison function, yes epsilon is going to be your parameter.  You can calculate in from the range you're thinking of using with a macro in order to make the calculation a bit easier.

Dennis Atkins
Monday, July 12, 2004

Oh sorrt, you did mention that in your pedant section. i should have read the whole post. My apologies.

Dennis Atkins
Monday, July 12, 2004


This isn't a C/C++ issue, it's an issue with floating point numbers on any environment/platform.

Basically, it's a mistake to always reach for floating point when you have decimal number. There are many different ways to encode decimal numbers and they all involve trade-offs. Unfortunately, floating point (IEEE standards usually) is usually the only built in decimal data type in most languages.

Now, don't get me wrong, floating point is generally good enough for most purposes. But at some point in one's programming career, one needs to learn floating point's limitations and how to deal with them. And learn the other decimal representation options available and when to use them.

Bill Tomlinson
Monday, July 12, 2004

Small intro:

Old Motorola 68000 had 10 byte doubles and when I was
on 68k Macs I used == with doubles without any problem.

Then came PowerPC and doubles became "more compatible"
with the rest of the world. But instead of just giving up, I
decided to keep == wherever I could. I started comparing
doubles like this:

if (dblround(value1, 2) == dblround(value2, 2))

First version of dblround() was just a sprintf() call, but it
was too slow. Somewhere I found how to do it without sprintf
and it seems to work. Whatever I have in one value, after
rounding it becomes equal to the second value.

double x = 7.;
if (x+5. == 12.)

It was never true, at least on PPC. But when I round them,
they both become 12.0000000...000789... so they're equal.

Source for dblround:

double dblround (double value, short decPlc)
  double  power;
  double  addValue;
  double  tmpValue;

  switch (decPlc)  {
      case 1:  power = 10.;  addValue = 0.05; break;
      case 2:  power = 100.;  addValue = 0.005; break;
      case 3:  power = 1000.;  addValue = 0.0005;  break;
      default:  {
        char    vStr[512];
        sprintf (vStr, "%.*f", decPlc, value + .0000000001);  /* %f */
        return (atof(vStr));
  tmpValue = floor ((value + addValue + .0000000001) * power);
  if (tmpValue < -100e100 || tmpValue > 100e100)
      tmpValue = 0.;

  return (tmpValue / power);

This 0.0000000001 hack is there to round 0.499999...
as 0.5 and therefore as 1.0.

The thing does not work quite well if param "decPlc" is greater
than 10, but at least I don't need any values greater
than 5. With normal values for money and similar values
it's good enough.

(s)printf in CodeWarrior sometimes would expand %f into
some 300 characters even when I tell it not to (%.*f) due
to some bug or whatever reason. It happens "almost
never" and only for few special values. At first I had buffer
for only 256 characters and my app would sometimes crash.
It took me a lot of time in debugger to find out what's
going on. Now I have a vStr's size set to 512.

Monday, July 12, 2004

That's got to be the most perverse code I've ever seen. :-)

Chris Tavares
Monday, July 12, 2004

>>With normal values for money and similar values
it's good enough.

Why store currency as a float? Is there a reason not to store and do all calculations using integers (i.e., cents, not dollars -- or whatever the equivalent in local currency may be) and just convert for display?

John C.
Monday, July 12, 2004

(by "float" above, I also mean "double" or any other fp type)

John C.
Monday, July 12, 2004

I'd like to know the mechanism (bitwise) by which (double)1.0 is not equal to exactly (float)1.0.  I have never heard of that, doesn't seem reasonable really.

As a side note, man this board has gone to hell.  The first several responses are all telling the guy that 1.0 can't be represented exactly as a floating point number!  Why does everyone have to add their own (wrong) 2 cents?  Has happened on so many threads recently...

(note: not saying you should rely on it, or if it's good style to, but there's a difference between that and saying it's not possible)

Monday, July 12, 2004

I can't remember the exact bug I ran into once,  but it was some problem caused by 6400 being represented exactly, but 1/6400 could not.  Something to the effect of taking a number and multiplying it by 6400, truncating it,  and then multiplying it by 1/6400, and subtracting it from the original number and ended up with a remainder greater than 1/6400th.  I'm not sure that is exactly how it worked, but because 1/6400th could not be represented exactly, multiplying and in effect dividing by the same number did not bring me back to the same number, and the truncate in there brought it down a whole int value.

Keith Wright
Tuesday, July 13, 2004


I can't believe Keith posted that right after Roose's comments. Or was it a supremely subtle exercise in irony?

Tuesday, July 13, 2004

unsigned 32bit integers cannot handle more than ~$42M bucks which makes them unsuitable for real world lottery or banking accountancy where you track every cent.

No drama if you see it coming - simply write the app with Binary Coded Decimal (BCD) arithmetic. Legacy systems have to adapt or die as inflation beggars us all.

Some countries had the gumption to replace worn-out currencies with chunkier units e.g. 100 Old Francs became one New Franc.  Lowers your chance of becoming a millionaire and cripples microbusiness, though.

Is 64-bit integer arithmetic arriving too late? 

Wednesday, July 14, 2004

yeah, its getting late

john pollock
Thursday, August 5, 2004

*  Recent Topics

*  Fog Creek Home