# Accuracy of floating point computation in MCL

• To: Michael Lewis <melewis@jeeves.waterloo.edu>
• Subject: Accuracy of floating point computation in MCL
• From: Robert W. Kerns <rwk@crl.dec.com>
• Date: Fri, 29 Nov 1991 15:22-0500
• Cc: info-mcl@cambridge.apple.com

```    Date: Thu, 28 Nov 1991 16:36 EST
From: melewis@jeeves.waterloo.edu (Michael Lewis)

There seems to be a pervasive and important error in floating point
calculations in MCL.

There is a pervasive and important error in floating point
calculations.  Period.

This has nothing to do with MCL, nor with the Macintosh,
and has everything to do with the nature of floating point.

The phenomonon is called "roundoff error".  Floating point
is by definition APPROXIMATE arithmetic.  The biggest source
of your confusion is that there is no way to exactly represent
the number .1 in binary; you have to resort to "repeating binaries"
(in analogy to "repeating decimals").

Consider how you represent 1/3 in base 10, as 0.33333333333.....
1/5 and 1/10 have the same problem in base 2.  No matter how
many bits you allocate to the mantissa of the floating point
number, you will have some remaining error.  If you add up
this error enough times, it will get larger.

And my favourite is

? (> 3/10 .3)
T

This is, in fact, true.  0.3 may be the representation we
normally use for 3/10's, but in fact, there is no way to
represent that as a floating point number.  It's really
0.29999999999999999.... truncated somewhere.

However, the rational number 3/10 is precise.  Rational
numbers do not suffer from this difficulty.

Imagine the consequences!

Yes, it is important for you to imagine the consequences
when you write floating-point code.

Note however that
(- 1/2 .5) yields 0.0

.5 (1/2)  can be exactly represented in binary.

as does
(- 1/8 .125)

As can any power of 1/2.

As our application here depends *critically* on *accurate*
numerical calculations we were wondering exactly how we could get MCL
to "do the right thing"!

Well, you have a few different choices.

1)  You could avoid floating point calculations entirely, and use
rationals.

2)  You could use floating point, but avoid exact comparisions,
which are meaningless with floating point.  This means looking
at your algorithm to see how much roundoff error could be
accumulating, and checking to see if numbers are within that
range, rather than exactly equal.  MCL's floating point numbers
are pretty precise; very seldom do you need *THAT* much precision.

Option # 1 is slower, but exact.  If you're doing number theory,
for example, it will do what you want.

Option # 2 is faster, but inexact.  If you're doing long numerical
calculations, it is more appropriate.

There are many textbooks which cover this topic in more detail.
It certainly should be covered in almost any basic programming
textbook in basic form, although often it is not.  If your algorithm
is complex, you might need considerable help in figuring out how
large your error bounds are, or how to make them smaller.  Check
out a textbook on numerical programming for more assistance on
these topics.

```