[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
floating point questions
- To: chapman%lisp.DEC@decwrl.dec.com
- Subject: floating point questions
- From: Jon L White <edsel!jonl@labrea.Stanford.EDU>
- Date: Fri, 8 Apr 88 22:36:11 PDT
- Cc: edsel!jonl@labrea.Stanford.EDU, cl-cleanup@sail.stanford.edu
- In-reply-to: chapman%lisp.DEC@decwrl.dec.com's message of Fri, 8 Apr 88 07:23:21 PDT <8804081423.AA17359@decwrl.dec.com>
re: Dick reviewed my snapshot draft of the standard and suggested I talk to you
about floating point number representation. . . . Do you have other
thoughts on how floating point data types should be specified/implemented?
I don't see any clean-up proposals that have to do with this topic, just
the LEAST-POSITIVE-<mumble>-FLOAT communication.
There are three problems of concern that I know about:
First, for purposes of CLOS, it is very inconvenient to have sub-range
types for any numerical type. In one sense, the short-float, single-float,
double-float, and long-float types are sub-ranges of float; and the thorny
issue is that that there are three possible variations one how they are
merged. I don't know how to solve this one, except by ignoring the
existence of differing float types [and there is probably at least one or
two manufactures who will fight that to the hilt, since they have optimized
one extreme end or the other and perhaps see this distinction as a
"competitive edge"]. I *could* make a case for having only FLOAT as a
Common Lisp type, leaving to the vendors the issue of foisting distinctions
off on to the user [since in many cases, the distinctions will be irrelevant].
Very briefly, the three main points of this case are
(1) As a standard, CLtL p16-17 guarantees virtually nothing about what
single-float, double-float etc. mean; in one implementation, single
could mean a 23-bit mantissa, and in another it could mean a 96-bit
mantissa. Hence there is no guarantee of portability, so why bother?
(2) A recent survey of some numerical analysts, in a company dedicated
to selling Fortran engines, discovered the all-too-obvious fact that
many many algorithms are numerically unstable when run under the IEEE
32-bit format, but are quite well-behaved under the 64-bit format;
but interestingly, it turned up *no* cases of ill behaviour in the
64-bit mode that were correctible by going to a 128 bit format.
[Now, this is not the same as an "ill conditioned" problem]. In short,
there is a "good enough" size -- larger isn't needed, and smaller
could be justified only ocasionally by savings in space and/or time.
(3) On most machines, there seems to be a "preferred" format. In fact,
I'm aware of some hardware where single-float operations are a tad
slower than double-float ones; the driving motivation is that the
numerical analysts wanted the fastest possible floating point of
"good enough" size, and the other sizes were supported only for
"compatibility". Also, compact representations inside arrays
provide the only intesting space savings; this is quite analogous
to packed arrays of integers [e.g., an array element-type of
(signed-byte 8)]
[Since a larger group is being cc'd, I'd like to appeal to that group *not*
to flood the mailing list with lots of trivial counterexamples to each of
the above generalizations. I'm sure they've all been thought of before;
and since the status quo will remain until an actual proposal is made,
there is no need to argue against a non-proposal. If anyone would like
to contact me privately about pursuing such a proposal, I will respond;
but so far, I haven't seen much interest].
Second, some implementations permit "extremals" to be representable numbers,
and others don't; e.g., the IEEE standard allows for "denormalized" and
"infinity" numbers, while VAX and IBM/370 don't. So the question arises
as to just what "least-positive-<mumble>-float" means; is it the smallest
possible representation, or is it the smallest "normal" representation?
Paul Hilfinger (at Berkeley) feels rather strongly that is should be
the smallest possible representation; but now that raises the issue that
on some implementatons, "least-positive-<mumble>-float" is a perfectly
normal number causing no exceptions whatsoever, while on others it will
cause an "underflow" type trap whenever it is produced (unless you turn
off trapping, and yes, "gradual underflow" really is "underflow"). About
the best consensus I could get was to follow Symbolics lead and add the
names "least-positive-normalized-<mumble>-float", so that there would be a
portable way of dealing with the smallest reasonable number. Also:
(eql least-positive-<mumble>-float least-positive-normalized-<mumble>-float)
could be used as a test to determine whether or not the implementation
supports denormalized numbers.
A possible third trouble is with "most-positive-<mumble>-float" -- should
this be the largest reasonable number, or the largest possible representation?
If the latter, then in the IEEE format, the positive infinity should be
"most-positive-<mumble>-float" since it certainly is larger than any other
float. By analogy with the difference between "least-positive-..." and
"least-positive-normalized-...", I would have liked to see "most-positive-..."
and "most-positive-normalized-..."; that way, the test
(= most-positive-<mumble>-float most-positive-normalized-<mumble>-float)
could be used to determine whether or not the implementation supports
infinities. But alas, I couldn't convince QUUX (Guy Steele) about this one,
so I'm not sure it's worth wasting any more time over.
-- JonL --