[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

floating point questions



re: Dick reviewed my snapshot draft of the standard and suggested I talk to you
    about floating point number representation. . . .  Do you have other 
    thoughts on how floating point data types should be specified/implemented?
    I don't see any clean-up proposals that have to do with this topic, just 
    the LEAST-POSITIVE-<mumble>-FLOAT communication.

There are three problems of concern that I know about:


First, for purposes of CLOS, it is very inconvenient to have sub-range 
types for any numerical type.  In one sense, the short-float, single-float,
double-float, and long-float types are sub-ranges of float; and the thorny 
issue is that that there are three possible variations one how they are 
merged.  I don't know how to solve this one, except by ignoring the 
existence of differing float types [and there is probably at least one or 
two manufactures who will fight that to the hilt, since they have optimized 
one extreme end or the other and perhaps see this distinction as a 
"competitive edge"].  I *could* make a case for having only FLOAT as a
Common Lisp type, leaving to the vendors the issue of foisting distinctions
off on to the user [since in many cases, the distinctions will be irrelevant].
Very briefly, the three main points of this case are 
    (1) As a standard, CLtL p16-17 guarantees virtually nothing about what 
	single-float, double-float etc. mean; in one implementation, single
	could mean a 23-bit mantissa, and in another it could mean a 96-bit
	mantissa.  Hence there is no guarantee of portability, so why bother?
    (2) A recent survey of some numerical analysts, in a company dedicated
	to selling Fortran engines, discovered the all-too-obvious fact that
	many many algorithms are numerically unstable when run under the IEEE 
	32-bit format, but are quite well-behaved under the 64-bit format; 
	but interestingly, it turned up *no* cases of ill behaviour in the
	64-bit mode that were correctible by going to a 128 bit format.
	[Now, this is not the same as an "ill conditioned" problem].  In short,
	there is a "good enough" size -- larger isn't needed, and smaller
	could be justified only ocasionally by savings in space and/or time.
    (3) On most machines, there seems to be a "preferred" format.  In fact,
	I'm aware of some hardware where single-float operations are a tad
	slower than double-float ones; the driving motivation is that the
	numerical analysts wanted the fastest possible floating point of
	"good enough" size, and the other sizes were supported only for
	"compatibility".  Also, compact representations inside arrays
	provide the only intesting space savings; this is quite analogous
	to packed arrays of integers [e.g., an array element-type of
	(signed-byte 8)]
[Since a larger group is being cc'd, I'd like to appeal to that group *not*
to flood the mailing list with lots of trivial counterexamples to each of
the above generalizations.  I'm sure they've all been thought of before;
and since the status quo will remain until an actual proposal is made,
there is no need to argue against a non-proposal.  If anyone would like
to contact me privately about pursuing such a proposal, I will respond;
but so far, I haven't seen much interest].



Second, some implementations permit "extremals" to be representable numbers, 
and others don't;  e.g., the IEEE standard allows for "denormalized" and 
"infinity" numbers, while VAX and IBM/370 don't.  So the question arises 
as to just what "least-positive-<mumble>-float" means;  is it the smallest 
possible representation, or is it the smallest "normal" representation?  
Paul Hilfinger (at Berkeley) feels rather strongly that is should be
the smallest possible representation; but now that raises the issue that
on some implementatons, "least-positive-<mumble>-float" is a perfectly
normal number causing no exceptions whatsoever, while on others it will
cause an "underflow" type trap whenever it is produced (unless you turn
off trapping,  and yes, "gradual underflow" really is "underflow").  About 
the best consensus I could get was to follow Symbolics lead and add the 
names "least-positive-normalized-<mumble>-float", so that there would be a 
portable way of dealing with the smallest reasonable number.   Also:
  (eql least-positive-<mumble>-float least-positive-normalized-<mumble>-float)
could be used as a test to determine whether or not the implementation 
supports denormalized numbers.


A possible third trouble is with "most-positive-<mumble>-float" -- should
this be the largest reasonable number, or the largest possible representation?
If the latter, then in the IEEE format, the positive infinity should be
"most-positive-<mumble>-float" since it certainly is larger than any other
float.  By analogy with the difference between "least-positive-..." and
"least-positive-normalized-...", I would have liked to see "most-positive-..."
and "most-positive-normalized-..."; that way, the test
  (= most-positive-<mumble>-float most-positive-normalized-<mumble>-float)
could be used to determine whether or not the implementation supports
infinities.  But alas, I couldn't convince QUUX (Guy Steele) about this one,
so I'm not sure it's worth wasting any more time over.


-- JonL --