[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Funcall Speed-Up
I have a few words of commentary about how Lucid's implementation treats
the problem you brought up in this note, namely optimizations for FUNCALL
when the multiple-value context is simple.
But, as usual, I am unable to reproduce your results in any satisfying
way. Part of the problem is of course that you are runing on a Sun3/280
and I am running on a Sun3/480. Yet I highly suspect that you were using
the so-called SFC compiler rather than the PQC -- the former is a "fast"
compiler producing not-particularly optimized code (desirable for use while
doing rapid development) whereas the latter is a bells-and-whistles compiler
for use when you want the fastest possible speed. Although I would expect
you to use the latter compiler for benchmark comparison purposes, your note
didn't show awareness of the availability of the two options, so I must
presume that you used the "default" state, which is the one producing the
slower code. [You mention setting the SPEED and SAFETY optimize qualities,
but don't mention COMPILATION-SPEED; the SFC/PQC selection is based only on
the COMPILATION-SPEED quality, and the SFC effectively ignores SPEED and
SAFETY declarations.]
Accordingly, I have included a table below which may help us resolve the
differences.
First, Lucid's implementation is such that there is practically no
performance difference between doing:
(setq foo #'mumble)
(funcall foo ...)
and:
(mumble ...)
The only difference is the runtime certification that 'foo' holds a
compiled-function (in the FUNCALL case). At the very micro level this
might be an effect varying between 8% and 24% depending on alignment of
the instruction cache (one must always be cognizant of the pitfalls of
"micro" benchmarking!) This cost could be elided if the user would insert
a declaration like:
(funcall (the procedure foo) ...)
but that capability isn't in the current product.
Lucid's "Production" compiler has long taken cognizance of the calling
context of both forms -- (mumble ...) as well as (funcall foo ...) --
and elides any extra work that multiple-value returns would cause when
the call form is in a "1-return-value-wanted" or "dont-care" situation.
Furthermore, there is usually only a minimal cost when multiple return
values are being "spread-out" as in a multiple-value-bind etc.
Here is a table of TIME outputs for (report 1000000). I was using
;;; Sun Common Lisp, Development Environment 3.0.1, 2 August 1988
Note that all times are in seconds, and there were no page faults
and no consings.
Sun3/480 Sun3/480 Sun3/280 Sun3/280
SFC times PQC times Bob's times .95*Bob's times
do-cost
Elapsed Real Time = 1.98 0.50 1.04
Total Run Time = 1.98 0.50 0.72
User Run Time = 1.98 0.50 0.70
System Run Time = 0.00 0.00 0.02
regular-call-cost
Elapsed Real Time = 5.28 3.46 6.36
Total Run Time = 5.28 3.46 5.44
User Run Time = 5.28 3.46 5.40 5.13
System Run Time = 0.00 0.00 0.04
funcall-cost
Elapsed Real Time = 5.70 4.14 6.28
Total Run Time = 5.70 4.14 6.26
User Run Time = 5.70 4.14 6.24 5.93
System Run Time = 0.00 0.00 0.02
Now, I have some reason to believe that certain Lisp programs would
run only 5% or 10% faster when moved form a Sun3/280 to a Sun3/480.
Although I have no explanation for the anomalous reading on your DO-COST
measurement, you can see why, from the comparative values of the "User
Run Time" lines for the "regular-call-cost" and "funcall-cost" runs,
that I suspect you were using the SFC.
If this explanation is correct, then that would make AKCL's minimal
function-to-function overhead about 20% faster than Lucid's (at the
"micro" level); that seems quite reasonable. If it isn't true, then
your numbers would imply that AKCL's "micro" time is twice as fast
as Lucid's; that doesn't seem reasonable.
-- JonL --