[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Funcall Speed-Up

To: boyer@cs.utexas.edu
Subject: Funcall Speed-Up
From: Jon L White <jonl@lucid.com>
Date: Fri, 20 Oct 89 04:42:46 PDT
Cc: kcl@cli.com
In-reply-to: Bob Boyer's message of Wed, 18 Oct 89 18:49:45 CDT <8910182349.AA02243@rascal.ics.utexas.edu>

I have a few words of commentary about how Lucid's implementation treats
the problem you brought up in this note, namely optimizations for FUNCALL
when the multiple-value context is simple.  

But, as usual, I am unable to reproduce your results in any satisfying 
way.  Part of the problem is of course that you are runing on a Sun3/280 
and I am running on a Sun3/480.  Yet I highly suspect that you were using 
the so-called SFC compiler rather than the PQC -- the former is a "fast" 
compiler producing not-particularly optimized code (desirable for use while 
doing rapid development) whereas the latter is a bells-and-whistles compiler
for use when you want the fastest possible speed.  Although I would expect 
you to use the latter compiler for benchmark comparison purposes, your note 
didn't show awareness of the availability of the two options, so I must 
presume that you used the "default" state, which is the one producing the 
slower code.  [You mention setting the SPEED and SAFETY optimize qualities, 
but don't mention COMPILATION-SPEED; the SFC/PQC selection is based only on
the COMPILATION-SPEED quality, and the SFC effectively ignores SPEED and
SAFETY declarations.]

Accordingly, I have included a table below which may help us resolve the 
differences.


First, Lucid's implementation is such that there is practically no 
performance difference between doing:
		(setq foo #'mumble)
		(funcall foo ...)
and:
		(mumble ...)
The only difference is the runtime certification that 'foo' holds a
compiled-function (in the FUNCALL case).  At the very micro level this 
might be an effect varying between 8% and 24% depending on alignment of 
the instruction cache (one must always be cognizant of the pitfalls of 
"micro" benchmarking!)  This cost could be elided if the user would insert
a declaration like:
		(funcall (the procedure foo) ...)
but that capability isn't in the current product.

Lucid's "Production" compiler has long taken cognizance of the calling
context of both forms -- (mumble ...) as well as (funcall foo ...) --
and elides any extra work that multiple-value returns would cause when
the call form is in a "1-return-value-wanted" or "dont-care" situation.
Furthermore, there is usually only a minimal cost when multiple return 
values are being "spread-out" as in a multiple-value-bind etc.



Here is a table of TIME outputs for (report 1000000).  I was using
;;; Sun Common Lisp, Development Environment 3.0.1, 2 August 1988
Note that all times are in seconds, and there were no page faults 
and no consings.

                    Sun3/480   Sun3/480   Sun3/280     Sun3/280
                    SFC times  PQC times  Bob's times  .95*Bob's times

do-cost
Elapsed Real Time =  1.98       0.50       1.04 
Total Run Time    =  1.98       0.50       0.72 
User Run Time     =  1.98       0.50       0.70 
System Run Time   =  0.00       0.00       0.02 

regular-call-cost
Elapsed Real Time =  5.28       3.46       6.36 
Total Run Time    =  5.28       3.46       5.44
User Run Time     =  5.28       3.46       5.40        5.13 
System Run Time   =  0.00       0.00       0.04 

funcall-cost
Elapsed Real Time =  5.70       4.14       6.28 
Total Run Time    =  5.70       4.14       6.26
User Run Time     =  5.70       4.14       6.24        5.93 
System Run Time   =  0.00       0.00       0.02 



Now, I have some reason to believe that certain Lisp programs would 
run only 5% or 10% faster when moved form a Sun3/280 to a Sun3/480.  
Although I have no explanation for the anomalous reading on your DO-COST
measurement, you can see why,  from the comparative values of the "User 
Run Time" lines for the  "regular-call-cost" and "funcall-cost" runs,
that I suspect you were using the SFC.

If this explanation is correct, then that would make AKCL's minimal
function-to-function overhead about 20% faster than Lucid's (at the
"micro" level); that seems quite reasonable.  If it isn't true, then
your numbers would imply that AKCL's "micro" time is twice as fast
as Lucid's; that doesn't seem reasonable.



-- JonL --

References:
- Funcall Speed-Up
  - From: boyer@rascal.ics.UTEXAS.EDU (Bob Boyer)

Prev by Date: Please remove me from KCL mailing list.
Next by Date: Funcall Speed-Up
Previous by thread: Funcall Speed-Up
Next by thread: Re: Funcall Speed-Up
Index(es):
- Date
- Thread