[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PCL benchmark



    Date: Mon, 10 Oct 88 15:07:58 PDT
    From: Chris Burdorf <burdorf@rand-unix.ARPA>
          
    I hope people are getting upset about this, because performance
    is a BIG issue, especially for simulation.  

It does make me upset when a user is unhappy about a performance problem
they are having.  PCL performance is not what it could be, but as you
heard at the workshop, there are plans to improve it dramatically by
focusing on the implementation specific parts of it.   Other
implementation efforts, such as TI CLOS, provide encouragement that the
result of these efforts will provide quite high performance.

    Date: Thu, 13 Oct 88 14:59:46 PDT
    From: Chris Burdorf <burdorf@rand.org>

    Ernie 7 seconds
    Allegro PCL 20 seconds
    AKCL PCL 30 seconds.

Hmm.  It runs for me in .4 seconds in Lucid Lisp (SUN 4/110).  I wonder
what we are doing differently.  There are a number issues involved is
assessing the performance of any implementation of CLOS.  Many of these
have been raised in some of the earlier replies to your message.  In
this message, I will try to address some other issues, and try to
summarize what need to be done next.


*** Comparing Apples and Oranges ***

This is a serious problem when measuring CLOS performance.  Danny and I
have been saying that it is important to measure "comparable programs".
Among other things, we mean is that when re-writing a program from some
other language to CLOS, it is important to be sure that the re-written
program is as close as possible in functionality to the original.  If
CLOS doesn't let you get the re-write close, that reflects a problem
with CLOS.  But its important to try to use CLOS in the way that gets
the most correct re-write.  Of course this can take a couple of passes
since we are all just learning how best to use the language.

Taking advantage of the high performance aspects of a language can be
deceptive.  Stan Lanning has an interesting example of this which I hope
he will send out on Monday.

    Date: Mon, 10 Oct 88 15:07:58 PDT
    From: Chris Burdorf <burdorf@rand-unix.ARPA>

    I just ran a simulation benchmark in PCL.  It is a queueing simulation.
    To get it to run under PCL, I took the simulator from our own object
    system called ERNIE and converted it into PCL.  I then converted the
    queueing simulation into PCL.

    The timings I got were as follows:

             Ernie 4 seconds
             PCL   30 seconds

    ERNIE runs under PSL.  The version of PCL I have runs under AKCL.

There are many possible problems here.  Warren Harris mentioned the most
prominent one.  What is Ernie?  In one of your later messages, you said
that Ernie does earlier binding than CLOS does.  Do you mean that Ernie
is a statically typed language?  Or does Ernie do "block compiling" by
default?  Either of these would make it a significantly different
language than standard CLOS.  Of course it is quite easy to customize
CLOS to have these properties, perhaps that should be investigated.

Also, what did the code look like in Ernie?  Are you sure that when you
did the translation you didn't add more "object orientedness" than was
there in the first place.  One specific question is whether, in Ernie,
the accessors are generic.  If the KCL port supported structure classes,
you could experiment with rewriting this program using defstruct instead
of defclass.

A very serious issue is the quality of the KCL port of PCL.  It is
lousy, PCL performance in KCL is worse than in any other Common Lisp I
know of.  This is my fault, its just that not as much work has been done
on the KCL port as on other ports.  This shouldn't be too hard to fix.
The people at Ibuki have promised to help with this, so there should be
some significant improvement on this front soon.

Also of importance is the difference between KCL and PSL.  For a
specific program, two Lisps which are otherwise relatively comparable
can differ a lot.  If the program happens to stress a place where the
two lisps differ, the results can be dramatic.

The PCL--KCL, Ernie--PSL relationships are also important.  PCL is a
portable program, and as I said, the KCL port of PCL is bad.  You say
that you do not use the PSL system-lisp optimization, but perhaps the
implementor of Ernie does use it?  When you compiled PCL in KCL did you
set safety to 0 and speed to 3?

*** Caching Problems ***

    Date: Tue, 11 Oct 88 09:47:27 PDT
    From: larus%paris.Berkeley.EDU@ginger.Berkeley.EDU (James Larus)

    The first bug is that the caches for the discriminator functions have
    32 entries.  While a fixed-size cache works for some generic
    functions, it fails miserably for generic functions with more than 32
    methods ...

Three comments:

- Chris' code never gets cache misses, so this is not a problem
  in his system.

- The issue is not whether the generic function has more than 32
  methods.  Rather it is how many different classes of arguments 
  the generic function is called with.  

- As Rob Pettengill pointed out we have a much better strategy
  for dealing with this now, and we expect to have dynamically
  expanding caches this month.  I hope this will make a significant
  difference in the performance of your program.

    The second bug is that MAKE-INSTANCE is incredibly expensive.  The
    metaobject hair and parsing property list make for nice, general
    systems that are too expensive to use in real programs.  In fact, I'd
    argue that the CLOS standard should be rejected on this grounds!
    Gregor has a workaround that involves using another constructor that
    can be precompiled.  This extension is, however, not part of CLOS and
    so will not be portable.

The metaobject mechanism is not really what causes performance problems
in the initialization protocol.  Users have wanted extensible
initialization protocols in languages which didn't have a metaobject
protocol.  The initialization protocol is extensible, that can cause
performance problems.  But these are relatively easy to deal with.
Patrick Dussud's implementation and mine both have techniques for
solving them.  The specific version of the constructor code that will be
in the next release is not portable, but it is just an interim version
of a mechanism that will be portable.  In tests of the new mechanism at
PARC, we have seen increases in instance creation performance of up to
100 times.

An important point is that Chris' program doesn't call make-instance
when it is "running", so that isn't causing him problems.


*** Watch Out for Bugs ***

I notice in the code you sent me that it prints a lot of messages when
the simulator is running.  I assume you commented out all these calls to
format before you took these measurements, but the times you sent were
so much greater than the .4 seconds I measured that I wonder.  Certainly
something like this could throw the timing off completely.


*** Summary ***

The following are needed:

 - more information about Ernie
 - what your program looked like when it was written in Ernie
 - make sure PCL was compiled with speed 3 and safety 0.
 - try a different port of PCL.
 - improve the port of PCL you have
 - check on the measurement technique used.  Specifically make
   sure the calls to format were commented out.

Each of these will tell us a little about what is going on here.  I take
your complaint about PCL performance quite seriously, but before things
can be improved, we need to know what to improve.
-------