[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Lucid 3.0.2 Optimization botch



re: "5/5/89  Cinco de Mayo PCL" has a problem on Lucid 3.0 with the optimizing
    compiler. On a Sun 3/60, SunOS 4.0.1, Sun Common Lisp 3.0.2,
    pcl runs test.lisp correctly when compiled with the default (development)
    compiler mode. However, when (proclaim '(optimize (compilation-speed 0)))
    is evaluated prior to (load "defsys") in order to use the production
    mode of the compiler, the following happens:

This isn't a bug in Lucid's compiler -- it's a lurking bug in PCL that 
will "bite" most implementations where different settings of the compiler
optimization switches will produce morphologically different (but of 
course functionally equivalent) function objects.

The difficulty is in how discriminator codes service cache misses.  
They  "call out" to (potentially) random functions that will in some 
cases "smash" the function object that was actually running as the 
discriminator code.  This is all right providing you don't return to 
that function frame, but alas ...
 
I know this is a more extensive problem because the code in the
port-independent function 'notice-methods-change' goes out of
its way to do a tail-recursive call to the function that is going
to smash the possibly-executing discriminator code.  Here is the
commentary from that code (sic):

	;; In order to prevent this we take a simple measure:  we just
	;; make sure that it doesn't try to reference our its own closure
	;; variables after it makes the dcode change.  This is done by
	;; having notice-methods-change-2 do the work of making the change
	;; AND calling the actual generic function (a closure variable)
	;; over.  This means that at the time the dcode change is made,
	;; there is a pointer to the generic function on the stack where
	;; it won't be affected by the change to the closure variables.


A similar thing should be done in the construction of standard-accessor, 
checking, and  caching dcodes.  In an experimental version here at Lucid, 
I rewrote  dcode.lisp to do that, and there is no problem with it.  
Although that code is somewhat Lucid-specific, it could be of help to 
someone who wanted to rewrite the generic dcode.lisp (no pun intended). 
Contact me privately if you are intersted.

Doing a tail-recursive call out of dcodes when there is a cache miss
is a good thing, regardless of other problems.  I think one might as
well do it.  However, I should point out that in the presence of 
multiprocessing, there is another more serious problem that cannot be
solved so simply.  Think about what happens when one process decides
to update a dcode while another process is still using it; no such
stack-maintenance discipline will fix this case.  A tail-recursive
exit from the dcode will *immensely* reduce the likelihood that
another process can sneak in during the interval in which the dcode
requires consistency in its function; but it can't reduce that
likelihood to zero.

The more desirable thing to do is to put the whole "dcode" down one 
more level of indirection through the symbol-function cell of the 
generic functon.  This is effectively what PCL's 'make-trampoline' 
function does, but unfortunately that is not a very efficient approach 
when you consider how most compilers will compile it.  Something akin 
to the "mattress-pads" in John Foderaro's code (in the fin.lisp file) 
could probably be done for many other implementations as well.  I don't 
know who wrote the Lucid specific parts of fin.lisp, but I can supply 
the Lucid-specific code sequences of corresponding "mattress-pad"'s for 
several machines to anyone who wants to give it a try.  

Just doing the tail-recursive exit will probably make the matter 
essentially moot for all practical purposes.



-- JonL --