[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

A question about tweaking the 3600 Lisp Compiler



    Date: Mon, 7 Mar 88 11:22 EST
    From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>

        Date: Fri, 4 Mar 88 11:00 PST
        From: Siskind.pa@Xerox.COM

        I am doing some experiments where I compile SAT problems
        (Boolean satisfiability) and CSP problems (Constraint Satisfaction
        problems) directly into very tight Lisp code. The code is very simple.
        It is not very Lisp-like and ressembles Assembler code. It is all one
        large PROG with lots of IF-EQ-GOTOs and some SETQs to local variables.
        I can generate such code from the SAT/CSP problems quite quickly. The good
        thing about such code is that once it is compiled it runs exceedingly
        quickly (about 10,000 times faster than other more sophisticated SAT/CSP
        solvers which I have tried). The only problem is that the 3600 Lisp compiler
        takes forever to compile the large proceedures which my SAT/CSP-Lisp translater
        generates because of all of the optimization which is being done. 

    Are you sure it's the optimization, or is that just an assumption?

I'm pretty sure. In the example shown below, 58% is spent in COMPILER:OPTIMIZE-FORM.

                                                                          All of
        that optimization is useless for this application as the code which I
        generate has a trivial one-to-one correspondence to the most optimal
        3600 machine instruction sequences which the compiler ends up prducing.
        I tried setting both COMPILER:*PHASE-1-OPTIMIZE-P* and COMPILER:*RUN-PHASE-3*
        to NIL. That helped a bit but not enough.

Just for your info, it was Eric Weaver who suggested that I turn off those two switches.

        Question 1: How do I turn off ALL optimizations.

    I would first use metering tools, even crude ones like pressing c-m-Suspend
    a statistically significant number of times while compiling, to find out
    where the time is being spent.  Only then should you worry about turning
    things off.

Here are the results of using my profiling tools. (Do you remember the help you gave
me several months ago. Well this tool which I wrote is the result of that help. I find
it really easy to use and provides simple, uncluttered, and easy to interpret information.
I'll send you a copy if you want. I would submit it to SLUG but I don't know how.)

Anyway, first I ran the compiler with COMPILER:*PHASE-1-OPTIMIZE-P* and COMPILER:*RUN-PHASE-3*
both set to T. It took 1715 samples (at 10 samples per second) to compile my one function.
Here are the percentages of time spent inside each function. (The times are cummulative,
if A calls B then B's time is also charged against A.)

A total of 1715 samples were taken.

The following system functions were profiled:
 92%  COMPILER:COMPILE-TO-CORE
 92%  (:INTERNAL COMPILER:COMPILE-TO-CORE 1 (:DUMP-LAMBDA-EXPRESSION))
 92%  COMPILE
 92%  COMPILER:COMPILE-DEFINITION-1
 92%  COMPILER:COMPILE-LAMBDA-EXP
 69%  COMPILER:PHASE-1-BLOCK
 69%  COMPILER:RUN-PHASE-1-FUNCTION
 69%  (:PROPERTY PROG COMPILER:PHASE-1)
 69%  COMPILER:PHASE-1-EXTERNAL-FUNCTION
 69%  COMPILER:PHASE-1-FUNCTION-DEFINITION
 69%  COMPILER:PHASE-1-PROG
 69%  COMPILER:PHASE-1
 61%  COMPILER:TRANSFORM
 58%  COMPILER:OPTIMIZE-FORM
 55%  (:PROPERTY IF COMPILER:PHASE-1)
 37%  COMPILER:TRANSFORM-PREDICATE
 21%  COMPILER:ASSEMBLE-FUNCTION
 21%  COMPILER:RUN-PHASE-4
 17%  COMPILER:FIXUP-BRANCHES
 15%  COMPILER:PASS-OVER-BRANCHES
 11%  SYS:%ALLOCATE-BLOCK
 10%  SI:%GC-SCAVENGE
  6%  SI:%ALLOCATE-LIST-BLOCK-ESCAPE
  6%  COMPILER:PHASE-1-PROGN

I then ran the compiler again with both switches set to NIL. Here are the results
(For your info, the function being compiled is a CSP (Constraint Satisfaction problem)
encoding of the 8-Queens problem.)

A total of 974 samples were taken.

The following system functions were profiled:
 74%  COMPILER:COMPILE-TO-CORE
 74%  (:INTERNAL COMPILER:COMPILE-TO-CORE 1 (:DUMP-LAMBDA-EXPRESSION))
 74%  COMPILE
 72%  COMPILER:COMPILE-DEFINITION-1
 72%  COMPILER:COMPILE-LAMBDA-EXP
 43%  SYS:%ALLOCATE-BLOCK
 41%  SI:%GC-SCAVENGE
 38%  COMPILER:ASSEMBLE-FUNCTION
 38%  COMPILER:RUN-PHASE-4
 34%  COMPILER:PHASE-1-BLOCK
 34%  COMPILER:RUN-PHASE-1-FUNCTION
 34%  (:PROPERTY PROG COMPILER:PHASE-1)
 34%  COMPILER:PHASE-1-EXTERNAL-FUNCTION
 34%  COMPILER:PHASE-1-FUNCTION-DEFINITION
 34%  COMPILER:PHASE-1-PROG
 34%  COMPILER:PHASE-1
 26%  (:PROPERTY IF COMPILER:PHASE-1)
 23%  SI:%ALLOCATE-STRUCTURE-BLOCK-ESCAPE
 23%  SI:SIMPLE-MAKE-ARRAY-NSS-AREA
 23%  COMPILER:FIXUP-BRANCHES
 20%  SI:%ALLOCATE-LIST-BLOCK-ESCAPE
 19%  COMPILER:PASS-OVER-BRANCHES
 13%  COMPILER:ASSEMBLE-INTERVAL
 11%  APPEND
 11%  REDUCE
 11%  SI:%MAKE-LIST
  9%  COMPILER:PHASE-1-FUNCTION-CALL
  9%  (:PROPERTY COMPILER:JUMP COMPILER:ASSEMBLE)
  8%  COMPILER:TRANSFORM
  8%  COMPILER:NOTE-BRANCH
  6%  COMPILED-CSP-SOLVER-INTERNAL

    You can start by eschewing macros like IF and PUSH, as well as optimizable
    functions like LIST, if you're really aware of a one-to-one correspondence
    between your code and machine instructions.

Are you trying to tell me that COND is more primitive than IF? Anyway, macro
expansion is trivial for my examples and doesn't take much time. I will however
follow your suggestion and not use macros. BTW, what are the non-optimizable
or pre-optimized versions of LIST?

        Question 2: I know the machine code sequences which I want to generate.
        There are only two general templates of length 6 and 2 instructions
        respectively. Can I use an assembler directly and I how do I do that?

    It's a little known fact, but there is a crude assembler buried somewhere
    in the system.  I'd rather not tell you how to use it, for three reasons.
    One is that it won't actually work for you; without some modifications, it
    won't assemble code that you can call as an ordinary function.  Second is
    that if you adopt that approach, your code won't port to future machine
    offerings.  Third, because of the way that assembler is implemented, it
    may well be slower than the compiler.  We can keep this idea in reserve
    in case better ideas don't pan out, but currently I doubt that the assembler
    will be the answer to your problems.

I don't need to call the assembler generated function as an ordinary function.
The whole purpose of the experiment is to figure out how large the constant factor
is between CSP solvers which compile into machine code and CSP solvers which
function more like interpreters. The claim which I am trying to verify is that
variable reordering schemes, which lend themselves better to interpreted methods,
do not buy back the inefficiency lost from interpretation instead of compilation.

        Question 3: Is there an indirect GOTO instruction? If there is, how do I
        access it directly from lisp? 

    In 7.2 the CASE construct compiles into an indexed branch when the keys
    are integers in a small range.  I don't believe this instruction exists
    in 7.1.  In all releases, the undocumented SYS:%JUMP subprimitive
    transfers control to a PC.  I do not recommend using it as it's very
    easy to get into trouble with it.  I'm not aware of any special form for
    converting a GO-tag into a PC value.

The SYS:%JUMP subprimitive is exactly what I need. I realize that I will get into
trouble with this. I am not producing production code however, I am doing an
experiment which is not very LISPlike in nature but since I do all of my work
on a 3600 (for obvious reasons) I need to be able to do this one assembler level
experiment to verify a conjucture.

I will need to be able to convert a GO-tag into a PC value for this to work at
at all.

                                      I also need to be able to compare two
        PCs as integers and determine which is larger before I branch indirectly
        to one of of them. How can I do that?

    I don't understand why you would want to do this nor how you would get
    your hands on PC values.  However, given that you have two PC values,
    the subprimitive SYS:%POINTER-LESSP will compare them, as documented.

If you want to know the real nitty-gritty reasons, here they are. One of the
CSP techniques which I am trying to code up is called Selective Backtracking
(no it is not full Dependency-Directed Backtracking, only one aspect of it.)
This technique computes nogoods which are sets of mutually inconsistent
assumptions. It turns out that assumptions can be most efficiently represented
as PC values for the code which made the assumption. One thing that Selective
Backtracking must do is take a nogood and find the most recent assumption
in it. If the order of assumptions is fixed at compile time (no variable reordering)
then I can find the latest assumption by finding the largest PC-value.
All of this can be compiled into very tight machine code on almost any architecture.
My conjecture is that the efficiency of such tight code far outweighs the benefits
of variable reordering which makes such tight code optimization tricks impossible.
In the general case, I will need to create arbitrary data-structures representing
hierarchal nogoods over assumptions, which in this case will be PC-values, and
then do a form of resolution on these hierarchal nogoods. The resolution operation
will need to be able to compare pointers of =, and <.

    I hope you find these suggestions helpful.  If you have more questions,
    feel free to ask me.  I may take a few days to respond.  Be sure to copy
    Customer-Reports and Discuss-Compiler on all mail.

I really do appreciate these suggestions.

        Below is a sample piece of code which my SAT translater generates:

        (LAMBDA (ALL?)
          (PROG (RESULTS P0001 P0002 P0003)
                (SETQ RESULTS 'NIL)
                (SETQ P0001 NIL)
                (GO N0004)
          F0005 (IF P0001 (RETURN RESULTS))
                (SETQ P0001 T)
          N0004 (IF (NOT P0001) (GO S0006))
                (GO F0005)
          S0006 (SETQ P0002 NIL)
                (GO N0007)
          F0008 (IF P0002 (GO F0005))
                (SETQ P0002 T)
          N0007 (IF P0001 (GO S0009))
                (IF (NOT P0002) (GO S0009))
                (GO F0008)
          S0009 (SETQ P0003 NIL)
                (GO N0010)
          F0011 (IF P0003 (GO F0008))
                (SETQ P0003 T)
          N0010 (IF P0002 (GO S0012))
                (IF (NOT P0003) (GO S0012))
                (GO F0011)
          S0012 (PUSH (LIST (CONS 'A (IF P0001 :TRUE :FALSE))
                            (CONS 'B (IF P0002 :TRUE :FALSE))
                            (CONS 'C (IF P0003 :TRUE :FALSE)))
                      RESULTS)
                (IF ALL? (GO F0011) (RETURN RESULTS))))