[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
A question about tweaking the 3600 Lisp Compiler
Date: Mon, 7 Mar 88 11:22 EST
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Date: Fri, 4 Mar 88 11:00 PST
From: Siskind.pa@Xerox.COM
I am doing some experiments where I compile SAT problems
(Boolean satisfiability) and CSP problems (Constraint Satisfaction
problems) directly into very tight Lisp code. The code is very simple.
It is not very Lisp-like and ressembles Assembler code. It is all one
large PROG with lots of IF-EQ-GOTOs and some SETQs to local variables.
I can generate such code from the SAT/CSP problems quite quickly. The good
thing about such code is that once it is compiled it runs exceedingly
quickly (about 10,000 times faster than other more sophisticated SAT/CSP
solvers which I have tried). The only problem is that the 3600 Lisp compiler
takes forever to compile the large proceedures which my SAT/CSP-Lisp translater
generates because of all of the optimization which is being done.
Are you sure it's the optimization, or is that just an assumption?
I'm pretty sure. In the example shown below, 58% is spent in COMPILER:OPTIMIZE-FORM.
All of
that optimization is useless for this application as the code which I
generate has a trivial one-to-one correspondence to the most optimal
3600 machine instruction sequences which the compiler ends up prducing.
I tried setting both COMPILER:*PHASE-1-OPTIMIZE-P* and COMPILER:*RUN-PHASE-3*
to NIL. That helped a bit but not enough.
Just for your info, it was Eric Weaver who suggested that I turn off those two switches.
Question 1: How do I turn off ALL optimizations.
I would first use metering tools, even crude ones like pressing c-m-Suspend
a statistically significant number of times while compiling, to find out
where the time is being spent. Only then should you worry about turning
things off.
Here are the results of using my profiling tools. (Do you remember the help you gave
me several months ago. Well this tool which I wrote is the result of that help. I find
it really easy to use and provides simple, uncluttered, and easy to interpret information.
I'll send you a copy if you want. I would submit it to SLUG but I don't know how.)
Anyway, first I ran the compiler with COMPILER:*PHASE-1-OPTIMIZE-P* and COMPILER:*RUN-PHASE-3*
both set to T. It took 1715 samples (at 10 samples per second) to compile my one function.
Here are the percentages of time spent inside each function. (The times are cummulative,
if A calls B then B's time is also charged against A.)
A total of 1715 samples were taken.
The following system functions were profiled:
92% COMPILER:COMPILE-TO-CORE
92% (:INTERNAL COMPILER:COMPILE-TO-CORE 1 (:DUMP-LAMBDA-EXPRESSION))
92% COMPILE
92% COMPILER:COMPILE-DEFINITION-1
92% COMPILER:COMPILE-LAMBDA-EXP
69% COMPILER:PHASE-1-BLOCK
69% COMPILER:RUN-PHASE-1-FUNCTION
69% (:PROPERTY PROG COMPILER:PHASE-1)
69% COMPILER:PHASE-1-EXTERNAL-FUNCTION
69% COMPILER:PHASE-1-FUNCTION-DEFINITION
69% COMPILER:PHASE-1-PROG
69% COMPILER:PHASE-1
61% COMPILER:TRANSFORM
58% COMPILER:OPTIMIZE-FORM
55% (:PROPERTY IF COMPILER:PHASE-1)
37% COMPILER:TRANSFORM-PREDICATE
21% COMPILER:ASSEMBLE-FUNCTION
21% COMPILER:RUN-PHASE-4
17% COMPILER:FIXUP-BRANCHES
15% COMPILER:PASS-OVER-BRANCHES
11% SYS:%ALLOCATE-BLOCK
10% SI:%GC-SCAVENGE
6% SI:%ALLOCATE-LIST-BLOCK-ESCAPE
6% COMPILER:PHASE-1-PROGN
I then ran the compiler again with both switches set to NIL. Here are the results
(For your info, the function being compiled is a CSP (Constraint Satisfaction problem)
encoding of the 8-Queens problem.)
A total of 974 samples were taken.
The following system functions were profiled:
74% COMPILER:COMPILE-TO-CORE
74% (:INTERNAL COMPILER:COMPILE-TO-CORE 1 (:DUMP-LAMBDA-EXPRESSION))
74% COMPILE
72% COMPILER:COMPILE-DEFINITION-1
72% COMPILER:COMPILE-LAMBDA-EXP
43% SYS:%ALLOCATE-BLOCK
41% SI:%GC-SCAVENGE
38% COMPILER:ASSEMBLE-FUNCTION
38% COMPILER:RUN-PHASE-4
34% COMPILER:PHASE-1-BLOCK
34% COMPILER:RUN-PHASE-1-FUNCTION
34% (:PROPERTY PROG COMPILER:PHASE-1)
34% COMPILER:PHASE-1-EXTERNAL-FUNCTION
34% COMPILER:PHASE-1-FUNCTION-DEFINITION
34% COMPILER:PHASE-1-PROG
34% COMPILER:PHASE-1
26% (:PROPERTY IF COMPILER:PHASE-1)
23% SI:%ALLOCATE-STRUCTURE-BLOCK-ESCAPE
23% SI:SIMPLE-MAKE-ARRAY-NSS-AREA
23% COMPILER:FIXUP-BRANCHES
20% SI:%ALLOCATE-LIST-BLOCK-ESCAPE
19% COMPILER:PASS-OVER-BRANCHES
13% COMPILER:ASSEMBLE-INTERVAL
11% APPEND
11% REDUCE
11% SI:%MAKE-LIST
9% COMPILER:PHASE-1-FUNCTION-CALL
9% (:PROPERTY COMPILER:JUMP COMPILER:ASSEMBLE)
8% COMPILER:TRANSFORM
8% COMPILER:NOTE-BRANCH
6% COMPILED-CSP-SOLVER-INTERNAL
You can start by eschewing macros like IF and PUSH, as well as optimizable
functions like LIST, if you're really aware of a one-to-one correspondence
between your code and machine instructions.
Are you trying to tell me that COND is more primitive than IF? Anyway, macro
expansion is trivial for my examples and doesn't take much time. I will however
follow your suggestion and not use macros. BTW, what are the non-optimizable
or pre-optimized versions of LIST?
Question 2: I know the machine code sequences which I want to generate.
There are only two general templates of length 6 and 2 instructions
respectively. Can I use an assembler directly and I how do I do that?
It's a little known fact, but there is a crude assembler buried somewhere
in the system. I'd rather not tell you how to use it, for three reasons.
One is that it won't actually work for you; without some modifications, it
won't assemble code that you can call as an ordinary function. Second is
that if you adopt that approach, your code won't port to future machine
offerings. Third, because of the way that assembler is implemented, it
may well be slower than the compiler. We can keep this idea in reserve
in case better ideas don't pan out, but currently I doubt that the assembler
will be the answer to your problems.
I don't need to call the assembler generated function as an ordinary function.
The whole purpose of the experiment is to figure out how large the constant factor
is between CSP solvers which compile into machine code and CSP solvers which
function more like interpreters. The claim which I am trying to verify is that
variable reordering schemes, which lend themselves better to interpreted methods,
do not buy back the inefficiency lost from interpretation instead of compilation.
Question 3: Is there an indirect GOTO instruction? If there is, how do I
access it directly from lisp?
In 7.2 the CASE construct compiles into an indexed branch when the keys
are integers in a small range. I don't believe this instruction exists
in 7.1. In all releases, the undocumented SYS:%JUMP subprimitive
transfers control to a PC. I do not recommend using it as it's very
easy to get into trouble with it. I'm not aware of any special form for
converting a GO-tag into a PC value.
The SYS:%JUMP subprimitive is exactly what I need. I realize that I will get into
trouble with this. I am not producing production code however, I am doing an
experiment which is not very LISPlike in nature but since I do all of my work
on a 3600 (for obvious reasons) I need to be able to do this one assembler level
experiment to verify a conjucture.
I will need to be able to convert a GO-tag into a PC value for this to work at
at all.
I also need to be able to compare two
PCs as integers and determine which is larger before I branch indirectly
to one of of them. How can I do that?
I don't understand why you would want to do this nor how you would get
your hands on PC values. However, given that you have two PC values,
the subprimitive SYS:%POINTER-LESSP will compare them, as documented.
If you want to know the real nitty-gritty reasons, here they are. One of the
CSP techniques which I am trying to code up is called Selective Backtracking
(no it is not full Dependency-Directed Backtracking, only one aspect of it.)
This technique computes nogoods which are sets of mutually inconsistent
assumptions. It turns out that assumptions can be most efficiently represented
as PC values for the code which made the assumption. One thing that Selective
Backtracking must do is take a nogood and find the most recent assumption
in it. If the order of assumptions is fixed at compile time (no variable reordering)
then I can find the latest assumption by finding the largest PC-value.
All of this can be compiled into very tight machine code on almost any architecture.
My conjecture is that the efficiency of such tight code far outweighs the benefits
of variable reordering which makes such tight code optimization tricks impossible.
In the general case, I will need to create arbitrary data-structures representing
hierarchal nogoods over assumptions, which in this case will be PC-values, and
then do a form of resolution on these hierarchal nogoods. The resolution operation
will need to be able to compare pointers of =, and <.
I hope you find these suggestions helpful. If you have more questions,
feel free to ask me. I may take a few days to respond. Be sure to copy
Customer-Reports and Discuss-Compiler on all mail.
I really do appreciate these suggestions.
Below is a sample piece of code which my SAT translater generates:
(LAMBDA (ALL?)
(PROG (RESULTS P0001 P0002 P0003)
(SETQ RESULTS 'NIL)
(SETQ P0001 NIL)
(GO N0004)
F0005 (IF P0001 (RETURN RESULTS))
(SETQ P0001 T)
N0004 (IF (NOT P0001) (GO S0006))
(GO F0005)
S0006 (SETQ P0002 NIL)
(GO N0007)
F0008 (IF P0002 (GO F0005))
(SETQ P0002 T)
N0007 (IF P0001 (GO S0009))
(IF (NOT P0002) (GO S0009))
(GO F0008)
S0009 (SETQ P0003 NIL)
(GO N0010)
F0011 (IF P0003 (GO F0008))
(SETQ P0003 T)
N0010 (IF P0002 (GO S0012))
(IF (NOT P0003) (GO S0012))
(GO F0011)
S0012 (PUSH (LIST (CONS 'A (IF P0001 :TRUE :FALSE))
(CONS 'B (IF P0002 :TRUE :FALSE))
(CONS 'C (IF P0003 :TRUE :FALSE)))
RESULTS)
(IF ALL? (GO F0011) (RETURN RESULTS))))