[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Unable to deliver letter
Unable to deliver letter to the following recipient:
SLUG@R20.UTexas.EDU:
After trying without success for 3 days to contact the relevant
domain servers needed to resolve the domain name "R20.UTexas.EDU",
the Mailer must presume that there probably is no such domain.
----- Text of letter follows -----
Received: from EUPHRATES.SCRC.Symbolics.COM by STONY-BROOK.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 361457; Fri 11-Mar-88 18:22:24 EST
Date: Fri, 11 Mar 88 18:22 EST
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Subject: A question about tweaking the 3600 Lisp Compiler
To: Jeffrey Mark Siskind <Qobi@ZERMATT.LCS.MIT.EDU>
cc: Customer-Reports@STONY-BROOK.SCRC.Symbolics.COM, SLUG@R20.UTexas.EDU,
Discuss-Compiler@STONY-BROOK.SCRC.Symbolics.COM
In-Reply-To: <880308012605.1.QOBI@WHIRLWIND.LCS.MIT.EDU>
Message-ID: <19880311232215.1.MOON@EUPHRATES.SCRC.Symbolics.COM>
Date: Tue, 8 Mar 88 01:26 EST
From: Jeffrey Mark Siskind <Qobi@ZERMATT.LCS.MIT.EDU>
Date: Mon, 7 Mar 88 11:22 EST
From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
Date: Fri, 4 Mar 88 11:00 PST
From: Siskind.pa@Xerox.COM
I am doing some experiments where I compile SAT problems
(Boolean satisfiability) and CSP problems (Constraint Satisfaction
problems) directly into very tight Lisp code. The code is very simple.
It is not very Lisp-like and ressembles Assembler code. It is all one
large PROG with lots of IF-EQ-GOTOs and some SETQs to local variables.
I can generate such code from the SAT/CSP problems quite quickly. The good
thing about such code is that once it is compiled it runs exceedingly
quickly (about 10,000 times faster than other more sophisticated SAT/CSP
solvers which I have tried). The only problem is that the 3600 Lisp compiler
takes forever to compile the large proceedures which my SAT/CSP-Lisp translater
generates because of all of the optimization which is being done.
Are you sure it's the optimization, or is that just an assumption?
I'm pretty sure. In the example shown below, 58% is spent in COMPILER:OPTIMIZE-FORM.
The fact that 58% of the time was spent in a function with that name, and in functions
that it calls, does not prove that optimization is where the time is being spent.
For example, macro expansion takes place inside of that function. So do a bunch
of other things that aren't optimization.
All of
that optimization is useless for this application as the code which I
generate has a trivial one-to-one correspondence to the most optimal
3600 machine instruction sequences which the compiler ends up prducing.
I tried setting both COMPILER:*PHASE-1-OPTIMIZE-P* and COMPILER:*RUN-PHASE-3*
to NIL. That helped a bit but not enough.
Just for your info, it was Eric Weaver who suggested that I turn off those two switches.
Question 1: How do I turn off ALL optimizations.
I would first use metering tools, even crude ones like pressing c-m-Suspend
a statistically significant number of times while compiling, to find out
where the time is being spent. Only then should you worry about turning
things off.
Here are the results of using my profiling tools. (Do you remember the help you gave
me several months ago. Well this tool which I wrote is the result of that help. I find
it really easy to use and provides simple, uncluttered, and easy to interpret information.
I'll send you a copy if you want. I would submit it to SLUG but I don't know how.)
Anyway, first I ran the compiler with COMPILER:*PHASE-1-OPTIMIZE-P* and COMPILER:*RUN-PHASE-3*
both set to T. It took 1715 samples (at 10 samples per second) to compile my one function.
Here are the percentages of time spent inside each function. (The times are cummulative,
if A calls B then B's time is also charged against A.)
The use of cumulative percentages on a program as highly recursive as the compiler
means that there is essentially no information here on where the time is actually
being spent.
A total of 1715 samples were taken.
The following system functions were profiled:
92% COMPILER:COMPILE-TO-CORE
92% (:INTERNAL COMPILER:COMPILE-TO-CORE 1 (:DUMP-LAMBDA-EXPRESSION))
92% COMPILE
92% COMPILER:COMPILE-DEFINITION-1
92% COMPILER:COMPILE-LAMBDA-EXP
69% COMPILER:PHASE-1-BLOCK
69% COMPILER:RUN-PHASE-1-FUNCTION
69% (:PROPERTY PROG COMPILER:PHASE-1)
69% COMPILER:PHASE-1-EXTERNAL-FUNCTION
69% COMPILER:PHASE-1-FUNCTION-DEFINITION
69% COMPILER:PHASE-1-PROG
69% COMPILER:PHASE-1
61% COMPILER:TRANSFORM
58% COMPILER:OPTIMIZE-FORM
55% (:PROPERTY IF COMPILER:PHASE-1)
37% COMPILER:TRANSFORM-PREDICATE
21% COMPILER:ASSEMBLE-FUNCTION
21% COMPILER:RUN-PHASE-4
17% COMPILER:FIXUP-BRANCHES
15% COMPILER:PASS-OVER-BRANCHES
11% SYS:%ALLOCATE-BLOCK
10% SI:%GC-SCAVENGE
6% SI:%ALLOCATE-LIST-BLOCK-ESCAPE
6% COMPILER:PHASE-1-PROGN
I then ran the compiler again with both switches set to NIL. Here are the results
(For your info, the function being compiled is a CSP (Constraint Satisfaction problem)
encoding of the 8-Queens problem.)
The only information I got from the second profile is that failing to exclude
the GC is making your results unreproducible. Thus I can't compare the two
profiles to see what the effect of setting those switches was.
If you have access at MIT to Genera 7.2, you might try the metering tools
there. I'm told they're really good, and not as prone to the artifacts that
are showing up in your metering here. That's all I know; I have not actually
used the 7.2 metering tools myself, as I have not done any metering since
they became available to me.
A total of 974 samples were taken.
The following system functions were profiled:
74% COMPILER:COMPILE-TO-CORE
74% (:INTERNAL COMPILER:COMPILE-TO-CORE 1 (:DUMP-LAMBDA-EXPRESSION))
74% COMPILE
72% COMPILER:COMPILE-DEFINITION-1
72% COMPILER:COMPILE-LAMBDA-EXP
43% SYS:%ALLOCATE-BLOCK
41% SI:%GC-SCAVENGE
38% COMPILER:ASSEMBLE-FUNCTION
38% COMPILER:RUN-PHASE-4
34% COMPILER:PHASE-1-BLOCK
34% COMPILER:RUN-PHASE-1-FUNCTION
34% (:PROPERTY PROG COMPILER:PHASE-1)
34% COMPILER:PHASE-1-EXTERNAL-FUNCTION
34% COMPILER:PHASE-1-FUNCTION-DEFINITION
34% COMPILER:PHASE-1-PROG
34% COMPILER:PHASE-1
26% (:PROPERTY IF COMPILER:PHASE-1)
23% SI:%ALLOCATE-STRUCTURE-BLOCK-ESCAPE
23% SI:SIMPLE-MAKE-ARRAY-NSS-AREA
23% COMPILER:FIXUP-BRANCHES
20% SI:%ALLOCATE-LIST-BLOCK-ESCAPE
19% COMPILER:PASS-OVER-BRANCHES
13% COMPILER:ASSEMBLE-INTERVAL
11% APPEND
11% REDUCE
11% SI:%MAKE-LIST
9% COMPILER:PHASE-1-FUNCTION-CALL
9% (:PROPERTY COMPILER:JUMP COMPILER:ASSEMBLE)
8% COMPILER:TRANSFORM
8% COMPILER:NOTE-BRANCH
6% COMPILED-CSP-SOLVER-INTERNAL
You can start by eschewing macros like IF and PUSH, as well as optimizable
functions like LIST, if you're really aware of a one-to-one correspondence
between your code and machine instructions.
Are you trying to tell me that COND is more primitive than IF?
Yes, but as it turns out, some time during the last three years while I wasn't
looking this was changed. IF is now the primitive, and what I said about it
being a macro was wrong. My comments still apply to PUSH, though.
Anyway, macro
expansion is trivial for my examples and doesn't take much time. I will however
follow your suggestion and not use macros. BTW, what are the non-optimizable
or pre-optimized versions of LIST?
Look at the COMPILER:OPTIMIZED-INTO property of LIST.
Question 2: I know the machine code sequences which I want to generate.
There are only two general templates of length 6 and 2 instructions
respectively. Can I use an assembler directly and I how do I do that?
It's a little known fact, but there is a crude assembler buried somewhere
in the system. I'd rather not tell you how to use it, for three reasons.
One is that it won't actually work for you; without some modifications, it
won't assemble code that you can call as an ordinary function. Second is
that if you adopt that approach, your code won't port to future machine
offerings. Third, because of the way that assembler is implemented, it
may well be slower than the compiler. We can keep this idea in reserve
in case better ideas don't pan out, but currently I doubt that the assembler
will be the answer to your problems.
I don't need to call the assembler generated function as an ordinary function.
How else do you propose to call it?
The whole purpose of the experiment is to figure out how large the constant factor
is between CSP solvers which compile into machine code and CSP solvers which
function more like interpreters. The claim which I am trying to verify is that
variable reordering schemes, which lend themselves better to interpreted methods,
do not buy back the inefficiency lost from interpretation instead of compilation.
This is beyond me, sorry. I'm not familiar with these issues.
Question 3: Is there an indirect GOTO instruction? If there is, how do I
access it directly from lisp?
In 7.2 the CASE construct compiles into an indexed branch when the keys
are integers in a small range. I don't believe this instruction exists
in 7.1. In all releases, the undocumented SYS:%JUMP subprimitive
transfers control to a PC. I do not recommend using it as it's very
easy to get into trouble with it. I'm not aware of any special form for
converting a GO-tag into a PC value.
The SYS:%JUMP subprimitive is exactly what I need. I realize that I will get into
trouble with this. I am not producing production code however, I am doing an
experiment which is not very LISPlike in nature but since I do all of my work
on a 3600 (for obvious reasons) I need to be able to do this one assembler level
experiment to verify a conjucture.
I will need to be able to convert a GO-tag into a PC value for this to work at
at all.
I don't know how to do this.
I also need to be able to compare two
PCs as integers and determine which is larger before I branch indirectly
to one of of them. How can I do that?
I don't understand why you would want to do this nor how you would get
your hands on PC values. However, given that you have two PC values,
the subprimitive SYS:%POINTER-LESSP will compare them, as documented.
If you want to know the real nitty-gritty reasons, here they are. One of the
CSP techniques which I am trying to code up is called Selective Backtracking
(no it is not full Dependency-Directed Backtracking, only one aspect of it.)
This technique computes nogoods which are sets of mutually inconsistent
assumptions. It turns out that assumptions can be most efficiently represented
as PC values for the code which made the assumption. One thing that Selective
Backtracking must do is take a nogood and find the most recent assumption
in it. If the order of assumptions is fixed at compile time (no variable reordering)
then I can find the latest assumption by finding the largest PC-value.
All of this can be compiled into very tight machine code on almost any architecture.
My conjecture is that the efficiency of such tight code far outweighs the benefits
of variable reordering which makes such tight code optimization tricks impossible.
In the general case, I will need to create arbitrary data-structures representing
hierarchal nogoods over assumptions, which in this case will be PC-values, and
then do a form of resolution on these hierarchal nogoods. The resolution operation
will need to be able to compare pointers of =, and <.
I hope you find these suggestions helpful. If you have more questions,
feel free to ask me. I may take a few days to respond. Be sure to copy
Customer-Reports and Discuss-Compiler on all mail.
I really do appreciate these suggestions.
Below is a sample piece of code which my SAT translater generates:
(LAMBDA (ALL?)
(PROG (RESULTS P0001 P0002 P0003)
(SETQ RESULTS 'NIL)
(SETQ P0001 NIL)
(GO N0004)
F0005 (IF P0001 (RETURN RESULTS))
(SETQ P0001 T)
N0004 (IF (NOT P0001) (GO S0006))
(GO F0005)
S0006 (SETQ P0002 NIL)
(GO N0007)
F0008 (IF P0002 (GO F0005))
(SETQ P0002 T)
N0007 (IF P0001 (GO S0009))
(IF (NOT P0002) (GO S0009))
(GO F0008)
S0009 (SETQ P0003 NIL)
(GO N0010)
F0011 (IF P0003 (GO F0008))
(SETQ P0003 T)
N0010 (IF P0002 (GO S0012))
(IF (NOT P0003) (GO S0012))
(GO F0011)
S0012 (PUSH (LIST (CONS 'A (IF P0001 :TRUE :FALSE))
(CONS 'B (IF P0002 :TRUE :FALSE))
(CONS 'C (IF P0003 :TRUE :FALSE)))
RESULTS)
(IF ALL? (GO F0011) (RETURN RESULTS))))