[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Y, again (long)



The recent flurry of messages about the Y operator has prompted me to
spend yet a little more time on the MIT Scheme compiler.  The most
recent version (4.33) does a good job on some versions of the Y
operator, but not on all.  This message should clarify what it can
currently do and what it cannot.

I should point out that there is no explicit "Y-recognizer" in the
compiler, but rather the results are a consequence of various of its
parts:
 - value analysis (which values can reach what variables).  In
particular, a variable's value is "known" if only one value can reach
it.
 - closure analysis (which lambda expressions need to be represented
by actual objects at run time, instead of having an implicit
representation in the code stream)
 - a limited side effect analysis which allows the compiler to "punt"
some calls when the callee has no side effects and the value returned
is unused or known and available at the call point.  This is the piece
of code that I've written recently to make the examples below "work".

Some of my thoughts about the points raised in recent messages:

As far as using some version of Y to implement LETREC, we could
probably start doing it and users would hardly ever notice the
difference, but aside from conceptual elegance nothing would be
gained, so it's not really worth it in a "real" compiler.

Why have I spent any energy getting the Y operator to "work"?  From a
compiler writer's point of view, I'm not interested in the Y operator
per-se, except as a "cute" hack.  It just happens to be a very good
test case for value and closure analysis, and if the compiler can do a
good job on code that uses it, it will hopefully also do a good job on
the "simpler" cases that occur in practice.

About eq?/eqv?-ness of procedures, I should say that the only time we
may need to distinguish multiple procedures whose "code" is the same
lambda expression is when they can coexist in time, and free variables
can cause the different instances to behave differently (return
different values and/or perform different effects).  Clearly, any
correct compiler cannot "merge" two such instances.  The rest of the
cases are for philosophers to argue about.  To make them happy,
compiler writers may have to provide a "pedantic" switch in the
compiler to prevent merging of otherwise identical procedures.  I'm
not saying that determining when two procedures "act the same way" is
feasible (we all know about the halting theorem), but if we are able
to determine in some instance that the only thing that could
discriminate between two procedures is eq?/eqv?, there is really no
reason not to merge.

Here are some examples of what the MIT Scheme compiler can currently do:

The following versions of factorial turn into identical (bitwise) code:

;; version 1, using letrec.  reference version

(define fact
  (letrec ((fact (lambda (n)
		   (if (= n 0)
		       1
		       (* n (fact (- n 1)))))))
    fact))

;; version 2, "subroutinized" "standard" applicative order

(define fact
  ((lambda (f)
     (let ((g (lambda (g) (f (lambda () (g g))))))
       (g g)))
   (lambda (fg)
     (lambda (n)
       (if (= n 0)
	   1
	   (* n ((fg) (- n 1))))))))

;; version 3, "standard" normal order

(define fact
  ((lambda (f)
     ((lambda (g) (f (g g)))
      (lambda (g) (f (g g)))))
   (lambda (fact)
     (lambda (n)
       (if (= n 0)
	   1
	   (* n (fact (- n 1))))))))

Note that version 3 "has no right to work" in Scheme, but I've never
found anything wrong with compilers that "fix" users' code.

The following two versions turn into code essentially identical to the
code generated for the versions above, with the differences described
below.

;; version 4, another "subroutinized" "standard" applicative order

(define fact
  ((lambda (f)
     (let ((g (lambda (g) (f (lambda (x) ((g g) x))))))
       (g g)))
   (lambda (fact)
     (lambda (n)
       (if (= n 0)
	   1
	   (* n (fact (- n 1))))))))

;; version 5, "inline coded" Y

(define fact
  (let ((kernel
	 (lambda (p n)
	   (if (= n 0)
	       1
	       (* n (p p (- n 1)))))))
    (lambda (n)
      (kernel kernel n))))

The code generated for version 4 differs from the code in versions 1-3
only in the order in which the basic blocks appear in the linearized
output.

The code for version 5 differs because the compiler is currently not
smart enough to eta convert (lambda (n) ...) into (lambda (p n) ...)
after it determines that p in (lambda (p n) ...)  is used for "self
reference" and unneeded (and dropped).  Thus there is an initial call
(branch instruction) from the code generated for (lambda (n) ...) to
the code generated for (lambda (p n) ...), but the actual "recursive"
procedure is coded identically to the other cases above.

As an example of a hairier case where the compiler "wins", it
generates identical code for

(define (delq! item items)
  (letrec ((trim-initial-segment
	    (lambda (items)
	      (if (pair? items)
		  (if (eq? item (car items))
		      (trim-initial-segment (cdr items))
		      (begin (locate-initial-segment items (cdr items))
			     items))
		  items)))
	   (locate-initial-segment
	    (lambda (last this)
	      (if (pair? this)
		  (if (eq? item (car this))
		      (set-cdr! last (trim-initial-segment (cdr this)))
		      (locate-initial-segment this (cdr this)))
		  this))))
    (trim-initial-segment items)))

and

(define delq!
  (lambda (item items)
    (((lambda (b f g)
	((lambda (k1 k2)
	   (b (lambda () (k1 k1 k2))
	      (lambda () (k2 k1 k2))))
	 (lambda (k1 k2)
	   (f (lambda () (k1 k1 k2))
	      (lambda () (k2 k1 k2))))
	 (lambda (k1 k2)
	   (g (lambda () (k1 k1 k2))
	      (lambda () (k2 k1 k2))))))
      (lambda (p1g p2g)
	(lambda ()
	  ((p1g) items)))
      (lambda (p11g p12g)
	(lambda (items)
	  (if (pair? items)
	      (if (eq? item (car items))
		  ((p11g) (cdr items))
		  (begin ((p12g) items (cdr items))
			 items))
	      items)))
      (lambda (p21g p22g)
	(lambda (last this)
	  (if (pair? this)
	      (if (eq? item (car this))
		  (set-cdr! last ((p21g) (cdr this)))
		  ((p22g) this (cdr this)))
	      this)))))))

Examples where the compiler "loses":

;; version 6 "non-subroutinized" "standard" applicative order

(define fact
  ((lambda (f)
     ((lambda (g) (f (lambda () (g g))))
      (lambda (g) (f (lambda () (g g))))))
   (lambda (fg)
     (lambda (n)
       (if (= n 0)
	   1
	   (* n ((fg) (- n 1))))))))

;; version 7 another "non-subroutinized" "standard" applicative order

(define fact
  ((lambda (f)
     ((lambda (g) (f (lambda (x) ((g g) x))))
      (lambda (g) (f (lambda (x) ((g g) x))))))
   (lambda (fact)
     (lambda (n)
       (if (= n 0)
	   1
	   (* n (fact (- n 1))))))))

Version 6 does not win because the compiler is not smart enough to
realize that both copies of (lambda () (g g)) are indistinguishable.
This causes (lambda (n) ...) to be represented as a closure (with one
free variable, namely fg), and even though the compiler realizes that
the call (fg) always returns a version of (lambda (n) ...) and
therefore codes the call as a direct branch, there is still overhead
associated with the closure and computing its "next" version (which
will be unused).

Version 7 does not win for a similar reason.  Again, the compiler does
not realize that both copies of (lambda (x) ...) are identical and
therefore causes (lambda (n) ...) to be closed over fact.  Both calls
((g g) x) are known to be calls to (lambda (n) ...), and are
translated as direct branches after computing the "next" (useless)
closure for (lambda (n) ...)

There are many other cases where it will lose due to the
"non-uniqueness" of the values which may reach individual variables.

An altogether different reason why it may lose is the fact that once a
closure is detected, a "pessimistic" code generation strategy is used
which forces some of the "useless" procedures and overhead to
reappear.