[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Compiling DOTIMES



    Date: Thu, 19 Jul 90 15:56 EDT
    From: bud%cleves@gte.com (Bud Frawley)

    While using code to generate possibilities, Tack Fu of GTE noted that
    compiling nested DOTIMESs took considerably more time that compiling
    "equivalent" LOOPs. We've tried a few depths of nesting; it quickly
    gets out of hand. 

    Consider the simple lambda form
[reproduced below, so elided here...]

    It takes 40 seconds to compile this on a 3640.

As you are no doubt sick of hearing by now, this form contains a macro
which must be macroexpanded within its life.  Here is why: 

1.  When you expand a DOTIMES, the first thing it does is FIND-BODY-DECLARATIONS.
    Since Common Lisp the Language (CLtL) was unclear about this, it is assumed
    that a form which can be expanded into a declaration is allowed to actually
    do so, so the first subform in the in the body is also macroexpanded.

2.  Next, the body of DOTIMES is expanded to see whether the limit value is
    modified.  If it's a constant or a variable which is not modified, then no
    LET is generated to bind it; if it's modified or expensive to calculate, it's
    bound in a GENSYM'ed lexical variable.

So, if you have two nested DOTIMES's, one directly inside the other with no other
forms intervening, you expand the inner one three times, once for its own
expansion, and twice for the outer one's.

Here is the result of expanding your original version:

(time
  (progn
    (macroexpand
      '(dotimes (a 3)
	 (ignore a) 
	 (dotimes (b 3)
	   (ignore b)
	   (dotimes (c 3)
	     (ignore c)
	     (dotimes (x 3)
	       (ignore x)
	       (dotimes (y 3)
		 (ignore y)
		 (dotimes (z 3)
		   (ignore z)
		   (incf counter))))))))
    nil))
Evaluation of (PROGN (MACROEXPAND #) NIL) took 12.610828 seconds of elapsed time including:
  0.331 seconds processing sequence breaks,
  0.366 seconds in the storage system (including 0.000 seconds waiting for pages):
	0.147 seconds processing 747 page faults including 0 fetches,
	0.218 seconds in creating and destroying pages, and
	0.000 seconds in miscellaneous storage system tasks.
49,973 list, 8,037 structure words consed in WORKING-STORAGE-AREA.

Now, try it without the (IGNORE ...)'s.  Note that although the form is somewhat
smaller, the amount of macroexpansion time and consing triples!

    (let ((si:gc-flip-inhibit-time-until-warning 36000))  ;; No notifications, please
      (si:inhibit-gc-flips				      ;; No GC flips, please
	(time
	  (progn
	    (macroexpand
	      '(dotimes (a 3)
		 (dotimes (b 3)
		   (dotimes (c 3)
		     (dotimes (x 3)
		       (dotimes (y 3)
			 (dotimes (z 3)
			   (incf counter))))))))
	    nil))))
Evaluation of (PROGN (MACROEXPAND #) NIL) took 36.937469 seconds of elapsed time including:
  1.026 seconds processing sequence breaks,
  1.103 seconds in the storage system (including 0.000 seconds waiting for pages):
	0.488 seconds processing 2455 page faults including 0 fetches,
	0.614 seconds in creating and destroying pages, and
	0.000 seconds in miscellaneous storage system tasks.
159,007 list, 26,444 structure words consed in WORKING-STORAGE-AREA.

Again, this is because the first form inside the body needs to be macroexpanded
to see if it might be expanding into (DECLARE ...).

				      To further confound matters, the
    function compiled from DOTIMES is about 20% faster than the function
    based on LOOP. 

Of course.  It's a better loop for short loops.  Forward branch instructions (and
also long branches in either direction) often clear instruction pipelines if
taken, and the expansion of LOOP has an unconditional branch at the beginning of
the loop.  Try changing the number 3 to 30 (of course, nest fewer loops so you
don't wait until next year for results) and you'll see that the LOOP loop is not
much slower than the DOTIMES loop.

[On the Ivory, the comment in DOTIMES says the compiler generates a special
looping instruction for DOTIMES; I don't know if it can also do so for the LOOP
case, so that may also be causing your speedup.]

(dotimes (a 3)
  (do-whatever))

   => (PROG ((A 0))
	    (IF (NOT (< A 3)) (PROGN (GO #:G0004)))  ;; Conditional branch
	    #:G0003 (DO-WHATEVER)
	    (SETQ A (1+ A))
	    (IF (< A 3) (PROGN (GO #:G0003)))
	    #:G0004 (RETURN NIL))


(loop for a below 3
      do (do-whatever))

   => (LET ((A 0))
	(PROG NIL
	      (GO SI:NEXT-LOOP)			     ;; Unconditional branch
	   SI:LOOP-BODY (DO-WHATEVER)
	      (SETQ A (GLOBAL:PLUS A 1))
	   SI:NEXT-LOOP (OR (NOT (GLOBAL:LESSP A 3))
			    (GO SI:LOOP-BODY))
	   SI:END-LOOP ))