[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Detailed performance analysis

I've changed the Subject line to more accurately reflect the tangent off
on which we've gone.

    Date: Mon, 23 Oct 89 11:22 EDT
    From: Moon@STONY-BROOK.SCRC.Symbolics.COM (David A. Moon)

	Date: Sun, 22 Oct 89 20:10:17 EDT
	From: barmar@Think.COM

	I was doing loops of 100,000 and 1,000,000 repetitions, ran them a couple
	of times and saw consistent results.  My functions were pretty simple, but
	it's possible that I wasn't factoring everything out.  This is similar to
	what I used to determine speed of accessing various types of variable in a
	local function:

	(defun test (x n)
	  (flet ((test1 (x) (declare (ignore x))) ; access no variable
		 (test2 (x) x) ; access local variable
		 (test3 () x)) ; access lexical variable
	    (without-interrupts (time (loop repeat n do (test1 1))))
	    (without-interrupts (time (loop repeat n do (test2 1))))
	    (without-interrupts (time (loop repeat n do (test3))))))

	I guess comparing TEST1 to either of the others includes the difference
	between RETURN-NIL and RETURN-STACK.  But I think it should give an
	accurate measurement when comparing TEST2 and TEST3.

    More importantly, you are comparing the speed of calling a closure with
    one argument and accessing a parent lexical variable, to the speed of
    calling a non-closure function with no arguments and accessing a local
    variable, and then calling that the difference between accessing a
    parent lexical variable and accessing a local variable.  The difference
    you are reporting is the sum of three differences: two types of called
    function, two numbers of arguments, and two types of variable access.

Yes, I was assuming that the compiler would transform (test3) into
(funcall #'(:internal test 2 test3) <env>).  I'm not sure where I got
the idea that calls to local closures were optimized this way, but now I
see that I was wrong.

I finally found versions of TEST2 and TEST3 that I think are good:

	 (test2 (y) (false x y y) nil)		; access local variable
	 (test3 (y) (false x y x) nil))		; access lexical variable

These both reference each type of variable once (to force both to be
called as closures), and then reference a different type of variable for
comparison purposes.

It's difficult to come up with functions that measure individual
instructions like these, because it is hard to write Lisp code that will
compile into binary with just the intended difference.  Unexpected
optimizations and non-optimizations can creep in there.  For instance,
just before the above versions, I had a version that used + instead of
FALSE; however, when the second argument to + is a local variable it
compiles into a one-argument version of the +-INTERNAL builtin instead
of the normal zero-argument version (which gets both parameters from the
stack).  I haven't been able to figure out how to determine how long a
PUSH-LOCAL takes by itself, unless I can assume that CALL-<n>-xxx and
CALL-<n+1>-xxx take the same amount of time.