[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CMU CL 15c performance



These runs are with (declare (optimize (speed 3) (space 0) (safety 0))).
The (space 0) does have some effect on inline expansion.  In particular, it
enables "promiscuous" inline expansion of various system functions (such as
READ-CHAR.)

I profiled your program, and found that a lot of time was being spent in
the I/O routines:
      Seconds  |  Consed   |  Calls  |  Sec/Call  |  Name:
    ------------------------------------------------------
	19.700 | 2,464,112 |       1 |   19.69990 | LOCSUM
	13.570 |   308,536 |       1 |   13.56990 | READ-BYTE-IMAGE-RAW
	 7.800 |       760 |       1 |    7.79990 | WRITE-BYTE-IMAGE-RAW
	 4.870 |   307,752 |       1 |    4.86990 | CONVERT-FLOAT-BYTE-IMAGE
	 4.390 | 1,232,056 |       1 |    4.38990 | CONVERT-BYTE-FLOAT-IMAGE
    ------------------------------------------------------
	50.329 | 4,313,216 |       5 |            | Total

Unfortunately, CL lacks any standard way to do block I/O.  Changing to use
CMU specific operations resulted in:
      Seconds  |  Consed   |  Calls  |  Sec/Call  |  Name:
    ------------------------------------------------------
	19.650 | 2,464,112 |       1 |   19.64990 | LOCSUM
	 4.860 |   307,752 |       1 |    4.85990 | CONVERT-FLOAT-BYTE-IMAGE
	 4.660 | 1,232,056 |       1 |    4.65990 | CONVERT-BYTE-FLOAT-IMAGE
	 0.150 |   307,936 |       1 |    0.14990 | READ-BYTE-IMAGE-RAW
	 0.010 |       760 |       1 |    0.00990 | WRITE-BYTE-IMAGE-RAW
    ------------------------------------------------------
	29.329 | 4,312,616 |       5 |            | Total

[In the process, I discovered that PROFILE is broken in 15c.  A fixed
system will be out soon...]

I have appended those changes.  Basically all of the consing seems to be
accounted for by allocation of 2 byte-images and 3 float-images.
FOLD-FLOAT-IMAGE seems not to be called.  I would have to see the calling
context to tell why it isn't being inline expanded properly.

Most of the remaining overhead seems to be in the use of dynamically sized
multi-dimensional arrays.  I experimented with declaring the arrays to have
fixed size, but it turns out that since the dimensions are not a power of
2, this doesn't help much.  Our alpha system has general multiplier
recoding, but it is currently a pessimization due to the shifts not being
open-coded.

The same sort of vector-of-vectors approach you describe for your C version
would probably help this quite a bit, and would still allow dynamic sizing.

Below are dynamic VOP cost figures for suml.  As you can see, a great deal
of time is being spent in * and + for index calculation.  The calls to
%ARRAY-DIMENSION and %ARRAY-DATA-VECTOR would also be eliminated if vectors
were used.  The use for KNOWN-CALL-LOCAL and KNOWN-RETURN is because local
functions (such as AT in LOCSUM) aren't inline expanded unless explicitly
declared.

                           Vop          Count              Cost Percent
                      FIXNUM-*:     4,608,000        23,040,000 17.3%
            FAST-IF-</UNSIGNED:     4,296,966        12,890,898  9.7%
           MOVE-TO-WORD/FIXNUM:    11,051,534        11,668,496  8.7%
              KNOWN-CALL-LOCAL:     1,228,802         9,830,410  7.4%
         FAST-+/FIXNUM=>FIXNUM:     5,836,802         8,601,602  6.4%
              FAST-IF-</SIGNED:     2,457,600         7,372,800  5.5%
                  KNOWN-RETURN:     1,228,802         6,144,010  4.6%
DATA-VECTOR-REF/SIMPLE-ARRAY-SINGLE-FLOAT:
				    2,764,802         5,529,604  4.1%
              %ARRAY-DIMENSION:     4,611,049         5,226,090  3.9%
            %ARRAY-DATA-VECTOR:     4,608,002         5,222,402  3.9%
                 MOVE-ARGUMENT:     7,372,884         4,915,272  3.7%
                ALLOCATE-FRAME:     1,228,802         4,915,204  3.7%
                        BRANCH:     1,858,159         3,716,318  2.8%
         FAST--/FIXNUM=>FIXNUM:     1,228,800         3,686,400  2.8%

[No, this profiling tool is not documented....]

Please keep me updated on any other performance problems you run into.

  Rob

p.s.  Here are the I/O changes:

*** suml.lisp	Thu Jan 30 18:18:59 1992
--- hacked.lisp	Thu Jan 30 18:03:16 1992
***************
*** 35,40 ****
--- 35,47 ----
  (defun read-byte-image-raw (file w h)
    (declare (string file) (fixnum w h) (values byte-image))
    (let ((result (make-array (list w h) :element-type '(unsigned-byte 8))))
+     (declare (type byte-image result))
+     #+cmu
+     (with-open-file (stream file :direction :input
+ 			    :element-type '(unsigned-byte 8))
+       (lisp::with-array-data ((data result) (start) (end))
+ 	(system:read-n-bytes stream data start end)))
+     #-cmu
      (with-open-file (stream file :direction :input)
        (dotimes (i w)
  	(declare (fixnum i))		;INFERENCE1
***************
*** 44,49 ****
--- 51,63 ----
  
  (defun write-byte-image-raw (file image)
    (declare (string file) (type byte-image image))
+   #+cmu
+   (with-open-file (stream file :direction :output
+ 			  :if-exists :overwrite
+ 			  :element-type '(unsigned-byte 8))
+     (lisp::with-array-data ((data image) (start) (end))
+       (system:output-raw-bytes stream data start end)))
+   #-cmu
    (with-open-file (stream file :direction :output :if-exists :overwrite)
      (dotimes (i (array-dimension image 0))
        (declare (fixnum i))		;INFERENCE1
***************