[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

speedup for array ops - summary



Oops.  I think the last message got grunged.  It's long, so I'm attaching
it as a document.
It summarizes the effects that suggestions and code had on the function
that returns the limits
of arrays.
Thanks for your patience and help.
-- Dave
David S. Bright                        bright@enh.nist.gov  
Microanalysis Research Group                                   
National Institute of Standards & Technology (NIST, formerly NBS)
Gaithersburg, MD 20899                                           
USA
301-975-3911
Thank you to all those who sent replys to my inquiry:
RE: speedup array op.
This concerned a function that returned the maximum and minimum values for
an array.

Replies that I have so far are from:

Patrick Logan         patl@goldfish.mitron.tek.com
Rich Lynch  lynch@ils.nwu.edu
Alex Repenning  <ralex@tigger.cs.colorado.edu>
Bill St. Clair  bill@cambridge.apple.com
Michael Travers <mt@media.mit.edu>
Espen Vestre <espen@coli.uni-sb.de>

There was enough interest and there were enough suprises (for me anyway)
that
I thought it deserves another go-around.  What follows are the results that
I got in the 
form of notes, and the (hopefully minimally corrupted) versions of the
suggestions or code.  

I apologize for the length of this message - but in the interest of
clarity, I have
reproduced all of the definitions, even though many differ only slightly. 

I am pleased with the contributions and can use them as is, and would
appreciate explanations
of the various timings.  

How closely do the fastest LISP functions here approach C/FORTRAN/PASCAL
code?
(especially for various FIXNUM element types),  or assembly code?  Anyone
have a feel for 
this?  Would assembly code gain 10%, 100%, 1000% in speed?

Thanks again
-- Dave

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;   notes   
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Note 1:  This sets up a simple array, complex, real values, 256x256
elements.

Note 2:  Vanilla version without suggested improvements to avoid wasteful
code.  27 secs, 14 sec GC
I kept the wasteful version around for several iterations so that the
timings were
for similar code.  See Note 10 for effect of simple improvement.

Note 3:  Add a few declarations. They didn't help much - not enough of
them.  24 secs, 14 sec GC

Note 4:  Just making sure that the 1-D array used inside Note 2 example was
NOT simple (as
expected).  Bill St. Clair explains why this sort of array is NOT simple:

"A simple vector is one for which it makes the most sense to code
AREF in-line (given suitable declarations). AREF of a displaced
array needs to indirect to the displaced array and add the displacement
offset."

This is interesting because the byte version of this example (below) is one
of the fastest, despite
of a non-simple vector being used.

Note 5.  Using the simple array and running two indicies.  27 secs, 14 GC.

Note 6.  Same as above, but add declarations.  29 secs.  Oops.

Note 7.  Make a simple vector, fill it with stuff from the original array. 
Find the limits
directly on this vector.  (No declarations).  25 secs.  15sec GC.

Note 8. Add declarations.  24 secs.  16 GC.  NUTS!  :-(

Note 9. Use a list rather than an array.  Stuff list - 11.3 secs, 10 of
which is GC.
Find limits using list:  14.7 secs, 10.4 of which is GC. !
Why is the list operation so fast?

Note 10.  Back to square 1, (Note 2), avoid wasteful calls to abs.  14.6
sec, 9.5 GC.

Note 11.  Take the let out of the loop.  No change at all.  That's great --
compiler must be 
smart.  

Note 12.  Insert calls to floating point compiler.  13 secs.  7 GC. 
(Consing is still
being done, but where?)

Note 13.  Bill's faster version, without (!) floating point compiler: 10.6
secs, 6.6 GC.
A blow by blow explanation of why this works so well would be great.

---  Now, for the byte version ---

Note 14.  Set up the array again.  Everything went faster, so I quadrupled
the array size.

Note 15.  Vanilla version, no declarations, but no wasteful code.  5 secs. 
no GC.

Note 16.  Vanilla version with declarations.  3.7 secs.  no GC. 
Declarations were a respectable
help this time.  This was the 2nd fastest version!

Note 17.  Use a simple array, run two indices.  22 secs.

Note 18.  Same as above, with declarations.  20 secs.  Guess that's why
everyone uses displaced
arrays, whether or not they are simple.

Note 19.  Tried to generate simple vectors various ways, all of which
should have worked, I 
believe, according to the Optimization doc. (Khosla & St. Clair).  Note
that making the type
'unsigned-byte generates a fixnum simple vector, but making the type
'fixnum does not generate
a simple vector.  ???

Note 20.  Anyway, use a simple vector (fixnum, so 4x the storage)    10.8
secs.  

Note 21.  add the declaratios.  7.8 secs.

Note 22.  Bill's faster version --> 3.1 secs.


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;   code   
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;  ============  complex - real cases below 
=================================
;;;-------------------------------------------------
*** Note 1 ***
Test array setup
(defparameter *cara* (make-array '(256 256)))

;;; (ccl::simple-array-p *cara*) --> t

(time
 (dotimes (i 256)
   (dotimes (j 256)
     (setf (aref *cara* i j) (complex (random 3000.22) (random 333.33))))))
--> 43 secs.  27 GC.  3 CME.
;;;----------------------------------------------------------------------4DE
C92 8:56

*** Note 2 ***

(defmethod climits ((ara array))
  "Returns the maximum and minimum values of elements of an array."
  (let* ((asiz (array-total-size ara))
         (av (make-array asiz
	                 :displaced-to ara
	                 :element-type (array-element-type ara)))
         (amax (abs (aref av 0)))
         (amin (abs (aref av 1))))
    (dotimes (i asiz)
      (setq amax (max amax (abs(aref av i))))  
      (setq amin (min amin (abs(aref av i)))))    
    `((min ,amin)(max ,amax))))

(time (climits *cara*))  --> 27 secs.  14 GC,  1 MTE. --> ((MIN
2.207752189907997) (MAX 3017.639254217318))

;;; declaration (for comparison later)

*** Note 3 ***
(defmethod climits ((ara array))
  "Returns the maximum and minimum values of elements of an array."
  (declare (optimize (speed 3) (safety 0)))
  (let* ((asiz (array-total-size ara))
         (av (make-array asiz
	                 :displaced-to ara
	                 :element-type (array-element-type ara)))
         (amax (abs (aref av 0)))
         (amin (abs (aref av 1))))
    (declare (fixnum amax amin))
    (dotimes (i asiz)
      (declare (fixnum i))
      (setq amax (max amax (abs (aref av i))))  
      (setq amin (min amin (abs (aref av i)))))    
    `((min ,amin)(max ,amax))))

(time (climits *cara*)) --> 22 secs.  13 GC,  .5 MTE.
=========================================================================
***Note 4 ***
Make it a simple array (need to use two indices)., fixed size.

(adjustable-array-p *cara*) --> nil
(ccl::simple-array-p *cara*) --> T
(array-element-type *cara*) --> T
(class-of (aref *cara* 0 0)) --> #<BUILT-IN-CLASS COMPLEX>
(class-of (realpart (aref *cara* 0 0))) --> #<BUILT-IN-CLASS DOUBLE-FLOAT>

(defmethod test_climits ((ara array))
  (let* ((asiz (array-total-size ara))
         (av (make-array asiz
	                 :displaced-to ara
	                 :element-type (array-element-type ara))))
    (ccl::simple-array-p av)))

(test_climits *cara*) --> nil.  av is displaced and therefore NOT simple.
 
Now, for the simple array method:

***Note 5 ***

(defmethod climits ((ara array))
  "Returns the maximum and minimum values of elements of an array."
  (let* ((amax (abs (aref ara 0 0)))
         (amin (abs (aref ara 0 0))))
    (dotimes (i (first (array-dimensions ara)))
      (dotimes (j (second (array-dimensions ara)))
        (setq amax (max amax (abs(aref ara i j))))  
        (setq amin (min amin (abs(aref ara i j))))))    
    `((min ,amin)(max ,amax))))

(time (climits *cara*)) --> 27 secs.  14 GC, 2 MTE

***Note 6 ***

and, with the declaration:

(defmethod climits ((ara array))
  "Returns the maximum and minimum values of elements of an array."
  (declare (optimize (speed 3) (safety 0)))  
  (let* ((amax (abs (aref ara 0 0)))
         (amin (abs (aref ara 0 0))))
    (dotimes (i (first (array-dimensions ara)))
      (dotimes (j (second (array-dimensions ara)))
        (setq amax (max amax (abs(aref ara i j))))  
        (setq amin (min amin (abs(aref ara i j))))))    
    `((min ,amin)(max ,amax))))

(time (climits *cara*)) --> 29 secs.  14 GC, 2 MTE.

============================================================================
====
Vestre suggested a cleanup of the code inside - calling min or max half of
the
time.  Will do that later, as it destroyes the basis of the timing
comparisons.


 From Travers:

(defmacro svref& (vector index)
  `(locally
     (declare (optimize (speed 3) (safety 0)))
     (svref ,vector (the fixnum ,index))))

This requires the data to be simple vectors.  That means declaring the
vector first, and
displacing the array to it, I think.

*** Note 7 ***

For the moment:
.. make a simple vector..
(defparameter *sv* (make-array (* 256 256)))
(simple-vector-p *sv*) --> T

... move the data into it
(let* ((asiz (array-total-size *cara*))
       (av (make-array asiz
                       :displaced-to *cara*
                       :element-type (array-element-type *cara*))))
  (dotimes (i asiz)
    (setf (svref *sv* i) (aref av i))))

(defmethod climits ((sv simple-vector))
  (let* ((asiz (array-total-size sv))
         (amax (abs (svref sv 0)))
         (amin (abs (svref sv 1))))
    (dotimes (i asiz)
      (setq amax (max amax (abs(svref sv i))))  ;  ***
      (setq amin (min amin (abs(svref sv i)))))    ; ***
    `((min ,amin)(max ,amax))))

(time (climits *sv*)) -->25 secs, 15 GC, 2 MTE

... now, with the declarations:

(defmethod climits ((sv simple-vector))
  (declare (optimize (speed 3) (safety 0)))  
  (let* ((asiz (array-total-size sv))
         (amax (abs (svref sv 0)))
         (amin (abs (svref sv 1))))
    (dotimes (i asiz)
      (setq amax (max amax (abs(svref sv i))))  
      (setq amin (min amin (abs(svref sv i)))))   
    `((min ,amin)(max ,amax))))

(time (climits *sv*)) --> 24 secs.  16 GC, 1 MTE.


==========================================================================
*** Note 9 ***
from Espen:

(defun a-to-list (a)
  "makes a list out of any array"
  (coerce 
   (make-array (array-total-size a)
               :displaced-to a
               :element-type (array-element-type a))
   'list))

(defun listlim (numlist)
  "Returns the maximum and minimum values of a list of complex numbers."
  (let* ((amax (abs (first numlist)))
         (amin amax))
    (mapc #'(lambda (z)
              (setq z (abs z))
              (cond ((> z amax) (setq amax z))
                    ((< z amin) (setq amin z)))) 
          numlist)
    `((min ,amin)(max ,amax))))

(time (progn (setq *al* (a-to-list *cara*)) nil)) -->  11.3 secs  10.7 GC

(time (listlim *al*)) --> 14.7 secs.  10.4 GC.  .6 MTE.   !!!

Now, that is strange.
==========================================================================

*** note 10 ***

May be timing the wrong things.  Need to work on the inside of the loop.

(defmethod climits ((ara array))
  "Returns the maximum and minimum values of elements of an array."
  (let* ((asiz (array-total-size ara))
         (av (make-array asiz
	                 :displaced-to ara
	                 :element-type (array-element-type ara)))
         (amax (abs (aref av 0)))
         (amin (abs (aref av 1))))
    (dotimes (i asiz)
      (let ((absv (abs (aref av i))))
        (cond ((> absv amax) (setq amax absv))
              ((< absv amin) (setq amin absv)))))    
    `((min ,amin)(max ,amax))))


(time (climits *cara*)) --> 14.6 secs.  9.5 GC, .7 MTE

...  how about not putting the LET in the loop:

*** Note 11 ***

(defmethod climits ((ara array))
  "Returns the maximum and minimum values of elements of an array."
  (let* ((asiz (array-total-size ara))
         (av (make-array asiz
	                 :displaced-to ara
	                 :element-type (array-element-type ara)))
         (amax (abs (aref av 0)))
         (amin (abs (aref av 1)))
         (absv nil))
    (dotimes (i asiz)
      (setq absv (abs (aref av i)))
      (cond ((> absv amax) (setq amax absv))
            ((< absv amin) (setq amin absv))))    
    `((min ,amin)(max ,amax))))

(time (climits *cara*)) --> 14.6 secs.  9.5 GC, .7 MTE.  No change at all!

=======================================================================
... Now, how about the floating point compiler.
*** Note 12 ***

(defmethod climits ((ara array))
  "Returns the maximum and minimum values of elements of an array."
  (let* ((asiz (array-total-size ara))
         (av (make-array asiz
	                 :displaced-to ara
	                 :element-type (array-element-type ara)))
         (amax (cabs (aref av 0)))
         (amin (cabs (aref av 0)))
         (absv nil))
    (dotimes (i asiz)
      (setq absv (cabs (aref av i)))
      (cond ((> absv amax) (setq amax absv))
            ((< absv amin) (setq amin absv))))    
    `((min ,amin)(max ,amax))))

(defun cabs (cv)
  (let ((r (realpart cv))
        (i (imagpart cv)))
    (FPC::fpc-inline (+ (* i i) (* r r)))))

(defparameter rr 3.3)
(defparameter ii 0.9)
(FPC::fpc-inline (+ (* ii ii) (* rr rr)))

(time (climits *cara*)) --> 13 secs.  6.9 GC, .2 MTE

... Oops - forgot to keep the original array values.

(defmethod climits ((ara array))
  "Returns the maximum and minimum values of elements of an array."
  (let* ((asiz (array-total-size ara))
         (av (make-array asiz
	                 :displaced-to ara
	                 :element-type (array-element-type ara)))
         (amax (cabs (aref av 0)))
         (amin (cabs (aref av 0)))
         (vmax nil)(vmin nil)
         (vv nil)
         (absv nil))
    (dotimes (i asiz)
      (Setq vv (aref av i))
      (setq absv (cabs vv))
      (cond ((> absv amax) (setq vmax vv)(setq amax absv))
            ((< absv amin) (setq vmin vv)(setq amin absv))))    
    `((min ,vmin)(max ,vmax))))

(time (climits *cara*))  --> 13 secs  7 GC  .3 MTE.

(defmacro svref& (vector index)
  `(locally
     (declare (optimize (speed 3) (safety 0)))
     (svref ,vector (the fixnum ,index))))

... how about making all indices fixnums:

(defmethod climits ((ara array))
  "Returns the maximum and minimum values of elements of an array."
  (declare (optimize (speed 3) (safety 0)))
  (let* ((asiz (array-total-size ara))
         (av (make-array asiz
	                 :displaced-to ara
	                 :element-type (array-element-type ara)))
         (amax (cabs (aref av 0)))
         (amin (cabs (aref av 0)))
         (vmax nil)(vmin nil)
         (vv nil)
         (absv nil))
    (dotimes (i asiz)
      (declare (fixnum i))
      (Setq vv (aref av (the fixnum i)))
      (setq absv (cabs vv))
      (cond ((> absv amax) (setq vmax vv)(setq amax absv))
            ((< absv amin) (setq vmin vv)(setq amin absv))))    
    `((min ,vmin)(max ,vmax))))

(time (climits *cara*))  --> 13 secs  7 GC  .3 MTE

-----

example floating point compiler helper declarations. - see optimization
paper.

***Note 13 ***
(defmethod faster-climits ((ara array))
  "Returns the maximum and minimum values of elements of an array."
  (multiple-value-bind (av offset) (ccl::array-data-and-offset ara)
    (let* ((asiz (array-total-size ara))          
           (amax (abs (aref av 0)))       ;; the array starts with 0 not 1!
           (amin amax)
           (index offset))
      (declare (fixnum asiz index)
               (optimize (speed 3) (safety 0)))
      (macrolet ((doit ()
                   `(dotimes (i asiz)
                      (let ((value (abs (aref av index))))
                        (incf index)
                        (if (> value amax)
                          (setq amax value) ; no need to update amin
                          (when (< value amin)
                            (setq amin value)))))))
        (if (simple-vector-p av)
          (locally (declare (simple-vector av))
            (doit))
          (doit))
        `((min ,amin)(max ,amax))))))

(time (faster-climits *cara*)) --> 10.6 secs.  6.6 GC

;;; ======================  fixnum or byte case below
=======================
;;;----------------------------------------------------------------------4DE
C92 9:00
Test array setup

*** Note 14 ***

(defparameter *cara* (make-array '(512 512) :element-type '(unsigned-byte
8)
                                 :initial-element 0))

;;; (ccl::simple-array-p *cara*) --> t

(time
 (dotimes (i 512)
   (dotimes (j 512)
     (setf (aref *cara* i j) (the '(unsigned-byte 8) (random 255))))))

--> 29 secs  0 GC, .8 MTE
;;;----------------------------------------------------------------------4DE
C92 8:56

original using displaced array

*** Note 15 ***

(defmethod climits ((ara array))
  "Returns the maximum and minimum values of elements of an array."
  (let* ((asiz (array-total-size ara))
         (av (make-array asiz
	                 :displaced-to ara
	                 :element-type (array-element-type ara)))
         (amax (abs (aref av 0)))
         (amin (abs (aref av 1))))
    (dotimes (i asiz)
      (let ((absv (abs (aref av i))))
        (cond ((> absv amax) (setq amax absv))
              ((< absv amin) (setq amin absv)))))    
    `((min ,amin)(max ,amax))))

(time (climits *cara*)) --> 5.4 secs  .1 MTE  0 GC.

*** Note 16 ***

(defmethod climits ((ara array))
  "Returns the maximum and minimum values of elements of an array."
  (let* ((asiz (array-total-size ara))
         (av (make-array asiz
	                 :displaced-to ara
	                 :element-type (array-element-type ara)))
         (amax (abs (aref av 0)))
         (amin (abs (aref av 1))))
    (declare (fixnum amax amin))
    (dotimes (i asiz)
      (declare (optimize (speed 3)(safety 0))
               (fixnum i))
      (let ((absv (abs (aref av i))))
        (cond ((> absv amax) (setq amax absv))
              ((< absv amin) (setq amin absv)))))    
    `((min ,amin)(max ,amax))))

(time (climits *cara*)) --> 3.7 secs  .08 MTE  0 GC.

... try making the arefs to a simple array.  

*** Note 17 ***

(defmethod climits ((ara array))
  "Returns the maximum and minimum values of elements of an array."
  (let* ((amax (abs (aref ara 0 0)))
         (amin (abs (aref ara 0 0))))
    (dotimes (i (first (array-dimensions ara)))
      (dotimes (j (second (array-dimensions ara)))
        (setq amax (max amax (abs(aref ara i j))))  
        (setq amin (min amin (abs(aref ara i j))))))    
    `((min ,amin)(max ,amax))))

(time (climits *cara*)) --> 22 secs.  .5 MTE.

*** Note 18 ***

(defmethod climits ((ara array))
  (declare (type (simple-array '(unsigned-byte 8) (* *)) ara)
           (optimize (speed 3)(safety 0)))
  "Returns the maximum and minimum values of elements of an array."
  (let* ((amax (abs (aref ara 0 0)))
         (amin (abs (aref ara 0 0))))
    (declare (fixnum amax amin))
    (dotimes (i (first (array-dimensions ara)))
      (declare (fixnum i))
      (dotimes (j (second (array-dimensions ara)))
        (declare (fixnum j))
        (setq amax (max amax (abs (aref (the (simple-array (unsigned-byte
8) (* *)) ara) i j))))  
        (setq amin (min amin (abs (aref (the (simple-array (unsigned-byte
8) (* *)) ara) i j))))))    
    `((min ,amin)(max ,amax))))

(time (climits *cara*)) --> 20 secs.  

... need 1 dimensional array, allright.  Simple or not. ...

*** Note 19. ***

(defparameter *bb* (make-array (* 512 512)  :initial-element 0))
(ccl::simple-array-p *bb*) --> t
(simple-vector-p *bb*) --> t.
(defparameter *bb* (make-array (* 512 512)  :element-type t
:initial-element 0))
(simple-vector-p *bb*) --> t
(defparameter *bb* (make-array (* 512 512)  :element-type 'unsigned-byte
:initial-element 0))
(simple-vector-p *bb*) --> t
(class-of (svref *bb* 0)) --> fixnum
(defparameter *bb* (make-array (* 512 512)  :element-type '(signed-byte 16)
:initial-element 0))
(simple-vector-p *bb*) -->  nil
(defparameter *bb* (make-array (* 512 512)  :element-type '(unsigned-byte
8) :initial-element 0))
(simple-vector-p *bb*) --> nil
(defparameter *bb* (make-array (* 512 512)  :element-type 'fixnum
:initial-element 0))
(simple-vector-p *bb*) --> nil !


(defparameter *bb* (make-array (* 512 512)  :element-type 'unsigned-byte
:initial-element 0))
(dotimes (i (* 256 256) (setf (svref *bb* i) (random 255))))

*** Note 20 ***

(defmethod climits ((sv simple-vector))
  (let* ((asiz (array-total-size sv))
         (amax (abs (svref sv 0)))
         (amin (abs (svref sv 1))))
    (dotimes (i asiz)
      (setq amax (max amax (abs(svref sv i)))) 
      (setq amin (min amin (abs(svref sv i)))))   
    `((min ,amin)(max ,amax))))

(time (climits *bb*)) --> 10.8 secs.  .3 MTE

*** NOte 21 ***

(defmethod climits ((sv simple-vector))
  (let* ((asiz (array-total-size sv))
         (amax (abs (svref sv 0)))
         (amin (abs (svref sv 1))))
    (declare (fixnum amax amin))
    (dotimes (i asiz)
      (declare (fixnum i)(optimize (speed 3)(safety 0)))
      (setq amax (max amax (abs(svref (the simple-vector sv) i)))) 
      (setq amin (min amin (abs(svref (the simple-vector sv) i)))))   
    `((min ,amin)(max ,amax))))

(time (climits *bb*)) --> 7.8 secs.  .2 MTE.


;;;----------------------------------------------------------------------4DE
C92 14:15
;;;----------------------------------------------------------------------4DE
C92 14:15

*** Note 22 ***
Bill's recent version:

; Here's a faster version that uses the internal array-data-and-offset
; function to get a likely simple-vector. It also declares the dotimes
; limit variable to be a fixnum so that the increment and compare of
; i with the limit will be inline fixnum operations.

(defmethod faster-climits ((ara array))
  "Returns the maximum and minimum values of elements of an array."
  (multiple-value-bind (av offset) (ccl::array-data-and-offset ara)
    (let* ((asiz (array-total-size ara))          
           (amax (abs (aref av 0)))       ;; the array starts with 0 not 1!
           (amin amax)
           (index offset))
      (declare (fixnum asiz index)
               (optimize (speed 3) (safety 0)))
      (macrolet ((doit ()
                   `(dotimes (i asiz)
                      (let ((value (abs (aref av index))))
                        (incf index)
                        (if (> value amax)
                          (setq amax value) ; no need to update amin
                          (when (< value amin)
                            (setq amin value)))))))
        (if (simple-vector-p av)
          (locally (declare (simple-vector av))
            (doit))
          (doit))
        `((min ,amin)(max ,amax))))))

(time (faster-climits *cara*)) --> 3.1 secs.