[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

CMU CL 15c performance



I did some simple benchmarking, and I thought the results might
interest you.  The programs and some comments are appended in a shar
file. In these benchmarks, safe CMU CL seems to run at about 1/3 the
speed of C++, and 1/8 the speed of highly optimized C (unsafe is about
30% faster); this is using (optimize (speed 3) (compilation-speed 0)).
Is there some way to speed up compiled CMU CL further?

There are some cases in which the type inference algorithm doesn't seem
to make "obvious" inferences. I have marked them in the source file.
This didn't affect the benchmarks, though, since I simply put in the
declarations by hand. It is quite likely that these are due to my
incomplete understanding of the precise meaning of the CommonLisp
type system, though.

Using a higher-order function like FOLD-FLOAT-IMAGE seemed to result in
significant consing even though it was declared inline and all types
were known. Also, I couldn't figure out how to express a polymorphic
type constraint (I think the facility is simply missing from
CommonLisp), but to be able to do so would be useful for functions like
this.

Anyway, thanks for making CMU CL available.

					Thomas.


# This is a shell archive.  Remove anything before this line,
# then unpack it by saving it in a file and typing "sh file".
#
# Wrapped by volterra!tmb on Thu Jan 30 02:45:32 EST 1992
# Contents:  cmulisp.README cmulisp.transcript sumc.c suml.lisp sump.cc
 
echo x - cmulisp.README
sed 's/^@//' > "cmulisp.README" <<'@//E*O*F cmulisp.README//'
Timings on SparcStation IPC:

		source		 real		user	max heap(est)

cc -O4 		sumc.c		  5.9		  4.8
g++ -O 		sump.cc		 19.1		 17.2
g++ -O -DUNSAFE	sump.cc		 16.5		 15.1
cmulisp/safety0	suml.lisp	 46.9		 37.8	  13 M
cmulisp/safety1	suml.lisp	 50.8		 44.2	  13 M

See the file "cmulisp.transcript" for details.

Notes:

* The programs all compute a convolution of a 640x480 image with
  a rectangular indicator function using a sumtable. There are no
  floating point multiplies. Except for the lack of floating point
  multiply, the instruction mix (including I/O and array accessing) 
  is probably pretty typical of image processing programs and other
  number crunching applications in general.

* The C version uses an array-of-pointers to represent 2D arrays.
  This saves an integer multiply on each access; I don't know how 
  much of a difference this makes. The compiler is Sun cc (whatever
  comes with SunOS 4.1.1).

* The C++ version performs array bounds checking on every access.  It
  also pays a few seconds for several redundant copies involving the
  copy constructor (g++ 2.0 will fix this; AT&T C++ performs worse and
  manages to make two redundant copies each time). Also, the Sparc code 
  generator isn't all that great in g++ 1.39.1.

* The CommonLisp version is fully declared and compiled in a single
  compilation unit. The compiler is CMU CommonLisp 15c for the Sun4.

@//E*O*F cmulisp.README//
chmod u=rw,g=rw,o=r cmulisp.README
 
echo x - cmulisp.transcript
sed 's/^@//' > "cmulisp.transcript" <<'@//E*O*F cmulisp.transcript//'
volterra$ cc -O4 -o sumc sumc.c
volterra$ time sumc < /vmunix > /dev/null
reading converting summing converting writing done
        5.9 real         4.8 user         1.0 sys  
volterra$ g++ -v -O -o sump sump.cc
g++ version 1.39.1 (based on GCC 1.39)
 /usr/local/lib/gcc-cpp -+ -v -undef -D__GNUC__ -D__GNUG__ -D__cplusplus -Dsparc -Dsun -Dunix -D__sparc__ -D__sun__ -D__unix__ -D__OPTIMIZE__ sump.cc /tmp/cca13489.cpp
GNU CPP version 1.40
 /usr/local/lib/gcc-cc1plus /tmp/cca13489.cpp -quiet -dumpbase sump.cc -O -version -o /tmp/cca13489.s
GNU C++ version 1.39.1 (based on GCC 1.39) (sparc) compiled by GNU C version 1.40.
default target switches: -mfpu -mepilogue
 /usr/local/lib/gcc-as -o sump.o /tmp/cca13489.s
 /usr/local/lib/gcc-ld++ -o sump -e start -dc -dp /lib/crt0.o sump.o -lg++ /usr/local/lib/gcc-gnulib -lc
volterra$ time sump
reading [ByteImage copy]converting [FloatImage copy]summing [FloatImage copy][FloatImage copy]converting [ByteImage copy]writing done
       19.1 real        17.2 user         1.4 sys  
volterra$ cmulisp
CMU Common Lisp 15c, running on volterra
Hemlock 3.5 (15c), Python 1.0(15c), target SPARCstation/Sun 4
Send bug reports and questions to cmucl-bugs@cs.cmu.edu.
* (with-compilation-unit (:optimize '(optimize (speed 3) (safety 0) (compilation-speed 0))) (compile-file "suml.lisp"))

Python version 1.0(15c), VM version SPARCstation/Sun 4 on 30 JAN 92 01:13:19 am.
Compiling: /fs/vt/tmb/src/bench-sum/suml.lisp 30 JAN 92 01:12:41 am

Converted MESSAGE.
Converted READ-UCHAR.
Converted WRITE-UCHAR.
Converted READ-BYTE-IMAGE-RAW.
Converted WRITE-BYTE-IMAGE-RAW.
Converted CONVERT-BYTE-FLOAT-IMAGE.
Converted FOLD-FLOAT-IMAGE.
Converted FLOAT-IMAGE-RANGE.
Converted CONVERT-FLOAT-BYTE-IMAGE.
Converted SUMTABLE.
Converted LOCSUM.
Converted DOIT.
Compiling DEFTYPE BYTE-IMAGE: 
Compiling DEFTYPE FLOAT-IMAGE: 
Compiling DEFUN CONVERT-BYTE-FLOAT-IMAGE: 

File: /fs/vt/tmb/src/bench-sum/suml.lisp

In: DEFUN CONVERT-FLOAT-BYTE-IMAGE
  (MAKE-ARRAY (ARRAY-DIMENSIONS IMAGE)
              :ELEMENT-TYPE
              '(UNSIGNED-BYTE 8))
Note: Unable to optimize because:
      Dimension list not constant; cannot open code array creation


In: DEFUN LOCSUM
  (MAKE-ARRAY (LIST W H) :ELEMENT-TYPE 'SINGLE-FLOAT)
Note: Unable to optimize because:
      Dimension list not constant; cannot open code array creation


In: DEFUN SUMTABLE
  (MAKE-ARRAY (ARRAY-DIMENSIONS IMAGE) :ELEMENT-TYPE 'SINGLE-FLOAT)
Note: Unable to optimize because:
      Dimension list not constant; cannot open code array creation


In: DEFUN CONVERT-BYTE-FLOAT-IMAGE
  (MAKE-ARRAY (ARRAY-DIMENSIONS IMAGE) :ELEMENT-TYPE 'SINGLE-FLOAT)
Note: Unable to optimize because:
      Dimension list not constant; cannot open code array creation


In: DEFUN READ-BYTE-IMAGE-RAW
  (MAKE-ARRAY (LIST W H) :ELEMENT-TYPE '(UNSIGNED-BYTE 8))
Note: Unable to optimize because:
      Dimension list not constant; cannot open code array creation


In: DEFUN DOIT
  (LET* ((X1 #) (INPUT #) (X2 #) (FINPUT #) (X3 #) ...) T)
Warning: Variable X6 defined but never used.
Warning: Variable X5 defined but never used.
Warning: Variable X4 defined but never used.
Warning: Variable X3 defined but never used.
Warning: Variable X2 defined but never used.
Warning: Variable X1 defined but never used.


In: DEFUN FLOAT-IMAGE-RANGE
  (DEFUN FLOAT-IMAGE-RANGE (IMAGE)
    (DECLARE (TYPE FLOAT-IMAGE IMAGE)
             (VALUES SINGLE-FLOAT SINGLE-FLOAT))
    (LET (# #) (DECLARE #) (DOTIMES # # #) (VALUES MIN MAX)))
Note: Doing float to pointer coercion (cost 13) from MIN to "<return value>".
Note: Doing float to pointer coercion (cost 13) from MAX to "<return value>".
[GC threshold exceeded with 2,007,096 bytes in use.  Commencing GC.]
[GC completed with 1,076,464 bytes retained and 930,632 bytes freed.]
[GC will next occur when at least 3,076,464 bytes are in use.]

Compiling DEFUN FOLD-FLOAT-IMAGE: 

File: /fs/vt/tmb/src/bench-sum/suml.lisp

In: DEFUN FOLD-FLOAT-IMAGE
  (DEFUN FOLD-FLOAT-IMAGE (F BASE IMAGE)
    (DECLARE (TYPE # F)
             (SINGLE-FLOAT BASE)
             (TYPE FLOAT-IMAGE IMAGE)
             (VALUES SINGLE-FLOAT))
    (DOTIMES (I #) (DECLARE #) (DOTIMES # #))
    ...)
==>
  (C::%FUNCALL #<LAMBDA #xB050F95
                        NAME= FOLD-FLOAT-IMAGE
                        TYPE= #<FUNCTION-TYPE (FUNCTION
                                               #
                                               SINGLE-FLOAT)>
                        WHERE-FROM= :DEFINED
                        VARS= (F BASE IMAGE)>
               #:G113
               #:G114
               #:G115)
Note: Doing float to pointer coercion (cost 13) to BASE.

  (FUNCALL F BASE (AREF IMAGE I J))
==>
  (C::%FUNCALL F BASE (AREF IMAGE I J))
Note: Doing float to pointer coercion (cost 13).

Compiling Top-Level Form: 

/fs/vt/tmb/src/bench-sum/suml.sparcf written.
Compilation finished in 0:00:40.

Compilation unit finished.
  6 warnings
  9 notes


#p"/fs/vt/tmb/src/bench-sum/suml.sparcf"
T
T
* (load "suml")

T
* (time (doit "/vmunix" "/tmp/zzz"))
Compiling LAMBDA NIL: 
Compiling Top-Level Form: 
reading converting 
[GC threshold exceeded with 3,196,576 bytes in use.  Commencing GC.]
[GC completed with 1,868,424 bytes retained and 1,328,152 bytes freed.]
[GC will next occur when at least 3,868,424 bytes are in use.]
processing [GC threshold exceeded with 4,327,128 bytes in use.  Commencing GC.]
[GC completed with 4,326,192 bytes retained and 936 bytes freed.]
[GC will next occur when at least 6,326,192 bytes are in use.]
converting minmax scaling writing 
Evaluation took:
  46.91 seconds of real time
  37.82 seconds of user run time
  3.4 seconds of system run time
  115 page faults and
  4309080 bytes consed.
T
* (with-compilation-unit (:optimize '(optimize (speed 3) (safety 1) (compilation-speed 0))) (compile-file "suml.lisp"))

Python version 1.0(15c), VM version SPARCstation/Sun 4 on 30 JAN 92 01:16:51 am.
Compiling: /fs/vt/tmb/src/bench-sum/suml.lisp 30 JAN 92 01:12:41 am

Converted MESSAGE.
Converted READ-UCHAR.
Converted WRITE-UCHAR.
Converted READ-BYTE-IMAGE-RAW.
Converted WRITE-BYTE-IMAGE-RAW.
Converted CONVERT-BYTE-FLOAT-IMAGE.
Converted FOLD-FLOAT-IMAGE.
Converted FLOAT-IMAGE-RANGE.
Converted CONVERT-FLOAT-BYTE-IMAGE.
Converted SUMTABLE.
Converted LOCSUM.
Converted DOIT.
Compiling DEFTYPE BYTE-IMAGE: 
Compiling DEFTYPE FLOAT-IMAGE: 
Compiling DEFUN CONVERT-BYTE-FLOAT-IMAGE: 

File: /fs/vt/tmb/src/bench-sum/suml.lisp

In: DEFUN READ-BYTE-IMAGE-RAW
  (MAKE-ARRAY (LIST W H) :ELEMENT-TYPE '(UNSIGNED-BYTE 8))
Note: Unable to optimize because:
      Dimension list not constant; cannot open code array creation


In: DEFUN SUMTABLE
  (MAKE-ARRAY (ARRAY-DIMENSIONS IMAGE) :ELEMENT-TYPE 'SINGLE-FLOAT)
Note: Unable to optimize because:
      Dimension list not constant; cannot open code array creation


In: DEFUN CONVERT-BYTE-FLOAT-IMAGE
  (MAKE-ARRAY (ARRAY-DIMENSIONS IMAGE) :ELEMENT-TYPE 'SINGLE-FLOAT)
Note: Unable to optimize because:
      Dimension list not constant; cannot open code array creation


In: DEFUN LOCSUM
  (MAKE-ARRAY (LIST W H) :ELEMENT-TYPE 'SINGLE-FLOAT)
Note: Unable to optimize because:
      Dimension list not constant; cannot open code array creation


In: DEFUN CONVERT-FLOAT-BYTE-IMAGE
  (MAKE-ARRAY (ARRAY-DIMENSIONS IMAGE)
              :ELEMENT-TYPE
              '(UNSIGNED-BYTE 8))
Note: Unable to optimize because:
      Dimension list not constant; cannot open code array creation


In: DEFUN DOIT
  (LET* ((X1 #) (INPUT #) (X2 #) (FINPUT #) (X3 #) ...) T)
Warning: Variable X6 defined but never used.
Warning: Variable X5 defined but never used.
Warning: Variable X4 defined but never used.
Warning: Variable X3 defined but never used.
Warning: Variable X2 defined but never used.
Warning: Variable X1 defined but never used.


In: DEFUN CONVERT-FLOAT-BYTE-IMAGE
  (TRUNCATE (* SCALE (- # MIN)))
--> TRUNCATE LET 
==>
  (KERNEL:%UNARY-TRUNCATE C::X)
Note: Forced to do full call.
      Unable to do inline float truncate (cost 5) because:
      Can't trust output type assertion under safe policy.
[GC threshold exceeded with 6,332,448 bytes in use.  Commencing GC.]
[GC completed with 1,176,304 bytes retained and 5,156,144 bytes freed.]
[GC will next occur when at least 3,176,304 bytes are in use.]
Note: Doing float to pointer coercion (cost 13).


In: DEFUN FLOAT-IMAGE-RANGE
  (DEFUN FLOAT-IMAGE-RANGE (IMAGE)
    (DECLARE (TYPE FLOAT-IMAGE IMAGE)
             (VALUES SINGLE-FLOAT SINGLE-FLOAT))
    (LET (# #) (DECLARE #) (DOTIMES # # #) (VALUES MIN MAX)))
Note: Doing float to pointer coercion (cost 13) from MIN to "<return value>".
Note: Doing float to pointer coercion (cost 13) from MAX to "<return value>".

Compiling DEFUN FOLD-FLOAT-IMAGE: 

File: /fs/vt/tmb/src/bench-sum/suml.lisp

In: DEFUN FOLD-FLOAT-IMAGE
  (FUNCALL F BASE (AREF IMAGE I J))
==>
  (C::%FUNCALL F BASE (AREF IMAGE I J))
Note: Doing float to pointer coercion (cost 13).

Compiling Top-Level Form: 

/fs/vt/tmb/src/bench-sum/suml.sparcf written.
Compilation finished in 0:00:20.

Compilation unit finished.
  6 warnings
  10 notes


#p"/fs/vt/tmb/src/bench-sum/suml.sparcf"
T
T
* (load "suml")

T
* (time (doit "/vmunix" "/tmp/zzz"))
Compiling LAMBDA NIL: 
Compiling Top-Level Form: 
reading converting [GC threshold exceeded with 3,734,664 bytes in use.  Commencing GC.]
[GC completed with 1,917,272 bytes retained and 1,817,392 bytes freed.]
[GC will next occur when at least 3,917,272 bytes are in use.]
processing [GC threshold exceeded with 4,375,976 bytes in use.  Commencing GC.]
[GC completed with 4,367,232 bytes retained and 8,744 bytes freed.]
[GC will next occur when at least 6,367,232 bytes are in use.]
converting minmax scaling [GC threshold exceeded with 6,373,384 bytes in use.  Commencing GC.]
[GC completed with 2,216,496 bytes retained and 4,156,888 bytes freed.]
[GC will next occur when at least 4,216,496 bytes are in use.]
writing 
Evaluation took:
  50.83 seconds of real time
  44.26 seconds of user run time
  4.77 seconds of system run time
  82 page faults and
  6769072 bytes consed.
T
* (quit)
volterra$ 
@//E*O*F cmulisp.transcript//
chmod u=rw,g=rw,o=r cmulisp.transcript
 
echo x - sumc.c
sed 's/^@//' > "sumc.c" <<'@//E*O*F sumc.c//'
/* Convolution using sum tables.
   Thomas M. Breuel, 29 Jan 1992 */

#include <math.h>
#include <stdio.h>

void *malloc();

float **fialloc(w,h) {
	float *base = malloc(w*h*sizeof (float));
	float **result = malloc(w*sizeof (float*));
	int i;
	for(i=0;i<w;i++) result[i]=base+i*h;
	return result;
}

fifree(p) float **p; {
	free(p[0]);
	free(p);
}

unsigned char **ucialloc(w,h) {
	unsigned char *base = malloc(w*h*sizeof (unsigned char));
	unsigned char **result = malloc(w*sizeof (unsigned char*));
	int i;
	for(i=0;i<w;i++) result[i]=base+i*h;
	return result;
}

ucifree(p) unsigned char **p; {
	free(p[0]);
	free(p);
}

unsigned char **ucifread_raw(stream,w,h) FILE *stream; {
	int i,j;
	unsigned char **result=ucialloc(w,h);
	for(i=0;i<w;i++) for(j=0;j<h;j++) result[i][j]=getc(stream);
	return result;
}

ucifwrite_raw(stream,p,w,h) FILE *stream; unsigned char **p; {
	int i,j;
	for(i=0;i<w;i++) for(j=0;j<h;j++) putc(p[i][j],stream);
}

float **uci_to_fi(uci,w,h) unsigned char **uci; {
	int i,j;
	float **result=fialloc(w,h);
	for(i=0;i<w;i++) for(j=0;j<h;j++) result[i][j]=uci[i][j];
	return result;
}

unsigned char **fi_to_uci(fi,w,h) float **fi; {
	int i,j;
	unsigned char **result=ucialloc(w,h);
	float min=1e38,max=-1e38;
	float scale;
	for(i=0;i<w;i++) for(j=0;j<h;j++) if(fi[i][j]<min) min=fi[i][j];
	for(i=0;i<w;i++) for(j=0;j<h;j++) if(fi[i][j]>max) max=fi[i][j];
	if(max<=min) scale=1.0; else scale=255.0/(max-min);
	for(i=0;i<w;i++) for(j=0;j<h;j++) result[i][j]=(int)(scale*(fi[i][j]-min));
	return result;
}

float **sumtable(fi,w,h) float **fi; {
	int i,j; float sum;
	float **result=fialloc(w,h);
	for(i=0;i<w;i++) for(j=0,sum=0.0;j<h;j++) {
		float t=fi[i][j];
		result[i][j]=sum;
		sum+=t;
	}
	for(j=0;j<h;j++) for(i=0,sum=0.0;i<w;i++) {
		float t=result[i][j];
		result[i][j]=sum;
		sum+=t;
	}
	return result;
}

float sub(fi,w,h,x,y) float **fi; {
	if(x<0) x=0; else if(x>=w) x=w-1;
	if(y<0) y=0; else if(y>=h) y=h-1;
	return fi[x][y];
}

float **locsum(fi,w,h,dx,dy) float **fi; {
	int i,j;
	float **sum=sumtable(fi,w,h);
	float **result=fialloc(w,h);
#define S(x,y) sub(sum,w,h,x,y)
	for(i=0;i<w;i++) for(j=0;j<h;j++)
		result[i][j]=S(i-dx,j-dy)+S(i+dx,j+dy)-S(i-dx,j+dy)-S(i+dx,j-dy);
	fifree(sum);
	return result;
}

main() {
	unsigned char **input,**output;
	float **finput,**foutput;
	fprintf(stderr,"reading ");
	input = ucifread_raw(stdin,640,480);
	fprintf(stderr,"converting ");
	finput = uci_to_fi(input,640,480);
	fprintf(stderr,"summing ");
	foutput = locsum(finput,640,480,5,5);
	fprintf(stderr,"converting ");
	output = fi_to_uci(foutput,640,480);
	fprintf(stderr,"writing ");
	printf("VISF=\n");
	printf("ETYPE=uchar\n");
	printf("DIMS=640 480\n");
	printf("\n");
	ucifwrite_raw(stdout,output,640,480);
	fprintf(stderr,"done\n");
}
@//E*O*F sumc.c//
chmod u=rw,g=rw,o=r sumc.c
 
echo x - suml.lisp
sed 's/^@//' > "suml.lisp" <<'@//E*O*F suml.lisp//'
;;;
;;; convolution using sum tables
;;;
;;; declarations are specific for CMU CommonLisp
;;;
;;; Thomas M. Breuel, 29 Jan 1992
;;;


;;; INFERENCE1:
;;;
;;; The variable assumes the range 0 ... (- (ARRAY-DIMENSION # #) 1) and
;;; is not assigned to; the variable is also used as an array subscript

;;; INFERENCE2:
;;;
;;; The variable is initialized as a single-float, and all assignments to
;;; are also obviously of type single-float (e.g., assignments of the form
;;; (setq var (min var single-float-value)))

;;; INFERENCE3:
;;;
;;; The variable is used as an array subscript.

(declaim (start-block))

(defun message (x) (princ x) (force-output))

(deftype byte-image () '(simple-array (unsigned-byte 8) (* *)))
(deftype float-image () '(simple-array single-float (* *)))

(defun read-uchar (stream) (char-code (read-char stream)))
(defun write-uchar (stream char) (write-char (code-char char) stream))

(defun read-byte-image-raw (file w h)
  (declare (string file) (fixnum w h) (values byte-image))
  (let ((result (make-array (list w h) :element-type '(unsigned-byte 8))))
    (with-open-file (stream file :direction :input)
      (dotimes (i w)
	(declare (fixnum i))		;INFERENCE1
	(dotimes (j h)
	  (setf (aref result i j) (read-uchar stream)))))
    result))

(defun write-byte-image-raw (file image)
  (declare (string file) (type byte-image image))
  (with-open-file (stream file :direction :output :if-exists :overwrite)
    (dotimes (i (array-dimension image 0))
      (declare (fixnum i))		;INFERENCE1
      (dotimes (j (array-dimension image 1))
	(write-uchar stream (aref image i j))))))

(defun convert-byte-float-image (image)
  (declare (type byte-image image) (values float-image))
  (let ((result (make-array (array-dimensions image) :element-type 'single-float)))
    (dotimes (i (array-dimension image 0))
      (declare (fixnum i))		;INFERENCE1
      (dotimes (j (array-dimension image 1))
	(setf (aref result i j) (float (aref image i j)))))
    result))

; despite the inline declaration, (fold-float-image #'min initial image) still
; conses a lot

(defun fold-float-image (f base image)
  (declare (type (function (single-float single-float) single-float) f)
	   (single-float base) (type float-image image) (values single-float))
  ;can I express the following type constraints in CommonLisp?
  ;(declare (type (function (A single-float) A) f) (A base) (type float-image image) (values A))
  (dotimes (i (array-dimension image 0))
    (declare (fixnum i))		;INFERENCE1
    (dotimes (j (array-dimension image 1))
      (setq base (funcall f base (aref image i j)))))
  base)
(declaim (inline fold-float-image))

(defun float-image-range (image)
  (declare (type float-image image) (values single-float single-float))
  (let ((min (aref image 0 0))
	(max (aref image 0 0)))
    (declare (single-float min max))	;INFERENCE2
    (dotimes (i (array-dimension image 0))
      (declare (fixnum i))		;INFERENCE1
      (dotimes (j (array-dimension image 1))
	(setq min (min min (aref image i j)))
	(setq max (max max (aref image i j)))))
    (values min max)))

(defun convert-float-byte-image (image)
  (declare (type float-image image) (values byte-image))
  (message "minmax ")
  (multiple-value-bind (min max) (float-image-range image)
    (declare (single-float min max))	;INFERENCE2
    (message "scaling ")
    (let ((scale (/ 255.0 (- max min)))
	  (result (make-array (array-dimensions image) :element-type '(unsigned-byte 8))))
      (declare (single-float scale))	;INFERENCE2
      (dotimes (i (array-dimension image 0))
	(declare (fixnum i))		;INFERENCE1
	(dotimes (j (array-dimension image 1))
	  (setf (aref result i j) (truncate (* scale (- (aref image i j) min))))))
      result)))

(defun sumtable (image)
  (declare (type float-image image) (values float-image))
  (let ((result (make-array (array-dimensions image) :element-type 'single-float))
	(sum 0.0s0))
    (declare (single-float sum))	;INFERENCE2
    (dotimes (i (array-dimension image 0))
      (declare (fixnum i))		;INFERENCE1
      (setq sum 0.0s0)
      (dotimes (j (array-dimension image 1))
	(incf sum (prog1 (aref image i j) (setf (aref result i j) sum)))))
    (dotimes (j (array-dimension image 1))
      (declare (fixnum j))		;INFERENCE1
      (setq sum 0.0s0)
      (dotimes (i (array-dimension image 0))
	(incf sum (prog1 (aref result i j) (setf (aref result i j) sum)))))
    result))

(defun locsum (image dx dy)
  (declare (type float-image image) (fixnum dx dy))
  (let* ((table (sumtable image))
	 (w (array-dimension image 0))
	 (h (array-dimension image 1))
	 (result (make-array (list w h) :element-type 'single-float)))
    (flet ((at (i j)
	       (declare (fixnum i j))	;INFERENCE3
	       (cond ((< i 0) (setq i 0)) ((>= i w) (setq i (- w 1))))
	       (cond ((< j 0) (setq j 0)) ((>= j h) (setq j (- h 1))))
	       (aref table i j)))
      (dotimes (i w)
	(declare (fixnum i))		;INFERENCE1
	(dotimes (j h)
	  (declare (fixnum j))		;INFERENCE1
	  (setf (aref result i j)
		(+ (at (+ i dx) (+ j dy))
		   (at (- i dx) (- j dy))
		   (- (at (+ i dx) (- j dy)))
		   (- (at (- i dx) (+ j dy))))))))
    result))

(defun doit (&optional (in "/vmunix") (out "/dev/null"))
  (let* ((x1 (message "reading "))
	 (input (read-byte-image-raw in 640 480))
	 (x2 (message "converting "))
	 (finput (convert-byte-float-image input))
	 (x3 (message "processing "))
	 (foutput (locsum finput 5 5))
	 (x4 (message "converting "))
	 (output (convert-float-byte-image foutput))
	 (x5 (message "writing "))
	 (x6 (write-byte-image-raw out output)))
    t))

(declaim (end-block))
@//E*O*F suml.lisp//
chmod u=rw,g=rw,o=r suml.lisp
 
echo x - sump.cc
sed 's/^@//' > "sump.cc" <<'@//E*O*F sump.cc//'
//
// Convolution using sum tables
//
// Thomas M. Breuel, 29 Jan 1992
//

extern "C" {
#include <math.h>
#include <stdio.h>
int abort();
}

class ByteImage;
class FloatImage;

struct ByteImage {
	unsigned char *data;
	int dims[2];
	ByteImage(int w,int h) {
		dims[0]=w; dims[1]=h;
		data = new unsigned char[dims[0]*dims[1]];
	}
	ByteImage(int *dims) {
		this->dims[0]=dims[0];
		this->dims[1]=dims[1];
		data = new unsigned char[dims[0]*dims[1]];
	}
	~ByteImage() {
		delete data;
	}
	unsigned char &operator()(int i,int j) {
#ifndef UNSAFE
		if(unsigned(i)>=dims[0]||unsigned(j)>=dims[1]) abort();
#endif
		return data[i+j*dims[0]];
	}
	int dim(int n) {return dims[n];}
	void operator=(ByteImage &other) {
		if(data) delete data;
		dims[0]=other.dims[0];
		dims[1]=other.dims[1];
		int n=dims[0]*dims[1];
		data=new unsigned char[n];
		for(int i=0;i<n;i++) data[i]=other.data[i];
	}
	ByteImage(ByteImage &other) {
		fprintf(stderr,"[ByteImage copy]");
		data=0;
		*this=other;
	}
	operator FloatImage();
};

ByteImage read_raw(char *file,int w,int h) {
	FILE *stream=fopen(file,"r");
	ByteImage result(w,h);
	for(int i=0;i<w;i++) for(int j=0;j<h;j++) {
		char c = getc(stream);
		result(i,j) = *(unsigned char *)&c;
	}
	fclose(stream);
	return result;
}

void write_vis(char *file,ByteImage &image) {
	FILE *stream=fopen(file,"w");
	fprintf(stream,"VISF=\n");
	fprintf(stream,"DIMS=%d %d\n",image.dim(0),image.dim(1));
	fprintf(stream,"ETYPE=uchar\n");
	fprintf(stream,"\n");
	for(int i=0;i<image.dim(0);i++) for(int j=0;j<image.dim(1);j++) {
		unsigned char c=image(i,j);
		putc(c,stream);
	}
	fclose(stream);
}

struct FloatImage {
	float *data;
	int dims[2];
	FloatImage(int w,int h) {
		dims[0]=w; dims[1]=h;
		data = new float[dims[0]*dims[1]];
	}
	FloatImage(int *dims) {
		this->dims[0]=dims[0];
		this->dims[1]=dims[1];
		data = new float[dims[0]*dims[1]];
	}
	~FloatImage() {
		delete data;
	}
	float &operator()(int i,int j) {
#ifndef UNSAFE
		if(unsigned(i)>=dims[0]||unsigned(j)>=dims[1]) abort();
#endif
		return data[i+j*dims[0]];
	}
	int dim(int n) {return dims[n];}
	void operator=(FloatImage &other) {
		if(data) delete data;
		dims[0]=other.dims[0];
		dims[1]=other.dims[1];
		int n=dims[0]*dims[1];
		data=new float[n];
		for(int i=0;i<n;i++) data[i]=other.data[i];
	}
	FloatImage(FloatImage &other) {
		fprintf(stderr,"[FloatImage copy]");
		data=0;
		*this=other;
	}
	operator ByteImage();
};

ByteImage::operator FloatImage() {
	int i,j;
	FloatImage result(dims);
	for(i=0;i<dim(0);i++) for(j=0;j<dim(1);j++) result(i,j) = float(operator()(i,j));
	return result;
}

FloatImage::operator ByteImage() {
	int i,j;
	ByteImage result(dims);
	float min=operator()(0,0);
	for(i=0;i<dim(0);i++) for(j=0;j<dim(1);j++) if(operator()(i,j)<min) min=operator()(i,j);
	float max=operator()(0,0);
	for(i=0;i<dim(0);i++) for(j=0;j<dim(1);j++) if(operator()(i,j)>max) max=operator()(i,j);
	float scale=255.0/(max-min);
	for(i=0;i<dim(0);i++) for(j=0;j<dim(1);j++) result(i,j) = int(scale*(operator()(i,j)-min));
	return result;
}

struct Sumtable:FloatImage {
	Sumtable(int w,int h):FloatImage(w,h) {}
	Sumtable(int *dims):FloatImage(dims) {}
	float &operator()(int i,int j) {
		if(i<0) i=0; else if(i>=dims[0]) i=dims[0]-1;
		if(j<0) j=0; else if(j>=dims[1]) j=dims[1]-1;
		return data[i+j*dim(0)];
	}
	void operator=(Sumtable &other) {
		this->FloatImage::operator=(other);
	}
	Sumtable(Sumtable &other):FloatImage(other) {}
};

Sumtable sumtable(FloatImage &image) {
	Sumtable result(image.dims);
	int w=image.dim(0),h=image.dim(1);
	for(int i=0;i<w;i++) {
		float sum=0.0;
		for(int j=0;j<h;j++) {
			float t=image(i,j);
			result(i,j)=sum;
			sum+=t;
		}
	}
	for(int j=0;j<h;j++) {
		float sum=0.0;
		for(int i=0;i<w;i++) {
			float t=result(i,j);
			result(i,j)=sum;
			sum+=t;
		}
	}
	return result;
}

FloatImage locsum(FloatImage &image,int dx,int dy) {
	Sumtable S = sumtable(image);
	FloatImage result(image.dims);
	int i,j;
	int w=image.dim(0),h=image.dim(1);
	for(i=0;i<w;i++) for(j=0;j<h;j++)
		result(i,j)=S(i-dx,j-dy)+S(i+dx,j+dy)-S(i-dx,j+dy)-S(i+dx,j-dy);
	return result;
}

main() {
	fprintf(stderr,"reading ");
	ByteImage input = read_raw("/vmunix",640,480);
	fprintf(stderr,"converting ");
	FloatImage finput = input;
	fprintf(stderr,"summing ");
	FloatImage foutput = locsum(finput,5,5);
	fprintf(stderr,"converting ");
	ByteImage output = foutput;
	fprintf(stderr,"writing ");
	write_vis("/tmp/1",output);
	fprintf(stderr,"done\n");
}
@//E*O*F sump.cc//
chmod u=rw,g=rw,o=r sump.cc
 
exit 0