[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[no subject]



Dear MCLers,

two days from now I am up to speak on the prospects of parallel processing 
(for astronomical applications such as HST image restoration). The
target architecture will be a Connection Machine 5 (CM5) which
supports *Lisp. I want people to think hard about Lisp as a viable
high-level language for scientific computing. So much for the background.

While preparing the talk I made a comparison between Fortran 77,
Fortran 90 and Common Lisp, and no surprise, CL downs the other two
languages on almost all counts, though Fortran 90 came out
surprisingly well.

My audience will mainly consist of scientists, whose experience is in
numerical computing. So I expect the question: "What about the
performance?" In order to prepare myself I wrote a miniature benchmark
computing the logarithm of n-factorial in both Fortran-77 and MCL.
The code simply sums the logs of the individual terms and is appended below.

The timing tests were extremely disappointing. Though I used all
sorts of declarations in the end, the CL code was consistently slower
by a factor of 5 to 6 across the SE, SE/30 and SUN Sparc 2 (+ Allegro
CL) platforms used for benchmarking.

I know that neither MCL nor Allegor CL are running on the CM5, but
the consistent slower performance (two different Fortran compilers,
two different CL compilers) of Lisp w.r to Fortran worries me.

Any clue how to improve the performance? If the situation prevails, I
will find it difficult to recommend CL for supercomputing
applications.

Thanks.

Hans-Martin Adorf
Space Telescope - European Coordinating Facility
European Southern Observatory
Karl-Schwarzschild-Str. 2
D-8046 Garching b. Muenchen


F77 code -------------------
program benchm

        integer*4 n
        real*4 result, logfac

C       Read-eval-print loop
        dowhile (.true.)
          write (*, *) "> "
          read (*, *) n
          result = logfac(n)
          write (*, *) result
        enddo
        
        stop
        end
        
        
        real*4 function logfac(n)
C       Compute log of n-factorial
        
        integer*4 n, i
        
        logfac = 0.0
        do i = 1, n
          logfac = logfac + alog10(float(i))
        enddo
        
        return
        end
        
CL code ---------------------

;;;; -*- Mode; Common-Lisp; Syntax: Common-Lisp; Package: CL-USER -*-
;;;;
;;;; Floating point benchmark tests

(declaim (optimize (safety 0) (speed 3)))

(defmacro sum ((i i-min i-max &optional (i-inc 1)) s-exp)
  "Sums expression with inclusive upper bound."
  `(let ((result 0))
     (do ((,i ,i-min (incf ,i ,i-inc)))
         ((> ,i ,i-max) result)
       (incf result ,s-exp))))
;; (sum (i 1 10) i)
;; (sum (i 1 2) (* i i))

(defun factorial-a (n)
  "Calculates n-factorial recursively and exactly."
  (if (= n 0)
    1
    (* n (factorial-a (1- n)))))
;;(time (factorial-a 200))
#|
Apple Mac SE:
(FACTORIAL-A 200) took 307 milliseconds (0.307 seconds) to run.
Of that, 10 milliseconds (0.010 seconds) were spent in The Cooperative Multitasking Experience.
 16152 bytes of memory allocated.
Apple Mac SE/30:
(FACTORIAL-A 200) took 48 milliseconds (0.048 seconds) to run.
 16152 bytes of memory allocated.
|#

(defun factorial-b (n)
  "Calculates n-factorial recursively and exactly with declarations."
  (declare (type integer n)
           (optimize (speed 3) (safety 0)))
  (if (= n 0)
    1
    (* n (the integer (factorial-b (1- n))))))
;;(time (factorial-b 200))
#|
Apple Mac SE:
(FACTORIAL-B 200) took 302 milliseconds (0.302 seconds) to run.
Of that, 9 milliseconds (0.009 seconds) were spent in The Cooperative Multitasking Experience.
 16152 bytes of memory allocated.
Apple Mac SE/30:
(FACTORIAL-B 200) took 47 milliseconds (0.047 seconds) to run.
 16152 bytes of memory allocated.
|#
;; Conclusion: the declarations do not help in MCL 2.0f

(defun log-factorial-a (n)
  "Calculates log of n-factorial recursively."
  (if (= n 0)
    0
    (+ (log n) (log-factorial-a (1- n)))))
;; (time (log-factorial-a 200))
#|
Apple Mac SE:
(LOG-FACTORIAL-A 200) took 6166 milliseconds (6.166 seconds) to run.
Of that, 316 milliseconds (0.316 seconds) were spent in The Cooperative Multitasking Experience.
 3200 bytes of memory allocated.
Apple Mac SE/30:
(LOG-FACTORIAL-A 200) took 79 milliseconds (0.079 seconds) to run.
 3256 bytes of memory allocated.
863.2319871924054
|#

(defun log-factorial-c (n)
  "Calculates log of n-factorial non-recursively."
  (let ((result 0))
    (do ((i 1 (1+ i)))
        ((> i n) result)
      (incf result (log i)))))
;; (time (log-factorial-c 200))
#|
Apple Mac SE:
(LOG-FACTORIAL-C 200) took 6226 milliseconds (6.226 seconds) to run.
Of that, 326 milliseconds (0.326 seconds) were spent in The Cooperative Multitasking Experience.
 3200 bytes of memory allocated.
Apple Mac SE/30:
(LOG-FACTORIAL-C 200) took 56 milliseconds (0.056 seconds) to run.
 3200 bytes of memory allocated.
863.2319871924054
|#

(defun log-factorial-d (n)
  "Calculates log of n-factorial non-recursively with body-less loop."
  (do ((i 1 (1+ i))
       (result 0 (+ result (log i))))
      ((> i n) result)))
;; (time (log-factorial-d 200))
#|
Apple Mac SE:
(LOG-FACTORIAL-D 200) took 6158 milliseconds (6.158 seconds) to run.
Of that, 271 milliseconds (0.271 seconds) were spent in The Cooperative Multitasking Experience.
 3200 bytes of memory allocated.
Apple Mac SE/30:
(LOG-FACTORIAL-D 200) took 56 milliseconds (0.056 seconds) to run.
 3200 bytes of memory allocated.
863.2319871924054
|#
       
(defun log-factorial-e (n &aux (result 0.0))
  "Calculates n-factorial non-recursively with declarations."
  (declare (type integer i n)
           (type float result))
  (do ((i 1 (1+ i)))
      ((> i n) result)
    (incf result (the float (log i)))))
;; (time (log-factorial-e 200))
#|
(LOG-FACTORIAL-C 200) took 6136 milliseconds (6.136 seconds) to run.
Of that, 250 milliseconds (0.250 seconds) were spent in The Cooperative Multitasking Experience.
 3200 bytes of memory allocated.
863.2319871924054
|#

(defun log-factorial-f (n)
  "Calculates n-factorial non-recursively using the sum macro."
  (sum (i 1 n) (log i)))
;; (time (log-factorial-f 200))

#|
(compile-file "~hmadorf/ai/lisp/benchmarks/benchmark.lisp")
(load "~hmadorf/ai/lisp/benchmarks/benchmark.fasl")

(defvar n)
(setf n 100000)
;; (time (log-factorial-a n))
(time (log-factorial-c n))
(time (log-factorial-d n))
(time (log-factorial-e n))
(time (log-factorial-f n))

#|
-------------------------- Mac SE/30:
n = 1000
(LOG-FACTORIAL-A N) took 316 milliseconds (0.316 seconds) to run.
Of that, 24 milliseconds (0.024 seconds) were spent in The Cooperative Multitasking Experience.
 16000 bytes of memory allocated.
(LOG-FACTORIAL-C N) took 365 milliseconds (0.365 seconds) to run.
Of that, 65 milliseconds (0.065 seconds) were spent in The Cooperative Multitasking Experience.
 16000 bytes of memory allocated.
(LOG-FACTORIAL-D N) took 325 milliseconds (0.325 seconds) to run.
Of that, 24 milliseconds (0.024 seconds) were spent in The Cooperative Multitasking Experience.
 16000 bytes of memory allocated.
(LOG-FACTORIAL-E N) took 317 milliseconds (0.317 seconds) to run.
Of that, 5 milliseconds (0.005 seconds) were spent in The Cooperative Multitasking Experience.
 16000 bytes of memory allocated.
(LOG-FACTORIAL-F N) took 339 milliseconds (0.339 seconds) to run.
Of that, 24 milliseconds (0.024 seconds) were spent in The Cooperative Multitasking Experience.
 16000 bytes of memory allocated.
5912.128178488171

------------------------ SUN Sparc-2:
USER(10): (defvar n)
(setf n 10000)
(time (log-factorial-a n))
(time (log-factorial-c n))
(time (log-factorial-d n))
(time (log-factorial-e n))
(time (log-factorial-f n))
USER(11): 10000
USER(12): gc: E=39% N=915200 O+=0 pfu=0+1057 pfg=0+495
cpu time (non-gc) 1067 msec user, 116 msec system
cpu time (gc)     2083 msec user, 117 msec system
cpu time (total)  3150 msec user, 233 msec system
real time  3421 msec
space allocation:
 9 cons cells, 0 symbols, 1280112 other bytes,
82108.95
USER(13): cpu time (non-gc) 700 msec user, 100 msec system
cpu time (gc)     0 msec user, 0 msec system
cpu time (total)  700 msec user, 100 msec system
real time  813 msec
space allocation:
 1 cons cell, 0 symbols, 1280048 other bytes,
82108.95
USER(14): gc: E=85% N=872024 O+=152 pfu=0+512 pfg=0+2
cpu time (non-gc) 717 msec user, 67 msec system
cpu time (gc)     200 msec user, 0 msec system
cpu time (total)  917 msec user, 67 msec system
real time  1071 msec
space allocation:
 9 cons cells, 0 symbols, 1280112 other bytes,
82108.95
USER(15): gc: E=86% N=871800 O+=312 pfu=0+142
cpu time (non-gc) 750 msec user, 0 msec system
cpu time (gc)     167 msec user, 0 msec system
cpu time (total)  917 msec user, 0 msec system
real time  1030 msec
space allocation:
 1 cons cell, 0 symbols, 1280032 other bytes,
82108.95
USER(16): cpu time (non-gc) 733 msec user, 17 msec system
cpu time (gc)     0 msec user, 0 msec system
cpu time (total)  733 msec user, 17 msec system
real time  859 msec
space allocation:
 9 cons cells, 0 symbols, 1280112 other bytes,
82108.95

USER(3): 
(defvar n)
(setf n 100000)
;; (time (log-factorial-a n))
(time (log-factorial-c n))
(time (log-factorial-d n))
(time (log-factorial-e n))
(time (log-factorial-f n))

USER(3): N
USER(4): 100000
USER(5): gc: E=86% N=44192 O+=256 pfu=594+819 pfg=9+34
gc: E=75% N=44176 O+=0
gc: E=74% N=44080 O+=112 pfg=0+1
gc: E=75% N=44104 O+=0
gc: E=76% N=13432 O+=30632 pfg=1+10
gc: E=91% N=13448 O+=0
gc: E=91% N=13488 O+=0
gc: E=91% N=13448 O+=0
gc: E=91% N=13456 O+=0
gc: E=91% N=13448 O+=0
gc: E=91% N=13480 O+=0
gc: E=91% N=13480 O+=0
gc: E=91% N=13496 O+=0
cpu time (non-gc) 7267 msec user, 316 msec system
cpu time (gc)     1350 msec user, 100 msec system
cpu time (total)  8617 msec user, 416 msec system
real time  10512 msec
space allocation:
 1 cons cell, 0 symbols, 12800048 other bytes,
1051299.6
USER(6): gc: E=90% N=19712 O+=824 pfu=2+1
gc: E=88% N=19720 O+=0
gc: E=91% N=19696 O+=0
gc: E=89% N=19696 O+=0
gc: E=91% N=13424 O+=6272
gc: E=91% N=13440 O+=0
gc: E=92% N=13424 O+=0
gc: E=91% N=13472 O+=0
gc: E=91% N=13440 O+=0
gc: E=91% N=13424 O+=0
gc: E=94% N=13472 O+=0
gc: E=91% N=13456 O+=0
gc: E=91% N=13440 O+=0
cpu time (non-gc) 7267 msec user, 101 msec system
cpu time (gc)     683 msec user, 33 msec system
cpu time (total)  7950 msec user, 134 msec system
real time  11095 msec
space allocation:
 1 cons cell, 0 symbols, 12800048 other bytes,
1051299.6
USER(7): gc: E=89% N=19696 O+=816
gc: E=91% N=19696 O+=0
gc: E=89% N=19696 O+=0
gc: E=89% N=19696 O+=0
gc: E=89% N=13424 O+=6272
gc: E=92% N=13440 O+=0
gc: E=91% N=13424 O+=0
gc: E=91% N=13472 O+=0
gc: E=91% N=13456 O+=0
gc: E=92% N=13440 O+=0
gc: E=91% N=13440 O+=0
gc: E=94% N=13472 O+=0
gc: E=91% N=13456 O+=0
cpu time (non-gc) 7334 msec user, 33 msec system
cpu time (gc)     700 msec user, 50 msec system
cpu time (total)  8034 msec user, 83 msec system
real time  9565 msec
space allocation:
 1 cons cell, 0 symbols, 12800032 other bytes,
1051299.6
USER(8): gc: E=87% N=19720 O+=824
gc: E=91% N=19768 O+=0
gc: E=91% N=19712 O+=0
gc: E=89% N=19712 O+=0
gc: E=89% N=13464 O+=6272
gc: E=91% N=13440 O+=0
gc: E=91% N=13440 O+=0
gc: E=94% N=13432 O+=0
gc: E=91% N=13440 O+=0
gc: E=94% N=13448 O+=0
gc: E=89% N=13464 O+=0
gc: E=94% N=13440 O+=0
gc: E=91% N=13440 O+=0
cpu time (non-gc) 7283 msec user, 16 msec system
cpu time (gc)     683 msec user, 17 msec system
cpu time (total)  7966 msec user, 33 msec system
real time  8256 msec
space allocation:
 1 cons cell, 0 symbols, 12800048 other bytes,
1051299.6
USER(9): 
|#
|#