[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

benchmarks



    Date: Thu, 22 Feb 90 14:29 EST
    From: barmar@think.com (Barry Margolin)

	Date: Thu, 22 Feb 90 10:45 EST
	From: Qobi@zermatt.lcs.mit.edu (Jeffrey Mark Siskind)

	Personally, I feel that the only good measure of performance is
	``stop-watch time''---the actual elapsed time from when I hit
	the return key until I see the results. This includes all of
	the I/O and OS overhead because my computer has to do that when
	solving my problem just as much as it has to run my code in
	user mode. For all experiments, I ran on a machine that was
	not running other user tasks. But they were presumably running
	other system processes which is the normal everyday working
	configuration of the machine. I think that what I called ``Elapsed
	time'' did actually measure ``stop-watch time.'' Accordingly,
	I think that these are the most reliable numbers to use when
	comparing benchmark results.

    If you can get exclusive use of a system, this is a reasonable way to do
    benchmarks.  However, the TIME macro doesn't know that you've arranged
    this, so it reports the two numbers.

    Traditionally, "CPU time" refers to the amount of time the processor
    spends executing within a given process; in most systems this is
    implemented by the scheduler noting the time when it schedules a process
    to run, noting the time when it next deschedules that process, and
    adding the difference to a total for the process.  On timesharing
    systems this is useful since it factors out the load of the system
    somewhat; a program should take approximately the same amount of CPU
    time no matter how many other users are using the system.  Systems
    differ in whether they include time spent in the kernel in this amount;
    some systems distinguish "system time" and "user time".  Sometimes it's
    useful to disregard system time, because it is more affected by
    activities of other users (for example, if you're timing file access,
    the system time will depend on whether another user had recently
    accessed the same or a nearby file).
An excellent summary; I'd only like to add that even this this definition
of CPU time can produce widely varying results because of paging.  For that
reason the KL-10 computed CPU time by counting memory references (it had
hardware for that) which resulted in the same 'CPU time' for every execution
of the same program whether it never encountered a page fault or if it took
one on every single memory reference (impossible, but you get the idea).

    On single-user workstations such as the Lisp Machine it is less
    necessary to factor out the affects of other processes, because the user
    is believed to have more control over them than on a timesharing system.
    In Genera you can use WITHOUT-INTERRUPTS or PROCESS:WITHOUT-PREEMPTS to
    hog the processor when you want to factor out other processes, but
    ordinary users on timesharing systems generally can't get exclusive use.
    CPU time is supposed to be close to the time you'd see if yours were the
    only process on the system.  If you just want to know whether 3650 or
    Sparcstation-1 processors are faster at addition this is usually the
    right way to measure it.
The time spent servicing network interrupts can be significant, especially
on heavily-used networks, so in addition to WITHOUT-INTERRUPTS-style control,
I unplug my network.

    There are, of course, times when CPU time is completely worthless.  For
    instance, if you're timing operations that use network facilities or
    back-end processors, you don't want to factor out the time spent waiting
    for the server.  Or if you want to know how well a program performs
    under various system loads you want to measure its real time.

    Benchmarking is very tough, in general.  Gabriel's book ("Performance
    and Evaluation of Lisp Systems") discusses many of these issues.

						    barmar