[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

File System Performance Loss in Genera 8.0 (16X slower!)



    Date: Fri, 6 Jul 90 11:04:06 EDT
    From: cogen@XN.LL.MIT.EDU

    What has happened to the network file system in going from Genera 7.2 to 8.0?
    By one measure, it is 16 TIMES SLOWER THAN BEFORE! I would like to know if
    anyone else has experienced this, or is it a problem with our local
    installation?

My experience is just the opposite -- 8.0 NFS is noticeably faster.
Reading binary files from a Sun 4/60 via NFS is faster than reading them
from a 3650 via NFILE.

    The details: The Symbolics, a 3670, is running 8.0 and uses IP-TCP 422.2 to
    communicate with our Sun 3 running 4.0.

    The test program: copy 7 binary files from a directory on the Sun to a local
    Symbolics directory. The files are in the neighborhood of 10KBytes each and the
    directory contains about 140 files.

	    Time under 7.2: about 13 seconds.

	    Time under 8.0: about 215 seconds.

The relative speeds appear to be very dependent upon the particular
operations being used on the NFS stream.  I compared the following three
mathods:

(with-open-file (f "kulla:~/{name}" :direction :input :element-type '(unsigned-byte 8))
  (loop repeat 10240 do (read-byte f)))

(with-open-file (f "kulla:~/{name}" :direction :input :element-type '(unsigned-byte 8))
  (stream-copy-until-eof f #'sys:null-stream))

(copy-file "kulla:~/{name}" "local:>barmar>{name}" :element-type '(unsigned-byte 8))

The first averaged about 8 seconds per file in 7.2 on a 3650, 12 seconds
per file in 8.0 on a 3640; 7.2 is 50% faster.  The second loop averaged
about 1/2 second per file in 7.2, but 1/3 second per file in 8.0; 8.0 is
50% faster.  And the third averaged about 3 seconds per file in 7.2, 1.3
second per file in 8.0; 8.0 is 130% faster.

My interpretation of this is that NFS has gotten faster in 8.0, but
READ-BYTE has gotten slower.

    This is astounding, and completely unusable. Any ideas as to what is wrong? The
    bottom right corner of the screen, which indicates file activity, shows that
    for each file copied, the directory is opened several times. Thus the time to
    copy a file is related to both the size of the file and the size of the
    directory. In this example, the size of the directory appears to dominate. But
    please don't tell me to make my directory smaller! There has to be another
    answer.

Is the Sun directory in a ".sct" hierarchy, where all the files have
explicit version numbers?  If so, then finding the newest version
requires listing the directory to find all the file names of the form
filename.~version~ so that the highest version number can be determined.

However, if all the files are coming from the same directory it should
only need to do this once.  The Lispm maintains a directory contents
cache, and if the directory hasn't been modified then it will use this
cache (note, however, that the progress note still shows this as opening
the directory -- the only way to tell that it is actually going over the
net is by looking at the whostate, to see whether you spend much time in
"NFS ReadDir").

Another change that can affect things adversely is the support in 8.0
IP-TCP for reassembling large datagrams, which is used by 8.0 NFS.  The
maximum packet size on an ethernet is about 1500 bytes, but NFS will
request 8Kbytes of the file at a time.  This datagram will be split into
a series of ethernet packets, but if just one of them isn't received the
entire series will have to be sent again (retransmission occurs at the
datagram level, not the packet level).  I was seeing this when I was
trying my above "benchmarks" on another machine (go into Peek Network
mode and check whether there are a bunch of Reassembly Nodes -- this
indicates you're getting lots of partial datagrams).

                                                barmar