[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

File System Performance Loss in Genera 8.0 (16X slower!)



I think Barmar has done a good job isolating the particular tasks that
are faster in 8.0 and which are slower.  I am working with snicoud and
cogen off-line to see if there is any problem with their particular
configuration or with the directory code that would make things any
slower than they need to be.  I have discovered that the NFS streams did
not benefit from some of the stream speed-ups that were installed in
8.0, so there is something to be done there.

With regard to the large number of reassembly nodes that you see in 8.0:
Since we are now receiving 8K messages that consist of smaller ethernet
packets, we need reassembly nodes.  It is true that if you miss one
packet in the reassembly, you lose the whole reassembly and it has to
timeout.  The current reassembly timeout it 60 times the max
time-to-live of the datagram.  I might try to change that to be a
variable.  

There are usually two reasons to have a huge number of reassembly nodes.
The first is having a gateway in between the lispm and the NFS server.
The old Proteon gateways we have at Symbolics can only handle so many
packets at once so they can drop packets.  NFS tries to compensate for
this by lowering the max message size through gateways to 4K but you can
still drop packets. The second problem stems from the older hardware in
the 36XX series.  The ethernet controller on the OBS machines (3600,
3640, 3645, 3670, 3675) was designed before there were ethernet
controller chips on the market.  It was not designed for the throughputs
of todays servers and drops packets.  There isn't much we can do about
this.  While the NBS machines (3610, 3620, 3630, 3650, etc.) have an
ethernet controller chip (the LANC chip), they only have 2 (I think)
hardware buffers dedicated to the chip and the microcode has to copy the
data out of the buffers before they can receive more data.  So, NBS
machines can drop packets when they are sent too fast.  No surprise,
most modern Sun's can send packets this fast.  I first noticed this
problem when measuring network performance of an XL1200 blasting to a
3650.  The XL1200 can cause this problem for an NBS machine too.  The
real danger here is that NFS sends the exact same message at the same
speed everytime and if you have just enough data to get retransmitted to
the machine everytime, without any other data to disrupt the stream,
you can lose.  I believe this is what they are seeing at MCC.  Again,
there isn't much we can do about this, these machines were not designed
to have "network firehoses" opened up on them.  Nobody was pushing data
around that fast at the time.  Note, none of these problems affect Ivory
based machines, as far as I know.