[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Living with the Chaos protocol



   Date: Fri, 3 Apr 92 12:50:30 EST
   From: delaney@xn.ll.mit.edu
   Subject: Living with the Chaos protocol
   Reply-To: delaney@xn.ll.mit.edu
   Status: RO

      We're using them on the same subnet without any problems that I
      know of.  What types of problems have people seen with TCP and
      CHAOS on the same subnet?

   We were having occasional "broadcast storms" which would slow the
   response to users of all our 36XX machines (but not, if I remember
   correctly, the TI Explorers) to a crawl. The CPUs appeared to be
   tied up fielding ethernet packets. When we moved the LISP machines
   to a separate subnet with no forwarding of CHAOS packets out of
   that subnet, we stopped having the storms.

   We were never able to explain what was happening or why. We
   suspected that some non-Symbolics machine was broadcasting bogus
   CHAOS-like packets in response to legitimate CHAOS packets under
   obscure circumstances. The circumstances had to be obscure because
   the storms did not occur at all for a number of years and then only
   occured once a week or so. But they could last all day.

The Symbolics network code is hopelessly bit-rotted, but at least it
started out pretty robust and attempted to conform to protocol
-specifications-.  I'm not aware of Symbolics machines being
responsible for this sort of lossage in years they've been at MIT.

On the other hand, the network code is not very robust in the face of
malicious or stupid or misconfigured hosts and tends to wilt in the
face of a broadcast storm (rather than fight back!)

   We suspected either a new Solbourne or a MassComp with
   newly-updated software of starting the storms but, curiously,
   disconnecting these machines from the affected subnet would not
   stop a storm in progress.

   I am sure others out there can add their 2 cents about broadcast
   storms.

I've seen misconfigured un*x boxes (and what other kind are
there?) insistently broadcast "YP" UDP packets.  For some reason this
makes the lisp machine want to cons up *three* UDP-RPC servers, and
due to gross timing bugs in the RPC::ALLOCATE-XDR-BLOCK code can lead
the whole machine to wedge because all network connections are waiting
for free "Big Packet Buffers."  Of course, when the next YP broadcast
comes in, it conses up another three UDP-RPC servers, which spin-wait
for packet buffers.

But this just makes your machine catatonic.  It doesn't do anything to
the network. 

There are quite a number of similar gross inefficiencies, together
with a general lack of "defensive" network coding in the Lispm
software.  I don't think this is necessarily the worst thing, but it
is certainly unpleasant when some other bozo on one's network prevents
one from getting any work done.

My motto is "When in doubt, unix is screwing you."  (The problem is
that it is the only thing which self-styled system administrators can
deal with, and they only thing they -think- they understand.)