[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Strange network congestion



    Date: Tue, 11 Dec 90  09:10:00 EST
    From: delaney@xn.ll.mit.edu (John R. Delaney)

    For the 4th time in a moth, the 3rd time in a wekk, and the 2nd time in
    as many days, we are having a problem with ALL of our L-bus-based
    Symbolics LISP machines which make virtually unusable unless
    disconnected for our group's network.

    Context: We have 10 L-bus-based 36xx LISP machines, 1 G-bus-based 36xx

By G-bus-based I assume you mean G-machine-based (L-machines and
G-machines both use the L-bus).

    LISP machine, 1 MacIvory, 4 TI Explorer LISP machines, ~20 SUN 3 and 4
    workstation, 2 SUN SPARCstations, 2 DG AViions (different models), a VAX
    11/782, a Cisco router, and God only knows what else, all on 1 ethernet.
    All the Symbolics machines run Genera 8.0.1.

    Symptoms: 90+% of the cycles on each L-bus-based 36xx is spent in the
    Ethernet reciever process. For all practical purposes, the machines are
    useless. No other computers seem to be directly affected. The problem
    appears to start spontaneously.  

    The first time this occured, we shut down all of the effected Symbolics
    machines and brought them back up one at a time, expecting the problem
    to recur when one machine was brought up. But the problem did not then
    recur.

    The second time, we unplugged ethernet connectors one at a time. The
    last machine left connected still had the peoblem. We disconnected it
    and reconnected another; the problem recured on the newly reconnected
    machine. Then we started disconnecting other commputers. After, but not
    immediately after, disconnecting the Cisco router from the outside
    world, the problem went away.

    The third time, we did nothing and the problem went away spontaneously.

    The fourth time (still going on), we disconnected the router from the
    outside world. But that had no impact on the problem.

    Our fancy HP network monitor shows no signs of anything strange; but it
    apparently is set up to monitor TCP/IP activity only. The PEEK network
    screens on the TIs tell us that apparently large numbers of broadcast
    packets are being received by them but not the origins. They also tell
    us that a large-ish number of CHAOS RUT packets are being received, even
    though we do not have a CHAOS bridge in our system (as far as we know).

Note that a cisco router can act as a Chaosnet bridge.  But I don't
think they'll do it automatically, it has to be specified in the
configuration file.

If you can reconfigure your network monitor to show all Ethernet
traffic, that would be the way to diagnose this problem.  In general,
this sounds like something (or several somethings) is rapidly and
repeatedly broadcasting to a protocol that the Lispm doesn't support;
the Ethernet receiver process implements everything until a packet is
dispatched to a particular application protocol (although this doesn't
explain why the G- and I-machines weren't affected).  The Ethernet
receiver process also implements ARP, so another possibility is that
something is frequently broadcasting ARP requests.

                                                barmar