[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

More on disk overruns.

Received: from CLAUDE.LAAC-AI.Dialnet.Symbolics.COM by ALAN.LAAC-AI.Dialnet.Symbolics.COM via CHAOS with CHAOS-MAIL id 40283; Tue 19-Sep-89 13:38:57 PDT
Date: Tue, 19 Sep 89 13:39 PDT
From: Robert D. Pfeiffer <RDP@ALAN.LAAC-AI.Dialnet.Symbolics.COM>
Subject: More on disk overruns. 
To: SLUG@ALAN.LAAC-AI.Dialnet.Symbolics.COM
In-Reply-To: Your message of 18 Sep 89 14:03 PDT
Message-ID: <19890919203933.2.RDP@CLAUDE.LAAC-AI.Dialnet.Symbolics.COM>

[I'm replying again on the SLUG list with the idea that this topic may
be of benefit to others.  I hope this is the right thing to do and that
I'm not wasting too many people's time.]

    Date: Mon, 18 Sep 89 12:44 PDT
    From: TYSON@Warbucks.AI.SRI.COM (Mabry Tyson)

	Date: Mon, 18 Sep 89 10:22 PDT
	From: Robert D. Pfeiffer <RDP@ALAN.LAAC-AI.Dialnet.Symbolics.COM>
	[And now, the postscript:  The problem has resurfaced!  We ran for a
	week with no noticable problems.  I decided it was time to do some disk
	hygiene and so was using the Check Records command from Level 3 of
	FSMaint.  It was running along fine for quite a while when all of a
	sudden -- yes, you guessed it -- lots and lots of disk overruns!  Oh
	well, back to the drawing board.  I guess this will finally give us the
	impetus to migrate to a 3650 as our primary file server.]

    Aha!  Guess it's time to change the disk's electronics again...

Well, actually, yes. <:-)  The failing system has two Eagles in it so,
just to be sure, we're going to swap the second disk controller board
set.  For the record, I'm skeptical.

    The point about the network statistics is to see that all your machines are
    getting about the same results.  If so, then it is something on the net
    (possibly starting at your server, possibly not).  If it is only on your
    server, then your problems are there.  I show typically  (CRC Errors +
    Alignment Errors) < 0.5% (Net ignored + Receive count) with CRC Errors and
    Alignment Errors being roughly equal (ie, within a factor of 2).

OK, I understand.  I'll have to look into this.  Thanks for the insight.

    One thing you didn't mention (I think) was swapping transceivers.  You should
    be aware there are at least 3 protocols: Ethernet 1, Ethernet 2, and 802.3.
    I know the Symbolics's run with Ethernet 1.  If it is hooked up to a transceiver
    that only is for 802.3, maybe it won't work right??  But then, if that were
    the cause, why would you problems be variable/intermittent?

    Have you ruled out machine/disk temperature as a cause?  How's your power look
    (level, wave-form)?  I presume your CE made sure that the disk wasn't at fault
    in the first place.  Maybe its belt is getting a bit long or slippery so it
    isn't spinning at the right rate...  (Why would it be overruns though?)

Good questions.  I'll make sure these things get checked if they haven't
been already.

    Oh yes, one more thing...  Did you yank the transceiver cable out when you
    started getting disk overruns?  If you can still get them, try it and see
    if the overruns go away.  If so, the net is directly or indirectly causing

Yes, we did do this earlier and the problems persisted with the machine
disconnected from the Ethernet.  That's when we started focusing more on
the disk drives and their electronics.

    By the way, I'm confused about your mail address.  

Me too.

							As you see, when Warbucks
    gets through with it, it is


    Are the two ALAN machines the same?  


    and ALAN.KAHUNA... point to UUNET.UU.NET.  There is no MX record for
    ALAN.LAAC-AI.DIALNET.SYMBOLICS.COM.  Either you ought to fix it so your return
    address is just RDP@ALAN.KAHUNA.DECNET.LOCKEED.COM or you ought to get a
    MX record for your DIALNET address.  (Ask CUSTOMER-REPORTS@SYMBOLICS.COM to
    have an MX record for *.LAAC-AI.DIALNET.SYMBOLICS.COM installed to direct mail to

Well, it's a real long and complicated story.  We're not actually using
Dialnet although we began to set that up at one time long ago.  We go
through a fairly tortuous Lockheed-internal path which finally gets to
the Arpanet.  The intervening machines are completely outside my control
and change without my knowledge.  If I wanted to, I'm sure that I could
spend my full time just trying to figure out and keep up with our
network.  But, since I have no desire to do this, I'm usually just happy
to get my mail through reasonably often.

Thanks again for all your help on this topic.