[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Disk ECC errors

    Date: Fri, 29 Jul 88 12:53 PDT
    From: TYSON@Warbucks.AI.SRI.COM (Mabry Tyson)
    I'm having a discussion with CUSTOMER-REPORTS regarding disk ECC errors.
    Recently we have seen a rash of problems that involved trashing worlds
    (ie, disk sectors that weren't being written at the time) and that caused
    disk search errors (ie, it presumably dribbled on the header block).
    I'm trying to determine whether these are coincidences or not.

The design bug in the OBS (3600, 364x, 367x) disk controller that causes
most hard ECC errors (by dropping one word when writing) does not cause
those symptoms.  The bug has never been found, but it definitely does
not cause those symptoms.

It sounds like something else, maybe a bad disk or controller or cable,
is hitting you and causing disk writing at random times.  I wouldn't
expect the something else to be spread across multiple machines, unless
somebody is moving hardware around, or unless your people are being
careless about noting the precise symptoms and attributing multiple
dissimilar problems to a single cause.  Of course, the software support
people have a lot more experience with the various kinds of disk
problems than I.  The main reason for sending this message is to point
out that "ECC error" is not a single symptom with a single cause, but
a common symptom of a myriad of diseases.  It's like a fever rather than
like a broken arm.