[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: What does "irrecoverable disk overrun" mean?

It probably means that you have a problem on your network.  In the 3600,3670,
3640 set of machines, the network has priority over the disk and the machine
listens to every packet.  Thus if you have a packet storm (some of us think
Suns tend to do this), then the Symbolics is so busy listening to the stuff
on the net, it forgets about the disk for a while.  When it looks back, the
disk has already past the start of the sector that it wanted (overran it).
I believe the code tries again a few times and eventually gives up.

When this happens, you may also see ECC errors in your files on that
machine for the same reason.

As I recall David Moon gave a good explanation of what happens some time
back (1 1/2 years?).  I could hunt it up if needed.  (I'm sorry my explanation
isn't as good as his.  I believe that I am close enough to the truth.)

If you have a number of lisp machines and this is the only one that this is
happening to, then maybe it is something local with the machine.  Try
swapping transceivers and cables (by swapping them at the connectors at the
back of the CPUs if the machines are close together).  Try swapping the
I/O paddle and I/O cards between machines.  Who knows, it might actually
be the disk!

(It seems to me that there were some 3645s that had intermittent problems
with their disks but I can't recall the details.  I can probably recover
that info if you need it.)