[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fatal Disk Error urgency [1]



    Date: Wed, 6 Jun 90 13:33 EDT 
    From: Dodds@YUKON.SCRC.Symbolics.COM (Douglas Dodds)

	Date: Wed, 6 Jun 90 00:25 EDT
	From: RWK@FUJI.ILA.Dialnet.Symbolics.COM (Robert W. Kerns)
Hi Bob.
	[ . . . . ]

	I don't know if SI:FIX-FEP-BLOCK performs the proper actions to
	minimize damage to the file in which the block appears, but hopefully
	the documentation will enlighten you.  In general, world-load files which
	get a bad block should be replaced, LMFS files should have a block of zeros
	substituted, 

As the other Doug says below, never use these functions on a LMFS
partition unless you intend to throw the entire LMFS away.

		     and paging files can just have the block removed (while the 
	file is not in use!)

Ditto world files: Never use these fuctions on files that are in use.

The 8.0/7.4/7.2 documentation for SI:FIX-FEP-BLOCK and SI:FIX-FEP-FILE is
sketchy at best.  I distributed a more comprehesive guide to the field
service people around Jan 89.

These functions will do a read-only scan of a block or fep file for ECC
errors.  If an error is found, the user will be queried about performing a
write/read test.  The write/read tests will rewrite the block using
repaired (but probably incorrect) data from the block and and reread the
block to see if the error still exists.  If there is still an error in the
block, then the user will be queried with several choices:

1. DELETE:  This option is supposed to remove the bad block from the file,
   add it to the bad blocks list, and then delete the file.  It should be
   used on world load files when you know the data in the bad blocks has
   been trashed.  The file should be expunged from the disk file system
   and reloaded from tape (or recreated).

2. SPLICE:  This option is supposed to remove the bad block from the file
   and add it to the bad blocks list.  The file's data map is spliced back
   together, leaving an intact working file.  This option can be used to
   repair trashed paging files.

3. ZERO:  This option is supposed to remove the bad block from the file,
   allocate a new block from the free map, splice that block into the
   file, and write 0's into it.  This option can also be used to repair
   trashed paging files.

4. COPY:  This option is supposed to remove the bad block from the file
   while retaining the original data in the block (which is probably
   damaged in some way), allocate a new block from the free map, splice
   that block into the file, and write the original data back into it.
   This option could be used if you feel the original data has not been
   trashed.  Good Luck.

Due to minor buggyness in past releases, I usually recommend that if a
hard ECC error is found, one should use the SPLICE option, delete and
expunge the file, and recreate it.  This insures that the bad block is
indeed removed from use and is not allocated to both the original file and
the bad blocks file.

In all cases, run the function SI:VERIFY-FEP-FILESYSTEM to make sure
everything is clean.

    The documentation states, and I agree, that for safety, you should never
    use SI:FIX-FEP-BLOCK or SI:FIX-FEP-FILE on LMFS partitions.  Instead,
    use LMFS:FIX-FILE, which gives you the right pathname-based handles on
    the file, and limits the options to those that are safe for the
    integrity of LMFS partitions.

To gain more information on errors and their locations, these functions
can be safely run on any disk file (including LMFS partitions) 1AS LONG AS
NO POSITIVE ACTION IS TAKEN0.  Say 1NO0 to all option queries.