RD51/52/53 Survival Tips

From Computer History Wiki
Jump to: navigation, search

An interesting memo written by a field service staff member in 1986:

                      RD51/52/53 Survival Tips
    ( or, why is such a small disk causing me so much trouble )
                     James B. Frazier MSD-CSSE
 First, a few terms you need to know. There are some differences in
 this implementation of DSA/MSCP.
 DSA==> Digital Storage Architecture
 MSCP==> Mass Storage Control Protocol
 DSDF==> Digital Standard Disk Format
 BBR ==> Bad Block Replacement
 The process used to make the disk appear error free.
 FACT==> FActory Control Table
 Where the disk manufacturer writes bad spot info. This info was found
 with very sensitive testing during manufacture. Written in a non DSDF
 format.  Accessible only to the RQDXn controller.
 FCT==> Format Control Table
 Where the RQDXn controller stores bad block info. This info is a
 combination of the FACT and bad blocks found during format. Written
 in DSDF format. Accessible only to RQDXn controller.
 RCT==> Replacement and Caching Table
 Where the BBR and Revector process stores/finds which RBN replaces a
 given LBN. Accessible to Host and controller.
 LBN==> Logical Block Number
 Where programs and data are stored. Accessible to host and RQDXn.
 RBN==> Replacement Block Number
 A block used to replace a LBN that is bad.
 UIT==> Unit Information Table
 Where the RQDX3 stores/reads the disk geometry table.
 When should you format a disk:
 When should a disk be formatted? Well... a new disk probably has the
 wrong format on it and obviously needs to be formatted.
 An existing disk that has been corrupted because of a bad controller,
 bad cable or bad environment (AC line noise etc.) will need
 An existing disk that the user says has a bad block and his program
 won’t work, or one or more files can’t be accessed, may not need
 formatting. The first thing to do is check the system error log. All
 of the operating systems can log the MSCP Status/Event code in the
 system error log. If that code includes an octal 10 or hex 8, you
 have found a forced error. The remedy is to replace the file(s) from
 a backup set, or repair/recreate said file(s). What has happened is
 that the RQDXn BBR process replaced an LBN with a RBN but couldn’t
 read the original data safely. The best guess data was written in the
 RBN with an error flag that forces a Status/Event code of octal 10,
 hex 8, a "Forced Error". The block in question is no longer bad. The
 data is. bad. As soon as the file is replaced, and/or this block is
 written over, the forced error disappears.
 Now if your customer is reporting lots of problems with bad blocks,
 the first item should be to check the integrity of the hardware and
 cables. There is a known problem with the 20 pin and 34 pin
 connectors that plug into the RD5x, some plugs have exposed conductors
 extending out from the side, this can short against the skid plate and
 cause lots and lots of problems. Until the stockrooms can be purged,
 your only recourse is to inspect all cables and tape them as necessary.
 Next check the rev levels of all the hardware, RQDX1 controllers
 should have V9.4 roms. RQDX2 controllers should have V10.0E roms.
 The RQDX3 has only one version of ucode, so far.
 [No longer true, see ...]
 If you upgrade a uPDP23+ RQDX1 from V8.0 to V9.4, or install an RQDX2,
 check the KDF11-B boot roms, they MUST be at least V.9, part no.
 23-183E4 & 23-184E4. V.9 roms are no longer available, so if you need
 new roms order V1.0, part no. 23-380E4 & 23-381E4, order by using
 part no. KDF11-B3.
 Be very aware that an RD5x system reacts differently than a RLO2 system
 to poor environments. Electrical disturbances, AC line noise, etc.,
 cause retries on an RLO2, on an RD5x the RQDXn starts doing lots of BBR
 and will eventually fill up the RCT. At this point the RD5x is
 unusable. If you suspect AC line noise, arrange to loan a DEC CVC to
 your customer. If that helps, sell him one.
 ESD can also hurt you, there are many ways to alleviate this problem.
 Check with district support. During maintenance, always use a
 anti-static kit.
 It may be that the RD5x in question was improperly formatted, and the
 FACT was not used, or the data from the manufacturers sticker was not
 How to properly format RD5x disks:
 Once the decision is made to format a disk, backup the data if at all
 Something you need to know and remember, THE RD51 DOES NOT HAVE A
 FACT! This is important to know when answering diagnostic setup
 questions to format the RD51 disk.
 At this writing we have shipped three different controllers,
 RQDX1/2/3, with six separate levels of ucode. When you run a
 formatter diagnostic, it calls to a format program inside the
 controller’s ucode. This explains why one diagnostic prints slightly
 different text on the system console when run at several sites. Every
 rev level of ucode has some subtle changes, including the text used
 during the format process.
 For uPDP systems with RQDX1/2 controllers, use ZRQBC1. When the
 formatter starts you should answer NO to the change HW question unless
 the RQDX1/2 is not at the standard address.
 The change SW is only useful for APT. Don’t use it in the field.
 In the case of an RD51, the only safe course is to answer NO to the
 question "Use existing bad block information <N>:". And YES to the
 question "Use downline load <N>:", and supply the head, cylinder, and
 byte offset info from the sticker on the drive. Also answer YES to
 the question "Continue if bad block information is not accessible
 If there is no sticker on the drive, then say no to the downline load
 In the case of the RD52/53 you should answer YES to the question "Use
 existing bad block information <N>:", and NO to the "Continue if bad
 block info is not accessible <N>:" question. If this fails, because
 it claims it can’t read bad block info, retry and answer NO to the
 question "Use existing bad block information <N>:". And YES to the
 question "Use down line load <N>:, and enter the info from the sticker
 on drive. Say yes to "Continue.." this time.
 For uPDP systems with RQDX3 controllers and RD51/52/53 drives use
                 ALWAYS SAY YES TO CHANGE HW
 For uVAXI systems with RQDX1/2 controllers and RD51/52/53 drives, use
 EHXRQ and the directions for uPDP. The questions are identical.
 RQDX3s are not supported. [But they do work!]
 AUTOFORMAT MODE". If you don’t have MDM V1.08 or better, get it.
 Someone in the branch/district is on automatic distribution and should
 be sharing with all.
 Drive vs Controller compatibility:
 The RQDX2 is an RQDX1 with some ECOs and larger RAMs etc. So all of
 the RQDX1/2 ucode versions can be viewed as a linear progression.
 23-188E5-00 & 23-189E5-00 ==>  V10.0E  RQDX2  Will handle RX50,RD51/52/53
 23-178E5-00 & 23-179E5-00 ==>  V10.0D  RQDX2  Will handle RX50,RD51/52/53
 23-172E5-00 & 23-173E5-00 ==>  VO9.4E  RQDX1  Will handle RX50,RD51/52
 23-042E5-00 & 23-043E5-00 ==>  VO9.0   RQDX1  Will handle RX50,RD51/52
 93-264E4-00 & 23-265E4-00 ==>  VO8.0   RQDX1  Will handle RX50,RD51
 23-238E4-00 & 23-239E4-00 ==>  VO7.0   RQDX1  Will handle RX50,RD51
 Now, if by changing the controller, or the proms, you change the ucode
 revision level up, the new ucode will restructure the RCT/FCT tables
 on the connected disks to it’s satisfaction. If the ucode revision
 level is to a lower level, the older ucode won’t understand the disk
 structure, and the disk(s) will have to be formatted. The ucode is
 only upwards compatible, not downwards.
 If you replace an RQDX1/2 with an RQDX3, you MUST reformat the disk.
 RQDX1/2 controllers have the disk geometry stored in roms. RQDX3
 controllers write the disk geometry on the disk during format, (in
 the UIT) and then read it off the disk after every power up sequence
 or controller init sequence.
 If an RQDX3 tries to bring an RD5x drive online and can’t read the UIT,
 it assumes that the drive is an RD51. The solution is to reformat.
 If an RQDX2 is configured with 3 RD5x drives and no RX50, the 3rd RD5x
 (drive 2) has the removable bit set. When a sniffer boot sees this it
 will try to boot drive 2 first instead of drive 0.
 How you can help:
 MSD-CSSE is working on identifying problem scenarios associated with
 this subsystem. You can help us by reporting specific 
 syndromes/symptoms that you see being repeated at multiple customer
 sites. Please send as much detail as possible, include exact drive
 and controller type, firmware rev level, CPU type and rev level, boot
 rom rev level, box type, operating system etc. Send reports to me at:
 James B. Frazier MS: ZKO2-1/N42