Difference between revisions of "RD51/52/53 Survival Tips"
From Computer History Wiki
(New page "RD51/52/53 Survival Tips") |
(No difference)
|
Revision as of 09:05, 16 July 2023
An interesting memo written by a field service staff member in 1986:
RD51/52/53 Survival Tips ( or, why is such a small disk causing me so much trouble ) James B. Frazier MSD-CSSE First, a few terms you need to know. There are some differences in this implementation of DSA/MSCP. DSA==> Digital Storage Architecture MSCP==> Mass Storage Control Protocol DSDF==> Digital Standard Disk Format BBR ==> Bad Block Replacement The process used to make the disk appear error free. FACT==> FActory Control Table Where the disk manufacturer writes bad spot info. This info was found with very sensitive testing during manufacture. Written in a non DSDF format. Accessible only to the RQDXn controller. FCT==> Format Control Table Where the RQDXn controller stores bad block info. This info is a combination of the FACT and bad blocks found during format. Written in DSDF format. Accessible only to RQDXn controller. RCT==> Replacement and Caching Table Where the BBR and Revector process stores/finds which RBN replaces a given LBN. Accessible to Host and controller. LBN==> Logical Block Number Where programs and data are stored. Accessible to host and RQDXn. RBN==> Replacement Block Number A block used to replace a LBN that is bad. UIT==> Unit Information Table Where the RQDX3 stores/reads the disk geometry table. When should you format a disk: When should a disk be formatted? Well... a new disk probably has the wrong format on it and obviously needs to be formatted. An existing disk that has been corrupted because of a bad controller, bad cable or bad environment (AC line noise etc.) will need formatting. An existing disk that the user says has a bad block and his program won’t work, or one or more files can’t be accessed, may not need formatting. The first thing to do is check the system error log. All of the operating systems can log the MSCP Status/Event code in the system error log. If that code includes an octal 10 or hex 8, you have found a forced error. The remedy is to replace the file(s) from a backup set, or repair/recreate said file(s). What has happened is that the RQDXn BBR process replaced an LBN with a RBN but couldn’t read the original data safely. The best guess data was written in the RBN with an error flag that forces a Status/Event code of octal 10, hex 8, a "Forced Error". The block in question is no longer bad. The data is. bad. As soon as the file is replaced, and/or this block is written over, the forced error disappears. Now if your customer is reporting lots of problems with bad blocks, the first item should be to check the integrity of the hardware and cables. There is a known problem with the 20 pin and 34 pin connectors that plug into the RD5x, some plugs have exposed conductors extending out from the side, this can short against the skid plate and cause lots and lots of problems. Until the stockrooms can be purged, your only recourse is to inspect all cables and tape them as necessary. Next check the rev levels of all the hardware, RQDX1 controllers should have V9.4 roms. RQDX2 controllers should have V10.0E roms. The RQDX3 has only one version of ucode, so far. [No longer true, see ...] If you upgrade a uPDP23+ RQDX1 from V8.0 to V9.4, or install an RQDX2, check the KDF11-B boot roms, they MUST be at least V.9, part no. 23-183E4 & 23-184E4. V.9 roms are no longer available, so if you need new roms order V1.0, part no. 23-380E4 & 23-381E4, order by using part no. KDF11-B3. Be very aware that an RD5x system reacts differently than a RLO2 system to poor environments. Electrical disturbances, AC line noise, etc., cause retries on an RLO2, on an RD5x the RQDXn starts doing lots of BBR and will eventually fill up the RCT. At this point the RD5x is unusable. If you suspect AC line noise, arrange to loan a DEC CVC to your customer. If that helps, sell him one. ESD can also hurt you, there are many ways to alleviate this problem. Check with district support. During maintenance, always use a anti-static kit. It may be that the RD5x in question was improperly formatted, and the FACT was not used, or the data from the manufacturers sticker was not entered. How to properly format RD5x disks: Once the decision is made to format a disk, backup the data if at all possible! Something you need to know and remember, THE RD51 DOES NOT HAVE A FACT! This is important to know when answering diagnostic setup questions to format the RD51 disk. At this writing we have shipped three different controllers, RQDX1/2/3, with six separate levels of ucode. When you run a formatter diagnostic, it calls to a format program inside the controller’s ucode. This explains why one diagnostic prints slightly different text on the system console when run at several sites. Every rev level of ucode has some subtle changes, including the text used during the format process. For uPDP systems with RQDX1/2 controllers, use ZRQBC1. When the formatter starts you should answer NO to the change HW question unless the RQDX1/2 is not at the standard address. ALWAYS ANSWER NO TO THE CHANGE SW QUESTION !!!!!!!!!!!!!! ALWAYS ANSWER NO TO THE CHANGE SW QUESTION !!!!!!!!!!!!!! The change SW is only useful for APT. Don’t use it in the field. In the case of an RD51, the only safe course is to answer NO to the question "Use existing bad block information <N>:". And YES to the question "Use downline load <N>:", and supply the head, cylinder, and byte offset info from the sticker on the drive. Also answer YES to the question "Continue if bad block information is not accessible <N>:". If there is no sticker on the drive, then say no to the downline load question. In the case of the RD52/53 you should answer YES to the question "Use existing bad block information <N>:", and NO to the "Continue if bad block info is not accessible <N>:" question. If this fails, because it claims it can’t read bad block info, retry and answer NO to the question "Use existing bad block information <N>:". And YES to the question "Use down line load <N>:, and enter the info from the sticker on drive. Say yes to "Continue.." this time. For uPDP systems with RQDX3 controllers and RD51/52/53 drives use ZRQCB2. ALWAYS SAY YES TO CHANGE HW ALWAYS SAY YES TO USE AUTOFORMAT. For uVAXI systems with RQDX1/2 controllers and RD51/52/53 drives, use EHXRQ and the directions for uPDP. The questions are identical. RQDX3s are not supported. [But they do work!] In the case of uVAXII, ALWAYS ANSWER YES TO THE QUESTION "USE AUTOFORMAT MODE". If you don’t have MDM V1.08 or better, get it. Someone in the branch/district is on automatic distribution and should be sharing with all. Drive vs Controller compatibility: The RQDX2 is an RQDX1 with some ECOs and larger RAMs etc. So all of the RQDX1/2 ucode versions can be viewed as a linear progression. 23-188E5-00 & 23-189E5-00 ==> V10.0E RQDX2 Will handle RX50,RD51/52/53 23-178E5-00 & 23-179E5-00 ==> V10.0D RQDX2 Will handle RX50,RD51/52/53 23-172E5-00 & 23-173E5-00 ==> VO9.4E RQDX1 Will handle RX50,RD51/52 23-042E5-00 & 23-043E5-00 ==> VO9.0 RQDX1 Will handle RX50,RD51/52 93-264E4-00 & 23-265E4-00 ==> VO8.0 RQDX1 Will handle RX50,RD51 23-238E4-00 & 23-239E4-00 ==> VO7.0 RQDX1 Will handle RX50,RD51 Now, if by changing the controller, or the proms, you change the ucode revision level up, the new ucode will restructure the RCT/FCT tables on the connected disks to it’s satisfaction. If the ucode revision level is to a lower level, the older ucode won’t understand the disk structure, and the disk(s) will have to be formatted. The ucode is only upwards compatible, not downwards. If you replace an RQDX1/2 with an RQDX3, you MUST reformat the disk. RQDX1/2 controllers have the disk geometry stored in roms. RQDX3 controllers write the disk geometry on the disk during format, (in the UIT) and then read it off the disk after every power up sequence or controller init sequence. If an RQDX3 tries to bring an RD5x drive online and can’t read the UIT, it assumes that the drive is an RD51. The solution is to reformat. If an RQDX2 is configured with 3 RD5x drives and no RX50, the 3rd RD5x (drive 2) has the removable bit set. When a sniffer boot sees this it will try to boot drive 2 first instead of drive 0. How you can help: MSD-CSSE is working on identifying problem scenarios associated with this subsystem. You can help us by reporting specific syndromes/symptoms that you see being repeated at multiple customer sites. Please send as much detail as possible, include exact drive and controller type, firmware rev level, CPU type and rev level, boot rom rev level, box type, operating system etc. Send reports to me at: James B. Frazier MS: ZKO2-1/N42 or on the ENET: PULSAR:: FRAZIER