Difference between revisions of "RD51/52/53 Survival Tips"

From Computer History Wiki
Jump to: navigation, search
(New page "RD51/52/53 Survival Tips")
(No difference)

Revision as of 09:05, 16 July 2023

An interesting memo written by a field service staff member in 1986:

                     RD51/52/53 Survival Tips
   ( or, why is such a small disk causing me so much trouble )

                    James B. Frazier MSD-CSSE

First, a few terms you need to know. There are some differences in
this implementation of DSA/MSCP.

DSA==> Digital Storage Architecture

MSCP==> Mass Storage Control Protocol

DSDF==> Digital Standard Disk Format

BBR ==> Bad Block Replacement

The process used to make the disk appear error free.

FACT==> FActory Control Table

Where the disk manufacturer writes bad spot info. This info was found
with very sensitive testing during manufacture. Written in a non DSDF
format.  Accessible only to the RQDXn controller.

FCT==> Format Control Table

Where the RQDXn controller stores bad block info. This info is a
combination of the FACT and bad blocks found during format. Written
in DSDF format. Accessible only to RQDXn controller.

RCT==> Replacement and Caching Table

Where the BBR and Revector process stores/finds which RBN replaces a
given LBN. Accessible to Host and controller.

LBN==> Logical Block Number

Where programs and data are stored. Accessible to host and RQDXn.

RBN==> Replacement Block Number

A block used to replace a LBN that is bad.

UIT==> Unit Information Table

Where the RQDX3 stores/reads the disk geometry table.

When should you format a disk:

When should a disk be formatted? Well... a new disk probably has the
wrong format on it and obviously needs to be formatted.

An existing disk that has been corrupted because of a bad controller,
bad cable or bad environment (AC line noise etc.) will need
formatting.

An existing disk that the user says has a bad block and his program
won’t work, or one or more files can’t be accessed, may not need
formatting. The first thing to do is check the system error log. All
of the operating systems can log the MSCP Status/Event code in the
system error log. If that code includes an octal 10 or hex 8, you
have found a forced error. The remedy is to replace the file(s) from
a backup set, or repair/recreate said file(s). What has happened is
that the RQDXn BBR process replaced an LBN with a RBN but couldn’t
read the original data safely. The best guess data was written in the
RBN with an error flag that forces a Status/Event code of octal 10,
hex 8, a "Forced Error". The block in question is no longer bad. The
data is. bad. As soon as the file is replaced, and/or this block is
written over, the forced error disappears.

Now if your customer is reporting lots of problems with bad blocks,
the first item should be to check the integrity of the hardware and
cables. There is a known problem with the 20 pin and 34 pin
connectors that plug into the RD5x, some plugs have exposed conductors
extending out from the side, this can short against the skid plate and
cause lots and lots of problems. Until the stockrooms can be purged,
your only recourse is to inspect all cables and tape them as necessary.

Next check the rev levels of all the hardware, RQDX1 controllers
should have V9.4 roms. RQDX2 controllers should have V10.0E roms.
The RQDX3 has only one version of ucode, so far.
[No longer true, see ...]

If you upgrade a uPDP23+ RQDX1 from V8.0 to V9.4, or install an RQDX2,
check the KDF11-B boot roms, they MUST be at least V.9, part no.
23-183E4 & 23-184E4. V.9 roms are no longer available, so if you need
new roms order V1.0, part no. 23-380E4 & 23-381E4, order by using
part no. KDF11-B3.

Be very aware that an RD5x system reacts differently than a RLO2 system
to poor environments. Electrical disturbances, AC line noise, etc.,
cause retries on an RLO2, on an RD5x the RQDXn starts doing lots of BBR
and will eventually fill up the RCT. At this point the RD5x is
unusable. If you suspect AC line noise, arrange to loan a DEC CVC to
your customer. If that helps, sell him one.

ESD can also hurt you, there are many ways to alleviate this problem.
Check with district support. During maintenance, always use a
anti-static kit.

It may be that the RD5x in question was improperly formatted, and the
FACT was not used, or the data from the manufacturers sticker was not
entered.

How to properly format RD5x disks:

Once the decision is made to format a disk, backup the data if at all
possible!

Something you need to know and remember, THE RD51 DOES NOT HAVE A
FACT! This is important to know when answering diagnostic setup
questions to format the RD51 disk.

At this writing we have shipped three different controllers,
RQDX1/2/3, with six separate levels of ucode. When you run a
formatter diagnostic, it calls to a format program inside the
controller’s ucode. This explains why one diagnostic prints slightly
different text on the system console when run at several sites. Every
rev level of ucode has some subtle changes, including the text used
during the format process.

For uPDP systems with RQDX1/2 controllers, use ZRQBC1. When the
formatter starts you should answer NO to the change HW question unless
the RQDX1/2 is not at the standard address.

      ALWAYS ANSWER NO TO THE CHANGE SW QUESTION !!!!!!!!!!!!!!
      ALWAYS ANSWER NO TO THE CHANGE SW QUESTION !!!!!!!!!!!!!!

The change SW is only useful for APT. Don’t use it in the field.

In the case of an RD51, the only safe course is to answer NO to the
question "Use existing bad block information <N>:". And YES to the
question "Use downline load <N>:", and supply the head, cylinder, and
byte offset info from the sticker on the drive. Also answer YES to
the question "Continue if bad block information is not accessible
<N>:".

If there is no sticker on the drive, then say no to the downline load
question.

In the case of the RD52/53 you should answer YES to the question "Use
existing bad block information <N>:", and NO to the "Continue if bad
block info is not accessible <N>:" question. If this fails, because
it claims it can’t read bad block info, retry and answer NO to the
question "Use existing bad block information <N>:". And YES to the
question "Use down line load <N>:, and enter the info from the sticker
on drive. Say yes to "Continue.." this time.

For uPDP systems with RQDX3 controllers and RD51/52/53 drives use
ZRQCB2.

                ALWAYS SAY YES TO CHANGE HW
             ALWAYS SAY YES TO USE AUTOFORMAT.

For uVAXI systems with RQDX1/2 controllers and RD51/52/53 drives, use
EHXRQ and the directions for uPDP. The questions are identical.
RQDX3s are not supported. [But they do work!]

In the case of uVAXII, ALWAYS ANSWER YES TO THE QUESTION "USE 
AUTOFORMAT MODE". If you don’t have MDM V1.08 or better, get it.
Someone in the branch/district is on automatic distribution and should
be sharing with all.

Drive vs Controller compatibility:

The RQDX2 is an RQDX1 with some ECOs and larger RAMs etc. So all of
the RQDX1/2 ucode versions can be viewed as a linear progression.

23-188E5-00 & 23-189E5-00 ==>  V10.0E  RQDX2  Will handle RX50,RD51/52/53
23-178E5-00 & 23-179E5-00 ==>  V10.0D  RQDX2  Will handle RX50,RD51/52/53

23-172E5-00 & 23-173E5-00 ==>  VO9.4E  RQDX1  Will handle RX50,RD51/52
23-042E5-00 & 23-043E5-00 ==>  VO9.0   RQDX1  Will handle RX50,RD51/52
93-264E4-00 & 23-265E4-00 ==>  VO8.0   RQDX1  Will handle RX50,RD51
23-238E4-00 & 23-239E4-00 ==>  VO7.0   RQDX1  Will handle RX50,RD51

Now, if by changing the controller, or the proms, you change the ucode
revision level up, the new ucode will restructure the RCT/FCT tables
on the connected disks to it’s satisfaction. If the ucode revision
level is to a lower level, the older ucode won’t understand the disk
structure, and the disk(s) will have to be formatted. The ucode is
only upwards compatible, not downwards.

If you replace an RQDX1/2 with an RQDX3, you MUST reformat the disk.
RQDX1/2 controllers have the disk geometry stored in roms. RQDX3
controllers write the disk geometry on the disk during format, (in
the UIT) and then read it off the disk after every power up sequence
or controller init sequence.

If an RQDX3 tries to bring an RD5x drive online and can’t read the UIT,
it assumes that the drive is an RD51. The solution is to reformat.

If an RQDX2 is configured with 3 RD5x drives and no RX50, the 3rd RD5x
(drive 2) has the removable bit set. When a sniffer boot sees this it
will try to boot drive 2 first instead of drive 0.

How you can help:

MSD-CSSE is working on identifying problem scenarios associated with
this subsystem. You can help us by reporting specific 
syndromes/symptoms that you see being repeated at multiple customer
sites. Please send as much detail as possible, include exact drive
and controller type, firmware rev level, CPU type and rev level, boot
rom rev level, box type, operating system etc. Send reports to me at:

James B. Frazier MS: ZKO2-1/N42
or on the ENET: PULSAR:: FRAZIER