Common File System
From Computer History Wiki
The following excerpt from:
TOPS-20 System Manager's Guide June 1990 TOPS-20 (KL Model B) Version 7.0
explains in detail how the Common File System (CFS) works on the DEC PDP-10, DECSYSTEM-20 computer running the TOPS-20 operating system. CFS uses the HSC50 Hierarchical Storage Controller and the CI20 Computer Interconnect for communication between PDP-10s.
THE COMMON FILE SYSTEM
CONTENTS
12.1 OVERVIEW
12.1.1 CFS HARDWARE
12.1.2 CFS SOFTWARE
12.1.3 CFS USERS
12.1.4 CFS and DECnet
12.1.5 CFS and TIGHTLY-COUPLED SYSTEMS
12.1.6 Limitations
12.1.7 "Cluster Data Gathering"
12.1.8 Cluster GALAXY
12.2 PLACEMENT OF FILES
12.2.1 Update Files
12.2.2 Files on Served Disks
12.2.3 Mail Files
12.2.4 Sharing System Files
12.3 LOAD BALANCING
12.3.1 Dedicating Systems
12.3.2 Assigning Users to Systems
12.4 STRUCTURE NAMES
12.5 SYSTEM LOGICAL NAMES
12.6 SHARING STRUCTURES AMONG SYSTEMS
12.6.1 Sharing System Structures
12.6.2 Sharing the Login Structure
12.6.2.1 Creating the Login Structure
12.6.2.2 Enabling "Login Structure"
12.6.2.3 Disabling "Login Structure"
12.6.2.4 PS: and BS: Directories
12.7 RESTRICTING STRUCTURES TO ONE SYSTEM
12.8 DISMOUNTING STRUCTURES
12.9 MAKING THE CI UNAVAILABLE TO A SYSTEM
12.10 USING DUMPER
12.11 ERRORS
12.11.1 Communication Problems
12.11.2 Massbus Problems with Dual-Ported Disk Drives 12-24
12.12 SHUTTING DOWN A CFS SYSTEM
12.1 OVERVIEW
The Common File System (CFS) is a feature of TOPS-20 that allows users
from more than one system to simultaneously access files. Any
structure in the CFS configuration can be made available to any user
for reading or writing.
Each TOPS-20 system in the CFS configuration has its own operating
system, main memory, system structure, console, unit-record devices,
and processes to be scheduled and run. But the systems are linked
through a shared file system. This unified file system can be
composed of all the disk structures on all systems. These structures
appear to users as local to their own systems.
The main features of CFS are:
o It increases file accessibility. For example, if a system is
down for maintenance, users can log onto another system and
still access all files that do not depend on the down system
for access.
o It lets you adjust loads on systems by reassigning users as
loads require. (Or, users themselves may be allowed to
switch systems as they see fit.) These changes need not
result in file-access limitations.
o It lets you reduce the time that would be involved in
maintaining duplicate sets of files.
o It lets you save disk space by minimizing duplication of
files on different systems.
CFS (with Cluster GALAXY software) also lets users send jobs to
printers connected to any system in the configuration.
12.1.1 CFS HARDWARE
The following are typical CFS configurations:
+--------------+ +--------------+
| | | |
SYSTEM ----| DECSYSTEM-20 |-------DISK-------| DECSYSTEM-20 |-- SYSTEM
STRUCTURE | |-------DISK-------| | STRUCTURE
| +------| +------+ |
| | CI20 | | CI20 | |
+--------------+ +--------------+
\\ //
\\ //
\\ //
\\ //
\\ //
\\ //
\\ //
+----------------+
| |
| STAR |
| |
| COUPLER |
| |
+----------------+
|| ||
|| ||
+----------------+---DISK
| |---DISK
| HSC 50 |---DISK
| |---DISK
+----------------+
Figure 12-1: Two Systems with Massbus Disks and HSC50-based Disks
+--------------+ +--------------+
| | | |
SYSTEM ----| DECSYSTEM-20 |-------DISK-------| DECSYSTEM-20 |-- SYSTEM
STRUCTURE | |-------DISK-------| | STRUCTURE
| +------|-------DISK-------+------+ |
| | CI20 |-------DISK-------| CI20 | |
+--------------+ +--------------+
\\ //
\\ //
\\ //
\\ //
\\ //
\\ //
\\ //
+----------------+
| |
| STAR |
| |
| COUPLER |
| |
+----------------+
Figure 12-2: Two Systems with Massbus Disks
Star Coupler
The star coupler provides the physical interconnection for the CI
cables among DECSYSTEM-20s and HSC50s. The maximum distance between a
system and the star coupler is 45 meters.
A DECSYSTEM-20 can be connected to just one star coupler. That is, it
can be part of only one CFS cluster.
CI
The Computer Interconnect (CI) bus is the communications link used by
CFS. It also connects systems to HSC50-based disks (RA60s and RA81s).
In addition, it provides access to massbus disks for systems without a
direct connection to those disks, for example, to another system's
system structure.
Each system has four communications links to the star coupler. Two of
them are for transmitting data and the other two are for receiving
data. The redundant CI connections are used for increased
availability and performance. When one of the connections has failed
or is in use, the CI microcode chooses the other one for data
transmission. At start-up, TOPS-20 verifies that at least one set of
transmit and receive connections is working.
CI20
The CI20 port adapter provides the interface between the DECSYSTEM-20
and the CI bus. Only one CI20 is allowed per system.
Massbus Disks
Multisystem access may be granted to all massbus disks.
It is recommended that massbus disks intended to be shared be
dual-ported between two DECSYSTEM-20s (drive port switches placed in
the A/B position). With a two-system CFS cluster, this avoids the
overhead involved in file-server activity, as described later in this
section. However, the systems must be able to communicate with each
other over the CI; they must be connected to the same star coupler.
Otherwise, neither system will be allowed access to the disk. Thus,
the following configurations are not supported:
+--------------+ +--------------+
| | | |
| G | | H |
| |-------DISK-------| |
| | (A/B) | |
| | | |
+--------------+ +--------------+
+--------------+ +--------------+ +--------------+
| | | | | |
| G | | H | | I |
| |-------DISK-------| | | |
| | (A/B) | | | |
| | | | | |
+--------------+ +--------------+ +--------------+
| |
| |
| |
| |
| |
| |
--------------------------------- CI
+--------+ +--------+ +--------+ +--------+
| | | | | | | |
| G | | H | | I | | J |
| | | |-------DISK-------| | | |
| | | | (A/B) | | | |
| | | | | | | |
+--------+ +--------+ +--------+ +--------+
| | | |
| | | |
| | | |
| | | |
------------------------- CI ------------------------ CI
In the first two figures, systems G and H are not joined in a CFS
configuration. The same applies to systems H and I in the third
figure. TOPS-20 maintains the integrity of data on shared disks by
ensuring that the systems can, over the CI, coordinate accesses to
those disks.
Massbus disks not directly connected to a system are called "served
disks" because TOPS-20's MSCP (Mass Storage Control Protocol)
file-served facility makes this "outside" access possible. To enable
an outside path to a massbus disk, that is, to make it a served disk,
enter an ALLOW command in the n-CONFIG.CMD file, on a system to which
the disk drive is connected, in the form:
ALLOW <drive type> serial number
The drive type is one of the following: RP06, RP07, or RP20. You can
obtain the serial number with the command:
OPR>SHOW CONFIGURATION DISK-DRIVE<RET>
Note that TOPS-20 creates an RP20 serial number by adding 8000 to the
disk drive unit number. Therefore, RP20 unit numbers should be unique
among CFS systems.
To disallow access to a served disk that was allowed access, enter the
following command in the n-CONFIG.CMD file:
RESTRICT <drive type> serial number
Disks are RESTRICTED by default if you do not specify ALLOW commands.
NOTE
Disks that make up the system structure must not be
dual ported to another TOPS-20 system.
12.1.2 CFS SOFTWARE
Intersystem communication is an integral part of CFS. When TOPS-20
starts up, it makes a CFS connection with each TOPS-20 system that is
already running. This establishes the contact necessary for
intersystem file-system management.
In reality, only one system writes to a 256K section of a file at a
time. When a system needs write access to a file section, it
broadcasts a request for that resource to all systems it has
established contact with. If another system already owns the desired
write access, that system will respond negatively. Clearance will be
granted to the requesting system only after the other system has
completed the write operation by writing the data back to disk from
its memory. Thus, systems negotiate for write access to files and
keep each other informed of the state of the disks that they share.
This ensures the integrity of data on those disks.
Because intersystem communication is vital to CFS operations, the
systems stay alert to CI problems and to other indications that they
may have lost contact with each other. Section 12.11.1, Communication
Problems, discusses the actions that systems take when there is a
breakdown in communications.
The INFORMATION CLUSTER command displays the names of HSC50s and CFS
systems that are currently accessible.
DATE and TIME
When a CFS system starts up, it takes the date and time from the
systems that are already running. The operator is not prompted for
this information. Instead, the system types a message similar to the
following on the operator's terminal:
The date and time is: Wednesday, 11-MAY-1988 9:38AM
This typeout serves as a check on the date and time. If no other
system is running, the operator is prompted for the information.
When the date and time are changed on any CFS system, such as with the
^ESET command, all other systems are notified so that they can
re-synchronize. This synchronization ensures that the creation date
and time of files written from one system are consistent with the
other CFS systems. Otherwise, many programs that use this information
could malfunction.
12.1.3 CFS USERS
CFS is transparent to users:
o Users are normally unaware that someone from another system
may be accessing a file at the same time that they are,
except in such cases as the following. A file being read on
system "A" will prevent its being renamed on system "B."
o Users are not required to know about the CFS configuration.
Specifically, they do not need to know how massbus disks are
ported. To access files, all they need to know are structure
names, as on non-CFS systems.
The INFORMATION CLUSTER command lets users know what HSC50s and
TOPS-20 systems are currently accessible to their systems.
12.1.4 CFS and DECnet
A CFS configuration differs from a DECnet network. Although a CFS
configuration comprises multiple independent systems, the systems
share a unified file system and cooperate in its operation. They
function more as a single system than as systems merely communicating.
If the optional DECnet-20 software is installed, each CFS system
running DECnet is a DECnet network node with its own node name.
The files in CFS disk structures may be accessible to remote systems
by way of such DECnet facilities as NFT. However, a node name is
needed to access files in this way. CFS users, on the other hand, do
not need to specify node names.
All systems in a CFS configuration must be TOPS-20 systems. In a
DECnet network, however, other systems that support DECnet can be
included.
DECnet on a system allows access to other CFS clusters as well as
DECnet communication between systems in a cluster (for example, with
the SET HOST command).
Table 12-1: Comparison of CFS and DECnet
__________________________________________________________________
Characteristic CFS DECnet
Characteristic CFS DECnet
__________________________________________________________________
Multiple systems X X
TOPS-20 systems only X
One file system X
Node name in file spec X
DECnet software X
CI X X
NI X
__________________________________________________________________
12.1.5 CFS and TIGHTLY-COUPLED SYSTEMS
A CFS cluster also differs from tightly-coupled multiprocessing
environments. Each CFS system has its own main memory, which is not
shared with another system. It also has its own system structure for
booting and swapping and may have its own public structure for logging
in. Also, CFS systems do not perform automatic load balancing. That
is, the CPUs do not relieve each other of processing during high job
loads. All jobs, including batch jobs, run only on the computer that
the user logs onto.
12.1.6 Limitations
CFS does not coordinate use of the following facilities across
systems: IPCF and OPENF OF%DUD. As an example, a DBMS application
cannot span multiple systems, because DBMS uses the OPENF OF%DUD
facility. Therefore, such applications should be restricted to a
single system. Attempts to cross systems using these facilities will
generate error messages.
CFS allows for shared disk files and line printers. However, it does
not provide for shared magnetic tapes.
12.1.7 "Cluster Data Gathering"
The "cluster data gathering" system application (CLUDGR) is enabled by
default in the n-CONFIG.CMD file. This "SYSAP" collects
cluster-related data so that, for example:
o Users can obtain information on remote systems in the cluster
by way of the SYSTAT command.
o Users can send messages throughout the cluster with the SEND
command.
o Operators can obtain scheduling information on remote systems
(SHOW SCHEDULER), receive structure status information from
system responses during remote structure dismounts, and send
messages to users throughout the cluster (^ESEND and SEND).
o System programmers can use the INFO% monitor call to obtain
information on remote cluster systems. (As described in
Section 11.1, you can control access to the INFO% monitor
call through the access control program.)
You can disable and enable user, operator, and programmer CLUDGR SYSAP
functions in the n-CONFIG.CMD file with the following commands:
DISABLE CLUSTER-INFORMATION
DISABLE CLUSTER-SENDALLS
ENABLE CLUSTER-INFORMATION
ENABLE CLUSTER-SENDALLS
NOTE
The CLUDGR SYSAP functions cannot be disabled for the
GALAXY components.
During timesharing, the operator can disable and enable these same
functions with the following privileged commands:
^ESET [NO] CLUSTER-INFORMATION
^ESET [NO] CLUSTER-SENDALLS
12.1.8 Cluster GALAXY
GALAXY is the TOPS-20 batch and spooling subsystem. In a cluster, it
lets operators:
o Dismount a structure from a single terminal in a cluster even
if the structure is mounted on more than one system in the
cluster. The dismount with removal process is automated.
o Mount structures on a remote system in the cluster.
o Set a structure exclusive from a single terminal in a cluster
even if the structure has been mounted on more than one
system in the cluster. This process is automated.
o Send messages to all users on remote systems in the cluster.
o Control cluster printers
o Obtain remote information through most of the SHOW commands.
o Obtain the status of inter-system GALAXY DECnet connections.
Cluster GALAXY lets users:
1. Send jobs to cluster printers
2. Receive information on remote print requests
3. Cancel remote print requests
4. Receive notification of remote print job queuing and
completion
"Cluster GALAXY" requires DECnet, TOPS-20 version 7, and GALAXY
version 6.
You can disable cluster GALAXY on one or more systems by way of the
GALGEN dialog. This dialogue can be run during or after system
installation, and then GALAXY must be rebuilt. However, with the
feature disabled, none of the remote functions listed above are
available; the operating environment is as it was in GALAXY version 5,
with TOPS-20 version 6.1. For example, to dismount a shared
structure, the operator had to give commands from a terminal on each
system on which the structure was mounted, and there was not a great
deal of remote information to help the operator with this activity.
This chapter assumes that the feature is enabled.
12.2 PLACEMENT OF FILES
This section offers guidelines for arranging files on CFS systems for
maximum performance and efficiency.
12.2.1 Update Files
Simultaneous shared writing to a file from multiple systems incurs the
most overhead of any CFS file access operation. This is because
systems involved in shared writing spend time seeking and granting
write permission and coordinating their moves in other ways.
Therefore, you might want to place the involved users on the same
system.
12.2.2 Files on Served Disks
For optimum performance, you should not place on served disks files
that require frequent access from multiple systems. This applies to
both reads and writes. MSCP file-server operations incur considerable
overhead, because the system with the direct connection acts as a disk
controller for the accessing system. Therefore, such files should
reside on HSC50 disks or, in a two-system CFS configuration, on
massbus disks dual ported between systems.
12.2.3 Mail Files
By default, users' mail files are created and updated in their
logged-in directories on the public structure. To access this mail,
users log in and issue appropriate mail commands. They may have to go
through this login procedure for every system that contains mail for
them. You can change this default arrangement and simplify matters
for the CFS user who has accounts on multiple systems. By redefining
the systemwide logical name POBOX:, as described in Section 3.3.9, you
can establish a central location on a sharable structure for all mail
files in the CFS configuration. Then, no matter where users log in,
the mail facility sees an accumulation of mail that could have been
addressed to them at any system in the configuration. Mail is no
longer isolated on individual public structures.
An added advantage to redefining POBOX: is that public structures do
not fill up with mail files.
You must create a directory on the structure defined by POBOX: for
every user in the CFS configuration who is to receive mail.
12.2.4 Sharing System Files
Most of the files that normally reside on system structures can be
moved to a shared structure. Rather than duplicate files in such
areas as SYS: and HLP: across systems, you can keep one set of these
files on a shared structure. This saves disk space and eases the task
of maintaining the files. Also, time and tape are saved during DUMPER
backup and restore operations. Because system files are primarily
read and not often updated, system performance does not suffer because
of this file sharing, provided the structure is not on a server disk.
If you consolidate system files, remember to include in the
definitions for the systemwide logical names the structures that
contain the files. For example, if the SYS: files reside on the
structure COMBO:, the definition for SYS: would be:
DEFINE SYS: (AS) COMBO:<NEW-SUBSYS>, COMBO:<SUBSYS>,-<RET>
MAIN:<NEW-SUBSYS>, MAIN:<SUBSYS><RET>
where:
MAIN: is the name of a system structure
You should define structures in this way on all the systems, giving
the appropriate system structure name. Make sure that the shared
structure or structures are mounted UNREGULATED so that users will be
able to access the files without having to give a MOUNT command.
The drawback to sharing system files is that if there is trouble with
the shared structure, users on all systems suffer.
Most of the SYSTEM: files must remain on the system structures, so
sharing these files is not recommended.
START-UP FILES
Certain files must remain on each system structure. These files are
involved in system start-up and are required before a non-system
structure is made available to a system. The following files should
remain in each <SYSTEM> area:
7-CONFIG.CMD
7-PTYCON.ATO
7-SETSPD.EXE
7-SYSJOB.EXE
7-SYSJOB.RUN
ACCOUNTS-TABLE.BIN
CHECKD.EXE
DEVICE-STATUS.BIN
DUMP.EXE
ERRMES.BIN
EXEC.EXE
HOSTS.TXT
IPADMP.EXE
IPALOD.EXE
KNILDR.EXE
MONITR.EXE
MONNAM.TXT
TGHA.EXE
In addition, all the GALAXY files should remain in each <SUBSYS> area.
These files come from the GALAXY saveset on the TOPS-20 Distribution
Tape. (Refer to the TOPS-20 KL Model B Installation Guide.)
Command files that are used at your installation during start-up also
should be kept on separate system structures. These files include
SYSTEM.CMD and NETWORK.CMD.
12.3 LOAD BALANCING
This section discusses the distribution of jobs across CFS systems.
12.3.1 Dedicating Systems
One way to balance loads is to establish the types of jobs that will
run on particular systems. For example, you might relegate batch jobs
to one system, freeing other systems to run interactive jobs
unimpeded. To encourage users to adopt this arrangement, you could
give batch jobs the lowest priority on all but the batch-designated
system. Users will have to wait a relatively long time for completion
of batch jobs on non-batch systems. Refer to Section 10.2, SCHEDULING
LOW PRIORITY TO BATCH JOBS, for further information.
Conversely, on the batch system, you could accord batch jobs the
highest priority. Refer to Section 10.1.4, Procedures to Turn On the
Class Scheduler, for details. Dedicating a system in this manner is
especially useful when there are many long-running batch jobs at an
installation.
Another suggestion is to put software development jobs on one system
and production jobs on another. Also, you may want to keep one system
lightly loaded for critical jobs.
DBMS applications and programming applications requiring IPCF
facilities must be confined to one system. These are other items to
consider if you choose to establish certain uses for particular
systems.
Keep in mind that users must log onto the systems that are to run
their particular jobs. This applies to batch jobs also (without
DECnet). Batch jobs must be submitted by a user logged in on the
system where they are to run. The control and log files may reside on
shared disks.
12.3.2 Assigning Users to Systems
In the CFS environment, much of the load balancing is expected to be
performed by users. The systems, for example, do not detect that one
CPU is overburdened and that another one is underutilized and,
accordingly, reassign users' jobs. Instead, users themselves could
determine whether or not they should log off a system and log onto
another one when system response is slow. Such user tools as the
SYSTAT and INFORMATION SYSTEM commands and the CTRL/T function can
help users in this area. These tools report on the current state of a
system. Among the items reported are the number of jobs running on a
system, load averages, the current setting of the bias control "knob,"
and whether batch jobs are assigned to a special class. This
information can be obtained for all systems in the configuration, not
just for the user's logged-in system.
If you choose this load balancing scheme, you should create
directories for all users on all the system structures in the CFS
configuration. Also, directory usernames should be unique throughout
the configuration, as described below. Then, users can log onto any
system with no problem.
USERNAMES
Directory usernames should be unique throughout the CFS configuration.
For example, there should be only one user with the username <BROWN>
at an installation. This lets users access system resources without
encountering password-related obstacles or causing security breaches.
If two users on different systems have the same usernames but
different passwords, their passwords will be invalid when they switch
systems. If these same users should by chance have the same
passwords, they will have complete access to each other's files when
they switch systems. Also, if a structure is mounted on both systems
as domestic, neither user will have to give a password when accessing
the directory on that structure that has their username. (Refer to
Section 4.5.7, Mounting Structures from Another Installation, for a
discussion of foreign and domestic structures.)
DIRECTORY AND USER GROUPS
To facilitate user access to CFS files, you could make directory and
user group numbers consistent on all structures. That way, users
could change structures or systems and their access attempts would
have predictable outcomes.
12.4 STRUCTURE NAMES
Because the structures on all systems are part of a unified file
system, structure names must be unique throughout the CFS
configuration.
If it is necessary to mount structures with duplicate names, the
operator should mount one of the structures using an alias. (Refer to
Section 4.5.2, Mounting Structures Having the Same Name.) The system
recognizes a structure by its alias, which is the same as the
permanent structure identification name, unless otherwise specified.
Note that everyone throughout the CFS configuration must refer to a
structure by the same alias.
12.5 SYSTEM LOGICAL NAMES
Logical names are implemented differently from structure names and
their aliases. Logical names are local definitions that need not be
unique nor consistent throughout the CFS configuration. Thus, the
same logical name on two different systems can refer to two completely
different disk areas. However, because users are likely to be mobile,
systemwide logical names should be consistent across systems. This
will avoid confusion for users who switch systems.
Refer to Section 3.3, SYSTEM-LOGICAL NAMES, for further information.
12.6 SHARING STRUCTURES AMONG SYSTEMS
By default, all structures in the CFS configuration are accessible to
all systems, provided outside paths have been established for massbus
disks where necessary, using the ALLOW command (refer to Section
12.1.1, CFS HARDWARE). It is necessary to "mount" a structure on any
system that is to access files on it, however. That is, the operator
or a user on that system must issue a MOUNT command for the structure.
(There can be up to 64 structures online on one system.) After a
structure is mounted on a system, users can access it as on non-CFS
systems. Users have automatic access to their public structure files,
as on non-CFS systems.
If a structure has been restricted to a system through previous use of
the operator command, SET STRUCTURE str: EXCLUSIVE, it can be made
sharable again with the SET STRUCTURE str: SHARED command. The
operator issues this command from a terminal running OPR on the system
that has exclusive use of the structure. Then, MOUNT commands can be
issued for the structure that has been made sharable. The default
setting for structures is sharable.
The operator command, SHOW STATUS STRUCTURE, indicates the shared or
exclusive status for all structures known to a system.
STRUCTURE ATTRIBUTES
The operator specifies attributes for a structure with the SET
STRUCTURE command, as described in the TOPS-20 Operator's Command
Language Reference Manual. They are permanent settings that do not
revert to default values after system crashes and reloads.
Note that all systems need not have the same attributes in effect for
a structure. For example, one system can have a structure mounted as
foreign and regulated, and another system can have the same structure
mounted as domestic and unregulated. Except for SHARED and EXCLUSIVE,
attributes are on a single-system basis only.
12.6.1 Sharing System Structures
Bear in mind that when system structures are shared, privileged users
can create privileged accounts on any system structure, with the
^ECREATE command. This may or may not be desirable.
12.6.2 Sharing the Login Structure
In a CFS-20 cluster, it may be advantageous to set up a homogeneous
environment where all user accounts reside on a shared "login
structure". Then, you do not need to maintain an account on every
system to which a user has access.
12.6.2.1 Creating the Login Structure - To create this shared
structure, give the following command to the CHECKD program for every
system in the cluster:
CHECKD>ENABLE LOGIN-STRUCTURE(FOR STRUCTURE)str:(FOR CPU)cpu<RET>
where: str is the name of the login structure and cpu is the system's
CPU serial number. The default serial number is the current system's.
This command adds the CPUs' serial numbers to the login structure's
home blocks. The following command displays all the serial numbers
that were entered into the blocks:
CHECKD>SHOW (INFORMATION FOR) LOGIN-SERIAL-NUMBERS (FOR STRUCTURE)
str:<RET>
where: str is the name of the login structure
You should create a directory on this structure for each user in the
cluster. (You can also put any other kind of directory on the login
structure.) Users' LOGIN.CMD files and their .INIT files for various
system programs should reside in these directories.
12.6.2.2 Enabling "Login Structure" - The system looks for user
accounts on the login structure rather than on the boot structure when
you enter the following command in the n-CONFIG.CMD file:
ENABLE LOGIN-STRUCTURE
12.6.2.3 Disabling "Login Structure" - You can disable the "login
structure" feature with the following commands:
n-CONFIG.CMD file:
DISABLE LOGIN-STRUCTURE
CHECKD program:
CHECKD>DISABLE LOGIN-STRUCTURE(FOR STRUCTURE)str:(FOR CPU)cpu<RET>
where: str is the name of the shared structure and cpu is the
system's CPU serial number. The default serial number is the current
system's.
These commands cause the system to look for user accounts on the boot
structure, which is the default condition.
12.6.2.4 PS: and BS: Directories - Before you enable the "login
structure" feature, the public structure (PS:) is the boot structure
(BS:), and is also known as the system structure.
After you enable the feature, the system considers PS: to be the
login structure, the structure that contains all the user login
directories.
The special system directories, described in Section 3.2, must remain
on BS:, although you may choose to move many of their files to other
directories. However, the files listed in Section 12.2.4 must remain
on the boot structure in <SYSTEM>:
Also, the GALAXY components write files to SPOOL:, which the system
defines at startup to be BS:<SPOOL>.
NOTE
Except where noted, this manual assumes that you have
not enabled the "login structure" feature.
12.7 RESTRICTING STRUCTURES TO ONE SYSTEM
There may be times when you want to restrict use of a structure to a
particular system. Such a structure might be used for DBMS
applications (refer to Section 12.1.6, Limitations), or security
measures may call for restricted use. For whatever reason, the
operator restricts a structure with the following command:
OPR> SET STRUCTURE str: EXCLUSIVE<RET>
When the operator gives this command, the system first checks to see
that the structure is not in use on other systems. If it is, the
operator is given a list of those systems and asked whether or not
this system should proceed with an automatic remote dismount of the
structure from those systems (with the NO-REMOVAL option). This
information and automatic dismount requires cluster GALAXY to be
enabled. Ideally, the operator should beforehand follow the normal
dismount procedure of making the structure unavailable to new users
and notifying existing users of the pending dismount. The structure
should be kept unavailable for all systems except the exclusive one so
that the structure will not be inadvertently shared when the owning
system crashes.
After a structure has been dismounted from other systems, the SET
STRUCTURE EXCLUSIVE command can take effect. It remains in effect on
the system, as do all SET STRUCTURE specifications, throughout crashes
and reloads. If users give the MOUNT command for a structure that is
exclusive to another system, an error message will be issued,
indicating that the structure is unavailable.
Note that any system can have exclusive use of any sharable structure
except another system's system structure.
Refer to the TOPS-20 Operator's Guide for details on setting
structures exclusive.
12.8 DISMOUNTING STRUCTURES
When issuing a DISMOUNT command for a structure, operators have the
option of specifying that the structure be physically removed from a
disk drive. In the CFS environment, however, the system first ensures
that the structure is not in use on other systems. If a structure is
mounted on another system, the operator is notified and must go
through the normal procedure of dismounting the structure (with the
NO-REMOVAL option) from that system.
Note that with cluster GALAXY enabled, the operator can dismount
structures remotely by performing all activities from an OPR terminal
on the local system. The operator does not need to log onto any other
system.
Throughout the dismount process, the operator receives various
informational messages as well as error messages if, for example, the
system cannot get an exclusive lock on the structure by way of the
ENQ% monitor call or communicate with nodes on which the structure is
mounted. (The system cannot communicate with nodes that have the
cluster GALAXY feature disabled.)
Refer to the TOPS-20 Operator's Guide for details on dismounting
structures.
The default setting on CFS systems is for a structure to be dismounted
with the no-removal option.
Sometimes the system instructs the operator to dismount structures.
This can occur for the following reasons:
o The operator attempts to shut down a system.
o The operator attempts to make the CI unavailable to a system.
o A system has been granted exclusive use of a structure.
o A structure has been physically dismounted from another
system.
Refer to Sections 12.7, 12.9, and 12.12 for details.
These dismount instructions appear if you have included the ENABLE
JOB0-CTY-OUTPUT command in the n-CONFIG.CMD file.
12.9 MAKING THE CI UNAVAILABLE TO A SYSTEM
Ordinarily, you need do nothing at all to operate the CI. However,
you may need to disengage a system from the CI so Field Service
personnel can diagnose and/or correct problems with the CI20 or the
HSC50. Or, you may wish to remove a system from the CFS
configuration. At those times, you should instruct the operator to
make the CI unavailable by means of the SET PORT CI UNAVAILABLE
command. (Refer to the TOPS-20 Operator's Guide for details.)
When the CI is unavailable to a system, users cannot access
multi-access disks (dual-ported disks, HSC50-based disks, or served
disks on other systems). These disks rely on the CI to coordinate
accesses and/or to transmit data. Served disks on the system
disengaging from the CI will be unavailable to other systems.
Dual-ported massbus disks in the A/B position will have to be powered
down and switched to one system.
When the operator gives the SET PORT CI UNAVAILABLE command, the
system indicates the structures that need to be dismounted and the
disk drives that need to be made unavailable. The operator is advised
to follow the normal procedures of forewarning users before
dismounting structures and making disk drives unavailable. The
command option to forcibly disengage a system from the CI should be
reserved for emergencies. If the operator determines that disengaging
from the CI will be too disruptive to users, the operator has the
option of aborting the procedure.
To put the CI back in operation, the operator gives the command:
OPR>SET PORT CI AVAILABLE<RET>
The operator is then asked if any other TOPS-20 system is running on
the CI. If yes, the system rejoining the CFS configuration must be
rebooted. If no, the CI20 will be reloaded and started. If the
operator answers "no" and another TOPS-20 system is found after the
CI20 has started, a CFRECN BUGHLT is issued on processors with lower
serial numbers than the system joining the cluster and on this
processor also, if there's a system in the cluster with a higher
serial number. (See Section 12.11.1 for details.) After the system
rejoins the configuration, structures that were affected when the CI
was made unavailable will need to be remounted.
12.10 USING DUMPER
CFS offers operators and users flexibility in saving and restoring
disk files. The only restriction is that DUMPER must be running on a
system to which tape drives are attached. Tape drives are not served
through CFS.
12.11 ERRORS
This section discusses the actions you or the operator take when
errors occur in the CFS environment. It also describes how CFS
systems react to various errors. Note that there is no single
hardware or software point that can disable the whole configuration.
For example, systems can start up or crash with little impact on other
systems.
12.11.1 Communication Problems
CFS systems are sensitive to breaks in communication, whether they are
caused by CI20 errors or system crashes. Because the data integrity
of shared structures depends on unbroken intersystem contact, the
systems take quick action to prevent data corruption. Therefore, you
may observe any of the following when systems lose contact with each
other. These should be rare occurrences.
o For a period of time calculated as 5 seconds per node (hosts
and HSCs), no system in the configuration can access any
multi-access disks (dual-ported disks, HSC50-based disks,
served disks on other systems).
This interval allows each system to check that its own CI20
and segment of the CI bus are working. Most likely, some
system's CI20 microcode has stopped and is automatically
reloaded during the interval, or a system has crashed.
(There may be other, unpredictable reasons for CI
disruption.) Jobs that were accessing multi-access disks are
suspended until data integrity is assured.
If the CI20 and CI bus are working before the end of the
interval, the system can resume accessing all multi-access
disks except server disks on a crashed system.
o A system crashes with a KLPNRL BUGHLT. This happens if the
CI20 microcode takes longer to reload than 10 seconds. This
BUGHLT is expected to occur rarely, because the microcode
should be reloaded within a couple of seconds.
o If communication resumes after the interval mentioned at the
beginning of this section, without the faulty system having
crashed and restarted, the system with the lower serial
number crashes with a CFRECN BUGHLT message as the faulty
system tries to establish contact with each running system.
That is, a system joining the cluster illegally will crash
any system already in the cluster with a lower CPU serial
number. The node itself will crash if there is already a
node in the cluster with a higher CPU serial number. For
example, this occurs when the SET PORT CI AVAILABLE command
has caused communication to resume incorrectly due to
operator error, as described in Section 12.9, MAKING THE CI
UNAVAILABLE TO A SYSTEM.
With such a delayed reconnection, a system is likely to
contain old, invalid information about the status of
multi-access disks. This is because other systems are
allowed to access the disks after the interval, believing a
faulty system is no longer running. Therefore, systems are
selected to crash so that a fresh database can be established
for the disks when the systems restart.
EXAMPLE
There are four systems in a cluster with serial numbers one
through four:
System Serial Number
A 1
B 2
C 3
D 4
System B leaves the cluster and then tries to rejoin after
the delay allowance has expired. System A crashes because
its serial number is lower than system B's. System B crashes
when it tries to establish contact with either systems C or
D, whose serial numbers are higher than B's. Systems C and D
remain running.
AT STARTUP
Sometimes, communication problems begin at system startup. A system
that has just started up tries to communicate with each TOPS-20 system
and HSC that is already running. After the SETSPD program sets the
systemwide defaults, a system joining the CFS cluster checks to make
sure that:
o Its own CI20 is working. If there is any malfunction, the
"CFS joining" process is aborted. The system makes no
further attempts to communicate with other CFS nodes and
remains outside the CFS cluster.
o Its segment of the CI bus is working.
If the areas above are satisfactory, the system starting up then
checks to see if, for each cluster node:
o The CI20 at the remote node is in maintenance mode (TOPS-20
nodes only, not HSCs). If so, the system knows that it
cannot communicate with that node and tries to establish
contact with the next node.
o Its own CI20 driver has created a system block for the remote
node. The driver creates this block when the remote system
responds to its request for recognition. The system block
allows for a virtual circuit to be established between the
two systems, over which inter-node data and messages are sent
on the CI. If the block does not yet exist, the system sends
a message to the CTY so that the operator can take
appropriate action on the remote node. This situation
usually indicates a hardware problem with the remote system's
CI20.
If a system block has been created for a remote TOPS-20 node, the
system tries to establish a CFS connection with that node by way of
the virtual circuit. It is through CFS connections that systems
communicate in order to coordinate access to shared disks. If the
attempt fails, a message is sent to the CTY for operator action. This
situation usually indicates a software problem, most likely that
TOPS-20 is not running at the remote node.
If the attempt at communication is successful, a confirming BUGINF is
sent to the starting system's CTY.
A system starting up makes these communication checks for every other
node in the cluster.
Refer to the TOPS-20 Operator's Guide for details on the operator
information and error messages.
12.11.2 Massbus Problems with Dual-Ported Disk Drives
Dual-ported disk drives are accessed by each system through the
massbus hardware connections. However, if for some reason a massbus
path becomes unavailable to a system, the other system, with working
massbus connections, can provide access to the drives affected, with
the MSCP file server. The disks become "served."
The operator enables this facility by powering down the disks and
flipping the drive port switches from the A/B position to the position
that corresponds to the servicing system. Then the operator must
reboot the system with the faulty massbus link. These procedures are
required because a running system will never invoke the MSCP server
after identifying a massbus path for a disk. It is assumed that an
ALLOW command has been entered in the n-CONFIG.CMD file for the disk
drives, as described in Section 12.1.1, CFS Hardware.
The operator returns the switches to the A/B position when the massbus
problem is corrected. The PHYTPD BUGINF is then issued to confirm
that the massbus will now be used for data transmission.
12.12 SHUTTING DOWN A CFS SYSTEM
When an operator issues the ^ECEASE command to shut down a system,
outside jobs that may be accessing the system's served disks do not
hang, with the following procedure. If any served disks have been
mounted from other CFS systems, the operator is warned to check those
systems for possible structure dismounting instructions.
At the other systems, meantime, if any served disks are mounted on the
system shutting down, the operator is warned of the pending shutdown
and is advised to dismount the structures listed.
In a CFS-20 cluster, a shutdown on one system causes "system going
down" messages to be transmitted to all systems in the cluster at the
sixty-minute, five-minute, and one-minute marks. For example, if SYSA
is shutting down, the following messages appear clusterwide:
[System SYSA going down in 60 minutes at 1-Dec-87 16:29:22]
[System SYSA going down in 5 minutes at 1-Dec-87 16:29:22]
[System SYSA going down in one minute!!]