Common File System

The following excerpt from:
TOPS-20 System Manager's Guide June 1990 TOPS-20 (KL Model B) Version 7.0
explains in detail how the Common File System (CFS) works on the DEC PDP-10, DECSYSTEM-20 computer running the TOPS-20 operating system. CFS uses the HSC50 Hierarchical Storage Controller and the CI20 Computer Interconnect for communication between PDP-10s.
                               THE COMMON FILE SYSTEM


                                      CONTENTS

          12.1    OVERVIEW 
          12.1.1    CFS HARDWARE 
          12.1.2    CFS SOFTWARE 
          12.1.3    CFS USERS  
          12.1.4    CFS and DECnet 
          12.1.5    CFS and TIGHTLY-COUPLED SYSTEMS  
          12.1.6    Limitations  
          12.1.7    "Cluster Data Gathering" 
          12.1.8    Cluster GALAXY 
          12.2    PLACEMENT OF FILES 
          12.2.1    Update Files 
          12.2.2    Files on Served Disks  
          12.2.3    Mail Files 
          12.2.4    Sharing System Files 
          12.3    LOAD BALANCING 
          12.3.1    Dedicating Systems 
          12.3.2    Assigning Users to Systems 
          12.4    STRUCTURE NAMES  
          12.5    SYSTEM LOGICAL NAMES 
          12.6    SHARING STRUCTURES AMONG SYSTEMS 
          12.6.1    Sharing System Structures  
          12.6.2    Sharing the Login Structure  
          12.6.2.1    Creating the Login Structure 
          12.6.2.2    Enabling "Login Structure" 
          12.6.2.3    Disabling "Login Structure"  
          12.6.2.4    PS: and BS: Directories  
          12.7    RESTRICTING STRUCTURES TO ONE SYSTEM 
          12.8    DISMOUNTING STRUCTURES 
          12.9    MAKING THE CI UNAVAILABLE TO A SYSTEM  
          12.10   USING DUMPER 
          12.11   ERRORS 
          12.11.1   Communication Problems 
          12.11.2   Massbus Problems with Dual-Ported Disk Drives  12-24
          12.12   SHUTTING DOWN A CFS SYSTEM 



  12.1  OVERVIEW

  The Common File System (CFS) is a feature of TOPS-20 that allows users
  from  more  than  one  system  to  simultaneously  access  files.  Any
  structure in the CFS configuration can be made available to  any  user
  for reading or writing.

  Each TOPS-20 system in the CFS configuration  has  its  own  operating
  system,  main  memory, system structure, console, unit-record devices,
  and processes to be scheduled and run.  But  the  systems  are  linked
  through  a  shared  file  system.   This  unified  file  system can be
  composed of all the disk structures on all systems.  These  structures
  appear to users as local to their own systems.

  The main features of CFS are:

        o  It increases file accessibility.  For example, if a system is
           down  for  maintenance, users can log onto another system and
           still access all files that do not depend on the down  system
           for access.

        o  It lets you adjust loads on systems by reassigning  users  as
           loads  require.   (Or,  users  themselves  may  be allowed to
           switch systems as they  see  fit.)  These  changes  need  not
           result in file-access limitations.

        o  It lets you  reduce  the  time  that  would  be  involved  in
           maintaining duplicate sets of files.

        o  It lets you save disk  space  by  minimizing  duplication  of
           files on different systems.
 
  CFS (with Cluster GALAXY  software)  also  lets  users  send  jobs  to
  printers connected to any system in the configuration.



  12.1.1  CFS HARDWARE

  The following are typical CFS configurations:


              +--------------+                  +--------------+
              |              |                  |              |
   SYSTEM ----| DECSYSTEM-20 |-------DISK-------| DECSYSTEM-20 |-- SYSTEM
  STRUCTURE   |              |-------DISK-------|              |  STRUCTURE
              |       +------|                  +------+       |
              |       | CI20 |                  | CI20 |       |
              +--------------+                  +--------------+
                          \\                      //            
                           \\                    //
                            \\                  //
                             \\                //
                              \\              //
                               \\            //
                                \\          //
                              +----------------+
                              |                |
                              |      STAR      |
                              |                |
                              |     COUPLER    |
                              |                |
                              +----------------+
                                ||          ||
                                ||          ||
                              +----------------+---DISK
                              |                |---DISK
                              |     HSC 50     |---DISK
                              |                |---DISK
                              +----------------+


  Figure 12-1:  Two Systems with Massbus Disks and HSC50-based Disks


              +--------------+                  +--------------+
              |              |                  |              |
   SYSTEM ----| DECSYSTEM-20 |-------DISK-------| DECSYSTEM-20 |-- SYSTEM
  STRUCTURE   |              |-------DISK-------|              |  STRUCTURE
              |       +------|-------DISK-------+------+       |
              |       | CI20 |-------DISK-------| CI20 |       |
              +--------------+                  +--------------+
                          \\                      //            
                           \\                    //
                            \\                  //
                             \\                //
                              \\              //
                               \\            //
                                \\          //
                              +----------------+
                              |                |
                              |      STAR      |
                              |                |
                              |     COUPLER    |
                              |                |
                              +----------------+


  Figure 12-2:  Two Systems with Massbus Disks


  Star Coupler

  The star coupler provides the  physical  interconnection  for  the  CI
  cables among DECSYSTEM-20s and HSC50s.  The maximum distance between a
  system and the star coupler is 45 meters.

  A DECSYSTEM-20 can be connected to just one star coupler.  That is, it
  can be part of only one CFS cluster.


  CI

  The Computer Interconnect (CI) bus is the communications link used  by
  CFS.  It also connects systems to HSC50-based disks (RA60s and RA81s).
  In addition, it provides access to massbus disks for systems without a
  direct  connection  to  those  disks, for example, to another system's
  system structure.

  Each system has four communications links to the star coupler.  Two of
  them  are  for  transmitting  data and the other two are for receiving
  data.   The  redundant  CI  connections   are   used   for   increased
  availability  and performance.  When one of the connections has failed
  or is in use,  the  CI  microcode  chooses  the  other  one  for  data
  transmission.   At start-up, TOPS-20 verifies that at least one set of
  transmit and receive connections is working.


  CI20

  The CI20 port adapter provides the interface between the  DECSYSTEM-20
  and the CI bus.  Only one CI20 is allowed per system.


  Massbus Disks

  Multisystem access may be granted to all massbus disks.

  It is  recommended  that  massbus  disks  intended  to  be  shared  be
  dual-ported  between  two DECSYSTEM-20s (drive port switches placed in
  the A/B position).  With a two-system CFS  cluster,  this  avoids  the
  overhead  involved in file-server activity, as described later in this
  section.  However, the systems must be able to communicate  with  each
  other  over  the  CI; they must be connected to the same star coupler.
  Otherwise, neither system will be allowed access to the  disk.   Thus,

  the following configurations are not supported:


              +--------------+                  +--------------+
              |              |                  |              |
              |     G        |                  |      H       |
              |              |-------DISK-------|              |  
              |              |       (A/B)      |              |
              |              |                  |              |
              +--------------+                  +--------------+



  +--------------+                  +--------------+    +--------------+
  |              |                  |              |    |              |
  |     G        |                  |      H       |    |     I        |
  |              |-------DISK-------|              |    |              |
  |              |       (A/B)      |              |    |              |
  |              |                  |              |    |              |
  +--------------+                  +--------------+    +--------------+
                                           |                    |
                                           |                    |
                                           |                    |
                                           |                    |
                                           |                    |
                                           |                    |
                                    --------------------------------- CI




  +--------+    +--------+                  +--------+    +--------+
  |        |    |        |                  |        |    |        |
  |   G    |    |  H     |                  |    I   |    |     J  |
  |        |    |        |-------DISK-------|        |    |        |
  |        |    |        |       (A/B)      |        |    |        |
  |        |    |        |                  |        |    |        |
  +--------+    +--------+                  +--------+    +--------+
      |              |                          |              |
      |              |                          |              |
      |              |                          |              |
      |              |                          |              |
  ------------------------- CI               ------------------------ CI


  In the first two figures, systems G and H are  not  joined  in  a  CFS
  configuration.   The  same  applies  to  systems  H and I in the third
  figure.  TOPS-20 maintains the integrity of data on  shared  disks  by
  ensuring  that  the  systems  can, over the CI, coordinate accesses to
  those disks.

  Massbus disks not directly connected to a system  are  called  "served
  disks"   because   TOPS-20's  MSCP  (Mass  Storage  Control  Protocol)
  file-served facility makes this "outside" access possible.  To  enable
  an  outside path to a massbus disk, that is, to make it a served disk,
  enter an ALLOW command in the n-CONFIG.CMD file, on a system to  which
  the disk drive is connected, in the form:

  ALLOW <drive type> serial number

  The drive type is one of the following:  RP06, RP07, or RP20.  You can
  obtain the serial number with the command:

  OPR>SHOW CONFIGURATION DISK-DRIVE<RET>

  Note that TOPS-20 creates an RP20 serial number by adding 8000 to  the
  disk drive unit number.  Therefore, RP20 unit numbers should be unique
  among CFS systems.

  To disallow access to a served disk that was allowed access, enter the
  following command in the n-CONFIG.CMD file:
  
  RESTRICT <drive type> serial number
  
  Disks are RESTRICTED by default if you do not specify ALLOW commands.

                                   NOTE

          Disks that make up the system structure  must  not  be
          dual ported to another TOPS-20 system.



  12.1.2  CFS SOFTWARE

  Intersystem communication is an integral part of  CFS.   When  TOPS-20
  starts  up, it makes a CFS connection with each TOPS-20 system that is
  already  running.   This  establishes  the   contact   necessary   for
  intersystem file-system management.

  In reality, only one system writes to a 256K section of a  file  at  a
  time.   When  a  system  needs  write  access  to  a  file section, it
  broadcasts  a  request  for  that  resource  to  all  systems  it  has
  established  contact with.  If another system already owns the desired
  write access, that system will respond negatively.  Clearance will  be
  granted  to  the  requesting  system  only  after the other system has
  completed the write operation by writing the data back  to  disk  from
  its  memory.   Thus,  systems  negotiate for write access to files and
  keep each other informed of the state of the disks  that  they  share.
  This ensures the integrity of data on those disks.

  Because intersystem communication is  vital  to  CFS  operations,  the
  systems  stay  alert to CI problems and to other indications that they
  may have lost contact with each other.  Section 12.11.1, Communication
  Problems,  discusses  the  actions  that  systems take when there is a
  breakdown in communications.

  The INFORMATION CLUSTER command displays the names of HSC50s  and  CFS
  systems that are currently accessible.


  DATE and TIME

  When a CFS system starts up, it takes  the  date  and  time  from  the
  systems  that  are  already running.  The operator is not prompted for
  this information.  Instead, the system types a message similar to  the
  following on the operator's terminal:

  The date and time is:  Wednesday, 11-MAY-1988 9:38AM

  This typeout serves as a check on the date  and  time.   If  no  other
  system is running, the operator is prompted for the information.

  When the date and time are changed on any CFS system, such as with the
  ^ESET  command,  all  other  systems  are  notified  so  that they can
  re-synchronize.  This synchronization ensures that the  creation  date
  and  time  of  files  written  from one system are consistent with the
  other CFS systems.  Otherwise, many programs that use this information
  could malfunction.



  12.1.3  CFS USERS

  CFS is transparent to users:

        o  Users are normally unaware that someone from  another  system
           may  be  accessing  a  file  at  the same time that they are,
           except in such cases as the following.  A file being read  on
           system "A" will prevent its being renamed on system "B."

        o  Users are not required to know about the  CFS  configuration.
           Specifically,  they do not need to know how massbus disks are
           ported.  To access files, all they need to know are structure
           names, as on non-CFS systems.

  The INFORMATION CLUSTER  command  lets  users  know  what  HSC50s  and
  TOPS-20 systems are currently accessible to their systems.



  12.1.4  CFS and DECnet

  A CFS configuration differs from a DECnet  network.   Although  a  CFS
  configuration  comprises  multiple  independent  systems,  the systems
  share a unified file system and  cooperate  in  its  operation.   They
  function more as a single system than as systems merely communicating.
  If the optional DECnet-20  software  is  installed,  each  CFS  system
  running DECnet is a DECnet network node with its own node name.

  The files in CFS disk structures may be accessible to  remote  systems
  by  way  of  such  DECnet  facilities as NFT.  However, a node name is
  needed to access files in this way.  CFS users, on the other hand,  do
  not need to specify node names.

  All systems in a CFS configuration must  be  TOPS-20  systems.   In  a
  DECnet  network,  however,  other  systems  that support DECnet can be
  included.

  DECnet on a system allows access to other  CFS  clusters  as  well  as
  DECnet  communication  between systems in a cluster (for example, with
  the SET HOST command).


  Table 12-1:  Comparison of CFS and DECnet


    __________________________________________________________________

    Characteristic                           CFS            DECnet
    __________________________________________________________________

    Multiple systems                         X              X

    TOPS-20 systems only                     X

    One file system                          X

    Node name in file spec                                  X

    DECnet software                                         X

    CI                                       X              X

    NI                                                      X
    __________________________________________________________________




  12.1.5  CFS and TIGHTLY-COUPLED SYSTEMS

  A  CFS  cluster  also  differs  from  tightly-coupled  multiprocessing
  environments.   Each  CFS system has its own main memory, which is not
  shared with another system.  It also has its own system structure  for
  booting and swapping and may have its own public structure for logging
  in.  Also, CFS systems do not perform automatic load balancing.   That
  is,  the  CPUs do not relieve each other of processing during high job
  loads.  All jobs, including batch jobs, run only on the computer  that
  the user logs onto.



  12.1.6  Limitations

  CFS does  not  coordinate  use  of  the  following  facilities  across
  systems:   IPCF  and  OPENF OF%DUD.  As an example, a DBMS application
  cannot span multiple systems,  because  DBMS  uses  the  OPENF  OF%DUD
  facility.   Therefore,  such  applications  should  be restricted to a
  single system.  Attempts to cross systems using these facilities  will
  generate error messages.

  CFS allows for shared disk files and line printers.  However, it  does
  not provide for shared magnetic tapes.



  12.1.7  "Cluster Data Gathering"

  The "cluster data gathering" system application (CLUDGR) is enabled by
  default   in   the   n-CONFIG.CMD   file.    This   "SYSAP"   collects
  cluster-related data so that, for example:

        o  Users can obtain information on remote systems in the cluster
           by way of the SYSTAT command.

        o  Users can send messages throughout the cluster with the  SEND
           command.

        o  Operators can obtain scheduling information on remote systems
           (SHOW  SCHEDULER),  receive structure status information from
           system responses during remote structure dismounts, and  send
           messages to users throughout the cluster (^ESEND and SEND).

        o  System programmers can use the INFO% monitor call  to  obtain
           information  on  remote  cluster  systems.   (As described in
           Section 11.1, you can control access  to  the  INFO%  monitor
           call through the access control program.)

  You can disable and enable user, operator, and programmer CLUDGR SYSAP
  functions in the n-CONFIG.CMD file with the following commands:

  DISABLE CLUSTER-INFORMATION
  DISABLE CLUSTER-SENDALLS

  ENABLE CLUSTER-INFORMATION
  ENABLE CLUSTER-SENDALLS

                                   NOTE

          The CLUDGR SYSAP functions cannot be disabled for  the
          GALAXY components.

  During timesharing, the operator can disable  and  enable  these  same
  functions with the following privileged commands:

  ^ESET [NO] CLUSTER-INFORMATION
  ^ESET [NO] CLUSTER-SENDALLS



  12.1.8  Cluster GALAXY

  GALAXY is the TOPS-20 batch and spooling subsystem.  In a cluster,  it
  lets operators:

        o  Dismount a structure from a single terminal in a cluster even
           if  the  structure  is mounted on more than one system in the
           cluster.  The dismount with removal process is automated.

        o  Mount structures on a remote system in the cluster.

        o  Set a structure exclusive from a single terminal in a cluster
           even  if  the  structure  has  been  mounted on more than one
           system in the cluster.  This process is automated.

        o  Send messages to all users on remote systems in the cluster.

        o  Control cluster printers

        o  Obtain remote information through most of the SHOW commands.

        o  Obtain the status of inter-system GALAXY DECnet connections.

  Cluster GALAXY lets users:
  
       1.  Send jobs to cluster printers
  
       2.  Receive information on remote print requests
  
       3.  Cancel remote print requests
  
       4.  Receive  notification  of  remote  print  job   queuing   and
           completion

  "Cluster GALAXY"  requires  DECnet,  TOPS-20  version  7,  and  GALAXY
  version 6.

  You can disable cluster GALAXY on one or more systems by  way  of  the
  GALGEN  dialog.   This  dialogue  can  be  run  during or after system
  installation, and then GALAXY must  be  rebuilt.   However,  with  the
  feature  disabled,  none  of  the  remote  functions  listed above are
  available; the operating environment is as it was in GALAXY version 5,
  with   TOPS-20  version  6.1.   For  example,  to  dismount  a  shared
  structure, the operator had to give commands from a terminal  on  each
  system  on  which the structure was mounted, and there was not a great
  deal of remote information to help the operator with this activity.

  This chapter assumes that the feature is enabled.



  12.2  PLACEMENT OF FILES

  This section offers guidelines for arranging files on CFS systems  for
  maximum performance and efficiency.



  12.2.1  Update Files

  Simultaneous shared writing to a file from multiple systems incurs the
  most  overhead  of  any  CFS  file  access operation.  This is because
  systems involved in shared writing spend  time  seeking  and  granting
  write   permission   and  coordinating  their  moves  in  other  ways.
  Therefore, you might want to place the  involved  users  on  the  same
  system.



  12.2.2  Files on Served Disks

  For optimum performance, you should not place on  served  disks  files
  that  require  frequent access from multiple systems.  This applies to
  both reads and writes.  MSCP file-server operations incur considerable
  overhead, because the system with the direct connection acts as a disk
  controller for the accessing system.   Therefore,  such  files  should
  reside  on  HSC50  disks  or,  in  a  two-system CFS configuration, on
  massbus disks dual ported between systems.



  12.2.3  Mail Files

  By default, users'  mail  files  are  created  and  updated  in  their
  logged-in  directories  on the public structure.  To access this mail,
  users log in and issue appropriate mail commands.  They may have to go
  through  this  login procedure for every system that contains mail for
  them.  You can change this default arrangement  and  simplify  matters
  for  the CFS user who has accounts on multiple systems.  By redefining
  the systemwide logical name POBOX:, as described in Section 3.3.9, you
  can  establish a central location on a sharable structure for all mail
  files in the CFS configuration.  Then, no matter where users  log  in,
  the  mail  facility  sees an accumulation of mail that could have been
  addressed to them at any system in  the  configuration.   Mail  is  no
  longer isolated on individual public structures.

  An added advantage to redefining POBOX:  is that public structures  do
  not fill up with mail files.

  You must create a directory on the structure defined  by  POBOX:   for
  every user in the CFS configuration who is to receive mail.



  12.2.4  Sharing System Files

  Most of the files that normally reside on  system  structures  can  be
  moved  to  a  shared  structure.   Rather than duplicate files in such
  areas as SYS:  and HLP:  across systems, you can keep one set of these
  files on a shared structure.  This saves disk space and eases the task
  of maintaining the files.  Also, time and tape are saved during DUMPER
  backup  and  restore  operations.   Because system files are primarily
  read and not often updated, system performance does not suffer because
  of  this file sharing, provided the structure is not on a server disk.
  If  you  consolidate  system  files,  remember  to  include   in   the
  definitions  for  the  systemwide  logical  names  the structures that
  contain the files.  For example, if the  SYS:   files  reside  on  the
  structure COMBO:, the definition for SYS:  would be:


  DEFINE SYS: (AS) COMBO:<NEW-SUBSYS>, COMBO:<SUBSYS>,-<RET>

  MAIN:<NEW-SUBSYS>, MAIN:<SUBSYS><RET>

  where:

           MAIN:  is the name of a system structure

  You should define structures in this way on all  the  systems,  giving
  the  appropriate  system  structure  name.   Make sure that the shared
  structure or structures are mounted UNREGULATED so that users will  be
  able to access the files without having to give a MOUNT command.

  The drawback to sharing system files is that if there is trouble  with
  the shared structure, users on all systems suffer.

  Most of the SYSTEM:  files must remain on the  system  structures,  so
  sharing these files is not recommended.


  START-UP FILES

  Certain files must remain on each system structure.  These  files  are
  involved  in  system  start-up  and  are  required before a non-system
  structure is made available to a system.  The following  files  should
  remain in each <SYSTEM> area:

  7-CONFIG.CMD
  7-PTYCON.ATO
  7-SETSPD.EXE
  7-SYSJOB.EXE
  7-SYSJOB.RUN
  ACCOUNTS-TABLE.BIN
  CHECKD.EXE
  DEVICE-STATUS.BIN
  DUMP.EXE
  ERRMES.BIN
  EXEC.EXE
  HOSTS.TXT
  IPADMP.EXE
  IPALOD.EXE
  KNILDR.EXE
  MONITR.EXE
  MONNAM.TXT
  TGHA.EXE

  In addition, all the GALAXY files should remain in each <SUBSYS> area.
  These  files  come from the GALAXY saveset on the TOPS-20 Distribution
  Tape.  (Refer to the TOPS-20 KL Model B Installation Guide.)

  Command files that are used at your installation during start-up  also
  should  be  kept  on  separate system structures.  These files include
  SYSTEM.CMD and NETWORK.CMD.



  12.3  LOAD BALANCING

  This section discusses the distribution of jobs across CFS systems.



  12.3.1  Dedicating Systems

  One way to balance loads is to establish the types of jobs  that  will
  run on particular systems.  For example, you might relegate batch jobs
  to  one  system,  freeing  other  systems  to  run  interactive   jobs
  unimpeded.   To  encourage  users to adopt this arrangement, you could
  give batch jobs the lowest priority on all  but  the  batch-designated
  system.  Users will have to wait a relatively long time for completion
  of batch jobs on non-batch systems.  Refer to Section 10.2, SCHEDULING
  LOW PRIORITY TO BATCH JOBS, for further information.

  Conversely, on the batch system,  you  could  accord  batch  jobs  the
  highest  priority.  Refer to Section 10.1.4, Procedures to Turn On the
  Class Scheduler, for details.  Dedicating a system in this  manner  is
  especially  useful  when  there are many long-running batch jobs at an
  installation.

  Another suggestion is to put software development jobs on  one  system
  and production jobs on another.  Also, you may want to keep one system
  lightly loaded for critical jobs.

  DBMS  applications  and  programming   applications   requiring   IPCF
  facilities  must  be confined to one system.  These are other items to
  consider if you  choose  to  establish  certain  uses  for  particular
  systems.

  Keep in mind that users must log onto the  systems  that  are  to  run
  their  particular  jobs.   This  applies  to  batch jobs also (without
  DECnet).  Batch jobs must be submitted by a  user  logged  in  on  the
  system where they are to run.  The control and log files may reside on
  shared disks.



  12.3.2  Assigning Users to Systems

  In the CFS environment, much of the load balancing is expected  to  be
  performed  by users.  The systems, for example, do not detect that one
  CPU is  overburdened  and  that  another  one  is  underutilized  and,
  accordingly,  reassign  users'  jobs.  Instead, users themselves could
  determine whether or not they should log off a  system  and  log  onto
  another  one  when  system  response  is slow.  Such user tools as the
  SYSTAT and INFORMATION SYSTEM commands and  the  CTRL/T  function  can
  help users in this area.  These tools report on the current state of a
  system.  Among the items reported are the number of jobs running on  a
  system, load averages, the current setting of the bias control "knob,"
  and whether  batch  jobs  are  assigned  to  a  special  class.   This
  information  can be obtained for all systems in the configuration, not
  just for the user's logged-in system.

  If  you  choose  this  load  balancing  scheme,  you   should   create
  directories  for  all  users  on  all the system structures in the CFS
  configuration.  Also, directory usernames should be unique  throughout
  the  configuration,  as described below.  Then, users can log onto any
  system with no problem.


  USERNAMES

  Directory usernames should be unique throughout the CFS configuration.
  For  example,  there should be only one user with the username <BROWN>
  at an installation.  This lets users access system  resources  without
  encountering password-related obstacles or causing security breaches.

  If two  users  on  different  systems  have  the  same  usernames  but
  different  passwords, their passwords will be invalid when they switch
  systems.   If  these  same  users  should  by  chance  have  the  same
  passwords,  they  will have complete access to each other's files when
  they switch systems.  Also, if a structure is mounted on both  systems
  as  domestic, neither user will have to give a password when accessing
  the directory on that structure that has their  username.   (Refer  to
  Section  4.5.7,  Mounting  Structures from Another Installation, for a
  discussion of foreign and domestic structures.)


  DIRECTORY AND USER GROUPS

  To facilitate user access to CFS files, you could make  directory  and
  user  group  numbers  consistent  on  all structures.  That way, users
  could change structures or systems and  their  access  attempts  would
  have predictable outcomes.



  12.4  STRUCTURE NAMES

  Because the structures on all systems  are  part  of  a  unified  file
  system,   structure   names   must   be   unique  throughout  the  CFS
  configuration.

  If it is necessary to  mount  structures  with  duplicate  names,  the
  operator should mount one of the structures using an alias.  (Refer to
  Section 4.5.2, Mounting Structures Having the Same Name.)  The  system
  recognizes  a  structure  by  its  alias,  which  is  the  same as the
  permanent structure identification name, unless  otherwise  specified.
  Note  that  everyone  throughout the CFS configuration must refer to a
  structure by the same alias.



  12.5  SYSTEM LOGICAL NAMES

  Logical names are implemented differently  from  structure  names  and
  their  aliases.   Logical names are local definitions that need not be
  unique nor consistent throughout the  CFS  configuration.   Thus,  the
  same logical name on two different systems can refer to two completely
  different disk areas.  However, because users are likely to be mobile,
  systemwide  logical  names  should be consistent across systems.  This
  will avoid confusion for users who switch systems.

  Refer to Section 3.3, SYSTEM-LOGICAL NAMES, for further information.



  12.6  SHARING STRUCTURES AMONG SYSTEMS

  By default, all structures in the CFS configuration are accessible  to
  all  systems, provided outside paths have been established for massbus
  disks where necessary, using  the  ALLOW  command  (refer  to  Section
  12.1.1,  CFS HARDWARE).  It is necessary to "mount" a structure on any
  system that is to access files on it, however.  That is, the  operator
  or a user on that system must issue a MOUNT command for the structure.
  (There can be up to 64 structures  online  on  one  system.)  After  a
  structure  is  mounted  on a system, users can access it as on non-CFS
  systems.  Users have automatic access to their public structure files,
  as on non-CFS systems.

  If a structure has been restricted to a system through previous use of
  the  operator  command,  SET STRUCTURE str: EXCLUSIVE,  it can be made
  sharable  again  with  the  SET STRUCTURE str: SHARED  command.    The
  operator issues this command from a terminal running OPR on the system
  that has exclusive use of the structure.  Then, MOUNT commands can  be
  issued  for  the  structure  that has been made sharable.  The default
  setting for structures is sharable.

  The operator command, SHOW STATUS STRUCTURE, indicates the  shared  or
  exclusive status for all structures known to a system.


  STRUCTURE ATTRIBUTES

  The operator  specifies  attributes  for  a  structure  with  the  SET
  STRUCTURE  command,  as  described  in  the TOPS-20 Operator's Command
  Language Reference Manual.  They are permanent settings  that  do  not
  revert to default values after system crashes and reloads.

  Note that all systems need not have the same attributes in effect  for
  a  structure.  For example, one system can have a structure mounted as
  foreign and regulated, and another system can have the same  structure
  mounted as domestic and unregulated.  Except for SHARED and EXCLUSIVE,
  attributes are on a single-system basis only.



  12.6.1  Sharing System Structures

  Bear in mind that when system structures are shared, privileged  users
  can  create  privileged  accounts  on  any  system structure, with the
  ^ECREATE command.  This may or may not be desirable.



  12.6.2  Sharing the Login Structure

  In a CFS-20 cluster, it may be advantageous to set  up  a  homogeneous
  environment  where  all  user  accounts  reside  on  a  shared  "login
  structure".  Then, you do not need to maintain  an  account  on  every
  system to which a user has access.



  12.6.2.1  Creating  the  Login  Structure - To  create   this   shared
  structure,  give the following command to the CHECKD program for every
  system in the cluster:

  CHECKD>ENABLE LOGIN-STRUCTURE(FOR STRUCTURE)str:(FOR CPU)cpu<RET>

  where:  str is the name of the login structure and cpu is the system's
  CPU serial number.  The default serial number is the current system's.

  This command adds the CPUs' serial numbers to  the  login  structure's
  home  blocks.   The  following command displays all the serial numbers
  that were entered into the blocks:

  CHECKD>SHOW (INFORMATION  FOR)  LOGIN-SERIAL-NUMBERS  (FOR  STRUCTURE)

  str:<RET>

  where:  str is the name of the login structure

  You should create a directory on this structure for each user  in  the
  cluster.   (You  can also put any other kind of directory on the login
  structure.) Users' LOGIN.CMD files and their .INIT files  for  various
  system programs should reside in these directories.



  12.6.2.2  Enabling  "Login  Structure" - The  system  looks  for  user
  accounts on the login structure rather than on the boot structure when
  you enter the following command in the n-CONFIG.CMD file:

  ENABLE LOGIN-STRUCTURE



  12.6.2.3  Disabling "Login Structure" - You  can  disable  the  "login
  structure" feature with the following commands:


  n-CONFIG.CMD file:

  DISABLE LOGIN-STRUCTURE


  CHECKD program:

  CHECKD>DISABLE LOGIN-STRUCTURE(FOR STRUCTURE)str:(FOR CPU)cpu<RET>

  where:  str is the name  of  the  shared  structure  and  cpu  is  the
  system's  CPU serial number.  The default serial number is the current
  system's.

  These commands cause the system to look for user accounts on the  boot
  structure, which is the default condition.



  12.6.2.4  PS:  and BS:  Directories - Before  you  enable  the  "login
  structure"  feature,  the public structure (PS:) is the boot structure
  (BS:), and is also known as the system structure.

  After you enable the feature, the system  considers  PS:   to  be  the
  login  structure,  the  structure  that  contains  all  the user login
  directories.

  The special system directories, described in Section 3.2, must  remain
  on  BS:,  although you may choose to move many of their files to other
  directories.  However, the files listed in Section 12.2.4 must  remain
  on the boot structure in <SYSTEM>:

  Also, the GALAXY components write files to SPOOL:,  which  the  system
  defines at startup to be BS:<SPOOL>.

                                   NOTE

          Except where noted, this manual assumes that you  have
          not enabled the "login structure" feature.



  12.7  RESTRICTING STRUCTURES TO ONE SYSTEM

  There may be times when you want to restrict use of a structure  to  a
  particular   system.    Such  a  structure  might  be  used  for  DBMS
  applications (refer  to  Section  12.1.6,  Limitations),  or  security
  measures  may  call  for  restricted  use.   For  whatever reason, the
  operator restricts a structure with the following command:

  OPR> SET STRUCTURE str: EXCLUSIVE<RET>

  When the operator gives this command, the system first checks  to  see
  that  the  structure  is  not  in use on other systems.  If it is, the
  operator is given a list of those systems and  asked  whether  or  not
  this  system  should  proceed with an automatic remote dismount of the
  structure from those  systems  (with  the  NO-REMOVAL  option).   This
  information  and  automatic  dismount  requires  cluster  GALAXY to be
  enabled.  Ideally, the operator should beforehand  follow  the  normal
  dismount  procedure  of  making the structure unavailable to new users
  and notifying existing users of the pending dismount.   The  structure
  should be kept unavailable for all systems except the exclusive one so
  that the structure will not be inadvertently shared  when  the  owning
  system crashes.

  After a structure has been dismounted  from  other  systems,  the  SET
  STRUCTURE  EXCLUSIVE command can take effect.  It remains in effect on
  the system, as do all SET STRUCTURE specifications, throughout crashes
  and  reloads.  If users give the MOUNT command for a structure that is
  exclusive  to  another  system,  an  error  message  will  be  issued,
  indicating that the structure is unavailable.

  Note that any system can have exclusive use of any sharable  structure
  except another system's system structure.

  Refer  to  the  TOPS-20  Operator's  Guide  for  details  on   setting
  structures exclusive.



  12.8  DISMOUNTING STRUCTURES

  When issuing a DISMOUNT command for a structure,  operators  have  the
  option  of  specifying that the structure be physically removed from a
  disk drive.  In the CFS environment, however, the system first ensures
  that  the structure is not in use on other systems.  If a structure is
  mounted on another system,  the  operator  is  notified  and  must  go
  through  the  normal  procedure of dismounting the structure (with the
  NO-REMOVAL option) from that system.

  Note that with cluster  GALAXY  enabled,  the  operator  can  dismount
  structures  remotely by performing all activities from an OPR terminal
  on the local system.  The operator does not need to log onto any other
  system.

  Throughout  the  dismount  process,  the  operator  receives   various
  informational  messages as well as error messages if, for example, the
  system cannot get an exclusive lock on the structure  by  way  of  the
  ENQ%  monitor call or communicate with nodes on which the structure is
  mounted.  (The system cannot communicate  with  nodes  that  have  the
  cluster GALAXY feature disabled.)

  Refer to the TOPS-20  Operator's  Guide  for  details  on  dismounting
  structures.

  The default setting on CFS systems is for a structure to be dismounted
  with the no-removal option.

  Sometimes the system instructs the operator  to  dismount  structures.
  This can occur for the following reasons:

        o  The operator attempts to shut down a system.

        o  The operator attempts to make the CI unavailable to a system.
   
        o  A system has been granted exclusive use of a structure.
   
        o  A structure  has  been  physically  dismounted  from  another
           system.
  
  Refer to Sections 12.7, 12.9, and 12.12 for details.
  
  These dismount instructions appear if you  have  included  the  ENABLE
  JOB0-CTY-OUTPUT command in the n-CONFIG.CMD file.



  12.9  MAKING THE CI UNAVAILABLE TO A SYSTEM

  Ordinarily, you need do nothing at all to operate  the  CI.   However,
  you  may  need  to  disengage  a  system  from the CI so Field Service
  personnel can diagnose and/or correct problems with the  CI20  or  the
  HSC50.    Or,   you   may  wish  to  remove  a  system  from  the  CFS
  configuration.  At those times, you should instruct  the  operator  to
  make  the  CI  unavailable  by  means  of  the SET PORT CI UNAVAILABLE
  command.  (Refer to the TOPS-20 Operator's Guide for details.)

  When  the  CI  is  unavailable  to  a  system,  users  cannot   access
  multi-access  disks  (dual-ported  disks, HSC50-based disks, or served
  disks on other systems).  These disks rely on  the  CI  to  coordinate
  accesses  and/or  to  transmit  data.   Served  disks  on  the  system
  disengaging  from  the  CI  will  be  unavailable  to  other  systems.
  Dual-ported  massbus disks in the A/B position will have to be powered
  down and switched to one system.

  When the operator gives the  SET  PORT  CI  UNAVAILABLE  command,  the
  system  indicates  the  structures  that need to be dismounted and the
  disk drives that need to be made unavailable.  The operator is advised
  to   follow   the   normal  procedures  of  forewarning  users  before
  dismounting  structures  and  making  disk  drives  unavailable.   The
  command  option  to  forcibly disengage a system from the CI should be
  reserved for emergencies.  If the operator determines that disengaging
  from  the  CI  will  be  too disruptive to users, the operator has the
  option of aborting the procedure.

  To put the CI back in operation, the operator gives the command:

  OPR>SET PORT CI AVAILABLE<RET>

  The operator is then asked if any other TOPS-20 system is  running  on
  the  CI.   If  yes, the system rejoining the CFS configuration must be
  rebooted.  If no, the CI20 will  be  reloaded  and  started.   If  the
  operator  answers  "no"  and another TOPS-20 system is found after the
  CI20 has started, a CFRECN BUGHLT is issued on processors  with  lower
  serial  numbers  than  the  system  joining  the  cluster  and on this
  processor also, if there's a system  in  the  cluster  with  a  higher
  serial  number.   (See  Section 12.11.1 for details.) After the system
  rejoins the configuration, structures that were affected when  the  CI
  was made unavailable will need to be remounted.



  12.10  USING DUMPER

  CFS offers operators and users flexibility  in  saving  and  restoring
  disk  files.  The only restriction is that DUMPER must be running on a
  system to which tape drives are attached.  Tape drives are not  served
  through CFS.



  12.11  ERRORS

  This section discusses the actions  you  or  the  operator  take  when
  errors  occur  in  the  CFS  environment.   It  also describes how CFS
  systems react to  various  errors.   Note  that  there  is  no  single
  hardware  or  software point that can disable the whole configuration.
  For example, systems can start up or crash with little impact on other
  systems.



  12.11.1  Communication Problems

  CFS systems are sensitive to breaks in communication, whether they are
  caused  by  CI20 errors or system crashes.  Because the data integrity
  of shared structures depends  on  unbroken  intersystem  contact,  the
  systems  take quick action to prevent data corruption.  Therefore, you
  may observe any of the following when systems lose contact  with  each
  other.  These should be rare occurrences.

        o  For a period of time calculated as 5 seconds per node  (hosts
           and  HSCs),  no  system  in  the configuration can access any
           multi-access disks  (dual-ported  disks,  HSC50-based  disks,
           served disks on other systems).

           This interval allows each system to check that its  own  CI20
           and  segment  of  the  CI bus are working.  Most likely, some
           system's CI20 microcode  has  stopped  and  is  automatically
           reloaded  during  the  interval,  or  a  system  has crashed.
           (There  may  be   other,   unpredictable   reasons   for   CI
           disruption.)  Jobs that were accessing multi-access disks are
           suspended until data integrity is assured.

           If the CI20 and CI bus are working  before  the  end  of  the
           interval,  the  system  can resume accessing all multi-access
           disks except server disks on a crashed system.

        o  A system crashes with a KLPNRL BUGHLT.  This happens  if  the
           CI20  microcode takes longer to reload than 10 seconds.  This
           BUGHLT is expected to occur  rarely,  because  the  microcode
           should be reloaded within a couple of seconds.

       o  If communication resumes after the interval mentioned at  the
           beginning  of  this section, without the faulty system having
           crashed and restarted,  the  system  with  the  lower  serial
           number  crashes  with  a  CFRECN BUGHLT message as the faulty
           system tries to establish contact with each  running  system.
           That  is,  a  system joining the cluster illegally will crash
           any system already in the cluster with  a  lower  CPU  serial
           number.   The  node  itself  will crash if there is already a
           node in the cluster with a higher  CPU  serial  number.   For
           example,  this  occurs when the SET PORT CI AVAILABLE command
           has  caused  communication  to  resume  incorrectly  due   to
           operator  error,  as described in Section 12.9, MAKING THE CI
           UNAVAILABLE TO A SYSTEM.

           With such a delayed  reconnection,  a  system  is  likely  to
           contain   old,   invalid  information  about  the  status  of
           multi-access  disks.   This  is  because  other  systems  are
           allowed  to  access the disks after the interval, believing a
           faulty system is no longer running.  Therefore,  systems  are
           selected to crash so that a fresh database can be established
           for the disks when the systems restart.

           EXAMPLE

           There are four systems in a cluster with serial  numbers  one
           through four:

               System                Serial Number

                  A                                            1
                  B                                            2
                  C                                            3
                  D                                            4

           System B leaves the cluster and then tries  to  rejoin  after
           the  delay  allowance  has expired.  System A crashes because
           its serial number is lower than system B's.  System B crashes
           when  it  tries to establish contact with either systems C or
           D, whose serial numbers are higher than B's.  Systems C and D
           remain running.


 AT STARTUP

  Sometimes, communication problems begin at system startup.   A  system
  that has just started up tries to communicate with each TOPS-20 system
  and HSC that is already running.  After the SETSPD  program  sets  the
  systemwide  defaults,  a system joining the CFS cluster checks to make
  sure that:

        o  Its own CI20 is working.  If there is  any  malfunction,  the
           "CFS  joining"  process  is  aborted.   The  system  makes no
           further attempts to communicate  with  other  CFS  nodes  and
           remains outside the CFS cluster.

        o  Its segment of the CI bus is working.

  If the areas above are  satisfactory,  the  system  starting  up  then
  checks to see if, for each cluster node:

        o  The CI20 at the remote node is in maintenance  mode  (TOPS-20
           nodes  only,  not  HSCs).   If  so,  the system knows that it
           cannot communicate with that  node  and  tries  to  establish
           contact with the next node.

        o  Its own CI20 driver has created a system block for the remote
           node.   The  driver creates this block when the remote system
           responds to its request for recognition.   The  system  block
           allows  for  a  virtual circuit to be established between the
           two systems, over which inter-node data and messages are sent
           on the CI.  If the block does not yet exist, the system sends
           a  message  to  the  CTY  so  that  the  operator  can   take
           appropriate  action  on  the  remote  node.   This  situation
           usually indicates a hardware problem with the remote system's
           CI20.

  If a system block has been created for  a  remote  TOPS-20  node,  the
  system  tries  to  establish a CFS connection with that node by way of
  the virtual circuit.  It  is  through  CFS  connections  that  systems
  communicate  in  order  to  coordinate access to shared disks.  If the
  attempt fails, a message is sent to the CTY for operator action.  This
  situation  usually  indicates  a  software  problem,  most likely that
  TOPS-20 is not running at the remote node.

  If the attempt at communication is successful, a confirming BUGINF  is
  sent to the starting system's CTY.

  A system starting up makes these communication checks for every  other
  node in the cluster.

  Refer to the TOPS-20 Operator's Guide  for  details  on  the  operator
  information and error messages.



  12.11.2  Massbus Problems with Dual-Ported Disk Drives

  Dual-ported disk drives  are  accessed  by  each  system  through  the
  massbus  hardware  connections.  However, if for some reason a massbus
  path becomes unavailable to a system, the other system,  with  working
  massbus  connections,  can provide access to the drives affected, with
  the MSCP file server.  The disks become "served."

  The operator enables this facility by  powering  down  the  disks  and
  flipping the drive port switches from the A/B position to the position
  that corresponds to the servicing  system.   Then  the  operator  must
  reboot  the system with the faulty massbus link.  These procedures are
  required because a running system will never invoke  the  MSCP  server
  after  identifying  a  massbus path for a disk.  It is assumed that an
  ALLOW command has been entered in the n-CONFIG.CMD file for  the  disk
  drives, as described in Section 12.1.1, CFS Hardware.

  The operator returns the switches to the A/B position when the massbus
  problem  is  corrected.   The  PHYTPD BUGINF is then issued to confirm
  that the massbus will now be used for data transmission.



  12.12  SHUTTING DOWN A CFS SYSTEM

  When an operator issues the ^ECEASE command to  shut  down  a  system,
  outside  jobs  that  may be accessing the system's served disks do not
  hang, with the following procedure.  If any  served  disks  have  been
  mounted  from other CFS systems, the operator is warned to check those
  systems for possible structure dismounting instructions.

  At the other systems, meantime, if any served disks are mounted on the
  system  shutting  down, the operator is warned of the pending shutdown
  and is advised to dismount the structures listed.

  In a CFS-20 cluster, a shutdown on one  system  causes  "system  going
  down"  messages to be transmitted to all systems in the cluster at the
  sixty-minute, five-minute, and one-minute marks.  For example, if SYSA
  is shutting down, the following messages appear clusterwide:

  [System SYSA going down in 60 minutes at 1-Dec-87 16:29:22]

  [System SYSA going down in 5 minutes at 1-Dec-87 16:29:22]

  [System SYSA going down in one minute!!]
Common File System

Navigation menu

Views

Personal tools

Navigation

Search

Tools