Difference between revisions of "Unix dump/restore tape format"

From Computer History Wiki
Jump to: navigation, search
(Some notes.)
m
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
The [[Unix]] '''dump''' command makes a full or incremental backup of a single [[disk]] partition to [[magnetic tape]]s.  The corresponding '''restore''' command recovers data from such tapes.  The tape format is vaguely documented in various man pages, but not to a detail necessary to fully understand it.  This primarily describes the formats used by [[Unix V7]], [[4BSD]], and [[SunOS]] around the 1980s.
+
The [[Unix]] '''dump''' command makes a full or incremental backup of a single [[disk]] partition to [[magnetic tape]]s or [[file]]s.  The corresponding '''restore''' command recovers data from such tapes.  The tape format is vaguely documented in various man pages, but not to a detail necessary to fully understand it.  This primarily describes the formats used by [[Unix V7]], [[4BSD]], and [[SunOS]] around the 1980s.
  
 
A dump is logically a series of 1024-byte [[block]]s, grouped ten at a time into tape [[record]]s.  There are no tape marks interspersed between records, just the normal double mark and the end of a tape.  A block can be a [[header]], or data.  Header blocks hold metadata and [[UNIX file system|inode]] information.
 
A dump is logically a series of 1024-byte [[block]]s, grouped ten at a time into tape [[record]]s.  There are no tape marks interspersed between records, just the normal double mark and the end of a tape.  A block can be a [[header]], or data.  Header blocks hold metadata and [[UNIX file system|inode]] information.
Line 113: Line 113:
 
| Addr
 
| Addr
 
| 40 bytes
 
| 40 bytes
| Not present
+
| ''Not present''
 
|-
 
|-
 
| Atime
 
| Atime
Line 120: Line 120:
 
|-
 
|-
 
| ?
 
| ?
| Not present
+
| ''Not present''
 
| 32 bits
 
| 32 bits
 
|-
 
|-
Line 128: Line 128:
 
|-
 
|-
 
| ?
 
| ?
| Not present
+
| ''Not present''
 
| 32 bits
 
| 32 bits
 
|-
 
|-
Line 136: Line 136:
 
|-
 
|-
 
| ?
 
| ?
| Not present
+
| ''Not present''
 
| 92 bytes
 
| 92 bytes
 
|-
 
|-
 +
! Total
 +
| 64 bytes
 +
| 128 bytes
 
|}
 
|}
  
 
===TS_TAPE===
 
===TS_TAPE===
  
Header type 1.  This header must start all volumes.  The Date field must be same on all volumes.  The Volume field identifies the individual tape in the dump.  
+
Header type 1.  This header must start all volumes.  The Date field must be same on all volumes.  The Volume field identifies the individual tape in the dump.  There are no data blocks.
  
 
===TS_BITS and TS_CLRI===
 
===TS_BITS and TS_CLRI===
Line 151: Line 154:
 
===TS_INODE and TS_ADDR===
 
===TS_INODE and TS_ADDR===
  
Header types 2 and 4.  TS_INODE stores the inode data from disk.  After the header block comes data blocks, as per the Addr array.  A one signals a data block is present on tape.  A zero indicates a 1024-byte hole in the file.  If a volume runs out of space during a run of data blocks, the next volume begins with a TS_TAPE block, and then follows the rest of the data blocks.
+
Header types 2 and 4.  TS_INODE stores the inode data from disk.  After the header block comes data blocks, as per the Addr array.  An array item of one signals a data block is present on tape.  A zero indicates a 1024-byte hole in the file.  If a volume runs out of space during a run of data blocks, the next volume begins with a TS_TAPE block, and then follows the rest of the data blocks.
  
 
The Addr field is normally 256 byte at the most.  If it can't accommodate a large file, several TS_ADDR may follow.  A TS_ADDR header stores the same inode data, but following data blocks are appended to the file.
 
The Addr field is normally 256 byte at the most.  If it can't accommodate a large file, several TS_ADDR may follow.  A TS_ADDR header stores the same inode data, but following data blocks are appended to the file.
Line 171: Line 174:
 
|}
 
|}
  
The directory for the new format has variable-length entries.  A 0 entry length signals the end of the directory.
+
The directory for the new format has variable-length entries.  An entry length of 0 signals the end of the directory.
  
 
{| class="wikitable"
 
{| class="wikitable"
Line 193: Line 196:
 
===TS_END===
 
===TS_END===
  
Header type 5.  This header signals the end of the dump on the last volume.  Several TS_END blocks may be written.
+
Header type 5.  This header signals the end of the dump on the last volume.  Several TS_END blocks may be written.  There are no data blocks.
  
 
[[Category: Tape Formats]]
 
[[Category: Tape Formats]]
 
[[Category: UNIX]]
 
[[Category: UNIX]]

Latest revision as of 06:49, 9 September 2022

The Unix dump command makes a full or incremental backup of a single disk partition to magnetic tapes or files. The corresponding restore command recovers data from such tapes. The tape format is vaguely documented in various man pages, but not to a detail necessary to fully understand it. This primarily describes the formats used by Unix V7, 4BSD, and SunOS around the 1980s.

A dump is logically a series of 1024-byte blocks, grouped ten at a time into tape records. There are no tape marks interspersed between records, just the normal double mark and the end of a tape. A block can be a header, or data. Header blocks hold metadata and inode information.

Dumps will normally span several tapes, each of which is called a volume. All volumes begin with a TS_TAPE block. The last volume ends with one or several TS_END blocks. The first volume will have TS_BITS and TS_CLRI blocks after TS_TAPE. The rest of the tape blocks are of type TS_INODE, storing raw inodes as stored on disk, followed by data blocks. In some cases a TS_INODE header can not store all information needed, in which case it can be extended with TS_ADDR blocks. Normally the first inodes written to tape is the entire directory structure of the whole file system, and file inodes make up the rest of the volumes.

The tape format comes in two versions. The first version is for the older UNIX file system, and the second version is for Berkeley's FFS file system. On top of that, data can be 16 or 32 bits wide, and big or little endian.

Notes:

  • Since dump reads low-level file system information like inode numbers, it's limited to a single partition.
  • If the first volume has been lost, it's likely that directories and filenames can't be recovered.
  • File data blocks may spill over to the next volume, but look out for that TS_TAPE block.

TAPE HEADER

This data is common to all headers, but some of the fields may be valid or ignored depending on the header type.

An int is 16 or 32 bits depending on the host.

The checksum is a plain two's complement sum of all ints in the 1024-byte heder block.

Name Description Size Value (old, new format)
Type Header type int TS_TAPE, TS_BITS, TS_CLRI, TS_INODE, TS_ADDR, TS_END
Date Date of dump 32 bits Timestamp
Ddate Dump from 32 bits Timestamp from previous incremental backup, or 0 for full
Volume Tape number int Tape in dump, from 1
Tapea Block number 32 bits Block in dump across all tapes, from 1
Inumber Inode number int File system inode number
Magic Format identifier int 60011 60012
Checksum Checksum int Filled in to make sum 84446 (modulo int)
Inode Inode data 64 or 128 bytes As stored on disk
Count Number of addr bytes
or data blocks
int
Addr Data present Array of bytes 0 or 1

The inode data looks like this:

Name Old Size New Size
Mode 16 bits 16 bits
Nlink 16 bits 16 bits
Uid 16 bits 16 bits
Gid 16 bits 16 bits
Size 32 bits 64 bits
Addr 40 bytes Not present
Atime 32 bits 32 bits
 ? Not present 32 bits
Mtime 32 bits 32 bits
 ? Not present 32 bits
Ctime 32 bits 32 bits
 ? Not present 92 bytes
Total 64 bytes 128 bytes

TS_TAPE

Header type 1. This header must start all volumes. The Date field must be same on all volumes. The Volume field identifies the individual tape in the dump. There are no data blocks.

TS_BITS and TS_CLRI

Header types 3 and 6. Stores bit arrays from the file system. The Count field specifies the number of data blocks following the header block.

TS_INODE and TS_ADDR

Header types 2 and 4. TS_INODE stores the inode data from disk. After the header block comes data blocks, as per the Addr array. An array item of one signals a data block is present on tape. A zero indicates a 1024-byte hole in the file. If a volume runs out of space during a run of data blocks, the next volume begins with a TS_TAPE block, and then follows the rest of the data blocks.

The Addr field is normally 256 byte at the most. If it can't accommodate a large file, several TS_ADDR may follow. A TS_ADDR header stores the same inode data, but following data blocks are appended to the file.

Ordinary files have bit 100000 (octal) set in the Mode field. Directories have bit 40000 set. Note that large directories may need additional TS_ADDR blocks.

The directory for the old format has fixed-length entries.

Size Description
16 bits Inode number
14 bytes ASCII file name

The directory for the new format has variable-length entries. An entry length of 0 signals the end of the directory.

Size Description
32 bits Inode number
16 bits Entry length
16 bits Name length
 ? bytes ASCII file name

TS_END

Header type 5. This header signals the end of the dump on the last volume. Several TS_END blocks may be written. There are no data blocks.