Difference between revisions of "UNIX V6 memory layout"

From Computer History Wiki
Jump to: navigation, search
(Cover mem + addr space generally)
m (Split I-D layout: minor clarifications)
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''UNIX V6 memory layout''', and how [[UNIX Sixth Edition]] uses the [[main memory]] and [[memory management]] of the system, is fairly simple. The [[operating system]]'s [[kernel]] (both [[instruction]]s and data) permanently occupies low physical memory; [[process]]es reside above the kernel, and can be [[swapping|swapped]] in and out, from [[secondary storage]], as needed. All physical memory above the OS is available for use by processes.
+
'''UNIX V6 memory layout''', and how [[UNIX Sixth Edition]] uses the [[main memory]] and [[memory management]] of the host [[PDP-11]] system, is fairly simple. The [[operating system]]'s [[kernel]] (both [[instruction]]s and data) permanently occupies low physical memory; [[process]]es reside above the kernel, and can be [[swapping|swapped]] in and out, from [[secondary storage]], as needed. All physical memory above the OS is available for use by processes.
  
 
[[PDP-11 Memory Management]] provides a number of [[segment]]s of [[address space]], both in kernel and [[user]] mode. Each segment can be independently mapped to main memory, but UNIX makes very limited use of the overall potential flexibility; rather, it is mostly used to provide a few large blocks.
 
[[PDP-11 Memory Management]] provides a number of [[segment]]s of [[address space]], both in kernel and [[user]] mode. Each segment can be independently mapped to main memory, but UNIX makes very limited use of the overall potential flexibility; rather, it is mostly used to provide a few large blocks.
  
Two segments of the kernel's address space are used for specialized purposes. One is for gaining access to the swappable per-process data of each process (including its kernel [[stack]]); this is the 'user' block (the term 'user' is re-used here). Thus, that data is thus always at the same [[virtual address]] in the kernel's address space, no matter which process is running. The other segment is set up to allow access to the [[UNIBUS]]'s so-called 'I/O Page', which holds [[peripheral]] and [[Central Processing Unit|CPU]] [[register]]s. The other kernel segments are used for on or two (see below) large block(s) containing the kernel's code and data; details of the kernel's address space and main memory layout are below.
+
Two segments of the kernel's [[virtual address]] space are used for specialized purposes. One segment is used for gaining access to the swappable per-process data of each process (including its kernel [[stack]]); this is the 'user' block (the term 'user' is re-used here). Thus, that data for the current process is thus always at the same virtual address in the kernel's address space, no matter which process is running. The other special segment is set up to allow access to the [[UNIBUS]]'s so-called 'I/O Page', which holds [[peripheral]] and [[Central Processing Unit|CPU]] [[register]]s. The other kernel segments are used for one or two (see below) large block(s) containing the kernel's code and data; details of the kernel's address space and main memory layout are given below.
  
User processes contain data in two separate blocks of virtual address space (one for the stack), and potentially also another block containing [[pure code]]. (If a user program has pure code, and two processes are running that program, they will share a single copy of it in main memory.) If the process does not have pure code, the instructions are in the same block as the data. The two data blocks are ''always'' stored contiguously in main memory, along with the 'user' block, and all three are always swapped as a unit; this simplifies physical memory allocation.
+
User processes contain data in two separate blocks of virtual address space (one for the stack), and potentially also another block containing [[pure code]]. (If a user program has pure code, and two processes are running that program, they will share a single copy in main memory of its pure code.) If the process does not have pure code, the instructions are in the same block as the data. The two data blocks are ''always'' stored contiguously in main memory, along with the 'user' block, and all three are always swapped as a unit; this simplifies physical memory allocation and swapping code.
  
 
==Two variants in kernel layout==
 
==Two variants in kernel layout==
  
The details of how the kernel of UNIX V6 uses the main memory of the system is fundamentally different between the two types of PDP-11 Memory Management provided by the various models of [[PDP-11]] on which UNIX V6 runs.
+
The details of how the kernel of UNIX V6 uses the main memory of the system is fundamentally different between the two types of PDP-11 Memory Management provided by the various models of PDP-11 on which UNIX V6 runs.
  
 
===Single-space layout===
 
===Single-space layout===
Line 17: Line 17:
 
The organization of the kernel's address space is very similar to that of a Unix process: the [[object code]] is in the low part of the kernel's address space, with initialized data above it, and the so-called [[BSS]] (un-initialized data) above that.
 
The organization of the kernel's address space is very similar to that of a Unix process: the [[object code]] is in the low part of the kernel's address space, with initialized data above it, and the so-called [[BSS]] (un-initialized data) above that.
  
This exactly matches the layout of all this in actual physical main memory; the operating system is in the lowest physical memory, with the code at the bottom, and kernel data above that. There are 8 segments available when memory management is enabled; the first 6 are used to hold the kernel's instructions and data, as described. As mentioned above, the 7th segment is used to gain access to the swappable per-process data of each process, and the 8th is set up to allow access to the UNIBUS 'I/O Page'.
+
This exactly matches the layout of all this in actual physical main memory; the OS is in the lowest physical memory, with the code at the bottom, and kernel data above that. There are 8 segments available when memory management is enabled; the first 6 are used to hold the kernel's instructions and data, as described. As mentioned above, the 7th segment is used to gain access to the swappable per-process data of each process, and the 8th is set up to allow access to the UNIBUS 'I/O Page'.
  
 
Note that the allocation of 6 segments to the kernel's code and data provides a hard limit on the size of the kernel, including the number of [[disk]] [[buffer]]s, etc. (Later versions of UNIX supported [[overlay]]s in the kernel, to remove this limitation.) The stock V6 ''does not'' check to make sure that a system build respects this limit; it is perfectly possible to build images that take more than 6 segments (48KB) to hold them, which ''will'' fail in bizarre and un-predictable ways.
 
Note that the allocation of 6 segments to the kernel's code and data provides a hard limit on the size of the kernel, including the number of [[disk]] [[buffer]]s, etc. (Later versions of UNIX supported [[overlay]]s in the kernel, to remove this limitation.) The stock V6 ''does not'' check to make sure that a system build respects this limit; it is perfectly possible to build images that take more than 6 segments (48KB) to hold them, which ''will'' fail in bizarre and un-predictable ways.
Line 23: Line 23:
 
===Split I-D layout===
 
===Split I-D layout===
  
The more powerful models of the PDP-11 supported by V6 (the [[PDP-11/45]] and [[PDP-11/70]]) provide two independent 16-bit address spaces for code and data, for both the kernel, and for user processes. (Two of the data segments have fixed, special usage, as described above.) The way the code and data blocks are allocated to actual physical main memory for the OS are non-obvious, though.
+
The more powerful models of the PDP-11 supported by V6 (the [[PDP-11/45]] and [[PDP-11/70]]) provide two independent 16-bit address spaces for code and data, for both the kernel, and for user processes. (Two of the data segments have fixed, special usage, as described above.) The way the code and data blocks are allocated to actual physical main memory for the kernel are non-obvious, though.
  
 
All the kernel data (both initialized and un-initialized), along with a small amount of code, is placed in low ''physical'' memory. The advantage of doing this is that the physical address of any kernel data is the same as its virtual address, so when setting up a peripheral to do [[Direct Memory Access|DMA]] (which ''always'' uses physical addresses), no address translation is necessary.
 
All the kernel data (both initialized and un-initialized), along with a small amount of code, is placed in low ''physical'' memory. The advantage of doing this is that the physical address of any kernel data is the same as its virtual address, so when setting up a peripheral to do [[Direct Memory Access|DMA]] (which ''always'' uses physical addresses), no address translation is necessary.
Line 33: Line 33:
 
First, the Unix [[linker]] is not prepared to produce output files in which the data is below the code (in file terms), which is what would be needed to simply load the file produced by the linker into main memory, and have things wind up at the correct physical locations.
 
First, the Unix [[linker]] is not prepared to produce output files in which the data is below the code (in file terms), which is what would be needed to simply load the file produced by the linker into main memory, and have things wind up at the correct physical locations.
  
Rather than complicate the [[bootstrap]] (which, in V6, is a single-stage which has to fit into a single [[disk block]]) to perform this transformation, a separate program, 'sysfix', post-processes the linker's output to produce a file where the initialized data is below the code; it also moves the code's [[absolute address]] up, so that it starts at the start of the second page of the instruction space.
+
Rather than complicate the [[bootstrap]] (which, in V6, is a single stage which has to fit into a single [[disk block]]) to perform this transformation, a separate program, 'sysfix', post-processes the linker's output [[object code]] to produce a file where the initialized data is placed below the code; it also moves the code's [[absolute address]] up, so that it starts at the start of the second page of the instruction space.
  
 
This is because of that small amount of code in low memory; instruction address space segment 0 is used to contain this (once memory management is enabled), so that it will appear in the address space at the same place after memory management is enabled as before (so that this code, which is what is running when memory management is turned on, will appear in the same place during that); this means that the system's actual code must start in segment 1.
 
This is because of that small amount of code in low memory; instruction address space segment 0 is used to contain this (once memory management is enabled), so that it will appear in the address space at the same place after memory management is enabled as before (so that this code, which is what is running when memory management is turned on, will appear in the same place during that); this means that the system's actual code must start in segment 1.
Line 50: Line 50:
 
** [[Running UNIX v6 in SIMH]]
 
** [[Running UNIX v6 in SIMH]]
 
* [[Installing UNIX Sixth Edition on Ersatz-11]]
 
* [[Installing UNIX Sixth Edition on Ersatz-11]]
 +
 +
{{Nav Unix}}
  
 
[[Category: UNIX]]
 
[[Category: UNIX]]

Revision as of 14:33, 23 June 2022

UNIX V6 memory layout, and how UNIX Sixth Edition uses the main memory and memory management of the host PDP-11 system, is fairly simple. The operating system's kernel (both instructions and data) permanently occupies low physical memory; processes reside above the kernel, and can be swapped in and out, from secondary storage, as needed. All physical memory above the OS is available for use by processes.

PDP-11 Memory Management provides a number of segments of address space, both in kernel and user mode. Each segment can be independently mapped to main memory, but UNIX makes very limited use of the overall potential flexibility; rather, it is mostly used to provide a few large blocks.

Two segments of the kernel's virtual address space are used for specialized purposes. One segment is used for gaining access to the swappable per-process data of each process (including its kernel stack); this is the 'user' block (the term 'user' is re-used here). Thus, that data for the current process is thus always at the same virtual address in the kernel's address space, no matter which process is running. The other special segment is set up to allow access to the UNIBUS's so-called 'I/O Page', which holds peripheral and CPU registers. The other kernel segments are used for one or two (see below) large block(s) containing the kernel's code and data; details of the kernel's address space and main memory layout are given below.

User processes contain data in two separate blocks of virtual address space (one for the stack), and potentially also another block containing pure code. (If a user program has pure code, and two processes are running that program, they will share a single copy in main memory of its pure code.) If the process does not have pure code, the instructions are in the same block as the data. The two data blocks are always stored contiguously in main memory, along with the 'user' block, and all three are always swapped as a unit; this simplifies physical memory allocation and swapping code.

Two variants in kernel layout

The details of how the kernel of UNIX V6 uses the main memory of the system is fundamentally different between the two types of PDP-11 Memory Management provided by the various models of PDP-11 on which UNIX V6 runs.

Single-space layout

For the PDP-11/40-type 'limited subset' memory management (the /40 is the only model supported by the stock V6, but it is relatively easy to get it running on the PDP-11/34 and the PDP-11/23, which also only support the same limited subset), the memory management hardware provides only a single 16-bit address space.

The organization of the kernel's address space is very similar to that of a Unix process: the object code is in the low part of the kernel's address space, with initialized data above it, and the so-called BSS (un-initialized data) above that.

This exactly matches the layout of all this in actual physical main memory; the OS is in the lowest physical memory, with the code at the bottom, and kernel data above that. There are 8 segments available when memory management is enabled; the first 6 are used to hold the kernel's instructions and data, as described. As mentioned above, the 7th segment is used to gain access to the swappable per-process data of each process, and the 8th is set up to allow access to the UNIBUS 'I/O Page'.

Note that the allocation of 6 segments to the kernel's code and data provides a hard limit on the size of the kernel, including the number of disk buffers, etc. (Later versions of UNIX supported overlays in the kernel, to remove this limitation.) The stock V6 does not check to make sure that a system build respects this limit; it is perfectly possible to build images that take more than 6 segments (48KB) to hold them, which will fail in bizarre and un-predictable ways.

Split I-D layout

The more powerful models of the PDP-11 supported by V6 (the PDP-11/45 and PDP-11/70) provide two independent 16-bit address spaces for code and data, for both the kernel, and for user processes. (Two of the data segments have fixed, special usage, as described above.) The way the code and data blocks are allocated to actual physical main memory for the kernel are non-obvious, though.

All the kernel data (both initialized and un-initialized), along with a small amount of code, is placed in low physical memory. The advantage of doing this is that the physical address of any kernel data is the same as its virtual address, so when setting up a peripheral to do DMA (which always uses physical addresses), no address translation is necessary.

(The small amount of code in low memory includes code which runs while booting, before memory management is enabled, and also interrupt vectors, which the PDP-11 hardware requires to be in low kernel data memory.)

This has implications for the layout of the code and data in the system image (in the file which contains it), in the kernel address space, and for the loading and initialization of the system.

First, the Unix linker is not prepared to produce output files in which the data is below the code (in file terms), which is what would be needed to simply load the file produced by the linker into main memory, and have things wind up at the correct physical locations.

Rather than complicate the bootstrap (which, in V6, is a single stage which has to fit into a single disk block) to perform this transformation, a separate program, 'sysfix', post-processes the linker's output object code to produce a file where the initialized data is placed below the code; it also moves the code's absolute address up, so that it starts at the start of the second page of the instruction space.

This is because of that small amount of code in low memory; instruction address space segment 0 is used to contain this (once memory management is enabled), so that it will appear in the address space at the same place after memory management is enabled as before (so that this code, which is what is running when memory management is turned on, will appear in the same place during that); this means that the system's actual code must start in segment 1.

The file does not contain the BSS, though, so after the system image is loaded into main memory by the bootstrap, as part of the OS's initialization the code must be moved up in real memory, to make room for the BSS.

See also