UNIX V6 internals

From Computer History Wiki
Jump to: navigation, search

The internals of UNIX V6 are relatively easy to understand, particularly with the aid of the (justly) famed Lions book, John Lions' Commentary on UNIX 6th Edition. Reading UNIX V6 memory layout, which explains V6's use of physical main memory, and PDP-11 Memory Management, is recommended as a first step, before tackling Lions.

Here are a few notes on various specific topics which are covered in more detail than in Lions, and which may be helpful.


UNIX kernels consists of a number of smallish relocatable binary modules which are statically linked into a large object code image; during booting, that image is loaded into memory and started. After the usual initialization, the '0th' process (which in V6 has the task of swapping processes in and out) is hand-crafted; if then 'forks' into two, using the standard UNIX fork() system call (which creates a 'clone' of the existing process).

The second process then runs a tiny program (the binary for which is built into the kernel) which does an 'exec()' system call, to read in and run "/etc/init". That process/command then does another fork(), and the child process from that exec()'s the shell (command interpreter, in "/bin/sh").

savu(), retu() and aretu()

These are the primitives which switch, or otherwise adjust, stacks in the kernel. aretu() is significantly different from retu() - the latter switches to a different process/stack, whereas aretu() does a 'non-local goto' in the current process. (Actually, technically, it switches to a different stack frame on the current stack; when the subroutine which had called aretu() returns, it does not return to its caller, but instead returns to the procedure which had called savu().) In addition to switching to a different stack, retu() also switches to a saved stack frame on that stack, in the same manner as aretu().

Note that in PDP-11 C all stack frames look identical, but this is not true of other machines/compilers. So if subroutine A calls savu(), and subroutine B calls aretu(), when the call to aretu() returns, procedure B is still running, but on procedure A's stack frame. So on machines where A's stack frame looks different from B's, hilarity ensues; this famously caused a serious problem when Unix Seventh Edition was moved to the Interdata 8/32.

rsav, qsav and ssav

There are actually three sets of saved stack information stored in the 'user' structure (the swappable per-process data block):

	int	u_rsav[2];	/* save r5,r6 when exchanging stacks */
	int	u_qsav[2];	/* label variable for quits and interrupts */
	int	u_ssav[2];	/* label variable for swapping */

Calls to retu(), the primitive to switch stacks/processes, always use rsav. The others are for 'non-local gotos' inside a process.

One can think of the qsav as a poor man's exception handler for software interrupts to a process while it's asleep in the kernel, waiting for something. When a process is interrupted out of such a wait, rather than the sleep() call returning, the process wakes up returning from the procedure that did the savu(qsav). (That happens because sleep() - which is the procedure that's running when the call to aretu(qsav) returns - does a return immediately after restoring the stack to the frame saved in qsav.) The intervening call chain (from the savu() to the sleep()) is discarded - which is why UNIX has that annoying "Interrupted system call" system call error return.

The ssav is used in association with swapping; when a process is swapped out, since that can happen in a number of ways/places, the call stack can contain calls to various things like expand(), etc. When it's swapped back in, apparently rather than include code in all those places to unwind the call stack, the simplest thing to do is to just throw that all away, and have it go back to where it was just before it was decided to swap it out.

exec() and pure-text images

For simple programs/commands (stored in the file system in a file, and read into main memory to execute them), UNIX divides the address space of a process into a mixed 'text/data' segment and a 'stack' segment, using hardware support from PDP-11 Memory Management. The text/data segment contains both object code, and data.

UNIX also has the ability to provide separate 'text' (referred to as a 'pure text') and 'data' segments; the text segment will be read-only, and shared between all processes executing that program/command.

(Note that for simplicity, the data and stack segments are stored contiguously in physical main memory, although they are in separate segments in the address space of the process.)

The operation of the exec() system call (which replaces the contents of the address space of a process with new contents, read in from a program/command stored in a file) is fairly simple for mixed programs/commands: a block of main memory large enough to hold the new program/command is allocated (the process may pause to be swapped out before this happens), the text and data is read in, and off it goes.

Both newproc() (the routine that implements the fork() operation) and expand() (the routine used to re-size the data/stack segments of a process) have code which uses a simple memory-memory copy if there is enough free memory to do what they have been requested to do.

Things are considerably more complex when the new program/command has a pure text.

In such a case, exec() calls xalloc(), which, if the text was not already available (either in main memory, or swapped out), reads in a copy of the pure text from the file, and then always moves that (contiguous) copy out to the swap device. In both this case, and if the text was in use, but not already in main memory, it also swaps out the rest of the process, because, as the code explains:

  if the calling process
  is misplaced in core the text image might not fit.
  Quite possibly the code after "out:" could check to
  see if the text does fit and simply swap it in.

The first part of the comment is applicable even in the case where the text was just read into main memory: note that the process doesn't yet have a data or stack segment allocated at that point, just the 'user' area. (The two will be stored contiguously with the 'user' area, when they are eventually added.)

The data and stack segments will be set up later in the exec() code, after the call to xalloc() returns, after the process is swapped back in, but this may be impossible if the text segment is 'poorly' placed in main memory - which is why the in-memory copy is discarded. (This is more likely to be an issue on systems with a limited amount of main memory, of course.)

(In fact, the process will never have anything other than a 'user' area when in xalloc(); the only call to xalloc() in the system is immediately preceeded by an "expand(USIZE)" which throws away everything except the 'user' area.)

(It would be possible to do what the comment suggests - adding code to check to see if there's enough memory available to hold the pure text as well as the rest of the process, and if so, avoiding the swap-out/in of the 'user' segment of the process. It appears that something like this was done in PWB/UNIX, which is otherwise mostly very similar to V6; xalloc() calls xexpand(), which only swaps out the process if there is no room for the pure text. In PWB1, though, the data and stack segments have already been set up when this happens.)

Anyway, unlike the mixed text/data case, a considerable number of swap operations inevitably result, (the exact number depending on exactly what the situation is with main memory); for the first instance of a pure-text program/command, at least:

  • swapping out the text
  • swapping out the 'user' area
  • swapping the text back in
  • swapping the 'user' area back in

The last two are performed by the so-called 'swapping' process (process 0), which runs in a permanent loop in the kernel, in sched(); its job is to find processes that need to be swapped in, and do so (perhaps after swapping others out, to make room).

Note that at that point, after the two swaps in, the process still does not have its data and stack segments; therefore, the call to expand(), after the return from xalloc(), to allocate the memory for them, may also result in a swap out/in cycle.

See also

External links