PDP-11 C stack operation
PDP-11 C stack operation is explained in detail by an old 'C Calling ('Internal Workings of PDP-11 C Programs')' help file, which I wrote in about 1978.
To give a tiny bit of background, C on the PDP-11 makes heavy use of the stack; on subroutine calls, arguments are passed on the stack (as part of the usage of the stack for the call stack), and automatic data is also kept on the stack (all addressed using the frame pointer, R5).
To give an example, this test routine:
/* Show C stack usage.
*/
foo(a, b)
int a, b;
{ int x, y;
x = 0;
bar(1, a);
return(y);
}
produces this output:
.globl _foo .text _foo: ~a=4 ~b=6 ~x=177770 ~y=177766 jsr r5,csv sub $4,sp clr -10(r5) mov 4(r5),(sp) mov $1,-(sp) jsr pc,*$_bar tst (sp)+ mov -12(r5),r0 L1:jmp cret
The only things a PDP-11 C subroutine needs in its environment are i) a stack; ii) the arguments, and return point, on the top of the stack. Two special elements of run-time support, csv and cret, set up and tear down the stack frame (on entry and exit, respectively); csv will set up the frame pointer (the old contents of which are saved via the "jsr r5" which starts csv), making no assumptions about the old contents of R5.
C routines respect all registers except R0 and R1 (which are also used to hold return values; R1 only when a long is returned), and expect the same of routines they call.
Note that the top location on the stack is a scratch word (set up by csv). To call another subroutine, the arguments are pushed, the routine is called (which pushes the return PC), and on return (which pops the return PC), the arguments are discarded by the caller.
Details
THE INTERNAL WORKINGS OF PDP-11 C PROGRAMS
Noel Chiappa - MIT/LCS/CSR
This is a description of the internal workings of any given
compiled C program output by the UNIX C compiler. C is a stack frame
language, using R5 as the stack frame pointer. For simplicity, R5 will
hereafter be called the FP (frame pointer). Note that arguments are
generally passed on the stack and answers returned in the registers.
Recall also that C global names generally start with an "_" tacked on
in front of the declared name. Generally only routines and EXTERNALS
(both implicit and declared) are given the honour of global names.
On entry (in the UNIX environment - for a discussion of
stand alone C, see the end), the SP points to the lowest location
on a stack that looks as follows:
Address Word
<Or> Address Hibyte Lobyte
177776 0 ARGV[ARGC-1][n]
.
.
&ARGV[ARGC-1][0]-1 ARGV[ARGC-1][0] 0
. ARGV[ARGC-2][n] ARGV[ARGC-2][n-1]
.
.
&ARGV[0][0] ARGV[0][1] ARGV[0][0]
&&ARGV[ARGC] 177777
&&ARGV[ARGC-1] &ARGV[ARGC-1][0]
.
.
&&ARGV[0] &ARGV[0][0]
SP--> 0[SP] ARGC
On entry, a routine called CRT0 is executed. It comes in
several flavors, depending on the surrounding environment:
CRT0 Ordinary vanilla
FCRT0 For programs with the floating point hardware simulator
MCRT0 If the PROFIL option is in use.
The basic effect of CRT0 is to set the SP one word lower, move ARGC
into that, put &&ARGV[0][0] into the location above that, and leave
the SP pointing to the bottom of the stack. It then does a JSR PC,
_MAIN. The basic effect is to leave the bottom of the stack looking
like this:
.
.
&&ARGV[0] &ARGV[0][0]
2(SP) &&ARGV[0]
SP--> 0(SP) ARGC
This concludes the special handling. MAIN acts just like all
other C routines, so the following discussion applies to it too. C
routines expect their arguments on the stack and return values in the
low register(s).(Now you know why you can only return one value!)
All arguments are passed by value, so in general you only pass simple
variables, with no arrays or structures or suchlike. They are in
reverse order, with the first arguments at the top of the stack, and
the last lowermost. (For longs, reals and doubles, the format is
standard PDP-11 format; the highest order word is in the highest
number word/register.) The topmost position (@SP) is of course the
return PC. Routines do not remove their arguments from the stack.
The first move of all (compiled) C routines is to do a
JSR FP, CSV. This is a general routine that does stack frame set up and
saves the old register set. It first sets FP to the current SP.
(Remember that the JSR will have saved the old FP on the stack.)
It then pushes registers 2 through 4. (Remember about being only able
to use 3 REGISTER variables?) It then does (of all idiotic things) a
JSR PC, @R0.(R0 is where it saved the return point which had been held
in FP. It's idiotic because they throw away the PC that the JSR
stores, so a JMP @R0 would have done just as well. I suspect that they
use a JSR for the side effect, possibly having to do with a C protocol
about the top of the stack being a scratch location. Oh well.) At this
point the stack looks like:
.
.
<4+2N>(FP) ArgN
.
.
4(FP) Arg0
2(FP) Old PC (From calling routine)
FP--> 0(FP) Old FP
. Old R4
. Old R3
. Old R2
SP--> 0(SP) Old PC (From CSV - unused)
Note that this top word is unused - any C routine that uses the
stack will write over it. The next thing done is to subtract an
appropriate amount from SP to allocate space for automatic storage.
(Static storage will be discussed in a moment.) All references to
arguments are thus positive relative to the FP, and references to auto
storage are negative relative to the FP. Temporaries are on top of
that, but are generally accessed via the SP.
Static data comes in two flavors - global and local. Global
can be initialized, and if initialized lives in what is called
the DATA segment. Local static cannot and lives in the BSS segment.
(BSS stands for Block Started by Symbol; it was originally IBM 7094
terminology for "a block of reserved storage".) It is where
uninitializeable static lives, as opposed to initialized, which is
in the DATA segment.) Unitialized global static also lives in the BSS
segment. In programs that only use I space, the order is TEXT, DATA
and BSS, with TEXT starting at 0, and the DATA and BSS segments
contiguous after it. (Note that in shared pure files DATA will start
on a 4K boundary.) In programs that use both I and D spaces, DATA will
also start at 0.
The rest of the internal workings of any given compiled
C routine should be obvious to anyone with sufficient PDP-11
Assembly Language experience[1][2]. Generous use of the compiler -S
option for a while will soon make it possible for you to start
grubbing directly via ADB. [3] is also highly recommended to all who
want to know how this garbage comes to be.
At the end of each routine, the routine stores its return value
(if any) in the appropriate register(s) and does a JMP CRET. The
companion routine to CSV, CRET does the inverse of the former. CRET
is a cleanup routine that goes through and restores register 2 through
4, restores SP (it is set to the current FP, which is, as you will
remember, the old top of stack), restores the FP from the stack,
and does an RTS PC, thereby popping the old PC and leaving the stack as
it was at the tme of the call.
If EXIT or _EXIT is called explicitly, they simply put
their argument in R0 (for use by the EXIT call - note that if
not explicitly specified this may well be garbage) and do an
EXIT call. The difference is that EXIT makes a call to
_CLEANUP before dying. As appropriate, it stores the old FP
and gets a new one from the value of SP just before the JSR.
Failing that, when MAIN exits, CRT0 calls _EXIT, with
the returned value as an argument.
The reason that CRT (C Run Time support) is what starts
up is that UNIX C compiler automatically links in a CRT file of
some sort unless specifically told not to via the -c option.
The stack pointer and arguments will have been set up
by the UNIX system during the EXEC system call. If you want to
use C in a stand alone program, you will have to provide
your own substitute for the initial startup, and you may want
to provide your own version of such things as CSV, etc.
[1] Digital Equipment Corporation, "PDP-11 Processor Handbook,"
D.E.C.
[2] Ritchie, D.M., "The UNIX Assembler,"
Bell Labs Memo, available as part of the UNIX documentation.
[3] Ritchie, D.M., "A Tour Through the UNIX C Compiler,"
Bell Labs Memo, available online in UNIX.
External links
- csv.s - original V6 csv and cret source code
- csv.s - long-return-safe source
- crt0.s