Programming in C - A Tutorial
From Computer History Wiki
This appears to have been written for the inclusion with Sixth Edition Unix.
Contents
Chapter 1
Programming in C _ A Tutorial
Brian W. Kernighan
Bell Laboratories, Murray Hill, N. J.
1
_. Introduction
____________
C is a computer language available on the and operating
systems at Murray Hill and (in preliminary form) on OS/360
at Holmdel. C lets you write your programs clearly and sim-
ply _ it has decent control flow facilities so your code can
be read straight down the page, without labels or GOTO's; it
lets you write code that is compact without being too cryp-
tic; it encourages modularity and good program organization;
and it provides good data-structuring facilities.
This memorandum is a tutorial to make learning C as
painless as possible. The first part concentrates on the
central features of C; the second part discusses those parts
of the language which are useful (usually for getting more
efficient and smaller code) but which are not necessary for
the new user. This is not
___ a reference manual. Details and
special cases will be skipped ruthlessly, and no attempt
will be made to cover every language feature. The order of
presentation is hopefully pedagogical instead of logical.
Users who would like the full story should consult the C
_
Reference
_________ Manual
______ by D. M. Ritchie [1], which should be read
for details anyway. Runtime support is described in [2] and
[3]; you will have to read one of these to learn how to com-
pile and run a C program.
We will assume that you are familiar with the mysteries
of creating files, text editing, and the like in the operat-
ing system you run on, and that you have programmed in some
language before.
2
_. A
_ Simple
______ C
_ Program
_______
main( ) {
printf("hello, world"); }
A C program consists of one or more functions
_________, which
are similar to the functions and subroutines of a Fortran
program or the procedures of PL/I, and perhaps some external
data definitions. is such a function, and in fact all C
programs must have a Execution of the program begins at the
first statement of will usually invoke other functions to
perform its job, some coming from the same program, and oth-
ers from libraries.
One method of communicating data between functions is
by arguments. The parentheses following the function name
surround the argument list; here is a function of no argu-
October 10, 1975
- 2 -
ments, indicated by ( ). The {} enclose the statements of
the function. Individual statements end with a semicolon
but are otherwise free-format.
is a library function which will format and print out-
put on the terminal (unless some other destination is speci-
fied). In this case it prints hello, world A function is
invoked by naming it, followed by a list of arguments in
parentheses. There is no statement as in Fortran or
3
_. A
_ Working
_______ C
_ Program
_______; Variables
_________; Types
_____ and
___ Type
____
Declarations
____________
Here's a bigger program that adds three integers and
prints their sum. main( ) {
int a, b, c, sum;
a = 1; b = 2; c = 3;
sum = a + b + c;
printf("sum is %d", sum); }
Arithmetic and the assignment statements are much the
same as in Fortran (except for the semicolons) or The format
of C programs is quite free. We can put several statements
on a line if we want, or we can split a statement among
several lines if it seems desirable. The split may be
between any of the operators or variables, but not
___ in the
middle of a name or operator. As a matter of style, spaces,
tabs, and newlines should be used freely to enhance reada-
bility.
C has four fundamental types
_____ of variables:
int integer (PDP-11: 16 bits; H6070: 36 bits; IBM360: 32 bits)
char one byte character (PDP-11, IBM360: 8 bits; H6070: 9 bits)
float single-precision floating point
double double-precision floating point
There are also arrays
______ and structures
__________ of these basic types,
pointers
________ to them and functions
_________ that return them, all of
which we will meet shortly.
All
___ variables in a C program must be declared, although
this can sometimes be done implicitly by context. Declara-
tions must precede executable statements. The declaration
int a, b, c, sum; declares and to be integers.
Variable names have one to eight characters, chosen
from A-Z, a-z, 0-9, and (ul, and start with a non-digit.
Stylistically, it's much better to use only a single case
and give functions and external variables names that are
unique in the first six characters. (Function and external
variable names are used by various assemblers, some of which
are limited in the size and case of identifiers they can
handle.) Furthermore, keywords and library functions may
October 10, 1975
- 3 -
only be recognized in one case.
4
_. Constants
_________
We have already seen decimal integer constants in the
previous example _ 1, 2, and 3. Since C is often used for
system programming and bit-manipulation, octal numbers are
an important part of the language. In C, any number that
begins with 0 (zero!) is an octal integer (and hence can't
have any 8's or 9's in it). Thus 0777 is an octal constant,
with decimal value 511.
A ``character'' is one byte (an inherently machine-
dependent concept). Most often this is expressed as a
character
_________ constant
________, which is one character enclosed in sin-
gle quotes. However, it may be any quantity that fits in a
byte, as in below: char quest, newline, flags; quest = '?';
newline = '\n'; flags = 077;
The sequence `\n' is C notation for ``newline charac-
ter'', which, when printed, skips the terminal to the begin-
ning of the next line. Notice that `\n' represents only a
single character. There are several other ``escapes'' like
`\n' for representing hard-to-get or invisible characters,
such as `\t' for tab, `\b' for backspace, `\0' for end of
file, and `\\' for the backslash itself.
and constants are discussed in section 26.
5
_. Simple
______ I
_/O
_ _ getchar
_______, putchar
_______, printf
______
main( ) {
char c;
c = getchar( );
putchar(c); }
and are the basic I/O library functions in C. fetches
one character from the standard input (usually the terminal)
each time it is called, and returns that character as the
value of the function. When it reaches the end of whatever
file it is reading, thereafter it returns the character
represented by `\0' (ascii which has value zero). We will
see how to use this very shortly.
puts one character out on the standard output (usually
the terminal) each time it is called. So the program above
reads one character and writes it back out. By itself, this
isn't very interesting, but observe that if we put a loop
around this, and add a test for end of file, we have a com-
plete program for copying one file to another.
is a more complicated function for producing formatted
output. We will talk about only the simplest use of it.
Basically, uses its first argument as formatting informa-
October 10, 1975
- 4 -
tion, and any successive arguments as variables to be out-
put. Thus printf ("hello, world\n"); is the simplest use _
the string ``hello, world\n'' is printed out. No formatting
information, no variables, so the string is dumped out ver-
batim. The newline is necessary to put this out on a line
by itself. (The construction "hello, world\n" is really an
array of More about this shortly.)
More complicated, if is 6, printf ("sum is %d\n", sum);
prints sum is 6 Within the first argument of the characters
``%d'' signify that the next argument in the argument list
is to be printed as a base 10 number.
Other useful formatting commands are ``%c'' to print
out a single character, ``%s'' to print out an entire
string, and ``%o'' to print a number as octal instead of de-
cimal (no leading zero). For example, n = 511; printf
("What is the value of %d in octal?", n); printf (" %s! %d
decimal is %o octal\n", "Right", n, n); prints
What is the value of 511 in octal? Right! 511 decimal is
777 octal Notice that there is no newline at the end of the
first output line. Successive calls to (and/or for that
matter) simply put out characters. No newlines are printed
unless you ask for them. Similarly, on input, characters
are read one at a time as you ask for them. Each line is
generally terminated by a newline (\n), but there is other-
wise no concept of record.
October 10, 1975
Chapter 2
1
_. If
__; relational
__________ operators
_________; compound
________ statements
__________
The basic conditional-testing statement in C is the
statement: c = getchar( ); if( c '?' )
printf("why did you type a question mark?\n"); The sim-
plest form of is if (expression) statement
The condition to be tested is any expression enclosed
in parentheses. It is followed by a statement. The expres-
sion is evaluated, and if its value is non-zero, the state-
ment is executed. There's an optional clause, to be
described soon.
The character sequence `==' is one of the relational
operators in C; here is the complete set:
equal to (EQ to Fortraners) != not equal to
> greater than < less than >= greater than or equal
to <= less than or equal to
The value of is 1 if the relation is true, and 0 if
false. Don't forget that the equality test is `=='; a sin-
gle `=' causes an assignment, not a test, and invariably
leads to disaster.
Tests can be combined with the operators and For exam-
ple, we can test whether a character is blank or tab or new-
line with if( c' ' c'\t' c'\n' ) C guarantees that and are
evaluated left to right _ we shall soon see cases where this
matters.
One of the nice things about C is that the part of an
can be made arbitrarily complicated by enclosing a set of
statements in {}. As a simple example, suppose we want to
ensure that is bigger than as part of a sort routine. The
interchange of and takes three statements in C, grouped to-
gether by {}: if (a < b) {
t = a;
a = b;
b = t; }
As a general rule in C, anywhere you can use a simple
statement, you can use any compound statement, which is just
a number of simple or compound ones enclosed in {}. There
is no semicolon after the } of a compound statement, but
there is
__ a semicolon after the last non-compound statement
inside the {}.
The ability to replace single statements by complex
ones at will is one feature that makes C much more pleasant
to use than Fortran. Logic (like the exchange in the previ-
ous example) which would require several GOTO's and labels
in Fortran can and should be done in C without any, using
compound statements.
October 10, 1975
- 2 -
2
_. While
_____ Statement
_________; Assignment
__________ within
______ an
__ Expression
__________; Null
____
Statement
_________
The basic looping mechanism in C is the statement.
Here's a program that copies its input to its output a char-
acter at a time. Remember that `\0' marks the end of file.
main( ) {
char c;
while( (c=getchar( )) != '\0' )
putchar(c); } The statement is a loop, whose gen-
eral form is while (expression) statement Its meaning is (a)
evaluate the expression (b) if its value is true (ie, not
zero)
do the statement, and go back to (a) Because the
expression is tested before the statement is executed, the
statement part can be executed zero times, which is often
desirable. As in the statement, the expression and the
statement can both be arbitrarily complicated, although we
haven't seen that yet. Our example gets the character, as-
signs it to and then tests if it's a `\0''. If it is not a
`\0', the statement part of the is executed, printing the
character. The then repeats. When the input character is
finally a `\0', the terminates, and so does
Notice that we used an assignment statement c =
getchar( ) within an expression. This is a handy notational
shortcut which often produces clearer code. (In fact it is
often the only way to write the code cleanly. As an exer-
cise, re-write the file-copy without using an assignment in-
side an expression.) It works because an assignment state-
ment has a value, just as any other expression does. Its
value is the value of the right hand side. This also im-
plies that we can use multiple assignments like x = y = z =
0; Evaluation goes from right to left.
By the way, the extra parentheses in the assignment
statement within the conditional were really necessary: if
we had said c = getchar( ) != '\0' would be set to 0 or 1
depending on whether the character fetched was an end of
file or not. This is because in the absence of parentheses
the assignment operator `=' is evaluated after the relation-
al operator `!='. When in doubt, or even if not, paren-
thesize.
Since returns as its function value, we could also copy
the input to the output by nesting the calls to and main( )
{
while( putchar(getchar( )) != '\0' ) ; } What statement
is being repeated? None, or technically, the null
____ state-
ment, because all the work is really done within the test
part of the This version is slightly different from the pre-
vious one, because the final `\0' is copied to the output
before we decide to stop.
October 10, 1975
- 3 -
3
_. Arithmetic
__________
The arithmetic operators are the usual `+', `(mi', `*',
and `/' (truncating integer division if the operands are
both and the remainder or mod operator `%': x = a%b; sets to
the remainder after is divided by (i.e., The results are
machine dependent unless and are both positive.
In arithmetic, variables can usually be treated like
variables. Arithmetic on characters is quite legal, and of-
ten makes sense: c = c + 'A' - 'a'; converts a single lower
case ascii character stored in to upper case, making use of
the fact that corresponding ascii letters are a fixed dis-
tance apart. The rule governing this arithmetic is that all
are converted to before the arithmetic is done. Beware that
conversion may involve sign-extension _ if the leftmost bit
of a character is 1, the resulting integer might be nega-
tive. (This doesn't happen with genuine characters on any
current machine.)
So to convert a file into lower case: main( ) {
char c;
while( (c=getchar( )) != '\0' )
if( 'A'<=c && c<='Z' )
putchar(c+'a'-'A');
else
putchar(c); } Characters have different sizes
on different machines. Further, this code won't work on an
IBM machine, because the letters in the ebcdic alphabet are
not contiguous.
4
_. Else
____ Clause
______; Conditional
___________ Expressions
___________
We just used an after an The most general form of is if
(expression) statement1 else statement2 the part is option-
al, but often useful. The canonical example sets to the
minimum of and if (a < b)
x = a; else
x = b; Observe that there's a semicolon after
C provides an alternate form of conditional which is
often more concise. It is called the ``conditional expres-
sion'' because it is a conditional which actually has a
value and can be used anywhere an expression can. The value
of a<b ? a : b; is if is less than it is otherwise. In gen-
eral, the form expr1 ? expr2 : expr3 means ``evaluate If it
is not zero, the value of the whole thing is otherwise the
value is
To set to the minimum of and then: x = (a<b ? a : b);
The parentheses aren't necessary because is evaluated before
`=', but safety first.
Going a step further, we could write the loop in the
October 10, 1975
- 4 -
lower-case program as while( (c=getchar( )) != '\0' )
putchar( ('A'<=c && c<='Z') ? c-'A'+'a' : c );
and can be used to construct logic that branches one of
several ways and then rejoins, a common programming struc-
ture, in this way: if()
{} else if()
{} else if()
{} else
{} The conditions are tested in order, and exactly one
block is executed _ either the first one whose is satisfied,
or the one for the last When this block is finished, the
next statement executed is the one after the last If no ac-
tion is to be taken for the ``default'' case, omit the last
For example, to count letters, digits and others in a
file, we could write main( ) {
int let, dig, other, c;
let = dig = other = 0;
while( (c=getchar( )) != '\0' )
if( ('A'<=c && c<='Z') ('a'<=c && c<='z') ) let;
else if( '0'<=c && c<='9' ) dig;
else other;
printf("%d letters, %d digits, %d others\n", let, dig,
other); } The `++' operator means ``increment by 1''; we
will get to it in the next section.
October 10, 1975
Chapter 3
1
_. Increment
_________ and
___ Decrement
_________ Operators
_________
In addition to the usual `(mi', C also has two other
interesting unary operators, `++' (increment) and `(mi(mi'
(decrement). Suppose we want to count the lines in a file.
main( ) {
int c,n;
n = 0;
while( (c=getchar( )) != '\0' )
if( c '\n' )
n;
printf("%d lines\n", n); } is equivalent to but clear-
er, particularly when is a complicated expression. `++' and
`(mi(mi' can be applied only to and (and which we haven't
got to yet).
The unusual feature of `++' and `(mi(mi' is that they
can be used either before or after a variable. The value of
is the value of after
_____ it has been incremented. The value of
is before
______ it is incremented. Suppose is 5. Then x = k; in-
crements to 6 and then sets to the resulting value, i.e., to
6. But x = k; first sets to to 5, and then
____ increments to 6.
The incrementing effect of and is the same, but their values
are respectively 5 and 6. We shall soon see examples where
both of these uses are important.
2
_. Arrays
______
In C, as in Fortran or PL/I, it is possible to make ar-
rays whose elements are basic types. Thus we can make an
array of 10 integers with the declaration int x[10]; The
square brackets mean subscripting
____________; parentheses are used only
for function references. Array indexes begin at zero
____, so
the elements of are x[0], x[1], x[2], , x[9] If an array has
elements, the largest subscript is
Multiple-dimension arrays are provided, though not much
used above two dimensions. The declaration and use look
like int name[10] [20]; n = name[i+j] [1] + name[k] [2];
Subscripts can be arbitrary integer expressions. Multi-
dimension arrays are stored by row (opposite to Fortran), so
the rightmost subscript varies fastest; has 10 rows and 20
columns.
Here is a program which reads a line, stores it in a
buffer, and prints its length (excluding the newline at the
end). main( ) {
int n, c;
char line[100];
n = 0;
while( (c=getchar( )) != '\n' ) {
if( n < 100 )
line[n] = c;
n;
}
October 10, 1975
- 2 -
printf("length = %d\n", n); }
As a more complicated problem, suppose we want to print
the count for each line in the input, still storing the
first 100 characters of each line. Try it as an exercise
before looking at the solution: main( ) {
int n, c; char line[100];
n = 0;
while( (c=getchar( )) != '\0' )
if( c '\n' ) {
printf("%d0, n);
n = 0;
}
else {
if( n < 100 ) line[n] = c;
n;
} }
3
_. Character
_________ Arrays
______; Strings
_______
Text is usually kept as an array of characters, as we
did with in the example above. By convention in C, the last
character in a character array should be a `\0' because most
programs that manipulate character arrays expect it. For
example, uses the `\0' to detect the end of a character ar-
ray when printing it out with a `%s'.
We can copy a character array into another like this:
i = 0;
while( (t[i]=s[i]) != '\0' )
i;
Most of the time we have to put in our own `\0' at the
end of a string; if we want to print the line with it's
necessary. This code prints the character count before the
line: main( ) {
int n;
char line[100];
n = 0;
while( (line[n]=getchar( )) != '\n' );
line[n] = '\0';
printf("%d:\t%s", n, line); } Here we increment in the
subscript itself, but only after the previous value has been
used. The character is read, placed in and only then is in-
cremented.
There is one place and one place only where C puts in
the `\0' at the end of a character array for you, and that
is in the construction "stuff between double quotes" The
compiler puts a `\0' at the end automatically. Text en-
closed in double quotes is called a string
______; its properties
are precisely those of an (initialized) array of characters.
October 10, 1975
- 3 -
4
_. For
___ Statement
_________
The statement is a somewhat generalized that lets us
put the initialization and increment parts of a loop into a
single statement along with the test. The general form of
the is for( initialization; expression; increment )
statement The meaning is exactly
initialization;
while( expression ) {
statement
increment;
} Thus, the following code does the same array copy as
the example in the previous section:
for( i=0; (t[i]=s[i]) != '\0'; i ); This slightly more
ornate example adds up the elements of an array:
sum = 0;
for( i=0; i<n; i)
sum = sum + array[i];
In the statement, the initialization can be left out if
you want, but the semicolon has to be there. The increment
is also optional. It is not
___ followed by a semicolon. The
second clause, the test, works the same way as in the if the
expression is true (not zero) do another loop, otherwise get
on with the next statement. As with the the loop may be
done zero times. If the expression is left out, it is taken
to be always true, so for( ; ; ) and while( 1 ) are both in-
finite loops.
You might ask why we use a since it's so much like a
(You might also ask why we use a because...) The is usually
preferable because it keeps the code where it's used and
sometimes eliminates the need for compound statements, as in
this code that zeros a two-dimensional array: for( i=0; i<n;
i )
for( j=0; j<m; j )
array[i][j] = 0;
5
_. Functions
_________; Comments
________
Suppose we want, as part of a larger program, to count
the occurrences of the ascii characters in some input text.
Let us also map illegal characters (those with value>127 or
<0) into one pile. Since this is presumably an isolated
part of the program, good practice dictates making it a
separate function. Here is one way: main( ) {
int hist[129]; / 128 legal chars + 1 illegal group
/
count(hist, 128); / count the letters into hist /
printf( ); / comments look like this; use them
/
/ anywhere blanks, tabs or newlines could ap-
pear / } count(buf, size)
October 10, 1975
- 4 -
int size, buf[ ]; {
int i, c;
for( i=0; i<=size; i )
buf[i] = 0; / set buf to zero /
while( (c=getchar( )) != '\0' ) { / read til eof /
if( c > size c < 0 )
c = size; / fix illegal input /
buf[c];
}
return; } We have already seen many examples of calling
a function, so let us concentrate on how to define
______ one.
Since has two arguments, we need to declare them, as shown,
giving their types, and in the case of the fact that it is
an array. The declarations of arguments go between
_______ the ar-
gument list and the opening `{'. There is no need to speci-
fy the size of the array for it is defined outside of
The statement simply says to go back to the calling
routine. In fact, we could have omitted it, since a return
is implied at the end of a function.
What if we wanted to return a value, say the number of
characters read? The statement allows for this too:
int i, c, nchar;
nchar = 0;
while( (c=getchar( )) != '\0' ) {
if( c > size c < 0 )
c = size;
buf[c];
nchar;
}
return(nchar); Any expression can appear within the
parentheses. Here is a function to compute the minimum of
two integers: min(a, b)
int a, b; {
return( a < b ? a : b ); }
To copy a character array, we could write the function
strcopy(s1, s2) / copies s1 to s2 /
char s1[ ], s2[ ]; {
int i;
for( i = 0; (s2[i] = s1[i]) != '\0'; i ); } As is often
the case, all the work is done by the assignment statement
embedded in the test part of the Again, the declarations of
the arguments and omit the sizes, because they don't matter
to (In the section on pointers, we will see a more efficient
way to do a string copy.)
There is a subtlety in function usage which can trap
the unsuspecting Fortran programmer. Simple variables (not
arrays) are passed in C by ``call by value'', which means
that the called function is given a copy of its arguments,
October 10, 1975
- 5 -
and doesn't know their addresses. This makes it impossible
to change the value of one of the actual input arguments.
There are two ways out of this dilemma. One is to make
special arrangements to pass to the function the address of
a variable instead of its value. The other is to make the
variable a global or external variable, which is known to
each function by its name. We will discuss both possibili-
ties in the next few sections.
6
_. Local
_____ and
___ External
________ Variables
_________
If we say f( ) {
int x;
} g( ) {
int x;
} each is local
_____ to its own routine _ the in is unrelated to
the in (Local variables are also called ``automatic''.)
Furthermore each local variable in a routine appears only
when the function is called, and disappears
__________ when the func-
tion is exited. Local variables have no memory from one
call to the next and must be explicitly initialized upon
each entry. (There is a storage class for making local
variables with memory; we won't discuss it.)
As opposed to local variables, external
________ variables
_________ are
defined external to all functions, and are (potentially)
available to all functions. External storage always remains
in existence. To make variables external we have to define
______
them external to all functions, and, wherever we want to use
them, make a declaration
___________. main( ) {
extern int nchar, hist[ ];
count( );
} count( ) {
extern int nchar, hist[ ];
int i, c;
} int hist[129]; / space for histogram /
int nchar; / character count / Roughly speaking,
any function that wishes to access an external variable must
contain an declaration for it. The declaration is the same
as others, except for the added keyword Furthermore, there
must somewhere be a definition
__________ of the external variables
external to all functions.
External variables can be initialized; they are set to
zero if not explicitly initialized. In its simplest form,
initialization is done by putting the value (which must be a
constant) after the definition: int nchar 0;
char flag 'f';
October 10, 1975
- 6 -
etc This is discussed further in a later section.
This ends our discussion of what might be called the
central core of C. You now have enough to write quite sub-
stantial C programs, and it would probably be a good idea if
you paused long enough to do so. The rest of this tutorial
will describe some more ornate constructions, useful but not
essential.
October 10, 1975
Chapter 4
1
_. Pointers
________
A pointer
_______ in C is the address of something. It is a
rare case indeed when we care what the specific address it-
self is, but pointers are a quite common way to get at the
contents of something. The unary operator `&' is used to
produce the address of an object, if it has one. Thus
int a, b;
b = &a; puts the address of into We can't do much with
it except print it or pass it to some other routine, because
we haven't given the right kind of declaration. But if we
declare that is indeed a pointer
_______ to an integer, we're in
good shape:
int a, b, c;
b = &a;
c = b; contains the address of and means to use the
value in as an address, i.e., as a pointer. The effect is
that we get back the contents of albeit rather indirectly.
(It's always the case that is the same as if has an ad-
dress.)
The most frequent use of pointers in C is for walking
efficiently along arrays. In fact, in the implementation of
an array, the array name represents the address of the
zeroth element of the array, so you can't use it on the left
side of an expression. (You can't change the address of
something by assigning to it.) If we say char y; char
x[100]; is of type pointer to character (although it doesn't
yet point anywhere). We can make point to an element of by
either of y = &x[0]; y = x; Since is the address of this is
legal and consistent.
Now gives More importantly, (y+1) gives x[1]
(y+i) gives x[i] and the sequence
y = &x[0];
y; leaves pointing at
Let's use pointers in a function that computes how long
a character array is. Remember that by convention all char-
acter arrays are terminated with a `\0'. (And if they
aren't, this program will blow up inevitably.) The old way:
length(s)
char s[ ]; {
int n;
for( n=0; s[n] != '\0'; )
n;
return(n); } Rewriting with pointers gives length(s)
char s; {
int n;
for( n=0; s != '\0'; s )
n;
return(n); } You can now see why we have to say what
kind of thing points to _ if we're to increment it with we
have to increment it by the right amount.
October 10, 1975
- 2 -
The pointer version is more efficient (this is almost
always true) but even more compact is
for( n=0; s != '\0'; n ); The returns a character; the
increments the pointer so we'll get the next character next
time around. As you can see, as we make things more effi-
cient, we also make them less clear. But is an idiom so
common that you have to know it.
Going a step further, here's our function that copies a
character array to another strcopy(s,t)
char s, t; {
while(t = s); } We have omitted the test against `\0',
because `\0' is identically zero; you will often see the
code this way. (You must
____ have a space after the `=': see
section 25.)
For arguments to a function, and there only, the de-
clarations char s[ ]; char s; are equivalent _ a pointer to
a type, or an array of unspecified size of that type, are
the same thing.
If this all seems mysterious, copy these forms until
they become second nature. You don't often need anything
more complicated.
2
_. Function
________ Arguments
_________
Look back at the function in the previous section. We
passed it two string names as arguments, then proceeded to
clobber both of them by incrementation. So how come we
don't lose the original strings in the function that called
As we said before, C is a ``call by value'' language:
when you make a function call like the value
_____ of is passed,
not its address. So there's no way to alter
_____ from inside If
is an array this isn't a problem, because is
__ an address any-
way, and you're not trying to change it, just what it ad-
dresses. This is why works as it does. And it's convenient
not to have to worry about making temporary copies of the
input arguments.
But what if is a scalar and you do want to change it?
In that case, you have to pass the address
_______ of to and then
use it as a pointer. Thus for example, to interchange two
integers, we must write flip(x, y)
int x, y; {
int temp;
temp = x;
x = y;
y = temp; } and to call we have to pass the addresses
of the variables: flip (&a, &b);
October 10, 1975
- 3 -
3
_. Multiple
________ Levels
______ of
__ Pointers
________; Program
_______ Arguments
_________
When a C program is called, the arguments on the com-
mand line are made available to the main program as an argu-
ment count and an array of character strings containing the
arguments. Manipulating these arguments is one of the most
common uses of multiple levels of pointers (``pointer to
pointer to ...''). By convention, is greater than zero; the
first argument (in is the command name itself.
Here is a program that simply echoes its arguments.
main(argc, argv)
int argc;
char argv; {
int i;
for( i=1; i < argc; i )
printf("%s ", argv[i]);
putchar('\n'); } Step by step: is called with two argu-
ments, the argument count and the array of arguments. is a
pointer to an array, whose individual elements are pointers
to arrays of characters. The zeroth argument is the name of
the command itself, so we start to print with the first ar-
gument, until we've printed them all. Each is a character
array, so we use a in the
You will sometimes see the declaration of written as
char argv[ ]; which is equivalent. But we can't use because
both dimensions are variable and there would be no way to
figure out how big the array is.
Here's a bigger example using and A common convention
in C programs is that if the first argument is `(mi', it in-
dicates a flag of some sort. For example, suppose we want a
program to be callable as prog -abc arg1 arg2 where the
`(mi' argument is optional; if it is present, it may be fol-
lowed by any combination of a, b, and c. main(argc, argv)
int argc;
char argv; {
aflag = bflag = cflag = 0;
if( argc > 1 && argv[1][0] '-' ) {
for( i=1; (c=argv[1][i]) != '\0'; i )
if( c'a' )
aflag;
else if( c'b' )
bflag;
else if( c'c' )
cflag;
else
printf("%c?\n", c);
--argc;
argv;
}
October 10, 1975
- 4 -
There are several things worth noticing about this
code. First, there is a real need for the left-to-right
evaluation that && provides; we don't want to look at unless
we know it's there. Second, the statements
--argc;
argv; let us march along the argument list by one posi-
tion, so we can skip over the flag argument as if it had
never existed _ the rest of the program is independent of
whether or not there was a flag argument. This only works
because is a pointer which can be incremented.
4
_. The
___ Switch
______ Statement
_________; Break
_____; Continue
________
The statement can be used to replace the multi-way test
we used in the last example. When the tests are like this:
if( c 'a' ) else if( c 'b' ) else if( c 'c' ) else test-
ing a value against a series of constants
_________, the switch state-
ment is often clearer and usually gives better code. Use it
like this: switch( c ) {
case 'a':
aflag;
break; case 'b':
bflag;
break; case 'c':
cflag;
break; default:
printf("%c?\n", c);
break; } The statements label the various actions we
want; gets done if none of the other cases are satisfied.
(A is optional; if it isn't there, and none of the cases
match, you just fall out the bottom.)
The statement in this example is new. It is there be-
cause the cases are just labels, and after you do one of
them, you fall
____ through
_______ to the next unless you take some ex-
plicit action to escape. This is a mixed blessing. On the
positive side, you can have multiple cases on a single
statement; we might want to allow both upper and lower case
letters in our flag field, so we could say case 'a': case
'A': case 'b': case 'B':
etc But what if we just want to get out after doing ? We
could get out of a of the with a label and a but this is
really ugly. The statement lets us exit without either or
label. switch( c ) {
case 'a':
aflag;
break; case 'b':
bflag;
break;
} / the break statements get us here directly / The state-
ment also works in and statements _ it causes an immediate
October 10, 1975
- 5 -
exit from the loop.
The statement works only
____ inside and it causes the next
iteration of the loop to be started. This means it goes to
the increment part of the and the test part of the We could
have used a in our example to get on with the next iteration
of the but it seems clearer to use instead.
October 10, 1975
Chapter 5
1
_. Structures
__________
The main use of structures is to lump together collec-
tions of disparate variable types, so they can conveniently
be treated as a unit. For example, if we were writing a
compiler or assembler, we might need for each identifier in-
formation like its name (a character array), its source line
number (an integer), some type information (a character,
perhaps), and probably a usage count (another integer).
char id[10];
int line;
char type;
int usage;
We can make a structure out of this quite easily. We
first tell C what the structure will look like, that is,
what kinds of things it contains; after that we can actually
reserve storage for it, either in the same statement or
separately. The simplest thing is to define it and allocate
storage all at once: struct {
char id[10];
int line;
char type;
int usage; } sym;
This defines to be a structure with the specified
shape; and are members
_______ of the structure. The way we refer
to any particular member of the structure is
structure(hyname member as in
symtype = 077;
if( symusage 0 )
while( symid[j] )
etc Although the names of structure members never
stand alone, they still have to be unique _ there can't be
another or in some other structure.
So far we haven't gained much. The advantages of
structures start to come when we have arrays of structures,
or when we want to pass complicated data layouts between
functions. Suppose we wanted to make a symbol table for up
to 100 identifiers. We could extend our definitions like
char id[100][10];
int line[100];
char type[100];
int usage[100]; but a structure lets us rearrange this
spread-out information so all the data about a single iden-
tifer is collected into one lump: struct {
char id[10];
int line;
char type;
int usage; } sym[100]; This makes an array of struc-
tures; each array element has the specified shape. Now we
can refer to members as
sym[i]usage; / increment usage of i(hyth identifier /
for( j=0; sym[i]id[j] != '\0'; )
October 10, 1975
- 2 -
etc Thus to print a list of all identifiers that
haven't been used, together with their line number,
for( i=0; i<nsym; i )
if( sym[i]usage 0 )
printf("%d\t%s\n", sym[i]line, sym[i]id);
Suppose we now want to write a function which will tell
us if already exists in by giving its index, or that it
doesn't, by returning a (mi1. We can't pass a structure to
a function directly _ we have to either define it external-
ly, or pass a pointer to it. Let's try the first way first.
int nsym 0; / current length of symbol table / struct {
char id[10];
int line;
char type;
int usage; } sym[100]; / symbol table / main(
) {
if( (index = lookup(newname)) >= 0 )
sym[index]usage; / already there /
else
install(newname, newline, newtype);
} lookup(s)
char s; {
int i;
extern struct {
char id[10];
int line;
char type;
int usage;
} sym[ ];
for( i=0; i<nsym; i )
if( compar(s, sym[i]id) > 0 )
return(i);
return(-1); }
compar(s1,s2) / return 1 if s1s2, 0 otherwise /
char s1, s2; {
while( s1 s2 )
if( s2 '\0' )
return(1);
return(0); } The declaration of the structure in isn't
needed if the external definition precedes its use in the
same source file, as we shall see in a moment.
Now what if we want to use pointers? struct symtag {
char id[10];
int line;
char type;
int usage; } sym[100], psym;
psym = &sym[0]; / or p = sym; / This makes a
pointer to our kind of structure (the symbol table), then
October 10, 1975
- 3 -
initializes it to point to the first element of
Notice that we added something after the word a ``tag''
called This puts a name on our structure definition so we
can refer to it later without repeating the definition.
It's not necessary but useful. In fact we could have said
struct symtag {
structure definition }; which wouldn't have assigned
any storage at all, and then said
struct symtag sym[100]; struct symtag psym;
which would define the array and the pointer. This could be
condensed further, to struct symtag sym[100], psym;
The way we actually refer to an member of a structure
by a pointer is like this:
ptr -> structure(hymember The symbol `(mi>' means we're
pointing at a member of a structure; `(mi>' is only used in
that context. is a pointer to the (base of) a structure
that contains the structure member. The expression refers
to the indicated member of the pointed-to structure. Thus
we have constructions like: psym->type = 1; psym->id[0] =
'a'; and so on.
For more complicated pointer expressions, it's wise to
use parentheses to make it clear who goes with what. For
example, struct { int x, y; } p; p->x increments x p->x so
does this! (p)->x increments p before getting x
p->y uses y as a pointer, then increments it (p->y) so
does this (p)->y uses y as a pointer, then increments p
The way to remember these is that (dot), and bind very
tightly. An expression involving one of these is treated as
a unit. and are names exactly as is.
If is a pointer to a structure, any arithmetic on takes
into account the acutal size of the structure. For in-
stance, increments by the correct amount to get the next
element of the array of structures. But don't assume that
the size of a structure is the sum of the sizes of its
members _ because of alignments of different sized objects,
there may be ``holes'' in a structure.
Enough theory. Here is the lookup example, this time
with pointers. struct symtag {
char id[10];
int line;
char type;
int usage; } sym[100]; main( ) {
struct symtag lookup( );
struct symtag psym;
if( (psym = lookup(newname)) ) / non-zero pointer /
psym -> usage; / means already there /
else
install(newname, newline, newtype);
October 10, 1975
- 4 -
} struct symtag lookup(s)
char s; {
struct symtag p;
for( p=sym; p < &sym[nsym]; p )
if( compar(s, p->id) > 0)
return(p);
return(0); } The function doesn't change: refers to a
string.
In we test the pointer returned by against zero, rely-
ing on the fact that a pointer is by definition never zero
when it really points at something. The other pointer mani-
pulations are trivial.
The only complexity is the set of lines like struct
symtag lookup( ); This brings us to an area that we will
treat only hurriedly _ the question of function types. So
far, all of our functions have returned integers (or charac-
ters, which are much the same). What do we do when the
function returns something else, like a pointer to a struc-
ture? The rule is that any function that doesn't return an
has to say explicitly what it does return. The type infor-
mation goes before the function name (which can make the
name hard to see). Examples: char f(a)
int a; {
}
int g( ) { }
struct symtag lookup(s) char s; { } The function returns a
character, returns a pointer to an integer, and returns a
pointer to a structure that looks like And if we're going to
use one of these functions, we have to make a declaration
where we use it, as we did in above.
Notice th parallelism between the declarations
struct symtag lookup( );
struct symtag psym; In effect, this says that and are
both used the same way _ as a pointer to a strcture _ even
though one is a variable and the other is a function.
October 10, 1975
Chapter 6
1
_. Initialization
______________ of
__ Variables
_________
An external variable may be initialized at compile time
by following its name with an initializing value when it is
defined. The initializing value has to be something whose
value is known at compile time, like a constant.
int x 0; / "0" could be any constant / int a 'a';
char flag 0177; int p &y[1]; / p now points to y[1] /
An external array can be initialized by following its name
with a list of initializations enclosed in braces:
int x[4] {0,1,2,3}; / makes x[i] = i / int y[
] {0,1,2,3}; / makes y big enough for 4 values /
char msg "syntax error\n"; / braces unnecessary here /
char keyword[ ]{
"if",
"else",
"for",
"while",
"break",
"continue",
0 }; This last one is very useful _ it makes an array
of pointers to character strings, with a zero at the end so
we can identify the last element easily. A simple lookup
routine could scan this until it either finds a match or en-
counters a zero keyword pointer: lookup(str) /
search for str in keyword[ ] /
char str; {
int i,j,r;
for( i=0; keyword[i] != 0; i) {
for( j=0; (r=keyword[i][j]) str[j] && r != '\0';
j );
if( r str[j] )
return(i);
}
return(-1); }
Sorry _ neither local variables nor structures can be
initialized.
October 10, 1975
Chapter 7
1
_. Scope
_____ Rules
_____: Who
___ Knows
_____ About
_____ What
____
A complete C program need not be compiled all at once;
the source text of the program may be kept in several files,
and previously compiled routines may be loaded from li-
braries. How do we arrange that data gets passed from one
routine to another? We have already seen how to use func-
tion arguments and values, so let us talk about external da-
ta. Warning: the words declaration
___________ and definition
__________ are used
precisely in this section; don't treat them as the same
thing.
A major shortcut exists for making declarations. If
the definition of a variable appears before
______ its use in some
function, no declaration is needed within the function.
Thus, if a file contains f1( ) { } int foo; f2( ) { foo =
1; } f3( ) { if ( foo ) } no declaration of is needed in
either or or because the external definition of appears be-
fore them. But if wants to use it has to contain the de-
claration f1( ) {
extern int foo;
}
This is true also of any function that exists on anoth-
er file _ if it wants it has to use an declaration for it.
(If somewhere there is an declaration for something, there
must also eventually be an external definition of it, or
you'll get an ``undefined symbol'' message.)
There are some hidden pitfalls in external declarations
and definitions if you use multiple source files. To avoid
them, first, define and initialize each external variable
only once in the entire set of files: int foo 0; You can
get away with multiple external definitions on but not on so
don't ask for trouble. Multiple initializations are illegal
everywhere. Second, at the beginning of any file that con-
tains functions needing a variable whose definition is in
some other file, put in an declaration, outside of any func-
tion: extern int foo; f1( ) { }
etc
The compiler control line, to be discussed shortly,
lets you make a single copy of the external declarations for
a program and then stick them into each of the source files
making up the program.
2
_. #define
______, #include
_______
C provides a very limited macro facility. You can say
#define name something and thereafter anywhere
``name'' appears as a token, ``something'' will be substi-
tuted. This is particularly useful in parametering the
sizes of arrays: #define ARRAYSIZE 100
int arr[ARRAYSIZE];
October 10, 1975
- 2 -
while( i < ARRAYSIZE ) (now we can alter the entire
program by changing only the or in setting up mysterious
constants: #define SET 01 #define INTERRUPT 02 /
interrupt bit / #define ENABLED 04
if( x & (SET | INTERRUPT | ENABLED) ) Now we have meaningful
words instead of mysterious constants. (The mysterious
operators `&' (AND) and `(or' (OR) will be covered in the
next section.) It's an excellent practice to write programs
without any literal constants except in statements.
There are several warnings about First, there's no sem-
icolon at the end of a all the text from the name to the end
of the line (except for comments) is taken to be the ``some-
thing''. When it's put into the text, blanks are placed
around it. Good style typically makes the name in the upper
case _ this makes parameters more visible. Definitions af-
fect things only after they occur, and only within the file
in which they occur. Defines can't be nested. Last, if
there is a in a file, then the first character of the file
must
____ be a `#', to signal the preprocessor that definitions
exist.
The other control word known to C is To include one
file in your source at compilation time, say #include
"filename" This is useful for putting a lot of heavily used
data definitions and statements at the beginning of a file
to be compiled. As with the first line of a file containing
a has to begin with a `#'. And can't be nested _ an includ-
ed file can't contain another
October 10, 1975
Chapter 8
1
_. Bit
___ Operators
_________
C has several operators for logical bit-operations.
For example, x = x & 0177; forms the bit-wise of and 0177,
effectively retaining only the last seven bits of Other
operators are (or inclusive OR ^ (circumflex) exclusive
OR + (tilde) 1's complement ! logical NOT << left
shift (as in x<<2) >> right shift (arithmetic on
PDP(hy11; logical on H6070, IBM360)
2
_. Assignment
__________ Operators
_________
An unusual feature of C is that the normal binary
operators like `+', `(mi', etc. can be combined with the
assignment operator `=' to form new assignment operators.
For example, x =- 10; uses the assignment operator `=(mi' to
decrement by 10, and x =& 0177 forms the of and 0177. This
convention is a useful notational shortcut, particularly if
is a complicated expression. The classic example is summing
an array: for( sum=i=0; i<n; i )
sum =+ array[i]; But the spaces around the operator are
critical! For instance, x = -10; sets to (mi10, while x =-
10; subtracts 10 from When no space is present, x=-10; also
decreases by 10. This is quite contrary to the experience
of most programmers. In particular, watch out for things
like c=s; y=&x[0]; both of which are almost certainly not
what you wanted. Newer versions of various compilers are
courteous enough to warn you about the ambiguity.
Because all other operators in an expression are
evaluated before the assignment operator, the order of
evaluation should be watched carefully: x = x<<y | z; means
``shift left places, then with and store in But x =<< y | z;
means ``shift left by places'', which is rather different.
3
_. Floating
________ Point
_____
We've skipped over floating point so far, and the
treatment here will be hasty. C has single and double pre-
cision numbers (where the precision depends on the machine
at hand). For example,
double sum;
float avg, y[10];
sum = 00;
for( i=0; i<n; i )
sum =+ y[i];
avg = sum/n; forms the sum and average of the array
All floating arithmetic is done in double precision.
Mixed mode arithmetic is legal; if an arithmetic operator in
an expression has both operands or the arithmetic done is
integer, but if one operand is or and the other is or both
operands are converted to Thus if and are and is
(x+i)/j converts i and j to float x + i/j does
i/j integer, then converts Type conversion may be made by
October 10, 1975
- 2 -
assignment; for instance,
int m, n;
float x, y;
m = x;
y = n; converts to integer (truncating toward zero),
and to floating point.
Floating constants are just like those in Fortran or
PL/I, except that the exponent letter is `e' instead of `E'.
Thus:
pi = 314159;
large = 123456789e10;
will format floating point numbers: in the format
string will print the corresponding variable in a field di-
gits wide, with decimal places. An instead of an will pro-
duce exponential notation.
4
_. Horrors
_______! goto
____'s
_ and
___ labels
______
C has a statement and labels, so you can branch about
the way you used to. But most of the time aren't needed.
(How many have we used up to this point?) The code can al-
most always be more clearly expressed by and compound state-
ments.
One use of with some legitimacy is in a program which
contains a long loop, where a would be too extended. Then
you might write
mainloop:
goto mainloop; Another use is to implement a out of
more than one level of or can only branch to labels within
the same function.
5
_. Acknowledgements
________________
I am indebted to a veritable host of readers who made
valuable criticisms on several drafts of this tutorial.
They ranged in experience from complete beginners through
several implementors of C compilers to the C language
designer himself. Needless to say, this is a wide enough
spectrum of opinion that no one is satisfied (including me);
comments and suggestions are still welcome, so that some fu-
ture version might be improved.
October 10, 1975
Chapter 9
References
__________
C is an extension of B, which was designed by D. M.
Ritchie and K. L. Thompson [4]. The C language design and
implementation are the work of D. M. Ritchie. The version
was begun by A. Snyder and B. A. Barres, and completed by S.
C. Johnson and M. E. Lesk. The version is primarily due to
T. G. Peterson, with the assistance of M. E. Lesk.
[1] D. M. Ritchie, C
_ Reference
_________ Manual
______. Bell Labs, Jan.
1974.
[2] M. E. Lesk & B. A. Barres, The
___ GCOS
____ C
_ Library
_______. Bell
Labs, Jan. 1974.
[3] D. M. Ritchie & K. Thompson, UNIX
____ Programmer
__________'s
_ Manual
______.
5th Edition, Bell Labs, 1974.
[4] S. C. Johnson & B. W. Kernighan, The
___ Programming
___________
Language
________ B
_. Computer Science Technical Report 8, Bell
Labs, 1972.
October 10, 1975
External links
- Brian W. Kernighan, Programming in C - A Tutorial