3 - Control Flow
The control flow statements of a language specify the order in which computations are done. We have already met the most common control flow constructions of C in earlier examples; here we will complete the set, and be more precise about the ones discussed before.
3.1 - Statements and Blocks
An expression such as x = 0 or i++ or printf( ... ) becomes a
statement when it is followed by a semicolon, as in
x = 0;
i++;
printf(...);
In C, the semicolon is a statement terminator, rather than a separator as it is in Algol-like languages.
The braces { and } are used to group declarations and statements
together into a compound statement or block so that they are syntactically
equivalent to a single statement. The braces that surround the statements of
a function are one obvious example; braces around multiple statements after
an if, else, while or for are another. (Variables can actually be
declared inside any block; we will talk about this in Chapter 4.) There is
never a semicolon after the right brace that ends a block.
Ah C, how do I love thee? Let me count the ways. - Dr. Chuck with homage to Elizabeth Barrett Browning
The humble semicolon is why spacing and line-ends do not matter to C and C-like languages. It means we as programmers can focus all of our white space and lines on communicating our intent to humans. This freedom is not an excuse to write obtuse or dense code (see the Obfuscated Perl Contest) but instead freedom to describe what we mean or use spacing to help us understand our code.
We can take a quick look at how a few other C-like languages treat the semicolon. Java is just like C in that the semicolon terminates statements. Python treats the semicolon as a separator - allowing more than one statement on a single line. But since Python treats the end of line as a statement separator - you generally never use semicolon in Python. But for people like me who automatically add a semicolon when typing code too fast, at least Python ignores the few semicolons I add to my code out of habit. JavaScript treats semicolon as a separator but since JavaScript ignores the end of a line (it is whitespace), semicolons are required when a block of code consists of more than one line. When I write JavaScript, I meticulously include semicolons at the end of all statements because "any good C programmer can write C in any language".
3.2 - If-Else
The if-else statement is used to make decisions. Formally, the syntax is
if (expression)
statement-1
else
statement-2
where the else part is optional. The expression is evaluated; if it is "true"
(that is, if expression has a non-zero value), statement-1 is done. If it is
"false" (expression is zero) and if there is an else part, statement-2 is executed instead.
Since an if simply tests the numeric value of an expression, certain
coding shortcuts are possible. The most obvious is writing
if (expression)
instead of
if (expression != 0)
Sometimes this is natural and clear; at other times it is cryptic.
Because the else part of an if-else is optional, there is an ambiguity
when an else is omitted from a nested if sequence. This is resolved in
the usual way - the else is associated with the closest previous else-less
if. For example, in
if (n > 0)
if (a > b)
z = a;
else
z = b;
the else goes with the inner if, as we have shown by indentation. If that
isn't what you want, braces must be used to force the proper association:
if (n > 0) {
if (a > b)
z = a;
}
else
z = b;
The ambiguity is especially pernicious in situations like:
if (n > 0)
for (i = 0; i < n; i++)
if (s[i] > 0) {
printf("...");
return(i);
}
else /* WRONG */
printf("error - n is zero\n");
The indentation shows unequivocally what you want, but the compiler
doesn't get the message, and associates the else with the inner if. This
kind of bug can be very hard to find.
By the way, notice that there is a semicolon after z = a in
if (a > b)
z = a;
else
z = b;
This is because grammatically, a statement follows the if, and an expression
statement like z = a is always terminated by a semicolon.
3.3 - Else-If
The construction
if (expression)
statement
else if (expression)
statement
else if (expression)
statement
else
statement
occurs so often that it is worth a brief separate discussion. This sequence of
if's is the most general way of writing a multi-way decision. The
expression's are evaluated in order; if any expression is true, the statement
associated with it is executed, and this terminates the whole chain. The
code for each statement is either a single statement, or a group in braces.
The last else part handles the "none of the above" or default case
where none of the other conditions was satisfied. Sometimes there is no
explicit action for the default; in that case the trailing
else
statement
can be omitted, or it may be used for error checking to catch an "impossible" condition.
To illustrate a three-way decision, here is a binary search function that
decides if a particular value x occurs in the sorted array v. The elements of
v must be in increasing order. The function returns the position (a number
between 0 and n-1) if x occurs in v, and -1 if not.
binary(x, v, n) /* find x in v[0] ... v[n-1] */
int x, v[], n;
{
int low, high, mid;
low = 0;
high = n - 1;
while (low <= high)
{
mid = (low+high) / 2;
if (x < v[mid])
high = mid - 1;
else if (x > v[mid])
low = mid + 1;
else /* found match */
return (mid);
}
return(-1);
}
The fundamental decision is whether x is less than, greater than, or
equal to the middle element v[mid] at each step; this is a natural for
else-if.
Note that in the above examples, the else and the if are two language constructs that are just being used
idiomatically to construct an else if pattern with indentation that captures the idiom.
If we are pedantic about indentation of the above sequence we would be separating the
else and if and then indenting each succeeding block further as follows
(brackets added for clarity):
if (expression) {
statement
}
else
{
if (expression) {
statement
}
else
{
if (expression) {
statement
}
else
{
statement
}
}
}
Java and JavaScript keep the else and if as separate language elements and document their
idiomatic usage and indentation just like C.
But the Python elif is a new language construct that achieves the same end as shown below:
if (expression) :
block
elif (expression) :
block
elif (expression) :
block
else :
block
The C/Java/JavaScript and Python idioms thankfully look the same when the idiomatic indentation
is used. Even FORTRAN 77 supports the ELSE IF construct.
3.4 - Switch
The switch statement is a special multi-way decision maker that tests
whether an expression matches one of a number of constant values, and
branches accordingly. In Chapter 1 we wrote a program to count the
occurrences of each digit, white space, and all other characters, using a
sequence of if ... else if ... else. Here is the same program with a
switch.
#include <stdio.h>
main() /* count digits, white space, others */
{
int c, i, nwhite, nother, ndigit[10];
nwhite = nother = 0;
for (i = 0; i < 10; i++)
ndigit[i] = 0;
while ((c = getchar()) != EOF)
switch (c) {
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
ndigit[c-'0']++;
break;
case ' ':
case '\n':
case '\t':
nwhite++;
break;
default:
nother++;
break;
}
printf("digits =");
for (i = 0; i < 10; i++)
printf(" %d", ndigit[i]);
printf("\nwhite space = %d, other = %d\n", nwhite, nother);
}
The switch evaluates the integer expression in parentheses (in this
program the character c) and compares its value to all the cases. Each case
must be labeled by an integer or character constant or constant expression.
If a case matches the expression value, execution starts at that case. The
case labeled default is executed if none of the other cases is satisfied. A
default is optional; if it isn't there and if none of the cases matches, no
action at all takes place. Cases and default can occur in any order. Cases
must all be different.
The break statement causes an immediate exit from the switch.
Because cases serve just as labels, after the code for one case is done,
execution falls through to the next unless you take explicit action to escape.
break and return are the most common ways to leave a switch. A
break statement can also be used to force an immediate exit from while,
for and do loops as well, as will be discussed later in this chapter.
Falling through cases is a mixed blessing. On the positive side, it allows multiple cases for a single action, as with the blank, tab or newline in this example. But it also implies that normally each case must end with a break to prevent falling through to the next. Falling through from one case to another is not robust, being prone to disintegration when the program is modified. With the exception of multiple labels for a single computation, fall-throughs should be used sparingly.
As a matter of good form, put a break after the last case (the
default here) even though it's logically unnecessary. Some day when
another case gets added at the end, this bit of defensive programming will
save you.
Ah the switch statement. What is there to say? I think that the switch statement
was added to C to compete with the earlier FORTRAN "computed GOTO" statement and to keep low-level
programmers from switching to assembly language to implement a
branch table.
The authors spend most of the previous section apologizing for the switch statement, so perhaps
you should take that as a hint and avoid it in your code.
There are very few situations where a branch table outperforms a series of if-else checks
and those are likely deep in library or operating system code. Programmers should only use
switch if they understand what a branch table is, and why it is more efficient
for this particular bit of their program. Otherwise just use else if to do the readers
of your code a favor.
One can only wonder if Guido van Rossum or someone else at
Centrum Wiskunde & Informatica (CWI) in the Netherlands looked
at the above code example and thought,
"Interesing how case works in a C switch statement - a colon at the end of the
line and indenting the following lines is an elegant way to visually
denote blocks of code."
Exercise 3-1. Write a function expand(s, t) which converts characters
like newline and tab into visible escape sequences like \n and \t as it
copies the string s to t. Use a switch.
3.5 - Loops - While and For
We have already encountered the while and for loops. In
while (expression)
statement
the expression is evaluated. If it is non-zero, statement is executed and expression is re-evaluated. This cycle continues until expression becomes zero, at which point execution resumes after statement.
The for statement
for ( expr1 ; expr2 ; expr3)
statement
is equivalent to
expr1 ;
while (expr2) {
statement
expr3;
}
Grammatically, the three components of a for are expressions. Most commonly,
expr1 and expr3 are assignments or function calls and expr2 is a
relational expression. Any of the three parts can be omitted, although the
semicolons must remain. If expr1 or expr3 is left out, i is simply dropped
from the expansion. If the test, expr2, is not present, it is taken as permanently true, so
for (;;) {
...
}
is an "infinite" loop, presumably to be broken by other means (such as a
break or return).
Whether to use while or for is largely a matter of taste. For example, in
while ( (c = getchar () ) == ' ' || c == '\n' || c == '\t')
; /* skip white space characters */
there is no initialization or re-initialization, so the while seems most
natural.
The for is clearly superior when there is a simple initialization and
re-initialization, since it keeps the loop control statements close together and
visible at the top of the loop. This is most obvious in
for (i = 0; i < N; i++)
which is the C idiom for processing the first N elements of an array, the
analog of the Fortran or PL/I DO loop. The analogy is not perfect, however,
since the limits of a for loop can be altered from within the loop, and the
controlling variable i retains its value when the loop terminates for any
reason. Because the components of the for are arbitrary expressions, for
loops are not restricted to arithmetic progressions. Nonetheless, it is bad
style to force unrelated computations into a for; it is better reserved for
loop control operations.
As a larger example, here is another version of atoi for converting a
string to its numeric equivalent. This one is more general; it copes with
optional leading white space and an optional + or - sign. (Chapter 4 shows
atof, which does the same conversion for floating point numbers.)
The basic structure of the program reflects the form of the input:
skip white space, if any
get sign, if any
get integer part, convert it
Each step does its part, and leaves things in a clean state for the next. The whole process terminates on the first character that could not be part of a number.
atoi(s) /* convert s to integer */
char s[];
{
int i, n, sign;
for (i=0; s[i]==' ' || s[i]=='\n' || s[i]=='\t'; i++)
; /* skip white space */
sign = 1;
if (s[i] == '+' || s[i] == '-') /* sign */
sign = (s[i++]=='+') ? 1 : -1;
for (n = 0; s[i] >= '0' && s[i] <= '9'; i++)
n = 10 * n + s[i] - '0';
return(sign * n);
}
The advantages of keeping loop control centralized are even more obvious when there are several nested loops. The following function is a Shell sort for sorting an array of integers. The basic idea of the Shell sort is that in early stages, far-apart elements are compared, rather than adjacent ones, as in simple interchange sorts. This tends to eliminate large amounts of disorder quickly, so later stages have less work to do. The interval between compared elements is gradually decreased to one, at which point the sort effectively becomes an adjacent interchange method.
shell(v, n) /* sort v[0]...v[n-1] into increasing order */
int v[], n;
{
int gap, i, j, temp;
for (gap = n/2; gap > 0; gap /= 2)
for (i = gap; i < n; i++)
for (j=i-gap; j>=0 && v[j]>v[j+gap]; j -= gap){
temp = v[j];
v[j] = v[j+gap];
v[j+gap] = temp;
}
}
There are three nested loops. The outermost loop controls the gap between
compared elements, shrinking it from n/2 by a factor of two each pass until
it becomes zero. The middle loop compares each pair of elements that is
separated by gap; the innermost loop reverses any that are out of order.
Since gap is eventually reduced to one, all elements are eventually ordered
correctly. Notice that the generality of the for makes the outer loop fit the
same form as the others, even though it is not an arithmetic progression.
One final C operator is the comma ",", which most often finds use in
the for statement. A pair of expressions separated by a comma is
evaluated left to right, and the type and value of the result are the type and
value of the right operand. Thus in a for statement, it is possible to place
multiple expressions in the various parts, for example to process two indices
in parallel. This is illustrated in the function reverse(s), which reverses
the string s in place.
#include <string.h>
reverse (s) /* reverse string s in place */
char s[];
{
int c, i, j;
for (i = 0, j = strlen(s)-1; i < j; i++, j--) {
c = s[i];
s[i] = s[j];
s[j] = c;
}
}
The commas that separate function arguments, variables in declarations, etc., are not comma operators, and do not guarantee left to right evaluation.
Exercise 3-2. Write a function expand(s1 , s2) which expands shorthand notations
like a-z in the string s1 into the equivalent complete list
abc...xyz in s2. Allow for letters of either case and digits, and be
prepared to handle cases like a-b-c and a-z0-9 and -a-z. (A useful
convention is that a leading or trailing - is taken literally.)
3.6 - Loops - Do-while
The while and for loops share the desirable attribute of testing the
termination condition at the top, rather than at the bottom, as we discussed
in Chapter 1. The third loop in C, the do-while, tests at the bottom after
making each pass through the loop body; the body is always executed at
least once. The syntax is
do
statement
while (expression) ;
The statement is executed, then expression is evaluated. If it is true, statement is evaluated again, and so on. If the expression becomes false, the loop terminates.
As might be expected, do-while is much less used than while and
for, accounting for perhaps five percent of all loops. Nonetheless, it is
from time to time valuable, as in the following function itoa, which converts a number to
a character string (the inverse of atoi). The job is
slightly more complicated than might be thought at first, because the easy
methods of generating the digits generate them in the wrong order. We
have chosen to generate the string backwards, then reverse it.
itoa(n, s) /* convert n to characters in s */
char s[];
int n;
{
int i, sign;
if ((sign = n) < 0) /* record sign */
n = -n; /* make n positive */
i = 0;
do { /* generate digits in reverse order */
s[i++] = n % 10 + '0'; /* get next digit */
} while ((n /= 10) > 0); /* delete it */
if (sign < 0)
s[i++] = '-';
s[i] = '\0';
reverse(s);
}
The do-while is necessary, or at least convenient, since at least one character must
be installed in the array s, regardless of the value of n. We also
used braces around the single statement that makes up the body of the
do-while, even though they are unnecessary, so the hasty reader will not
mistake the while part for the beginning of a while loop.
It is important for any language to provide top-tested loops and bottom-tested loops. But don't feel bad if you write code for a year and never feel like a bottom-tested loop is the right way to solve a problem you are facing. It is usually rare to write a loop that you insist will run once regardless of its input data.
Exercise 3-3. In a 2's complement number representation, our version of
itoa does not handle the largest negative number, that is, the value of n
equal to -(2wordsize-1). Explain why not. Modify it to print that value
correctly, regardless of the machine it runs on.
Exercise 3-4. Write the analogous function itob(n, s) which converts
the unsigned integer n into a binary character representation in s. Write
itoh, which converts an integer to hexadecimal representation.
Exercise 3-5. Write a version of itoa which accepts three arguments
instead of two. The third argument is a minimum field width; the converted
number must be padded with blanks on the left if necessary to make it wide
enough.
3.7 - Break
It is sometimes convenient to be able to control loop exits other than by
testing at the top or bottom. The break statement provides an early exit
from for, while, and do, just as from switch. A break statement
causes the innermost enclosing loop (or switch) to be exited immediately.
The following program removes trailing blanks and tabs from the end of
each line of input, using a break to exit from a loop when the rightmost
non-blank, non-tab is found.
#include <stdio.h>
#define MAXLINE 1000
main() /* remove trailing blanks and tabs */
{
int n;
char line[MAXLINE];
while ((n = getline(line, MAXLINE)) > 0) {
while (--n >= 0)
if (line[n] != ' ' && line[n] != '\t'
&& line[n] != '\n')
break;
line[n+1] = '\0';
printf("%s\n", line);
}
}
getline returns the length of the line. The inner while loop starts at
the last character of line (recall that --n decrements n before using the
value), and scans backwards looking for the first character that is not a
blank, tab or newline. The loop is broken when one is found, or when n
becomes negative (that is, when the entire line has been scanned). You
should verify that this is correct behavior even when the line contains only
white space characters.
An alternative to break is to put the testing in the loop itself:
while ((n = getline(line, MAXLINE)) > 0) {
while (--n >= 0
&& (line[n]== ' ' || line[n]=='\t' || line[n]=='\n'))
;
...
}
This is inferior to the previous version, because the test is harder to understand.
Tests which require a mixture of &&, ||, !, or parentheses should
generally be avoided.
3.8 - Continue
The continue statement is related to break, but less often used; it
causes the next iteration of the enclosing loop (for, while, do) to begin.
In the while and do, this means that the test part is executed immediately;
in the for, control passes to the re-initialization step. (continue applies
only to loops, not to switch. A continue inside a switch inside a loop
causes the next loop iteration.)
As an example, this fragment processes only positive elements in the array a; negative values are skipped.
for (i = 0; i < N; i++) {
if (a[i] < 0) /* skip negative elements */
continue;
... /* do positive elements */
}
The continue statement is often used when the part of the loop that follows is
complicated, so that reversing a test and indenting another level
would nest the program too deeply.
Now that we have seen the break and continue language structures in C, and learned about "middle-tested" loops, it is time to revisit
the Structured Programming debate and the need for priming operations
when a program must process all data until it finishes and still handle the "there is no data at all" situation.
In the previous chapter the authors skirted the issue by using a top-tested while loop and a side-effect assignment
statement residual value that was compared to EOF to decide when to exit the loop:
int c;
while ((c = getchar()) != EOF) {
/* process your data */
}
Just for fun, now that we know about the for loop, lets rewrite this loop as a for loop just
to make sure who really understand how it works:
int c;
for (c = getchar(); c != EOF; c = getchar()) {
...
}
Now you will almost never see a "read all the characters until EOF" written this way because it is not
"K&R told us to use a while loop for this". But the for formulation is probably clearer than the while formulation
to a reader who is not familiar with the assignment side-effect idiom. In particular the for formulation does not require the
reader to understand that an assignment statement has a residual value of the value that was assigned.
The first part of the for is the "priming read", the second part of the for is the top tested exit criteria
that works both for no data at all and after all data has been read and processed, and the third part of the
for is done "at the bottom of the loop" to advance to the next character or encounter EOF before going back to the
top of the loop and doing the test. The call to getchar() is done twice in the for formulation of the "read all available data"
loop and while we don't like to repeat outselves in code - if it is a small and obvious bit of code - perhaps the code is more clear
with a bit of repetition.
So with all this as background, you can take this page and sit down with a friend at a coffee shop and debate as long as you like about which is the better formulation for the "read all available data" loop.
But if you ask Dr. Chuck's opinion, neither of these is ideal because in the real world we build data oriented loops that usually do a lot more than get one character from standard input. My formulation of a data loop will upset structured programming purists - but I write code in the real world so here is my version:
int c;
while (1) {
c = getchar();
if ( c == EOF ) break;
/* process your data */
}
And if I wanted to skip blanks and new lines I could use both break and continue further angering the
structured programming purists.
int c;
while (1) {
c = getchar();
if ( c == EOF ) break;
if ( c == ' ' || c == '\n' ) continue;
/* process your data */
}
I use this middle tested approach because usually the data I am processing is coming from a more complex source
than the keyboard and I don't want a function with 2-3 parameters stuck in a side effect assignment statement in
a while test. Also sometimes you want to exit a loop, not just based on the return value from the function,
but instead based on the data structure that came back from the function itself.
As these "data processing loops" get more complex, the middle tested loop is a tried and true pattern. Even Kernighan and Ritchie point out its benefits above.
And with that, I have now triggered endless coffee shop conversations about the best way to write a data handling loop.
Exercise 3-6. Write a program which copies its input to its output, except that it prints only one instance from each group of adjacent identical lines. (This is a simple version of the UNIX utility uniq.)
3.9 - Goto's and Labels
C provides the infinitely-abusable goto statement, and labels to branch
to. Formally, the goto is never necessary, and in practice it is almost
always easy to write code without it. We have not used goto in this book.
Nonetheless, we will suggest a few situations where goto's may find a
place. The most common use is to abandon processing in some deeply
nested structure, such as breaking out of two loops at once. The break
statement cannot be used directly since it leaves only the innermost loop.
Thus:
for ( ... )
for ( ... ) {
...
if (disaster)
goto error;
}
...
error:
clean up the mess
This organization is handy if the error-handling code is non-trivial, and if
errors can occur in several places. A label has the same form as a variable
name, and is followed by a colon. It can be attached to any statement in the
same function as the goto.
As another example, consider the problem of finding the first negative element in a two-dimensional array. (Multi-dimensional arrays are discussed in Chapter 5.) One possibility is
for (i = 0; i < N; i++)
for (j = 0; j < M; j++)
if (v[i][j] < 0)
goto found;
/* didn't find */
found:
/* found one at position i, j */
...
Code involving a goto can always be written without one, though
perhaps at the price of some repeated tests or an extra variable. For example, the array search becomes
found = 0;
for (i = 0; i < N && !found; i++)
for (j = 0; j < M && !found; j++)
found = v[i][j] < 0;
if (found)
/* it was at i-1, j-1 */
...
else
/* not found */
...
Although we are not dogmatic about the matter, it does seem that goto
statements should be used sparingly, if at all.
Before we leave control flow, I need to say that I agree with structured programming experts as well as Kernighan and Ritchie in
that using goto is universally a bad idea. There is a lot of little details that make them a real problem - things like how the stack
works in function calls and code blocks and patching the stack up correctly when a goto happens in a deeply-nested mess.
You might be tempted to use a goto when you want to exit multiple nested loops in a single statement (break and continue only
exit the innermost loop). The authors use this as an example above but are quite lukewarm when describing the use of goto.
Usually if your problem is that complex putting things in a function and using return, or adding a few if statements is a better
choice. The Dr. Chuck middle tested loop data processing solves this because the loop is always the innermost loop.
Also as new languages were built the concept of "exceptions" became part of language design and was a far more elegant solution to
a path of of some deeply nested code that just needs to "get out". So most of the time you think goto is a good idea - you should
lean towards a throw / catch pattern to make your intention clear. It is one of the reasons why we prefer languages like Java
or Python over C when writing general purpose code.