Skip to main content

3 - Control Flow

The control flow statements of a language specify the order in which computations are done. We have already met the most common control flow constructions of C in earlier examples; here we will complete the set, and be more precise about the ones discussed before.

3.1 - Statements and Blocks

An expression such as x = 0 or i++ or printf( ... ) becomes a statement when it is followed by a semicolon, as in

x = 0;
i++;
printf(...);

In C, the semicolon is a statement terminator, rather than a separator as it is in Algol-like languages.

The braces { and } are used to group declarations and statements together into a compound statement or block so that they are syntactically equivalent to a single statement. The braces that surround the statements of a function are one obvious example; braces around multiple statements after an if, else, while or for are another. (Variables can actually be declared inside any block; we will talk about this in Chapter 4.) There is never a semicolon after the right brace that ends a block.

Chuckism

Ah C, how do I love thee? Let me count the ways. - Dr. Chuck with homage to Elizabeth Barrett Browning

The humble semicolon is why spacing and line-ends do not matter to C and C-like languages. It means we as programmers can focus all of our white space and lines on communicating our intent to humans. This freedom is not an excuse to write obtuse or dense code (see the Obfuscated Perl Contest) but instead freedom to describe what we mean or use spacing to help us understand our code.

We can take a quick look at how a few other C-like languages treat the semicolon. Java is just like C in that the semicolon terminates statements. Python treats the semicolon as a separator - allowing more than one statement on a single line. But since Python treats the end of line as a statement separator - you generally never use semicolon in Python. But for people like me who automatically add a semicolon when typing code too fast, at least Python ignores the few semicolons I add to my code out of habit. JavaScript treats semicolon as a separator but since JavaScript ignores the end of a line (it is whitespace), semicolons are required when a block of code consists of more than one line. When I write JavaScript, I meticulously include semicolons at the end of all statements because "any good C programmer can write C in any language".

3.2 - If-Else

The if-else statement is used to make decisions. Formally, the syntax is

if (expression)
statement-1
else
statement-2

where the else part is optional. The expression is evaluated; if it is "true" (that is, if expression has a non-zero value), statement-1 is done. If it is "false" (expression is zero) and if there is an else part, statement-2 is executed instead.

Since an if simply tests the numeric value of an expression, certain coding shortcuts are possible. The most obvious is writing

if (expression)

instead of

if (expression != 0)

Sometimes this is natural and clear; at other times it is cryptic.

Because the else part of an if-else is optional, there is an ambiguity when an else is omitted from a nested if sequence. This is resolved in the usual way - the else is associated with the closest previous else-less if. For example, in

if (n > 0)
if (a > b)
z = a;
else
z = b;

the else goes with the inner if, as we have shown by indentation. If that isn't what you want, braces must be used to force the proper association:

if (n > 0) {
if (a > b)
z = a;
}
else
z = b;

The ambiguity is especially pernicious in situations like:

if (n > 0)
for (i = 0; i < n; i++)
if (s[i] > 0) {
printf("...");
return(i);
}
else /* WRONG */
printf("error - n is zero\n");

The indentation shows unequivocally what you want, but the compiler doesn't get the message, and associates the else with the inner if. This kind of bug can be very hard to find.

By the way, notice that there is a semicolon after z = a in

if (a > b)
z = a;
else
z = b;

This is because grammatically, a statement follows the if, and an expression statement like z = a is always terminated by a semicolon.

3.3 - Else-If

The construction

if (expression)
statement
else if (expression)
statement
else if (expression)
statement
else
statement

occurs so often that it is worth a brief separate discussion. This sequence of if's is the most general way of writing a multi-way decision. The expression's are evaluated in order; if any expression is true, the statement associated with it is executed, and this terminates the whole chain. The code for each statement is either a single statement, or a group in braces.

The last else part handles the "none of the above" or default case where none of the other conditions was satisfied. Sometimes there is no explicit action for the default; in that case the trailing

else
statement

can be omitted, or it may be used for error checking to catch an "impossible" condition.

To illustrate a three-way decision, here is a binary search function that decides if a particular value x occurs in the sorted array v. The elements of v must be in increasing order. The function returns the position (a number between 0 and n-1) if x occurs in v, and -1 if not.

binary(x, v, n) /* find x in v[0] ... v[n-1] */
int x, v[], n;
{
int low, high, mid;

low = 0;
high = n - 1;
while (low <= high)
{
mid = (low+high) / 2;
if (x < v[mid])
high = mid - 1;
else if (x > v[mid])
low = mid + 1;
else /* found match */
return (mid);
}
return(-1);
}

The fundamental decision is whether x is less than, greater than, or equal to the middle element v[mid] at each step; this is a natural for else-if.

Chuckism

Note that in the above examples, the else and the if are two language constructs that are just being used idiomatically to construct an else if pattern with indentation that captures the idiom.

If we are pedantic about indentation of the above sequence we would be separating the else and if and then indenting each succeeding block further as follows (brackets added for clarity):

if (expression) {
statement
}
else
{
if (expression) {
statement
}
else
{
if (expression) {
statement
}
else
{
statement
}
}
}

Java and JavaScript keep the else and if as separate language elements and document their idiomatic usage and indentation just like C.

But the Python elif is a new language construct that achieves the same end as shown below:

if (expression) :
block
elif (expression) :
block
elif (expression) :
block
else :
block

The C/Java/JavaScript and Python idioms thankfully look the same when the idiomatic indentation is used. Even FORTRAN 77 supports the ELSE IF construct.

3.4 - Switch

The switch statement is a special multi-way decision maker that tests whether an expression matches one of a number of constant values, and branches accordingly. In Chapter 1 we wrote a program to count the occurrences of each digit, white space, and all other characters, using a sequence of if ... else if ... else. Here is the same program with a switch.

#include <stdio.h>

main() /* count digits, white space, others */
{
int c, i, nwhite, nother, ndigit[10];

nwhite = nother = 0;
for (i = 0; i < 10; i++)
ndigit[i] = 0;

while ((c = getchar()) != EOF)
switch (c) {
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
ndigit[c-'0']++;
break;
case ' ':
case '\n':
case '\t':
nwhite++;
break;
default:
nother++;
break;
}

printf("digits =");
for (i = 0; i < 10; i++)
printf(" %d", ndigit[i]);
printf("\nwhite space = %d, other = %d\n", nwhite, nother);
}

The switch evaluates the integer expression in parentheses (in this program the character c) and compares its value to all the cases. Each case must be labeled by an integer or character constant or constant expression. If a case matches the expression value, execution starts at that case. The case labeled default is executed if none of the other cases is satisfied. A default is optional; if it isn't there and if none of the cases matches, no action at all takes place. Cases and default can occur in any order. Cases must all be different.

The break statement causes an immediate exit from the switch. Because cases serve just as labels, after the code for one case is done, execution falls through to the next unless you take explicit action to escape. break and return are the most common ways to leave a switch. A break statement can also be used to force an immediate exit from while, for and do loops as well, as will be discussed later in this chapter.

Falling through cases is a mixed blessing. On the positive side, it allows multiple cases for a single action, as with the blank, tab or newline in this example. But it also implies that normally each case must end with a break to prevent falling through to the next. Falling through from one case to another is not robust, being prone to disintegration when the program is modified. With the exception of multiple labels for a single computation, fall-throughs should be used sparingly.

As a matter of good form, put a break after the last case (the default here) even though it's logically unnecessary. Some day when another case gets added at the end, this bit of defensive programming will save you.

Chuckism

Ah the switch statement. What is there to say? I think that the switch statement was added to C to compete with the earlier FORTRAN "computed GOTO" statement and to keep low-level programmers from switching to assembly language to implement a branch table.

The authors spend most of the previous section apologizing for the switch statement, so perhaps you should take that as a hint and avoid it in your code.

There are very few situations where a branch table outperforms a series of if-else checks and those are likely deep in library or operating system code. Programmers should only use switch if they understand what a branch table is, and why it is more efficient for this particular bit of their program. Otherwise just use else if to do the readers of your code a favor.

One can only wonder if Guido van Rossum or someone else at Centrum Wiskunde & Informatica (CWI) in the Netherlands looked at the above code example and thought, "Interesing how case works in a C switch statement - a colon at the end of the line and indenting the following lines is an elegant way to visually denote blocks of code."

Exercise 3-1. Write a function expand(s, t) which converts characters like newline and tab into visible escape sequences like \n and \t as it copies the string s to t. Use a switch.

3.5 - Loops - While and For

We have already encountered the while and for loops. In

while (expression)
statement

the expression is evaluated. If it is non-zero, statement is executed and expression is re-evaluated. This cycle continues until expression becomes zero, at which point execution resumes after statement.

The for statement

for ( expr1 ; expr2 ; expr3)
statement

is equivalent to

expr1 ;
while (expr2) {
statement
expr3;
}

Grammatically, the three components of a for are expressions. Most commonly, expr1 and expr3 are assignments or function calls and expr2 is a relational expression. Any of the three parts can be omitted, although the semicolons must remain. If expr1 or expr3 is left out, i is simply dropped from the expansion. If the test, expr2, is not present, it is taken as permanently true, so

for (;;) {
...
}

is an "infinite" loop, presumably to be broken by other means (such as a break or return).

Whether to use while or for is largely a matter of taste. For example, in

while ( (c = getchar () ) == ' ' || c == '\n' || c == '\t')
; /* skip white space characters */

there is no initialization or re-initialization, so the while seems most natural.

The for is clearly superior when there is a simple initialization and re-initialization, since it keeps the loop control statements close together and visible at the top of the loop. This is most obvious in

for (i = 0; i < N; i++)

which is the C idiom for processing the first N elements of an array, the analog of the Fortran or PL/I DO loop. The analogy is not perfect, however, since the limits of a for loop can be altered from within the loop, and the controlling variable i retains its value when the loop terminates for any reason. Because the components of the for are arbitrary expressions, for loops are not restricted to arithmetic progressions. Nonetheless, it is bad style to force unrelated computations into a for; it is better reserved for loop control operations.

As a larger example, here is another version of atoi for converting a string to its numeric equivalent. This one is more general; it copes with optional leading white space and an optional + or - sign. (Chapter 4 shows atof, which does the same conversion for floating point numbers.)

The basic structure of the program reflects the form of the input:

skip white space, if any
get sign, if any
get integer part, convert it

Each step does its part, and leaves things in a clean state for the next. The whole process terminates on the first character that could not be part of a number.

atoi(s) /* convert s to integer */
char s[];
{
int i, n, sign;

for (i=0; s[i]==' ' || s[i]=='\n' || s[i]=='\t'; i++)
; /* skip white space */
sign = 1;
if (s[i] == '+' || s[i] == '-') /* sign */
sign = (s[i++]=='+') ? 1 : -1;
for (n = 0; s[i] >= '0' && s[i] <= '9'; i++)
n = 10 * n + s[i] - '0';
return(sign * n);
}

The advantages of keeping loop control centralized are even more obvious when there are several nested loops. The following function is a Shell sort for sorting an array of integers. The basic idea of the Shell sort is that in early stages, far-apart elements are compared, rather than adjacent ones, as in simple interchange sorts. This tends to eliminate large amounts of disorder quickly, so later stages have less work to do. The interval between compared elements is gradually decreased to one, at which point the sort effectively becomes an adjacent interchange method.

shell(v, n) /* sort v[0]...v[n-1] into increasing order */
int v[], n;
{
int gap, i, j, temp;

for (gap = n/2; gap > 0; gap /= 2)
for (i = gap; i < n; i++)
for (j=i-gap; j>=0 && v[j]>v[j+gap]; j -= gap){
temp = v[j];
v[j] = v[j+gap];
v[j+gap] = temp;
}
}

There are three nested loops. The outermost loop controls the gap between compared elements, shrinking it from n/2 by a factor of two each pass until it becomes zero. The middle loop compares each pair of elements that is separated by gap; the innermost loop reverses any that are out of order. Since gap is eventually reduced to one, all elements are eventually ordered correctly. Notice that the generality of the for makes the outer loop fit the same form as the others, even though it is not an arithmetic progression.

One final C operator is the comma ",", which most often finds use in the for statement. A pair of expressions separated by a comma is evaluated left to right, and the type and value of the result are the type and value of the right operand. Thus in a for statement, it is possible to place multiple expressions in the various parts, for example to process two indices in parallel. This is illustrated in the function reverse(s), which reverses the string s in place.

#include <string.h>

reverse (s) /* reverse string s in place */
char s[];
{
int c, i, j;

for (i = 0, j = strlen(s)-1; i < j; i++, j--) {
c = s[i];
s[i] = s[j];
s[j] = c;
}
}

The commas that separate function arguments, variables in declarations, etc., are not comma operators, and do not guarantee left to right evaluation.

Exercise 3-2. Write a function expand(s1 , s2) which expands shorthand notations like a-z in the string s1 into the equivalent complete list abc...xyz in s2. Allow for letters of either case and digits, and be prepared to handle cases like a-b-c and a-z0-9 and -a-z. (A useful convention is that a leading or trailing - is taken literally.)

3.6 - Loops - Do-while

The while and for loops share the desirable attribute of testing the termination condition at the top, rather than at the bottom, as we discussed in Chapter 1. The third loop in C, the do-while, tests at the bottom after making each pass through the loop body; the body is always executed at least once. The syntax is

do
statement
while (expression) ;

The statement is executed, then expression is evaluated. If it is true, statement is evaluated again, and so on. If the expression becomes false, the loop terminates.

As might be expected, do-while is much less used than while and for, accounting for perhaps five percent of all loops. Nonetheless, it is from time to time valuable, as in the following function itoa, which converts a number to a character string (the inverse of atoi). The job is slightly more complicated than might be thought at first, because the easy methods of generating the digits generate them in the wrong order. We have chosen to generate the string backwards, then reverse it.

itoa(n, s)    /* convert n to characters in s */
char s[];
int n;
{
int i, sign;

if ((sign = n) < 0) /* record sign */
n = -n; /* make n positive */
i = 0;
do { /* generate digits in reverse order */
s[i++] = n % 10 + '0'; /* get next digit */
} while ((n /= 10) > 0); /* delete it */
if (sign < 0)
s[i++] = '-';
s[i] = '\0';
reverse(s);
}

The do-while is necessary, or at least convenient, since at least one character must be installed in the array s, regardless of the value of n. We also used braces around the single statement that makes up the body of the do-while, even though they are unnecessary, so the hasty reader will not mistake the while part for the beginning of a while loop.

Chuckism

It is important for any language to provide top-tested loops and bottom-tested loops. But don't feel bad if you write code for a year and never feel like a bottom-tested loop is the right way to solve a problem you are facing. It is usually rare to write a loop that you insist will run once regardless of its input data.

Exercise 3-3. In a 2's complement number representation, our version of itoa does not handle the largest negative number, that is, the value of n equal to -(2wordsize-1). Explain why not. Modify it to print that value correctly, regardless of the machine it runs on.

Exercise 3-4. Write the analogous function itob(n, s) which converts the unsigned integer n into a binary character representation in s. Write itoh, which converts an integer to hexadecimal representation.

Exercise 3-5. Write a version of itoa which accepts three arguments instead of two. The third argument is a minimum field width; the converted number must be padded with blanks on the left if necessary to make it wide enough.

3.7 - Break

It is sometimes convenient to be able to control loop exits other than by testing at the top or bottom. The break statement provides an early exit from for, while, and do, just as from switch. A break statement causes the innermost enclosing loop (or switch) to be exited immediately.

The following program removes trailing blanks and tabs from the end of each line of input, using a break to exit from a loop when the rightmost non-blank, non-tab is found.

#include <stdio.h>
#define MAXLINE 1000

main() /* remove trailing blanks and tabs */
{
int n;
char line[MAXLINE];

while ((n = getline(line, MAXLINE)) > 0) {
while (--n >= 0)
if (line[n] != ' ' && line[n] != '\t'
&& line[n] != '\n')
break;
line[n+1] = '\0';
printf("%s\n", line);
}
}

getline returns the length of the line. The inner while loop starts at the last character of line (recall that --n decrements n before using the value), and scans backwards looking for the first character that is not a blank, tab or newline. The loop is broken when one is found, or when n becomes negative (that is, when the entire line has been scanned). You should verify that this is correct behavior even when the line contains only white space characters.

An alternative to break is to put the testing in the loop itself:

while ((n = getline(line, MAXLINE)) > 0) {
while (--n >= 0
&& (line[n]== ' ' || line[n]=='\t' || line[n]=='\n'))
;
...
}

This is inferior to the previous version, because the test is harder to understand. Tests which require a mixture of &&, ||, !, or parentheses should generally be avoided.

3.8 - Continue

The continue statement is related to break, but less often used; it causes the next iteration of the enclosing loop (for, while, do) to begin. In the while and do, this means that the test part is executed immediately; in the for, control passes to the re-initialization step. (continue applies only to loops, not to switch. A continue inside a switch inside a loop causes the next loop iteration.)

As an example, this fragment processes only positive elements in the array a; negative values are skipped.

for (i = 0; i < N; i++) {
if (a[i] < 0) /* skip negative elements */
continue;
... /* do positive elements */
}

The continue statement is often used when the part of the loop that follows is complicated, so that reversing a test and indenting another level would nest the program too deeply.

Chuckism

Now that we have seen the break and continue language structures in C, and learned about "middle-tested" loops, it is time to revisit the Structured Programming debate and the need for priming operations when a program must process all data until it finishes and still handle the "there is no data at all" situation.

In the previous chapter the authors skirted the issue by using a top-tested while loop and a side-effect assignment statement residual value that was compared to EOF to decide when to exit the loop:

int c;
while ((c = getchar()) != EOF) {
/* process your data */
}

Just for fun, now that we know about the for loop, lets rewrite this loop as a for loop just to make sure who really understand how it works:

int c;
for (c = getchar(); c != EOF; c = getchar()) {
...
}

Now you will almost never see a "read all the characters until EOF" written this way because it is not "K&R told us to use a while loop for this". But the for formulation is probably clearer than the while formulation to a reader who is not familiar with the assignment side-effect idiom. In particular the for formulation does not require the reader to understand that an assignment statement has a residual value of the value that was assigned.

The first part of the for is the "priming read", the second part of the for is the top tested exit criteria that works both for no data at all and after all data has been read and processed, and the third part of the for is done "at the bottom of the loop" to advance to the next character or encounter EOF before going back to the top of the loop and doing the test. The call to getchar() is done twice in the for formulation of the "read all available data" loop and while we don't like to repeat outselves in code - if it is a small and obvious bit of code - perhaps the code is more clear with a bit of repetition.

So with all this as background, you can take this page and sit down with a friend at a coffee shop and debate as long as you like about which is the better formulation for the "read all available data" loop.

But if you ask Dr. Chuck's opinion, neither of these is ideal because in the real world we build data oriented loops that usually do a lot more than get one character from standard input. My formulation of a data loop will upset structured programming purists - but I write code in the real world so here is my version:

int c;
while (1) {
c = getchar();
if ( c == EOF ) break;
/* process your data */
}

And if I wanted to skip blanks and new lines I could use both break and continue further angering the structured programming purists.

int c;
while (1) {
c = getchar();
if ( c == EOF ) break;
if ( c == ' ' || c == '\n' ) continue;
/* process your data */
}

I use this middle tested approach because usually the data I am processing is coming from a more complex source than the keyboard and I don't want a function with 2-3 parameters stuck in a side effect assignment statement in a while test. Also sometimes you want to exit a loop, not just based on the return value from the function, but instead based on the data structure that came back from the function itself.

As these "data processing loops" get more complex, the middle tested loop is a tried and true pattern. Even Kernighan and Ritchie point out its benefits above.

And with that, I have now triggered endless coffee shop conversations about the best way to write a data handling loop.

Exercise 3-6. Write a program which copies its input to its output, except that it prints only one instance from each group of adjacent identical lines. (This is a simple version of the UNIX utility uniq.)

3.9 - Goto's and Labels

C provides the infinitely-abusable goto statement, and labels to branch to. Formally, the goto is never necessary, and in practice it is almost always easy to write code without it. We have not used goto in this book.

Nonetheless, we will suggest a few situations where goto's may find a place. The most common use is to abandon processing in some deeply nested structure, such as breaking out of two loops at once. The break statement cannot be used directly since it leaves only the innermost loop. Thus:

  for ( ... )
for ( ... ) {
...
if (disaster)
goto error;
}
...

error:
clean up the mess

This organization is handy if the error-handling code is non-trivial, and if errors can occur in several places. A label has the same form as a variable name, and is followed by a colon. It can be attached to any statement in the same function as the goto.

As another example, consider the problem of finding the first negative element in a two-dimensional array. (Multi-dimensional arrays are discussed in Chapter 5.) One possibility is

  for (i = 0; i < N; i++)
for (j = 0; j < M; j++)
if (v[i][j] < 0)
goto found;
/* didn't find */
found:
/* found one at position i, j */
...

Code involving a goto can always be written without one, though perhaps at the price of some repeated tests or an extra variable. For example, the array search becomes

found = 0;
for (i = 0; i < N && !found; i++)
for (j = 0; j < M && !found; j++)
found = v[i][j] < 0;
if (found)
/* it was at i-1, j-1 */
...
else
/* not found */
...

Although we are not dogmatic about the matter, it does seem that goto statements should be used sparingly, if at all.

Chuckism

Before we leave control flow, I need to say that I agree with structured programming experts as well as Kernighan and Ritchie in that using goto is universally a bad idea. There is a lot of little details that make them a real problem - things like how the stack works in function calls and code blocks and patching the stack up correctly when a goto happens in a deeply-nested mess.

You might be tempted to use a goto when you want to exit multiple nested loops in a single statement (break and continue only exit the innermost loop). The authors use this as an example above but are quite lukewarm when describing the use of goto.

Usually if your problem is that complex putting things in a function and using return, or adding a few if statements is a better choice. The Dr. Chuck middle tested loop data processing solves this because the loop is always the innermost loop.

Also as new languages were built the concept of "exceptions" became part of language design and was a far more elegant solution to a path of of some deeply nested code that just needs to "get out". So most of the time you think goto is a good idea - you should lean towards a throw / catch pattern to make your intention clear. It is one of the reasons why we prefer languages like Java or Python over C when writing general purpose code.