C Programming For Everybody-38-55
C Programming For Everybody-38-55
md
AND EXPRESSIONS
Variables and constants are the basic data objects manipulated in a program. Declarations list the
variables to be used, and state what type they have and perhaps what their initial values are.
Operators specify what is to be done to them. Expressions combine variables and constants to
produce new values. These are the topics of this chapter.
Only the first eight characters of an internal name are significant, although more may be used. For
external names such as function names and external variables, the number may be less than eight,
because external names are used by various assemblers and loaders. Appendix A lists details.
Furthermore, keywords like if, else, int, float, etc., are reserved: you can't use them as variable
names. (They must be in lower case.)
In modern C languages, the limitation of the first eight characters of a variable name being unique
has been extended. In most C variants you can expect that at least 30 characters of a variable
name are treated as unique. The eight character limitation was to reflect a typical limitation of
identifier length in assembly language programming and run-time linkers of the time.
Naturally it's wise to choose variable names that mean something, that are related to the purpose of
the variable, and that are unlikely to get mixed up typographically.
In addition, there are a number of qualifiers which can be applied to int: short, long, and unsigned.
short and long refer to different sizes of integers, unsigned numbers obey the laws of arithmetic
modulo 2n, where n is the number of bits in an int; unsigned numbers are always positive. The
declarations for the qualifiers look like
short int x;
long int y;
unsigned int z;
The word int can be omitted in such situations, and typically is.
https://www.cc4e.com/book/chap02.md 1/18
21/03/2024, 13:51 cc4e.com/book/chap02.md
The precision of these objects depends on the machine at hand; the table below shows some
representative values.
The intent is that short and long should provide different lengths of integers where practical; int will
normally reflect the most "natural" size for a particular machine. As you can see, each compiler is free
to interpret short and long as appropriate for its own hardware. About all you should count on is that
short is no longer than long.
In the mid 1970's, C was designed to support a range of computer generations. The PDP-11 was
a common "previous-generation" computer that has less memory and so variable sizes were kept
small. The more modern computers in this chart had a bit more memory and so could afford to
have slightly larger sizes.
The idea of "natural" size is the size that could be loaded, computed, and stored in a single
machine language instruction. You knew as a programmer that when you used int, the machine
code generated from your program would not need to include extra instructions for a simple line
of code like x = x + 1;.
Modern int values in C are 32-bits long and long values are 64-bits long. Even though modern
computers can do 64-bit computations in a single instruction, using the shorter int type when
appropriate can save on memory storage and memory bandwidth when using int values.
Interestingly the length of a 32-bit int leads to a UNIX/C problem with dates that is called the
"Year 2038 Problem". A common way to represent time in UNIX/C programs was as a 32-bit
integer in the number of seconds since 1-Jan-1970. It was quick and easy to compare, add or
subtract these second counter dates in code and databases. But the number of seconds since 1-
Jan-1970 will overflow a 32-bit number on 19-Jan-2038. By now, in order to avoid problems most
systems have converted to storing these "number of seconds" values in long (64-bit) values.
Which gives us almost 300 billion years until we need to worry about overflowing timer counters
again.
Back when C was developed we had two different character sets and two different char variable
lengths. The world generally standardized on the ASCII character set for the core western
characters and Unicode / UTF-8 to represent all characters in all languages worldwide. But that is
a story for another time. For now, just think of the char type as also a byte type, is 8-bits in length
and can store ASCII. Modern languages like Python or Java have excellent support for wide
character sets. In our historical look at C - we will not cover wide or multi-byte characters.
Also if you look at the float and double types, you see different bit-sizes. Even worse, each of
these computers did floating point computation using slightly different hardware implementations
https://www.cc4e.com/book/chap02.md 2/18
21/03/2024, 13:51 cc4e.com/book/chap02.md
and the same code run on different computers would give slightly different results and have
unpredictable behavior on overflow, underflow and other extraordinary floating point operations.
This was solved by the introduction of the IEEE 754 (1985) floating point format standard - which
standardized both the length of float and double but insured that the same set of floating point
calculations would produce the exact same result on different processors.
2.3 Constants
int and float constants have already been disposed of, except to note that the usual
123.456e-7
or
0.12E3
"scientific" notation for float's is also legal. Every floating point constant is taken to be double, so the
"e" notation serves for both float and double.
Long constants are written in the style 123L. An ordinary integer constant that is too long to fit in an int
is also taken to be a long.
There is a notation for octal and hexadecimal constants: a leading 0 (zero) on an int constant implies
octal; a leading 0x or 0X indicates hexadecimal. For example, decimal 31 can be written as 037 in octal
and 0x1f or 0X1F in hex. Hexadecimal and octal constants may also be followed by L to make them
long.
A character constant is a single character written within single quotes, as in 'x'. The value of a
character constant is the numeric value of the character in the machine's character set. For example,
in the ASCII character set the character zero, or '0', is 48, and in EBCDIC '0' is 240, both quite
different from the numeric value 0. Writing '0' instead of a numeric value like 48 or 240 makes the
program independent of the particular value. Character constants participate in numeric operations
just as any other numbers, although they are most often used in comparisons with other characters. A
later section treats conversion rules.
Certain non-graphic characters can be represented in character constants by escape sequences like
\n (newline), \t (tab), \0 (null), \\ (backslash), \' (single quote), etc., which look like two characters,
but are actually only one. In addition, an arbitrary byte-sized bit pattern can be generated by writing
'\ddd'
In the 1970's we sent much of our output to printers. A "form feed" was the character we would
send to the printer to advance to the top of a new page.
The character constant '\0' represents the character with value zero.
'\0' is often written instead of 0 to emphasize the character nature of some expression.
https://www.cc4e.com/book/chap02.md 3/18
21/03/2024, 13:51 cc4e.com/book/chap02.md
A constant expression is an expression that involves only constants. Such expressions are evaluated
at compile time, rather than run time, and accordingly may be used in any place that a constant may
be, as in
or
seconds = 60 * 60 * hours;
"I am a string"
or
The quotes are not part of the string, but serve only to delimit it. The same escape sequences used for
character constants apply in strings; \" represents the double quote character.
Technically, a string is an array whose elements are single characters. The compiler automatically
places the null character \0 at the end of each such string, so programs can conveniently find the end.
This representation means that there is no real limit to how long a string can be, but programs have to
scan one completely to determine its length. The physical storage required is one more location than
the number of characters written between the quotes. The following function strlen(s) returns the
length of a character string s, excluding the terminal \0.
i = 0;
while (s[i] != '\0')
++i;
return(i);
}
Be careful to distinguish between a character constant and a string that contains a single character:
'x' is not the same as "x". The former is a single character, used to produce the numeric value of the
letter x in the machine's character set. The latter is a character string that contains one character (the
letter x) and a \0.
2.4 Declarations
All variables must be declared before use, although certain declarations can be made implicitly by
context. A declaration specifies a type, and is followed by a list of one or more variables of that type,
as in
int lower, upper, step;
char c, line[1000];
Variables can be distributed among declarations in any fashion; the lists above could equally well be
written as
https://www.cc4e.com/book/chap02.md 4/18
21/03/2024, 13:51 cc4e.com/book/chap02.md
int lower;
int upper;
int step;
char c;
char line [1000];
This latter form takes more room, but is convenient for adding a comment to each declaration or for
subsequent modifications.
Variables may also be initialized in their declaration, although there are some restrictions. If the name
is followed by an equals sign and a constant, that serves as an initializer, as in
char backslash = '\\';
int i = 0;
float eps = 1.0e-5;
If the variable in question is external or static, the initialization is done once only, conceptually before
the program starts executing. Explicitly initialized automatic variables are initialized each time the
function they are in is called. Automatic variables for which there is no explicit initializer have
undefined (i.e., garbage) values. External and static variables are initialized to zero by default, but it is
good style to state the initialization anyway.
x % y
produces the remainder when x is divided by y, and thus is zero when y divides x exactly. For
example, a year is a leap year if it is divisible by 4 but not by 100, except that years divisible by 400
are leap years. Therefore
The + and - operators have the same precedence, which is lower than the (identical) precedence of *,
/, and %, which are in turn lower than unary minus. Arithmetic operators group left to right. (A table at
the end of this chapter summarizes precedence and associativity for all operators.)
The order of evaluation is not specified for associative and commutative operators like * and +; the
compiler may rearrange a parenthesized computation involving one of these. Thus a+(b+c) can be
evaluated as (a+b)+c. This rarely makes any difference, but if a particular order is required, explicit
temporary variables must be used.
https://www.cc4e.com/book/chap02.md 5/18
21/03/2024, 13:51 cc4e.com/book/chap02.md
The above paragraph allowing the compiler to reorder computations even in the presense of
parenthesis is known as the "K&R C Rearrangement License". As the author's state, it almost
never makes a difference unless an expression contains a value computed in a function call or
there is a pointer lookup to find a value for the computation that might fail. This rule was subtly
adjusted in the ISO version of C but ISO C still does does not strictly force the order of otherwise
commutative operations - even in the presense of parenthesis.
The good news is that as long as you keep your expressions simple, you don't have to worry
about this rule. Sometimes the real value of parenthesis is to communicate your intentions to
human readers of your code. If you are writing code that depends of the order of overflow,
function calls, and pointer dereferences in a single mathematical expression - perhaps you
should break your expression into multiple statements.
They all have the same precedence. Just below them in precedence are the equality operators:
== !=
which have the same precedence. Relationals have lower precedence than arithmetic operators, so
expressions like i < lim-1 are taken as i < (lim - 1), as would be expected.
More interesting are the logical connectives && and ||. Expressions connected by && or || are
evaluated left to right, and evaluation stops as soon as the truth or falsehood of the result is known.
These properties are critical to writing programs that work. For example, here is a loop from the input
function getline which we wrote in Chapter 1.
for (i=0; i<lim-1 && (c=getchar()) != '\n' && c != EOF; ++i)
s[i] = c;
Clearly, before reading a new character it is necessary to check that there is room to store it in the
array s, so the test i<lim-1 must be made first. Not only that, but if this test fails, we must not go on
and read another character.
Similarly, it would be unfortunate if c were tested against EOF before getchar was called: the call must
occur before the character in c is tested.
The precedence of && is greater than that of ||, and both are lower than relational and equality
operators, so expressions like
need no extra parentheses. But since the precedence of != is higher than assignment, parentheses
are needed in
(c = getchar()) != '\n'
https://www.cc4e.com/book/chap02.md 6/18
21/03/2024, 13:51 cc4e.com/book/chap02.md
One of the great debates of the 1970's was how to use Structured Programming to avoid any use
of "goto" statements that lead to completely unreadable "spaghetti code". Structured code was
easier to read, debug, and validate. Structured Programming advocated for if-then-else, else if,
while-do loops and do-while loops where the loop exit test was at the top or bottom of loops
respectively.
There was a move from Flow Charts with lines, boxes and arrows to structured diagramming
techniques like Nassi–Shneiderman diagrams that used nested boxes to emphasize the
structured nature of the code.
The proponents of each approach tended to approach the problem based on the language they
used. ALGOL and PASCAL programmers were strong advocates of structured programming and
those languages had syntax that encouraged the approach. FORTRAN programmers had
decades of flow-chart style thinking and tended to eschew full adoption of structured
programming.
Kernighan and Ritchie chose a "middle path" and made it so C could support both approaches to
avoid angering either side of the structured programming debate.
One area where the "structured programming" movement kept hitting a snag was implementing a
loop that reads a file and processes data until it reaches the end of file. The loop must be able to
handle an empty file or no data at all. There are three ways to construct a "read and process until
EOF" loop and none of the approaches are ideal.
The loop constructions are "top tested loop with priming read before the loop", "bottom tested
loop with a read as the first statement in the loop and an if-then-else as the rest of the body of the
loop", "top tested infinite loop with a priming read and middle test/exit", and "top tested loop with a
side effect read in the test of the loop" (which is the way that Kernighan and Ritchie chose to
document in this chapter).
This construct is a top-tested loop (which most programmers prefer) and it folds the "priming
read" and put its value into the variable c. But since the getchar call might also return EOF we
need to check if we actually received no data at all and need to avoid executing the body of the
loop or exit the loop.
If EOF were defined as zero instead of -1, the loop could have been written:
while ( c = getchar() ) {
/* body of the loop */
}
Now the getchar function returns a character or zero and the test itself is looking at the side-effect
/ residual value of the assignment statement to decide to start and/or continue the loop body. The
problem with using zero as EOF if you are reading binary (i.e. like a JPG image) data, a zero
character might make perfect sense and we would not want to incorrectly end the loop because
of a zero character in the input data that is not end of file.
So we get the double parenthesis, side effect call to getchar and test of the return value inside
the while test.
https://www.cc4e.com/book/chap02.md 7/18
21/03/2024, 13:51 cc4e.com/book/chap02.md
I am quite confident that this is far more detail that you wanted here in Chapter 2, but it is as good
a time as any to understand that how much thought goes into how a programming language is
designed and documented. By the time we finish Chapter 3 and look at the break and continue
statements which are in languages like Python and Java, you will see that this 50-year-old
structured programming debate is still unresolved in the minds of many software developers.
The unary negation operator ! converts a non-zero or true operand into 0, and a zero or false operand
into 1. A common use of ! is in constructions like
if (!inword)
rather than
if (inword == 0)
It's hard to generalize about which form is better. Constructions like !inword read quite nicely ("if not in
word"), but more complicated ones can be hard to understand.
Exercise 2-1. Write a loop equivalent to the for loop above without using &&.
First, char's and int's may be freely intermixed in arithmetic expressions: every char in an expression
is automatically converted to an int. This permits considerable flexibility in certain kinds of character
transformations. One is exemplified by the function atoi, which converts a string of digits into its
numeric equivalent.
atoi(s) /* convert s to integer */
char s[];
{
int i, n;
n = 0;
for (i = 0; s[i] >= '0' && s[i] <= '9'; ++i)
n = 10 * n + s[i] - '0';
return(n);
}
s[i] - '0'
gives the numeric value of the character stored in s[i] because the values of '0', '1', etc., form a
contiguous increasing positive sequence.
Another example of char to int conversion is the function lower which maps a single character to
lower case for the ASCII character set only. If the character is not an upper case letter, lower returns it
unchanged.
https://www.cc4e.com/book/chap02.md 8/18
21/03/2024, 13:51 cc4e.com/book/chap02.md
{
if (c >= 'A' && c <= 'Z')
return(c + 'a' - 'A');
else
return(c);
}
This works for ASCII because corresponding upper case and lower case letters are a fixed distance
apart as numeric values and each alphabet is contiguous - there is nothing but letters between A and
Z. This latter observation is not true of the EBCDIC character set (IBM 360/370), so this code fails on
such systems - it converts more than letters.
There is one subtle point about the conversion of characters to integers. The language does not
specify whether variables of type char are signed or unsigned quantities. When a char is converted to
an int, can it ever produce a negative integer? Unfortunately, this varies from machine to machine,
reflecting differences in architecture. On some machines (PDP-11, for instance), a char whose
leftmost bit is 1 will be converted to a negative integer ("sign extension"). On others, a char is
promoted to an int by adding zeros at the left end, and thus is always positive.
The definition of C guarantees that any character in the machine's standard character set will never be
negative, so these characters may be used freely in expressions as positive quantities. But arbitrary
bit patterns stored in character variables may appear to be negative on some machines, yet positive
on others.
The most common occurrence of this situation is when the value -1 is used for EOF. Consider the code
char c;
c = getchar();
if (c == EOF)
...
On a machine which does not do sign extension, c is always positive because it is a char, yet EOF is
negative. As a result, the test always fails. To avoid this, we have been careful to use int instead of
char for any variable which holds a value returned by getchar.
The real reason for using int instead of char is not related to any questions of possible sign extension.
It is simply that getchar must return all possible characters (so that it can be used to read arbitrary
input) and, in addition, a distinct EOF value. Thus its value cannot be represented as a char, but must
instead be stored as an int.
Since the book was written before the getchar function was standardized, the text is somewhat
vague in this section. Shortly after the book was published, getchar was put into the stdio.h and
declared to return an integer so as to accommodate all possible characters and the integer -1
value to indicate end of file.
int c;
c = getchar();
if (c == EOF) ...
While the conversion from char to int may or may not have sign extension (and yes it still
depends on the implementation 50 years later), the conversion from int to char is predictable
with the top bits simply being discarded.
https://www.cc4e.com/book/chap02.md 9/18
21/03/2024, 13:51 cc4e.com/book/chap02.md
If you are using the library function gets to read a file line-by-line, we don't need to worry about
this conversion. Since gets returns a pointer to a character array (i.e. a string) - it indicates it has
reached end-of-file by returning a "null" pointer (i.e. there is no more data to give).
Another useful form of automatic type conversion is that relational expressions like i > j and logical
expressions connected by && and || are defined to have value 1 if true, and 0 if false. Thus the
assignment
isdigit = c >= '0' && c <= '9';
sets isdigit to 1 if c is a digit, and to 0 if not. (In the test part of if, while, for, etc., "true" just means
"non-zero.")
Implicit arithmetic conversions work much as expected. In general, if an operator like + or * which
takes two operands (a "binary operator") has operands of different types, the "lower" type is promoted
to the "higher" type before the operation proceeds. The result is of the higher type. More precisely, for
each arithmetic operator, the following sequence of conversion rules is applied.
char and short are converted to int, and float is converted to double.
Then if either operand is double, the other is converted to double, and the result is double.
Otherwise if either operand is long, the other is converted to long, and the result is long.
Otherwise if either operand is unsigned, the other is converted to unsigned, and the result is
unsigned.
Notice that all float values in an expression are converted to double; all floating point arithmetic in C
is done in double precision.
Conversions take place across assignments; the value of the right side is converted to the type of the
left, which is the type of the result. A character is converted to an integer, either by sign extension or
not, as described above. The reverse operation, int to char, is well-behaved - excess high-order bits
are simply discarded. Thus in
int i;
char c;
i = c;
c = i;
the value of c is unchanged. This is true whether or not sign extension is involved.
and
i = x
both cause conversions; float to int causes truncation of any fractional part. double is converted to
float by rounding. Longer int's are converted to shorter ones or to char's by dropping the excess
high-order bits.
https://www.cc4e.com/book/chap02.md 10/18
21/03/2024, 13:51 cc4e.com/book/chap02.md
Since a function argument is an expression, type conversions also take place when arguments are
passed to functions: in particular, char and short become int, and float becomes double. This is why
we have declared function arguments to be int and double even when the function is called with char
and float.
Finally, explicit type conversions can be forced ("coerced") in any expression with a construct called a
cast. In the construction
(type-name) expression
the expression is converted to the named type by the conversion rules above. The precise meaning of
a cast is in fact as if expression were assigned to a variable of the specified type, which is then used in
place of the whole construction. For example, the library routine sqrt expects a double argument, and
will produce nonsense if inadvertently handed something else. So if n is an integer,
sqrt ((double) n)
converts n to double before passing it to sqrt. (Note that the cast produces the value of n in the proper
type; the actual content of n is not altered.) The cast operator has the same precedence as other
unary operators, as summarized in the table at the end of this chapter.
Exercise 2-2. Write the function htoi(s), which converts a string of hexadecimal digits into its
equivalent integer value. The allowable digits are 0 through 9, a through f, and A through F.
if (c == '\n')
++nl;
The unusual aspect is that ++ and -- may be used either as prefix operators (before the variable, as in
++n), or postfix (after the variable: n++). In both cases, the effect is to increment n. But the expression
++n increments n before using its value, while n++ increments n after its value has been used. This
means that in a context where the value is being used, not just the effect, ++n and n++ are different. If n
is 5, then
x = n++;
sets x to 5, but
x = ++n;
sets x to 6. In both cases, n becomes 6. The increment and decrement operators can only be applied
to variables; an expression like x=(i+j)++ is illegal.
choose prefix or postfix according to taste. But there are situations where one or the other is
specifically called for. For instance, consider the function squeeze(s, c) which removes all
occurrences of the character c from the string s.
https://www.cc4e.com/book/chap02.md 11/18
21/03/2024, 13:51 cc4e.com/book/chap02.md
squeeze(s, c) / delete all c from s /
char s[];
int c;
{
int i, j;
Each time a non-c occurs, it is copied into the current j position, and only then is j incremented to be
ready for the next character. This is exactly equivalent to
if (s[i] != c) {
s[j] = s[i];
j++;
}
Another example of a similar construction comes from the getline function which we wrote in Chapter
1, where we can replace
if (c == '\n') {
s[i] = c;
++i;
}
As a third example, the function strcat(s, t) concatenates the string t to the end of the string s.
strcat assumes that there is enough space in s to hold the combination.
i = j = 0;
while (s[i] != '\0') /* find end of s */
i++;
while ((s[i++] = t[j++]) != '\0') /* copy t */
;
}
As each character is copied from t to s, the postfix ++ is applied to both i and j to make sure that they
are in position for the next pass through the loop.
Exercise 2-3. Write an alternate version of squeeze(s1, s2) which deletes each character in s1 which
matches any character in the string s2.
Exercise 2-4. Write the function any(s1, s2) which returns the first location in the string s1 where any
character from the string s2 occurs, or -1 if s1 contains no characters from s2.
The bitwise AND operator & is often used to mask off some set of bits; for example,
c = n & 0177;
sets to zero all but the low-order 7 bits of n. The bitwise OR operator | is used to turn bits on:
x = x | MASK;
You should carefully distinguish the bitwise operators & and | from the logical connectives && and ||,
which imply left-to-right evaluation of a truth value. For example, if x is 1 and y is 2, then x & y is zero
while x && y is one. (Why?)
The shift operators << and >> perform left and right shifts of their left operand by the number of bit
positions given by the right operand. Thus x << 2 shifts x left by two positions, filling vacated bits with
0; this is equivalent to multiplication by 4. Right shifting an unsigned quantity fills vacated bits with 0.
Right shifting a signed quantity will fill with sign bits ("arithmetic shift") on some machines such as the
PDP-11, and with 0-bits ("logical shift") on others.
The unary operator ~ yields the one's complement of an integer; that is, it converts each 1-bit into a 0-
bit and vice versa. This operator typically finds use in expressions like
x & ~077
which masks the last six bits of x to zero. Note that x & ~077 is independent of word length, and is
thus preferable to, for example, x & 0177700, which assumes that x is a 16-bit quantity. The portable
form involves no extra cost, since ~077 is a constant expression and thus evaluated at compile time.
To illustrate the use of some of the bit operators, consider the function getbits(x, p, n) which returns
(right adjusted) the n-bit field of x that begins at position p. We assume that bit position 0 is at the right
end and that n and p are sensible positive values. For example, getbits(x, 4, 3) returns the three
bits in bit positions 4, 3 and 2, right adjusted.
getbits(x, p, n) /* get n bits from position p */
unsigned x, p, n;
{
return((x >> (p+1-n)) & ~(~0 << n));
}
x >> (p+1-n) moves the desired field to the right end of the word. Declaring the argument x to be
unsigned ensures that when it is right-shifted, vacated bits will be filled with zeros, not sign bits,
regardless of the machine the program is run on. ~0 is all 1-bits; shifting it left n bit positions with ~0 <<
n creates a mask with zeros in the rightmost n bits and ones everywhere else; complementing that
with ~ makes a mask with ones in the rightmost n bits.
Bitwise operators may seem unnecessary for modern computers, but if you look at the internal
structure of TCP/IP packets, values are packed very tightly into the headers in order to save
space. C made it possible to write portable TCP/IP implementations on a wide range of hardware
architectures.
https://www.cc4e.com/book/chap02.md 13/18
21/03/2024, 13:51 cc4e.com/book/chap02.md
Bitwise operators also play an important role in encryption, decryption, and checksum
calculations. Modern languages like Java and Python support bitwise operators following the
same patterns that we established in C so things like TCP/IP and encryption algorithms can also
be implemented in these languages.
By defining these operators, it kept software developers from needing to write non-portable
assembly language code to implement these low-level features in operating systems and
libraries.
Exercise 2-6. Write a function wordlength() which computes the word length of the host machine, that
is, the number of bits in an int. The function should be portable, in the sense that the same source
code works on all machines.
Exercise 2-7. Write the function rightrot(n, b) which rotates the integer n to the right by b bit
positions.
Exercise 2-8. Write the function invert(x, p, n) which inverts (i.e., changes 1 into 0 and vice versa)
the n bits of x that begin at position p, leaving the others unchanged.
i = i + 2
in which the left hand side is repeated on the right can be written in the compressed form
i += 2
Most binary operators (operators like + which have a left and right operand) have a corresponding
assignment operator op=, where op is one of
+ - * / % << >> & ^ |
is equivalent to
e1 = (e1) op (e2)
except that e1 is computed only once. Notice the parentheses around e2:
x *= y + 1
is actually
x = x * (y + 1)
rather than
https://www.cc4e.com/book/chap02.md 14/18
21/03/2024, 13:51 cc4e.com/book/chap02.md
x = x * y + 1
As an example, the function bitcount counts the number of 1-bits in its integer argument.
bitcount(n) /* count 1 bits in n */
unsigned n;
{
int b;
for (b = 0; n != 0; n >>= 1)
if (n & 01)
b++;
return(b);
}
Quite apart from conciseness, assignment operators have the advantage that they correspond better
to the way people think. We say "add 2 to i" or "increment i by 2," not "take i, add 2, then put the
result back in i". Thus i += 2. In addition, for a complicated expression like
yyval[yypv[p3+p4] + yypv[p1+p2]] += 2
the assignment operator makes the code easier to understand, since the reader doesn't have to check
painstakingly that two long expressions are indeed the same, or to wonder why they're not. And an
assignment operator may even help the compiler to produce more efficient code.
We have already used the fact that the assignment statement has a value and can occur in
expressions; the most common example is
while ((c = getchar()) != EOF)
...
Assignments using the other assignment operators (+=, -=, etc.) can also occur in expressions,
although it is a less frequent occurrence.
Exercise 2-9. In a 2's complement number system, x & ( x-1 ) deletes the rightmost 1-bit in x.
(Why?) Use this observation to write a faster version of bitcount.
of course compute in z the maximum of a and b. The conditional expression, written with the ternary
operator "?:", provides an alternate way to write this and similar constructions. In the expression
e1 ? e2 : e3
the expression e1 is evaluated first. If it is non-zero (true), then the expression e2 is evaluated, and
that is the value of the conditional expression. Otherwise e3 is evaluated, and that is the value. Only
one of e2 and e3 is evaluated. Thus to set z to the maximum of a and b,
z = (a > b) ? a : b; /* z = max(a, b) */
https://www.cc4e.com/book/chap02.md 15/18
21/03/2024, 13:51 cc4e.com/book/chap02.md
It should be noted that the conditional expression is indeed an expression, and it can be used just as
any other expression. If e2 and e3 are of different types, the type of the result is determined by the
conversion rules discussed earlier in this chapter. For example, if f is a float, and n is an int, then the
expression
(n > 0) ? f : n
Parentheses are not necessary around the first expression of a conditional expression, since the
precedence of ?: is very low, just above assignment. They are advisable anyway, however, since they
make the condition part of the expression easier to see.
The conditional expression often leads to succinct code. For example, this loop prints N elements of an
array, 10 per line, with each column separated by one blank, and with each line (including the last)
terminated by exactly one newline.
for (i = 0; i < N; i++)
printf("%6d%c", a[i], (i%10==9 || i==N-1) ? '\n' : ' ');
A newline is printed after every tenth element, and after the N-th. All other elements are followed by
one blank. Although this might look tricky, it's instructive to try to write it without the conditional
expression.
Exercise 2-10. Rewrite the function lower, which converts upper case letters to lower case, with a
conditional expression instead of if-else.
Operator Associativity
() [] -> . left to right
! ~ ++ -- - (type) * & sizeof right to left
*/% left to right
+- left to right
<< >> left to right
< <= > >= left to right
== != left to right
& left to right
^ left to right
| left to right
&& left to right
|| left to right
https://www.cc4e.com/book/chap02.md 16/18
21/03/2024, 13:51 cc4e.com/book/chap02.md
Operator Associativity
?: right to left
= += -= etc. right to left
, Chapter 3 left to right
The operators -> and . are used to access members of structures; they will be covered in Chapter 6,
along with sizeof (size of an object). Chapter 5 discusses * (indirection) and & (address of).
Note that the precedence of the bitwise logical operators &, ^ and | falls below == and !=. This implies
that bit-testing expressions like
As mentioned before, expressions involving one of the associative and commutative operators (*, +, &,
^, |) can be rearranged even when parenthesized. In most cases this makes no difference
whatsoever; in situations where it might, explicit temporary variables can be used to force a particular
order of evaluation.
C, like most languages, does not specify in what order the operands of an operator are evaluated. For
example, in a statement like
x = f() + g();
f may be evaluated before g or vice versa; thus if either f or g alters an external variable that the other
depends on, x can depend on the order of evaluation. Again, intermediate results can be stored in
temporary variables to ensure a particular sequence.
Similarly, the order in which function arguments are evaluated is not specified, so the statement
printf("%d %d\n", ++n, power(2, n)); /* WRONG */
can (and does) produce different results on different machines, depending on whether or not n is
incremented before power is called. The solution, of course, is to write
++n;
printf("%d %d\n", n, power(2, n));
Function calls, nested assignment statements, and increment and decrement operators cause "side
effects" - some variable is changed as a byproduct of the evaluation of an expression. In any
expression involving side effects, there can be subtle dependencies on the order in which variables
taking part in the expression are stored. One unhappy situation is typified by the statement
a[i] = i++;
The question is whether the subscript is the old value of i or the new. The compiler can do this in
different ways, and generate different answers depending on its interpretation. When side effects
(assignment to actual variables) takes place is left to the discretion of the compiler, since the best
order strongly depends on machine architecture.
The moral of this discussion is that writing code which depends on order of evaluation is a bad
programming practice in any language. Naturally, it is necessary to know what things to avoid, but if
https://www.cc4e.com/book/chap02.md 17/18
21/03/2024, 13:51 cc4e.com/book/chap02.md
you don't know how they are done on various machines, that innocence may help to protect you. (The
C verifier lint will detect most dependencies on order of evaluation.)
The real moral of the story is to use side effect operators very carefully. They are generally only
used in idiomatic situations and then written using simple code. The authors are happy to tell you
everything that you can do in C in great deal and they are also suggesting that just because you
can do something that it does not mean you should do something. Remember that a key aspect
of writing programs is to communicate with future human readers of your code (including you
reading your own code in the future).
With modern-day compilers and optimizers, you gain little performance by writing dense / obtuse
code. Write the code and describe what you want done and let the compiler find the best way to
do it. One of the reasons that a common senior project in many Computer Science degrees was
to write a complier is to make sure all Computer Science students understand that they can trust
the compiler to generate great code.
This is not the ideal book to learn C programming because the 1978 edition does
not reflect the modern C language. Using an obsolete book gives us an
opportunity to take students back in time and understand how the C language was evolving as it laid the
ground work for a future with portable applications.
If you want to learn modern C programming from a book that reflects the modern C language, I suggest you
use the C Programming Language, 2nd Edition, also written by Brian W. Kernighan and Dennis M. Ritchie,
initially published in 1988 and based on the ANSI C (C89) version of the language.
The source code to this book is at https://github.com/csev/cc4e. You are welcome to help in the creation and
editing of the book.
https://www.cc4e.com/book/chap02.md 18/18