Directory listing of http: uap.unnes.ac.id ebook electronic book 1 C++ Programming Language 3rd E

________________________________________

________________________________________________________________________________________________________________________________________________________________

6
________________________________________

________________________________________________________________________________________________________________________________________________________________

Expressions and Statements
Premature optimization
is the root of all evil.
– D. Knuth
On the other hand,
we cannot ignore efficiency.
– Jon Bentley

Desk calculator example — input — command line arguments — expression summary
— logical and relational operators — increment and decrement — free store — explicit
type conversion — statement summary — declarations — selection statements — declarations in conditions — iteration statements — the infamous ggoottoo — comments and
indentation — advice — exercises.


6.1 A Desk Calculator [expr.calculator]
Statements and expressions are introduced by presenting a desk calculator program that provides
the four standard arithmetic operations as infix operators on floating-point numbers. The user can
also define variables. For example, given the input
r = 22.55
aarreeaa = ppii * r * r

(pi is predefined) the calculator program will write
22.55
1199.663355

where 22.55 is the result of the first line of input and 1199.663355 is the result of the second.

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

108

Expressions and Statements


Chapter 6

The calculator consists of four main parts: a parser, an input function, a symbol table, and a
driver. Actually, it is a miniature compiler in which the parser does the syntactic analysis, the input
function handles input and lexical analysis, the symbol table holds permanent information, and the
driver handles initialization, output, and errors. We could add many features to this calculator to
make it more useful (§6.6[20]), but the code is long enough as it is, and most features would just
add code without providing additional insight into the use of C++.
6.1.1 The Parser [expr.parser]
Here is a grammar for the language accepted by the calculator:
pprrooggrraam
m:
E
EN
ND
D
eexxpprr__lliisstt E
EN
ND

D

// END is end-of-input

eexxpprr__lliisstt:
eexxpprreessssiioonn P
PR
RIIN
NT
T
eexxpprreessssiioonn P
PR
RIIN
NT
T eexxpprr__lliisstt

// PRINT is semicolon

eexxpprreessssiioonn:
eexxpprreessssiioonn + tteerrm

m
eexxpprreessssiioonn - tteerrm
m
tteerrm
m
tteerrm
m:
tteerrm
m / pprriim
maarryy
tteerrm
m * pprriim
maarryy
pprriim
maarryy
pprriim
maarryy:
N
NU
UM

MB
BE
ER
R
N
NA
AM
ME
E
N
NA
AM
ME
E = eexxpprreessssiioonn
- pprriim
maarryy
( eexxpprreessssiioonn )

In other words, a program is a sequence of expressions separated by semicolons. The basic units of
an expression are numbers, names, and the operators *, /, +, - (both unary and binary), and =.

Names need not be declared before use.
The style of syntax analysis used is usually called recursive descent; it is a popular and straightforward top-down technique. In a language such as C++, in which function calls are relatively
cheap, it is also efficient. For each production in the grammar, there is a function that calls other
functions. Terminal symbols (for example, E
EN
ND
D, N
NU
UM
MB
BE
ER
R, +, and -) are recognized by the lexical analyzer, ggeett__ttookkeenn(); and nonterminal symbols are recognized by the syntax analyzer functions, eexxpprr(), tteerrm
m(), and pprriim
m(). As soon as both operands of a (sub)expression are known, the
expression is evaluated; in a real compiler, code could be generated at this point.
The parser uses a function ggeett__ttookkeenn() to get input. The value of the most recent call of
ggeett__ttookkeenn() can be found in the global variable ccuurrrr__ttookk. The type of ccuurrrr__ttookk is the enumeration T
Tookkeenn__vvaalluuee:


The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

Section 6.1.1

The Parser

eennuum
m T
Tookkeenn__vvaalluuee {
N
NA
AM
ME
E,
N
NU
UM
MB
BE

ER
R,
E
EN
ND
D,
P
PL
LU
USS=´+´,
M
MIIN
NU
USS=´-´, M
MU
UL
L=´*´,
P
PR
RIIN

NT
T=´;´, A
ASSSSIIG
GN
N=´=´, L
LP
P=´(´,
};

109

D
DIIV
V=´/´,
R
RP
P=´)´

T
Tookkeenn__vvaalluuee ccuurrrr__ttookk = P

PR
RIIN
NT
T;

Representing each token by the integer value of its character is convenient and efficient and can be
a help to people using debuggers. This works as long as no character used as input has a value used
as an enumerator – and no character set I know of has a printing character with a single-digit integer value. I chose P
PR
RIIN
NT
T as the initial value for ccuurrrr__ttookk because that is the value it will have
after the calculator has evaluated an expression and displayed its value. Thus, I ‘‘start the system’’
in a normal state to minimize the chance of errors and the need for special startup code.
Each parser function takes a bbooooll (§4.2) argument indicating whether the function needs to call
ggeett__ttookkeenn() to get the next token. Each parser function evaluates ‘‘its’’ expression and returns the
value. The function eexxpprr() handles addition and subtraction. It consists of a single loop that looks
for terms to add or subtract:
ddoouubbllee eexxpprr(bbooooll ggeett)
{

ddoouubbllee lleefftt = tteerrm
m(ggeett);

// add and subtract

ffoorr (;;)
// ‘‘forever’’
ssw
wiittcchh (ccuurrrr__ttookk) {
ccaassee P
PL
LU
USS:
lleefftt += tteerrm
m(ttrruuee);
bbrreeaakk;
ccaassee M
MIIN
NU
USS:
lleefftt -= tteerrm
m(ttrruuee);
bbrreeaakk;
ddeeffaauulltt:
rreettuurrnn lleefftt;
}
}

This function really does not do much itself. In a manner typical of higher-level functions in a
large program, it calls other functions to do the work.
The switch-statement tests the value of its condition, which is supplied in parentheses after the
ssw
wiittcchh keyword, against a set of constants. The break-statements are used to exit the switchstatement. The constants following the ccaassee labels must be distinct. If the value tested does not
match any ccaassee label, the ddeeffaauulltt is chosen. The programmer need not provide a ddeeffaauulltt.
Note that an expression such as 22-33+44 is evaluated as (22-33)+44, as specified in the grammar.
The curious notation ffoorr(;;) is the standard way to specify an infinite loop; you could pronounce it ‘‘forever.’’ It is a degenerate form of a for-statement (§6.3.3); w
whhiillee(ttrruuee) is an alternative. The switch-statement is executed repeatedly until something different from + and - is found,
and then the return-statement in the default case is executed.
The operators += and -= are used to handle the addition and subtraction; lleefftt=lleefftt+tteerrm
m() and

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

110

Expressions and Statements

Chapter 6

lleefftt=lleefftt-tteerrm
m() could have been used without changing the meaning of the program. However,
lleefftt+=tteerrm
m() and lleefftt-=tteerrm
m() not only are shorter but also express the intended operation
directly. Each assignment operator is a separate lexical token, so a + = 11; is a syntax error because
of the space between the + and the =.
Assignment operators are provided for the binary operators
+

-

*

/

%

&

|

^

>

so that the following assignment operators are possible
=

+=

-=

*=

/=

%=

&=

|=

^=

=

The % is the modulo, or remainder, operator; &, |, and ^ are the bitwise logical operators AND,
OR, and exclusive OR; > are the left shift and right shift operators; §6.2 summarizes the
operators and their meanings. For a binary operator @ applied to operands of built-in types, an
expression xx@
@=
=yy means xx=
=xx@
@yy, except that x is evaluated once only.
Chapter 8 and Chapter 9 discuss how to organize a program as a set of modules. With one
exception, the declarations for this calculator example can be ordered so that everything is declared
exactly once and before it is used. The exception is eexxpprr(), which calls tteerrm
m(), which calls
pprriim
m(), which in turn calls eexxpprr(). This loop must be broken somehow. A declaration
ddoouubbllee eexxpprr(bbooooll);

before the definition of pprriim
m() will do nicely.
Function tteerrm
m() handles multiplication and division in the same way eexxpprr() handles addition
and subtraction:
ddoouubbllee tteerrm
m(bbooooll ggeett)
{
ddoouubbllee lleefftt = pprriim
m(ggeett);

// multiply and divide

ffoorr (;;)
ssw
wiittcchh (ccuurrrr__ttookk) {
ccaassee M
MU
UL
L:
lleefftt *= pprriim
m(ttrruuee);
bbrreeaakk;
ccaassee D
DIIV
V:
iiff (ddoouubbllee d = pprriim
m(ttrruuee)) {
lleefftt /= dd;
bbrreeaakk;
}
rreettuurrnn eerrrroorr("ddiivviiddee bbyy 00");
ddeeffaauulltt:
rreettuurrnn lleefftt;
}
}

The result of dividing by zero is undefined and usually disastrous. We therefore test for 0 before
dividing and call eerrrroorr() if we detect a zero divisor. The function eerrrroorr() is described in §6.1.4.
The variable d is introduced into the program exactly where it is needed and initialized immediately. The scope of a name introduced in a condition is the statement controlled by that condition,

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

Section 6.1.1

The Parser

111

and the resulting value is the value of the condition (§6.3.2.1). Consequently, the division and
assignment lleefftt/=dd is done if and only if d is nonzero.
The function pprriim
m() handling a primary is much like eexxpprr() and tteerrm
m(), except that because
we are getting lower in the call hierarchy a bit of real work is being done and no loop is necessary:
ddoouubbllee nnuum
mbbeerr__vvaalluuee;
ssttrriinngg ssttrriinngg__vvaalluuee;
ddoouubbllee pprriim
m(bbooooll ggeett)
{
iiff (ggeett) ggeett__ttookkeenn();

// handle primaries

ssw
wiittcchh (ccuurrrr__ttookk) {
ccaassee N
NU
UM
MB
BE
ER
R:
// floating-point constant
{
ddoouubbllee v = nnuum
mbbeerr__vvaalluuee;
ggeett__ttookkeenn();
rreettuurrnn vv;
}
ccaassee N
NA
AM
ME
E:
{
ddoouubbllee& v = ttaabbllee[ssttrriinngg__vvaalluuee];
iiff (ggeett__ttookkeenn() == A
ASSSSIIG
GN
N) v = eexxpprr(ttrruuee);
rreettuurrnn vv;
}
ccaassee M
MIIN
NU
USS:
// unary minus
rreettuurrnn -pprriim
m(ttrruuee);
ccaassee L
LP
P:
{
ddoouubbllee e = eexxpprr(ttrruuee);
iiff (ccuurrrr__ttookk != R
RP
P) rreettuurrnn eerrrroorr(") eexxppeecctteedd");
ggeett__ttookkeenn();
// eat ’)’
rreettuurrnn ee;
}
ddeeffaauulltt:
rreettuurrnn eerrrroorr("pprriim
maarryy eexxppeecctteedd");
}
}

When a N
NU
UM
MB
BE
ER
R (that is, an integer or floating-point literal) is seen, its value is returned. The
input routine ggeett__ttookkeenn() places the value in the global variable nnuum
mbbeerr__vvaalluuee. Use of a global
variable in a program often indicates that the structure is not quite clean – that some sort of optimization has been applied. So it is here. Ideally, a lexical token consists of two parts: a value specifying the kind of token (a T
Tookkeenn__vvaalluuee in this program) and (when needed) the value of the token.
Here, there is only a single, simple variable, ccuurrrr__ttookk, so the global variable nnuum
mbbeerr__vvaalluuee is
needed to hold the value of the last N
NU
UM
MB
BE
ER
R read. Eliminating this spurious global variable is left
as an exercise (§6.6[21]). Saving the value of nnuum
mbbeerr__vvaalluuee in the local variable v before calling
ggeett__ttookkeenn() is not really necessary. For every legal input, the calculator always uses one number
in the computation before reading another from input. However, saving the value and displaying it
correctly after an error helps the user.
In the same way that the value of the last N
NU
UM
MB
BE
ER
R is kept in nnuum
mbbeerr__vvaalluuee, the character
string representation of the last N
NA
AM
ME
E seen is kept in ssttrriinngg__vvaalluuee. Before doing anything to a

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

112

Expressions and Statements

Chapter 6

name, the calculator must first look ahead to see if it is being assigned to or simply read. In both
cases, the symbol table is consulted. The symbol table is a m
maapp (§3.7.4, §17.4.1):
m
maapp ttaabbllee;

That is, when ttaabbllee is indexed by a ssttrriinngg, the resulting value is the ddoouubbllee corresponding to the
ssttrriinngg. For example, if the user enters
rraaddiiuuss = 66337788.338888;

the calculator will execute
ddoouubbllee& v = ttaabbllee["rraaddiiuuss"];
// ... expr() calculates the value to be assigned ...
v = 66337788.338888;

The reference v is used to hold on to the ddoouubbllee associated with rraaddiiuuss while eexxpprr() calculates the
value 66337788.338888 from the input characters.
6.1.2 The Input Function [expr.input]
Reading input is often the messiest part of a program. This is because a program must communicate with a person, it must cope with that person’s whims, conventions, and seemingly random
errors. Trying to force the person to behave in a manner more suitable for the machine is often
(rightly) considered offensive. The task of a low-level input routine is to read characters and compose higher-level tokens from them. These tokens are then the units of input for higher-level routines. Here, low-level input is done by ggeett__ttookkeenn(). Writing a low-level input routine need not be
an everyday task. Many systems provide standard functions for this.
I build ggeett__ttookkeenn() in two stages. First, I provide a deceptively simple version that imposes a
burden on the user. Next, I modify it into a slightly less elegant, but much easier to use, version.
The idea is to read a character, use that character to decide what kind of token needs to be composed, and then return the T
Tookkeenn__vvaalluuee representing the token read.
The initial statements read the first non-whitespace character into cchh and check that the read
operation succeeded:
T
Tookkeenn__vvaalluuee ggeett__ttookkeenn()
{
cchhaarr cchh = 00;
cciinn>>cchh;
ssw
wiittcchh (cchh) {
ccaassee 00:
rreettuurrnn ccuurrrr__ttookk=E
EN
ND
D;

// assign and return

By default, operator >> skips whitespace (that is, spaces, tabs, newlines, etc.) and leaves the value
of cchh unchanged if the input operation failed. Consequently, cchh==00 indicates end of input.
Assignment is an operator, and the result of the assignment is the value of the variable assigned
to. This allows me to assign the value E
EN
ND
D to ccuurrrr__ttookk and return it in the same statement. Having a single statement rather than two is useful in maintenance. If the assignment and the return
became separated in the code, a programmer might update the one and forget to update to the other.

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

Section 6.1.2

The Input Function

113

Let us look at some of the cases separately before considering the complete function. The
expression terminator ´;´, the parentheses, and the operators are handled simply by returning their
values:
ccaassee ´;´:
ccaassee ´*´:
ccaassee ´/´:
ccaassee ´+´:
ccaassee ´-´:
ccaassee ´(´:
ccaassee ´)´:
ccaassee ´=´:
rreettuurrnn ccuurrrr__ttookk=T
Tookkeenn__vvaalluuee(cchh);

Numbers are handled like this:
ccaassee ´00´: ccaassee ´11´: ccaassee ´22´: ccaassee ´33´: ccaassee ´44´:
ccaassee ´55´: ccaassee ´66´: ccaassee ´77´: ccaassee ´88´: ccaassee ´99´:
ccaassee ´.´:
cciinn.ppuuttbbaacckk(cchh);
cciinn >> nnuum
mbbeerr__vvaalluuee;
rreettuurrnn ccuurrrr__ttookk=N
NU
UM
MB
BE
ER
R;

Stacking ccaassee labels horizontally rather than vertically is generally not a good idea because this
arrangement is harder to read. However, having one line for each digit is tedious. Because operator >> is already defined for reading floating-point constants into a ddoouubbllee, the code is trivial. First
the initial character (a digit or a dot) is put back into cciinn. Then the constant can be read into
nnuum
mbbeerr__vvaalluuee.
A name is handled similarly:
ddeeffaauulltt:
// NAME, NAME =, or error
iiff (iissaallpphhaa(cchh)) {
cciinn.ppuuttbbaacckk(cchh);
cciinn>>ssttrriinngg__vvaalluuee;
rreettuurrnn ccuurrrr__ttookk=N
NA
AM
ME
E;
}
eerrrroorr("bbaadd ttookkeenn");
rreettuurrnn ccuurrrr__ttookk=P
PR
RIIN
NT
T;

The standard library function iissaallpphhaa() (§20.4.2) is used to avoid listing every character as a separate ccaassee label. Operator >> applied to a string (in this case, ssttrriinngg__vvaalluuee) reads until it hits whitespace. Consequently, a user must terminate a name by a space before an operator using the name as
an operand. This is less than ideal, so we will return to this problem in §6.1.3.
Here, finally, is the complete input function:
T
Tookkeenn__vvaalluuee ggeett__ttookkeenn()
{
cchhaarr cchh = 00;
cciinn>>cchh;

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

114

Expressions and Statements

Chapter 6

ssw
wiittcchh (cchh) {
ccaassee 00:
rreettuurrnn ccuurrrr__ttookk=E
EN
ND
D;
ccaassee ´;´:
ccaassee ´*´:
ccaassee ´/´:
ccaassee ´+´:
ccaassee ´-´:
ccaassee ´(´:
ccaassee ´)´:
ccaassee ´=´:
rreettuurrnn ccuurrrr__ttookk=T
Tookkeenn__vvaalluuee(cchh);
ccaassee ´00´: ccaassee ´11´: ccaassee ´22´: ccaassee ´33´: ccaassee ´44´:
ccaassee ´55´: ccaassee ´66´: ccaassee ´77´: ccaassee ´88´: ccaassee ´99´:
ccaassee ´.´:
cciinn.ppuuttbbaacckk(cchh);
cciinn >> nnuum
mbbeerr__vvaalluuee;
rreettuurrnn ccuurrrr__ttookk=N
NU
UM
MB
BE
ER
R;
ddeeffaauulltt:
// NAME, NAME =, or error
iiff (iissaallpphhaa(cchh)) {
cciinn.ppuuttbbaacckk(cchh);
cciinn>>ssttrriinngg__vvaalluuee;
rreettuurrnn ccuurrrr__ttookk=N
NA
AM
ME
E;
}
eerrrroorr("bbaadd ttookkeenn");
rreettuurrnn ccuurrrr__ttookk=P
PR
RIIN
NT
T;
}
}

The conversion of an operator to its token value is trivial because the T
Tookkeenn__vvaalluuee of an operator
was defined as the integer value of the operator (§4.8).
6.1.3 Low-level Input [expr.low]
Using the calculator as defined so far reveals a few inconveniences. It is tedious to remember to
add a semicolon after an expression in order to get its value printed, and having a name terminated
by whitespace only is a real nuisance. For example, xx=77 is an identifier – rather than the identifier
x followed by the operator = and the number 77. Both problems are solved by replacing the typeoriented default input operations in ggeett__ttookkeenn() with code that reads individual characters.
First, we’ll make a newline equivalent to the semicolon used to mark the end of expression:
T
Tookkeenn__vvaalluuee ggeett__ttookkeenn()
{
cchhaarr cchh;
ddoo { // skip whitespace except ’\n’
iiff(!cciinn.ggeett(cchh)) rreettuurrnn ccuurrrr__ttookk = E
EN
ND
D;
}w
whhiillee (cchh!=´\\nn´ && iissssppaaccee(cchh));

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

Section 6.1.3

Low-level Input

115

ssw
wiittcchh (cchh) {
ccaassee ´;´:
ccaassee ´\\nn´:
rreettuurrnn ccuurrrr__ttookk=P
PR
RIIN
NT
T;

A do-statement is used; it is equivalent to a while-statement except that the controlled statement is
always executed at least once. The call cciinn.ggeett(cchh) reads a single character from the standard
input stream into cchh. By default, ggeett() does not skip whitespace the way ooppeerraattoorr >> does. The
test iiff (!cciinn.ggeett(cchh)) fails if no character can be read from cciinn; in this case, E
EN
ND
D is returned to
terminate the calculator session. The operator ! (NOT) is used because ggeett() returns ttrruuee in case
of success.
The standard library function iissssppaaccee() provides the standard test for whitespace (§20.4.2);
iissssppaaccee(cc) returns a nonzero value if c is a whitespace character and zero otherwise. The test is
implemented as a table lookup, so using iissssppaaccee() is much faster than testing for the individual
whitespace characters. Similar functions test if a character is a digit – iissddiiggiitt() – a letter – iissaall-pphhaa() – or a digit or letter – iissaallnnuum
m().
After whitespace has been skipped, the next character is used to determine what kind of lexical
token is coming.
The problem caused by >> reading into a string until whitespace is encountered is solved by
reading one character at a time until a character that is not a letter or a digit is found:
ddeeffaauulltt:
// NAME, NAME=, or error
iiff (iissaallpphhaa(cchh)) {
ssttrriinngg__vvaalluuee = cchh;
w
whhiillee (cciinn.ggeett(cchh) && iissaallnnuum
m(cchh)) ssttrriinngg__vvaalluuee.ppuusshh__bbaacckk(cchh);
cciinn.ppuuttbbaacckk(cchh);
rreettuurrnn ccuurrrr__ttookk=N
NA
AM
ME
E;
}
eerrrroorr("bbaadd ttookkeenn");
rreettuurrnn ccuurrrr__ttookk=P
PR
RIIN
NT
T;

Fortunately, these two improvements could both be implemented by modifying a single local section of code. Constructing programs so that improvements can be implemented through local modifications only is an important design aim.
6.1.4 Error Handling [expr.error]
Because the program is so simple, error handling is not a major concern. The error function simply
counts the errors, writes out an error message, and returns:
iinntt nnoo__ooff__eerrrroorrss;
ddoouubbllee eerrrroorr(ccoonnsstt ssttrriinngg& ss)
{
nnoo__ooff__eerrrroorrss++;
cceerrrr