Implementation of lookahead
10.2.5 Implementation of lookahead
Function lookaheadp takes a grammar and a production, and returns the lookahead set of the production. It is defined in terms of four functions. Each of the first three functions will be defined in a separate subsection below, the fourth function is defined in this subsection.
LL Parsing
•
isEmpty :: (Ord s,Symbol s) => CFG s -> s -> Bool Function isEmpty takes a grammar and a nonterminal and determines whether
or not the empty string can be derived from the nonterminal in the grammar. (This function was called empty in Definition 3.)
•
firsts :: (Ord s, Symbol s) => CFG s -> [(s,[s])] Function firsts takes a grammar and computes the first set of each symbol
(the first set of a terminal is the terminal itself).
•
follow :: (Ord s, Symbol s) => CFG s -> [(s,[s])] Function follow takes a grammar and computes the follow set of each non-
terminal (so it associates a list of symbols with each nonterminal).
•
lookSet :: Ord s
(s -> Bool) -> -- isEmpty (s -> [s]) -> -- firsts? (s -> [s]) -> -- follow? (s, [s])
-> -- production
[s]
-- lookahead set
Note that we use the operator ?, see Section 6.4.2, on the firsts and follow association lists. Function lookSet takes a predicate, two functions that given
a nonterminal return the first and follow set, respectively, and a production, and returns the lookahead set of the production. Function lookSet is intro- duced after the definition of function lookaheadp.
Now we define:
lookaheadp :: (Symbol s, Ord s) => CFG s -> (s,[s]) -> [s] lookaheadp grammar =
lookSet (isEmpty grammar) ((firsts grammar)?) ((follow grammar)?) We will exemplify the definition of function lookSet with the grammar exGrammar,
with the following productions:
S→ AaS | B | CB A→ SC | B→ A|b C→ D
D→ d Consider the production S → AaS. The lookahead set of the production contains
the set of symbols which can appear as the first terminal symbol of a sequence of symbols derived from A. But, since the nonterminal symbol A can derive the empty string, the lookahead set also contains the symbol a.
Consider the production A → SC. The lookahead set of the production contains the set of symbols which can appear as the first terminal symbol of a sequence of symbols derived from S. But, since the nonterminal symbol S can derive the empty string, the lookahead set also contains the set of symbols which can appear as the first terminal symbol of a sequence of symbols derived from C.
10.2 LL Parsing: Implementation
Finally, consider the production B → A. The lookahead set of the production contains the set of symbols which can appear as the first terminal symbol of a sequence of symbols derived from A. But, since the nonterminal symbol A can derive the empty string, the lookahead set also contains the set of terminal symbols which can follow the nonterminal symbol B in some derivation.
The examples show that it is useful to have functions firsts and follow in which, for every nonterminal symbol n, we can look up the terminal symbols which can appear as the first terminal symbol of a sequence of symbols in some derivation from n and the set of terminal symbols which can follow the nonterminal symbol n in a sequence of symbols occurring in some derivation respectively. It turns out that the definition of function follow also makes use of a function lasts which is similar to the function firsts, but which deals with last nonterminal symbols rather than first terminal ones.
The examples also illustrate a control structure which will be used very often in the following algorithms: we will fold over right-hand sides. While doing so we compute sets of symbols for all the symbols of the right-hand side which we encounter and collect them into a final set of symbols. Whenever such a list for a symbol is computed, there are always two possibilities:
• either we continue folding and return the result of taking the union of the set
obtained from the current element and the set obtained by recursively folding over the rest of the right-hand side
• or we stop folding and immediately return the set obtained from the current
element. We continue if the current symbol is a nonterminal which can derive the empty
sequence and we stop if the current symbol is either a terminal symbol or a non- terminal symbol which cannot derive the empty sequence. The following function makes this statement more precise.
foldrRhs :: Ord s =>
(s -> Bool) -> (s -> [s]) -> [s] ->
[s]
foldrRhs p f start = foldr op start
where op x xs = f x ‘union‘ if p x then xs else [] The function foldrRhs is, of course, most naturally defined in terms of the function
foldr. This function is somewhere in between a general purpose and an application specific function (we could easily have made it more general though). In the exercises we give an alternative characterisation of foldRhs. We will also need a function scanrRhs which is like foldrRhs but accumulates intermediate results in a list. The function scanrRhs is most naturally defined in terms of the function scanr.
scanrRhs :: Ord s =>
(s -> Bool) -> (s -> [s]) ->
LL Parsing
[s] -> [[s]]
scanrRhs p f start = scanr op start
where op x xs = f x ‘union‘ if p x then xs else [] Finally, we will also need a function scanlRhs which does the same job as scanrRhs
but in the opposite direction. The easiest way to define scanlRhs is in terms of scanrRhs and reverse.
scanlRhs p f start = reverse . scanrRhs p f start . reverse We now return to the function lookSet.
lookSet :: Ord s =>
(s -> Bool) -> (s -> [s]) ->
(s,[s]) -> [s]
lookSet p f g (nt,rhs) = foldrRhs p f (g nt) rhs The function lookSet makes use of foldrRhs to fold over an right-hand side. As
stated above, the function foldrRhs continues processing an right-hand side only if it encounters a nonterminal symbol for which p (so isEmpty in the lookSet in- stance lookaheadp) holds. Thus, the set g nt (follow?nt in the lookSet instance lookaheadp) is only important for those right-hand sides for nt that consist of non- terminals that can all derive the empty sequence. We can now (assuming that the definitions of the auxiliary functions are given) use the function lookaheadp instance of lookSet to compute the lookahead sets of all productions.
look nt rhs = lookaheadp exGrammar (nt,rhs) ? look ’S’ "AaS"
dba ? look ’S’ "B" dba ? look ’S’ "CB"
d ? look ’A’ "SC" dba ? look ’A’ ""
ad ? look ’B’ "A" dba ? look ’B’ "b"
b ? look ’C’ "D"
d ? look ’D’ "d"
d
10.2 LL Parsing: Implementation
It is clear from this result that exGrammar is not an LL(1)-grammar. Let us have
a closer look at how these lookahead sets are obtained. We will have to use the functions firsts and follow and the predicate isEmpty for computing intermediate results. The corresponding subsections explain how to compute these intermediate results.
For the lookahead set of the production A → AaS we fold over the right-hand side AaS. Folding stops at ’a’ and we obtain
firsts? ’A’ ‘union‘ firsts? ’a’ ==
"dba" ‘union‘ "a" ==
"dba" For the lookahead set of the production A → SC we fold over the right-hand side
SC. Folding stops at C since it cannot derive the empty sequence, and we obtain
firsts? ’S’ ‘union‘ firsts? ’C’ ==
"dba" ‘union‘ "d" ==
"dba" Finally, for the lookahead set of the production B → A we fold over the right-hand
side A In this case we fold over the complete (one element) list and and we obtain
firsts? ’A’ ‘union‘ follow? ’B’ ==
"dba" ‘union‘ "d" ==
"dba" The other lookahead sets are computed in a similar way.