LL(1) grammars
10.1.3 LL(1) grammars
The examples in the previous subsection show that the derivations of the example sentences are deterministic, provided we can look ahead one or two symbols in the input. An obvious question now is: for which grammars are all derivations deterministic? Of course, as the second example shows, the answer to this question depends on the number of symbols we are allowed to look ahead. In the rest of this chapter we assume that we may look 1 symbol ahead. A grammar for which all derivations are deterministic with 1 symbol lookahead is called LL(1): Leftmost with a Lookahead of 1. Since all derivations of sentences of LL(1) grammars are deterministic, LL(1) is a desirable property of grammars.
To formalise this definition, we define lookAhead sets.
Definition 1: lookAhead set
The lookahead set of a production N → α is the set of terminal symbols that can appear as the first symbol of a string that can be derived from N δ (where N δ appears as a tail substring in a derivation from the start-symbol) starting with the production N → α. So
lookAhead (N → α) ∗ = {x | S ⇒ γN δ ⇒ γαδ ∗ ⇒ γxβ}
2 For example, for the productions of gramm1 we have
lookAhead (S → cA)
= {c}
lookAhead (S → b)
= {b}
lookAhead (A → cBC) = {c} lookAhead (A → bSA) = {b} lookAhead (A → a)
= {a}
lookAhead (B → cc)
= {c}
lookAhead (B → Cb)
= {a, b}
lookAhead (C → aS)
= {a}
lookAhead (C → ba)
= {b}
We use lookAhead sets in the definition of LL(1) grammar.
Definition 2: LL(1) grammar
A grammar G is LL(1) if all pairs of productions of the same nonterminal have disjoint lookahead sets, that is: for all productions N → α, N → β of G:
lookAhead (N → α) ∩ lookAhead (N → β) = Ø
10.1 LL Parsing: Background
Since all lookAhead sets for productions of the same nonterminal of gramm1 are disjoint, gramm1 is an LL(1) grammar. For gramm2 we have:
lookAhead (S → abA) = {a} lookAhead (S → aa)
= {a}
lookAhead (A → bb)
= {b}
lookAhead (A → bS)
= {b}
Here, the lookAhead sets for both nonterminals S and A are not disjoint, and it follows that gramm2 is not LL(1). gramm2 is an LL(2) grammar, where an LL(k) grammar for k ≥ 2 is defined similarly to an LL(1) grammar: instead of one symbol lookahead we have k symbols lookahead.
How do we determine whether or not a grammar is LL(1)? Clearly, to answer this question we need to know the lookahead sets of the productions of the grammar. The lookAhead set of a production N → α, where α starts with a terminal symbol x, is simply x. But what if α starts with a nonterminal P , that is α = P β, for some β? Then we have to determine the set of terminal symbols with which strings derived from P can start. But if P can derive the empty string, we also have to determine the set of terminal symbols with which a string derived from β can start. As you see, in order to determine the lookAhead sets of productions, we are interested in
• whether or not a nonterminal can derive the empty string (empty); • which terminal symbols can appear as the first symbol in a string derived from
a nonterminal (firsts); • and which terminal symbols can follow upon a nonterminal in a derivation
(follow). In each of the following definitions we assume that a grammar G is given.
Definition 3: Empty
Function empty takes a nonterminal N , and determines whether or not the empty string can be derived from the nonterminal:
empty N ∗ = N ⇒
2 For example, for gramm3 we have:
empty S = False empty A = True
empty B = False
Definition 4: First
The first set of a nonterminal N is the set of terminal symbols that can appear as the first symbol of a string that can be derived from N :
firsts N ∗ = {x | N ⇒ xβ}
LL Parsing
For example, for gramm3 we have:
firsts S = {a, b, c} firsts A = {c}
firsts B = {b} We could have given more restricted definitions of empty and firsts, by only looking
at derivations from the start-symbol, for example,
empty N ∗ = S ⇒ αN β ⇒ αβ but the simpler definition above suffices for our purposes.
∗
Definition 5: Follow
The follow set of a nonterminal N is the set of terminal symbols that can follow on N in a derivation starting with the start-symbol S from the grammar G:
follow N = {x | S ⇒ αN xβ}
∗
2 For example, for gramm3 we have:
follow S = {a} follow A = {a}
follow B = {a} In the following section we will give programs with which lookahead, empty, firsts,
and follow are computed.
Exercise 10.1 . Give the results of the function empty for the grammars gramm1 and gramm2.
Exercise 10.2 . Give the results of the function firsts for the grammars gramm1 and gramm2.
Exercise 10.3 . Give the results of the function follow for the grammars gramm1 and gramm2.
Exercise 10.4 . Give the results of the function lookahead for grammar gramm3.
Is gramm3 an LL(1) grammar ?
Exercise 10.5 . Grammar gramm2 is not LL(1), but it can be transformed into an LL(1) gram-
mar by left factoring. Give this equivalent grammar gramm2’ and give the results of the functions empty, first, follow and lookAhead on this grammar. Is gramm2’ an LL(1) grammar?
Exercise 10.6 . A non-leftrecursive grammar for Bit-Lists is given by the following grammar
(see your answer to exercise 2.18):
L → BR R → | ,BR
B →0|1
10.2 LL Parsing: Implementation
Give the results of functions empty, firsts, follow and lookAhead on this gram- mar. Is this grammar LL(1)?