Proofs of pumping lemmas
9.4 Proofs of pumping lemmas
This section gives the proof of the pumping lemmas. Proof: of the Regular Pumping Lemma, Theorem 1.
Since L is a regular language, there exists a deterministic finite-state automaton D such that L = Ldfa D. Take for n the number of states of D. Let s be an element of L with sublist y such that |y| ≥ n, say s = xyz . Consider the sequence of states D passes through while processing y. Since |y| ≥ n, this sequence has more than n entries, hence at least one state, say state A, occurs twice. Take u, v, w as follows
• u is the initial part of y processed until the first occurrence of A, • v is the (nonempty) part of y processed from the first to the second occurrence
of A, • w is the remaining part of y
Note that D could have skipped processing v, and hence would have accepted xuwz . Furthermore, D can repeat the processing in between the first occurrence of A and the second occurrence of A as often as desired, and hence it accepts xuv i wz for all
i ∈ IIN. Formally, a simple proof by induction shows that (∀i : i ≥ 0 : xuv i wz ∈ L).
2 Proof: of the Context-free Pumping Lemma, Theorem 2. Let G = (T, N, R, S) be a context-free grammar such that L = L(G). Let m be the
length of the longest right-hand side of any production, and let k be the number of nonterminals of G. Take c = m k . In Lemma 3 below we prove that if z is a list with |z| > c, then in all derivation trees for z there exists a path of length at least k+1.
Let z ∈ L such that |z| > c. Since grammar G has k nonterminals, there is at least one nonterminal that occurs more than once in a path of length k+1 (which contains k+2 symbols, of which at most one is a terminal, and all others are nonterminals) of a derivation tree for z. Consider the nonterminal A that satisfies the following requirements.
• A occurs at least twice in the path of length k+1 of a derivation tree for z.
9.4 Proofs of pumping lemmas
Call the list corresponding to the derivation tree rooted at the lower A w, and call the list corresponding to the derivation tree rooted at the upper A (which contains the list w ) vwx .
• A is chosen such that at most one of v and x equals . • Finally, we suppose that below the upper occurrence of A no other nonterminal
that satisfies the same requirements occurs, that is, the two A’s are the lowest pair of nonterminals satisfying the above requirements.
First we show that a nonterminal A satisfying the above requirements exists. We prove this statement by contradiction. Suppose that for all nonterminals A that occur twice on a path of length at least k+1 both v and x, the border lists of the list vwx corresponding to the tree rooted at the upper occurrence of A, are both equal to
. Then we can replace the tree rooted at the upper A by the tree rooted at the lower
A without changing the list corresponding to the tree. Thus we can replace all paths of length at least k+1 by a path of length at most k. But this contradicts Lemma 3 below, and we have a contradiction. It follows that a nonterminal A satisfying the above requirements exists.
There exists a derivation tree for z in which the path from the upper A to the leaf has length at most k+1, since either below A no nonterminal occurs twice, or there is
one or more nonterminal B that occurs twice, but the border lists v 0 and x 0 from the
list v 0 w 0 x 0 corresponding to the tree rooted at the upper occurrence of nonterminal
B are empty. Since the lists v 0 and x 0 are empty, we can replace the tree rooted at
the upper occurrence of B by the tree rooted at the lower occurrence of B without changing the list corresponding to the derivation tree. Since we can do this for all nonterminals B that occur twice below the upper occurrence of A, there exists a derivation tree for z in which the path from the upper A to the leaf has length at most k+1. It follows from Lemma 3 below that the length of vwx is at most m k+1 , so we define d = m k+1 .
Suppose z = uvwxy, that is, the list corresponding to the subtree to the left (right) of the upper occurrence of A is u (y). This situation is depicted as follows.
We prove by induction that (∀i : i ≥ 0 : uv i wx i y ∈ L). In this proof we apply the tree substitution process described in the example before the lemma. For i = 0 we have to show that the list uwy is a sentence in L. The list uwy is obtained if the tree rooted at the upper A is replaced by the tree rooted at the lower A. Suppose that for all i ≤ n we have uv i wx i y ∈ L. The list uv i+1 wx i+1 y is obtained if the tree rooted at the lower A in the derivation tree for uv i wx i y ∈ L is replaced by the tree
Pumping Lemmas: the expressive power of languages
rooted above it A. This proves the induction step.
2 The proof of the Pumping Lemma above frequently refers to the following lemma.
Theorem 3:
Let G be a context-free grammar, and suppose that the longest right-hand side of any production has length m. Let t be a derivation tree for a list z ∈ L G. If height t ≤ j, then |z| ≤ m j .
Proof: We prove a slightly stronger result: if t is a derivation tree for a list z, but the root of t is not necessarily the start-symbol, and height t ≤ j, then |z| ≤ m j . We prove this statement by induction on j.
For the base case, suppose j = 1. Then tree t corresponds to a single production in
G, and since the longest right-hand side of any production has length m, we have that |z| ≤ m = m j .
For the induction step, assume that for all derivation trees t of height at most j we have that |z| ≤ m j , where z is the list corresponding to t. Suppose we have
a tree t of height j+1. Let A be the root of t, and suppose the top of the tree corresponds to the production A → v in G. For all trees s rooted at the symbols of v we have height s ≤ j, so the induction hypothesis applies to these trees, and the lists corresponding to the trees rooted at the symbols of v all have length at most m j . Since A → v is a production of G, and since the longest right-hand side of any production has length m, the list corresponding to the tree rooted at A has length at most m × m j =m j+1 , which proves the induction step.
summary
This section introduces pumping lemmas. Pumping lemmas are used to prove that languages are not regular or not context-free.
9.5 exercises
Exercise 9.8 . Show that the following language is not regular.
{x | x ∈ {0, 1} ∗ ∧ nr 1 x = nr 0 x} where function nr takes an element a and a list x, and returns the number of
occurrences of a in x.
Exercise 9.9 . Consider the following language:
{a i j b | 0 ≤ i ≤ j}
1. Is this language context-free? If it is, give a context-free grammar and prove that this grammar generates the language. If it is not, why not?
2. Is this language regular? If it is, give a regular grammar and prove that this grammar generates the language. If it is not, why not?
Exercise 9.10 . Consider the following language:
{wcw | w ∈ {a, b} ∗ }
9.5 exercises
1. Is this language context-free? If it is, give a context-free grammar and prove that this grammar generates the language. If it is not, why not?
2. Is this language regular? If it is, give a regular grammar and prove that this grammar generates the language. If it is not, why not?
Exercise 9.11 . Consider the grammar G with the following productions.
1. Is this grammar
• Context-free? • Regular?
Why?
2. Give the language of G without referring to G itself. Prove that your descrip- tion is correct.
3. Is the language of G
• Context-free? • Regular?
Why?
Exercise 9.12 . Consider the grammar G with the following productions.
1. Is this grammar
• Context-free? • Regular?
Why?
2. Give the language of G without referring to G itself. Prove that your descrip- tion is correct.
3. Is the language of G
• Context-free? • Regular?
Why?
Pumping Lemmas: the expressive power of languages
Chapter 10
LL Parsing
introduction
This chapter introduces LL(1) parsing. LL(1) parsing is an efficient (linear in the length of the input string) method for parsing that can be used for all LL(1) gram- mars. A grammar is LL(1) if at each step in a derivation the next symbol in the input uniquely determines the production that should be applied. In order to de- termine whether or not a grammar is LL(1), we introduce several kinds of grammar analyses, such as determining whether or not a nonterminal can derive the empty string, and determining the set of symbols that can appear as the first symbol in a derivation from a nonterminal.
goals
After studying this chapter you will
• know the definition of LL(1) grammars; • know how to parse a sentence from an LL(1) grammar; • be able to apply different kinds of grammar analyses in order to determine
whether or not a grammar is LL(1). This chapter is organised as follows. Section 10.1 describes the background of LL(1)
parsing, and Section 10.2 describes an implementation in Haskell of LL(1) parsing and the different kinds of grammar analyses needed for checking whether or not a grammar is LL(1).