The pumping lemma for regular languages
9.2 The pumping lemma for regular languages
In this section we give the pumping lemma for regular languages. The lemma gives
a property that is satisfied by all regular languages. The property is a statement of the form: in sentences longer than a certain length a substring can be identified that can be duplicated while retaining a sentence. The idea behind this property is simple: regular languages are accepted by finite automata. Given a DFA for a regular language, a sentence of the language describes a path from the start state to some finite state. When the length of such a sentence exceeds the number of states, then at least one state is visited twice; consequently the path contains a cycle that can be repeated as often as desired. The proof of the following lemma is given in Section 9.4.
Theorem 1: Regular Pumping Lemma
Let L be a regular language. Then there exists
there exist
Note that |y| denotes the length of the string y. Also remember that for all x ∈ X : · · · is true if X = Ø, and there exists x ∈ X : · · · is false if X = Ø.
For example, consider the following automaton.
a
89:; ?>=< A b ?>=< 89:; B
?>=< 89:; ()+ .-, D
This automaton accepts: abcabcd, abcabcabcd, and, in general, a(bca) ∗ bcd. The statement of the pumping lemma amounts to the following. Take for n the number of states in the automaton (5). Let x, y, z be such that xyz ∈ L, and |y| ≥ n. Then we know that in order to accept y, the above automaton has to pass at least twice through state A. The part that is accepted in between the two moments the automaton passes through state A can be pumped up to create sentences that contain an arbitrary number of copies of the string v = bca.
This pumping lemma is useful in showing that a language does not belong to the family of regular languages. Its application is typical of pumping lemmas in general; they are used negatively to show that a given language does not belong to some family.
Pumping Lemmas: the expressive power of languages
Theorem 1 enables us to prove that a language L is not regular by showing that for all
n ∈ IIN :
there exist
there exists
i ∈ IIN : xuv i wz 6∈ L
In all applications of the pumping lemma in this chapter, this is the formulation we will use.
Note that if n = 0, we can choose y = , and since there is no v with |v| > 0 such that y = uvw, the statement above holds for all such v (namely none!).
As an example, we will prove that language L = {a m m b | m ≥ 0} is not regular.
Let n ∈ IIN.
Take s = a n b n with x = , y = a n , and z = b n .
Let u, v, w be such that y = uvw with v 6= , that is, u = a p ,v=a q and w = a r with p + q + r = n and q > 0. Take i = 2, then
xuv 2 wz 6∈ L
⇐
defn. x, u, v, w, z, calculus
a p+2q+r b n 6∈ L
⇐
p+q+r=n n + q 6= n
⇐
arithmetic q>0
⇐
q>0 true
Note that the language L = {a m b m | m ≥ 0} is context-free, and together with the
fact that each regular grammar is also a context-free grammar it follows immediately that the set of regular languages is strictly smaller than the set of context-free languages.
Note that here we use the pumping lemma (and not the proof of the pumping lemma) to prove that a language is not regular. This kind of proof can be viewed as a kind of game: ‘for all’ is about an arbitrary element which can be chosen by the opponent; ‘there exists’ is about a particular element which you may choose. Choosing the right elements helps you ‘win’ the game, where winning means proving that a language is not regular.
Exercise 9.1 . Prove that the following language is not regular
{a k 2 | k ≥ 0}
Exercise 9.2 . Show that the following language is not regular.
{x | x ∈ {a, b} ∗ ∧ nr a x < nr b x} where nr a x is the number of occurrences of a in x.
9.3 The pumping lemma for context-free languages
Exercise 9.3 . Prove that the following language is not regular
{a k m b | k ≤ m ≤ 2k}
Exercise 9.4 . Show that the following language is not regular.
{a k b l a m | k > 5 ∧ l > 3 ∧ m ≤ l}