Regular grammars
5.2 Regular grammars
This section defines regular grammars, a special kind of context-free grammars. Subsection 5.2.1 gives the correspondence between nondeterministic finite-state au- tomata and regular grammars.
Definition 8: Regular Grammar
A regular grammar G is a context free grammar (T, N, R, S) in which all production
regular
rules in R are of one of the following two forms:
gram- mar
A → xB
A →x with x ∈ T ∗ and A, B ∈ N . So in every rule there is at most one nonterminal, and
if there is a nonterminal present, it occurs at the end. 2 The regular grammars as defined here are sometimes called right-regular grammars.
There is a symmetric definition for left-regular grammars.
Definition 9: Regular Language
A regular language is a language that is generated by a regular grammar. 2 regular
From the definition it is clear that each regular language is context-free. The question
lan-
is now: Is each context-free language regular? The answer is: No. There are context-
guage free languages that are not regular; an example of such a language is {a n b n | n ∈ IIN}.
To understand this, you have to know how to prove that a language is not regular. Because of its subtlety, we postpone this kind of proofs until Chapter 9. Here it suffices to know that regular languages form a proper subset of context-free languages and that we will profit from their speciality in the recognition process.
A first similarity between regular languages and context-free languages is that both are closed under union, concatenation and Kleene-star.
Theorem 10:
Let L and M be regular languages, then
L∪M is regular
LM is regular
L ∗ is regular
Proof: Let G L = (T, N L ,R L ,S L ) and G M = (T, N M ,R M ,S M ) be regular grammars for L and M respectively, then
• for regular grammars, the well-known union construction for context-free gram-
mars is a regular grammar again; • we obtain a regular grammar for LM if we replace, in G L , each production of
the form T → x and T → by T → xS M and T → S M , respectively; • since L ∗ = { } ∪ LL ∗ , it follows from the above that there exists a regular
grammar for L ∗ .
Regular Languages
In addition to these closure properties, regular languages are closed under intersec- tion and complement too. See the exercises. This is remarkable because context-free
languages are not closed under these operations. Recall the language L = L 1 ∩L 2
where L 1 = {a n n b c m | n, m ∈ IIN} and L
m
2 = {a n b c m | n, m ∈ IIN}.
As for context-free languages, there may exist more than one regular grammar for
a given regular language and these regular grammars may be transformed into each other. We conclude this section with a grammar transformation:
Theorem 11:
For each regular grammar G there exists a regular grammar G 0 with start-symbol
S 0 such that
L(G) = L(G 0 )
and such that G 0 has no productions of the form U → V and W → , with W 6= S 0 .
In other words: every regular grammar can be transformed to a form where every production has a nonempty terminal string in its righthandside (with a possible exception for S → ).
The proof of this transformation is omitted, we only briefly describe the construction of such a regular grammar, and illustrate the construction with an example.
Given a regular grammar G, a regular grammar with the same language but without productions of the form U → V and W → for all U , V , and all W 6= S is obtained
as follows. First, consider all pairs Y , Z of nonterminals of G such that Y ∗ ⇒ Z. Add productions Y → z to the grammar, with Z → z a production of the original
grammar, and z not a single nonterminal. Remove all productions U → V from
G. Finally, remove all productions of the form W → for W 6= S, and for each production U → xW add the production U → x. The following example illustrates this construction.
Consider the following regular grammar G.
The grammar G 0 of the desired form is constructed in 3 steps.
Step 1
Let G 0 equal G. Step 2
Consider all pairs of nonterminals Y and Z. If Y ∗ ⇒ Z, add the productions Y → z to
5.2 Regular grammars
G 0 , with Z → z a production of the original grammar, and z not a single nonterminal.
Furthermore, remove all productions of the form U → V from G 0 . In the example
we remove the productions S → A, S → C, A → S, and we add the productions
S → bB and S → since S ∗ ⇒ A, and the production S → c since S ⇒ C, and the productions A → aA and A → bB since A ∗ ⇒ S, and the production A → c since
∗
A ∗ ⇒ C. We obtain the grammar with the following productions.
C →c This grammar generates the same language as G, and has no productions of the
form U → V . It remains to remove productions of the form W → for W 6= S. Step 3
Remove all productions of the form W → for W 6= S, and for each production U → xW add the production U → x. Applying this transformation to the above grammar gives the following grammar.
C →c Each production in this grammar is of one of the desired forms: U → x or U → xV ,
and the language of the grammar G 0 we thus obtain is equal to the language of
grammar G.
Regular Languages
5.2.1 Equivalence of Regular grammars and Finite automata
In the previous section, we introduced finite-state automata. Here we show that regular grammars and nondeterministic finite-state automata are two sides of one coin.
We will prove the equivalence using theorem 7. The equivalence consists of two parts, formulated in the theorems 12 and 13 below. The basis for both theorems is the direct correspondence between a production A → xB and a transition
A x ?>=< 89:;
B
Theorem 12: Regular grammar for NFA
For each NFA M there exists a regular grammar G such that
Lnfa(M ) = L(G)
Proof: We will just sketch the construction, the formal proof can be found in the literature. Let (X, Q, d, S, F ) be a DFA for NFA M . Construct the grammar
G = (X, Q, R, S) where • the terminals of the grammar are the input alphabet of the automaton;
• the nonterminals of the grammar are the states of the automaton; • the start state of the grammar is the start state of the automaton; • the productions of the grammar correspond to the automaton transitions:
a rule A → xB for each transition
x
A ?>=< 89:; B
a rule A → for each terminal state A. In formulae:
R = {A → xB | A, B ∈ Q, x ∈ X, d A x = B}
∪ {A →
|A∈F}
Theorem 13: NFA for regular grammar
For each regular grammar G there exists a nondeterministic finite-state automaton M such that
L(G) = Lnfa(M )
Proof: Again, we will just sketch the construction, the formal proof can be found in the literature. The construction consists of two steps: first we give a direct trans- lation of a regular grammar to an automaton and then we transform the automaton into a suitable shape.
From a grammar to an automaton. Let G = (T, N, R, S) be a regular grammar without productions of the form U → V and W → for W 6= S. Construct NFA M = (X, Q, d, {S}, F ) where
5.3 Regular expressions
• The input alphabet of the automaton are the nonempty terminal strings (!)
that occur in the rules of the grammar:
X = {x ∈ T + | A, B ∈ N, A → xB ∈ R} ∪ {x ∈ T + | A ∈ N, A → x ∈ R}
• The states of the automaton are the nonterminals of the grammar extended
with a new state N f . Q = N ∪ {N f } • The transitions of the automaton correspond to the grammar productions:
for each rule A → xB we get a transition
A x ?>=< 89:;
B
for each rule A → x with nonempty x, we get a transition
A x GFED ABC N
f
In formulae: For all A ∈ N and x ∈ X:
dAx = {B | B ∈ N, A → xB ∈ R}
∪ {N f | A → x ∈ R}
• The final states of the automaton are N f and possibly S, if S → is a grammar
production.
F = {N f } ∪ {S | S → ∈ R}
Lnfa(M ) = L(G), because of the direct correspondence between derivation steps in
G and transitions in M . Transforming the automaton to a suitable shape. There is a minor flaw in the au-
tomaton given above: the grammar and the automaton have different alphabets. This shortcoming will be remedied by an automaton transformation which yields an equivalent automaton with transitions labelled by elements of T (instead of T ∗ ). The transformation is relatively easy and is depicted in the diagram below. In order
to eliminate transition d q x = q 0 where x = x 1 x 2 ...x k+1 with k > 0 and x i ∈ T for
all i, add new (nonfinal) states p 1 ,...,p k to the existing ones and new transitions
dqx 1 =p 1 ,dp 1 x 2 =p 2 ,...,dp k x k+1 =q 0 .
b ...
- b b b b pppppppppppppppppp b - b
It is intuitively clear that the resulting automaton is equivalent to the original one.
Carry out this transformation for each M -transition d q x = q 0 with |x| > 1 in order
to get an automaton for G with the same input alphabet T .