Chomsky’s String Grammars
E.1 Chomsky’s String Grammars
A string that does not include any symbol is called the empty word and it is denoted with λ.
+ . ∗ . It can be defined
+ ∪ {λ}. Let S 1 ,S 2 be sets of strings. S 1 S 2 denotes a set of strings: S 1 S 2 = {αβ : α ∈ S 1 , β∈S 2 }, i.e. the set consisting of strings that are catenations of strings belonging
to S 1 with strings belonging to S 2 .
Now, we introduce four classes of grammars of the Noam Chomsky model [ 141 , 250 ].
Definition E.1
A phrase-structure grammar (unrestricted grammar, type-0 gram- mar ) is a quadruple
G N T , P, S), where
© Springer International Publishing Switzerland 2016 271 M. Flasi´nski, Introduction to Artificial Intelligence, DOI 10.1007/978-3-319-40022-8
272 Appendix E: Formal Models for Artificial Intelligence Methods … N is a set of nonterminal symbols,
∗ is called the right-hand side of the production, S
T = ∅. Definition E.2
∗ . We denote
β =⇒ G δ(or β =⇒ δ, if G is assumed) iff β = η 1 αη 2 , δ=η 1 γη 2 and α → γ ∈ P, where P is a set of productions of the
grammar G. We say that β directly derives δ in the grammar G, and we call such direct deriving
a derivational step in the grammar G. The reflexive and transitive closure of the relation =⇒, denoted with =⇒ ∗ , is called a derivation in the grammar G.
Definition E.3 The language generated by the grammar G is a set
L(G)
T :S =⇒ φ}.
Definition E.4
A context-sensitive grammar (type-1 grammar) is a quadruple
G N T , P, S), where
T , S are defined as in Definition E.1 ,
P is a set of productions of the form: η 1 A η 2 →η 1 γη 2 ,
+ . Additionally we assume that a production of the form A → λ is allowable, if A does not occur in any production of P in its
in which η 1 , η 2 ∗ , A N ,
right-hand side. Definition E.5
A context-free grammar (type-2 grammar) is a quadruple
G N T , P, S), where
T , S are defined as in Definition E.1 ,
P N , ∗ . Definition E.6
A regular (or right-regular) grammar (type-3 grammar) is a quadru- ple
G N T , P, S), where
Appendix E: Formal Models for Artificial Intelligence Methods … 273
T , S are defined as in Definition E.1 ,
N ∪ {λ}.
TT
As we discussed in Chap. 8 , a context-free grammar is of a sufficient descriptive power for most applications of syntactic pattern recognition systems. Unfortunately,
a pushdown automaton that analyzes context-free languages is inefficient in the sense of computational complexity. Therefore, there have been defined certain subclasses of context-free grammars such that corresponding automata are efficient. LL(k) gram-
mars, introduced in an intuitive way in Chap. 8 , are one of the most popular such subclasses. Let us characterize them in a formal way [ 180 ].
Definition E.7
be a context-free grammar defined as in Definition E.4 ∗ , and |x| denotes the length (a number of symbols) of a string x
T , P, S)
∗ . FIRST k ( η) denotes a set of all the terminal prefixes of strings of the length k (or of the length less than k, if a terminal string shorter than k is derived from α)
that can be derived from η in the grammar G, i.e. FIRST k (
: (η ∗ =⇒ xβ ∧ |x| = k) ∨ (η
be a context-free grammar. =⇒ ∗ L denotes a leftmost deriva- tion in the grammar G, i.e. a derivation such that a production is always applied to
T , P, S)
the leftmost nonterminal. Definition E.8
be a context-free grammar defined as in Definition E.4 . A grammar G is called an LL(k) grammar iff for every two leftmost derivations
T ∗ , ∗ , A N , the following condition holds
If FIRST k ( x) = FIRST k ( y), then β = γ.
The condition formulated above for a grammar G means that for any derivational step of a derivation of a string w that is derivable in G, we can choose a production in an unambiguous way on the basis of an analysis of some part of w that is of length k. We say that the grammar G has the property of an unambiguous choice of
a production with respect to the k-length prefix in a leftmost derivation .
274 Appendix E: Formal Models for Artificial Intelligence Methods …