Generation of Structural Patterns

8.1 Generation of Structural Patterns

We begin our definition of a generative grammar in the Chomsky model [47] by introducing a set of terminal symbols. Terminal symbols are expressions which are used for constructing sentences belonging to a language generated by a grammar. 2 Let us assume that we construct a grammar for a subset of the English language consisting of sentences which contain four subjects (male names): Hector, Victor, Hyacinthus, Pacificus, two predicates: accepts, rejects, and four objects: glob- alism, conservatism, anarchism, pacifism. For example, the sentences Hector accepts globalism and Hyacinthus rejects conservatism belong to the language. 3

Let us denote this language by L 1 . Then, a set of terminal symbols T 1 is defined as follows: T 1 ={ Hector, Victor, Hyacinthus, Pacificus, accepts, rejects, globalism, conservatism, anarchism, pacifism}. As we have mentioned above, sentences are generated with rewriting rules called productions . 4 Let us consider an application of a production with the following example. Let there be given a phrase:

Hector accepts B ,

(8.1) where B is an auxiliary symbol, called a nonterminal symbol, 5 which denotes an

object in a sentence. Let there be given a production of the form:

(8.2) An expression placed on the left-hand side of an arrow is called the left-hand side of

B→ globalism .

a production and an expression placed on the right-hand side of an arrow is called the right-hand side of a production. An application of the production to the phrase, denoted by =⇒, consists of replacing an expression of the phrase which is equivalent to the left-hand side of the production (in our case it is the symbol B) by the right- hand side of the production. Thus, an application of the production is denoted in the

2 Let us remember that the notions of word and sentence are treated symbolically in formal language theory. For example, if we define a grammar which generates single words of the English language,

then letters are terminal symbols. Then, English words which consist of these letters are called words (sentences) of a formal language. However, if we define a grammar which generates sentences of the English language, then English words can be treated as terminal symbols. Then sentences of the English language are called words (sentences) of a formal language. As we see later on, words (sentences) of a formal language generated by a grammar can represent any string structures, e.g., stock charts, charts in medicine (ECG, EEG), etc. Since in this section we use examples from a natural language, strings consisting of symbols are called sentences.

3 We omit the terminal symbol of a full stop in all examples in order to simplify our considerations. Of course, if we use generative grammars for Natural Language Processing (NLP), we should use

a full stop symbol. 4 We call them productions, because they are used for generating—“producing”—sentences of a

language. 5 Nonterminal symbols are usually denoted by capital letters.

8.1 Generation of Structural Patterns 105 following way:

(8.3) Now, we define a set of all productions which generate our language. Let us denote

Hector accepts B =⇒ Hector accepts globalism .

this set by P 1 :

( 1) S → Hector A ( 2) S → Victor A ( 3) S → Hyacinthus A ( 4) S → Pacificus A ( 5) A → accepts B ( 6) A → rejects B ( 7) B → globalism ( 8) B → conservatism ( 9) B → anarchism (

10) B → pacifism .

For example, the sentence Pacificus rejects globalism is generated as follows:

S =⇒ 4 Pacificus A 6 =⇒ Pacificus rejects B 7 =⇒ Pacificus rejects globalism . (8.4)

Indices of the productions applied are placed above double arrows. A sequence of production applications used for generating a sentence is called a derivation of this sentence. If we are not interested in presenting the sequence of derivational steps, then we can simply write:

(8.5) which means that we can generate (derive) a sentence Pacificus rejects globalism

S ∗ =⇒ Pacificus rejects globalism ,

with the help of grammar productions starting with a symbol S. As we can see our set of nonterminal symbols, which we denote by N 1 , consists of three auxiliary symbols S, A, B, which are responsible for generating a subject,

a predicate, and an object, i.e.,

N 1 = {S, A, B}.

In a generative grammar a nonterminal symbol which is used for starting any derivation is called the start symbol (axiom) and it is denoted by S. Thus, our grammar

G 1 can be defined as a quadruple G 1 = (T 1 , N 1 , P 1 , S) . A language generated by a grammar G 1 is denoted L(G 1 ) . The language L(G 1 ) is the set of all the sentences which can be derived with the help of productions of the grammar G 1 . It can be proved that for our language L 1 , which has been introduced in an informal way at the beginning of this section, the following holds: L(G 1 ) =L 1 .

A generative grammar has an interesting property: it is a finite “object” (it consists of finite sets of terminal and nonterminal symbols and a finite set of productions), however it can generate an infinite language, i.e., a language which consists of an

106 8 Syntactic Pattern Analysis infinite number of sentences. Before we consider this property, let us introduce the

following denotations. Let a be a symbol. By a n we denote an expression which consists of an n-element sequence of symbols a, i.e.,

where n ≥ 0. If n = 0, then it is a 0-element sequence of symbols a, called the empty word and denoted by λ. Thus: a 0 = λ, a 1 = a, a 2 = aa, a 3 = aaa, etc. For example, let us define a language L 2 in the following way:

(8.7) The language L 2 is infinite and consists of n-element sequences of symbols a, where

L 2 = {a n , n≥ 1} .

additionally a has to occur at least once. (The empty word does not belong to L 2 .) It

can be generated by the following simple grammar G 2 :

G 2 = (T 2 , N 2 , P 2 , S) ,

where T 2 = {a}, N 2 = {S}, and the set of productions P 2 contains the following productions:

( 1) S → a S ( 2) S → a .

A derivation of a sentence of a given length r ≥ 2 (i.e., n = r ) in G 2 is performed according to the following scheme:

1 1 1 S 1 =⇒ a S =⇒ aa S =⇒ . . . =⇒ a r− 1 S 2 =⇒ a r , (8.8) i.e., firstly the first production is applied (r − 1) times, then the second production

is applied once at the end. If we want to generate a sentence of a length n = 1, i.e., the sentence a, the we apply the second production once.

Let us notice that defining a language L 2 as an infinite set is possible due to the first production. This production is of an interesting form: S → a S, which means that it refers to itself (a symbol S occurs on the left- and right-hand sides of the production). Such a form is called recursive, from Latin recurrere—“running back”. This “running back” of the symbol S during a derivation each time after applying

the first production, makes the language L 2 infinite.

Now, we discuss a very important issue concerning generative grammars. There are a lot of classes (types) of generative grammars. These classes can be arranged in a hierarchy according to the criterion of their generative power. We introduce this criterion with the help of the following example. Let us define the next formal language as follows:

L 3 = {a n b m , n≥ 1 , m ≥ 2} .

8.1 Generation of Structural Patterns 107 The language L 3 consists of the subsequence of symbols a and the subsequence of

symbols b. Additionally, the symbol a has to occur at least once, and the symbol b has to occur at least twice. For example, the sentences abb, aabb, aabbb, aabbbb, ... , aaabb , etc. belong to this language. It can be generated by the following grammar:

G 3 = (T 3 , N 3 , P 3 , S) ,

where T 3 = {a , b}, N 3 = {S , A , B}, and the set of productions P 3 contains the following productions:

( 1) S → a A ( 2) A → a A ( 3) A → b B ( 4) B → b B ( 5) B → b .

The first production is used for generating the first symbol a. The second production generates successive symbols a in a recursive way. We generate the first symbol b with the help of the third production. The fourth production generates successive symbols

b in a recursive way (analogously to the second production). The fifth production is used for generating the last symbol b of the sentence. For example, a derivation of the sentence a 3 b 4 is performed as follows:

1 2 2 3 S 4 =⇒ a A =⇒ aa A =⇒ aaa A =⇒ aaab B =⇒

4 4 =⇒ aaabb B 5 =⇒ aaabbb B =⇒ aaabbbb .

(8.10) Now, let us define a language L 4 which is a modification of the language L 3 , in

the following way:

L 4 = {a m b m , m≥ 1} .

(8.11) The language L 4 differs from the language L 3 in demanding an equal number of

symbols a and b. The language L 4 cannot be generated with the help of grammars having productions of the form of grammar G 3 , since such productions do not ensure the condition of an equal number of symbols. It results from the fact that in such

a grammar we firstly generate a certain number of symbols a and then we start to generate symbols b, but the grammar “does not remember"how many symbols a have been generated. We say that a grammar having productions in the a form of the

productions of G 3 has too weak generative power to generate the language L 4 . Now, we introduce a grammar G 4 which is able to generate the language L 4 :

G 4 = (T 4 , N 4 , P 4 , S) ,

where T 4 = {a , b}, N 4 = {S}, and the set of productions P 4 contains the following productions:

108 8 Syntactic Pattern Analysis

( 1) S → a Sb ( 2) S → ab .

For example, a derivation of the sentence a 4 b 4 is performed in the following way:

(8.12) As we can see a solution to the problem of an equal number of symbols a and b is

1 1 1 S 2 =⇒ a Sb =⇒ aa Sbb =⇒ aaa Sbbb =⇒ aaaabbbb .

obtained by generating the same number of both symbols in each derivational step. Thus, the grammar G 4 has sufficient generative power to generate the language

L 4 . The generative power of classes of formal grammars results from the form of

their productions. Let us notice that the productions of grammars G 1 ,G 2 ,G 3 are of the following two forms:

< nonterminal symbol> → <terminal symbol><nonterminal symbol> or <nonterminal symbol> → <terminal symbol> .

Using productions of such forms, we can only “stick” a symbol onto the end of the phrase which has been derived till now. Grammars having only productions of such a form are called regular grammars. 6 In the Chomsky hierarchy such grammars have the weakest generative power. For grammars such as the grammar G 4 , we do not demand any specific form of the right-hand side of productions. We require only a single nonterminal symbol at the left-hand side of a production. Such grammars are called context-free grammars. 7 They have greater generative power than regular grammars. However, we have to pay a certain price for increasing the generative power of grammars. We discuss this issue in the next section.

Dokumen yang terkait

Hubungan pH dan Viskositas Saliva terhadap Indeks DMF-T pada Siswa-siswi Sekolah Dasar Baletbaru I dan Baletbaru II Sukowono Jember (Relationship between Salivary pH and Viscosity to DMF-T Index of Pupils in Baletbaru I and Baletbaru II Elementary School)

0 46 5

Institutional Change and its Effect to Performance of Water Usage Assocition in Irrigation Water Managements

0 21 7

The Effectiveness of Computer-Assisted Language Learning in Teaching Past Tense to the Tenth Grade Students of SMAN 5 Tangerang Selatan

4 116 138

the Effectiveness of songs to increase students' vocabuloary at second grade students' of SMP Al Huda JAkarta

3 29 100

The effectiveness of classroom debate to improve students' speaking skilll (a quasi-experimental study at the elevent year student of SMAN 3 south Tangerang)

1 33 122

Kerjasama ASEAN-China melalui ASEAN-China cooperative response to dangerous drugs (ACCORD) dalam menanggulangi perdagangan di Segitiga Emas

2 36 164

The Effect of 95% Ethanol Extract of Javanese Long Pepper (Piper retrofractum Vahl.) to Total Cholesterol and Triglyceride Levels in Male Sprague Dawley Rats (Rattus novergicus) Administrated by High Fat Diet

2 21 50

Factors Related to Somatosensory Amplification of Patients with Epigas- tric Pain

0 0 15

The Concept and Value of the Teaching of Karma Yoga According to the Bhagavadgita Book

0 0 9

Pemanfaatan Permainan Tradisional sebagai Media Pembelajaran Anak Usia Dini untuk Mengembangkan Aspek Moral dan Bahasa Anak Utilization of Traditional Games as Media Learning Early Childhood to Develop Aspects of Moral and Language Children Irfan Haris

0 0 11