Regular expressions

5.3 Regular expressions

  Regular expressions are a classical and convenient way to describe, for example, the structure of terminal words. This section defines regular expressions, defines the

  Regular Languages

  language of a regular expression, and shows that regular expressions and regular grammars are equally expressive formalisms. We do not discuss implementations of (datatypes and functions for matching) regular expressions; implementations can be found in the literature, see [9, 6].

  Definition 14: RE T , regular expressions over alphabet T

  regular

  The set RE T of regular expressions over alphabet T is inductively defined as follows:

  expres-

  for regular expressions R, S

  where a ∈ T . The operator + is associative, commutative, and idempotent; the con- catenation operator, written as juxtaposition (so x concatenated with y is denoted by xy), is associative, and is the unit of it. In formulae this reads, for all regular expressions R, S, and V ,

  2 Furthermore, the star operator, ∗, binds stronger than concatenation, and concate- nation binds stronger than +. Examples of regular expressions are:

  (bc)∗ + Ø

  + b( ∗) The language (i.e. the “semantics”) of a regular expression over T is a set of T -

  sequences compositionally defined on the structure of regular expressions. As follows.

  Definition 15: Language of a regular expression

  Function Lre :: RE T → {T ∗ } returns the language of a regular expression. It is defined inductively by:

  Lre(Ø) = Ø

  Lre( ) = {} Lre(b) = {b}

  5.3 Regular expressions

  Lre(x + y) = Lre(x)) ∪ Lre(y)

  Lre(xy) = Lre(x) Lre(y) Lre(x∗) = (Lre (x)) ∗

  2 Since ∪ is associative, commutative, and idempotent, set concatenation is associative

  with { } as its unit, and function Lre is well defined. Note that the language Lreb∗ is the set consisting of zero, one or more concatenations of b, i.e., Lre(b∗) = ({b}) ∗ . As an example of a language of a regular expression, we compute the language of the regular expression ( + bc)d.

  Lre(( + bc)d)

  (Lre( + bc)) (Lre(d))

  (Lre( ) ∪ Lre(bc)){d}

  ({ } ∪ (Lre(b))(Lre(c))){d}

  {d, bcd} Regular expressions are used to describe the tokens of a language. For example, the

  list

  if p then e1 else e2 contains six tokens, three of which are identifiers. An identifier is an element in the

  language of the regular expression

  letter (letter + digit )∗ where

  letter = a+b+...+z+

  A+B+...+Z

  digit = 0+1+...+9 see subsection 2.3.1.

  In the beginning of this section we claimed that regular expressions and regular grammars are equivalent formalisms. We will prove this claim later, but first we illustrate the construction of a regular grammar out of a regular expressions in an example. Consider the following regular expression.

  R = a∗ + + (a + b)∗ We aim at a regular grammar G such that Lre(R) = L(G) and again we take a

  top-down approach.

  Regular Languages

  Suppose that nonterminal A generates the language Lre(a∗), nonterminal B gener- ates the language Lre( ), and nonterminal C generates the language Lre((a + b)∗). Suppose furthermore that the productions for A, B, and C satisfy the conditions imposed upon regular grammars. Then we obtain a regular grammar G with L(G) = Lre(R) by defining

  S →A S →B S →C

  where S is the start-symbol of G. It remains to construct productions for nonter- minals A, B, and C.

  • The nonterminal A with productions

  A → aA

  A → generates the language Lre(a∗).

  • Since Lre( ) = { }, the nonterminal B with production

  B → generates the language { }.

  • Nonterminal C with productions

  C → aC

  C → bC

  C → generates the language Lre((a + b)∗). For a specific example it is not difficult to construct a regular grammar for a regular

  expression. We now give the general result.

  Theorem 16: Regular Grammar for Regular Expression

  For each regular expression R there exists a regular grammar G such that

  Lre(R) = L(G)

  The proof of this theorem is given in Section 5.4. To obtain a regular expression that generates the same language as a given regular

  grammar we go via an automaton. Given a regular grammar G, we can use the theorems from the previous sections to obtain a DFA D such that

  L(G) = Ldfa(D)

  5.4 Proofs

  So if we can obtain a regular expression for a DFA D, we have found a regular expression for a regular grammar. To obtain a regular expression for a DFA D, we interpret each state of D as a regular expression defined as the sum of the concate- nation of outgoing terminal symbols with the resulting state. For our example DFA we obtain:

  C = It is easy to merge these four regular expressions into a single regular expression,

  partially because this is a simple example. Merging the regular expressions obtained from a DFA that may loop is more complicated, as we will briefly explain in the proof of the following theorem. In general, we have:

  Theorem 17: Regular Expression for Regular Grammar

  For each regular grammar G there exists a regular expression R such that

  L(G) = Lre(R)

  The proof of this theorem is given in Section 5.4.

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

19 819 7

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

1 29 9

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

7 202 3

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

0 63 87

The Correlation between students vocabulary master and reading comprehension

16 145 49

The correlation intelligence quatient (IQ) and studenst achievement in learning english : a correlational study on tenth grade of man 19 jakarta

0 57 61

An analysis of moral values through the rewards and punishments on the script of The chronicles of Narnia : The Lion, the witch, and the wardrobe

1 59 47

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

8 140 133

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

9 128 37

Transmission of Greek and Arabic Veteri

0 1 22