Elementary parsers

3.2 Elementary parsers

  The goals of this section are:

  • introduce some very simple parsers for parsing sentences of grammars with

  rules of the form:

  A →

  A →a

  A →x where x is a sequence of terminals;

  • show how one can construct useful functions from simple, trivially correct

  functions by means of generalisation and partial parametrisation. This section defines parsers that can only be used to parse fixed sequences of terminal

  symbols. For a grammar with a production that contains nonterminals in its right- hand side we need techniques that will be introduced in the following section.

  We will start with a very simple parse function that just recognises the terminal symbol a. The type of the input string symbols is Char in this case, and as a parse ‘tree’ we also simply use a Char:

  symbola :: Parser Char Char symbola []

  symbola (x:xs) | x == ’a’

  = [(’a’,xs)]

  | otherwise = []

  The list of successes method immediately pays off, because now we can return an empty list if no parsing is possible (because the input is empty, or does not start with an a).

  In the same fashion, we can write parsers that recognise other symbols. As always, rather than defining a lot of closely related functions, it is better to abstract from the symbol to be recognised by making it an extra argument of the function. Further- more, the function can operate on lists of characters, but also on lists of symbols of other types, so that it can be used in other applications than character oriented ones. The only prerequisite is that the symbols to be parsed can be tested for equality. In Hugs, this is indicated by the Eq predicate in the type of the function.

  Using these generalisations, we obtain the function symbol that is given in listing 2. The function symbol is a function that, given a symbol, returns a parser for that symbol. A parser on its turn is a function too. This is why two arguments appear in the definition of symbol.

  Parser combinators

  -- Elementary parsers symbol :: Eq s => s -> Parser s s

  symbol a []

  symbol a (x:xs) | x == a

  = [(x,xs)]

  | otherwise = []

  satisfy :: (s -> Bool) -> Parser s s satisfy p []

  satisfy p (x:xs) | p x

  = [(x,xs)]

  | otherwise = []

  token :: Eq s => [s] -> Parser s [s] token k xs | k == take n xs = [(k,drop n xs)]

  | otherwise

  where n = length k failp

  :: Parser s a

  failp xs = [] succeed

  :: a -> Parser s a

  succeed r xs = [(r,xs)] -- Applications of elementary parsers digit :: Parser Char Char

  digit = satisfy isDigit

  Listing 2: ParserType.hs

  3.2 Elementary parsers

  We will now define some elementary parsers that can do the work traditionally taken care of by lexical analysers (see Section 2.8). For example, a useful parser is one that recognises a fixed string of symbols, such as while or switch. We will call this function token; it is defined in listing 2. As in the case of the symbol function we have parametrised this function with the string to be recognised, effectively making it into a family of functions. Of course, this function is not confined to strings of characters. However, we do need an equality test on the type of values in the input string; the type of token is:

  token :: Eq s => [s] -> Parser s [s] The function token is a generalisation of the symbol function, in that it recognises

  a list of symbols instead of a single symbol. Note that we cannot define symbol in terms of token: the two functions have incompatible types.

  Another generalisation of symbol is a function which may, depending on the in- put, return different parse results. Instead of specifying a specific symbol, we can parametrise the function with a condition that the symbol should fulfill. Thus the function satisfy has a function s -> Bool as argument. Where symbol tests for equality to a specific value, the function satisfy tests for compliance with this predicate. It is defined in listing 2. This generalised function is for example useful when we want to parse digits (characters in between ’0’ and ’9’):

  digit :: Parser Char Char digit = satisfy isDigit

  where the function isDigit is the standard predicate that tests whether or not a character is a digit:

  isDigit

  :: Char -> Bool

  isDigit x = ’0’ <= x x <= ’9’ In books on grammar theory an empty string is often called ‘epsilon’. In this tradi-

  tion, we will define a function epsilon that ‘parses’ the empty string. It does not consume any input, and hence always returns an empty parse tree and unmodified input. A zero-tuple can be used as a result value: () is the only value of the type ().

  epsilon

  :: Parser s ()

  epsilon xs = [((),xs)]

  A more useful variant is the function succeed, which also doesn’t consume input, but always returns a given, fixed value (or ‘parse tree’, if you can call the result of processing zero symbols a parse tree). It is defined in listing 2.

  Dual to the function succeed is the function failp, which fails to recognise any symbol on the input string. As the result list of a parser is a ‘list of successes’, and in the case of failure there are no successes, the result list should be empty. Therefore the function failp always returns the empty list of successes. It is defined in listing 2. Note the difference with epsilon, which does have one element in its list of successes (albeit an empty one).

  Parser combinators

  Do not confuse failp with epsilon: there is an important difference between re- turning one solution (which contains the unchanged input as ‘rest’ string) and not returning a solution at all!

  Exercise 3.1 . Define a function capital :: Parser Char Char that parses capital letters. Exercise 3.2 . Since satisfy is a generalisation of symbol, the function symbol can be defined

  as an instance of satisfy. How can this be done?

  Exercise 3.3 . Define the function epsilon using succeed .

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

19 819 7

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

1 29 9

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

7 202 3

Improving the VIII-B Students' listening comprehension ability through note taking and partial dictation techniques at SMPN 3 Jember in the 2006/2007 Academic Year -

0 63 87

The Correlation between students vocabulary master and reading comprehension

16 145 49

The correlation intelligence quatient (IQ) and studenst achievement in learning english : a correlational study on tenth grade of man 19 jakarta

0 57 61

An analysis of moral values through the rewards and punishments on the script of The chronicles of Narnia : The Lion, the witch, and the wardrobe

1 59 47

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

8 140 133

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

9 128 37

Transmission of Greek and Arabic Veteri

0 1 22