Construction of simple scanners
10.4 Construction of simple scanners
In a sense, a scanner or lexical analyser may be thought of as just another syntax analyser. It handles a grammar with productions relating non-terminals such as identifier, number and Relop to terminals supplied, in effect, as single characters of the source text. When used in conjunction with a higher level parser a subtle shift in emphasis comes about: there is, in effect, no special goal symbol. Each invocation of the scanner is very much bottom-up rather than top-down; its task ends when it has reduced a string of characters to a token, without preconceived ideas of what that should be. These tokens or non-terminals are then regarded as terminals by the higher level recursive descent parser that analyses the phrase structure of Block, Statement, Expression and so on. There are at least five reasons for wishing to decouple the scanner from the main parser: The productions involved are usually very simple. Very often they amount to regular expressions, and then a scanner may be programmed without recourse to methods like recursive descent. A symbol like an identifier is lexically equivalent to a reserved word; the distinction may sensibly be made as soon as the basic token has been synthesized. The character set may vary from machine to machine, a variation easily isolated in this phase. The semantic analysis of a numeric literal constant deriving the internal representation of its value from the characters is easily performed in parallel with lexical analysis. The scanner can be made responsible for screening out superfluous separators, like blanks and comments, which are rarely of interest in the formulation of the higher level grammar. In common with the parsing strategy suggested earlier, development of the routine or function responsible for token recognition may assume that it is always called after some globally accessible variable CH has been found to contain the next character to be handled in the source will then read a complete sequence of characters that form a recognizable token will relinquish scanning after leaving CH with the first character that does not form part of this token so as to satisfy the precondition for the next invocation of the scanner. A scanner is necessarily a top-down parser, and for ease of implementation it is desirable that the productions defining the token grammar also obey the LL1 rules. However, checking these is much simpler, as token grammars are almost invariably regular, and do not display self-embedding and thus can be almost always easily be transformed into LL1 grammars. There are two main strategies that are employed in scanner construction: Rather than being decomposed into a set of recursive routines, simple scanners are often written in an ad hoc manner, controlled by a large CASE or switch statement, since the essential task is one of choosing between a number of tokens, which are sometimes distinguishable on the basis of their initial characters. Alternatively, since they usually have to read a number of characters, scanners are often written in the form of a finite state automaton FSA controlled by a loop, on each iteration of which a single character is absorbed, the machine moving between a number of states, determined by the character just read. This approach has the advantage that the construction can be formalized in terms of an extensively developed automata theory, leading to algorithms from which scanner generators can be constructed automatically. A proper discussion of automata theory is beyond the scope of this text, but in the next section we shall demonstrate both approaches to scanner construction by means of some case studies.10.5 Case studies
Parts
» compilers and compiler generators 1996
» Objectives compilers and compiler generators 1996
» Systems programs and translators
» The relationship between high-level languages and translators
» T-diagrams Classes of translator
» Phases in translation compilers and compiler generators 1996
» Multi-stage translators compilers and compiler generators 1996
» Interpreters, interpretive compilers, and emulators
» Using a high-level host language Porting a high level translator
» Bootstrapping Self-compiling compilers compilers and compiler generators 1996
» The half bootstrap compilers and compiler generators 1996
» Bootstrapping from a portable interpretive compiler A P-code assembler
» Simple machine architecture compilers and compiler generators 1996
» Addressing modes compilers and compiler generators 1996
» Machine architecture Case study 1 - A single-accumulator machine
» Instruction set Case study 1 - A single-accumulator machine
» A specimen program Case study 1 - A single-accumulator machine
» An emulator for the single-accumulator machine
» A minimal assembler for the machine
» Machine architecture Case study 2 - a stack-oriented computer
» Instruction set Case study 2 - a stack-oriented computer
» Specimen programs Case study 2 - a stack-oriented computer
» An emulator for the stack machine
» Syntax, semantics, and pragmatics
» Languages, symbols, alphabets and strings Regular expressions
» Grammars and productions compilers and compiler generators 1996
» Classic BNF notation for productions
» Simple examples compilers and compiler generators 1996
» Phrase structure and lexical structure -productions
» Semantic overtones The British Standard for EBNF Lexical and phrase structure emphasis
» One- and two-pass assemblers, and symbol tables
» Source handling Towards the construction of an assembler
» Lexical analysis Towards the construction of an assembler
» Syntax analysis Towards the construction of an assembler
» Symbol table structures The first pass - static semantic analysis
» The second pass - code generation
» Symbol table structures The first pass - analysis and code generation
» Error detection compilers and compiler generators 1996
» Simple expressions as addresses
» Improved symbol table handling - hash tables
» Macro processing facilities compilers and compiler generators 1996
» Conditional assembly compilers and compiler generators 1996
» Relocatable code compilers and compiler generators 1996
» Further projects compilers and compiler generators 1996
» Equivalent grammars Case study - equivalent grammars for describing expressions
» Useless productions and reduced grammars -free grammars
» Ambiguous grammars compilers and compiler generators 1996
» Type 1 Grammars Context-sensitive The Chomsky hierarchy
» The relationship between grammar type and language type
» EBNF description of Clang A sample program
» Context sensitivity Deterministic top-down parsing
» Terminal successors, the FOLLOW function, and LL1 conditions for non -free grammars
» Further observations Restrictions on grammars so as to allow LL1 parsing
» Alternative formulations of the LL1 conditions
» The effect of the LL1 conditions on language design
» Construction of simple recursive descent parsers
» Case studies compilers and compiler generators 1996
» Syntax error detection and recovery
» Construction of simple scanners
» LR parsing compilers and compiler generators 1996
» Automated construction of scanners and parsers
» Embedding semantic actions into syntax rules
» Attribute grammars compilers and compiler generators 1996
» Synthesized and inherited attributes
» Classes of attribute grammars
» Case study - a small student database
» Installation Input file preparation
» Execution Output from CocoR Assembling the generated system
» Case study - a simple adding machine
» Character sets Comments and ignorable characters
» Pragmas User names The scanner interface
» Syntax error recovery Parser specification
» Grammar checks Semantic errors
» Interfacing to support modules The parser interface
» Essentials of the driver program Customizing the driver frame file
» Case study - Understanding C declarations
» Case study - Generating one-address code from expressions
» Case study - Generating one-address code from an AST
» Case study - How do parser generators work?
» Project suggestions compilers and compiler generators 1996
» Overall compiler structure compilers and compiler generators 1996
» A hand-crafted source handler
» A hand-crafted scanner Lexical analysis
» A CocoR generated scanner Efficient keyword recognition
» A hand-crafted parser A CocoR generated parser
» Syntax error handling in hand-crafted parsers
» Error reporting The symbol table handler
» Other aspects of symbol table management - further types
» The code generation interface
» Code generation for a simple stack machine
» Machine tyranny Other aspects of code generation
» Abstract syntax trees as intermediate representations
» Simple optimizations - constant folding
» Simple optimizations - removal of redundant code
» Generation of assembler code
» Source handling, lexical analysis and error reporting Syntax
» The scope and extent of identifiers
» Symbol table support for the scope rules
» Parsing declarations and procedure calls
» Hypothetical machine support for the static link model
» Code generation for the static link model
» Hypothetical machine support for the display model
» Code generation for the display model
» Relative merits of the static link and display models
» Syntax and semantics compilers and compiler generators 1996
» Symbol table support for context-sensitive features
» Actual parameters and stack frames
» Hypothetical stack machine support for parameter passing
» Semantic analysis and code generation in a hand-crafted compiler
» Semantic analysis and code generation in a CocoR generated compiler
» Scope rules Language design issues
» Function return mechanisms Language design issues
» Context sensitivity and LL1 conflict resolution Fundamental concepts
» Parallel processes, exclusion and synchronization
Show more