lecture 10 compiler I
www.nand2tetris.org
(2)
Course map
Assembler Chapter 6H.L. Language
&
Operating Sys.
abstract interfaceCompiler
Chapters 10 - 11
VM Translator
Chapters 7 - 8
Computer
Architecture
Chapters 4 - 5
Gate Logic
Chapters 1 - 3
Electrical
Engineering
Physics
Virtual
Machine
abstract interfaceSoftware
hierarchy
Assembly
Language
abstract interfaceHardware
hierarchy
Machine
Language
abstract interfaceHardware
Platform
abstract interfaceChips &
Logic Gates
abstract interfaceHuman
Thought
Abstract design
(3)
Motivation: Why study about compilers?
!
"
"
" #
(4)
The big picture
. . .
RISC
other digital platforms, each equipped
RISC
machine
language
Hack
Hack
machine
language
CISC
machine
language
CISC
. . .
a high-level
written in
language
Any
. . .
HW
lectures
(Projects
1-6)
Intermediate code
VM
implementation
over CISC
platforms
VM imp.
over RISC
platforms
VM imp.
over the Hack
platform
VM
emulator
VM
lectures
(Projects
7-8)
Some Other
language
Jack
language
Some
compiler
Some Other
compiler
Jack
compiler
. . .
Some
language
. . .
Compiler
lectures
(Projects
10,11)
'
(
)
(
(
$(
*
&
(5)
Compiler architecture (front end)
. . . Intermediate code RISC machine language Hack machine language CISC machinelanguage . . .
written in a high-level language . . . VM implementation over CISC platforms VM imp. over RISC platforms VM imp. over the Hack
platform VM emulator Some Other language Jack language Some compiler Some Other
compiler
Jack compiler . . .
Some language . . .
*
&
$
+
,
%
-.'/
0
+
$
(Chapter 11)
Jack Program Toke-nizer Parser Code Gene -rationSyntax Analyzer
Jack Compiler
VM code XML code(Chapter 10)
(source)
(target)
(6)
Tokenizing / Lexical analysis
1
$
!
#
*
/
& &
,22%
/
(
$
"
* "
"
"
"&&&
3
"
$
+
$ " *
(7)
Jack Tokenizer
if (x < 153) {let city = ”Paris”;}
if (x < 153) {let city = ”Paris”;}
<tokens>
<keyword> if </keyword>
<symbol> ( </symbol>
<identifier> x </identifier>
<symbol> < </symbol>
<integerConstant> 153 </integerConstant>
<symbol> ) </symbol>
<symbol> { </symbol>
<keyword> let </keyword>
<identifier> city </identifier>
<symbol> = </symbol>
<stringConstant> Paris </stringConstant>
<symbol> ; </symbol>
<symbol> } </symbol>
</tokens>
<tokens>
<keyword>
if
</keyword>
<symbol>
(
</symbol>
<identifier>
x
</identifier>
<symbol>
<
</symbol>
<integerConstant>
153
</integerConstant>
<symbol>
)
</symbol>
<symbol>
{
</symbol>
<keyword>
let
</keyword>
<identifier>
city
</identifier>
<symbol>
=
</symbol>
<stringConstant>
Paris
</stringConstant>
<symbol>
;
</symbol>
<symbol>
}
</symbol>
</tokens>
$
+ 4
(8)
Parsing
$
+
+
*
&
+
$
+
*
"
*
5
!
" #
(9)
Parsing examples
(5+3)*2
–
sqrt(9*4)
she discussed sex with her doctor
-5
sqrt
+
*
3
2
9
4
*
Jack
English
discussed
she
sex
with
her doctor
parse 1
discussed
she
with
her doctor
parse 2
(10)
More examples of challenging parsing
3
$
*
*
3
$
*
*
(
(11)
!
#
7
!
(
#
8
0
9
&
A typical grammar of a typical C-like language
while (expression) {
if (expression)
statement;
while (expression) {
statement;
if (expression)
statement;
}
while (expression) {
statement;
statement;
}
}
if (expression) {
statement;
while (expression)
statement;
statement;
}
if (expression)
if (expression)
statement;
}
while (expression) {
if (expression)
statement;
while (expression) {
statement;
if (expression)
statement;
}
while (expression) {
statement;
statement;
}
}
if (expression) {
statement;
while (expression)
statement;
statement;
}
if (expression)
if (expression)
statement;
}
program: statement;
statement:
whileStatement
| ifStatement
|
// other statement possibilities ...
| '{' statementSequence '}'
whileStatement: 'while' '(' expression ')' statement
ifStatement: simpleIf
| ifElse
simpleIf: 'if' '(' expression ')' statement
ifElse: 'if' '(' expression ')' statement
'else' statement
statementSequence: ''
// null, i.e. the empty sequence
| statement ';' statementSequence
expression:
// definition of an expression comes here
// more definitions follow
program: statement;
statement:
whileStatement
| ifStatement
|
// other statement possibilities ...
| '
{
' statementSequence '
}
'
whileStatement: '
while
' '
(
' expression '
)
' statement
ifStatement: simpleIf
| ifElse
simpleIf: '
if
' '
(
' expression '
)
' statement
ifElse: '
if
' '
(
' expression '
)
' statement
'
else
' statement
statementSequence: ''
// null, i.e. the empty sequence
| statement '
;
' statementSequence
expression: // definition of an expression comes here
(12)
Parse tree
statement
whileStatement
expression
statementSequence
statement
statement
statementSequence
Input Text:
while
(count<=100) {
/** demonstration */
count++;
// ...
Tokenized:
while
(
count
<=
100
)
{
count
++
;
...
program: statement;
statement: whileStatement
| ifStatement
|
// other statement possibilities ...
| '{' statementSequence '}'
whileStatement: 'while'
'(' expression ')'
statement
...
program: statement;
statement: whileStatement
| ifStatement
|
// other statement possibilities ...
| '
{
' statementSequence '
}
'
whileStatement: '
while
'
'
(
' expression '
)
'
statement
(13)
Recursive descent parsing
-"
parseStatement()
parseWhileStatement()
parseIfStatement()
parseStatementSequence()
parseExpression().
9
//!:#
$
$
;
$
6 $
//!:#&
while (expression) {
statement;
statement;
while (expression) {
while (expression)
statement;
statement;
}
}
while (expression) {
statement;
statement;
while (expression) {
while (expression)
statement;
statement;
}
(14)
A linguist view on parsing
-<
"
" =
"
*=
" *=
"
"
&
!
- $ "
(15)
The Jack grammar
’x’
:
*
x
:
x?
:
:
;
x*
:
:
x|y
:
(x,y)
:
"
.
’x’
:
*
x
:
x?
:
:
;
x*
:
:
x|y
:
(16)
The Jack grammar (cont.)
’x’
:
*
x
:
x?
:
:
;
x*
:
:
:
’x’
:
*
x
:
x?
:
:
;
x*
:
:
(17)
Jack syntax analyzer in action
Class Bar {
method Fraction foo(int y) {
var int temp; // a variable
let temp = (xxx+12)*-63;
...
...
Class Bar {
method Fraction foo(int y) {
var int temp; // a variable
let temp = (xxx+12)*-63;
...
...
+
>
"
+
!
#
+
$
"
" & & .'/&
<varDec>
<keyword> var </keyword>
<keyword> int </keyword>
<identifier> temp </identifier>
<symbol> ; </symbol>
</varDec>
<statements>
<letStatement>
<keyword> let </keyword>
<identifier> temp </identifier>
<symbol> = </symbol>
<expression>
<term>
<symbol> ( </symbol>
<expression>
<term>
<identifier> xxx </identifier>
</term>
<symbol> + </symbol>
<term>
<int.Const.> 12 </int.Const.>
</term>
</expression>
...
<varDec>
<keyword> var </keyword>
<keyword> int </keyword>
<identifier> temp </identifier>
<symbol> ; </symbol>
</varDec>
<statements>
<letStatement>
<keyword> let </keyword>
<identifier> temp </identifier>
<symbol> = </symbol>
<expression>
<term>
<symbol> ( </symbol>
<expression>
<term>
<identifier> xxx </identifier>
</term>
<symbol> + </symbol>
<term>
<int.Const.> 12 </int.Const.>
</term>
</expression>
...
Syntax analyzer
+ 4
(
"
<xxx>
Recursive code for the body of xxx
</xxx>
xxx
(keyword, symbol, constant, or identifier)"
<xxx>
xxx
value
</xxx>
(18)
(19)
(20)
CompilationEngine: a recursive top-down parser for Jack
The CompilationEngine effects the actual compilation output.
It gets its input from a
JackTokenizer
and emits its parsed structure into an
output file/stream.
The output is generated by a series of
compilexxx()
routines, one for every
syntactic element
xxx
of the Jack grammar.
The contract between these routines is that each
compilexxx()
routine should
read the syntactic construct
xxx
from the input,
advance()
the tokenizer
exactly beyond
xxx
, and output the parsing of
xxx
.
Thus,
compilexxx()
may only be called if indeed
xxx
is the next syntactic
element of the input.
In the first version of the compiler, which we now build, this module emits a
structured printout of the code, wrapped in XML tags (defined in the specs of
project 10). In the final version of the compiler, this module generates
executable VM code (defined in the specs of project 11).
(21)
(22)
(23)
(24)
Summary and next step
(Chapter 11)
Jack
Program
Toke-nizer
Parser
Code
Gene
-ration
Syntax Analyzer
Jack Compiler
VM
code
XML
code
(Chapter 10)
+
(*
"
.'/
"
*
?'
(25)
Perspective
*
+
* *
Lex
$
+
Yacc
@
!
&&&#
6 $
let
"
do
" &&&
5
5
$
"
&
1
6 $
$
"
(
9
8
(1)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 20
CompilationEngine: a recursive top-down parser for Jack
The CompilationEngine effects the actual compilation output.
It gets its input from a JackTokenizer
and emits its parsed structure into an
output file/stream.
The output is generated by a series of compilexxx()
routines, one for every
syntactic element xxx
of the Jack grammar.
The contract between these routines is that each compilexxx()
routine should
read the syntactic construct
xxx
from the input,
advance()
the tokenizer
exactly beyond xxx, and output the parsing of xxx.
Thus,
compilexxx()
may only be called if indeed
xxx
is the next syntactic
element of the input.
In the first version of the compiler, which we now build, this module emits a
structured printout of the code, wrapped in XML tags (defined in the specs of
project 10). In the final version of the compiler, this module generates
executable VM code (defined in the specs of project 11).
(2)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 21
CompilationEngine (cont.)
(3)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 22
CompilationEngine (cont.)
(4)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 23
CompilationEngine (cont.)
(5)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 24
Summary and next step
(Chapter 11)
Jack
Program
Toke-nizer
Parser
Code
Gene
-ration
Syntax Analyzer
Jack Compiler
VM
code
XML
code
(Chapter 10)
+
(*
"
.'/
"
*
?'
(6)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 25