lecture 10 compiler I

(1)

www.nand2tetris.org


(2)

Course map

Assembler Chapter 6

H.L. Language

&

Operating Sys.

abstract interface

Compiler

Chapters 10 - 11

VM Translator

Chapters 7 - 8

Computer

Architecture

Chapters 4 - 5

Gate Logic

Chapters 1 - 3

Electrical

Engineering

Physics

Virtual

Machine

abstract interface

Software

hierarchy

Assembly

Language

abstract interface

Hardware

hierarchy

Machine

Language

abstract interface

Hardware

Platform

abstract interface

Chips &

Logic Gates

abstract interface

Human

Thought

Abstract design


(3)

Motivation: Why study about compilers?

!

"

"

" #


(4)

The big picture

. . .

RISC

other digital platforms, each equipped

RISC

machine

language

Hack

Hack

machine

language

CISC

machine

language

CISC

. . .

a high-level

written in

language

Any

. . .

HW

lectures

(Projects

1-6)

Intermediate code

VM

implementation

over CISC

platforms

VM imp.

over RISC

platforms

VM imp.

over the Hack

platform

VM

emulator

VM

lectures

(Projects

7-8)

Some Other

language

Jack

language

Some

compiler

Some Other

compiler

Jack

compiler

. . .

Some

language

. . .

Compiler

lectures

(Projects

10,11)

'

(

)

(

(

$(

*

&


(5)

Compiler architecture (front end)

. . . Intermediate code RISC machine language Hack machine language CISC machine

language . . .

written in a high-level language . . . VM implementation over CISC platforms VM imp. over RISC platforms VM imp. over the Hack

platform VM emulator Some Other language Jack language Some compiler Some Other

compiler

Jack compiler . . .

Some language . . .

*

&

$

+

,

%

-.'/

0

+

$

(Chapter 11)

Jack Program Toke-nizer Parser Code Gene -ration

Syntax Analyzer

Jack Compiler

VM code XML code

(Chapter 10)

(source)

(target)


(6)

Tokenizing / Lexical analysis

1

$

!

#

*

/

& &

,22%

/

(

$

"

* "

"

"

"&&&

3

"

$

+

$ " *


(7)

Jack Tokenizer

if (x < 153) {let city = ”Paris”;}

if (x < 153) {let city = ”Paris”;}

<tokens>

<keyword> if </keyword>

<symbol> ( </symbol>

<identifier> x </identifier>

<symbol> &lt; </symbol>

<integerConstant> 153 </integerConstant>

<symbol> ) </symbol>

<symbol> { </symbol>

<keyword> let </keyword>

<identifier> city </identifier>

<symbol> = </symbol>

<stringConstant> Paris </stringConstant>

<symbol> ; </symbol>

<symbol> } </symbol>

</tokens>

<tokens>

<keyword>

if

</keyword>

<symbol>

(

</symbol>

<identifier>

x

</identifier>

<symbol>

&lt;

</symbol>

<integerConstant>

153

</integerConstant>

<symbol>

)

</symbol>

<symbol>

{

</symbol>

<keyword>

let

</keyword>

<identifier>

city

</identifier>

<symbol>

=

</symbol>

<stringConstant>

Paris

</stringConstant>

<symbol>

;

</symbol>

<symbol>

}

</symbol>

</tokens>

$

+ 4


(8)

Parsing

$

+

+

*

&

+

$

+

*

"

*

5

!

" #


(9)

Parsing examples

(5+3)*2

sqrt(9*4)

she discussed sex with her doctor

-5

sqrt

+

*

3

2

9

4

*

Jack

English

discussed

she

sex

with

her doctor

parse 1

discussed

she

with

her doctor

parse 2


(10)

More examples of challenging parsing

3

$

*

*

3

$

*

*

(


(11)

!

#

7

!

(

#

8

0

9

&

A typical grammar of a typical C-like language

while (expression) {

if (expression)

statement;

while (expression) {

statement;

if (expression)

statement;

}

while (expression) {

statement;

statement;

}

}

if (expression) {

statement;

while (expression)

statement;

statement;

}

if (expression)

if (expression)

statement;

}

while (expression) {

if (expression)

statement;

while (expression) {

statement;

if (expression)

statement;

}

while (expression) {

statement;

statement;

}

}

if (expression) {

statement;

while (expression)

statement;

statement;

}

if (expression)

if (expression)

statement;

}

program: statement;

statement:

whileStatement

| ifStatement

|

// other statement possibilities ...

| '{' statementSequence '}'

whileStatement: 'while' '(' expression ')' statement

ifStatement: simpleIf

| ifElse

simpleIf: 'if' '(' expression ')' statement

ifElse: 'if' '(' expression ')' statement

'else' statement

statementSequence: ''

// null, i.e. the empty sequence

| statement ';' statementSequence

expression:

// definition of an expression comes here

// more definitions follow

program: statement;

statement:

whileStatement

| ifStatement

|

// other statement possibilities ...

| '

{

' statementSequence '

}

'

whileStatement: '

while

' '

(

' expression '

)

' statement

ifStatement: simpleIf

| ifElse

simpleIf: '

if

' '

(

' expression '

)

' statement

ifElse: '

if

' '

(

' expression '

)

' statement

'

else

' statement

statementSequence: ''

// null, i.e. the empty sequence

| statement '

;

' statementSequence

expression: // definition of an expression comes here


(12)

Parse tree

statement

whileStatement

expression

statementSequence

statement

statement

statementSequence

Input Text:

while

(count<=100) {

/** demonstration */

count++;

// ...

Tokenized:

while

(

count

<=

100

)

{

count

++

;

...

program: statement;

statement: whileStatement

| ifStatement

|

// other statement possibilities ...

| '{' statementSequence '}'

whileStatement: 'while'

'(' expression ')'

statement

...

program: statement;

statement: whileStatement

| ifStatement

|

// other statement possibilities ...

| '

{

' statementSequence '

}

'

whileStatement: '

while

'

'

(

' expression '

)

'

statement


(13)

Recursive descent parsing

-"

parseStatement()

parseWhileStatement()

parseIfStatement()

parseStatementSequence()

parseExpression().

9

//!:#

$

$

;

$

6 $

//!:#&

while (expression) {

statement;

statement;

while (expression) {

while (expression)

statement;

statement;

}

}

while (expression) {

statement;

statement;

while (expression) {

while (expression)

statement;

statement;

}


(14)

A linguist view on parsing

-<

"

" =

"

*=

" *=

"

"

&

!

- $ "


(15)

The Jack grammar

’x’

:

*

x

:

x?

:

:

;

x*

:

:

x|y

:

(x,y)

:

"

.

’x’

:

*

x

:

x?

:

:

;

x*

:

:

x|y

:


(16)

The Jack grammar (cont.)

’x’

:

*

x

:

x?

:

:

;

x*

:

:

:

’x’

:

*

x

:

x?

:

:

;

x*

:

:


(17)

Jack syntax analyzer in action

Class Bar {

method Fraction foo(int y) {

var int temp; // a variable

let temp = (xxx+12)*-63;

...

...

Class Bar {

method Fraction foo(int y) {

var int temp; // a variable

let temp = (xxx+12)*-63;

...

...

+

>

"

+

!

#

+

$

"

" & & .'/&

<varDec>

<keyword> var </keyword>

<keyword> int </keyword>

<identifier> temp </identifier>

<symbol> ; </symbol>

</varDec>

<statements>

<letStatement>

<keyword> let </keyword>

<identifier> temp </identifier>

<symbol> = </symbol>

<expression>

<term>

<symbol> ( </symbol>

<expression>

<term>

<identifier> xxx </identifier>

</term>

<symbol> + </symbol>

<term>

<int.Const.> 12 </int.Const.>

</term>

</expression>

...

<varDec>

<keyword> var </keyword>

<keyword> int </keyword>

<identifier> temp </identifier>

<symbol> ; </symbol>

</varDec>

<statements>

<letStatement>

<keyword> let </keyword>

<identifier> temp </identifier>

<symbol> = </symbol>

<expression>

<term>

<symbol> ( </symbol>

<expression>

<term>

<identifier> xxx </identifier>

</term>

<symbol> + </symbol>

<term>

<int.Const.> 12 </int.Const.>

</term>

</expression>

...

Syntax analyzer

+ 4

(

"

<xxx>

Recursive code for the body of xxx

</xxx>

xxx

(keyword, symbol, constant, or identifier)

"

<xxx>

xxx

value

</xxx>


(18)

(19)

(20)

CompilationEngine: a recursive top-down parser for Jack

The CompilationEngine effects the actual compilation output.

It gets its input from a

JackTokenizer

and emits its parsed structure into an

output file/stream.

The output is generated by a series of

compilexxx()

routines, one for every

syntactic element

xxx

of the Jack grammar.

The contract between these routines is that each

compilexxx()

routine should

read the syntactic construct

xxx

from the input,

advance()

the tokenizer

exactly beyond

xxx

, and output the parsing of

xxx

.

Thus,

compilexxx()

may only be called if indeed

xxx

is the next syntactic

element of the input.

In the first version of the compiler, which we now build, this module emits a

structured printout of the code, wrapped in XML tags (defined in the specs of

project 10). In the final version of the compiler, this module generates

executable VM code (defined in the specs of project 11).


(21)

(22)

(23)

(24)

Summary and next step

(Chapter 11)

Jack

Program

Toke-nizer

Parser

Code

Gene

-ration

Syntax Analyzer

Jack Compiler

VM

code

XML

code

(Chapter 10)

+

(*

"

.'/

"

*

?'


(25)

Perspective

*

+

* *

Lex

$

+

Yacc

@

!

&&&#

6 $

let

"

do

" &&&

5

5

$

"

&

1

6 $

$

"

(

9

8


(1)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 20

CompilationEngine: a recursive top-down parser for Jack

The CompilationEngine effects the actual compilation output.

It gets its input from a JackTokenizer

and emits its parsed structure into an

output file/stream.

The output is generated by a series of compilexxx()

routines, one for every

syntactic element xxx

of the Jack grammar.

The contract between these routines is that each compilexxx()

routine should

read the syntactic construct

xxx

from the input,

advance()

the tokenizer

exactly beyond xxx, and output the parsing of xxx.

Thus,

compilexxx()

may only be called if indeed

xxx

is the next syntactic

element of the input.

In the first version of the compiler, which we now build, this module emits a

structured printout of the code, wrapped in XML tags (defined in the specs of

project 10). In the final version of the compiler, this module generates

executable VM code (defined in the specs of project 11).


(2)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 21

CompilationEngine (cont.)


(3)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 22

CompilationEngine (cont.)


(4)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 23

CompilationEngine (cont.)


(5)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 24

Summary and next step

(Chapter 11)

Jack

Program

Toke-nizer

Parser

Code

Gene

-ration

Syntax Analyzer

Jack Compiler

VM

code

XML

code

(Chapter 10)

+

(*

"

.'/

"

*

?'


(6)

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 25

Perspective

*

+

* *

Lex

$

+

Yacc

@

!

&&&#

6 $

let

"

do

" &&&

5

5

$

"

&

1

6 $

$

"

(

9

8