Professor Mike Schulte Computer Architecture ECE 201

Lecture 12: Pipeline Datapath Design Professor Mike Schulte Computer Architecture ECE 201 Designing a Pipelined Processor

° Go back and examine your datapath and control diagram

° associated resources with states ° ensure that flows do not conflict, or figure out how to resolve

° assert control in appropriate stage

Pipelined Processor (simplified for slides)

° What happens if we start a new instruction every cycle?

Valid

IR Ctrl Ctrl

. Mem

IRex

IRwb

IRmem WB Ctrl

Ex Ctrl Inst Dcd

Mem Equal .

A S

Reg File Reg File

Exec

Next PC

Mem Access

Data Mem

Control and Datapath

IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] If Cond S <– A + B; S <– A or ZX; S <– A + SX; S <– A + SX; PC < PC+SX; M <– Mem[S] Mem[S] <- B R[rd] <– S; R[rt] <– S; R[rd] <– M;

Equal .

Reg File Reg File

Exec

IR PC . Mem

B M

Next PC Inst

Mem Access

Data Mem

Pipelining the Load Instruction

Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Ifetch Reg/Dec Exec Mem Wr 1st lw Ifetch Reg/Dec Exec Mem Wr 2nd lw Ifetch Reg/Dec Exec Mem Wr 3rd lw The Four Stages of R-type

° The five independent functional units in the pipeline datapath are:

Instruction Memory for the Ifetch stage
Register File’s Read ports (bus A and busB) for the Reg/Dec stage
ALU for the Exec stage
Data Memory for the Mem stage
Register File’s Write port (bus W) for the Wr stage

° Ifetch: Instruction Fetch

Fetch the instruction from the Instruction Memory • Update PC

° Reg/Dec: Registers Fetch and Instruction Decode ° Exec:

ALU operates on the two register operands

° Wr: Write the ALU output back to the register file Cycle 1 Cycle 2 Cycle 3 Cycle 4 Ifetch Reg/Dec Exec Wr R-type

Pipelining the R-type and Load Instruction

Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Wr R-type Ops! We have a problem! Important Observation

° We have pipeline conflict or structural hazard:

Two instructions try to write to the register file at the same time!
Only one write port

° Each functional unit must be used at the same stage for all instructions:

Load uses Register File’s Write Port during its 5th stage
R-type uses Register File’s Write Port during its 4th stage

Ifetch Reg/Dec Exec Mem Wr Load

5 Ifetch Reg/Dec Exec Wr R-type

4 ° 2 ways to solve this pipeline hazard.

° Each functional unit can only be used once per instruction

Solution 1: Insert “Bubble” into the Pipeline

Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Wr R-type Pipeline Bubble Ifetch Reg/Dec Exec Wr R-type Solution 2: Delay R-type’s Write by One Cycle

° Insert a “bubble” into the pipeline to prevent 2 writes at the same cycle • The control logic can be complex.

Lose instruction fetch and issue opportunity.

° No instruction is started in Cycle 6!

° Delay R-type’s register write by one cycle:

Now R-type instructions also use Reg File’s write port at Stage 5
Mem stage is a NOOP stage: nothing is being done.

Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Ifetch Reg/Dec Mem Wr R-type Ifetch Reg/Dec Mem Wr R-type Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Mem Wr R-type Ifetch Reg/Dec Mem Wr R-type Ifetch Reg/Dec Exec Wr R-type Mem Exec Exec Exec Exec

Modified Control & Datapath

IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– M; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– M; S <– A + SX; Mem[S] <- B if Cond PC < PC+SX; M <– S Exec

M <– S The Four Stages of Store

Reg .

File Mem

Access Data Mem

A B

Reg File Equal

PC Next PC

IR Inst . Mem

° Ifetch: Instruction Fetch

Fetch the instruction from the Instruction Memory • Update PC

° Reg/Dec: Registers Fetch and Instruction Decode ° Exec: Calculate the memory address ° Mem: Write the data into the Data Memory

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Ifetch Reg/Dec Exec Mem Store Wr

The Three Stages of Beq

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Ifetch Reg/Dec Exec Mem Beq Wr Control Diagram

° Ifetch: Instruction Fetch

Fetch the instruction from the Instruction Memory

° Reg/Dec:

Registers Fetch and Instruction Decode

° Exec:

compares the two register operand,
select correct branch target address
latch into PC

IR <- Mem[PC]; PC < PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; Exec

Reg .

File Mem

A B

Reg File Equal

PC Next PC

IR Inst . Mem

M <– S M <– S

Access Data Mem

Datapath + Data Stationary Control

Exec Reg .

File Mem

Access Data Mem

A B S

Reg File PC

Next PC

Inst . Mem

Decode

Mem Ctrl WB Ctrl

M rs rt op rs rt fun im ex me wb rw v me wb rw v wb rw v rd shmt

Exec Ctrl

Let’s Try it Out 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

these addresses are octal Assume branch is taken, but predicted not taken

Start: Fetch 10

File Mem

14 lw r1, r2(35)

10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 n n n

M 2 rt im

Mem Ctrl WB Ctrl

Decode

Inst . Mem

Next PC

Reg File PC

A B S

Access Data Mem

Exec Reg .

File Mem

10 Fetch 14, Decode 10

10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 n n n n

M rs rt im

Mem Ctrl WB Ctrl

Decode

Inst . Mem

Next PC

Reg File PC

A B S

Access Data Mem

Exec Reg .

Fetch 20, Decode 14, Exec 10

Access Data Mem

24 lw r1 sub r3, r4, r5 addI r2, r2, 3

3 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 n

Mem Ctrl WB Ctrl

Decode

Inst . Mem

Next PC

Reg File PC

r2 B r2+35

Exec Reg .

File Mem

Exec Reg .

20 lw r1 addI r2, r2, 3 Fetch 24, Decode 20, Exec 14, Mem 10

35 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 n n

M 2 rt

Mem Ctrl WB Ctrl

Decode

Inst . Mem

Next PC

Reg File PC

r2 B S

Access Data Mem

File Mem

Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10

File Mem

100 ori r8, r9 17

34 beq addI r2 sub r3 r4-r5

9 xx 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

Mem Ctrl WB Ctrl r1=M[r2+35]

Decode

Inst . Mem

Next PC

Reg File PC

r6 r7 r2+3

Access Data Mem

Exec Reg .

File Mem

30 lw r1 beq r6, r7 100 addI r2 sub r3 Fetch 34, Dcd 30, Ex 24, Mem 20, WB 14

7 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

M[r2+35]

Mem Ctrl WB Ctrl

Decode

Inst . Mem

Next PC

Reg File PC

r4 r5 r2+3

Access Data Mem

Exec Reg .

Fetch 100, Dcd 34, Ex 30, Mem 24, WB 20

Exec Reg .

File Mem

Access Data Mem

r9 x

Reg File PC

Next PC

Inst . Mem

Decode

Mem Ctrl WB Ctrl r1=M[r2+35] 11 12

10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

100 beq r2 = r2+3

sub r3

r4-r5

17 ori r8 xxx add r10, r11, r12 ooops, we should have only one delayed instruction

Fetch 104, Dcd 100, Ex 34, Mem 30, WB 24

Access Data Mem

r11 r12

Reg File PC

Next PC

Inst . Mem

Decode

Mem Ctrl WB Ctrl r1=M[r2+35] 14 15

10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

104

beq

r2 = r2+3 r3 = r4-r5 xx ori r8

xxx

add r10 and r13, r14, r15 n

Branch does nothing in WB stage r9 | 17

Exec Reg .

File Mem

Fetch 108, Dcd 104, Ex 100, Mem 34, WB 30

Exec Reg .

File Mem

Access Data Mem

r14 r15

Reg File PC

Next PC

Inst . Mem

Decode

Mem Ctrl WB Ctrl r1=M[r2+35]

10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

110 r2 = r2+3 r3 = r4-r5 xx

ori

add r10 and r13 n

Or writes result if using a delayed branch r9 | 17 r11+r12

Fetch 114, Dcd 110, Ex 104 , Mem 100, WB 34

Access Data Mem

Reg File PC

Next PC

Inst . Mem

Decode

Mem Ctrl WB Ctrl r1=M[r2+35]

10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

114 r2 = r2+3 r3 = r4-r5 r8 = r9 | 17 add r10 and r13

Squash the extra add instruction in the branch shadow! r11+r12

NO WB NO Ovflow r14 & R15

Exec Reg .

File Mem

Summary: Pipelining

° What makes it easy

all instructions are the same length
just a few instruction formats
memory operands appear only in loads and stores

° What makes it hard?

structural hazards: suppose we had only one memory
control hazards: need to worry about branch instructions
data hazards: an instruction depends on a previous instruction

° We’ll talk about modern processors and what really makes it hard:

exception handling • trying to improve performance with out-of-order execution, etc.

Summary

° Pipelining is a fundamental concept

multiple steps using distinct resources

° Utilize capabilities of the datapath by pipelined instruction processing

Professor Mike Schulte Computer Architecture ECE 201

Lecture 12: Pipeline Datapath Design Professor Mike Schulte Computer Architecture ECE 201 Designing a Pipelined Processor

Pipelined Processor (simplified for slides)

Control and Datapath

Pipelining the Load Instruction

Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Ifetch Reg/Dec Exec Mem Wr 1st lw Ifetch Reg/Dec Exec Mem Wr 2nd lw Ifetch Reg/Dec Exec Mem Wr 3rd lw The Four Stages of R-type

Pipelining the R-type and Load Instruction

Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Wr R-type Ops! We have a problem! Important Observation

Solution 1: Insert “Bubble” into the Pipeline

Modified Control & Datapath

M <– S The Four Stages of Store

The Three Stages of Beq

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Ifetch Reg/Dec Exec Mem Beq Wr Control Diagram

Datapath + Data Stationary Control

Let’s Try it Out 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

Start: Fetch 10

Fetch 20, Decode 14, Exec 10

Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10

Fetch 100, Dcd 34, Ex 30, Mem 24, WB 20

Fetch 104, Dcd 100, Ex 34, Mem 30, WB 24

Fetch 108, Dcd 104, Ex 100, Mem 34, WB 30

Fetch 114, Dcd 110, Ex 104 , Mem 100, WB 34

Summary: Pipelining

Summary

Dokumen yang terkait

Key words : Prescribing Profile, Cardiovascular, Screening Recipe. PENDAHULUAN - PROFIL PERESEPAN OBAT PADA PASIEN RAWAT JALAN JAMKESDA DARI POLI KARDIOVASKULAR DI APOTEK RUMAH SAKIT LABUANG BAJI MAKASSAR PERIODE JANUARI – JUNI 201

Astract: Computer network technology develops very

PengaruhTata kelola perusahaan, Profitabilitas, Kebijakan Dividen, Intensitas penelitian dan Pengembangan terhadap Nilai Perusahaan Pada Perusahaan Manufaktur yang Tercatat di Bursa Efek Indonesia Periode Tahun 2009 – 201

Jill Gerhardt jill.gerhardtstockton.edu Computer Science and Information Systems Stockton College Galloway, New Jersey 08205, USA

Implementation of Computer Assisted Test Selection System in Local Governments

Professor Mike Schulte Computer Architecture ECE 201

Professor Mike Schulte Computer Architecture ECE 201

Professor Mike Schulte Computer Architecture ECE 201

Professor Mike Schulte Computer Architecture ECE 201

Professor Mike Schulte Computer Architecture ECE 201

Dukungan

Links

Professor Mike Schulte Computer Architecture ECE 201

Lecture 12: Pipeline Datapath Design Professor Mike Schulte Computer Architecture ECE 201 Designing a Pipelined Processor

Pipelined Processor (simplified for slides)

Control and Datapath

Pipelining the Load Instruction

Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Ifetch Reg/Dec Exec Mem Wr 1st lw Ifetch Reg/Dec Exec Mem Wr 2nd lw Ifetch Reg/Dec Exec Mem Wr 3rd lw The Four Stages of R-type

Pipelining the R-type and Load Instruction

Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Wr R-type Ifetch Reg/Dec Exec Wr R-type Ops! We have a problem! Important Observation

Solution 1: Insert “Bubble” into the Pipeline

Modified Control & Datapath

M <– S The Four Stages of Store

The Three Stages of Beq

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Ifetch Reg/Dec Exec Mem Beq Wr Control Diagram

Datapath + Data Stationary Control

Let’s Try it Out 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

Start: Fetch 10

Fetch 20, Decode 14, Exec 10

Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10

Fetch 100, Dcd 34, Ex 30, Mem 24, WB 20

Fetch 104, Dcd 100, Ex 34, Mem 30, WB 24

Fetch 108, Dcd 104, Ex 100, Mem 34, WB 30

Fetch 114, Dcd 110, Ex 104 , Mem 100, WB 34

Summary: Pipelining

Summary

Dokumen yang terkait

Key words : Prescribing Profile, Cardiovascular, Screening Recipe. PENDAHULUAN - PROFIL PERESEPAN OBAT PADA PASIEN RAWAT JALAN JAMKESDA DARI POLI KARDIOVASKULAR DI APOTEK RUMAH SAKIT LABUANG BAJI MAKASSAR PERIODE JANUARI – JUNI 201

Astract: Computer network technology develops very

PengaruhTata kelola perusahaan, Profitabilitas, Kebijakan Dividen, Intensitas penelitian dan Pengembangan terhadap Nilai Perusahaan Pada Perusahaan Manufaktur yang Tercatat di Bursa Efek Indonesia Periode Tahun 2009 – 201

Jill Gerhardt jill.gerhardtstockton.edu Computer Science and Information Systems Stockton College Galloway, New Jersey 08205, USA

Implementation of Computer Assisted Test Selection System in Local Governments

Professor Mike Schulte Computer Architecture ECE 201

Professor Mike Schulte Computer Architecture ECE 201

Professor Mike Schulte Computer Architecture ECE 201

Professor Mike Schulte Computer Architecture ECE 201

Professor Mike Schulte Computer Architecture ECE 201

Dokumen yang Anda mencari sudah siap untuk unduhkan