Maximum Speedup ≤≤ Number of stages Speedup ≤ Time for unpipelined operation Time for longest stage

Lecture 13: Pipeline Control and Exceptions Professor Mike Schulte Computer Architecture ECE 201 Recap: Ideal Pipelining

IF DCD EX MEM WB

Example: 40ns data path, 5 stages, Longest stage is 10 ns, Speedup

≤

4 Assume instructions

are completely independent!

Maximum Speedup ≤≤ Number of stages Speedup ≤ ≤ Time for unpipelined operation Time for longest stage

Recap: Control Diagram

IR <- Mem[PC]; PC < PC+4; A <- R[rs]; B<– R[rt] S <– A + B; S <– A or ZX; S <– A + SX; S <– A + SX; If Cond PC < PC+SX; M <– S M <– Mem[S] Mem[S] <- B M <– S R[rd] <– S; R[rt] <– S; R[rd] <– M;

Equal .

A M S Reg File

Reg File

IR Exec PC

. Mem B

Next PC Inst

D Mem

Data Mem Access

But recall use of “Data Stationary Control”

° The Main Control generates the control signals during

Reg/Dec

Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later
Control signals for Mem (MemWr Branch) are used 2 cycles later
Control signals for Wr (MemtoReg MemWr) are used 3 cycles later

Reg/Dec Exec Mem Wr ExtOp ExtOp ALUSrc ALUSrc

Ex/ Mem

IF/ID Register

ID/Ex Register ALUOp ALUOp

Mem Main /Wr

RegDst RegDst Control Register Register

MemW MemW MemW r r r Branch Branch Branch MemtoReg MemtoReg MemtoReg MemtoReg

RegWr RegWr RegWr RegWr

Datapath + Data Stationary Control

Let’s Try it Out 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 these addresses are octal

wb rw

me wb rw

ex me wb rw

Exec Reg .

File Mem

WB Ctrl

Mem Ctrl

D Decode

IR Inst . Mem

PC Next PC

A B S Reg File

Access Data Mem

rs rt op rs rt fun

Start: Fetch 10

Exec Reg .

File Mem

Access Data Mem

A B S Reg File

IR Inst . Mem

D Decode

Mem Ctrl

WB Ctrl

M rs rt im

10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 n n n n

PC Next PC

Fetch 14, Decode 10

File Mem

10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 n n n lw r1, r2(35)

Exec Reg .

PC Next PC

A B S Reg File

2 rt

WB Ctrl

Mem Ctrl

D Decode

IR Inst . Mem

Access Data Mem

Fetch 20, Decode 14, Exec 10

2 rt

PC Next PC

IF EX

35 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 n n lw r1 addI r2, r2, 3

Exec Reg .

File Mem

WB Ctrl

Mem Ctrl

D Decode

IR Inst . Mem

B S Reg File

Access Data Mem

Fetch 24, Decode 20, Exec 14, Mem 10

File Mem

Exec Reg .

PC Next PC

IF EX M

3 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 n lw r1 sub r3, r4, r5 addI r2, r2, 3

WB Ctrl

Mem Ctrl

D Decode

IR Inst . Mem

Reg File

r2+35

Access Data Mem

Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10

Exec Reg .

Note Delayed Branch: always execute ori after beq

PC Next PC

IF EX M WB

7 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

beq r6, r7 100 addI r2 sub r3

M[r2+35]

WB Ctrl

Mem Ctrl

D Decode

IR Inst . Mem

Reg File

r4 r5 r2+3

Access Data Mem

File Mem

Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14

Access Data Mem

100 ori r8, r9 17

File Mem

100

PC Next PC

IF EX M WB

10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 beq addI

sub r3 r4-r5

r6 r7

r2+3

r1=M[r2+35] 9 xx

WB Ctrl

Mem Ctrl

D Decode

IR Inst . Mem

Reg File

Exec Reg .

Fetch 104, Dcd 100, Ex 30, Mem 24, WB 20

Exec Reg .

File Mem

Access Data Mem

Reg File

IR Inst . Mem

D Decode

Mem Ctrl

WB Ctrl

10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

ID EX M WB

PC Next PC

___

= Fill it in yourself!

Fetch 110, Dcd 104, Ex 100, Mem 30, WB 24

Access Data Mem

Reg File

IR Inst . Mem

D Decode

Mem Ctrl

WB Ctrl

10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

EX M WB

PC Next PC

___

= Fill it in yourself!

? ? ? ?

Exec Reg .

File Mem

Fetch 114, Dcd 110, Ex 104, Mem 100, WB 30

Exec Reg .

? ? ? ?

IF DCD EX Mem WB

IF DCD OF Ex RS WAR Data Hazard

(write after read)

(write after write)

IF DCD OF Ex Mem

RAW (read after write) Data Hazard

WAW Data Hazard

IF DCD EX Mem WB

IFetch DCD ° ° °

Control Hazard

IFetch DCD ° ° ° Structural Hazard

? ?

= Fill it in yourself!

File Mem

___

PC Next PC

M WB

10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

WB Ctrl

Mem Ctrl

D Decode

IR Inst . Mem

Reg File

Access Data Mem

Pipeline Hazards Again I-Fet ch DCD MemOpFetch OpFetch Exec Store

Data Hazards

° Avoid some “by design”

eliminate WAR by always fetching operands early (DCD) in pipe
eleminate WAW by doing all WBs in order (last stage, static)
stall or forward (if possible)

° Detect and resolve remaining ones

IF DCD EX Mem WB

IF DCD OF Ex Mem RAW Data Hazard WAW Data Hazard

IF DCD OF Ex RS RAW Data Hazard

IF DCD EX Mem WB

IF DCD EX Mem WB Hazard Detection

° Suppose instruction i is about to be issued and a predecessor instruction j is in the instruction pipeline. ° A RAW hazard exists on register

Wregs( j )

ρ ρ if

ρ ∈ ρ ∈

Rregs( i )

∩ ∩

Keep a record of pending writes (for inst's in the pipe) and compare with operand regs of current instruction.
When instruction issues, reserve its result register.
When on operation completes, remove its write reservation.

° A WAW hazard exists on register ρ ρ if ρ ∈ ρ ∈ Wregs( i ) ∩ ∩ Wregs( j ) ° A WAR hazard exists on register ρ ρ if ρ ∈ ρ ∈ Wregs( i ) ∩ ∩ Rregs( j )

Record of Pending Writes

IAU

npc ° Current operand

registers

I mem

° Pending writes

Regs op rw rs rt PC

° hazard <=

B A im n op rw

((rs == rw & regW ) OR

ex) ex

((rs == rw & regW ) OR

mem) me

alu ((rs == rw & regW ) OR

wb) wb S n op rw ((rt == rw & regW ) OR ex) ex

((rt == rw & regW ) OR

mem) me

D mem ((rt == rw ) & regW )

wb wb m n op rw

Regs

Resolve RAW by forwarding

IAU

° Detect nearest valid write op npc operand register and forward into

I mem

op latches, bypass ing

Regs

remainder of the

op rw rs rt PC

Forward pipe mux

Increase muxes to

B A

im n op rw

add paths from pipeline registers

alu

Data Forwarding =

n op rw

Data Bypassing

D mem

n op rw Regs

What about memory operations? ° If instructions are initiated in order and operations always occur in the same

op Rd Ra Rb

stage, there can be no hazards between memory operations! ° What does delaying WB on arithmetic operations cost? – cycles ?

op Rd Ra Rb

– hardware ? A B ° What about data dependence on loads? R1 <- R4 + R5 R2 <- Mem[ R2 + I ] R3 <- R2 + R1 Rd R => " Delayed Loads " T Rd to reg file Compiler Avoiding Load Stalls: scheduled unscheduled 54% gcc 31% 42% spice 14% 65% tex 25% 0% 20% 40% 60% 80% % loads stalling pipeline

What about Interrupts, Traps, Faults?

° External

Interrupts:

Allow pipeline to drain,
Load PC with interupt address

° Faults (within instruction, restartable)

Force trap instruction into IF
disable writes till trap hits WB
must save multiple PCs or PC + state

Refer to MIPS solution

Exception Handling

IAU

npc

I mem detect bad instruction address

lw $2,20($5)

Regs PC

detect bad instruction

B A im n op rw detect overflow

alu

S detect bad data address

D mem

Regs Allow exception to take effect

Exception Problem

° Exceptions/Interrupts : 5 instructions executing in 5 stage pipeline

How to stop the pipeline?
Restart?
Who caused the interrupt?

Stage Problem interrupts occurring

IF Page fault on instruction fetch; misaligned memory access; memory-protection violation

ID Undefined or illegal opcode EX Arithmetic exception MEM Page fault on data fetch; misaligned memory access; memory-protection violation; memory error

° Load with data page fault, Add with instruction page fault? ° Solution 1: interrupt vector/instruction

Resolution: Freeze above & Bubble Below npc

alu

D mem

IAU PC Regs

im op rw n op rw n op rw n op rw rs rt

bubble freeze

I mem Regs

MIPS R3000 Instruction Pipeline Decode Inst Fetch ALU / E.A Memory Write Reg Reg. Read TLB I-Cache RF Operation WB E.A. TLB D-Cache Resource Usage TLB TLB I-cache RF WB ALUALU D-Cache

Write in phase 1, read in phase 2 => eliminates bypass from WB

Recall: Data Hazard on r1

Time (clock cycles)

ID/RF EX MEM WB ALU Reg Reg Im Dm add r1 ,r2,r3

I ALU n Im Reg Dm Reg s sub r4, r1 ,r3 t

ALU

Im Reg Dm Reg

and r6, r1 ,r7

O ALU r Im Reg Dm Reg or r8, r1 ,r9 d

ALU e

Im Reg Dm Reg r xor r10, r1 ,r11

With MIPS R3000 pipeline, no need to forward from WB stage

MIPS R3000 Multicycle Operations op Rd Ra Rb Ex: Multiply, Divide, Cache Miss Stall all stages above multicycle operation in the pipeline mul Rd Ra Rb A B Drain (bubble) stages below it Use control word of local stage state to step through multicycle operation Rd R Rd T to reg file Issues in Pipelined design

_{IF D Ex M W} Limitation ° Pipelining _{IF D Ex M W} _{IF D Ex M W} _{IF D Ex M W} Issue rate, FU stalls, FU depth ° Super-pipeline

- Issue one instruction per (fast) cycle

_{IF D Ex M W<- ALU takes multiple cycles
_{IF D Ex M W} _{IF D Ex M W} _{IF D Ex M W} Clock skew, FU stalls, FU depth ° Super-scalar _{IF D Ex M W} _{IF D Ex M W} Hazard resolutio- Issue multiple scalar
_{IF D Ex M W} instructions per cycle _{IF D Ex M W} ° VLIW (“EPIC”- Each instruction specifies
_{IF D Ex M W} _{Ex M W} Packing
multiple scalar operations _{Ex M W<- Compiler determines parallelism
_{Ex M W} ° Vector operations _{IF D Ex M W} _{Ex M W} Applicabilit- Each instruction specifies
_{Ex M W} series of identical operations _{Ex M W}
Partitioned Instruction Issue (simple Superscalar)
independent int and FP issue to separate pipelines
I-Cache Int Reg Inst Issue FP Reg and Bypass Operand / Result Busses Int Unit Load / FP Add FP Mul Store Unit D-Cache Single Issue Total Time = Int Time + FP Time Max Speedup: Total Time MAX(Int Time, FP Time) Multiple Pipes/ Harder Superscalar
IR0
IR1 Issues:

Reg. File ports
Register File Detecting Data Dependences A B B A Bypassing

RAW Hazard
WAR Hazard
R R D$ D$ Multiple load/store ops? T T Branches
Branch penalties in superscalar Example: resolved in op-fetch stage, single exposed delay (ala MIPS, Sparc)
I-fetch Branch delay Squash 2
I-fetch
Branch Squash 1 delay Summary
° Pipelines pass control information down the pipe just as data moves down pipe
° Forwarding/Stalls handled by local control ° Exceptions stop the pipeline ° MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load)
° More performance from deeper pipelines, parallelism}}

Maximum Speedup ≤≤ Number of stages Speedup ≤ Time for unpipelined operation Time for longest stage

Lecture 13: Pipeline Control and Exceptions Professor Mike Schulte Computer Architecture ECE 201 Recap: Ideal Pipelining

Maximum Speedup ≤≤ Number of stages Speedup ≤ ≤ Time for unpipelined operation Time for longest stage

Recap: Control Diagram

But recall use of “Data Stationary Control”

Reg/Dec

Datapath + Data Stationary Control

Start: Fetch 10

Fetch 14, Decode 10

Fetch 20, Decode 14, Exec 10

Fetch 24, Decode 20, Exec 14, Mem 10

Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10

Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14

Fetch 104, Dcd 100, Ex 30, Mem 24, WB 20

Fetch 110, Dcd 104, Ex 100, Mem 30, WB 24

Fetch 114, Dcd 110, Ex 104, Mem 100, WB 30

Pipeline Hazards Again I-Fet ch DCD MemOpFetch OpFetch Exec Store

Data Hazards

IF DCD EX Mem WB Hazard Detection

Wregs( j )

Record of Pending Writes

Resolve RAW by forwarding

Forward pipe mux

Data Bypassing

What about memory operations? ° If instructions are initiated in order and operations always occur in the same

What about Interrupts, Traps, Faults?

Interrupts:

Exception Handling

Exception Problem

Resolution: Freeze above & Bubble Below npc

MIPS R3000 Instruction Pipeline Decode Inst Fetch ALU / E.A Memory Write Reg Reg. Read TLB I-Cache RF Operation WB E.A. TLB D-Cache Resource Usage TLB TLB I-cache RF WB ALUALU D-Cache

Recall: Data Hazard on r1

ID/RF EX MEM WB ALU Reg Reg Im Dm add r1 ,r2,r3

Partitioned Instruction Issue (simple Superscalar)

I-Cache Int Reg Inst Issue FP Reg and Bypass Operand / Result Busses Int Unit Load / FP Add FP Mul Store Unit D-Cache Single Issue Total Time = Int Time + FP Time Max Speedup: Total Time MAX(Int Time, FP Time) Multiple Pipes/ Harder Superscalar

IR1 Issues:

Register File Detecting Data Dependences A B B A Bypassing

Branch penalties in superscalar Example: resolved in op-fetch stage, single exposed delay (ala MIPS, Sparc)

Branch Squash 1 delay Summary

Dokumen yang terkait

Revisiting the Dimensions of Knowledge Management Orientation Behavior in Indonesia Creative Industry

The Role of Customer Engagement in Enhancing Passenger Loyalty in IndonesianAirline Industry: Relationship Marketing Approach

The impact of supply chain management practices on performance of SMEs

Research Journal of Hospitality Tourism

Market-sensing Capability and Business Performance of Retail Entrepreneurs

The Organizational of International Business

Political Economy of International Trade

Kopy Luwak: a conservation strategy for global market Aluisius Hery Pratono Irzameingindra Putri Radjamin

Paper Produksi Secara Industrial Materi Ajar Berbasis Teknologi Informasi (Industrial Production of IT-based Learning Resources)

1-bit ALU for MSB (correction)

Dokumen yang Anda mencari sudah siap untuk unduhkan