Professor Mike Schulte Computer Architecture ECE 201
Lecture 10: Designing a Multicycle Processor Professor Mike Schulte Computer Architecture ECE 201 Recap: A Single Cycle Datapath
° We designed a processor that requires one cycle per instruction.
Instruction<31:0> <21:25> <16:20> <11:15> nPC_sel
<0:15> Instruction Fetch Unit
Rd Rt Clk RegDst
1 Mux Rt Rs Rd Imm16 Rs Rt
RegWr ALUctr
5
5
5 MemtoReg busA Zero MemWr Rw Ra Rb busW ALU
32 32 32-bit
32 Mux Clk
32
32 Extender
1 WrEn Adr
32 Data imm16
32
16 Memory Clk ALUSrc ExtOp
What’s wrong with our CPI=1 processor?
Arithmetic & Logical
mux mux
PC Inst Memory Reg File ALU setup Load
mux mux
PC Inst Memory Reg File ALU Data Mem setup
Critical Path
Store
mux
PC Inst Memory Reg File ALU Data Mem Branch
mux
PC Inst Memory Reg File cmp
° Long cycle time ° All instructions take as much time as the slowest ° Real memory is not so nice as our idealized memory
- cannot always get the job done in one (short) cycle
Reducing Cycle Time
° Cut combinational dependency graph and insert register / latch ° Do same work in two fast cycles, rather than one slow one
storage element storage element Acyclic
Acyclic Combinational
Combinational Logic (A)
Logic
storage element =>
Acyclic Combinational Logic (B) storage element storage element
Basic Limits on Cycle Time
° Next address logic ° Data Memory Access
- PC <= branch ? PC + offset : PC + 4 • M <= Mem[ALUout]
° Instruction Fetch ° Result Store
- InstructionReg <= Mem[PC] • R[rd] <=ALUout
° Operand Fetch
- A <= R[rs], B <= R[rt]
° ALU execution
Control
- ALUout <= A + B
sel _ RegWr MemRd MemWr RegDst nPC ExtOp ALUSrc
ALUctr .
Exec
Reg File PC Fetch Fetch
Mem Access Operand Next PC Instruction
Result Store
Partitioning the CPI=1 Datapath
° Add registers between smallest steps sel _
RegWr MemRd MemWr RegDst MemWr ExtOp nPC
ALUSrc ALUctr .
Exec
Reg File PC Fetch
Mem
Fetch Operand Access Next PC Instruction
Result Store Allow the instruction to take multiple cycles.
Example Multicycle Datapath
sel _ RegWr
MemToReg RegDst MemRd MemWr nPC
ALUctr ALUSrc ExtOp .
A Reg
S
Reg File Ext ALU
IR
File
PC
B
Next PC
M
Mem
Access Fetch InstructionResult Store Fetch Operand ° Additional registers are added to store values between stages.
R-type instructions (add, sub, . . .)
inst Logical Register Transfers
ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4inst Physical Register Transfers
IR <– MEM[pc]
ADDU A<– R[rs]; B <– R[rt] S <– A + B R[rd] <– S; PC <– PC + 4 .A S
Reg File Reg File Exec
IR PC . Mem
B
Next PC Inst M Mem Access
Logical immediate instructions inst Logical Register Transfers ADDU R[rt] <– R[rs] OR zx(Im16); PC <– PC + 4 inst Physical Register Transfers
IR <– MEM[pc] ADDU A<– R[rs]; B <– R[rt]
S <– A or ZeroExt(Im16)
R[rt] <– S; PC <– PC + 4 .A S
Reg File Reg File Exec
IR PC . Mem
B
Next PC Inst M Mem
Access
Load instruction inst Logical Register Transfers
LW R[rt] <– MEM(R[rs] + sx(Im16);
PC <– PC + 4 inst Physical Register TransfersIR <– MEM[pc] LW A<– R[rs]; B <– R[rt]
S <– A + SignEx(Im16)
M <– MEM[S] R[rd] <– M; PC <– PC + 4 .A S
Reg File Reg File Exec
IR PC . Mem
B
Next PC Inst M Mem
Access
Store instruction inst Logical Register Transfers
SW MEM(R[rs] + sx(Im16) <– R[rt];
PC <– PC + 4 inst Physical Register TransfersIR <– MEM[pc] SW A<– R[rs]; B <– R[rt]
S <– A + SignEx(Im16);
MEM[S] <– B PC <– PC + 4 .A S
Reg File Reg File Exec
IR PC . Mem
B
Next PC Inst M Mem
Access
Branch instruction inst Logical Register Transfers BEQ if R[rs] == R[rt] then PC <= PC + sx(Im16) || 00 else PC <= PC + 4
inst Physical Register Transfers
IR <– MEM[pc]
A<– R[rs]; B <– R[rt]Eq =(A - B == 0)
BEQ&Eq PC <– PC + sx(Im16) || 00
.
A S
Reg File Reg File Exec
IR PC . Mem
B
Next PC Inst M Mem
Access
Our Control Model
° State specifies control signals for Register Transfer ° Transfer occurs upon exiting state
inputs (conditions) Next State Logic
Control State Output Logic outputs (control sugbaks)
Control Specification for multicycle proc
“instruction fetch”
IR <= MEM[PC]
A <= R[rs] “decode / operand fetch” B <= R[rt] LW BEQ & Equal R-type ORi BEQ & ~Equal SW
PC <= PC + 4 PC <= PC + S <= A fun B S <= A or ZX S <= A + SX S <= A + SX SX || 00
Execute M <= MEM[S] MEM[S] <= B PC <= PC + 4
Memory Access R[rd] <= S R[rt] <= S R[rt] <= M
PC <= PC + 4 PC <= PC + 4 PC <= PC + 4 Write-back
datapath + state diagram => control ° Translate RTs into control points ° Assign states ° Then go build the controller
Mapping RTs to Control Points
IR <= MEM[PC]
“instruction fetch”
imem_rd, IRen A <= R[rs]
“decode”
B <= R[rt] Aen, Ben
LW BEQ & Equal R-type ORi
SW BEQ & ~Equal
S <= A fun B PC <= PC + 4 PC <= PC + S <= A or ZX S <= A + SX S <= A + SX
ALUfun , Sen
SX || 00 Execute
M <= MEM[S] MEM[S] <= B PC <= PC + 4 Memory
R[rd] <= S PC <= PC + 4 RegDst,
R[rt] <= S R[rt] <= M RegWr,
PC <= PC + 4 PC <= PC + 4 PCen
Write-back
Assigning States
IR <= MEM[PC] R-type
0000 0001 0100 0101
R: ORi: LW: SW:
State Op field Eq Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst
0000 xxxxxx x 0001 1 0 x 0 0 x x x 0 x 0 0 x 0 x 0001 BEQ 0011 0 0 x 1 1 x x x 0 x 0 0 x 0 x 0001 BEQ 1 0010 0 0 x 1 1 x x x 0 x 0 0 x 0 x 0001 R-type x 0100 0 0 x 1 1 x x x 0 x 0 0 x 0 x 0001 orI x 0110 0 0 x 1 1 x x x 0 x 0 0 x 0 x 0001 LW x 1000 0 0 x 1 1 x x x 0 x 0 0 x 0 x 0001 SW x 1011 0 0 x 1 1 x x x 0 x 0 0 x 0 x 0010 xxxxxx x 0000 0 1 1 0 0 x x x 0 x 0 0 x 0 x 0011 xxxxxx x 0000 0 1 0 0 0 x x x 0 x 0 0 x 0 x 0100 xxxxxx x 0101 0 0 x 0 0 x 1 fun 0 x 0 0 x 0 x 0101 xxxxxx x 0000 0 1 0 0 0 x x x 0 x 0 0 0 1 1 0110 xxxxxx x 0111 0 0 x 0 0 0 0 or 1 x 0 0 0 1 1 0111 xxxxxx x 0000 0 1 0 0 0 x x x 0 x 0 0 0 1 0 1000 xxxxxx x 1001 0 0 x 0 0 1 0 add 1 x 0 0 0 1 0 1001 xxxxxx x 1010 0 0 x 0 0 x x x 0 1 0 1 x 0 x 1010 xxxxxx x 0000 0 1 0 0 0 x x x 0 x 0 0 1 1 0 1011 xxxxxx x 1100 0 0 x 0 0 1 0 add 1 x 0 0 x 0 x 1100 xxxxxx x 0000 0 1 0 0 0 x x x 0 0 1 0 x 0 x
1100
1010 0011 0010 1011
0110 0111 1000 1001
Execute Memory Write-back
A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S
SW “instruction fetch” “decode”
PC <= PC + 4 PC <= PC + SX || 00
BEQ & Equal BEQ & ~Equal
S <= A + SX MEM[S] <= B PC <= PC + 4
LW
S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S]
ORi
PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4
Detailed Control Specification
Performance Evaluation
- state diagram gives CPI for each instruction type
- workload gives frequency of each type Type CPI
1.5 Store 4 10%
sequencer control datapath control micro-PC sequencer microinstruction
° Use this structure to construct a simple “microsequencer” ° Control reduces to programming this very simple device
° The state digrams that arise define the controller for an instruction set processor are highly structured
0.6 Average CPI:4.1
0.4 branch 3 20%
1.6 Load 5 30%
Arith/Logic 4 40%
i
x freqI
i
for type Frequency CPI
i
° What is the average CPI?
Controller Design
- microprogramming
Example: micro PC
op-code Map ROM Counter zero inc load
0000 i i+1 i
Using a micro PC
PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4
ORi
S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S]
LW
S <= A + SX MEM[S] <= B PC <= PC + 4
A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S
PC <= PC + 4 PC <= PC + SX || 00
SW “instruction fetch” “decode”
Execute Memory Write-back
0000 0001 0100 0101
0110 0111 1000 1001
1010 0011 0010 1011
1100 inc load inc zero zero zero zero zero zero inc inc inc inc inc
IR <= MEM[PC] R-type
BEQ & Equal BEQ & ~Equal
Our Micro Sequencer
op-code Map ROM Micro-PC
Z I L datapath control taken
Microprogram Control Specification
0000 ? inc 1 0 x 0 0 x x x 0 x 0 0 x 0 x 0001 load 0 0 x 1 1 x x x 0 x 0 0 x 0 x 0001 1 inc 0 0 x 1 1 x x x 0 x 0 0 x 0 x 0010 x zero 0 1 1 0 0 x x x 0 x 0 0 x 0 x 0011 x zero 0 1 0 0 0 x x x 0 x 0 0 x 0 x 0100 x inc 0 0 x 0 0 x 1 fun 0 x 0 0 x 0 x 0101 x zero 0 1 0 0 0 x x x 0 x 0 0 0 1 1 0110 x inc 0 0 x 0 0 0 0 or 1 x 0 0 0 1 1 0111 x zero 0 1 0 0 0 x x x 0 x 0 0 0 1 0 1000 x inc 0 0 x 0 0 1 0 add 1 x 0 0 0 1 0 1001 x inc 0 0 x 0 0 x x x 0 1 0 1 x 0 x 1010 x zero 0 1 0 0 0 x x x 0 x 0 0 1 1 0 1011 x inc 0 0 x 0 0 1 0 add 1 x 0 0 x 0 x 1100 x zero 0 1 0 0 0 x x x 0 0 1 0 x 0 x
µPC Taken Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst
R: ORi: LW: SW: BEQ
Mapping ROM
R-type 000000 0100 BEQ 000100 0011 ori 001101 0110 LW 100011 1000 SW 101011 1011 opcode
µPC
Example: Controlling Memory
PC Instruction Memory Inst. Reg
addr data
IR_en InstMem_rd
IM_wait
Controller handles non-ideal memory
“instruction fetch”
IR <= MEM[PC]
wait ~wait A <= R[rs] “decode / operand fetch” B <= R[rt]
LW BEQ & Equal R-type ORi SW BEQ & ~Equal PC <= PC + 4 PC <= PC + S <= A fun B S <= A or ZX S <= A + SX S <= A + SX
SX || 00 Execute
M <= MEM[S] MEM[S] <= B Memory
~wait wait wait ~wait
R[rd] <= S R[rt] <= S R[rt] <= M PC <= PC + 4 PC <= PC + 4 PC <= PC + 4 PC <= PC + 4
Write-back
Really Simple Time-State Control
IR <= MEM[PC] wait ~wait instruction fetch
A <= R[rs] B <= R[rt] decode
LW BEQ & Equal R-type ORi SW BEQ & ~Equal S <= A fun B S <= A or ZX S <= A + SX S <= A + SX
Execute M <= MEM[S] MEM[S] <= B
Memory wait wait
R[rd] <= S R[rt] <= S R[rt] <= M PC <= PC + 4PC <= PC + PC <= PC + 4 PC <= PC + 4 PC <= PC + 4
PC <= PC + 4 SX || 00 write-back
Time-state Control Path
° Local decode and control at each stage Exec
Reg .
File
Mem
AccessA B S M
Reg File Equal PC Next PC
IR Inst . Mem
Valid
IRex Dcd Ctrl
IRmem Ex Ctrl
IRwb Mem Ctrl WB Ctrl
Overview of Control
° Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Finite State Diagram Microprogram Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Representation Logic Equations Truth Tables Implementation PLA ROM Technique “hardwired control” “microprogrammed control”
Summary
° Disadvantages of the Single Cycle Processor
- Long cycle time
- Cycle time is too long for all instructions except the Load ° Multiple Cycle Processor:
- Divide the instructions into smaller steps
- Execute each step (instead of the entire instruction) in one
cycle ° Partition datapath into equal size chunks to minimize cycle time
- ~10 levels of logic between latches
° Follow same 5-step method for designing “real” processor
Summary (cont’d)
° Control is specified by finite state digram ° Specialize state-diagrams easily captured by microsequencer
- simple increment & “branch” fields
- datapath control fields
° Control design reduces to Microprogramming ° Control is more complicated with
- complex instruction sets
- restricted datapaths (see the book)
° Simple Instruction set and powerful datapath => simple control