Professor Mike Schulte Computer Architecture ECE 201

Lecture 10: Designing a Multicycle Processor Professor Mike Schulte Computer Architecture ECE 201 Recap: A Single Cycle Datapath

  ° We designed a processor that requires one cycle per instruction.

  Instruction<31:0> <21:25> <16:20> <11:15> nPC_sel

  <0:15> Instruction Fetch Unit

  Rd Rt Clk RegDst

  1 Mux Rt Rs Rd Imm16 Rs Rt

  RegWr ALUctr

  5

  5

  5 MemtoReg busA Zero MemWr Rw Ra Rb busW ALU

  32 32 32-bit

  32 Mux Clk

  32

  32 Extender

  1 WrEn Adr

  32 Data imm16

  32

  16 Memory Clk ALUSrc ExtOp

What’s wrong with our CPI=1 processor?

  Arithmetic & Logical

  mux mux

  PC Inst Memory Reg File ALU setup Load

  mux mux

  PC Inst Memory Reg File ALU Data Mem setup

  Critical Path

  Store

  mux

  PC Inst Memory Reg File ALU Data Mem Branch

  mux

  PC Inst Memory Reg File cmp

  ° Long cycle time ° All instructions take as much time as the slowest ° Real memory is not so nice as our idealized memory

  • cannot always get the job done in one (short) cycle

Reducing Cycle Time

  ° Cut combinational dependency graph and insert register / latch ° Do same work in two fast cycles, rather than one slow one

  storage element storage element Acyclic

  Acyclic Combinational

  Combinational Logic (A)

  Logic

  storage element =>

  Acyclic Combinational Logic (B) storage element storage element

Basic Limits on Cycle Time

  ° Next address logic ° Data Memory Access

  • PC <= branch ? PC + offset : PC + 4M <= Mem[ALUout]

  ° Instruction Fetch ° Result Store

  • InstructionReg <= Mem[PC]R[rd] <=ALUout

  ° Operand Fetch

  • A <= R[rs], B <= R[rt]

  ° ALU execution

  Control

  • ALUout <= A + B

  sel _ RegWr MemRd MemWr RegDst nPC ExtOp ALUSrc

  ALUctr .

  Exec

  Reg File PC Fetch Fetch

  Mem Access Operand Next PC Instruction

  Result Store

Partitioning the CPI=1 Datapath

  ° Add registers between smallest steps sel _

  RegWr MemRd MemWr RegDst MemWr ExtOp nPC

  ALUSrc ALUctr .

  Exec

  Reg File PC Fetch

Mem

  Fetch Operand Access Next PC Instruction

  Result Store Allow the instruction to take multiple cycles.

Example Multicycle Datapath

  sel _ RegWr

  MemToReg RegDst MemRd MemWr nPC

  ALUctr ALUSrc ExtOp .

  A Reg

  S

  Reg File Ext ALU

  IR

  File

  PC

  B

  Next PC

  M

  

Mem

Access Fetch Instruction

  Result Store Fetch Operand ° Additional registers are added to store values between stages.

R-type instructions (add, sub, . . .)

  

inst Logical Register Transfers

ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4

inst Physical Register Transfers

  

IR <– MEM[pc]

ADDU A<– R[rs]; B <– R[rt] S <– A + B R[rd] <– S; PC <– PC + 4 .

  A S

  Reg File Reg File Exec

  IR PC . Mem

  B

  Next PC Inst M Mem Access

Logical immediate instructions inst Logical Register Transfers ADDU R[rt] <– R[rs] OR zx(Im16); PC <– PC + 4 inst Physical Register Transfers

  IR &lt;– MEM[pc] ADDU A&lt;– R[rs]; B &lt;– R[rt]

  

S &lt;– A or ZeroExt(Im16)

R[rt] &lt;– S; PC &lt;– PC + 4 .

  A S

  Reg File Reg File Exec

  IR PC . Mem

  B

  Next PC Inst M Mem

Access

Load instruction inst Logical Register Transfers

  

LW R[rt] &lt;– MEM(R[rs] + sx(Im16);

PC &lt;– PC + 4 inst Physical Register Transfers

  IR &lt;– MEM[pc] LW A&lt;– R[rs]; B &lt;– R[rt]

  

S &lt;– A + SignEx(Im16)

M &lt;– MEM[S] R[rd] &lt;– M; PC &lt;– PC + 4 .

  A S

  Reg File Reg File Exec

  IR PC . Mem

  B

  Next PC Inst M Mem

Access

Store instruction inst Logical Register Transfers

  

SW MEM(R[rs] + sx(Im16) &lt;– R[rt];

PC &lt;– PC + 4 inst Physical Register Transfers

  IR &lt;– MEM[pc] SW A&lt;– R[rs]; B &lt;– R[rt]

  

S &lt;– A + SignEx(Im16);

MEM[S] &lt;– B PC &lt;– PC + 4 .

  A S

  Reg File Reg File Exec

  IR PC . Mem

  B

  Next PC Inst M Mem

Access

Branch instruction inst Logical Register Transfers BEQ if R[rs] == R[rt] then PC <= PC + sx(Im16) || 00 else PC <= PC + 4

  

inst Physical Register Transfers

  

IR &lt;– MEM[pc]

A&lt;– R[rs]; B &lt;– R[rt]

Eq =(A - B == 0)

BEQ&amp;Eq PC &lt;– PC + sx(Im16) || 00

  .

  A S

  Reg File Reg File Exec

  IR PC . Mem

  B

  Next PC Inst M Mem

Access

Our Control Model

  ° State specifies control signals for Register Transfer ° Transfer occurs upon exiting state

  inputs (conditions) Next State Logic

  Control State Output Logic outputs (control sugbaks)

Control Specification for multicycle proc

  “instruction fetch”

  IR &lt;= MEM[PC]

  A &lt;= R[rs] “decode / operand fetch” B &lt;= R[rt] LW BEQ &amp; Equal R-type ORi BEQ &amp; ~Equal SW

  PC &lt;= PC + 4 PC &lt;= PC + S &lt;= A fun B S &lt;= A or ZX S &lt;= A + SX S &lt;= A + SX SX || 00

  Execute M &lt;= MEM[S] MEM[S] &lt;= B PC &lt;= PC + 4

  Memory Access R[rd] &lt;= S R[rt] &lt;= S R[rt] &lt;= M

  PC &lt;= PC + 4 PC &lt;= PC + 4 PC &lt;= PC + 4 Write-back

  datapath + state diagram =&gt; control ° Translate RTs into control points ° Assign states ° Then go build the controller

Mapping RTs to Control Points

  IR &lt;= MEM[PC]

  “instruction fetch”

  imem_rd, IRen A &lt;= R[rs]

  “decode”

  B &lt;= R[rt] Aen, Ben

  LW BEQ &amp; Equal R-type ORi

  SW BEQ &amp; ~Equal

  S &lt;= A fun B PC &lt;= PC + 4 PC &lt;= PC + S &lt;= A or ZX S &lt;= A + SX S &lt;= A + SX

  ALUfun , Sen

  SX || 00 Execute

  M &lt;= MEM[S] MEM[S] &lt;= B PC &lt;= PC + 4 Memory

  R[rd] &lt;= S PC &lt;= PC + 4 RegDst,

  R[rt] &lt;= S R[rt] &lt;= M RegWr,

  PC &lt;= PC + 4 PC &lt;= PC + 4 PCen

  Write-back

Assigning States

  IR &lt;= MEM[PC] R-type

  0000 0001 0100 0101

  R: ORi: LW: SW:

  State Op field Eq Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst

  0000 xxxxxx x 0001 1 0 x 0 0 x x x 0 x 0 0 x 0 x 0001 BEQ 0011 0 0 x 1 1 x x x 0 x 0 0 x 0 x 0001 BEQ 1 0010 0 0 x 1 1 x x x 0 x 0 0 x 0 x 0001 R-type x 0100 0 0 x 1 1 x x x 0 x 0 0 x 0 x 0001 orI x 0110 0 0 x 1 1 x x x 0 x 0 0 x 0 x 0001 LW x 1000 0 0 x 1 1 x x x 0 x 0 0 x 0 x 0001 SW x 1011 0 0 x 1 1 x x x 0 x 0 0 x 0 x 0010 xxxxxx x 0000 0 1 1 0 0 x x x 0 x 0 0 x 0 x 0011 xxxxxx x 0000 0 1 0 0 0 x x x 0 x 0 0 x 0 x 0100 xxxxxx x 0101 0 0 x 0 0 x 1 fun 0 x 0 0 x 0 x 0101 xxxxxx x 0000 0 1 0 0 0 x x x 0 x 0 0 0 1 1 0110 xxxxxx x 0111 0 0 x 0 0 0 0 or 1 x 0 0 0 1 1 0111 xxxxxx x 0000 0 1 0 0 0 x x x 0 x 0 0 0 1 0 1000 xxxxxx x 1001 0 0 x 0 0 1 0 add 1 x 0 0 0 1 0 1001 xxxxxx x 1010 0 0 x 0 0 x x x 0 1 0 1 x 0 x 1010 xxxxxx x 0000 0 1 0 0 0 x x x 0 x 0 0 1 1 0 1011 xxxxxx x 1100 0 0 x 0 0 1 0 add 1 x 0 0 x 0 x 1100 xxxxxx x 0000 0 1 0 0 0 x x x 0 0 1 0 x 0 x

  1100

  1010 0011 0010 1011

  0110 0111 1000 1001

  Execute Memory Write-back

  A &lt;= R[rs] B &lt;= R[rt] S &lt;= A fun B R[rd] &lt;= S

  SW “instruction fetch” “decode”

  PC &lt;= PC + 4 PC &lt;= PC + SX || 00

  BEQ &amp; Equal BEQ &amp; ~Equal

  S &lt;= A + SX MEM[S] &lt;= B PC &lt;= PC + 4

  LW

  S &lt;= A + SX R[rt] &lt;= M PC &lt;= PC + 4 M &lt;= MEM[S]

  ORi

  PC &lt;= PC + 4 S &lt;= A or ZX R[rt] &lt;= S PC &lt;= PC + 4

Detailed Control Specification

Performance Evaluation

  • state diagram gives CPI for each instruction type
  • workload gives frequency of each type Type CPI

  1.5 Store 4 10%

  sequencer control datapath control micro-PC sequencer microinstruction

  ° Use this structure to construct a simple “microsequencer” ° Control reduces to programming this very simple device

  ° The state digrams that arise define the controller for an instruction set processor are highly structured

  0.6 Average CPI:4.1

  0.4 branch 3 20%

  1.6 Load 5 30%

  Arith/Logic 4 40%

  i

  x freqI

  i

  for type Frequency CPI

  i

  ° What is the average CPI?

Controller Design

  • microprogramming

Example: micro PC

  op-code Map ROM Counter zero inc load

  0000 i i+1 i

Using a micro PC

  PC &lt;= PC + 4 S &lt;= A or ZX R[rt] &lt;= S PC &lt;= PC + 4

  ORi

  S &lt;= A + SX R[rt] &lt;= M PC &lt;= PC + 4 M &lt;= MEM[S]

  LW

  S &lt;= A + SX MEM[S] &lt;= B PC &lt;= PC + 4

  A &lt;= R[rs] B &lt;= R[rt] S &lt;= A fun B R[rd] &lt;= S

  PC &lt;= PC + 4 PC &lt;= PC + SX || 00

  SW “instruction fetch” “decode”

  Execute Memory Write-back

  0000 0001 0100 0101

  0110 0111 1000 1001

  1010 0011 0010 1011

  1100 inc load inc zero zero zero zero zero zero inc inc inc inc inc

  IR &lt;= MEM[PC] R-type

  BEQ &amp; Equal BEQ &amp; ~Equal

Our Micro Sequencer

  op-code Map ROM Micro-PC

  Z I L datapath control taken

Microprogram Control Specification

  0000 ? inc 1 0 x 0 0 x x x 0 x 0 0 x 0 x 0001 load 0 0 x 1 1 x x x 0 x 0 0 x 0 x 0001 1 inc 0 0 x 1 1 x x x 0 x 0 0 x 0 x 0010 x zero 0 1 1 0 0 x x x 0 x 0 0 x 0 x 0011 x zero 0 1 0 0 0 x x x 0 x 0 0 x 0 x 0100 x inc 0 0 x 0 0 x 1 fun 0 x 0 0 x 0 x 0101 x zero 0 1 0 0 0 x x x 0 x 0 0 0 1 1 0110 x inc 0 0 x 0 0 0 0 or 1 x 0 0 0 1 1 0111 x zero 0 1 0 0 0 x x x 0 x 0 0 0 1 0 1000 x inc 0 0 x 0 0 1 0 add 1 x 0 0 0 1 0 1001 x inc 0 0 x 0 0 x x x 0 1 0 1 x 0 x 1010 x zero 0 1 0 0 0 x x x 0 x 0 0 1 1 0 1011 x inc 0 0 x 0 0 1 0 add 1 x 0 0 x 0 x 1100 x zero 0 1 0 0 0 x x x 0 0 1 0 x 0 x

  µPC Taken Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst

  R: ORi: LW: SW: BEQ

Mapping ROM

  R-type 000000 0100 BEQ 000100 0011 ori 001101 0110 LW 100011 1000 SW 101011 1011 opcode

  µPC

Example: Controlling Memory

  PC Instruction Memory Inst. Reg

  addr data

  IR_en InstMem_rd

  IM_wait

Controller handles non-ideal memory

  “instruction fetch”

  IR &lt;= MEM[PC]

  wait ~wait A &lt;= R[rs] “decode / operand fetch” B &lt;= R[rt]

  LW BEQ &amp; Equal R-type ORi SW BEQ &amp; ~Equal PC &lt;= PC + 4 PC &lt;= PC + S &lt;= A fun B S &lt;= A or ZX S &lt;= A + SX S &lt;= A + SX

  SX || 00 Execute

  M &lt;= MEM[S] MEM[S] &lt;= B Memory

  ~wait wait wait ~wait

  R[rd] &lt;= S R[rt] &lt;= S R[rt] &lt;= M PC &lt;= PC + 4 PC &lt;= PC + 4 PC &lt;= PC + 4 PC &lt;= PC + 4

  Write-back

Really Simple Time-State Control

  IR &lt;= MEM[PC] wait ~wait instruction fetch

  A &lt;= R[rs] B &lt;= R[rt] decode

  LW BEQ &amp; Equal R-type ORi SW BEQ &amp; ~Equal S &lt;= A fun B S &lt;= A or ZX S &lt;= A + SX S &lt;= A + SX

  Execute M &lt;= MEM[S] MEM[S] &lt;= B

  Memory wait wait

  R[rd] &lt;= S R[rt] &lt;= S R[rt] &lt;= M PC &lt;= PC + 4PC &lt;= PC + PC &lt;= PC + 4 PC &lt;= PC + 4 PC &lt;= PC + 4

  PC &lt;= PC + 4 SX || 00 write-back

Time-state Control Path

  ° Local decode and control at each stage Exec

  Reg .

  File

Mem

Access

  A B S M

  Reg File Equal PC Next PC

  IR Inst . Mem

  Valid

  IRex Dcd Ctrl

  IRmem Ex Ctrl

  IRwb Mem Ctrl WB Ctrl

Overview of Control

  ° Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Finite State Diagram Microprogram Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Representation Logic Equations Truth Tables Implementation PLA ROM Technique “hardwired control” “microprogrammed control

Summary

  ° Disadvantages of the Single Cycle Processor

  • Long cycle time
  • Cycle time is too long for all instructions except the Load ° Multiple Cycle Processor:
  • Divide the instructions into smaller steps
  • Execute each step (instead of the entire instruction) in one

  cycle ° Partition datapath into equal size chunks to minimize cycle time

  • ~10 levels of logic between latches

  ° Follow same 5-step method for designing “real” processor

Summary (cont’d)

  ° Control is specified by finite state digram ° Specialize state-diagrams easily captured by microsequencer

  • simple increment &amp; “branch” fields
  • datapath control fields

  ° Control design reduces to Microprogramming ° Control is more complicated with

  • complex instruction sets
  • restricted datapaths (see the book)

  ° Simple Instruction set and powerful datapath =&gt; simple control