Professor Mike Schulte Computer Architecture ECE 201

  Lecture 8: Designing a Single Cycle Datapath Professor Mike Schulte Computer Architecture ECE 201 The Big Picture: Where are We Now? °

  The Five Classic Components of a Computer °

  Today’s Topic: Design a Single Cycle Processor Control Datapath Memory Processor Input Output inst. set design technology machine design

  The Big Picture: The Performance Perspective °

  Performance of a machine is determined by:

  • Instruction count
  • Clock cycle time
  • Clock cycles per instruction
  • Clock cycle time
  • Clock cycles per instruction
  • Advantages: Simple design, low CPI
  • Disadvantages: Long cycle time, which is limited by the slowest instruction.
  • the meaning of each instruction is given by register
  • datapath must include storage element for ISA registers
  • datapath must support each register transfer
  • R-type
  • I-type
  • J-type

  ° Processor design (datapath and control) will determine:

  ° Single cycle processor - one clock cycle per instruction

  CPI Inst. Count Cycle Time How to Design a Processor: step-by-step °

  Analyze instruction set => datapath requirements

  transfers R[rd] <– R[rs] + R[rt];

  ° Select set of datapath components and establish clocking methodology

  ° Design datapath to meet the requirements

  ° Analyze implementation of each instruction to determine setting of control points that effects the register transfer.

  ° Design the control logic

  • op: operation of the instruction
  • rs, rt, rd: the source and destination register specifiers
  • shamt: shift amount
  • funct: selects the variant of the operation in the “op” field
  • address / immediate: address offset or immediate value
  • target address: target address of the jump instruction

  op rs rt rd shamt funct

  6

  11

  16

  21

  26

  31 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits

  • addu rd, rs, rt
  • subu rd, rs, rt

  26

  21

  16

  31 6 bits 16 bits 5 bits 5 bits

  31 6 bits 16 bits 5 bits 5 bits

  26

  21

  16

  op rs rt immediate

  • ori rt, rs, imm16
  • lw rt, rs, imm16
  • sw rt, rs, imm16
  • beq rs, rt, imm16

  op rs rt immediate

  16

  21

  26

  31 6 bits 16 bits 5 bits 5 bits

  op rs rt immediate

  ° LOAD and STORE

  ° BRANCH:

  16

  Review: The MIPS Instruction Formats °

  All MIPS instructions are 32 bits long. The three instruction formats are:

  ° The different fields are:

  op target address

  26

  31 6 bits 26 bits

  op rs rt rd shamt funct

  6

  11

  21

  ° OR Immediate:

  26

  31 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits

  op rs rt immediate

  16

  21

  26

  31 6 bits 16 bits 5 bits 5 bits

  Step 1a: The MIPS Subset for Today °

  ADD and SUB

  For the class project, we will be designing a processor that supports a larger subset of the MIPS instruction set.

  Register Transfer Logic (RTL) °

  RTL gives the meaning of the instructions °

  All instructions start by fetching the instruction op | rs | rt | rd | shamt | funct = MEM[ PC ] op | rs | rt | Imm16 = MEM[ PC ] inst Register Transfers addu R[rd] <– R[rs] + R[rt]; PC <– PC + 4 subu R[rd] <– R[rs] – R[rt]; PC <– PC + 4 ori R[rt] <– R[rs] + zero_ext(imm16); PC <– PC + 4 load R[rt] <– MEM[ R[rs] + sign_ext(imm16)]; PC <– PC + 4 store MEM[ R[rs] + sign_ext(imm16) ] <– R[rt]; PC <– PC + 4 beq if ( R[rs] == R[rt] ) then PC <– PC + 4 + sign_ext(imm16)] || 00 else PC <– PC + 4 Step 1: Requirements of the Instruction Set °

  Memory

  • instruction & data °

  Registers (32 x 32)

  • read rs
  • read rt
  • write rt or rd

  ° PC

  ° Extender (sign extend or zero extend)

  ° Add and sub register or extended immediate

  ° Add 4 or shifted extended immediate to PC

  Step 2: Components of the Datapath CarryIn °

  Adder A Adder

  32 Sum

  32 B Carry

  32 Select ° MUX

  A MUX

  32 Combinational Logic: Y

  32 Does not use a clock B

  32 °

ALU OP

  3 A

  32 ALU Result

  32 B

  32 Storage Element: Register (Basic Building Blocks) Write Enable °

  Register Similar to the D Flip Flop except

  • Data In Data Out
    • N-bit input and output

  N N Write enable input -

  Clk Write Enable:

  • negated (0): Data Out will not change
    • asserted (1): Data Out will become Data In
    • on the falling edge of the clock

  5 RWRARB 32 32-bit Registers

   busA and busB

  32 busB 5 5

  32 busA

  

32

  time.” Clk

busW

Write Enable

  block:

  via busW (data) when Write Enable is 1 ° Clock input (CLK)

  Storage Element: Register File ° Register File consists of 32 registers:

  Clocking Methodology - Negative Edge Triggered °

  Setup Hold

  . . . . . . . . . . .

  Don’t Care Setup Hold .

  Clk

  Skew ° (CLK-to-Q + Shortest Delay Path - Clock Skew) > Hold Time

  All storage elements are clocked by the same clock edge ° Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock

  • Two 32-bit output busses:
  • One 32-bit input bus: busW ° Register is selected by:
  • RA (number) selects the register to put on busA (data)
  • RB (number) selects the register to put on busB (data)
  • RW (number) selects the register to be written
  • The CLK input is a factor ONLY during write operation
  • During read operation, behaves as a combinational logic
    • RA or RB valid => busA or busB valid after “access

  Storage Element: Idealized Memory Write Enable Address ° Memory (idealized)

  32 One input bus: Data InData In DataOut One output bus: Data Out

  32

  32 Clk ° Memory word is selected by:

  • Address selects the word to put on Data OutWrite Enable = 1: address selects the memory

  word to be written via the Data In bus ° Clock input (CLK)

  The CLK input is a factor ONLY during write operation

  • During read operation, memory behaves as a
  • combinational logic block:
    • Address valid => Data Out valid after “access time.”

  Step 3 °

  Register Transfer Requirements –> Datapath Design Instruction FetchDecode instructions and Read Operands

  • Execute Operation
  • Write back the result
  • Fetch the Instruction: mem[PC]
  • Update the program counter:

  • Sequential Code: PC <- PC + 4
  • Branch and Jump: PC <- “something else”
    • Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields
    • ALUctr and RegWr: control logic after decoding the

  5

  31 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits

  26

  21

  16

  11

  6

  

ALU

op rs rt rd shamt funct

  32 32-bit Registers Rs Rt Rd

  5 Rw Ra Rb

  5

  3a: Overview of the Instruction Fetch Unit ° The common RTL operations

  32 busA 32 busB

  32

  RegWr

  Clk busW

  

ALUctr

  32 Result

  instruction

  Logic 3b: Add & Subtract ° R[rd] <- R[rs] op R[rt] Example: addu rd, rs, rt

  Instruction Memory PC Clk Next Address

  32 Instruction Word Address

  3

  3c: Logical Operations with Immediate ° R[ rt ] <- R[rs] op ZeroExt[imm16] Example : ori rt, rs, imm16

  11

  31

  26

  21

  16

  op rs rt immediate

  6 bits 5 bits 5 bits 16 bits

  rd?

  31 16 15

  immediate 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

  16 bits 16 bits Rd Rt

  RegDst

  Mux

  Rs ALUctr

  RegWr

  5

  5

  5

  3 busA Rw Ra Rb busW

  

ALU

  32 Result

  32 32-bit

32 Registers

  32 busB Clk

  Mux

  32 ZeroExt imm16

  32

  16 ALUSrc

  3d: Load Operations ° R[ rt ] <- Mem[R[rs] + SignExt[imm16] Example: lw rt, rs, imm16

  11

  31

  26

  21

  16

  op rs rt immediate

  6 bits 5 bits 5 bits 16 bits

  rd

  Rd Rt RegDst

  Mux

  Rs ALUctr

  RegWr

  5

  5

  5

  3 busA W_Src Rw Ra Rb busW ALU

  32

  32 32-bit

  32 Registers

  32 Clk busB MemWr

  Mux Mux

  32 Extender WrEn Adr

  Data In

  32 Data

  32 imm16

  32

16 Memory

  Clk ALUSrc

  ExtOp

  3e: Store Operations ° Mem[ R[rs] + SignExt[imm16] <- R[rt] ] Example: sw rt, rs, imm16

  31

  26

  21

  16

  op rs rt immediate

  6 bits 5 bits 5 bits 16 bits Rd Rt

  ALUctr MemWr W_Src RegDst

  Mux

  Rs Rt

  3 RegWr

  5

  5

  5 busA Rw Ra Rb busW ALU

  32

  32 32-bit

  32 Registers

  32 busB Clk

  

Mux Mux

  32 Extender WrEn Adr

  Data In

  32

  32 Data imm16

  32

16 Memory

  Clk ALUSrc

  ExtOp

  3f: The Branch Instruction

  31

  26

  21

  16

  op rs rt immediate

  6 bits 5 bits 5 bits 16 bits

  ° beq rs, rt, imm16 mem[PC] Fetch the instruction from memory

  • Equal <- R[rs] == R[rt] Calculate the branch condition
  • if (COND eq 0) Calculate the next instruction’s address
    • PC <- PC + 4 + ( SignExt(imm16) x 4 )

  else

  • PC <- PC + 4

  Datapath for Branch Operations ° beq rs, rt, imm16 Datapath generates condition (equal) op rs rt immediate

00 Adder Mux Adder

  1

  16 imm16

  ALUSrc ExtOp Mux

  MemtoReg

  Clk Data In

  WrEn

  32 Adr

  Data Memory MemWr

  ALU Equal

  Instruction<31:0>

  1

  1

  RegDst Extender Mux

  <21:25> <16:20> <11:15> <0:15>

  Imm16 Rd Rt Rs

  = Adder Adder PC

  Clk

  4 nPC_sel

  PC Ext

  Adr

  Inst Memory

  3

  32

  Rt Rd

  16

  5

  21

  26

  31 6 bits 16 bits 5 bits 5 bits 32 imm16

  PC

  Clk

  4

  nPC_sel Clk busW

  RegWr

  32 busA 32 busB

  5

  5 Rw Ra Rb

  Rs Rt

  32 32-bit Registers

  Rs Rt

  Equal? Cond PC Ext Inst Address Putting it All Together: A Single Cycle Datapath imm16

  32 ALUctr Clk busW

  RegWr

  32

  32 busA 32 busB

  5

  5

  5 Rw Ra Rb

00 Mux

  32 32-bit Registers

  Step 4: Given Datapath: RTL -> Control ALUctr RegDst

  Fun

  PC Ext imm16

  4 nPC_sel

  Clk

  Adder Adder PC

  Inst Memory Meaning of the Control Signals ° Rs, Rt, Rd and Imed16 hardwired into datapath ° nPC_sel: 0 => PC <– PC + 4; 1 => PC <– PC + 4 + SignExt(Im16) || 00

  Adr

  RegWr

  <21:25>

  ALUSrc ExtOp MemtoReg MemWr

  Op

  Inst Memory DATA PATH Control

  Adr

  nPC_sel

  Imm16 Rd Rs Rt

  <21:25> <16:20> <11:15> <0:15>

  Instruction<31:0>

  Equal

00 Mux

  Meaning of the Control Signals °

  MemWr: write memory ° ExtOp: “zero”, “sign”

  ° MemtoReg: 0 => ALU; 1 => Mem

  ° ALUsrc: 0 => regB; 1 => immed °

  RegDst: 0 => “rt”; 1 => “rd” °

  RegWr: write dest register ° ALUctr: “add”, “sub”, “or”

  RegDst ALUctr MemWr MemtoReg Equal

  Rt Rd

  3

  1 Rs Rt

  RegWr

  5

  5

  5 busA

  =

  Rw Ra Rb busW

  ALU

  32

  32 32-bit

  32 Registers busB

  32 Mux

  Mux

  32 Clk

  Extender

  32 WrEn Adr

  1

  1 Data In

  Data

  imm16

  32

16 Memory

  Clk

  ExtOp ALUSrc Control Signals inst Register Transfer ADD R[rd] <– R[rs] + R[rt]; PC <– PC + 4 ALUsrc = RegB, ALUctr = “add”, RegDst = rd, RegWr, nPC_sel = “+4” SUB R[rd] <– R[rs] – R[rt]; PC <– PC + 4 ALUsrc = RegB, ALUctr = “sub”, RegDst = rd, RegWr, nPC_sel = “+4” ORi R[rt] <– R[rs] + zero_ext(Imm16); PC <– PC + 4 ALUsrc = Im, Extop = “Z”, ALUctr = “or”, RegDst = rt, RegWr, nPC_sel = “+4” LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16)]; PC <– PC + 4 ALUsrc = Im, Extop = “Sn”, ALUctr = “add”, MemtoReg, RegDst = rt, RegWr, nPC_sel = “+4” STORE MEM[ R[rs] + sign_ext(Imm16)] <– R[rs]; PC <– PC + 4 ALUsrc = Im, Extop = “Sn”, ALUctr = “add”, MemWr, nPC_sel = “+4” BEQ if ( R[rs] == R[rt] ) then PC <– PC + sign_ext(Imm16)] || 00 else PC <– PC + 4 nPC_sel = EQUAL, ALUctr = “sub”

  Step 5: Logic for each control signal ° nPC_sel <= if (OP == BEQ) then EQUAL else 0 ° ALUsrc <= if (OP == “000000”) then “regB” else “immed” ° ALUctr <= if (OP == “000000”) then funct elseif (OP == ORi) then “OR” elseif (OP == BEQ) then “sub” else “add”

  Rs Rt

  1

  ° ExtOp <= if (OP == ORi) then “zero” else “sign” ° MemWr <= (OP == Store)

  MemtoReg

  <21:25> <16:20> <11:15> <0:15>

  Imm16 Rd Rt Rs

  = imm16

  RegDst Extender Mux

  Rt Rd

  Adder Adder PC

  ALU Equal

  5 Rw Ra Rb

  5

  5

  4 nPC_sel

  Adr

  RegWr

  Inst Memory sign ext add rt +4

  Example: Load Instruction

  ° MemtoReg <= (OP == Load) ° RegWr: <= if ((OP == Store) || (OP == BEQ)) then 0 else 1 ° RegDst: <= if ((OP == Load) || (OP == ORi)) then 0 else 1

  Instruction<31:0>

00 Mux

  Data Memory MemWr

  32 Adr

  1

  1

  WrEn

  ALUSrc ExtOp Mux

  16 imm16

  32

  32 32-bit Registers

  Clk

  32 busA 32 busB

  PC Ext

  32

  32 ALUctr Clk busW

  Clk Data In

  ° Next time: implementing control

  Clk PC

  ° Single cycle datapath => CPI=1, CCT => long

  ° MIPS makes it easier

  Next Address Control Datapath Control Signals Conditions Summary ° 5 steps to design a processor

  32 A B

  32

  32

  32

  5 Rt

  5 Rs

  Address Ideal Instruction Memory

  An Abstract View of the Implementation °

  Ideal Data Memory Instruction Instruction

  Data Address

  Clk Data In

  ALU

  32 32-bit Registers Rd

  5 Rw Ra Rb

  Clk

  Data Out

  Logical vs. Physical Structure

  • 1. Analyze instruction set => datapath requirements
  • 2. Select set of datapath components & establish clock methodology
  • 3. Design datapath meeting the requirements
  • 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer.
  • 5. Design the control logic
  • Instructions same size
  • Source registers always in same place
  • Immediates same size, location
  • Operations always on registers/immediates