Professor Mike Schulte Computer Architecture ECE 201
Lecture 8: Designing a Single Cycle Datapath Professor Mike Schulte Computer Architecture ECE 201 The Big Picture: Where are We Now? °
The Five Classic Components of a Computer °
Today’s Topic: Design a Single Cycle Processor Control Datapath Memory Processor Input Output inst. set design technology machine design
The Big Picture: The Performance Perspective °
Performance of a machine is determined by:
- Instruction count
- Clock cycle time
- Clock cycles per instruction
- Clock cycle time
- Clock cycles per instruction
- Advantages: Simple design, low CPI
- Disadvantages: Long cycle time, which is limited by the slowest instruction.
- the meaning of each instruction is given by register
- datapath must include storage element for ISA registers
- datapath must support each register transfer
- R-type
- I-type
- J-type
° Processor design (datapath and control) will determine:
° Single cycle processor - one clock cycle per instruction
CPI Inst. Count Cycle Time How to Design a Processor: step-by-step °
Analyze instruction set => datapath requirements
transfers R[rd] <– R[rs] + R[rt];
° Select set of datapath components and establish clocking methodology
° Design datapath to meet the requirements
° Analyze implementation of each instruction to determine setting of control points that effects the register transfer.
° Design the control logic
- op: operation of the instruction
- rs, rt, rd: the source and destination register specifiers
- shamt: shift amount
- funct: selects the variant of the operation in the “op” field
- address / immediate: address offset or immediate value
- target address: target address of the jump instruction
op rs rt rd shamt funct
6
11
16
21
26
31 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
- addu rd, rs, rt
- subu rd, rs, rt
26
21
16
31 6 bits 16 bits 5 bits 5 bits
31 6 bits 16 bits 5 bits 5 bits
26
21
16
op rs rt immediate
- ori rt, rs, imm16
- lw rt, rs, imm16
- sw rt, rs, imm16
- beq rs, rt, imm16
op rs rt immediate
16
21
26
31 6 bits 16 bits 5 bits 5 bits
op rs rt immediate
° LOAD and STORE
° BRANCH:
16
Review: The MIPS Instruction Formats °
All MIPS instructions are 32 bits long. The three instruction formats are:
° The different fields are:
op target address
26
31 6 bits 26 bits
op rs rt rd shamt funct
6
11
21
° OR Immediate:
26
31 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
op rs rt immediate
16
21
26
31 6 bits 16 bits 5 bits 5 bits
Step 1a: The MIPS Subset for Today °
ADD and SUB
For the class project, we will be designing a processor that supports a larger subset of the MIPS instruction set.
Register Transfer Logic (RTL) °
RTL gives the meaning of the instructions °
All instructions start by fetching the instruction op | rs | rt | rd | shamt | funct = MEM[ PC ] op | rs | rt | Imm16 = MEM[ PC ] inst Register Transfers addu R[rd] <– R[rs] + R[rt]; PC <– PC + 4 subu R[rd] <– R[rs] – R[rt]; PC <– PC + 4 ori R[rt] <– R[rs] + zero_ext(imm16); PC <– PC + 4 load R[rt] <– MEM[ R[rs] + sign_ext(imm16)]; PC <– PC + 4 store MEM[ R[rs] + sign_ext(imm16) ] <– R[rt]; PC <– PC + 4 beq if ( R[rs] == R[rt] ) then PC <– PC + 4 + sign_ext(imm16)] || 00 else PC <– PC + 4 Step 1: Requirements of the Instruction Set °
Memory
- instruction & data °
Registers (32 x 32)
- read rs
- read rt
- write rt or rd
° PC
° Extender (sign extend or zero extend)
° Add and sub register or extended immediate
° Add 4 or shifted extended immediate to PC
Step 2: Components of the Datapath CarryIn °
Adder A Adder
32 Sum
32 B Carry
32 Select ° MUX
A MUX
32 Combinational Logic: Y
32 Does not use a clock B
32 °
ALU OP
3 A
32 ALU Result
32 B
32 Storage Element: Register (Basic Building Blocks) Write Enable °
Register Similar to the D Flip Flop except
- Data In Data Out
- N-bit input and output
N N Write enable input -
Clk Write Enable:
- negated (0): Data Out will not change
- asserted (1): Data Out will become Data In
- on the falling edge of the clock
5 RWRARB 32 32-bit Registers
busA and busB
32 busB 5 5
32 busA
32
time.” Clk
busW
Write Enable
block:
via busW (data) when Write Enable is 1 ° Clock input (CLK)
Storage Element: Register File ° Register File consists of 32 registers:
Clocking Methodology - Negative Edge Triggered °
Setup Hold
. . . . . . . . . . .
Don’t Care Setup Hold .
Clk
Skew ° (CLK-to-Q + Shortest Delay Path - Clock Skew) > Hold Time
All storage elements are clocked by the same clock edge ° Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock
- Two 32-bit output busses:
- One 32-bit input bus: busW ° Register is selected by:
- RA (number) selects the register to put on busA (data)
- RB (number) selects the register to put on busB (data)
- RW (number) selects the register to be written
- The CLK input is a factor ONLY during write operation
- During read operation, behaves as a combinational logic
- RA or RB valid => busA or busB valid after “access
Storage Element: Idealized Memory Write Enable Address ° Memory (idealized)
32 One input bus: Data In • Data In DataOut One output bus: Data Out •
32
32 Clk ° Memory word is selected by:
- Address selects the word to put on Data Out • Write Enable = 1: address selects the memory
word to be written via the Data In bus ° Clock input (CLK)
The CLK input is a factor ONLY during write operation
- During read operation, memory behaves as a
- combinational logic block:
- Address valid => Data Out valid after “access time.”
Step 3 °
Register Transfer Requirements –> Datapath Design Instruction Fetch • Decode instructions and Read Operands
- Execute Operation
- Write back the result
- Fetch the Instruction: mem[PC]
- Update the program counter:
- Sequential Code: PC <- PC + 4
- Branch and Jump: PC <- “something else”
- Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields
- ALUctr and RegWr: control logic after decoding the
5
31 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
26
21
16
11
6
ALU
op rs rt rd shamt funct32 32-bit Registers Rs Rt Rd
5 Rw Ra Rb
5
3a: Overview of the Instruction Fetch Unit ° The common RTL operations
32 busA 32 busB
32
RegWr
Clk busW
ALUctr
32 Result
instruction
Logic 3b: Add & Subtract ° R[rd] <- R[rs] op R[rt] Example: addu rd, rs, rt
Instruction Memory PC Clk Next Address
32 Instruction Word Address
3
3c: Logical Operations with Immediate ° R[ rt ] <- R[rs] op ZeroExt[imm16] Example : ori rt, rs, imm16
11
31
26
21
16
op rs rt immediate
6 bits 5 bits 5 bits 16 bits
rd?
31 16 15
immediate 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16 bits 16 bits Rd Rt
RegDst
Mux
Rs ALUctr
RegWr
5
5
5
3 busA Rw Ra Rb busW
ALU
32 Result
32 32-bit
32 Registers
32 busB Clk
Mux
32 ZeroExt imm16
32
16 ALUSrc
3d: Load Operations ° R[ rt ] <- Mem[R[rs] + SignExt[imm16] Example: lw rt, rs, imm16
11
31
26
21
16
op rs rt immediate
6 bits 5 bits 5 bits 16 bits
rd
Rd Rt RegDst
Mux
Rs ALUctr
RegWr
5
5
5
3 busA W_Src Rw Ra Rb busW ALU
32
32 32-bit
32 Registers
32 Clk busB MemWr
Mux Mux
32 Extender WrEn Adr
Data In
32 Data
32 imm16
32
16 Memory
Clk ALUSrc
ExtOp
3e: Store Operations ° Mem[ R[rs] + SignExt[imm16] <- R[rt] ] Example: sw rt, rs, imm16
31
26
21
16
op rs rt immediate
6 bits 5 bits 5 bits 16 bits Rd Rt
ALUctr MemWr W_Src RegDst
Mux
Rs Rt
3 RegWr
5
5
5 busA Rw Ra Rb busW ALU
32
32 32-bit
32 Registers
32 busB Clk
Mux Mux
32 Extender WrEn Adr
Data In
32
32 Data imm16
32
16 Memory
Clk ALUSrc
ExtOp
3f: The Branch Instruction
31
26
21
16
op rs rt immediate
6 bits 5 bits 5 bits 16 bits
° beq rs, rt, imm16 mem[PC] Fetch the instruction from memory
- Equal <- R[rs] == R[rt] Calculate the branch condition
- if (COND eq 0) Calculate the next instruction’s address
- PC <- PC + 4 + ( SignExt(imm16) x 4 )
else
- PC <- PC + 4
Datapath for Branch Operations ° beq rs, rt, imm16 Datapath generates condition (equal) op rs rt immediate
00 Adder Mux Adder
1
16 imm16
ALUSrc ExtOp Mux
MemtoReg
Clk Data In
WrEn
32 Adr
Data Memory MemWr
ALU Equal
Instruction<31:0>
1
1
RegDst Extender Mux
<21:25> <16:20> <11:15> <0:15>
Imm16 Rd Rt Rs
= Adder Adder PC
Clk
4 nPC_sel
PC Ext
Adr
Inst Memory
3
32
Rt Rd
16
5
21
26
31 6 bits 16 bits 5 bits 5 bits 32 imm16
PC
Clk
4
nPC_sel Clk busW
RegWr
32 busA 32 busB
5
5 Rw Ra Rb
Rs Rt
32 32-bit Registers
Rs Rt
Equal? Cond PC Ext Inst Address Putting it All Together: A Single Cycle Datapath imm16
32 ALUctr Clk busW
RegWr
32
32 busA 32 busB
5
5
5 Rw Ra Rb
00 Mux
32 32-bit Registers
Step 4: Given Datapath: RTL -> Control ALUctr RegDst
Fun
PC Ext imm16
4 nPC_sel
Clk
Adder Adder PC
Inst Memory Meaning of the Control Signals ° Rs, Rt, Rd and Imed16 hardwired into datapath ° nPC_sel: 0 => PC <– PC + 4; 1 => PC <– PC + 4 + SignExt(Im16) || 00
Adr
RegWr
<21:25>
ALUSrc ExtOp MemtoReg MemWr
Op
Inst Memory DATA PATH Control
Adr
nPC_sel
Imm16 Rd Rs Rt
<21:25> <16:20> <11:15> <0:15>
Instruction<31:0>
Equal
00 Mux
Meaning of the Control Signals °
MemWr: write memory ° ExtOp: “zero”, “sign”
° MemtoReg: 0 => ALU; 1 => Mem
° ALUsrc: 0 => regB; 1 => immed °
RegDst: 0 => “rt”; 1 => “rd” °
RegWr: write dest register ° ALUctr: “add”, “sub”, “or”
RegDst ALUctr MemWr MemtoReg Equal
Rt Rd
3
1 Rs Rt
RegWr
5
5
5 busA
=
Rw Ra Rb busW
ALU
32
32 32-bit
32 Registers busB
32 Mux
Mux
32 Clk
Extender
32 WrEn Adr
1
1 Data In
Data
imm16
32
16 Memory
Clk
ExtOp ALUSrc Control Signals inst Register Transfer ADD R[rd] <– R[rs] + R[rt]; PC <– PC + 4 ALUsrc = RegB, ALUctr = “add”, RegDst = rd, RegWr, nPC_sel = “+4” SUB R[rd] <– R[rs] – R[rt]; PC <– PC + 4 ALUsrc = RegB, ALUctr = “sub”, RegDst = rd, RegWr, nPC_sel = “+4” ORi R[rt] <– R[rs] + zero_ext(Imm16); PC <– PC + 4 ALUsrc = Im, Extop = “Z”, ALUctr = “or”, RegDst = rt, RegWr, nPC_sel = “+4” LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16)]; PC <– PC + 4 ALUsrc = Im, Extop = “Sn”, ALUctr = “add”, MemtoReg, RegDst = rt, RegWr, nPC_sel = “+4” STORE MEM[ R[rs] + sign_ext(Imm16)] <– R[rs]; PC <– PC + 4 ALUsrc = Im, Extop = “Sn”, ALUctr = “add”, MemWr, nPC_sel = “+4” BEQ if ( R[rs] == R[rt] ) then PC <– PC + sign_ext(Imm16)] || 00 else PC <– PC + 4 nPC_sel = EQUAL, ALUctr = “sub”
Step 5: Logic for each control signal ° nPC_sel <= if (OP == BEQ) then EQUAL else 0 ° ALUsrc <= if (OP == “000000”) then “regB” else “immed” ° ALUctr <= if (OP == “000000”) then funct elseif (OP == ORi) then “OR” elseif (OP == BEQ) then “sub” else “add”
Rs Rt
1
° ExtOp <= if (OP == ORi) then “zero” else “sign” ° MemWr <= (OP == Store)
MemtoReg
<21:25> <16:20> <11:15> <0:15>
Imm16 Rd Rt Rs
= imm16
RegDst Extender Mux
Rt Rd
Adder Adder PC
ALU Equal
5 Rw Ra Rb
5
5
4 nPC_sel
Adr
RegWr
Inst Memory sign ext add rt +4
Example: Load Instruction
° MemtoReg <= (OP == Load) ° RegWr: <= if ((OP == Store) || (OP == BEQ)) then 0 else 1 ° RegDst: <= if ((OP == Load) || (OP == ORi)) then 0 else 1
Instruction<31:0>
00 Mux
Data Memory MemWr
32 Adr
1
1
WrEn
ALUSrc ExtOp Mux
16 imm16
32
32 32-bit Registers
Clk
32 busA 32 busB
PC Ext
32
32 ALUctr Clk busW
Clk Data In
° Next time: implementing control
Clk PC
° Single cycle datapath => CPI=1, CCT => long
° MIPS makes it easier
Next Address Control Datapath Control Signals Conditions Summary ° 5 steps to design a processor
32 A B
32
32
32
5 Rt
5 Rs
Address Ideal Instruction Memory
An Abstract View of the Implementation °
Ideal Data Memory Instruction Instruction
Data Address
Clk Data In
ALU
32 32-bit Registers Rd
5 Rw Ra Rb
Clk
Data Out
Logical vs. Physical Structure
- 1. Analyze instruction set => datapath requirements
- 2. Select set of datapath components & establish clock methodology
- 3. Design datapath meeting the requirements
- 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer.
- 5. Design the control logic
- Instructions same size
- Source registers always in same place
- Immediates same size, location
- Operations always on registers/immediates