Professor Mike Schulte Computer Architecture ECE 201

Lecture 11: Exceptions and Pipelining Basics Professor Mike Schulte Computer Architecture ECE 201 Exceptions

  

System

user program

Exception

Handler

  Exception:

return from

exception

normal control flow: sequential, jumps, branches, calls, returns

  ° Exception = unprogrammed control transfer

  • system takes action to handle the exception
    • must record the address of the offending instruction
    • record any other information necessary to return afterwards

    >returns control to user
  • must save & restore user state

Two Types of Exceptions

  ° Interrupts

  • caused by external events
  • asynchronous to program execution
  • may be handled between instructions
  • simply suspend and resume user program

  ° Traps (or Exceptions)

  • caused by internal events
    • exceptional conditions (overflow)
    • invalid instruction
    • faults (non-resident page in memory)
    • internal hardware error

  • synchronous to program execution
  • condition must be remedied by the handler
  • instruction may be retried or simulated and program continued

  or program may be aborted MIPS convention ° Exception means any unexpected change in control flow, without distinguishing internal or external; ° Use the term interrupt only when the event is externally caused.

Type of event From where? MIPS terminology

  I/O device request External Interrupt Invoke OS from user program Internal Exception Arithmetic overflow Internal Exception Using an undefined instruction Internal Exception Hardware malfunctions Either Exception or Interrupt

Additions to MIPS ISA to support Exceptions?

  ° EPC –a 32-bit register used to hold the address of the affected instruction. ° Cause –a register used to record the cause of the exception. To simplify the discussion, assume

  • undefined instruction=0
  • arithmetic overflow=1

  ° Status - interrupt mask and enable bits and determines what exceptions can occur.

  ° Control signals to write EPC , Cause, and Status ° Be able to write exception address into PC, increase mux set PC to exception address (C000 0000 ). hex

  ° May have to undo PC = PC + 4, since want EPC to point to offending instruction (not its successor); PC = PC - 4 Big Picture: user / system modes

  ° By providing two modes of execution (user/system) it is possible for the computer to manage itself

  • operating system is a special program that runs in the priviledged mode and has access to all of the resources of the computer
  • presents “virtual resources” to each user that are more

  convenient that the physical resurces files vs. disk sectors -

  • virtual memory vs physical memory
    • protects each user program from others

  ° Exceptions allow the system to taken action in response to events that occur while user program is executing

  • O/S begins at the handler

How Control Detects Exceptions in our FSD

  ° Undefined Instruction–detected when no next state is defined from state 1 for the op value.

  We handle this exception by defining the next state value for all opvalues other than lw, sw, 0 (R-type), jmp, beq, and ori as new state

  ° detect overflow, and a signal called Overflow is provided as an output from the ALU. This signal is used in the modified finite state machine to specify an additional possible next state

  ° Note: Challenge in designing control of a real machine is to handle different interactions between instructions and other exception-causing events such that control logic remains small and fast.

  Complex interactions makes the control unit the most challengingaspect of hardware design

Modification to the Control Specification

  IR <= MEM[PC] undefined instruction

  PC <= PC + 4 EPC <= PC - 4

  A <= R[rs] PC <= exp_addr other

  B <= R[rt] cause <= 0 (UI)

  

BEQ

LW R-type ORi SW

  S <= A - B ~Equal

  S <= A fun B S <= A op ZX S <= A + SX S <= A + SX 0010

  

Equal

overflow

  PC <= PC + M <= MEM[S] MEM[S] <= B

  SX || 00 0011

  R[rd] <= S R[rt] <= S R[rt] <= M Additional condition from

  EPC <= PC - 4 Datapath to indicate overflow

  PC <= exp_addr cause <=1 (Ovf)

Pipelining is Natural!

  ° Pipelining provides a method for executing multiple A B C D instructions at the same time.

  ° Laundry Example ° Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold

  ° Washer takes 30 minutes ° Dryer takes 40 minutes ° “Folder” takes 20 minutes

Sequential Laundry

  7

  8

  9

  11 Midnight

  10 Time 30 40 20 30 40 20 30 40 20 30 40 20

  T a

A

  s k

B

  O r

C

  d e r

D

  ° Sequential laundry takes 6 hours for 4 loads ° If they learned pipelining, how long would laundry take?

Pipelined Laundry: Start work ASAP

  6 PM

  7

  8

  9

  11 Midnight

  10 Time 30 40

  40 40 40 20

  T a

A

  s k

B

  O r

C

  d e r

D

  ° Pipelined laundry takes 3.5 hours for 4 loads

Pipelining Lessons

  ° Pipelining doesn’t help latency of single task, it

  6 PM

  7

  8

  9 helps throughput of entire workload

  Time

  ° Pipeline rate limited by slowest pipeline stage 30 40

  40 40 40 20

  T

  ° Multiple tasks operating

  a

A simultaneously using

  s

  different resources

  k

  ° Potential speedup =

B Number pipe stages

  O

  ° Unbalanced lengths of

  r

  d

  pipe stages reduces C

  

speedup

  e r

  ° Time to “ fill ” pipeline and

D time to “ drain ” it reduces

  

speedup

° Stall for Dependences

The Five Stages of the Load Instruction

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Ifetch Reg/Dec Exec Mem Wr Load Pipelined Execution

  ° Ifetch: Instruction Fetch

  • Fetch the instruction from the Instruction Memory

  ° Reg/Dec: Registers Fetch and Instruction Decode ° Exec: Calculate the memory address ° Mem: Read the data from the Data Memory ° Wr: Write the data back to the register file

  ° On a processor multple instructions are in various stages at the same time. ° Assume each instruction takes five cycles

  IFetch Dcd Exec Mem WB

  IFetch Dcd Exec Mem WB

  IFetch Dcd Exec Mem WB

  IFetch Dcd Exec Mem WB

  IFetch Dcd Exec Mem WB

  IFetch Dcd Exec Mem WB Program Flow Time

Single Cycle, Multiple Cycle, vs. Pipeline Clk Cycle 1 Multiple Cycle Implementation: Ifetch Reg Exec Mem Wr Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Load Ifetch Reg Exec Mem Wr Ifetch Reg Exec Mem Load Store Pipeline Implementation: Ifetch Reg Exec Mem Wr Store Clk Single Cycle Implementation: Load Store Waste Ifetch R-type Ifetch Reg Exec Mem Wr R-type Cycle 1 Cycle 2 Graphically Representing Pipelines

  ° Can help with answering questions like:

  • How many cycles does it take to execute this code?
  • What is the ALU doing during cycle 4?
  • Are two instructions trying to use the same resource at the

  same time?

  I n s t r.

  Time (clock cycles)

  Inst 0 Inst 1 ALU Im Reg Dm Reg ALU Im Reg Dm Reg

Why Pipeline? Because the resources are there!

  Time (clock cycles)

  ALU

  I Im Reg Dm Reg n Inst 0

  ALU

  s

  Im Reg Dm Reg

  t

Inst 1 r. ALU Im Reg Dm Reg

  O

Inst 2

  r

  ALU

  d

Inst 3 Im Reg Dm Reg

  e r

ALU Inst 4 Im Reg Dm Reg Why Pipeline?

  ° Suppose

  • 100 instructions are executed
  • The single cycle machine has a cycle time of 45 ns

  The multicycle and pipeline machines have cycle times of 10 ns

  • The multicycle machine has a CPI of 4.6

  ° Single Cycle Machine

  • 45 ns/cycle x 1 CPI x 100 inst = 4500 ns

  ° Multicycle Machine 10 ns/cycle x 4.6 CPI x 100 inst = 4600 ns

  ° Ideal pipelined machine

  • 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns

  ° Ideal pipelined vs. single cycle speedup

  • 4500 ns / 1040 ns = 4.33

  What has not yet been considered? °

Can pipelining get us into trouble?

  ° Yes: Pipeline Hazards

  • structural hazards : attempt to use the same resource two

  different ways at the same time

  • E.g., two instructions try to read the same memory at the

  same time

  • data hazards : attempt to use item before it is ready
    • instruction depends on result of prior instruction still in the pipeline add r1 , r2, r3 sub r4, r2, r1

  • control hazards : attempt to make a decision before condition is

  evaulated

  • branch instructions

  beq r1, loop add r1, r2, r3 ° Can always resolve hazards by waiting

  • pipeline control must detect the hazard take action (or delay action) to resolve hazards
  • Single Memory is a Structural Hazard

  Time (clock cycles)

  ALU Mem Mem Reg

  I Reg n Load

  ALU

  s

  Mem Mem Reg Reg

  t

Instr 1 r. ALU Mem Mem Reg Reg

  O

Instr 2 ALU

  r

  Mem Mem Reg Reg

  d

Instr 3

  e

  ALU

  r

  Mem Mem Reg Reg Instr 4 Detection is easy in this case! (right half highlight means read, left half write)

Structural Hazards limit performance

  ° Example: if 1.3 memory accesses per instruction and only one memory access per cycle then average CPI = 1.3

  • otherwise resource is more than 100% utilized

  ° Solution 1: Use separate instruction and data memories

  ° Solution 2: Allow memory to read and write more than one word per cycle

  ° Solution 3: Stall

Control Hazard Solutions

  ° Stall: wait until decision is clear Its possible to move up decision to 2nd stage by addinghardware to check registers as being read

  I Time (clock cycles) n ALU

  Mem Mem Reg

  s Reg

Add

  t

ALU r. Mem Mem Reg Reg Beq

  O

  ALU

  r

  d e r

Mem Reg Reg Load Mem

  ° Impact: 2 clock cycles per branch instruction => slow

Control Hazard Solutions

  ° Predict: guess one direction then back up if wrong

  • Predict not taken

  ° Impact: 1 clock cycle per branch instruction if right, 2 if wrong (right - 50% of time)

  ° More dynamic scheme: history of 1 branch (- 90%)

  I n s t r. O r d e r

  Time (clock cycles)

Add Beq Load ALU Mem Reg Mem Reg ALU Mem Reg Mem Reg Mem ALU Reg

  

Mem Reg

° Redefine branch behavior (takes place after next instruction) “delayed branch”

Control Hazard Solutions

  ° Impact: 1 clock cycles per branch instruction if can find instruction to put in “slot” (- 50% of time)

  ° Launch more instructions per clock cycle=>less useful

  I n s t r. O r d e r

  Time (clock cycles)

Add Beq Misc ALU Mem Reg Mem Reg ALU Mem Reg Mem Reg Mem ALU Reg

  

Mem Reg

Load Mem ALU Reg Mem Reg

Data Hazard on r1 add r1 ,r2,r3 sub r4, r1 ,r3 and r6, r1 ,r7 or r8, r1 ,r9 xor r10, r1 ,r11 Problem: r1 cannot be read by other instructions before it is written by the add

Data Hazard on r1:

  • Dependencies backwards in time are hazards

  I n s t r. O r d e r

  Time (clock cycles)

  add r1 ,r2,r3 sub r4, r1 ,r3 and r6, r1 ,r7 or r8, r1 ,r9 xor r10, r1 ,r11

  IF

  ID/RF EX MEM WB ALU Im Reg Dm

Reg

  ALU Im Reg Dm Reg

ALU

  

Im Reg Dm Reg

Im ALU Reg Dm Reg ALU Im Reg Dm Reg

Data Hazard Solution:

  • “Forward” result from one stage to another
  • “or” OK if define read/write properly

  I n s t r. O r d e r

  Time (clock cycles)

  add r1 ,r2,r3 sub r4, r1 ,r3 and r6, r1 ,r7 or r8, r1 ,r9 xor r10, r1 ,r11

  IF

  ID/RF EX MEM WB ALU Im Reg Dm

Reg

  ALU Im Reg Dm Reg

ALU

  

Im Reg Dm Reg

Im ALU Reg Dm Reg ALU Im Reg Dm Reg

Forwarding (or Bypassing): What about Loads

  • Dependencies backwards in time are hazards
  • Can’t solve with forwarding:
  • Must delay/stall instruction dependent on loads

  Time (clock cycles)

  lw r1 ,0(r2) sub r4, r1 ,r3

  IF

  ID/RF EX MEM WB ALU Im Reg Dm

Reg

  ALU Im Reg Dm Reg