12 Processor Structure and Function [Compatibility Mode]

  Willia m St a llings Com put e r Orga nizat ion a nd Archit e c t ure

  t h

  8 Edit ion Cha pt e r 1 2 Proc e ssor St ruc t ure a nd Func t ion

  

— Fet ch inst ruct ions

— I nt erpret inst ruct ions — Fet ch dat a — Process dat a — Writ e dat a

  CPU St ruc t ure

  • CPU m ust :

  s u B s m te s

  re tu c u tr S

  Re gist e rs

  CPU m ust have som e w orking space • ( t em porary st orage) Called regist ers • Num ber and funct ion vary bet w een • processor designs One of t he m aj or design decisions • Top level of m em ory hierarchy •

  U se r V isible Re gist e rs

  • General Purpose • Dat a
  • Address • Condit ion Codes

  Ge ne ra l Pur pose Re gist e rs (1 )

  • May be t rue general purpose
  • May be rest rict ed
  • • May be used for dat a or addressing

  • Dat a

  — Accum ulat or

  • Addressing

  — Segm ent

  Ge ne ra l Pur pose Re gist e rs (2 )

  • Make t hem general purpose

  — I ncrease flexibilit y and program m er opt ions — I ncrease inst ruct ion size & com plexit y

  • Make t hem specialized

  — Sm aller ( fast er) inst ruct ions — Less flexibilit y

  H ow M a ny GP Re gist e rs?

  Bet w een 8 - 32 • Fewer = m ore m em ory references • More does not reduce m em ory references • and t akes up processor real est at e See also RI SC •

  H ow big?

  Large enough t o hold full address • Large enough t o hold full word • Oft en possible t o com bine t wo dat a • regist ers

  — C program m ing — double int a; — long int a;

  Condit ion Code Re gist e rs

  • Set s of individual bit s

  — e.g. result of last operat ion was zero

  • • Can be read ( im plicit ly) by program s

  — e.g. Jum p if zero

  • • Can not ( usually) be set by program s

  Cont rol & St at us Re gist e rs

  • Program Count er
  • I nst ruct ion Decoding Regist er
  • Mem ory Address Regist er
  • Mem ory Buffer Regist er
  • • Revision: what do t hese all do?

  Progra m St at us Word

  • A set of bit s
  • • I ncludes Condit ion Codes

    • Sign of last result
  • Zero • Carry • Equal • Overflow • I nt errupt enable/ disable
  • Supervisor

  Supe r visor M ode

  • I nt el ring zero
  • Kernel m ode
  • • Allows privileged inst ruct ions t o execut e

  • Used by operat ing syst em
  • Not available t o user program s

  Ot he r Re gist e rs

  May have regist ers point ing t o: • — Process cont rol blocks ( see O/ S) — I nt errupt Vect ors ( see O/ S)

  N.B. CPU design and operat ing syst em • design are closely linked

  s n o ti a iz n a rg O r te is

  I nst ruc t ion Cycle

  • Revision

    • St allings Chapt er 3

  I ndire c t Cycle

  May require m em ory access t o fet ch • operands I ndirect addressing requires m ore • m em ory accesses Can be t hought of as addit ional inst ruct ion • subcycle

  t c e ir d n

   I h it w le c y

  m ra g ia D te ta S le c y

  — PC cont ains address of next inst ruct ion — Address m oved t o MAR — Address placed on address bus — Cont rol unit request s m em ory read — Result placed on dat a bus, copied t o MBR, t hen t o I R

  Dat a Flow (I nst ruc t ion Fe t ch)

  • Depends on CPU design
  • I n general:
  • Fet ch

  — Meanwhile PC increm ent ed by 1 perform ed — Right m ost N bit s of MBR t ransferred t o MAR — Cont rol unit request s m em ory read — Result ( address of operand) m oved t o MBR

  Dat a Flow (Dat a Fe t ch)

  • I R is exam ined
  • I f indirect addressing, indirect cycle is

  ) m ra g ia D h tc e

  ) m ra g ia D t c e ir d

  — Mem ory read/ w rit e — I nput / Out put — Regist er t ransfers — ALU operat ions

  Dat a Flow (Exe c ut e )

  • May t ake m any form s
  • • Depends on inst ruct ion being execut ed

  • May include

  Dat a Flow (I nt e rrupt )

  • Sim ple

  Predict able • Current PC saved t o allow resum pt ion • aft er int errupt Cont ent s of PC copied t o MBR • Special m em ory locat ion ( e.g. st ack • point er) loaded t o MAR MBR writ t en t o m em ory • PC loaded wit h address of int errupt • handling rout ine Next inst ruct ion ( first of int errupt handler) • can be fet ched

  ) m ra g ia D t p ru r te

  Pre fe t ch

  Fet ch accessing m ain m em ory • Execut ion usually does not access m ain • m em ory Can fet ch next inst ruct ion during • execut ion of current inst ruct ion Called inst ruct ion prefet ch •

  I m prove d Pe rfor m a nc e

  • But not doubled:

  — Fet ch usually short er t han execut ion

  • – Prefet ch m ore t han one inst ruct ion?

  — Any j um p or branch m eans t hat prefet ched inst ruct ions are not t he required inst ruct ions

  • • Add m ore st ages t o im prove perform ance

  Pipe lining

  • Fet ch inst ruct ion
  • Decode inst ruct ion
  • • Calculat e operands ( i.e. EAs)

  • Fet ch operands
  • Execut e inst ruct ions
  • Writ e result
  • Overlap t hese operat ions

  e n li e ip P n o ti c u tr s

  T im ing Dia gra m for I nst ruc t ion Pipe line Ope rat ion

  T he Effe c t of a Condit iona l Bra nch on I nst ruc t ion Pipe line Ope rat ion

  e n li e ip

  n o ti ic p e D e n li e ip

  to rs o n

  Pipe line H a za rds

  Pipeline, or som e port ion of pipeline, m ust • st all Also called pipeline bubble • Types of hazards •

  — Resource — Dat a — Cont rol

  Re sourc e H a za rds

  • Tw o ( or m ore) inst ruct ions in pipeline need sam e resource
  • Execut ed in serial rat her t han parallel for part of pipeline
  • Also called st ruct ural hazard
  • E.g. Assum e sim plified five- st age pipeline — Each st age t akes one clock cycle
  • I deal case is new inst ruct ion ent ers pipeline each clock cycle
  • Assum e m ain m em ory has single port
  • • Assum e inst ruct ion fet ches and dat a reads and writ es perform ed

  one at a t im e

  • I gnore t he cache
  • Operand read or w rit e cannot be perform ed in parallel w it h

  inst ruct ion fet ch

  • Fet ch inst ruct ion st age m ust idle for one cycle fet ching I 3
  • • E.g. m ult iple inst ruct ions ready t o ent er execut e inst ruct ion phase

  • Single ALU
  • One solut ion: increase available resources — Mult iple m ain m em ory port s

  Dat a H a za rds

  Conflict in access of an operand locat ion • Tw o inst ruct ions t o be execut ed in sequence • Bot h access a part icular m em ory or regist er operand • I f in st rict sequence, no problem occurs • I f in a pipeline, operand value could be updat ed so as t o •

produce different result from st rict sequent ial execut ion

E.g. x86 m achine inst ruct ion sequence: •

  • ADD EAX, EBX / * EAX = EAX + EBX

  SUB ECX, EAX / * ECX = ECX – EAX • ADD inst ruct ion does not updat e EAX unt il end of st age 5, • at clock cycle 5 SUB inst ruct ion needs value at beginning of it s st age 2, at • clock cycle 4 Pipeline m ust st all for t wo clocks cycles • Wit hout special hardware and specific avoidance • algorit hm s, result s in inefficient pipeline usage

  m ra g ia

  Type s of Dat a H a za rd

  • Read aft er w rit e ( RAW) , or t rue dependency

  

— An inst ruct ion m odifies a regist er or m em ory locat ion

— Succeeding inst ruct ion reads dat a in t hat locat ion — Hazard if read t akes place before writ e com plet e

  • Writ e aft er read ( RAW) , or ant idependency

  — An inst ruct ion reads a regist er or m em ory locat ion — Succeeding inst ruct ion writ es t o locat ion — Hazard if writ e com plet es before read t akes place

  • Writ e aft er writ e ( RAW) , or out put dependency

  — Tw o inst ruct ions bot h writ e t o sam e locat ion — Hazard if writ es t ake place in reverse of order int ended sequence

  • Previous exam ple is RAW hazard

  m ra g ia D rd a z

  rd

  Cont rol H a za rd

  • Also known as branch hazard
  • • Pipeline m akes wrong decision on branch

  predict ion

  • • Brings inst ruct ions int o pipeline t hat m ust

  subsequent ly be discarded

  • Dealing wit h Branches

  — Mult iple St ream s — Prefet ch Branch Target — Loop buffer — Branch predict ion — Delayed branching

  M ult iple St re a m s

  Have t wo pipelines • Prefet ch each branch int o a separat e • pipeline Use appropriat e pipeline • Leads t o bus & regist er cont ent ion • Mult iple branches lead t o furt her pipelines • being needed

  Pre fe t ch Bra nch Ta rge t

  Target of branch is prefet ched in addit ion • t o inst ruct ions following branch Keep t arget unt il branch is execut ed • Used by I BM 360/ 91 •

  Loop Buffe r

  • Very fast m em ory
  • • Maint ained by fet ch st age of pipeline

  • Check buffer before fet ching from

  m em ory

  • Very good for sm all loops or j um ps
  • c.f. cache
  • Used by CRAY- 1

  m ra g ia

  Bra nch Pre dic t ion (1 )

  • Predict never t aken

  — Assum e t hat j um p will not happen — Always fet ch next inst ruct ion — 68020 & VAX 11/ 780 —

  VAX will not prefet ch aft er branch if a page fault would result ( O/ S v CPU design)

  • Predict always t aken

  — Assum e t hat j um p will happen — Always fet ch t arget inst ruct ion

  Bra nch Pre dic t ion (2 )

  • Predict by Opcode

  — Som e inst ruct ions are m ore likely t o result in a j um p t han t hers

  — Can get up t o 75% success

  • Taken/ Not t aken swit ch

  — Based on previous hist ory — Good for loops — Refined by t wo- level or correlat ion- based branch hist ory

  • Correlat ion- based

  — I n loop- closing branches, hist ory is good predict or

  — I n m ore com plex st ruct ures, branch direct ion

  — Do not t ake j um p unt il you have t o — Rearrange inst ruct ions

  Bra nch Pre dic t ion (3 )

  • Delayed Branch

  t r a h c w lo F n o ti ic

  m ra g ia D te ta S n o ti ic

  I nt e l 8 0 4 8 6 Pipe lining

  • Fet ch

  — From cache or ext ernal m em ory — Put in one of t w o 16- byt e prefet ch buffers — Fill buffer w it h new dat a as soon as old dat a consum ed — Average 5 inst ruct ions fet ched per load — I ndependent of ot her st ages t o keep buffers full

  • Decode st age 1

  — Opcode & address- m ode info — At m ost first 3 byt es of inst ruct ion — Can direct D2 st age t o get rest of inst ruct ion

  • Decode st age 2

  — Expand opcode int o cont rol signals — Com put at ion of com plex address m odes

  • Execut e

  — ALU operat ions, cache access, regist er updat e

  • Writ eback

  s le p m a x E e n li e ip P n o ti c

  rs te is g

  r te is

  rs te

  M M X Re gist e r M a pping

  • MMX uses several 64 bit dat a t ypes
  • Use 3 bit regist er address fields

  — 8 regist ers

  • No MMX specific regist ers

  — Aliasing t o low er 64 bit s of exist ing float ing point regist ers

  M a pping of M M X Re gist e rs t o Float ing-Point Re gist e rs

  Pe nt ium I nt e rrupt Proc e ssing

  • I nt errupt s

  — Maskable — Nonm askable

  • Except ions

  — Processor det ect ed — Program m ed

  • I nt errupt vect or t able

  — Each int errupt t ype assigned a num ber — I ndex t o vect or t able — 256 * 32 bit int errupt vect ors

  • 5 priorit y classes

  ARM At t ribut e s

  • RI SC
  • Moderat e array of uniform regist ers

  — More t han m ost CI SC, less t han m any RI SC

  • Load/ st ore m odel

  — Operat ions perform on operands in regist ers only

  • Uniform fixed- lengt h inst ruct ion

  — 32 bit s st andard set 16 bit s Thum b

  • Shift or rot at ion can preprocess source regist ers

  — Separat e ALU and shift er unit s

  • Sm all num ber of addressing m odes

  — All load/ st ore addressees from regist ers and inst ruct ion fields

  

— No indirect or indexed addressing involving values in

m em ory

  • Aut o- increm ent and aut o- decrem ent addressing

  — I m prove loops

  n o ti a iz n a rg O M

  ARM Proc e ssor Orga nizat ion

  Many variat ions depending on ARM version • Dat a exchanged bet ween processor and m em ory • t hrough dat a bus Dat a it em ( load/ st ore) or inst ruct ion ( fet ch) • I nst ruct ions go t hrough decoder before execut ion •

  • Pipeline and cont rol signal generat ion in cont rol

  unit Dat a goes t o regist er file • — Set of 32 bit regist ers — Byt e & halfword t wos com plem ent dat a sign ext ended

  Typically t wo source and one result regist er • Rot at ion or shift before ALU •

  ARM Proc e ssor M ode s

  • User • Privileged

  — 6 m odes

  • – OS can t ailor syst em s soft ware use
  • – Som e regist ers dedicat ed t o each privileged m ode
  • – Sw ift er cont ext changes
    • Except ion

  — 5 of privileged m odes — Ent ered on given except ions — Subst it ut e som e regist ers for user regist ers

  • – Avoid corrupt ion

  Privile ge d M ode s

  • Syst em Mode

  — Not except ion — Uses sam e regist ers as User m ode — Can be int errupt ed by…

  • Supervisor m ode

  — OS — Soft ware int errupt usedd t o invoke operat ing syst em services

  • Abort m ode

  — m em ory fault s

  • Undefined m ode

  

— At t em pt inst ruct ion t hat is not support ed by int eger core

coprocessors

  • Fast int errupt m ode

  — I nt errupt signal from designat ed fast int errupt source — Fast int errupt cannot be int errupt ed — May int errupt norm al int errupt

  • I nt errupt m ode

  ARM Re gist e r Orga nizat ion Ta ble Modes Privileged modes Exception modes User System Supervisor Abort Undefined Interrupt Fast Interrupt R0 R0 R0 R0 R0 R0 R0 R1 R1 R1 R1 R1 R1 R1 R2 R2 R2 R2 R2 R2 R2 R3 R3 R3 R3 R3 R3 R3 R4 R4 R4 R4 R4 R4 R4 R5 R5 R5 R5 R5 R5 R5 R6 R6 R6 R6 R6 R6 R6 R7 R7 R7 R7 R7 R7 R7 R8 R8 R8 R8 R8 R8 R8_fiq R9 R9 R9 R9 R9 R9 R9_fiq R10 R10 R10 R10 R10 R10 R10_fiq R11 R11 R11 R11 R11 R11 R11_fiq R12 R12 R12 R12 R12 R12 R12_fiq R13 (SP) R13 (SP) R13_svc R13_abt R13_und R13_irq R13_fiq R14 (LR) R14 (LR) R14_svc R14_abt R14_und R14_irq R14_fiq R15 (PC) R15 (PC) R15 (PC) R15 (PC) R15 (PC) R15 (PC) R15 (PC)

  ARM Re gist e r Orga nizat ion

  • 37 x 32- bit regist ers
  • 31 general- purpose regist ers

  — Som e have special purposes — E.g. program count ers

  • Six program st at us regist ers
  • • Regist ers in part ially overlapping banks

  — Processor m ode det erm ines bank

  • • 16 num bered regist ers and one or t wo

  program st at us regist ers visible

  Ge ne ra l Re gist e r U sa ge

  • R13 norm ally st ack point er ( SP)

  — Each except ion m ode has it s own R13

  • R14 link regist er ( LR)

  — Subrout ine and except ion m ode ret urn address

  • R15 program count er

  CPSR

  • CPSR process st at us regist er

  — Except ion m odes have dedicat ed SPSR

  • 16 m sb are user flags

  — Condit ion codes ( N,Z,C,V) — Q – overflow or sat urat ion in som e SMI D inst ruct ions

  — J – Jazelle ( 8 bit ) inst ruct ions — GEE[ 3: 0] SMI D use [ 19: 16] as great er t han or equal flag

  • 16 lsb syst em flags for privilege m odes

  — E – endian — I nt errupt disable — T – Norm al or Thum b inst ruct ion

  R S P S d n

  ARM I nt e rrupt (Exc e pt ion) Proc e ssing

  More t han one except ion allowed • Seven t ypes • Execut ion forced from except ion vect ors • Mult iple except ions handled in priorit y • order

  • Processor halt s execut ion aft er current

  inst ruct ion Processor st at e preserved in SPSR for • except ion

  — Address of inst ruct ion about t o execut e put in link regist er

  Fore ground Re a ding

  • Processor exam ples
  • St allings Chapt er 12
  • • Manufact urer web sit es & specs