panel-plw-2007.ppt 340KB Jun 23 2011 12:05:46 PM

(1)

Lizy Kurian John, LCA, UT Aust in

1

The University of Texas at Austin

What Programming

Language/Compiler

Researchers should Know

about Computer Architecture

Lizy Kurian John

Department of Electrical and Computer Engineering


(2)

Lizy Kurian John, LC A, UT Austin

2

Somebody once said

Computers are dumb actors

and

compilers/programmers

are the master playwrights

.”


(3)

Lizy Kurian John, LC A, UT Austin

3

Computer Architecture

Basics

ISAs

RISC vs CISC

Assembly language coding

Datapath (ALU) and controller

Pipelining

Caches

Out of order execution


(4)

Lizy Kurian John, LC A, UT Austin

4

Basics

ILP

DLP

TLP

Massive parallelism

SIMD/MIMD

VLIW

Performance and Power metrics

Hennessy and Patterson architecture books ASPLOS, ISCA, Micro, HPCA


(5)

Lizy Kurian John, LC A, UT Austin

5

The Bottomline

Programming Language choice

affects performance and power

eg: Java

Compilers affect Performance

and Power


(6)

Lizy Kurian John, LC A, UT Austin

6

A Java Hardware

Interpreter

Radhakrishnan, Ph. D 2000 (ISCA2000, ICS2001)

This technique used by Nazomi

Communications, Parthus (Chicory Systems)

Java class file

Native executable

Fetch Hardware bytecode

translator

Decode Execute

bytecodes


(7)

Lizy Kurian John, LC A, UT Austin

7

HardInt Performance

4-way performance 44 .8 10 9.

3 149.

7 93 4. 1 91 1. 7 60 .4 13 5. 9 85 .2 12 7. 7 49 2. 2 71 .0 13 3. 7 22 1. 5 98 9. 4 86 7. 8 59 .8 10 8.

8 146.

2 14 6. 1 32 1. 9 16 .0 27 .7 28 .8 25 0. 2 12 0. 0 0 50 100 150 200 250 300 350 400

db javac jess mpeg mtrt

e x e c u ti o n c y c le s ( m il li o n s )

J DK 1.1.6 Interpreter J DK 1.1.6 J IT J DK 1.2 Interpreter J DK 1.2 J IT Hard- Int

• Hard-Int performs consistently better than the interpreter • In JIT mode, significant performance boost in 4 of 5


(8)

Lizy Kurian John, LC A, UT Austin

8

Compiler and Power

A B D F C E A B D F A B D F C C E E

DDG Peak Power = 3

Energy = 6

Peak Power = 2 Energy = 6

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 1 Cycle 2 Cycle 3 Cycle 4


(9)

Lizy Kurian John, LC A, UT Austin

9

Valluri et al 2001 HPCA

workshop

Quantitative Study

Influence of state-of-the-art optimizations

on energy and power of the processor

examined

Optimizations studied

 Standard –O1 to –O4 of DEC Alpha’s cc compiler  Four individual optimizations – simple

basic-block instruction scheduling, loop unrolling,

function inlining, and aggressive global


(10)

Lizy Kurian John, LC A, UT Austin

10

Standard Optimizations on

Power

Benchmark opt level Energy Exec Time Insts Avg Power IPC

O0 100 100 100 100 100 O1 74.48 81.55 81.52 91.33 99.96 O2 75.13 81.44 82.04 92.25 100.73 O3 75.13 81.44 82.04 92.25 100.73 O4 79.01 82.77 86.11 95.45 104.03 O0 100 100 100 100 100 O1 66.2 64.13 68.94 103.23 107.5 O2 62.62 61.31 63.01 102.14 102.78 O3 62.62 61.31 63.01 102.14 102.78 O4 63.67 62.19 63.75 102.38 102.51 O0 100 100 100 100 100 O1 81.32 83.66 83.18 97.2 99.42 O2 79.6 75.97 82.97 104.78 109.21 O3 79.6 75.97 82.97 104.78 109.21 O4 85.71 77.89 90.96 110.05 116.78

compress

go


(11)

Lizy Kurian John, LC A, UT Austin

11

Somebody once said

Computers are dumb actors

and

compilers/programmers

are the master playwrights

.”


(12)

Lizy Kurian John, LC A, UT Austin

12

A large part of modern

out of order processors

is hardware that could have

been eliminated if a good

compiler existed.


(13)

Lizy Kurian John, LC A, UT Austin

13

Let me get more arrogant

A large part of modern out of

order processors was designed

because

computer architects thought

compiler writers could not do a

good job.


(14)

Lizy Kurian John, LC A, UT Austin

14

Value Prediction

Is a slap on your face

Shen and Lipasti


(15)

Lizy Kurian John, LC A, UT Austin

15

Value Locality

Likelihood that an instruction’s

computed result or a similar

predictable result will occur soon

Observation – a limited set of

unique values constitute majority

of values produced and consumed

during execution


(16)

Lizy Kurian John, LC A, UT Austin

16


(17)

Lizy Kurian John, LC A, UT Austin

17

Causes of value locality

Data redundancy – many 0s, sparse

matrices, white space in files, empty

cells in spread sheets

Program constants –

Computed branches – base address for

jump tables is a run-time constant

Virtual function calls – involve code to


(18)

Lizy Kurian John, LC A, UT Austin

18

Causes of value locality

Memory alias resolution – compiler

conservatively generates code – may

contain stores that alias with loads

Register spill code – stores and

subsequent loads

Convergent algorithms – convergence in

parts of algorithms before global

convergence


(19)

Lizy Kurian John, LC A, UT Austin

19

2 Extremist Views

Anything that can be done in

hardware should be done in

hardware.

Anything that can be done in

software should be done in

software.


(20)

Lizy Kurian John, LC A, UT Austin

20

What do we need?

The Dumb actor

Or the

The defiant actor – who pays very

little attention to the script


(21)

Lizy Kurian John, LC A, UT Austin

21

Challenging all compiler

writers

The last 15 years was the defiant actor’s era

What about the next 15? TLP,

Multithreading, Parallelizing compilers –

It’s time for a lot more dumb acting from

the architect’s side.

And it’s time for some good scriptwriting

from the compiler writer’s side.


(22)

Lizy Kurian John, LCA, UT Aust in

22

The University of Texas at Austin


(23)

Lizy Kurian John, LC A, UT Austin

23

Compiler Optimzations

cc

-

Native C compiler on Dec

Alpha 21064 running OSF1

operating system

gcc –

Used to study the effect of


(24)

Lizy Kurian John, LC A, UT Austin

24

Std Optimizations Levels

on

cc

-O0 – No optimizations performed

-O1 – Local optimizations such as CSE,

copy propagation, IVE etc

-O2 – Inline expansion of static procedures

and global optimizations such as loop

unrolling, instruction scheduling

-O3 – Inline expansion of global procedures

-O4 – s/w pipelining, loop vectorization etc


(25)

Lizy Kurian John, LC A, UT Austin

25

Std Optimizations Levels

on g

cc

-O0 – No optimizations performed

-O1 – Local optimizations such as CSE, copy propagation, dead-code elimination etc -O2 – aggressive instruction scheduling -O3 – Inlining of procedures

Almost same optimizations in each level of cc and gccIn cc and gcc, optimizations that increase ILP are in

levels -O2, -O3, and -O4

cc used where ever possible, gcc used used where specific hooks are required


(26)

Lizy Kurian John, LC A, UT Austin

26

Individual Optimizations

Four

gcc

optimizations, all optimizations

applied on top -O1

-

fschedule-insns

local register allocation

followed by basic-block list scheduling

-

fschedule-insns2

– Postpass scheduling

done

-

finline-functions –

Integrated all simple

functions into their callers

-funroll-loops

– Perform the optimization


(27)

Lizy Kurian John, LC A, UT Austin

27

Some observations

Energy consumption reduces when

# of instructions is reduced, i.e.,

when the total work done is less,

energy is less

Power dissipation is directly


(28)

Lizy Kurian John, LC A, UT Austin

28

Observations (contd.)

Function inlining was found to be

good for both power and energy

Unrolling was found to be good for

energy consumption but bad for

power dissipation


(29)

Lizy Kurian John, LC A, UT Austin

29

MMX/SIMD

Automatic usage of SIMD ISA

still difficult 10+ years after

introduction of MMX.


(30)

Lizy Kurian John, LC A, UT Austin

30

Standard Optimizations on

Power (Contd)

Benchmark opt level Energy Exec Time Insts Avg Power IPC

O0 100 100 100 100 100 O1 97.38 100.24 92.49 97.15 92.27 O2 97.69 99.38 92.49 98.3 93.07 O3 97.69 99.38 92.49 98.3 93.07 O4 98.31 99.27 92.84 99.02 93.51 O0 100 100 100 100 100 O1 42.09 51.04 33.21 82.46 65.06 O2 40.99 47.52 33.1 86.28 69.67 O3 40.99 46.37 33.1 87.65 71.38 O0 100 100 100 100 100 O1 30.1 36.64 20.01 82.15 5463 O2 28.93 34.01 19.05 85.06 56.01 O3 28.93 34.01 19.05 85.06 56.01

su2cor

swim saxpy


(1)

Lizy Kurian John, LC

A, UT Austin

25

Std Optimizations Levels

on g

cc

-

O0 – No optimizations performed

-O1 – Local optimizations such as CSE, copy propagation, dead-code elimination etc -O2 – aggressive instruction scheduling -O3 – Inlining of procedures

Almost same optimizations in each level of cc and gcc

In cc and gcc, optimizations that increase ILP are in

levels -O2, -O3, and -O4

cc used where ever possible, gcc used used where specific hooks are required


(2)

Lizy Kurian John, LC

A, UT Austin

26

Individual Optimizations

Four

gcc

optimizations, all optimizations

applied on top -O1

-

fschedule-insns

local register allocation

followed by basic-block list scheduling

-

fschedule-insns2

– Postpass scheduling

done

-

finline-functions –

Integrated all simple

functions into their callers

-funroll-loops

– Perform the optimization


(3)

Lizy Kurian John, LC

A, UT Austin

27

Some observations

Energy consumption reduces when

# of instructions is reduced, i.e.,

when the total work done is less,

energy is less

Power dissipation is directly


(4)

Lizy Kurian John, LC

A, UT Austin

28

Observations (contd.)

Function inlining was found to be

good for both power and energy

Unrolling was found to be good for

energy consumption but bad for

power dissipation


(5)

Lizy Kurian John, LC

A, UT Austin

29

MMX/SIMD

Automatic usage of SIMD ISA

still difficult 10+ years after

introduction of MMX.


(6)

Lizy Kurian John, LC

A, UT Austin

30

Standard Optimizations on

Power (Contd)

Benchmark opt level Energy Exec Time Insts Avg Power IPC

O0 100 100 100 100 100

O1 97.38 100.24 92.49 97.15 92.27 O2 97.69 99.38 92.49 98.3 93.07 O3 97.69 99.38 92.49 98.3 93.07 O4 98.31 99.27 92.84 99.02 93.51

O0 100 100 100 100 100

O1 42.09 51.04 33.21 82.46 65.06 O2 40.99 47.52 33.1 86.28 69.67 O3 40.99 46.37 33.1 87.65 71.38

O0 100 100 100 100 100

O1 30.1 36.64 20.01 82.15 5463 O2 28.93 34.01 19.05 85.06 56.01 O3 28.93 34.01 19.05 85.06 56.01

su2cor

swim saxpy


Dokumen yang terkait

ANALISIS FAKTOR YANGMEMPENGARUHI FERTILITAS PASANGAN USIA SUBUR DI DESA SEMBORO KECAMATAN SEMBORO KABUPATEN JEMBER TAHUN 2011

2 53 20

KONSTRUKSI MEDIA TENTANG KETERLIBATAN POLITISI PARTAI DEMOKRAT ANAS URBANINGRUM PADA KASUS KORUPSI PROYEK PEMBANGUNAN KOMPLEK OLAHRAGA DI BUKIT HAMBALANG (Analisis Wacana Koran Harian Pagi Surya edisi 9-12, 16, 18 dan 23 Februari 2013 )

64 565 20

FAKTOR – FAKTOR YANG MEMPENGARUHI PENYERAPAN TENAGA KERJA INDUSTRI PENGOLAHAN BESAR DAN MENENGAH PADA TINGKAT KABUPATEN / KOTA DI JAWA TIMUR TAHUN 2006 - 2011

1 35 26

A DISCOURSE ANALYSIS ON “SPA: REGAIN BALANCE OF YOUR INNER AND OUTER BEAUTY” IN THE JAKARTA POST ON 4 MARCH 2011

9 161 13

Pengaruh kualitas aktiva produktif dan non performing financing terhadap return on asset perbankan syariah (Studi Pada 3 Bank Umum Syariah Tahun 2011 – 2014)

6 101 0

Pengaruh pemahaman fiqh muamalat mahasiswa terhadap keputusan membeli produk fashion palsu (study pada mahasiswa angkatan 2011 & 2012 prodi muamalat fakultas syariah dan hukum UIN Syarif Hidayatullah Jakarta)

0 22 0

Perlindungan Hukum Terhadap Anak Jalanan Atas Eksploitasi Dan Tindak Kekerasan Dihubungkan Dengan Undang-Undang Nomor 39 Tahun 1999 Tentang Hak Asasi Manusia Jo Undang-Undang Nomor 23 Tahun 2002 Tentang Perlindungan Anak

1 15 79

Pendidikan Agama Islam Untuk Kelas 3 SD Kelas 3 Suyanto Suyoto 2011

4 108 178

PP 23 TAHUN 2010 TENTANG KEGIATAN USAHA

2 51 76

KOORDINASI OTORITAS JASA KEUANGAN (OJK) DENGAN LEMBAGA PENJAMIN SIMPANAN (LPS) DAN BANK INDONESIA (BI) DALAM UPAYA PENANGANAN BANK BERMASALAH BERDASARKAN UNDANG-UNDANG RI NOMOR 21 TAHUN 2011 TENTANG OTORITAS JASA KEUANGAN

3 32 52