VLIW M achines Case Study: The Intel IA-64 M erced Architecture

428 CHAPT ER 10 T RENDS IN COMPUT ER ARCHIT ECT URE

10.7 VLIW M achines

T here is an architecture that is in a sense competitive with superscalar architectures, referred to as the VLIW Very Long Instruction Word architecture. In VLIW machines, multiple operations are packed into a single instruction word that may be 128 or more bits wide. T he VLIW machine has multiple execution units, similar to the superscalar machine. A typical VLIW CPU might have two IUs, two FPUs, two loadstore units, and a BPU. It is the responsibility of the compiler to organize multiple operations into the instruction word. T his relieves the CPU of the need to examine instructions for dependencies, or to order or reorder instructions. A disadvantage is that the compiler must out of necessity be pessimistic in its estimates of dependencies. If it cannot find enough instructions to fill the instruction word, it must fill the blank spots with NOP instructions. Furthermore, VLIW architectural improvements require software to be recom- piled to take advantage of them. T here have been a number of attempts to market VLIW machines, but mainly, VLIW machines have fallen out of favor in recent years. Performance is the pri- mary culprit, for the reasons above, among others.

10.8 Case Study: The Intel IA-64 M erced Architecture

T his section discusses a microprocessor family in development by an alliance between Intel and Hewlett-Packard, which is hoped will take the consortium into the 21st century. We first look into the background that led to the decision to develop a new architecture, and then we look at what is currently known about the architecture. T he information in this section is taken from various publications and Web sites, and has not been confirmed by Intel or Hewlett-Packard. 10.8.1 BACKGROUND—THE 80 X 86 CISC ARCHITECTURE T he current Intel 80x86 architecture, which runs on some 80 of desktop com- puters in the late 1990’s, had its roots in the 8086 microprocessor, designed in the late 1970’s. T he architectural roots of the family go back to the original Intel 8080, designed in the early 1970’s. Being a persistent advocate of upward compatibility, Intel has been in a sense hobbled by a CISC architecture that is over 20 years old. Other vendors such as Motorola abandoned hardware compatibility for modernization, relying upon emulators to ease the transition to a new ISA. In any case, Intel and Hewlett-Packard decided several years ago that the x86 CHAPT ER 10 T RENDS IN COMPUT ER ARCHIT ECT URE 429 architecture would soon reach the end of its useful life, and they began joint research on a new architecture. Intel and Hewlett-Packard have been quoted as saying that RISC architectures have “run out of gas,” so to speak, so their search led in other directions. T he result of their research led to the IA-64, which stands for “Intel Architecture-64.” T he first of the IA-64 family is known by the code name Merced , after the Merced River, near San Jose, California. 10.8.2 THE MERCED: AN EPIC ARCHITECTURE Although Intel has not released significant details of the Merced ISA, it refers to its architecture as Explicitly Parallel Instruction Computing, or EPIC . Intel takes pains to point out that it is not a VLIW or even an LIW machine, perhaps out of sensitivity to the bad reputation that VLIW machines have received, however, some industry analysts refer to it as “the VLIW-like EPIC architecture.” Features While exact details are not publicly known as of this writing, published sources report that the Merced is expected to have the following characteristics: • 128 64-bit GPRs and perhaps 128 80-bit FPRs; • 64 1-bit predicate registers explained later; • Instruction words contain three instructions packed into one 128-bit par- cel; • Execution units, roughly equivalent to IU, FPU, and BPU, appear in multiples of three, and the IA-64 will be able to schedule instructions into these multiples; • It will be the burden of the compiler to schedule the instructions to take advantage of the multiple execution units; • Most of the instructions seem to be RISC-like, although it is rumored that the processor will still execute 80x86 binary codes, in a dedicated execution unit, known as the DXU; • Speculative loads. T he processor will be able to load values from memory well in advance of when they are needed. Exceptions caused by the loads are postponed until execution has proceeded to the place where the loads 430 CHAPT ER 10 T RENDS IN COMPUT ER ARCHIT ECT URE would normally have occurred • Predication not prediction, where both sides of a conditional branch instruction are executed and the results from the side not taken are discarded. T hese latter two features are discussed in more detail later. The Instruction Word T he 128-bit instruction word, shown in Figure 10-13, has three 40-bit instructions, and an 8-bit template. T he template is placed by the compiler to tell the CPU which instructions in and near that instruction word can execute in parallel , thus the term “Explicit.” T he CPU need not analyze the code at runtime to expose instructions that can be executed in parallel because the compiler deter- mines that ahead of time. Compilers for most VLIW machines must place NOP instructions in slots where instructions cannot be executed in parallel. In the IA-64 scheme, the presence of the template identifies those instructions in the word that can and cannot be executed in parallel, so the compiler is free to schedule instructions into all three slots, regardless of whether they can be executed in parallel. T he 6-bit predicate field in each instruction represents a tag placed there by the compiler to identify which leg of a conditional branch the instruction is part of, and is used in branch predication. Branch Predication Rather than using branch prediction, the IA-64 architecture uses branch predication to remove penalties due to mis-predicted branches. When the compiler encounters a conditional branch instruction that is a candidate for predication, it Figure 10-13 The 128-bit IA-64 instruction word. 8 bit Template 40 bit Instruction 40 bit Instruction 40 bit Instruction 6 bit Predicate 7 bit GPR 7 bit GPR 7 bit GPR 13 bit Op Code 128 bits 40 bits CHAPT ER 10 T RENDS IN COMPUT ER ARCHIT ECT URE 431 selects two unique labels and labels the instructions in each leg of the branch instruction with one of the two labels, identifying which leg they belong to. Both legs can then be executed in parallel. T here are 64 one-bit predicate registers, one corresponding to each of the 64 possible predicate identifiers. When the actual branch outcome is known, the corresponding one-bit predicate register is set if the branch outcome is T RUE, and the one-bit predicate register corresponding to the FALSE label is cleared. T hen the results from instructions having the correct predicate label are kept, and results from instructions having the incorrect mis-predicted label are discarded. Speculative Loads T he architecture also employs speculative loads , that is, examining the instruction stream for upcoming load instructions and loading the value ahead of time, speculating that the value will actually be needed and will not have been altered by intervening operations. If successful, this eliminates the normal latency inher- ent in memory accesses. T he compiler examines the instruction stream for candidate load operations that it can “hoist” to a location earlier in the instruction sequence. It inserts a check instruction at the point where the load instruction was originally located. T he data value is thus available in the CPU when the check instruction is encountered. T he problem that is normally faced by speculative loads is that the load opera- tion may generate an exception, for example because the address is invalid. How- ever, the exception may not be genuine, because the load may be beyond a branch instruction that is not taken, and thus would never actually be executed. T he IA-64 architecture postpones processing the exception until the check instruction is encountered. If the branch is not taken then the check instruction will not be executed, and thus the exception will not be processed. All of this complexity places a heavy burden on the compiler, which must be clever about how it schedules operations into the instruction words. 80x86 Compatibility Intel was recently granted a patent for a method, presumably to be used with IA-64, for supporting two instruction sets, one of which is the x86 instruction set. It describes instructions to allow switching between the two execution 432 CHAPT ER 10 T RENDS IN COMPUT ER ARCHIT ECT URE modes, and for data sharing between them. Estimated Performance It has been estimated that the first Merced implementation will appear sometime in the year 2000, and will have an 800 MHz clock speed. Goals are for it to have performance several times that of current-generation processors when running in EPIC mode, and that of a 500 MHz Pentium II in x86 mode. Intel has stated that initially the IA-64 microprocessor will be reserved for use in high-performance workstations and servers, and at an estimated initial price of 5000 each this will undoubtedly be the case. On the other hand, skeptics, who seem to abound when new technology is announced, say that the technology is unlikely to meet expectations, and that the IA-64 may never see the light of day. Time will tell.

VLIW M achines Case Study: The Intel IA-64 M erced Architecture

10.7 VLIW M achines

10.8 Case Study: The Intel IA-64 M erced Architecture

10.9 Parallel Architecture

Parts

Dokumen yang terkait

GBPP TIF203 GBPP Arsitektur dan Organisasi Komputer

SAP TIF203 SAP Arsitektur dan Organisasi Komputer

Handout TIF203 Arsitektur dan Organisasi Komputer Ch 1 2

Handout TIF203 Arsitektur dan Organisasi Komputer Ch 4

Handout TIF203 Arsitektur dan Organisasi Komputer Ch 5 6

Handout TIF203 Arsitektur dan Organisasi Komputer Ch 7

Handout TIF203 Arsitektur dan Organisasi Komputer Ch 9

Handout TIF203 Arsitektur dan Organisasi Komputer Ch 11

Handout TIF203 Arsitektur dan Organisasi Komputer Ch 12

Handout TIF203 Arsitektur dan Organisasi Komputer Ch 13

Dukungan

Links

VLIW M achines Case Study: The Intel IA-64 M erced Architecture

10.7 VLIW M achines

10.8 Case Study: The Intel IA-64 M erced Architecture

10.9 Parallel Architecture

Parts

Dokumen yang terkait

GBPP TIF203 GBPP Arsitektur dan Organisasi Komputer

SAP TIF203 SAP Arsitektur dan Organisasi Komputer

Handout TIF203 Arsitektur dan Organisasi Komputer Ch 1 2

Handout TIF203 Arsitektur dan Organisasi Komputer Ch 4

Handout TIF203 Arsitektur dan Organisasi Komputer Ch 5 6

Handout TIF203 Arsitektur dan Organisasi Komputer Ch 7

Handout TIF203 Arsitektur dan Organisasi Komputer Ch 9

Handout TIF203 Arsitektur dan Organisasi Komputer Ch 11

Handout TIF203 Arsitektur dan Organisasi Komputer Ch 12

Handout TIF203 Arsitektur dan Organisasi Komputer Ch 13

Dokumen yang Anda mencari sudah siap untuk unduhkan