Simple machine architecture compilers and compiler generators 1996

Compilers and Compiler Generators © P.D. Terry, 2000 4 MACHINE EMULATION In Chapter 2 we discussed the use of emulation or interpretation as a tool for programming language translation. In this chapter we aim to discuss hypothetical machine languages and the emulation of hypothetical machines for these languages in more detail. Modern computers are among the most complex machines ever designed by the human mind. However, this is a text on programming language translation and not on electronic engineering, and our restricted discussion will focus only on rather primitive object languages suited to the simple translators to be discussed in later chapters.

4.1 Simple machine architecture

Many CPU central processor unit chips used in modern computers have one or more internal registers or accumulators, which may be regarded as highly local memory where simple arithmetic and logical operations may be performed, and between which local data transfers may take place. These registers may be restricted to the capacity of a single byte 8 bits, or, as is typical of most modern processors, they may come in a variety of small multiples of bytes or machine words. One fundamental internal register is the instruction register IR, through which moves the bitstrings bytes representing the fundamental machine-level instructions that the processor can obey. These instructions tend to be extremely simple - operations such as clear a register or move a byte from one register to another being the typical order of complexity. Some of these instructions may be completely defined by a single byte value. Others may need two or more bytes for a complete definition. Of these multi-byte instructions, the first usually denotes an operation, and the rest relate either to a value to be operated upon, or to the address of a location in memory at which can be found the value to be operated upon. The simplest processors have only a few data registers, and are very limited in what they can actually do with their contents, and so processors invariably make provision for interfacing to the memory of the computer, and allow transfers to take place along so-called bus lines between the internal registers and the far greater number of external memory locations. When information is to be transferred to or from memory, the CPU places the appropriate address information on the address bus, and then transmits or receives the data itself on the data bus. This is illustrated in Figure 4.1. The memory may simplistically be viewed as a one-dimensional array of byte values, analogous to what might be described in high-level language terms by declarations like the following TYPE ADDRESS = CARDINAL [0 .. MemSize - 1]; BYTES = CARDINAL [0 .. 255]; VAR Mem : ARRAY ADDRESS OF BYTES; in Modula-2, or, in C ++ which does not provide for the subrange types so useful in this regard typedef unsigned char BYTES; BYTES Mem[MemSize]; Since the memory is used to store not only data but also instructions, another important internal register in a processor, the so-called program counter or instruction pointer denoted by PC or IP, is used to keep track of the address in memory of the next instruction to be fed to the processor’s instruction register IR. Perhaps it will be helpful to think of the processor itself in high-level terms: TYPE PROCESSOR = struct processor { RECORD BYTES IR; IR, BYTES R1, R2, R3; R1, R2, R3 : BYTES; unsigned PC; PC : ADDRESS; }; END; VAR processor cpu; CPU : PROCESSOR; The operation of the machine is repeatedly to fetch a byte at a time from memory along the data bus, place it in the IR, and then execute the operation which this byte represents. Multi-byte instructions may require the fetching of further bytes before the instruction itself can be decoded fully by the CPU, of course. After the instruction denoted by the contents of IR has been executed, the value of PC will have been changed to point to the next instruction to be fetched. This fetch-execute cycle may be described by the following algorithm: BEGIN CPU.PC := initialValue; address of first code instruction LOOP CPU.IR := Mem[CPU.PC]; fetch IncrementCPU.PC; bump PC in anticipation ExecuteCPU.IR; affecting other registers, memory, PC handle machine interrupts if necessary END END. Normally the value of PC alters by small steps since instructions are usually stored in memory in sequence; execution of branch instructions may, however, have a rather more dramatic effect. So might the occurrence of hardware interrupts, although we shall not discuss interrupt handling further. A program for such a machine consists, in the last resort, of a long string of byte values. Were these to be written on paper as binary, decimal, or hexadecimal values, they would appear pretty meaningless to the human reader. We might, for example, find a section of program reading 25 45 21 34 34 30 45 Although it may not be obvious, this might be equivalent to a high-level statement like Price := 2 Price + MarkUp; Machine-level programming is usually performed by associating mnemonics with the recognizable operations, like HLT for halt or ADD for add to register. The above code is far more comprehensible when written with commentary as LDA 45 ; load accumulator with value stored in memory location 45 SHL ; shift accumulator one bit left multiply by 2 ADI 34 ; add 34 to the accumulator STA 45 ; store the value in the accumulator at memory location 45 Programs written in an assembly language - which have first to be assembled before they can be executed - usually make use of other named entities, for example MarkUp EQU 34 ; CONST MarkUp = 34; LDA Price ; CPU.A := Price; SHL ; CPU.A := 2 CPU.A; ADI MarkUp ; CPU.A := CPU.A + 34; STA Price ; Price := CPU.A; When we use code fragments such as these for illustration we shall make frequent use of commentary showing an equivalent fragment written in a high-level language. Commentary follows the semicolon on each line, a common convention in assembler languages.

4.2 Addressing modes