Machine tyranny Other aspects of code generation

are of fixed size, addressing is quite easy. Suppose we declare a matrix VAR Matrix[MinX : MaxX , MinY : MaxY]; Then we shall have to reserve MaxX-MinX+1 MaxY-MinY+1 consecutive locations for the whole array. If we store the elements by rows as in most languages, other than Fortran, then the offset of the I,J th element in the matrix will be found as I - MinX MaxY - MinY + 1 + J - MinY + offset of first element You will need a new version of the STK_ind opcode incorporating bounds checking. 15.22 Extend your system to allow whole arrays of the same length to be assigned one to another. 15.23 Some users like to live dangerously. How could you arrange for the compiler to have an option whereby generation of subscript range checks could be suppressed? 15.24 Complete level 1 of your extended Clang or Topsy compiler by generating code for all the extra statement forms that you have introduced while assiduously exploring the exercises suggested earlier in this chapter. How many of these can only be completed if the instruction set of the machine is extended?

15.3 Other aspects of code generation

As the reader may have realized, the approach taken to code generation up until now has been rather idealistic. A hypothetical stack machine is, in many ways, ideal for our language - as witness the simplicity of the code generator - but it may differ rather markedly from a real machine. In this section we wish to look at other aspects of this large and intricate subject.

15.3.1 Machine tyranny

It is rather awkward to describe code generation for a real machine in a general text. It inevitably becomes machine specific, and the principles may become obscured behind a lot of detail for an architecture with which the reader may be completely unfamiliar. To illustrate a few of these difficulties, we shall consider some features of code generation for a relatively simple processor. The Zilog Z80 processor that we shall use as a model is typical of several 8-bit microprocessors that were very popular in the early 1980’s, and which helped to spur on the microcomputer revolution. It had a single 8-bit accumulator denoted by A , several internal 8-bit registers denoted by B, C, D, E, H and L , a 16-bit program counter PC , two 16-bit index registers IX and IY , a 16-bit stack pointer SP , an 8-bit data bus, and a 16-bit address bus to allow access to 64KB of memory. With the exception of the BRN opcode and, perhaps, the HLT opcode, our hypothetical machine instructions do not map one-for-one onto Z80 opcodes. Indeed, at first sight the Z80 would appear to be ill suited to supporting a high-level language at all, since operations on a single 8-bit accumulator only provide for handling numbers between -128 and +127, scarcely of much use in arithmetic calculations. For many years, however - even after the introduction of processors like the Intel 80x86 and Motorola 680x0 - 16-bit arithmetic was deemed adequate for quite a number of operations, as it allows for numbers in the range -32768 to +32767. In the Z80 a limited number of operations were allowed on 16-bit register pairs. These were denoted BC, DE and HL , and were formed by simply concatenating the 8-bit registers mentioned earlier. For example, 16-bit constants could be loaded immediately into a register pair, and such pairs could be pushed and popped from the stack, and transferred directly to and from memory. In addition, the HL pair could be used as a 16-bit accumulator into which could be added and subtracted the other pairs, and could also be used to perform register-indirect addressing of bytes. On the Z80 the 16-bit operations stopped short of multiplication, division, logical operations and even comparison against zero, all of which are found on more modern 16 and 32-bit processors. We do not propose to describe the instruction set in any detail; hopefully the reader will be able to understand the code fragments below from the commentary given alongside. As an example, let us consider Z80 code for the simple assignment statement I := 4 + J - K where I, J and K are integers, each stored in two bytes. A fairly optimal translation of this, making use of the HL register pair as a 16 bit accumulator, but not using a stack in any way, might be as follows: LD HL,4 ; HL := 4 LD DE,J ; DE := Mem[J] ADD HL,DE ; HL := HL + DE 4 + J LD DE,K ; DE := Mem[K] OR A ; just to clear Carry SBC HL,DE ; HL := HL - DE - Carry 4 + J - K LD I,HL ; Mem[I] := HL On the Z80 this amounted to some 18 bytes of code. The only point worth noting is that, unlike addition, there was no simple 16-bit subtraction operation, only one which involved a carry bit, which consequently required unsetting before SBC could be executed. By contrast, the same statement coded for our hypothetical machine would have produced 13 words of code ADR I ; push address of I LIT 4 ; push constant 4 ADR J ; push address of J VAL ; replace with value of J ADD ; 4 + J ADR K ; push address of K VAL ; replace with value of K SUB ; 4 + J - K STO ; store on I and for a simple single-accumulator machine like that discussed in Chapter 4 we should probably think of coding this statement on the lines of LDI 4 ; A := 4 ADD J ; A := 4 + J SUB K ; A := 4 + J - K STA I ; I := 4 + J - K How do we begin to map stack machine code to these other forms? One approach might be to consider the effect of the opcodes, as defined in the interpreter in Chapter 4, and to arrange that code generating routines like stackaddress , stackconstant and assign generate code equivalent to that which would be obeyed by the interpreter. For convenience we quote the relevant equivalences again. We use T to denote the virtual machine top of stack pointer, to avoid confusion with the SP register of the Z80 real machine. ADR address : T := T - 1; Mem[T] := address push an address LIT value : T := T - 1; Mem[T] := value push a constant VAL : Mem[T] := Mem[Mem[T]] dereference ADD : T := T + 1; Mem[T] := Mem[T] + Mem[T-1] addition SUB : T := T + 1; Mem[T] := Mem[T] - Mem[T-1] subtraction STO : Mem[Mem[T+1]] := Mem[T]; T := T + 2 store top-of-stack It does not take much imagination to see that this would produce a great deal more code than we should like. For example, the equivalent Z80 code for an LIT opcode, obtained from a translation of the sequence above, and generated by stackconstantNum might be T := T - 1 : LD HL,T ; HL := T DEC HL ; HL := HL - 1 DEC HL ; HL := HL - 1 LD T,HL ; T := HL Mem[T] := Num : LD DE,Num ; DE := Num LD HL,E ; store low order byte INC HL ; HL := HL + 1 LD HL,D ; store high order byte which amounts to some 14 bytes. We should comment that HL must be decremented twice to allow for the fact that memory is addressed in bytes, not words, and that we have to store the two halves of the register pair DE in two operations, bumping the HL pair used for register indirect addressing between these. If the machine for which we are generating code does not have some sort of hardware stack we might be forced or tempted into taking this approach, but fortunately most modern processors do incorporate a stack. Although the Z80 did not support operations like ADD and SUB on elements of its stack, the pushing which is implicit in LIT and ADR is easily handled, and the popping and pushing implied by ADD and SUB are nearly as simple. Consequently, it would be quite simple to write code generating routines which, for the same assignment statement as before, would have the effects shown below. ADR I : LD HL,I ; HL := address of I PUSH HL ; push address of I LIT 4 : LD DE,4 ; DE := 4 PUSH DE ; push value of 4 ADR J : LD HL,J ; HL := address of J PUSH HL ; push address of J VAL : POP HL ; HL := address of variable LD E,HL ; E := Mem[HL] low order byte INC HL ; HL := HL + 1 LD D,HL ; D := Mem[HL] high order byte PUSH DE ; replace with value of J ADD : POP DE ; DE := second operand POP HL ; HL := first operand ADD HL,DE ; HL := HL + DE PUSH HL ; 4 + J ADR K : LD HL,K ; HL := address of K PUSH HL ; push address of K VAL : POP HL ; HL := address of variable LD E,HL ; E := low order byte INC HL ; HL := HL + 1 LD D,HL ; D := high order byte PUSH DE ; replace with value of K SUB : POP DE ; DE := second operand POP HL ; HL := first operand OR A ; unset carry SBC HL,DE ; HL := HL - DE - carry PUSH HL ; 4 + J - K STO : POP DE ; DE := value to be stored POP HL ; HL := address to be stored at LD HL,E ; Mem[HL] := E store low order byte INC HL ; HL := HL + 1 LD HL,D ; Mem[HL] := D store high order byte ; store on I We need not present code generator routines based on these ideas in any detail. Their intent should be fairly clear - the code generated by each follows distinct patterns, with obvious differences being handled by the parameters which have already been introduced. For the example under discussion we have generated 41 bytes, which is still quite a long way from the optimal 18 given before. However, little effort would be required to reduce this to 32 bytes. It is easy to see that 8 bytes could simply be removed the ones marked with a single asterisk, since the operations of pushing a register pair at the end of one code generating sequence and of popping the same pair at the start of the next are clearly redundant. Another byte could be removed by replacing the two marked with a double asterisk by a one-byte opcode for exchanging the DE and HL pairs the Z80 code EX DE,HL does this. These are examples of so-called peephole optimization, and are quite easily included into the code generating routines we are contemplating. For example, the algorithm for assign could be PROCEDURE Assign; Generate code to store top-of-stack on address stored next-to-top BEGIN IF last code generated was PUSH HL THEN replace this PUSH HL with EX DE,HL ELSIF last code generated was PUSH DE THEN delete PUSH DE ELSE generate code for POP DE END; generate code for POP HL; generate code for LD HL,E generate code for INC HL; generate code for LD HL,D END; The reader might like to reflect on the kinds of assignment statements which would give rise to the three possible paths through this routine. By now, hopefully, it will have dawned on the reader that generation of native code is probably strongly influenced by the desire to make this compact and efficient, and that achieving this objective will require the compiler writer to be highly conversant with details of the target machine, and with the possibilities afforded by its instruction set. We could pursue the ideas we have just introduced, but will refrain from doing so, concentrating instead on how one might come up with a better structure from which to generate code.

15.3.2 Abstract syntax trees as intermediate representations