are of fixed size, addressing is quite easy. Suppose we declare a matrix
VAR Matrix[MinX : MaxX , MinY : MaxY];
Then we shall have to reserve
MaxX-MinX+1 MaxY-MinY+1
consecutive locations for the whole array. If we store the elements by rows as in most languages, other than Fortran, then the
offset of the I,J th element in the matrix will be found as
I - MinX MaxY - MinY + 1 + J - MinY + offset of first element
You will need a new version of the
STK_ind
opcode incorporating bounds checking. 15.22 Extend your system to allow whole arrays of the same length to be assigned one to another.
15.23 Some users like to live dangerously. How could you arrange for the compiler to have an option whereby generation of subscript range checks could be suppressed?
15.24 Complete level 1 of your extended Clang or Topsy compiler by generating code for all the extra statement forms that you have introduced while assiduously exploring the exercises suggested
earlier in this chapter. How many of these can only be completed if the instruction set of the machine is extended?
15.3 Other aspects of code generation
As the reader may have realized, the approach taken to code generation up until now has been rather idealistic. A hypothetical stack machine is, in many ways, ideal for our language - as witness
the simplicity of the code generator - but it may differ rather markedly from a real machine. In this section we wish to look at other aspects of this large and intricate subject.
15.3.1 Machine tyranny
It is rather awkward to describe code generation for a real machine in a general text. It inevitably becomes machine specific, and the principles may become obscured behind a lot of detail for an
architecture with which the reader may be completely unfamiliar. To illustrate a few of these difficulties, we shall consider some features of code generation for a relatively simple processor.
The Zilog Z80 processor that we shall use as a model is typical of several 8-bit microprocessors that were very popular in the early 1980’s, and which helped to spur on the microcomputer revolution.
It had a single 8-bit accumulator denoted by
A
, several internal 8-bit registers denoted by
B, C, D, E, H
and
L
, a 16-bit program counter
PC
, two 16-bit index registers
IX
and
IY
, a 16-bit stack pointer
SP
, an 8-bit data bus, and a 16-bit address bus to allow access to 64KB of memory. With the exception of the
BRN
opcode and, perhaps, the
HLT
opcode, our hypothetical machine instructions do not map one-for-one onto Z80 opcodes. Indeed, at first sight the Z80 would appear
to be ill suited to supporting a high-level language at all, since operations on a single 8-bit accumulator only provide for handling numbers between -128 and +127, scarcely of much use in
arithmetic calculations. For many years, however - even after the introduction of processors like the Intel 80x86 and Motorola 680x0 - 16-bit arithmetic was deemed adequate for quite a number of
operations, as it allows for numbers in the range -32768 to +32767. In the Z80 a limited number of operations were allowed on 16-bit register pairs. These were denoted
BC, DE
and
HL
, and were formed by simply concatenating the 8-bit registers mentioned earlier. For example, 16-bit constants
could be loaded immediately into a register pair, and such pairs could be pushed and popped from
the stack, and transferred directly to and from memory. In addition, the
HL
pair could be used as a 16-bit accumulator into which could be added and subtracted the other pairs, and could also be used
to perform register-indirect addressing of bytes. On the Z80 the 16-bit operations stopped short of multiplication, division, logical operations and even comparison against zero, all of which are
found on more modern 16 and 32-bit processors. We do not propose to describe the instruction set in any detail; hopefully the reader will be able to understand the code fragments below from the
commentary given alongside.
As an example, let us consider Z80 code for the simple assignment statement
I := 4 + J - K
where
I, J
and
K
are integers, each stored in two bytes. A fairly optimal translation of this, making use of the
HL
register pair as a 16 bit accumulator, but not using a stack in any way, might be as follows:
LD HL,4 ; HL := 4 LD DE,J ; DE := Mem[J]
ADD HL,DE ; HL := HL + DE 4 + J LD DE,K ; DE := Mem[K]
OR A ; just to clear Carry SBC HL,DE ; HL := HL - DE - Carry 4 + J - K
LD I,HL ; Mem[I] := HL
On the Z80 this amounted to some 18 bytes of code. The only point worth noting is that, unlike addition, there was no simple 16-bit subtraction operation, only one which involved a carry bit,
which consequently required unsetting before
SBC
could be executed. By contrast, the same statement coded for our hypothetical machine would have produced 13 words of code
ADR I ; push address of I LIT 4 ; push constant 4
ADR J ; push address of J VAL ; replace with value of J
ADD ; 4 + J ADR K ; push address of K
VAL ; replace with value of K SUB ; 4 + J - K
STO ; store on I
and for a simple single-accumulator machine like that discussed in Chapter 4 we should probably think of coding this statement on the lines of
LDI 4 ; A := 4 ADD J ; A := 4 + J
SUB K ; A := 4 + J - K STA I ; I := 4 + J - K
How do we begin to map stack machine code to these other forms? One approach might be to consider the effect of the opcodes, as defined in the interpreter in Chapter 4, and to arrange that
code generating routines like
stackaddress
,
stackconstant
and
assign
generate code equivalent to that which would be obeyed by the interpreter. For convenience we quote the relevant
equivalences again. We use
T
to denote the virtual machine top of stack pointer, to avoid confusion with the
SP
register of the Z80 real machine.
ADR address : T := T - 1; Mem[T] := address push an address LIT value : T := T - 1; Mem[T] := value push a constant
VAL : Mem[T] := Mem[Mem[T]] dereference ADD : T := T + 1; Mem[T] := Mem[T] + Mem[T-1] addition
SUB : T := T + 1; Mem[T] := Mem[T] - Mem[T-1] subtraction STO : Mem[Mem[T+1]] := Mem[T]; T := T + 2 store top-of-stack
It does not take much imagination to see that this would produce a great deal more code than we should like. For example, the equivalent Z80 code for an
LIT
opcode, obtained from a translation of
the sequence above, and generated by
stackconstantNum
might be
T := T - 1 : LD HL,T ; HL := T DEC HL ; HL := HL - 1
DEC HL ; HL := HL - 1 LD T,HL ; T := HL
Mem[T] := Num : LD DE,Num ; DE := Num LD HL,E ; store low order byte
INC HL ; HL := HL + 1 LD HL,D ; store high order byte
which amounts to some 14 bytes. We should comment that
HL
must be decremented twice to allow for the fact that memory is addressed in bytes, not words, and that we have to store the two halves
of the register pair
DE
in two operations, bumping the
HL
pair used for register indirect addressing between these.
If the machine for which we are generating code does not have some sort of hardware stack we might be forced or tempted into taking this approach, but fortunately most modern processors do
incorporate a stack. Although the Z80 did not support operations like
ADD
and
SUB
on elements of its stack, the pushing which is implicit in
LIT
and
ADR
is easily handled, and the popping and pushing implied by
ADD
and
SUB
are nearly as simple. Consequently, it would be quite simple to write code generating routines which, for the same assignment statement as before, would have the
effects shown below.
ADR I : LD HL,I ; HL := address of I PUSH HL ; push address of I
LIT 4 : LD DE,4 ; DE := 4 PUSH DE ; push value of 4
ADR J : LD HL,J ; HL := address of J PUSH HL ; push address of J
VAL : POP HL ; HL := address of variable LD E,HL ; E := Mem[HL] low order byte
INC HL ; HL := HL + 1 LD D,HL ; D := Mem[HL] high order byte
PUSH DE ; replace with value of J ADD : POP DE ; DE := second operand
POP HL ; HL := first operand ADD HL,DE ; HL := HL + DE
PUSH HL ; 4 + J ADR K : LD HL,K ; HL := address of K
PUSH HL ; push address of K VAL : POP HL ; HL := address of variable
LD E,HL ; E := low order byte INC HL ; HL := HL + 1
LD D,HL ; D := high order byte PUSH DE ; replace with value of K
SUB : POP DE ; DE := second operand POP HL ; HL := first operand
OR A ; unset carry SBC HL,DE ; HL := HL - DE - carry
PUSH HL ; 4 + J - K STO : POP DE ; DE := value to be stored
POP HL ; HL := address to be stored at LD HL,E ; Mem[HL] := E store low order byte
INC HL ; HL := HL + 1 LD HL,D ; Mem[HL] := D store high order byte
; store on I
We need not present code generator routines based on these ideas in any detail. Their intent should be fairly clear - the code generated by each follows distinct patterns, with obvious differences being
handled by the parameters which have already been introduced.
For the example under discussion we have generated 41 bytes, which is still quite a long way from the optimal 18 given before. However, little effort would be required to reduce this to 32 bytes. It is
easy to see that 8 bytes could simply be removed the ones marked with a single asterisk, since the operations of pushing a register pair at the end of one code generating sequence and of popping the
same pair at the start of the next are clearly redundant. Another byte could be removed by replacing
the two marked with a double asterisk by a one-byte opcode for exchanging the
DE
and
HL
pairs the Z80 code
EX DE,HL
does this. These are examples of so-called peephole optimization, and are quite easily included into the code generating routines we are contemplating. For example, the
algorithm for
assign
could be
PROCEDURE Assign; Generate code to store top-of-stack on address stored next-to-top
BEGIN IF last code generated was PUSH HL
THEN replace this PUSH HL with EX DE,HL ELSIF last code generated was PUSH DE
THEN delete PUSH DE ELSE generate code for POP DE
END; generate code for POP HL; generate code for LD HL,E
generate code for INC HL; generate code for LD HL,D END;
The reader might like to reflect on the kinds of assignment statements which would give rise to the three possible paths through this routine.
By now, hopefully, it will have dawned on the reader that generation of native code is probably strongly influenced by the desire to make this compact and efficient, and that achieving this
objective will require the compiler writer to be highly conversant with details of the target machine, and with the possibilities afforded by its instruction set. We could pursue the ideas we have just
introduced, but will refrain from doing so, concentrating instead on how one might come up with a better structure from which to generate code.
15.3.2 Abstract syntax trees as intermediate representations