Basic Code Generation

We will create a simple language with assignments and loops. It will have a stack-based instruction set, and target both machines with an unbounded and fixed number of registers. We will define an AST as:

1data Stat = Assign Name Exp
2  | Seq Stat Stat
3  | ForLoop Name Exp Exp Stat
4
5data Exp = Binop Op Exp Exp
6  | Unop Op Exp
7  | Const Int
8
9data Op = Plus | Minus | Times | Divide
10
11data Name = [Char]

Our instruction set is defined as:

1data Instruction = Add | Sub | Mul | Div | Neg
2  | PushImm Int
3  | PushAbs Name
4  | Pop Name
5  | CompEq
6  | JTrue Label
7  | JFalse Label
8  | Define Label

1. Naive Code Generator

Now, we present a syntax-directed code generator:

1transStat :: Stat -> [Instruction]
2
3-- transExp will leave the value at the top of the stack.
4-- Pop id will then pop that instruction and store it in the variable.
5transStat (Assign id e) = transExp e ++ [Pop id]
6
7transStat (Seq s1 s2) = transStat s1 ++ transStat s2
8
9transStat (ForLoop id e1 e2 body) =
10  -- Initialize the loop variable
11  transExp e1 ++ [Pop id] ++ 
12  -- Define the loop start
13  [Define label1] ++
14  -- Check the loop condition
15  transExp e2 ++ [PushAbs id, CompGt, JTrue label2] ++
16  -- Execute the loop body
17  transStat body ++
18  -- Increment the loop variable
19  [PushAbs id, PushImm 1, Add, Pop id] ++
20  -- Jump back to the loop start
21  [Jmp label1, Define label2]

We can now complete our simple code generator with the remaining translators:

1transExp :: Exp -> [Instruction]
2transExp (Binop op e1 e2) = transExp e1 ++ transExp e2 ++ transOp op
3transExp (Unop op e) = transExp e ++ transUnop op
4transExp (Ident id) = [PushAbs id]
5transExp (Const n) = [PushImm n]
6
7transOp :: Op -> [Instruction]
8transOp Plus = [Add]
9transOp Minus = [Sub]
10transOp Times = [Mul]
11transOp Divide = [Div]
12
13transUnop :: Op -> [Instruction]
14transUnop Minus = [Neg]

2. Using Registers

Let us now expand the instruction set to include registers:

1data Instruction = Add Reg Reg
2  | Sub Reg Reg
3  | Load Reg Name
4  | LoadImm Reg Num
5  | Store Reg Name
6  | Push Reg
7  | Pop Reg
8  | CompEq Reg Reg
9  | JTrue Reg Label
10  | JFalse Reg Label
11  | Define Label

Now, we can define a new translator for expressions. When we evaluate a sub expression for an expression going to register , we need to make sure to only use registers for , preserving scope:

1transExp :: Exp -> Reg -> [Instruction]
2transExp (Const n) r = [LoadImm r n]
3transExp (Ident id) r = [Load r id]
4transExp (Binop op e1 e2) r
5  = transExp e1 r ++ transExp e2 (r + 1) ++ transBinop op r (r + 1)
6
7transBinop :: Op -> Reg -> Reg -> [Instruction]
8transBinop Plus r1 r2 = [Add r1 r2]
9transBinop Minus r1 r2 = [Sub r1 r2]
10transBinop Times r1 r2 = [Mul r1 r2]
11transBinop Divide r1 r2 = [Div r1 r2]

2.1 Immediate Operands

Instead of loading the constant into a register, we can use an instruction such as addl $5, %eax to add an immediate value to a register. We can modify our translator to use this instruction:

1data Instruction = ...
2  | AddImm Reg Num
3  | SubImm Reg Num
4  | MulImm Reg Num
5
6transExp :: Exp -> Reg -> [Instruction]
7transExp (Bionp op e1 (Const n)) r
8  = transExp e1 r ++ transBinopImm op r n
9transExp (Binop op (Const n) e2) r
10  | commutative op = transExp e2 r ++ transBinopImm op r n
11
12transBinopImm :: Op -> Reg -> Num -> [Instruction]
13transBinopImm Plus r n = [AddImm r n]
14transBinopImm Minus r n = [SubImm r n]
15transBinopImm Times r n = [MulImm r n]

2.2 Limited Registers

What if we have an accumulator machine, that had just one register, and a stack for intermediate values? All arithmetic expressions would pop a value off the stack add it to the accumulator.

1data Instruction = Add | Sub | Mul | Div
2  | AddImm Num
3  | Push -- Push the current acc onto the stack
4  | Pop -- Pop the stack into the acc
5  | Load Name -- Set the acc to the value of the variable
6  | LoadImm Num
7  | Store Name -- Store the acc in the variable
8  | CompEq
9  | Jump Label
10  | JTrue Label
11  | JFalse Label
12  | Define Label
13
14transExp :: Exp -> [Instruction]
15transExp (Const n) = [LoadImm n]
16transExp (Ident id) = [Load id]
17transExp (Binop op e1 e2)
18  = transExp e2 ++ [Push] ++ transExp e1 ++ transBinop op
19
20transBinop :: Op -> [Instruction]
21transBinop Plus = [Add]
22transBinop Minus = [Sub]
23transBinop Times = [Mul]
24transBinop Divide = [Div]

We can use this strategy as a fallback - when there is only 1 register remaining, we fall back to using it as an accumulator.

1transExp :: Exp -> Reg -> [Instruction]
2transExp (Const n) r = [LoadImm r n]
3transExp (Ident id) r = [Load r id]
4transExp (Binop op e1 e2) r
5  | r == MAXREG
6    = transExp e2 r ++ [Push r] ++ transExp e1 r ++ transBinopStack op r
7  | otherwise
8    = transExp e1 r ++ transExp e2 (r + 1) ++ transBinop op r (r + 1)
9
10transBinopStack :: Op -> Reg -> [Instruction]
11transBinopStack Add r = [AddStack r]
12transBinopStack Sub r = [SubStack r]
13
14transBinop :: Op -> Reg -> Reg -> [Instruction]
15transBinop Plus r1 r2 = [Add r1 r2]
16transBinop Minus r1 r2 = [Sub r1 r2]

2.3 Sethi-Ullman Algorithm

We now consider an algorithm which minimises the number of registers needed, and avoids spilling intermediate values into the main store whenever possible. The idea is the order of instruction matters. This is the subexpression ordering principle:

Given an expression e1 op e2, we always choose to evaluate the first expression, as it will always require more registers. This is because, during the evaluation of the second subexpression, one register will be used holding the result of the first.

For example, consider an application e1 op e2 where e1 requires registers and e2 requires registers. If e1 is evaluated first, the maximum number of registers used at once is . If e2 is evaluated first, the maximum number of registers used at once is . We choose the order that minimizes that value.

What if we precomputed, for each node of the AST, the number of registers required to evaluate that node? We can then use this information to choose the order of evaluation.

1weight :: Exp -> Int
2weight (Const n) = 1
3weight (Ident id) = 1
4weight (Binop op e1 e2) = min [c1, c2]
5  where c1 = max (weight e1) (weight e2 + 1)
6        c2 = max (weight e1 + 1) (weight e2)

Now, we can modify our code generator accordingly:

1transExp (Binop op e1 e2) r = 
2  if weight e1 > weight e2
3    then
4      transExp e1 r ++ transExp e2 (r + 1) ++ transBinop op r (r + 1)
5    else
6      transExp e2 r ++ transExp e1 (r + 1) ++ transBinop op r (r + 1)

What about non-commutative operations? Lets modify transExp to tell it which registers it is allowed to use. It should leave the result in the first register in the list:

1transExp :: Exp -> [Reg] -> [Instruction]
2transExp (Const n) (r:_) = [LoadImm r n]
3transExp (Ident id) (r:_) = [Load r id]
4transExp (Binop op e1 e2) (r:r':rs) = 
5  if weight e1 > weight
6    then
7      transExp e1 (r:r':rs) ++ transExp e2 (r':rs) ++ transBinop op r r'
8    else
9      transExp e2 (r":r:rs) ++ transExp e1 (r:rs) ++ transBinop op r r'

Can we modify our weight function to take into account immediate operands? We need to maintain a balance of using a minimum amount of registers and selecting the minimum amount of instructions.

2.4 Graph Colouring

The Sethi-Ullman algorithm doesn't have any context of the variables or expressions around it, so it cannot keep registers to store the same variable in the same register. We can make a more optimal algorithm.

1int x = z + a * b;
2int y = z + a * b;

Can be rewritten as:

1int t = a * b;
2int x = z + t;
3int y = z + t;

We can build a smarter register allocator with graph colouring.

We can use a tree-walking translator to generate an intermediate code where temporary values are always saved in named location.
We can construct an interference graph: nodes are temporary locations, linked by an arc if the values must be stored simultaneously (if their live ranges overlap).
Try to colour the nodes, so no connected nodes have the same colouring.

Although this is slow, a fast heuristic is easy to obtain.

If the attempt to colour the graph using available registers fails, we must spill a register (i.e. choose an arc in the graph and break it, by splitting a variable's live range in two). We do this by storing a variable into memory. To choose values to spill efficiently:

Avoid adding spill code to the innermost loop (prioritize by their nesting depth).
Perform a split that will enable colouring.

3. Function Calls

A function may be called at a point where registers are being used. This means, changing the order of evaluation may change the result, due to side affects. Register targeting can be used to make sure arguments are computed in the right registers.

A function may be called from different places, but it must always return to the right place: the address of the next instruction must be saved and restored. We must ensure the function doesn't clobber any registers which contain intermediate values:

Caller saved registers must be saved by the caller incase they are clobbered by the callee. The caller doesn't know which registers are needed, so it saved all registers that might be used.
Callee saved registers must be saved by the callee incase they are clobbered by the caller. The calle doesnt know which registers are in use, so it has to save any that might be.

In IA32, some registers are caller saved, and some are callee saved. This ensures that our program will work with other people's libraries.

Back to Home

Table of Contents