Optimizations
- recall 5 phases of compiler: lexer, parser, (type checker, operational semantics), optimization, translate to target machine code (ASM)
- modern compilers => most actions happen in optimization phase
Intermediate Representation (IR)
- Provides an intermediate level of abstraction
- IR has more details than the source code
- Optimizations happen on the IR
- has less details than the target (machine, or assembly code) ...
Three-Address IR
- every instructions has the form
x := y op z # binary , e.g., x:= y + z x := op y # unary , e.g., y := -y
y,z
are registers or constants ...
- compound expression
x + y * z
is translated to :t1 := y * z t2 := x + t1
Optimization Overview
- Largest (most complicated) phase of compiler
-
Where to perform optimization
- On AST:
- Pro: Machine independent
- Con: Too high level (cannot too too much optimization)
- On ASM:
- Pro: low level, expose many details and optimization opportunies
- Con: machine dependent, reimplement the optimization if switch to a different target
- On IR:
- Pro: machine independent
- Pro: low level enough to expose optimization opportunties
- On AST:
-
3-address code
P -> S P | S S -> id := id op id #op are things like +, -, * ... | id := op id | id := id | push id | id := pop | if id relop id goto L # relop < = > ... | L: | jump L
-
A basic block is a maximal sequence of instructions with
- no label (except in the first instruction)
- no jump (except in the last instruction)
L: .... .... .... jump M
- cannot jump into a middle of a block (except at the beginning)
- cannot jump out of a middlew of a block (except at the end)
- Consider this basic block
L: t := 2 * x w := t + x if w > 0 goto to L'
- Because (3) executes AFTER (2), we can
- change (3) to
w := 3 * x
- remove 2 (assuming
t
is not used anywhere else)
- change (3) to
-
A Control-Flow graph (CFG) is a directed graph
- basic blocks are nodes
- edge from a block A to a block if the execution can pass from the last instruction in A to the first instruction in B
- E.g., the last instruction is
jump Lb
- We can represent the body of a method (or function or procedure) as a CFG
-
Goals of optimization
- Minimize Execution time (most often)
- Minimize Code size (e.g., embedded system)
- Minimize Operations to Disks (e.g., Database)
- Minimize Power Consumption (e.g., sensor, smart phones, watches)
- Important: Need to preserve the semantics of the program
- whatever results we get from the original one, we need to get the same results in the optimization version
-
3 granularity levels of optimizations
- Local optimizations: Apply optimization to basic block in isolation
- Global optimizations: Apply optimization to the CFG in isolation
-
Inter-procedural optimizations: Apply optimization to the entire program (consists multiple methods / functions)
-
Most compilers do (1: local), many do (2: global), very few do (3)
-
In practice, people DO NOT use the fanciest/most optimized algorithms
- They have low pay-offs
- Too hard/complex to implement (this might affect correctness preservation)
- Their analyses too expensive during compilation time
- Goal: maximum benefit for minimum cost
Local Optimization: (optimization applied to basic block)
Algebraic Simplifications
- can delete some statements
x := x + 0 (0 + x) x := x * 1 (1 * x)
- can simplify some statements
x := x * 0 => x := 0 x := y ** 2 => x := y * y # make call to library (expensive operation), # usually has loop to do exp x := x * 8 => x := x << 3 # on some machines << is faster than *modern computers) x := x * 15 => x := x << 4; x := t - x #; but not all (especially
Constant Folding
-
operations on constants can be computed at compile time
- if there is a statement : x := y op z
- and y and z are constants
y op z
can be computed at compile time- Example:
x : = 2 + 3 => x := 5 if 2 > 0: code => if true: code => code if 2 < 0: code => if false: code => code never get executed
-
Constant folding can be dangerous (gives different results)
- Compile program on machine X
- Run the compiled program on machine Y
- X and Y might have diff architectures
a := 1.5 + 3.7
=>a := 5.2
on Xa := 1.5 + 3.7
=>a: 5.1999
on Ya = "1.5 + 3.7"
-
Unreachable Examples
- debug macro
#define DEBUG 0 if (DEBUG) then .... ....
- libraries (not everything in the library are used)
- debug macro
Single Assignment form
- each register (id) occurs only ONCE on the left-hand side of an assignment
x := z + y => b := z + y a := x => a := b x := 2 * x => x := 2 * b
- converting to SA could be tricky in many code regions (e.g., within loops)
Optimizations on SA blocks
Common Subexpression Elimination
- if a basic block is in SA form
- a definition x := is the first use of
x
in a block - then when 2 assignments have the same rhs, then they compute the same value
x := y + z => x:= y + z ... => ... w := y + z => w:= x
Copy Propagation
b := z + y => b := z + y
a := b => a := b
x := 2 * a => x := 2 * b
- only useful for enabling other optimizations
- eliminate dead code
- constant folding ...
- Example
a := 5 a := 5 x := 2 * a ===> x := 10 y := x + 6 y := 16 t := x * y t := 160
Dead code elimination
- if
w:=rhs
appears in a basic block andw
does not appear anywhere else, thenw:=rhs
is dead and can be removed
Summary for local optimization
- each local optimization does little thing by itself
- but they interact (performing an optimization enables another)
-
compiler: repeat optimization until no other improvement is possible
- but usually compilers has heuristics to determine when to stop
-
Example
# initial
a := x ** 2
b := 3
c := x
d := c * c
e := b * 2
f := a + d
g := e * f
# final version
a := x * x
g := 12 * a
....
Peephole Optimization
- Peephole: is a short sequence of (usually contiguous) instructions
- The compiler replaces that peephole (sequence) with another one that is equivalent (but faster)
i1,...,in -> j1,...,jm
- Peephole is often performed on assembly code
-
Examples
# 1 a := b => a := b b := a # 2 a := a + 1 => a:= a + 3 a := a + 2
-
Just like local optimization, peephole opt must be applied repeatedly for maximum effect
- "Optimization" is misnamed
- Compiler does not produce an "optimal" version
- it only attempts to improve the code by repeatedly applying various optimization techniques
Global Optimization
x:=3
B > 0
/ \
/ \
Y:= Z + W Y:=0
\ /
\ /
A := 2 * x # can replace x with 3
-
To replace a use of a variable x by a constant k, we need to ensure that
- on every path to the use of x, the last assignment to x has the form
x := k
- on every path to the use of x, the last assignment to x has the form
-
dataflow analysis (global)
- an analysis of the entire control flow graph
x:=3 B > 0 / \ / \ Y:= Z + W Y:=0 x:=4 / \ / A := 2 * x # cannot replace x with anything
-
Global optimization tasks (e.g., dataflow anlaysis) have shared traits
- to make some optimization at a location
X
, then we need to know the properties atX
(we need to know the invariant properties atX
) - requires knowledge of the entire program
- it's OK to be conservative. If the compiler doesn't know what is true, then it will say it doesn't know.
- always safe to say it doesn't know.
- to make some optimization at a location
Constant Propagation
-
To replace a use of a variable x by a constant k, we need to ensure that
- on every path to the use of x, the last assignment to x has the form
x := k
-
The property that we are interested in is checking if
x := k
(at some locationL
)? -
3 Values that the analysis can give at location
L
about the propertyx:=k
- BOTTOM -> this location is NOT reachable
- k -> x == constant k
- TOP -> I have no idea what x could be here
-
Example
[x = TOP]
x:=3
[x == 3]
B > 0
[x == 3 ]
/ \
/ \
Y:= Z + W Y:=0
[x==3] [x ==3]
x:=4
[x==4] /
\ /
[x==TOP]
A := 2 * x
Memory Management
- new: allocate space
-
Garbage Collection
-
C and C++ programs have many memory-related bugs
- double free , use after free, dangling pointer
- overwrite part of data structure by accidents ...
- OpenSSL Heartbleed
-
Apache Optionbleed
-
Memory bugs are REALY hard to find
- x = 3
- y = 3
-
bugs happen in the FUTURE
-
Automatic Memory Management
- 1950s ...
-
Become mainstream with popularity of Java (1990's, Gosling?)
-
Managing Memory
- When an object is created, its runtime environment will allocate unused space for the object (new X)
- after a while there will be no more unused space
- Automatic MM attempt to determine which space is UNUSED (garbage) and automatically delete (free) it
-
How do we will when an object or space that object points to will never be used again ?
-
Reachability Algorithms
-
A object X is reachable iff
- something (a variable) points to it
- another reachable object Y contains a pointer to X
-
We can find all reachable objects by starting with all variables and follow their pointers
-
An unreachable object can never be used, i.e., garbage
-
Mark and Sweep
- 2 phases:
- 1: mark phase: traces all reachable objects
-
2: sweep phase: collects garbage object
-
Every object will have an extra bit: the mark bit
- Mark phase
- initially all mark bit is 0
- start from some root object (variable), traverse everything that variable can reach (point to)
- mark those as 1
-
Sweep phase
- look at objects with mark bit 0 (garbage)
- add them to a free list
- objects with mark bit 1 reset to 0
-
NOTES MOVED TO THIS PDF DOCUMENT