Skip to content

Optimizations

  • recall 5 phases of compiler: lexer, parser, (type checker, operational semantics), optimization, translate to target machine code (ASM)
    • modern compilers => most actions happen in optimization phase

Intermediate Representation (IR)

  • Provides an intermediate level of abstraction
  • IR has more details than the source code
    • Optimizations happen on the IR
  • has less details than the target (machine, or assembly code) ...

Three-Address IR

  • every instructions has the form
    x := y op z    # binary , e.g., x:= y + z
    x := op y      # unary  , e.g.,  y :=  -y
    
    • y,z are registers or constants ...
  • compound expression x + y * z is translated to :
    t1 := y * z
    t2 := x + t1
    

Optimization Overview

  • Largest (most complicated) phase of compiler
  • Where to perform optimization

    • On AST:
      • Pro: Machine independent
      • Con: Too high level (cannot too too much optimization)
    • On ASM:
      • Pro: low level, expose many details and optimization opportunies
      • Con: machine dependent, reimplement the optimization if switch to a different target
    • On IR:
      • Pro: machine independent
      • Pro: low level enough to expose optimization opportunties
  • 3-address code

    P -> S P | S
    S -> id := id op id    #op are things like +, -, * ...
     |   id := op id
     |   id := id
     |   push id
     |   id := pop
     |   if id relop id goto L   # relop < = > ...
     |   L:
     |   jump L
    

  • A basic block is a maximal sequence of instructions with

    • no label (except in the first instruction)
    • no jump (except in the last instruction)
      L:
      ....
      ....
      ....
      jump M
      
    • cannot jump into a middle of a block (except at the beginning)
    • cannot jump out of a middlew of a block (except at the end)
    • Consider this basic block
    L:
    t := 2 * x
    w := t + x
    if w > 0 goto to L'
    
    • Because (3) executes AFTER (2), we can
      • change (3) to w := 3 * x
      • remove 2 (assuming t is not used anywhere else)
  • A Control-Flow graph (CFG) is a directed graph

    • basic blocks are nodes
    • edge from a block A to a block if the execution can pass from the last instruction in A to the first instruction in B
    • E.g., the last instruction is jump Lb
    • We can represent the body of a method (or function or procedure) as a CFG
  • Goals of optimization

    • Minimize Execution time (most often)
    • Minimize Code size (e.g., embedded system)
    • Minimize Operations to Disks (e.g., Database)
    • Minimize Power Consumption (e.g., sensor, smart phones, watches)
    • Important: Need to preserve the semantics of the program
      • whatever results we get from the original one, we need to get the same results in the optimization version
  • 3 granularity levels of optimizations

    • Local optimizations: Apply optimization to basic block in isolation
    • Global optimizations: Apply optimization to the CFG in isolation
    • Inter-procedural optimizations: Apply optimization to the entire program (consists multiple methods / functions)

    • Most compilers do (1: local), many do (2: global), very few do (3)

  • In practice, people DO NOT use the fanciest/most optimized algorithms

    • They have low pay-offs
    • Too hard/complex to implement (this might affect correctness preservation)
    • Their analyses too expensive during compilation time
    • Goal: maximum benefit for minimum cost

Local Optimization: (optimization applied to basic block)

Algebraic Simplifications

  • can delete some statements
    x := x + 0   (0 + x)
    x := x * 1   (1 * x)
    
  • can simplify some statements
    x := x * 0   =>   x := 0
    x := y ** 2  =>   x := y * y   # make call to library (expensive operation),
                                   # usually has loop to do exp
    x := x * 8   =>   x := x << 3   # on some machines << is faster than *modern computers)
    x := x * 15  =>   x := x << 4; x := t - x  #; but not all (especially
    

Constant Folding

  • operations on constants can be computed at compile time

    • if there is a statement : x := y op z
    • and y and z are constants
    • y op z can be computed at compile time
    • Example:
      x : = 2 + 3  => x := 5
      
      if 2 > 0: code  => if true: code  => code
      if 2 < 0: code  => if false: code =>  code never get executed
      
  • Constant folding can be dangerous (gives different results)

    • Compile program on machine X
    • Run the compiled program on machine Y
    • X and Y might have diff architectures
    • a := 1.5 + 3.7 => a := 5.2 on X
    • a := 1.5 + 3.7 => a: 5.1999 on Y
    • a = "1.5 + 3.7"
  • Unreachable Examples

    • debug macro
      #define DEBUG 0
      if (DEBUG) then ....
        ....
      
    • libraries (not everything in the library are used)

Single Assignment form

  • each register (id) occurs only ONCE on the left-hand side of an assignment
    x := z + y      =>    b := z + y
    a := x          =>    a := b
    x := 2 * x      =>    x := 2 * b
    
  • converting to SA could be tricky in many code regions (e.g., within loops)

Optimizations on SA blocks

Common Subexpression Elimination

  • if a basic block is in SA form
  • a definition x := is the first use of x in a block
  • then when 2 assignments have the same rhs, then they compute the same value
    x := y + z             =>  x:= y + z
    ...                    =>  ...
    w := y + z             =>  w:= x
    

Copy Propagation

b := z + y  =>    b := z + y
a := b      =>    a := b
x := 2 * a  =>    x := 2 * b
  • only useful for enabling other optimizations
    • eliminate dead code
    • constant folding ...
  • Example
    a := 5                a := 5
    x := 2 * a     ===>   x := 10
    y := x + 6            y := 16
    t := x * y            t := 160
    

Dead code elimination

  • if w:=rhs appears in a basic block and w does not appear anywhere else, then w:=rhs is dead and can be removed

Summary for local optimization

  • each local optimization does little thing by itself
  • but they interact (performing an optimization enables another)
  • compiler: repeat optimization until no other improvement is possible

    • but usually compilers has heuristics to determine when to stop
  • Example

# initial
a := x ** 2
b := 3
c := x
d := c * c
e := b * 2
f := a + d
g := e * f
# final version
a := x * x
g := 12 * a
....

Peephole Optimization

  • Peephole: is a short sequence of (usually contiguous) instructions
  • The compiler replaces that peephole (sequence) with another one that is equivalent (but faster)
    • i1,...,in -> j1,...,jm
  • Peephole is often performed on assembly code
  • Examples

    # 1
    a := b           =>  a := b
    b := a
    
    # 2
    a := a + 1       => a:= a + 3
    a := a + 2
    

  • Just like local optimization, peephole opt must be applied repeatedly for maximum effect

  • "Optimization" is misnamed
    • Compiler does not produce an "optimal" version
    • it only attempts to improve the code by repeatedly applying various optimization techniques

Global Optimization

                    x:=3
                    B > 0
                   /     \
                  /       \
            Y:= Z + W     Y:=0
                  \      /
                   \    /
                   A := 2 * x   #  can replace x with 3 
  • To replace a use of a variable x by a constant k, we need to ensure that

    • on every path to the use of x, the last assignment to x has the form
      x := k
      
  • dataflow analysis (global)

    • an analysis of the entire control flow graph
                      x:=3
                      B > 0
                     /     \
                    /       \
                Y:= Z + W     Y:=0
                  x:=4      /
                     \     /
                    A := 2 * x   # cannot replace x with anything
    
  • Global optimization tasks (e.g., dataflow anlaysis) have shared traits

    • to make some optimization at a location X, then we need to know the properties at X (we need to know the invariant properties at X)
    • requires knowledge of the entire program
    • it's OK to be conservative. If the compiler doesn't know what is true, then it will say it doesn't know.
      • always safe to say it doesn't know.

Constant Propagation

  • To replace a use of a variable x by a constant k, we need to ensure that

    • on every path to the use of x, the last assignment to x has the form
    x := k
    
  • The property that we are interested in is checking if x := k (at some location L)?

  • 3 Values that the analysis can give at location L about the property x:=k

    • BOTTOM -> this location is NOT reachable
    • k -> x == constant k
    • TOP -> I have no idea what x could be here
  • Example

                    [x = TOP]
                    x:=3        
                    [x == 3]
                    B > 0       
                    [x == 3 ]
                   /     \          
                  /       \
          Y:= Z + W       Y:=0           
           [x==3]          [x ==3] 
            x:=4  
           [x==4]         /
                  \      /
                  [x==TOP]
                  A := 2 * x  

Memory Management

  • new: allocate space
  • Garbage Collection

  • C and C++ programs have many memory-related bugs

  • double free , use after free, dangling pointer
  • overwrite part of data structure by accidents ...
  • OpenSSL Heartbleed
  • Apache Optionbleed

  • Memory bugs are REALY hard to find

  • x = 3
  • y = 3
  • bugs happen in the FUTURE

  • Automatic Memory Management

  • 1950s ...
  • Become mainstream with popularity of Java (1990's, Gosling?)

  • Managing Memory

  • When an object is created, its runtime environment will allocate unused space for the object (new X)
  • after a while there will be no more unused space
  • Automatic MM attempt to determine which space is UNUSED (garbage) and automatically delete (free) it
  • How do we will when an object or space that object points to will never be used again ?

  • Reachability Algorithms

  • A object X is reachable iff

    • something (a variable) points to it
    • another reachable object Y contains a pointer to X
  • We can find all reachable objects by starting with all variables and follow their pointers

  • An unreachable object can never be used, i.e., garbage

  • Mark and Sweep

  • 2 phases:
  • 1: mark phase: traces all reachable objects
  • 2: sweep phase: collects garbage object

  • Every object will have an extra bit: the mark bit

  • Mark phase
    • initially all mark bit is 0
    • start from some root object (variable), traverse everything that variable can reach (point to)
    • mark those as 1
  • Sweep phase

    • look at objects with mark bit 0 (garbage)
    • add them to a free list
    • objects with mark bit 1 reset to 0
  • NOTES MOVED TO THIS PDF DOCUMENT