Intro
Intro to Compilers
- Two main approaches to implement a programming lang: Compilers and Interpreters
- Examples: Python or Ruby interpreters,
- Input: a program (e.g.,
add_2_nums.py
) and data (e.g., integer inputs5
and7
) - Output: results (e.g., shown on the screen,
python add_2_nums.py 5 7
shows12
) - "On-line"
- Examples: Java javac , C compiler gcc, clang, Ocaml compiler
- Input: a program (e.g.,
add_2_nums.c
) - Output: another program (e.g., in executable format
gcc add_2_nums.c -o add_2_nums.exe
), which can then be executed with data to obtain output (e.g.,./add_2_numbers.exe 2 3
shows 5 - "Off-line"
- Compiler has the Code Generation phase that an Interpreter does not
History
-
In the 50's, computer hardware (e.g., IBM's) are expensive (and big), but software is even more expensive and difficult to use.
-
John Backus:
- Speedcoding: early form of an interpreter
- Advantages: much faster to do development, increase productivity
- Disadvantage:
- Slow : 10-20x slower than program written in low level
- Space: ..
- FORTRAN 1
- Formula Translation
- 1954-1957 : time to write compiler for Fortran
- 1958: 50% written Fortran
- Require lots of good engineering skills
Structure of Compiler: 5 main phases
- Lexical Analysis (Syntax)
- Parsing (Syntax)
- Semantic Analysis (Semantics)
- Optimization (makes it runs faster, use less memory, etc)
- Code Generation (not needed for interpreters)
Phase 1: Lexer
- Goal: classify provide text into words or tokens
- Technique: use Regular Expressions and Finite Automata to recognize tokens (e.g., tokens
x
andvar5
are variable names while tokens1.2
and-0.5
are constants (numbers))
Example
if x==y then z=1; else z=2;
: multiple tokens such as if, space, x, ==, y, space, then ..
- Reserve kws: if, then, else, ..
- Constant, digit: 1,2
- Variable names: x, y, z
- Operators: ==, =
- Others: ;, blank
Phase 2: Parser
- Goal: check if obtained tokens form a valid program (i.e., conform to a grammar/structure specific to the language)
- Technique: use a Context Free Grammar and parsing algorithm to build a parse tree
Example
- Matching parenthesis, brackets:
(x*(x)))
would have a parsing error - Statements do not end with semi-colons
- A boolean expression follows
if
,while
, e.g.,if (exp)
Phase 3: Semantic Analyzer
- Goal: Understand meaning or semantics of the program
- Technique: Compilers include various techniques that perform static analysis to catch errors such as type checking, determining scopes, ambiguities, etc
Example
- Type checking:
int x = 5; x = "a string"
would be a type mismatch - PL's have rules to avoid certain kind of ambiguities
{ int Jack = 3; { int Jack = 4; cout << Jack; // print 4 } }
Phase 4: Optimization
- Goal: Automatically modify the program to another one that has the same meaning as the original one, but is "better" in some way, e.g.,:
- Run faster
- Use less memory
- Use less power
- Techniques: various ways (compile time evaluation, constant propagation, dead code elimination, etc)
Could be tricky: really has to preserve the original semantics
X = Y * 0 is the same as X = 0
?- No! only true for integers, not true for floating points,
NaN * 0 = NaN
- No! only true for integers, not true for floating points,
Phase 5: Code Generation
- Goal: produce assembly code (usually)
- Essentially a translation from a language to another
- Not needed in an interpreter, i.e., only in a compiler
Classical vs Modern Compilers
- Every compiler has these five phases, but their proportions have
changed over the years
In the past: [ l ][ p ][s][ o ][ cg ] Now : [l][p][ s ][ o ][ cg ] # more effort in optimization