Semantic Analysis
- Lexical analysis
- Detects inputs with illegal tokens
- Parsing
- Detects inputs with ill-formed parse trees
- Parsing cannot catch some errors
- Some language constructs not context-free
- Semantic analysis
- Last "front end" phase
- Catches all remaining errors
- Semantic Analyzer
- Checks of many kinds
- E.g.,
coolc
checks:- All identifiers are declared
- Types
- Inheritance relationships
- Classes defined only once
- Methods in a class defined only once
- Reserved identifiers are not misused
- And others . . .
- The requirements depend on the language
Scope
- Matching identifier declarations with uses
- Important static analysis step in most languages
-
Including
Cool!
-
Examples
# Example 1 # the y in y + 3 is declared by the let let y: String <- "abc" in y + 3 # Example 2 # the x in x + 3 is not declared here (hopefully somewhere above) let y: Int in x + 3
-
The scope of an identifier is the portion of a program in which that identifier is accessible
-
The same identifier may refer to different things in different parts of the program
- Different scopes for same name don't overlap
-
An identifier may have restricted scope
-
Most languages have static scope
- Scope depends only on the program text, not run-time behavior
- Cool has static scope
-
A few languages are dynamically scoped
- Lisp, but now has changed to mostly static scoping
-
Scope depends on execution of the program
let x: Int <- 0 in { x; # associated with x = 0 let x: Int <- 1 in x; # associated with x=1 x; # associated with x = 0 }
-
A dynamically-scoped variable refers to the closest enclosing binding in the execution of the program
- Example
g(y) = let a <- 4 in f(3); f(x) = a; # a is assocated with 4
Cool Scope
-
Cool identifier bindings are introduced by
- Class declarations (introduce class names)
- Method definitions (introduce method names) - Let expressions (introduce object id's)
- Formal parameters (introduce object id's)
- Attribute definitions (introduce object id's)
- Case expressions (introduce object id's)
-
Not all identifiers follow the most-closely nested rule
- E.g.,
class
definitions in Cool:- Cannot be nested
- Are globally visible throughout the program
- A class name can be used before it is defined
Class Foo { let y: Bar in .. # Bar is defined below }; Class Bar { };
-
Attribute names are global within the class in which they are defined
Class Foo { f(): Int { a }; # a is defined below a: Int <- 0; }
-
Method names have complex rules
- A method need not be defined in the class in which it is used, but in some parent class
- Methods may also be redefined (overridden)
Choose whether or not each variable use binds to the name on the given line
1 Class Foo {
2 f(x: Int): Int {
3 {
4 let x: Int <- 4 in {
6 x;
7 let x: Int <- 7 in
8 x;
9 x;
10 };
11 x;
12 };
13 };
14 x: Int <- 14;
15 }
- Line 6 binds to line 2 : No
- Line 9 binds to line 7 : No
- Line 11 binds to line 2 : Yes
- Line 11 binds to line 14: No
Symbol Tables
-
Much of semantic analysis can be expressed as a recursive descent of an AST
- Before: Process an AST node n
- Recurse: Process the children of n
- After: Finish processing the AST node n
-
When performing semantic analysis on a portion of the the AST, we need to know which identifiers are defined
-
A symbol table is a data structure that tracks the current bindings of identifiers
-
Example: the scope of
let
bindings is one subtree of the AST:let x: Int <- 0 in e
- x is defined in subtree e|(symbol table) let x /(symbol table) \(symbol table + x) init e
- Before processing
e
, add definition ofx
to current definitions in the symbol table, overriding any other definition ofx
- Recurse
- After processing
e
, remove definition ofx
and restore old definition ofx
- Before processing
-
Implementation of Symbol Tables
- For a simple symbol table we can use a stack
- Operations
add_symbol(x)
pushx
and associated info, such asx
's type, on the stackfind_symbol(x)
search stack, starting from top, forx
. Return firstx
found or NULL if none foundremove_symbol()
pop the stack
- The simple symbol table works for
let
- Symbols added one at a time
- Declarations are perfectly nested
- But doesn't work for function's formals, e.g.,
f(x:Int, x:Int){..}
- Need more complex data structures: each scope is a stack
enter_scope()
: start a new nested scopefind_symbol(x)
: finds current x (or null)add_symbol(x)
: add a symbol x to the tablecheck_scope(x)
: true if x defined in current scopeexit_scope()
: exit current scope
- Problem: Class names can be used before being defined
- We can't check class names using a symbol table or even in one pass
- Solution:
- Pass 1: Gather all class names
- Pass 2: Do the checking
- In general, (each) semantic analysis requires multiple passes (probably more than two)
Type Checking
- What is a type?
- The notion varies from language to language
- Consensus
- A set of values
- A set of operations on those values
- Ex1: \(\mathbb{Z}\) and \(\{+,-, \le, <\}\),
- Ex2: string and
concat
,isnull?
-
Classes are one instantiation of the modern notion of type
-
Some languages have no type
- Consider the assembly language fragment
add $r1, $r2, $r3
- What are the types of $r1, $r2, $r3?
- At the ASM level, they really have no types (string of 0's and 1's)
- Certain operations are legal for values of each type
- It doesn't make sense to add a function pointer and an integer in C
- It does make sense to add two integers
- But both have the same assembly language implementation!
Type System
- A language's type system specifies which operations are valid for which types
- The goal of type checking is to ensure that operations are used
only with the correct types
- Enforces intended interpretation of values, because nothing else will!
Three kinds of typings
- Statically typed: All or almost all checking of types is done as part of compilation (C, Java, Cool)
- Dynamically typed: Almost all checking of types is done as part of program execution (Scheme)
- Untyped: No type checking (machine code)
Competing views on static vs. dynamic typing
- Static typing proponents say:
- Static checking catches many programming errors at compile time
- Avoids overhead of runtime type checks
-
Dynamic typing proponents say:
- Static type systems are restrictive
- Rapid prototyping difficult within a static type system
-
A lot of code is written in statically typed languages with an "escape" mechanism
- Unsafe casts in C, Java
- People retrofit static typing to dynamically typed languages
- For optimization, debugging
- It's debatable whether either compromise represents the best or worst of both worlds
Cool Types
-
The types in Cool are:
- Class Names
- SELF_TYPE (will talk later)
-
The user declares types for identifiers
- The compiler infers types for expressions
- Infers a type for every expression
Checking vs Inferencing
- Type Checking is the process of verifying fully typed programs
- Type Inference is the process of filling in missing type information
- The two are different, but the terms are often used interchangeably
Rules of Inference
- We have seen two examples of formal notation specifying parts of a
compiler
- Regular expressions
- Context-free grammars
- The appropriate formalism for type checking is logical rules of inference
Inference Rules
-
Inference rules have the form
If Hypothesis is true, then Conclusion is true
-
Type checking computes via reasoning
If E1 and E2 have certain types, then E3 has a certain type
-
Rules of inference are a compact notation for If-Then statements
-
The notation is easy to read with practice
-
Start with a simplified system and gradually add features
-
Building blocks
- Symbol \(\wedge\) is and
- Symbol \(\Rightarrow\) is if-then
- \(x:T\) is x has type T
-
Example
- If \(e1\) has type Int and \(e2\) has type Int, then \(e1 + e2\) has type Int
- (\(e1\) has type Int \(\wedge\) \(e2\) has type Int) \(\Rightarrow\) \(e1 + e2\) has type Int
-
(\(e1\):Int \(\wedge\) \(e2\):Int) \(\Rightarrow\) \(e1 + e2\):Int
-
The statement \((e1: \text{Int} \wedge e2: \text{Int}) \Rightarrow e1 + e2: \text{Int}\)
- is a special case of \(\text{Hypothesis}_1 \wedge \dots \wedge \text{Hypothesis}_n \Rightarrow \text {Conclusion}\)
- This is an inference rule
-
By tradition inference rules are written
|- Hypothesis ... |- Hypothesis ------------------------------- |- Conclusion
- Cool type rules have hypotheses and conclusions
|- e:T
means "it is provable that .."- These rules give templates describing how to type integers and expressions (e.g., e1, e2) - By filling in the templates, we can produce complete typings for expressionsi is an integer literal ----------------------- [Int] |- i : Int |- e1: Int |- e2: Int ---------------------- [Add] |- e1 + e2 : Int
Soundness
A type system is sound if
- Whenever
|- e: T
, thene
evaluates to a value of typeT
-
We only want sound rules
- But some sound rules are better than others!
- E.g., below rule is sound, but weak because
i
should beInt
intstead ofObj
in conclusioni is Int -------------- |- i : Object # either Object or Int is correct, but Int is more precise
-
Question:
-
Which of the below type rules are sound?
*Not sound* |- e1: Int |- e2: Int ---------------------- [Divide] |- e1 / e2 : Bool *Sound* |- e1: T1 |- e2: T2 . . . |- en: Tn --------- [Sequence] |- {e1; e2; ... en; } : Tn *Sound* |- e1: T1 -------------- [Isvoid] |- isvoid e1 : Bool *Not sound* |- e1: Int |- e2: Int ---------------------- [Compare] |- e1 < e2 :Int
Type Checking
- Type checking proves facts
e: T
- Proof is on the structure of the AST
- Proof has the shape of the AST
- One type rule is used for each AST node
- In the type rule used for a node
e
:- Hypotheses are the proofs of types of
e
's subexpressions - Conclusion is the type of e
- Hypotheses are the proofs of types of
- Types are computed in a bottom-up pass over the AST
-
Type Environments
------------------ [False]
|- false : Bool
s is a string literal
---------------------- [String]
|- s: String
# new T produces an object of type T
# Ignore SELF_TYPE for now . . .
----------------- [New]
|- new T : T
|- e: Bool
--------------- [Not]
|- !e: Bool
# Important: type of while is Object, not T
|- e1: Bool
|- e2: T
-------------------------------- [Loop]
|- while e1 loop e2 pool: Object
-
Problem: What is the type of a variable reference?
x is a variable --------------- [Var] |- x: ?
- The local, structural rule does not carry enough information to give
x
a type. - Solution: Put more information in the rules!
- The local, structural rule does not carry enough information to give
-
A type environment gives types for free variables
- A type environment is a function from ObjectIdentifiers to Types
-
A variable is free in an expression if it is not defined within the expression
- Let O be a function from ObjectIdentifiers to Types - The sentencex # x is free x + y # both x,y are free let y <- ... in x + y # y is bound, x is free
O |- e: T
is read: - Under the assumption that variables have the types given by O, it is provable that the expression e has the type T - Type environment is added to the earlier rules:
i is an integer literal ----------------------- [Int] O |- i : Int O |- e1: Int O |- e2: Int --------------------------- [Add] O |- e1 + e2 : Int
Now we can define new rules
O(x) = T
--------- [Var]
O | x : T
O[T0/x] |- e1: T1
---------------------- [Let-No-Init]
O |- let x:T0 in e1: T1
# Note O[T/x](x) = T and O[T/x](y) = O(y) where x#y
-
Question:
Fill in the correct type environments in the following type rule
O1 |- e1: T1 O2 |- e2: T2 -------------------------------- [Let-Init] O |- let x: T1 <- e1 in e2 : T2 - O1 = O[T1/x]; O2 = O[T1/x] ? - O1 = O[T1/x]; O2 = O[T2/x] ? - O1 = O; O2 = O[T1/x] ? - O1 = O; O2 = O[T2/x] ?
Summary
- The type environment gives types to the free identifiers in the
current scope
- The type environment is passed down the AST from the root
towards the leaves
- Types are computed up the AST from the leaves towards the root
Subtyping
-
Consider let with initialization:
O|- e0:T0 O[T0/x] |- e1: T1 --------------------------- [Let-Init] O |- let x:T0 <- e0 in e1 : T1
-
Define a relation \(\le\) on classes
-
\(X\le X\)
- \(X\le Y\) if \(X\) inherits from \(Y\)
-
\(X \le Z\) if \(X \le Y\) and \(Y \le Z\)
O |- e0: T0 O[T/x] |- e1: T1 T0 <= T ---------------------------- [Let-Init] O |- let x:T <- e0 in e1 : T1 O(x) = T0 O|- e1:T1 ---------------- T1 <= T0 ---------------- [Assign] O|- x<- e1:T1
-
Let \(O_C(x) = T\) for all attributes \(x:T\) in class \(C\)
OC(x) = T0 ---------- OC |- e1: T1 ---------- T1 <= T0 -------------------- [Attr-Init] OC |- x:T0 <- e1;
-
Consider:
if e0 then e1 else e2 fi
-
The result can be either
e1
ore2
- The type is either e1's type of e2's type
- The best we can do is the smallest supertype larger than the type of e1 or e2
lub(X,Y)
: the least upper bound of X and Y, is Z if- \(X\le Z \wedge Y\le Z\) # Z is an upper bound
- \(X \le Z' \wedge Y \le Z' \Rightarrow Z \le Z'\) # Z is least among upper bounds
-
In COOL, the least upper bound of two types is their least common ancestor in the inheritance tree
OBJECT / \ LUB(x,x) / \ / \ / \ / \ / \ / \ x x
-
Question
Class Object Class Bool inherits Object Class Point inherits Object Class Line inherits Object Class Shape inherits Object Class Quad inherits Shape Class Circle inherits Shape Class Rect inherits Quad Class Square inherits Rect
Given the above class definitions, which of the following lub statements are true?
- lub(Point, Quad) = Object
- lub(Square, Rect) = Quad
- lub(Square, Rect) = Rect
- lub(Square, Circle) = Object
Continuing with if-then-else
O |- e0: Bool
O|- e1:T1
O|- e2:T2
---------------------------------------- [If-Then-Else]
O |- if e0 then e1 else e2 fi: lub(T1,T2)
The rule for case expressions takes a lub over all branches
O |- e0: T0
------------
O[T1/x1] |- e1: T1'
...
O[Tn/xn] |- en: Tn'
-------------------------------------------------------------------- [Case]
O |- case e0 of x1:T1 -> e1; ...; xn:Tn -> en; esac : lub(T1',...,Tn')
Method Environment
Problem: What type should we use for the ?
below?
O |- e0:T0
O |- e1:T1
...
O |- en: Tn
-------------------------- [Dispatch]
O |- e0.f(e1, ... ,en): ?
- In Cool, method and object identifiers live in different name spaces
- A method foo and an object foo can coexist in the same scope
-
In the type rules, this is reflected by a separate method environment M for method signatures
M(C,f) = (T1,. . .Tn,Tn+1)
means in class C there is a method f
- Now we can use the method environment M to define dispatch rulesf(x1:T1,...,xn:Tn): Tn+1
O,M |- e0:T0
O,M |- e1:T1
...
O,M |- en: Tn
M(T0,f) = (T1’,. . .Tn’,Tn+1)
Ti <=Ti’ for 1 <=i <=n
-------------------------------- [Dispatch]
O,M |- e0.f(e1, ... ,en): Tn+1
O, M |- e0: T0
O, M |- e1: T1
...
O, M |- en: Tn
T0 <=T
M(T,f) = (T1',..., Tn', Tn+1)
Ti <=Ti' for 1 <=i <=n
------------------------------------ [Static Dispatch]
O, M |- e0@T.f(e1, ... ,en): Tn+1
Question: Given the class definitions and method declaration below, which of the following are valid types for the variables in the statement below?
z <- x.setCenter(y)
Class Object
Class Bool inherits Object
Class Point inherits Object
Class Line inherits Object
Class Shape inherits Object {
setCenter(p: Point): Bool { ...
};
...
};
Class Quad inherits Shape
Class Circle inherits Shape
Class Rect inherits Quad
Class Square inherits Rect
Answer Choice
x: Rect, y: Object, z: Bool Y/N
x: Circle, y: Point, z: Bool Y/N
x: Object, y: Object, z: Object Y/N
x: Shape, y: Point, z: Bool Y/N
- The method environment must be added to all rules
- In most cases, M is passed down but not actually used
- Only the dispatch rules use M
O,M |- e1:Int O,M |- e2:Int
----------------------------- [Add]
O,M |-e1+e2:Int
-
For some cases involving SELF_TYPE, we need to know the class in which an expression appears
-
The full type environment for COOL:
- A mapping O giving types to object id's
- A mapping M giving types to methods
- The current class C
-
The form of a sentence in the logic is
O,M,C |- e: T
Example:
O,M,C |- e1: Int O,M,C |- e2: Int
--------------------------------- [Add]
O,M,C |- e1 + e2 : Int
- The rules in this lecture are COOL-specific
- Some other languages have very different rules
- General themes
- Type rules are defined on the structure of expressions
- Types of variables are modeled by an environment
- Warning: Type rules are very compact!
Implementing Type Checking
- COOL type checking can be implemented in a single traversal over the AST
- Type environment is passed down the tree
- From parent to child
-
Types are passed up the tree
- From child to parent
-
Examples
O,M,C |- e1: Int O,M,C |- e2: Int -------------------------------------- [Add] O,M,C |- e1 + e2 : Int TypeCheck(Environment, e1 + e2){ T1 = TypeCheck(Environment, e1); T2 = TypeCheck(Environment, e2); Check T1 == T2 == Int; return Int; }
O |- e0:T0 O[T/x] |- e1: T1 T0 <= T ------------------------ O |- let x:T <- e0 in e1 : T1 TypeCheck(Environment, let x:T <- e0 in e1) { T0 = TypeCheck(Environment, e0); T1 = TypeCheck(Environment.add(x:T), e1); Check subtype(T0,T1); return T1; }
Static vs Dynamic Checking
-
Static type systems detect common errors at compile time
-
But some correct programs are disallowed
- Some argue for dynamic type checking instead
- Others want more expressive static type checking
-
But more expressive type systems are more complex
-
The dynamic type of an object is the class C that is used in the "new C" expression that created it
- A run-time notion
- Even languages that are not statically typed have the notion of dynamic type
-
The static type of an expression captures all dynamic types the expression could have
- A compile-time notion
Self Type
class Count {
i : int <- 0;
inc () : Count {
{
i <- i + 1;
self;
}
};
};
- Class
Count
incorporatesa counter - The
inc
method works for any subclass - Now, consider a subclass
Stock
ofCount
class Stock inherits Count {
name : String; -- name of item
};
class Main {
Stock a <- (new Stock).inc ();
... a.name ...
};
- Problem: Ok for dynamic, not ok for Static type
(new Stock).inc()
has dynamic type Stock- So it is legitimate to write
Stock a <- (new Stock).inc ()
- But this is not well-typed
(new Stock).inc()
has static type Count- Thus cannot assigned to a Stock because Count is not a subtype of Stock
- The reason for this problem is because the type checker "loses" type information
- This makes inheriting
inc
useless - So, we must redefine
inc
for each of the subclasses, with a specialized return type, i.e., motivation forself type
, discussed next
- This makes inheriting
Self Type
- We will extend the type system
- Insight:
inc
returnsself
- Therefore the return value has same type as
self
- Which could be
Count
or any subtype ofCount
!
- Introduce the keyword
SELF_TYPE
to use for the return value of such functions- We will also need to modify the typing rules to handle SELF_TYPE
- SELF_TYPE allows the return type of inc to change when inc is inherited
-
Modify the declaration of inc to read
inc() : SELF_TYPE { ... }
-
The type checker can now prove:
O,M,C |- (new Count).inc() : Count O,M,C |- (new Stock).inc() : Stock
-
The program from before (that was not well-typed) is now well-typed by using SELF_TYPE
-
SELF_TYPE is not a dynamic type
- It is a static type
- It helps the type checker to keep better track of types
- It enables the type checker to accept more correct programs
-
In short, having SELF_TYPE increases the expressive power of the type system
Self-Type Operations
Recall
class Count {
i : int <- 0;
inc () : Count {
{
i <- i + 1;
self;
}
};
};
- What can be the dynamic type of the object returned by
inc
?- Answer: whatever could be the dynamic type of "self"
- E.g.,
incr
could be invoked through Count or any subclass of Count
-
In general, if SELF_TYPE appears textually in the class C as the declared type of E then
dynamic_type(E) <= C
-
Note: The meaning of SELF_TYPE depends on where it appears
SELF_TYPE_C
to refers to an occurrence of SELF_TYPE in the body of C
-
This suggests a typing rule:
SELF_TYPE_C <= C (*)
-
Rule (*) has an important consequence:
- In type checking it is always safe to replace SELF_TYPE_C by C
- This suggests one way to handle SELF_TYPE :
- Replace all occurrences of SELF_TYPE_C by C
- Recall the operations on types
- T1 <= T2 T1 is a subtype of T2
lub(T1,T2)
the least-upper bound of T1 and T2
-
Extend these operations to handle SELF_TYPE
-
Subtypes involving Self Type
Let T and T' be any types but SELF_TYPE
SELF_TYPE_C <= SELF_TYPE_C
- In Cool we never compare SELF_TYPEs coming from different classes
SELF_TYPE_C <= T if C <= T
- SELF_TYPE_C can be any subtype of C
- This includes C itself
- Thus this is the most flexible rule we can allow
T <= SELF_TYPE_C
alwaysfalse
. Note: SELF_TYPE_C can denote any subtype of C. Why ? Consider this inheritence
C / T / A So SELF_TYPE_C could be A, and we would have T <= A, but this is not possible because C <= T
-
Lub's involving Self Type
Let T and T' be any types but SELF_TYPE
lub(SELF_TYPE_C, SELF_TYPE_C) = SELF_TYPE_C
lub(SELF_TYPE_C, T) = lub(C, T)
- This is the best we can do because
SELF_TYPE_C <= C
- This is the best we can do because
lub(T, SELF_TYPE_C) = lub(C, T)
Self Type Usage
- The parser checks SELF_TYPE appears only where a type is expected
-
But SELF_TYPE is not allowed everywhere a type can appear:
-
class T inherits T' {...}
- T, T' cannot be SELF_TYPE
x:T
- Attribute x
- T can be SELF_TYPE
let x : T in E
- T can be SELF_TYPE
new T
- T can be SELF_TYPE
- Creates an object of the same dynamic type as self
m@T(E1,...,En)
- T cannot be SELF_TYPE
-
m(x : T) : T' { ... }
-
Only T' can be SELF_TYPE !
-
What could go wrong if T were SELF_TYPE?
class A { comp(x : SELF_TYPE) : Bool {...}; }; class B inherits A { b : int; comp(x : SELF_TYPE) : Bool { ... x.b ...}; }; ... let x : A <- new B in ... x.comp(new A); ... ...
-
-
Self Type Checking Rules
-
SELF_TYPE's meaning depends on the enclosing class
O,M,C |- e : T
An expression e occurring in the body of C has static type T given a variable type environment O and method signatures M
-
The next step is to design type rules using SELF_TYPE
-
Most of the rules remain the same
-
But use the new <= (subtype) and lub
O(Id) = T0 O,M,C |- e1 : T0 T1 <= T0 -- new subtype rule, i.e., needs to check if T1 or T0 are SUB_TYPE O,M,C |- Id <- e1 : T1
-
Recall the old rule for dispatch
O,M,C |- e0 : T0 ... O,M,C |- en : Tn M(T0, f) = (T1’,...,Tn’,Tn+1’) Tn+1 ’ # SELF_TYPE Ti <= Ti' 1 <= i <= n ---------------------------------- O,M,C |- e0.f(e1,...,en) : Tn+1'
-
New rule for dispatch
- If the return type of the method is SELF_TYPE then the type of the dispatch is the type of the dispatch expression:
O,M,C |- e0 : T0 ... O,M,C |- en : Tn M(T0, f) = (T1’,...,Tn’,SELF_TYPE) Ti <= Ti' 1 <= i <= n ---------------------------------- O,M,C |- e0.f(e1,...,en) : T0 -- Q: think why it is T0 ?
-
Formal parameters cannot be SELF_TYPE
-
Actual arguments can be SELF_TYPE -- The extended <= relation handles this case
-
The type T0 of the dispatch expression could be SELF_TYPE -- Which class is used to find the declaration of f? -- Answer: it is safe to use the class where the dispatch appears
-
Recall the old rule for static dispatch
O,M,C |- e0 : T0 M O,M,C |- en : Tn T0 <= T M(T, f) = (T1’,...,Tn’,Tn+1’) Tn+1’ # SELF_TYPE Ti <= Ti’ 1 <= i <= n ------------------------------------ O,M,C |- e@T.f(e1,...,en) : Tn+1'
-
New rule for static dispatch
- If the return type of the method is SELF_TYPE we have:
O,M,C |- e0 : T0 M O,M,C |- en : Tn T0 <= T M(T, f) = (T1’,...,Tn’,SELF_TYPE) Ti <= Ti’ 1 <= i <= n ------------------------------------ O,M,C |- e@T.f(e1,...,en) : T0
-
Why is this rule correct?
- If we dispatch a method returning SELF_TYPE in class T, don't we get back a T?
- No. SELF_TYPE is the type of the self parameter, which may be a subtype of the class in which the method appears
-
There are two new rules using SELF_TYPE
O,M,C |- self : SELF_TYPE_C O,M,C |- new SELF_TYPE : SELF_TYPE_C
-
The extended <= and lub operations can do a lot of the work.
- SELF_TYPE can be used only in a few places. Be sure it isn't used anywhere else.
- A use of SELF_TYPE_C always refers to any subtype of the current class C -- The exception is the type checking of dispatch. The method return type of SELF_TYPE might have nothing to do with the current class
-
-
Summary
- SELF_TYPE is a research idea -- It adds more expressiveness to the type system
- SELF_TYPE is itself not so important -- except for the project
- Rather, SELF_TYPE is meant to illustrate that type checking can be quite subtle
- In practice, there should be a balance between the complexity of the type system and its expressiveness
Semantics
Three types of Semantics
- Operational Semantics: details (e.g., mem loc, evaluation order matters)
- Denotational Semantics: high level, mathematic way of understanding progarms
- Axiomatic Semantics: mainly used for verification, check if the program is correct with respect to some specification (e.g., output is half of input)
Operational Semantics
- Gives the meaning to an (Cool) expression
- Evaluate an expr to some value
For typing, the rule looks like
Env |- e: C #evaluate e to some class (type) C under the environment Env
For evaluation (operational semantic), the rule looks very similar
Env |- e: V #evaluate e to some value under the environment Env
- Example:
Env |- e1: 5 Env |- e2: 7 --------------------------- Env |- e1 + e2 : 12
Formal Notations
y <- x + 1
E
is an environment: tells us where in memory a variable is
- a mapping: varid -> memory loc, e.g., E(x) = l1
- E = [x: l1, y: l2, ...]
- S
is a store: tells us what is in the memory
- a mapping: memory loc -> value, e.g., S(l1) = 33
- S = [l1 -> 33, l2 -> 7]
- S\' = S[100/l2] # S\' is S but with S(l2)= 100 (instead of 7)
- Env = E, S
- Cool values are objects
- objs are instances of some class
- X(a1 = l1, ..., a_n = l_n) is a Cool object where
- X is a class of the object
- a_i\'s are the attributes of class X (include inherited ones)
- l_i\'s are the mem location where the value of a_i\'s are stored
- Special classes (primitive, no attribues)
- Int(5) : an integer value 5
- Bool(true): ...
- String(4, \"Cool\"): a string \"Cool\" of length/size 4
- void
- no operations can be performed on it
- Except for the test
isvoid
- similar idea to NULL
Evaluation judgement
so, E, S |- e: v, S'
so
is the current value of self ,self
E
: variable mapping -> mem locS
: store mem loc -> val-
v
-
Examples
- note that the value of the assignment expr------------------------------ so, E, S | true : Bool(true), S i is an integer literal ----------------------- so, E, S | i : Int(i), S s is a string literal n is the length of s ----------------------- so, E, S | s : String(n, s), S E(id) = l_id S(l_id) = v ------------------- so, E, S | id : v, S ----------------------- so, E, S | self: so, S so, E, S |- e: v, S1 E(id) = l_id S2 = S1[v/l_id] ----------------------- so, E, S | id <- e: v, S2
id <- e
is the value ofe
- order matters due to side effects (i.e., new store)so, E, S |- e1: v1, S1 so, E, S1|- e2: v2, S2 --------------------------- so, E, S |- e1 + e2 : v1+v2 , S2
- note that the value of the block exprso, E, S |- e1: v1, S1 so, E, S1|- e2: v2, S2 ... so, E, Sn-1|- en: vn, Sn --------------------------- so, E, S |- e1, ..., en : vn, Sn
e1,..,en
is the value ofen
-
Example
-
Applying judgement rules to the Cool expression
{x <- 7+5; 4;}
so, [x:l], [l->0] |- 7 : Int(7), [l->0]
so, [x:l], [l->0] |- 5 : Int(5), [l->0]
so, [x:l], [l->0] |- 7+5 : Int(12), [l->0]
[l->0](12/l) = [l->12]
----------------------------------------
so,[x:l],[l->0] |- x <- 7 + 5: 12,[l->12]
so,[x:l],[l->12] | 4 : Int(4), [l->12]
----------------------------------------------
so, [x:l], [l->0]|- {x <- 7+5; 4;} , Int(4), [l->12]
so, E, S | e1: Bool(true), S1
so, E, S1 | e2: v2, S2
-------------------------------------------
so, E, S | if e1 then e2 else e3 fi: v2, S2
so, E, S | e1: Bool(false), S1
so, E, S1 | e3: v3, S3
-------------------------------------------
so, E, S | if e1 then e2 else e3 fi: v3, S3
so, E, S | e1: Bool(false), S1
----------------------------------------
so, E, S | while e1 loop e2 pool: void, S1
so, E, S | e1: Bool(true), S1
so, E, S1| e2: v, S2
so, E, S2| while e1 loop e2 pool: void, S3
----------------------------------------
so, E, S | while e1 loop e2 pool: void, S3
so, E, S | e1: v1, S1
l_new = newloc(S1)
E' = E[l_new/id]
so, E', S1[v1/l_new] |- e2 : v2, S2
------------------------------------------
so, E, S |- let id: T <- e1 in e2: v2, S2
- informal semantics of
new T
- allocate locations to hold all the attributes of an object of a class T (allocate new object, which involve all of its attributes)
- set these attributes to their default values
- D_int = Int(0)
- D_bool = Bool(false)
- D_string = String(0, "")
- D_C = void (for any other class C)
- evaluate the initializers of the attributes
-
return the newly allocated object
-
for a class C, we write
class(C)=(a1:T1 <- e1, ... , an:Tn <- en)
- ai's are attributes (also include inherited ones)
- Ti's the types of attributes ai's
- e_i are the initializer exprs
- order: greatest ancestor first
A{a1, a2} B inherits A {b1, b2} C inherits B {c1, c2} class(C) = (a1:.., a2:.., b1 .., b2, c1, c2)
T0 = if(T == SELF_TYPE and the so=X(...)) X else T
class(T0) = (a1:T1 <- e1, ... , an:Tn <- en)
li = newloc(S) for i=1,...,n
v = T0(a1=l1, ..., an=ln)
S1 = S[D_T1/l1, ... , D_TN/ln]
E' = [a1:l1, ... , an:ln]
v,E',S1 |- {a1<-e1;...;an<-en}: vn, S2
--------------------------
so, E, S |- new T: v, S2
- informal semantics of
e0.f(e1,..,en)
- evaluate the arguments in order
e1,...,en
- evaluate e0 to get the target type X
- from the type X, get the signature/definition of
f
- Create n new location and env that maps f's arguments to those locations
- initial the loc's with the actual arguments
- evaluate the body of
f
impl(C,f) = (x1,..,xn,ebody)
xi
's are the formal argmentsebody`` is the body of the method
f`
so, E, S | e1:v1, S1
so, E, S1 | e2:v2, S2
..
so, E, Sn-1 | en:vn, Sn
so, E, Sn | e0: v0, Sn+1
vo = X(a1=l1,...am=lm)
impl(X, f) = (x1, ..,xn, ebody)
lxi = newloc(Sn+1) for i=1,..,n
E' = [a1:l1, ... am:lm][x1/lx1, ..., xn/lxn]
Sn+2 = Sn+1[v1/lx1,...,vn/lxn]
v0, E', Sn+2 |- ebody: v, Sn+3
-----------------------------------
so, E, S | e0.f(e1,..,en): v, Sn+3
class A{
a
f(a){
a
}
- Many errors DO NOT exist in the operational semantic analysis
-
they are already caught in previous analyses (e.g., type checking already check that f is a function in some class X)
-
There many runtime errors that type checkers and others do not prevent
- Division by 0
- Null dereferencing
- Array index out of bound
- Heap overflow
-
.........
-
Many expensive analyses will not be included by the compiler
- things done by the compiler must be done QUICKLY
- checking program correctness : will never be done inside a compiler, too expensive