Skip to content

Semantic Analysis

  • Lexical analysis
    • Detects inputs with illegal tokens
  • Parsing
    • Detects inputs with ill-formed parse trees
    • Parsing cannot catch some errors
    • Some language constructs not context-free
  • Semantic analysis
    • Last "front end" phase
    • Catches all remaining errors
  • Semantic Analyzer
    • Checks of many kinds
    • E.g., coolc checks:
      1. All identifiers are declared
      2. Types
      3. Inheritance relationships
      4. Classes defined only once
      5. Methods in a class defined only once
      6. Reserved identifiers are not misused
      7. And others . . .
    • The requirements depend on the language

Scope

  • Matching identifier declarations with uses
  • Important static analysis step in most languages
  • Including Cool!

  • Examples

    # Example 1
    # the y in y + 3 is declared by the let
    let y: String <- "abc" in y + 3 
    
    # Example 2
    # the x in x + 3 is not declared here (hopefully somewhere above)
    let y: Int in x + 3 
    

  • The scope of an identifier is the portion of a program in which that identifier is accessible

  • The same identifier may refer to different things in different parts of the program

    • Different scopes for same name don't overlap
  • An identifier may have restricted scope

  • Most languages have static scope

    • Scope depends only on the program text, not run-time behavior
    • Cool has static scope
  • A few languages are dynamically scoped

    • Lisp, but now has changed to mostly static scoping
    • Scope depends on execution of the program

      let x: Int <- 0 in {
          x; # associated with x = 0
          let x: Int <- 1 in
              x;  # associated with x=1
          x; # associated with x = 0
      }
      

    • A dynamically-scoped variable refers to the closest enclosing binding in the execution of the program

    • Example
      g(y) = let a <- 4 in f(3); 
      f(x) = a;  # a is assocated with 4
      

Cool Scope

  • Cool identifier bindings are introduced by

    • Class declarations (introduce class names)
    • Method definitions (introduce method names) - Let expressions (introduce object id's)
    • Formal parameters (introduce object id's)
    • Attribute definitions (introduce object id's)
    • Case expressions (introduce object id's)
  • Not all identifiers follow the most-closely nested rule

  • E.g., class definitions in Cool:
    • Cannot be nested
    • Are globally visible throughout the program
    • A class name can be used before it is defined
      Class Foo {
        let y: Bar in ..  # Bar is defined below
      };
      
      Class Bar {
      };
      
  • Attribute names are global within the class in which they are defined

    Class Foo { 
      f(): Int { a };  # a is defined below
      a: Int <- 0;
    }
    

  • Method names have complex rules

    • A method need not be defined in the class in which it is used, but in some parent class
    • Methods may also be redefined (overridden)

Choose whether or not each variable use binds to the name on the given line

1 Class Foo {
2     f(x: Int): Int { 
3         {
4             let x: Int <- 4 in { 
6                 x;
7                 let x: Int <- 7 in 
8                     x;
9                 x;
10             };
11             x;
12        };
13    };
14    x: Int <- 14;
15 }
  • Line 6 binds to line 2 : No
  • Line 9 binds to line 7 : No
  • Line 11 binds to line 2 : Yes
  • Line 11 binds to line 14: No

Symbol Tables

  • Much of semantic analysis can be expressed as a recursive descent of an AST

    • Before: Process an AST node n
    • Recurse: Process the children of n
    • After: Finish processing the AST node n
  • When performing semantic analysis on a portion of the the AST, we need to know which identifiers are defined

  • A symbol table is a data structure that tracks the current bindings of identifiers

  • Example: the scope of let bindings is one subtree of the AST:

    let x: Int <- 0 in e - x is defined in subtree e

                   |(symbol table)
                   let x
      /(symbol table)   \(symbol table + x)
     init                 e
    

    • Before processing e, add definition of x to current definitions in the symbol table, overriding any other definition of x
    • Recurse
    • After processing e, remove definition of x and restore old definition of x
  • Implementation of Symbol Tables

    • For a simple symbol table we can use a stack
    • Operations
      • add_symbol(x) push x and associated info, such as x's type, on the stack
      • find_symbol(x) search stack, starting from top, for x. Return first x found or NULL if none found
      • remove_symbol() pop the stack
    • The simple symbol table works for let
      • Symbols added one at a time
      • Declarations are perfectly nested
    • But doesn't work for function's formals, e.g., f(x:Int, x:Int){..}
    • Need more complex data structures: each scope is a stack
      • enter_scope(): start a new nested scope
      • find_symbol(x): finds current x (or null)
      • add_symbol(x): add a symbol x to the table
      • check_scope(x): true if x defined in current scope
      • exit_scope(): exit current scope
    • Problem: Class names can be used before being defined
      • We can't check class names using a symbol table or even in one pass
    • Solution:
      • Pass 1: Gather all class names
      • Pass 2: Do the checking
    • In general, (each) semantic analysis requires multiple passes (probably more than two)

Type Checking

  • What is a type?
    • The notion varies from language to language
  • Consensus
    • A set of values
    • A set of operations on those values
    • Ex1: \(\mathbb{Z}\) and \(\{+,-, \le, <\}\),
    • Ex2: string and concat, isnull?
  • Classes are one instantiation of the modern notion of type

  • Some languages have no type

    • Consider the assembly language fragment
    • add $r1, $r2, $r3
    • What are the types of $r1, $r2, $r3?
      • At the ASM level, they really have no types (string of 0's and 1's)
  • Certain operations are legal for values of each type
    • It doesn't make sense to add a function pointer and an integer in C
    • It does make sense to add two integers
    • But both have the same assembly language implementation!

Type System

  • A language's type system specifies which operations are valid for which types
  • The goal of type checking is to ensure that operations are used only with the correct types
    • Enforces intended interpretation of values, because nothing else will!

Three kinds of typings

  1. Statically typed: All or almost all checking of types is done as part of compilation (C, Java, Cool)
  2. Dynamically typed: Almost all checking of types is done as part of program execution (Scheme)
  3. Untyped: No type checking (machine code)

Competing views on static vs. dynamic typing

  • Static typing proponents say:
    • Static checking catches many programming errors at compile time
    • Avoids overhead of runtime type checks
  • Dynamic typing proponents say:

    • Static type systems are restrictive
    • Rapid prototyping difficult within a static type system
  • A lot of code is written in statically typed languages with an "escape" mechanism

  • Unsafe casts in C, Java
  • People retrofit static typing to dynamically typed languages
    • For optimization, debugging
  • It's debatable whether either compromise represents the best or worst of both worlds

Cool Types

  • The types in Cool are:

    • Class Names
    • SELF_TYPE (will talk later)
  • The user declares types for identifiers

  • The compiler infers types for expressions
    • Infers a type for every expression

Checking vs Inferencing

  • Type Checking is the process of verifying fully typed programs
  • Type Inference is the process of filling in missing type information
  • The two are different, but the terms are often used interchangeably

Rules of Inference

  • We have seen two examples of formal notation specifying parts of a compiler
    • Regular expressions
    • Context-free grammars
  • The appropriate formalism for type checking is logical rules of inference

Inference Rules

  • Inference rules have the form

    If Hypothesis is true, then Conclusion is true
    
  • Type checking computes via reasoning

    If E1 and E2 have certain types, then E3 has a certain type
    
  • Rules of inference are a compact notation for If-Then statements

  • The notation is easy to read with practice

  • Start with a simplified system and gradually add features

  • Building blocks

    • Symbol \(\wedge\) is and
    • Symbol \(\Rightarrow\) is if-then
    • \(x:T\) is x has type T
  • Example

    • If \(e1\) has type Int and \(e2\) has type Int, then \(e1 + e2\) has type Int
    • (\(e1\) has type Int \(\wedge\) \(e2\) has type Int) \(\Rightarrow\) \(e1 + e2\) has type Int
    • (\(e1\):Int \(\wedge\) \(e2\):Int) \(\Rightarrow\) \(e1 + e2\):Int

    • The statement \((e1: \text{Int} \wedge e2: \text{Int}) \Rightarrow e1 + e2: \text{Int}\)

      • is a special case of \(\text{Hypothesis}_1 \wedge \dots \wedge \text{Hypothesis}_n \Rightarrow \text {Conclusion}\)
      • This is an inference rule
    • By tradition inference rules are written

    |- Hypothesis ... |- Hypothesis 
    -------------------------------
            |- Conclusion
    
    • Cool type rules have hypotheses and conclusions

    |- e:T means "it is provable that .."

    i is an integer literal 
    -----------------------   [Int]
       |- i : Int
    
    |- e1: Int  |- e2: Int 
    ----------------------  [Add]
      |- e1 + e2 : Int
    
    - These rules give templates describing how to type integers and expressions (e.g., e1, e2) - By filling in the templates, we can produce complete typings for expressions

Soundness

A type system is sound if

  • Whenever |- e: T, then e evaluates to a value of type T
  • We only want sound rules

    • But some sound rules are better than others!
    • E.g., below rule is sound, but weak because i should be Int intstead of Obj in conclusion
        i is Int 
      --------------
      |- i : Object  # either Object or Int is correct, but Int is more precise
      
  • Question:

    • Which of the below type rules are sound?

      *Not sound*
      |- e1: Int  |- e2: Int
      ----------------------   [Divide]
      |- e1 / e2 : Bool
      
      
      *Sound*
      |- e1: T1
      |- e2: T2
       .
       .
       .
      |- en: Tn
      ---------   [Sequence]
      |- {e1; e2; ... en; } : Tn
      
      
      *Sound*
         |- e1: T1
      --------------   [Isvoid]
      |- isvoid e1 : Bool
      
      
      *Not sound*
      |- e1: Int  |- e2: Int
      ---------------------- [Compare]
      |- e1 < e2 :Int
      

    Type Checking

    • Type checking proves facts e: T
      • Proof is on the structure of the AST
      • Proof has the shape of the AST
      • One type rule is used for each AST node
    • In the type rule used for a node e:
      • Hypotheses are the proofs of types of e's subexpressions
      • Conclusion is the type of e
    • Types are computed in a bottom-up pass over the AST

Type Environments

------------------ [False]
  |- false : Bool

s is a string literal 
---------------------- [String]
  |- s: String


# new T produces an object of type T 
# Ignore SELF_TYPE for now . . .

----------------- [New]
  |- new T : T



  |-  e: Bool 
--------------- [Not]
  |- !e: Bool


# Important: type of while is Object, not T
   |- e1: Bool  
   |- e2: T
--------------------------------  [Loop]
|- while e1 loop e2 pool: Object
  • Problem: What is the type of a variable reference?

    x is a variable 
    --------------- [Var]
    |- x: ?
    
    • The local, structural rule does not carry enough information to give x a type.
    • Solution: Put more information in the rules!
  • A type environment gives types for free variables

    • A type environment is a function from ObjectIdentifiers to Types
    • A variable is free in an expression if it is not defined within the expression

      x   # x is free
      x + y   # both x,y are free
      let y <- ...  in x + y    # y is bound, x is free
      
      - Let O be a function from ObjectIdentifiers to Types - The sentence O |- e: T is read: - Under the assumption that variables have the types given by O, it is provable that the expression e has the type T - Type environment is added to the earlier rules:

    i is an integer literal 
    -----------------------   [Int]
    O |- i : Int
    
    O |- e1: Int  O |- e2: Int 
    ---------------------------  [Add]
    O |- e1 + e2 : Int
    

Now we can define new rules

  O(x) = T
  ---------   [Var]
  O | x : T

 O[T0/x] |- e1: T1
----------------------  [Let-No-Init]
O |- let x:T0 in e1: T1

# Note O[T/x](x) = T and O[T/x](y) = O(y) where x#y
  • Question:

    Fill in the correct type environments in the following type rule

        O1 |- e1: T1 
        O2 |- e2: T2
    --------------------------------  [Let-Init]
    O |- let x: T1 <- e1 in e2 : T2
    
    
    - O1 = O[T1/x]; O2 = O[T1/x]   ?
    - O1 = O[T1/x]; O2 = O[T2/x]   ? 
    - O1 = O; O2 = O[T1/x]         ?
    - O1 = O; O2 = O[T2/x]         ?
    

Summary

-   The type environment gives types to the free identifiers in the
    current scope
-   The type environment is passed down the AST from the root
    towards the leaves
-   Types are computed up the AST from the leaves towards the root

Subtyping

  • Consider let with initialization:

    O|- e0:T0
    O[T0/x] |- e1: T1
    --------------------------- [Let-Init]
    O |- let x:T0 <- e0 in e1 : T1
    
  • Define a relation \(\le\) on classes

  • \(X\le X\)

  • \(X\le Y\) if \(X\) inherits from \(Y\)
  • \(X \le Z\) if \(X \le Y\) and \(Y \le Z\)

        O |- e0: T0
    O[T/x] |- e1: T1
        T0 <= T
    ---------------------------- [Let-Init]
    O |- let x:T <- e0 in e1 : T1
    
    
        O(x) = T0 
        O|- e1:T1
    ----------------
        T1 <= T0
    ---------------- [Assign]
    O|- x<- e1:T1
    
  • Let \(O_C(x) = T\) for all attributes \(x:T\) in class \(C\)

        OC(x) = T0
        ----------
        OC |- e1: T1
        ----------
        T1 <= T0
    -------------------- [Attr-Init]
    OC |- x:T0 <-  e1;
    
  • Consider:

    if e0 then e1 else e2 fi
    
  • The result can be either e1 or e2

  • The type is either e1's type of e2's type
  • The best we can do is the smallest supertype larger than the type of e1 or e2
  • lub(X,Y): the least upper bound of X and Y, is Z if
    • \(X\le Z \wedge Y\le Z\) # Z is an upper bound
    • \(X \le Z' \wedge Y \le Z' \Rightarrow Z \le Z'\) # Z is least among upper bounds
  • In COOL, the least upper bound of two types is their least common ancestor in the inheritance tree

            OBJECT
        /            \
    LUB(x,x)
    /     \       /     \
    /   \  /  \   /  \   /  \
    x          x  
    
  • Question

    Class Object
    Class Bool inherits Object 
    Class Point inherits Object 
    Class Line inherits Object 
    Class Shape inherits Object 
    Class Quad inherits Shape 
    Class Circle inherits Shape 
    Class Rect inherits Quad 
    Class Square inherits Rect
    

Given the above class definitions, which of the following lub statements are true?

- lub(Point, Quad) = Object 
- lub(Square, Rect) = Quad 
- lub(Square, Rect) = Rect 
- lub(Square, Circle) = Object

Continuing with if-then-else

         O |- e0: Bool
         O|- e1:T1
         O|- e2:T2
----------------------------------------    [If-Then-Else]
O |- if e0 then e1 else e2 fi: lub(T1,T2)

The rule for case expressions takes a lub over all branches

           O |- e0: T0
           ------------
         O[T1/x1] |- e1: T1'
               ...
         O[Tn/xn] |- en: Tn'
-------------------------------------------------------------------- [Case]
O |- case e0 of x1:T1 -> e1; ...; xn:Tn -> en; esac : lub(T1',...,Tn')

Method Environment

Problem: What type should we use for the ? below?

  O |- e0:T0
  O |- e1:T1
    ...
  O |- en: Tn
-------------------------- [Dispatch]
 O |- e0.f(e1, ... ,en): ?
  • In Cool, method and object identifiers live in different name spaces
    • A method foo and an object foo can coexist in the same scope
  • In the type rules, this is reflected by a separate method environment M for method signatures

    M(C,f) = (T1,. . .Tn,Tn+1)
    

    means in class C there is a method f

    f(x1:T1,...,xn:Tn): Tn+1
    
    - Now we can use the method environment M to define dispatch rules

  O,M |- e0:T0
  O,M |- e1:T1
    ...
  O,M |- en: Tn
  M(T0,f) = (T1’,. . .Tn’,Tn+1) 
  Ti <=Ti’ for 1 <=i <=n
-------------------------------- [Dispatch]
 O,M |- e0.f(e1, ... ,en): Tn+1


 O, M |- e0: T0
 O, M |- e1: T1
    ...
 O, M |- en: Tn
 T0 <=T
 M(T,f) = (T1',..., Tn', Tn+1)
 Ti <=Ti' for 1 <=i <=n
------------------------------------ [Static Dispatch]
 O, M |- e0@T.f(e1, ... ,en): Tn+1

Question: Given the class definitions and method declaration below, which of the following are valid types for the variables in the statement below?

z <- x.setCenter(y)

Class Object
Class Bool inherits Object 
Class Point inherits Object 
Class Line inherits Object 
Class Shape inherits Object {
  setCenter(p: Point): Bool { ...
  };
  ...
};
Class Quad inherits Shape 
Class Circle inherits Shape 
Class Rect inherits Quad 
Class Square inherits Rect

Answer Choice

x: Rect, y: Object, z: Bool  Y/N
x: Circle, y: Point, z: Bool  Y/N
x: Object, y: Object, z: Object Y/N
x: Shape, y: Point, z: Bool Y/N
  • The method environment must be added to all rules
  • In most cases, M is passed down but not actually used
    • Only the dispatch rules use M
O,M |- e1:Int   O,M |- e2:Int 
----------------------------- [Add]
     O,M |-e1+e2:Int
  • For some cases involving SELF_TYPE, we need to know the class in which an expression appears

  • The full type environment for COOL:

    • A mapping O giving types to object id's
    • A mapping M giving types to methods
    • The current class C
  • The form of a sentence in the logic is

    O,M,C |- e: T
    

Example:

O,M,C |- e1: Int O,M,C |- e2: Int 
--------------------------------- [Add]
O,M,C |- e1 + e2 : Int
  • The rules in this lecture are COOL-specific
    • Some other languages have very different rules
  • General themes
    • Type rules are defined on the structure of expressions
    • Types of variables are modeled by an environment
  • Warning: Type rules are very compact!

Implementing Type Checking

  • COOL type checking can be implemented in a single traversal over the AST
  • Type environment is passed down the tree
    • From parent to child
  • Types are passed up the tree

    • From child to parent
  • Examples

     O,M,C |- e1: Int    O,M,C |- e2: Int 
    -------------------------------------- [Add] 
     O,M,C |- e1 + e2 : Int
    
    TypeCheck(Environment, e1 + e2){ 
        T1 = TypeCheck(Environment, e1); 
        T2 = TypeCheck(Environment, e2); 
        Check T1 == T2 == Int;
        return Int; 
    }
    

     O |- e0:T0
     O[T/x] |- e1: T1
     T0 <= T
    ------------------------
    O |- let x:T <- e0 in e1 : T1
    
    TypeCheck(Environment, let x:T <- e0 in e1) { 
        T0 = TypeCheck(Environment, e0);
        T1 = TypeCheck(Environment.add(x:T), e1); 
        Check subtype(T0,T1);
        return T1;
    }
    

Static vs Dynamic Checking

  • Static type systems detect common errors at compile time

  • But some correct programs are disallowed

    • Some argue for dynamic type checking instead
    • Others want more expressive static type checking
  • But more expressive type systems are more complex

  • The dynamic type of an object is the class C that is used in the "new C" expression that created it

    • A run-time notion
    • Even languages that are not statically typed have the notion of dynamic type
  • The static type of an expression captures all dynamic types the expression could have

    • A compile-time notion

Self Type

class Count {
  i : int <- 0;
  inc () : Count {
    {
      i <- i + 1;
      self;
    } 
  };
};
  • Class Count incorporatesa counter
  • The inc method works for any subclass
  • Now, consider a subclass Stock of Count
class Stock inherits Count { 
  name : String; -- name of item
};

class Main {
  Stock a <- (new Stock).inc ();
  ... a.name ...
};
  • Problem: Ok for dynamic, not ok for Static type
    • (new Stock).inc() has dynamic type Stock
    • So it is legitimate to write Stock a <- (new Stock).inc ()
  • But this is not well-typed
    • (new Stock).inc() has static type Count
    • Thus cannot assigned to a Stock because Count is not a subtype of Stock
  • The reason for this problem is because the type checker "loses" type information
    • This makes inheriting inc useless
    • So, we must redefine inc for each of the subclasses, with a specialized return type, i.e., motivation for self type, discussed next

Self Type

  • We will extend the type system
  • Insight:
    • inc returns self
    • Therefore the return value has same type as self
    • Which could be Count or any subtype of Count!
  • Introduce the keyword SELF_TYPE to use for the return value of such functions
    • We will also need to modify the typing rules to handle SELF_TYPE
  • SELF_TYPE allows the return type of inc to change when inc is inherited
  • Modify the declaration of inc to read

    inc() : SELF_TYPE { ... }
    
  • The type checker can now prove:

    O,M,C |- (new Count).inc() : Count 
    O,M,C |- (new Stock).inc() : Stock
    
  • The program from before (that was not well-typed) is now well-typed by using SELF_TYPE

  • SELF_TYPE is not a dynamic type

    • It is a static type
    • It helps the type checker to keep better track of types
    • It enables the type checker to accept more correct programs
  • In short, having SELF_TYPE increases the expressive power of the type system

Self-Type Operations

Recall

class Count {
  i : int <- 0;
  inc () : Count {
    {
      i <- i + 1;
      self;
    } 
  };
};
  • What can be the dynamic type of the object returned by inc?
    • Answer: whatever could be the dynamic type of "self"
    • E.g., incr could be invoked through Count or any subclass of Count
  • In general, if SELF_TYPE appears textually in the class C as the declared type of E then

    dynamic_type(E) <= C
    
  • Note: The meaning of SELF_TYPE depends on where it appears

    • SELF_TYPE_C to refers to an occurrence of SELF_TYPE in the body of C
  • This suggests a typing rule:

    SELF_TYPE_C <= C (*)
    
  • Rule (*) has an important consequence:

    • In type checking it is always safe to replace SELF_TYPE_C by C
  • This suggests one way to handle SELF_TYPE :
    • Replace all occurrences of SELF_TYPE_C by C
  • Recall the operations on types
    • T1 <= T2 T1 is a subtype of T2
    • lub(T1,T2) the least-upper bound of T1 and T2
  • Extend these operations to handle SELF_TYPE

  • Subtypes involving Self Type

    Let T and T' be any types but SELF_TYPE

    1. SELF_TYPE_C <= SELF_TYPE_C
      • In Cool we never compare SELF_TYPEs coming from different classes
    2. SELF_TYPE_C <= T if C <= T
      • SELF_TYPE_C can be any subtype of C
      • This includes C itself
        • Thus this is the most flexible rule we can allow
    3. T <= SELF_TYPE_C always false. Note: SELF_TYPE_C can denote any subtype of C. Why ? Consider this inheritence
                C
               /
              T
             /
            A
    
    So SELF_TYPE_C could be A,  and we would have T <= A,  but this is not possible because C <= T
    
  • Lub's involving Self Type

    Let T and T' be any types but SELF_TYPE

    1. lub(SELF_TYPE_C, SELF_TYPE_C) = SELF_TYPE_C
    2. lub(SELF_TYPE_C, T) = lub(C, T)
      • This is the best we can do because SELF_TYPE_C <= C
    3. lub(T, SELF_TYPE_C) = lub(C, T)

Self Type Usage

  • The parser checks SELF_TYPE appears only where a type is expected
  • But SELF_TYPE is not allowed everywhere a type can appear:

  • class T inherits T' {...}

    • T, T' cannot be SELF_TYPE
  • x:T
    • Attribute x
    • T can be SELF_TYPE
  • let x : T in E
    • T can be SELF_TYPE
  • new T
    • T can be SELF_TYPE
    • Creates an object of the same dynamic type as self
  • m@T(E1,...,En)
    • T cannot be SELF_TYPE
  • m(x : T) : T' { ... }

    • Only T' can be SELF_TYPE !

    • What could go wrong if T were SELF_TYPE?

      class A { comp(x : SELF_TYPE) : Bool {...}; }; 
      class B inherits A {
        b : int;
        comp(x : SELF_TYPE) : Bool { ... x.b ...}; };
      ...
      let x : A <- new B in ... x.comp(new A); ... 
      ...
      
  • Self Type Checking Rules

    • SELF_TYPE's meaning depends on the enclosing class

      O,M,C |- e : T
      

      An expression e occurring in the body of C has static type T given a variable type environment O and method signatures M

    • The next step is to design type rules using SELF_TYPE

    • Most of the rules remain the same

    • But use the new <= (subtype) and lub

      O(Id) = T0
      O,M,C |- e1 : T0
      T1 <= T0   -- new subtype rule, i.e., needs to check if T1 or T0 are SUB_TYPE
      O,M,C |- Id <- e1 : T1
      

    • Recall the old rule for dispatch

      O,M,C |- e0 : T0 
            ...
      O,M,C |- en : Tn
      M(T0, f) = (T1’,...,Tn’,Tn+1’)
      Tn+1 ’ # SELF_TYPE 
      Ti <= Ti'  1 <= i <= n 
      ----------------------------------
      O,M,C |- e0.f(e1,...,en) : Tn+1'
      
    • New rule for dispatch

      • If the return type of the method is SELF_TYPE then the type of the dispatch is the type of the dispatch expression:
      O,M,C |- e0 : T0 
            ...
      O,M,C |- en : Tn
      M(T0, f) = (T1’,...,Tn’,SELF_TYPE)
      Ti <= Ti'  1 <= i <= n 
      ----------------------------------
      O,M,C |- e0.f(e1,...,en) : T0   -- Q: think why it is T0 ? 
      
    • Formal parameters cannot be SELF_TYPE

    • Actual arguments can be SELF_TYPE -- The extended <= relation handles this case

    • The type T0 of the dispatch expression could be SELF_TYPE -- Which class is used to find the declaration of f? -- Answer: it is safe to use the class where the dispatch appears

    • Recall the old rule for static dispatch

      O,M,C |- e0 : T0 M
      O,M,C |- en : Tn
      T0 <= T
      M(T, f) = (T1’,...,Tn’,Tn+1’)
      Tn+1’ # SELF_TYPE
      Ti <= Ti’ 1 <= i <= n
      ------------------------------------
      O,M,C |- e@T.f(e1,...,en) : Tn+1'
      
    • New rule for static dispatch

      • If the return type of the method is SELF_TYPE we have:
      O,M,C |- e0 : T0 M
      O,M,C |- en : Tn
      T0 <= T
      M(T, f) = (T1’,...,Tn’,SELF_TYPE)
      Ti <= Ti’ 1 <= i <= n
      ------------------------------------
      O,M,C |- e@T.f(e1,...,en) : T0
      
    • Why is this rule correct?

      • If we dispatch a method returning SELF_TYPE in class T, don't we get back a T?
      • No. SELF_TYPE is the type of the self parameter, which may be a subtype of the class in which the method appears
    • There are two new rules using SELF_TYPE

      O,M,C |- self : SELF_TYPE_C
      
      O,M,C |- new SELF_TYPE : SELF_TYPE_C
      
    • The extended <= and lub operations can do a lot of the work.

    • SELF_TYPE can be used only in a few places. Be sure it isn't used anywhere else.
    • A use of SELF_TYPE_C always refers to any subtype of the current class C -- The exception is the type checking of dispatch. The method return type of SELF_TYPE might have nothing to do with the current class
  • Summary

    • SELF_TYPE is a research idea -- It adds more expressiveness to the type system
    • SELF_TYPE is itself not so important -- except for the project
    • Rather, SELF_TYPE is meant to illustrate that type checking can be quite subtle
    • In practice, there should be a balance between the complexity of the type system and its expressiveness

Semantics

Three types of Semantics

  • Operational Semantics: details (e.g., mem loc, evaluation order matters)
  • Denotational Semantics: high level, mathematic way of understanding progarms
  • Axiomatic Semantics: mainly used for verification, check if the program is correct with respect to some specification (e.g., output is half of input)

Operational Semantics

  • Gives the meaning to an (Cool) expression
  • Evaluate an expr to some value

For typing, the rule looks like

Env |- e: C    #evaluate e to some class (type) C under the environment Env

For evaluation (operational semantic), the rule looks very similar

Env |- e: V  #evaluate e to some value under the environment Env
- Sometimes Env is referred to as CONTEXT

  • Example:
           Env |-  e1: 5
           Env |-  e2: 7
        ---------------------------
           Env |-   e1 + e2 : 12
    

Formal Notations

y <- x + 1  
- E is an environment: tells us where in memory a variable is - a mapping: varid -> memory loc, e.g., E(x) = l1 - E = [x: l1, y: l2, ...] - S is a store: tells us what is in the memory - a mapping: memory loc -> value, e.g., S(l1) = 33 - S = [l1 -> 33, l2 -> 7] - S\' = S[100/l2] # S\' is S but with S(l2)= 100 (instead of 7) - Env = E, S

  • Cool values are objects
    • objs are instances of some class
    • X(a1 = l1, ..., a_n = l_n) is a Cool object where
      • X is a class of the object
      • a_i\'s are the attributes of class X (include inherited ones)
      • l_i\'s are the mem location where the value of a_i\'s are stored
    • Special classes (primitive, no attribues)
      • Int(5) : an integer value 5
      • Bool(true): ...
      • String(4, \"Cool\"): a string \"Cool\" of length/size 4
      • void
        • no operations can be performed on it
        • Except for the test isvoid
        • similar idea to NULL

Evaluation judgement

so, E, S |- e: v, S'
  • so is the current value of self , self
  • E : variable mapping -> mem loc
  • S : store mem loc -> val
  • v

  • Examples

        ------------------------------    
        so, E, S | true :  Bool(true), S   
    
    
        i is an integer literal
        -----------------------
        so, E, S | i : Int(i), S
    
    
    
        s is a string literal
        n is the length of s
        -----------------------
        so, E, S | s : String(n, s), S
    
    
         E(id) = l_id
         S(l_id) = v
        -------------------
        so, E, S | id : v, S
    
    
        -----------------------
        so, E, S | self: so, S
    
        so, E, S |- e:  v, S1
        E(id) = l_id
        S2 = S1[v/l_id]
        -----------------------
        so, E, S | id <- e: v, S2   
    
    - note that the value of the assignment expr id <- e is the value of e
        so, E, S |- e1: v1, S1
        so, E, S1|- e2: v2, S2
        ---------------------------
        so, E, S |- e1 + e2 : v1+v2 , S2
    
    - order matters due to side effects (i.e., new store)

        so, E, S |- e1: v1, S1
        so, E, S1|- e2: v2, S2
              ...
        so, E, Sn-1|- en: vn, Sn
        ---------------------------
        so, E, S |- e1, ..., en :  vn, Sn
    
    - note that the value of the block expr e1,..,en is the value of en

  • Example

  • Applying judgement rules to the Cool expression {x <- 7+5; 4;}

so, [x:l], [l->0] |- 7 : Int(7), [l->0]
so, [x:l], [l->0] |- 5 : Int(5), [l->0]
so, [x:l], [l->0] |- 7+5 : Int(12), [l->0]

[l->0](12/l) = [l->12]
----------------------------------------
so,[x:l],[l->0] |- x <- 7 + 5: 12,[l->12]


so,[x:l],[l->12] | 4 : Int(4), [l->12]
----------------------------------------------
so, [x:l], [l->0]|- {x <- 7+5; 4;} , Int(4), [l->12] 
so, E, S | e1: Bool(true), S1
so, E, S1 | e2: v2, S2

-------------------------------------------
so, E, S | if e1 then e2 else e3 fi: v2, S2



so, E, S | e1: Bool(false), S1
so, E, S1 | e3: v3, S3

-------------------------------------------
so, E, S | if e1 then e2 else e3 fi: v3, S3



so, E, S | e1: Bool(false), S1
----------------------------------------
so, E, S | while e1 loop e2 pool: void, S1


so, E, S | e1: Bool(true), S1
so, E, S1| e2: v, S2
so, E, S2| while e1 loop e2 pool: void, S3
----------------------------------------
so, E, S | while e1 loop e2 pool: void, S3

so, E, S | e1: v1, S1
l_new = newloc(S1)
E' = E[l_new/id]
so, E', S1[v1/l_new] |- e2 : v2, S2
------------------------------------------
so, E, S |- let id: T <- e1 in e2:  v2, S2
- newloc(S) -> l where l does not exist in S (malloc, new, ..)

  • informal semantics of new T
  • allocate locations to hold all the attributes of an object of a class T (allocate new object, which involve all of its attributes)
  • set these attributes to their default values
    • D_int = Int(0)
    • D_bool = Bool(false)
    • D_string = String(0, "")
    • D_C = void (for any other class C)
  • evaluate the initializers of the attributes
  • return the newly allocated object

  • for a class C, we write

    class(C)=(a1:T1 <- e1, ... , an:Tn <- en)
    

    • ai's are attributes (also include inherited ones)
    • Ti's the types of attributes ai's
    • e_i are the initializer exprs
    • order: greatest ancestor first
    A{a1, a2}
    B inherits A {b1, b2}
    C inherits B {c1, c2}
    class(C) = (a1:.., a2:.., b1 .., b2, c1, c2)
    
T0 = if(T == SELF_TYPE and the so=X(...)) X else T
class(T0) = (a1:T1 <- e1, ... , an:Tn <- en)
li = newloc(S) for i=1,...,n
v = T0(a1=l1, ..., an=ln)
S1 = S[D_T1/l1, ... , D_TN/ln]
E' = [a1:l1, ... , an:ln]
v,E',S1 |- {a1<-e1;...;an<-en}: vn, S2
--------------------------
so, E, S |- new T: v, S2
  • informal semantics of e0.f(e1,..,en)
  • evaluate the arguments in order e1,...,en
  • evaluate e0 to get the target type X
  • from the type X, get the signature/definition of f
  • Create n new location and env that maps f's arguments to those locations
  • initial the loc's with the actual arguments
  • evaluate the body of f
  • impl(C,f) = (x1,..,xn,ebody)
    • xi's are the formal argments
    • ebody`` is the body of the methodf`
 so, E, S | e1:v1, S1
 so, E, S1 | e2:v2, S2
 ..
 so, E, Sn-1 | en:vn, Sn

 so, E, Sn | e0: v0, Sn+1

 vo = X(a1=l1,...am=lm)
 impl(X, f) = (x1, ..,xn, ebody)
 lxi = newloc(Sn+1) for i=1,..,n
 E' = [a1:l1, ... am:lm][x1/lx1, ..., xn/lxn]
 Sn+2 = Sn+1[v1/lx1,...,vn/lxn]
 v0, E', Sn+2 |- ebody: v, Sn+3
-----------------------------------
 so, E, S | e0.f(e1,..,en): v, Sn+3
class A{
    a
    f(a){
        a
    }
  • Many errors DO NOT exist in the operational semantic analysis
  • they are already caught in previous analyses (e.g., type checking already check that f is a function in some class X)

  • There many runtime errors that type checkers and others do not prevent

  • Division by 0
  • Null dereferencing
  • Array index out of bound
  • Heap overflow
  • .........

  • Many expensive analyses will not be included by the compiler

  • things done by the compiler must be done QUICKLY
  • checking program correctness : will never be done inside a compiler, too expensive