PA4: Semantic Analyzer
Goal
For this assignment you will write a semantic analyzer which type checks Cool programs. Among other things, this involves traversing the abstract syntax tree and the class hierarchy. You will reject all Cool programs that do not comply with the Cool type system.
You will also write additional code to deserialize the AST produced by the parser stage and to serialize the class map, implementation map, parent map, and annotated AST produced by your semantic analysis.
Specification
You will turn in an Ocaml program that takes a single command-line argument (e.g.,file.cl-ast
).
That argument will be an ASCII text Cool abstract syntax tree file (as described in PA3).
Your program must either indicate that there is an error in the input (e.g., a
type error) or emit file.cl-type
, a serialized Cool abstract
syntax tree, class map, implementation map, and parent map.
If your program is called checker
, invoking checker file.cl-ast
should yield the same output as cool --type file.cl
.
Error Reporting
To report an error, write the string:
ERROR: line_number: Type-Check: message
to standard output and terminate the program. You may write whatever you want in the message, but it should be fairly indicative. Example erroneous input:
1 2 3 4 5 |
|
ERROR: 3: Type-Check: arithmetic on String Int instead of Ints
Line Number Error Reporting
The typing rules do not directly specify the line numbers on which errors are to be reported. The Cool reference compiler uses these guidelines (possibly surprising ones are italicized):
-
Errors related to parameter-less method main in
class Main
: always line 0 -
Inheritance cycle: always line 0
-
Other inheritance type problem: inherited type identifier location
-
self
orSELF_TYPE
used in wrong place: self (resp. SELF_TYPE) identifier (resp. type) location -
Redefining a feature: (second) feature location
-
Redefining a formal or class: (second) identifier location
-
Other attribute problems: attribute location
-
Redefining a method and changing types: (second) type location
-
Other problems with redefining a method: method location
-
Method body type does not conform: method name identifier location
-
Attribute initializer does not conform: attribute name identifier location
-
Errors with types of arguments to relational/arithmetic operations: location of relational/arithmetic operation expression
-
Errors with types of
while / if
subexpression(s): location of (enclosing)while
orif
expression (not the location of the conditional expression) -
Errors with
case
expression (e.g., lub): location ofcase
expression -
Errors with conformance in
let
: location oflet
expression (not location of initializer) -
Errors in blocks: location of (beginning of) block expression
-
Errors in actual arguments: location of method invocation expression (not the location of any particular actual argument)
-
Assignment does not conform: assignment expression location (not right-hand-side location)
-
Unknown identifier: location of identifier
-
Unknown method: location of method name identifier
-
Unknown type: location of type
Remember that you do not have to match the English prose of the reference compiler's error messages at all. You just have to get the line number right.
Semantic checks are unordered -- if a program contains two or more errors, you may indicate whichever you like. You can infer from this that all of our test cases will contain at most one error.
The .cl-type
File Format
If there are no errors in file.cl-ast
your program should create
file.cl-type
and serialize the class map, implementation map, parent
map, and annotated AST to it.
The class and implementation maps are described in the Cool Reference Manual.
A .cl-type
file consists of four sections:
- The class map.
- The implementation map.
- The parent map.
- The annotated AST.
Simply output the four sections in order, one after the other.
We will now describe exactly what to output for the class and implementation maps. The general idea and notation (one string per line, recursive descent) are the same as in PA3.
Output
-
The Class Map
- Output class_map
\n
. - Output the number of classes and then
\n
. - Output each class in turn (in ascending alphabetical order):
- Output the name of the class and then
\n
. - Output the number of attributes and then
\n
. - Output each attribute in turn (in order of appearance, with
inherited attributes from a superclass coming first):
- Output no_initializer
\n
and then the attribute name\n
and then the type name\n
. - or Output initializer
\n
and then the attribute name\n
and then the type name\n
and then the initializer expression.
- Output no_initializer
- Output the name of the class and then
- Output class_map
-
The Implementation Map
- Output implementation_map
\n
. - Output the number of classes and then
\n
. - Output each class in turn (in ascending alphabetical order):
- Output the name of the class and then
\n
. - Output the number of methods for that class and then
\n
. - Output each method in turn (in order of appearance, with
inherited or overridden methods from a superclass coming
first; internal methods are defined to appear in ascending
alphabetical order):
- Output the method name and then
\n
. - Output the number of formals and then
\n
. - Output each formal's name only:
- Output the name and then
\n
- Output the name and then
- If this method is inherited from a parent class and not
overriden, output the name of the ultimate parent class
that defined the method body expression and then
\n
. Otherwise, output the name of the current class and then\n
. - Output the method body expression.
- Output the method name and then
- Output the name of the class and then
- Output implementation_map
-
The Parent Map
- Output parent_map
\n
. - Output the number of parent-child inheritance relations and then
\n
. This number is equal to the number of classes minus one (sinceObject
has no parent). - Output each child class in turn (in ascending alphabetical
order):
- Output the name of the child class and then
\n
. - Output the name of the child class's parent and then
\n
.
- Output the name of the child class and then
- Output parent_map
-
The Annotated AST
- With two exceptions, the annotated AST format is identical to the normal AST from PA3.
-
The first change involves expressions. To output an
Expression
:- Output the line number of the expression and then a newline (as in PA3).
- Output the name of type associated with the expression and
then a newline. For example, the expression
3+x
is associated with the typeInt
. This is new to PA4. It should be done for PA4, but not for PA4c. - Output the name of the expression and then a newline and then any subparts.
-
The second change is a new kind of expression, internal, used to represent the bodies of predefined methods. Internal expressions are those that are handled by the run-time system -- you might think of them as part of the standard library. You output Internal Expressions (including the type annotation, as above) as follows:
0 \n type \n internal \n Class.method \n
The valid kinds of internal expressions (i.e., the values for Class.method) are:
IO.in_int IO.in_string IO.out_int IO.out_string Object.abort Object.copy Object.type_name String.concat String.length String.substr
They are formally defined in the Cool Reference Manual.
Note that you must output information about all classes and methods defined in the program as well as all base classes (and their methods). Do not just print out "classes actually used" or "methods actually called" or something like that. Output all classes and methods -- no optimizations or shortcuts!
Detailed .cl-type Example
Now that we've formally defined the output specification, we can present a worked example. Here's the example input we will consider:
Example input
1 2 3 4 5 6 |
|
Resulting .cl-type
class map output with comments
.cl-type class map | comment |
---|---|
class_map | |
6 | number of classes |
Bool | note: includes predefined base classes |
0 | |
IO | |
0 | |
Int | |
0 | |
Main | |
1 | our Main has 1 attribute |
initializer | |
my_attribute | named "my_attribute" |
Int | with type Int |
2 | initializer expression line number |
Int | initializer expression type (see above: this is an expression annotated with a type) |
-- do not emit these expression type annotations for the PA4c Checkpoint! | |
integer | initializer expression kind |
5 | which integer constant is it? |
Object | |
0 | |
String | |
0 |
Resulting .cl-type
implementation map output with comments
.cl-type implementation map | comment |
---|---|
implementation_map | |
6 | six classes |
Bool | first is Bool |
3 | - it has three methods |
abort | - first is abort() |
0 | -- abort has 0 formal arguments |
Object | -- name of parent class from which Bool inherits abort() |
0 | -- abort's body expression starts on line 0 |
Object | -- abort's body expression has type Object |
internal | -- abort's body is an internal kind of expression (i.e., a system call; see above) |
Object.abort | -- extra detail on abort's body expression |
copy | - second of Bool's three methods is copy() |
0 | -- copy has 0 formal arguments |
Object | -- name of parent class from which Bool inherits copy() |
0 | -- copy's body expression starts on line 0 |
SELF_TYPE | -- copy's body expression has type SELF_TYPE |
internal | -- copy's body is an internal kind of expression (i.e., a system call; see above) |
Object.copy | -- extra detail on copy's body expression |
... many lines skipped ... | |
Main | another class is Main |
8 | - it has 8 methods |
... many lines skipped ... | |
main | - one of Main's methods is main() |
0 | -- main has 0 formal arguments |
Main | -- the name of the class where Main.main() is defined |
4 | -- the body expression of Main.main starts on line 4 |
SELF_TYPE | -- the body expression of Main.main has type SELF_TYPE |
self_dispatch | -- the body of Main.main() is a self_dispatch kind of expression |
... many lines skipped ... |
Finally, the resulting .cl-type parent map output with comments
.cl-type parent map | comment |
---|---|
parent_map | |
5 | there are five classes with parents (Object is the sixth class) |
Bool | Bool's parent ... |
Object | ... is Object. |
IO | IO's parent ... |
Object | ... is Object. |
Int | Int's parent ... |
Object | ... is Object. |
Main | Main's parent ... |
IO | ... is IO. |
String | String's parent ... |
Object | ... is Object. |
Writing the code to output a .cl-type
text file given an AST may
take a bit of time but it should not be difficult; our reference
implementation does it in 35 lines and cleaves closely to the structure
given above. Reading in the AST is similarly straightforward; our
reference implementation does it in 171 lines.
Commentary
Tests
- You can use the following tests to check your implementation.
- You can do basic testing as follows:
$ cool --parse file.cl
$ cool --out reference --type file.cl
$ my-checker file.cl-ast
$ diff -b -B -E -w file.cl-type reference.cl-type
- You should implement all of the typing rules in the Cool Reference
Manual. There are also a number of other rules and corner cases you have
to check (e.g., no class can inherit from
Int
, you cannot redefine a class, you cannot have an attribute named self, etc.). They are sprinkled throughout the manual. Check everything you possibly can.
Video Guides
Turn-In and Grading
What to Turn In
PA4c
PA4c is a checkpoint for PA4 to encourage you to start early.
You must turn a zip file contains a single main.ml
file.
PA4c is essentially an early version of PA4
that does the following:
-
Reads in the .cl-ast file given as a command-line argument.
- You do not need to use a parser generator to read in the
.cl-ast
file -- its format was specifically chosen to make it easy to read with just some mutually-recursive procedures. It should take you (much) less than 150 lines to read in the.cl-ast
file.
- You do not need to use a parser generator to read in the
-
Does every bit of typechecking and semantic analysis possible without typechecking expressions.
- Thus you should not annotate types in initializer expressions in the class map.
-
Prints out error messages as normal.
-
Outputs only the class map to
.cl-type
if there are no errors.- You can use the
--class-map
command-line argument to get the reference compiler to spit out the class map after typechecking (for comparison).
- You can use the
Thus you should build the class hierarchy and check everything related to that. For example:
- Check to see if a class inherits from
Int
(etc.). - Check to see if a class inherits from an undeclared class.
- Check for cycles in the class hierarchy.
- Check for duplicate method or attribute definitions in the same class.
- Check for a child class that redefines a parent method but changes the parameters.
- Check for a missing method main in class Main.
- Check for
self
andSELF_TYPE
mistakes in classes and methods. - This list is not exhaustive -- read the Cool Reference Manual carefully and find everything you might check for without typechecking expressions.
- Basically, you'll look at classes, methods and attibutes (but not method bodies).
Q: What's the exact list of errors I have to check for in PA4c?
- A: No such list is provided! Part of the assignment is thinking up all possible checks that do not involve expressions.
PA4
You must turn in a zip file containing these files:
readme.txt
-- a plain ASCII text file describing your design decisions. In addition, answer the following questions in yourreadme.txt
:- What are some challenging parts for this assignment? What did you do to solve them?
- Suggestions for both the instructor and future students for this assignment?
main.ml
-- your implementation
Grading Rubric
PA4-C/PA4 Grading (out of 80 points)
- 70 points -- for autograder tests
- Each missed test removes points, to a minimum of 0/70, even if there are more tests than total points.
- 1 points -- for turning in something that compiles/runs for PA4-C
- 1 -- if you turn in something that compiles/runs for PA4-C
- 0 -- if you did not turn in anything or turn in something that does not compile/run
- 5 points -- for a clear description in your
README
- 5 -- thorough discussion of design decisions (e.g., handling of the class hierarchy, case and new and dispatch) and choice of test cases; a few paragraphs of coherent English sentences should be fine
- 2 -- vague or hard to understand; omits important details
- 0 -- little to no effort, or submitted an RTF/DOC/PDF file instead of plain TXT
- 4 point -- for code cleanliness
- 4-- code is mostly clean and well-commented
- 2 -- code is sloppy and/or poorly commented in places
- 0 -- little to no effort to organize and document code