PA3: Parser
Goal
For this assignment you will write a parser using a parser generator. You will describe the Cool grammar in an appropriate input format and the parser generator will generate actual code (e.g., in OCaml). You will also write additional code to unserialize the tokens produced by the lexer stage and to serialize the abstract syntax tree produced by your parser.
Specification
You will turn in an Ocaml program that takes a single command-line argument (e.g., file.cl-lex
).
That argument will be an ASCII text Cool tokens file (as described in PA2).
The cl-lex
file will always be well-formed (i.e., there will be no syntax errors in the cl-lex
file itself). However, the cl-lex
file may describe a sequence of Cool tokens that do not form a valid Cool program.
Your program must either indicate that there is an error in the Cool
program described by the cl-lex
file (e.g., a parse error in the
Cool file) or emit file.cl-ast
, a serialized Cool abstract syntax
tree. Your program's main parser component must be constructed by a
parser generator. The "glue code" for processing command-line
arguments, unserializing tokens and serializing the resulting
abstract syntax tree should be written by hand. If your program is
called parser, invoking parser file.cl-lex
should yield the same
output as cool --parse file.cl
. Your program will consist of a
number of OCaml files, a number of Python files, or a number of Ruby
files.
You will not write the parser but instead use ocamlyacc
to generate one.
You will just provide the context-free grammar rules and ocamlyacc
will generate the appropriate parser.
Line Numbers
The line number for an expression is the line number of the first token that is part of that expression. Example:
5 6 7 8 |
|
The while
expression is on line 5, the x <= 99
expression is on line
5, the 99
expression is on line 6, and the x <- x + 1
and x + 1
expressions are on line 7. The line numbers for tokens are present in
the serialized token .cl-ast
file.
Your parser is responsible for keeping track of the line numbers (both for the output syntax tree and for error reporting).
Error Reporting
To report an error, write the string:
ERROR: line_number: Parser: message
to standard output and terminate the program. You may write whatever you want in the message, but it should be fairly indicative. Example erroneous input:
70 |
|
Example error report output:
ERROR: 70: Parser: syntax error near +
The .cl-ast
File Format
If there are no errors in file.cl-lex
your program should create file.cl-ast
and serialize the abstract syntax tree to it.
The general format of a .cl-ast
file follows the Cool Reference Manual Syntax chart.
Basically, we do a pre-order traversal of the abstract
syntax tree, writing down every node as we come to it.
We will now describe exactly what to output for each kind of node. You
can view this as specifying a set of mutually-recursive tree-walking
functions.
The notation superclass:identifier
means "output the
superclass using the rule (below) for outputting an identifier". The
notation "\n
" means "output a newline".
Output
-
To Output An AST. A Cool AST is a list of classes. Output the list of classes.
-
To Output A List (of classes, or features, or whatever). Output the number of elements, then a newline, then output each list element in turn.
-
To Output A Class. Output the class name as an identifier. Then output either:
- no_inherits
\n
- inherits
\n
superclass:identifier
Then output the list of features.
- no_inherits
-
To Output An Identifier. Output the source-file line number, then a newline, then the identifier string, then a newline.
-
To Output A Feature. Output the name of the feature and then a newline and then any subparts, as given below:
- attribute_no_init
\n
name:identifier type:identifier - attribute_init
\n
name:identifier type:identifier init:exp - method
\n
name:identifier formals-list\n
type:identifier body:exp
- attribute_no_init
-
To Output A Formal. Output the name as an identifier on line and then the type as an identifier on a line.
-
To Output An Expression. Output the line number of the expression and then a newline. Output the name of the expression and then a newline and then any subparts, as given below:
- assign
\n
var:identifier rhs:exp - dynamic_dispatch
\n
e:exp method:identifier args:exp-list - static_dispatch
\n
e:exp type:identifier method:identifier args:exp-list - self_dispatch
\n
method:identifier args:exp-list - if
\n
predicate:exp then:exp else:exp - while
\n
predicate:exp body:exp - block
\n
body:exp-list - new
\n
class:identifier - isvoid
\n
e:exp - plus
\n
x:exp y:exp - minus
\n
x:exp y:exp - times
\n
x:exp y:exp - divide
\n
x:exp y:exp - lt
\n
x:exp y:exp - le
\n
x:exp y:exp - eq
\n
x:exp y:exp - not
\n
x:exp - negate
\n
x:exp - integer
\n
the_integer_constant\n
- string
\n
the_string_constant\n
- identifier
\n
variable:identifier (Note that this is not the same as the integer and string cases above) - true
\n
- false
\n
- assign
-
To Output A let Expression. (Output the line number, as usual.) Output let
\n
. Then output the binding list. To output a binding, do either:- let_binding_no_init
\n
variable:identifier type:identifier - let_binding_init
\n
variable:identifier type:identifier value:exp
Finally, output the expression that is the body of the let.
- let_binding_no_init
-
To Output A case Expression. (Output the line number, as usual.) Output case
\n
. Then output the case expression. Then output the case-elements list. -
To output a case-element, output the variable as an identifier, then the type as an identifier, then the case-element-body as an exp.
An Example
Example input
1 2 3 4 5 6 7 8 |
|
Corresponding .cl-ast
output with comments
1 -- number of classes
3 -- line number of class name identifier
List -- class name identifier
no_inherits -- does this class inherit?
1 -- number of features
method -- what kind of feature?
6 -- line number of method name identifier
cons -- method name identifier
1 -- number of formal parameters
6 -- line number of formal parameter identifier
i -- formal parameter identifier
6 -- line number of formal parameter type identifier
Int -- formal parameter type identifier
6 -- line number of return type identifier
List -- return type identifier
7 -- line number of body expression
dynamic_dispatch -- kind of body expression
7 -- line number of dispatch receiver expression
new -- kind of dispatch receiver expression
7 -- line number of new-class identifier
Cons -- new-class identifier
7 -- line number of dispatch method identifier
init -- dispatch method identifier
2 -- number of arguments in dispatch
7 -- line number of first argument expression
identifier -- kind of first argument expression
7 -- line number of the identifier
i -- what is the identifier?
7 -- line nmber of second argument expression
identifier -- kind of second argument expression
7 -- line number of the identifier
self -- what is the identifier?
The .cl-ast
format is quite verbose, but it is particularly easy for
later stages (e.g., the type checker) to read in again without having to
go through all of the trouble of "actually parsing". It will also make
it particularly easy for you to notice where things are going awry if
your parser is not producing the correct output.
Writing the code to output a .cl-ast
text file given an AST may take a
bit of time but it should not be difficult; our reference implementation
(in OCaml) does it in 116 lines and cleaves closely to the structure
given above.
Parser Generators
For this assignment, you must use a parser generator.
Ocaml has an ocamlyacc
parser generator.
Commentary
Tests
- You can use the following tests to check your implementation.
- You can do basic testing with something like the following:
$ cool --lex file.cl $ cool --out reference --parse file.cl $ my-parser file.cl-lex $ diff -b -B -E -w file.cl-ast reference.cl-ast
You may find the reference Cool compiler's --unparse
option useful for debugging your .cl-ast
files.
Hint
If you are failing every negative test case, it is likely that you are not handling cross-platform compatibility correctly on all of your inputs and outputs.
Video Guides
Turn-In and Grading
What To Turn In
PA3
You must turn in a zip file containing these files:
readme.txt
: a plain ASCII text file describing your design decisions. In addition, answer the following questions in yourreadme.txt
:- What are some challenging parts for this assignment? What did you do to solve them?
- Suggestions for both the instructor and future students for this assignment?
- source files:
parser.ml
parser.mly
- See an example of a
readme.txt
file - If you work in a team, then list the name of your other team member in the
readme.txt
.
Grading Rubic
PA3 Grading (out of 50 points)
- 41 points : for autograder tests (-1 point per incorrect test,
minimum score of 0)
- Each missed test removes points, to a minimum of 0, even if there are more tests than total points.
- 5 points : for a clear description in your README
- 5 : thorough discussion of design decisions (e.g., the handling of let) and answering given questions; a few paragraphs of coherent English sentences should be fine
- 2 : vague or hard to understand; omits important details
- 0 : little to no effort
- 4 point : for code cleanliness
- 4 : code is mostly clean and well-commented
- 2 : code is sloppy and/or poorly commented in places
- 0 : little to no effort to organize and document code
- -5 points : if you neglected to include the grammar definition (e.g.,
.mly
file)- -5 : only submitted machine-generated parser; failed to submit grammar from which parser was generated