CPS710 Assignment 3

CPS710	Assignment 3
	Parser - Syntax Tree Generation	Due: November 11

Objectives

In this course you will be developing an interpreter for a high-level language, called HL, which manipulates ordered lists.

In this assignment you will be using JavaCC and JJTree to build the component of HL's parser which generates ASTs for HL programs.

This should familiarize you with the use of compiler generators to produce parse trees and abstract syntax trees.

The learning objectives addressed in this assignment are 4Cd, 4Bc, 4Bd, 4Ab. 4De is also inferred from 4Cd.
Some of these learning objectives will be identified in the description of this assignment, and some in the tests.

Preparation

The first three steps of the preparation for this assignment are the same as for the previous assignments, but there is some additional work because you will be using JJTree to generate ASTs in addition to JavaCC.

Create a directory on the moons where you will do your work, with a subdirectory called Tests, for example:
```
mkdir 710A3
cd 710A3
mkdir Tests
```
Copy all the files in the Handouts directory into your working directory. These files are located on the moons at ~cps710/public_html/doc/F2024/A3/Handouts:
```
cp ~cps710/public_html/doc/F2024/A3/Handouts/* .
cp ~cps710/public_html/doc/F2024/A3/Handouts/src/* .
```
Make the four testing scripts t, runtestsa, runtestsb, runtestsc executable by giving them user x perms:
```
chmod u+x t runtests*
```
JJTree Set up:
In this assignment you will be using JJTree. JJTree is a JavaCC preprocessor which generates code to construct ASTs. It is bundled with JavaCC.
JJTree is going to generate many AST classes, one for each type of AST node. These classes will be stored in a subdirectory called AST. You will want to delete all of these classes each time you make modifications to your AST (this has been implemented in the makefile). Therefore it is useful to have them all in the same directory, particularly since you won't be modifying them for this assignment. JJTree does have options to generate packages, but there are open tickets on these features and in packaging in general in the JavaCC forum. Try at your own risk. Instead, you can simply set things up to put all the AST modules in one directory.
All that is needed to set up the workspace for JJTree is to create the subdirectory AST, and move into that directory the SimpleNode.java file shown in class which has a new toString() method to better print tokens:
```
mkdir AST
mv SimpleNode.java AST
```
HL.jjt setup: (explanation only - no steps required)
For this assignment, you will not be able to use your own solution to the second assignment because the grammar has been changed significantly. Instead you will be using the file HL.jjt which is handed out. This section is an explanation of the relationship between HL.jj and HL.jjt.
In terms of content, the grammar that was handed out in assignment 2 has been cleaned up and it is now LL1 except for a few lookaheads, including some syntactic lookaheads that will not slow down the interpretation noticeably, practically speaking given how the language is likely to be used. You can compare this grammar with the one in assignment 2 to see the choices made to resolve the problems you encountered in Assignment 2.
The grammar has also been extended to better incorporate Boolean values where they might be desirable for HL programmers.
In terms of structure, HL.jjt is more or less the same file as HL.jj except that it has a .jjt extension instead of a .jj extension, and a few other changes to support JJTree:
- The following JavaCC/JJTree directives (options) have been added at the beginning of the file:
```
MULTI=true;	// This will generate one AST class for each non-suppressed non-terminal
JJTREE_OUTPUT_DIRECTORY="AST";  // This will put all your AST classes in the AST directory
VISITOR=true;	// This won't be used until the next assignment, but will be needed to make your assignment compile properly
```
  Comment: according to the JJTree documentation, you can also set these as command-line options. However, it is simpler to put everything inside HL.jjt.
- In HL.jjt, the start() production has been broken down into two productions start() and S().
  start() is generic code to return the AST constructed during parsing, and S() is the starting production for HL (which was called start in assignment 2). The code for start() is:
```
SimpleNode start()  #void :
{}
{  S()  { return (SimpleNode) (jjtree.popNode()); }
| < EOF > {throw new ParseException("End of File.");}
}
```
  JJtree keeps a stack of the ASTs it constructs and when parsing is complete, the top of the stack contains the AST that has been created. start() returns this AST to the function that called it in TestHL.java. It also handles end of file.
Run make:
Explanation of the makefile:
- Because you have set the JJTree switch MULTI to true, JJTree will construct ASTs in "multi" mode. i.e. each production generates its own AST subclass of the SimpleNode class for that production. These subclasses will be stored in the AST subdirectory. As you proceed through the assignment, you will end up with different ASTs and therefore some of the early AST classes will be discontinued. For this reason, the makefile cleans up the AST subdirectory each time HL.jjt is recompiled and copies SimpleNode.java into it again.
- The jjtree command is executed before the javacc command because jjtree is a preprocessor for javacc. The sequence of transformations of your HL.jjt file will be:
```
HL.jjt ->[jjtree]-> HL.jj ->[javacc]-> HL.java ->[javac]-> HL.class
```
- ./AST has been added to your java classpath to compile and run the parser.
Make sure that everything compiles properly before proceeding further.

Assignment

By default, JJTree constructs parse trees. In this assignment, you will be slowly converting these parse trees into desired ASTs. Therefore, the outputs will change depending on how far you get in the assignment. This is explained further in the testing section.

For the rest of the assignment, you should modify HL.jjt to construct all your ASTs. In this assignment you will not modify any other files.

Here now are the requirements for the ASTs and their associated learning objectives. They are listed in increasing order of difficulty. Please check the Tests directory for examples of these explanations. The tests in that directory are in the same order as the requirements.

Shorter names

4Cd: Because JavaCC productions are Java function calls, some of the names used for them, such as "return_stat" were lengthened to avoid using Java keywords. Other productions were renamed to be consistent with them. Without changing the production names in the grammar, make sure that the names of all the AST classes generated for statements do not contain the _stat component, but start instead with an upper case letter. For example, when the return_stat() production is parsed, an ASTReturn structure will be created.

Leaves of the ASTs

In this section we will define all the leaves of the ASTs: some are nodified tokens and some are null.

Comparators

4Cd: You should have one ASTcomparison class (where the nodes have 3 subtrees) to deal with all comparisons. The middle node is a nodified comparator token specifying the type of comparison, The names are the same as the token names. For example ASTLESSEQ for the <LESSEQ> token described by the string "<=".

Boolean constants

4Cd:

Absolute value

4Cd: By default the absolute value of an expression is parsed exactly like an expression surrounded by parentheses. Instead, it should be parsed as an ASTabsolute_value with one child.

Nulls

Some productions have optional components. In particular:

the domain in set former expressions
the elif and else clauses of an if statement

4Cd: Rework the grammar of these types of statements to make sure that their ASTs have exactly the same structure with the same number of children whether or not the optional components are there. More specifically:

An ASTset_former always has 3 children: the index, the domain, and the condition, but if the domain is not specified, then the second child needs to be ASTNULL. (Note that at the C level, this will be the first of two children because the index will not be kept until 4Bd is implemented).
An ASTIf always has 3 children: condition, then clause, and else clause, but the else clause is the node ASTNULL when there are no elifs or else. This means that you must restructure your grammar so that elifs are simply represented as ASTIf nodes, without changing the syntax of the language. I.e. programmers will still be able to use elif constructs exactly the same way, but they will be represented internally as ASTIf constructs.

Meaningful tokens

4Bd:

Building AST's with JJTree Handout

Note: This B-level objective is not very hard, and you should therefore implement it as soon as you decide to attempt the B-level objectives, because it will facilitate your testing greatly: once you've programmed this requirement, you will be able to see the name of the identifiers, and the values of the numbers and strings that you input into your tests.

Cleanups

Later sections will describe how boolean and arithmetic operations will be rationalized. This section deals with simpler changes to other structures.

Simple cleanups

4Bc:

4Bc: In a similar vein, the only production in not_clause() that needs to be in an ASTnot is the one that is negated with a "!".

4Bc: You should also simplify the ASTs that represent sets. There should only be two ASTs for sets: ASTset_former for the sets described with set formers and ASTset for sets described simply as a list of elements.

List ASTs

4Bc:

4Bc: By default, ASTvar_decl and ASTset trees have a single child which is a list. These two ASTs should be turned into n-ary nodes whose children are the ones who were previously their grandchildren.

Operations

The remaining requirements are all for learning objective 4Ab. They can all be implemented by manipulating the grammar or using more advanced features of jjtree.

4Ab: You should now rationalize the names and structures of the AST nodes that represent operations as described here:

Identifiers

Names of AST for operations:

AST minimizations:

Operator Precedence:

precedence rules

Highest Precedence

( ), | |

* / %

+ -

<, <=, >, >=, ==, !=, =in, !in

Lowest Precedence

Associativity:

left associative

In the description of the ASTs that follows you will notice some inconsistencies between similar operators: Some of the binary operators &, |, +, -, *, /, % will be represented as binary operations with ASTs with only two subtrees, and others will be represented as n-ary operations with ASTs that can have more than two subtrees.

Real interpreters may be more consistent, but these discrepancies have been introduced for educational purposes, to learn to manipulate grammars and JJTree directives differently to arrive at different AST structures. These ASTs will also end up having to be evaluated differently in the next assignment, providing a further opportunity to work with different types of data structures.

Boolean Operations:

ASTnot is a unary operator. Its node has a single child.
& and | are left-associative n-ary operators, where n>0. Their nodes (ASTand, ASTor respectively) have as many children as are parsed in the boolean expression.
For example a | b | c |d would be represented as [ASTor a b c d]. (here a, b, c, d are placeholders for more complicated expressions involving comparisons.)

Arithmetic Operations:

*, /, and % are left associative binary operators with the same precedence: ASTmul, ASTdiv, and ASTmod have two children, and when there are multiple operations, the subtrees should be on the left.
For example, a * b / c would be represented as [ASTdiv [ASTmul a b] c].
+ and - are unary operators as well as left associative binary operators. However, because + and - have the same precedence, a string consisting of these operations will be considered to be a sum, where the terms are either positive or negative. These sums will be represented with a single n-ary node called ASTsum, where n>0.
- The second to last children of ASTsums will be unary nodes ASTpos and ASTneg representing unary + and -.
- The first child will either be an ASTneg if the first term in the sum was negative, or the term itself otherwise.
For example:
a-b+c would be represented as [ASTsum a [ASTneg b] [ASTpos c]],
-a-b+c would be represented as [ASTsum [ASTneg a] [ASTneg b] [ASTpos c]],
-a would be represented as [ASTsum [ASTneg a]],

Debugging your assignment

JavaCC has a built-in tracing functionality that you may find useful for debugging your work. It is invoked by setting:

  DEBUG_PARSER=true;

In the options section of HL.jj program right at the top where you have also specified the case.

Testing your assignment

Your entire assignment will be tested on the scs moons as described here:

We will first retrieve your submission and, using the makefile in that directory, we will compile it and link it with all the java source files we gave you in the Handouts/src directory: TestHL.java, SimpleNode.java, and all the *Token.java files needed for scanning.
We will then run the runtest script for the version of the assignment that you submitted (runtestsa for A3A.zip, runtestsb for A3B.zip, runtestsc for A3C.zip). You can find these three test files and the associated single test files called t in the Handouts directory

As for previous assignments these scripts runs a battery of tests that build on each other. The test files and answers can be found in the Tests directory.

However, there is an added level of complexity to the testing of this assignment: The ASTs that you build will be very different depending on how far you get in the learning objectives. Therefore, we will run different test scripts based your indication of how far you got in your development. The input files are the same, but there are three types of result files, one for each level as you can see in the structure of the Tests directory. The expected outputs at each level assume that all the learning outcomes have been met at that level. If that is not the case, we will look at your outputs to see how far you have proceeded.

Submitting the assignment

As for the previous assignments, we are using scripts to grade you. Therefore please use exact names listed below, including the name of the zip file.

As explained in the previous section on testing, the output of you program will be very different depending on how far you have gone in the learning objectives. Therefore you will need to indicate in your submission which learning objectives you are attempting to meet to make sure that we can test your submission properly:

Zip up HL.jjt into one of the following:
- A3C.zip if you are only attempting to meet the 4Cd learning objective,
- A3B.zip if you are attempting to meet the 4Cd, 4Bc and 4Bd learning objectives, but not 4Ab.
- A3A.zip if you are attempting to meet all the learning objectives of this assignment.
As explained above, we will be testing your assignment with the extra java files provided in the Handouts directory
Make sure that your solution passes the tests described in the section above on the moons.
Submit electronically A3C.zip, A3B.zip or A3A.zip

References

Chapter 4 of the JavaCC textbook describes JJTree
Building AST's with JJTree
JJTree examples shown during JJTree parsing lecture
JavaCC [tm]: JJTree Reference Documentation
JavaCC [tm]: API Routines contains a comprehensive list of all JJTree classes, methods, and variables available in addition to the JavaCC API routines.

This page is maintained by Sophie Quigley (cps710@cs.torontomu.ca)
Last modified Tuesday, 05-Nov-2024 02:07:10 EST

CPS710

Assignment 3

Parser - Syntax Tree Generation

Due: November 11