CPS710

Assignment 3

Toronto Metropolitan University University

Parser - Syntax Tree Generation

Due: November 11


Objectives

In this course you will be developing an interpreter for a high-level language, called HL, which manipulates ordered lists.

In this assignment you will be using JavaCC and JJTree to build the component of HL's parser which generates ASTs for HL programs.

This should familiarize you with the use of compiler generators to produce parse trees and abstract syntax trees.

The learning objectives addressed in this assignment are 4Cd, 4Bc, 4Bd, 4Ab. 4De is also inferred from 4Cd.
Some of these learning objectives will be identified in the description of this assignment, and some in the tests.

Preparation

The first three steps of the preparation for this assignment are the same as for the previous assignments, but there is some additional work because you will be using JJTree to generate ASTs in addition to JavaCC.
  1. Create a directory on the moons where you will do your work, with a subdirectory called Tests, for example:
    mkdir 710A3
    cd 710A3
    mkdir Tests
    
  2. Copy all the files in the Handouts directory into your working directory. These files are located on the moons at ~cps710/public_html/doc/F2024/A3/Handouts:
    cp ~cps710/public_html/doc/F2024/A3/Handouts/* .
    cp ~cps710/public_html/doc/F2024/A3/Handouts/src/* .
    
  3. Make the four testing scripts t, runtestsa, runtestsb, runtestsc executable by giving them user x perms:
    chmod u+x t runtests*
    

  4. JJTree Set up:

    In this assignment you will be using JJTree. JJTree is a JavaCC preprocessor which generates code to construct ASTs. It is bundled with JavaCC.

    JJTree is going to generate many AST classes, one for each type of AST node. These classes will be stored in a subdirectory called AST. You will want to delete all of these classes each time you make modifications to your AST (this has been implemented in the makefile). Therefore it is useful to have them all in the same directory, particularly since you won't be modifying them for this assignment. JJTree does have options to generate packages, but there are open tickets on these features and in packaging in general in the JavaCC forum. Try at your own risk. Instead, you can simply set things up to put all the AST modules in one directory.

    All that is needed to set up the workspace for JJTree is to create the subdirectory AST, and move into that directory the SimpleNode.java file shown in class which has a new toString() method to better print tokens:

    mkdir AST
    mv SimpleNode.java AST
    

  5. HL.jjt setup: (explanation only - no steps required)

    For this assignment, you will not be able to use your own solution to the second assignment because the grammar has been changed significantly. Instead you will be using the file HL.jjt which is handed out. This section is an explanation of the relationship between HL.jj and HL.jjt.

    In terms of content, the grammar that was handed out in assignment 2 has been cleaned up and it is now LL1 except for a few lookaheads, including some syntactic lookaheads that will not slow down the interpretation noticeably, practically speaking given how the language is likely to be used. You can compare this grammar with the one in assignment 2 to see the choices made to resolve the problems you encountered in Assignment 2.

    The grammar has also been extended to better incorporate Boolean values where they might be desirable for HL programmers.

    In terms of structure, HL.jjt is more or less the same file as HL.jj except that it has a .jjt extension instead of a .jj extension, and a few other changes to support JJTree:

  6. Run make:

    Explanation of the makefile:

    Make sure that everything compiles properly before proceeding further.


Assignment

By default, JJTree constructs parse trees. In this assignment, you will be slowly converting these parse trees into desired ASTs. Therefore, the outputs will change depending on how far you get in the assignment. This is explained further in the testing section.

For the rest of the assignment, you should modify HL.jjt to construct all your ASTs. In this assignment you will not modify any other files.

Here now are the requirements for the ASTs and their associated learning objectives. They are listed in increasing order of difficulty. Please check the Tests directory for examples of these explanations. The tests in that directory are in the same order as the requirements.

Shorter names

4Cd: Because JavaCC productions are Java function calls, some of the names used for them, such as "return_stat" were lengthened to avoid using Java keywords. Other productions were renamed to be consistent with them. Without changing the production names in the grammar, make sure that the names of all the AST classes generated for statements do not contain the _stat component, but start instead with an upper case letter. For example, when the return_stat() production is parsed, an ASTReturn structure will be created.

Leaves of the ASTs

In this section we will define all the leaves of the ASTs: some are nodified tokens and some are null.
  1. Comparators
  2. 4Cd: You should have one ASTcomparison class (where the nodes have 3 subtrees) to deal with all comparisons. The middle node is a nodified comparator token specifying the type of comparison, The names are the same as the token names. For example ASTLESSEQ for the <LESSEQ> token described by the string "<=".

  3. Boolean constants
  4. 4Cd: The two Boolean constants True and False should also be nodified tokens with no children, defined exactly like nodified comparators. For example ASTTRUE for the <TRUE> token described by the string "#1".
  5. Absolute value
  6. 4Cd: By default the absolute value of an expression is parsed exactly like an expression surrounded by parentheses. Instead, it should be parsed as an ASTabsolute_value with one child.

  7. Nulls
  8. Some productions have optional components. In particular:

    In these ASTs, missing components will be expressed with an ASTNULL.

    4Cd: Rework the grammar of these types of statements to make sure that their ASTs have exactly the same structure with the same number of children whether or not the optional components are there. More specifically:

  9. Meaningful tokens
  10. 4Bd: The remaining leaves of the ASTs, are tokens that contain values: IDNUM, IDSET, IDBOOL, NUMBER, STRING. They should be nodified as explained in the Building AST's with JJTree Handout. For these, you will need to construct an AST node class for each of these types of tokens. The name of these node classes should be idnum, idset, idbool, number, and string. Since the tokens will be discarded after the equivalent nodes are constructed, you will need to transfer to these nodes the value of the token that was constructed during scanning. Take a look at the source code for the two classes Node and SimpleNode to see what is in place to support this transfer.

    Note: This B-level objective is not very hard, and you should therefore implement it as soon as you decide to attempt the B-level objectives, because it will facilitate your testing greatly: once you've programmed this requirement, you will be able to see the name of the identifiers, and the values of the numbers and strings that you input into your tests.

Cleanups

Later sections will describe how boolean and arithmetic operations will be rationalized. This section deals with simpler changes to other structures.
  1. Simple cleanups
  2. 4Bc: Since you are building an AST, you should remove useless nodes which only represent syntactic constructs that have no inherent meaning. To start with, nodes for productions that simply group other productions that are fully described on their own should be discarded: S, statement_LL1, statement, identifier, value, term, and simple_term.

    4Bc: In a similar vein, the only production in not_clause() that needs to be in an ASTnot is the one that is negated with a "!".

    4Bc: You should also simplify the ASTs that represent sets. There should only be two ASTs for sets: ASTset_former for the sets described with set formers and ASTset for sets described simply as a list of elements.

  3. List ASTs
  4. 4Bc: Nodes which represent lists (ASTbody, ASTclause, ASTident_list, ASTvalue_list, ASTprint_list, ASTexp_list) should have as many children as are parsed. In other words, they should be n-ary nodes containing the elements of the list. When these lists are empty (when allowed by the grammar), then they have 0 children. Note that ASTprint_list, and ASTexp_list cannot be empty when they are used children of ASTPrint, and ASTFor trees respectively. This should not change.

    4Bc: By default, ASTvar_decl and ASTset trees have a single child which is a list. These two ASTs should be turned into n-ary nodes whose children are the ones who were previously their grandchildren.

Operations

The remaining requirements are all for learning objective 4Ab. They can all be implemented by manipulating the grammar or using more advanced features of jjtree.

4Ab: You should now rationalize the names and structures of the AST nodes that represent operations as described here:

In the description of the ASTs that follows you will notice some inconsistencies between similar operators: Some of the binary operators &, |, +, -, *, /, % will be represented as binary operations with ASTs with only two subtrees, and others will be represented as n-ary operations with ASTs that can have more than two subtrees.

Real interpreters may be more consistent, but these discrepancies have been introduced for educational purposes, to learn to manipulate grammars and JJTree directives differently to arrive at different AST structures. These ASTs will also end up having to be evaluated differently in the next assignment, providing a further opportunity to work with different types of data structures.


Debugging your assignment

JavaCC has a built-in tracing functionality that you may find useful for debugging your work. It is invoked by setting:
  DEBUG_PARSER=true;
In the options section of HL.jj program right at the top where you have also specified the case.

Testing your assignment

Your entire assignment will be tested on the scs moons as described here:
  1. We will first retrieve your submission and, using the makefile in that directory, we will compile it and link it with all the java source files we gave you in the Handouts/src directory: TestHL.java, SimpleNode.java, and all the *Token.java files needed for scanning.

  2. We will then run the runtest script for the version of the assignment that you submitted (runtestsa for A3A.zip, runtestsb for A3B.zip, runtestsc for A3C.zip). You can find these three test files and the associated single test files called t in the Handouts directory
As for previous assignments these scripts runs a battery of tests that build on each other. The test files and answers can be found in the Tests directory.

However, there is an added level of complexity to the testing of this assignment: The ASTs that you build will be very different depending on how far you get in the learning objectives. Therefore, we will run different test scripts based your indication of how far you got in your development. The input files are the same, but there are three types of result files, one for each level as you can see in the structure of the Tests directory. The expected outputs at each level assume that all the learning outcomes have been met at that level. If that is not the case, we will look at your outputs to see how far you have proceeded.

Submitting the assignment

As for the previous assignments, we are using scripts to grade you. Therefore please use exact names listed below, including the name of the zip file.

As explained in the previous section on testing, the output of you program will be very different depending on how far you have gone in the learning objectives. Therefore you will need to indicate in your submission which learning objectives you are attempting to meet to make sure that we can test your submission properly:

References


This page is maintained by Sophie Quigley (cps710@cs.torontomu.ca)
Last modified Tuesday, 05-Nov-2024 02:07:10 EST