|
Assignment 2
|
|
|
Parser - Grammar Recognizer
|
Due: October 20
|
SUBMISSION INSTRUCTIONS UNDER CONSTRUCTION - REMAINDER IS READY
Objectives
In this course you will be developing an interpreter for a high-level
language, called HL, which manipulates lists and strings.
In this assignment you will be using JavaCC
to build the component of HL's parser that recognises syntactically correct HL programs.
This should familiarize you with the use of compiler generators
and the recursive descent parsing process, as well as the manipulation of grammars into an LL(k) form.
The
learning objectives addressed in this assignment are
4Cb, 3Bd (this will include 3De), 3Be, 4Bb, 3Ad, 4Aa.
- The 3Bd, 3Be, and 3Ad learning objectives are grammar related and you can meet them by
modifying the HL grammar to make it LL(k) and documenting the changes you have made.
- The 4Cb, 4Bb, and 4Aa learning objectives are both grammar and parsing related.
They all involve using a tool (in this case JavaCC)
to build a parser that recognizes increasingly complex syntactic components of the HL language.
The B and A parser learning objectives are related to the grammar learning objectives because they involve
using JavaCC to overcome some of the problems with the grammar of HL.
Some of these learning objectives will be identified in the description of this assignment,
and some in the tests.
Assignment Stages
In this assignment you will manipulate the
Grammar for the HL language,
shown here in a variation of
Backus-Naur form (BNF),
to try to make it LL(k) while still accepting the same language.
This means that it must be unambiguous, not left recursive,
and with at most k lookaheads. K should be as small as possible.
Since you will ultimately build an interpreter for HL rather than a compiler,
the starting non-terminal for this grammar, called start, is either a statement, an expression, or a condition
rather than an entire program.
You can see this in the productions for start.
It is recommended that you do this assignment in two stages:
- First manipulate the BNF grammar directly to try to make it as close to LL(k) as possible.
- Then convert this grammar to a recursive descent parser HL.jj in the format used by JavaCC
and continue manipulating it with the support of JavaCC.
You will need to document all the changes you have made to the grammar (in both stages).
These explanations should be recorded as comments in HL.jj as explained underneath.
As you work on this assignment, you will need to decide when to stop working on the BNF grammar
and switch to working on the JavaCC parser instead.
Pros and cons of BNF grammar manipulation vs. JavaCC parser manipulation
It is up to you to decide when you want to switch from BNF manipulation to JavaCC manipulation.
Here are the issues related to this decision:
- The grammar file is a short text file, and that makes it easier to see the entire grammar
you are working on.
This is useful because many of the problems with the grammar
involve productions interacting with each other in a way that is not necessarily obvious,
particularly if you have to look through information spread out on many pages (paper or electronic).
Also, editing the grammar in this form is a lot faster than editing JavaCC code where each non-terminal
is implemented with its own function.
- On the other hand, JavaCC will provide some assistance identifying problems in the grammar
by issuing warnings about problematic locations in the grammar.
Interpreting these warnings is not always easy, and resolving them properly is not always straightforward,
but the JavaCC assistance is definitely useful, at the very least by showing where there are some problems.
Assignment Preparation
Just like you did for Assignment 1, you should
- Create a directory on the moons where you will do your work, with a subdirectory called Tests, for example:
mkdir 710A2
cd 710A2
mkdir Tests
- Copy all the files in the
Handouts directory
into your working directory.
These files are located on the moons at ~cps710/public_html/doc/F2024/A2/Handouts:
cp ~cps710/public_html/doc/F2024/A2/Handouts/* .
cp ~cps710/public_html/doc/F2024/A2/Handouts/src/* .
- Make the two testing scripts t and runtests executable by giving them user x perms:
chmod u+x t runtests
Notice that solutions to A1 are provided in the Handouts/src directory:
HL.jj, Token.java, and the other Token java files.
These solutions are only for the B and C learning objectives of the scanner.
These solutions are provided to you for the sole purpose of integrating them in
your cps710 assignments this semester, and are copyrighted accordingly.
If you prefer, you can use your own solution to A1 as a starting point for A2.
Whatever you decide, be aware that the testing scripts for the rest of the assignments
this semester will be using the java files to handle tokens handed out here.
Assignment
Grammar (Module 3) Learning Objectives
To meet these learning objectives, you will need to manipulate your grammar
to remove ambiguities and handle components of the grammar that are not left factored.
You will also need to explain what you have done.
Here is the documentation that you need to include in your HL.jj file:
- 3Bd: Right before each set of productions that were not initially left-factored,
add a comment that starts with "//3Bd:"
and explains what prefix some of the productions had in common.
If you decide not to left factor them, also explain why.
- 3Be: Right before each local lookahead, add a comment that starts with "//3Be:"
and explains which productions will be distinguished by this lookahead
- 3Be: On the first line of your HL.jj file,
add a comment that starts with "//3Be:"
and states what k is for this LL(k) grammar
if you ignore syntactic lookaheads.
- 3Ad: Right before each set of productions that you have reworked to remove an ambiguity,
add a comment that starts with "//3Ad:"
and explains what ambiguity problem you are solving, and how.
Parsing (Module 4) Learning Objectives
To meet these learning objectives, you will need to use JavaCC to parse HL programs.
Modify the HL.jj program you developed in your first assignment
to make it accept the HL grammar that you have been modifying.
- You will first need to convert the grammar in BNF form to the grammar format used by JavaCC.
A
sed file called
bnftojj.sed is provided in the
Handouts directory
to support the initial mechanical component of this conversion process. To use this file, type
sed -f bnftojj.sed < HLgrammar.txt > grammar
and then append the grammar file just created to the HL.jj handout (or your own from your A1 directory):
cat ~cps710/public_html/doc/Now/A2/Handouts/src/HL.jj grammar > HL.jj
(You may find it interesting to look at
bnftojj.sed
to see another application of the power of regular expressions.
Sed applies all the commands in bnftojj.sed in order to each line of the input file.
The Wikipedia page on sed
provides a summary of the syntax.
The ampersand character & used in the second component of the substitute command represents
the whole string matched by the regular expression.)
The sed script is very helpful but slightly limited (particularly when dealing with ors inside parentheses)
because it works with regular expressions, so you will need to clean up some of its work manually in a few places.
To do so, just compare the translation agains the original bnf grammar.
- The structure of your start production, which should be on the first line,
will be a little different from all the other productions
because it will need to handle EOF which generates an exception to stop the program.
Make sure that it ends up looking like this:
void start () throws ParseException :
{}
{
// whatever expression was on the RHS of the start production
| < EOF > {throw new ParseException("End of File.");}
}
- Compile and link your grammar with the provided java files, in particular
TestHL.java.
The makefile in the handouts supports this.
Note that it is extremely unlikely that your HL.jj will compile properly the first time you try.
You will meet the module 4 learning objectives when your program passes the tests
for these learning objectives.
- If you are getting frustrated trying to get your program to compile so that you can meet the
tests for the B and C learning objectives, you could temporarily remove the
fn_call and comparison productions (and maybe set_former) from the entire grammar.
After you remove these productions, the grammar will still include challenges
that will need to be dealt with in order to meet the C learning objectives,
but they will be much more limited.
Once you are passing the C tests, you can gradually reintegrate these productions into the grammar,
adding them to your program in the same order as they are exercised by the B and A tests,
solving the problems that they create as you go.
- If you are only trying to meet the C learning objectives,
then just take a look at the C level tests and remove the components of the grammar that are not
necessary to pass these tests.
However, if you are planning to meet the B learning objectives as well as the C objectives,
this is not good idea because there will be too much material to reintegrate.
Here are additional requirements for the parsing learning objectives:
- 4Bb: Only if you are foregoing the 4Aa learning objectives:
declare your entire grammar to be LL(k) for the k you have identified in 3Be.
- 4Aa: Please use local lookaheads for
the few non-LL(1) productions rather than declaring your entire grammar to be LL(k)
where k>1.
- 4Aa: Minimize your use of syntactic lookaheads!
We will assess whether you are using too many or whether the strings they grab are too long.
- 4Aa: Right before each syntactic lookahead, add a comment which starts with "//4Aa:"
and explains which productions will be distinguished by this lookahead
Debugging your assignment
JavaCC has a built-in tracing functionality that you may find useful for debugging your work.
It is invoked by setting:
DEBUG_LOOKAHEAD =true;
DEBUG_PARSER=true;
In the options section of HL.jj program right at the top where you have also specified the case.
Testing your assignment
Your assignment will be assessed both manually and by running tests:
We will look at your code to see how you have manipulated the grammar,
and we will read your comments to get an understanding of the decisions you made.
Your assignment will also be tested on the scs moons as described here:
- We will first retrieve your submission, unzip it, and,
using the makefile in the
Handouts directory
compile it along with
the five java source files provided in that directory:
TestHL.java, Token.java, IntegerToken.java, IdentifierToken.java, and StringToken.java.
- We will then run the runtest script in the
Handouts directory.
This script is structured similarly to the runtest script in assignment1.
The tests and their answers are in the
Tests directory.
As for assignment 1, the name of each test indicates which learning objectives it is associated with.
- We may run some additional tests which are not posted.
However, the test files provided here provide a very thorough start.
You are strongly encouraged to perform your own testing following
the above procedure before submitting your assignment.
Submitting the assignment - UNDER CONSTRUCTION
As for assignment 1, we are using a script to grade you.
Therefore please use exact names listed below, including the name of the zip file
- Zip up HL.jj into
either A2A.zip if you are attempting to meet the A-level learning objectives,
or A2B.zip if you are not.
If you have modified any .java files, also include them in your zip file.
However, this should not be the case, except possibly for StringToken.java
if you've implemented the A learning objectives of Assignment 1
(because HL.jj and StringToken.java in the handouts implement only the B level objectives of assignment 1).
As explained above, we will be testing your assignment with
the java files provided in the
Handouts directory
-
Make sure that your solution passes the tests described in the section above on the moons.
- Submit electronically
A2A.zip or A2B.zip
References
This page is maintained by
Sophie Quigley
(cps710@cs.torontomu.ca)
Last modified
Monday, 07-Oct-2024 11:28:23 EDT