Start on Oct 18, 2021
Due on Nov 1, 2021
With grace days due on Nov 4, 2021
Your task for this homework is to use LLVM in order to write the code generation step for expressions and methods in the Decaf programming language.
We will be using a code generation and compiler toolkit called LLVM for this homework. We will be using LLVM version 12.0.1.
Before you start programming for this homework it is very important you work through the LLVM practice problems first.
We will be using various LLVM tools such as llvm-as
and llc
.
The easiest way to access the right version of these binaries is
to use llvm-config --bindir
and add this directory as a
prefix to the LLVM binaries.
We can also use llvm-config
to generate the necessary command
line options for g++
or clang++
. clang++
is recommended for
this homework.
llvm-config --cxxflags --cppflags --cflags --ldflags --system-libs --libs all
The text generated by the above command can be used to tell the C++ compiler which LLVM libraries to link with your program and where they can be found.
You must have git and python (3.x) on your system to run the assignments. Once you’ve confirmed this, run this command:
git clone https://github.com/anoopsarkar/compilers-class-hw.git
In the decafexpr
directory you will find various python programs
which you will use to test your solution to this homework.
If you have already cloned the repository earlier you can get the new homework files by going to the directory where you cloned the repository earlier and then doing:
# go to the directory where you did a git clone earlier
git pull origin master
To get started with your homework do the following steps.
Assuming you have set up your repository using the instruction in HW0, clone your repository and enter that directory and copy over the decafexpr files:
git clone git@csil-git1.cs.surrey.sfu.ca:YOUR_USERNAME/CMPT379-1217-YOUR_USERNAME.git
cd CMPT379-1217-YOUR_USERNAME
mkdir -p decafexpr
cd decafexpr
cp -r /your-path-to/compilers-class-hw/decafexpr/* .
git add *
git commit -m 'initial commit'
git push
If you update my repository using git pull
then you might have to copy over the
new files into your repository. Be careful you do not clobber your own files
in the answer
directory.
Your solution must be compiled in the answer
directory and must be called decafexpr
.
There is an incomplete solution to this homework in the answer
directory. You can create the default
binary on CSIL machines as follows:
cd your-repo-name/answer
make llvmconfigllvm-config-12 default
If you have your own installation of LLVM on your home machine then just use:
make default
The goal of this homework is to write a code generator for variables, simple expressions and methods for the Decaf programming language. The output will be in LLVM assembly which is compiled to x86 assembly and then to a binary file. The first step will be to write a symbol table implementation for your compiler. The structure of Decaf and code generation hints are given in the Decaf specification:
Read the specification carefully before you attempt to write any code to solve this homework.
Using the Decaf language specification as your guide, you have two steps:
Implement a symbol table that can keep track of variables and methods in Decaf.
A symbol table is a mapping from identifiers to any information that the compiler needs for code generation. A symbol table is easily implemented using hash tables or maps, e.g. here is a declaration of a symbol table using STL in C++:
typedef map<string, descriptor* > symbol_table;
where a descriptor is a structure or class which contains useful information about the variable including a type, a register destination, a memory address location, and a variable spilled indicating if the value is to be found in a register or in memory (note that we will not use memory locations for variables until later).
In Decaf we are allowed to shadow a variable declaration. This means that a new definition for an identifier in a block will cause a new descriptor to be associated with the identifier, but once the block terminates the previous descriptor for the identifier has to be restored. A simple way to implement this notion of local scoping is to specify that each block can create a new symbol table in a list:
typedef list<symbol_table > symbol_table_list;
symbol_table_list symtbl;
If a variable has a local definition that shadows another definition of the same variable name, we pick up the most recently defined descriptor for that variable by simply scanning the list of symbol tables starting from the most recent one:
descriptor* access_symtbl(string ident) {
for (auto i : symtbl) {
auto find_ident = i.find(ident);
if (find_ident != i.end()) {
return find_ident->second;
}
}
return NULL;
}
The full example of how to use this data structure is available as a github gist. This is just one way to implement a symbol table. You can implement it any way you like, as long as it can handle shadowing of variables.
For this first step, you can assume that the identifiers are only variables, but for solving this homework (including Part 2) the symbol table will also store information about function names, global variables, etc., and additional information will have to be added to the descriptor.
To test your symbol table implementation, use it to remember the line number of each variable definition and then introduce a comment line into the input Decaf program that specifies the line number of the variable definition for that identifier.
When entering a new block you should make sure to insert a new symbol table so that local re-definitions of a variable will shadow higher level definitions.
You should modify your AST generation code from HW2. Implement the scoping by adding and removing new scopes in your symbol table by doing a top-down pass over the AST after the entire AST is constructed. This way of solving it is closer to the top-down code generation in Step 2 of this homework.
Here is an example of what Step 1 should do. For input Decaf program:
extern func print_int(int) void;
package C {
func main() int {
var x int;
x = 1;
print_int(x);
}
}
Your program should print out the following on standard error:
defined variable: x, with type: int, on line number: 4
And on standard output:
extern func print_int(int) void;
package C {
func main() int {
var x int;
x = 1; // using decl on line: 4
print_int(x) // using decl on line: 4;
}
}
There are some additional testcases available in the decafsym
directory in
the compilers-class-hw
git repository.
Provide code generator for the following fragment of Decaf that includes:
decaf-stdlib.c
).Do not perform code generation for field declarations aka Global variables.
The following AST nodes should have a fully working code generation implemented (except for short-circuiting).
Most of these are trivial to implement using the LLVM API. Only the first one, arithmetic and boolean expressions is non-trivial. The Decaf language specification has a section on Semantics which lays out the rules of type coercion and other gray areas in the implementation of Decaf.
Boolean constants are of type i1
in LLVM assembly. Char constants can be either i8
or i32
.
You will need to refer to the LLVM Documentation
and the LLVM Tutorial.
To get started on this step of the homework first do the LLVM
Practice and then read the README.md
file
in the compilers-class-hw/decafexpr
directory.
As shown in LLVM practice you should extend your abstract
syntax tree (AST) classes to add LLVM API calls for code generation. Keep all
the AST classes from your parser in HW2. For the AST nodes not mentioned above
just return nullptr
or NULL
as the value for the code generation function call.
For the rest you should make sure the code generation function returns the appropriate
LLVM code block of type LLVM::Value *
.
More details about the task is provided by examining the testcases for this homework.
The output should be in LLVM assembly which can be compiled to x86 assembly
using the LLVM tools and run as a binary. We will use the binary llvm-run
in
the answer directory to create and run the binary from the Decaf programs in
the testcases directory.
The LLVM assembly and toolchain output is dumped into the llvm
directory.
You should examine your output to debug your compiler.
Make sure you obey the following requirements:
exit(EXIT_SUCCESS)
. And if your program finds an error in the input Decaf program you should exit using exit(EXIT_FAILURE)
. The definitions of EXIT_SUCCESS
and EXIT_FAILURE
are in cstdlib
(for C++) and in stdlib.h
(for C).TheModule->print(errs(), nullptr);
where TheModule
is of type llvm::Module*
.Remember to push your solution source code to your repository:
cd answer
git add decafexpr.y decafexpr.lex # and any other files you need for the solution
git commit -m 'initial solution'
git push
Then each time you finish a component of your solution you can push it to the remote repository:
git add [source-file]
git commit -m 'commit message' [source-file]
git push
You have been given three helper programs to help you develop your solution to this homework.
Run your solution program on the testcases using the Python program zipout.py
.
Your solution must be compiled in the answer
directory and must be called decafexpr
.
Run against all testcases as follows:
# on CSIL machines you should set the following environment variable
export LLVMCONFIG=llvm-config-12
# go to the directory with the file zipout.py
python3 zipout.py
This creates a directory called output
and a file output.zip
which can be checked against the reference output files
(see section on Check your solution below).
If you run zipout.py
multiple times it will overwrite your output directory and zip file which should be fine most of the time (but be careful).
Check your solution accuracy using the Python program check.py
. You must create an output.zip
file using the above step in Run your solution on testcases.
Note that the references are only available for the dev
testcases. When you are graded you will be evaluated on both the dev
and test
testcases.
output.zip
contains your output for both sets of testcases.
You can use the default program provided to get an initial solution to this homework. Run python3 zipout.py -r default
to get a source.zip
file you can score using check.py
.
python3 check.py
Correct(dev): 2 / 100
Score(dev): 2.00
Total Score: 2.00
You must also upload your source code to Coursys. You should prepare your source for upload using the Python program zipsrc.py
.
# go to the directory with the file zipsrc.py
python3 zipsrc.py
This will create a zip file called source.zip
. You should upload this file as your submission to hw1 on Coursys.
Be careful: zipsrc.py
will only package files in the answer
directory. Make sure you have put all your supporting files in that directory. In particular, put relevant documentation into answer/README.md
.
If you add any testcases of your own please put them in the directories answer/testcases/[your-username]/
and answer/references/[your-username]/
using the same convention used by zipout.py
and check.py
.
answer
directory as a zip file source.zip
produced by running python3 zipsrc.py
must be uploaded to the hw3
submission page on Coursys.output.zip
produced by running python3 zipout.py
must be uploaded to the hw3
submission page on Coursys. When we run check.py
on the public testcases it should have a value higher than the output from the default program to get any marks.source.zip
must be on your gitlab repository. Please commit and push often in order to get feedback on your code.make decafexpr
in your answer directory to create the decafexpr
binary.answer/README.md
file.If you have any questions or you’re confused about anything, just ask.