Skip to main content

Code Generation for Decaf Expressions

Start on Oct 18, 2021

Due on Nov 1, 2021

With grace days due on Nov 4, 2021

Your task for this homework is to use LLVM in order to write the code generation step for expressions and methods in the Decaf programming language.

LLVM

We will be using a code generation and compiler toolkit called LLVM for this homework. We will be using LLVM version 12.0.1.

Before you start programming for this homework it is very important you work through the LLVM practice problems first.

We will be using various LLVM tools such as llvm-as and llc. The easiest way to access the right version of these binaries is to use llvm-config --bindir and add this directory as a prefix to the LLVM binaries.

We can also use llvm-config to generate the necessary command line options for g++ or clang++. clang++ is recommended for this homework.

llvm-config --cxxflags --cppflags --cflags --ldflags --system-libs --libs all 

The text generated by the above command can be used to tell the C++ compiler which LLVM libraries to link with your program and where they can be found.

Getting Started

You must have git and python (3.x) on your system to run the assignments. Once you’ve confirmed this, run this command:

git clone https://github.com/anoopsarkar/compilers-class-hw.git

In the decafexpr directory you will find various python programs which you will use to test your solution to this homework.

If you have already cloned the repository earlier you can get the new homework files by going to the directory where you cloned the repository earlier and then doing:

# go to the directory where you did a git clone earlier
git pull origin master

To get started with your homework do the following steps.

Copy over files to your repository

Assuming you have set up your repository using the instruction in HW0, clone your repository and enter that directory and copy over the decafexpr files:

git clone git@csil-git1.cs.surrey.sfu.ca:YOUR_USERNAME/CMPT379-1217-YOUR_USERNAME.git
cd CMPT379-1217-YOUR_USERNAME
mkdir -p decafexpr
cd decafexpr
cp -r /your-path-to/compilers-class-hw/decafexpr/* .
git add *
git commit -m 'initial commit'
git push

If you update my repository using git pull then you might have to copy over the new files into your repository. Be careful you do not clobber your own files in the answer directory.

Default solution

Your solution must be compiled in the answer directory and must be called decafexpr. There is an incomplete solution to this homework in the answer directory. You can create the default binary on CSIL machines as follows:

cd your-repo-name/answer
make llvmconfigllvm-config-12 default

If you have your own installation of LLVM on your home machine then just use:

make default

The Challenge

The goal of this homework is to write a code generator for variables, simple expressions and methods for the Decaf programming language. The output will be in LLVM assembly which is compiled to x86 assembly and then to a binary file. The first step will be to write a symbol table implementation for your compiler. The structure of Decaf and code generation hints are given in the Decaf specification:

Decaf specification

Read the specification carefully before you attempt to write any code to solve this homework.

Your Task

Using the Decaf language specification as your guide, you have two steps:

Step 1: Symbol Table

Implement a symbol table that can keep track of variables and methods in Decaf.

A symbol table is a mapping from identifiers to any information that the compiler needs for code generation. A symbol table is easily implemented using hash tables or maps, e.g. here is a declaration of a symbol table using STL in C++:

typedef map<string, descriptor* > symbol_table;

where a descriptor is a structure or class which contains useful information about the variable including a type, a register destination, a memory address location, and a variable spilled indicating if the value is to be found in a register or in memory (note that we will not use memory locations for variables until later).

In Decaf we are allowed to shadow a variable declaration. This means that a new definition for an identifier in a block will cause a new descriptor to be associated with the identifier, but once the block terminates the previous descriptor for the identifier has to be restored. A simple way to implement this notion of local scoping is to specify that each block can create a new symbol table in a list:

typedef list<symbol_table > symbol_table_list;
symbol_table_list symtbl;

If a variable has a local definition that shadows another definition of the same variable name, we pick up the most recently defined descriptor for that variable by simply scanning the list of symbol tables starting from the most recent one:

descriptor* access_symtbl(string ident) {
    for (auto i : symtbl) {
        auto find_ident = i.find(ident);
        if (find_ident != i.end()) {
            return find_ident->second;
        }
    }
    return NULL;
}

The full example of how to use this data structure is available as a github gist. This is just one way to implement a symbol table. You can implement it any way you like, as long as it can handle shadowing of variables.

For this first step, you can assume that the identifiers are only variables, but for solving this homework (including Part 2) the symbol table will also store information about function names, global variables, etc., and additional information will have to be added to the descriptor.

To test your symbol table implementation, use it to remember the line number of each variable definition and then introduce a comment line into the input Decaf program that specifies the line number of the variable definition for that identifier.

When entering a new block you should make sure to insert a new symbol table so that local re-definitions of a variable will shadow higher level definitions.

You should modify your AST generation code from HW2. Implement the scoping by adding and removing new scopes in your symbol table by doing a top-down pass over the AST after the entire AST is constructed. This way of solving it is closer to the top-down code generation in Step 2 of this homework.

Here is an example of what Step 1 should do. For input Decaf program:

extern func print_int(int) void;
package C {
    func main() int {
        var x int; 
        x = 1;
        print_int(x);
    }
}

Your program should print out the following on standard error:

defined variable: x, with type: int, on line number: 4

And on standard output:

extern func print_int(int) void;
package C {
    func main() int {
        var x int; 
        x = 1; // using decl on line: 4
        print_int(x) // using decl on line: 4;
    }
}

There are some additional testcases available in the decafsym directory in the compilers-class-hw git repository.

Step 2: Code Generation for Expressions

Provide code generator for the following fragment of Decaf that includes:

  • Arithmetic and Boolean expressions.
  • Function calls.
  • Function definitions (including recursive functions).
  • Declaration of extern functions (all extern functions are defined in decaf-stdlib.c).

Do not perform code generation for field declarations aka Global variables.

The following AST nodes should have a fully working code generation implemented (except for short-circuiting).

Most of these are trivial to implement using the LLVM API. Only the first one, arithmetic and boolean expressions is non-trivial. The Decaf language specification has a section on Semantics which lays out the rules of type coercion and other gray areas in the implementation of Decaf.

Boolean constants are of type i1 in LLVM assembly. Char constants can be either i8 or i32. You will need to refer to the LLVM Documentation and the LLVM Tutorial.

Getting Started on Step 2

To get started on this step of the homework first do the LLVM Practice and then read the README.md file in the compilers-class-hw/decafexpr directory.

As shown in LLVM practice you should extend your abstract syntax tree (AST) classes to add LLVM API calls for code generation. Keep all the AST classes from your parser in HW2. For the AST nodes not mentioned above just return nullptr or NULL as the value for the code generation function call. For the rest you should make sure the code generation function returns the appropriate LLVM code block of type LLVM::Value *.

Requirements

More details about the task is provided by examining the testcases for this homework.

The output should be in LLVM assembly which can be compiled to x86 assembly using the LLVM tools and run as a binary. We will use the binary llvm-run in the answer directory to create and run the binary from the Decaf programs in the testcases directory.

The LLVM assembly and toolchain output is dumped into the llvm directory. You should examine your output to debug your compiler. Make sure you obey the following requirements:

  1. If your program succeeds in parsing the input you should exit from your program using exit(EXIT_SUCCESS). And if your program finds an error in the input Decaf program you should exit using exit(EXIT_FAILURE). The definitions of EXIT_SUCCESS and EXIT_FAILURE are in cstdlib (for C++) and in stdlib.h (for C).
  2. You must dump the LLVM assembly by calling TheModule->print(errs(), nullptr); where TheModule is of type llvm::Module*.

Development and upload procedure

Remember to push your solution source code to your repository:

cd answer
git add decafexpr.y decafexpr.lex # and any other files you need for the solution
git commit -m 'initial solution'
git push

Then each time you finish a component of your solution you can push it to the remote repository:

git add [source-file]
git commit -m 'commit message' [source-file]
git push

You have been given three helper programs to help you develop your solution to this homework.

Run your solution on testcases

Run your solution program on the testcases using the Python program zipout.py. Your solution must be compiled in the answer directory and must be called decafexpr. Run against all testcases as follows:

# on CSIL machines you should set the following environment variable
export LLVMCONFIG=llvm-config-12
# go to the directory with the file zipout.py
python3 zipout.py

This creates a directory called output and a file output.zip which can be checked against the reference output files (see section on Check your solution below).

If you run zipout.py multiple times it will overwrite your output directory and zip file which should be fine most of the time (but be careful).

Check your solution

Check your solution accuracy using the Python program check.py. You must create an output.zip file using the above step in Run your solution on testcases. Note that the references are only available for the dev testcases. When you are graded you will be evaluated on both the dev and test testcases. output.zip contains your output for both sets of testcases.

You can use the default program provided to get an initial solution to this homework. Run python3 zipout.py -r default to get a source.zip file you can score using check.py.

python3 check.py 
Correct(dev): 2 / 100
Score(dev): 2.00
Total Score: 2.00

Package your source for Coursys

You must also upload your source code to Coursys. You should prepare your source for upload using the Python program zipsrc.py.

# go to the directory with the file zipsrc.py
python3 zipsrc.py

This will create a zip file called source.zip. You should upload this file as your submission to hw1 on Coursys.

Be careful: zipsrc.py will only package files in the answer directory. Make sure you have put all your supporting files in that directory. In particular, put relevant documentation into answer/README.md.

If you add any testcases of your own please put them in the directories answer/testcases/[your-username]/ and answer/references/[your-username]/ using the same convention used by zipout.py and check.py.

Ground Rules

  • You must turn in two things:
    • Your source code from the answer directory as a zip file source.zip produced by running python3 zipsrc.py must be uploaded to the hw3 submission page on Coursys.
    • Your output on the testcases which is the file output.zip produced by running python3 zipout.py must be uploaded to the hw3 submission page on Coursys. When we run check.py on the public testcases it should have a value higher than the output from the default program to get any marks.
  • Your source code from source.zip must be on your gitlab repository. Please commit and push often in order to get feedback on your code.
  • Make sure that we can run make decafexpr in your answer directory to create the decafexpr binary.
  • You cannot use data or code resources outside of what is provided to you. If you use external code snippets provide citations in the answer/README.md file.
  • For the written description of your submission and supporting documentation, you can use plain ASCII but for math equations it is better to use kramdown. Do not use any proprietary or binary file formats such as Microsoft Word.

Grading

  • Score for testcases both dev and test.
  • Code review by TAs. Please check for comments on your code on gitlab.

If you have any questions or you’re confused about anything, just ask.