The project is structured in three directories.
c_compiler
contains the main compiler, together with the final parser and lexer.c_parser
contains the first parser that was written and covered a very simple c-like syntax.c_lexer
contains the first lexer for the C syntax.
The necessary dependencies are: make bison flex g++-mips-linux-gnu qemu tidy
.
All the necessary dependencies and correct environment can be installed using Vagrant and the provided vagrant file. This can be done using the following commands.
# Installing the virtual machine
vagrant up
# ssh into the machine
vagrant ssh
# cd into the correct directory
cd /vagrant
The tests and builds can then be run from there.
To use the compiler and test it out, together with the lexer and parser, the makefile
can
be used. The main compiler can be built using the following commands.
# build all the targets
make
# Building the main compiler
make bin/c_compiler
# Building the parser
make bin/c_parser
# Building the lexer
make bin/c_lexer
These can then be tested using the following shell scripts.
# Running tests for the compiler
./run_test_deliverable.sh 1
# Running tests against the gcc reference implementation of the compiler
./run_test_deliverable.sh 2
# Running smaller testbench on compiler
./test_compiler.sh
# Running tests for the parser
./test_parser.sh
# Running tests for the lexer
./test_lexer.sh
Implemented compiler functionality following the C89 Spec.
- Local variables
- Integer arithmetic
- While
- IfElse
- For
- Function calls
- Arrays
- Multi Dimensional Arrays
- Pointers
- Strings
- Structures
- Floating-point
Overview of the ast that is built by the compiler, this only contains some of the important classes in the Compiler.
I used a pure abstract Node
class as an entry point to the AST.
I then had a TranslationUnit
class which contained
the external declarations and function
definitions in a vector. All other classes used linked lists instead of vectors to store
multiple elements, for example, the Statement
class has a
pointer to a next statement instead
of having a separate StatementList
class. This meant that
I did not have to write an extra List
class and made it easier to inherit from the right classes. I used the grammar to separated the
classes properly, as an Expression
is completely different
to a Type
, a Statement
or a Function
.
This meant that I could separate the member functions and not have to declare all of them in the
Node
class, as that would lead to a lot of empty definitions. These classes were mostly abstract too
but contained a few function definitions that would throw exceptions so that I did not have to
define these functions when I did not need them. I also had a few specific member functions
(eg. for UnaryExpression
) for which I had to use dynamic
and static pointer casting to access them.
The class hierarchy and structure is very logical as it closely matches the grammar. This also that it was easy to know what member variables a class needed and the inheritance was very logical too. All the member functions are well separated and only appear where they are actually used.
All the general base classes, that are mostly abstract as well, are in the bison union, which means that I can use and assign all of those classes directly, and be more specific in the member variables of those classes so that they only contain the types I need.
The Type
class is not very useful as it does not capture arrays correctly and store
all their information when using a multidimensional array for example. It also does not
enable me to extract this information correctly as it will only give me the primitive type of
the array.
As I did not want all the classes to contain functions that they do not need, classes like UnaryOperator have member functions. To access these I have to use dynamic casts and with my linked lists, I always have to check for nullptr before doing anything with it.
I did not use many registers because they are a limited resource, and instead I decided only to use registers $2 and $3 for performing operations and rarely use registers $t0 and $t1 when loading the address of a value, for example for pointers. I also used registers $4, $5, $6, $7 for passing values to functions. I used a frame pointer for the current function to access the stack. This frame is split into multiple parts, first, enough space to store and pass values when making function calls, then space for all declared variables in the function, and lastly enough space to store temporary results from expressions. It also stores the previous frame pointer and the return address at the end of the frame. Every time I perform any operations, I store the result of that operation in that temporary space in the frame. I keep track of the stack position and temporary expression stack position using the Bindings class. This class also includes a map of variables bindings, which binds the identifier to its type and stack position relative to the frame pointer. The Bindings class is passed by value to account for scopes and variable shadowing.
The Bindings class stores the type of the identifier so that I can look
it up and perform the right operation in the Expression
class. Storing the type, however, also
means that I do not have to store the type of an Expression
, but can just deduce it.
By only using two registers for operations, I do not have to worry about having no more registers available, and which registers will not be overwritten after a function call. The temporary expression result will always be in the current frame of the function even after a function call.
As I am only using two registers to perform operations, I have to include loads and stores to access the temporary results of the operations. I also store results of an operation when I do not need the result anymore. This means that the code will run much slower.
When counting the variables that are being assigned in the current function, I assume that they are all integers. I also always store the return value $31 even when there is no function call. This means that the frame is always much larger than it needs to be.
The Type
class was not structured well for handling arrays, however, it does work well
with the other types and makes it easy to store and load variables using the
right operations. It also made it easy to identify pointers for pointer arithmatic and store
values in arrays without padding.
Function calls work well as they follow the MIPS ABI, and leave enough space for the called function to store the parameters in the previous frame. The return address is also always stored, which make recursive calls possible. Function declarations also work because they do not have to emit assembly.
I would like to work on multidimensional arrays as they do not fully work. To do that I would have to add information such as the initial array in the type so that I can assign arrays inside the array to a pointer and perform the right pointer arithmetic.
I would also like to reduce the number of dynamic and static casts by adding more intermediate classes. This would also make the whole AST design more extensible and it would be easier to possibly add structures in the future. This would also make classes more specialized and maintainable.