- Set up GreenThumb (see the top-level README).
- Try to get the ARM cooperative superoptimizer running (see the top-level README).
- Read Greenthumb: Superoptimizer Construction Framework (CC'16).
The framework utilizes inheritance to provide retargetability. The top level directory contains the superclasses to be extended. File xxx.rkt
contains the implementation of superclass xxx%
. For the purpose of a demonstration, assume we want to construct a superoptimizer for simplified ARM (called armdemo
). To extend the framework , we have to implement class armdemo-xxx%
extending class xxx%
(in armdemo/armdemo-xxx.rkt
file).
First, run the setup script with the name of the ISA:
./setup.py armdemo
The script will generate armdemo
directory that contains
armdemo-xxx.rkt
programs. A programarmdemo-xxx.rkt
is a skeleton code defining the classarmdemo-xxx%
extending the superclassxxx%
. The generated code also includes the declarations of the required methods the users have to implement along with comments explaining the methods and examples.test-simulator.rkt
program for testing the ISA simulatortest-search.rkt
program for testing the individual search techniquesmain.rkt
andoptimize.rkt
programs for running the complete cooperative search launching multiple search instances
The setup script generates example implementations of the required functions that accompany this documentation. The generated code includes ?
hooks that indicate locations that developers need to modify. If you use DrRacket as an editor, it will complain about ?
, and you can easily jump to ?
. For the purpose of the demo, if we remove those ?
s and simply use the example code, this demo should work out of the box.
In this documentation, we will walk you through building a superoptimizer in 7 steps under multiple sections.
Section A. ISA Basic Description
- Step 1: Extend
machine%
.
Section B. Program Intermediate Representations
- Step 2: Extend
parser%
andprinter%
. Test the parser and printer usingtest-simulator.rkt
.
- Step 3: Extend
simulator-rosette%
andsimulator-racket%
. Test the simulators usingtest-simulator.rkt
.
- Step 4: Extend
symbolic%
. Test the symbolic search usingtest-searh.rkt
- Step 5: Extend
stochastic%
. Test the stochastic search usingtest-searh.rkt
- Step 6: Extend
forwardbackward%
. Test the enumerative search usingtest-searh.rkt
- Step 7: Enalbe the cooperative search by implementing:
- method
info-from-file
ofarmdemo-parser%
- methods
config-from-string-ir
andoutput-string-constraint
ofarmdemo-printer%
- method
len-limit
ofarmdemo-symbolic%
andarmdemo-forwardbackward%
- method
We must define the description of the ISA, including its opcodes, the ISA bitwidth, and the structure of the program state.
Step 1: machine.
We define such information in the class armdemo-machine%
, which extends the class machine%
.
Step 1.1: bitwidth.
First, we define how many bits are used to represent the smallest unit of value by setting the field bitwidth
. Since we are supporting ARM 32-bit arithmetic and logical instructions, we set bitwidth to 32.
Step 1.2: program state structure.
Next, we must define the structure of a program state (machine state/CPU state) and types of elements a program state contains. Say our armdemo
processor contains 32-bit registers and memory (no flags). Therefore, there are two types of elements in our program state: register and memory. For each type of program state element, we need to name it using Racket symbol. In this demo, we name register type 'reg
. For memory, GreenThumb provides special objects including memory, so the name of memory type can be obtained by calling (get-memory-type)
. See more information about provided special objects here.
Now we are ready to define the structure of a program state by overriding progstate-structure
method. The program state structure must be defined in terms of program state elements' types.
For example, we define an armdemo
program state as follows:
(define (progstate-structure)
(progstate (for/vector ([i config]) 'reg)
(get-memory-type)))
where progstate
is defined at the top of armdemo-machine.rkt
as a macro for creating a vector with two entries. The first entry of our program state contains a vector of register's values, and the second entry contains memory.
Notice that in an actual processor, the number of registers is fixed, but the number of registers in a piece of code we want to optimize is usually less than the number of available registers. Therefore, we use config
to represent number of registers of interest. The smaller the number of registers (smaller program state in general) the smaller the search space. Now, you may ask, how does config
get set? We can manually set config
when creating armdemo-machine%
in test-simulator.rkt
and test-search.rkt
. We can also extract config
by analyzing the code we want to optimize (which will be explained later in Step 7.2)
In general, a program state must be a mutable object, so we use vectors instead of lists (Racket list is immutable). This is crucial for the bidirectional search strategy in the enumerative superoptimizer.
Step 1.3: program state element type. Next, we have to provide more information about each type of a program state element. In particular, we have to tell GreenThumb how to get and set a value of a specific program state element and what the valid values of such element. We do this by calling:
(define-progstate-type <element-type>
#:get <a function to get a value>
#:set <a function to set a value>
#:min <optional minimum valid value (inclusive)>
#:max <optional maximum valid value (inclusive)>)
For memory type, we define:
(define-progstate-type (get-memory-type)
#:get (lambda (state) (progstate-memory state))
#:set (lambda (state val) (set-progstate-memory! state val)))
For register type, we define:
(define-progstate-type 'reg
#:get (lambda (state arg) (vector-ref (progstate-regs state) arg))
#:set (lambda (state arg val) (vector-set! (progstate-regs state) arg val)))
Since there are multiple elements of type register, the get and set functions take in a program state and an additional argument arg
indicating which register it should reference. This additional argument is coming from operands in an instruction.
We use #:min
and #:max
for elements that do not take all values of the defined bitwidth. For example, if our program state includes a flag that only takes value 0 and 1, we can use #:min
and #:max
for such type of element.
These get and set functions are used for many tasks in the stochastic and enumerative search such as interpreting an instruction backward, updating liveness forward and backward, obtaining valid instruction operands, and obtaining value instructions.
Step 1.4: instruction operand type. Next, we must define operand types of instructions. In particular, we have to tell GreenThumb which values the stochastic and enumerative superoptimizer should try when synthesizing an instruction. We do this by calling:
(define-arg-type <operand-type> (lambda (config) <a list of values>))
For armdemo
, there are 3 types of instruction operands: 'reg
(register), 'const
(32-bit constant), and 'bit
(shifting constant, so it's values are less than 32) defined as follows:
(define-arg-type 'reg (lambda (config) (range config)))
(define-arg-type 'const (lambda (config) '(0 1 -1 -2 -8)))
(define-arg-type 'bit (lambda (config) '(0 1)))
Note that lists of values we define here are values the stochastic and enumerative superoptimizers try during synthesizing candidate programs. Although'const
actually ranges from 0 to 2^32-1, we do not want the stochastic and enumerative superoptimizers to try all 2^32 values, so instead we inform them to only try the values we specified here. Additionally, GreenThumb analyzes an input program to be optimized, and automatically adds more values to these lists according to their operand types if those values appear in the input program.
Furthermore, notice that 'reg
is used as both a program state element type and an instruction operand type. When this happens, the value of an instruction operand of type 'reg
will be used as the additional argument for the get and set functions when accessing a program state element of type 'reg
(as described in Step 1.3). If we name this operand type differently, GreenThumb may not be able to pass the value of the operand to the get and set functions. See [Section Connection between Instruction Operand and Program State Element in Advanced Usage], if you need to name the operand type differently.
For other operand types, we may name them freely. However, GreenThumb treats 'const
and 'bit
specially. In particular, the enumerative search works in a reduced-bitwidth domain (4 bits instead of 32 bits). Therefore, the enumerative search converts the input source code into the reduced-bitwidth version. If an operand has type 'const
or 'bit
, GreenThumb will convert the constant appropriately for its kind. For example, value 31 of type 'bit
will be converted to 3; shifting by 31 bits in 32-bit domain is equivalent to shifting by 3 bits in 4-bit domain. Large numbers of type 'const
will be converted into 4-bit numbers by masking the lower 4 bits. If an operand has a type other than 'const
or 'bit
, GreenThumb will preserve its value during conversion.
Step 1.5: instruction classes. Next, we define instructions in the ISA. We must inform GreenThumb how many opcodes there are in an instruction by calling:
(init-machine-description <number of opcodes per an instruction>)
For many ISAs, there is one opcode in an instruction. For the purpose of the demo, armdemo
has one opcode in an instruction. However, in practice, arm
has multiple opcodes in an instruction, including a base opcode, a conditional suffix, and an optional shift. See [Section Instruction with Multiple Opcodes in Advanced Usage], if there are more than one opcode in an instruction.
Let's now define instructions in our ISA using the previously defined operand types and program state element types. Instructions of a same class are defined together. Instructions are in the same class if their operands have the same types; and their inputs from and outputs to a program state also have the same types. An instruction class can be defined as:
(define-instruction-class
<class name> <a list of opcodes>
#:args <a list of operand types> #:ins <a list of inputs> #:outs < a list of outputs>
#:commute <optional pair of commutative operands>)
For example, we define:
(define-instruction-class 'rrr-commute '(add xor)
#:args '(reg reg reg) #:ins '(1 2) #:outs '(0) #:commute '(1 . 2))
(define-instruction-class 'rri '(add# xor#)
#:args '(reg reg const) #:ins '(1) #:outs '(0))
(define-instruction-class 'load '(load)
#:args '(reg reg) #:ins (list 1 (get-memory-type)) #:outs '(0))
The lists given to #:ins
and #:outs
can contain numbers (which refer to operands from #:args
) and program state element types.
For instructions add
and xor
, #:args '(reg reg reg)
informs that these instructions have 3 operands. #:ins '(1 2)
informs that these instructions take two inputs, the values of program state elements specified by operands 1 and 2 (index starting from 0). #:outs '(0)
informs that their output is stored in the program state element specified by operand 0. #:commute '(1 . 2)
informs that operands 1 and 2 are commutative.
For instruction add#
, we define #:ins '(1)
instead of #:ins '(1 2)
. This is because although 'const
(operand 2) is an input to add#
, but 'const
is not a part of program state. GreenThumb only needs to know about inputs to and outputs from an instruction that are parts of program state. However, we can still define #:ins '(1 2)
for add#
, and GreenThumb will remove 2
from the list automatically. Note that in fact, add#
is not an ARM instruction; it corresponds to instruction add rX, rX, immediate
. However, we need to distinguish add rX, rX, immediate
from add rX, rX, rX
. Therefore, we need to create a different opcode name for add rX, rX, immediate
.
For instruction load
, we define #:ins (list 1 (get-memory-type))
because there is an additional implicit input which is memory.
Finally, we must call the following function after defining all instruction classes.
(finalize-machine-description)
GreenThumb provides an instruction representation, defined as (struct inst (op args))
. An instruction representation is a building block for constructing program representations. GreenThumb uses three levels of program representations:
Source is a plain-text source. For example, the program p
below is an ARM program in the source format:
lsr r0, r0, 3 ; r0 = r0 >> 3
lsl r0, r0, 3 ; r0 = r0 << 3
String IR is an IR after parsing a source, which is a vector of inst
. Each inst
includes an opcode in its field op
and a vector of arguments in its field args
. Opcodes and arguments are represented as strings. The program p
in the string-IR format is:
(vector (inst "lsr" (vector "r0" "r0" "3"))
(inst "lsl" (vector "r0" "r0" "3")))
Encoded IR is an IR after encoding a String IR. It is also a vector of inst
, but its op
and args
fields contain integer IDs instead of strings. An opcode ID is an integer indexing into the opcodes
vector in machine%. A register ID 'X' is an integer that maps to register "rX". For constants, we simply convert strings to numbers. The program p
in the encoded-IR format may looks like:
(vector (inst 3 (vector 0 0 3))
(inst 4 (vector 0 0 3)))
Note that "lsr r0, r0, 3" and "lsr r0, r0, r3" should have different opcode IDs. Our convention is to use 'lsr#
for the one with the constant operand and 'lsr
for the one with the register operand when defining instruction classes (Step 1.5).
All components except parser%
and printer%
work with an encoded IR, because it enables representing programs with bitvector logic formulas used in the symbolic search and equivalence validator (which verifies the equivalence of two programs).
Step 2: Parser and Printer Since parser%
and printer%
are responsible for converting sources to encoded IRs and vice versa, we must extend them by implementing:
- class
armdemo-parser%
, which parsesarmdemo
IR source code into string-IR format. - three methods in the class
armdemo-printer%
:print-syntax-inst
prints string-IR program in source format;encode-inst
converts string-IR to encoded-IR format; anddecode-inst
converts encoded-IR to string-IR format.
Apart from being able to parse source code with complete instructions, the parser should be able to parse string "?" as (inst #f #f)
for both String-IR and encoded-IR formats. This is used as a place holder for an instruction to be synthesized during testing.
See example code generated from the setup script on how to implement these methods including handling "?". After implementing these methods, use test-simulator.rkt
to test the parser and printer.
In order for GreenThumb to understand the semantics of the new ISA and evaluate the performance of different code fragments, we have to implement an functional simulator and define its performance model.
Step 3: Simulator We must implement the methods interpret
and performance-cost
of the classes armdemo-simulator-racket%
and armdemo-simulator-rosette%
. The implementations of these two classes are in fact identical except that the former is implemented in #lang racket
, while the latter in #lang s-exp rosette
. The Racket simulator is used to interpret sequences of instructions on concrete inputs in the stochastic and enumerative search. The Rosette simulator is used by the symbolic search and equivalence validator. Although the Rosette simulator can also be used in the stochastic and enumerative search, it is slower than the Racket simulator. See example code generated from the setup script how to implement these methods. After implement these methods, use test-simulator.rkt
to test the simulator. uncomment more code to test this.
- Rosette supports most Racket operations but not all. For example, Rosette does not support any symbolic hash operation or some vector operations (e.g. vector-copy). If you wish to use some of these handy functions, take a look at provided functions in
ops-rosette.rkt
, which can be used instead of the original functions. We also provide the same functions but implemented more efficiently inops-racket.rkt
, which can be used insimulator-racket%
. - Notice the provided function
(finitize-bit x)
.(finitize-bit x)
calls(finitize x bit)
, where thefinitize
function truncates overflowedx
tobit
bits and convertx
to a signed number. Our convention is that every value inprogstate
has to be in the signed format (e.g. -1 instead of 2^32 - 1 for a 32-bit number). This is because we need the racket simulator to be consistent with the constraint solver we use, which works with signed numbers. Therefore, always callfinitize-bit
after performing an arithmetic operation.
It might be easier to only modify armdemo-simulator-rosette.rkt
. Once it works correctly, copy the entire file armdemo-simulator-rosette.rkt
to armdemo-simulator-racket.rkt
, and replace any appearance of rosette with racket.
Step 4: Symbolic Search Once the simulator is working. The symbolic search should work right out of the box! Use test-search.rkt
to test the symbolic search on a small code fragment. Make sure to test the search for every instruction.
Step 5: Stochastic Search
We need to extend the method (correctness-cost state1 state2 live)
of the class stochastic%
. STOKE (ASPLOS'13)
suggests a correctness cost to be the number of non-matching bits between the live outputs of program states state1
and state2
with rewards for correct (or nearly correct) values in wrong locations.
stochastic%
provides the method (correctness-cost-base state1 state2 live diff-bits)
to calculate the suggested correctness cost when
each of state1
and state2
is a vector of values, given a lambda function diff-bits
that counts number of non-matching bits between two values. Thus, we use correctness-cost-base
for the register part of armdemo
program state.
The memory object memory-racket%
also provides the method (correctness-cost other-memory diff-bits bitwidth)
. For each memory location that has beeen modified, it counts the number of non-matching bits of the values from the two memory objects. Thus, we call this method for the memory part of armdemo
program state.
Once we implement this method, uncomment the stochastic search section in test-search.rkt
and run it to test the stochastic search.
Step 6: Enumerative Search
If our ISA has memory accessing instructions, we must extend the method update-progstate-ins-load
and update-progstate-ins-store
of the class machine%
. This function is used for executing memory instruction backward, in particular, for modifying the input parts of a program state. See comments and examples in the generated code. For more complicated load and store instructions, see Customize Inverse Interpreter for Load/Store Instructions in Advanced Usage.
Once we implement these methods, uncomment the enumerative search section in test-search.rkt
and run it to test the enumerative search.
Set code
definition to a program with only a newly added instruction and another dummy instruction (such as add 0) to create an obviously inefficient program. Add or remove ?
in sketch
definition to define how many instructions an output program may contain; one ?
represents one instruction.
Run each of the symbolic, stochastic, and enumerative search at a time.
- When the stochastic and enumerative search find a better program, they should print
FOUND!!!
followed by an output program. - The symbolic and enumerative search should return with an optimized program quickly (for program with one instruction).
- For the stochastic search, we might have to wait a little longer. If we don't see it printing
FOUND!!!
within a minute, something may be wrong.
To test that the backward search of the enumerative search works properly, run the enumerative search again but on a larger program (four instructions) that can be optimized to three instructions. This is because the enumerative search only performs the backward search when it tries to synthesize programs with three instructions or more. Make sure to add more ?
in sketch
definition if an output program contains more instructions.
Step 7: Cooperative Search To get the coopeative search working, we have to do the following.
Step 7.1: Window Decomposition
To help a search instance determine an appropriate size of a window in the context-aware window decomposition, we must implement the method (len-limit)
of armdemo-forwardbackward%
and armdemo-symbolic%
to return L, the size of code fragment (the number of instructions) that can be synthesized within one minute. The cooperative search
varies the window size used for the different search instances; in particular, it uses L, 2L, 3L, and 4L window sizes.
Step 7.2: Liveness Information and Machine Config
To invoke the cooperative superoptimizer, we run optimize.rkt
and pass in the filename of code to be optimized. Along with the file containing code to be optimized, the superoptimizer expects another file with the same name appended by ".info" that contains live-out infomation and additional information if necessary. Therefore, the superoptimizer needs to parse the info file, so we must implement:
- the method
(info-from-file filename)
inarmdemo-parser.rkt
to parse live-out information from a given file. The file may contain more information such as preconditions of the inputs. - the method
(output-constraint-string live)
inarmdemo-printer.rkt
. The argumentlive
that is passed to this method is an output frominfo-from-file
method. This method should convertlive
into a string that is evaluated to a program state that contains #t and #f, where #t indicates that the corresponding element in the program state is live.
The superoptimizer also needs to set the config
field of the machine%
class because we no longer create the machine object manually as in test-search.rkt
. We must implement the method (config-from-string-ir program)
to analyze a given program and returns a machine config. program
is given in the string-IR format.
Step 7.3: Running the Cooperative Search
Run the superoptimizer using armdemo/optimize.rkt
. The usage of armdemo/optimize.rkt
is the same as the usage of arm/optimizer.rkt
as explained in the top-level README.
Note that even if we do not implement all search techniques, for instance, we only implement armdemo-stochastic%
, we can still use optimize.rkt
to run the stochastic search instances in parallel, communicating to each other about the best program.