The compilation process generally includes the following stages
Lexical Analysis
: The process of converting the input stream into aToken
streamSyntax Analysis
: Determine whether theToken
stream satisfies the given grammar definitionSemantic Analysis
: During the syntax analysis process, some side effects will be executed, these side effects are semantic analysisIntermediate Code Generation
: The result of semantic analysis may be intermediate codeMachine Independent Code Optimization
: Intermediate code can undergo some machine-independent code optimizations, such as- Deleting common sub-expressions
- Removing useless code
- Constant merging
- Code movement
- Deleting induction variables
Machine-Dependent Target Code
: Translating intermediate code into machine code for a specific architectureMachine Dependent Code Optimization
The compilation engine provides a complete framework for compilation, abstracting the following steps
Lexical Analysis
: With the help of the compilation engine, we can create a lexical analyzer with simple construction stepsSyntax Analysis
: We only need to define the specific grammar, the compilation engine will translate the grammar definition into a state automatonSemantic Analysis
: With the help of the compilation engine, we can easily expand the semantic analysis
The compilation engine supports various different grammar analysis methods, including
LL1
: Worst analysis capabilitySLR
: Analysis capability slightly stronger thanLL1
LR0
: Analysis capability slightly stronger thanSLR
LR1
: Strongest analysis capabilityLALR
(recommended): The analysis capability is the same asLR1
, but the size of the state machine is smaller thanLR1
<dependency>
<groupId>com.github.liuyehcf</groupId>
<artifactId>compile-engine</artifactId>
<version>1.0.3</version>
</dependency>
We use the compilation engine to complete a simple calculator, with the following functions
- Supports
+
,-
,*
,/
*
,/
have higher priority than+
,-
- Supports
()
- Only supports integers
We can use DefaultLexicalAnalyzer
to build a lexical analyzer. The lexemes of the calculator include
- Operators,
+
,-
,*
,/
- Brackets
()
- Numbers
There are three types of lexemes
normal
: Normal lexeme, such as+
,-
,*
,/
herekeyword
: Keywords, which have higher priority than normal lexemesoperator
: Custom parsing process, such as parsing integers here
static LexicalAnalyzer LEXICAL_ANALYZER = DefaultLexicalAnalyzer.Builder.builder()
.addTokenOperator(Symbol.createIdentifierTerminator(IDENTIFIER_INTEGER_LITERAL), new IntegerIdentifier())
.addNormalMorpheme(Symbol.createTerminator(NORMAL_SMALL_LEFT_PARENTHESES), "(")
.addNormalMorpheme(Symbol.createTerminator(NORMAL_SMALL_RIGHT_PARENTHESES), ")")
.addNormalMorpheme(Symbol.createTerminator(NORMAL_MUL), "*")
.addNormalMorpheme(Symbol.createTerminator(NORMAL_DIV), "/")
.addNormalMorpheme(Symbol.createTerminator(NORMAL_ADD), "+")
.addNormalMorpheme(Symbol.createTerminator(NORMAL_SUB), "-")
.build();
The grammar definition of the calculator is as follows
<program> → <expression>
<expression> → <additive expression>
<additive expression> → <additive expression> + <multiplicative expression>
| <additive expression> - <multiplicative expression>
| <multiplicative expression>
<multiplicative expression> → <multiplicative expression> * <primary>
| <multiplicative expression> * <primary>
| <multiplicative expression> / <primary>
| <primary>
<primary> → #integerLiteral
| ( <expression> )
Using the grammar definition tool provided by the compilation engine, we translate the above grammar:
Symbol
: Grammar symbols, including terminators and non-terminatorsSymbolString
: Grammar symbol stringPrimaryProduction
: Productions, can contain semantic actionsProduction
: A collection of productions with the same left-hand sideGrammar
: Grammar
For example code, refer to com.github.liuyehcf.framework.compile.engine.test.example.calculator.CalculatorGrammar
The semantic actions of the calculator are simple:
- When reducing
<additive expression> → <additive expression> + <multiplicative expression>
, addAdd
opcode - When reducing
<additive expression> → <additive expression> - <multiplicative expression>
, addSub
opcode - When reducing
<multiplicative expression> → <multiplicative expression> * <primary>
, addMul
opcode - When reducing
<multiplicative expression> → <multiplicative expression> / <primary>
, addDiv
opcode - When reducing
<primary> → #integerLiteral
, addLoad
opcode
The generated product
after compilation is a collection of opcodes containing Add
, Sub
, Mul
, Div
, Load
. The result calculation can be done by performing calculations in the order of the opcode arrangement.
For example code, refer to com.github.liuyehcf.framework.compile.engine.test.example.calculator.CalculatorCode