This is a step by step overview how to refactor an architecture.
It can also be used to add a new architecture module. As long as it is supported by LLVM or a fork of it.
Please always contact us in the Auto-Sync tracking issue before working on a module. We can provide support and save you a lot of time.
Don't hesitate to ask any questions in our Telegram Community channel.
Especially if you feel stuck or struggle to understand where an issue is coming from. The update process is, although already simplified, relatively complex.
Note:
-
If we talk about C++ files in the steps below, we always refer to the files in the LLVM repo.
-
PrinterCapstone
is the class defined inllvm-capstone/llvm/utils/TabelGen/PrinterCapstone.cpp
-
Always attempt to make the translated C file behave as closely as possible to the original C++ file! This greatly helps debugging and assures that Capstone behaves almost exactly the same as original LLVM.
-
- Read
CONTRIBUTING.md
- Read
docs/ARCHITECTURE.md
- Read
suite/auto-sync/README.md
- Read
suite/auto-sync/ARCHITECTURE.md
- Read
suite/auto-sync/intro.md
- Delete all files in
arch/<ARCH>/
, except theARCHModule.*
andARCHMapping.*
. cd suite/auto-sync/
- Delete all files in
- Read
-
pip install -e .
- Clone and build
llvm-tblgen
(see docs) - Quickly check options of the updater
ASUpdater -h
- Add Arch name in
Target.py
- In llvm-capstone handle arch in
PrinterCapstone.cpp::decoderEmitterEmitFieldFromInstruction()
(add decoder function) - Generate:
ASUpdater -s IncGen -a ARCH
- Errors? Check if the error message tells you what to do. If no hint exists, ask us.
- Check if
inc
files inbuild
look good.
- Add Arch name in
-
- Check for template functions in
<ARCH>InstPrinter.cpp
and<ARCH>Disassember.cpp
- Copy new config in
arch_conf.json
(LoongArch for a minimal example).- Don't forget to add
ARCHIntPrinter.cpp
to the list of theAddCSDetail
tests!
- Don't forget to add
- Add as a minimum the
<ARCH>InstPrinter.cpp
,<ARCH>InstPrinter.h
and<ARCH>Disassembler.cpp
to the translation list.- Tip: The variables use in there are defined in
path_vars.json
- Tip: The variables use in there are defined in
- Add architecture specific includes in
Patches/Includes.py
. Copy the code from another architecture for the beginning. - Prepare API header (
<arch>.h
) for patching:- Check the generated
inc
files. Files names like<ARCH>GenCS<something>Enum.inc
contain enumerations for the header. Those get patched into the main header file of the architecture. - Remove old values and add
// generated content <...> begin
comments for patching. Checkoutlongarch.h
as example.
- Check the generated
- Commit all changes so far.
- The next step will write to the
arch/
andinclude/capstone/<arch>.h
header!- Run generation, translation and copy/patch the files:
ASUpdater -a <ARCH> -w --copy-translated -s IncGen Translate PatchArchHeader
- Run generation, translation and copy/patch the files:
- Check for template functions in
-
-
- Arch header:
- Invalid characters in enum identifiers? Replace char in
PrinterCapstone::normalizedMnemonic
- Invalid characters in enum identifiers? Replace char in
- In
arch/<ARCH>
- Missing identifier/symbols? -> Check if they are somewhere in the generated files. If yes, included them and update
Include.py
. If not, you have to find the LLVM source file where they are defined and add it to thearch_config.json
to translate it.- OR it needs the
SystemOperands.inc
file. Also can be generated by adding the arch to the list ininc_gen.json
.
- OR it needs the
- Missing identifier/symbols? -> Check if they are somewhere in the generated files. If yes, included them and update
- Note: When you start the next step, you likely don't want to generate, translate and copy files again. Because your had-made fixes get overwritten. So ensure you no longer use the
-w
flag for theASUpdater
and you checked thoroughly that all necessary files got translated!
- Arch header:
- Commit to save changes so far.
-
- Remove all obvious irrelevant C++ code from the translated files (e.g. class initializes)
- Double check non-obvious cases, if they are important. Rember: removing something might lead to bugs later!
- If in doubt, ask us.
- If you fix the same syntax over and over again, consider adding a Patch for the
CppTranslator
. - Common problems:
- Missing namespace prefix
unsigned GR32Regs[]
should beunsigned ARCH_GR32Regs[]
. Seenamespace begin/end
comments in the code.
- Missing namespace prefix
- TODO: Add more.
- If in doubt, check the original C++ file in the LLVM repo.
-
-
- Add
ARCHLinkage.h
and the functions in theInstPrinter.c
,ArchDisassembler.c
. - Add essential code in
ARCHMapping.c
. Esential is everything not releated to details. - If unsure how to do Capstone <-> LLVM code things, always check LoongArch. If LoongArch doesn't handle this case, check Mips, SystemZ
- Add
-
- Update regression MC tests: Map LLVM
mattr
andmcpu
names to the CS identifiers if necessary. -> Edit themcupdater.json
config file. - Update tests:
ASUpdater -s MCUpdate -a Arch -w
- Run MC tests:
cstest tests/MC/Arch
- Run MC tests:
- Update regression MC tests: Map LLVM
-
- Effectively copy behavior from
LoongArchMapping.c
orSystemZMapping.c
but change values. - Changes to the API (structs in
arch.h
) are only allowed if it was wrong before. Otherwise only extensions. - Don't forget to update the Python bindings.
- Run detail tests to check results.
- Run detail tests with coverage.
ArchMapping.c
should be covered near 100%
- Effectively copy behavior from