-
-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lift MSP430 machine language to RzIL #4379
Conversation
a68c9ea
to
5438a1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited about this! Feel free to reach out if you need any help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Would be awesome if you finish it!
It is advisable to think about how to group several instructions. E.g. for ARM, PPC etc. we grouped them by arithmetic, shift, branch etc instructions. Mostly because the operands of those groups are almost always the same. So it is easier to initialize the operands at the start of the function and than use them to build the operations.
Check the ISA, if they group instructions into classes or something. Usually the grouping from the ISA makes the most sense.
Thanks a lot! I do intend to finish it, fingers crossed. The architecture is a bit obscure and the only available emulator (https://msp430.online/) requires a compilation toolchain because it expects an ELF or something like that, on the one hand this is good because adding support for obscure architectures is precisely why ESIL and RzIL were introduced in the first place, but on the other hand that means it's very easy to come up with bugs and - even worse - subtle deviations from the behavior of the CPU due to vague English descriptions. Add to that I'm not very familiar with RzIL API (e.g. the DUP bug above never crossed my mind), and that's why I'm taking it very slow and careful here.
It appears that MSP430 instructions are very orthogonal, there is almost no difference at all between instructions of the same arity, the only 2 differences is whether instructions update the flags and whether they're emulated (syntactic sugar for specific uses of other instructions) or not. I'm generally not familiar enough yet with the architecture to split by myself, so I'm just grouping by arity and whether the instruction is a jump instruction, this is the same approach taken by the disassembler, so I'm following it now but keeping my mind open if there are any patterns that hint if there is a better grouping. Unlike the disassembler, which naturally doesn't mention emulated instructions (since those instructions don't exist in the byte stream input to the disassembler), it might make sense in my case to add the emulated instruction because it's more efficient. For example it's inefficient to implement a SETC (set carry flag) using the architecture's emulation, which uses a more complex instruction BIS with 2 operands and more general logic. SETC is much simpler. But I'm implementing the bare basics and worrying about efficiency and auxiliaries later. |
This comment was marked as resolved.
This comment was marked as resolved.
5438a1c
to
0fa7e53
Compare
0fa7e53
to
975f225
Compare
This comment was marked as resolved.
This comment was marked as resolved.
Really sorry for being held up this long, many many interviews and disruptions in my irl life. What remains isn't that much. Just a rebase into a newer dev, and debugging the issue of why All in all, most of the major work is done IMO, the supported instructions pass into the lifter and come out as IL trees. It's just a matter of seeing why the CI doesn't think so from now on. |
525219e
to
4d28ad6
Compare
This comment was marked as resolved.
This comment was marked as resolved.
81c01a4
to
95b6070
Compare
e1f8c5e
to
4df5509
Compare
4df5509
to
7d1a32e
Compare
7d1a32e
to
e63f202
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is quite close to being complete. Only few steps left to do.
@moste00 check also this ASAN (UBSAN) error:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job!
Besides the review comments above (mostly small syntax things), please ensure that the code is leak free. This is important because those will become hot paths when analysis algos are build on top of RzIL.
You can test it with valgrind: valgrind --leak-check=yes rz-test -j 1 test/db/asm/msp430
or with ASAN (by enabling an option, can't find currently which one though).
@@ -0,0 +1,60 @@ | |||
// SPDX-FileCopyrightText: 2024 Mostafa Mahmoud <[email protected]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason this is an inc
file? Can't it be its own header file with static inline
functions?
inc
files tend to break the syntax highlighting and is really just useful for huge amounts of generated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I usually think of .h
s as files that contain "non-logic" code: declarations, type definitions, constants, and so on. I think of .c
s as files that contain logic and should be seperately compiled, but inc
s are those that contain logic but shouldn't be seperately compilable, as they're considered just fragment of a single "owning" .c
file.
That's my "head cannon", so to speak. But I can change it anyway if it conflicts with any convention or aesthetic preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rot127 Thanks a lot for your kind words and for the comprehensive review.
Done. Valgrind output below indicate no leaks. |
e63f202
to
dd23c0b
Compare
dd23c0b
to
225d05a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but please fix these comments
librz/arch/isa/msp430/msp430_il.c
Outdated
} | ||
/* @} */ | ||
|
||
// ************************************* ---------------------------- ******************************** // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot to remove non-Doxygen comment lines then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave those comment markers ? They're a very obvious way to demarcate the file into the conceptually seperate "Regions". Doxygen groups don't look as visually distinct.
With those comments, I can look at the file overview in the small window in VSCode and immeditately know that the file is seperated into independent chunks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't strongly oppose it but it will be the only file that uses such a way to group, so it will be inconsistent with other code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree partially with @XVilka here. But I like these comment blocks as well. But one line is enough, IMHO.
Maybe without those ones?
// ************************************* ---------------------------- ******************************** //
librz/arch/isa/msp430/msp430_il.c
Outdated
} | ||
/** @} */ | ||
|
||
// ************************************* ------------------- ********************************* // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
librz/arch/isa/msp430/msp430_il.c
Outdated
|
||
#include <msp430/msp430_il.h> | ||
|
||
// ``The 16-bit program counter (PC/R0) points to the next instruction to be executed.`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please convert to Doxygent then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More nitpicks :D Sorry, always happens when looking at stuff a second time. But again, in general really nice.
librz/arch/isa/msp430/msp430_disas.h
Outdated
char instr[7 + 1]; | ||
char instr[7 + 1]; ///< Null-delimited string representation of an assembly operation mnemonic. | ||
|
||
bool is_byte; ///< does it have a .b ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool is_byte; ///< does it have a .b ? | |
bool is_byte; ///< does it have a ".b" postfix? |
} | ||
} | ||
|
||
ut8 word_sizedness(const Msp430Instruction *op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ut8 word_sizedness(const Msp430Instruction *op) { | |
static inline ut8 word_sizedness(const Msp430Instruction *op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ones above should also be static
, since they are in a header file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XVilka Forgot what the usual decisions on theses are (since they are quite long). static
or moving to .c
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not used across multiple files - just static
(and inline
optionally, though often that's reduntant) and move to a C file directly, if it is used across multiple files - keep it in a header file, and mark as static inline
, see e.g. librz/include/rz_vector.h
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case the move to a different file is intentional to conceptually seperate main logic from helpers. The helpers are used by every lifter and will clutter and derail the flow of the main .c
if moved to it, not to mention make it balloon in size to almost double.
So that's why I seperate them into coherent nice pieces with names like msp430_il_getset.h
(memory/registers accessors that implements addressing modes) and msp430_il_flags.h
(flags setters).
librz/arch/isa/msp430/msp430_il.c
Outdated
|
||
#include <msp430/msp430_il.h> | ||
|
||
// ``The 16-bit program counter (PC/R0) points to the next instruction to be executed.`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@moste00 Don't forget this one, please.
librz/arch/isa/msp430/msp430_il.c
Outdated
} | ||
/* @} */ | ||
|
||
// ************************************* ---------------------------- ******************************** // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree partially with @XVilka here. But I like these comment blocks as well. But one line is enough, IMHO.
Maybe without those ones?
// ************************************* ---------------------------- ******************************** //
225d05a
to
c29400f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please just replace this one rz_warn_if_reached()
change.
Your checklist for this pull request
Detailed description
This PR is the first step towards lifting of the MSP430 instruction set to the RzIL representation. MSP430 is a quite simple architecture with comprehensive references here (http://elec327.github.io/assets/documents/slau144j_userguide.pdf), here (https://www.ti.com/sc/docs/products/micro/msp430/userguid/ag_b.pdf) and (https://www.ti.com/sc/docs/products/micro/msp430/userguid/ag_04.pdf).
In this PR, only lifting of the following 3 single-operand instruction is implemented. This is deliberate to keep the diff small. The rest of the lifting could be either new commits on this same branch and PR or future PRs.
Instructions lifted:
RRC (Rotate Right through Carry): takes a single operand which is also the destination, shifts it right by 1 position. The MSB becomes the carry flag, and the carry flag becomes the discarded LSB (from before the shift). Other flags are updated as specified in the references. Instruction has word-sized and byte-sized variants.
SWPB (Swap Bytes): takes a single operand which is also the destination, and exchanges the 2 bytes in it. Doesn't update the flags or any other CPU state. Always a word-sized operation.
SXT (Sign eXTension): takes a single operand which is also the destination, and replicates the MSB of the lower byte of the operand to the entire upper byte (e.g. if the lower byte is less than 128, the upper byte is 255, else the upper byte is 0). Always a word-sized operation that reads the lower byte and writes the upper byte of the operand.
Each instruction in the MSP430 has seven addressing modes, only one of them is implemented in this PR, namely the Register Addressing Mode, where the operand(s) are a register index. Other indexing modes including memory access and immediates should be straightforward to add later.
Test plan
...
Closing issues
...