Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lift MSP430 machine language to RzIL #4379

Merged
merged 2 commits into from
Jul 1, 2024
Merged

Conversation

moste00
Copy link
Contributor

@moste00 moste00 commented Mar 22, 2024

Your checklist for this pull request

  • I've read the guidelines for contributing to this repository
  • I made sure to follow the project's coding style
  • I've documented or updated the documentation of every function and struct this PR changes. If not so I've explained why.
  • I've added tests that prove my fix is effective or that my feature works (if possible)
  • I've updated the rizin book with the relevant information (if needed)

Detailed description

This PR is the first step towards lifting of the MSP430 instruction set to the RzIL representation. MSP430 is a quite simple architecture with comprehensive references here (http://elec327.github.io/assets/documents/slau144j_userguide.pdf), here (https://www.ti.com/sc/docs/products/micro/msp430/userguid/ag_b.pdf) and (https://www.ti.com/sc/docs/products/micro/msp430/userguid/ag_04.pdf).

In this PR, only lifting of the following 3 single-operand instruction is implemented. This is deliberate to keep the diff small. The rest of the lifting could be either new commits on this same branch and PR or future PRs.

Instructions lifted:
 

  • RRC (Rotate Right through Carry): takes a single operand which is also the destination, shifts it right by 1 position. The MSB becomes the carry flag, and the carry flag becomes the discarded LSB (from before the shift). Other flags are updated as specified in the references. Instruction has word-sized and byte-sized variants.

  • SWPB (Swap Bytes): takes a single operand which is also the destination, and exchanges the 2 bytes in it. Doesn't update the flags or any other CPU state. Always a word-sized operation.

  • SXT (Sign eXTension): takes a single operand which is also the destination, and replicates the MSB of the lower byte of the operand to the entire upper byte (e.g. if the lower byte is less than 128, the upper byte is 255, else the upper byte is 0). Always a word-sized operation that reads the lower byte and writes the upper byte of the operand.

Each instruction in the MSP430 has seven addressing modes, only one of them is implemented in this PR, namely the Register Addressing Mode, where the operand(s) are a register index. Other indexing modes including memory access and immediates should be straightforward to add later.

Test plan

...

Closing issues

...

Copy link
Member

@DMaroo DMaroo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited about this! Feel free to reach out if you need any help.

librz/arch/isa/msp430/msp430_il.c Outdated Show resolved Hide resolved
librz/arch/isa/msp430/msp430_il.c Outdated Show resolved Hide resolved
XVilka

This comment was marked as resolved.

Copy link
Member

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Would be awesome if you finish it!

It is advisable to think about how to group several instructions. E.g. for ARM, PPC etc. we grouped them by arithmetic, shift, branch etc instructions. Mostly because the operands of those groups are almost always the same. So it is easier to initialize the operands at the start of the function and than use them to build the operations.
Check the ISA, if they group instructions into classes or something. Usually the grouping from the ISA makes the most sense.

@Rot127 Rot127 added the MSP430 label Mar 23, 2024
@moste00
Copy link
Contributor Author

moste00 commented Mar 23, 2024

Very nice! Would be awesome if you finish it!

Thanks a lot! I do intend to finish it, fingers crossed. The architecture is a bit obscure and the only available emulator (https://msp430.online/) requires a compilation toolchain because it expects an ELF or something like that, on the one hand this is good because adding support for obscure architectures is precisely why ESIL and RzIL were introduced in the first place, but on the other hand that means it's very easy to come up with bugs and - even worse - subtle deviations from the behavior of the CPU due to vague English descriptions. Add to that I'm not very familiar with RzIL API (e.g. the DUP bug above never crossed my mind), and that's why I'm taking it very slow and careful here.

It is advisable to think about how to group several instructions. E.g. for ARM, PPC etc. we grouped them by arithmetic, shift, branch etc instructions.

It appears that MSP430 instructions are very orthogonal, there is almost no difference at all between instructions of the same arity, the only 2 differences is whether instructions update the flags and whether they're emulated (syntactic sugar for specific uses of other instructions) or not.

I'm generally not familiar enough yet with the architecture to split by myself, so I'm just grouping by arity and whether the instruction is a jump instruction, this is the same approach taken by the disassembler, so I'm following it now but keeping my mind open if there are any patterns that hint if there is a better grouping.

Unlike the disassembler, which naturally doesn't mention emulated instructions (since those instructions don't exist in the byte stream input to the disassembler), it might make sense in my case to add the emulated instruction because it's more efficient. For example it's inefficient to implement a SETC (set carry flag) using the architecture's emulation, which uses a more complex instruction BIS with 2 operands and more general logic. SETC is much simpler. But I'm implementing the bare basics and worrying about efficiency and auxiliaries later.

@XVilka

This comment was marked as resolved.

XVilka

This comment was marked as resolved.

XVilka

This comment was marked as resolved.

librz/arch/isa/msp430/msp430_il.c Outdated Show resolved Hide resolved
librz/arch/isa/msp430/msp430_instruction_parse.inc Outdated Show resolved Hide resolved
librz/arch/isa/msp430/msp430_instruction_parse.inc Outdated Show resolved Hide resolved
@XVilka

This comment was marked as resolved.

@moste00
Copy link
Contributor Author

moste00 commented Apr 30, 2024

@moste00 any updates on this PR?

Really sorry for being held up this long, many many interviews and disruptions in my irl life.

What remains isn't that much. Just a rebase into a newer dev, and debugging the issue of why git-clang-format gives the green on my machine but somehow fails in the CI even though the 2 versions match. There is a complaint by tinyCC about a switch case somewhere, but I think it's a nitpick and I have a fix in mind. I will push later today and ask for review.

All in all, most of the major work is done IMO, the supported instructions pass into the lifter and come out as IL trees. It's just a matter of seeing why the CI doesn't think so from now on.

@XVilka

This comment was marked as resolved.

@moste00 moste00 force-pushed the msp430_il_lifting branch 2 times, most recently from 81c01a4 to 95b6070 Compare June 5, 2024 21:51
@moste00 moste00 force-pushed the msp430_il_lifting branch 2 times, most recently from e1f8c5e to 4df5509 Compare June 16, 2024 16:16
XVilka

This comment was marked as resolved.

librz/arch/isa/msp430/msp430_il.c Outdated Show resolved Hide resolved
test/db/asm/msp430 Outdated Show resolved Hide resolved
Copy link
Member

@XVilka XVilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is quite close to being complete. Only few steps left to do.

test/db/asm/msp430 Outdated Show resolved Hide resolved
@XVilka XVilka requested a review from DMaroo June 24, 2024 12:57
@XVilka
Copy link
Member

XVilka commented Jun 24, 2024

@moste00 check also this ASAN (UBSAN) error:

[XX] db/asm/msp430 <asm> clr 0xfffa(r4)
-- <asm> clr 0xfffa(r4) <--- 8443faff ---> <IL>
-- IL
(storew 0 (+ (bv 16 0xfffa) (var r4)) (bv 16 0x0))
../librz/arch/isa/msp430/msp430_disas.c:359:19: runtime error: left shift of 65530 by 16 places cannot be represented in type 'int'

@XVilka XVilka requested a review from Rot127 June 24, 2024 13:00
Copy link
Member

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

Besides the review comments above (mostly small syntax things), please ensure that the code is leak free. This is important because those will become hot paths when analysis algos are build on top of RzIL.

You can test it with valgrind: valgrind --leak-check=yes rz-test -j 1 test/db/asm/msp430 or with ASAN (by enabling an option, can't find currently which one though).

librz/arch/isa/msp430/msp430_disas.c Outdated Show resolved Hide resolved
librz/arch/isa/msp430/msp430_disas.c Show resolved Hide resolved
librz/arch/isa/msp430/msp430_disas.c Show resolved Hide resolved
librz/arch/isa/msp430/msp430_disas.h Outdated Show resolved Hide resolved
librz/arch/isa/msp430/msp430_disas.h Outdated Show resolved Hide resolved
librz/arch/isa/msp430/msp430_il_flags.inc Outdated Show resolved Hide resolved
librz/arch/isa/msp430/msp430_il_flags.inc Outdated Show resolved Hide resolved
librz/arch/isa/msp430/msp430_il_jmp_utils.inc Outdated Show resolved Hide resolved
@@ -0,0 +1,60 @@
// SPDX-FileCopyrightText: 2024 Mostafa Mahmoud <[email protected]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason this is an inc file? Can't it be its own header file with static inline functions?
inc files tend to break the syntax highlighting and is really just useful for huge amounts of generated code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I usually think of .hs as files that contain "non-logic" code: declarations, type definitions, constants, and so on. I think of .cs as files that contain logic and should be seperately compiled, but incs are those that contain logic but shouldn't be seperately compilable, as they're considered just fragment of a single "owning" .c file.

That's my "head cannon", so to speak. But I can change it anyway if it conflicts with any convention or aesthetic preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote to move it to a normal .c .h form. In my experience working with .inc files is always a little annoying in one or another aspect. @XVilka @DMaroo

librz/arch/isa/msp430/msp430_register_names.h Outdated Show resolved Hide resolved
@moste00
Copy link
Contributor Author

moste00 commented Jun 28, 2024

@Rot127 Thanks a lot for your kind words and for the comprehensive review.

Besides the review comments above (mostly small syntax things), please ensure that the code is leak free. This is important because those will become hot paths when analysis algos are build on top of RzIL.

You can test it with valgrind: valgrind --leak-check=yes rz-test -j 1 test/db/asm/msp430 or with ASAN (by enabling an option, can't find currently which one though).

Done. Valgrind output below indicate no leaks.

image

XVilka

This comment was marked as resolved.

Copy link
Member

@XVilka XVilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please fix these comments

}
/* @} */

// ************************************* ---------------------------- ******************************** //
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to remove non-Doxygen comment lines then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave those comment markers ? They're a very obvious way to demarcate the file into the conceptually seperate "Regions". Doxygen groups don't look as visually distinct.

With those comments, I can look at the file overview in the small window in VSCode and immeditately know that the file is seperated into independent chunks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't strongly oppose it but it will be the only file that uses such a way to group, so it will be inconsistent with other code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree partially with @XVilka here. But I like these comment blocks as well. But one line is enough, IMHO.

Maybe without those ones?

// ************************************* ---------------------------- ******************************** //

}
/** @} */

// ************************************* ------------------- ********************************* //
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same


#include <msp430/msp430_il.h>

// ``The 16-bit program counter (PC/R0) points to the next instruction to be executed.``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please convert to Doxygent then.

@XVilka XVilka requested a review from Rot127 June 29, 2024 00:44
Copy link
Member

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More nitpicks :D Sorry, always happens when looking at stuff a second time. But again, in general really nice.

librz/arch/isa/msp430/msp430_disas.c Show resolved Hide resolved
librz/arch/isa/msp430/msp430_disas.h Show resolved Hide resolved
char instr[7 + 1];
char instr[7 + 1]; ///< Null-delimited string representation of an assembly operation mnemonic.

bool is_byte; ///< does it have a .b ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool is_byte; ///< does it have a .b ?
bool is_byte; ///< does it have a ".b" postfix?

}
}

ut8 word_sizedness(const Msp430Instruction *op) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ut8 word_sizedness(const Msp430Instruction *op) {
static inline ut8 word_sizedness(const Msp430Instruction *op) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ones above should also be static, since they are in a header file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XVilka Forgot what the usual decisions on theses are (since they are quite long). static or moving to .c?

Copy link
Member

@XVilka XVilka Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not used across multiple files - just static (and inline optionally, though often that's reduntant) and move to a C file directly, if it is used across multiple files - keep it in a header file, and mark as static inline, see e.g. librz/include/rz_vector.h.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the move to a different file is intentional to conceptually seperate main logic from helpers. The helpers are used by every lifter and will clutter and derail the flow of the main .c if moved to it, not to mention make it balloon in size to almost double.

So that's why I seperate them into coherent nice pieces with names like msp430_il_getset.h (memory/registers accessors that implements addressing modes) and msp430_il_flags.h (flags setters).


#include <msp430/msp430_il.h>

// ``The 16-bit program counter (PC/R0) points to the next instruction to be executed.``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moste00 Don't forget this one, please.

librz/arch/isa/msp430/msp430_il.c Outdated Show resolved Hide resolved
}
/* @} */

// ************************************* ---------------------------- ******************************** //
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree partially with @XVilka here. But I like these comment blocks as well. But one line is enough, IMHO.

Maybe without those ones?

// ************************************* ---------------------------- ******************************** //

Copy link
Member

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just replace this one rz_warn_if_reached() change.

librz/arch/isa/msp430/msp430_disas.c Outdated Show resolved Hide resolved
@XVilka XVilka merged commit 3e258f5 into rizinorg:dev Jul 1, 2024
43 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants