Issue 469: Semantically invalid IR statements due to bad Subregister Substitution #470

vobst · 2024-06-14T14:57:03Z

It was observed that the Subregister Substitution may produce expressions that are semantically malformed. This happens when assignments are made to the lower bytes of a full register and this subregister has no special name.

It was noticed since evaluating a SUBPIECE expression of size zero over the PI abstract domain lead to panics (#469).

Note: The relevant patch is in the first commit. I snuck in some more debugging patches here ... just don't mind them :)

Fix

The underlying issue is that when assigning to the lower bytes of registers the miss-translated pcode does not use special subregister names (or there simple are no such names), but rather used the full register with a smaller size. In those cases the old code would end up thinking that the subregister had the same size as the full register and the SUBPIECE expression over the old register value that is needed to build a sill-size assignment would end up being of size zero.

This is corrected by setting the register size of the full register to the size of the variable being assigned. If the register size is already correct, this should be a no-op. The fix is in the first commit.

Examples

Here are some examples of previously malformed expressions that are corrected by this patch:

`lxvd2x` on ppc64le

  0016e454 98 56     lxvd2x vs0,r28,r10
           1c 7c
                                   $U56800:8 = INT_ADD r28, r10
                                   f0 = LOAD ram($U56800:8)
                                   $U56900:8 = INT_ADD $U56800:8, 8:8
                                   vs0:8 = LOAD ram($U56900:8)

becomes

     DEF [instr_0016e454_0] $U5a300:8(temp) = (r28:8 + r10:8)
     DEF [instr_0016e454_1] loaded_value:8(temp) := Load from $U5a300:8(temp)
     DEF [instr_0016e454_1_cast_to_base] vs0:16 = (loaded_value:8(temp) Piece (vs0:16)[0-7])
     DEF [instr_0016e454_2] $U5a400:8(temp) = ($U5a300:8(temp) + 0x8:8)
     DEF [instr_0016e454_3] loaded_value:8(temp) := Load from $U5a400:8(temp)
-    DEF [instr_0016e454_3_cast_to_base] vs0:16 = ((vs0:16)[16-15] Piece loaded_value:8(temp))
+    DEF [instr_0016e454_3_cast_to_base] vs0:16 = ((vs0:16)[8-15] Piece loaded_value:8(temp))

`cvt.s.L` on mips64el

  00117788 60 01     cvt.s.L       f5,f0
           a0 46
                                   f5:4 = INT2FLOAT f0

becomes

-    DEF [instr_0011778c_0] f5:8 = ((f5:8)[8-7] Piece ((f5:8)[0-3] FloatAdd (f5:8)[0-3]))
+    DEF [instr_0011778c_0] f5:8 = ((f5:8)[4-7] Piece ((f5:8)[0-3] FloatAdd (f5:8)[0-3]))

`sel` on arm

  08000276 a3 fa     sel    r5,r3,r7
           87 f5
                     $Ub8680:1 = INT_EQUAL GE1, 1:1
                     $Ub8700:1 = INT_MULT $Ub8680:1, r3:1
                     $Ub8780:1 = INT_EQUAL GE1, 0:1
                     $Ub8800:1 = INT_MULT $Ub8780:1, r7:1
                     r5:1 = INT_ADD $Ub8700:1, $Ub8800:1

becomes

     DEF [instr_08000276_1] $Ub7900:1(temp) = $Ub7880:1(temp) * (r3:4)[0-0]
     DEF [instr_08000276_2] $Ub7980:1(temp) = (GE1:1 == 0x0:1)
     DEF [instr_08000276_3] $Ub7a00:1(temp) = $Ub7980:1(temp) * (r7:4)[0-0]
-    DEF [instr_08000276_4] r5:4 = ((r5:4)[4-3] Piece ($Ub7900:1(temp) + $Ub7a00:1(temp)))
+    DEF [instr_08000276_4] r5:4 = ((r5:4)[1-3] Piece ($Ub7900:1(temp) + $Ub7a00:1(temp)))

Numbers

Here are the numbers of corrected lines for each program in our minimal test suite. All changes were manually verified to be correct.

| program  | arch        | occurrences |
|----------|-------------|------------|
| netfs.ko | amd64       | 0          |
| netfs.ko | arm64       | 0          |
| netfs.ko | armhf       | 0          |
| netfs.ko | mips32r2el  | 0          |
| netfs.ko | mips64r2el  | 0          |
| netfs.ko | powerpc64le | 0          |
| netfs.ko | x86         | 0          |
| ls       | amd64       | 0          |
| ls       | arm64       | 0          |
| ls       | armel       | 0          |
| ls       | armhf       | 0          |
| ls       | mips64el    | 54         |
| ls       | mipsel      | 0          |
| ls       | ppc64el     | 66         |
| ls       | x86         | 0          |
| readelf  | amd64       | 0          |
| readelf  | arm64       | 0          |
| readelf  | armel       | 0          |
| readelf  | armhf       | 0          |
| readelf  | mips64el    | 0          |
| readelf  | mipsel      | 0          |
| readelf  | ppc64el     | 31         |
| readelf  | x86         | 0          |

Assignments to subregisters that have the same name as the full register may lead to expressions that have SUBPIECE operations that would result in zero-sized values. In general, the subregister substitution produces semantically incorrect IR code in those cases. I observed that this occurs when assignments are made to the lower bytes of a larger register. Those cases should be handled by this patch. If this may also happen for assignments to other subregisters, then this patch is incomplete (see FIXME). Reported-by: https://github.com/ljungnickel Closes: fkie-cad#469 Signed-off-by: Valentin Obst <[email protected]>

…values If we see zero-sized values flying around it is a clear sign that something, somewhere went terribly wrong, so just print a meaningful error and unwind. Signed-off-by: Valentin Obst <[email protected]>

Replace integer arithmetic that might overflow with the appropriate methods to explicitly opt-in to the behavior. Since these methods have the same semantics as the bare operators (in release builds) there should be no semantic changes here. I did not investigate if the original code was aware that these operations might overflow. In particual, it is not clear to me if the right shift is expected to potentially overflow. Signed-off-by: Valentin Obst <[email protected]>

While debugging analysis issues I found myself reinserting the same print statements at the same places for each new investigation. Therefore, introduce an extensible infrastructure to debug different parts of the analysis. The `utils::debug::Settings` object is derived from the command line and an immutable reference to it is the sole thing that is being passed around to convey the current debug environment to different part of the code. It also provides helper methods to perform debug operations based on the runtime environment. The related command-line options and the output format are entirely unstable and only meant for development purposes. Signed-off-by: Valentin Obst <[email protected]>

The output of the Ghidra plugin (and Ghidra in general) is non-deterministic, even on the same Ghidra version. Probably this is due to analysis timeouts. Furthermore, running Ghidra takes a significant portion of the analysis time. Therefore, we can create reproducibility as well as speedups during development by saving the plugin output (using `--debug pcode-raw`) and replaying it later (via `--pcode-raw /path/to/file`). Signed-off-by: Valentin Obst <[email protected]>

Signed-off-by: Valentin Obst <[email protected]>

Valentin Obst added 6 commits June 14, 2024 23:11

lib/abstract_domain/data: add debug assertions to prevent zero-sized …

33229f3

…values If we see zero-sized values flying around it is a clear sign that something, somewhere went terribly wrong, so just print a meaningful error and unwind. Signed-off-by: Valentin Obst <[email protected]>

clippy: Rust 1.79 fixes

5cfd32d

Signed-off-by: Valentin Obst <[email protected]>

vobst force-pushed the issue_469 branch from cf3c238 to 5cfd32d Compare June 14, 2024 21:13

vobst added the bug Something isn't working label Jun 14, 2024

vobst self-assigned this Jun 14, 2024

This was linked to issues Jun 14, 2024

panic in Result::unwrap() due to "InvalidShiftAmount" when analyzing STM32 Firmware #469

Closed

Explicitly use wrapping arithmetic operations when overflow may occur #467

Closed

CHANGES: fix Subregister Substitution

9c6cc26

Signed-off-by: Valentin Obst <[email protected]>

vobst merged commit 65efd5f into fkie-cad:master Jun 18, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 469: Semantically invalid IR statements due to bad Subregister Substitution #470

Issue 469: Semantically invalid IR statements due to bad Subregister Substitution #470

vobst commented Jun 14, 2024 •

edited

Loading

Issue 469: Semantically invalid IR statements due to bad Subregister Substitution #470

Issue 469: Semantically invalid IR statements due to bad Subregister Substitution #470

Conversation

vobst commented Jun 14, 2024 • edited Loading

Fix

Examples

lxvd2x on ppc64le

cvt.s.L on mips64el

sel on arm

Numbers

vobst commented Jun 14, 2024 •

edited

Loading

`lxvd2x` on ppc64le

`cvt.s.L` on mips64el

`sel` on arm