Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 469: Semantically invalid IR statements due to bad Subregister Substitution #470

Merged
merged 7 commits into from
Jun 18, 2024

Conversation

vobst
Copy link
Collaborator

@vobst vobst commented Jun 14, 2024

It was observed that the Subregister Substitution may produce expressions that are semantically malformed. This happens when assignments are made to the lower bytes of a full register and this subregister has no special name.

It was noticed since evaluating a SUBPIECE expression of size zero over the PI abstract domain lead to panics (#469).

Note: The relevant patch is in the first commit. I snuck in some more debugging patches here ... just don't mind them :)

Fix

The underlying issue is that when assigning to the lower bytes of registers the miss-translated pcode does not use special subregister names (or there simple are no such names), but rather used the full register with a smaller size. In those cases the old code would end up thinking that the subregister had the same size as the full register and the SUBPIECE expression over the old register value that is needed to build a sill-size assignment would end up being of size zero.

This is corrected by setting the register size of the full register to the size of the variable being assigned. If the register size is already correct, this should be a no-op. The fix is in the first commit.

Examples

Here are some examples of previously malformed expressions that are corrected by this patch:

lxvd2x on ppc64le

  0016e454 98 56     lxvd2x vs0,r28,r10
           1c 7c
                                   $U56800:8 = INT_ADD r28, r10
                                   f0 = LOAD ram($U56800:8)
                                   $U56900:8 = INT_ADD $U56800:8, 8:8
                                   vs0:8 = LOAD ram($U56900:8)

becomes

     DEF [instr_0016e454_0] $U5a300:8(temp) = (r28:8 + r10:8)
     DEF [instr_0016e454_1] loaded_value:8(temp) := Load from $U5a300:8(temp)
     DEF [instr_0016e454_1_cast_to_base] vs0:16 = (loaded_value:8(temp) Piece (vs0:16)[0-7])
     DEF [instr_0016e454_2] $U5a400:8(temp) = ($U5a300:8(temp) + 0x8:8)
     DEF [instr_0016e454_3] loaded_value:8(temp) := Load from $U5a400:8(temp)
-    DEF [instr_0016e454_3_cast_to_base] vs0:16 = ((vs0:16)[16-15] Piece loaded_value:8(temp))
+    DEF [instr_0016e454_3_cast_to_base] vs0:16 = ((vs0:16)[8-15] Piece loaded_value:8(temp))

cvt.s.L on mips64el

  00117788 60 01     cvt.s.L       f5,f0
           a0 46
                                   f5:4 = INT2FLOAT f0

becomes

-    DEF [instr_0011778c_0] f5:8 = ((f5:8)[8-7] Piece ((f5:8)[0-3] FloatAdd (f5:8)[0-3]))
+    DEF [instr_0011778c_0] f5:8 = ((f5:8)[4-7] Piece ((f5:8)[0-3] FloatAdd (f5:8)[0-3]))

sel on arm

  08000276 a3 fa     sel    r5,r3,r7
           87 f5
                     $Ub8680:1 = INT_EQUAL GE1, 1:1
                     $Ub8700:1 = INT_MULT $Ub8680:1, r3:1
                     $Ub8780:1 = INT_EQUAL GE1, 0:1
                     $Ub8800:1 = INT_MULT $Ub8780:1, r7:1
                     r5:1 = INT_ADD $Ub8700:1, $Ub8800:1

becomes

     DEF [instr_08000276_1] $Ub7900:1(temp) = $Ub7880:1(temp) * (r3:4)[0-0]
     DEF [instr_08000276_2] $Ub7980:1(temp) = (GE1:1 == 0x0:1)
     DEF [instr_08000276_3] $Ub7a00:1(temp) = $Ub7980:1(temp) * (r7:4)[0-0]
-    DEF [instr_08000276_4] r5:4 = ((r5:4)[4-3] Piece ($Ub7900:1(temp) + $Ub7a00:1(temp)))
+    DEF [instr_08000276_4] r5:4 = ((r5:4)[1-3] Piece ($Ub7900:1(temp) + $Ub7a00:1(temp)))

Numbers

Here are the numbers of corrected lines for each program in our minimal test suite. All changes were manually verified to be correct.

| program  | arch        | occurrences |
|----------|-------------|------------|
| netfs.ko | amd64       | 0          |
| netfs.ko | arm64       | 0          |
| netfs.ko | armhf       | 0          |
| netfs.ko | mips32r2el  | 0          |
| netfs.ko | mips64r2el  | 0          |
| netfs.ko | powerpc64le | 0          |
| netfs.ko | x86         | 0          |
| ls       | amd64       | 0          |
| ls       | arm64       | 0          |
| ls       | armel       | 0          |
| ls       | armhf       | 0          |
| ls       | mips64el    | 54         |
| ls       | mipsel      | 0          |
| ls       | ppc64el     | 66         |
| ls       | x86         | 0          |
| readelf  | amd64       | 0          |
| readelf  | arm64       | 0          |
| readelf  | armel       | 0          |
| readelf  | armhf       | 0          |
| readelf  | mips64el    | 0          |
| readelf  | mipsel      | 0          |
| readelf  | ppc64el     | 31         |
| readelf  | x86         | 0          |

Valentin Obst added 6 commits June 14, 2024 23:11
Assignments to subregisters that have the same name as the full register
may lead to expressions that have SUBPIECE operations that would result
in zero-sized values. In general, the subregister substitution produces
semantically incorrect IR code in those cases.

I observed that this occurs when assignments are made to the lower
bytes of a larger register. Those cases should be handled by this
patch. If this may also happen for assignments to other subregisters,
then this patch is incomplete (see FIXME).

Reported-by: https://github.com/ljungnickel
Closes: fkie-cad#469
Signed-off-by: Valentin Obst <[email protected]>
…values

If we see zero-sized values flying around it is a clear sign that
something, somewhere went terribly wrong, so just print a meaningful
error and unwind.

Signed-off-by: Valentin Obst <[email protected]>
Replace integer arithmetic that might overflow with the appropriate
methods to explicitly opt-in to the behavior.

Since these methods have the same semantics as the bare operators (in
release builds) there should be no semantic changes here.

I did not investigate if the original code was aware that these
operations might overflow. In particual, it is not clear to me if the
right shift is expected to potentially overflow.

Signed-off-by: Valentin Obst <[email protected]>
While debugging analysis issues I found myself reinserting the
same print statements at the same places for each new investigation.

Therefore, introduce an extensible infrastructure to debug different
parts of the analysis. The `utils::debug::Settings` object is derived
from the command line and an immutable reference to it is the sole thing
that is being passed around to convey the current debug environment to
different part of the code. It also provides helper methods to perform
debug operations based on the runtime environment.

The related command-line options and the output format are entirely
unstable and only meant for development purposes.

Signed-off-by: Valentin Obst <[email protected]>
The output of the Ghidra plugin (and Ghidra in general) is
non-deterministic, even on the same Ghidra version. Probably this is
due to analysis timeouts.

Furthermore, running Ghidra takes a significant portion of the analysis
time.

Therefore, we can create reproducibility as well as speedups during
development by saving the plugin output (using `--debug pcode-raw`) and
replaying it later (via `--pcode-raw /path/to/file`).

Signed-off-by: Valentin Obst <[email protected]>
Signed-off-by: Valentin Obst <[email protected]>
@vobst vobst merged commit 65efd5f into fkie-cad:master Jun 18, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant