Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5.0.1 has more broken 'NOREGNAME' syntax on ARM32. #2145

Closed
Tracked by #2081
gerph opened this issue Aug 23, 2023 · 9 comments
Closed
Tracked by #2081

5.0.1 has more broken 'NOREGNAME' syntax on ARM32. #2145

gerph opened this issue Aug 23, 2023 · 9 comments
Milestone

Comments

@gerph
Copy link

gerph commented Aug 23, 2023

Summary

I made some changes for 5.0.1 for the CS_OPT_SYNTAX_NOREGNAME which I thought were working, but things have become more broken in 5.0.1

It looks like NOREGNAME produces the same output as DEFAULT.

Example code

This example code prints out the default register form and the 'noregname' form.

#!/usr/bin/env python

import sys

from capstone import *
import capstone.arm_const

code = bytearray([0, 0x10, 0x90, 0xe5]) # LDR r1,[r0]

# A decoder for 'no regname'
mdnr = Cs(CS_ARCH_ARM, CS_MODE_ARM)
mdnr.detail = True
mdnr.syntax = capstone.CS_OPT_SYNTAX_NOREGNAME

# A decoder for default format
mddef = Cs(CS_ARCH_ARM, CS_MODE_ARM)
mddef.detail = True
mddef.syntax = capstone.CS_OPT_SYNTAX_DEFAULT

optype_names = dict((getattr(capstone.arm_const, optype), optype) for optype in dir(capstone.arm_const) if optype.startswith('ARM_OP_'))

print("cs_version() = %r" % (cs_version(),))

for regnum in range(0, 16):
    # Tweak the source register
    code[2] = (code[2] & 0xF0) | regnum
    for i in mddef.disasm(bytes(code), 0x1000):
        dis_default = "%-6s%s" % (i.mnemonic, i.op_str)
    for i in mdnr.disasm(bytes(code), 0x1000):
        dis_noregname = "%-6s%s" % (i.mnemonic, i.op_str)
    print("Register %2i: default: %-20s  noregname: %s" % (regnum, dis_default, dis_noregname))

Test results for 4.0.2

cs_version() = (4, 0, 1024)
Register  0: default: ldr   r1, [r0]        noregname: ldr   r1, [r0]
Register  1: default: ldr   r1, [r1]        noregname: ldr   r1, [r1]
Register  2: default: ldr   r1, [r2]        noregname: ldr   r1, [r2]
Register  3: default: ldr   r1, [r3]        noregname: ldr   r1, [r3]
Register  4: default: ldr   r1, [r4]        noregname: ldr   r1, [r4]
Register  5: default: ldr   r1, [r5]        noregname: ldr   r1, [r5]
Register  6: default: ldr   r1, [r6]        noregname: ldr   r1, [r6]
Register  7: default: ldr   r1, [r7]        noregname: ldr   r1, [r7]
Register  8: default: ldr   r1, [r8]        noregname: ldr   r1, [r8]
Register  9: default: ldr   r1, [sb]        noregname: ldr   r1, [r9]
Register 10: default: ldr   r1, [sl]        noregname: ldr   r1, [r10]
Register 11: default: ldr   r1, [fp]        noregname: ldr   r1, [r11]
Register 12: default: ldr   r1, [ip]        noregname: ldr   r1, [r12]
Register 13: default: ldr   r1, [sp]        noregname: ldr   r1, [sp]
Register 14: default: ldr   r1, [lr]        noregname: ldr   r1, [lr]
Register 15: default: ldr   r1, [pc]        noregname: ldr   r1, [pc]

Test results for 5.0.0

cs_version() = (5, 0, 1280)
Register  0: default: ldr   r1, [r0]        noregname: ldr   r1, [r0]
Register  1: default: ldr   r1, [r1]        noregname: ldr   r1, [r1]
Register  2: default: ldr   r1, [r2]        noregname: ldr   r1, [r2]
Register  3: default: ldr   r1, [r3]        noregname: ldr   r1, [r3]
Register  4: default: ldr   r1, [r4]        noregname: ldr   r1, [r4]
Register  5: default: ldr   r1, [r5]        noregname: ldr   r1, [r5]
Register  6: default: ldr   r1, [r6]        noregname: ldr   r1, [r6]
Register  7: default: ldr   r1, [r7]        noregname: ldr   r1, [r7]
Register  8: default: ldr   r1, [r8]        noregname: ldr   r1, [r8]
Register  9: default: ldr   r1, [sb]        noregname: ldr   r1, [r9]
Register 10: default: ldr   r1, [sl]        noregname: ldr   r1, [r10]
Register 11: default: ldr   r1, [fp]        noregname: ldr   r1, [r11]
Register 12: default: ldr   r1, [ip]        noregname: ldr   r1, [r12]
Register 13: default: ldr   r1, [sp]        noregname: ldr   r1, [r13]
Register 14: default: ldr   r1, [lr]        noregname: ldr   r1, [r14]
Register 15: default: ldr   r1, [pc]        noregname: ldr   r1, [pc]

Notice this is all register numbers in the noregname case; this was what I tried to make more consistent with 4.0.x.

Test results for 5.0.1

cs_version() = (5, 0, 1280)
Register  0: default: ldr   r1, [r0]        noregname: ldr   r1, [r0]
Register  1: default: ldr   r1, [r1]        noregname: ldr   r1, [r1]
Register  2: default: ldr   r1, [r2]        noregname: ldr   r1, [r2]
Register  3: default: ldr   r1, [r3]        noregname: ldr   r1, [r3]
Register  4: default: ldr   r1, [r4]        noregname: ldr   r1, [r4]
Register  5: default: ldr   r1, [r5]        noregname: ldr   r1, [r5]
Register  6: default: ldr   r1, [r6]        noregname: ldr   r1, [r6]
Register  7: default: ldr   r1, [r7]        noregname: ldr   r1, [r7]
Register  8: default: ldr   r1, [r8]        noregname: ldr   r1, [r8]
Register  9: default: ldr   r1, [sb]        noregname: ldr   r1, [sb]
Register 10: default: ldr   r1, [sl]        noregname: ldr   r1, [sl]
Register 11: default: ldr   r1, [fp]        noregname: ldr   r1, [fp]
Register 12: default: ldr   r1, [ip]        noregname: ldr   r1, [ip]
Register 13: default: ldr   r1, [sp]        noregname: ldr   r1, [sp]
Register 14: default: ldr   r1, [lr]        noregname: ldr   r1, [lr]
Register 15: default: ldr   r1, [pc]        noregname: ldr   r1, [pc]

Note that the noregname case is exactly the same as the default.

Expected output

I had hoped that 5.0.1 would be closer to the 4.0.x version. It seems to have gone worse..

Possible reason

I looked at the constants in the Python capstone/__init__.py for CS_OPT_SYNTAX and I see a possible problem?

On 5.0.0 the constants are:

# Capstone syntax value
CS_OPT_SYNTAX_DEFAULT = 0    # Default assembly syntax of all platforms (CS_OPT_SYNTAX)
CS_OPT_SYNTAX_INTEL = 1    # Intel X86 asm syntax - default syntax on X86 (CS_OPT_SYNTAX, CS_ARCH_X86)
CS_OPT_SYNTAX_ATT = 2      # ATT asm syntax (CS_OPT_SYNTAX, CS_ARCH_X86)
CS_OPT_SYNTAX_NOREGNAME = 3   # Asm syntax prints register name with only number - (CS_OPT_SYNTAX, CS_ARCH_PPC, CS_ARCH_ARM)
CS_OPT_SYNTAX_MASM = 4      # MASM syntax (CS_OPT_SYNTAX, CS_ARCH_X86)
CS_OPT_SYNTAX_MOTOROLA = 5 # MOS65XX use $ as hex prefix

On 5.0.1 the constants are:

# Capstone syntax value
CS_OPT_SYNTAX_DEFAULT = 1 << 1  # Default assembly syntax of all platforms (CS_OPT_SYNTAX)
CS_OPT_SYNTAX_INTEL = 1 << 2  # Intel X86 asm syntax - default syntax on X86 (CS_OPT_SYNTAX, CS_ARCH_X86)
CS_OPT_SYNTAX_ATT = 1 << 3  # ATT asm syntax (CS_OPT_SYNTAX, CS_ARCH_X86)
CS_OPT_SYNTAX_NOREGNAME = 1 << 4  # Asm syntax prints register name with only number - (CS_OPT_SYNTAX, CS_ARCH_PPC, CS_ARCH_ARM)
CS_OPT_SYNTAX_MASM = 1 << 5  # MASM syntax (CS_OPT_SYNTAX, CS_ARCH_X86)
CS_OPT_SYNTAX_MOTOROLA = 1 << 6  # MOS65XX use $ as hex prefix
CS_OPT_SYNTAX_CS_REG_ALIAS = 1 << 7  # Prints common register alias which are not defined in LLVM (ARM: r9 = sb etc.)

It's likely that this is correct, but the fact that the selection of the syntax has changed its constant values, and the output has stopped working makes me think that it might be related.

If I look at the setter for syntax in 5.0.1, I see:

    # syntax setter: modify assembly syntax.
    @syntax.setter
    def syntax(self, style):
        status = _cs.cs_option(self.csh, CS_OPT_SYNTAX, style)
        if status != CS_ERR_OK:
            raise CsError(status)
        # save syntax
        self._syntax = style

But for 'skipdata' I see it has this form:

    # setter: modify skipdata status
    @skipdata.setter
    def skipdata(self, opt):
        if opt == False:
            status = _cs.cs_option(self.csh, CS_OPT_SKIPDATA, CS_OPT_OFF)
        else:
            status = _cs.cs_option(self.csh, CS_OPT_SKIPDATA, CS_OPT_ON)
        if status != CS_ERR_OK:
            raise CsError(status)

        # save this option
        self._skipdata = opt

ie it's using CS_OPT_ON and CS_OPT_OFF in the call to change options, whilst the syntax isn't, and in capstone.h file we see the actual definitions as:

/// Runtime option value (associated with option type above)
typedef enum cs_opt_value {
	CS_OPT_OFF = 0,  ///< Turn OFF an option - default for CS_OPT_DETAIL, CS_OPT_SKIPDATA, CS_OPT_UNSIGNED.
	CS_OPT_ON = 1 << 0, ///< Turn ON an option (CS_OPT_DETAIL, CS_OPT_SKIPDATA).
	CS_OPT_SYNTAX_DEFAULT = 1 << 1, ///< Default asm syntax (CS_OPT_SYNTAX).
	CS_OPT_SYNTAX_INTEL = 1 << 2, ///< X86 Intel asm syntax - default on X86 (CS_OPT_SYNTAX).
	CS_OPT_SYNTAX_ATT = 1 << 3,   ///< X86 ATT asm syntax (CS_OPT_SYNTAX).
	CS_OPT_SYNTAX_NOREGNAME = 1 << 4, ///< Prints register name with only number (CS_OPT_SYNTAX)
	CS_OPT_SYNTAX_MASM = 1 << 5, ///< X86 Intel Masm syntax (CS_OPT_SYNTAX).
	CS_OPT_SYNTAX_MOTOROLA = 1 << 6, ///< MOS65XX use $ as hex prefix
	CS_OPT_SYNTAX_CS_REG_ALIAS = 1 << 7, ///< Prints common register alias which are not defined in LLVM (ARM: r9 = sb etc.)
} cs_opt_value;

The value of CS_OPT_ON and CS_OPT_OFF is 1 and 0 respectively, which makes me think that this was intended to be an OR'd bitfield to control the flags.

But I'm guessing here... it seems odd that a patch version update would change the meaning of the constants - that might make it hard in compiled languages that expect to be able to dynamic link with minor versions without an ABI change? Again, I'm guessing that's the case.

@Rot127
Copy link
Collaborator

Rot127 commented Aug 23, 2023

Are you sure you are testing agains 5.0.1? Because The changes to the constants are in the next branch. Not in the 5.0.1 tagged commit.

@gerph
Copy link
Author

gerph commented Aug 23, 2023

Ah! When I was looking at the header files in github, I was indeed looking at next... but when I was looking at the python code I was seeing it on my machine... let me try to confirm more convincingly...

Yes, I am convinced that I am using the 5.0.1 python files. I have started from a clean docker container, installed capstone 5.0.1 and then run my test code, and then displayed the constants from the capstone/__init__.py.

Here's the operations I performed:

charles@laputa ~/projects/RO/pyromaniac (protect-daheap-free-blocks↑1)> docker run -it --rm -v $PWD/cap:/code python:2.7  bash
root@51cfcfd1c82d:/# pip install capstone
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
Collecting capstone
  Downloading capstone-5.0.1.tar.gz (2.9 MB)
     |████████████████████████████████| 2.9 MB 2.2 MB/s 
Building wheels for collected packages: capstone
  Building wheel for capstone (setup.py) ... done
  Created wheel for capstone: filename=capstone-5.0.1-py2-none-manylinux1_x86_64.whl size=2896990 sha256=2f1605438c68d5ad2cbfe53556d34dbf1fc3a9439f0de86038fd4163fb4c2d72
  Stored in directory: /root/.cache/pip/wheels/1f/55/36/dae037be731eca3b5b9d76a4e8a0238ed08f0bc5d538675c5a
Successfully built capstone
Installing collected packages: capstone
Successfully installed capstone-5.0.1
WARNING: You are using pip version 20.0.2; however, version 20.3.4 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
root@51cfcfd1c82d:/# cd /code
root@51cfcfd1c82d:/code# ls
diss-regnames.py
root@51cfcfd1c82d:/code# python diss-regnames.py 
cs_version() = (5, 0, 1280)
Register  0: default: ldr   r1, [r0]        noregname: ldr   r1, [r0]
Register  1: default: ldr   r1, [r1]        noregname: ldr   r1, [r1]
Register  2: default: ldr   r1, [r2]        noregname: ldr   r1, [r2]
Register  3: default: ldr   r1, [r3]        noregname: ldr   r1, [r3]
Register  4: default: ldr   r1, [r4]        noregname: ldr   r1, [r4]
Register  5: default: ldr   r1, [r5]        noregname: ldr   r1, [r5]
Register  6: default: ldr   r1, [r6]        noregname: ldr   r1, [r6]
Register  7: default: ldr   r1, [r7]        noregname: ldr   r1, [r7]
Register  8: default: ldr   r1, [r8]        noregname: ldr   r1, [r8]
Register  9: default: ldr   r1, [sb]        noregname: ldr   r1, [sb]
Register 10: default: ldr   r1, [sl]        noregname: ldr   r1, [sl]
Register 11: default: ldr   r1, [fp]        noregname: ldr   r1, [fp]
Register 12: default: ldr   r1, [ip]        noregname: ldr   r1, [ip]
Register 13: default: ldr   r1, [sp]        noregname: ldr   r1, [sp]
Register 14: default: ldr   r1, [lr]        noregname: ldr   r1, [lr]
Register 15: default: ldr   r1, [pc]        noregname: ldr   r1, [pc]
root@51cfcfd1c82d:/code# python
Python 2.7.18 (default, Apr 20 2020, 19:27:10) 
[GCC 8.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import capstone
>>> capstone
<module 'capstone' from '/usr/local/lib/python2.7/site-packages/capstone/__init__.pyc'>
>>> 
root@51cfcfd1c82d:/code# grep OPT_SYNTAX /usr/local/lib/python2.7/site-packages/capstone/__init__.py
    'CS_OPT_SYNTAX',
    'CS_OPT_SYNTAX_DEFAULT',
    'CS_OPT_SYNTAX_INTEL',
    'CS_OPT_SYNTAX_ATT',
    'CS_OPT_SYNTAX_NOREGNAME',
    'CS_OPT_SYNTAX_MASM',
    'CS_OPT_SYNTAX_MOTOROLA',
    'CS_OPT_SYNTAX_CS_REG_ALIAS',
CS_OPT_SYNTAX = 1    # Intel X86 asm syntax (CS_ARCH_X86 arch)
CS_OPT_SYNTAX_DEFAULT = 1 << 1  # Default assembly syntax of all platforms (CS_OPT_SYNTAX)
CS_OPT_SYNTAX_INTEL = 1 << 2  # Intel X86 asm syntax - default syntax on X86 (CS_OPT_SYNTAX, CS_ARCH_X86)
CS_OPT_SYNTAX_ATT = 1 << 3  # ATT asm syntax (CS_OPT_SYNTAX, CS_ARCH_X86)
CS_OPT_SYNTAX_NOREGNAME = 1 << 4  # Asm syntax prints register name with only number - (CS_OPT_SYNTAX, CS_ARCH_PPC, CS_ARCH_ARM)
CS_OPT_SYNTAX_MASM = 1 << 5  # MASM syntax (CS_OPT_SYNTAX, CS_ARCH_X86)
CS_OPT_SYNTAX_MOTOROLA = 1 << 6  # MOS65XX use $ as hex prefix
CS_OPT_SYNTAX_CS_REG_ALIAS = 1 << 7  # Prints common register alias which are not defined in LLVM (ARM: r9 = sb etc.)
            self._syntax = CS_OPT_SYNTAX_INTEL
        status = _cs.cs_option(self.csh, CS_OPT_SYNTAX, style)
root@51cfcfd1c82d:/code# 

Whether the changed constants are on the branch or not, they're definitely in the 5.0.1 release as downloaded from pip. Unless I did something wrong, but I cannot see where.

@Rot127
Copy link
Collaborator

Rot127 commented Aug 23, 2023

Seems like d2a39a2 accidentally slipped into the release.
@kabeor Could you please reverse it?

@gerph I know you are aware of it, but as a general note: Because Python2 is EOL we removed the CI code for it as well. So there is no guarantee that it will work in the future.

@kabeor
Copy link
Member

kabeor commented Aug 24, 2023

Well I see. d2a39a2 will be revert and the change will be abled in next release.

@Rot127 Maybe #2097 shouldn't cherry-pick into v5?

@gerph
Copy link
Author

gerph commented Aug 24, 2023

In the prior issue, about the operands I reverted the change you reference, and it seems to have fixed the registers as well. The outptu from my test, having built the tree with d2a39a2 reverted I see:

cs_version() = (5, 0, 1280)
Register  0: default: ldr   r1, [r0]        noregname: ldr   r1, [r0]
Register  1: default: ldr   r1, [r1]        noregname: ldr   r1, [r1]
Register  2: default: ldr   r1, [r2]        noregname: ldr   r1, [r2]
Register  3: default: ldr   r1, [r3]        noregname: ldr   r1, [r3]
Register  4: default: ldr   r1, [r4]        noregname: ldr   r1, [r4]
Register  5: default: ldr   r1, [r5]        noregname: ldr   r1, [r5]
Register  6: default: ldr   r1, [r6]        noregname: ldr   r1, [r6]
Register  7: default: ldr   r1, [r7]        noregname: ldr   r1, [r7]
Register  8: default: ldr   r1, [r8]        noregname: ldr   r1, [r8]
Register  9: default: ldr   r1, [sb]        noregname: ldr   r1, [r9]
Register 10: default: ldr   r1, [sl]        noregname: ldr   r1, [r10]
Register 11: default: ldr   r1, [fp]        noregname: ldr   r1, [r11]
Register 12: default: ldr   r1, [ip]        noregname: ldr   r1, [r12]
Register 13: default: ldr   r1, [sp]        noregname: ldr   r1, [sp]
Register 14: default: ldr   r1, [lr]        noregname: ldr   r1, [lr]
Register 15: default: ldr   r1, [pc]        noregname: ldr   r1, [pc]

which is the behaviour I expect. So that would address the issue.

It doesn't address the cs_version(), but I'll leave a note on the 5.0.x release suggestions ticket to update it for the release.

@peace-maker
Copy link
Contributor

d2a39a2 should not be backported to v5. That's why I made it a seperate commit in #2097 since it's only related to the auto-sync update of ARM. The other commit in that PR is still valid for v5.

Sorry for not making that clearer :(

@nmeum
Copy link

nmeum commented Jan 18, 2024

I think this was fixed in #2240 which was just merged.

@Rot127
Copy link
Collaborator

Rot127 commented Jan 18, 2024

@gerph Could you please check again. And if true close the issue.

@gerph
Copy link
Author

gerph commented Jan 21, 2024

@Rot127 Retested on the v5 branch and I now get the same register names as 4.0.0. So this still seems to deal with the issue.

I also still see that cs_version still reports (5, 0, 1280), so the current v5 branch is indistinguishable from the 5.0.1 release version, but that's an independant thing

@gerph gerph closed this as completed Jan 21, 2024
@Rot127 Rot127 added this to the v5.0.2 milestone Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants