RISCV basic support #1318

Antwy · 2024-04-03T15:52:03Z

Hi! Added basic support for RISCV instruction set. This covers most of IMC standard ISA extensions for RV32 & RV64.
It would be great if you could give some review and merge it.

Some details:

The provided architecture(ARCH_RVxx) contains compressed instructions by default
As RV32 is a subset of RV64 (except 1 instruction), both of them are situated in a single namespace riscv alike x86, the difference is the register sizes
RISCV in Capstone needs refactoring so it can fail to process some instructions (issue #2278)
As Capstone doesn't have enumeration of pseudo instructions yet (like alias id for arm), they are detected while building semantics of corresponding instruction

JonathanSalwan · 2024-04-03T17:21:06Z

Awesome. Thanks for a such MR. Let me few weeks to review this. Can you try to fix CIs?

JonathanSalwan · 2024-04-03T17:22:01Z

@cnheitman can you take a look at this too so that we have at least two reviews for a such MR?

cnheitman · 2024-04-03T18:18:03Z

Awesome! Great PR!

@JonathanSalwan Yes, I'll review it, most probably sometime next week.

Antwy · 2024-04-04T12:20:58Z

Well, I guess one way to fix CI is to update Capstone version from 4.0.2 to 5.0+. But if you need the older version, I can try to put riscv code under defines

cnheitman · 2024-04-14T15:07:55Z

The PR looks good, great work @Antwy o/

This PR was based on master (I think) and does not include the commit upgrading Bitwuzla from v0.2.0 to v0.4.0 of dev-v1.0. However, there should be no conflicts rebasing on top of that change.

I did not dive much into the semantics file. On a quick overview they look good. If we want to increase our confidence on the code, we can add some more tests (for instance, we have a binary with different optimization levels which we use to test the ARM semantics). However, basic unittests were included so it seems fine for now.

Ideally, I would add a RV32 and RV64 version of this custom crackme so we can have an example of a full working binary, as we do with AARCH64 and ARM.

Regarding the CI and Capstone. I think we can move on to version 5.0.1. Version 4.0.2 is from 2020 (iirc) and as far as I can tell we don't have any specific reason to keep supporting it.

Antwy · 2024-05-03T17:15:49Z

Now rebased on dev-v1.0 and some semantics issues fixed (but still not sure in taint spreading variants).
Adding binary test seems reasonable as basic unittest suite doesn't cover controlflow transfers.

Antwy · 2024-05-30T13:56:10Z

Added the crackme binary test

mrexodia · 2024-06-04T13:03:33Z

Might be worthwhile to take a look at the official RISC-V ISA tests: https://github.com/riscv-software-src/riscv-tests

I wrote a small python script to process the compiled files and generate a data.h that contains enough information to start emulation: https://github.com/thesecretclub/riscy-business/blob/master/riscvm/generate-isa-tests.py. You just have to modify this chunk of code to set up the triton context and start emulation:

https://github.com/thesecretclub/riscy-business/blob/master/riscvm/tests.cpp#L44-L55

The reason for the generate-isa-tests.py is that the test executables have some common setup, which requires support for supervisor instructions/paging so I extract just the raw RV64I code and run that directly (since the setup can be done on the emulator host side).

The process should be similar for the 32-bit ISA tests, but I didn't compile or try them. It helped me a lot to find bugs in my emulator, so it was definitely worth the effort in my opinion! You also do not need unicorn anymore (because given their track record their semantics are probably incorrect), or at least it can be an additional test.

m4drat · 2024-06-09T19:29:47Z

Hi! I've played around with this PR and seems to have found a bug. According to RISCV ISA manual the SLL instruction in RV64I should perform a logical shift left by the shift amount from the lower 6 bits held in the third operand (this also applies to other similar instructions).

In RV64I, only the low 6 bits of rs2 are considered for the shift amount.

However, in the current implementation only the lower 5 bits are used.

This example demonstrates the problem:

from unicorn.riscv_const import *
from unicorn import *
from triton import *

CODE_START = 0x0


def emu_unicorn():
    # sll s8, s7, a7
    opcode = b"\x33\x9c\x1b\x01"

    mu = Uc(UC_ARCH_RISCV, UC_MODE_RISCV64)
    mu.mem_map(CODE_START, 0x1000)
    mu.mem_write(CODE_START, opcode)

    mu.reg_write(UC_RISCV_REG_A7, 0x69C99AB9B9401024)
    mu.reg_write(UC_RISCV_REG_S7, 0xDB4D6868655C3585)

    mu.reg_write(UC_RISCV_REG_PC, CODE_START)
    mu.emu_start(CODE_START, CODE_START + len(opcode))

    print("(UC) s8 = 0x%x" % mu.reg_read(UC_RISCV_REG_S8))


def emu_triton():
    # sll s8, s7, a7
    opcode = b"\x33\x9c\x1b\x01"

    ctx = TritonContext()
    ctx.setArchitecture(ARCH.RV64)

    inst = Instruction()
    inst.setOpcode(opcode)
    inst.setAddress(CODE_START)

    ctx.setConcreteRegisterValue(ctx.registers.x17, 0x69C99AB9B9401024)
    ctx.setConcreteRegisterValue(ctx.registers.x23, 0xDB4D6868655C3585)

    ctx.processing(inst)

    print("(TT) s8 = 0x%x" % ctx.getConcreteRegisterValue(ctx.registers.x24))


def main():
    emu_unicorn()
    emu_triton()


if __name__ == "__main__":
    main()

Antwy · 2024-06-10T13:42:06Z

Good catch, @m4drat! Thanks!

mrexodia · 2024-06-10T15:01:29Z

Modified my generation script to generate a python file that can be used for the tests:

from elftools.elf.elffile import ELFFile
from elftools.elf.sections import SymbolTableSection
import os
import zlib

def parse_test_elf(file):
    with open(file, "rb") as f:
        elf = ELFFile(f)
        # Enumerate the SymbolTableSection
        for section in elf.iter_sections():
            if isinstance(section, SymbolTableSection):
                for i in range(section.num_symbols()):
                    symbol = section.get_symbol(i)
                    if symbol.name:
                        if symbol.name.startswith("test_"):
                            address = symbol.entry.st_value
                            # Convert address to file offset
                            offset = list(elf.address_offsets(address))[0]
                            return address, offset
    return None, None

def main():
    tests = []
    directory = "isa-tests"
    code = "import zlib\n\n"
    for file in sorted(os.listdir(directory)):
        if file.startswith("rv64") and not file.endswith(".dump"):
            path = os.path.join(directory, file)
            address, offset = parse_test_elf(path)
            if offset is None:
                print(f"Failed to parse {file}")
                continue
            data = f"__{file.replace('-', '_').upper()}_DATA = zlib.decompress(b\""
            with open(path, "rb") as f:
                for byte in zlib.compress(f.read(), 9):
                    data += f"\\x{byte:02x}"
            data += "\")\n"
            code += data
            tests.append((file, address, offset))

    code += "\n"

    code += "TESTS = [\n"
    for name, address, offset in tests:
        variable = f"__{name.replace('-', '_').upper()}_DATA"
        code += f"    (\"{name}\", {variable}, {hex(address)}, {hex(offset)}),\n"
    code += "]\n"

    with open("isa-tests/data.py", "wb") as f:
        f.write(code.encode("utf-8"))


if __name__ == "__main__":
    main()

The generated data.py: data.py.zip (384 kb)

m4drat · 2024-06-10T15:25:52Z

@Antwy

Hm, I might be wrong on this one, but I couldn't see in the code whether you handle RV32/RV64 cases differently. Because the shift amount should depend on the target. 6 bits - RV64, 5 bits - RV32.

To be precise, here are the quotes:

SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register
rs1 by the shift amount held in register rs2. In RV64I, only the low 6 bits of rs2 are considered for the
shift amount.

SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register
rs1 by the shift amount held in the lower 5 bits of register rs2.

mrexodia · 2024-06-10T15:57:31Z

Spent some time rigging up the official ISA tests on top of this PR:

import sys
from struct import pack

from triton import *
from riscv64_data import TESTS as RV64TESTS

def emulate_test(name: str, binary: bytes, address: int, offset: int, trace: bool):
    # initial state
    STACK = 0x200000
    istate = {
        "stack": bytearray(b"".join([pack('B', 255 - i) for i in range(256)])),
        "heap":  bytearray(b"".join([pack('B', i) for i in range(256)])),
        "x0":    0x0,
        "x1":    0x0,
        "x2":    STACK,
        "x3":    0x0,
        "x4":    0x0,
        "x5":    0x0,
        "x6":    0x0,
        "x7":    0x0,
        "x8":    0x0,
        "x9":    0x0,
        "x10":   0x0,
        "x11":   0x0,
        "x12":   0x0,
        "x13":   0x0,
        "x14":   0x0,
        "x15":   0x0,
        "x16":   0x0,
        "x17":   0x0,
        "x18":   0x0,
        "x19":   0x0,
        "x20":   0x0,
        "x21":   0x0,
        "x22":   0x0,
        "x23":   0x0,
        "x24":   0x0,
        "x25":   0x0,
        "x26":   0x0,
        "x27":   0x0,
        "x28":   0x0,
        "x29":   0x0,
        "x30":   0x0,
        "x31":   0x0,
        "f0":    0x00112233445566778899aabbccddeeff,
        "f1":    0xffeeddccbbaa99887766554433221100,
        "f2":    0xfefedcdc5656787889892692dfeccaa0,
        "f3":    0x1234567890987654321bcdffccddee01,
        "f4":    0x0,
        "f5":    0x0,
        "f6":    0x0,
        "f7":    0x0,
        "f8":    0x0,
        "f9":    0x0,
        "f10":   0x0,
        "f11":   0x0,
        "f12":   0x0,
        "f13":   0x0,
        "f14":   0x0,
        "f15":   0x0,
        "f16":   0x0,
        "f17":   0x0,
        "f18":   0x0,
        "f19":   0x0,
        "f20":   0x0,
        "f21":   0x0,
        "f22":   0x0,
        "f23":   0x0,
        "f24":   0x0,
        "f25":   0x0,
        "f26":   0x0,
        "f27":   0x0,
        "f28":   0x0,
        "f29":   0x0,
        "f30":   0x0,
        "f31":   0x0,
        "pc":    address,
    }

    ctx = TritonContext()
    ctx.setArchitecture(ARCH.RV64)

    ctx.setConcreteMemoryAreaValue(STACK,           bytes(istate['stack']))
    ctx.setConcreteMemoryAreaValue(address, binary[offset:])
    ctx.setConcreteRegisterValue(ctx.registers.x0,  0)
    ctx.setConcreteRegisterValue(ctx.registers.x1,  istate['x1'])
    ctx.setConcreteRegisterValue(ctx.registers.x2,  istate['x2'])
    ctx.setConcreteRegisterValue(ctx.registers.x3,  istate['x3'])
    ctx.setConcreteRegisterValue(ctx.registers.x4,  istate['x4'])
    ctx.setConcreteRegisterValue(ctx.registers.x5,  istate['x5'])
    ctx.setConcreteRegisterValue(ctx.registers.x6,  istate['x6'])
    ctx.setConcreteRegisterValue(ctx.registers.x7,  istate['x7'])
    ctx.setConcreteRegisterValue(ctx.registers.x8,  istate['x8'])
    ctx.setConcreteRegisterValue(ctx.registers.x9,  istate['x9'])
    ctx.setConcreteRegisterValue(ctx.registers.x10, istate['x10'])
    ctx.setConcreteRegisterValue(ctx.registers.x11, istate['x11'])
    ctx.setConcreteRegisterValue(ctx.registers.x12, istate['x12'])
    ctx.setConcreteRegisterValue(ctx.registers.x13, istate['x13'])
    ctx.setConcreteRegisterValue(ctx.registers.x14, istate['x14'])
    ctx.setConcreteRegisterValue(ctx.registers.x15, istate['x15'])
    ctx.setConcreteRegisterValue(ctx.registers.x16, istate['x16'])
    ctx.setConcreteRegisterValue(ctx.registers.x17, istate['x17'])
    ctx.setConcreteRegisterValue(ctx.registers.x18, istate['x18'])
    ctx.setConcreteRegisterValue(ctx.registers.x19, istate['x19'])
    ctx.setConcreteRegisterValue(ctx.registers.x20, istate['x20'])
    ctx.setConcreteRegisterValue(ctx.registers.x21, istate['x21'])
    ctx.setConcreteRegisterValue(ctx.registers.x22, istate['x22'])
    ctx.setConcreteRegisterValue(ctx.registers.x23, istate['x23'])
    ctx.setConcreteRegisterValue(ctx.registers.x24, istate['x24'])
    ctx.setConcreteRegisterValue(ctx.registers.x25, istate['x25'])
    ctx.setConcreteRegisterValue(ctx.registers.x26, istate['x26'])
    ctx.setConcreteRegisterValue(ctx.registers.x27, istate['x27'])
    ctx.setConcreteRegisterValue(ctx.registers.x28, istate['x28'])
    ctx.setConcreteRegisterValue(ctx.registers.x29, istate['x29'])
    ctx.setConcreteRegisterValue(ctx.registers.x30, istate['x30'])
    ctx.setConcreteRegisterValue(ctx.registers.x31, istate['x31'])
    ctx.setConcreteRegisterValue(ctx.registers.f0,  istate['f0'])
    ctx.setConcreteRegisterValue(ctx.registers.f1,  istate['f1'])
    ctx.setConcreteRegisterValue(ctx.registers.f2,  istate['f2'])
    ctx.setConcreteRegisterValue(ctx.registers.f3,  istate['f3'])
    ctx.setConcreteRegisterValue(ctx.registers.f4,  istate['f4'])
    ctx.setConcreteRegisterValue(ctx.registers.f5,  istate['f5'])
    ctx.setConcreteRegisterValue(ctx.registers.f6,  istate['f6'])
    ctx.setConcreteRegisterValue(ctx.registers.f7,  istate['f7'])
    ctx.setConcreteRegisterValue(ctx.registers.f8,  istate['f8'])
    ctx.setConcreteRegisterValue(ctx.registers.f9,  istate['f9'])
    ctx.setConcreteRegisterValue(ctx.registers.f10, istate['f10'])
    ctx.setConcreteRegisterValue(ctx.registers.f11, istate['f11'])
    ctx.setConcreteRegisterValue(ctx.registers.f12, istate['f12'])
    ctx.setConcreteRegisterValue(ctx.registers.f13, istate['f13'])
    ctx.setConcreteRegisterValue(ctx.registers.f14, istate['f14'])
    ctx.setConcreteRegisterValue(ctx.registers.f15, istate['f15'])
    ctx.setConcreteRegisterValue(ctx.registers.f16, istate['f16'])
    ctx.setConcreteRegisterValue(ctx.registers.f17, istate['f17'])
    ctx.setConcreteRegisterValue(ctx.registers.f18, istate['f18'])
    ctx.setConcreteRegisterValue(ctx.registers.f19, istate['f19'])
    ctx.setConcreteRegisterValue(ctx.registers.f20, istate['f20'])
    ctx.setConcreteRegisterValue(ctx.registers.f21, istate['f21'])
    ctx.setConcreteRegisterValue(ctx.registers.f22, istate['f22'])
    ctx.setConcreteRegisterValue(ctx.registers.f23, istate['f23'])
    ctx.setConcreteRegisterValue(ctx.registers.f24, istate['f24'])
    ctx.setConcreteRegisterValue(ctx.registers.f25, istate['f25'])
    ctx.setConcreteRegisterValue(ctx.registers.f26, istate['f26'])
    ctx.setConcreteRegisterValue(ctx.registers.f27, istate['f27'])
    ctx.setConcreteRegisterValue(ctx.registers.f28, istate['f28'])
    ctx.setConcreteRegisterValue(ctx.registers.f29, istate['f29'])
    ctx.setConcreteRegisterValue(ctx.registers.f30, istate['f30'])
    ctx.setConcreteRegisterValue(ctx.registers.f31, istate['f31'])

    pc = istate['pc']
    for i in range(1000):
        ctx.setConcreteRegisterValue(ctx.registers.pc, pc)
        opcode = ctx.getConcreteMemoryValue(MemoryAccess(pc, CPUSIZE.DWORD))
        opcode_bytes = pack('<I', opcode)
        inst = Instruction(opcode_bytes)
        inst.setAddress(pc)
        state = ctx.processing(inst)
        if trace:
            print(inst)
        if state == EXCEPTION.NO_FAULT:
            pc = ctx.getConcreteRegisterValue(ctx.registers.pc)
        else:
            disasm = inst.getDisassembly()
            if "fence" in disasm:
                # HACK: ignore the unsupported fence instruction
                pc += 4
            elif "ecall" in disasm:
                syscall_index = ctx.getConcreteRegisterValue(ctx.registers.x17)
                #assert syscall_index == 139, f"invalid syscall: {syscall_index}"
                return ctx.getConcreteRegisterValue(ctx.registers.x10)
            else:
                raise Exception(f"{inst} -> exception {state}")
    return -1


if __name__ == "__main__":
    success = 0
    for name, binary, address, offset in RV64TESTS:
        exit_code = emulate_test(name, binary, address, offset, trace=False)
        if exit_code == 0:
            print(f"SUCCESS: {name}")
            success += 1
        else:
            print(f"FAILURE: {name}, {exit_code}")
    print(f"\n{success}/{len(RV64TESTS)} passed")

Unfortunately the success rate isn't the best, here is the output:

FAILURE: rv64ui-p-add, 46
FAILURE: rv64ui-p-addi, 83
FAILURE: rv64ui-p-addiw, 83
FAILURE: rv64ui-p-addw, 46
SUCCESS: rv64ui-p-and
FAILURE: rv64ui-p-andi, 15
SUCCESS: rv64ui-p-auipc
SUCCESS: rv64ui-p-beq
SUCCESS: rv64ui-p-bge
SUCCESS: rv64ui-p-bgeu
SUCCESS: rv64ui-p-blt
SUCCESS: rv64ui-p-bltu
SUCCESS: rv64ui-p-bne
SUCCESS: rv64ui-p-fence_i
SUCCESS: rv64ui-p-jal
FAILURE: rv64ui-p-jalr, 15
SUCCESS: rv64ui-p-lb
SUCCESS: rv64ui-p-lbu
SUCCESS: rv64ui-p-ld
SUCCESS: rv64ui-p-lh
SUCCESS: rv64ui-p-lhu
FAILURE: rv64ui-p-lui, 18446744071562067968
SUCCESS: rv64ui-p-lw
SUCCESS: rv64ui-p-lwu
FAILURE: rv64ui-p-ma_data, -1
FAILURE: rv64ui-p-or, 858993459
FAILURE: rv64ui-p-ori, 16713727
SUCCESS: rv64ui-p-sb
SUCCESS: rv64ui-p-sd
SUCCESS: rv64ui-p-sh
SUCCESS: rv64ui-p-simple
FAILURE: rv64ui-p-sll, 1024
FAILURE: rv64ui-p-slli, 34603008
FAILURE: rv64ui-p-slliw, 13
FAILURE: rv64ui-p-sllw, 13
FAILURE: rv64ui-p-slt, 1
SUCCESS: rv64ui-p-slti
FAILURE: rv64ui-p-sltiu, 1
FAILURE: rv64ui-p-sltu, 1
FAILURE: rv64ui-p-sra, 1024
SUCCESS: rv64ui-p-srai
SUCCESS: rv64ui-p-sraiw
FAILURE: rv64ui-p-sraw, 1024
FAILURE: rv64ui-p-srl, 1024
SUCCESS: rv64ui-p-srli
FAILURE: rv64ui-p-srliw, 7
FAILURE: rv64ui-p-srlw, 7
FAILURE: rv64ui-p-sub, 18446744073709551602
FAILURE: rv64ui-p-subw, 18446744073709551602
SUCCESS: rv64ui-p-sw
FAILURE: rv64ui-p-xor, 858993459
FAILURE: rv64ui-p-xori, 16713712
FAILURE: rv64um-p-div, 17
FAILURE: rv64um-p-divu, 17
FAILURE: rv64um-p-divuw, 17
FAILURE: rv64um-p-divw, 17
FAILURE: rv64um-p-mul, 1122
FAILURE: rv64um-p-mulh, 1122
FAILURE: rv64um-p-mulhsu, 1122
FAILURE: rv64um-p-mulhu, 1122
FAILURE: rv64um-p-mulw, 1122
FAILURE: rv64um-p-rem, 17
FAILURE: rv64um-p-remu, 17
FAILURE: rv64um-p-remuw, 17
FAILURE: rv64um-p-remw, 17

26/65 passed

Might be I set something up incorrectly though, but the register names are not matching the disassembly so it's difficult to debug/trace without creating additional mapping etc.

Antwy · 2024-06-10T17:49:03Z

@mrexodia
Well, adding to current testsuite debug printing and lines from rv64ui-lui which has status 'FAILURE' I've got:

Instruction:  0x10011a: lui ra, 0
x0:   0x0
x1:   0x0
----------------
[OK] lui   x1, #0x00000
-------------------------------
Instruction:  0x10011e: lui ra, 0xfffff
x0:   0x0
x1:   0xfffffffffffff000
----------------
[OK] lui   x1, #0xfffff
-------------------------------
Instruction:  0x100122: srai ra, ra, 1
x0:   0x0
x1:   0xfffffffffffff800
----------------
[OK] srai  x1, x1, #1
-------------------------------
Instruction:  0x100126: lui ra, 0x7ffff
x0:   0x0
x1:   0x7ffff000
----------------
[OK] lui   x1, #0x7ffff
-------------------------------
Instruction:  0x10012a: srai ra, ra, 0x14
x0:   0x0
x1:   0x7ff
----------------
[OK] srai  x1, x1, #20
-------------------------------
Instruction:  0x10012e: lui ra, 0x80000
x0:   0x0
x1:   0xffffffff80000000
----------------
[OK] lui   x1, #0x80000
-------------------------------
Instruction:  0x100132: srai ra, ra, 0x14
x0:   0x0
x1:   0xfffffffffffff800
----------------
[OK] srai  x1, x1, #20
-------------------------------
Instruction:  0x100136: lui zero, 0
x0:   0x0
x1:   0xfffffffffffff800
----------------
[OK] lui   x0, #0x80000

These seem like equal to test expected values. Maybe the result above is caused by parsing issues in case of more than one instruction in testcase or instruction with immediate operand written without "i".

The test lines in src/testers/riscv/unicorn_test_riscv64.py:

    (b"\xb7\x00\x00\x00", "lui   x1, #0x00000"),
    (b"\xb7\xf0\xff\xff", "lui   x1, #0xfffff"),
    (b"\x93\xd0\x10\x40", "srai  x1, x1, #1"),
    (b"\xb7\xf0\xff\x7f", "lui   x1, #0x7ffff"),
    (b"\x93\xd0\x40\x41", "srai  x1, x1, #20"),
    (b"\xb7\x00\x00\x80", "lui   x1, #0x80000"),
    (b"\x93\xd0\x40\x41", "srai  x1, x1, #20"),
    (b"\x37\x00\x00\x00", "lui   x0, #0x80000"),

and the debug printing right after ctx.processing(inst):

    print("Instruction: ", inst)
    print("x0:  ", hex(ctx.getSymbolicRegisterValue(ctx.registers.x0)))
    print("x1:  ", hex(ctx.getSymbolicRegisterValue(ctx.registers.x1)))

m4drat · 2024-06-10T20:33:29Z

@Antwy

Stumbled upon another corner-case for the REMW instruction:

from unicorn.riscv_const import *
from unicorn import *
from triton import *

CODE_START = 0x0


def emu_unicorn():
    # remw s0, s5, t0
    opcode = b"\x3b\xe4\x5a\x02"

    mu = Uc(UC_ARCH_RISCV, UC_MODE_RISCV64)
    mu.mem_map(CODE_START, 0x1000)
    mu.mem_write(CODE_START, opcode)

    mu.reg_write(UC_RISCV_REG_S5, 0x917665C427EBEE5D)
    mu.reg_write(UC_RISCV_REG_T0, 0x0000000000000000)

    mu.reg_write(UC_RISCV_REG_PC, CODE_START)
    mu.emu_start(CODE_START, CODE_START + len(opcode))

    print("(UC) s0 = 0x%x" % mu.reg_read(UC_RISCV_REG_S0))


def emu_triton():
    # remw s0, s5, t0
    opcode = b"\x3b\xe4\x5a\x02"

    ctx = TritonContext()
    ctx.setArchitecture(ARCH.RV64)

    inst = Instruction()
    inst.setOpcode(opcode)
    inst.setAddress(CODE_START)

    ctx.setConcreteRegisterValue(ctx.registers.x21, 0x917665C427EBEE5D)
    ctx.setConcreteRegisterValue(ctx.registers.x5, 0x0000000000000000)

    ctx.processing(inst)

    print("(TT) s0 = 0x%x" % ctx.getConcreteRegisterValue(ctx.registers.x8))


def main():
    emu_unicorn()
    emu_triton()


if __name__ == "__main__":
    main()

According to ISA Manual, in case of division by 0, the result of the operation should be equal to the lowest 32-bits of the dividend, not 0.

The semantics for division by zero and division overflow are summarized in Table 11. The quotient of
division by zero has all bits set, and the remainder of division by zero equals the dividend. Signed
division overflow occurs only when the most-negative integer is divided by . The quotient of a signed
division with overflow is equal to the dividend, and the remainder is zero. Unsigned division overflow
cannot occur.

mrexodia · 2024-06-10T20:41:57Z

@mrexodia Well, adding to current testsuite debug printing and lines from rv64ui-lui which has status 'FAILURE' I've got:

Instruction:  0x10011a: lui ra, 0
x0:   0x0
x1:   0x0
----------------
[OK] lui   x1, #0x00000
-------------------------------
Instruction:  0x10011e: lui ra, 0xfffff
x0:   0x0
x1:   0xfffffffffffff000
----------------
[OK] lui   x1, #0xfffff
-------------------------------
Instruction:  0x100122: srai ra, ra, 1
x0:   0x0
x1:   0xfffffffffffff800
----------------
[OK] srai  x1, x1, #1
-------------------------------
Instruction:  0x100126: lui ra, 0x7ffff
x0:   0x0
x1:   0x7ffff000
----------------
[OK] lui   x1, #0x7ffff
-------------------------------
Instruction:  0x10012a: srai ra, ra, 0x14
x0:   0x0
x1:   0x7ff
----------------
[OK] srai  x1, x1, #20
-------------------------------
Instruction:  0x10012e: lui ra, 0x80000
x0:   0x0
x1:   0xffffffff80000000
----------------
[OK] lui   x1, #0x80000
-------------------------------
Instruction:  0x100132: srai ra, ra, 0x14
x0:   0x0
x1:   0xfffffffffffff800
----------------
[OK] srai  x1, x1, #20
-------------------------------
Instruction:  0x100136: lui zero, 0
x0:   0x0
x1:   0xfffffffffffff800
----------------
[OK] lui   x0, #0x80000

These seem like equal to test expected values. Maybe the result above is caused by parsing issues in case of more than one instruction in testcase or instruction with immediate operand written without "i".

The test lines in src/testers/riscv/unicorn_test_riscv64.py:

    (b"\xb7\x00\x00\x00", "lui   x1, #0x00000"),
    (b"\xb7\xf0\xff\xff", "lui   x1, #0xfffff"),
    (b"\x93\xd0\x10\x40", "srai  x1, x1, #1"),
    (b"\xb7\xf0\xff\x7f", "lui   x1, #0x7ffff"),
    (b"\x93\xd0\x40\x41", "srai  x1, x1, #20"),
    (b"\xb7\x00\x00\x80", "lui   x1, #0x80000"),
    (b"\x93\xd0\x40\x41", "srai  x1, x1, #20"),
    (b"\x37\x00\x00\x00", "lui   x0, #0x80000"),

and the debug printing right after ctx.processing(inst):

    print("Instruction: ", inst)
    print("x0:  ", hex(ctx.getSymbolicRegisterValue(ctx.registers.x0)))
    print("x1:  ", hex(ctx.getSymbolicRegisterValue(ctx.registers.x1)))

For my emulator I ran into bugs with the immediate loading. So the operation was correct, but certain encodings related to (shifted) immediates were not (especially the sign extension is very complicated). That might also be the case here…

Here are the traces from my emulator, might be helpful: rv64ui-traces.zip

Antwy · 2024-06-13T13:12:25Z

@m4drat
Thanks! Guess this one is fixed too. Please, let me know if you find anything else!

JonathanSalwan · 2024-06-14T12:37:21Z

Can you fix vcpkg by adding the risc feature and updating Capstone version for Appveyor? Once all CIs are green, I will do a quick review and merge it to dev-v1.0 :)

JonathanSalwan · 2024-06-14T13:12:04Z

https://vcpkg.link/ports/capstone/v/5.0.1/1

I think we also have to update capstone in vcpkg to switch from 5.0.0-rc2 to 5.0.1

Antwy · 2024-07-19T11:33:42Z

@JonathanSalwan
As no new bugs were reported for now and finally CIs are green, I'll be looking forward for your review :)

JonathanSalwan · 2024-07-19T17:42:17Z

awesome @Antwy! I will review the patch next week. Thx a lot!

JonathanSalwan · 2024-07-19T17:46:08Z

CMakeLists.txt

@@ -194,6 +194,12 @@ endif()
 message(STATUS "Compiling with Capstone")
 find_package(CAPSTONE 5 REQUIRED)
 message(STATUS "CAPSTONE version: ${CAPSTONE_VERSION}")
+if(NOT ${CS_VERSION_MAJOR} MATCHES 4)


What if it's capstone 3? Maybe we should use GREATER_EQUAL?

Triton's dependency list says that libcapstone >= 4.0.x is required.
Actually it was once GREATER_EQUAL, but vcpkg builder doesn't like it because CS_VERSION_MAJOR is treated as an empty variable

Good point. Well, ok to lets keep if(NOT ${CS_VERSION_MAJOR} MATCHES 4) if it's better for CIs.

Other solutions:

we could upgrade the cmake used by vcpkg. GREATER_EQUAL is used from cmake 3.7.2.

On cmake 3.0.2 we have the GREATER variable.

Looks like cmake 3.30 is used by vcpkg. So, using GREATER should be ok

GREATER_EQUAL is ok too with manually specified first argument

JonathanSalwan · 2024-07-19T17:48:08Z

src/libtriton/arch/architecture.cpp

+#include <triton/riscv32Cpu.hpp>
+#include <triton/riscv64Cpu.hpp>
+#endif
+#include <triton/riscvSpecifications.hpp>


riscvSpecifications.hpp is not under the ifdef?

JonathanSalwan

Other than these comments, looks good to me. Nice job!

JonathanSalwan · 2024-07-22T12:40:41Z

src/libtriton/bindings/python/namespaces/initOpcodesNamespace.cpp

@@ -10,6 +10,7 @@
 #include <triton/pythonXFunctions.hpp>
 #include <triton/aarch64Specifications.hpp>
 #include <triton/arm32Specifications.hpp>
+#include <triton/riscvSpecifications.hpp>


ifdef COMPILE_RISCV ?

JonathanSalwan · 2024-07-22T12:56:30Z

src/libtriton/bindings/python/namespaces/initRegNamespace.cpp

@@ -131,6 +131,36 @@ namespace triton {

        PyObject* arm32RegistersDictClass = xPyClass_New(nullptr, arm32RegistersDict, xPyString_FromString("ARM32"));
        xPyDict_SetItemString(registersDict, "ARM32", arm32RegistersDictClass);
+


At top of this file, can you add the doxygen documentation like it is with others archs?

To generate the rv64_reg and rv32_reg files, you have to edit this one. You can simply copy/past the aarch64 part. This python script is used to extract register names from the spec.

Then, you have to edit this file. You can simply copy/past like it is with others archs.

looks like fixed

JonathanSalwan · 2024-07-22T12:58:21Z

src/libtriton/includes/triton/archEnums.hpp

@@ -193,6 +209,56 @@ namespace triton {

    /*! @} End of arm namespace */
    };
+
+    //! The riscv namespace
+    namespace riscv {


ifdef COMPILE_RISCV?

JonathanSalwan · 2024-07-25T08:19:32Z

Ok, let's merge this into dev-v1.0. thx a lot @Antwy for this contribution!

JonathanSalwan requested review from JonathanSalwan and cnheitman April 3, 2024 17:21

Antwy force-pushed the riscv-support branch 2 times, most recently from efccaaa to b38a401 Compare April 4, 2024 11:39

Antwy force-pushed the riscv-support branch from b38a401 to e155584 Compare May 3, 2024 17:13

Antwy force-pushed the riscv-support branch from 8903f3a to 86d85cd Compare May 6, 2024 14:58

Antwy force-pushed the riscv-support branch from 86d85cd to 5beeb3a Compare May 30, 2024 13:39

Antwy force-pushed the riscv-support branch from 5beeb3a to 781f99b Compare June 4, 2024 11:55

Antwy force-pushed the riscv-support branch from b230b87 to fbf9f42 Compare June 10, 2024 17:22

Antwy force-pushed the riscv-support branch 2 times, most recently from f92dcca to 2d190da Compare June 14, 2024 13:18

Antwy added 8 commits July 16, 2024 17:04

Add python bindings & binary emulation test for riscv 64/32

ba2aa12

Fix RV64 bitsize of reg-reg shift semantics (sll/sra/srl)

239c5c6

Fix division operations corner cases semantics

f096f25

Fix vcpkg CI

181408e

Fix word extracting shifts

409a094

Fix immutable register assignments

df5eca9

Fix compressed shift immediate bits

aa726cb

Revert AppVeyor to Capstone 4.0.2

3da06a7

Antwy force-pushed the riscv-support branch 4 times, most recently from 6ce7657 to 05a9162 Compare July 18, 2024 17:29

Use riscv only with Capstone version 5+

b727410

Antwy force-pushed the riscv-support branch from 05a9162 to b727410 Compare July 18, 2024 17:42

JonathanSalwan reviewed Jul 19, 2024

View reviewed changes

Fix issues

43d8399

JonathanSalwan reviewed Jul 22, 2024

View reviewed changes

Antwy added 3 commits July 22, 2024 17:18

Fix more issues on riscv

ea3cb55

Fix cmake

7d19b69

Uncomment mrs/msr aarch64 capstone version depending test

ea19db3

Antwy force-pushed the riscv-support branch from 7cadb22 to ea19db3 Compare July 24, 2024 13:44

JonathanSalwan merged commit 6ee4e2c into JonathanSalwan:dev-v1.0 Jul 25, 2024
26 checks passed

JonathanSalwan added this to the v1.0 milestone Jul 25, 2024

JonathanSalwan added Feature ISA Semantics libTriton labels Jul 25, 2024

JonathanSalwan pushed a commit that referenced this pull request Jul 25, 2024

#1318: fix documentation

93e493a

		@@ -131,6 +131,36 @@ namespace triton {

		PyObject* arm32RegistersDictClass = xPyClass_New(nullptr, arm32RegistersDict, xPyString_FromString("ARM32"));
		xPyDict_SetItemString(registersDict, "ARM32", arm32RegistersDictClass);

RISCV basic support #1318

RISCV basic support #1318

Conversation

Antwy commented Apr 3, 2024 • edited Loading

JonathanSalwan commented Apr 3, 2024 • edited Loading

JonathanSalwan commented Apr 3, 2024 • edited Loading

cnheitman commented Apr 3, 2024

Antwy commented Apr 4, 2024

cnheitman commented Apr 14, 2024

Antwy commented May 3, 2024

Antwy commented May 30, 2024

mrexodia commented Jun 4, 2024 • edited Loading

m4drat commented Jun 9, 2024

Antwy commented Jun 10, 2024

mrexodia commented Jun 10, 2024

m4drat commented Jun 10, 2024

mrexodia commented Jun 10, 2024

Antwy commented Jun 10, 2024 • edited Loading

m4drat commented Jun 10, 2024

mrexodia commented Jun 10, 2024 • edited Loading

Antwy commented Jun 13, 2024

JonathanSalwan commented Jun 14, 2024

JonathanSalwan commented Jun 14, 2024 • edited Loading

Antwy commented Jul 19, 2024 • edited Loading

JonathanSalwan commented Jul 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JonathanSalwan Jul 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JonathanSalwan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JonathanSalwan commented Jul 25, 2024

Antwy commented Apr 3, 2024 •

edited

Loading

JonathanSalwan commented Apr 3, 2024 •

edited

Loading

JonathanSalwan commented Apr 3, 2024 •

edited

Loading

mrexodia commented Jun 4, 2024 •

edited

Loading

Antwy commented Jun 10, 2024 •

edited

Loading

mrexodia commented Jun 10, 2024 •

edited

Loading

JonathanSalwan commented Jun 14, 2024 •

edited

Loading

Antwy commented Jul 19, 2024 •

edited

Loading

JonathanSalwan Jul 22, 2024 •

edited

Loading