-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aarch64 - Failed to iterate the unwind table: UnknownCallFrameInstruction(DwCfa(45)) #114
Comments
For cross-compilation to work you also need to specify a linker, otherwise it'll use the one that's default on your system. (You can see in the error message that it prints out It's probably the best to do this in
|
Yeah, unfortunately without detailed logs it's not possible to figure out why this happens. |
Thanks, setting CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_LINKER worked for compilation of libbytehound.so |
Well, that's a possibility. Could you figure out which exact flags are used to compile the program you want to profile?
That error is from the So fixing this specific error would require either recompiling your program so that the unsupported DWARF instruction is not emitted (possibly by disabling the security mitigation, but in your case this is most likely not going to be practical as you'd have to recompile everything), or adding support to gimli so that it can process it, and then maybe (or maybe not, I haven't looked into this in a lot of detail) adding something to Possibly relevant LLVM/libunwind pull request for inspiration. |
Hi I did it on binary, and it does not contain any debug symbols. Then I thought that maybe some libs used by this binary may contain this entry. So I used llvm-dwarfdump on all libs from ldd, and everything is stripped. According to my simple binary - I was wrong. I mean it worked - graph shown by bytehound is ok, but there are visible 'Failed to iterate the unwind table: UnknownCallFrameInstruction(DwCfa(45))' messages. Base compilation has such flags: Now I am confused - why there is a message UnknownCallFrameInstruction visible if we dont have any debug symbols in all libs and binary? Anyway, I'll try to modify gimli so it will not return UnknownCallFrameInstruction |
These are not necessarily from debug symbols. DWARF is also used for normal unwind tables in non-debug binaries. (Usually the
Well, the way to go here is to make it handle it properly, not just ignore it. (: This error pops up for a reason, and AFAIK this DWARF instruction may be used when fetching the return address register, so if you'll make it ignore it you might get incorrect results later. The easiest way to fix it is to probably write a test program which is affected by this, then from within that program grab the backtrace using |
You're right, it's eh_frame - it contains DW_CFA_AARCH64_negate_ra_state entries. llvm-dwarfdump --eh-frame libc.so.6 shows for example
I'd like to change dependency for nwind (gimli) in not-perf project. NOT-PERF:
This builds with success As I wanted to change sources of gimli so I download it
As you can see in nwind Cargo this points to 0.25 version of gimli:
So I rebase GIMLI to 0.25
And change nwind Cargo.toml to point to local gimli using path argument
And build:
I can see some errors pointing to nwind :
But all I did is used the same version (0.25) which is local, not from github. Cargo.lock in not-perf points to the same version:
Did I something wrong? |
That's because
As you can see both There are two ways of handling it.
Usually it's a lot more convenient to do (2), especially if you're replacing a dependency which is used by a bunch of stuff in the dependency tree. |
I patched bytehound with gimli, where I treat this in the same way as NOP.
So probably as you said, it is needed to interpret it properly. I checked the code you mentioned
But even if there are not so many changes I really don't know how to implement that in gimli ;/ As you proposed I wrote test - it just prints the backtrace - note that branch-protection is on my libc.so.6 WITHOUT branch-protection and WITHOUT gimli changes
WITH branch protection AND WITHOUT gimli changes
WITH branch protection and WITH gimli changes - treating this opcode as NOP
Now I'm stuck. Implementing gimli changes to do proper things is quite difficult, and as you can see - for example if I treat it in the same way as NOP then it works. But for bigger (in my case c++) apps, where bytehound is ld-preloaded - (sometimes) it crashes, and I don't know how to catch the problematic scenario - app is killed, so I can't even grab the coredump. |
Yes, it's possible that the app will crash if this is not implemented properly, because the unwinding algorithm in Bytehound depends on the unwinding being correct. Here's what libunwind does, and here's what gdb does when it encounters this instructions. So it looks like it toggles the special/hidden RA state register. Other DWARF instructions then could maybe use this value? I can't really give you much more details than this because I'm not familiar with how exactly this is implemented and I don't have the time to research it. It also looks like it could be possible (but I can be totally wrong here; this is just a result of 5 minutes of me googling) to treat this instruction as a NOP if pointer authentication is disabled with the |
I'm not sure but I guess there is an issue with the object file format during the linking process. Specifically, the linker (cc) encountered relocations in generic ELF, and it complains about the format:
there might be a mismatch between the target architecture specified during compilation and the actual architecture of the linked object files. Maybe try to check dependency build settings (libmimalloc_sys, libtikv_jemalloc_sys, libnwind) built for aarch64 In my case, after editing and running the following:
I built it:
There also might be a problem with the system libraries i.e. issues can arise if the system's C/C++ libraries are not compatible with the cross-compilation target. |
I'll recompile it once again, to be sure but I'm quite sure that I passed everything for my latest checks, even details like values od -march, -mcpu, -mtune. CC and linker was used from SDK. Anyway I'll do it once again. Anyway, besides changes in gimli I also disabled shadow based stack unwinding and it helped a lot, crashes are much later and probably connected with fork calls. I'll also try to disable branch security in kernel command line since I have newer than 3.12 :) Thanks |
I went back to this issue and did the basic steps further, but still I can't fix it. I see gimli added support for this opcode: gimli-rs/gimli#667 I have some test as you suggested which can be run in docker aarch64
while when I compile it without
So the output is much better. This is what you suggested to do. The app I am runnig is:
Now I'm stuck. I hoped that fixing gimli would be enough but it's not. I tried to understand the references you mentioned and it seems that I need to add the authentication, but I dont know where... I can see https://reviews.llvm.org/D123692 And there https://llvm.googlesource.com/libunwind/+/96fa50101690f48f0e7a7ffe363a5612d9ecac41%5E%21/ Any help would be appreciated! |
I'm not super familiar with how this works, but this page seems to explain the whole pointer authentication feature pretty well: https://developer.arm.com/documentation/102433/0100/Return-oriented-programming So, basically, when a function is entered the return address in the LR register is signed (that is, some of the bits which normally are guaranteed to be always zero for pointers are used to store the signature), and then pushed on the stack. And then on return that address is popped, and a special return instruction is used which 1) verifies that the signature is still correct, and 2) performs a return ignoring the bits clobbered with the signature. Alternatively there's also a backwards-compatible mode where this special return instruction is split into two instruction, where the first instruction verifies the signature and clears the signature bits from the address, and a standard return instruction is used to return. (...and on older CPUs which don't support this feature the special instructions are treated as NOPs) So, basically, two things need to be fixed here:
(...or for (2) the replacement pointers could maybe also be signed by Bytehound and put on the stack? That might or might not work; again, I'm not super familiar with how exactly this works so I'm not sure without trying it out) Have you actually tried to disable this feature in the kernel? That should, I think, make it work since the pointers won't be mangled nor authenticated anymore (assuming the |
Hmm seems that indeed the easiest way is to disable it. so it's really a lot better to just disable the pointer authentication feature instead (which, again, AFAIK can be disabled adding a single parameter to the kernel command line arguments). You're right it can be changed via kernel command line arguments. I even wrote to Marc, the author of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/Documentation/admin-guide/kernel-parameters.txt?h=v5.12&id=f8da5752fd1b25f1ecf78a79013e2dfd2b860589 Thanks for your help, I'll focus on turning it off instead of trying to fix, seems that it is something I can handle ;) |
I'm trying to cross-compile bytehound for aarch64
What am I doing wrong with this cross-compilation?
I tried with sdk, where I passed also --sysroot and there is the same output
P.S.
I'm trying to recompile it since current bytehound I'm using crashes binaries for that architecture and I thought that maybe I'm using too old version which does not match some libs in the binary I'm trying to profile. It collects some data for ~5mins and then app is crashing. Profiled do not have any good backtrace ;(
Disabling shadow-based stack unwinding did not help. Maybe you have some idea what caused that problem?
The text was updated successfully, but these errors were encountered: