Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faults and weird behavior on armv5te #64126

Closed
M1cha opened this issue Sep 3, 2019 · 9 comments
Closed

faults and weird behavior on armv5te #64126

M1cha opened this issue Sep 3, 2019 · 9 comments
Labels
O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state

Comments

@M1cha
Copy link

M1cha commented Sep 3, 2019

I'm trying to use rust binaries on an armv5te gateway using yocto(thud, meta-atmel).
Depending on the configuration(rust-version, target-flags, lto, optimization) I have anything from a (luckily) working binary over wrong logic-level behavior to segmentation faults.
During runtime, these problems are deterministic and are introduced during compile-time.

I've uploaded a minmial example of one of these symptoms for you to be able to reproduce the problem: https://github.com/M1cha/rust_armfault/tree/arcswap
config:

  • rustup's stable toolchain(stable-x86_64-unknown-linux-gnu, rustc 1.31.1 (b6c32da9b 2018-12-18))
  • arm-unknown-linux-gnueabi toolchain built by crosstools-ng 1:1.24.0.r6.gafaf7b9a-1
  • the following ~/.cargo/config
[target.armv5te-unknown-linux-gnueabi]
linker = "arm-linux-gnueabi-gcc"
  • build command: cargo build --release --target armv5te-unknown-linux-gnueabi

This specific fault only occurs with LTO enabled, during optimization levels 2 and 3.
To be clear, unless I'm fighting multiple bugs at once, this bug is NOT caused by LTO, it just happens to occur in that config. I have other symptoms which occur without LTO and optimization level 1, but I can't reproduce them outside of my yocto environment.(e.g. a segfault during regex::Regex::new).

This is the call-stack during the fault:

#0  0xffff0fc0 in ?? ()
#1  0x00457d68 in compiler_builtins::arm_linux::__kuser_cmpxchg ()
    at rustc/compiler_builtins_shim/../../libcompiler_builtins/src/arm_linux.rs:8
#2  compiler_builtins::arm_linux::atomic_cmpxchg ()
    at rustc/compiler_builtins_shim/../../libcompiler_builtins/src/arm_linux.rs:85
#3  __sync_val_compare_and_swap_4 () at rustc/compiler_builtins_shim/../../libcompiler_builtins/src/arm_linux.rs:103
#4  0x00403ab0 in <arc_swap::ArcSwapAny<T, S>>::wait_for_readers ()
#5  0x004169e4 in armfault::main ()
#6  0x00416a94 in std::rt::lang_start::{{closure}} ()
#7  0x00416498 in main ()

registers:

r0             0x0                 0
r1             0x0                 0
r2             0x45acdc            4566236
r3             0x0                 0
r4             0x0                 0
r5             0x0                 0
r6             0x45acdc            4566236
r7             0xffff0fc0          4294905792
r8             0x474100            4669696
r9             0x4740c4            4669636
r10            0xbefffb20          3204447008
r11            0x0                 0
r12            0x473ee4            4669156
sp             0xbefffae0          0xbefffae0
lr             0x457d68            4554088
pc             0xffff0fc0          0xffff0fc0
cpsr           0x60000010          1610612752

user_debug kernel log:

armfault: unhandled page fault (11) at 0x0045acdc, code 0x81f
pgd = 6ca4b731
[0045acdc] *pgd=25d04831, *pte=2406d18f, *ppte=2406daae
CPU: 0 PID: 28225 Comm: armfault Not tainted 4.19.61-yocto-standard #1
Hardware name: Atmel AT91SAM9
PC is at _einittext+0x3f902c00/0xffe43ce8
LR is at 0x457d68
pc : [<ffff0fc0>]    lr : [<00457d68>]    psr: 60000010
sp : befffae0  ip : 00473ee4  fp : 00000000
r10: befffb20  r9 : 004740c4  r8 : 00474100
r7 : ffff0fc0  r6 : 0045acdc  r5 : 00000000  r4 : 00000000
r3 : 00000000  r2 : 0045acdc  r1 : 00000000  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode USER_32  ISA ARM  Segment user
Control: 0005317f  Table: 25fa4000  DAC: 00000055
CPU: 0 PID: 28225 Comm: armfault Not tainted 4.19.61-yocto-standard #1
Hardware name: Atmel AT91SAM9
[<c000faf0>] (unwind_backtrace) from [<c000d248>] (show_stack+0x10/0x14)
[<c000d248>] (show_stack) from [<c000fe3c>] (__do_user_fault+0x94/0xf0)
[<c000fe3c>] (__do_user_fault) from [<c001015c>] (do_page_fault+0x250/0x284)
[<c001015c>] (do_page_fault) from [<c0010304>] (do_DataAbort+0x48/0xe8)
[<c0010304>] (do_DataAbort) from [<c0009d0c>] (__dabt_usr+0x4c/0x60)
Exception stack(0xc5c79fb0 to 0xc5c79ff8)
9fa0:                                     00000000 00000000 0045acdc 00000000
9fc0: 00000000 00000000 0045acdc ffff0fc0 00474100 004740c4 befffb20 00000000
9fe0: 00473ee4 befffae0 00457d68 ffff0fc0 60000010 ffffffff

Program received signal SIGSEGV, Segmentation fault.

instructions at pc, pc+4, pc+8:

(gdb) x/i $pc
=> 0xffff0fc0:  ldr     r3, [r2]
(gdb) x/i $pc + 4
   0xffff0fc4:  subs    r3, r3, r0
(gdb) x/i $pc + 8
   0xffff0fc8:  streq   r1, [r2]

At first glance this looks like a fault-on-read but due to how __kuser_cmpxchg is implemented it actually faults at the write(0xffff0fc8), then jumps back to 0xffff0fc0 and then informs gdb via ptrace.
The address it tries to write to is 0x45acdc which maps to 0x5acdc inside the elf binary.
This area is part of the rodata section:
[15] .rodata PROGBITS 0005ab80 05ab80 00616c 00 A 0 0 64
There's no symbol at that location though.

I hope that this helps you to track down the bug because this keeps me from using rust on this architecture.

@sanxiyn sanxiyn added the O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state label Sep 4, 2019
@rettichschnidi
Copy link

I was able to reproduce the issue with rustc 1.31.1 and 1.37.0.

For 1.37.0 and 1.39.0 the issues seems to be fixed.

@M1cha
Copy link
Author

M1cha commented Dec 11, 2019

I managed to reproduce this specific issue without arc-swap: https://github.com/M1cha/rust_armfault/tree/atomicptr

basically, if you load from a global static AtomicPtr which is not inside the current crate(it has to be in an external crate), then LTO links against the wrong memory location.

@Amanieu
Copy link
Member

Amanieu commented Mar 21, 2020

I tried to reproduce this with the latest nightly but wasn't able to. Can you check on your end to see if this issue is fixed?

@M1cha
Copy link
Author

M1cha commented Mar 21, 2020

I can't reproduce it anymore either. My hope is that it actually got fixed and it's not just that it coincidentally stopped causing a crash.

@Amanieu
Copy link
Member

Amanieu commented Mar 21, 2020

I'm going to close this issue unless someone can reproduce this with the latest compiler.

@Amanieu Amanieu closed this as completed Mar 21, 2020
@Fighter19
Copy link

Fighter19 commented Oct 31, 2022

I've recently encountered a similar problem.

I use cargo-xbuild to build my project.

The object file for compiler_builtins contains multiple such occurrences.
For example on symbol: _ZN17compiler_builtins9arm_linux29__sync_val_compare_and_swap_417h4daedd6aaff95c3cE

Disassembly of section .text._ZN17compiler_builtins9arm_linux29__sync_val_compare_and_swap_417h4daedd6aaff95c3cE:

00000000 <_ZN17compiler_builtins9arm_linux29__sync_val_compare_and_swap_417h4daedd6aaff95c3cE>:
   0:	e92d48f0 	push	{r4, r5, r6, r7, fp, lr}
   4:	e59f7034 	ldr	r7, [pc, #52]	; 40 <_ZN17compiler_builtins9arm_linux29__sync_val_compare_and_swap_417h4daedd6aaff95c3cE+0x40>
   8:	e1a05002 	mov	r5, r2
   c:	e1a04001 	mov	r4, r1
  10:	e1a06000 	mov	r6, r0
  14:	e5960000 	ldr	r0, [r6]
  18:	e1500004 	cmp	r0, r4
  1c:	1a000006 	bne	3c <_ZN17compiler_builtins9arm_linux29__sync_val_compare_and_swap_417h4daedd6aaff95c3cE+0x3c>
  20:	e1a00004 	mov	r0, r4
  24:	e1a01005 	mov	r1, r5
  28:	e1a02006 	mov	r2, r6
  2c:	e12fff37 	blx	r7
  30:	e3500000 	cmp	r0, #0
  34:	1afffff6 	bne	14 <_ZN17compiler_builtins9arm_linux29__sync_val_compare_and_swap_417h4daedd6aaff95c3cE+0x14>
  38:	e1a00004 	mov	r0, r4
  3c:	e8bd88f0 	pop	{r4, r5, r6, r7, fp, pc}
  40:	ffff0fc0 			; <UNDEFINED> instruction: 0xffff0fc0

Disassembly of section .ARM.exidx.text._ZN17compiler_builtins9arm_linux29__sync_val_compare_and_swap_417h4daedd6aaff95c3cE:

00000000 <.ARM.exidx.text._ZN17compiler_builtins9arm_linux29__sync_val_compare_and_swap_417h4daedd6aaff95c3cE>:
   0:	00000000 	andeq	r0, r0, r0
   4:	00000001 	andeq	r0, r0, r1

There are 72 cases of that happening in the object file.
All of which I believe to be related to sync_fetch_... functions.

EDIT: Note, I did retry this again with a custom armv6kz-none-eabihf target, I did not encounter any issues there, so this appears to be a problem limited to armv5te.

I tested following two targets which are certainly affected (likely even more):
armv5te-unknown-linux-gnueabi
arm-linux-androideabi

@Amanieu
Copy link
Member

Amanieu commented Nov 1, 2022

That's a different issue: rust-lang/compiler-builtins#420

@Fighter19
Copy link

Fighter19 commented Nov 1, 2022

I don't see how that's a different issue.
The code generated in my disassembly file tries to branch with link to 0xffff0fc0,
which is the same as OP's issue.
Both are generated when using atomic functions.
I have no issues of multiple definitions.
EDIT:
Never mind, it's probably not an issue at all.
I assume, that's just the intended way of emulating these features.
Even though I find no information about 0xffff0fc0 being some kind of implicit location for emulation of atomic functions.

@Amanieu
Copy link
Member

Amanieu commented Nov 1, 2022

https://www.kernel.org/doc/Documentation/arm/kernel_user_helpers.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state
Projects
None yet
Development

No branches or pull requests

5 participants