Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ath11k kernel panic on Pi 5 with 8GB RAM, but not on 2GB (DMA/PCI-E kernel panic) #6424

Open
omerk opened this issue Oct 17, 2024 · 2 comments

Comments

@omerk
Copy link

omerk commented Oct 17, 2024

Describe the bug

ath11k kernel module works as expected on a Raspberry Pi 5 board with 2 GB RAM but the same image (same boot media) fails with a DMA/memory related kernel panic on the 8GB unit.

Limiting the memory of the 8GB unit to 2GB (mem=2G in cmdline.txt) fixes the issue. Detailed logs below.

Test setup consists of two units:

Unit 1: Raspberry Pi 5, 2GB RAM, Official M.2 Hat, QCN9074 WiFi module (PCI-E)
Unit 2: Raspberry Pi 5, 8GB RAM, Official M.2 Hat, QCN9074 WiFi module (PCI-E)

WiFi modules used are identical brand/model and from the same batch, boot media is shared across the two units to ensure there are no config-related issues.

Not entirely sure if this is specifically an issue with the ath11k driver, as it is seems to work on other platforms, perhaps this is a BCM2712 DMA / PCI-E restriction? Speculating, of course, thanks in advance for your assistance.

Steps to reproduce the behaviour

(On a fresh device, with no WiFi configuration)

  • Compile, install and boot up the custom kernel
  • Use nmcli to connect to a WiFi network
  • kernel panic

(With a valid WiFi configuration set up)

  • Boot device with custom kernel
  • kernel panic

Device (s)

Raspberry Pi 5

System

Kernel version:

$ git rev-parse HEAD
84ab77459e61c648299d32464127b89ca65de40a

$ uname -a
Linux raspberrypi 6.6.56-v8-16k-x+ #1 SMP PREEMPT Thu Oct 17 13:34:10 BST 2024 aarch64 GNU/Linux

.config used to compile the kernel, which is essentially the standard 2712 config with ath11k enabled, attached: kernel-config.zip

config.txt used on device:

dtoverlay=disable-wifi
dtoverlay=disable-bt

# For QCN9074
dtparam=pciex1
dtparam=pciex1_gen=3

# Force PCIe config to support 32bit DMA addresses at the expense of having to bounce buffers.
# https://github.com/raspberrypi/firmware/blob/b154632e320b87ea95c6ce8b59f96dbbe523ecf1/boot/overlays/README#L3597
dtoverlay=pcie-32bit-dma

# Compatibility features
# https://github.com/raspberrypi/firmware/blob/b154632e320b87ea95c6ce8b59f96dbbe523ecf1/boot/overlays/README#L3611
# no-mip: Use if a) more than 8 interrupt vectors are required or b) the EP requires DMA and MSI addresses to be 32bit.
dtoverlay=pciex1-compat-pi5,no-mip

# Uncomment some or all of these to enable the optional hardware interfaces
#dtparam=i2c_arm=on
#dtparam=i2s=on
#dtparam=spi=on

# Enable audio (loads snd_bcm2835)
dtparam=audio=on

# Additional overlays and parameters are documented
# /boot/firmware/overlays/README

# Automatically load overlays for detected cameras
camera_auto_detect=1

# Automatically load overlays for detected DSI displays
display_auto_detect=1

# Automatically load initramfs files, if found
auto_initramfs=1

# Enable DRM VC4 V3D driver
dtoverlay=vc4-kms-v3d
max_framebuffers=2

# Don't have the firmware create an initial video= setting in cmdline.txt.
# Use the kernel's default instead.
disable_fw_kms_setup=1

# Run in 64-bit mode
arm_64bit=1

# Disable compensation for displays with overscan
disable_overscan=1

# Run as fast as firmware / board allows
arm_boost=1

[cm4]
# Enable host mode on the 2711 built-in XHCI USB controller.
# This line should be removed if the legacy DWC2 controller is required
# (e.g. for USB device mode) or if USB support is not required.
otg_mode=1

[cm5]
dtoverlay=dwc2,dr_mode=host

Logs

Working kit (Unit with 2GB RAM)

$ cat /proc/cpuinfo | grep "Model"
Model           : Raspberry Pi 5 Model B Rev 1.0

$ free -m
               total        used        free      shared  buff/cache   available
Mem:            2009         254        1638           5         172        1754
Swap:            199           0         199

$ vcgencmd get_mem arm && vcgencmd get_mem gpu
arm=1020M
gpu=4M

ath11k is loaded on boot:

$ dmesg | grep ath11k
[    6.801102] ath11k_pci 0000:01:00.0: BAR 0: assigned [mem 0x1b80000000-0x1b801fffff 64bit]
[    6.801137] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    6.820708] ath11k_pci 0000:01:00.0: MSI vectors: 16
[    6.820724] ath11k_pci 0000:01:00.0: qcn9074 hw1.0
[    7.329153] ath11k_pci 0000:01:00.0: chip_id 0x0 chip_family 0x0 board_id 0xa0 soc_id 0xffffffff
[    7.329165] ath11k_pci 0000:01:00.0: fw_version 0x270206d0 fw_build_timestamp 2022-08-04 12:48 fw_build_id WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1

WiFi networks are listed:

$ nmcli dev wifi list
IN-USE  BSSID              SSID            MODE   CHAN  RATE        SIGNAL  BAR>
        5A:09:D4:FA:34:89  BTWi-fi         Infra  40    405 Mbit/s  44      ▂▄_>
        5A:09:D4:FA:34:8A  BTWifi-X        Infra  40    405 Mbit/s  40      ▂▄_>
        4C:09:D4:FA:34:88  BTHub5-CMCS     Infra  40    405 Mbit/s  37      ▂▄_>
        EC:6C:9A:4A:61:54  BT-JWAKQR       Infra  40    540 Mbit/s  27      ▂__>
...

nmcli used to connect to WiFi network:

$ sudo nmcli dev wifi connect <ap> password <password>
Device 'wlan0' successfully activated with '7a1e9176-f639-4ccf-8b19-c656fc9a1150'.

$ ip -c a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether 2c:cf:67:83:eb:b8 brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether c4:93:00:3a:34:a2 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.194/24 brd 192.168.1.255 scope global dynamic noprefixroute wlan0
       valid_lft 86377sec preferred_lft 86377sec
    inet6 fe80::a7c7:324f:bc91:522a/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

$ ping raspberrypi.com
PING raspberrypi.com (172.67.154.53) 56(84) bytes of data.
64 bytes from 172.67.154.53 (172.67.154.53): icmp_seq=1 ttl=58 time=9.50 ms
64 bytes from 172.67.154.53 (172.67.154.53): icmp_seq=2 ttl=58 time=12.3 ms
64 bytes from 172.67.154.53 (172.67.154.53): icmp_seq=3 ttl=58 time=13.5 ms
^C
--- raspberrypi.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 9.502/11.780/13.507/1.680 ms

Works as expected, no issue to report.

Non-working kit (Unit with 8GB RAM)

$ cat /proc/cpuinfo | grep "Model"
Model           : Raspberry Pi 5 Model B Rev 1.0

$ free -m
               total        used        free      shared  buff/cache   available
Mem:            8052         300        7691           5         168        7752
Swap:            199           0         199

$ vcgencmd get_mem arm && vcgencmd get_mem gpu
arm=1020M
gpu=4M

$ dmesg | grep ath11k
[    7.140417] ath11k_pci 0000:01:00.0: BAR 0: assigned [mem 0x1b80000000-0x1b801fffff 64bit]
[    7.140444] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    7.140717] ath11k_pci 0000:01:00.0: MSI vectors: 16
[    7.140728] ath11k_pci 0000:01:00.0: qcn9074 hw1.0
[    7.590439] ath11k_pci 0000:01:00.0: chip_id 0x0 chip_family 0x0 board_id 0xa0 soc_id 0xffffffff
[    7.590449] ath11k_pci 0000:01:00.0: fw_version 0x270206d0 fw_build_timestamp 2022-08-04 12:48 fw_build_id WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1

$ nmcli dev wifi list
IN-USE  BSSID              SSID                      MODE   CHAN  RATE        S>
        4C:09:D4:FA:34:88  BTHub5-CMCS               Infra  40    405 Mbit/s  3>
        5A:09:D4:FA:34:8A  BTWifi-X                  Infra  40    405 Mbit/s  3>
        5A:09:D4:FA:34:89  BTWi-fi                   Infra  40    405 Mbit/s  3>
        EC:6C:9A:4A:61:54  BT-JWAKQR                 Infra  40    540 Mbit/s  2>
        62:6C:9A:4A:61:56  EE WiFi-X                 Infra  40    540 Mbit/s  2>
...

Trying to connect to a WiFi network results in a kernel panic:

$ sudo nmcli dev wifi connect <ap> password <password>
[  123.832476] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  123.841313] Mem abort info:
[  123.844114]   ESR = 0x0000000096000145
[  123.847909]   EC = 0x25: DABT (current EL), IL = 32 bits
[  123.853243]   SET = 0, FnV = 0
[  123.856304]   EA = 0, S1PTW = 0
[  123.859452]   FSC = 0x05: level 1 translation fault
[  123.864348] Data abort info:
[  123.867234]   ISV = 0, ISS = 0x00000145, ISS2 = 0x00000000
[  123.872742]   CM = 1, WnR = 1, TnD = 0, TagAccess = 0
[  123.877811]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  123.883141] user pgtable: 16k pages, 47-bit VAs, pgdp=0000000101bcc000
[  123.889694] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[  123.898432] Internal error: Oops: 0000000096000145 [#1] PREEMPT SMP
[  123.904722] Modules linked in: michael_mic qrtr_mhi binfmt_misc qrtr ath11k_pci mhi ath11k qmi_helpers spidev mac80211 vc4 snd_soc_hdmi_codec drm_display_helper libarc4 cec cfg80211 drm_dma_helper sg drm_kms_helper snd_soc_core rpivid_hevc(C) aes_ce_blk pisp_be v4l2_mem2mem aes_ce_cipher snd_compress ghash_ce videobuf2_dma_contig gf128mul snd_pcm_dmaengine libaes rfkill videobuf2_memops snd_pcm videobuf2_v4l2 sha2_ce sha256_arm64 sha1_ce videodev snd_timer raspberrypi_hwmon videobuf2_common snd mc v3d i2c_brcmstb gpio_keys spi_bcm2835 gpu_sched raspberrypi_gpiomem pwm_fan rp1_adc drm_shmem_helper nvmem_rmem uio_pdrv_genirq uio drm fuse drm_panel_orientation_quirks backlight dm_mod ip_tables x_tables ipv6
[  123.967462] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C         6.6.56-v8-16k-x+ #1
[  123.976108] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[  123.981960] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  123.988947] pc : dcache_inval_poc+0x28/0x58
[  123.993145] lr : arch_sync_dma_for_cpu+0x34/0x50
[  123.997776] sp : ffffc00080003c40
[  124.001095] x29: ffffc00080003c40 x28: ffff80010162c860 x27: ffffc00080003eb8
[  124.008257] x26: ffffc00080003ce4 x25: 0000000000000000 x24: 0000000000000005
[  124.015419] x23: 00000000000025f0 x22: 0000000000000040 x21: 0000000000000002
[  124.022581] x20: ffff800100fab0c0 x19: ffffffffffffffff x18: 0000000000000000
[  124.029743] x17: ffffb0017a7b8000 x16: ffffd000841375c8 x15: 00005555fa586b70
[  124.036905] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  124.044067] x11: 00000000000000cf x10: 00000000000000c8 x9 : ffffd000841376c0
[  124.051229] x8 : ffffc00080003d38 x7 : 0000000000000000 x6 : 0000000000000000
[  124.058390] x5 : 00000001040c0000 x4 : ffff800104cc6820 x3 : 000000000000003f
[  124.065552] x2 : 0000000000000040 x1 : 0000000000000000 x0 : ffffffffffffffff
[  124.072714] Call trace:
[  124.075161]  dcache_inval_poc+0x28/0x58
[  124.079006]  dma_sync_single_for_cpu+0xf8/0x128
[  124.083549]  ath11k_hal_srng_prefetch_desc+0x6c/0xa0 [ath11k]
[  124.089341]  ath11k_hal_srng_access_begin+0x44/0x58 [ath11k]
[  124.095038]  ath11k_dp_process_rx+0xd0/0x3b8 [ath11k]
[  124.100124]  ath11k_dp_service_srng+0x32c/0x360 [ath11k]
[  124.105471]  ath11k_pcic_ext_grp_napi_poll+0x3c/0xd8 [ath11k]
[  124.111254]  __napi_poll+0x40/0x208
[  124.114751]  net_rx_action+0x2e0/0x338
[  124.118508]  handle_softirqs+0x118/0x360
[  124.122440]  __do_softirq+0x1c/0x28
[  124.125935]  ____do_softirq+0x18/0x30
[  124.129605]  call_on_irq_stack+0x24/0x58
[  124.133536]  do_softirq_own_stack+0x24/0x38
[  124.137730]  irq_exit_rcu+0x8c/0xd0
[  124.141225]  el1_interrupt+0x38/0x68
[  124.144810]  el1h_64_irq_handler+0x18/0x28
[  124.148917]  el1h_64_irq+0x64/0x68
[  124.152325]  default_idle_call+0x5c/0x170
[  124.156344]  do_idle+0x204/0x238
[  124.159579]  cpu_startup_entry+0x40/0x50
[  124.163512]  rest_init+0xec/0xf8
[  124.166745]  arch_call_rest_init+0x18/0x20
[  124.170853]  start_kernel+0x528/0x690
[  124.174523]  __primary_switched+0xbc/0xd0
[  124.178544] Code: d1000443 ea03003f 8a230021 54000040 (d50b7e21)
[  124.184658] ---[ end trace 0000000000000000 ]---
[  124.189287] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[  124.196186] SMP: stopping secondary CPUs
[  124.200118] Kernel Offset: 0x100004000000 from 0xffffc00080000000
[  124.206231] PHYS_OFFSET: 0x0
[  124.209114] CPU features: 0x1,00000001,70028143,0000720b
[  124.214442] Memory Limit: none
[  124.217501] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

Non-working 8GB unit made to work with mem=2G in cmdline.txt

$ cat /boot/firmware/cmdline.txt
console=serial0,115200 console=tty1 root=PARTUUID=8c5b1cb2-02 rootfstype=ext4 fsck.repair=yes mem=2G rootwait

$ free -m
               total        used        free      shared  buff/cache   available
Mem:            1947         250        1582           5         171        1697
Swap:            199           0         199

$ dmesg | grep ath11k
[    7.862557] ath11k_pci 0000:01:00.0: BAR 0: assigned [mem 0x1b80000000-0x1b801fffff 64bit]
[    7.862603] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    7.863780] ath11k_pci 0000:01:00.0: MSI vectors: 16
[    7.863795] ath11k_pci 0000:01:00.0: qcn9074 hw1.0
[    8.310542] ath11k_pci 0000:01:00.0: chip_id 0x0 chip_family 0x0 board_id 0xa0 soc_id 0xffffffff
[    8.310551] ath11k_pci 0000:01:00.0: fw_version 0x270206d0 fw_build_timestamp 2022-08-04 12:48 fw_build_id WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1

$ nmcli dev wifi list
IN-USE  BSSID              SSID            MODE   CHAN  RATE        SIGNAL  BAR>
        EC:6C:9A:4A:61:54  BT-JWAKQR       Infra  40    540 Mbit/s  29      ▂__>
        62:6C:9A:4A:61:56  EE WiFi-X       Infra  40    540 Mbit/s  25      ▂__>
        62:6C:9A:4A:61:55  EE WiFi         Infra  40    540 Mbit/s  25      ▂__>
<...>

$ sudo nmcli dev wifi connect <ap> password <password>
Device 'wlan0' successfully activated with '6ca93d62-f17e-4580-aaa4-f1dbe64a902b'.

$ ip -c a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether 2c:cf:67:67:8d:23 brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether c4:93:00:3a:34:99 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.185/24 brd 192.168.1.255 scope global dynamic noprefixroute wlan0
       valid_lft 86384sec preferred_lft 86384sec
    inet6 fe80::5a4e:f962:55b2:ca18/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

$ ping raspberrypi.com
PING raspberrypi.com (104.21.88.234) 56(84) bytes of data.
64 bytes from 104.21.88.234 (104.21.88.234): icmp_seq=1 ttl=58 time=7.68 ms
64 bytes from 104.21.88.234 (104.21.88.234): icmp_seq=2 ttl=58 time=7.94 ms
64 bytes from 104.21.88.234 (104.21.88.234): icmp_seq=3 ttl=58 time=8.53 ms
^C
--- raspberrypi.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 7.677/8.050/8.531/0.356 ms

By limiting the memory to 2GB, the 8GB unit works as expected.

Additional context

I have tried various permutations of the following config options in cmdline.txt with no success:

  • iommu=soft
  • iommu.strict=1
  • coherent_pool=1M
@P33M
Copy link
Contributor

P33M commented Nov 13, 2024

Are you saying that using

dtoverlay=pcie-32bit-dma
dtoverlay=pciex1-compat-pi5,no-mip
dtparam=pciex1
dtparam=pciex1_gen=3

in config.txt on an 8GB device results in a kernel panic?
What happens if you remove dtparam=pciex1_gen=3?

@P33M
Copy link
Contributor

P33M commented Nov 13, 2024

It's likely this bug - https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/#25662182

Usage of virt_to_phys() is bad. This is fixed in kernel 6.9 and onwards. Our rpi-6.12.y branch is mostly functional (and will be the next target for rpi-update releases), can you try building that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants