Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32S3: Zephyr freezes during OTA Update over UDP and BLE #76325

Open
epc-ake opened this issue Jul 26, 2024 Discussed in #76302 · 13 comments
Open

ESP32S3: Zephyr freezes during OTA Update over UDP and BLE #76325

epc-ake opened this issue Jul 26, 2024 Discussed in #76302 · 13 comments
Assignees
Labels
bug The issue is a bug, or the PR is fixing a bug platform: ESP32 Espressif ESP32 priority: low Low impact/importance bug
Milestone

Comments

@epc-ake
Copy link
Contributor

epc-ake commented Jul 26, 2024

Discussed in #76302

Originally posted by epc-ake July 25, 2024
Has anyone managed to use mcumgr over udp on a esp32s3?

I'm developing on an esp32s3_devkitm and want to enable OTA firmware updates using mcumgr and mcuboot over a UDP interface. To experiment with this, I modified the prj.conf file of the samples/net/wifi example to enable mcuboot and mcumgr. See the attached file for details. prj.conf

After flashing the firmware and connecting to a Wi-Fi network, I can retrieve image information using the go-app and AuTerm. For example, with the go-app:

mcumgr --conntype udp --connstring=[x.x.0.60]:1337 image list
Images:
 image=0 slot=0
    version: 0.0.0
    bootable: true
    flags: active confirmed
    hash: 60e5eb52f59451a3db2ec9e978b13c0c8485577dd6787684e216069341bdf80b
Split status: N/A (0)

However, when I try to upload an image, the firmware becomes unresponsive and freezes:

./mcumgr --conntype udp --connstring=[x.x.0.60]:1337 image upload zephyr.signed.bin
# starts freezing...

Output zephyr:

*** Booting Zephyr OS build 065fa94c79e5 ***
[00:00:00.241,000] <inf> smp_udp: Started (IPv4)
uart:~$ wifi connect -s ***** -p ***** -k 1
Connection requested
Connected
# requesting image info
[00:00:17.353,000] <inf> net_dhcpv4: Received: x.x.0.60
[00:00:24.301,000] <dbg> mcumgr_img_grp: img_mgmt_active_slot: (0) => 0
[00:00:24.302,000] <inf> mcuboot_util: Image index: 0, Swap type: none
[00:00:24.302,000] <dbg> mcumgr_img_grp: img_mgmt_get_next_boot_slot: (0, *) => slot = 0, type = 0
[00:00:24.302,000] <dbg> mcumgr_img_grp: img_mgmt_active_slot: (0) => 0
uart:~$  # doing image update -> freezing...

I ran the debugger in parallel, and when I interrupted GDB (Ctrl+C) after the firmware started freezing, it pointed to _DoubleExceptionVector.

Output GDB:

Info : [esp32s3.cpu0] Target halted, PC=0x403743C0, debug_reason=00000000
[esp32s3.cpu0] Target halted, PC=0x403743C0, debug_reason=00000000
Info : Set GDB target to 'esp32s3.cpu0'
Set GDB target to 'esp32s3.cpu0'
Info : [esp32s3.cpu1] Target halted, PC=0x40043A40, debug_reason=00000000
[esp32s3.cpu1] Target halted, PC=0x40043A40, debug_reason=00000000

Program received signal SIGINT, Interrupt.
_DoubleExceptionVector () at /zephyrproject/zephyr-epc/arch/xtensa/core/xtensa_asm2_util.S:525

Uploading over serial works without any issues, so it seems to be specifically related to the UDP interface.

Does anyone have any ideas on how to debug this?

Copy link

Hi @epc-ake! We appreciate you submitting your first issue for our open-source project. 🌟

Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙

@nordicjm nordicjm added the platform: ESP32 Espressif ESP32 label Jul 26, 2024
@epc-ake
Copy link
Contributor Author

epc-ake commented Jul 26, 2024

if I set CONFIG_IMG_ERASE_PROGRESSIVELY=y it uploads at least some data before freezing again.

@epc-ake
Copy link
Contributor Author

epc-ake commented Jul 29, 2024

It also freezes during BLE update

@epc-ake epc-ake changed the title ESP32S3: Zephyr freezes during OTA Update over UDP ESP32S3: Zephyr freezes during OTA Update over UDP and BLE Jul 29, 2024
@epc-ake
Copy link
Contributor Author

epc-ake commented Jul 29, 2024

Same issue over serial as well.
However, it does manage to upload part of the firmware.

@jhedberg jhedberg added bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug labels Jul 30, 2024
@epc-ake
Copy link
Contributor Author

epc-ake commented Aug 7, 2024

@sylvioalves can you give us an update on this? In our last conversation on discord you mentioned that you've evaluated a fix for this.

@LeoBRIANDSmile
Copy link
Contributor

Hi, the CPU freezes during which step of OTA update (downloading firmware, erasing flash, writing flash, ...) ? Could you debug this. Because I have a similar issue on an homemade OTA update tool on esp32s3 during the flash erase step. CPU raises FATAL EXCEPTION.

@epc-ake
Copy link
Contributor Author

epc-ake commented Sep 2, 2024

I didn't work on this, so no progress here unfortunately.
I think writing to the flash causes zephyr to freeze. So it might be a problem with the flash driver.
#77452 mentions a similar/same bug.

@LeoBRIANDSmile
Copy link
Contributor

image
It seems to be an issue of flash protection

@epc-ake
Copy link
Contributor Author

epc-ake commented Sep 2, 2024

image It seems to be an issue of flash protection

This seems to be relevant to the esptool.py.
Overall I was able to partially update the image...

@rftafas
Copy link

rftafas commented Sep 24, 2024

@epc-ake can you try #78121?

sylvioalves added a commit to sylvioalves/zephyr that referenced this issue Sep 27, 2024
ESP32-S3 initialization code should apply the errata after cache
initialization. This fixes it making sure data and cache instruction are properly
handled and let following calls to work as needed.

This also update hal_espressif to force gcc to treat register bitfield
structs declared as volatile to ensure writes on 32 bit peripheral registers.

Fixes zephyrproject-rtos#71397
Fixes zephyrproject-rtos#76325

Signed-off-by: Sylvio Alves <[email protected]>
sylvioalves added a commit to sylvioalves/zephyr that referenced this issue Sep 27, 2024
ESP32-S3 initialization code should apply the errata
after cache initialization. This fixes it making sure
data and cache instruction are properly
handled and let following calls to work as needed.

This also update hal_espressif to force gcc to treat
register bitfield structs declared as volatile to
ensure writes on 32 bit peripheral registers.

Fixes zephyrproject-rtos#71397
Fixes zephyrproject-rtos#76325

Signed-off-by: Sylvio Alves <[email protected]>
@sylvioalves sylvioalves added this to the v3.7.1 milestone Sep 27, 2024
@marekmatej
Copy link

@epc-ake are you, by any chance, running the second image on the APPCPU/cpu1 ?

Info : [esp32s3.cpu1] Target halted, PC=0x40043A40, debug_reason=00000000
[esp32s3.cpu1] Target halted, PC=0x40043A40, debug_reason=00000000

@epc-ake
Copy link
Contributor Author

epc-ake commented Oct 1, 2024

@epc-ake can you try #78121?

Thanks for this. I tried it and it seems to be working for at least UDP.
However, I don't have the time right now to test it in depth...
I will come back to it in 1-2 weeks.

@epc-ake
Copy link
Contributor Author

epc-ake commented Oct 1, 2024

@epc-ake are you, by any chance, running the second image on the APPCPU/cpu1 ?

Info : [esp32s3.cpu1] Target halted, PC=0x40043A40, debug_reason=00000000
[esp32s3.cpu1] Target halted, PC=0x40043A40, debug_reason=00000000

Hmm interesting. I didn't notice that.
Currently I'm only using the PROCPU.
Maybe because I interrupted GDB with (Ctrl+C) it halted on cpu1 by accident?

sylvioalves added a commit to sylvioalves/zephyr that referenced this issue Oct 1, 2024
ESP32-S3 initialization code should apply the errata
after cache initialization. This fixes it making sure
data and cache instruction are properly
handled and let following calls to work as needed.

This also update hal_espressif to force gcc to treat
register bitfield structs declared as volatile to
ensure writes on 32 bit peripheral registers.

Fixes zephyrproject-rtos#71397
Fixes zephyrproject-rtos#76325

Signed-off-by: Sylvio Alves <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug, or the PR is fixing a bug platform: ESP32 Espressif ESP32 priority: low Low impact/importance bug
Projects
None yet
Development

No branches or pull requests

7 participants