You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a system built around an Espressif ESP32-S3 MCU using ESP-IDF/FreeRTOS. Recently when we updated our Rust toolchain we started having issues where under certain conditions the watchdog timer would constantly trigger and reset the system. We've managed to track it down to being caused by the new crossbeam-channel-based Channel spinlooping in try_send and try_recv.
Many parts of our system communicate using Channels and the highest priority threads are the ones that read measurements from sensors and then sends those measurements to various channels for further processing using try_send. These threads also read from command channels using try_recv.
Our expectation with this approach is that the sending and receiving from these channels should never block on waiting for other threads to run and if we can't read/send anything right then the methods should immediately return an Err which we ignore.
Through some judicious println!-debugging I've found that when we call try_send or try_recv we sometimes end up in a situation where start_send/start_recv performs the following spin_light calls multiple thousands of times:
This then leads to our idle task never getting to run and thus the watchdog timer times out and resets the system. Disabling the watchdog timer doesn't seem to let it ever get unstuck on its own.
I've tried switching to crossbeam-channel as well and while it seems harder to reproduce using that crate it's still happening.
0x420304b2 - <core::ops::range::Range<T> as core::iter::range::RangeIteratorImpl>::spec_next
at ??:??
0x3fcd9bd0 - _btdm_bss_end
at ??:??
0x4203038e - std::sync::mpmc::array::Channel<T>::start_send
at ??:??
0x3fcd9c00 - _btdm_bss_end
at ??:??
0x42029852 - std::sync::mpmc::Sender<T>::try_send
at ??:??
0x3fcd9c50 - _btdm_bss_end
at ??:??
0x42038706 - std::sync::mpsc::SyncSender<T>::try_send
at /home/remmy/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/sync/mpsc/mod.rs:739
0x3fcd9c70 - _btdm_bss_end
at ??:??
0x4200b3e3 - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
at /home/remmy/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/thread/mod.rs:529
0x3fcd9d50 - _btdm_bss_end
at ??:??
0x420bd223 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at /home/remmy/.rustup/toolchains/esp/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1985
0x3fcd9de0 - _btdm_bss_end
at ??:??
0x420c3735 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at /home/remmy/.rustup/toolchains/esp/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1985
0x3fcd9e00 - _btdm_bss_end
at ??:??
0x420ef854 - pthread_task_func
at /home/remmy/src/i/elofleet/firmware/elobox/.embuild/espressif/esp-idf/v5.0.3/components/pthread/pthread.c:196
0x3fcd9e20 - _btdm_bss_end
at ??:??
The text was updated successfully, but these errors were encountered:
They look related, but the actual spinning case from their backtrace is different.
Ultimately it's not a big problem for us if try_recv blocks for a short while in certain cases, but it's a big problem if it's a spinloop because in RTOS conditions it means that lower-priority threads will never get to run, and so the whole system hangs.
It seems like they're slightly different instances of the general symptom: something is spinning during a priority inversion. With RT scheduling the inversion can last forever. With less strict scheduling it merely takes a while until it resolves.
saethlin
added
T-libs
Relevant to the library team, which will review and decide on the PR/issue.
and removed
needs-triage
This issue may need triage. Remove it if it has been sufficiently triaged.
labels
Aug 15, 2023
We have a system built around an Espressif ESP32-S3 MCU using ESP-IDF/FreeRTOS. Recently when we updated our Rust toolchain we started having issues where under certain conditions the watchdog timer would constantly trigger and reset the system. We've managed to track it down to being caused by the new
crossbeam-channel
-basedChannel
spinlooping intry_send
andtry_recv
.Many parts of our system communicate using
Channel
s and the highest priority threads are the ones that read measurements from sensors and then sends those measurements to various channels for further processing usingtry_send
. These threads also read from command channels usingtry_recv
.Our expectation with this approach is that the sending and receiving from these channels should never block on waiting for other threads to run and if we can't read/send anything right then the methods should immediately return an
Err
which we ignore.Through some judicious
println!
-debugging I've found that when we calltry_send
ortry_recv
we sometimes end up in a situation wherestart_send
/start_recv
performs the followingspin_light
calls multiple thousands of times:rust/library/std/src/sync/mpmc/array.rs
Line 186 in eb26296
rust/library/std/src/sync/mpmc/array.rs
Line 277 in eb26296
This then leads to our idle task never getting to run and thus the watchdog timer times out and resets the system. Disabling the watchdog timer doesn't seem to let it ever get unstuck on its own.
I've tried switching to
crossbeam-channel
as well and while it seems harder to reproduce using that crate it's still happening.Meta
rustc --version --verbose
:Backtrace
The text was updated successfully, but these errors were encountered: