Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: madsim panicked at must be called from the context of a Madsim runtime #16950

Closed
wangrunji0408 opened this issue May 27, 2024 · 3 comments
Closed
Assignees
Labels
type/bug Something isn't working
Milestone

Comments

@wangrunji0408
Copy link
Contributor

wangrunji0408 commented May 27, 2024

Describe the bug

When running deterministic recovery test, there is a chance of encountering this error. It is probably a bug of madsim.

Error message/log

thread '<unnamed>' panicked at /risingwave/.cargo/registry/src/index.crates.io-6f17d22bba15001f/madsim-0.2.27/src/sim/runtime/context.rs:27:44:
there is no reactor running, must be called from the context of a Madsim runtime
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::panicking::panic_display
             at ./src/stream/src/executor/backfill/arrangement_backfill.rs:389:17
             at ./src/stream/src/executor/wrapper/epoch_check.rs:73:1
             at ./src/stream/src/executor/wrapper/trace.rs:102:1
             at ./src/stream/src/executor/dispatch.rs:386:21
             at ./src/stream/src/executor/actor.rs:243:5
             at ./src/expr/core/src/expr_context.rs:35:65
             at ./src/stream/src/executor/actor.rs:178:10
 100: core::ptr::drop_in_place<futures_util::future::future::Map<risingwave_stream::executor::actor::Actor<risingwave_stream::executor::dispatch::DispatchExecutor>::run::{{closure}},risingwave_stream::task::stream_manager::<impl risingwave_stream::task::barrier_manager::LocalBarrierWorker>::spawn_actors::{{closure}}>>
 101: core::ptr::drop_in_place<core::option::Option<futures_util::future::future::Map<risingwave_stream::executor::actor::Actor<risingwave_stream::executor::dispatch::DispatchExecutor>::run::{{closure}},risingwave_stream::task::stream_manager::<impl risingwave_stream::task::barrier_manager::LocalBarrierWorker>::spawn_actors::{{closure}}>>>
 102: core::pin::Pin<Ptr>::set
 103: tokio::task::task_local::_::<impl core::ops::drop::Drop for tokio::task::task_local::TaskLocalFuture<T,F>>::drop::__drop_inner::{{closure}}
 104: tokio::task::task_local::LocalKey<T>::scope_inner
 105: tokio::task::task_local::_::<impl core::ops::drop::Drop for tokio::task::task_local::TaskLocalFuture<T,F>>::drop::__drop_inner
 106: tokio::task::task_local::_::<impl core::ops::drop::Drop for tokio::task::task_local::TaskLocalFuture<T,F>>::drop

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

No response

Additional context

https://buildkite.com/risingwavelabs/pull-request/builds/50377#018fb898-4f49-42d5-a96a-d566e0ab4ea4

@wangrunji0408 wangrunji0408 added the type/bug Something isn't working label May 27, 2024
@wangrunji0408 wangrunji0408 self-assigned this May 27, 2024
@github-actions github-actions bot added this to the release-1.10 milestone May 27, 2024
@BugenZhao
Copy link
Member

core::ptr::drop_in_place<...>

Was it during the process when the actor being dropped?

@wangrunji0408
Copy link
Contributor Author

wangrunji0408 commented May 27, 2024

Yes. The root reason is that one actor called tokio::spawn when being dropped. The spawn was called in foyer-storage. @MrCroxx said this will be removed after #16869, but that PR is blocked by madsim-rs/madsim#213.

stack backtrace:
   0:     0x5642bd9f82f2 - std::backtrace_rs::backtrace::libunwind::trace::h50c8612b7fb5c600
   1:     0x5642bd9f82f2 - std::backtrace_rs::backtrace::trace_unsynchronized::h7c1b56b43cfb82e4
   2:     0x5642bd9f82f2 - std::sys_common::backtrace::_print_fmt::ha8e8feab608e57c5
                               at /risingwave/.cargo/registry/src/index.crates.io-6f17d22bba15001f/madsim-0.2.27/src/sim/runtime/context.rs:27:44
                               at /risingwave/.cargo/registry/src/index.crates.io-6f17d22bba15001f/madsim-0.2.27/src/sim/runtime/context.rs:27:10
                               at /risingwave/.cargo/registry/src/index.crates.io-6f17d22bba15001f/madsim-0.2.27/src/sim/task/mod.rs:577:20
                               at /risingwave/.cargo/registry/src/index.crates.io-6f17d22bba15001f/madsim-0.2.27/src/sim/task/mod.rs:655:5
                               at /risingwave/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foyer-storage-0.7.6/src/storage.rs:314:9
                               at /risingwave/src/stream/src/executor/backfill/arrangement_backfill.rs:389:17
                               at /risingwave/src/stream/src/executor/wrapper/epoch_check.rs:73:1
                               at /risingwave/src/stream/src/executor/wrapper/trace.rs:102:1
                               at /risingwave/src/stream/src/executor/dispatch.rs:386:21
                               at /risingwave/src/stream/src/executor/actor.rs:243:5

@BugenZhao
Copy link
Member

BugenZhao commented May 27, 2024

This suddenly reminds me that we spawn separate tasks in the blocking thread pool to drop the actor, during which the task-local variables provided to the actor are no longer accessible. This may not be related to this issue, but can potentially lead to problems.

/// Drop the stream in a blocking task to avoid interfering with other actors.
///
/// Logically the actor is dropped after we send the barrier with `Drop` mutation to the
/// downstream, thus making the `drop`'s progress asynchronous. However, there might be a
/// considerable amount of data in the executors' in-memory cache, dropping these structures might
/// be a CPU-intensive task. This may lead to the runtime being unable to schedule other actors if
/// the `drop` is called on the current thread.
pub async fn spawn_blocking_drop_stream<T: Send + 'static>(stream: T) {
let _ = tokio::task::spawn_blocking(move || drop(stream))
.instrument_await("drop_stream")
.await;
}

let with_config =
crate::CONFIG.scope(self.actor_manager.env.config().clone(), instrumented);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants