Skip to content

Commit

Permalink
i#6814: Fix stack overflow on signal delivery to mid-detach thread (#…
Browse files Browse the repository at this point in the history
…6815)

Fixes two stack overflow scenarios that occur when DR delivers an app
signal to the native signal handler for a thread that is mid-detach.

First case: when a thread is handling the suspend signal and is waiting
for the detacher thread to wake it up and tell it to continue detaching.
Currently, DR unblocks signals before starting the wait. If the signal is
delivered at this point, currently execute_native_handler() incorrectly
delivers the signal to the native handler on DR's own signal stack. To
fix this: we now do not unblock signals during this wait as it
complicates native signal delivery, also for the second case described
below.

Additionally, for a detaching thread, we now do not explicitly
restore the app's sigblocked mask; DR already restores the mask on the
signal frame, which would be restored automatically when the thread
returns from the DR detach signal handler. This avoids another case
where the app may be on DR's signal stack when the native signal is
delivered.

Second case: when the thread has been woken up by the detacher thread,
executed sig_detach, and reinstated the app signal stack (if available). If
the signal is delivered at this point, execute_native_handler() adds a new
signal frame on top of DR's own signal frame on the app stack and invokes
the native signal handler. This sometimes ends up taking too much stack
space which causes a stack overflow, as observed on an internal app with
frequent profiling signals that use the stack-intensive libunwind to get a
stack trace for all threads. To fix this: we reuse the same signal frame
for delivering the signal to the native signal handler, when the app
doesn't need a non-RT frame.

The new code is exercised by the existing detach_signal test. Also
modified the test to have some threads that have a very small sigstack,
which helps reproduce the crash originally seen on a real app. (There
was already a note in detach_signal test about using a large sigstack to
avoid this stack overflow.)

Tested on an internal app where failures reduce from ~136/4000 to
~1/4000.

Issue: #6814
  • Loading branch information
abhinav92003 committed May 22, 2024
1 parent d30253d commit 427e33e
Show file tree
Hide file tree
Showing 9 changed files with 230 additions and 56 deletions.
4 changes: 2 additions & 2 deletions core/dynamo.c
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* **********************************************************
* Copyright (c) 2010-2022 Google, Inc. All rights reserved.
* Copyright (c) 2010-2024 Google, Inc. All rights reserved.
* Copyright (c) 2000-2010 VMware, Inc. All rights reserved.
* **********************************************************/

Expand Down Expand Up @@ -2882,7 +2882,7 @@ dynamo_thread_not_under_dynamo(dcontext_t *dcontext)
LOG(THREAD, LOG_ASYNCH, 2, "thread %d not under DR control\n",
dcontext->owning_thread);
dcontext->currently_stopped = true;
os_thread_not_under_dynamo(dcontext);
os_thread_not_under_dynamo(dcontext, /*restore_sigblocked=*/true);
#ifdef SIDELINE
/* FIXME: if # active threads is 0, then put sideline thread to sleep! */
if (dynamo_options.sideline) {
Expand Down
3 changes: 2 additions & 1 deletion core/globals.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* **********************************************************
* Copyright (c) 2011-2023 Google, Inc. All rights reserved.
* Copyright (c) 2011-2024 Google, Inc. All rights reserved.
* Copyright (c) 2000-2010 VMware, Inc. All rights reserved.
* **********************************************************/

Expand Down Expand Up @@ -445,6 +445,7 @@ extern bool dynamo_all_threads_synched; /* are all other threads suspended safel
* so this is always set on Windows.
*/
extern bool dynamo_control_via_attach;
extern bool started_detach;
/* Not guarded by DR_APP_EXPORTS because later detach implementations might not
* go through the app interface.
*/
Expand Down
4 changes: 2 additions & 2 deletions core/os_shared.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* **********************************************************
* Copyright (c) 2010-2022 Google, Inc. All rights reserved.
* Copyright (c) 2010-2024 Google, Inc. All rights reserved.
* Copyright (c) 2003-2010 VMware, Inc. All rights reserved.
* **********************************************************/

Expand Down Expand Up @@ -80,7 +80,7 @@ void
os_thread_under_dynamo(dcontext_t *dcontext);
/* must only be called for the executing thread */
void
os_thread_not_under_dynamo(dcontext_t *dcontext);
os_thread_not_under_dynamo(dcontext_t *dcontext, bool restore_sigblocked);
void
os_process_under_dynamorio_initiate(dcontext_t *dcontext);
void
Expand Down
6 changes: 3 additions & 3 deletions core/synch.c
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* **********************************************************
* Copyright (c) 2012-2022 Google, Inc. All rights reserved.
* Copyright (c) 2012-2024 Google, Inc. All rights reserved.
* Copyright (c) 2008-2010 VMware, Inc. All rights reserved.
* **********************************************************/

Expand Down Expand Up @@ -46,8 +46,8 @@

extern vm_area_vector_t *fcache_unit_areas; /* from fcache.c */

static bool started_detach = false; /* set before synchall */
bool doing_detach = false; /* set after synchall */
bool started_detach = false; /* set before synchall */
bool doing_detach = false; /* set after synchall */
thread_id_t detacher_tid = INVALID_THREAD_ID;

static void
Expand Down
13 changes: 10 additions & 3 deletions core/unix/os.c
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* *******************************************************************************
* Copyright (c) 2010-2023 Google, Inc. All rights reserved.
* Copyright (c) 2010-2024 Google, Inc. All rights reserved.
* Copyright (c) 2011 Massachusetts Institute of Technology All rights reserved.
* Copyright (c) 2000-2010 VMware, Inc. All rights reserved.
* *******************************************************************************/
Expand Down Expand Up @@ -2952,10 +2952,17 @@ os_thread_under_dynamo(dcontext_t *dcontext)
}

void
os_thread_not_under_dynamo(dcontext_t *dcontext)
os_thread_not_under_dynamo(dcontext_t *dcontext, bool restore_sigblocked)
{
stop_itimer(dcontext);
signal_swap_mask(dcontext, true /*to app*/);
/* The caller may not want to restore the app's sigblocked mask right now.
* E.g., when a thread is in DR's signal handler to handle the detach signal,
* it can restore the mask atomically with going native by setting it on the
* signal frame, which avoids races.
*/
if (restore_sigblocked) {
signal_swap_mask(dcontext, true /*to app*/);
}
os_swap_context(dcontext, true /*to app*/, DR_STATE_GO_NATIVE);
}

Expand Down
5 changes: 4 additions & 1 deletion core/unix/os_private.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* **********************************************************
* Copyright (c) 2011-2022 Google, Inc. All rights reserved.
* Copyright (c) 2011-2024 Google, Inc. All rights reserved.
* Copyright (c) 2008-2010 VMware, Inc. All rights reserved.
* **********************************************************/

Expand Down Expand Up @@ -71,20 +71,23 @@
# define ASM_R1 "x1"
# define ASM_R2 "x2"
# define ASM_R3 "x3"
# define ASM_R4 "x4"
# define ASM_XSP "sp"
# define ASM_INDJMP "br"
#elif defined(DR_HOST_ARM)
# define ASM_R0 "r0"
# define ASM_R1 "r1"
# define ASM_R2 "r2"
# define ASM_R3 "r3"
# define ASM_R4 "r4"
# define ASM_XSP "sp"
# define ASM_INDJMP "bx"
#elif defined(DR_HOST_RISCV64)
# define ASM_R0 "a0"
# define ASM_R1 "a1"
# define ASM_R2 "a2"
# define ASM_R3 "a3"
# define ASM_R4 "a4"
# define ASM_XSP "sp"
# define ASM_INDJMP "jr"
#endif /* X86/ARM */
Expand Down
Loading

0 comments on commit 427e33e

Please sign in to comment.