deadlock (resulting in big quit) from vcl_BackendEvent() racing vcldir_retire() #4140

nigoroll · 2024-07-25T08:58:13Z

Expected Behavior

.

Current Behavior

seen while trying to reproduce nigoroll/libvmod-dynamic#117:

This is similar to the problem fixed by #4048, but here the situation is not to add a director, but rather to remove it:

A director thread (updating backends) is racing a CLI cold event which has a pthread_join() for the director thread. It never completes because the thread is waiting for the vcl_mtx, which the CLI event holds because it is returning the last director reference.

Thread 218 (Thread 0x7f84bb7ff6c0 (LWP 462388)):
#0  futex_wait (private=0, expected=2, futex_word=0x7f84cf422b88) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x7f84cf422b88, private=0) at ./nptl/lowlevellock.c:49
#2  0x00007f84cfcab2d2 in lll_mutex_lock_optimized (mutex=0x7f84cf422b88) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x7f84cf422b88) at ./nptl/pthread_mutex_lock.c:93
#4  0x0000563f69cb8f64 in Lck__Lock (lck=0x563f69e2f908 <vcl_mtx>, p=0x563f69dbf89a "vcldir_retire", l=279) at cache/cache_lck.c:124
#5  0x0000563f69cfa71a in vcldir_retire (vdir=0x7f84bdc53f80) at cache/cache_vrt_vcl.c:279
#6  0x0000563f69cf5df3 in vcldir_deref (vdir=0x7f84bdc53f80) at cache/cache_vrt_vcl.c:305
#7  0x0000563f69cf5fbe in VRT_Assign_Backend (dst=0x7f84bdc14af0, src=0x0) at cache/cache_vrt_vcl.c:343
#8  0x00007f84c2fa81f0 in ref_del (ctx=<optimized out>, r=0x7f84bdc14ad0) at vmod_dynamic.c:428
#9  0x00007f84c2faa176 in dom_update (now=1721895825.5200291, priv=<optimized out>, res=0x7f84c2fbaba0 <res_getdns>, dom=<optimized out>) at vmod_dynamic.c:74
2
#10 dom_lookup_thread (priv=0x7f84ba616400) at vmod_dynamic.c:820
#11 0x00007f84cfca8044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#12 0x00007f84cfd2861c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 1 (Thread 0x7f84cfc1b1c0 (LWP 457084)):
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007f84cfca9d9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007f84cfc5af32 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007f84cfc45472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x0000563f69cc0eac in pan_ic (func=0x563f69db7986 "cli_quit", file=0x563f69db7545 "cache/cache_main.c", line=376, cond=0x563f69db79b4 "It's time for the b
ig quit", kind=VAS_WRONG) at cache/cache_panic.c:860
#5  0x0000563f69d87135 in VAS_Fail (func=0x563f69db7986 "cli_quit", file=0x563f69db7545 "cache/cache_main.c", line=376, cond=0x563f69db79b4 "It's time for the
 big quit", kind=VAS_WRONG) at vas.c:67
#6  0x0000563f69cbb3d2 in cli_quit (sig=3) at cache/cache_main.c:376
#7  <signal handler called>
#8  0x00007f84cfca4da4 in __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=462388, futex_word=0x7f84bb7ff990) at ./nptl
/futex-internal.c:57
#9  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7f84bb7ff990, expected=462388, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=pri
vate@entry=128, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
#10 0x00007f84cfca4e0b in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7f84bb7ff990, expected=<optimized out>, clockid=clockid@entry
=0, abstime=abstime@entry=0x0, private=private@entry=128) at ./nptl/futex-internal.c:139
#11 0x00007f84cfca9b33 in __pthread_clockjoin_ex (threadid=140208058136256, thread_return=thread_return@entry=0x0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, block=block@entry=true) at ./nptl/pthread_join_common.c:102
#12 0x00007f84cfca99df in ___pthread_join (threadid=<optimized out>, thread_return=thread_return@entry=0x0) at ./nptl/pthread_join.c:24
#13 0x00007f84c2fa8623 in dom_event (dir=<optimized out>, ev=<optimized out>) at vmod_dynamic.c:1051
#14 0x0000563f69c9683d in VDI_Event (d=0x7f84ba6059f0, ev=VCL_EVENT_COLD) at cache/cache_director.c:277
#15 0x0000563f69cd9fbc in vcl_BackendEvent (vcl=0x7f84b9f16240, e=VCL_EVENT_COLD) at cache/cache_vcl.c:449
#16 0x0000563f69cd9428 in vcl_set_state (vcl=0x7f84b9f16240, state=0x563f69df22ee "0", msg=0x7ffc888a5528) at cache/cache_vcl.c:597
#17 0x0000563f69cd8eb4 in VCL_Poll () at cache/cache_vcl.c:729
#18 0x0000563f69c90ea9 in cli_cb_before (cli=0x7f84cf4d2810) at cache/cache_cli.c:80
#19 0x0000563f69d8b359 in cls_exec (cfd=0x7f84cf4d27e0, av=0x7f84cf42c400, ac=2) at vcli_serve.c:303
#20 0x0000563f69d8a2c6 in cls_feed (cfd=0x7f84cf4d27e0, p=0x7ffc888a5672 "\nold\n727.1721895820.531743/vgc.so 1auto\nLIBC_2.2.5", e=0x7ffc888a5673 "old\n727.1721895820.531743/vgc.so 1auto\nLIBC_2.2.5") at vcli_serve.c:412
#21 0x0000563f69d89d8a in VCLS_Poll (cs=0x7f84cf422b30, cli=0x7f84cf4d2810, timeout=-1) at vcli_serve.c:617
#22 0x0000563f69c90cf2 in CLI_Run () at cache/cache_cli.c:110
#23 0x0000563f69cbb0e5 in child_main (sigmagic=1, altstksz=81920) at cache/cache_main.c:467
#24 0x0000563f69d285fb in mgt_launch_child (cli=0x0) at mgt/mgt_child.c:402
#25 0x0000563f69d28091 in MCH_Start_Child () at mgt/mgt_child.c:692
#26 0x0000563f69d2f502 in main (argc=9, argv=0x7ffc888a7af8) at mgt/mgt_main.c:959


(gdb) thr 218
[Switching to thread 218 (Thread 0x7f84bb7ff6c0 (LWP 462388))]
#0  futex_wait (private=0, expected=2, futex_word=0x7f84cf422b88) at ../sysdeps/nptl/futex-internal.h:146
146	../sysdeps/nptl/futex-internal.h: No such file or directory.
(gdb) fr 3
#3  ___pthread_mutex_lock (mutex=0x7f84cf422b88) at ./nptl/pthread_mutex_lock.c:93
93	./nptl/pthread_mutex_lock.c: No such file or directory.
(gdb) p *mutex
$1 = {__data = {__lock = 2, __count = 0, __owner = 457084, __nusers = 1, __kind = 2, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = "\002\000\000\000\000\000\000\000|\371\006\000\001\000\000\000\002", '\000' <repeats 22 times>, __align = 2}

Possible Solution

I see no obvious solution to this problem at this point.

Steps to Reproduce (for bugs)

No response

Context

.

Varnish Cache version

No response

Operating system

No response

Source of binary packages used (if any)

No response

The text was updated successfully, but these errors were encountered:

Pondering varnishcache#4140 made me realize that our central per-vcl director list constitues a classic case of iterating a mutable list. That besides the already known fact that running potentially expensive iterator functions while holding the vcl_mtx is a very bad idea. So this patch is a suggestion on how to get out of this sitation. It does not go all the way (it still leaves unsolved a similar problem of iterating over all backends of _all vcls_), but shows an attempt on how to tackle the "iterate over one VCL's backends". We add an fittingly named 'vdure' facility which manages the director list and, in particular, director retirement. The main change is that, while iterators are active on the director list, vcl_mtx is _not_ held and any request of a director to retire only ends up in a resignation, which manifests by this director being put on a second list. When all iterators have completed, the resignation list is worked and the actual retirement carried out. NOTE: I am particularly _not_ happy about the vdire_(start|end)_iter() interface being exposed. What I would like to see is a FOREACH macro which does both the start and end. Having to use an iterator with a callback is clumsy, and having to directly call start/end with a VTAILQ_FOREACH feels like crossing interfaces.

Pondering varnishcache#4140 made me realize that our central per-vcl director list constitutes a classic case of iterating a mutable list. That besides the already known fact that running potentially expensive iterator functions while holding the vcl_mtx is a very bad idea. So this patch is a suggestion on how to get out of this situation. It does not go all the way (it still leaves unsolved a similar problem of iterating over all backends of _all vcls_), but shows an attempt on how to tackle the "iterate over one VCL's backends". We add a fittingly named 'vdire' facility which manages the director list and, in particular, director retirement. The main change is that, while iterators are active on the director list, vcl_mtx is _not_ held and any request of a director to retire only ends up in a resignation, which manifests by this director being put on a second list. When all iterators have completed, the resignation list is worked and the actual retirement carried out. NOTE: I am particularly _not_ happy about the vdire_(start|end)_iter() interface being exposed. What I would like to see is a FOREACH macro which does both the start and end. Having to use an iterator with a callback is clumsy, and having to directly call start/end with a VTAILQ_FOREACH feels like crossing interfaces.

So it turns out, that - still - we can deadlock raching between VCL events and director deletions (see varnishcache/varnish-cache#4140). This commit was originally written to fix #108, but it now enables us to take a different route again: Do not pthread_join() lookup threads, but rather detach them and make sure _they_ return the reference before terminating. The remainder of the commit message is an edit in light of the new insights: Our lookup threads may continue for a bit after a VCL_COLD event has been received: dom_event() sets dom->status to DYNAMIC_ST_DONE while the lookup thread may be blocking in the resolver or updating a domain. So, lookup threads might try to create backends even after the VCL has gone cold, in which case VRT_AddDirector() would panic with "Dynamic Backends can only be added to warm VCLs" - _unless_ a reference is present on the VCL to make the VCL state change go through VCL_TEMP_COOLING, in which case VRT_AddDirector() just returns NULL. vmod_dynamic took a VCL reference since 99912bc back in May 2019, but before the merge of varnishcache/varnish-cache#4037 it did not work correctly, because the VCL reference was returned before all threads had terminated. The naive idea to solve this would be to take/release a VCL reference in each lookup thread (or when starting/stopping it), but references can only be taken from the CLI context (ASSERT_CLI() in VRT_VCL_Prevent_Discard). Also, to support use of the .resolve() type method from vcl_init{}, we needed to depart from the original concept of running resolver threads only for warm vcls. We get out of this situation by using a reference on top of the VCL reference, which we can take even before we have the VCL reference. The VCL reference is returned once all internal "referef"s are returned. Ref #108 Probably fixes #117 also

Pondering varnishcache#4140 made me realize that our central per-vcl director list constitutes a classic case of iterating a mutable list. That besides the already known fact that running potentially expensive iterator functions while holding the vcl_mtx is a very bad idea. So this patch is a suggestion on how to get out of this situation. It does not go all the way (it still leaves unsolved a similar problem of iterating over all backends of _all vcls_), but shows an attempt on how to tackle the "iterate over one VCL's backends". We add a fittingly named 'vdire' facility which manages the director list and, in particular, director retirement. The main change is that, while iterators are active on the director list, vcl_mtx is _not_ held and any request of a director to retire only ends up in a resignation, which manifests by this director being put on a second list. When all iterators have completed, the resignation list is worked and the actual retirement carried out. NOTE: I am particularly _not_ happy about the vdire_(start|end)_iter() interface being exposed. What I would like to see is a FOREACH macro which does both the start and end. Having to use an iterator with a callback is clumsy, and having to directly call start/end with a VTAILQ_FOREACH feels like crossing interfaces.

Pondering varnishcache#4140 made me realize that our central per-vcl director list constitutes a classic case of iterating a mutable list. That besides the already known fact that running potentially expensive iterator functions while holding the vcl_mtx is a very bad idea. So this patch is a suggestion on how to get out of this situation. It does not go all the way (it still leaves unsolved a similar problem of iterating over all backends of _all vcls_), but shows an attempt on how to tackle the "iterate over one VCL's backends". We add a fittingly named 'vdire' facility which manages the director list and, in particular, director retirement. The main change is that, while iterators are active on the director list, vcl_mtx is _not_ held and any request of a director to retire only ends up in a resignation, which manifests by this director being put on a second list. When all iterators have completed, the resignation list is worked and the actual retirement carried out.

nigoroll mentioned this issue Jul 26, 2024

Add a facility for unlocked director iteration #4142

Open

nigoroll mentioned this issue Oct 7, 2024

cache_vcl: Do the temperature dance for backend creation races #4205

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deadlock (resulting in big quit) from vcl_BackendEvent() racing vcldir_retire() #4140

deadlock (resulting in big quit) from vcl_BackendEvent() racing vcldir_retire() #4140

nigoroll commented Jul 25, 2024 •

edited

Loading

deadlock (resulting in big quit) from vcl_BackendEvent() racing vcldir_retire() #4140

deadlock (resulting in big quit) from vcl_BackendEvent() racing vcldir_retire() #4140

Comments

nigoroll commented Jul 25, 2024 • edited Loading

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Varnish Cache version

Operating system

Source of binary packages used (if any)

nigoroll commented Jul 25, 2024 •

edited

Loading