Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deadlock (resulting in big quit) from vcl_BackendEvent() racing vcldir_retire() #4140

Open
nigoroll opened this issue Jul 25, 2024 · 0 comments

Comments

@nigoroll
Copy link
Member

nigoroll commented Jul 25, 2024

Expected Behavior

.

Current Behavior

seen while trying to reproduce nigoroll/libvmod-dynamic#117:

This is similar to the problem fixed by #4048, but here the situation is not to add a director, but rather to remove it:

A director thread (updating backends) is racing a CLI cold event which has a pthread_join() for the director thread. It never completes because the thread is waiting for the vcl_mtx, which the CLI event holds because it is returning the last director reference.

Thread 218 (Thread 0x7f84bb7ff6c0 (LWP 462388)):
#0  futex_wait (private=0, expected=2, futex_word=0x7f84cf422b88) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x7f84cf422b88, private=0) at ./nptl/lowlevellock.c:49
#2  0x00007f84cfcab2d2 in lll_mutex_lock_optimized (mutex=0x7f84cf422b88) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x7f84cf422b88) at ./nptl/pthread_mutex_lock.c:93
#4  0x0000563f69cb8f64 in Lck__Lock (lck=0x563f69e2f908 <vcl_mtx>, p=0x563f69dbf89a "vcldir_retire", l=279) at cache/cache_lck.c:124
#5  0x0000563f69cfa71a in vcldir_retire (vdir=0x7f84bdc53f80) at cache/cache_vrt_vcl.c:279
#6  0x0000563f69cf5df3 in vcldir_deref (vdir=0x7f84bdc53f80) at cache/cache_vrt_vcl.c:305
#7  0x0000563f69cf5fbe in VRT_Assign_Backend (dst=0x7f84bdc14af0, src=0x0) at cache/cache_vrt_vcl.c:343
#8  0x00007f84c2fa81f0 in ref_del (ctx=<optimized out>, r=0x7f84bdc14ad0) at vmod_dynamic.c:428
#9  0x00007f84c2faa176 in dom_update (now=1721895825.5200291, priv=<optimized out>, res=0x7f84c2fbaba0 <res_getdns>, dom=<optimized out>) at vmod_dynamic.c:74
2
#10 dom_lookup_thread (priv=0x7f84ba616400) at vmod_dynamic.c:820
#11 0x00007f84cfca8044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#12 0x00007f84cfd2861c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 1 (Thread 0x7f84cfc1b1c0 (LWP 457084)):
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007f84cfca9d9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007f84cfc5af32 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007f84cfc45472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x0000563f69cc0eac in pan_ic (func=0x563f69db7986 "cli_quit", file=0x563f69db7545 "cache/cache_main.c", line=376, cond=0x563f69db79b4 "It's time for the b
ig quit", kind=VAS_WRONG) at cache/cache_panic.c:860
#5  0x0000563f69d87135 in VAS_Fail (func=0x563f69db7986 "cli_quit", file=0x563f69db7545 "cache/cache_main.c", line=376, cond=0x563f69db79b4 "It's time for the
 big quit", kind=VAS_WRONG) at vas.c:67
#6  0x0000563f69cbb3d2 in cli_quit (sig=3) at cache/cache_main.c:376
#7  <signal handler called>
#8  0x00007f84cfca4da4 in __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=462388, futex_word=0x7f84bb7ff990) at ./nptl
/futex-internal.c:57
#9  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7f84bb7ff990, expected=462388, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=pri
vate@entry=128, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
#10 0x00007f84cfca4e0b in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7f84bb7ff990, expected=<optimized out>, clockid=clockid@entry
=0, abstime=abstime@entry=0x0, private=private@entry=128) at ./nptl/futex-internal.c:139
#11 0x00007f84cfca9b33 in __pthread_clockjoin_ex (threadid=140208058136256, thread_return=thread_return@entry=0x0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, block=block@entry=true) at ./nptl/pthread_join_common.c:102
#12 0x00007f84cfca99df in ___pthread_join (threadid=<optimized out>, thread_return=thread_return@entry=0x0) at ./nptl/pthread_join.c:24
#13 0x00007f84c2fa8623 in dom_event (dir=<optimized out>, ev=<optimized out>) at vmod_dynamic.c:1051
#14 0x0000563f69c9683d in VDI_Event (d=0x7f84ba6059f0, ev=VCL_EVENT_COLD) at cache/cache_director.c:277
#15 0x0000563f69cd9fbc in vcl_BackendEvent (vcl=0x7f84b9f16240, e=VCL_EVENT_COLD) at cache/cache_vcl.c:449
#16 0x0000563f69cd9428 in vcl_set_state (vcl=0x7f84b9f16240, state=0x563f69df22ee "0", msg=0x7ffc888a5528) at cache/cache_vcl.c:597
#17 0x0000563f69cd8eb4 in VCL_Poll () at cache/cache_vcl.c:729
#18 0x0000563f69c90ea9 in cli_cb_before (cli=0x7f84cf4d2810) at cache/cache_cli.c:80
#19 0x0000563f69d8b359 in cls_exec (cfd=0x7f84cf4d27e0, av=0x7f84cf42c400, ac=2) at vcli_serve.c:303
#20 0x0000563f69d8a2c6 in cls_feed (cfd=0x7f84cf4d27e0, p=0x7ffc888a5672 "\nold\n727.1721895820.531743/vgc.so 1auto\nLIBC_2.2.5", e=0x7ffc888a5673 "old\n727.1721895820.531743/vgc.so 1auto\nLIBC_2.2.5") at vcli_serve.c:412
#21 0x0000563f69d89d8a in VCLS_Poll (cs=0x7f84cf422b30, cli=0x7f84cf4d2810, timeout=-1) at vcli_serve.c:617
#22 0x0000563f69c90cf2 in CLI_Run () at cache/cache_cli.c:110
#23 0x0000563f69cbb0e5 in child_main (sigmagic=1, altstksz=81920) at cache/cache_main.c:467
#24 0x0000563f69d285fb in mgt_launch_child (cli=0x0) at mgt/mgt_child.c:402
#25 0x0000563f69d28091 in MCH_Start_Child () at mgt/mgt_child.c:692
#26 0x0000563f69d2f502 in main (argc=9, argv=0x7ffc888a7af8) at mgt/mgt_main.c:959


(gdb) thr 218
[Switching to thread 218 (Thread 0x7f84bb7ff6c0 (LWP 462388))]
#0  futex_wait (private=0, expected=2, futex_word=0x7f84cf422b88) at ../sysdeps/nptl/futex-internal.h:146
146	../sysdeps/nptl/futex-internal.h: No such file or directory.
(gdb) fr 3
#3  ___pthread_mutex_lock (mutex=0x7f84cf422b88) at ./nptl/pthread_mutex_lock.c:93
93	./nptl/pthread_mutex_lock.c: No such file or directory.
(gdb) p *mutex
$1 = {__data = {__lock = 2, __count = 0, __owner = 457084, __nusers = 1, __kind = 2, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = "\002\000\000\000\000\000\000\000|\371\006\000\001\000\000\000\002", '\000' <repeats 22 times>, __align = 2}

Possible Solution

I see no obvious solution to this problem at this point.

Steps to Reproduce (for bugs)

No response

Context

.

Varnish Cache version

No response

Operating system

No response

Source of binary packages used (if any)

No response

nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Jul 26, 2024
Pondering varnishcache#4140 made me realize that our central per-vcl director list
constitues a classic case of iterating a mutable list. That besides the already
known fact that running potentially expensive iterator functions while holding
the vcl_mtx is a very bad idea.

So this patch is a suggestion on how to get out of this sitation. It does not
go all the way (it still leaves unsolved a similar problem of iterating over all
backends of _all vcls_), but shows an attempt on how to tackle the "iterate over
one VCL's backends".

We add an fittingly named 'vdure' facility which manages the director list and,
in particular, director retirement. The main change is that, while iterators are
active on the director list, vcl_mtx is _not_ held and any request of a director
to retire only ends up in a resignation, which manifests by this director being
put on a second list.

When all iterators have completed, the resignation list is worked and the actual
retirement carried out.

NOTE: I am particularly _not_ happy about the vdire_(start|end)_iter() interface
being exposed. What I would like to see is a FOREACH macro which does both the
start and end. Having to use an iterator with a callback is clumsy, and having
to directly call start/end with a VTAILQ_FOREACH feels like crossing
interfaces.
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Jul 26, 2024
Pondering varnishcache#4140 made me realize that our central per-vcl director list
constitutes a classic case of iterating a mutable list. That besides the already
known fact that running potentially expensive iterator functions while holding
the vcl_mtx is a very bad idea.

So this patch is a suggestion on how to get out of this situation. It does not
go all the way (it still leaves unsolved a similar problem of iterating over all
backends of _all vcls_), but shows an attempt on how to tackle the "iterate over
one VCL's backends".

We add a fittingly named 'vdire' facility which manages the director list and,
in particular, director retirement. The main change is that, while iterators are
active on the director list, vcl_mtx is _not_ held and any request of a director
to retire only ends up in a resignation, which manifests by this director being
put on a second list.

When all iterators have completed, the resignation list is worked and the actual
retirement carried out.

NOTE: I am particularly _not_ happy about the vdire_(start|end)_iter() interface
being exposed. What I would like to see is a FOREACH macro which does both the
start and end. Having to use an iterator with a callback is clumsy, and having
to directly call start/end with a VTAILQ_FOREACH feels like crossing
interfaces.
nigoroll added a commit to nigoroll/libvmod-dynamic that referenced this issue Jul 30, 2024
So it turns out, that - still - we can deadlock raching between VCL
events and director deletions (see
varnishcache/varnish-cache#4140). This commit
was originally written to fix #108, but it now enables us to take a
different route again: Do not pthread_join() lookup threads, but rather
detach them and make sure _they_ return the reference before terminating.

The remainder of the commit message is an edit in light of the new
insights:

Our lookup threads may continue for a bit after a VCL_COLD event has
been received: dom_event() sets dom->status to DYNAMIC_ST_DONE while
the lookup thread may be blocking in the resolver or updating a
domain.

So, lookup threads might try to create backends even after the VCL has
gone cold, in which case VRT_AddDirector() would panic with "Dynamic
Backends can only be added to warm VCLs" - _unless_ a reference is
present on the VCL to make the VCL state change go through
VCL_TEMP_COOLING, in which case VRT_AddDirector() just returns NULL.

vmod_dynamic took a VCL reference since
99912bc back in May 2019, but before
the merge of varnishcache/varnish-cache#4037
it did not work correctly, because the VCL reference was returned
before all threads had terminated.

The naive idea to solve this would be to take/release a VCL reference
in each lookup thread (or when starting/stopping it), but references
can only be taken from the CLI context (ASSERT_CLI() in
VRT_VCL_Prevent_Discard). Also, to support use of the .resolve() type
method from vcl_init{}, we needed to depart from the original concept
of running resolver threads only for warm vcls.

We get out of this situation by using a reference on top of the VCL
reference, which we can take even before we have the VCL reference.
The VCL reference is returned once all internal "referef"s are returned.

Ref #108

Probably fixes #117 also
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Jul 31, 2024
Pondering varnishcache#4140 made me realize that our central per-vcl director list
constitutes a classic case of iterating a mutable list. That besides the already
known fact that running potentially expensive iterator functions while holding
the vcl_mtx is a very bad idea.

So this patch is a suggestion on how to get out of this situation. It does not
go all the way (it still leaves unsolved a similar problem of iterating over all
backends of _all vcls_), but shows an attempt on how to tackle the "iterate over
one VCL's backends".

We add a fittingly named 'vdire' facility which manages the director list and,
in particular, director retirement. The main change is that, while iterators are
active on the director list, vcl_mtx is _not_ held and any request of a director
to retire only ends up in a resignation, which manifests by this director being
put on a second list.

When all iterators have completed, the resignation list is worked and the actual
retirement carried out.

NOTE: I am particularly _not_ happy about the vdire_(start|end)_iter() interface
being exposed. What I would like to see is a FOREACH macro which does both the
start and end. Having to use an iterator with a callback is clumsy, and having
to directly call start/end with a VTAILQ_FOREACH feels like crossing
interfaces.
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Jul 31, 2024
Pondering varnishcache#4140 made me realize that our central per-vcl director list
constitutes a classic case of iterating a mutable list. That besides the already
known fact that running potentially expensive iterator functions while holding
the vcl_mtx is a very bad idea.

So this patch is a suggestion on how to get out of this situation. It does not
go all the way (it still leaves unsolved a similar problem of iterating over all
backends of _all vcls_), but shows an attempt on how to tackle the "iterate over
one VCL's backends".

We add a fittingly named 'vdire' facility which manages the director list and,
in particular, director retirement. The main change is that, while iterators are
active on the director list, vcl_mtx is _not_ held and any request of a director
to retire only ends up in a resignation, which manifests by this director being
put on a second list.

When all iterators have completed, the resignation list is worked and the actual
retirement carried out.

NOTE: I am particularly _not_ happy about the vdire_(start|end)_iter() interface
being exposed. What I would like to see is a FOREACH macro which does both the
start and end. Having to use an iterator with a callback is clumsy, and having
to directly call start/end with a VTAILQ_FOREACH feels like crossing
interfaces.
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Sep 30, 2024
Pondering varnishcache#4140 made me realize that our central per-vcl director list
constitutes a classic case of iterating a mutable list. That besides the already
known fact that running potentially expensive iterator functions while holding
the vcl_mtx is a very bad idea.

So this patch is a suggestion on how to get out of this situation. It does not
go all the way (it still leaves unsolved a similar problem of iterating over all
backends of _all vcls_), but shows an attempt on how to tackle the "iterate over
one VCL's backends".

We add a fittingly named 'vdire' facility which manages the director list and,
in particular, director retirement. The main change is that, while iterators are
active on the director list, vcl_mtx is _not_ held and any request of a director
to retire only ends up in a resignation, which manifests by this director being
put on a second list.

When all iterators have completed, the resignation list is worked and the actual
retirement carried out.

NOTE: I am particularly _not_ happy about the vdire_(start|end)_iter() interface
being exposed. What I would like to see is a FOREACH macro which does both the
start and end. Having to use an iterator with a callback is clumsy, and having
to directly call start/end with a VTAILQ_FOREACH feels like crossing
interfaces.
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Oct 28, 2024
Pondering varnishcache#4140 made me realize that our central per-vcl director list
constitutes a classic case of iterating a mutable list. That besides the already
known fact that running potentially expensive iterator functions while holding
the vcl_mtx is a very bad idea.

So this patch is a suggestion on how to get out of this situation. It does not
go all the way (it still leaves unsolved a similar problem of iterating over all
backends of _all vcls_), but shows an attempt on how to tackle the "iterate over
one VCL's backends".

We add a fittingly named 'vdire' facility which manages the director list and,
in particular, director retirement. The main change is that, while iterators are
active on the director list, vcl_mtx is _not_ held and any request of a director
to retire only ends up in a resignation, which manifests by this director being
put on a second list.

When all iterators have completed, the resignation list is worked and the actual
retirement carried out.
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Nov 8, 2024
Pondering varnishcache#4140 made me realize that our central per-vcl director list
constitutes a classic case of iterating a mutable list. That besides the already
known fact that running potentially expensive iterator functions while holding
the vcl_mtx is a very bad idea.

So this patch is a suggestion on how to get out of this situation. It does not
go all the way (it still leaves unsolved a similar problem of iterating over all
backends of _all vcls_), but shows an attempt on how to tackle the "iterate over
one VCL's backends".

We add a fittingly named 'vdire' facility which manages the director list and,
in particular, director retirement. The main change is that, while iterators are
active on the director list, vcl_mtx is _not_ held and any request of a director
to retire only ends up in a resignation, which manifests by this director being
put on a second list.

When all iterators have completed, the resignation list is worked and the actual
retirement carried out.
nigoroll added a commit to nigoroll/varnish-cache that referenced this issue Nov 9, 2024
Pondering varnishcache#4140 made me realize that our central per-vcl director list
constitutes a classic case of iterating a mutable list. That besides the already
known fact that running potentially expensive iterator functions while holding
the vcl_mtx is a very bad idea.

So this patch is a suggestion on how to get out of this situation. It does not
go all the way (it still leaves unsolved a similar problem of iterating over all
backends of _all vcls_), but shows an attempt on how to tackle the "iterate over
one VCL's backends".

We add a fittingly named 'vdire' facility which manages the director list and,
in particular, director retirement. The main change is that, while iterators are
active on the director list, vcl_mtx is _not_ held and any request of a director
to retire only ends up in a resignation, which manifests by this director being
put on a second list.

When all iterators have completed, the resignation list is worked and the actual
retirement carried out.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant