-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deadlock (resulting in big quit) from vcl_BackendEvent() racing vcldir_retire() #4140
Comments
nigoroll
added a commit
to nigoroll/varnish-cache
that referenced
this issue
Jul 26, 2024
Pondering varnishcache#4140 made me realize that our central per-vcl director list constitues a classic case of iterating a mutable list. That besides the already known fact that running potentially expensive iterator functions while holding the vcl_mtx is a very bad idea. So this patch is a suggestion on how to get out of this sitation. It does not go all the way (it still leaves unsolved a similar problem of iterating over all backends of _all vcls_), but shows an attempt on how to tackle the "iterate over one VCL's backends". We add an fittingly named 'vdure' facility which manages the director list and, in particular, director retirement. The main change is that, while iterators are active on the director list, vcl_mtx is _not_ held and any request of a director to retire only ends up in a resignation, which manifests by this director being put on a second list. When all iterators have completed, the resignation list is worked and the actual retirement carried out. NOTE: I am particularly _not_ happy about the vdire_(start|end)_iter() interface being exposed. What I would like to see is a FOREACH macro which does both the start and end. Having to use an iterator with a callback is clumsy, and having to directly call start/end with a VTAILQ_FOREACH feels like crossing interfaces.
nigoroll
added a commit
to nigoroll/varnish-cache
that referenced
this issue
Jul 26, 2024
Pondering varnishcache#4140 made me realize that our central per-vcl director list constitutes a classic case of iterating a mutable list. That besides the already known fact that running potentially expensive iterator functions while holding the vcl_mtx is a very bad idea. So this patch is a suggestion on how to get out of this situation. It does not go all the way (it still leaves unsolved a similar problem of iterating over all backends of _all vcls_), but shows an attempt on how to tackle the "iterate over one VCL's backends". We add a fittingly named 'vdire' facility which manages the director list and, in particular, director retirement. The main change is that, while iterators are active on the director list, vcl_mtx is _not_ held and any request of a director to retire only ends up in a resignation, which manifests by this director being put on a second list. When all iterators have completed, the resignation list is worked and the actual retirement carried out. NOTE: I am particularly _not_ happy about the vdire_(start|end)_iter() interface being exposed. What I would like to see is a FOREACH macro which does both the start and end. Having to use an iterator with a callback is clumsy, and having to directly call start/end with a VTAILQ_FOREACH feels like crossing interfaces.
nigoroll
added a commit
to nigoroll/libvmod-dynamic
that referenced
this issue
Jul 30, 2024
So it turns out, that - still - we can deadlock raching between VCL events and director deletions (see varnishcache/varnish-cache#4140). This commit was originally written to fix #108, but it now enables us to take a different route again: Do not pthread_join() lookup threads, but rather detach them and make sure _they_ return the reference before terminating. The remainder of the commit message is an edit in light of the new insights: Our lookup threads may continue for a bit after a VCL_COLD event has been received: dom_event() sets dom->status to DYNAMIC_ST_DONE while the lookup thread may be blocking in the resolver or updating a domain. So, lookup threads might try to create backends even after the VCL has gone cold, in which case VRT_AddDirector() would panic with "Dynamic Backends can only be added to warm VCLs" - _unless_ a reference is present on the VCL to make the VCL state change go through VCL_TEMP_COOLING, in which case VRT_AddDirector() just returns NULL. vmod_dynamic took a VCL reference since 99912bc back in May 2019, but before the merge of varnishcache/varnish-cache#4037 it did not work correctly, because the VCL reference was returned before all threads had terminated. The naive idea to solve this would be to take/release a VCL reference in each lookup thread (or when starting/stopping it), but references can only be taken from the CLI context (ASSERT_CLI() in VRT_VCL_Prevent_Discard). Also, to support use of the .resolve() type method from vcl_init{}, we needed to depart from the original concept of running resolver threads only for warm vcls. We get out of this situation by using a reference on top of the VCL reference, which we can take even before we have the VCL reference. The VCL reference is returned once all internal "referef"s are returned. Ref #108 Probably fixes #117 also
nigoroll
added a commit
to nigoroll/varnish-cache
that referenced
this issue
Jul 31, 2024
Pondering varnishcache#4140 made me realize that our central per-vcl director list constitutes a classic case of iterating a mutable list. That besides the already known fact that running potentially expensive iterator functions while holding the vcl_mtx is a very bad idea. So this patch is a suggestion on how to get out of this situation. It does not go all the way (it still leaves unsolved a similar problem of iterating over all backends of _all vcls_), but shows an attempt on how to tackle the "iterate over one VCL's backends". We add a fittingly named 'vdire' facility which manages the director list and, in particular, director retirement. The main change is that, while iterators are active on the director list, vcl_mtx is _not_ held and any request of a director to retire only ends up in a resignation, which manifests by this director being put on a second list. When all iterators have completed, the resignation list is worked and the actual retirement carried out. NOTE: I am particularly _not_ happy about the vdire_(start|end)_iter() interface being exposed. What I would like to see is a FOREACH macro which does both the start and end. Having to use an iterator with a callback is clumsy, and having to directly call start/end with a VTAILQ_FOREACH feels like crossing interfaces.
nigoroll
added a commit
to nigoroll/varnish-cache
that referenced
this issue
Jul 31, 2024
Pondering varnishcache#4140 made me realize that our central per-vcl director list constitutes a classic case of iterating a mutable list. That besides the already known fact that running potentially expensive iterator functions while holding the vcl_mtx is a very bad idea. So this patch is a suggestion on how to get out of this situation. It does not go all the way (it still leaves unsolved a similar problem of iterating over all backends of _all vcls_), but shows an attempt on how to tackle the "iterate over one VCL's backends". We add a fittingly named 'vdire' facility which manages the director list and, in particular, director retirement. The main change is that, while iterators are active on the director list, vcl_mtx is _not_ held and any request of a director to retire only ends up in a resignation, which manifests by this director being put on a second list. When all iterators have completed, the resignation list is worked and the actual retirement carried out. NOTE: I am particularly _not_ happy about the vdire_(start|end)_iter() interface being exposed. What I would like to see is a FOREACH macro which does both the start and end. Having to use an iterator with a callback is clumsy, and having to directly call start/end with a VTAILQ_FOREACH feels like crossing interfaces.
nigoroll
added a commit
to nigoroll/varnish-cache
that referenced
this issue
Sep 30, 2024
Pondering varnishcache#4140 made me realize that our central per-vcl director list constitutes a classic case of iterating a mutable list. That besides the already known fact that running potentially expensive iterator functions while holding the vcl_mtx is a very bad idea. So this patch is a suggestion on how to get out of this situation. It does not go all the way (it still leaves unsolved a similar problem of iterating over all backends of _all vcls_), but shows an attempt on how to tackle the "iterate over one VCL's backends". We add a fittingly named 'vdire' facility which manages the director list and, in particular, director retirement. The main change is that, while iterators are active on the director list, vcl_mtx is _not_ held and any request of a director to retire only ends up in a resignation, which manifests by this director being put on a second list. When all iterators have completed, the resignation list is worked and the actual retirement carried out. NOTE: I am particularly _not_ happy about the vdire_(start|end)_iter() interface being exposed. What I would like to see is a FOREACH macro which does both the start and end. Having to use an iterator with a callback is clumsy, and having to directly call start/end with a VTAILQ_FOREACH feels like crossing interfaces.
nigoroll
added a commit
to nigoroll/varnish-cache
that referenced
this issue
Oct 28, 2024
Pondering varnishcache#4140 made me realize that our central per-vcl director list constitutes a classic case of iterating a mutable list. That besides the already known fact that running potentially expensive iterator functions while holding the vcl_mtx is a very bad idea. So this patch is a suggestion on how to get out of this situation. It does not go all the way (it still leaves unsolved a similar problem of iterating over all backends of _all vcls_), but shows an attempt on how to tackle the "iterate over one VCL's backends". We add a fittingly named 'vdire' facility which manages the director list and, in particular, director retirement. The main change is that, while iterators are active on the director list, vcl_mtx is _not_ held and any request of a director to retire only ends up in a resignation, which manifests by this director being put on a second list. When all iterators have completed, the resignation list is worked and the actual retirement carried out.
nigoroll
added a commit
to nigoroll/varnish-cache
that referenced
this issue
Nov 8, 2024
Pondering varnishcache#4140 made me realize that our central per-vcl director list constitutes a classic case of iterating a mutable list. That besides the already known fact that running potentially expensive iterator functions while holding the vcl_mtx is a very bad idea. So this patch is a suggestion on how to get out of this situation. It does not go all the way (it still leaves unsolved a similar problem of iterating over all backends of _all vcls_), but shows an attempt on how to tackle the "iterate over one VCL's backends". We add a fittingly named 'vdire' facility which manages the director list and, in particular, director retirement. The main change is that, while iterators are active on the director list, vcl_mtx is _not_ held and any request of a director to retire only ends up in a resignation, which manifests by this director being put on a second list. When all iterators have completed, the resignation list is worked and the actual retirement carried out.
nigoroll
added a commit
to nigoroll/varnish-cache
that referenced
this issue
Nov 9, 2024
Pondering varnishcache#4140 made me realize that our central per-vcl director list constitutes a classic case of iterating a mutable list. That besides the already known fact that running potentially expensive iterator functions while holding the vcl_mtx is a very bad idea. So this patch is a suggestion on how to get out of this situation. It does not go all the way (it still leaves unsolved a similar problem of iterating over all backends of _all vcls_), but shows an attempt on how to tackle the "iterate over one VCL's backends". We add a fittingly named 'vdire' facility which manages the director list and, in particular, director retirement. The main change is that, while iterators are active on the director list, vcl_mtx is _not_ held and any request of a director to retire only ends up in a resignation, which manifests by this director being put on a second list. When all iterators have completed, the resignation list is worked and the actual retirement carried out.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Expected Behavior
.
Current Behavior
seen while trying to reproduce nigoroll/libvmod-dynamic#117:
This is similar to the problem fixed by #4048, but here the situation is not to add a director, but rather to remove it:
A director thread (updating backends) is racing a CLI cold event which has a
pthread_join()
for the director thread. It never completes because the thread is waiting for thevcl_mtx
, which the CLI event holds because it is returning the last director reference.Possible Solution
I see no obvious solution to this problem at this point.
Steps to Reproduce (for bugs)
No response
Context
.
Varnish Cache version
No response
Operating system
No response
Source of binary packages used (if any)
No response
The text was updated successfully, but these errors were encountered: