#3227: Wait for the keepalive period to end before stopping the acceptance of new requests #3236

michdan · 2024-07-04T22:04:19Z

This change will prevent potential disruptions in request handling.
Currently, when a worker is restarted due to the max_request limit and the keepalive period is greater than zero, there is a gap between stopping the acceptance of new requests and starting a new worker. This happens because the process must wait for the lesser of the grace period and the keepalive period, even if there are no pending requests. It waits due to open connections that remain active for the duration of the keepalive period.
See also: #3227
cc: @benoitc @pajod

…ance of new requests. This will prevent potential disruptions in request handling.

michdan · 2024-07-08T13:33:46Z

To test such functionality, consider using an approach inspired by Apache httpd: https://github.com/apache/httpd/blob/trunk/test/modules/core/test_002_restarts.py. This approach involves making assertions based on log output. While it would require implementing some additional tools, this method seems to simplify testing complex scenarios such as multiprocessing and greenlets.

michdan · 2024-07-09T11:58:33Z

gunicorn/workers/ggevent.py

+        # Retrieve all active greenlets and repeatedly check, until the keepalive period ends,
+        # if any of those greenlets are still active. If none are active, exit the loop.
+
+        greenlets = {id(greenlet) for server in servers for greenlet in server.pool.greenlets}


When this line is executed, self.alive is already False, which means that all keepalive connections are either closed or are being handled by the existing greenlets. No new keepalive connections will be created until the worker restarts.
Therefore, if the greenlets found in this line have finished, we do not need to wait any longer, as we are assured that there are no more active keepalive connections.

pajod · 2024-07-09T23:39:37Z

I am not yet confident I understand all possibly implications.. but whether further changed or not.. this definitely needs documentation (It already needed documentation before, but after the change the behaviour can be even more unexpected). Maybe a time table that lists the state transitions from hitting max_requests + jitter, then awaiting keepalive, then awaiting graceful_timeout, then app initialization - and what sort of responsiveness to a) existing and b) to new clients expect in each of the worker states? Those expectation could then be translated into (likely more verbose, less readable) test cases.

pajod · 2024-07-09T23:40:45Z

gunicorn/workers/ggevent.py

+        greenlets = {id(greenlet) for server in servers for greenlet in server.pool.greenlets}
+        ts = time.time()
+
+        while time.time() - ts <= self.cfg.keepalive:


obligatory "only do that with a monotonic clock" comment

Thanks for pointing this out. I changed it to use a monotonic clock. By the way, shouldn't it be changed in other places as well? For example, I can see: https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/ggevent.py#L98

@michdan Absolutely correct, there are other places, but until I can improve my tests to actually cover such unimportant corner cases, my focus is on merely improving it while also touching the timeout behaviour anyway (as of now: here, plus in #3157).

benoitc

I think only the first part of this change (ensuring we don't process keepalive during closing is OK.

I disagree with the second part. Can you update your patch to only take care of the first one?

benoitc · 2024-08-10T07:45:29Z

gunicorn/workers/ggevent.py

@@ -87,6 +87,10 @@ def run(self):
            self.notify()
            gevent.sleep(1.0)

+        # Wait for pending keepalive connections to complete before stopping the acceptance of new requests
+        if self.cfg.keepalive:


i disagree there, we shouldn't care about waiting for keepalive connections when closing. CLose should be done as fast as possible.

benoitc · 2024-08-10T07:46:44Z

gunicorn/workers/base_async.py

@@ -36,7 +36,8 @@ def handle(self, listener, client, addr):
            parser = http.RequestParser(self.cfg, client, addr)
            try:
                listener_name = listener.getsockname()
-                if not self.cfg.keepalive:
+                # do not allow keepalive if the worker is about to be restarted


this change is OK

michdan mentioned this pull request Jul 4, 2024

Waiting socket reads block worker restart #3227

Open

michdan changed the title ~~3227: Wait for the keepalive period to end before stopping the acceptance of new requests~~ #3227: Wait for the keepalive period to end before stopping the acceptance of new requests Jul 4, 2024

3227: Wait for the keepalive period to end before stopping the accept…

34ce875

…ance of new requests. This will prevent potential disruptions in request handling.

michdan force-pushed the 3227_wait_for_keepalive_before_restart branch from 29b7866 to 34ce875 Compare July 8, 2024 12:11

michdan commented Jul 9, 2024

View reviewed changes

pajod reviewed Jul 9, 2024

View reviewed changes

benoitc#3227: Use a monotonic clock to measure elapsed time.

bcdbc09

benoitc requested changes Aug 10, 2024

View reviewed changes

benoitc self-assigned this Aug 10, 2024

benoitc added the Waiting for feedback label Aug 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#3227: Wait for the keepalive period to end before stopping the acceptance of new requests #3236

#3227: Wait for the keepalive period to end before stopping the acceptance of new requests #3236

michdan commented Jul 4, 2024 •

edited

Loading

michdan commented Jul 8, 2024

michdan Jul 9, 2024

pajod commented Jul 9, 2024

pajod Jul 9, 2024

michdan Jul 12, 2024 •

edited

Loading

pajod Jul 13, 2024

benoitc left a comment

benoitc Aug 10, 2024

benoitc Aug 10, 2024

#3227: Wait for the keepalive period to end before stopping the acceptance of new requests #3236

Are you sure you want to change the base?

#3227: Wait for the keepalive period to end before stopping the acceptance of new requests #3236

Conversation

michdan commented Jul 4, 2024 • edited Loading

michdan commented Jul 8, 2024

michdan Jul 9, 2024

Choose a reason for hiding this comment

pajod commented Jul 9, 2024

pajod Jul 9, 2024

Choose a reason for hiding this comment

michdan Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

pajod Jul 13, 2024

Choose a reason for hiding this comment

benoitc left a comment

Choose a reason for hiding this comment

benoitc Aug 10, 2024

Choose a reason for hiding this comment

benoitc Aug 10, 2024

Choose a reason for hiding this comment

michdan commented Jul 4, 2024 •

edited

Loading

michdan Jul 12, 2024 •

edited

Loading