Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coordinate backend HTTP keep-alive timeouts with buildpack teams #240

Open
peterellisjones opened this issue Nov 12, 2021 · 2 comments
Open
Assignees

Comments

@peterellisjones
Copy link
Contributor

peterellisjones commented Nov 12, 2021

Is this a security vulnerability?

No

Issue

Hi folks, when using HTTP keep-alive it is important that clients always close connections before servers. This is to avoid the race condition where a client sends a new request on an existing connection just as the server closes the connection. When this happens for a request from the gorouter to an application, the user will get a 502 for non-retriable / non-idempotent requests.

When backend keep-alive connections are enabled via max_idle_connections > 0 , the gorouter uses a 90 second duration for keep alives. Unfortunately this default value is incompatible with the 60 second default value in the Java buildpack as client keep-alive durations must be longer than server keep-alive durations to avoid the race condition.

This means that the hard-coded value for keep-alive duration in the gorouter and the default value for keep-alive duration in the Java buildpack are not compatible. So that Cloud Foundry works better "out of the box", it would be great if buildpack teams could coordinate with gorouter team so that mutually-compatible values are chosen. The important thing is that whatever values are chosen, the gorouter should have a shorter keep-alive duration than the app.

Affected Versions

All gorouters with backend keep alive enabled

Context

Java buildpack issue 881
CF docs PR 199 and PR 201

Traffic Diagram

           +----+---+    +----------+     +-------+
  \o/      |        |    |          |     |       |
   +  +--->+ AWS LB +--->+ Gorouter +---->+  App  |
  / \      |        |    |          |     |       |
 client    +--------+    +----------+     +-------+
                                      ^^^^
                            issue occurs here when app closes connection
                            just as gorouter sends next request

Steps to Reproduce

You can reproduce this by creating a test app with a (for example) 100ms keep-alive timeout and writing a script that sends a non-idempotent request (eg a POST request) via the gorouter every 100ms. You will eventually get a 502 error from the gorouter when the race condition triggers due to the gorouter sending a request to the app just as the app closes the connection. I've successfully reproduced this with a gorouter and test app running locally -- I expect you may need to tweak the durations for a gorouter running in Cloud Foundry to account for network latency.

Possible Fix

A simple fix for the java-buildpack case would just be to make the gorouter backend keep-alive duration < 60 seconds. This may still be incompatible with other buildpacks though.

@plowin
Copy link
Contributor

plowin commented Nov 12, 2021

Also cross-linking a similar blog about a nodejs express server behind an AWS ALB: https://adamcrowder.net/posts/node-express-api-and-aws-alb-502. One customer just recently ran into this issue with their node-server on our CF env.

@ameowlia ameowlia assigned ameowlia and reneighbor and unassigned ameowlia Nov 17, 2021
@plowin
Copy link
Contributor

plowin commented Aug 25, 2023

@ameowlia , we are regularly receiving 502 reports due to respective misconfigurations. There is not much progress on buildpack-side. Could you support making this a topic if it would also support other community-members? See e.g. cloudfoundry/java-buildpack#881

@mariash mariash assigned ameowlia and unassigned reneighbor Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Waiting for Changes | Open for Contribution
Development

No branches or pull requests

4 participants