Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will default to the OS settings for the TCP keepalive configuration, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long would the keep alive timeout be? I found that by default it only kicks in after 2 hours. Won't that mean that we'll run into the TCP retransmission timeout before (after ~ 15 minutes) and therefore, the keepalive timeout won't have any effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that is the default - but bear in mind that a tcp connection is fairly likely to be open for many hours before a failure occurs, so we would expect keepalives every 75s for most conns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand things correctly, then the keep alive will start after 2h of idleness. Hence, whenever there is a package arriving it will be reset. Therefore, I believe that the default settings won't help here and if we recommend enabling upstream tcp keepalive with envoy, then we should set concrete values here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly related question: Francesco concluded on restatedev/restate#626 to use HTTP/2 keep alives instead TCP keep alives. He also mentioned that Envoy should be able to do the same. Should we rather recommend using this mechanism instead of TCP keep alive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, sounds good to me, ill revert this and implement http2