-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docs for HTTPRoute timeouts + retries + route metrics #1814
Conversation
Signed-off-by: Alex Leong <[email protected]>
Signed-off-by: Alex Leong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this is really good! I've suggested some edits, but they're pretty minor things, really.
I wholeheartedly agree that not having Viz makes this kind of documentation much tougher, but I think you've done about as well as could be done there. Thanks!!
release. Creating these policy resources will cause the Linkerd proxy to perform | ||
the appropriate retries or timeouts when calling that service. Retries and | ||
timeouts are always performed on the *outbound* (client) side. | ||
Timeouts and retries can be configured using [HTTPRoute], GrpcRoute, or Service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeouts and retries can be configured using [HTTPRoute], GrpcRoute, or Service | |
Timeouts and retries can be configured using [HTTPRoute], GRPCRoute, or Service |
We should be consistent with the name of the resource, I think.
implemented incorrectly retries can amplify small errors into system wide | ||
outages. For that reason, we made sure they were implemented in a way that would | ||
increase the reliability of the system while limiting the risk. | ||
has for gracefully handling partial or transient application failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe include timeouts in here, too? "Timeouts and automatic retries are two of the most powerful and useful mechanisms..." ?
|
||
Retries are a client-side behavior, and are therefore performed by the | ||
outbound side of the Linkerd proxy.[^1] If retries are configured on an | ||
HttpRoute or GrpcRoute with multiple backends, each retry of a request can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HttpRoute or GrpcRoute with multiple backends, each retry of a request can | |
HTTPRoute or GRPCRoute with multiple backends, each retry of a request can |
Should those be links?
To get per-route metrics, you must create [HTTPRoute] resources. If a route has | ||
a `parent_ref` which points to a Service resource, Linkerd will generate | ||
outbound per-route traffic metrics for all HTTP traffic that it sends to that | ||
Service. If a route has a `parent_ref` which points to a Server resource, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Service. If a route has a `parent_ref` which points to a Server resource, | |
Service. If a route has a `parent_ref` which points to a **Server** resource, |
To get per-route metrics, you must create [HTTPRoute] resources. If a route has | ||
a `parent_ref` which points to a Service resource, Linkerd will generate | ||
outbound per-route traffic metrics for all HTTP traffic that it sends to that | ||
Service. If a route has a `parent_ref` which points to a Server resource, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This confuses me. 😂 Suppose I have meshed workloads foo
and bar
, and I have an HTTPRoute with a parent_ref
of foo
's Service. If bar
sends a request to foo
... I'm only going to get outbound metrics, unless I also have a parent_ref
on my HTTPRoute that points to a Server for foo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming! I think I'm gonna have to play with this a bit. 🙂
out the profile that is generated: | ||
We know that the webapp component is getting 500s from the books component, but | ||
it would be great to narrow this down further and get per route metrics. To do | ||
this, we leverage the Gateway API and define a set of HTTPRoute resources, each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this, we leverage the Gateway API and define a set of HTTPRoute resources, each | |
this, we take advantage of the Gateway API and define a set of HTTPRoute resources, each |
Pet peeve. 😂
For this demo, the method is appended to the route regex. | ||
|
||
To get profiles for `authors` and `books`, you can run: | ||
We can then check that these HTTPRoute have been accepted by their parent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can then check that these HTTPRoute have been accepted by their parent | |
We can then check that these HTTPRoutes have been accepted by their parent |
This tells us that Linkerd make a total of 469 retry requests and 247 of those | ||
were successful and the other 222 were not and hit the default retry limit of | ||
`1`. We can improve this further by increasing this limit to allow more than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tells us that Linkerd make a total of 469 retry requests and 247 of those | |
were successful and the other 222 were not and hit the default retry limit of | |
`1`. We can improve this further by increasing this limit to allow more than | |
This tells us that Linkerd make a total of 469 retry requests, of which 247 were | |
successful. The remaining 222 failed and could not be retried again, since we didn't | |
raise the retry limit from its default of 1. | |
We can improve this further by increasing this limit to allow more than |
Signed-off-by: Alex Leong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ship it! 🙂
* update 2.16 retries + timeouts + route metrics docs Signed-off-by: Alex Leong <[email protected]> Co-authored-by: Flynn <[email protected]>
No description provided.