Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V7 Add transient error handling #2281

Open
cronchie opened this issue Nov 6, 2024 · 5 comments
Open

V7 Add transient error handling #2281

cronchie opened this issue Nov 6, 2024 · 5 comments
Assignees
Labels
1) Discussion ongoing Issue is opened and assigned but no clear proposal yet next meeting Filter for leaders V7 Temporary label for grouping logging related issues _5.0 - prep This needs to be addressed to prepare 5.0

Comments

@cronchie
Copy link
Contributor

cronchie commented Nov 6, 2024

Draft requirement:

Verify that transient errors are handled with a retry mechanism that includes backoff and jitter, ensuring retries are spaced effectively to prevent overwhelming the system or external service, and avoiding retries for errors that are not transient.

I propose this has an availability impact because without a retry and backoff strategy, clients may repeatedly hammer a server when transient errors (like timeouts or temporary unavailability) occur. This could overwhelm the target service or network, creating a DoS. As for the inverse-- there's no point retrying other kinds of errors (AuthN / AuthZ / 400 Bad Request) and doing so can also impact availability.

@tghosth tghosth added 1) Discussion ongoing Issue is opened and assigned but no clear proposal yet _5.0 - prep This needs to be addressed to prepare 5.0 V7 Temporary label for grouping logging related issues labels Nov 6, 2024
@ryarmst
Copy link
Collaborator

ryarmst commented Nov 6, 2024

I think this is a good mechanism, but perhaps out of scope. There are many ways an app could be designed such that a sort of self-DoS could occur and there is not an immediate vector to abuse it.

@cronchie
Copy link
Contributor Author

cronchie commented Nov 6, 2024

There are many ways an app could be designed such that a sort of self-DoS could occur and there is not an immediate vector to abuse it.

Very true! Though I suggest that this is DoSing an external system. For instance, if a web app keeps retrying an operation with no backoff time, that could impact the service's availability.

@elarlang
Copy link
Collaborator

elarlang commented Nov 7, 2024

potentially related: #1778

@tghosth
Copy link
Collaborator

tghosth commented Nov 7, 2024

@cronchie could you take a look at #1778 and see if the content there is similar to what you mention here?

@elarlang
Copy link
Collaborator

elarlang commented Nov 9, 2024

I rechecked, that this issue is not a duplicate of #1778, although related.

From the issue description:

I propose this has an availability impact because, without a retry and backoff strategy, clients may repeatedly hammer a server when transient errors (like timeouts or temporary unavailability) occur. This could overwhelm the target service or network, creating a DoS. As for the inverse-- there's no point retrying other kinds of errors (AuthN / AuthZ / 400 Bad Request) and doing so can also impact availability.

Point of view: "save 3rd party service from DoS". Based on this description I would say, that every service must defend itself and can not rely on the hope their users are nice.

There are some risks for the application itself, when hammering other services - the application may get blocked from using the service - and this is a clear denial of service for the application itself (that is in scope for ASVS), not (only) for the external service (that is out of scope for ASVS).

How to solve the problem - as it may be done and achieved in many different ways, we just can highlight the problem in a documentation requirement.

We have a documentation requirement:

1.14.7 Verify that all communication needs for the application are documented. This should include external services which the application relies upon and cases where an end user might be able to provide an external location to which the application will then connect.

We can ask "Document how the connection to service should fail, and if retry is allowed, then retry strategy."

In case there is need to create a new documentation requirement for #1778, it may fit there better.

@tghosth tghosth added the next meeting Filter for leaders label Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1) Discussion ongoing Issue is opened and assigned but no clear proposal yet next meeting Filter for leaders V7 Temporary label for grouping logging related issues _5.0 - prep This needs to be addressed to prepare 5.0
Projects
None yet
Development

No branches or pull requests

4 participants