Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retries to error message #159

Merged
merged 2 commits into from
Oct 16, 2024

Conversation

konstin
Copy link
Contributor

@konstin konstin commented Jun 11, 2024

As a user or developer, it is hard to tell whether a request failed randomly and was not retried, or whether it failed even with retries (e.g. astral-sh/uv#3514). This PR adds the number of retries to the error chain if there were any retries. Example error chain:

error: Request failed after 3 retries
  Caused by: error sending request for url (https://pypi.org/simple/tqdm/)
  Caused by: client error (Connect)
  Caused by: dns error: failed to lookup address information: Name or service not known
  Caused by: failed to lookup address information: Name or service not known

I've also changed request_middleware::Error transparent to avoid being to verbose.

@konstin konstin requested a review from a team as a code owner June 11, 2024 10:36
err: reqwest_middleware::Error,
},
#[error(transparent)]
Error(reqwest_middleware::Error),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion:

using an enum here is going to make it harder for a user to match on this error now, because they have to handle an extra branch that is still just a reqwest_middleware::Error

it seems completely unnecessary, why don't you just make this a struct that has a retry count and an error?
the only place where RetryError::Error was constructed, we actually do have a retry count that we could supply.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, suppose I'm a user, and i want to do further error handling and call is_body on the underlying Reqwest::Error. https://docs.rs/reqwest/latest/reqwest/struct.Error.html#method.is_body
With this enum that requires a lot more boilerplate code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the enum so we can insert the error message about the number of retries when we did them, while not showing an irrelevant "Failed after 0 retries" otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesn't seem like such a big problem. maybe you could customize Display so that it doesn't show when the number is 0? the enum is going to create boilerplate at any call-site that wants to further inspect the error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't skip the display on 0 because it ToString has to return a string even if it's an empty string, the only way i know of how to skip a step in the error trace is error(transparent).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't skip the display on 0 because it ToString has to return a string even if it's an empty string

i think maybe you misunderstood me, you could do something like this

pub struct Error {
   err: reqwest::Error,
   retries: u32,
}

then you could implement Display like

impl Display for Error {
    fn fmt(&self, f: ...) -> ... {
       if self.retries == 0 {
          write!(f, "{}", self.err)
       } else {
          write!(f, "Failed after {} retries: {}", self.retries, self.err)
       }
    }
}

or similar

// Report whether we failed with or without retries.
break if n_past_retries > 0 {
result.map_err(|err| {
Error::Middleware(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks wrong, on this code path, this is not a middleware error, it's a request error.

a good example of a middleware error is when the middleware cannot do something it needs to do to function, like at like 146 above.

the result variable is the result of actually making a request and it failing or succeeding somehow, so these errors are request errors, not middleware errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make this work you need to change ReqwestMiddleware::Error so that the Reqwest variant contains your retry data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't change reqwest::Error since it's part of the reqwest crate and not of request-middleware crate, so we need to use the dynamic error variant here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can change reqwest_middleware::Error , in fact you already did in this PR. that's what i'm suggesting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require changing reqwest_middleware::Error so that it has a variant specific to retry errors, this would break the isolation between the general retry stack and the specific middleware for retrying.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i see now, that's annoying

)
})
} else {
result.map_err(|err| Error::Middleware(RetryError::Error(err).into()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also not a middleware error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the most common pattern in rust to have one error type per crate (or sometimes per top level module, like serde::de::Error and serde::ser::Error). Crates higher up in the dependency tree then have an enum where most variants are wrappers around the errors of crates they depend on. In reqwest-middleware, we want to allow a dynamic middleware stack, so neither do we know the list of error type statically, nor can we declare it at compile time, so we have to use a dyn Error such as anyhow provides. The current request_middleware::Error is insufficient when adding context as we do here, it doesn't properly model that the actual reqwest::Error instance may be lower in the stack. I don't really want to change that type to just a anyhow::Error wrapper with a method that finds you a reqwest::Error in the chain, if any, to avoid too much churn.

Copy link
Contributor

@cbeck88 cbeck88 Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change you are making here, where normal reqwest::Error from the crate, is classified as reqwest_middleware::Error::middleware instead of reqwest_middleware::Error::request, is going to break every single existing user of this crate who attempts to actually inspect and handle these errors. because today, all those users match on reqwest_middleware::Error and 99.9% of the time it's a request error that they can further inspect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it almost completely defeats the point of having request_middleware::Error::reqwest, which after this change will serve to only confuse users

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, these changelog entries are not sufficient, you should indicate a semver breaking change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current interpretation of the error enum assumes two cases:

  • There was an internal error in the middleware
  • There was a reqwest::Error we forward
    But there exists a third class of errors, those where we do have a reqwest::Error, but also additional context or a middle error with a reqwest::Error behind it. The current enum represent this kind of errors, and there exists no simple rust pattern for matching on them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, i've bumped reqwest-retry to 0.6.0, let's see what the maintainers say about reqwest-middleware.

@konstin konstin force-pushed the konsti/add-retries-to-error branch from 85eafb5 to 21ceec9 Compare July 1, 2024 20:19
konstin added a commit to astral-sh/uv that referenced this pull request Jul 2, 2024
In #3514 and #2755, users had intermittent network errors, but it was not always clear whether we had already retried these requests or not. Building upon TrueLayer/reqwest-middleware#159, this PR adds the number of retries to the error message, so we can see at first glance where we're missing retries and where we might need to change retry settings.

Example error trace:

```
Could not connect, are you offline?
  Caused by: Request failed after 3 retries
  Caused by: error sending request for url (https://pypi.org/simple/uv/)
  Caused by: client error (Connect)
  Caused by: dns error: failed to lookup address information: Name or service not known
  Caused by: failed to lookup address information: Name or service not known
```

This code is ugly since i'm missing a better pattern for attaching context to reqwest middleware errors in TrueLayer/reqwest-middleware#159.
konstin added a commit to astral-sh/uv that referenced this pull request Jul 2, 2024
In #3514 and #2755, users had intermittent network errors, but it was not always clear whether we had already retried these requests or not. Building upon TrueLayer/reqwest-middleware#159, this PR adds the number of retries to the error message, so we can see at first glance where we're missing retries and where we might need to change retry settings.

Example error trace:

```
Could not connect, are you offline?
  Caused by: Request failed after 3 retries
  Caused by: error sending request for url (https://pypi.org/simple/uv/)
  Caused by: client error (Connect)
  Caused by: dns error: failed to lookup address information: Name or service not known
  Caused by: failed to lookup address information: Name or service not known
```

This code is ugly since i'm missing a better pattern for attaching context to reqwest middleware errors in TrueLayer/reqwest-middleware#159.
konstin added a commit to astral-sh/uv that referenced this pull request Jul 2, 2024
In #3514 and #2755, users had intermittent network errors, but it was
not always clear whether we had already retried these requests or not.
Building upon TrueLayer/reqwest-middleware#159,
this PR adds the number of retries to the error message, so we can see
at first glance where we're missing retries and where we might need to
change retry settings.

Example error trace:

```
Could not connect, are you offline?
  Caused by: Request failed after 3 retries
  Caused by: error sending request for url (https://pypi.org/simple/uv/)
  Caused by: client error (Connect)
  Caused by: dns error: failed to lookup address information: Name or service not known
  Caused by: failed to lookup address information: Name or service not known
```

This code is ugly since i'm missing a better pattern for attaching
context to reqwest middleware errors in
TrueLayer/reqwest-middleware#159.
@aochagavia
Copy link

@cbeck88 @konstin is there anything preventing this PR from being merged? I'm using uv as a dependency in a project and had to patch reqwest-middleware in my Cargo.toml, and I know there are other projects that are refraining from update to the latest uv libraries to avoid this issue (e.g. pixi).

Let me know if I can help in any way here ;)

@cbeck88
Copy link
Contributor

cbeck88 commented Aug 5, 2024

@cbeck88 @konstin is there anything preventing this PR from being merged? I'm using uv as a dependency in a project and had to patch reqwest-middleware in my Cargo.toml, and I know there are other projects that are refraining from update to the latest uv libraries to avoid this issue (e.g. pixi).

Let me know if I can help in any way here ;)

At a minimum, the cargo toml has to be fixed, because 0.6.0 was already released

We haven't yet heard from maintainers whether they want to go in this direction. Currently the number of retries is reported by logging and the log level is configurable.

If they do, imo more thought should be given to exactly what the structure of the error is, because existing code that needs to access the underlying reqwest error will have to downcast, and any changes to that structure in the future will be breaking changes. Maybe better to have private fields and pub fn getters.

If they don't want the patch, there's probably still a path forward for uv to get the number of retries without forking this crate, for example by injecting a custom strategy object that wraps the default one and contains a counter, or adding another middleware layer I guess.

@eopb
Copy link
Member

eopb commented Oct 16, 2024

@konstin first of all, I'm very sorry this took so long before getting reviewed by a maintainer. Thank you for putting in the effort to continually rebase this PR. It's awesome to see reqwest-retry being used in uv.

@cbeck88 thank you for providing such detailed reviews. These were very helpful.

I don't see anything major preventing this from being merged so I'm going to go ahead. There's been a fair amount of discussion so feel welcome to open a PR or issue if anyone disagrees. Another breaking change to reqwest-retry wouldn't be a huge deal.

@eopb eopb merged commit 680741f into TrueLayer:main Oct 16, 2024
7 of 9 checks passed
@konstin konstin deleted the konsti/add-retries-to-error branch October 16, 2024 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants