Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust S3 response code handling #54

Open
port8080 opened this issue Jan 8, 2023 · 1 comment
Open

More robust S3 response code handling #54

port8080 opened this issue Jan 8, 2023 · 1 comment

Comments

@port8080
Copy link

port8080 commented Jan 8, 2023

First of all: thanks very much for your work. I have been using rudolfs and have managed to upload ~100GB over ~20K files.

While I was doing that, I started noticing errors which only went away when I did
git config --global lfs.concurrenttransfers 1

This made me realize that you probably have to add more logic to cover over Rusoto's error handling. You already mention this in your comments:

                // Rusoto really sucks at correctly reporting errors.
                // Lets work around that here.

What I suspect is happening is that S3 is returning a 503 slow-down which is getting lumped into RusotoError::Unknown(_) and I think you are handling it like every other transient error.

In our own code, we have done something like this:

/// Roughly inferred what makes sense from
/// https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html#ErrorCodeList
 pub fn s3_status_code_is_permanent(s: StatusCode) -> bool {
     s == StatusCode::BAD_REQUEST || // 400 
     s == StatusCode::UNAUTHORIZED || // 401
     s == StatusCode::FORBIDDEN || // 403
     s == StatusCode::NOT_FOUND // 404
 }

and then

  .map_err(|e| match e {
            RusotoError::Unknown(ref r) => {
                 if r.status == StatusCode::NOT_FOUND {
                     Error::NotFound
                 } else {
                     error!("S3 Failure on key {}, Err {:?}", key, e);
                     e.into()
                 }
             }
             _ => {
                 error!("S3 Failure on key {}, Err {:?}", key, e);
                 e.into()
             }   
@jasonwhite
Copy link
Owner

I started noticing errors

Can you provide the error message? Did it happen during upload or download? Is it an error from the gitlfs client? Or is it from Rudolfs?

Even with these errors, does the operation still complete successfully?

What I suspect is happening is that S3 is returning a 503 slow-down which is getting lumped into RusotoError::Unknown(_) and I think you are handling it like every other transient error.

All upload requests to S3 are retried with exponential backoff, so if S3 returns a 503 error, the request should get retried automatically.

After looking at this code again, I do see that downloads are not retried, so this could be what you're seeing. I think the GitLFS client should automatically retry the download if Rudolfs fails to download because of a transient S3 error, but I'm not 100% sure.

If Rudolfs is unnecessarily retrying in the event of HTTP 400 errors (client errors), then it'll just take a little longer to fail, but the correctness is still there. Still, that should be fixed -- it's always good to fail faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants