Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry non-duplicate ec2 register_image errors #274

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

asher
Copy link
Contributor

@asher asher commented Oct 31, 2019

No description provided.

@asher
Copy link
Contributor Author

asher commented Oct 31, 2019

Aminator currently only retries register_image exceptions caused by a duplicate image name. We see regular throttling due to 429 rate limiting from amazon, so hoping allowing these to be retried during registration, as well as being more aggressive at retrying tag failures will cut down on full bake retries.

@asher asher requested a review from bmoyles October 31, 2019 00:49
return False
# This could be a retryable error, such as ec2 api throttling
_tries -= 1
if (tries > 0):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be

Suggested change
if (tries > 0):
if (_tries > 0):

instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, thanks!

Copy link
Contributor

@bmoyles bmoyles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still bummed that Amazon doesn't support tag creation at image registration time like it does for other objects (volumes, snapshots, etc) as it would be nice to be able to get rid of the tagging step altogether and just do it in-line with registration.

@asher one thing I might offer up -- we could make failing on tag failure optional... We'll still have some identifying metadata embedded in the AMI name and the AMI description fields which gives the app, build number, the commit, the base AMI name, ID, and version. We would potentially lose one (or all) of:

  • base_ami_version (redundant, in ancestor_version in description)
  • creator (which is not always accurate, anyways, depending on where, when, and how the bake was triggered)
  • creation_time (which, given creationDate exists on the AMI, is redundant -- our creation_time tag reflects the time at which tags were added, creationDate reflects registration time if I'm not mistaken, and looking at some recent bakes, they differ by seconds at most)
  • appversion -- this might be more annoying to miss as I know there are some internal tools that use it. All of the information in appversion is also available from description EXCEPT the build job name

Given we have enough metadata in the name and description, seems like we could make tag failure optionally fatal and worst case, run something out-of-process later on that adds tags back to the image if they're missing...

@@ -76,8 +76,15 @@ def _retry(f, *args, **kwargs):
_tries -= 1
_delay *= backoff
else:
log.critical("Unable to retry register_image due to ClientError: %s", e)
return False
# This could be a retryable error, such as ec2 api throttling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the current crop of retry-able errors identifiable by either a specific exception or message in the exception?

What this effectively does is add an alternative path for ClientError exceptions wherein exception['Error']['Code'] does not match InvalidAMIName.Duplicate that catches all other forms, so if you're just looking to retry all exceptions from ClientError I might consider getting rid of the specific foo around checking for InvalidAMIName.Duplicate in the error code and just always retry ClientError exceptions (which is basically what your else: block does).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bigger deal is if the exceptions thrown during registration are of some other type than ClientError...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants