Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: idsse-912: add optimization args to aws_cp() #75

Merged
merged 3 commits into from
Sep 10, 2024

Conversation

mackenzie-grimes-noaa
Copy link
Contributor

Linear Issue

IDSSE-912

Changes

  • Add optional concurrency and chunk_size args to aws_cp, to change how s5cmd downloads files (hopefully improving the time to download a large file)

Explanation

These function args let us control the number of parallel threads and size of chunks that s5cmd uses to download a single GRIB file from AWS. The defaults (what DAS was using up until now) is 5 threads, 50 MB chunks at a time, but now DAS can tweak these controls to figure out what runs the fastest in our environment.

Unfortunately s5cmd does not support downloading partial files from AWS S3 today. It's open source, so I'm hoping to contribute to the s5cmd project to get it added.

Copy link
Contributor

@Geary-Layne Geary-Layne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you have looks fine, but see comment for a suggested change.

def aws_cp(self,
path: str,
dest: str,
concurrency: int = 5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather the default be None, and if concurrency or chuck_size was None it wouldn't be included in the command (thus use the s5cmd default) vs us having a copy of the default values, which s5cmd could change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed!

@mackenzie-grimes-noaa mackenzie-grimes-noaa merged commit d750a8e into main Sep 10, 2024
2 checks passed
@mackenzie-grimes-noaa mackenzie-grimes-noaa deleted the feat/aws-cp-customization branch September 10, 2024 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants