Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would there be a way to programmatically get task metrics from the information that Coiled collects? #290

Open
rsignell opened this issue Jul 24, 2024 · 4 comments

Comments

@rsignell
Copy link

I often have workflows that are mostly thousands of reads to s3 for chunks of the same size. I can see some of the variability on the Dask dashboard, but was wondering:

Is there might be a way to get the metrics for the tasks programmatically as a dataframe or something so I could look at the distribution, the tails of the distribution, etc. ?

@rsignell rsignell changed the title Would there be a way to get a histogram of s3 access times from the information that Coiled collects? Would there be a way to programmatically get task metrics from the information that Coiled collects? Jul 24, 2024
@hendrikmakait
Copy link
Member

Hi, @rsignell! I'm curious about the intent behind your request. What problem(s) would you like to solve using task metrics? Since there's a plethora of possible metrics, which ones would you be interested in?

@rsignell
Copy link
Author

I would like to look at the variability of the time it takes to retrieve many chunks of identically-sized chunks of data from s3.

@hendrikmakait
Copy link
Member

hendrikmakait commented Jul 25, 2024

I would like to look at the variability of the time it takes to retrieve many chunks of identically-sized chunks of data from s3.

That would mean you'd be interested in the distribution of task durations for those tasks reading your chunks, or something else? Is there a specific problem that understanding the variability would help you with?

@rsignell
Copy link
Author

Yes, I'm trying to figure out how many chunks at a time I should request for each task, and it would be good to know the distribution of s3 access times within the task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants