-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up old jobs data for dashboard generation #55
Comments
@assignUser @boshek what do you think is a good amount of time to keep? In my opinion 120 days should be enough to give us the state for a couple of releases, so we can compare what was the job status on the previous release when we are creating a new release. At the moment the first data points are from mid May 2022. |
Good thought. One idea is to restrict the dates plotted and then display some aggregated to show the long term mean and say the 120 days mean like this: I also think we could introduce some cheap interactivity with plotly such that we could have some hover capabilities - that is hover over a point and it can give you the exact date, maybe the percent failed and even exactly what failed if we want. As far as removing the csvs, I am always a bit resistant to removing any data. Is size the issue? Perhaps we could convert to parquet or maybe we could even write them to a bucket somewhere and then use arrow to query that bucket. |
I have tested around with s3 before, using 3 csvs a day makes it quite slow due to the number of objects. But the csvs compress very well so using a single parquet file and re-writing it for each push wouldn't be a problem. +1 for ✨ interactivity :D |
(also a good place for some dogfooding of {arrow}) |
And maybe partitioning by month and year would be a good idea too. That would give us some efficiency especially since the OP is able trimmed our look back window. Even that long term mean could be calculated efficiently with a query. |
AH yeah, that way we would only have to rewrite the latest partition vs all values. Nice! |
This PR is a draft to address #55. To do this I have ported the report to become a quarto doc rather than rmarkdown and then written the viz in javascript for more interactivity. Because the way this is implemented, it needs to be served up via https rather than a local html file. So screenshots it is. Here is the default which sets the x-axes to extend to the last 120 days but we can slide to only look at the last ten days or look at the past 6 months. I have also updated the build table to include passing runs. Because this adds a significant amount of rows to the table, I've implemented some interactivity for the build table. This looks like this:
The nightlies job dashboard is great!!!
http://crossbow.voltrondata.com/
But after 7 months of jobs information we should add a way of cleaning old data from it. Both to remove some of the csv's generated on the repo: https://github.com/ursacomputing/crossbow/tree/master/csv_reports
And to make the graphs for trends clearer, right now is difficult to understand dates, etcetera.
The text was updated successfully, but these errors were encountered: