-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Free up space in GitHub Actions Runners for remaining jobs #1601
Conversation
Signed-off-by: Jordan Carlin <[email protected]>
@cmuellner @TommyMurphyTM1234 Would be good to get this merged so we can finally get a gcc 14 nightly release |
A few questions...
I realise that these commands were already present in other places in the actions but I didn't understand them there either and was always meaning to ask about them. |
Thanks for the PR! Are you sure that manually deleting unused distro components is sufficient to address the problem? I.e. Have you reproduced the issue and verified that this fixes it? If so, then we might consider uninstalling packages. |
@TommyMurphyTM1234 In this case I just went with what had already been done in the other jobs for this workflow, but I dealt with a more complicated version of this for another project so can provide some context. GitHub Actions runners are only guaranteed to have 14 GB of free disk space. In practice they tend to have something in the 20-25 GB range free. The actual runners are much larger (close to 75 GB), but much of that is used up by the default container configuration. To answer you questions:
These are two of the largest preinstalled items (collectively using 9 GB) and neither are needed for these workflows. Presumably when these jobs were first created they were selected as easy targets to recover space.
See https://github.com/actions/runner-images for details on what comes preinstalled on the GitHub Action runners. They try to preload most of the software people might need to reduce CI time and avoid the need to install various components every time.
I believe the runner does not install Android with apt, so it must be manually deleted.
All of the artifacts that the failing job is trying to download take up ~25 GB in total. Removing these two components gives us ~30 GB of free space.
Each job is started in a new container, so there is no way to make things persist between them. There is no "init" time that applies to everything in the workflow.
I created a script that removes almost all of the preinstalled software for another repo (https://github.com/openhwgroup/cvw/blob/main/.github/cli-space-cleanup.sh). With that script the total available free storage increases to 61 GB. If we want to ensure this isn't an issue in the future we could do something like that to remove more software, but it seems like it is probably unnecessary for this.
All GitHub Actions runners are created from the container image linked above and guaranteed to have at least 14 GB of free space. Anything beyond that will fluctuate as the images are updated.
We definitely could, but most of those actions do a lot other strange things (recreating the filesystem to merge another unused disk) that seem much more likely to break as the image is updated. Just removing software shouldn't fail even if that software were to no longer be included. |
The error log does not tell much. I assume we have not reached the disk space limit of a release's build artifacts, but we have reached the limit of the build machine that does the release step (create a release, download all toolchains, upload toolchains to release). If my assumption is right, then the issue is that we now have 24 toolchains, and our approach of downloading them all at once and pushing them to the release is not working. Possible solutions: either we move the upload part to the toolchain builders, or we process one toolchain at a time (download from build, upload to release, delete). |
Yes. That is what I see as well. The job that downloads all of them runs out of space. The easiest solution would be to create more space on that runner, but changing the workflow to avoid downloading them all could also work. I'm not sure how to upload to the same release from multiple jobs though. |
For dotnet this could work: For things under E.g.
I did not look into the installation scripts of the images that install Android in the images. |
Thanks @jordancarlin for the explanations. I guess that I don't understand enough about the GitHub actions/runners etc. and need to read up on them a bit. 🙂 |
Maybe the best solution is to create a small script that uninstalls several of the tools (can be a simplified version of the one I linked above) and call that script at the beginning of each job. That way it is centralized in one place and can be easily updated if needed. |
After looking into the CI/CD script again, I don't think we need to discuss or change this PR further. That's enough justification to merge this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We already do this in the build step.
Thanks again for the PR, @jordancarlin! |
Great. Hopefully that'll solve our issues. |
Is it still failing in spite of this change? |
Looks like it was a transient network issue that time |
Attempt to resolve issue #1591 by creating more free space on the runner for the jobs that are failing.