Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly release CI action is broken #1591

Open
TommyMurphyTM1234 opened this issue Oct 23, 2024 · 23 comments
Open

Nightly release CI action is broken #1591

TommyMurphyTM1234 opened this issue Oct 23, 2024 · 23 comments

Comments

@TommyMurphyTM1234
Copy link
Collaborator

See here:

Nightly Release
Error when evaluating 'strategy' for job 'upload-assets'. .github/workflows/nightly-release.yaml (Line: 199, Col: 15): Matrix must define at least one vector

Only source bundles generated, no binary toolchains.

@TommyMurphyTM1234
Copy link
Collaborator Author

This seems to be the culprit but I don't really understand it yet...

@TommyMurphyTM1234
Copy link
Collaborator Author

The last successful nightly release was 3rd September 2024:

so I presume that one of the commits since that date caused the problem?

I hope it's not one of mine! :-)

@cmuellner
Copy link
Collaborator

I created a PR that should address this issue: #1592

@TommyMurphyTM1234
Copy link
Collaborator Author

I created a PR that should address this issue: #1592

Thanks @cmuellner. 👍

@TommyMurphyTM1234
Copy link
Collaborator Author

Any idea why the nightly build still doesn't seem to be working or, at least, hasn't completed and uploaded a complete set of built artifacts yet?

@cmuellner
Copy link
Collaborator

I was waiting for a review of my PRs that address issues in the CI/CD scripts (#1582 and #1592).
I just merged them without receiving a review.

@TommyMurphyTM1234
Copy link
Collaborator Author

TommyMurphyTM1234 commented Oct 26, 2024

Still something wrong I guess? Only sources in the latest release again.

Edit: oh, out of disk space? Even though it's supposed to clean up after itself as far as I can see?

Does it maybe need to do more to clean up?
Do older release artifacts need to be deleted?
Are the changes to enable additional Linux musl and uClibc builds exceeding the available resources?

@jordancarlin
Copy link
Contributor

It looks like it is the "create release" job that is running out of space. It downloads all of the artifacts from previous steps, which take up 25 GB but the runner only has 21 GB available. Each job is run on a separate runner, so the space needs to be cleaned up in this job too.

@cmuellner
Copy link
Collaborator

The CI seems to be regularly (I've observed this multiple times since we added the musl builds into the CI/CD) broken because of git/musl issues:

error: RPC failed; HTTP 504 curl 22 The requested URL returned error: 504
fatal: expected 'packfile'
fatal: clone of 'https://git.musl-libc.org/git/musl' into submodule path '/home/runner/work/riscv-gnu-toolchain/riscv-gnu-toolchain/musl' failed
Failed to clone 'musl' a second time, aborting

I'm not sure what the best way to move forward here is.

@TommyMurphyTM1234
Copy link
Collaborator Author

TommyMurphyTM1234 commented Nov 1, 2024

https://git.musl-libc.org/git/musl

That's the wrong URL as far as I can see:

Edit: ah - sorry - ignore that...

2024-11-01 11 59 08

@mickflemm
Copy link
Contributor

Maybe we are hitting an issue with HTTP (e.g. http.postBuffer is not enough to hold the pack file), does doing a shallow clone solve this ? Yocto at some point used this mirror: https://github.com/kraj/musl/tree/master it seems to be up to date.

Maybe @richfelker can help.

@richfelker
Copy link

Can you provide a minimal test case to reproduce the failure to git-clone? I just cloned successfully.

@richfelker
Copy link

FWIW if you're re-cloning on every CI job, the polite thing to do is make it a shallow clone. The more polite thing to do would be to cache git clones. But I don't think this is related to the problem.

@TommyMurphyTM1234
Copy link
Collaborator Author

FWIW if you're re-cloning on every CI job, the polite thing to do is make it a shallow clone.

FWIW that's what this recent PR was intended to deal with but it's closed pending further investigations:

The more polite thing to do would be to cache git clones. But I don't think this is related to the problem.

Do you know what this would involve for this repo's actions?

@richfelker
Copy link

Do you know what this would involve for this repo's actions?

No, I don't. I've really avoided getting into CI workflows myself because I deem them a gigantically irresponsible abuse of resources. So I'm not sure what tooling there is to fix this (avoid downloading the same thing thousands of times), but it's something I very much hope someone is working on.

@cmuellner
Copy link
Collaborator

I've really avoided getting into CI workflows myself because I deem them a gigantically irresponsible abuse of resources.

I'm in the same camp. However, there is a significant demand for pre-built releases of this repo.
The automatic builds, which trigger new releases if new changes are merged, broke in August. Since then, people have regularly reached out as they want them back.

A possible solution is to have a mirror repo on Github, which regularly pulls the changes from the official repo. This reduces the load on upstream git servers.

@TommyMurphyTM1234
Copy link
Collaborator Author

A possible solution is to have a mirror repo on Github, which regularly pulls the changes from the official repo. This reduces the load on upstream git servers.

Another possibility might be to wget source tarballs for those components that have clearly defined releases?

@TommyMurphyTM1234
Copy link
Collaborator Author

In case this helps at all (may belong elsewhere?):

Notes:

  1. For each component the master tarball link is listed along with a link to the version currently used by riscv-gnu-toolchain (except where noted otherwise below)
  2. Where different tarball formats are available the smallest/most compressed option was selected
  3. Not sure what specific llvm tarball is relevant
  4. Ditto for dejagnu
  5. Spike/pk repos don't seem to create regular release/snapshot tarballs so this may be an option for these components yet?

@mickflemm
Copy link
Contributor

On one hand if we run CI very often it's indeed a waste of resources, on the other hand it's useful for regression testing (and we can't solve that with a mirror repo btw, we can't check for example pull requests that way, and it's super useful), and it's an even worse waste of resources to have the users of this repo (or their CIs) building this repo again and again. That being said there are a few ways to optimize the flow, here are a few suggestions:

  1. Mark all submodules for shallow cloning, this is better than wget IMHO since it'll work for all repos (even for those that don't create release tarballs), and will also be easier to update them. It'll make the build process faster for our users too.

  2. I don't know if we can preserve the build environment across builds, that would help a lot since at this point we clone everything on every job. One approach would be to use https://github.com/actions/cache for sharing the cloned repos (or maybe have one job that would clone everything and cache its output for the others) but I haven't tested it (I've seen others using artifact upload/download to share data across jobs).

  3. Instead of nigthly releases we can do weekly releases, it's a small change on nightly-release.yaml, it doesn't make much sense to have a release every day, even monthly releases would be fine.

  4. For improving the size of the generated toolchains, we could deduplicate files using their hashes and switch them to hardlinks (I've tested it and it works fine), tar will preserve those so when user unpacks it there will also be a benefit, not only for the tarball's size. Then we could switch form gz to a more efficient compression, I use xz and to improve its efficiency further I first create the tarball and then compress it with xz -e -T0 (this is better than tar cvJf since it has the opportunity to create a better dictionary).

@TommyMurphyTM1234
Copy link
Collaborator Author

TommyMurphyTM1234 commented Nov 1, 2024

Then we could switch form gz to a more efficient compression, I use xz

Tarball repositories that I've seen (e.g. see above) suggest that LZ compression may be even better than XZ (at least from a compression perspective, not sure if it's slower?).

@cmuellner
Copy link
Collaborator

cmuellner commented Nov 1, 2024

  1. Mark all submodules for shallow cloning, this is better than wget IMHO since it'll work for all repos (even for those that don't create release tarballs), and will also be easier to update them. It'll make the build process faster for our users too.

PR exists (#1605).

  1. I don't know if we can preserve the build environment across builds, that would help a lot since at this point we clone everything on every job. One approach would be to use https://github.com/actions/cache for sharing the cloned repos (or maybe have one job that would clone everything and cache its output for the others) but I haven't tested it (I've seen others using artifact upload/download to share data across jobs).

I also thought of this, but I have zero experience with it. It is hard to get up and running if it cannot be tested locally.

  1. Instead of nigthly releases we can do weekly releases, it's a small change on nightly-release.yaml, it doesn't make much sense to have a release every day, even monthly releases would be fine.

We trigger the build every night, but it will not download/build anything if there were no changes in the last 24 hours.

  1. For improving the size of the generated toolchains, we could deduplicate files using their hashes and switch them to hardlinks (I've tested it and it works fine), tar will preserve those so when user unpacks it there will also be a benefit, not only for the tarball's size. Then we could switch form gz to a more efficient compression, I use xz and to improve its efficiency further I first create the tarball and then compress it with xz -e -T0 (this is better than tar cvJf since it has the opportunity to create a better dictionary).

I will look into this. I usually use --threads=0 -6e for toolchain releases, as this gave the best results when I tested it a few years ago.

Thanks!

@TShapinsky
Copy link

2. I don't know if we can preserve the build environment across builds, that would help a lot since at this point we clone everything on every job. One approach would be to use https://github.com/actions/cache for sharing the cloned repos (or maybe have one job that would clone everything and cache its output for the others) but I haven't tested it (I've seen others using artifact upload/download to share data across jobs).

I'm working on a branch on my fork on this topic. Not happy with it quite yet. TShapinsky#2

@TShapinsky
Copy link

Another way toolchain size can be reduced is if stripped versions of the programs are used. A good portion of the dependencies already support a variant of make install-strip. I did some testing and it reduced the final toolchain output size by more than half.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants