Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild aarch64 only + pytorch 2.4.0 for numpy support #267

Merged
merged 4 commits into from
Oct 2, 2024

Conversation

conda-forge-admin
Copy link
Contributor

@conda-forge-admin conda-forge-admin commented Oct 1, 2024

Closes #266
Closes #256

Hi! This is the friendly automated conda-forge-webservice.

I've started rerendering the recipe as instructed in #266.

If I find any needed changes to the recipe, I'll push them to this PR shortly. Thank you for waiting!

Here's a checklist to do before merging.

@conda-forge-admin
Copy link
Contributor Author

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

@conda-forge-admin conda-forge-admin marked this pull request as ready for review October 1, 2024 11:36
@hmaarrfk hmaarrfk changed the title MNT: rerender Rebuild aarch64 only + pytorch 2.4.0 for numpy support Oct 1, 2024
@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 1, 2024

One more failure:

site-packages/torch/__init__.py", line 290, in <module>
2024-10-01T20:29:09.7250661Z     from torch._C import *  # noqa: F403
2024-10-01T20:29:09.7279549Z     ^^^^^^^^^^^^^^^^^^^^^^
2024-10-01T20:29:09.7294358Z ImportError: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /home/conda/feedstock_root/build_artifacts/libtorch_1727783283199/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.12/site-packages/torch/../../.././libcurand.so.10)

at test time we need Glibc 2.27.... How do we get around this @conda-forge/cuda

@bdice
Copy link

bdice commented Oct 1, 2024

Unfortunately, there isn't a great way to run tests for packages that require a glibc newer than 2.17. We have been skipping tests in conda-forge CI until a compatible system (like the Alma Linux 8 images) is available. There's not a very good workaround afaik, but I think that condaforge/linux-anvil-aarch64-cuda CUDA 11.8 CI images may have a newer glibc 2.28 because they're based on CentOS 8, which could be used as a hack...

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 1, 2024

Thank you for your fast reply. Will include a comment to this effect and skip things

@leofang
Copy link
Member

leofang commented Oct 1, 2024

Q: Can we piggyback #264 in this PR since we're rebuilding aarch64? It seems to me it should be as simple as setting export USE_PRIORITIZED_TEXT_FOR_LD=1 in the build script. Can we give it a try?

@jakirkham
Copy link
Member

AFAICT this release already has that fix. Please see comment: #264 (comment)

@jakirkham
Copy link
Member

Unfortunately, there isn't a great way to run tests for packages that require a glibc newer than 2.17. We have been skipping tests in conda-forge CI until a compatible system (like the Alma Linux 8 images) is available. There's not a very good workaround afaik, but I think that condaforge/linux-anvil-aarch64-cuda CUDA 11.8 CI images may have a newer glibc 2.28 because they're based on CentOS 8, which could be used as a hack...

Agree with Bradley

@h-vetinari did this in FAISS. Perhaps he could share more about what he needed there

Though note the packages then also build in an environment with a newer GLIBC, which means the packages could have newer GLIBC symbols. Admittedly the ARM CUDA packages are already needing a newer GLIBC. So in practice that may already be a reasonable move

Have found maintaining a custom conda_build_config.yaml can be a bit bumpy. It can be done, but I try to avoid if there is a better option. They tend to get hung up with migrators affecting the values in the conda_build_config.yaml and so can require a bit of maintenance to keep working

Ideally we solve this by working through GLIBC 2.28 support in conda-forge ( conda-forge/conda-forge.github.io#1941 ), which can then be used here too. Though understand the value of intermediate solutions; hence, the comments above

@leofang
Copy link
Member

leofang commented Oct 1, 2024

AFAICT this release already has that fix. Please see comment: #264 (comment)

No it is not, see #264 (comment). Please also take a look at the CI log:

  -------------------------------------------------------------------------------------------------
  |                                                                                               |
  |            WARNING: we strongly recommend enabling linker script optimization for ARM + CUDA. |
  |            To do so please export USE_PRIORITIZED_TEXT_FOR_LD=1                               |
  |                                                                                               |
  -------------------------------------------------------------------------------------------------

https://github.com/conda-forge/pytorch-cpu-feedstock/actions/runs/11124603298/job/30910499886?pr=267#step:3:1461

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 1, 2024

Q: Can we piggyback #264 in this PR since we're rebuilding aarch64? It seems to me it should be as simple as setting export USE_PRIORITIZED_TEXT_FOR_LD=1 in the build script. Can we give it a try?

I do not want to try. I am doing everything I can to contain the scope of this PR.

This is a "damage control PR" to unblock many others.

You may branch off this PR and try the fix in a different branch, but build mods have been difficult to test and validate.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 1, 2024

@leofang I am willing to try for Python 3.13 PR + 2.4.1 + libprotobuf migration.

@leofang
Copy link
Member

leofang commented Oct 1, 2024

It is a one-liner change and the worse case scenario is it has no effect (i.e. no perf improvement) at run time; at build time regardless whether it is set it'd always build fine, so I considered it a zero-cost trial test and was hoping to not waste the CI resource (storage and bandwidth). I can certainly add this one liner change in a standalone PR if it is preferred. Just a drive-by comment since I was pinged (as part of the CUDA team).

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 1, 2024

It is a one-liner change and the worse case scenario is it has no effect

I'm sorry, this whole PR was a "1 liner change" that already took a 1 month to resolve.

Again, I'll click merge for a green PR, but for now, I really don't want to do it.

@leofang
Copy link
Member

leofang commented Oct 1, 2024

FYI, one issue addressed by this PR (#266) was filed by a Grace Hopper (GH200) user @jcwomack, which #264 will certainly help too

I'm sorry, this whole PR was a "1 liner change" that already took a 1 month to resolve.

Sorry for frustration, I wasn't aware of the 1 month span nor anticipating any potential pushback. I only looked at the PR start time which was from earlier today. It is OK that we rebuild the package in the next PR. Thank you for considering.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 1, 2024

Sorry for frustration, I wasn't aware of the 1 month span

Thats ok. I am more than happy to help review PRs and do look for your help and guidance. If you want to help, do subscribe to the notifications on this repo.

nor anticipating any potential pushback.

Yeah, unfortunately it is difficult to prioritize things. Sorry for that.

It is OK that we rebuild the package in the next PR

If you want, you can help by forking off here and pulling in a few migrations.

For what its worth, I am trying to build on my own machine to make sure the tests pass now, before pushing and going to sleep....

@h-vetinari
Copy link
Member

Though note the packages then also build in an environment with a newer GLIBC, which means the packages could have newer GLIBC symbols.

No, this is what the sysroot is all about. As long as our compilers use the right sysroot version (through the stdlib jinja), there's no risk to use the newer images. This is also how it worked to build for cos6 (~glibc 2.12) in a cos7 image.

@h-vetinari
Copy link
Member

I know @hmaarrfk is working hard on unblocking this, so I'll defer to him for the purposes on this PR, but I'd much prefer bumping the image rather than skipping the tests

@h-vetinari
Copy link
Member

Ah, just saw the auto merge label. Let's pick this up in the next PR then

@hmaarrfk hmaarrfk merged commit 3a6abd3 into conda-forge:main Oct 2, 2024
4 checks passed
@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 2, 2024

The reason for this build is that I’m pretty sure every aarch package for PyTorch 2.4 was build without bumpy support for the last month.

This should bring us back in line allowing other improvements.

Happy to use the new image next build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Merge the PR when CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python 2.4.0 + aarch compiled without numpy support
6 participants