Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building with CUDA 12.2 #270

Closed
jeongseok-meta opened this issue Oct 4, 2024 · 10 comments
Closed

Building with CUDA 12.2 #270

jeongseok-meta opened this issue Oct 4, 2024 · 10 comments
Labels
question Further information is requested

Comments

@jeongseok-meta
Copy link
Contributor

Comment:

I am interested in knowing the current status or plans regarding the support for building PyTorch with CUDA 12.2 or higher versions, specifically for the linux-64 platform. Despite my attempts to resolve this on my own, I haven't had any success yet. Could someone provide insights or updates on this matter?

@jeongseok-meta jeongseok-meta added the question Further information is requested label Oct 4, 2024
@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 4, 2024

I think @jakirkham once mentioned that there were likely some performance improvements to be had by compiling with newer tooling.

Q: Are you looking for performance improvements from Cuda 12.2?

2 challenges for me:

  • It is really hard to understand the compatibility between cuda versions, linux driver versions, windows driver versions. The information isn't readily available to me (if you have it please share).
  • The CI matrix would have to increase. Given our limited resources, this would just increase the interation time (and build time). The latest build is "fingers crossed green" but likelly won't have all the builds out for 24-48 hours.

@jeongseok-meta
Copy link
Contributor Author

Thank you for the quick reply! I am looking not only for performance improvements but also for fixes to a number of issues related to C++20 when building C++ project with libtorch.

Regarding the challenges, I apologize that I don't have any good answers. However, as a potential solution to the second point, would it be possible to increase the minimum supported version (for 12 major) to 12.2 instead of keeping 12.0 in parallel?

@h-vetinari
Copy link
Member

@jakirkham and I had been discussing a new CUDA 12.x migration, with x TBD. At the time he was about to leave on PTO, but perhaps the time is right to take this up again. There's a bunch of other places that need a newer CUDA already.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 4, 2024

related to C++20 when building C++ project with libtorch.

understood.

to 12.2 instead of keeping 12.0 in parallel?

the problem with this statement is that this is really easy to say this when you start a new project, but as soon as you start to support a few systems that were built over the last X years, you start to be more sympathetic to older hardware. On air-gapped systems for example, there isn't an easy procedure to update things.

I think that both Ubuntu and Nvidia have improved the driver availability situation, but we can't be too trigger happy with dropping versions.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 4, 2024

@jeongseok-meta you have been really active in trying this on the CIs which is great! I think you can likely add something like:
https://github.com/conda-forge/pytorch-cpu-feedstock/pull/261/files#diff-ff61408cdc05bc9667deeadb55e4aaceb1371972076b6bf6934f9008920f2bd2R11

that will build with cuda 12.2 and we can chime in as needed.

Do add the "skips" to skip 99% of the builds while we debug!

@jeongseok-meta
Copy link
Contributor Author

All sounds good. Let me give it a try!

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 4, 2024

you might want to rebase onto #261 since I think that build is working well for cross compilation and the like.

@jeongseok-meta
Copy link
Contributor Author

Summary of current status:

  • PyTorch 2.5.1 #277 was created to update PyTorch to the latest version (2.5.0), which resolves many issues when building with newer dependencies. I would like to update this first.
  • Build with CUDA 12.6 #271 was created to update CUDA to the latest version (12.6). Originally, 12.2 was needed, but it turned out that updating to the latest version all at once is a better solution considering the maintenance cost of CI (as opposed to maintaining multiple 12.x versions). I have checked with my team and we agree on this.

While working on #277, I encountered build issues with CUDA 11.8. @hmaarrfk suggested dropping it as it's a bit outdated, which I also agreed with. Maybe we could maintain 12.0 and 12.6? Not sure if it's worth it though. I wonder how others think about this. Feel free to add your input!

@jeongseok-meta jeongseok-meta mentioned this issue Oct 25, 2024
5 tasks
@hmaarrfk
Copy link
Contributor

Even 11.8 + 12.0 + 12.6 would be ok for me.

@jeongseok-meta
Copy link
Contributor Author

Closing this issue now that the build with CUDA 12.6 has been enabled by #271. I think we can continue the discussion on which CUDA version to support in conda-forge/conda-forge-pinning-feedstock#6630 Thank you for all the guidance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants