Building with CUDA 12.2 #270

jeongseok-meta · 2024-10-04T15:26:38Z

Comment:

I am interested in knowing the current status or plans regarding the support for building PyTorch with CUDA 12.2 or higher versions, specifically for the linux-64 platform. Despite my attempts to resolve this on my own, I haven't had any success yet. Could someone provide insights or updates on this matter?

hmaarrfk · 2024-10-04T15:32:59Z

I think @jakirkham once mentioned that there were likely some performance improvements to be had by compiling with newer tooling.

Q: Are you looking for performance improvements from Cuda 12.2?

2 challenges for me:

It is really hard to understand the compatibility between cuda versions, linux driver versions, windows driver versions. The information isn't readily available to me (if you have it please share).
The CI matrix would have to increase. Given our limited resources, this would just increase the interation time (and build time). The latest build is "fingers crossed green" but likelly won't have all the builds out for 24-48 hours.

jeongseok-meta · 2024-10-04T15:44:51Z

Thank you for the quick reply! I am looking not only for performance improvements but also for fixes to a number of issues related to C++20 when building C++ project with libtorch.

Regarding the challenges, I apologize that I don't have any good answers. However, as a potential solution to the second point, would it be possible to increase the minimum supported version (for 12 major) to 12.2 instead of keeping 12.0 in parallel?

h-vetinari · 2024-10-04T15:51:27Z

@jakirkham and I had been discussing a new CUDA 12.x migration, with x TBD. At the time he was about to leave on PTO, but perhaps the time is right to take this up again. There's a bunch of other places that need a newer CUDA already.

hmaarrfk · 2024-10-04T16:01:30Z

related to C++20 when building C++ project with libtorch.

understood.

to 12.2 instead of keeping 12.0 in parallel?

the problem with this statement is that this is really easy to say this when you start a new project, but as soon as you start to support a few systems that were built over the last X years, you start to be more sympathetic to older hardware. On air-gapped systems for example, there isn't an easy procedure to update things.

I think that both Ubuntu and Nvidia have improved the driver availability situation, but we can't be too trigger happy with dropping versions.

hmaarrfk · 2024-10-04T16:02:56Z

@jeongseok-meta you have been really active in trying this on the CIs which is great! I think you can likely add something like:
https://github.com/conda-forge/pytorch-cpu-feedstock/pull/261/files#diff-ff61408cdc05bc9667deeadb55e4aaceb1371972076b6bf6934f9008920f2bd2R11

that will build with cuda 12.2 and we can chime in as needed.

Do add the "skips" to skip 99% of the builds while we debug!

jeongseok-meta · 2024-10-04T16:06:29Z

All sounds good. Let me give it a try!

hmaarrfk · 2024-10-04T16:10:19Z

you might want to rebase onto #261 since I think that build is working well for cross compilation and the like.

jeongseok-meta · 2024-10-25T23:11:18Z

Summary of current status:

PyTorch 2.5.1 #277 was created to update PyTorch to the latest version (2.5.0), which resolves many issues when building with newer dependencies. I would like to update this first.
Build with CUDA 12.6 #271 was created to update CUDA to the latest version (12.6). Originally, 12.2 was needed, but it turned out that updating to the latest version all at once is a better solution considering the maintenance cost of CI (as opposed to maintaining multiple 12.x versions). I have checked with my team and we agree on this.

While working on #277, I encountered build issues with CUDA 11.8. @hmaarrfk suggested dropping it as it's a bit outdated, which I also agreed with. Maybe we could maintain 12.0 and 12.6? Not sure if it's worth it though. I wonder how others think about this. Feel free to add your input!

hmaarrfk · 2024-10-26T00:37:44Z

Even 11.8 + 12.0 + 12.6 would be ok for me.

jeongseok-meta · 2024-11-05T05:19:58Z

Closing this issue now that the build with CUDA 12.6 has been enabled by #271. I think we can continue the discussion on which CUDA version to support in conda-forge/conda-forge-pinning-feedstock#6630 Thank you for all the guidance!

jeongseok-meta added the question Further information is requested label Oct 4, 2024

hmaarrfk mentioned this issue Oct 8, 2024

Help-Wanted Priority List #273

Open

jeongseok-meta mentioned this issue Oct 25, 2024

PyTorch 2.5.1 #277

Merged

5 tasks

h-vetinari mentioned this issue Oct 31, 2024

Migrate for CUDA 12.x (x to be determined) conda-forge/conda-forge-pinning-feedstock#6630

Open

jeongseok-meta closed this as completed Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building with CUDA 12.2 #270

Building with CUDA 12.2 #270

jeongseok-meta commented Oct 4, 2024

hmaarrfk commented Oct 4, 2024

jeongseok-meta commented Oct 4, 2024

h-vetinari commented Oct 4, 2024

hmaarrfk commented Oct 4, 2024

hmaarrfk commented Oct 4, 2024

jeongseok-meta commented Oct 4, 2024

hmaarrfk commented Oct 4, 2024

jeongseok-meta commented Oct 25, 2024

hmaarrfk commented Oct 26, 2024

jeongseok-meta commented Nov 5, 2024

Building with CUDA 12.2 #270

Building with CUDA 12.2 #270

Comments

jeongseok-meta commented Oct 4, 2024

Comment:

hmaarrfk commented Oct 4, 2024

jeongseok-meta commented Oct 4, 2024

h-vetinari commented Oct 4, 2024

hmaarrfk commented Oct 4, 2024

hmaarrfk commented Oct 4, 2024

jeongseok-meta commented Oct 4, 2024

hmaarrfk commented Oct 4, 2024

jeongseok-meta commented Oct 25, 2024

hmaarrfk commented Oct 26, 2024

jeongseok-meta commented Nov 5, 2024