Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild for cudnn 9 -- i.e. drop cudnn altogether #286

Merged

Conversation

regro-cf-autotick-bot
Copy link
Contributor

This PR has been triggered in an effort to update cudnn9.

Notes and instructions for merging this PR:

  1. Please merge the PR only after the tests have passed.
  2. Feel free to push to the bot's branch to update this PR if needed.

Please note that if you close this PR we presume that the feedstock has been rebuilt, so if you are going to perform the rebuild yourself don't close this PR until the your rebuild has been merged.


If this PR was opened in error or needs to be updated please add the bot-rerun label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase @conda-forge-admin, please rerun bot in a PR comment to have the conda-forge-admin add it for you.

This PR was created by the regro-cf-autotick-bot. The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. Feel free to drop us a line if there are any issues! This PR was generated by https://github.com/regro/cf-scripts/actions/runs/10628718403 - please use this URL for debugging.

@conda-forge-webservices
Copy link
Contributor

conda-forge-webservices bot commented Aug 30, 2024

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

@jakirkham
Copy link
Member

Work likely would be needed upstream. That said, it looks more likely this will dropped as a dependency than upgraded

xref: cupy/cupy#8215

@leofang leofang marked this pull request as draft August 30, 2024 15:21
@hmaarrfk
Copy link
Contributor

@conda-forge-admin please restart cis

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 8, 2024

Given that the intention is to drop CUDNN altogether, might it be be acceptable to just "not use cudnn" at build time as it is suggested in:

cupy/cupy#8633 (comment)

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 8, 2024

@conda-forge-admin please rerender

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you but ran into some issues. Please check the output logs of the GitHub actions workflow below for more details. You can also ping conda-forge/core for further assistance or you can try rerendering locally.

The following suggestions might help debug any issues:

  • Is the recipe/{meta.yaml,recipe.yaml} file valid?
  • If there is a recipe/conda-build-config.yaml file in the feedstock make sure that it is compatible with the current global pinnnings.
  • Is the fork used for this PR on an organization or user GitHub account? Automated rerendering via the webservices admin bot only works for user GitHub accounts.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/11227339497.

@hmaarrfk hmaarrfk changed the title Rebuild for cudnn 9 Rebuild for cudnn 9 -- i.e. cuda cudnn altogether Oct 8, 2024
@jakirkham
Copy link
Member

jakirkham commented Oct 8, 2024

Originally the thinking was to support cuDNN until it was dropped

This is possible with cuDNN 8, but not cuDNN 9

Edit: It's worth noting cuDNN is already an optional dependency

xref: #266

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 8, 2024

If this build passes, we could build CUDNN8 support with a higher build number.

Perhaps we can also wait a few weeks for a new "version" upstream, to more clearly communicate the "change of a feature set" at conda-forge.

@jakirkham
Copy link
Member

Yeah was inclined to wait for CuPy 14

Though please let us know if there's more context behind your interest here (like some bug we need to be aware of)

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 8, 2024

The only thing i'm concerned about for "14" or the "next version" is that often it slips, and "old cupy" can "keep your packages on old versions" which makes it difficult to stay on the edge.

The Cupy14 release tracker seems stuck in April cupy/cupy#8269 (again nothing wrong with that, just it is what it looks like).

@jakirkham
Copy link
Member

Made a note in that issue: cupy/cupy#8269 (comment)

Despite indications to the contrary, folks are hard at work on that release. The biggest item being NumPy 2 compatibility (meaning making CuPy look and act more like NumPy 2 in its API), which will involve quite a bit of work

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 8, 2024

Despite indications to the contrary, folks are hard at work on that release

I apologize. I had a hard time writing things on github. Sometimes typing doesn't evoke the complete meaning.

Even on my project I still pin to "1.26" on releases but test against "2.0" while the full bugs are identified.

I guess my main concern is that cupy is now one of the few remaining packages that isn't migrated to CUDNN9.
https://conda-forge.org/status/migration/?name=cudnn9

Soon it will become increasingly harder to install along side pytorch.

@hmaarrfk hmaarrfk changed the title Rebuild for cudnn 9 -- i.e. cuda cudnn altogether Rebuild for cudnn 9 -- i.e. drop cudnn altogether Oct 8, 2024
@jakirkham
Copy link
Member

jakirkham commented Oct 8, 2024

Ok thanks for clarifying the issue. Can see how this would cause difficulties

How about we have one build with cuDNN 8 and one build without cuDNN altogether?

That way users still needing the cuDNN bits can get them. Though if cuDNN is not needed, they can pick up the CuPy build without cuDNN. Should add the solver should also be able to pickup the cuDNN free build when cuDNN 9 is in the environment

@leofang
Copy link
Member

leofang commented Oct 10, 2024

Sorry for late reply. I am supportive to just drop the cuDNN support altogether to simplify things. I don't think anyone is using it through CuPy.

I feel shipping two builds (=doubling the support matrix) is a bit overwhelming. If backward compatibility is a concern, perhaps we can add cuDNN 9 as a run_constrained for the time being?

- run_constrained:
  - cudnn >=9

This way:

  • For users who need CuPy and don't care about cudnn: conda resolves to the latest build (due to build number bump)
  • For users who need to use cuDNN 8 through CuPy: conda install cupy cudnn=8 resolves to the previous build
  • For users who need CuPy and cuDNN 9 to coexist: conda resolve to the latest build (due to run_constrained)

@hmaarrfk
Copy link
Contributor

If backward compatibility is a concern

I do not believe i use CUDNN so it is not a concern to me. I feel like the original addition expansion was more for uniformity than anything else:
#266 (comment)

However, I, nor the bot, can rerender....

@conda-forge-admin please rerender

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you but ran into some issues. Please check the output logs of the GitHub actions workflow below for more details. You can also ping conda-forge/core for further assistance or you can try rerendering locally.

The following suggestions might help debug any issues:

  • Is the recipe/{meta.yaml,recipe.yaml} file valid?
  • If there is a recipe/conda-build-config.yaml file in the feedstock make sure that it is compatible with the current global pinnnings.
  • Is the fork used for this PR on an organization or user GitHub account? Automated rerendering via the webservices admin bot only works for user GitHub accounts.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/11275434758.

@leofang
Copy link
Member

leofang commented Oct 10, 2024

I can look into the rerender error later today, occupied atm 😥

@hmaarrfk
Copy link
Contributor

@conda-forge-admin please rerender

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you but ran into some issues. Please check the output logs of the GitHub actions workflow below for more details. You can also ping conda-forge/core for further assistance or you can try rerendering locally.

The following suggestions might help debug any issues:

  • Is the recipe/{meta.yaml,recipe.yaml} file valid?
  • If there is a recipe/conda-build-config.yaml file in the feedstock make sure that it is compatible with the current global pinnnings.
  • Is the fork used for this PR on an organization or user GitHub account? Automated rerendering via the webservices admin bot only works for user GitHub accounts.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/11278006184.

@hmaarrfk
Copy link
Contributor

The problem likely stems from the closing of the CUDA 12.0 migration and the fact that the cuda migrator is vendored.

regro-cf-autotick-bot@ac4c2d7

@hmaarrfk
Copy link
Contributor

@conda-forge-admin please rerender

@jakirkham
Copy link
Member

@conda-forge-admin , please re-render

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/11286055117.

@jakirkham
Copy link
Member

Fixed the re-rendering issues in PR: #289

Have merged that into this PR and resolved conflicts

Note that we are using the vendored migrator to add the latest CUDA builds. Some minor tweaks to that migrator were all that was needed to get things to work

@leofang
Copy link
Member

leofang commented Oct 11, 2024

I can look into the rerender error later today, occupied atm 😥

Fixed the re-rendering issues in PR: #289

Looks like John took care of it for me (thx!)

Note that we are using the vendored migrator to add the latest CUDA builds.

From this and other PRs I understand you two have different opinions when it comes to resolving this kind of awkward rerendering issues (@jakirkham prefers to modify the vendored migrator, @hmaarrfk prefers to craft/maintain an explicit cbc.yaml file). I lean toward the latter but only slightly, mainly because having to manually modify anything under .ci_support is kinda hard to teach/debug. I am not the one doing heavylifting here, so I am happy to take whatever solution you guys decide to land on (and thx again!) 😄

@jakirkham
Copy link
Member

For clarity the migrator was already handling the addition of CUDA 12.x. All I did was fix what was already used here

If we would like to discuss changes to that process, I would suggest that we raise a new issue to discuss it separately

AIUI this PR is focused on cuDNN handling. Let's keep this focused on that

@hmaarrfk hmaarrfk marked this pull request as ready for review October 11, 2024 16:52
Copy link
Contributor

@hmaarrfk hmaarrfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@hmaarrfk
Copy link
Contributor

So @jakirkham compiling with 12.6 still keeps it compatible with 12.0????

@leofang
Copy link
Member

leofang commented Oct 11, 2024

compiling with 12.6 still keeps it compatible with 12.0????

Yes, CuPy is taking advantage of CUDA minor version compatibility, so that we don't build packages per CUDA minor release.
https://docs.nvidia.com/deploy/cuda-compatibility/#minor-version-compatibility

@hmaarrfk
Copy link
Contributor

Great thanks. Merge?

@leofang
Copy link
Member

leofang commented Oct 11, 2024

For clarity the migrator was already handling the addition of CUDA 12.x. All I did was fix what was already used here

If we would like to discuss changes to that process, I would suggest that we raise a new issue to discuss it separately

AIUI this PR is focused on cuDNN handling. Let's keep this focused on that

@jakirkham Sorry if I am mistaken, but I am sensing some defensiveness here, and I feel obliged to explain (not trying to argue) why I raised the concern which I believe is relevant to general OSS development and this particular cuDNN-handling effort.

Throughout this PR, what I see is:

  • Mark kept hitting rerendering errors as he attempted to tackle the issue, none of the feedstock maintainers responded to Mark promptly as to why an error could happen (my apology!)
  • Eventually Mark figured out himself it's due to this feedstock overwriting the vendored migrator (link)
  • Mark made a (partly working) cbc.yaml-based solution in commit e5cb439
  • Mark's solution was silently overturned/removed in commit f0fc0d5

I think it is perfectly fine that a project maintainer has preference over certain approaches, but I would appreciate at least a reason or comment is given as to why the contributor's change is not favored. And that's why I thought it's good to point it out at least for the record.

Since Mark is happy, I am happy too and I approved the PR 🙂 Thanks @hmaarrfk @jakirkham!

@leofang leofang merged commit b52aaf5 into conda-forge:main Oct 11, 2024
27 checks passed
@regro-cf-autotick-bot regro-cf-autotick-bot deleted the rebuild-cudnn9-0-1_h1dd369 branch October 11, 2024 17:25
@hmaarrfk
Copy link
Contributor

I’m ok with letting this go.

I mostly don’t like editing .ci_support since that is often left for bot generated outputs.

If cupy is able to gain performance in 12.6 then I think we should try to extend this beyond this feedstock.

Let’s spend our time documenting that and implementing it

PS. I also feel like I am too liberal in using the Bots PR for a feedstock I don’t maintain. I should have just created my own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants