Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Consolidate metadata in pyproject.toml #1220

Draft
wants to merge 1 commit into
base: branch-23.04
Choose a base branch
from

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Feb 23, 2023

Description

Consolidates metadata in pyproject.toml. Templates the Conda recipe to pull this metadata from pyproject.toml. This should consolidate most of the metadata in one place and minimize duplication and effort updating.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added conda Python Related to RMM Python API labels Feb 23, 2023
@jakirkham jakirkham added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Feb 23, 2023
@jakirkham jakirkham force-pushed the tmplt_conda branch 2 times, most recently from c5286a7 to 5eb2c29 Compare February 23, 2023 05:36
@jakirkham
Copy link
Member Author

@vyasr would be interested in hearing your thoughts on this. Also hopefully this clarifies what I was referring to about templating the Conda recipe further with pyproject.toml

- setuptools >=61.0.0
- tomli # [py<311]
{% for r in data.get("build-system", {}).get("requires", []) %}
{% if r.split()[0] not in ["cmake", "ninja"] %}
Copy link
Member Author

@jakirkham jakirkham Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was going to try to ignore cmake and ninja here, but scikit-build pulls them in anyways

Raised issue ( conda-forge/scikit-build-feedstock#69 ) about this

Also use this metadata to fill out the `rmm` Conda recipe.
@vyasr
Copy link
Contributor

vyasr commented Feb 23, 2023

Thanks @jakirkham! Definitely interested to see how much we can leverage this. There is some tension between this approach and #1175, so we should discuss the tradeoffs between the two approaches.

  1. Generating from dependencies.yaml: At a high level, trying to generate meta.yaml dependencies directly from dependencies.yaml is problematic because meta.yaml isn't strictly YAML, it's a text file that can be formatted with Jinja to produce YAML. That means that from the perspective of dfg, we cannot use a normal YAML writer because we may need to write invalid YAML (e.g. lines with pin_compatible or the compilers). That also means that we need to introduce some syntax for encoding those invalid lines in dependencies.yaml, otherwise we make that file unparseable using standard YAML parsers. The other issue with this approach is that it requires careful partial rewriting of the file. Partial rewriting of TOML as in Generate pyproject.toml dependencies using dfg #1219 is already nontrivial, but pyproject.toml is a true TOML file and tomlkit does a good job of handling style-preserving changes. Since meta.yaml has Jinja in it, we will need to do a lot more manual text parsing (as shown in Generate meta.yaml dependencies from dependencies.yaml #1175), which is quite fragile.
  2. Pulling pyproject.toml data into meta.yaml: The primary weakness of this approach is that pyproject.toml is unlikely to ever contain all of the dependencies that we need in meta.yaml, particularly when it comes to the build and host sections. There is also no easy way to tell which additional dependencies are required a priori, nor is there any super easy rule of thumb. The end result is that we still end up having to maintain both lists, so it obeys the letter of DRY but not really the spirit since you still have to keep full track of both.

The end goal for me with dfg has always been to try and single-source our dependencies. It may be that meta.yaml is where we hit the limits of that approach, and if so that's fine, but I'm approach this PR (and #1175) with the hope of not throwing in the towel quite yet. Would love to get your thoughts there. Also CC @ajschmidt8 @bdice who I've discussed this with before.

@jakirkham
Copy link
Member Author

Interesting was seeing this as complementary to PR ( #1219 ) in that we could generate pyproject.toml and reuse a lot of that content in the recipe

Certainly some things like shared libraries need to be handled separately. Though am guessing this is true for wheels too. How are shared library dependencies being handled in that case?

Requirements is certainly one important piece. Though another is metadata. The Conda recipe's about can be pulled entirely from pyproject.toml metadata, which is a win as this would consolidate that in one place.

Maybe there is something more to thinking about with CMake and Conda sharing metadata somewhere for C++ libraries, but that may be a separate discussion

@vyasr
Copy link
Contributor

vyasr commented Feb 24, 2023

Interesting was seeing this as complementary to PR ( #1219 ) in that we could generate pyproject.toml and reuse a lot of that content in the recipe

Requirements is certainly one important piece. Though another is metadata. The Conda recipe's about can be pulled entirely from pyproject.toml metadata, which is a win as this would consolidate that in one place.

Maybe there is something more to thinking about with CMake and Conda sharing metadata somewhere for C++ libraries, but that may be a separate discussion

Sorry my initial comment wasn't super clear. I am 100% on board reusing pyproject.toml's metadata in meta.yaml. That is definitely the right approach to reduce duplication here. My previous comment was solely around what to do about dependencies, everything else can be pulled from pyproject.toml. I wonder if we should also consider pulling the package section from pyproject.toml as well. The name is probably overkill since it won't ever change, but pulling the version could be a useful way of ensuring that wheels and conda packages have their release versions managed in a consistent fashion going forward. WDYT?

Certainly some things like shared libraries need to be handled separately. Though am guessing this is true for wheels too. How are shared library dependencies being handled in that case?

Yup, this is the main area where I see the two approaches (dependencies.yaml+rapids-dependency-file-generator vs. pyproject.toml) as somewhat divergent and was trying to think about ways to reconcile. There are a few different cases that I can think of (and this list probably isn't exhaustive):

  • Compilers: For wheels, these must be installed on the host system, whereas conda can pull them.
  • CUDA libraries: Same deal as compilers
  • pin_compatible-style dependencies: These must be pinned separately in the build-system requires section and the dependencies section of pyproject.toml (for an example see the pyarrow pinnings in pyproject.toml and cudf/meta.yaml in the cudf repo). Since we'll have to maintain these separately anyway in dependencies.yaml, I wonder if we should eschew usage of pin_compatible entirely and just pull the manual pins out of the (generated) pyproject.toml dependencies into meta.yaml. Do you see any issues with that approach?
  • Shared libraries that are statically compiled into wheels: Examples of this are libraft or cumlprims_mg in cuml. Here we end up with a fundamentally different requirement set because of the nature of the build. If, in the future, we move towards dynamic linking and using rpaths to introduce links between libraries in different wheels, then I think pyproject.toml and meta.yaml will align much more closely here.
  • Packages that use run_exports in conda: I was just reminded of this from your comment Consolidate metadata in pyproject.toml ucx-py#930 (comment). There's no equivalent of those for pip requirements. That means that for a meta.yaml file we'll have a package in the host or build list (depending on whether it's a strong export) that will automatically propagate to the run dependencies, whereas for pyproject.toml we'll have to manually include it in both the build-system requires and the project dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conda improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Related to RMM Python API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants