Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vendor Rust crates into source distributions #49

Merged
merged 2 commits into from
Jun 6, 2024

Conversation

tiran
Copy link
Collaborator

@tiran tiran commented Jun 4, 2024

vendor_rust() takes an existing sdist tar ball and creates a new sdist with all Rust crates vendored. This allows offline compilation without downloading any Rust crates at build time.

Switch from tomli to toml. The tomli package does not have write support.

I have successfully wheel builds with vendored crates for

  • cryptography-42.0.7.tar.gz
  • maturin-1.5.1.tar.gz
  • pydantic_core-2.18.4.tar.gz
  • setuptools-rust-1.9.0.tar.gz

with CARGO_NET_OFFLINE=true python3 -m build -w.

Copy link
Member

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool, thank you!

I left some notes about how it might be integrated in an inline comment. Let me know what you think, maybe you had a different idea in mind.

One global comment is I'm trying to log fairly verbosely. Debug messages are usually written to a file. Info messages go to the console as progress messages, indicating that some action (especially anything that might take time) is starting.



def vendor_rust(
sdist: pathlib.Path,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should work out how to integrate this into the flow.

Today we have

  • download-source
  • prepare-source
    • unpack
    • patch
  • prepare-build
  • build-wheel

That leaves the patched source tree with build artifacts in it on the disk, and leaves the original sdist unmodified.

I think we want instead something like

  • download-source
  • prepare-source
    • unpack
    • patch
    • vendor rust
  • re-roll-sdist (to preserve a copy of our patched sources with the vendored rust code, before any build artifacts are written, so we can distribute the sources we use for the build)
  • prepare-build
  • build-wheel

If that's the flow, then I think for input we'll want this function to take the path to the source root instead of the sdist file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good plan. I like the idea to re-roll sdist. Eventually we want to store new sdists with downstream patches into a separate repo, too.

I have changed the function. It now takes a project_dir argument and returns a bool vendor_rust(project_dir: pathlib.Path, *, shrink_vendored: bool = True) -> bool. The project directory is the first directory in the root of a tar ball.

# Remove large, unused archive and lib files. They are only
# used on Windows, macOS, and iOS. This makes the vendor bundle up to
# 60% smaller.
for pattern in VENDOR_FILTER_PATTERNS:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make the pattern configurable, eventually? Are we likely to want to have it be part of the per-package customization interface?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored the code and moved the cleanup to a separate function. The cleanup now removes all **/*.a and **/*.lib files. It didn't work before because I didn't update the crate's checksum file. That's now fixed.

The removal of pre-compiled archive and lib files serves two purposes:

  1. Reduce size of vendored crates
  2. Prevent use of pre-compiled files

If we ever need the files, then we have to figure out how to build the crates from source.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if it's not going to be variable keeping it in the source makes sense. We can always make it configurable later, if we need to.

with open(fileame, "r", encoding="utf-8") as f:
cfg = toml.load(f)

modified = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to keep project-specific knowledge out of the core bits, and provide an override hook or other mechanism for a build system to inject that sort of knowledge based on the packages it wants to build. In this case, for setting the rust level, I could see us setting a global config in the settings file, or I could see a patch being applied to the package.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood! That makes sense.

I have removed the code that patches the MSRV in Cargo.toml files. Let's fix this in a separate patch.

I also filed an issue and PR with pydantic-core to lower their MSRV to 1.75. Hopefully their next upstream version can be built without a patch.

src/fromager/vendor_rust.py Outdated Show resolved Hide resolved
raise FileNotFoundError(project_dir)

manifests = list(project_dir.glob("**/Cargo.toml"))
for manifest in manifests:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An info log message here would provide good progress output.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a bunch of log statement (info and debug) to the code.

Copy link
Collaborator Author

@tiran tiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed your suggestions and updated the code.

I also added a minimal Rust project with pyo3 and setuptools-rust to e2e directory for future testing.

# Remove large, unused archive and lib files. They are only
# used on Windows, macOS, and iOS. This makes the vendor bundle up to
# 60% smaller.
for pattern in VENDOR_FILTER_PATTERNS:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored the code and moved the cleanup to a separate function. The cleanup now removes all **/*.a and **/*.lib files. It didn't work before because I didn't update the crate's checksum file. That's now fixed.

The removal of pre-compiled archive and lib files serves two purposes:

  1. Reduce size of vendored crates
  2. Prevent use of pre-compiled files

If we ever need the files, then we have to figure out how to build the crates from source.

with open(fileame, "r", encoding="utf-8") as f:
cfg = toml.load(f)

modified = False
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood! That makes sense.

I have removed the code that patches the MSRV in Cargo.toml files. Let's fix this in a separate patch.

I also filed an issue and PR with pydantic-core to lower their MSRV to 1.75. Hopefully their next upstream version can be built without a patch.

raise FileNotFoundError(project_dir)

manifests = list(project_dir.glob("**/Cargo.toml"))
for manifest in manifests:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a bunch of log statement (info and debug) to the code.



def vendor_rust(
sdist: pathlib.Path,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good plan. I like the idea to re-roll sdist. Eventually we want to store new sdists with downstream patches into a separate repo, too.

I have changed the function. It now takes a project_dir argument and returns a bool vendor_rust(project_dir: pathlib.Path, *, shrink_vendored: bool = True) -> bool. The project directory is the first directory in the root of a tar ball.

@tiran tiran force-pushed the vendor-rust branch 2 times, most recently from 7f1499b to 6bc1bcd Compare June 5, 2024 07:54
@tiran tiran marked this pull request as ready for review June 5, 2024 11:24
`vendor_rust()` takes an existing sdist tar ball and creates a new sdist
with all Rust crates vendored. This allows offline compilation without
downloading any Rust crates at build time.

Switch from `tomli` to `toml`. The `tomli` package does not have write
support.

I have successfully wheel builds with vendored crates for

- `cryptography-42.0.7.tar.gz`
- `maturin-1.5.1.tar.gz`
- `pydantic_core-2.18.4.tar.gz`
- `setuptools-rust-1.9.0.tar.gz`

with `CARGO_NET_OFFLINE=true python3 -m build -w`.

Signed-off-by: Christian Heimes <[email protected]>
Copy link
Member

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@@ -15,6 +15,9 @@ def build_wheel(ctx, req, sdist_root_dir, build_env):
if not builder:
builder = default_build_wheel
extra_environ = overrides.extra_environ_for_pkg(ctx.envs_dir, req.name, ctx.variant)
# TODO: refactor?
# Build Rust without network access
extra_environ["CARGO_NET_OFFLINE"] = "true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should go in default_build_wheel? Maybe it's better to do it here so any overrides also use the same setting without having to remember to add it themselves.

@dhellmann
Copy link
Member

@Mergifyio refresh

Copy link
Contributor

mergify bot commented Jun 6, 2024

refresh

✅ Pull request refreshed

@dhellmann dhellmann merged commit 5ef505f into python-wheel-build:main Jun 6, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants