bundler: Fix for prefetching of dependencies #673

a-ovchinnikov · 2024-10-03T20:33:57Z

Some dependencies contain architecture-specific components. Their name and location differ from standard Ruby-only dependencies. As a result prior to this commit such dependencies would be skipped during prefetch. This, in turn, failed hermetic builds because a dependency would end up missing.

Resolves: #672

Maintainers will complete the following section

Commit messages are descriptive enough
Code coverage from testing does not decrease and new code is covered
Docs updated (if applicable)
Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

approve GitHub Actions workflows by clicking a button
approve the Red Hat Trusted App Pipeline container build by commenting /ok-to-test
(as is the standard for Pipelines as Code)

a-ovchinnikov · 2024-10-04T13:00:00Z

/retest

brunoapimentel · 2024-10-07T21:09:18Z

/retest

tests/integration/test_data/bundler_everything_present/container/Containerfile

slimreaper35

It would be fine to keep e2e tests as separate commits. You forgot to generate test data.

cachi2/core/package_managers/bundler/main.py

slimreaper35 · 2024-10-08T09:31:18Z

cachi2/core/package_managers/bundler/scripts/lockfile_parser.rb

@@ -11,7 +11,9 @@
 lockfile_parser.specs.each do |spec|
    parsed_spec = {
      name: spec.name,
-      version: spec.version.to_s
+      version: spec.version.to_s,
+      full_name: spec.full_name,


I don't think we need full_name, it can be easily constructed from name and version as you have in the code.
That said, I wonder if we can move the platform attribute to the GemDependency class and make it default to ruby (for example). I see a lot of duplication in both classes.

When I look at ~~our favorite~~ PURL spec doc I see a platform qualifier. The default is ruby.

Full name requires platform to be constructed and has to be dispatched basing on whether platform equals to 'ruby' or not. That being said the method I used for platform detection is really hair-brained and could be simplified to just checking if platform equals to 'ruby'.

slimreaper35 · 2024-10-08T09:34:31Z

cachi2/core/package_managers/bundler/parser.py

+                    # No need to force a platform if we skip the packages.
+                    log.warning(
+                        "Skipping binary dependency %s because 'allow_binary' is set to False."
+                        " This will likely result in an unbuildable package.",


I would argue, that this is not true, it is a rare scenario when a gem does not ship a source distribution.
We only found one during the development.

The way nokogiri dependency is currently defined in test repo a build will fail if binaries are not available. That's how I learned about this problem. We do not control Gemfiles and thus cannot exclude platforms we don't like and will end up with multiple binaries in lock file. We cannot force Ruby platform too: this requires modifications to the lock file. Please correct me if I am wrong on any of these points. Moreover, even if we could force a platform we probably should not, because this would mean that any forced application would run much slower.

We cannot force Ruby platform too: this requires modifications to the lock file

We should not touch the lock file - ever - as that is the only source of truth for us, if we change it, it's going to be hard arguing about incorrect platform targets, bad resolution (see the issue I linked elsewhere) to the consumer base.

What I potentially don't agree with is the warning statement as argued by @slimreaper35 for one particular reason - I don't see any further scanning on dupes in case allow_binary == False, IOW there could be an identical dependency defined with platform ruby withing the lockfile along with the platform specific ones, couldn't it (for the same reasons as Python's wheels - GPU bound builds, complex and long builds) ?
If I'm not mistaken in that reasoning, then by having at least the ruby one, a subsequent build would still pass and so we ought not scare the user with such statements; and if it so happens that they'd fail the build, then we can point them to the docs and say there's a flag for binary deps.

slimreaper35 · 2024-10-08T09:35:11Z

cachi2/core/package_managers/bundler/parser.py

+class GemPlatformSpecificDependency(GemDependency):
+    """
+    Represents a gem dependency built for a specific platform.
+
+    Attributes:
+        platform:     Platform for which the dependency was built.
+    """
+
+    platform: str
+
+    @cached_property
+    def remote_location(self) -> str:
+        """Return remote location to download this gem from."""
+        return f"{self.source}/downloads/{self.name}-{self.version}-{self.platform}.gem"
+
+    def download_to(self, deps_dir: RootedPath) -> None:
+        """Download represented gem to specified file system location."""
+        fs_location = deps_dir.join_within_root(
+            Path(f"{self.name}-{self.version}-{self.platform}.gem")
+        )
+        log.info(
+            "Downloading platform-specific gem %s-%s-%s", self.name, self.version, self.platform
+        )
+        download_binary_file(self.remote_location, fs_location)
+
+


See my previous comment about duplication.

We could rework it, but after I am done with sqlite3 (and potentially other interesting ways of storing versions).

slimreaper35 · 2024-10-08T09:41:09Z

tests/integration/test_bundler.py

+                expected_exit_code=0,
+                expected_output="",
+            ),
+            [],  # No additional commands are run to verify the build


Is this marked as TODO or on purpose ?

My reasoning is as long as a build succeeds then everything that is necessary to build a project is present and no other tests are needed.

cachi2/core/models/input.py

slimreaper35 · 2024-10-10T16:06:25Z

I am not sure if overall that's what we want. When I was testing it locally, it looked like our parser script always adds platform = "ruby" if the source gem exists. So if users set allow binary to true, we download source gems anyway.

When I had (sorbet)[https://rubygems.org/gems/sorbet-static/versions] in Gemfile.lock. JSON output had 3 gems for 3 different platforms. Then, we would not download all pre-compiled gems available as we do in pip.

Everything kind of depends on the parser.

Some dependencies contain architecture-specific components. Their name and location differ from standard Ruby-only dependencies. As a result prior to this commit such dependencies would be skipped during prefetch. This, in turn, failed hermetic builds because a dependency would end up missing. Resolves: containerbuildsystem#672 Signed-off-by: Alexey Ovchinnikov <[email protected]>

a-ovchinnikov · 2024-10-10T17:38:11Z

I am not sure if overall that's what we want.

My understanding is that we want to download whatever is specified in a Gemfile.lock. Technically we do not even need to ask a user if they want precompiled binaries or not, but for consistency with pip we do this. I think it is also good to be explicit about binary blobs and make user acknowledge that this is indeed what they want: there is always a chance that this was never noticed before and made its way into a stricter environment by accident.

This commit adds e2e to bundler. The tests verify that gems pre-fetched with cachi2 could be built in isolation. Signed-off-by: Alexey Ovchinnikov <[email protected]>

eskultety · 2024-10-14T13:04:16Z

cachi2/core/package_managers/bundler/parser.py

+            # A combination of Ruby v.3.0.7 and some Bundler dependencies results in
+            # -gnu suffix being dropped from some platforms. This was observed on
+            # sqlite3-aarch-linux-gnu. We cannot control user's Ruby version,
+            # but we could try and guess correct path. If this fails then we should
+            # assume that package is broken.


What test data did you use? How did you come to the conclusion it has something to do with Ruby v3.0.7? When I tried the following Gemfile with that Ruby version (bundler 2.2.33):

source "https://rubygems.org" gem "sqlite3", "= 2.0.4"

The lockfile I got for mult-arch platforms was (note the mini_portile dep):

GEM remote: https://rubygems.org/ specs: mini_portile2 (2.8.7) sqlite3 (2.0.4) mini_portile2 (~> 2.8.0) PLATFORMS aarch64-linux arm-linux arm-linux-musl arm64-darwin x64-mingw-ucrt x86-linux x86_64-darwin x86_64-linux x86_64-linux-musl DEPENDENCIES sqlite3 (= 2.0.4)

I was not able to reproduce otherwise with newer Ruby which apparently changed how the compile targets are handled and with 3.1.0 I got:

GEM remote: https://rubygems.org/ specs: sqlite3 (2.0.4-aarch64-linux-gnu) sqlite3 (2.0.4-arm-linux-gnu) sqlite3 (2.0.4-arm-linux-musl) sqlite3 (2.0.4-arm64-darwin) sqlite3 (2.0.4-x64-mingw-ucrt) sqlite3 (2.0.4-x86-linux-gnu) sqlite3 (2.0.4-x86_64-darwin) sqlite3 (2.0.4-x86_64-linux-gnu) sqlite3 (2.0.4-x86_64-linux-musl) PLATFORMS aarch64-linux arm-linux arm-linux-musl arm64-darwin x64-mingw-ucrt x86-linux x86_64-darwin x86_64-linux x86_64-linux-musl DEPENDENCIES sqlite3 (= 2.0.4)

That said, I found this issue which reported a similar problem, but for the musl target: rubygems/rubygems#7432 . I wonder if these could be related in any way.

eskultety · 2024-10-14T13:09:08Z

cachi2/core/package_managers/bundler/parser.py

+            self.platform = self.platform + "-gnu"
+            fs_location = deps_dir.join_within_root(
+                Path(f"{self.name}-{self.version}-{self.platform}.gem")
+            )
+            download_binary_file(self.remote_location, fs_location)


Based on my limited research above (missing musl target in the lockfile) I am not convinced that we want to assume this problem exists only for -gnu. The reporter of rubygems/rubygems#7432 mentioned that he could work around the problem by explicitly doing bundle lock --add-platform xyz-musl and then it produced the expected results. At best I'd suggest keeping a neutral attitude here, not trying to recover from an error that likely doesn't originate at cachi2, and simply re-raise FetchError (or change the type) providing a solution and in that solution mention that the lockfile should be inspected and the actual URL checked for existence.

eskultety · 2024-10-14T13:12:58Z

cachi2/core/package_managers/bundler/parser.py

+                    # No need to force a platform if we skip the packages.
+                    log.warning(
+                        "Skipping binary dependency %s because 'allow_binary' is set to False."
+                        " This will likely result in an unbuildable package.",


We cannot force Ruby platform too: this requires modifications to the lock file

We should not touch the lock file - ever - as that is the only source of truth for us, if we change it, it's going to be hard arguing about incorrect platform targets, bad resolution (see the issue I linked elsewhere) to the consumer base.

eskultety · 2024-10-14T13:23:15Z

cachi2/core/package_managers/bundler/parser.py

+            if dep["platform"] != "ruby":
+                full_name = "-".join([dep["name"], dep["version"], dep["platform"]])
+                log.warning("Found a binary dependency %s", full_name)
+                if allow_binary:
+                    log.warning(
+                        "Downloading binary dependency %s because 'allow_binary' is set to True",
+                        full_name,
+                    )
+                    result.append(GemPlatformSpecificDependency(**dep))
+                else:
+                    # No need to force a platform if we skip the packages.
+                    log.warning(
+                        "Skipping binary dependency %s because 'allow_binary' is set to False."
+                        " This will likely result in an unbuildable package.",


nitpick: please invert the platform != "ruby" condition - a short-circuit evaluation reads much easier than a loong if and a one-liner else

eskultety · 2024-10-14T13:24:42Z

cachi2/core/package_managers/bundler/parser.py

+                    # No need to force a platform if we skip the packages.
+                    log.warning(
+                        "Skipping binary dependency %s because 'allow_binary' is set to False."
+                        " This will likely result in an unbuildable package.",


What I potentially don't agree with is the warning statement as argued by @slimreaper35 for one particular reason - I don't see any further scanning on dupes in case allow_binary == False, IOW there could be an identical dependency defined with platform ruby withing the lockfile along with the platform specific ones, couldn't it (for the same reasons as Python's wheels - GPU bound builds, complex and long builds) ?
If I'm not mistaken in that reasoning, then by having at least the ruby one, a subsequent build would still pass and so we ought not scare the user with such statements; and if it so happens that they'd fail the build, then we can point them to the docs and say there's a flag for binary deps.

eskultety · 2024-10-14T13:30:39Z

tests/unit/package_managers/bundler/test_parser.py

+def test_binary_gem_dependencies_could_be_downloaded_for_damaged_platforms(
+    mock_downloader: mock.MagicMock,
+    caplog: pytest.LogCaptureFixture,
+) -> None:
+    base_destination = RootedPath("/tmp/foo")
+    source = "https://rubygems.org"
+    platform = "aarch-linux"
+    dependency = GemPlatformSpecificDependency(
+        name="foo",
+        version="0.0.2",
+        source=source,
+        platform=platform,
+    )
+    expected_source_url = f"{source}/downloads/foo-0.0.2-{platform}.gem"
+    expected_corrected_source_url = f"{source}/downloads/foo-0.0.2-{platform}-gnu.gem"
+    expected_destination = base_destination.join_within_root(Path(f"foo-0.0.2-{platform}.gem"))
+    expected_corrected_destination = base_destination.join_within_root(
+        Path(f"foo-0.0.2-{platform}-gnu.gem")
+    )
+    expected_calls = [
+        mock.call(expected_source_url, expected_destination),
+        mock.call(expected_corrected_source_url, expected_corrected_destination),
+    ]
+
+    mock_downloader.side_effect = [FetchError(reason="testing"), "ok"]
+
+    dependency.download_to(base_destination)
+
+    assert relaxed_in("Downloading platform-specific gem", caplog.messages)
+    mock_downloader.assert_has_calls(expected_calls)
+
+


In context of this review, this test case should go away.

eskultety · 2024-10-14T13:33:22Z

cachi2/core/package_managers/bundler/parser.py

-            result.append(GemDependency(**dep))
+            if dep["platform"] != "ruby":
+                full_name = "-".join([dep["name"], dep["version"], dep["platform"]])
+                log.warning("Found a binary dependency %s", full_name)


nitpick: This warning feels a bit redundant since there are only 2 branches and each one reports a different WARNING. Also, I think those are INFO level messages, not warning, a potentially failed build aside there isn't any immediate harm that could be the result of this code. And since there is a flag for including binary deps, it simply doesn't feel justified as a warning anyway.

eskultety · 2024-10-14T13:34:36Z

cachi2/core/package_managers/bundler/parser.py

+    @property
+    def remote_location(self) -> str:
+        """Return remote location to download this gem from."""
+        return f"{self.source}/downloads/{self.name}-{self.version}-{self.platform}.gem"


Out of curiosity, is it okay for us to hard-code the 'downloads' part of the remote path? Do we want to be easily prepared for custom Ruby "package indices" ?

eskultety · 2024-10-14T13:35:29Z

cachi2/core/models/input.py

@@ -73,6 +73,7 @@ class BundlerPackageInput(_PackageInputBase):
    """Accepted input for a bundler package."""

    type: Literal["bundler"]
+    allow_binary: bool = False


I believe the CLI change should go to a separate commit and be decoupled from the core logic.

eskultety · 2024-10-14T13:36:34Z

cachi2/core/models/input.py

@@ -73,6 +73,7 @@ class BundlerPackageInput(_PackageInputBase):
    """Accepted input for a bundler package."""

    type: Literal["bundler"]
+    allow_binary: bool = False


Commit message:

I don't think this is a "fix" anymore. By handling this with a CLI flag it is a genuine feature ;) and so the commit subject should be adjusted to actually match the vibe.

a-ovchinnikov requested review from slimreaper35 and eskultety October 3, 2024 20:34

a-ovchinnikov force-pushed the issue672 branch from ba631f6 to 5eaf37d Compare October 3, 2024 20:35

a-ovchinnikov force-pushed the issue672 branch from 5eaf37d to ca5d99c Compare October 7, 2024 17:22

a-ovchinnikov requested review from brunoapimentel and ben-alkov October 7, 2024 19:16

a-ovchinnikov force-pushed the issue672 branch from ca5d99c to f2e0880 Compare October 7, 2024 19:38

a-ovchinnikov commented Oct 7, 2024

View reviewed changes

tests/integration/test_data/bundler_everything_present/container/Containerfile Outdated Show resolved Hide resolved

tests/integration/test_data/bundler_everything_present/container/Containerfile Outdated Show resolved Hide resolved

slimreaper35 reviewed Oct 8, 2024

View reviewed changes

slimreaper35 mentioned this pull request Oct 9, 2024

Bundler documentation #674

Open

4 tasks

a-ovchinnikov force-pushed the issue672 branch from f2e0880 to 4ab175e Compare October 9, 2024 18:07

eskultety reviewed Oct 10, 2024

View reviewed changes

cachi2/core/models/input.py Show resolved Hide resolved

a-ovchinnikov force-pushed the issue672 branch from 4ab175e to 8c8c437 Compare October 10, 2024 17:30

bundler: Adding e2e tests

bfad1cd

This commit adds e2e to bundler. The tests verify that gems pre-fetched with cachi2 could be built in isolation. Signed-off-by: Alexey Ovchinnikov <[email protected]>

a-ovchinnikov force-pushed the issue672 branch from 8c8c437 to bfad1cd Compare October 10, 2024 17:38

eskultety reviewed Oct 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bundler: Fix for prefetching of dependencies #673

bundler: Fix for prefetching of dependencies #673

a-ovchinnikov commented Oct 3, 2024 •

edited

Loading

a-ovchinnikov commented Oct 4, 2024

brunoapimentel commented Oct 7, 2024

slimreaper35 left a comment

slimreaper35 Oct 8, 2024

a-ovchinnikov Oct 8, 2024

slimreaper35 Oct 8, 2024

a-ovchinnikov Oct 8, 2024

eskultety Oct 14, 2024

eskultety Oct 14, 2024

slimreaper35 Oct 8, 2024

a-ovchinnikov Oct 8, 2024

slimreaper35 Oct 8, 2024

a-ovchinnikov Oct 8, 2024

slimreaper35 commented Oct 10, 2024 •

edited

Loading

a-ovchinnikov commented Oct 10, 2024

eskultety Oct 14, 2024

eskultety Oct 14, 2024

eskultety Oct 14, 2024

eskultety Oct 14, 2024

eskultety Oct 14, 2024

eskultety Oct 14, 2024

eskultety Oct 14, 2024

eskultety Oct 14, 2024

eskultety Oct 14, 2024

eskultety Oct 14, 2024

bundler: Fix for prefetching of dependencies #673

Are you sure you want to change the base?

bundler: Fix for prefetching of dependencies #673

Conversation

a-ovchinnikov commented Oct 3, 2024 • edited Loading

Maintainers will complete the following section

a-ovchinnikov commented Oct 4, 2024

brunoapimentel commented Oct 7, 2024

slimreaper35 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slimreaper35 commented Oct 10, 2024 • edited Loading

a-ovchinnikov commented Oct 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-ovchinnikov commented Oct 3, 2024 •

edited

Loading

slimreaper35 commented Oct 10, 2024 •

edited

Loading