-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bundler: Fix for prefetching of dependencies #673
base: main
Are you sure you want to change the base?
Conversation
ba631f6
to
5eaf37d
Compare
/retest |
5eaf37d
to
ca5d99c
Compare
ca5d99c
to
f2e0880
Compare
/retest |
tests/integration/test_data/bundler_everything_present/container/Containerfile
Outdated
Show resolved
Hide resolved
tests/integration/test_data/bundler_everything_present/container/Containerfile
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be fine to keep e2e tests as separate commits. You forgot to generate test data.
@@ -11,7 +11,9 @@ | |||
lockfile_parser.specs.each do |spec| | |||
parsed_spec = { | |||
name: spec.name, | |||
version: spec.version.to_s | |||
version: spec.version.to_s, | |||
full_name: spec.full_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need full_name
, it can be easily constructed from name
and version
as you have in the code.
That said, I wonder if we can move the platform attribute to the GemDependency
class and make it default to ruby (for example). I see a lot of duplication in both classes.
- When I look at
our favoritePURL spec doc I see a platform qualifier. The default isruby
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full name requires platform to be constructed and has to be dispatched basing on whether platform equals to 'ruby' or not. That being said the method I used for platform detection is really hair-brained and could be simplified to just checking if platform equals to 'ruby'.
# No need to force a platform if we skip the packages. | ||
log.warning( | ||
"Skipping binary dependency %s because 'allow_binary' is set to False." | ||
" This will likely result in an unbuildable package.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue, that this is not true, it is a rare scenario when a gem does not ship a source distribution.
We only found one during the development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way nokogiri dependency is currently defined in test repo a build will fail if binaries are not available. That's how I learned about this problem. We do not control Gemfiles and thus cannot exclude platforms we don't like and will end up with multiple binaries in lock file. We cannot force Ruby platform too: this requires modifications to the lock file. Please correct me if I am wrong on any of these points. Moreover, even if we could force a platform we probably should not, because this would mean that any forced application would run much slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot force Ruby platform too: this requires modifications to the lock file
We should not touch the lock file - ever - as that is the only source of truth for us, if we change it, it's going to be hard arguing about incorrect platform targets, bad resolution (see the issue I linked elsewhere) to the consumer base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I potentially don't agree with is the warning statement as argued by @slimreaper35 for one particular reason - I don't see any further scanning on dupes in case allow_binary == False
, IOW there could be an identical dependency defined with platform ruby
withing the lockfile along with the platform specific ones, couldn't it (for the same reasons as Python's wheels - GPU bound builds, complex and long builds) ?
If I'm not mistaken in that reasoning, then by having at least the ruby
one, a subsequent build would still pass and so we ought not scare the user with such statements; and if it so happens that they'd fail the build, then we can point them to the docs and say there's a flag for binary deps.
class GemPlatformSpecificDependency(GemDependency): | ||
""" | ||
Represents a gem dependency built for a specific platform. | ||
|
||
Attributes: | ||
platform: Platform for which the dependency was built. | ||
""" | ||
|
||
platform: str | ||
|
||
@cached_property | ||
def remote_location(self) -> str: | ||
"""Return remote location to download this gem from.""" | ||
return f"{self.source}/downloads/{self.name}-{self.version}-{self.platform}.gem" | ||
|
||
def download_to(self, deps_dir: RootedPath) -> None: | ||
"""Download represented gem to specified file system location.""" | ||
fs_location = deps_dir.join_within_root( | ||
Path(f"{self.name}-{self.version}-{self.platform}.gem") | ||
) | ||
log.info( | ||
"Downloading platform-specific gem %s-%s-%s", self.name, self.version, self.platform | ||
) | ||
download_binary_file(self.remote_location, fs_location) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my previous comment about duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could rework it, but after I am done with sqlite3 (and potentially other interesting ways of storing versions).
expected_exit_code=0, | ||
expected_output="", | ||
), | ||
[], # No additional commands are run to verify the build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this marked as TODO or on purpose ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning is as long as a build succeeds then everything that is necessary to build a project is present and no other tests are needed.
f2e0880
to
4ab175e
Compare
I am not sure if overall that's what we want. When I was testing it locally, it looked like our parser script always adds platform = "ruby" if the source gem exists. So if users set allow binary to true, we download source gems anyway. When I had (sorbet)[https://rubygems.org/gems/sorbet-static/versions] in Gemfile.lock. JSON output had 3 gems for 3 different platforms. Then, we would not download all pre-compiled gems available as we do in Everything kind of depends on the parser. |
Some dependencies contain architecture-specific components. Their name and location differ from standard Ruby-only dependencies. As a result prior to this commit such dependencies would be skipped during prefetch. This, in turn, failed hermetic builds because a dependency would end up missing. Resolves: containerbuildsystem#672 Signed-off-by: Alexey Ovchinnikov <[email protected]>
4ab175e
to
8c8c437
Compare
My understanding is that we want to download whatever is specified in a Gemfile.lock. Technically we do not even need to ask a user if they want precompiled binaries or not, but for consistency with pip we do this. I think it is also good to be explicit about binary blobs and make user acknowledge that this is indeed what they want: there is always a chance that this was never noticed before and made its way into a stricter environment by accident. |
This commit adds e2e to bundler. The tests verify that gems pre-fetched with cachi2 could be built in isolation. Signed-off-by: Alexey Ovchinnikov <[email protected]>
8c8c437
to
bfad1cd
Compare
# A combination of Ruby v.3.0.7 and some Bundler dependencies results in | ||
# -gnu suffix being dropped from some platforms. This was observed on | ||
# sqlite3-aarch-linux-gnu. We cannot control user's Ruby version, | ||
# but we could try and guess correct path. If this fails then we should | ||
# assume that package is broken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What test data did you use? How did you come to the conclusion it has something to do with Ruby v3.0.7? When I tried the following Gemfile
with that Ruby version (bundler 2.2.33):
source "https://rubygems.org"
gem "sqlite3", "= 2.0.4"
The lockfile I got for mult-arch platforms was (note the mini_portile
dep):
GEM
remote: https://rubygems.org/
specs:
mini_portile2 (2.8.7)
sqlite3 (2.0.4)
mini_portile2 (~> 2.8.0)
PLATFORMS
aarch64-linux
arm-linux
arm-linux-musl
arm64-darwin
x64-mingw-ucrt
x86-linux
x86_64-darwin
x86_64-linux
x86_64-linux-musl
DEPENDENCIES
sqlite3 (= 2.0.4)
I was not able to reproduce otherwise with newer Ruby which apparently changed how the compile targets are handled and with 3.1.0 I got:
GEM
remote: https://rubygems.org/
specs:
sqlite3 (2.0.4-aarch64-linux-gnu)
sqlite3 (2.0.4-arm-linux-gnu)
sqlite3 (2.0.4-arm-linux-musl)
sqlite3 (2.0.4-arm64-darwin)
sqlite3 (2.0.4-x64-mingw-ucrt)
sqlite3 (2.0.4-x86-linux-gnu)
sqlite3 (2.0.4-x86_64-darwin)
sqlite3 (2.0.4-x86_64-linux-gnu)
sqlite3 (2.0.4-x86_64-linux-musl)
PLATFORMS
aarch64-linux
arm-linux
arm-linux-musl
arm64-darwin
x64-mingw-ucrt
x86-linux
x86_64-darwin
x86_64-linux
x86_64-linux-musl
DEPENDENCIES
sqlite3 (= 2.0.4)
That said, I found this issue which reported a similar problem, but for the musl
target: rubygems/rubygems#7432 . I wonder if these could be related in any way.
self.platform = self.platform + "-gnu" | ||
fs_location = deps_dir.join_within_root( | ||
Path(f"{self.name}-{self.version}-{self.platform}.gem") | ||
) | ||
download_binary_file(self.remote_location, fs_location) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my limited research above (missing musl
target in the lockfile) I am not convinced that we want to assume this problem exists only for -gnu
. The reporter of rubygems/rubygems#7432 mentioned that he could work around the problem by explicitly doing bundle lock --add-platform xyz-musl
and then it produced the expected results. At best I'd suggest keeping a neutral attitude here, not trying to recover from an error that likely doesn't originate at cachi2, and simply re-raise FetchError
(or change the type) providing a solution and in that solution mention that the lockfile should be inspected and the actual URL checked for existence.
# No need to force a platform if we skip the packages. | ||
log.warning( | ||
"Skipping binary dependency %s because 'allow_binary' is set to False." | ||
" This will likely result in an unbuildable package.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot force Ruby platform too: this requires modifications to the lock file
We should not touch the lock file - ever - as that is the only source of truth for us, if we change it, it's going to be hard arguing about incorrect platform targets, bad resolution (see the issue I linked elsewhere) to the consumer base.
if dep["platform"] != "ruby": | ||
full_name = "-".join([dep["name"], dep["version"], dep["platform"]]) | ||
log.warning("Found a binary dependency %s", full_name) | ||
if allow_binary: | ||
log.warning( | ||
"Downloading binary dependency %s because 'allow_binary' is set to True", | ||
full_name, | ||
) | ||
result.append(GemPlatformSpecificDependency(**dep)) | ||
else: | ||
# No need to force a platform if we skip the packages. | ||
log.warning( | ||
"Skipping binary dependency %s because 'allow_binary' is set to False." | ||
" This will likely result in an unbuildable package.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: please invert the platform != "ruby"
condition - a short-circuit evaluation reads much easier than a loong if
and a one-liner else
# No need to force a platform if we skip the packages. | ||
log.warning( | ||
"Skipping binary dependency %s because 'allow_binary' is set to False." | ||
" This will likely result in an unbuildable package.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I potentially don't agree with is the warning statement as argued by @slimreaper35 for one particular reason - I don't see any further scanning on dupes in case allow_binary == False
, IOW there could be an identical dependency defined with platform ruby
withing the lockfile along with the platform specific ones, couldn't it (for the same reasons as Python's wheels - GPU bound builds, complex and long builds) ?
If I'm not mistaken in that reasoning, then by having at least the ruby
one, a subsequent build would still pass and so we ought not scare the user with such statements; and if it so happens that they'd fail the build, then we can point them to the docs and say there's a flag for binary deps.
def test_binary_gem_dependencies_could_be_downloaded_for_damaged_platforms( | ||
mock_downloader: mock.MagicMock, | ||
caplog: pytest.LogCaptureFixture, | ||
) -> None: | ||
base_destination = RootedPath("/tmp/foo") | ||
source = "https://rubygems.org" | ||
platform = "aarch-linux" | ||
dependency = GemPlatformSpecificDependency( | ||
name="foo", | ||
version="0.0.2", | ||
source=source, | ||
platform=platform, | ||
) | ||
expected_source_url = f"{source}/downloads/foo-0.0.2-{platform}.gem" | ||
expected_corrected_source_url = f"{source}/downloads/foo-0.0.2-{platform}-gnu.gem" | ||
expected_destination = base_destination.join_within_root(Path(f"foo-0.0.2-{platform}.gem")) | ||
expected_corrected_destination = base_destination.join_within_root( | ||
Path(f"foo-0.0.2-{platform}-gnu.gem") | ||
) | ||
expected_calls = [ | ||
mock.call(expected_source_url, expected_destination), | ||
mock.call(expected_corrected_source_url, expected_corrected_destination), | ||
] | ||
|
||
mock_downloader.side_effect = [FetchError(reason="testing"), "ok"] | ||
|
||
dependency.download_to(base_destination) | ||
|
||
assert relaxed_in("Downloading platform-specific gem", caplog.messages) | ||
mock_downloader.assert_has_calls(expected_calls) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In context of this review, this test case should go away.
result.append(GemDependency(**dep)) | ||
if dep["platform"] != "ruby": | ||
full_name = "-".join([dep["name"], dep["version"], dep["platform"]]) | ||
log.warning("Found a binary dependency %s", full_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: This warning feels a bit redundant since there are only 2 branches and each one reports a different WARNING. Also, I think those are INFO level messages, not warning, a potentially failed build aside there isn't any immediate harm that could be the result of this code. And since there is a flag for including binary deps, it simply doesn't feel justified as a warning anyway.
@property | ||
def remote_location(self) -> str: | ||
"""Return remote location to download this gem from.""" | ||
return f"{self.source}/downloads/{self.name}-{self.version}-{self.platform}.gem" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, is it okay for us to hard-code the 'downloads' part of the remote path? Do we want to be easily prepared for custom Ruby "package indices" ?
@@ -73,6 +73,7 @@ class BundlerPackageInput(_PackageInputBase): | |||
"""Accepted input for a bundler package.""" | |||
|
|||
type: Literal["bundler"] | |||
allow_binary: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the CLI change should go to a separate commit and be decoupled from the core logic.
@@ -73,6 +73,7 @@ class BundlerPackageInput(_PackageInputBase): | |||
"""Accepted input for a bundler package.""" | |||
|
|||
type: Literal["bundler"] | |||
allow_binary: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit message:
I don't think this is a "fix" anymore. By handling this with a CLI flag it is a genuine feature ;) and so the commit subject should be adjusted to actually match the vibe.
Some dependencies contain architecture-specific components. Their name and location differ from standard Ruby-only dependencies. As a result prior to this commit such dependencies would be skipped during prefetch. This, in turn, failed hermetic builds because a dependency would end up missing.
Resolves: #672
Maintainers will complete the following section
Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:
/ok-to-test
(as is the standard for Pipelines as Code)