Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caching: Enable caching & loading of container images in Forklift's cache #245

Open
ethanjli opened this issue Jun 13, 2024 · 0 comments
Open
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@ethanjli
Copy link
Member

ethanjli commented Jun 13, 2024

Currently, Forklift treats the Docker daemon's image storage as the only place where container images are cached. This means that Forklift can only download container images when the Docker daemon is running (i.e. not in an unbooted systemd-nspawn container running with QEMU cross-architecture virtualization, such as in https://github.com/PlanktoScope/PlanktoScope/blob/e3dbfc7ed25da6c0cd5e3159cf6880f3e3abda38/.github/workflows/build-os.yml#L161) and only when we have permissions to talk to the Docker daemon (i.e. with root permissions or the docker usergroup).

If we want to pre-cache container images before booting into a QEMU VM, currently we install and run skopeo (and also GNU parallel) and run a shell script to download container images; and then we run another shell script to load container images into the Docker daemon after booting into a QEMU VM. Forcing the OS maintainer to include and maintain these scripts exposes a lot of complexity which we could instead hide in Forklift (and which would allow that functionality to be reused much more conveniently).

We could modify the [dev] plt cache-img/stage cache-img subcommands so that they download all required container images to a local cache (e.g. /var/cache/forklift/containers/docker-archive or ~/.cache/forklift/containers/docker-archive) in a format which can be loaded into Docker:

  • The images should be downloaded without relying on the Docker daemon, e.g. by wrapping around crane pull which probably does what we need (and is ideal since we already depend on crane); if we can't use crane to do this, then we'll have wrap around skopeo or https://github.com/containers/image - though to get static builds of Forklift with either of those two options, we'd need to build with dynamically-linked dependencies disabled (see https://github.com/containers/skopeo/blob/main/install.md#building-a-static-binary for additional details). On the other hand, if we can use skopeo instead of crane, then maybe we can store container images with deduplication of shared layers (i.e. in the containers-storage format rather than the docker-archive format) to save disk space.
  • We can have the cache-img subcommands attempt to load the cached images into the Docker daemon.
  • If we don't delete images from the cache after loading them into the Docker daemon, this system could be part of a solution for caching: cache rm-img shouldn't delete images which might be needed #228. In this case, we'd probably want cache rm-img and cache rm-all to only touch Forklift's cache of downloaded container images, and then we'd add a new host prune-img command to touch the Docker daemon's image storage.
  • We need a way to specify the CPU architecture of container images to pre-download, in case it doesn't match the CPU architecture which forklift was compiled for. This could be an --override-arch flag on the cache-img subcommands.

Then we could add another subcommand (maybe cache load-img) to load cached images into Docker's image storage using https://pkg.go.dev/github.com/docker/docker/client#Client.ImageLoad. Maybe we should also have [dev] plt load-img and stage load-img subcommands to do the same thing but only for cached images required by the pallet or staged pallet bundle?

It would also be useful if we could hide all the complexity currently at https://github.com/PlanktoScope/PlanktoScope/blob/e3dbfc7ed25da6c0cd5e3159cf6880f3e3abda38/.github/workflows/build-os.yml#L161 into a GitHub Action for downloading (with caching) all container images required by a particular pallet.

For exporting files from OCI container images, those container images should be downloaded into the container image cache, and files should be loaded from the container image cache for export.

@ethanjli ethanjli added the enhancement New feature or request label Jun 13, 2024
@ethanjli ethanjli self-assigned this Jun 14, 2024
@ethanjli ethanjli changed the title Enable caching & loading of container images in Forklift's cache caching: Enable caching & loading of container images in Forklift's cache Jun 14, 2024
@ethanjli ethanjli added this to the Backlog milestone Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant