You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, Forklift treats the Docker daemon's image storage as the only place where container images are cached. This means that Forklift can only download container images when the Docker daemon is running (i.e. not in an unbooted systemd-nspawn container running with QEMU cross-architecture virtualization, such as in https://github.com/PlanktoScope/PlanktoScope/blob/e3dbfc7ed25da6c0cd5e3159cf6880f3e3abda38/.github/workflows/build-os.yml#L161) and only when we have permissions to talk to the Docker daemon (i.e. with root permissions or the docker usergroup).
If we want to pre-cache container images before booting into a QEMU VM, currently we install and run skopeo (and also GNU parallel) and run a shell script to download container images; and then we run another shell script to load container images into the Docker daemon after booting into a QEMU VM. Forcing the OS maintainer to include and maintain these scripts exposes a lot of complexity which we could instead hide in Forklift (and which would allow that functionality to be reused much more conveniently).
We could modify the [dev] plt cache-img/stage cache-img subcommands so that they download all required container images to a local cache (e.g. /var/cache/forklift/containers/docker-archive or ~/.cache/forklift/containers/docker-archive) in a format which can be loaded into Docker:
The images should be downloaded without relying on the Docker daemon, e.g. by wrapping around crane pull which probably does what we need (and is ideal since we already depend on crane); if we can't use crane to do this, then we'll have wrap around skopeo or https://github.com/containers/image - though to get static builds of Forklift with either of those two options, we'd need to build with dynamically-linked dependencies disabled (see https://github.com/containers/skopeo/blob/main/install.md#building-a-static-binary for additional details). On the other hand, if we can use skopeo instead of crane, then maybe we can store container images with deduplication of shared layers (i.e. in the containers-storage format rather than the docker-archive format) to save disk space.
We can have the cache-img subcommands attempt to load the cached images into the Docker daemon.
If we don't delete images from the cache after loading them into the Docker daemon, this system could be part of a solution for caching: cache rm-img shouldn't delete images which might be needed #228. In this case, we'd probably want cache rm-img and cache rm-all to only touch Forklift's cache of downloaded container images, and then we'd add a new host prune-img command to touch the Docker daemon's image storage.
We need a way to specify the CPU architecture of container images to pre-download, in case it doesn't match the CPU architecture which forklift was compiled for. This could be an --override-arch flag on the cache-img subcommands.
Then we could add another subcommand (maybe cache load-img) to load cached images into Docker's image storage using https://pkg.go.dev/github.com/docker/docker/client#Client.ImageLoad. Maybe we should also have [dev] plt load-img and stage load-img subcommands to do the same thing but only for cached images required by the pallet or staged pallet bundle?
For exporting files from OCI container images, those container images should be downloaded into the container image cache, and files should be loaded from the container image cache for export.
The text was updated successfully, but these errors were encountered:
ethanjli
changed the title
Enable caching & loading of container images in Forklift's cache
caching: Enable caching & loading of container images in Forklift's cache
Jun 14, 2024
Currently, Forklift treats the Docker daemon's image storage as the only place where container images are cached. This means that Forklift can only download container images when the Docker daemon is running (i.e. not in an unbooted systemd-nspawn container running with QEMU cross-architecture virtualization, such as in https://github.com/PlanktoScope/PlanktoScope/blob/e3dbfc7ed25da6c0cd5e3159cf6880f3e3abda38/.github/workflows/build-os.yml#L161) and only when we have permissions to talk to the Docker daemon (i.e. with
root
permissions or thedocker
usergroup).If we want to pre-cache container images before booting into a QEMU VM, currently we install and run skopeo (and also GNU parallel) and run a shell script to download container images; and then we run another shell script to load container images into the Docker daemon after booting into a QEMU VM. Forcing the OS maintainer to include and maintain these scripts exposes a lot of complexity which we could instead hide in Forklift (and which would allow that functionality to be reused much more conveniently).
We could modify the
[dev] plt cache-img
/stage cache-img
subcommands so that they download all required container images to a local cache (e.g./var/cache/forklift/containers/docker-archive
or~/.cache/forklift/containers/docker-archive
) in a format which can be loaded into Docker:crane pull
which probably does what we need (and is ideal since we already depend on crane); if we can't use crane to do this, then we'll have wrap around skopeo or https://github.com/containers/image - though to get static builds of Forklift with either of those two options, we'd need to build with dynamically-linked dependencies disabled (see https://github.com/containers/skopeo/blob/main/install.md#building-a-static-binary for additional details). On the other hand, if we can use skopeo instead of crane, then maybe we can store container images with deduplication of shared layers (i.e. in thecontainers-storage
format rather than thedocker-archive
format) to save disk space.cache-img
subcommands attempt to load the cached images into the Docker daemon.cache rm-img
shouldn't delete images which might be needed #228. In this case, we'd probably wantcache rm-img
andcache rm-all
to only touch Forklift's cache of downloaded container images, and then we'd add a newhost prune-img
command to touch the Docker daemon's image storage.forklift
was compiled for. This could be an--override-arch
flag on thecache-img
subcommands.Then we could add another subcommand (maybe
cache load-img
) to load cached images into Docker's image storage using https://pkg.go.dev/github.com/docker/docker/client#Client.ImageLoad. Maybe we should also have[dev] plt load-img
andstage load-img
subcommands to do the same thing but only for cached images required by the pallet or staged pallet bundle?It would also be useful if we could hide all the complexity currently at https://github.com/PlanktoScope/PlanktoScope/blob/e3dbfc7ed25da6c0cd5e3159cf6880f3e3abda38/.github/workflows/build-os.yml#L161 into a GitHub Action for downloading (with caching) all container images required by a particular pallet.
For exporting files from OCI container images, those container images should be downloaded into the container image cache, and files should be loaded from the container image cache for export.
The text was updated successfully, but these errors were encountered: