Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dockerTools.streamLayeredImage causing overeager layer invalidation due to mtime #254421

Closed
daniel-sampliner opened this issue Sep 10, 2023 · 4 comments

Comments

@daniel-sampliner
Copy link

Describe the bug

Changing the created attribute invalidates all layers of the built image.
This results in lots of wasted space/bandwidth for storing/downloading the exact same contents (although with superficially different mtimes).

Steps To Reproduce

  1. Start with simple default.nix:

    { pkgs ? import (fetchTarball "https://github.com/nixos/nixpkgs/archive/a1635b38216d7b04ee86aca6f1e43f7fe2da43bc.tar.gz") { } }:
    let
      mkSimpleImage = tag: created: pkgs.dockerTools.streamLayeredImage {
        inherit tag created;
        name = "test";
        contents = [ pkgs.bash ];
      };
    in
    {
      old = mkSimpleImage "old" "2023-09-01T00:00:01+00:00";
      new = mkSimpleImage "new" "2023-09-02T00:00:01+00:00";
    }
  2. Build the old image $(nix-build -A old) | podman image load

  3. Build the new image $(nix-build -A new) | podman image load

Now you can see that despite the images containing the exact same store paths, they do not share the associated layers:

:; diff -u \
  <(podman image inspect test:old | jq '.[0] | {RootFS, History}') \
  <(podman image inspect test:new | jq '.[0] | {RootFS, History}')
--- /proc/self/fd/11	2023-09-10 10:01:37.195629216 -0400
+++ /proc/self/fd/14	2023-09-10 10:01:37.195629216 -0400
@@ -2,37 +2,37 @@
   "RootFS": {
     "Type": "layers",
     "Layers": [
-      "sha256:950dfc183525b916ed0d566d816296f84a8ff3de55356e006e7188fd721163b3",
-      "sha256:ef1de47f02630a2a2023f1930fdd288f3c173a9f3127f7d83b2117e88d6a7c72",
-      "sha256:70dcc1307c02a7a11352c46ecbbf55e8d775f3daefbf3d83d7ba48c19e3dd1bc",
-      "sha256:d8a1865c296d9ebf11ee1415c372f633bafcd8fb97cd3f56d1a4d5934b053cb7",
-      "sha256:b2450631e5d1f86825b3a7cb0f086e77b934be4b7587ae6ac2575f1a1efa28e0",
+      "sha256:f574e510180be579e860f7b5555b3b861c3b4977d9cc41b6e3d9d38f76ef236e",
+      "sha256:07b482deeeb629eb58f057032e951d244ae4f5a997b59650b3bda9aa378c901f",
+      "sha256:ebec36ba9533f77ea9b70f4639b3b308424f886a7fe311c9e0ecda50a2a76285",
+      "sha256:2f623b46aa405cc3b3d2f559c247eeff75ef2abf74932ac0e02f1063c7bd466f",
+      "sha256:48a212285903e1ca60e7aa0bcf01c8305303060f55c6fc4044c68996f5e734d5",
       "sha256:d8d90d7bd650641d5c564a85ced16a2970181114390c123b0023bb7f94587c63"
     ]
   },
   "History": [
     {
-      "created": "2023-09-01T00:00:01Z",
+      "created": "2023-09-02T00:00:01Z",
       "comment": "store paths: ['/nix/store/gnzwqa9df994g01yw5x75qnbl1rhp9ds-libunistring-1.1']"
     },
     {
-      "created": "2023-09-01T00:00:01Z",
+      "created": "2023-09-02T00:00:01Z",
       "comment": "store paths: ['/nix/store/h3aw16j1c54jv8s39yvdhpfcx3538jwi-libidn2-2.3.4']"
     },
     {
-      "created": "2023-09-01T00:00:01Z",
+      "created": "2023-09-02T00:00:01Z",
       "comment": "store paths: ['/nix/store/kv0v4h5i911gj39m7n9q10k8r8gbn3sa-xgcc-12.3.0-libgcc']"
     },
     {
-      "created": "2023-09-01T00:00:01Z",
+      "created": "2023-09-02T00:00:01Z",
       "comment": "store paths: ['/nix/store/905gkx2q1pswixwmi1qfhfl6mik3f22l-glibc-2.37-8']"
     },
     {
-      "created": "2023-09-01T00:00:01Z",
+      "created": "2023-09-02T00:00:01Z",
       "comment": "store paths: ['/nix/store/m36d29gn5gm9bk0g7fcln1v8171hvn95-bash-5.2-p15']"
     },
     {
-      "created": "2023-09-01T00:00:01Z",
+      "created": "2023-09-02T00:00:01Z",
       "comment": "store paths: ['/nix/store/010qycr8cqgdd01f9b2bjfc71yab70zk-test-customisation-layer']"
     }
   ]

Expected behavior

Images share the layers with the same store paths.

Additional context

I can patch stream_layered_image.py to apply a fixed mtime to the layers:

diff --git a/pkgs/build-support/docker/stream_layered_image.py b/pkgs/build-support/docker/stream_layered_image.py
index d7c63eb43a78..a45d6266d5a5 100644
--- a/pkgs/build-support/docker/stream_layered_image.py
+++ b/pkgs/build-support/docker/stream_layered_image.py
@@ -336,7 +336,7 @@ def main():
         for num, store_layer in enumerate(conf["store_layers"], start=start):
             print("Creating layer", num, "from paths:", store_layer,
                   file=sys.stderr)
-            info = add_layer_dir(tar, store_layer, store_dir, mtime=mtime)
+            info = add_layer_dir(tar, store_layer, store_dir, mtime=1)
             layers.append(info)
 
         print("Creating layer", len(layers) + 1, "with customisation...",
@@ -345,7 +345,7 @@ def main():
           add_customisation_layer(
             tar,
             conf["customisation_layer"],
-            mtime=mtime
+            mtime=1
           )
         )
 
@@ -362,7 +362,7 @@ def main():
             },
             "history": [
                 {
-                  "created": datetime.isoformat(created),
+                  "created": "1970-01-01T00:00:01Z",
                   "comment": f"store paths: {layer.paths}"
                 }
                 for layer in layers

This has my intended effect of causing these images to share their layers, while still preserving the general image creation time in the metadata to what's specified:

:; diff -u \
  <(podman image inspect patched:old | jq '.[0] | {Created, RootFS, History}') \
  <(podman image inspect patched:new | jq '.[0] | {Created, RootFS, History}')
--- /proc/self/fd/11	2023-09-10 10:24:25.352906416 -0400
+++ /proc/self/fd/14	2023-09-10 10:24:25.352906416 -0400
@@ -1,5 +1,5 @@
 {
-  "Created": "2023-09-01T00:00:01Z",
+  "Created": "2023-09-02T00:00:01Z",
   "RootFS": {
     "Type": "layers",
     "Layers": [

However, I don't know if this has negative impacts regarding reproducability or otherwise.

Notify maintainers

@utdemir

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.1.43, NixOS, 23.05 (Stoat), 23.05.20230807.011567f`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.13.3`
 - nixpkgs: `/nix/store/24jqzdczdyg04kkgd71wvydc9yxr5ndh-source`
@utdemir
Copy link
Member

utdemir commented Sep 11, 2023

@daniel-sampliner Thanks for the issue!

So, as you deduced, that created field affects two things:

  • The "Created" field on the image metadata
  • mtime of the files inside the layers.

The latter causes the sha256sum's of the layers to change (as expected). I believe your issue can be solved if we split those fields to two:

  • created: "created" field on the image metata
  • mtime: modification times of the files inside

So you could set mtime to a constant value and changes on created won't invalidate the images. And if we make mtime default to created, this would be a backwards compatible change.

@daniel-sampliner , do you think this is a fair assessment of the problem? Would it solve your issue?

@daniel-sampliner
Copy link
Author

Yeah that sounds great!

I do have one "intrusive thought" nagging me: is that much flexibility needed? Instead should it be similar to the nix store which always has all files' mtime fixed at 1970-01-01?

Granted, hardcoding mtime and only allowing created to be configurable would not be a backwards-compatible change; I may just be a couple years too late to the party to be asking about this 😆.

In any case, I have no qualms with what you've suggested, since it would definitely solve my problem.

@utdemir
Copy link
Member

utdemir commented Sep 18, 2023

I've pretty simple implementation here, but I don't have access to a Linux machine and I have to test first. I might spend some time next weekend, but @daniel-sampliner , if you want to you can take over the branch :).

@tomberek
Copy link
Contributor

obe: #327579

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants