Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

age: error: failed to wrap key for recipient #0: test plugin: write |1: broken pipe #517

Open
someplaceguy opened this issue Aug 2, 2023 · 16 comments

Comments

@someplaceguy
Copy link

someplaceguy commented Aug 2, 2023

Environment

  • OS: NixOS 23.05
  • age version: 1.1.1

What were you trying to do

Trying to run tests as part of the package's build process.

What happened

A test failed.

building
Building subPackage .
Building subPackage ./agessh
Building subPackage ./armor
Building subPackage ./cmd/age
Building subPackage ./cmd/age-keygen
Building subPackage ./internal/bech32
Building subPackage ./internal/format
Building subPackage ./internal/plugin
Building subPackage ./internal/stream
running tests
ok      filippo.io/age  0.736s
ok      filippo.io/age/agessh   0.272s
ok      filippo.io/age/armor    0.016s
--- FAIL: TestScript (0.00s)
    --- FAIL: TestScript/plugin (0.06s)
        testscript.go:481: 
            # encrypt and decrypt a file with a test plugin (0.059s)
            > age -r age1test10qdmzv9q -o test.age input
            [stderr]
            age: error: failed to wrap key for recipient #0: test plugin: write |1: broken pipe
            age: report unexpected or unhelpful errors at https://filippo.io/age/report
            [exit status 1]
            FAIL: testdata/plugin.txt:2: unexpected command failure
            
FAIL
FAIL    filippo.io/age/cmd/age  1.008s
FAIL

The build machine's CPU may have been overloaded at the time the tests were running (e.g. load average up to ~150 on a 16-core CPU), but it should not have caused an error AFAIK.

@ctheune
Copy link

ctheune commented Aug 4, 2023

This happens repeatably (without any load issue) for me on Darwin. NixOS is on unstable.

Darwin chazzzzz 22.6.0 Darwin Kernel Version 22.6.0: Wed Jul  5 22:22:52 PDT 2023; root:xnu-8796.141.3~6/RELEASE_ARM64_T8103 arm64

Here's the full build output.

unpacking sources
unpacking source archive /nix/store/fxs7r50g1c64gbvnz76pc4g0kcclapaj-source
source root is source
patching sources
applying patch /nix/store/16jna1lazm41ff0rdx6dsw1h8fpwgzm2-0001-revert-temporarily-disabled-testscript-tests.patch
patching file cmd/age/age_test.go
patching file go.mod
patching file go.sum
applying patch /nix/store/6zkzhrsr4scj1dxr5igd6c513x8i3a49-0002-depend-on-c2sp_cctv_age__TestVectors.patch
patching file go.mod
patching file go.sum
patching file testkit_test.go
configuring
building
Building subPackage .
Building subPackage ./agessh
Building subPackage ./armor
Building subPackage ./cmd/age
Building subPackage ./cmd/age-keygen
Building subPackage ./internal/bech32
Building subPackage ./internal/format
Building subPackage ./internal/plugin
Building subPackage ./internal/stream
running tests
ok      filippo.io/age  1.100s
ok      filippo.io/age/agessh   0.805s
ok      filippo.io/age/armor    0.490s
--- FAIL: TestScript (0.00s)
    --- FAIL: TestScript/plugin (0.08s)
        testscript.go:481:
            # encrypt and decrypt a file with a test plugin (0.065s)
            > age -r age1test10qdmzv9q -o test.age input
            [stderr]
            age: error: failed to wrap key for recipient #0: test plugin: write |1: broken pipe
            age: report unexpected or unhelpful errors at https://filippo.io/age/report
            [exit status 1]
            FAIL: testdata/plugin.txt:2: unexpected command failure

FAIL
FAIL    filippo.io/age/cmd/age  2.166s
FAIL

I'm currently trying to figure out when this broke.

@ctheune
Copy link

ctheune commented Aug 4, 2023

@someplaceguy This /could/ also be a NixOS issue. Did you happen to already open a bug there?

@ctheune
Copy link

ctheune commented Aug 4, 2023

I just double-checked with the NixOS repo and our package definition hasn't changed in 8 months and has been on 1.1.1 since then: https://github.com/NixOS/nixpkgs/blob/master/pkgs/tools/security/age/default.nix

@ctheune
Copy link

ctheune commented Aug 4, 2023

It does look a bit like /something/ changed on our side:
https://hydra.nixos.org/log/3p7vyypyrsbys1r4wa8zv2h9ma3cjhhd-age-1.1.1.drv

@someplaceguy
Copy link
Author

someplaceguy commented Aug 4, 2023

@ctheune No, I run into these kind of non-deterministic test failures quite frequently (a few per week) due to the aforementioned CPU load issue. Apparently high CPU loads tend to be very good at finding bugs in software or even tests themselves.

All of these issues that I've ran into have been in the upstream software, never in NixOS.

But almost all of them are either test failures due to timeouts or race conditions in build systems (especially Makefiles), which I don't even bother open bugs for due to these being so common (I just disable tests or disable parallel building).

I've chosen to file this bug because it didn't seem to be either of these issues, but from superficially looking at the error, I would be surprised if the problem was in NixOS.

You don't seem to have a high CPU load and you also seem to be using NixOS, so perhaps this issue is different, I don't know.

@ctheune
Copy link

ctheune commented Aug 4, 2023

Yeah, could be. The issue seems to trigger upstream as well and it has been happening for me since yesterday and it happens repeatedly.

Here's the build that broke upstream. https://hydra.nixos.org/build/229581170

The changes that triggered the build are here:
https://hydra.nixos.org/api/scmdiff?branch=&rev2=9462344318b376e157c94fa60c20a25b913b2381&type=git&rev1=05f42aef4e9502a0d0ad79c9debd11d98565cfa7&uri=https%3A%2F%2Fgithub.com%2FNixOS%2Fnixpkgs.git

I checked the changes but none of the commit message seem suspicuous.

I'm going to disable the tests and see whether it actually works.

BTW: the aarch-build didn't break and the pure linux builds and I'm not quite sure whether I'm running my nix environment on MacOS with x86_64 and thus emulation ... o_O

@someplaceguy
Copy link
Author

someplaceguy commented Aug 4, 2023

I forgot to mention that once I rebuilt age, the issue disappeared for me.

But you seem to be running into this issue all the time, which added to this issue having 3 NixOS occurrences and no other occurrences in other OSes is quite puzzling indeed.

Perhaps an upgraded dependency is causing the problem? I really know almost nothing about age or its tests, and I don't really have much time to dive into it, so perhaps one of the developers can give more insight.

@ctheune
Copy link

ctheune commented Aug 4, 2023

So, it seems I ended up with a working version of age when disabling the tests, but the tests keep failing reliably for me.

Maybe pinging @bdd and @zowoq who touched it last could help.

@FiloSottile
Copy link
Owner

FiloSottile commented Aug 4, 2023

Hi! First of all, thank you for your work packaging age, and for reporting things upstream.

What that test does (under a layer of abstraction in testscript) is make a copy of the test binary into a temporary directory that's added to $PATH, execute the binary with different flags that make it behave as a plugin, and communicate with it over stdin/stdout. Does any of that sound like something that would clash with the NixOS env?

@someplaceguy
Copy link
Author

Does any of that sound like something that would clash with the NixOS env?

Theoretically it could, as NixOS runs package builds and tests under a sandbox, but according to the information you provided it seems unlikely to me (although it's hard to tell).

execute the binary with different flags that make it behave as a plugin, and communicate with it over stdin/stdout.

Again, I know almost nothing about age, but looking at this error message:

age: error: failed to wrap key for recipient #0: test plugin: write |1: broken pipe

... at first sight, if I'm interpreting the error correctly, it kinda appears that the test plugin exited prematurely because it was trying to write to its stdout (hence the write mention) but age must have closed its end of the pipe (hence the broken pipe error), which is why the write didn't succeed but instead aborted the execution of the test plugin.

It'd be interesting to figure out for what reasons age could be closing its end of the pipe, if that's the case.

@someplaceguy
Copy link
Author

someplaceguy commented Aug 4, 2023

Depending on the code, another possibility is that the write and broken pipe errors really are coming from age, which would just mean that the test plugin is exiting prematurely (or at least, closing its end of the pipe) for some reason, even though the age parent process could be working fine.

@bdd
Copy link
Contributor

bdd commented Aug 4, 2023

I cannot reproduce this on {x86_64, aarch64}-linux or aarch64-darwin. Neither from the trunk, nor from nixpkgs-23.05(-darwin) branches.

Looking at Hydra, this seems to fail more frequently on x86_64-darwin, rarely for aarch64-darwin, and seemingly never for Linux (x86_64, aarch64).

Sounds like @ctheune can repro on their aarch64-darwin box.
I changed the derivation a bit (only running the suspect test) and patched age test suite to set AGEDEBUG=plugin envvar for the testscript runnner, so maybe we can catch how it fails. Error in Hydra builds is bubbling up from age/age.go:238 when it's trying to wrap a file key for recipient, talking to the plugin. This happens over a pipe where plugin's input is stdin, and output is stdout. In the test case, this plugin is simply implemented to return the key it received, without wrapping (not that it matters); and this happens over a pipe. Maybe for some reason age-test-plugin process used in the tests is dying. I'm hoping to see a bit more with the patch.

Can you please try:

nix build --print-build-logs --no-link github:bdd/nixpkgs/age-test-plugin#age

...and paste the output of the failure?

If it doesn't fail in the first case, you can try rebuilding by adding --rebuild to the nix build command above.

@someplaceguy Can you try the same? I'm even more interested in seeing how this fails under Linux (as you said you're running NixOS 23.05).

@someplaceguy
Copy link
Author

someplaceguy commented Aug 5, 2023

@bdd I tried both with and without your commit but I can't reproduce this issue.

@ctheune
Copy link

ctheune commented Aug 9, 2023

@bdd Thanks for investigating! See my output below.

@FiloSottile yeah, the mysterious part is that it used to work and age didn't change. I was wondering whether a dependency change in Go-land might be triggering this.

age> unpacking sources
age> unpacking source archive /nix/store/fxs7r50g1c64gbvnz76pc4g0kcclapaj-source
age> source root is source
age> patching sources
age> applying patch /nix/store/16jna1lazm41ff0rdx6dsw1h8fpwgzm2-0001-revert-temporarily-disabled-testscript-tests.patch
age> patching file cmd/age/age_test.go
age> patching file go.mod
age> patching file go.sum
age> applying patch /nix/store/6zkzhrsr4scj1dxr5igd6c513x8i3a49-0002-depend-on-c2sp_cctv_age__TestVectors.patch
age> patching file go.mod
age> patching file go.sum
age> patching file testkit_test.go
age> applying patch /nix/store/27p8sx5v262x6dr93l94219ibkj3cclw-test-agedebug-plugin.patch
age> patching file cmd/age/testdata/plugin.txt
age> configuring
age> building
age> Building subPackage .
age> Building subPackage ./agessh
age> Building subPackage ./armor
age> Building subPackage ./cmd/age
age> Building subPackage ./cmd/age-keygen
age> Building subPackage ./internal/bech32
age> Building subPackage ./internal/format
age> Building subPackage ./internal/plugin
age> Building subPackage ./internal/stream
age> running tests
age> testing: warning: no tests to run
age> PASS
age> ok         filippo.io/age  0.498s [no tests to run]
age> testing: warning: no tests to run
age> PASS
age> ok         filippo.io/age/agessh   0.464s [no tests to run]
age> testing: warning: no tests to run
age> PASS
age> ok         filippo.io/age/armor    0.442s [no tests to run]
age> === RUN   TestScript
age> === RUN   TestScript/plugin
age> === PAUSE TestScript/plugin
age> === CONT  TestScript/plugin
age>     testscript.go:481:
age>         WORK=$WORK
age>         PATH=/private/tmp/nix-build-age-1.1.1.drv-0/testscript-main4004674410/bin:/nix/store/drw07a5dnh7xyxyh9iq1k4qlhnz80qf8-go-1.20.6/share/go/bin:/nix/store/drw07a5dnh7xyxyh9iq1k4qlhnz80qf8-go-1.20.6/bin:/nix/store/v3b9q8mi1ybz7bl75w1p6p3w6aw4vfyr-clang-wrapper-11.1.0/bin:/nix/store/5j3zwxjziy6hcpaf9wlcj7qqq0w78wfi-clang-11.1.0/bin:/nix/store/vjjplvzy60qz3nznrff6s8zvb80m3bif-coreutils-9.3/bin:/nix/store/xfy2lmwhipxbpr0zxpyhdnlkbnr388bg-cctools-binutils-darwin-wrapper-11.1.0-973.0.1/bin:/nix/store/snxjdcdka4p7af454n3v7mdc4yh1lywn-cctools-binutils-darwin-11.1.0-973.0.1/bin:/nix/store/vjjplvzy60qz3nznrff6s8zvb80m3bif-coreutils-9.3/bin:/nix/store/k5abm4s2pgs0xg64lligwfprzafg8cdf-findutils-4.9.0/bin:/nix/store/hmscngdly85w04lb5cn32jcfqyg61kfq-diffutils-3.10/bin:/nix/store/3c0v61mw26d766jfdqg2bqmb32pcl40n-gnused-4.9/bin:/nix/store/hdq4m97ma7rk9srn8f0myfhhlh2inzm5-gnugrep-3.11/bin:/nix/store/8cmmw5frzhs5nzpkvfhdih6d7b3g7zqw-gawk-5.2.2/bin:/nix/store/q2csqxdlnjinz6fsvg751wryhxbnk9sw-gnutar-1.34/bin:/nix/store/397iphc81akvp54z9lsrfqly4fj56d08-gzip-1.12/bin:/nix/store/p742p1s409j1yvb0ywvk258fjipzsqd9-bzip2-1.0.8-bin/bin:/nix/store/ch57psmgq8rg5f0i0cvp07d0zfndjf8x-gnumake-4.4.1/bin:/nix/store/xx5s0rkx3aav3f850ig4afadg6ms7lq9-bash-5.2-p15/bin:/nix/store/psxc626ll2zkjkalyybkyzcxvkgr6nbx-patch-2.7.6/bin:/nix/store/5ym0a90mnlw4gi81wpraa370wi32r075-xz-5.4.3-bin/bin:/nix/store/rxr4dqbh16y2v2c7fchcnlsfqkdr8bw9-file-5.44/bin
age>         GOTRACEBACK=system
age>         HOME=/no-home
age>         TMPDIR=$WORK/.tmp
age>         devnull=/dev/null
age>         /=/
age>         :=:
age>         $=$
age>         TESTSCRIPT_COVER_DIR=
age>         exe=
age>         > env AGEDEBUG=plugin
age>         # encrypt and decrypt a file with a test plugin (0.686s)
age>         > age -r age1test10qdmzv9q -o test.age input
age>         [stderr]
age>         -> add-recipient age1test10qdmzv9q
age>         -> wrap-file-key
age>         HL9K3fpwqlOHvr2sAMWCTg
age>         -> done
age>         -> recipient-stanza 0 test
age>         HL9K3fpwqlOHvr2sAMWCTg
age>         -> done
age>         age: error: failed to wrap key for recipient #0: test plugin: write |1: broken pipe
age>         age: report unexpected or unhelpful errors at https://filippo.io/age/report
age>         [exit status 1]
age>         FAIL: testdata/plugin.txt:4: unexpected command failure
age> --- FAIL: TestScript (0.00s)
age>     --- FAIL: TestScript/plugin (0.69s)
age> FAIL
age> FAIL       filippo.io/age/cmd/age  1.176s
age> FAIL
error: builder for '/nix/store/54krlsbx5b05x8ly3b7q37y5k24jhza1-age-1.1.1.drv' failed with exit code 1

@squirmy
Copy link

squirmy commented Aug 15, 2023

One extra data point here. It works fine for me on aarch64-darwin. It does not work for me inside GitHub actions on x86_64-darwin.

So, it seems I ended up with a working version of age when disabling the tests, but the tests keep failing reliably for me.

@ctheune would you mind sharing your solution for disabling the tests?

I've tried:

ageWithoutTests = pkgs.age.overrideAttrs (_: {
  doCheck = false;
});

But this doesn't seem to work around the issue for me.

@ctheune
Copy link

ctheune commented Sep 12, 2023

Sure. I'm using

    (age.overrideAttrs (finalAttrs: previousAttrs: {
      doCheck = false; # https://github.com/FiloSottile/age/issues/517
    }))

YorikSar added a commit to YorikSar/nixpkgs that referenced this issue Nov 6, 2023
See issue in upstream FiloSottile/age#517.
age sometimes fails this test, but sometimes it doesn't.
Disable this test until the upstream issue is resolved.
nyabinary pushed a commit to nyabinary/nixpkgs that referenced this issue Nov 14, 2023
See issue in upstream FiloSottile/age#517.
age sometimes fails this test, but sometimes it doesn't.
Disable this test until the upstream issue is resolved.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants