Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to detect SSDs in copyDir should not be a fatal error. #3653

Merged
merged 6 commits into from
Oct 30, 2023

Conversation

cmacknz
Copy link
Member

@cmacknz cmacknz commented Oct 24, 2023

The migration of the agent CI system to buildkite has resulted in a new pipeline with new Mac workers. On those workers there is a new test failure: https://buildkite.com/elastic/elastic-agent/builds/4344#018b6184-da97-4130-8dcf-60f543c6c94a/103-588

2023-10-24 11:54:42 UTC | === FAIL: internal/pkg/agent/application/upgrade Test_CopyFile/Existing_but_open,_ignore_errors (0.13s)
-- | --
  | 2023-10-24 11:54:42 UTC | upgrade_test.go:102:
  | 2023-10-24 11:54:42 UTC | Error Trace:	/Users/admin/builds/bk-agent-prod-orka-1698148180680922232/elastic/elastic-agent/internal/pkg/agent/application/upgrade/upgrade_test.go:102
  | 2023-10-24 11:54:42 UTC | Error:      	Not equal:
  | 2023-10-24 11:54:42 UTC | expected: false
  | 2023-10-24 11:54:42 UTC | actual  : true
  | 2023-10-24 11:54:42 UTC | Test:       	Test_CopyFile/Existing_but_open,_ignore_errors
  | 2023-10-24 11:54:42 UTC | Messages:   	ghw.Block() returned error: ioreg unmarshal resulted in 2 I/O device tree nodes

The ghw.Block function failing is from https://github.com/jaypipes/ghw which claims directly in its description:

ghw is a Go library providing hardware inspection and discovery for Linux and Windows. There currently exists partial support for MacOSX.

My guess is that this is a limitation of that library where it is failing to deal with the hardware configuration of the Mac workers here (probably there is some network attached storage through it off).

Regardless of the root cause, this isn't actually a fatal error since we can proceed with the default concurrency level of 1. Especially since this is part of the upgrade process, we should just fall back to a default and attempt the copy and let that fail instead if that is what is going to happen.

@cmacknz cmacknz added Team:Elastic-Agent Label for the Agent team backport-v8.11.0 Automated backport with mergify labels Oct 24, 2023
@cmacknz cmacknz self-assigned this Oct 24, 2023
@cmacknz cmacknz requested a review from a team as a code owner October 24, 2023 18:00
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@cmacknz cmacknz added skip-changelog and removed backport-v8.11.0 Automated backport with mergify labels Oct 24, 2023
@cmacknz
Copy link
Member Author

cmacknz commented Oct 24, 2023

Not backporting since the original change isn't in 8.11 #3212

@mergify
Copy link
Contributor

mergify bot commented Oct 24, 2023

This pull request does not have a backport label. Could you fix it @cmacknz? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 24, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-10-26T17:00:14.076+0000

  • Duration: 29 min 31 sec

Test stats 🧪

Test Results
Failed 0
Passed 6553
Skipped 59
Total 6612

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 24, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.824% (84/85) 👍
Files 66.885% (204/305) 👍
Classes 65.95% (368/558) 👍
Methods 53.016% (1160/2188) 👎 -0.024
Lines 39.363% (13665/34715) 👍 0.004
Conditionals 100.0% (0/0) 💚

@cmacknz
Copy link
Member Author

cmacknz commented Oct 25, 2023

buildkite test it

@cmacknz
Copy link
Member Author

cmacknz commented Oct 25, 2023

--
  | 2023-10-25 20:54:04 UTC | /usr/local/lib/ruby/gems/3.1.0/gems/rexml-3.2.5/lib/rexml/parsers/treeparser.rb:96:in `rescue in parse': #<RuntimeError: Illegal character "\\u0000" in raw string "\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u000
...
--
  | 2023-10-25 20:54:05 UTC | from /usr/local/lib/ruby/gems/3.1.0/gems/rexml-3.2.5/lib/rexml/text.rb:136:in `each'
  | 2023-10-25 20:54:05 UTC | from /usr/local/lib/ruby/gems/3.1.0/gems/rexml-3.2.5/lib/rexml/text.rb:136:in `check'
  | 2023-10-25 20:54:05 UTC | from /usr/local/lib/ruby/gems/3.1.0/gems/rexml-3.2.5/lib/rexml/text.rb:122:in `initialize'
  | 2023-10-25 20:54:05 UTC | from /usr/local/lib/ruby/gems/3.1.0/gems/rexml-3.2.5/lib/rexml/parsers/treeparser.rb:47:in `new'
  | 2023-10-25 20:54:05 UTC | from /usr/local/lib/ruby/gems/3.1.0/gems/rexml-3.2.5/lib/rexml/parsers/treeparser.rb:47:in `parse'
  | 2023-10-25 20:54:05 UTC | from /usr/local/lib/ruby/gems/3.1.0/gems/rexml-3.2.5/lib/rexml/document.rb:448:in `build'
  | 2023-10-25 20:54:05 UTC | from /usr/local/lib/ruby/gems/3.1.0/gems/rexml-3.2.5/lib/rexml/document.rb:101:in `initialize'
  | 2023-10-25 20:54:05 UTC | from /src/bin/annotate:59:in `new'
  | 2023-10-25 20:54:05 UTC | from /src/bin/annotate:59:in `block in <main>'
  | 2023-10-25 20:54:05 UTC | from /src/bin/annotate:53:in `each'
  | 2023-10-25 20:54:05 UTC | from /src/bin/annotate:53:in `<main>'
  | 2023-10-25 20:54:05 UTC | 💥 Error when processing JUnit tests

That's a new one

@cmacknz
Copy link
Member Author

cmacknz commented Oct 25, 2023

buildkite test it

Copy link
Member

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pchila
Copy link
Member

pchila commented Oct 26, 2023

/test

@cmacknz
Copy link
Member Author

cmacknz commented Oct 26, 2023

(linux-amd64-ubuntu-2204) Failed for instance linux-amd64-ubuntu-2204 (@ 34.16.59.95): ogc-linux-amd64-ubuntu-2204-00df unable to continue because stack never became ready: failed to check for cloud 8.12.0-SNAPSHOT to be ready: context deadline exceeded
--

@cmacknz
Copy link
Member Author

cmacknz commented Oct 26, 2023

buildkite test it

@mergify
Copy link
Contributor

mergify bot commented Oct 26, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b block-device-detection-nonfatal upstream/block-device-detection-nonfatal
git merge upstream/main
git push upstream block-device-detection-nonfatal

@cmacknz
Copy link
Member Author

cmacknz commented Oct 26, 2023

This also affects the install command which is now fixed.

Document that errors are not fatal.
@cmacknz
Copy link
Member Author

cmacknz commented Oct 26, 2023

I moved all uses of ghw.Block into one place so that this problem is less likely to return.

@elastic-sonarqube
Copy link

SonarQube Quality Gate

Quality Gate failed

Failed condition 20.0% 20.0% Coverage on New Code (is less than 40%)

See analysis details on SonarQube

@cmacknz
Copy link
Member Author

cmacknz commented Oct 27, 2023

I am going to force merge this for the same reason as #3623 (comment), the install code is covered by integration tests.

@cmacknz cmacknz merged commit 3d8bdf4 into elastic:main Oct 30, 2023
20 of 21 checks passed
@cmacknz cmacknz deleted the block-device-detection-nonfatal branch October 30, 2023 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants