Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sosreport: Add fallback error message, fix testInsightsClient failure on rhel4edge #19104

Merged
merged 3 commits into from
Jul 19, 2023

Conversation

martinpitt
Copy link
Member

If sos report fails, we can get an empty message in error. That will lead to not showing the error in the dialog at all (as it's falsy), even if errorDetail was set. Fix that by adding a generic fallback title.

Also log the full error for easier debugging.


I noticed this while investigating the current sos rawhide failure. This looks really unfriendly -- you click "Report", the dialog flickers and nothing happens. Now it shows the error properly:

sos

@martinpitt martinpitt added no-test For doc/workflow changes, or experiments which don't need a full CI run, and removed no-test For doc/workflow changes, or experiments which don't need a full CI run, labels Jul 18, 2023
@martinpitt martinpitt added the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Jul 18, 2023
@martinpitt martinpitt marked this pull request as draft July 18, 2023 07:53
@martinpitt
Copy link
Member Author

rhel-8-9 does not seem to pick up the mock /usr/local/bin/sos. That's because sos lives in /usr/sbin, and it has a different $PATH preference. Also, the C bridge has a different error.

Fixed both.

@martinpitt martinpitt removed the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Jul 18, 2023
@marusak
Copy link
Member

marusak commented Jul 18, 2023

ah nice, thanks! My first thought when I checked screenshots in the naughty was, why is there no error in the dialog

@martinpitt
Copy link
Member Author

martinpitt commented Jul 18, 2023

This failure looks unrelated, but repeats ND@6 did not run any SOS related test. Retrying to compare.

I ran the test locally by itself four times, and they all passed. I also ran the exact same list of ND@6 tests locally:

TEST_OS=rhel4edge test/common/run-tests --no-retry-fail --test-dir test/verify/ TestAccounts.testGroupCreate TestConnection.testHeadRequest TestCurrentMetrics.testMemory TestMenu.testDarkThemeSwitcher TestNondestructiveExample.testOne TestPackages.testBasic TestRoles.testBasic TestServices.testBasic TestServices.testRelationshipsUser TestServices.testServicesThemeConsistency TestServices.testTransientUnits TestSystemInfo.testInsightsStatus

These also passed. But on CI it failed again.

I added two commits which may help.

It also happens on other PRs, so this is unrelated. It may have come with cockpit-project/bots#5015 (yesterday's image refresh), but the test succeeded there.

This debug log shows that there are no insights related processes, and fuser didn't find anything either.

This log brings some insights (haha):

8556 /bin/sh -c #! /bin/sh  # Poll until /var/lib/insights/insights-details.json is 5 minutes # older than /etc/insights-client/.lastupload, then exit.  # Calling "insights-details --check-results" only returns results # corresponding to the most recent upload some time after the upload, # but we don't know when exactly.  We assume that it will not take # more than 5 minutes.  However, we also want the results as soon as # they are available so we poll a couple of time before the 5 minutes # are up.  # We poll fast for the first minute, and then slow down.  Also, there # is a absolute limit on how often we poll, in case something is wrong # with the time stamps (such as .lastupload being from 5 years in # future).  set -eu  details_out_of_date () {     if ! [ -f /etc/insights-client/.lastupload ]; then         return 1     fi     if ! [ -f /var/lib/insights/insights-details.json ]; then         return 0     fi     last_upload=$(stat -c "%Y" /etc/insights-client/.lastupload)     details=$(stat -c "%Y" /var/lib/insights/insights-details.json)     [ $details -lt $(expr $last_upload + 300) ] }  tries=0 while [ $tries -lt 20 ] && details_out_of_date; do     insights-client --check-results     if [ $tries -lt 5 ]; then         sleep 10     else         sleep 60     fi     tries=$(expr $tries + 1) done  --

I can't find the source of this anywhere. So one more round..

This reproduces reliably on a CI machine. I don't yet know what the difference is between that and local tasks toolbox.

@martinpitt martinpitt added the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Jul 18, 2023
@martinpitt martinpitt removed the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Jul 19, 2023
@martinpitt martinpitt changed the title sosreport: Add fallback error message sosreport: Add fallback error message, fix testInsightsClient failure on rhel4edge Jul 19, 2023
@martinpitt martinpitt marked this pull request as ready for review July 19, 2023 06:06
@martinpitt martinpitt requested a review from marusak July 19, 2023 06:41
marusak
marusak previously approved these changes Jul 19, 2023
Copy link
Member

@marusak marusak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just one comment but not a good reason to block this :)

If `sos report` fails, we can get an empty message in `error`. That will
lead to not showing the error in the dialog at all (as it's falsy), even
if errorDetail was set. Fix that by adding a generic fallback title.

Also log the full error for easier debugging.
stdout gets captured by Machine.execute(). Write the fuser output to
stderr so that we can actually see it.

Also cover the directory itself, not just its contents.
Our directory watches keep /etc/insights-client busy, so that it cannot
be unmounted during cleanup. Explicitly log out to prevent that.
@martinpitt martinpitt requested a review from marusak July 19, 2023 06:52
@martinpitt martinpitt added the flake unstable test label Jul 19, 2023
Copy link
Member

@marusak marusak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Danke!

@martinpitt martinpitt merged commit 7d5ffd7 into cockpit-project:main Jul 19, 2023
27 of 31 checks passed
@martinpitt martinpitt deleted the sos branch July 19, 2023 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flake unstable test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants