Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notify of SMART errors globally #1134

Closed
andreasn opened this issue Sep 8, 2014 · 13 comments
Closed

Notify of SMART errors globally #1134

andreasn opened this issue Sep 8, 2014 · 13 comments

Comments

@andreasn
Copy link
Contributor

andreasn commented Sep 8, 2014

Maybe a duplicate of another issue I've since forgot :)
Filing because it came up in a conversation.

We currently have no good way in Cockpit to notify of errors globally. If really severe stuff is happening to the system, it needs to be messaged correctly. This could be things like services crashing, hard drives failing or network connections suddenly missing. In short; Things that will impact the passengers on the bus and that needs to be acted upon ASAP, if going with that metaphor.

Patternfly recommends this pattern for notifications: https://www.patternfly.org/wikis/patterns/pattern-development/draft-patterns/global-notifications/

If this sounds good, I'll go ahead and create a wiki page for it.

@andreasn
Copy link
Contributor Author

andreasn commented Sep 8, 2014

@martinpitt
Copy link
Member

All of the links here are 404, but I suppose it should be easy to find their current URLs. However, we don't currently look for any global error of that sort. Adding SMART alerts for hard drives to the System front page does sound like a good idea. Crashing services or "suddenly" absent network connections are a different beast though, as these are not well-defined and we don't have state in between cockpit sessions.

@andreasn, WDYT of repurposing this to showing SMART alerts? This is then a concrete and tangible issue.

@andreasn
Copy link
Contributor Author

Updated links to the current notification pages, now that these patterns are not drafts any more:
https://www.patternfly.org/pattern-library/communication/notification-drawer/
https://www.patternfly.org/pattern-library/communication/toast-notifications/

@martinpitt martinpitt changed the title Notify of errors globally Notify of SMART errors globally Jan 15, 2019
@martinpitt
Copy link
Member

Thanks. Retitling accordingly, so that this becomes actionable.

@NanoSector
Copy link

Whoops, I made a new feature request after I found this, thinking this was just a way of alerting users and not for showing SMART data in general.

I'll re-drop my two cents here though. While cockpit shows some kind of status, adding SMART data like the GNOME Disks utility would make the disk information page much more useful. (The drive I mentioned in my original report has since failed and Cockpit still shows it's OK.)

For reference, this is how the GNOME Disks overview looks:
image

@marusak
Copy link
Member

marusak commented Dec 5, 2020

thinking this was just a way of alerting users and not for showing SMART data in general.

yeah, this issue is styled in a sense it is just about showing warnings, but really showing warning without showing overview as well would make little sense so I consider this issue to count on that as well.

(editing a bit, thought I was commenting on a different issue)

@andreasn
Copy link
Contributor Author

andreasn commented Dec 6, 2020

Looking at the screenshot above, the actually interesting bits would be if anything in the Assessment column is not marked as OK. That needs to be shown without having to dive into every single disk. One place we could show that would be in the Health card on the overview page. But maybe also on the Storage page next to each disk?
The details could go onto the pages of individual disks.

@NanoSector
Copy link

@andreasn GNOME Disks shows a one-line summary in the "Overall assessment" field. Maybe this can be shown in the Health card if it is not simply "OK" like shown in the screenshot?

For instance, if a disk has bad sectors, the overall assessment would show "Disk is OK, X bad sectors" where X is a number. While the disk might be fine enough now, this usually does indicate some imminent future failure which should be especially important in the use case of a server, and thus Cockpit.

@KKoukiou
Copy link
Contributor

pkg/lib/notifications.js is already doing it, we use page_status to present failed status already in overview page or the shell.

There is another issue discussing the same feature request, closing this as a duplicate. #11437

@markwort
Copy link

Hey, I'm wondering if this ticket should be kept open to track the state of SMART monitoring in cockpit?

Looking through the issues, I found #15010 and that was closed, stating it was a duplicate of this ticket.
#15010

I think at least a simple overview of SMART data would be great, even if I had to manually trigger that, for example in the storage panel, after selecting a particular disk. I'd even take verbatim output from a smartctl run somewhere behind the scenes...
It is difficult to cover everything, as there is no real standard for what is reported, or how it is reported, e.g. several vendors (and even several drives of the same vendors) might have different means of stating how much data has been written in total, so it might not be possible to have sensible "alerts" for that...

@garrett
Copy link
Member

garrett commented Aug 18, 2021

SMART info should be:

  1. in the storage page as information when viewing a disk
  2. in the storage page as errors/warnings
  3. in the overview as a message

It's definitely part of #11437, but requires special SMART-specific work to be implemented.

There's an issue @ #15460 that tracks disk errors being reported in Cockpit, including SMART, at both locations. That would cover both 2 and 3, but not item 1 for just displaying current SMART information.

@garrett
Copy link
Member

garrett commented Aug 18, 2021

@markwort: I agree with you and have re-opened #15010, as displaying the SMART status and information (even just the current high-level status that SMART reports) is distinct from displaying SMART warnings/errors.

As far as errors are concerned, SMART does have an overall status, even if different vendors have different details and thresholds. We can't reliably parse the extended information, but we can show it. However, with some vendors especially, sometimes it may lead someone to think a disk is failing when it isn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants