-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Extend network bonding metrics #1604
Comments
Most of that makes sense to me. The only thing that I would probably leave out is the One other minor labeling suggestion would be to call it It might also be useful to make
|
We're using bonding too, it would be great to add |
Hi, we also use bonding heavily in our infrastructure and I would be very interested in exposing some of these metrics as well. What is the status of this work? |
I don't think anything was done in that regard. Feel free to submit a PR. But please note that procfs interactions, if there needs to be added/change anything, should go into : https://github.com/prometheus/procfs |
I'm very sorry, but I still didn't get around to finishing the work on this. I started moving the existing parsing into procfs. I'm sharing my work-in-progress here, but it's far from complete, needs rebasing unto more recent changes and the procfs changes are hacked into vendor/ instead of making them in the appropriate project: If anyone wants to pick this up, feel free to (maybe leave a short comment here). I have still interest in this, but cannot promise when I'll be able to finish it. |
Came across this issue on our side today, would be nice if the info about aggregator ids could be implemented. The iface with id 3 is out of bond aggregation.
|
I came across this today looking into monitoring switch-side misconfigurations (LACP bond with no active members on the host side). I think I have some time next week to look at extending the procfs module to collect these statistics and the bonding collector to export them. |
@bewing Are you working on this? If not, I can take this up. |
I started on this a year ago, and got as far as opening some issues in related projects, identifying the need to flesh out test fixtures in #2347 so as to improve procps via prometheus/procfs#439 to make the metrics available. The working directories are lost to the ethos, and surely the ground under them has moved. It's back to a fresh start at this point, but maybe the actual code changes in the procps pull are still useful. I have not had time to work on this, and encourage others to make the attempt if they can. |
node_exporter currently exposes details about network bonding, which is great. To be able to monitor more failure cases, we would need additional metrics which we haven't found in node_exporter yet:
Essentially, we would need the following metrics:
It could also make sense to add some more information in the same go. So far, we haven't required these in our alerting, but they may be useful nevertheless:
We currently use a shell script and node_exporter's textfile collector to fill this gap. However, I think it would be useful to support these metrics out-of-the box, especially since only /sys files need to be read.
I'd volunteer to work on PRs against procfs and node_exporter. I would suggest adding this to the existing bonding_linux.go as it is closely related. Looks like this would also imply converting the existing bonding collector to procfs.
Related: #841
@SuperQ @discordianfish @pgier What do you think? Does it make sense in general? Should we only implement the first two metrics or the more generic approach? Do the suggested names make sense?
The text was updated successfully, but these errors were encountered: