-
-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address spurious LB+RB log flood on APC BXnnnnMI devices #2565
Conversation
That "\n" gets printed as "networkupstools#12" Signed-off-by: Jim Klimov <[email protected]>
…us_get() Signed-off-by: Jim Klimov <[email protected]>
…etworkupstools#2347] Signed-off-by: Jim Klimov <[email protected]>
…ay_sec et al [networkupstools#2347] Signed-off-by: Jim Klimov <[email protected]>
Converting to draft while this is being tested, so NUT CI won't rebuild it in vain against newer target branch as it evolves. |
Gentle bump. So many people complained about the issue, is anyone still interested in testing a prospective fix? :) |
im new to this so please bear with me so to test just need to clone this branch, compile, and sudo make install on the drivers folder? the way i currently install nut is installing via apt first then overwrite it with manual compile and sudo make install |
Hi Jim, thanks a lot for the effort put into making this work for everyone. |
Generally, yes. A finer approach is presented at https://github.com/networkupstools/nut/wiki/Building-NUT-for-in%E2%80%90place-upgrades-or-non%E2%80%90disruptive-tests which refers to the list of dependencies per platform, configure the new build similarly to what your packages (or older custom builds) delivered, and describes how to test a new driver from the build workspace before installing it over your older build for "production" use (or not, if the test is unsuccessful). Surely it is not the only way to skin a cat, but one best streamlined to exploratory custom builds. |
Code looking good as usual, Jim, one thing we might want to consider is if we should default to |
Well, given |
Signed-off-by: Jim Klimov <[email protected]>
Rolled out a testing package with this PR's code for the affected Unraid users today, hoping to hear back soon-ish. 😉 |
I just heard back with rudimentary logs from one affected user who thankfully tested the PR package for us. I'll see if I can get in touch with them again to raise the debug level for more verbose logfiles and also to wait longer...
(blanked out details are due to Unraid's log anonymization mechanisms) |
I can also confirm that issue's seems to persist with this update on my BX1600MI. Sorry if this is unrelated but a different issue that was fixed before also seems to have returned.
|
It's not resolved in Unraid on latest preview build. I'm stilling getting errors. My battery is also never reaching 100% anymore on this build. It seems to be stuck at 99%. Model: Back-UPS BX2200MI
|
Thanks for tests and logs, hope to figure out what mis-fired, when I get time to tend to this :\ People are welcome to look over the PR code, maybe some dumb typo breaks stuff. Regarding #2216 and OL+DISCHRG messages - that mentioned PR added a way to conceal these messages on devices where the owner knows (suspects) that this state combo means calibration (should set |
I think he meant the following behavior:
But, since the logs are quite far apart, it's possible that his device either doesn't report a battery.charge (in which case logs would by default fire for being longer than 30 seconds apart) or does some hover-charge thing between 99% and 100% and the log by default fires seeing a different battery.charge than before (notification 1 at 99% and notification 2 at 100%, rinse and repeat). Ideally, yes, |
I might be remembering incorrectly but I think I didn't get any "OL+DISCHRG" messages in my logs while on the 2.8.2 stable release. Just now I've added the following 2 lines to my ups.conf file and I'll report back tomorrow if these help fix the issue(s).
The UPS showing 99% instead of 100% is a different issue i'm experiencing. Not sure if the UPS is doing some kind of hover-charge but the UPS never reports above 99%. There is 1 way to get it to show 100% and that is reconnecting the USB cable when the UPS is fully charged. If the UPS ever gets below 100% because of a power outage it will once again become stuk at 99%. Rebooting NUT or even the server itself doesn't fix this and it will keep reporting 99%. |
Got two more reports from affected users testing this PR:
|
After setting lbrb_log_delay_without_calibrating = 1 and onlinedischarge_calibration = 1 in ups.conf I'm no longer getting LOWBATT/REPLACEBATT events. Below you can find the NUT details and all NUT related logs from the last +-24 hours. If you want me to test anything else please let me know.
|
Now, this sounds very promising, thanks! So I suppose the |
I also think this looks very positive, the auto detection mechanism seems to work too except for the one case with a BXnnnMI instead of a BXnnnnMI (although I can't see a reason why it should fail there code-wise). The options are there regardless of any auto detection woes, so it may at most require some fine tuning from the user to their respective device's behavior. This could also prove helpful for other UPS devices with odd status behavior, so definitely deserves its place being added to the NEWS and respective documentation files. |
Fun fact: this issue and PR got onto the working table along with #2564 for unrelated line-up of other APC devices :) |
Signed-off-by: Jim Klimov <[email protected]>
…r spurious LOWBATT/REPLACEBATT events on APC BXnnnnMI devices [networkupstools#2347] Signed-off-by: Jim Klimov <[email protected]>
Bumped the documentation and log-message suggestions. The latter is quite a monster of elvis operators now, would benefit from a bit of cosmetic testing :D |
…etworkupstools#2437] Signed-off-by: Jim Klimov <[email protected]>
… tweaks since 2023)" [networkupstools#2347] Signed-off-by: Jim Klimov <[email protected]>
The CI faults are due to a change with an agent after an upgrade (lacked 32-bit libs for some dependencies now). |
Tested the monster message printer, works well but relies on math a bit (that the Not sure why CI builds that code path where it failed due to missing libs, by config it should not have. |
…actly 0/1 [networkupstools#2347] When building a complex text expression, we rely on maths in some spots. Signed-off-by: Jim Klimov <[email protected]>
…etection Signed-off-by: Jim Klimov <[email protected]>
Signed-off-by: Jim Klimov <[email protected]>
…ctually build a graphical program Namely, that further third-party libs are available for the chosen architecture, not only the headers. Had a problem with 32/64-bit build agent that only had a binary lib*.so set for 64-bit after an update. Signed-off-by: Jim Klimov <[email protected]>
Thanks for the additions, is there a process for testing drivers and such hardware-specific code without actually having the affected hardware (which I assume you don't either)? Would love to help out with testing such code, but not sure how to go about that with the drivers (dummy-ups wouldn't work for a specific driver, right?). That message printer, as an example. |
In this case, I copy-pasted the block as a C program, replacing `upsdebugx`
with `printf` and fiddling with the `got_*` flags that already were `int`s
cached to avoid many `testvar()` calls :)
So not much of an established process yet, although there are some
precedents in `tests/*.c{,pp}` files about poking into e.g. `driver/main.c`
code for a semblance of unit tests...
…On Mon, Aug 12, 2024, 11:20 Rysz ***@***.***> wrote:
Tested the monster message printer, works well but relies on math a bit
(that the testvar findings are exactly 0 or 1) so will poke that a bit
later.
Not sure why CI builds that code path where it failed due to missing libs,
by config it should not have.
Thanks for the additions, is there a process for testing drivers and such
hardware-specific code without actually having the affected hardware (which
I assume you don't either)? Would love to help out with testing such code,
but not sure how to go about that with the drivers (dummy-ups wouldn't work
for a specific driver, right?).
—
Reply to this email directly, view it on GitHub
<#2565 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMPTFDW46BKV6MPTBETODLZRB44HAVCNFSM6AAAAABLT4WXO6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBTGQ3TQNBYGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Closes: #2347
Also note for #2533 question
It also adds some visibility around calibration status setting, extends "dstate" API with a
status_get()
method, and this helps avoid setting duplicate states (roughly like "OB LB OB") seen in some drivers earlier.I hope this toggle allows to fix the problem in the field by optionally delaying spurious status propagation from the driver by
lbrb_log_delay_sec
at most, and if the device is otherwise "online" and is calibrating (unlesslbrb_log_delay_without_calibrating
flag was also set).The fix goes to some lengths to try detecting the device model during init to default the setting to 3 sec for this line-up, otherwise defaults to 0 (immediate status propagation).
@desertwitch @grifferz @ShiroDN @PilaScat @bitmario @marcgarciamarti @KillianMelsen @gerben838665 @mauro-dasilva @tsopokis @statte @s7uben @Sanderluc5 @ivanjx @gabrieleancora @JoshNansoz1 @rioachim @owenperkins111 : Better late than never: would you be able to try a custom build of NUT following https://github.com/networkupstools/nut/wiki/Building-NUT-for-in%E2%80%90place-upgrades-or-non%E2%80%90disruptive-tests to see if it handles the devices better?
For the git checkout, use this PR's source branch:
If you run the built driver with debug verbosity of 2 or greater, it should log that it saw these calibration-like, LB and RB states, and chose to suppress them for a while according to settings. Checking that the numbers from CLI/
ups.conf
settings are propagated and considered correctly would also be helpful :)Maybe these messages should be sunk to a less visible debug verbosity, eventually.
Also of interest is if the impacted devices report frequent calibration messages by default (without debug) and if that should be addressed additionally or if
onlinedischarge_calibration
and/oronlinedischarge_log_throttle_sec
and related existing settings address it and the logs can be made peaceful and quiet already.