-
-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
APC Back-UPS BX1600MI spurious LOWBATT/REPLACEBATT events #2347
Comments
I have a new APC Back-UPS BX1200MI (basically the same model, just with a smaller battery) connected to TrueNAS that is using NUT 2.8.0, and there is some weird behavior. Sometimes, 1-2 times an hour, it triggers LOWBATT/REPLACEBATT events. In a debug log, I can see "[D2] parse_status: [OL DISCHRG CHRG LB RB]". When it works and the battery is charged to 100%, it seems to be switching between "OL" and "OL CHRG" every few seconds. I am not sure if that is a problem too, or just normal behavior, though. Here is a debug log from the driver at the time of an event:
|
CC @desertwitch : I think some of your earlier investigations were about similar behavior; any ideas here? Notably, in NUT master (after release 2.8.1) we had PR #2216 to address such notifications with new configuration options. From research and educated guesswork there, this may be just part of battery charge management which happens under the hood on all UPS devices that became visibly exposed on some firmwares. |
It seems to be driver related, something that the UPS is sending byte-wise that's being parsed wrongly into
I couldn't find an issue in the upsmon handling or the later processes. I've since had another report from a user with an APC BX1200MI-GR also experiencing intermediate |
Got another report of an affected Back-UPS BX750MI with spurious RB events. |
Same thing for me, I've the same problem with BX1200MI-GR |
Unfortunately my contact didn't come through with the UPS so we're back to square one. |
I am seeing the same behavior on a newly purchased BX1600MI-GR, manufactured October 2023, on Debian 12's nut 2.8.0-7. @desertwitch let me know if I can help by collecting logs or trying any patches |
Hello there! I can confirm this behavior is also happening with that version. The pattern seems to be random. It sometimes lasts a few seconds or a few minutes. |
Any chance you could try if the problem persists putting this flag in your UPS configuration in
So an example configuration would look like this:
Not sure if it will help or stop UPS recognition at all, but it's worth an attempt since we don't have an UPS at hand. @jimklimov : Any ideas from the logs provided above? It's a relatively popular entry-level line of APC so would be lovely if we can get this addressed one way or another, I'm sure many people would be appreciative. |
The only thing that comes to mind is to add a yet another throttle for such reports - so to not expose the status if RB appears and dissipates quickly (I'd be wary of tweaking LB like that... maybe optionally-throttle tied to known OL/OB/BYPASS status?) |
@desertwitch I just tried |
Thanks for the attempt, it was a long shot. Is there any logic to the statuses, as in do you notice a rapid succession or cycling through of certain statuses (LB, RB, LB, OL or OL, LB, RB, OL) in the logs? In general, do you have any logs you can provide us from NUT itself with timestamps so we can try to investigate this some more? Particularly interesting would be if the LB (low battery state) is happening at the same time the UPS is also OL or if the UPS goes OB before/after.
I'm curious though if this is something else entirely, which the driver misinterprets, or the UPS is actually sending these bogus statuses. I'm thinking it probably doesn't happen on their own APC software (if there is one for that series) or there would be more complaints on the APC forums, but considering that both APCUPSD and NUT are affected in a similar fashion it does seem to be some change in the firmware when compared to other functioning APC devices or even older models of the BX series which seem to work (firmware bug in newer models?). I agree with a LB throttle on its own being less than ideal, unless tied to another status or succession of states (if we can figure out any logic behind what's happening here). |
The general pattern is that it will usually be in a normal status like
Afterwards it goes back to a normal status. I've been running a patched |
I just got home and installed NUT through the unraid apps. I'd be happy to provide logs but not sure how. |
This is very valuable, thanks, so we can also see here that the UPS is doing some kind of - perhaps - calibration before these The other user's log above has shown a status of |
I don't know whether it is of any use, but I did not change anything (kept running git HEAD of nut) and the above strange behaviour stopped after a few days. It has been behaving normally for weeks now. Thanks, |
Just set NUT up normally through the GUI as you want to have it and watch the SYSLOG (Tools->System Log) for any such strange events being reported while NUT is started, they should pop up in the log by themselves if they occur on your system.
That is extremely curious, makes me wonder even more what the UPS is doing there - thanks though! |
Correct - I only get "is low" and "needs to be replaced". As a sanity check, I just tried pulling the plug for a few seconds and this is what I see in that case:
Thanks for taking the time to look into this :) |
Ah got it. They system log does show some low battery indications as seen below. I also pulled the plug and that all seems to be working fine. UPS switches to battery, I get an unraid notification, and it goes back online when plugged back in.
|
Thanks for the logs everyone - just to double-check back here, has anyone gotten a shutdown from this problem? |
My unraid was running fine until I hooked it up to the UPS yesterday evening. This morning it was off, and it started a parity check after turning it on even though nothing was scheduled. I can't find anything that logged the reason for the shutdown though. That was also using the built-in UPS utility, so I'll keep running NUT and see if I get any other unexpected shutdowns. |
So I just checked the system log again and found a whole bunch of errors. Don't know if it's related. I left all configuration options on the default values except for the setting the shutdown rules to 25% battery left. I'm also running powertop --auto-tune in case that's relevant.
|
I'm still on 2.8.0 (no update due to using truenas scale). I did still notice in the kernel log that every 3 minutes (exactly 3 min) the ups's usb connection was lost and restored. I've changed usb cable , usb port but this didn't go away untill I disconnected the ups from the server, stopped the ups service. Plugged in the ups again and started the service afterwards. Now I haven't had the message for 6h+ . Hope this helps someone. |
I was playing with my server and changed some settings in the bios and reconnected everything and the problem was gone. I don't know what the cause was. |
might be a bit unrelated since i use apcupsd directly but i tried compiling apcupsd with |
Hello, I do have the same exact model and I'm having the same problem. Let me know if you need any log/report and I'll provide them! |
I’m getting a bunch of the same logs here with a BX950MI. Is there anything we can do to help fix this? Provide some logs or perform any tests for you? |
Same issues here with BX1600MI exchanged under warranty. The old one was fine, the new one is unusable with apcupsd on freshtomato router firmware. The log is flooded by "battery disconnected/UPS battery must be replaced/Battery reattached" followed after some time by "hid-core.c: control queue full". I have tried the patch in the following thread (that ignores 'no battery' / 'replace battery' if not reported twice in a row), but it behaves the same: https://sourceforge.net/p/apcupsd/mailman/apcupsd-users/thread/9c51020c-9d8b-4eda-b4d4-3ff8dc08749c%40okazoo.eu/#msg58741334 I am aware that apcupsd's \drivers\usb\usb.c is totally different that nut's drivers/usbhid-ups.c but in the end it would be a similar approach on the misbehaving UPS status events info. |
Thanks for the different reports and ideas and the recent link to Not really sure which logs to ask for (thanks for offering), but thought of one point that can impact development of a fix: when the status flips about "battery disconnected -> must replace -> was reattached", how long does such cycle take? Is it always a matter of a few seconds with a (relatively) long quiet phase of normal work in between? It may make sense to add an option to suppress such messages if the issue gets resolved within a specified timeframe... (and/or following the apcupsd patch - immediately report repeated offenses that are not resolved at the moment). Would anyone here be brave enough to try coding such a change and making a PR? :) |
Here are the event logs from my unraid server. Jul 18 00:03:39 Garage-Server apcupsd[3492]: Power is back. UPS running on mains. |
Thanks. So it is fairly frequent, but each situation takes a couple of seconds to resolve. No particular rhythm or rhyme to frequency of the situations, though. (Thought if their comms chip goes to sleep if not poked every 30-120 sec like some other similar cases did) |
Sorry I haven't been able to get much further on this issue either, as there's no apparent logic to these conditions. |
As the original reporter of this issue, I did go through this with APC support and they are not prepared to accept any software reports unless they are from their own Powerchute software on Windows. As these spurious events do not happen there, they closed my issue as not reproducible. So in short, APC do not accept there is any kind of bug to fix I'm afraid. Strangely enough as mentioned in earlier comment: the behaviour seems to have stopped for me by itself. It's been months and it hasn't happened. I have no idea why as all I did was change to what was at the time the HEAD of main. |
I am getting the exact same on my proxmox server with apcupsd and a APC Back UPS BX950 |
anyone else also getting lots of self test switch messages from apcupsd? i have BX750MI |
…etworkupstools#2347] Signed-off-by: Jim Klimov <[email protected]>
…ay_sec et al [networkupstools#2347] Signed-off-by: Jim Klimov <[email protected]>
…etworkupstools#2347] Signed-off-by: Jim Klimov <[email protected]>
…ay_sec et al [networkupstools#2347] Signed-off-by: Jim Klimov <[email protected]>
Hello all, thank you for all the reports and logs. I've posted a prospective PR to address the situation by delaying LB/RB status propagation on impacted devices (user-configurable) and so hiding it from |
…o setting the lbrb_log_delay_without_calibrating flag [networkupstools#2347] Signed-off-by: Jim Klimov <[email protected]>
…o setting the lbrb_log_delay_without_calibrating flag [networkupstools#2347] Signed-off-by: Jim Klimov <[email protected]>
For anyone here on Unraid, I have rolled out a testing package with the latest plugin update today. Please read the instructions shown on the "NUT Settings" page on how to switch to the testing package and if possible do report back if the problems are resolved for you (or not) - thanks a lot for your support. 😉 |
I’m currently on 6.12.11 and nut-2.8.2-x86_64-2master.ssl11 for about two days. Brand new Back-UPS BX1200MI, still getting the replacement battery error everyday, but the event count seems to be reduced. |
…r spurious LOWBATT/REPLACEBATT events on APC BXnnnnMI devices [networkupstools#2347] Signed-off-by: Jim Klimov <[email protected]>
As suggested in the PR discussion, try setting
flags in your |
… tweaks since 2023)" [networkupstools#2347] Signed-off-by: Jim Klimov <[email protected]>
…actly 0/1 [networkupstools#2347] When building a complex text expression, we rely on maths in some spots. Signed-off-by: Jim Klimov <[email protected]>
I have an APC Back-UPS BX1600MI connected by its supplied USB cable. With the version of nut in Debian 10 (2.7.4-8) this was unusable as the usbhid-ups driver kept disconnecting every few seconds.
I upgraded to the git HEAD of nut and communication is now stable, but spurious events come in every so often (once or twice an hour at the moment). When I call a notify script that calls
upsc
at the time of those spurious events, I can see that theups.status
does bear that out, but other values do not. Example:As you can see, the status does include
LB
andRB
, but the charge is still 100 and the runtime is as expected. The bad status lasts less than 2 seconds before returning to simplyOL
.I went to the effort of installing a Windows VM, passing USB through to it and trying APC's own Powerchute Serial Shutdown software in there. This does not report any spurious events. Both Powerchute and nut do report true events that I induce. A self-test of the device passed. I am unable to demonstrate any behaviour that APC consider incorrect.
I returned the UPS to the vendor as faulty and they sent a replacement. The replacement behaves the same.
I installed
apcupsd
just to see how it behaved. It maintained a connection but its rate of spurious events was even worse: every couple of minutes. Anotherapcupsd
user reports the same symptoms as me:https://sourceforge.net/p/apcupsd/mailman/message/58740970/
Interestingly, they had a BX1600MI working with
apcupsd
, replaced it with another BX1600MI and now they see what I see, implying that newer models of Back-UPS have something different about them even though they are the same model number.Is there any way to work around this sort of thing? It's almost like I need a way to not believe such statuses unless they persist for at least 5 seconds, or something.
The text was updated successfully, but these errors were encountered: