-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
smokeping fails to start (timeout) if DNS temporarily unavailable #332
Comments
This issue has become stale and will be closed automatically within 7 days. Comment on the issue to keep it alive. |
Comment. |
Can possibly any dev weight on it? Frankly this seems to be a major issue, nothing short of opening for a wilful DoS. Worth noting that in my setup I use short timeout setting (2 sec, not the usual 15) so this initial probe is also ignoring this (otherwise my instance would start). |
have you tried adding a dependency to the dnsmasq service ... so that the smokeping service starts after dnsmasque ? |
That would work except for two use cases (I actually use smokeping in):
I agree with the initial reporter - there's no good reason to fail the start of the service when the one check of the one target happens to fail during the service restart. Appreciate the suggestion though. |
This is not a Smokeping issue, it is working as expected. If the issue is just during startup, start the smokeping service after the name service:
|
I already run dnsmasq on the router PC on my LAN. Running a second instance on every workstation/server seems unnecessary and pointless. Also, no number of dnsmasq instances will help if the LAN is not (yet) connected to the internet (eg. immediately following a power outage, which is the case I had in mind in my original bugreport). I disagree that startup should fail if internet connectivity is unavailable at the moment of startup. Please just fix your program to be friendly to dynamically changing network configuration. 🙏 |
@Strykar oh interesting, if that addition to the unit file works then I shall include it in the debian package (and thus it's indeed not an issue for upstream). the unit file already has the "After" line but not the "Wants" one. @miiichael can you try the following on the computer where you run smokeping and see if it helps avoid the startup issue?
|
I'll try it, but the issue isn't that smokeping is starting before local networking - it's starting before internet connectivity exists (either due to an ISP outage, or we've just recovered from a power outage and the VDSL modem is still negotiating sync). My understanding is that adding network-online.target just makes it start after ifup, yes? michael@yakka:~$ find /*/systemd/system/network-online.target* -ls
264068 4 drwxr-xr-x 2 root root 4096 Dec 8 2017 /etc/systemd/system/network-online.target.wants
265016 0 lrwxrwxrwx 1 root root 38 Dec 8 2017 /etc/systemd/system/network-online.target.wants/networking.service -> /lib/systemd/system/networking.service
265715 4 -rw-r--r-- 1 root root 513 Feb 2 2021 /lib/systemd/system/network-online.target My current workaround is to add some janky cron jobs. 😅 @reboot root sleep 300; systemctl status smokeping.service >/dev/null || (echo "Smokeping is bad. Let's try restarting it."; systemctl restart smokeping.service)
@reboot root sleep 360; systemctl status smokeping.service >/dev/null || (echo "Smokeping is still bad. Let's try restarting it."; systemctl restart smokeping.service)
@reboot root sleep 600; systemctl status smokeping.service >/dev/null || (echo "Smokeping is *still* bad. Let's try restarting it."; systemctl restart smokeping.service) |
@lelutin There are multiple issues being conflated here, the other posters should create a separate issue for their own. Please test and share your results here. That is the systemd-way of service dependency. @miiichael As a network stream latency grapher, what do you think smokeping should behave as if there is no network or name service available that it is configured to probe? |
I think it would be reasonable for smokeping to report that all internet hosts report as unreachable when the internet is unreachable, exactly as if I'd specified those hosts by IP (while still reporting on hosts within my LAN, which of course remain resolvable). |
This issue has become stale and will be closed automatically within 7 days. Comment on the issue to keep it alive. |
Comment. |
This issue has become stale and will be closed automatically within 7 days. Comment on the issue to keep it alive. |
Comment. |
This issue has become stale and will be closed automatically within 7 days. Comment on the issue to keep it alive. |
Comment. |
I had the same issue and just added /etc/systemd/system/smokeping.service.d/override.conf file with [Service] As I wanted it to keep running if possible. |
Yeah, either use Starting a networking monitoring software while your internet is down, does not seem to be a problem to be solved in the monitoring, but the underlying problem, thus fixing the Internet Down problem with your ISP or UPS. :) |
Well yes, but the bug here is that smokeping fails to monitor the hosts it can reach (ie. inside my LAN, which is the overwhelming majority of the hosts I'm monitoring) if it starts while my upstream link is temporarily down. Periodically restarting smokeping at intervals is just a workaround. |
This issue has become stale and will be closed automatically within 7 days. Comment on the issue to keep it alive. |
Comment. |
This issue has become stale and will be closed automatically within 7 days. Comment on the issue to keep it alive. |
Comment. |
@miiichael FYI I found out that systemd has a target named basically I've changed from:
to:
in the unit file. |
Thanks! This should help ensure LAN hosts are resolvable (in case my server boots somehow before the router), but there still remains the issue of smokeping not starting if any WAN hosts are unreachable/unresolvable (eg. because the router hasn't finished bringing up the WAN interface yet). |
@miiichael I see. At least now in the debian unit file the service should wait until network is fully online locally.
However, I don't think we can fix, with changes in the systemd unit file, the issue of not starting up if actual internet connectivity is not complete because of an external service like the router. For that, changes to smokeping's code would be needed in order to avoid bailing out completely upon startup. |
I am not sure it should be "fixed" in a network monitoring application like Smokeping. Smokeping (used by ISPs and network providers worldwide) changing to accommodate people using it on unreliable residential connection is going backwards IMO. |
But said $5 VM isn't going to be able to reach my internal hosts. If WAN is down my NAS and reticulation controller still works. I also host my mail locally. That smokeping wants to resolve all IPs on startup suggests that it would not notice if DNS entries change some time after startup... |
I'm not sure if the latency/availability collector like Smokeping should be the judge about the temporary DNS issues. It creates the problems because someone believes that temporary DNS issue is a critical fault for it. I can't imagine whole Nagios server refusing to start because one of the monitored servers/services is down. Edit: also the fact Smokeping doesn't crash/exit upon DNS problems happening later, once it's started, shows it's just an inconsistent behaviour. But also Smokeping crashing because of that would be a quite significant bug, right? You should request for this to be added to the code though, so the "unreliable [] connection" owner is adequately treated.
I'm sorry but this is just so awfully rude. Are you a product manager for Smokeping?
I don't see anything about enterprise, low latency, low loss networks. Good tools are transparent and agnostic. This tool still uses RRD, which proves it's not designed and made with huge amount of datapoints and enterprise in mind. You're doing a generally good piece of software a disservice "defending it" like this. |
Umm, @lelutin What version are you using?
Which seems to indicate that the issue shouldn't exist. I.e. please verify on latest version, as this could be related only to whatever version debian has packaged. |
Is it possible that it was systemd itself giving up on smokeping due to it taking a while to start (on account of repeatingly trying to contact not-yet-available DNS)? 🤔 |
@knofte great question.. but I was just responding to a part of the issue, where systemd would make sure to have hostname resolution before starting the service. I was not the one experiencing the issue. (e.g. I forwarded it here from debian. miichael was the original reporter of the issue) @miiichael which version of smokeping were/are you using? since you reported on debian and I failed to bump the version of smokeping for the package there for a while I'm wondering if you were using 2.7.3 from debian stable. fwiw there's now 2.8.2 in debian unstable if you'd like to test things out with this version. with enough luck, and judging by what @knofte said, if you're using 2.7.3 it's possible that your issue may go away with the newer version. |
Yes, I was running Debian stable (2.7.3). I've updated to testing (2.8.2), and also disabled the bandaids I put in cron that restarted the service if it wasn't running. After wading through old logs and some systemd manpages, I'm now wondering if my specific situation can be worked around by setting an unreasonably large (12 seconds per DNS name mentioned in config) TimeoutStartSec in the systemd unit... |
I'm running into this too. In my case, I don't think the issue is even the network being down or a majority of sites failing to resolve, but rather the volume of destinations... My workaround so far has been to patch this bit of code out: https://github.com/oetiker/SmokePing/blob/master/lib/Smokeping.pm#L2491-L2509. This lets Smokeping start up way faster, and doesn't seem to affect my probes. I do think warning about broken destinations is useful, but it really should happen in the background (instead of blocking startup) as none of these issues should be fatal. |
@oetiker I've tried patching Smokeping to make the startup DNS check optional, but I've gotten a bit stuck. In the above code block, it seems each variable (hide, nomasterpoll, etc) is processed separately. I want to add a new variable "skipdnscheck", but I haven't found an obvious way for the |
Hi there,
This was submitted by Michael Deegan as a bug report to debian, so I'm forwarding it here.
The original bug report is here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996824
This issue occurs when my smokeping host comes up before my dnsmasq host
(which currently waits during boot for my VDSL modem to gain sync...).
The result is that smokeping takes longer to start up than systemd is willing to wait:
I think that in the interests of robustness, it would be better that startup
not involve attempting to resolve every target hostname. Perhaps DNS
activity could instead be deferred until after the daemon forks?
The text was updated successfully, but these errors were encountered: