-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
interfaces: properly react to RENEW/REBIND from dhcp6c #6522
Comments
That's the case since 23.1.5? Since |
Reverting eec08e0 might be worth a try...
Better? |
Reverting this fixes the issue. However, it also causes a new one in Windows. Even though the prefix is now correctly listed as deprecated, it still tries to use it. Until I force the interface down and up again (which removes the deprecated prefixes). But there is probably nothing we can do about that. Interestingly I did not have this issues previously on my self-made lab router with alpine linux, dhcpcd and radvd. I think I still have a configuration backup somewhere. I could have a look if it helps. Edit: Found it, my radvd config looked like this (note that I used
|
I can't help with the latter part but I can issue a radvd restart once a full IPv6 renew event is being handled. That should give us the best of both worlds. |
Scratch the windows part. It turned out to be a "just" Chrome thing🤦 for all tabs I had open, it kept using the deprecated IP (even for new webpages or for forced refreshes). But restarting the browser made it use the new IP (without touching the interface). |
Ok, sounds good. This is going to be a bit complicated but I'll take a closer look later this week. |
prefix is not deprecated unless we shut down
@agowa338 I'm trying the following instead: 3cb2dd7669
Not sure if you use auto-mode or manual configuration. The auto-mode has a caveat in the form of 165327b which could mess up the hashing of the prefix/tracking configuration changes. |
That works 👍 thanks for the quick fix. |
Actually the only way to do this is set |
Should we consider trying to patch radvd to "fix" that? Because sending a deprecation message without a prefix change sounds like a bug anyway... |
It's being discussed since 2014 over at radvd. I don't have much hope :) |
It would be a whole lot easier if radvd had a kill signal to reload its config... then it could run forever and with GUI changes, only a signal would be neccessary... |
@meyergru that's what SIGHUP does :) but it mimicks a restart, only that it forget to deprecate the previous prefix... but the underlying issue is a bit more challenging, because on SIGHUP it would have to check IF the prefix configured changes in order to deprecate it. I don't think this works considering you can stuff multiple prefixes in there at any time. |
But in that case, you could set DeprecatePrefix on, and on GUI/configuration changes only issue a SIGHUP. When you see that the prefix has changed (e.g. when the connection is brought up again and the old prefix is different from the new one), you could restart the daemon. That is, you can check for prefix change outside of the daemon and decide which way to restart it. |
There is no way to track this through the code without ripping it all open across multiple files. And even then it might be on a configuration change and we miss it by issuing a faulty SIGHUP. I don't want to start tracking deprecated prefixes... I thought we had multiple software solutions for that :/ |
Just to reiterate, sending SIGHUP may be a better option if the config doesn't change (and the service is started). For any other case we restart properly. Just the "::/64" case has me worried, which does "magic" that we can neither track nor verify functionally (doing a simple test setup forcing this aside). Here my point is that this may have had negative effects on people's deployments all along, see commit for details on weird AdvRouterAddr option use. |
I think the work around we have now is all we can do. And the other thing is something that maybe should be brought up again in the radvd side. Because they can much more easily compare the running config with the configuration file... |
I am aware of where the real problem lies. I am just lucky to get the same prefix again on reconnects... |
The only other thing we could do (without code change to radvd) is to set the lifetimes to their minimum and have clients more or less constantly renew them (also bad)... |
RFC junkies will argue it drains all power from mobile stack ;) |
31961bf is perhaps what you are looking for here. I fiddled a bit too much on the master branch so apply with opnsense-patch may be tricky. We still have to solve the primary IP "changing" but I think we can manage that easily with more debug data from your end. |
No, it stays constant, I however don't know if this may be a race condition thing when the GUA expired and gets renewed but the ULA is still valid...
Yes it is changing, and when you look at the logs I provided, I made sure to censor it in a way that preserved which values were equal. I.E. |
I restarted opnsense after the patches, looks unchanged so far. The log still says:
For some reason, the console now lists the link-local (fe80::) IPv6 as "v6/DHCP6" for the wan interface. And I also noticed that the service startup order is suspicious. radvd, crony, unbound, and then newwanip. crony and unbound also logged some errors to the screen about unreachable IPs (as it started before newwanip gets called). Also, can you have a quick look at the logged kill command? (this is the one responsible for the deprecation of the IP, right?) I tried to execute it as it was logged, and it doesn't work. Is the quotation of the logged command different from what is actually executed? Or does it log an incorrect error message?
|
I don't think the errors are all that relevant. Ok, if the GUA is deprecated it would pick up the ULA when that one is not yet set to deprecated. Clear about the PD changing then. The master branch state is probably what you are looking for. I've made a backport as
I'd recommend a reboot just to be sure. If this isn't working for some reason best to revert all for now:
Cheers, |
I think from my side this is a wrap. 23.1.8 will ship all the relevant code. Also see https://forum.opnsense.org/index.php?topic=26832.0 for more discussion and test results. |
|
@TheekshanaA so did you try the patch and reboot? |
I did apply this patch and rebooted yep, I was running 23.8.1 kernel as another forum post said that helped them. opnsense-patch 05c6f2e got these same errors |
I am unsure what the error even is. "got these same errors" might be nothing. What's wrong operationally? |
Sorry for the vague message, with the errors a lan outage is synchronised, it lasts around 30 seconds roughly maybe a bit less. I'll get you the errors once it reoccurs. I reverted back to 23.1.8 and applied patches as mentioned in the above message. I can't seem to reproduce it on demand but it seems to occur "randomly" once a hour or so. I'll get you the logs as soon as the issue reoccurs. I apologise if something isn't clear as this is my first time using opnsense since I used pfSence as a kid almost 10 years ago. EDIT: It might be worth noting I am using a Realtek USB NIC for my LAN, I am currently awaiting delivery for my Intel dual NIC. Kind Regards, |
@fichtner
Edit occured again:
|
Hmm, you are patching the wrong thing. Try a plain 23.1.9 it should be better already. |
I just updated to 23.1.9 and also get the same error as others above:
Briefly after the error above, I also saw this one:
My IPv6 configuration is all green and working, from the OPNsense side, but when checked with external services (https://test-ipv6.com/), I have no IPv6 connectivity. This is since 23.1.8. On 23.1.7, I had all green lights with the same connection settings (default, from the docs). |
@Sieboldianus I don’t see anything related to what you are describing. |
@fichtner: Yes, my IPv6 problems may be unrelated to the logs above (the only thing that pointed me to IPv6 was seeing Perhaps also unrelated is that https://test-ipv6.com/ suddenly shows all green lights. I was just going to update my answer above. It seems the update to 23.1.9 has fixed my IPv6 problem. |
hmm, ok. An interface reconfiguration is required to replace the faulty script and a reboot or kill for dhcp6c to pick up the updated binary. If you still see issues later I’d recommend a reboot just to be sure. |
Interestingly I'm seeing exactly the same behaviour since the last update. dhcpv6 constantly crashing for some reason with the same log entries as shared by @TheekshanaA. |
Which error? Do you also use „ue“ |
I thought I'd give an update. I've switched to a intel pcie NIC and my setup has been extremely reliable however I am still getting the cannot forward ipv6 errors. I'm not sure if this is related to the dhcpv6 issue or not. Definitely will not be using a USB ethernet adapter again however I had no choice as I was waiting for my intel NIC to arrive. Thanks for the update |
The errors as shown in the general log which where shared by TheekshanaA. I'm not using a USB Ethernet driver, I have a Intel I350 NICs in a Sophos XG430 appliance. I did run the proposed reverts and patches, and it seems more stable now. But I will update you with my logs once it happens again. |
It happened again. Not that often as before, but the issue persists. |
These are most likely Android phones failing to acquire a GUA via SLAAC. They send out garbage anyway. Not much we can do about it and no relation to WAN side dhcp6c being discussed here. Check your LAN settings and also see https://docs.opnsense.org/manual/ipv6.html#basic-setup-and-troubleshooting Cheers, |
Can confirm, I have multiple android devices on my network so you are likely correct. Thought I'd also mention there are no more DHCP failing to renew errors that the others are reporting on my router. It may be worth noting I did spoof my MAC address to be the same as my original router's WAN port however I am not sure if this is related or not. Furthermore, my IPv6 tests do fail on my network. |
Closing this since initial topic fixed. Likely more issues exist in certain setups and it would be better to work on details reports for each one. |
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
Describe the bug
In other tickets (like #6515), it was already reported that OPNsense sometimes fails to update delegated prefixes. But now I discovered another bug that interrupts network communication even if OPNsense detects the prefix change.
dhcp6c changes the prefix within radvd.conf file and restarts radvd. Somehow this causes radvd to fail to send a deprecation message using the OLD prefix (even though
DeprecatePrefix on
is already configured). So the old prefix does not get deprecated and hosts on LAN have preferred IPv6 addresses from both prefixes, but OPNsense itself no longer knows about the old prefix (nor do upstream routers), so network communication fails.I then tried to stop radvd manually. But that just deprecated the NEW prefix (hence why even a reboot of OPNsense can't fix this issue, but all devices on LAN need to be touched)
To Reproduce
Steps to reproduce the behavior:
/var/etc/radvd.conf
to use the new prefix)Steps to reproduce in a Lab environment (a bit more deterministic and maybe easier to reproduce):
/var/etc/radvd.conf
to use some random prefix (P1).kill -1 dhcp6c-pid
to have dhcp6c rewrite the radvd config file (replacing P1 with another prefix P2) and restart it.Expected behavior
radvd deprecating old prefix
Describe alternatives you considered
Maybe patching
/var/etc/dhcp6c_wan_script.sh
to write the old prefix into /var/etc/radvd.conf, stop radvd (so that it sends a deprecation message with the old prefix), write the new prefix into radvd, and restart it.Screenshots
If applicable, add screenshots to help explain your problem.
Relevant log files
Additional context
/var/etc/radvd.conf
(Sensitive information replaced with $-Pseudo-Variables):/var/etc/dhcp6c.conf
:/var/etc/dhcp6c_wan.conf
:/var/etc/dhcp6c_wan_script.sh
:Environment
Software version used and hardware type if relevant, e.g.:
OPNsense 23.1.6-amd64
FreeBSD 13.1-RELEASE-p7
OpenSSL 1.1.1t 7 Feb 2023
AMD Ryzen 7 3800X 8-Core Processor (4 cores, 8 threads)
The text was updated successfully, but these errors were encountered: