pppd: implement net-init, net-pre-up and net-down. #367

jkroonza · 2022-08-24T14:16:10Z

This no longer differentiates between which sub protocols brings an
interface up or down, merely tracks which sub protocols have brought up
and interface, and which have not yet taken it down.

There is one caveat here, I use pointer comparison on these names, so
the absolutely cannot be anything other than a compile-time constant,
and I'm not sure whether or not at least -O1 is required or not, ie, if
the compiler will eliminate multiple "STRINGS" to a single instance in
memory.

I'm unable to test the solaris code, but I don't see any reason why this
should not work.

Signed-off-by: Jaco Kroon [email protected]

jkroonza · 2022-08-24T14:24:58Z

This is aimed to replace both #341 and #342 with the strategy as discussed in #341.

This is the initial step in that it basically provides a consolidate up/down in main.c, which ALL protocols (eg, MPLSCP should it ever get set up) can use to up/down the interface, and which will then execute net-init, net-pre-up and net-down as appropriate.

Pushed now to leverage the build environments to at least compile-test the non-linux stuff so long.

jkroonza · 2022-08-24T15:30:04Z

# /home/jkroon/projects/ppp/pppd/pppd user [email protected] password OkeeLoSh0joo debug maxfail 0 persist connect true plugin /home/jkroon/projects/ppp/pppd/plugins/pppoe/.libs/pppoe.so eth0.2 nodetach
Plugin /home/jkroon/projects/ppp/pppd/plugins/pppoe/.libs/pppoe.so loaded.
PPPoE plugin from pppd 2.4.10-dev
Send PPPOE Discovery V1T1 PADI session 0x0 length 12
 dst ff:ff:ff:ff:ff:ff  src 34:48:ed:8d:d4:61
 [service-name] [host-uniq b6 04 00 00]
Recv PPPOE Discovery V1T1 PADO session 0x0 length 49
 dst 34:48:ed:8d:d4:61  src 00:d0:f6:b6:56:d5
 [service-name] [AC-name tcap-ip-bng-1] [host-uniq b6 04 00 00] [AC-cookie 20 60 8f 6c a0 52 0f f2 b9 73 d9 43 85 ea 20 e0]
Send PPPOE Discovery V1T1 PADR session 0x0 length 32
 dst 00:d0:f6:b6:56:d5  src 34:48:ed:8d:d4:61
 [service-name] [host-uniq b6 04 00 00] [AC-cookie 20 60 8f 6c a0 52 0f f2 b9 73 d9 43 85 ea 20 e0]
Recv PPPOE Discovery V1T1 PADS session 0x1 length 12
 dst 34:48:ed:8d:d4:61  src 00:d0:f6:b6:56:d5
 [service-name] [host-uniq b6 04 00 00]
PPP session is 1
Connected to 00:d0:f6:b6:56:d5 via interface eth0.2
using channel 25
Using interface ppp1
Script /etc/ppp/net-init started (pid 1209)
Script /etc/ppp/net-init finished (pid 1209), status = 0x0
Connect: ppp1 <--> eth0.2
sent [LCP ConfReq id=0x1 <mru 1492> <magic 0x62b36091>]
rcvd [LCP ConfAck id=0x1 <mru 1492> <magic 0x62b36091>]
sent [LCP ConfReq id=0x1 <mru 1492> <magic 0x62b36091>]
rcvd [LCP ConfAck id=0x1 <mru 1492> <magic 0x62b36091>]
rcvd [LCP ConfReq id=0x47 <mru 1492> <auth pap> <magic 0x136385c4>]
sent [LCP ConfAck id=0x47 <mru 1492> <auth pap> <magic 0x136385c4>]
sent [PAP AuthReq id=0x1 user="<removed>" password=<hidden>]
rcvd [PAP AuthAck id=0x1 "Login ok"]
Remote message: Login ok
PAP authentication succeeded
peer from calling number 00:D0:F6:B6:56:D5 authorized
kernel does not support PPP filtering
sent [IPCP ConfReq id=0x1 <addr 154.73.32.7>]
sent [IPV6CP ConfReq id=0x1 <addr fe80::e4cf:b137:6c20:f011>]
rcvd [IPV6CP ConfReq id=0x8a <addr fe80::02d0:f6ff:feb6:5580>]
sent [IPV6CP ConfAck id=0x8a <addr fe80::02d0:f6ff:feb6:5580>]
rcvd [IPCP ConfReq id=0x75 <addr 165.16.204.97>]
sent [IPCP ConfAck id=0x75 <addr 165.16.204.97>]
rcvd [IPCP ConfNak id=0x1 <addr 165.16.204.108>]
sent [IPCP ConfReq id=0x2 <addr 165.16.204.108>]
rcvd [IPV6CP ConfAck id=0x1 <addr fe80::e4cf:b137:6c20:f011>]
Script /etc/ppp/net-pre-up started (pid 1238)
Script /etc/ppp/net-pre-up finished (pid 1238), status = 0x0
IPV6CP is now UP
local  LL address fe80::e4cf:b137:6c20:f011
remote LL address fe80::02d0:f6ff:feb6:5580
Script /etc/ppp/ipv6-up started (pid 1260)
rcvd [IPCP ConfAck id=0x2 <addr 165.16.204.108>]
Script /etc/ppp/ip-pre-up started (pid 1262)
Script /etc/ppp/ip-pre-up finished (pid 1262), status = 0x0
IPCP is now UP
local  IP address 165.16.204.108
remote IP address 165.16.204.97
Script /etc/ppp/ip-up started (pid 1346)
Script /etc/ppp/ipv6-up finished (pid 1260), status = 0x0
Script /etc/ppp/ip-up finished (pid 1346), status = 0x0


^CTerminating on signal 2
Connect time 8.1 minutes.
Sent 97674 bytes, received 11780 bytes.
IPCP is now DOWN
Script /etc/ppp/ip-down started (pid 17481)
IPV6CP is now DOWN
Script /etc/ppp/net-down started (pid 17483)
Script /etc/ppp/ipv6-down started (pid 17484)
sent [LCP TermReq id=0x2 "User request"]
rcvd [LCP TermAck id=0x2]
Connection terminated.
Connect time 8.1 minutes.
Sent 97674 bytes, received 11780 bytes.
Script /etc/ppp/ip-down finished (pid 17481), status = 0x0
Script /etc/ppp/net-down finished (pid 17483), status = 0x0
Script /etc/ppp/ipv6-down finished (pid 17484), status = 0x0

Ordering:

net-init successfully executed as per above as the first script, just after "Using interface ppp1" message.

After this, authentication was completed. At this point IPCP and IPV6CP was initiated.

IPV6CP completed first (but it could just as well have been IPCP.

This correctly triggered net-pre-up.

After which the normal IPv6 config and ipv6-up scripts was run (not waited for).

IPCP then came up, and executed it's usual ip-pre-up (after configuring the IPs actually, and really was live by the time the script executed).

This correctly ran ip-pre-up, and the set_ifup did NOT trigger net-pre-up again. As expected.

After this the ip-up script was also triggered and not waited for.

Upon ^C both IPCP and IPV6CP was terminated, and upon termination of the latter net-down was triggered.

There is one debate here - should the ip-down and ipv6-down execute PRIOR to net-down? (None of them are waited for, so I don't think such a guarantee makes sense.) As it stands, net-down will trigger PRIOR to the "last proto down" executing it's script, to "fix" this I'll need to move the execution of ip-down and ipv6 down prior to calling set_ifdown, but still after deconfiguring the protocol from the OS.

pppd/main.c

enaess · 2022-08-24T15:37:52Z

pppd/main.c

+ * use set_ipup("IPCP") ...
+ */
+int
+set_ifup(const char* name)


I am not a big fan of the new static variable up_protos[]. The protocols are already a part of the protocols[] (struct protoent). Seems like there should be a possibility of querying that structure to figure out if it is set in state up or not.

Also, I would imagine that NET_PRE_UP would be executed like IP_PRE_UP which means all configured network protocols have marked themselves as done (either UP or FAIL), but interface is not yet in configured with IFF_UP yet).

A script could possibly query the configuration of the interface being brought up to find the values it needs if it isn't passed into the net-pre-up script (as ip-pre-up would pass in the IP information, or the IPv6 information).

array of ints vs array of pointers? I was also wondering how to avoid up_protos, but alas, couldn't think of any. Even considered a bitmask ... do you have a better suggestion?

Yes, NET_PRE_UP gets executed like IP_PRE_UP, but since we cannot determine the order in which protocols configure, we can make no guarantees w.r.t. configured status (some protocol - probably exactly one - may already be configured, but it could be either ipv6cp or ipcp or theoretically some other), only that the interface will not yet be in IFF_UP state, but about to transition into that.

Yes, a script can either query the details using say "ip" or by using the environment variables (as set by script_setenv calls scattered through pppd - but again, no guarantees can be made since probably only one of ipv6cp or ipcp will have completed at this stage).

Eliminated the sys-* netif_set_up and netif_set_down functions in preference of exposing a single netif_set_state() from each.

I think I would look at main.c and find how they processes the protocol structure in get_input(), but if you could add a int field in there that says state, and this new function would
pseudo code:

for (protocol entry : protocols) {
if entry.id == PPP_EAP || entry.id == PPP_LCP || entry.id == .. ) // or, add a field phase in each protocol entry, i.e. mass update the files, then if (PHASE_NETWORK != entry.phase)
continue;
if !entry.up
return -1
}

// all NETWORK protocols are UP here.
setifstate(unit, 1);

Generally, adding the extra information to protocols structure; could help readability of the application. PHASE_NETWORK would be up when all NETWORK protocols has been negotiated.

Similarly, a flag to say the protocol is an AUTH protocol, you could chose to only pass the information to that protocol instead of checking EAP and LCP. See RFC1661.

Trying to run NET_PRE_UP before all protocols has completed negotiation runs you afoul with the problem that not all information can be extracted. Though, trying to pass it to the script may be difficult, but a script can be made to query the state of the interface and extract the IPv6, IPv4 and MPLS information. If you execute the NET_PRE_UP before all network protocols have been negotiated then this information would be lacking at that point.

Also, the current protocol structure doesn't state if a protocol was intended to run during a particular phase, that could possibly be added in a separate change to ensure once you iterate through the structure don't have to e.g. compare it against a set of hard-coded protocol numbers (e.g. see get_input() in main.c).

This would be helpful here too where you iterate through the protocols and would only set the interface in state up when all the protocols has come up. Unfortunately, this probably require it to expose yet another callback to the protocol code to say: yeah you are up and scripts like ip-up and ip6-up gets executed (after interface has been configured in state up (IFF_UP)).

@enaess thinking on this I agree this would be nice.

However, and this is a big issue, both ip-pre-up and ip-up is called from the same function in ipcp.c - this is a sequential step. So this would require a fairly major redesign.

I can certainly add a notify() and/or net_{pre_,}up_hook() mechanism for you, that's quite trivial.

Do we know of many systems using ip-pre-up? My recommendation would be that protocol-specific pre-ups remain, but that they be moved prior to interface configuration. This does of course imply that net-pre-up will likely be called after one protocol-specific pre-up, which I think is not ideal. Unless all protocols adheres to:

Ensure iface is UP.

Invoke protocol-specific pre-up (currently only ip, not ipv6), providing "will be used" details.

Configure interface for protocol.

Invoke up (currently ip and ipv6).

Alternatively (very major rework referred to above) we can somehow permit all protocols to do something like the following:

Upon starting pppd, prior to proceeding indicate that the protocol (if to be negotiated) needs to finish negotiation prior to interface up.

If negotiation fails remove from the "desired" set.

If negotiation succeeds, configure the interface as required.

Invoke pre-up. (This can potentially move to pre-up callback in 5 in order to only invoke it after net-pre-up I reckon).

Call if_setup(), additionally providing two optional call-backs. 1 to invoke when ready to bring interface up, just prior to actually doing so. 1 to invoke just after bringing the interface up.

All functionality in the current _up functions after the call to if_setup would thus move into the latter callback. I don't see a specific need to the first callback, but I'd rather someone look at it rather than for it.

One problem I possibly foresee is ondemand ... but I'd have to take a much closer look. As I see it there would then be four net-* scripts:

net-init - as per current.

net-pre-neg (need a better name, net-ready came to mind but that implies to iface is ready to bring up, which is not the case) - invoked post authentication but prior to initiating IPCP and IPV6CP (and others).

net-pre-up - invoked just prior to the pre-up callbacks (such that it's possible to get ordering net-pre-up followed by proto-pre-up).

net-down (as current).

Would this be more suitable? What if we WANT to bring up the interface when the first protocol is ready instead of when ALL protocols are ready? My opinion is that the current setup kinda works for me in that once the first protocol is good to go the interface can come up, so if IPv6 goes back and forth, failing to negotiate, why should it delay IPv4 or the other way around?

@enaess could you please provide some feedback here?

I think the general outline of the above seems okay. However, I would want to solicit feedback from @paulusmack to make sure he is okay with a change like that and what you are trying to do. The other part of this is how "risky" is a change like that, meaning will it break or become fragile, e.g. is ip-pre-up / net-pre-up deterministic in all cases like ip is up, but ip6 failed negotiation, the test-coverage, etc.

@paulusmack @enaess may I please request some direction here?

enaess · 2022-08-24T15:38:59Z

pppd/main.c

+    int i = 0;
+    const char** t;
+
+    while (up_protos && up_protos[i] && up_protos[i] != name)


Why not use a strcmp() here instead of comparing the memory address?

no real reason other than a compiler shouldn't duplicate the string and as such a pointer comparison is good enough. I've never seen that it gets duplicated but I've only ever worked with gcc in the last 19 years and some MSVC prior to that. Can convert if that's preferred, doesn't make a significant difference, and since others will probably wonder the same thing it's probably worth the change from a sanity perspective and just in case some compiler does actually duplicate the strings?

@enaess are you happy with the explanation or should I switch to strcmp?

I am not found of pointer comparisons as they imply deep understanding of the code and what you are trying to do. It is for the same reason Java does string comparisons using String.equals() vs. ==, they mean two completely different things.

pppd/sys-linux.c

jkroonza · 2022-08-25T05:06:22Z

pppd/main.c

+	run_net_script(PPP_PATH_NET_PREUP, 1);
+	if (if_indextoname(ifindex, iftmpname) && strcmp(iftmpname, ifname)) {
+	    info("Detected interface name change from %s to %s.", ifname, iftmpname);
+	    strcpy(ifname, iftmpname);


Need to call script_setenv here for IFNAME? What do I pass for iskey?

Only case where set_ifunit is called with iskey=0 is under multilink, in which case pppd joining a bundle I suspect (but don't KNOW) that the bundle interface will already be UP and thus set_ifup will never be called.

We've only ever used multilink in a handful of cases, and those use-cases are long-gone now so don't have an environment where I can scope and check this.

Neustradamus · 2022-10-10T00:25:24Z

@EasyNetDev: What do you think?
We wait you...

enaess · 2022-10-22T20:11:09Z

@Neustradamus I don't see this PR as a reason to hold up a release. While some good collaboration happened here, the idea of a new net-pre-up and net-up callback to improve on the current problems related to ip-pre-up and absence of ipv6-pre-up can be implemented as a separate PR later

jkroonza · 2022-10-24T06:51:57Z

@Neustradamus I don't see this PR as a reason to hold up a release. While some good collaboration happened here, the idea of a new net-pre-up and net-up callback to improve on the current problems related to ip-pre-up and absence of ipv6-pre-up can be implemented as a separate PR later

Hi,

I would really appreciate if we can get this merged, what would need to happen and what can I do from my side to assist in this regard?

jkroonza · 2022-12-30T06:58:36Z

Copied from PR #111

I'm a little confused by the intent of this change (#367).

OK.

There are a couple of reasons that the sifup/sifdown paths (which are by design intended ONLY to set flags, not to run scripts) are separated. One is that different platforms have different ways to represent multiple addresses on a given interface. This is a place where I think Solaris differs substantially from Linux. Solaris has "logical interfaces." A logical interface is the entity that holds the address and a set of independent flags for EACH address. That is, unlike Linux, each network layer protocol (actually each address) can be marked up or down independently.

Right. I was not aware of this. Does that actually happen that way? Because the various functions simply marks the interface up/down in the case of pppd, not a specific protocol/address here.

On Solaris, the SIOC[SG]IF* ioctls and the SIOC[SG]LIF* ioctls are different. The [SG]IF ioctls deal with legacy-style BSD-ish interfaces (dating back to SunOS 4.x days), while the [SG]LIF ioctls deal with logical interfaces. In the proposed change, all of the "LIF" handling has been deleted from the code base. I haven't tested this, but I wouldn't be surprised if this caused problems.

I can only test on Linux unfortunately. I looked at the code, and as far as I could determine, the code which was removed was 100% identical. I could have missed something. The primary aim here was to have a consolidated if up/down interface, which then did all the common work, around dispatching to the OS specific functions. Not to change behaviour, but to enable adding OS and protocol-independent up/down work to interfaces without having to do this in multiple places.

(The ioctl split exists in order to maintain binary compatibility with SunOS applications.) For example, on a Solaris 11 machine, I see this: lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 net0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 inet 10.50.31.94 netmask ffff0000 broadcast 10.50.255.255 ether 8c:73:6e:c0:e:8a lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1 inet6 ::1/128 net0: flags=20002004841<UP,RUNNING,MULTICAST,DHCP,IPv6> mtu 1500 index 2 inet6 fe80::8e73:6eff:fec0:e8a/10 ether 8c:73:6e:c0:e:8a net0:1: flags=20002004841<UP,RUNNING,MULTICAST,DHCP,IPv6> mtu 1500 index 2 inet6 fd48:3e6a:e278::e30/128 Things to note in this example (sorry about the line wrap): there are two distinct instances of "net0." They are distinguished by address family. And there's a separate "net0:1" logical interface that was created by DHCPv6 and is managed by that protocol. And for IPv6, in general, only the LIF ioctls will work on Solaris, due to the change in interface (wider values in the ioctl data structure). As for the rest of the change, I'm not sure I follow what the ultimate architectural goal of this might be. To me, each NCP is (somewhat unfortunately) different in terms of what it actually negotiates and what the units of currency are. This means that, for instance, IPv4 negotiates distinct endpoint addresses that are used at the application layer, but IPv6 negotiates only the IDs used for link-locals (which no application should touch), and MPLS has no addresses at all (LDP does that).

OK, so the question here is if it's possible to UP and/or DOWN the IPv4 vs IPv6 net0 interfaces (and even net0:1 independently of net0)? Based on what you stated, it sounds like a "yes it can", but can you please confirm? From what I recall the pppd code most certainly did not do that. And in fact, there was quite a bit of confusing stuff around upping and downing interfaces, and it looked like it's possible for IPv6 to tear down, and bring down the entire interface (ie, ifdown) without actually tearing down LCP as a whole, leaving pppd in a situation where it could consider IPv4 and MPLS etc to be "up", but due to the interface being down it's actually down. If you're right (I'm not saying you're wrong, just looking to understand better) about the way solaris handled this then I'm way off base and will probably need to rethink this a bit.

And that's ignoring other more exotic protocols such as AppleTalk that negotiate a separate network number and node address. All of this means that the environment variables and the command line arguments (again, unfortunately) differ for these external scripts, and is another reason these code paths are separate. I don't see what the grand script unification does except maybe cause compatibility issues. Moreover, there's no notion in PPP of having reached a stage of "fully up." Any link can bring up or tear down an NCP at any time, so if the ultimate goal is to get to a point where we can just invoke one "I'm up" script, I don't think that can be done in any coherent way.

I think this touches on some of the discussion and issue points. So what I desperately want is an init script to execute right at the start of pppd (net-init as implemented in that PR), then scripts to execute just before upping the interface (protocol independent, net-preup, after auth, but prior to setting the interface up), and then to execute when tearing down (less critical, but we do have lated ifb interfaces used for shaping ingress traffic from the ppp interfaces which needs to be torn down too).

(I'm not sure if that's where this was going, but just a guess.) Some minor issues:

If we're going to delete 'unit' numbers, let's not do it piecemeal. Let's have a single change that nukes them all, rather than a series of changes that arbitrarily yank them from one old interface or another. I'm fine with removal of that, as it was used only be a few odd non-public branches of the code, and is hard to maintain correctly when "nobody" is using it. I'm just not fine with incremental breakage; it seems unprincipled to me. (As this change is doing with sifup et al. Yes, I know it's not just this change; this is a pet peeve.)

Wasn't my intention to break anything, and I did take as much care as I could to maintain backwards compatibility. I do agree with your pet peeve and I can only state that was not my intention. You reference 'unit' numbers here specifically, I do recall yanking certain parameters which I think could have been unit numbers, they were yanked from functions I was modifying where to the best of my ability to determine they were not being used anyway. You reference "odd non-public branches" that may have used them, of this I was not aware. I also didn't spot specific other places in the (fairly large) codebase where they could be removed, but I could well have missed stuff, obviously, which is why we have reviews of code before merging :).

Definitely agree with the other commenters saying that comparing static strings by address only is not a good idea. It saves almost nothing in execution time, and opens up a neat hole for someone else to fall into by way of a bad compilation flag, a new pppd plugin, or some other contrivance. I don't see why this mechanism is so complex, when we already have per-NCP registration mechanisms for the FSM and this sort of state information could be carried there, but if it must be separate and must be done with a global array, then at least use strcmp.

OK, so add this to the per protocol structure is what I'm getting? This may also allow the one requested additional feature called net-up to be invoked once all protocols have been brought up.

The one other requested change was to only actually up the interface once all desired protocols have been negotiated and brought up, but as stated, that would require changes to most sub protocols which can obviously be done. Specifically, sub protocols would need to signal "configuration done, ready to be upped, please call this callback when interface is up". This can be done, and I'm happy to put in the effort for the in-tree protocols. I would, however, recommend that this be configurable behaviour.

If we're going that far, I'd further recommend that any and all other pre-up scripts be nuked, since already they have no guarantee that the interface is actually still down. Alternatively, we need to follow the above methodology and enforce that.

This is perhaps a bigger change than what should be merged at this point though, and could cause further delays to an overdue ppp release.

I saw some discussion on the issue of running the "down" script before the link goes down. It might seem nice to run the script before a locally-initiated tear-down starts, but that should never be given as a feature to the user, because there's no general way to make it work. The other end can terminate at any time, and in practice the network connection itself may well have ended before any sort of notification is received (which may also come in the form of a simple loss of carrier). So, sure, if you can invoke it early, and believe it to be important, then do so, but don't advertise it too strongly in any documentation or suggest that anyone should rely on it. (For what it's worth, I like interfaces that give consistent semantics rather than "helpful" ones. In other words, I'd prefer to see the "down" script run after the link goes down in all cases, so users can't become dependent on happenstance. But I guess others may differ or have particular limited usage cases in mind.)

Agreed about the consistency, will gladly make this change. They way I understood it was that locally the interface will still reflect UP, but I do agree with you, it will already no longer be operational if remote side tears down, so conceding to rather execute post down. This was a debate in my head and honestly it doesn't particularly matter that much.

jkroonza · 2023-01-02T11:15:37Z

@carlsonj replying here rather so we can split the release discussion from this, I think they should be separate at this point.

In Solaris, a logical interface is a specific protocol and a specific address. Marking an interface by name does that. It's not the same as Linux.

ACK. Will re-look at this code.

Here's what was missed: if (ioctl(ipfd, SIOCGIFFLAGS, &ifr) < 0) { if (ioctl(fd, SIOCGLIFFLAGS, &lifr) < 0) {
Two big differences there:

The IPv4 code uses the common "ipfd" file descriptor, which is open on "/dev/udp" for somewhat legacy reasons. The IPv6 code uses a temporary IPv6 UDP socket. (Not sure why; the code likely predates the addition of the ipfd6 variable. If I ever run into Adi Masputra again, I'll have to ask.) This is where the address family comes in.

The IPv4 code uses SIOCGIFFLAGS. The IPv6 code uses SIOCGLIFFLAGS. The two are not the same; GIF != GLIF. I agree the naming here is an eye test, though.

ACK. I'll re-look. My suggestion is then to proceed with a new ppp release without this patch. Work does need to be done here, and we'll keep using this patch on our systems for the time being.

pppd is initially launched. No need for a special script here, as the launcher can just do whatever is needed.

Can it? Let's say pppd is launched by xl2tpd or pppoe-server, you excpect the system administrator to somehow figure out which unit number is going to be used and wrapping pppd? Guessing that the "connect" option can be abused here, but at this stage (to the best of my knowledge) the unit number still won't be available. The disconnect script also doesn't run reliably due to "script is not run if the modem has already hung up". init script seems to be more aimed at initializing serial lines, either way, from a quick look these scripts only execute when ttyname option is given such that devnam is not the empty string. In fact:

Jan 2 12:05:11 kerberos pppd[9457]: In file /etc/ppp/options.lns: unrecognized option 'init'
Jan 2 12:05:52 cerberus pppd[24775]: In file /etc/ppp/options.lns: unrecognized option 'connect'
Jan 2 12:06:32 cerberus pppd[8513]: In file /etc/ppp/options.lns: unrecognized option 'disconnect'

This is when incoming over xl2tpd, pppd invoked by same.

pppd completes negotiating LCP but has not yet done AUTH (if any). We don't have this now. Not sure if it's important. It could be added.

This is the net-init part of same. Yes it's important. We need the unit number for initialization even prior to authentication. Mostly this ensures that ppp "state files" as we keep them were properly cleaned up during previous runs of pppd but there are other uses for us too.

pppd completes authentication of the peer and has not yet done any of the NCPs. This is auth-up.

There is both an auth-up (auth success) and auth-down (auth failure). Guessing the latter can be used to implement some form of source blacklisting, ie, for pppoe add the source mac into ebtables or similar to prevent further abuse, or to an ignore list for pppoe-server (which is a non-existent feature currently, but can be added, with the caveat that an attacker at this level can trivially bypass a mac block list, and can even utilize that to create a denial-of-service to other legitimate users. More useful for things like pptp and l2tpd wherewe can do a source IP block, but not useful in our primary case where we have a PPPoE to PPPoL2TP bridge as the LAC.

pppd completes negotiating an NCP but has not yet marked the network interface as "up" for traffic so no data is flowing yet. This is ip-pre-up. (Sadly, we have this for IPv4 only. Someone might want to add ipv6-pre-up.)

I created a PR for exactly that originally. It was shot down for various reasons, primarily that ip-pre-up is already unreliable (except perhaps for solaris), and adding ipv6-pre-up adds to the confusion. Two primary suggestions that came from that:

Delay "upping" the interface until all NCPs has completed, that way ipv6-pre-up would make more sense too. And net-init, net-pre-up, net-up and net-down could be further enhanced possibly. The question is, does net-pre-up then execute as the first pre-up script, or the last one in such a case. Would require callbacks into NCPs to notify NCPs that interface has now being brought up.
(my personal preference) Modify behaviour of pre-up scripts to provide details as to what will be configured (with no guarantee as to iface up/down state), passing the information as arguments to the script or environment variables (which I believe is already the case). Thus:
2.1. Execute net-pre-up;
2.2. bring up iface for operating systems that doesn't support per-protocol interface state (based on solaris input);
2.3. Set env variables;
2.4. execute ncp-pre-up;
2.5. configure interface;
2.6. bring up protocol specific interfaces for operating systems that supports that; and
2.7. execute ncp-up.

Note that the current patch series executes one NCP-pre-up prior to net-pre-up prior to upping the interface. This, following this discussion, will break at least solaris. This was my oversight plainly. The above should deal better with both Linux and Solaris type methodologies.

The involved scripts documentation would then be:

net-init - executes the moment the ppp interface has been created, at this point the interface will be down, unconfigured and prior to authentication commencing. You can use this script

auth-up/down - as per current. It should be noted these are executed asynchronously and thus precludes use for performing work between auth and NCPs being brought up.

net-pre-up - executes once the first NCP has completed negotiating. This is guaranteed to execute whilst the interface is still down, will be executed synchronously and should thus perform it's work in as short a time-frame as possible. No protocol specific information will be available at this point. You can safely rename the interface from this script, for example, based on authentication information.

*-pre-up (currently only ip-pre-up): Will be executed once the NCP has completed, prior to configuring the interface and, for operating systems such as Solaris that uses per-protocol interfaces, bringing up the per-protocol interface. For all other operating systems the interface will already be UP, but not yet configured for the specific protocol. The to be configured details will be available

*-up (I believe just ip-up and ipv6-up): Will be executed once the interface has been configured for the given NCP.

There was quite a large discussion around this. Essentially, pre-up cannot (In Linux world at least) be guaranteed to execute whilst the interface is still down. The consolidation was primarily done to simplify the net-* scripts for me, and because I thought it made sense to push this kind of work through a central controller function. I still think it does, but plainly it's not quite as simple as I thought.

pppd has marked the network interface "up" and traffic can now flow. This is ip-up and ipv6-up (and others where supported).

pppd has lost a network interface that was "up." This is ip-down and ipv6-down (and others).

For solaris where there is a per-protocol interface this makes sense, for Linux it most certainly doesn't make that much sense, since the interface has a single global state, eg:

6611: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 34:48:ed:8d:d4:61 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.144/24 brd 192.168.1.255 scope global dynamic noprefixroute eth0
       valid_lft 361sec preferred_lft 286sec
    inet6 fe80::cbb1:36d1:d0dc:a6a7/64 scope link 
       valid_lft forever preferred_lft forever

pppd has lost an authenticated peer. This is auth-down. "net-init" sounds a whole lot like point (1) to me. I don't see why anyone needs a special hook for this.

Refer above. auth-down executes on authentication failure, once a peer has been authenticated, the ppp is authenticed, authentication won't happen again.

The really narrow case that could be made (in the current system design) would be to have a script that fires when the interface is activated, just to support the rarely used "on-demand" feature in pppd, and also have that script fire immediately on the usual sort of pppd invocation. I guess I could get behind that, but it's definitely not like this proposed change at all.

I disagree. The patch series does not take ondemand into consideration at all currently. This point is valid, and if there is demand for hook scripts here I could work that into the system. ondemand itself I suspect makes less and less sense overall, except perhaps for things like vpn connections (client side). Eg, have the pppd sit there with a route pointed at the remote network, and bring up the vpn whenever it's needed, this to me makes some sense, however, ncp-pre-up in above should deal with those use-cases. I think. Haven't thought about this much at all.

"net-preup" already exists. That's point (4). Just use ip-pre-up. Tearing down also exists. That's point (6). So I'm not exactly sure what's being fixed here.

No it doesn't. In solaris world this statement may be true, but certainly not in Linux world. As it stands ip-pre-up has zero guarantees on linux with regards to being executed with the interface still being down.

I was specifically referring to the change for sifup() and sifdown() to remove the unit number and replace with the renamed set_ifup() and set_ifdown().

OK. Not sure what I missed here then, but I'll re-look.

Oof. Why? I think that roughly 99 and 44/100ths of this can be done using reasonably well crafted scripts with the existing interfaces. Granted, it looks like if you negotiate both IPV6CP and IPCP on the same link then it might be possible that ip-pre-up can fire "too late" but ON LINUX ONLY, so that's a Linux specific bug.

As I understand, so that we can have a guarantee that the interface is still down when executing ip-pre-up. With the information around Solaris now I had to think wider about this, and I suspect the above strategy should be suitable all around? The change in behaviour from a system administrator point of view would then be restricted to ncp-pre-up being executed prior to actually configuring the interface (which makes sense on Linux, not so much on Solaris - perhaps we need to vary behaviour here based on OS behaviour?)

I guess I wouldn't mind seeing "if-pre-up", "if-up", and "if-down" scripts that are invoked on Linux only when the whole interface is marked up and down by pppd. Not entirely sure if that fixes your problem, but it sounds like something that sys-linux.c could do without hurting others.

I call it net-, you call it if-, guess these are both aimed at the same. I don't care about if-up personally but yea, I guess this can be expanded to (keeping with my net- rather than if-, naming is irrelevant IMHO):

net-init - executed prior to negotiation (even auth).
net-pre-up - execute prior to bringing interface up. Permits renaming (using eg "ip li set dev pppX name pppX-jkroon").
net-up - executed after bringing interface up (on solaris I guess this will make little sense since it doesn't have a single state, but the interface could still be RUNNING at this point, just not protocol-specific UP?) This could be the added from my patch series fairly trivially and makes some sense for me.
(net-all-up - executed after all NCPs has either finished being configured, or failed conrfiguration. I see no point in this and thus won't endeavour to implement this unless someone requests me to do so)
net-down - executed after tearing down the interface, prior to destroying it (ie, ppp* still exist).

It's also possible that what you're really proposing is introducing a new "phase" for pppd, and I think that'd need a little more discussion. This would be a phase after NCP that represents "all NCPs resolved" (as described above), and possibly on Linux only, would be a place where the interface is finally marked "up." That seems like a more radical change than what's proposed here, but at least seems consistent with the existing design. Or, in a simpler model, we could just have a new pppd option that disables the modification of the IFF_UP flag by pppd, leaving that part to external scripts (or programs) to resolve. Then you can do whatever you want in your "if-up" and set the flag at your leisure. You might still need the "all NCPs" script to make that work.

You would. personally I have no need for a NCP specific pre-up in the Linux world, however, in Solaris world (without thinking too hard about this) it would make a lot of sense since each protocol really has it's own interface.

Otherwise, removing the existing scripts would also be a pretty big and incompatible change. I can't see how something like that would ever reach consensus without a fork. Even then, I think it'd be a long term disaster.

Just to be clear, not all suggestions in this thread came from me. My aim was "simply" to get a net-init, net-pre-up and net-down. Consolidation was done in order to simplify that (ie, no need to signal net-pre-up from hordes of different locations), and the consolidation I believe is what broke things overall for solaris as I understand. I'd still like to action that, and put as few things as possible in sys-*.c such that the behaviour of pppd between different operating systems remains as close as viably possible between different operating systems (and to make porting between operating systems easier, not sure if *-bsd is currently supported, for example, or whehter that's even viable). Currently we only have Linux and Solaris.

Agreed. So the ncp-pre-up being pre or post configuration could be a configurable option such that the default is to stick with current (broken on Linux) behaviour, but allow for the alternative behaviour. Or conversely we could have different behaviour hard wired in pppd based on the OS design. But that's still an incompatible change for Linux at least, so I think we should run this as an option and highly recommend the option being used on Linux then but not currently make that default, just warn that you should consider it's use (once off for every pppd run if one of the ncp-pre-up scripts is executed, unless the no- variant is explicitly set such that the system administrator has indicated to opt-in to the potentially broken behaviour), in a further future version we could then start warning that the default behaviour will change at some point, and then change it eventually.

Sorry for these long messages, but I feel it's important to not leave stones unturned, and to make sure that everything is well understood. Would the above strategy be adequate for you to keep existing behaviour as close as possible, whilst still introducing the new behaviour (which enables improvements for Linux sysadmins then).

…
-- James Carlson 42.703N 71.076W FN42lq08 @.***>

carlsonj · 2023-01-03T19:08:03Z

On 1/2/23 06:15, Jaco Kroon wrote: ACK. I'll re-look. My suggestion is then to proceed with a new ppp release without this patch. Work does need to be done here, and we'll keep using this patch on our systems for the time being.

OK.

1. pppd is initially launched. No need for a special script here, as the launcher can just do whatever is needed. Can it? Let's say pppd is launched by xl2tpd or pppoe-server, you excpect the system administrator to somehow figure out which unit number is /going/ to be used and wrapping pppd? Guessing that the "connect"

If you're doing something that requires a known interface name, why wouldn't you use the existing 'ifname' option on pppd? I don't see why guesswork would be needed. (And, no, if those other tools lack needed features, I don't think that's _necessarily_ a compelling argument to stack more on top of pppd, though it could be if it's somehow easier or more widely useful.) But, ok, there could be a plausible usage case here where you want the script to run after pppd has allocated a network interface to use, but before "anything else" has been done, including negotiating IP addresses or configuring them on the interface. I'm not actually *seeing* what that case might be, but, sure, that doesn't concern me as much as all of the rest of this. Suppose, though, we could find a clean way to fix the (apparent) Linux problem with ip-pre-up such that it actually was guaranteed to run and finish that script before pppd ever set the IFF_UP flag and after IPCP negotiation completed. Doing so would have the nice side-effect of making Linux pppd match what the Linux pppd man page says. Would that be sufficiently usable for what you're trying to do? The only difference (and I think it's a pretty important one) is that ip-pre-up runs late enough that it is *NOT* in the direct line of any timing-sensitive PPP negotiation machinery. So, yes, the addresses would be configured if you looked, but no IFF_UP flag (in this scenario) would be set.

option can be abused here, but at this stage (to the best of my knowledge) the unit number still won't be available. The disconnect

The old 'unit N' option is still available. May not be what you really want, though.

script also doesn't run reliably due to "script is not run if the modem has already hung up". init script seems to be more aimed at initializing serial lines, either way, from a quick look these scripts only execute when ttyname option is given such that devnam is not the empty string. In fact:

Correct. 'init' and 'connect' are about initializing the other side (the serial connection) and 'disconnect' is the serial-side tear-down script. Neither is what you want if you're trying to mess around with network interfaces.

Jan 2 12:05:11 kerberos pppd[9457]: In file /etc/ppp/options.lns: unrecognized option 'init' Jan 2 12:05:52 cerberus pppd[24775]: In file /etc/ppp/options.lns: unrecognized option 'connect' Jan 2 12:06:32 cerberus pppd[8513]: In file /etc/ppp/options.lns: unrecognized option 'disconnect'

I'll freely admit to knowing nothing about what xl2tpd is doing. At a guess, that *looks* like the standard pppd code has been modified and/or a plugin is removing those options. But I can't be sure. In any event, I don't think it's relevant to the issue at hand.

This is when incoming over xl2tpd, pppd invoked by same. 2. pppd completes negotiating LCP but has not yet done AUTH (if any). We don't have this now. Not sure if it's important. It could be added. This is the net-init part of same. Yes it's important. We need the unit number for initialization even prior to authentication. Mostly this ensures that ppp "state files" as we keep them were properly cleaned up during previous runs of pppd but there are other uses for us too.

I'm not entirely sure I understand those requirements. What are the "state files?" Is this a reference to the optional lock files used by pppd on ttys? The '.pid' file used to communicate with the external world? Or something else? This matters because it likely has implications for exactly *when* the script could possibly be run. It may be the case that it needs to run before pppd is ever started or that it could interact badly with pppd shutdown. It would be nice to have some detail here, as pppd already does its *own* management of state and I've never had great luck layering one manager atop another.

3. pppd completes authentication of the peer and has not yet done any of the NCPs. This is auth-up. There is both an auth-up (auth success) and auth-down (auth failure).

I might have been a little too terse in this description. These scripts are used when we're authenticating the peer. They are *NOT* general purpose wrappers around authentication in general. In other words, for the typical user with the typical sort of Internet connection scenario, the peer (the ISP) will insist on authentication but the local instance will *NOT* be configured to insist on authentication. And, of course, for that to succeed, the local instance needs to be configured with credentials and/or plug-ins that supply the authentication information demanded by the peer. And, because in that typical configuration we're not demanding authentication data from the ISP (often with the "noauth" option to be sure), this means that neither auth-up nor auth-down scripts will be used at all. These are sometimes referred to as "client" and "server" roles, although that terminology is (at least to me) incorrect. PPP is symmetric. Each side can decide (based on local administrative settings) whether and how to authenticate the other side. They're peers, not a a client/server relationship. But because people often set up a particular usage scenario, they tend to think it's the "only" way things work. This means that those scripts probably don't do what I'm _guessing_ you want.

Guessing the latter can be used to implement some form of source blacklisting, ie, for pppoe add the source mac into ebtables or similar to prevent further abuse, or to an ignore list for pppoe-server (which is a non-existent feature currently, but can be added, with the caveat that an attacker at this level can trivially bypass a mac block list, and can even utilize that to create a denial-of-service to other legitimate users. More useful for things like pptp and l2tpd wherewe can do a source IP block, but not useful in our primary case where we have a PPPoE to PPPoL2TP bridge as the LAC.

I don't understand. It doesn't appear to be related to auth-up or auth-down in any way. In particular, we don't run anything on authentication failure. The auth-up script is invoked if (and only if) we demand authentication from the peer, we get that required data, and we successfully complete authentication (i.e., we've validated the peer's identity). The auth-down script is run when (and only when) a link that previously ran auth-up is shut down or the PPP AUTH layer is otherwise torn down (as with LCP renegotiation). I don't see how those scripts could be used as you're describing. In general, PPP doesn't know about PPPoE/PPTP/L2TP details such as addresses on the underlying encapsulation. Nor, I think, should it. This part sounds very confusing to me, but maybe it has no impact here.

4. pppd completes negotiating an NCP but has not yet marked the network interface as "up" for traffic so no data is flowing yet. This is ip-pre-up. (Sadly, we have this for IPv4 only. Someone might want to add ipv6-pre-up.) I created a PR for exactly that originally. It was shot down for various reasons, primarily that ip-pre-up is already unreliable (except perhaps for solaris), and adding ipv6-pre-up adds to the confusion. Two primary suggestions that came from that: 1. Delay "upping" the interface until all NCPs has completed, that way ipv6-pre-up would make more sense too. And net-init, net-pre-up,

I would definitely agree with that. Delaying the setting of the IFF_UP flag so that ip-pre-up is actually reliable (and so that it conforms with the specification in the man page) on Linux sounds like a valuable thing to do.

net-up and net-down could be further enhanced possibly. The question is, does net-pre-up then execute as the first pre-up script, or the last one in such a case. Would require callbacks into NCPs to notify NCPs that interface has now being brought up.

You've lost me again. Dunno what the "callbacks" do, where they come from, or why they're needed. To me, if pppd is managing the interface flags, then pppd can offer script invocations that happen before or after flags are set in particular ways. Those scripts are free to do whatever they need to do. And, for consistency, pppd can (and does) wait for "pre-up" script to complete before actually setting the IFF_UP flag. But I don't understand "callback" in that context. What would the script be doing that would require information to be sent from the script to the pppd process? Is it anything other than success-vs-failure?

2. (my personal preference) Modify behaviour of pre-up scripts to provide details as to what /will/ be configured (with no guarantee as to iface up/down state), passing the information as arguments to the script or environment variables (which I believe is already the case). Thus: 2.1. Execute net-pre-up; 2.2. bring up iface for operating systems that doesn't support per-protocol interface state (based on solaris input); 2.3. Set env variables; 2.4. execute ncp-pre-up; 2.5. configure interface; 2.6. bring up protocol specific interfaces for operating systems that supports that; and 2.7. execute ncp-up.

I'm still missing: a. What net-pre-up could possibly do that ncp-pre-up (is that a generic term for "ip-pre-up"?) cannot. The only distinction it seems is that "net-pre-up" is running before IPCP option negotiation takes place. Is that it? Why? What can't you do after IPCP negotiation is done? b. What the ordering really means. Where does IPCP negotiation happen in that list? c. How this is compatible? It seems to me that doing 2.2 before 2.4 means that Solaris is broken. The guarantee for "ip-pre-up" is that it runs *before* the IFF_UP flag is set. So you can do critical things such as install packet filters. But in this new scenario, that doesn't work, because the IFF_UP flag was already set at 2.2. d. Why would there be "no guarantee" on interface up/down state? That makes no sense to me. The *whole point* of having ip-pre-up and ip-up is that the interface state is known so you can write sensible scripts. The state is in the pppd man page. Why would we remove that? e. Why there are so many points of control?

net-init - executes the moment the ppp interface has been created, at this point the interface will be down, unconfigured and prior to authentication commencing. You can use this script

... sentence seems cut off here. You can use this script how? Agreed that this one is plausible.

auth-up/down - as per current. It should be noted these are executed asynchronously and thus precludes use for performing work between auth and NCPs being brought up.

Yes; those are async scripts. Normally, they're used for audit logging. It is *possible* to interject a synchronous script between completion of the AUTH layer and the start of the NCP layer(s). I'm not sure if it's a good idea, so I'd really like to understand the intended usage much better. In particular, such a script would need to be carefully written because pppd would effectively hang while the script was off doing whatever it does. This means that the peer (unaware of the situation) would end up starting its NCPs, sending Configure-Request messages as one does following authentication, and timing out. In theory, one could possibly buffer a few messages in order to speed things along if the script is quick, but things can get really dicey there -- both with PPP behavior and known peer implementation bugs.

net-pre-up - executes once the first NCP has completed negotiating. This is guaranteed to execute whilst the interface is still down, will be executed synchronously and should thus perform it's work in as short a time-frame as possible. No protocol specific information will be available at this point. You can safely rename the interface from this script, for example, based on authentication information.

Or we could just fix ip-pre-up on Linux. Right? Not sure what interface renaming is about, but sure, that could be done. Seems easier to use 'ifname' option than hijacking in a script, but ok.

*-pre-up (currently only ip-pre-up): Will be executed once the NCP has completed, prior to /configuring/ the interface and, for operating systems such as Solaris that uses per-protocol interfaces, bringing up the per-protocol interface. For all other operating systems the interface will already be UP, but not yet configured for the specific protocol. The to be configured details will be available

That's not what the existing pppd documentation says, though. It says that ip-pre-up is run before the interface is marked "up." That's the whole reason for being. That's why it exists apart from "ip-up." /etc/ppp/ip-pre-up A program or script which is executed just before the ppp network interface is brought up. It is executed with the same parameters as the ip-up script (below). At this point the interface exists and has IP ad- dresses assigned but is still down. This can be used to add firewall rules before any IP traffic can pass through the interface. Pppd will wait for this script to finish before bringing the interface up, so this script should run quickly.

There was quite a large discussion around this. Essentially, pre-up cannot (In Linux world at least) be guaranteed to execute whilst the interface is still down. The consolidation was primarily done to

Sure it can. All that we have to do is introduce the "all NCPs stable" condition that we talked about before. The required change is then just this, regardless of OS: - "ip-pre-up" runs synchronously after the IP addresses have been negotiated and set on the network interfaces, but IFF_UP has *NOT* yet been set. - pppd waits until all configured NCPs reach either OPENED or STOPPED state. - Once reaching that bar, all of the network interfaces that have been negotiated are marked IFF_UP, and then any ip-up or ipv6-up scripts are run. I think that works everywhere, including both Linux and Solaris, and avoids all this extra complication. The "difference" from what we have today versus what you're proposing is a microscopic change on Solaris in the timing of just "ip-up" -- instead of waiting for IPCP and the IPv4 LIF alone, it ends up waiting for all NCPs to settle and all LIFs to be set. In a properly-functioning system, that's at worst a few milliseconds. In a badly-functioning one, who cares? The implication would be that if the user has enabled IPv6 and if the peer doesn't implement (or allow) IPv6 and it *ALSO* doesn't follow the RFCs so that it just discards the IPV6CP messages (rather than sending LCP Protocol-Reject as it must), then we could have a situation where ip-up is delayed arbitrarily as IPV6CP times out and eventually gives up. With the default parameters, that would take 30 seconds to resolve itself. The user could work around that by either (A) setting "noipv6" as already described in the man page for dealing with bug-ridden peers or (B) yelling at the owner of the peer machine to get a better system. Maybe both.

simplify the net-* scripts for me, and because I thought it made sense to push this kind of work through a central controller function. I still think it does, but plainly it's not quite as simple as I thought. 5. pppd has marked the network interface "up" and traffic can now flow. This is ip-up and ipv6-up (and others where supported). 6. pppd has lost a network interface that was "up." This is ip-down and ipv6-down (and others).

Yes.

For solaris where there is a per-protocol interface this makes sense, for Linux it most certainly doesn't make that much sense, since the interface has a single global state, eg:

But it still makes sense there. The current behavior on Linux is that the IFF_UP flag is removed when the *last* NCP goes away. That's what the static if_is_up and if6_is_up variables in sys-linux.c are about. (The if_is_up one is used confusingly as a counter, but it's really just a boolean.) This means that today: 1. You run the corresponding ip-down or ipv6-down script after deleting the addresses and *possibly* turning off the IFF_UP bit (if you're the last one out). 2. In the script, you can't trust the IFF_UP flag, particularly on Linux. But you *CAN* trust that the interface is no longer passing packets with that protocol, because we've shut it down in the kernel and removed the addresses.

Refer above. auth-down executes on authentication failure, once a peer has been authenticated, the ppp is authenticed, authentication won't happen again.

No, it doesn't really work like that.

"net-preup" already exists. That's point (4). Just use ip-pre-up. Tearing down also exists. That's point (6). So I'm not exactly sure what's being fixed here. No it doesn't. In solaris world this statement may be true, but certainly not in Linux world. As it stands ip-pre-up has zero guarantees on linux with regards to being executed with the interface still being down.

I think it's better to fix the bug rather than add more complexity.

As I understand, so that we can have a guarantee that the interface is still down when executing ip-pre-up. With the information around Solaris now I had to think wider about this, and I suspect the above strategy should be suitable all around? The /change/ in behaviour from a system administrator point of view would then be restricted to ncp-pre-up being executed prior to actually configuring the interface (which makes sense on Linux, not so much on Solaris - perhaps we need to vary behaviour here based on OS behaviour?)

We can certainly have scripts that only fire on a given platform, if that's needed. But to the extent possible, I *strongly* prefer that we design the interfaces in a platform agnostic manner.

I guess I wouldn't mind seeing "if-pre-up", "if-up", and "if-down" scripts that are invoked on Linux only when the whole interface is marked up and down by pppd. Not entirely sure if that fixes your problem, but it sounds like something that sys-linux.c could do without hurting others. I call it net-, you call it if-, guess these are both aimed at the same. I don't care about if-up personally but yea, I guess this can be expanded to (keeping with my net- rather than if-, naming is irrelevant IMHO):

I don't care about the naming. I was just trying to disambiguate it from your "net-" stuff, which *appears* to come from some other subsystem. (And which also has external dependencies and expectations that are a bit opaque to me right now.)

net-init - executed prior to negotiation (even auth). net-pre-up - execute prior to bringing interface up. Permits renaming (using eg "ip li set dev pppX name pppX-jkroon"). net-up - executed after bringing interface up (on solaris I guess this will make little sense since it doesn't have a single state, but the interface could still be RUNNING at this point, just not protocol-specific UP?)

Why wouldn't it make sense? The design is really quite straightforward and is in the man page. "ip-pre-up" is run before the interface is marked IFF_UP. "ip-up" is run after. It gives you the option of doing things that can *ONLY* be done when the interface is down -- such as configuring security-related packet filters -- before the "up" flag is set. Setting filters after going IFF_UP is a Bad Thing regardless of platform: it means the bad packets you're trying to filter away can sneak through on a timing hole. The IFF_RUNNING flag is orthogonal. That flag is set by the system itself when the underlying layers are ready to accept packets and cleared when those layers are down. It's basically an indication of hardware or lower-layer protocol status. The IFF_UP flag is an administrative flag: it says "the administrator would like to use this interface now." (For Ethernet, IFF_RUNNING is usually set using the hardware carrier flag. For PPP, it usually marks when LCP itself is in state OPENED, and LCP itself rides over some serial channel that usually has its own up/down indication.) Those two concepts are distinct. You can have an interface where the administrator has set an address and marked the interface "up," but someone has tripped on the wire and pulled it from the wall. That's IFF_UP but ~IFF_RUNNING. You can also have an interface where the wire is attached, but the administrator hasn't configured the interface for use it or just doesn't *want* it to be used. That's IFF_RUNNING but ~IFF_UP. In this case, pppd is acting as the ersatz "administrator." It's doing the things that the admin would normally do via "ifconfig" (or similar). I suggested disconnecting the administrative part of pppd as another option, but I guess that doesn't match whatever it is you're doing.

It's also possible that what you're really proposing is introducing a new "phase" for pppd, and I think that'd need a little more discussion. This would be a phase after NCP that represents "all NCPs resolved" (as described above), and possibly on Linux only, would be a place where the interface is finally marked "up." That seems like a more radical change than what's proposed here, but at least seems consistent with the existing design. Or, in a simpler model, we could just have a new pppd option that disables the modification of the IFF_UP flag by pppd, leaving that part to external scripts (or programs) to resolve. Then you can do whatever you want in your "if-up" and set the flag at your leisure. You might still need the "all NCPs" script to make that work. You would. /personally/ I have no need for a NCP specific pre-up in the Linux world, however, in Solaris world (without thinking too hard about this) it would make a lot of sense since each protocol really has it's own interface.

Given all of the above, I'm not sure I understand that position. I think that if you want "net-pre-up" then "ip-pre-up" *WITHOUT BUGS* is almost exactly what you want. And has the huge benefit of not introducing more complex functionality that we have to support from now until forever. If it's not exactly what you want, then I'd still like to see a use-case where net-pre-up works but ip-pre-up (again assuming the Linux bug is fixed) does not.

Otherwise, removing the existing scripts would also be a pretty big and incompatible change. I can't see how something like that would ever reach consensus without a fork. Even then, I think it'd be a long term disaster. Just to be clear, not all suggestions in this thread came from me. My aim was "simply" to get a net-init, net-pre-up and net-down. Consolidation was done in order to simplify that (ie, no need to signal net-pre-up from hordes of different locations), and the consolidation I believe is what broke things overall for solaris as I understand. I'd still like to action that, and put as few things as possible in sys-*.c such that the behaviour of pppd between different operating systems remains as close as viably possible between different operating systems (and to make porting between operating systems easier, not sure if *-bsd is currently supported, for example, or whehter that's even viable). Currently we only have Linux and Solaris.

We used to support those other systems, but yanked support as users went away. No idea what the state of that is now.

Agreed. So the ncp-pre-up being pre or post configuration could be a configurable option such that the default is to stick with current (broken on Linux) behaviour, but allow for the alternative behaviour. Or conversely we could have different behaviour hard wired in pppd based on the OS design. But that's still an incompatible change for Linux at least, so I think we should run this as an option and /highly recommend/ the option being used on Linux then but not currently make that default, just warn that you should consider it's use (once off for every pppd run if one of the ncp-pre-up scripts is executed, unless the no- variant is explicitly set such that the system administrator has indicated to opt-in to the potentially broken behaviour), in a further future version we could then start warning that the default behaviour will change at some point, and then change it eventually.

I'm not really with you there. I think having a new option that modifies how the scripts are used would add an extra level of complexity and failure-prone-ness that would be highly undesirable. It would end up with users randomly setting this "make it work right" flag everywhere, likely without understanding any of the implications. Pretty much the way people set "noauth" today or "novj". The "interface configuration" we're talking about is just adding the IP address. But not setting the IFF_UP flag. How does that matter? The system should ignore any IP addresses until the IFF_UP flag is set. If it doesn't do that today, then it's *really* fundamentally broken. Based on my limited testing, Linux isn't *that* bad.

Sorry for these long messages, but I feel it's important to not leave stones unturned, and to make sure that everything is well understood. Would the above strategy be adequate for you to keep existing behaviour as close as possible, whilst still introducing the new behaviour (which enables improvements for Linux sysadmins then).

Agreed; it's important to hash it all out. This is 30 year old code, and it would be good to avoid doing things that make hard for our heirs.

…

-- James Carlson 42.703N 71.076W FN42lq08 ***@***.***>

jkroonza · 2023-01-03T22:29:39Z

Without quoting (because the interface confuses me here and it's getting absurdly hard to sort it out):

If we can fix ip-pre-up, that solves our requirement for net/if-pre-up. Having looked at the code I do not thing this is a trivial endeavour though. Possible but not easy.

We still need/want a net/if-init.

And some form of -down script (doesn't really matter if we use ip-down or ipv6-down, but we generally can only take action on the last one ... how to figure that out, thus for me net/if-down makes sense to execute once all NCPs have been brought down.

ip-* only executes if IPCP is being negotiated, so it's conceivable that in the medium to long term ip-pre-up may no longer be usable. We certainly are trying to get to a point where we can ONLY have IPv6 on interfaces where possible, and tunneling IPv4 EDGES via IPv6 (most likely candidate currently is 464 XLAT or similar). I thus reason that a net/if-pre-up makes sense anyway.

I'm not sure net/if-up makes sense, for that matter even ip-up and ipv6-up makes little sense to me as everything we want to achieve should happen in pre-up. I guess there are possible absurd things like "enable bgp peers" that should only happen once the protocol is actually up though (crazy example, since I can't think of anything better). This is protocol specific though, and I can't envision any need for net/if-up.

So the issue on Linux happens with a sequence like:

LCP comes up.
AUTH phase happens (in neither, one or both directions, doens't matter).
NCPs starts. (let's say IPCP and IPV6CP).
IPV6CP completes, and brings interface UP. Only IPv6 LL at this stage. Other handling happens outside of pppd by way of RAs etc ...

at this stage the interface is UP and starts passing IPv6 packets.

There is no ipv6-pre-up either (adding this makes a lot of sense for Solaris, and my PR which has been closed can be used for this). In Linux this would suffer the same issue as ip-pre-up currently. One of the two would execute with the interface being down, but not both. And if another NCP is also being negotiated, it could be neither.

No IPCP completes, and pppd does this:
Configure the interface's address.
At this point packets can already be passed (Linux, not Solaris).
ip-pre-up executes.
Interface is "brought up" (ie, counter incremented).
ip-up executes.

Not too worried about teardown. Hoping this brings some clarity as to the perceived problem.

I'm thinking that documentation on Linux should be updated to reflect reality, as that is MUCH, MUCH simpler than fixing the overall problem (which I believe would require modification to ALL NCPs - some of which may not even be in-tree).

I'm thinking introducing interface/network level init, pre-up and down makes sense since these are protocol (NCP) agnostic.

I'm thinking having an option to explicitly execute any NCP specific -pre-up PRIOR to actually configuring also makes sense for me.

Currently ip-pre-up permits for interface renaming. Which is where the idea to permit this from net-pre-up came from. And there was quite a nasty other patch floating around to enable ifname to use % codes to include things like usernames etc into the interface name. The primary motivation being to set the interface name based on the authenticated username, eg, if user jkroon dials in over l2tp/ipsec, the interface can be renamed to something like vpn-jkroon rather than pppX. This can only be done after authentication. We don't have a specific need for this to be honest, this forms part of our state files.

The state files I reference are basically short key=value pair text files we keep on a per-interface basis, for example:

direction=in
protocol=l2tp
user=jkroon@realm

Except for in net-init and net-down these are append-only files. In net-init it's blanked, in net-down it's removed. This allows us to generate lists of connected users, and the mechanism they're using to connect. Information about LACs and calling station IDs are also added to this file. Yes, most/all of this information is also available radius side, but it's still convenient from a system administrator perspective to have this information "locally" on the LNS.

In terms of where we come from, we're an ISP, so using options like "unit" and "ifname" in the options files is precluded since it would cause problems, and we can't afford to have a per-user options file, not to mention that you can't reliably from pppoe-server nor xl2tpd determine which options file would be required until after PPP auth has happened.

Regarding having the option to "modify" behaviour - I do concede that this is nasty, but I don't see a better mechanism. The choices with respect to ip-pre-up (and any other NCP specific pre-up) are as follows:

Leave broken. But at least document it.
Try to fix it, probably requiring all NCPs to be modified.
Leave broken by default, document it and give an option for changed behaviour (and warn if it's not set). Eg, preup-prior-to-conf and preup-after-conf (naming is too long, but you get the idea, issue a warning if neither option is set ... in Linux).

I'm not convinced of any of this, and as per yourself, this just doesn't sit quite right.

carlsonj · 2023-01-04T14:54:05Z

On 1/3/23 17:29, Jaco Kroon wrote: Without quoting (because the interface confuses me here and it's getting absurdly hard to sort it out):

Agreed on that. Not sure what was so wrong with plain text email that we have to go through this. I guess any existing wheel can be reinvented in a substantially less than round form.

If we can fix ip-pre-up, that solves our requirement for net/if-pre-up. Having looked at the code I do not thing this is a trivial endeavour though. Possible but not easy.

If I get a spare moment, I'll look into it. It's not "trivial," but I do think it's a lot more durable. As I'm sure you know, one of the big problems with an older code base is the accumulation of little hacks everywhere -- technical debt. That's my main concern; that we avoid bolting little things on the side over and over for each project that comes along. They inevitably require slightly different things and have no built-in knowledge of each other.

We still need/want a net/if-init. And some form of -down script (doesn't really matter if we use ip-down or ipv6-down, but we generally can only take action on the last one ... how to figure that out, thus for me net/if-down makes sense to execute once all NCPs have been brought down.

A solution that detects all NCPs up would naturally be able to detect the down event as well, though I don't really understand this specific concern. Maybe what you're asking for is script-run-on-IFF_UP-cleared as some kind of always-final "net-down". That's possible; just not sure how useful. As a system architectural matter, I think we're kinda doing all of this wrong solely because of legacy issues. In a modern system, anyone who cares about interface up/down can just monitor a routing socket and get all those events delivered straight from the kernel, free of charge. No hokey scripts or other nonsense required. Clearly, that doesn't work for detecting the "I haven't marked it up, but I will shortly" event and synchronously blocking the flag change until some action is complete. That case does require something like ip-pre-up or an internal callback hook for a plug-in. (At least until someone dreams up a way for a routing socket to allow for interposition on requests.) I don't know what your application is doing, but if I were writing it, that's what I would do. After all, an Ethernet interface also can provide network service but has NONE of these scripts. WWED ("what would Ethernet do") is to me a fair way to sort out proposed PPP features.

ip-* only executes if IPCP is being negotiated, so it's conceivable that in the medium to long term ip-pre-up may no longer be usable. We certainly are trying to get to a point where we can ONLY have IPv6 on interfaces where possible, and tunneling IPv4 EDGES via IPv6 (most likely candidate currently is 464 XLAT or similar). I thus reason that a net/if-pre-up makes sense anyway.

I have a lot less sympathy for that one. What on Earth is anyone to do in such a script? "This interface is about to go up, but only for some mystery protocol" is not a coherent event. I can understand "authentication has completed." I can even understand wanting a synchronous call on authentication completion, though I think it's dangerous in general. I'm sort of baffled about "some NCP I'm not naming" as an event. Interface is about to go up for IPv4? Add filters and maybe some proxies or tunnels. Up for IPv6? Add filters and tunnels. What's the "generic" form of that? At worst, the current design forces you to deliver two scripts, ip-pre-up and ipv6-pre-up (assuming we add that, and assuming the bugs are fixed), that have identical contents and idempotent behavior. So what? I guess someone could argue that it "doesn't scale." But that also doesn't matter when we add a new network layer protocol about once per human generation.

I'm not sure net/if-up makes sense, for that matter even ip-up and ipv6-up makes little sense to me as everything we want to achieve should happen in pre-up. I guess there are possible absurd things like "enable bgp peers" that should only happen once the protocol is actually up though (crazy example, since I can't think of anything better). This is protocol specific though, and I can't envision any need for net/if-up.

The original reason for "ip-up" was to trigger local sync processes. A common one was to kick off a fetchmail process or a netnews send/receive. Granted, as I said above, I think it's a historical mistake. There should have instead have been a separate cron-like service that allows you to schedule things to run based on interface availability -- with the daemon reading from a routing socket rather than setting timers. Doing it just for the special case of PPP is suspicious at best.

So the issue on Linux happens with a sequence like: LCP comes up. AUTH phase happens (in neither, one or both directions, doens't matter). NCPs starts. (let's say IPCP and IPV6CP). IPV6CP completes, and brings interface UP. Only IPv6 LL at this stage. Other handling happens outside of pppd by way of RAs etc ... /at this stage the interface is UP and starts passing IPv6 packets./ There is no ipv6-pre-up either (adding this makes a lot of sense for Solaris, and my PR which has been closed can be used for this). In Linux this would suffer the same issue as ip-pre-up currently. One of the two would execute with the interface being down, but not both. And if another NCP is also being negotiated, it could be neither.

Are you referring to the current code or how it *should* operate? Personally, I think the pppd daemon should at least operate according to what is documented in the man page. That it does not on Linux is a bug, not a feature. And certainly not something we need to retread in every single message on this topic.

No IPCP completes, and pppd does this: Configure the interface's address. At this point packets can already be passed (Linux, not Solaris). ip-pre-up executes. Interface is "brought up" (ie, counter incremented). ip-up executes.

Again, this appears to be just a restatement of the existing bug. I'm not sure how this helps or makes anything any clearer.

Not too worried about teardown. Hoping this brings some clarity as to the perceived problem. I'm thinking that documentation on Linux should be updated to reflect reality, as that is MUCH, MUCH simpler than fixing the overall problem (which I believe would require modification to ALL NCPs - some of which may not even be in-tree).

Definitely, absolutely not. This would be terribly wrong. The whole point of ip-pre-up is that it runs before the interface is marked "up." The whole point. As in: just delete all the code instead or make it "ifndef LINUX" if the plan would be to update the documentation. There is absolutely no excuse for a feature that "sometimes, sort of, kinda, maybe" works. It's an attractive nuisance. The question of outside-of-tree NCPs is an interesting one. I don't know if there's (say) a Linux MPLS that has its own "sifup" function that torques the IFF_UP flag on "the interface." Heck, I don't know of Linux MPLS uses the same interface structure as regular network protocols -- if it does, then I think that'd be pretty wild. In any event, plug-ins have *always* been an architectural issue with pppd because there's very little constraint on the API. You can do arbitrary things in a plug-in that drag pppd outside of the existing documented behavior and/or out of alignment with the RFCs. So, having effectively a flag day for them doesn't sound out of bounds to me. More importantly: it matters much more to me that stock pppd works the way it's designed to work and less so that any hypothetical plug-in stays functional. I suspect we can accomplish either a compilation error or a run-time crash for any such unmodified NCP plug-in.

I'm thinking introducing interface/network level init, pre-up and down makes sense since these are protocol (NCP) agnostic.

The init and down one seem somewhat innocuous. I think they may well not be nearly as widely useful as you're expecting (certainly less useful than the existing ip-up; likely only for your project), but at least they're not actively harmful. I'm having a little trouble with a generic pre-up, as I just don't know what it means. Yeah, it's not too hard to implement (once the rest of the infrastructure is added), but I'm struggling with the utility of it. It seems like you're trying to future-proof something for a world where there's neither IPv4 nor IPv6. Or something like that. But without even a clear use-case for what actions might be taken in those scripts.

I'm thinking having an option to explicitly execute any NCP specific -pre-up PRIOR to actually configuring also makes sense for me.

What does "actually configuring" mean in this context? Does it mean "adding the IPCP negotiated IP address to the interface?" If so, then once again I must ask "why?" What can you accomplish before the address is set on the interface that cannot be accomplished after it is set? If you want to be in the middle of IPCP negotiation, approving or changing addresses, then that's a different discussion.

Currently ip-pre-up permits for interface renaming. Which is where the idea to permit this from net-pre-up came from. And there was quite a nasty other patch floating around to enable ifname to use % codes to include things like usernames etc into the interface name. The primary motivation being to set the interface name based on the authenticated username, eg, if user jkroon dials in over l2tp/ipsec, the interface can be renamed to something like vpn-jkroon rather than pppX. This can only be done after authentication. We don't have a specific need for this to be honest, this forms part of our state files.

Ah, ok. That (finally) makes some sense. That's the sort of thing I wanted: information about what you're actually trying to do. For what it's worth, we already have that feature in the built-in authentication. /etc/ppp/chap-secrets: ***@***.*** * "joe's secret" * -- ifname ppp-joeuser I don't know about the RADIUS or Diameter plug-ins. They should have features like that. It'd be a shame if they didn't as it's a long-standing feature in pppd.

The state files I reference are basically short key=value pair text files we keep on a per-interface basis, for example: |direction=in protocol=l2tp ***@***.*** | Except for in net-init and net-down these are append-only files. In net-init it's blanked, in net-down it's removed. This allows us to generate lists of connected users, and the mechanism they're using to connect. Information about LACs and calling station IDs are also added to this file. Yes, most/all of this information is also available radius side, but it's still convenient from a system administrator perspective to have this information "locally" on the LNS.

OK. So *not* pppd files. That was my point of confusion. Since I really haven't spent a great deal of time with L2TP, one area that seems a little confusing to me is that this seems to be relying on PPP authentication data rather than SCCRP Host Name and Challenge. But ok. In this case, since your system is the one that is demanding authentication, the existing auth-up and auth-down scripts do almost exactly what you want. auth-up is invoked when the user has been authenticated by PPP, and auth-down is invoked when the link goes down after having done auth-up. (And pppd has some logic to make sure that we never overlap auth-up and auth-down, in case the scripts run for a "long time.") It sounds like your concern there is that these scripts aren't synchronous -- that we don't block the state machine progress based on script termination. But I don't think that's a serious issue given the usage case. I would just write the authentication data to one file via auth-up, and then write the addressing data to another file with ip-up. The user interface that displays "logged in users" can read both and correlate based on interface name. If it's just a matter of convenience, that seems more than sufficient. If it's something with tighter constraints, then how do those fit in? Perhaps what you really want here is a custom plug-in to do all the special actions you need (and perhaps an API or two to enable them) rather than a slate of new general-purpose mechanisms in pppd.

In terms of where we come from, we're an ISP, so using options like "unit" and "ifname" in the options files is precluded since it would cause problems, and we can't afford to have a per-user options file, not to mention that you can't reliably from pppoe-server nor xl2tpd determine which options file would be required until after PPP auth has happened.

There's already a feature to do that in the built-in pppd authentication mechanism -- it's the "extra_options" mechanism in auth.c. No special options files needed.

Regarding having the option to "modify" behaviour - I do concede that this is nasty, but I don't see a better mechanism. The choices with respect to ip-pre-up (and any other NCP specific pre-up) are as follows: 1. Leave broken. But at least document it. 2. Try to fix it, probably requiring all NCPs to be modified. 3. Leave broken by default, document it and give an option for changed behaviour (and warn if it's not set). Eg, preup-prior-to-conf and preup-after-conf (naming is too long, but you get the idea, issue a warning if neither option is set ... in Linux). I'm not convinced of any of this, and as per yourself, this just doesn't sit quite right.

4. Break these things up to reduce the dependencies: a. Fix the bugs (places where pppd doesn't conform to its existing documentation and/or where the behavior or the documentation are incomplete or incoherent). b. Create a plug-in that provides the L2TP dial-in access server glue, potentially added some new callback hooks to make possible whatever it needs to do.

…

-- James Carlson 42.703N 71.076W FN42lq08 ***@***.***>

jkroonza · 2023-01-04T15:43:34Z

Thank you, that feedback was very clear and concise, much appreciated. Fully agreed around technical debt, I've spotted a fair amount of that in ppp project just trying to implement this. Hoping to SOLVE some of this for the future, not add to it, so your input goes a long way for that. Regarding comparing ethernet to ppp ... I don't think that's a always a fair comparison, ethernet is designed as a multi-access network, not point-to-point, and whilst many of the issues may be similar, this is not always the case, especially when you consider ISP type use. If you look at dhcpcd for example you will note it also has a very comprehensive hook script mechanism that can be used to take a number of actions, and for us these actions are very different when compared to ppp. So even on a client or "access" side, we generally treat this very differently.

I was not aware of extra_options, to the best of what I can determine looking at auth.c this only applies to local secrets files.

163 /* Extra options to apply, from the secrets file entry for the peer. */
164 static struct wordlist *extra_options;

Yes, to a degree I think net-pre-up is more future-proof than ip-pre-up as it does not rely on IPCP from being run. It also makes it more clear that it will execute regardless of NCPs being negotiated. This world is probably further off than I'd like to see, but I figured whilst I'm messing/tampering with it I might just as well.

I personally do believe that net-pre-up vs ncp-pre-up makes some sense. And I do believe both have their uses. For example, it makes more sense for me to do NCP agnostic work (like setting up tc on the ppp interface) from a network-agnostic script, and then to possibly amend or add to it from ncp-pre-up PRIOR to the interface coming up. Frankly, in this specific use-case we have nothing protocol specific going on, however in some other cases we do have some ipv4 vs ipv6 specific things going on. the ipv6 stuff currently due to lack of ipv6-pre-up just happens from ipv6-up, and we use procfs to not auto-configure an interface by default, but then on the ppp interface if we'd like it enable it from ipv6-up, so this isn't a major blockage currently. Based on the fact that this has not been requested previously, I suspect that either IPv6 penetration isn't what we'd like it to be yet, or very few people care enough (or have figured out workarounds like we have). Towards this end, no, I do not believe that it's one or the other, it really should be both.

I do agree I'd prefer to have the interface configured for the NCP prior to invoking the ncp-pre-up, but if we can't guarantee that this can happen, then I'd like the OPTION of having it execute PRIOR to configuring the interface with IP addresses, but with the environment variables already set. As we've established, behaviour on solaris is sane, linux is not. I'd rather have the issue documented at the very least in the man page, and workarounds provided if the issue can't be properly fixed. With an option to pick which quirky behaviour the system administrator prefers, and I see three options (based on your input too):

Delay interface UP until all NCPs have completed (you stated you will investigate options, there are downsides to this w.r.t. buggy remotes resulting in long, sometimes indefinite negotiation, and simply stating the remote side should fix isn't always viable/practical).
Execute pre-up prior to configuring (adding addresses) to interface, with no guarantee that interface is still down.
Execute pre-up after configuring, with no guarantee that interface is still down.

I'll happily concede that 1 should be the default, but I'd like the option of behaviour 2 or 3 too.

I'd like to have the options that interface comes up even if only one NCP completes, in case of remote bugs (being on the ISP side it's hard to guarantee what stupidity we face from the remote side, we've seen some ... interesting things).

We are pushing to a world where we can deploy routers that supports 464 XLAT such that we can move to a world where we only have IPv4 where absolutely required, so these routers will NACK IPCP, and then NAT46 IPv4 into the well-known 64:ff9b::/96, which the core ISP network can then NAT64 again on some dedicated host(s) which is connected to the world with IPv4. This is still a bit off, but yes, I'm at the same time trying to future-proof for this. IPv4 is fast becoming a scarce resource, and at the current IP lease rates this isn't something we're looking forward too, and thus looking to mitigate early.

With regards to l2tp and pppoe other "outer" protocols, we can trust the l2tp host (FNO controlled, and if we can't trust them, then we've got substantially bigger problems than rogue clients), but we can't trust the pppoe nor the ppp side of things. In the case of the FNO bridging PPPoE to L2TP the SCCRP authenticates the LAC, not client, so the LAC (which bridges between PPPoE and L2TP) is authenticated at this stage, but ppp ends up authenticating the final client. VPN use typically only authenticates in ppp anyway, and SCCRP doesn't happen here (PPP/L2TP/IPSec for example).

I've not actually used MPLSCP on ppp yet, so I honestly don't know what's all involved there, or even how it works, but we will have to look at this later in the year for another project.

carlsonj · 2023-01-05T20:11:29Z

On 1/4/23 10:43, Jaco Kroon wrote: Thank you, that feedback was very clear and concise, much appreciated. Fully agreed around technical debt, I've spotted a fair amount of that in ppp project just trying to implement this. Hoping to SOLVE some of this for the future, not add to it, so your input goes a long way for that. Regarding comparing ethernet to ppp ... I don't think that's a always a fair comparison, ethernet is designed as a multi-access network, not point-to-point, and whilst many of the issues may be similar, this is not always the case, especially when you consider ISP type use. If you look at dhcpcd for example you will note it also has a very comprehensive hook script mechanism that can be used to take a number of actions, and for us these actions are very different when compared to ppp. So even on a client or "access" side, we generally treat this very differently.

DHCP is an interesting comparison because DHCP can be used over PPP with both IPv4 and IPv6. The fact that we negotiate DHCP-like things within some (but certainly not all) of the NCPs is more of a bug than a feature, but is definitely water under the bridge. In particular, the DNS weirdness in IPCP was delivered to the WG as a fait accompli by a Particular Large Vendor, and I think predates my time as chair. In any event, it wasn't at all a pleasant surprise. It stinks and is a terrible mistake we have to drag along with us "forever." (For those following along: DHCP isn't just an address leasing mechanism, but can be used with already-addressed interfaces to provide all sorts of application layer data. It -- and the BOOTP subset -- have a long history with serial links prior to PPP, where BOOTP over SLIP was once somewhat common and served exactly the same purpose.)

I was not aware of extra_options, to the best of what I can determine looking at auth.c this only applies to local secrets files. 163 /* Extra options to apply, from the secrets file entry for the peer. */ 164 static struct wordlist *extra_options;

Eh, probably so. Oh well. In any event, that was the intended design here: a given identity can unlock a set of configuration values. It has the downside that it happens after authentication is complete, so you can't set some of the basic link parameters on a per-user basis, but you can definitely tweak all of the network layer things you might need. You'd probably have to talk to someone involved with RADIUS/Diameter to find out how (or "if") such a thing would be usable there. Worst case, one could augment the AAA plug-in so that it also reads a file or database to map from authenticated user ID to a set of pppd options to inject via extra_options. In any event, it wasn't intended as *exclusively* for authentication via built-in file mechanisms.

Yes, to a degree I think net-pre-up is more future-proof than ip-pre-up as it does not rely on IPCP from being run. It also makes it more clear that it will execute regardless of NCPs being negotiated. This world is probably further off than I'd like to see, but I figured whilst I'm messing/tampering with it I might just as well.

If we had the Linux bug fixed and you hooked both ip-pre-up and (an added) ipv6-pre-up, I suspect you'd have a solution that substantially outlasts at least my lifespan. And probably the marketspan of customers initiating PPP links as well. So that doesn't quite do it for me. (But see below.)

I /personally/ do believe that net-pre-up vs ncp-pre-up makes some sense. And I do believe both have their uses. For example, it makes more sense for me to do NCP agnostic work (like setting up tc on the ppp interface) from a network-agnostic script, and then to possibly amend or add to it from ncp-pre-up PRIOR to the interface coming up. Frankly, in this specific use-case we have nothing protocol specific going on, however in some other cases we do have some ipv4 vs ipv6 specific things going on. the ipv6 stuff currently due to lack of ipv6-pre-up just happens from ipv6-up, and we use procfs to not auto-configure an interface by default, but then on the ppp interface if we'd like it enable it from ipv6-up, so this isn't a major blockage currently. Based on the fact that this has not been requested previously, I suspect that either IPv6 penetration isn't what we'd like it to be yet, or very few people care enough (or have figured out workarounds like we have). Towards this end, no, I do not believe that it's one or the other, it really should be both.

tc might be a viable explanation of such an agnostic script. That's what I was trying to push towards: a usage case, so it's possible to document why such a feature is there and how anyone might use it somewhere and (crucially) what it could depend on. Many things seem to turn out poorly if there's no usage case because we end up getting all the boundary conditions wrong. Note that the existing "set_filters()" function is invoked after authentication is complete and before the network layers start negotiating, so it's all pretty well staked out. Though it's also the case that for scaling to very large numbers of users, you may well not want to have a bunch of scripts tying everything together. But I guess that's your issue to navigate.

I do agree I'd prefer to have the interface configured for the NCP prior to invoking the ncp-pre-up, but if we can't guarantee that this can happen, then I'd like the OPTION of having it execute PRIOR to configuring the interface with IP addresses, but with the environment variables already set. As we've established, behaviour on solaris is sane, linux is not. I'd rather have the issue documented at the very least in the man page, and workarounds provided if the issue can't be properly fixed. With an option to pick which quirky behaviour the system administrator prefers, and I see three options (based on your input too): 1. Delay interface UP until all NCPs have completed (you stated you will investigate options, there are downsides to this w.r.t. buggy remotes resulting in long, sometimes indefinite negotiation, and simply stating the remote side should fix isn't always viable/practical).

It's not indefinite. The existing pppd FSM parameters (and the design of PPP itself) prevent that.

2. Execute pre-up prior to configuring (adding addresses) to interface, with no guarantee that interface is still down. 3. Execute pre-up after configuring, with no guarantee that interface is still down.

It sounds like you are still describing bugs, so I think I need to work through some examples here. First of all, although the underlying protocol design allows it, and any good definition should allow it, the current pppd implementation does not actually support any "passive" NCPs. (Only LCP can be marked "passive," allowing the peer to initiate the link first.) Once we get to continue_networks(), all of the NCPs are given the "Open" event at once and they all start negotiating. This is then bounded overall by the restart timer and the maximum Configure-Request count. So, a good but complicated starting example starts with these: 1. LCP negotiates and goes to Opened state. 2. CHAP negotiates and goes to Opened state. 3. CPP+MPPE negotiates and goes to Opened state. At this point, we'll issue the "Open" event to all of the regular NCPs, and they start negotiating in parallel. The interface is affirmatively *DOWN* and nothing is yet configured. Now, we *could* invoke a net-pre-up script here, but I'm wobbly on that unless the excuse is just to invoke 'tc' commands on Linux. In which cause ... eh ... ok. Hope you didn't actually need IP addresses for that. In the current code (WITHOUT a fix), the next step gets ambiguous because of a Linux bug. Specifically, we could end up doing this specific sequence: 4x. IPV6CP finishes and marks interface UP 5x. IPCP finishes, sets addresses, and causes ip-pre-up to run (BAD! BUG! STOP!). I am once again arguing that this is straight-up wrong. It's a problem. It must be fixed. We just can't live like that. It's not something to document and pretend never happened. It's not a reason to introduce a new "fixed" interface and let the existing code rot in place as though it hadn't been thought about. Assuming a fix, the continuation looks like this: 4. NCPs negotiate with some ending in Opened state and some in Starting state (following LCP Protocol-Reject and a lower-down event). The NCPs that happen to negotiate addresses set them on the interface as part of this process, but AGAIN the PPP network interface is still DOWN. 5. After all NCPs settle into one of those two states as above, then invoke ip-pre-up and ipv6-pre-up, as indicated, and wait for them to terminate synchronously. Note: interface is still DOWN. 6. Now set IFF_UP in a platform-dependent manner. 7. Finally, invoke ip-up and ipv6-up. That's the normal (albeit a little complicated) route. The only interesting part comes if someone decides to start up an NCP in the middle of a connection. The underlying protocol allows this, and pppd also supports it, though the sequence itself is made tricky because of the lack of "silent" mode. This case looks like: 4-7. The protocols come up normally and we do as above. 8. The peer sends IPCP Terminate-Request. This causes the IP address to be removed, but if IPv6 is still running, nothing else happens. The interface is still up. The IPCP state machine is left as Stopped. Only IPv4 is effectively "down." 9. Some time passes. 10. The peer sends IPCP Configure-Request. We renegotiate IPCP and add the address back on. At this point, the interface is still up and we (somewhat bogusly for Linux only) invoke ip-pre-up again and then ip-up. Even with the fixes described so far from either of us, this is still broken. That's the one case where I think we would need to do something special for Linux. Is this the case you're worried about? If so, we could possibly do something like check whether ip-pre-up exists and, if so, then temporarily take the interface down, set the address, run the script, and bring it back up. That'll cause a short outage in IPv6, but probably nobody really cares. (If we had "silent" mode for IPCP or IPV6CP, we could start the protocol off in Stopped state and allow it to come up later. The existing FSM implementation supports this, but there's no option to enable it because nobody has ever needed it.)

I'll happily concede that 1 should be the default, but I'd like the option of behaviour 2 or 3 too.

I don't want (3) as stated. That's clearly a bug. Either the interface is down for the protocol or the "pre-up" can't be used. I can see how (2) might be a hack-around for the specific IFF_UP problem on Linux, particularly for the re-negotiation case described as (10) above. I don't particularly like this, though, especially as it has unknown side-effects with existing scripts. It's entirely possible that existing scripts expect the address to be on the interface so that "route add" can be used with a next-hop address. (Yeah, on some systems you don't and shouldn't need a next-hop address for point-to-point. But legacy stuff and/or confused admins ...) I'd very much rather that (regardless of whether the bug is fixed or not) we have good hooks for a plug-in that wants to do something weird during the address-setting process. Hooks like that are nearly free in terms of maintenance cost. And that would enable you and others doing similar things to make progress without having to integrate a change now.

I'd like to have the options that interface comes up even if only one NCP completes, in case of remote bugs (being on the ISP side it's hard to guarantee what stupidity we face from the remote side, we've seen some ... interesting things).

We have those options today. See the "*-max-configure" and "*-restart" options. They do *exactly* that. We give up on a protocol and pretend that the peer has rejected it if the peer has bugs. We even do it *by default*, because the options have default limits on them. I don't see how you can do any better than that. There are few truly timely and robust ways to discover that a given peer is clueless.

I've not actually used MPLSCP on ppp yet, so I honestly don't know what's all involved there, or even how it works, but we will have to look at this later in the year for another project.

Bit of a red herring. I was casting about for what network layer protocols other than IPv4 and IPv6 you could possibly be caring about that could possibly be involved somehow in the setting of the IFF_UP bit, and wasn't sure who your end users might be (yours aren't necessarily the same as mine). All of the other network layers that an ordinary user might care about (IPX, AppleTalk) are long since dead and buried. That leaves MPLS, which is really used as a datagram replacement for ATM or Frame Relay in backbone applications. No ordinary home user really ought to be asking for MPLS, though it might be possible for some kinds of business links. (Basically, the same places where they'd buy a dedicated circuit.) But if we did have it, my point was that it's unclear to me how it'd actually be hooked in. It's more L2-ish than L3-ish, so I'd think it'd be really weird if it showed up as part of "ppp0" or that it would care about IFF_UP. At least that'd be weird to me; I don't know what Linusians think about the issue. It should maybe look like whatever an InfiniBand underlying interface looks like. And if you meant something like IPv9 (since the numbers through at least 8 have already been proposed and shot), well, nah. We're having a tough enough time getting IPv6 deployed after 28+ years since IPng and CLNP got hashed out in Toronto. I mean, I don't want to be the one who says "640k ought to be enough," but I wouldn't hold my breath expecting something else. (Yes, there are admittedly potential wildcards. The alphabet-related one certainly comes to mind, as do a few national governments. But I'll just keep my eyes closed and trust that deployment is very much harder than any of them might think.)

…

-- James Carlson 42.703N 71.076W FN42lq08 ***@***.***>

jkroonza · 2023-01-06T07:38:45Z

640KB :p.

For tc commands, yes, we definitely DO NOT require any form of IP information here. At least, for what we use it for :).

Your "later renegotiate" reasoning looks solid. But I disagree, once the interface is up it should remain up. It's trivial to disable ipv6 temporarily on a per interface basis via sysctl (net.ipv6.conf.${interface}.disable_ipv6), but I don't see similar for ipv4. And once disabled, to re-enable experience dictates that a number of other sysctl's for same iface also needs to be properly reset. I've not dug too deep into this. This is a really, really sticky case, but not one I'm too worried about.

I still do not like there being any form of longer delay bringing the interface up than absolutely required. I really feel strongly that (on Linux at least) this should be configurable behaviour, or at the very least, if no ncp-pre-up scripts are present, should default to not delaying bringing the interface up. Since pre-up's are handled from the NCPs this gets tricky again and would require co-operation from NCPs to indicate which has it.

For downing the interface temporarily that's a bad idea IMHO, and again, if this is the way you want to go, sure, but I'd definitely want a configuration option to avoid this behaviour.

Given the discussion, would it be OK for me to so long amend the net-* scripts (would you prefer if-*?) to provide for network agnostic net-init (executes prior to auth even, the moment LCP has been established), net-pre-up (auth completed, either when first NCP signals ready, or prior to NCP being initiated), net-down (after LCP has shut down, prior to deleting the interface from the OS).

Can this be done currently by way of plugin rather than coding this into the core? Then users looking for this can just load this as a plugin rather, eg, netscripts or ifscripts?

carlsonj · 2023-01-06T22:26:31Z

On 1/6/23 02:38, Jaco Kroon wrote: 640KB :p. For tc commands, yes, we definitely DO NOT require any form of IP information here. At least, for what we use it for :). Your "later renegotiate" reasoning looks solid. But I disagree, once the interface is up it should remain up. It's trivial to disable ipv6 temporarily on a per interface basis via sysctl (net.ipv6.conf.${interface}.disable_ipv6), but I don't see similar for ipv4. And once disabled, to re-enable experience dictates that a number of other sysctl's for same iface also needs to be properly reset. I've not dug too deep into this. This is a really, really sticky case, but not one I'm too worried about.

Agreed; it would be messy at best. Turning the bit off may well have undesirable side-effects. I'm not too sure what to do about that one. I don't think there are any really good answers for it.

I still do not like there being any form of longer delay bringing the interface up than absolutely required. I really feel strongly that (on Linux at least) this should be configurable behaviour, or at the very least, if no ncp-pre-up scripts are present, should default to not delaying bringing the interface up. Since pre-up's are handled from the NCPs this gets tricky again and would require co-operation from NCPs to indicate which has it.

Conditioning the behavior (setting the IFF_UP flag) on the existence of a pre-up script makes sense to me. It's not really observable in any meaningful way without that script in place. I'm not really sure that such a wait does, in fact, result in any concerning delay, though. For properly-functioning implementations, we still have to go through LCP+AUTH+NCP to get to an 'up' interface, no matter what we do. Of those bits, the frequent outlier is the authentication portion, not the NCP, as it often proxies into some external AAA mechanism. (Yes, I know about the bad case of 3GPP, where they acknowledge all authentication data without bothering to check any of it, and then do the expensive AAA stuff during NCP. The less said about that, the better.) Even with a poorly-behaved (but still conformant) implementation, we're talking about _maybe_ a couple of seconds worth of skew. It's only the hypothetical really, really bad implementation that completely ignores the Configure-Request messages for an NCP and forces us into timing out completely that causes a noteworthy delay. I'd rather document that as "here are the risks associated with pre-up if your peer stinks, and your options for a work-around." (With the work-around being either don't use pre-up, tune the existing timeouts and retries downwards, or just disable the faulty protocols and drive on.)

For downing the interface temporarily that's a bad idea IMHO, and again, if this is the way you want to go, sure, but I'd definitely want a configuration option to avoid this behaviour.

Agreed. I was just trying to think out loud about the larger problem to see where any sort of "fix" for it might break down. I think I'm ok with saying that if pre-up exists, then we just don't allow protocols to be renegotiated like that without taking LCP down. I think that if you're using pre-up, you're *presumably* doing it from some Really Good Reason and not just on spec. As such, it should do what it advertises to do and not something arbitrarily different.

Given the discussion, would it be OK for me to so long amend the net-* scripts (would you prefer if-*?) to provide for network agnostic net-init (executes prior to auth even, the moment LCP has been established), net-pre-up (auth completed, either when first NCP signals ready, or prior to NCP being initiated), net-down (after LCP has shut down, prior to deleting the interface from the OS).

Those will need to be nailed down a bit more. I'm still fuzzy on the purpose of net-init as described here. You'd have no available information there about the identity of the peer, so the whole bit about renaming the interface or issuing per-user 'tc' settings seems out the window. It's not clear you could do much in that script, but I _guess_ the purpose is to do "rm -f" on a pile of left-over external state that might exist in this particular implementation ... ? If that's it, then I'd wonder whether there's a better answer somewhere. Is net-init also synchronous like the pre-up script and does it thus block LCP from executing? It seems so. The "net-pre-up" needs to be defined more crisply. If it happens right after auth + CCP/ECP, runs synchronously, and blocks the starting of NCPs, then I think I understand it. If it's anywhere else, then I'm much more wobbly on how to describe it for a user. "After the first NCP" seems like a bad definition to me. It means that it ends up with an unclear relationship with ip-pre-up. It's just confusing that way. In that case, it seems like you're trying to build a union of ipv4-or-ipv6-pre-up, and I think that's either pointless or just completely dependent on Linux issues and a fear of bad peers. As far as I can tell, the existing design serves the purpose, and there's no good reason I can see to force the union. Even if it means that you personally are forced to deliver two identical pre-up scripts and merge the events on the other end, I don't think the added bits in pppd for that part are worthwhile. If that's what you really want, then I think that belongs in a plug-in instead. It sounds like you might intend "net-down" to be synchronous as well. I'm not sure that's a great idea, because "down" scripts have a long history of hanging and causing resource leaks. I think the existing down scripts do as much as we can reasonably do here, though, for symmetry with "net-pre-up," I guess there could be a "net-down" that (presumably) is invoked after the other ip*-down scripts. I just think it should be async and guarantee nothing about the network interface. And, again, if you need to do something peculiar that would be hazardous for general use, I think a plug-in is the right way to go about that.

Can this be done currently by way of plugin rather than coding this into the core? Then users looking for this can just load this as a plugin rather, eg, netscripts or ifscripts?

I was mostly suggesting that for your particular L2TP application, you should probably explore the option of creating a plug-in that does what you want rather than trying to wire everything together with a bunch o' scripts (and thus requiring yet-more baroque scripting options). I don't think such a plug-in would be generically useful, but I guess that's up to you. We can easily create something like: /* Hook for a plugin to know when IP protocol is about to come up */ void (*ip_pre_up_hook)(u_int32_t our_addr, u_int32_t his_addr) = NULL; ... and then you can go to town on this state without affecting existing pppd users or adding any baked-in mechanisms that we'll need to continue supporting "forever" even when this L2TP application no longer exists.

…

-- James Carlson 42.703N 71.076W FN42lq08 ***@***.***>

jkroonza · 2023-01-08T08:28:46Z

I hear you regarding plugin vs external scripts. So from a system administrator perspective, it is simpler to just modify shell scripts, and it's also a lot simpler to debug these, and if something goes wrong with them, the probability of impacting or even crashing pppd is much lower. We prefer doing as little as possible from inside the pppd binary for this reason. It's also lower maintenance overhead to not have to keep non-public plugins in sync with the pppd project than notification scripts (the available scripts has to the best of my personal knowledge not changed in decades). I suspect I'm potentially one of the first pushing for that.

net-init - purpose: to notify that a new ppp negotiation has started. Basically, a new "unit" has been created. Most people will not find this useful, since quite frankly, as you say, there isn't much information available here. It's pre-auth, pre-everything. I don't recall how I implemented in the associated patch, so I'm not sure if it's actually pre or post LCP established. For our use-case this doesn't particularly matter. Our use-case is to reset out-of-pppd data we associate with the "unit" where the unit number is the primary key (if you think in terms of relational data). Other possible use-cases I can imagine relates to detection of attacks and blacklisting. Ie, if we get a number of initiations but auth never happens then potentially we're looking at some form of brute-force or denial of service attack. Anyhow, I agree this is probably fuzzy at best and difficult to define well in terms of alternative envisioned use-cases. IIRC this happens the moment we have a unit number, even before LCP is established but I'd have to re-check. Earlier == better here.

net-pre-up - purpose: to initialize networking prior to bringing up the network interface. As it's implemented right now, it comes up after the first NCP, which as per yourself is very unclear semantics, can ip-pre-up rely on net-pre-up having executed or not? So that's just not happy. Since we use net-pre-up instead of ip-pre-up for us that didn't matter. I'd prefer to keep delaying this as far as possible to the first NCP actually bringing the interface up, but yea, that's tricky because it means all plugins will have to signal that the NCP is now ready to configure the interface, then execute it's own pre-up, configure and then signal for bringing the interface up. Another question is if the interface isn't actually going to come up, why execute net-pre-up? The simpler methodology is to just execute this once the NCP negotiation is about to start, so it's really net-pre-ncp rather than net-pre-up. The stuff we intend to perform here is essentially house-keeping again, setting up tc against the interface, some rudimentary firewalling updates (from here we simply assume that either ipv4 or ipv6 or both may be negotiated and set up for both accordingly, it would be nice if we could separate ipv4 and ipv6 into their own -pre-up, but that doesn't work currently). Renaming the interface can happen here (not in init). I'm OK if this happens when we move into the NCP negotiation phase (sorry if my naming of phases aren't 100%) just after AUTH finished (or in the case of no auth, directly after LCP has established).

net-down - purpose: clean up out-of-pppd data, including undoing tc (the related ifb at least since when pppX gets destroyed any tc data directly associated with that will auto clean-up) and possible firewall updates. This would execute AFTER LCP has been torn down, and by implication AFTER all NCPs have been brought down (including their -down scripts have been executed). I hear you regarding the risks, please refer below.

net-up we don't care about, but if you want to keep with the NCP parallels it may be worthwhile to also consider this, to be executed once the interface is UP, whether that is delayed until all NCPs are "up" or the moment when the interface comes up ... hard to define. For Solaris this would (should) be once all NCPs are up in my opinion where it makes most sense. For Linux ... this is the sticky part around the entire discussion we've been circling the whole time. Frankly, to avoid this complication, let's just NOT implement this until it's asked for. But I think if this gets implemented it should be "after all NCPs have estalished or failed", in other words "when all NCPs have completed their negotiations". Re-negotiations may also mess with this, as such, this is an over-complication we (in my opinion) should not try and solve now, or ever ideally, only if someone asks for it.

You mentioned buffering of packets, I believe the Linux kernel already handles this to a degree (specifically, there is a file-descriptor from which packets are read, which implies the kernel is buffering them - or rather, buffering at least one, but probably more). So again, at worst, this may look like an "overly long latency issue" to the protocol, and except if it's exceptionally long as in the order of seconds it should be able to recover. I think these scripts executing timeously is the responsibility of the system administrator.

If the distinction is made between the interface state and the protocol state, then separate net-* or if-* (please indicate your preference) scripts compared to ncp-* scripts are perfectly sensible. Perhaps more so on Linux than Solaris. ncp-init may also in some cases make sense if a system administrator may want some way to indicate whether or not an NCP may or may not proceed based on authentication credentials or something (or perhaps net-pre-up should have some way of setting additional options based on whatever auth data may or may not be available - this way one could also use a private radius tag to provide data from radius which this script can then translate into pppd options - just spitballing here, since the radattr plugin already writes attributes to a file, eg, based on data from radius allow ipv4 or ipv6 to be negotiated, or even MPLS ... or associate the interface with a specific VRF in an MPLS world - I don't see a standard radius tag for this based on a VERY QUICK google, so some other mechanism may be required). Yes, there is an MPLSCP expernal plugin floating around which Gentoo have been including for a while now - I'm not sure the source.

I'd highly prefer all scripts to be synchronous, especially init and pre-up, down is less of an issue. With regards to these causing problems, suggestion perhaps, we can always adjust the log entries for executing scripts to indicate if they're being executed synchronously or asynchronously, as well as how long they actually took (in milliseconds), and for any synchronous script issue a warning if they take longer than time X, and we could even have a config option to wait max X time for synchronous scripts before sending them a SIGTERM, and then treating them as async, or simply forgetting about them. This has other risks I'm not sure I like, but frankly, sorting the scripts to not block should remain the responsibility of the system administrator of those scripts, and pppd's responsibility should end with warning in the man page about the risks. Something like this:

WARNING: This script is executed synchronously and if it doesn't complete in a timeous manner could cause protocol failures for the associated ppp protocol, and by implication the connection.

As a general rule, then I guess the protocol independent rules then for external scripts are:

-init - executed when protocol negotiation is initiated (ip-init when IPCP is started, ipv6-iit when IPV6CP is started, net-init when the ppp interface is created). sync.

-pre-up - executed prior to bringing the protocol interface up (see below regarding Linux). sync.

-up - executed after bringing the protocol interface up. async.

-down - executed after bringing the protocol down. sync/async/don't care, whatever the current status quo is.

Regarding Linux: -pre-up can be safetly executed prior to configuration in most cases, as long as the scripts don't depend on the configuration being present. Towards this end there is three strategies:

Delay interface UP status until all NCPs have completed negotiation (or at least all that have a pre-up script). This is mostly workable, with a few caveats:
1.1. Re-negotiations, or "later negotiations", eg, initially only IPV6CP is negotiated, then later IPCP is negotiated - do we bring if down just to be able to execute ip-pre-up with the down guarantee?
1.2. NCP negotiation failures may cause unnecessary delays (such as on 3GPP). There are timeouts here, as have been pointed out, so worst case delay should be a few seconds, I believe these timeouts have been introduced after I originally ran into this around 2003-2007, I've not looked into this.
Ignore the problem and execute as per current.
Execute the script prior to performing configuration at the very least.

I honestly don't like any of these options in full. Each have advantages and disadvantages. I personally think this should be configurable to indicate "which broken behaviour you'd prefer if you insist on using a protocol specific pre-up", something like:

pre-up-behaviour (Linux specific) one of the following:

delayup (default) - delay bringing the ppp interface up until all NCPs with pre-up scripts have completed negotiation. It has to be noted that in the (rare) case where a NCP is (re-)negotiated at a later stage the guarantee that during execution of pre-up scripts the interface is not actually guaranteed to be down.

delayconfig - delay configuration of the interface until after the pre-up script has executed. Environment variables containing the relevant information will still be set up prior to execution of the script.

historic - Would appreciate a better name here. The interface is configured prior to pre-up, for the first pre-up there is a reasonable chance that the interface may still be down, but it cannot be guaranteed.

One could elaborate, or link to a page explaining the issue.

At this stage, it may also be a good thing to bring the script names into the NCP specific protocol structure in a way so that consistent cross-protocol behaviour can eb enforced, but this may also be reasoning too far ahead. I'd have to dig a bit deeper, but for the moment I think if we can just kinda try to finalize the desired behaviour.

Neustradamus · 2023-03-18T08:36:04Z

@jkroonza: New commits have been added, maybe you can look :)

jkroonza · 2023-03-19T08:48:00Z

@jkroonza: New commits have been added, maybe you can look :)

On this PR? Or on master? I can see there's merge conflicts, either way, I don't think this is going to be ready for 2.5.0 ...

Neustradamus · 2023-03-19T17:31:13Z

@jkroonza: 2.5.0 is now ready if nothing new ^^
If you have a moment to look...

jkroonza · 2023-03-19T17:38:44Z

@jkroonza: 2.5.0 is now ready if nothing new ^^ If you have a moment to look...

What I've seen and tested I'm happy with.

I'll re-work this code once 2.5.0 is released, hopefully for 2.5.1 but I suspect not, otherwise I'll just use it as a patch of my own until 2.6.0.

For now I've discovered at least one flaw in this patch series that's annoying, to say the least, but for 99% of our use-cases it works. If fixed to work as per the discussion in this thread it'll solve the other 1% too, for us at least. It will, however, require more work and testing than what should be done prior to 2.5.0 at this stage.

Other pending on work on rp-pppoe (server specifically) is also quite a bit more urgent for us and I've got a handle on an xl2tpd bug too that's of the three probably top priority. So any spare time that I manage to carve out currently will go to that. Most of the voice (asterisk) R&D that's ongoing currently should be completed in the next three weeks hopefully, then I can swing back to networking for a stint.

Neustradamus · 2023-06-17T09:58:54Z

Hello @jkroonza, have you looked for your PR?

jkroonza · 2023-06-18T19:47:11Z

Hi,

Still need to loop back to this ... hopefully in the coming week.

jkroonza · 2023-06-19T10:01:40Z

Hey, I re-read this and other relevant discussions. I suspect I'll need to dig a bit deeper into the FSM implementation. There is definite work to be done here, and I'm happy to attempt it, going to need to sleep a bit on this.

Out of hand though, the fsm_callbacks struct - is it OK to remove say the up(fsm*) function in preference of calls like configure(fsm*), pre_up_script() and up_script()? Where the script calls really just provide a path (or potentially ncp_scripts) should be split into it's own structure.

Renaming and splitting here serves two purposes:

It prevents plugins from migrating silently without thought being given to the situation; and
We can run multiple network pre-ups in parallel (start the pre-up for the protocol the moment the NCP has completed, not wait for them until we are ready to up the actual interface, at which point we wait for all the pre-up scripts to complete).

{net/if}-pre-up would then execute first, guaranteed to complete, at which point other pre-ups can execute in parallel. This would mean that the FSM would need to become aware of scripts for various network protocols (And I'd put down into this too so we can wait for all of those too prior to pppd itself terminating). Since this work is happening anyway, we might just as well move this into the FSM so we can enforce behaviour of network scripts to that we can ensure -down doesn't get called prior to -up completing, and re-negotiation can't start (or complete?) until previous -down scripts have finished.

Additional (global) configuration options:

no-delay-iff-up - (Linux onl) By default on Linux bringing the interface into an up state is delayed until all network control protocols that has pre-up scripts that exist have completed (either successfully or failed) and all of these scripts (where applicable) have been executed.

no-down-if-for-preup - (Linux only) On Linux if a protocol gets negotiated after an interface has been brought up, then if said network protocol has a pre-up script, in order to maintain the constraint that the interface will be "still down" when the pre-up script it run the interface will be flagged down prior to configuring the protocol, including executing the pre-up, and then the interface will be re-enabled prior to finally executing the up script.

I'm not aware of any implementations that start negotiation after initial negotiations has completed, so perhaps the second option here should specify behaviour or "late" negotians (I have no idea for a name for such a option) with options such as:

down-iface-for-pre-up - (Default) bring the interface down in order to configure and execute pre-up.
only-if-no-pre-up - (Alternate default?) Permit a protocol if and only if there is no (known to FSM at least) pre-up script.
ignore-pre-up - Just don't execute the pre-up script.
leave-iface-up-for-pre-up - Execute the pre-up script even though the interface is already up.

Of these only only-if-no-pre-up and down-iface-for-pre-up IMHO really is sensible, but it may be that a sysadmin prefers to execute the script with the interface down, but really don't want to bring the interface down at all once it's up.

This overall is quite a major change, which has impact primarily in the Linux world, and if I do this right, almost none for Solaris.

@carlsonj @Neustradamus - opinions, objections, enlightenments, pitfalls or any other comments prior to me descending down this rabbit hole again?

Neustradamus · 2023-06-20T18:45:25Z

@enaess, @paulusmack, @EasyNetDev: What do you think about it and the last @jkroonza comment?

jkroonza · 2023-06-28T07:29:55Z

@enaess, @paulusmack, @EasyNetDev: What do you think about it and the last @jkroonza comment?

Really would appreciate a bit of direction here before commencing work.

paulusmack · 2023-08-03T08:15:20Z

I would prefer not to change the FSM callbacks since they correspond to events and actions defined in the relevant RFCs. The "up" action generally means that the two ends have reached agreement on how the link is to be configured. Then there is the question of how to do that configuring in practice in a race-free manner, which I assume is what we're discussing here. It seems to me that we have a higher-level state machine for which the "up" and "down" actions for the individual protocols become input events. Maybe expressing that state machine more formally might help with solving the races.

jkroonza · 2023-08-03T08:54:53Z

May I then simply implement a net-pre-up that executes (blocking) once auth is done but PRIOR to protocol negotiations starting?

And net-down (non-blocking) which executes just prior to taking LCP down?

Since ip-pre-up is then broken by design (and based on the feedback cannot be fixed), may I recommend that a big fat warning be added to the documentation? Something like:

WARNING: Please note that on systems where a single interface carries multiple protocols (Linux) ip-pre-up is NOT actually guaranteed to execute prior to the interface moving into an up state, although IP information won't be known you should consider using net-pre-up instead, alternatively, disable other NCPs such that IPv4 is the only negotiated protocol - which will also result in a guarantee that ip-pre-up is called prior to the interface going into an UP state.

paulusmack · 2023-08-04T06:20:08Z

May I then simply implement a net-pre-up that executes (blocking) once auth is done but PRIOR to protocol negotiations starting?

And net-down (non-blocking) which executes just prior to taking LCP down?

Since ip-pre-up is then broken by design (and based on the feedback cannot be fixed), may I recommend that a big fat warning be added to the documentation? Something like:

WARNING: Please note that on systems where a single interface carries multiple protocols (Linux) ip-pre-up is NOT actually guaranteed to execute prior to the interface moving into an up state, although IP information won't be known you should consider using net-pre-up instead, alternatively, disable other NCPs such that IPv4 is the only negotiated protocol - which will also result in a guarantee that ip-pre-up is called prior to the interface going into an UP state.

That all sounds fine to me.

jkroonza · 2023-09-26T21:53:22Z

May I then simply implement a net-pre-up that executes (blocking) once auth is done but PRIOR to protocol negotiations starting?
And net-down (non-blocking) which executes just prior to taking LCP down?
Since ip-pre-up is then broken by design (and based on the feedback cannot be fixed), may I recommend that a big fat warning be added to the documentation? Something like:
WARNING: Please note that on systems where a single interface carries multiple protocols (Linux) ip-pre-up is NOT actually guaranteed to execute prior to the interface moving into an up state, although IP information won't be known you should consider using net-pre-up instead, alternatively, disable other NCPs such that IPv4 is the only negotiated protocol - which will also result in a guarantee that ip-pre-up is called prior to the interface going into an UP state.

That all sounds fine to me.

Sorry for the delay. Moving to pppd 2.5.0 is getting some priority now.

Try as much as I could I could not completely axe the net-init mechanism. So please consider the motivation:

We have incoming ppp over ethernet (pppoe-server), l2tp (xl2tpd) as well as pptpd (some things just won't die). None of the daemons allows without some tweaking a mechanism to pre-run stuff prior to exec'ing pppd without basically wrapping pppd, and certainly the unit number won't be available at that stage, so we really just need to make sure that the moment the unit number is allocated, we clean the slate and verify that out external (from perspective of pppd) management state trackers are properly reset back into sane state (in case where pppd is forcefully killed or segfaults and doesn't clean up properly this is the only sane spot to perform the cleanup).

net-pre-up also permits interface renaming (there was an attempt to implement very complex code in pppd to perform interface renaming, this way this code can be handled externally to pppd using whatever script mechanism the op prefers). Any variables exported during the auth phase is also available here.

net-down should execute after all else have torn down, and ip-down and ipv6-down have been initiated (but possibly not completed).

Documentation pending, but review on code in the meantime please.

Neustradamus · 2023-09-29T21:43:45Z

@paulusmack, @carlsonj, @enaess, @EasyNetDev: Can you look this PR?
@jkroonza has worked recently on it :)

paulusmack · 2023-09-30T12:27:38Z

The commit description is not adequate. I want to see there a brief outline of what these new scripts are for and what problem is solved by having them, and also what other problems might still exist that this doesn't solve. I want people in the future to be able to understand the motivation for these changes and the design decisions behind them without needing access to github.com.

net-init executes as a blocking script directly after the unit number becomes available. This can be used to initialise aspects related to the ppp connection that lives outside of the ppp connection. It can also be used to clean up (in the author's extremely unlikely case) where a previous pppd crashed, and net-down didn't execute in order to clean up. net-pre-up executes as a blocking script after auth, prior to NCPs being negotiated. Unlike ip-pre-up this is guaranteed to execute prior to the interface being brought up, and can be used in an NCP agnostic manner to pre-initialise aspects of the interface for which it still needs to be down (amongst others it's recommended that firewall changes happen here). net-down executes in a non-blocking manner just prior to pppd terminating and can be used to clean up actions from previous scripts. You will notice that I mention ip-pre-up doesn't gaurantee that the interface will still be down, this is because in a Linux world all protocols runs on the same interface, compared to solaris where I'm informed each protocol runs on it's own sub-interface, each of which has it's own operational state. The man page for pppd has also been adjusted to indicate as much. Signed-off-by: Jaco Kroon <[email protected]>

jkroonza · 2023-10-03T06:15:06Z

The commit description is not adequate. I want to see there a brief outline of what these new scripts are for and what problem is solved by having them, and also what other problems might still exist that this doesn't solve. I want people in the future to be able to understand the motivation for these changes and the design decisions behind them without needing access to github.com.

That should be better now.

Whilst not my original strategy this does address all our concerns, as such I don't have further concerns around this, but there are from your side I'll be happy to add to the commit message.

Neustradamus · 2023-10-10T03:40:08Z

@paulusmack: The @jkroonza commit description is good now?

Neustradamus · 2023-10-13T21:56:57Z

@EasyNetDev: What do you think about the @jkroonza merged PR?

jkroonza force-pushed the consolidated-ifup-down-and-scripts branch from 31fb4d9 to bf9329e Compare August 24, 2022 14:38

This was referenced Aug 24, 2022

pppd/lcp: lcp-up and lcp-down scripts. #342

Closed

pppd/ipv6cp: Add ipv6-pre-up script. #341

Closed

enaess reviewed Aug 24, 2022

View reviewed changes

pppd/main.c Outdated Show resolved Hide resolved

enaess reviewed Aug 24, 2022

View reviewed changes

pppd/sys-linux.c Outdated Show resolved Hide resolved

jkroonza force-pushed the consolidated-ifup-down-and-scripts branch from b810041 to 6c56f53 Compare August 24, 2022 20:27

jkroonza mentioned this pull request Aug 24, 2022

PPP ifname enhanced #289

Closed

Neustradamus requested a review from paulusmack August 24, 2022 23:41

jkroonza commented Aug 25, 2022

View reviewed changes

jkroonza mentioned this pull request Dec 29, 2022

new ppp release ? #111

Closed

jkroonza mentioned this pull request Jan 4, 2023

pre-up, auth and script issues #343

Closed

jkroonza marked this pull request as draft February 14, 2023 06:16

jkroonza force-pushed the consolidated-ifup-down-and-scripts branch from 0d12ca7 to 7f65867 Compare September 26, 2023 21:45

jkroonza changed the title ~~pppd: consolidate the interface up/down code a bit.~~ pppd: implement net-init, net-pre-up and net-down. Sep 27, 2023

jkroonza force-pushed the consolidated-ifup-down-and-scripts branch from 7f65867 to a0ad71a Compare September 27, 2023 21:56

jkroonza marked this pull request as ready for review September 27, 2023 21:56

jkroonza force-pushed the consolidated-ifup-down-and-scripts branch from a0ad71a to 3906398 Compare October 3, 2023 06:13

paulusmack merged commit 1480d43 into ppp-project:master Oct 11, 2023
1 check passed

jkroonza deleted the consolidated-ifup-down-and-scripts branch December 4, 2023 13:50

paulusmack mentioned this pull request Sep 8, 2024

pppd: Add ipv6-pre-up-script option #511

Closed

pppd: implement net-init, net-pre-up and net-down. #367

pppd: implement net-init, net-pre-up and net-down. #367

Conversation

jkroonza commented Aug 24, 2022

jkroonza commented Aug 24, 2022

jkroonza commented Aug 24, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Neustradamus commented Oct 10, 2022

enaess commented Oct 22, 2022

jkroonza commented Oct 24, 2022

jkroonza commented Dec 30, 2022

jkroonza commented Jan 2, 2023

carlsonj commented Jan 3, 2023 via email

jkroonza commented Jan 3, 2023

carlsonj commented Jan 4, 2023 via email

jkroonza commented Jan 4, 2023

carlsonj commented Jan 5, 2023 via email

jkroonza commented Jan 6, 2023

carlsonj commented Jan 6, 2023 via email

jkroonza commented Jan 8, 2023

Neustradamus commented Mar 18, 2023

jkroonza commented Mar 19, 2023

Neustradamus commented Mar 19, 2023

jkroonza commented Mar 19, 2023

Neustradamus commented Jun 17, 2023

jkroonza commented Jun 18, 2023

jkroonza commented Jun 19, 2023

Neustradamus commented Jun 20, 2023

jkroonza commented Jun 28, 2023

paulusmack commented Aug 3, 2023

jkroonza commented Aug 3, 2023

paulusmack commented Aug 4, 2023

jkroonza commented Sep 26, 2023

Neustradamus commented Sep 29, 2023

paulusmack commented Sep 30, 2023

jkroonza commented Oct 3, 2023

Neustradamus commented Oct 10, 2023

Neustradamus commented Oct 13, 2023