Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix reliability of print_link_stats (with option persist) #505

Merged
merged 1 commit into from
Sep 10, 2024

Conversation

enometh
Copy link
Contributor

@enometh enometh commented Jul 14, 2024

  • pppd/ipcp.c: (ipcp_down): fix comment
  • pppd/main.c: (reset_link_stats): reset print_link_stats to 1, set start_time even if get_ppp_stats fails.

@enometh
Copy link
Contributor Author

enometh commented Jul 14, 2024

This is an attempt to fix the problem noted in the linux-ppp mailing list on mar-26-2024 and may-03-2024
under the subject "ppp-2.5.0 sometimes doesn't print stats on terminating on signal 2"

The sent/recv log messages were being lost, especially with the persist option.
This seems to be an oversight during reorg in commit
ba7f7e0 "Header file reorganization and cleaning up the public API for pppd version 2.5.0 (#379)" around the repurposing of the link_stats_valid variable as link_stats_print.

it also fixes a stray reference to the old variable in a comment.
please review. the reordering in reset_link_stats

@Neustradamus
Copy link
Member

@paulusmack: Have you seen this @enometh PR?

@paulusmack
Copy link
Collaborator

First, this needs a sign-off, see Submitting-patches.md.

Secondly, it needs an explanation that is more than simply saying what the textual change is. Don't make the reader have to reverse-engineer your thought process.

@paulusmack
Copy link
Collaborator

@enaess is does seem like ba7f7e0 introduced a problem with the link_stats_print variable, in that now print_link_stats will only ever produce output at most once. What was the motivation for introducing link_stats_print? It looks to me like I should just revert that part of ba7f7e0.

@enometh
Copy link
Contributor Author

enometh commented Aug 17, 2024

First, this needs a sign-off, see Submitting-patches.md.
Secondly, it needs an explanation that is more than simply saying what the textual change is. Don't make the reader have to reverse-engineer your thought process.

I explained the change to the extent it could be explained both on the mailing list and in the comment on the pull request. Since the patch was a trivial 2 line bugfix, of someone else's code, I skipped looking into what the "Sign off" involved. However I am using my real name and real email address. (I go by S.Madhu, officially)

@paulusmack
Copy link
Collaborator

Not sure why the DCO check is failing...

@enaess
Copy link
Contributor

enaess commented Aug 20, 2024

I'd prefer if we filed a bug on what wasn't working. All it says is it is a reliability issue with signal 2??

The get_ppp_stats() function returns 1 on success, then it also configures a timer to call a function repeatedly. The function it calls depends on what is configured and what OS you are on. Debug logs should reveal what is going on here. The get_ppp_stats() returns the value of the function it dynamically resolved, and if success it should be "1". If you get "0" then I'd be very interested in why??

So, again; please file a bug and we can look at your setup and try to figure out what is going on wrong. I am not sure if simply adding a few lines to say "now" it works, to resolve "what" problem in the first place.

@enometh
Copy link
Contributor Author

enometh commented Aug 20, 2024

I'd prefer if we filed a bug on what wasn't working. All it says is it is a reliability issue with signal 2??

The get_ppp_stats() function returns 1 on success, then it also configures a timer to call a function repeatedly. The function it calls depends on what is configured and what OS you are on. Debug logs should reveal what is going on here. The get_ppp_stats() returns the value of the function it dynamically resolved, and if success it should be "1". If you get "0" then I'd be very interested in why??

So, again; please file a bug and we can look at your setup and try to figure out what is going on wrong. I am not sure if simply adding a few lines to say "now" it works, to resolve "what" problem in the first place.

Please see
https://lore.kernel.org/linux-ppp/[email protected]/
and
http://lore.kernel.org/linux-ppp/[email protected]/

I believe this is a bonafide bug in the code which my patch fixes, and would appreciate if someone
could take the trouble to understand what it fixes. - the problem is as stated, with persist,
the stats only get printed once when the link drops, the patch makes sure stats get printed
when subsequent links are established. and broken.

* pppd/ipcp.c: (ipcp_down): fix comment
* pppd/main.c: (reset_link_stats): reset print_link_stats to 1, set
start_time even if get_ppp_stats fails.

This is an attempt to fix the problem noted in the linux-ppp mailing list on
mar-26-2024 and may-03-2024 under the subject "ppp-2.5.0 sometimes doesn't
print stats on terminating on signal 2"

The sent/recv log messages were being lost, especially with the persist option.
This seems to be an oversight during reorg in commit ba7f7e0 "Header file
reorganization and cleaning up the public API for pppd version 2.5.0 (ppp-project#379)"
around the repurposing of the link_stats_valid variable as link_stats_print.

It also fixes a stray reference to the old variable in a comment.  please
review the reordering in reset_link_stats

Signed-off-by: S Madhu <[email protected]>
@enometh
Copy link
Contributor Author

enometh commented Aug 20, 2024

Not sure why the DCO check is failing...
My author name and email was Madhu [email protected] (not routable)
And I signed off as "S Madhu" [email protected], which I thought was acceptable
I used to see commits authored by one person signed-off by another person, but
this didn't work on github.
I've adjusted the GIT_AUTHOR_EMAIL and GIT_COMMITTER_EMAIL to match the
routable email, in the sign-offline but I would prefer to keep my original signature (name + email) if possible, i.e. could i sign off as Madhu [email protected] ?
would that be possible in theory?
in the case that this patch is accepted, of course.

@paulusmack
Copy link
Collaborator

@enaess I think the basic problem is that link_stats_print is only ever set to 1 on initialization, hence print_link_stats() will only ever produce output once (at most). However, there are cases, e.g. when the persist option is used, where we used to get output from print_link_stats more than once, as the one pppd instance could handle a series of connections. This change was introduced in ba7f7e0. The fix proposed here, to set print_link_stats to 1 in reset_link_stats(), seems reasonable to me.

I agree it wants a bit more explanation in the commit message - the fix may be textually simple but that doesn't mean that the reasoning behind it is intuitively obvious.

@@ -1331,9 +1331,9 @@ print_link_stats(void)
void
reset_link_stats(int u)
{
if (!get_ppp_stats(u, &old_link_stats))
return;
get_ppp_stats(u, &old_link_stats);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we initialize old_link_stats to all zeroes if the get_ppp_stats call fails? Or copy it from link_stats?

@enaess
Copy link
Contributor

enaess commented Aug 21, 2024

I think the use of print_link_stats was intended to avoid double printing the result on exit (though, I am not 100% sure). If there was instances where an update was to be dispatched to a script on a regular interval, then I must have missed that memo.

I'd have to run this code and troubleshoot any side-effects of making that change. I do recall it wasn't a straight forward shot. Also, do we know if the print_link_stats will ever be called in case of a SIGHUP (and if it is, the value of print_link_stats is invalid?).

@enometh
Copy link
Contributor Author

enometh commented Aug 22, 2024 via email

@enometh
Copy link
Contributor Author

enometh commented Aug 22, 2024 via email

@enaess
Copy link
Contributor

enaess commented Aug 23, 2024

I don't recall correctly, but didn't @paulusmack fix-up some of the signal handling in pppd in the last release (or was that chat.c?). You reported seeing the stats being printed in some cases, and in some cases not? Perhaps the signal handling code has shifted things around that it introduced this bug.

I'd love getting a repro on this and test this out, but I don't have the time to fiddle with this issue right now.

@paulusmack
Copy link
Collaborator

I don't recall correctly, but didn't @paulusmack fix-up some of the signal handling in pppd in the last release (or was that chat.c?). You reported seeing the stats being printed in some cases, and in some cases not? Perhaps the signal handling code has shifted things around that it introduced this bug.

I'd love getting a repro on this and test this out, but I don't have the time to fiddle with this issue right now.

The key thing here that maybe you missed is that with the persist option, the one pppd instance can handle more than just one connection lifecycle. That is, with persist, pppd will try to reconnect after the connection terminates, and a given pppd instance could handle an unlimited number of connections (in serial fashion, one after another). So the logic where print_link_stats is set to 0 when the stats are printed and never set to 1 again after that is wrong. It's nothing to do with signals.

@paulusmack paulusmack merged commit 9d82710 into ppp-project:master Sep 10, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants