-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve results when NLM_F_DUMP_INTR is set #1018
Conversation
3b85385
to
fae5970
Compare
@adrianchiris, @aboch ... could you take a look? |
// ErrDumpInterrupted is an instance of errDumpInterrupted, used to report that | ||
// a netlink function has set the NLM_F_DUMP_INTR flag in a response message, | ||
// indicating that the results may be incomplete or inconsistent. | ||
var ErrDumpInterrupted = errDumpInterrupted{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use:
errors.New("results may be incomplete or inconsistent")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not equivalent: errDumpInterrupted
implements interface { Is(error) bool }
. errors.Is(errors.New(""), unix.EINTR) == false
while errors.Is(ErrDumpInterrupted, unix.EINTR) == true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you are right.
@robmry just as an FYI, the check was introduced to align with how libmnl is executing netlink commands. see [1] can you provide more info on the use-case where its ok and needed to rely partial results instead of retry ? (except the ci failure u linked). TBH i favor sticking with how iproute2/libmnl implements execution of these cmds unless there is a good reason to deviate |
Thanks @adrianchiris - the moby (dockerd) maintainers were concerned about unbounded retries on the new So, the preference is to be able to retry a limited number of times and, in the unlikely event the retries fail, fall-back to using the partial data - on the basis that it'll be no worse than it was before the change, and no problems have been noticed/reported. As an example of where partial data might be used in a more considered way - in this package, Many callers are still likely to see the error is non-nil and just take an error path without looking at the data. iproute2 does something similar here https://github.com/iproute2/iproute2/blob/e9096586e0701d5ae031df2f2708d20d34ae7bd4/lib/libnetlink.c#L901 ... it doesn't retry on NLM_F_DUMP_INTR, but it's not a daemon and the user can choose to retry when they get an error. |
Yes, the goal of this library is to be in sync with iproute2 behavior as much as possible. Per go guidelines, but in general a good practice for any language, if the the function returns an non nil error, any other returned variable must not be trusted. The fact that you have each list method to still return the original INTR error is good, but if the parsing functions fail because the partial list is truncated at the element level (I do not know if it is possible) then the parsing function will fail and the returned error will be a parsing error. This will mask the real INTR error, making it harder to root cause. |
The There's nothing moby-specific about the change, or requirement ... it's discussed in my comment because I was responding to the request for a use-case, and moby is one. But, the
I can't find anything like that in https://go.dev/doc/effective_go#errors - I also thought it was unusual, but io.ReadAll is a counter-example. In this case, the error is very explicit that the results aren't to be trusted. The idea is to give the caller the choice (like the kernel and iproute2 do).
I don't think that's possible - individual results from a collection are copied into the netlink response with locks held, so they'll be fine. The flag indicates that a generation number for the collection changed as those results were being accumulated. (A reference to a position in the collection is stored between netlink responses to a request, but that position can get out of date. Even with a single netlink response message, collections can be changed between reads of elements of the collection.) |
Before v1.2.1, NLM_F_DUMP_INTR flag was not checked when parsing messages. If it was possible for the kernel to return a truncated list, someone would have seen and complained about parsing errors in the same circumstances where we are currently complaining about EINTR. This library itself is very strong evidence that the kernel does not pass ill-formed messages to userspace when a dump is interrupted. |
I guess you convinced me. |
Hi all, Adding some more points:
That said, i have no objection for the change as long as we still fail if results are inconsistent.
please note that the kernel refers to this data as "inconsistent" rather then partial. [1] https://github.com/torvalds/linux/blob/8d8d276ba2fb5f9ac4984f5c10ae60858090babc/Documentation/userspace-api/netlink/intro.rst#dump-consistency Line 1871 in a018296
|
fae5970
to
9c357a0
Compare
Done, except the error is
|
Yes, indeed. But, if the data is returned, the caller has the option of choosing to do something else.
Also macro rtnl_dump_filter, which is used to process results for a lot of functions that make
Agreed - but this PR already only changes functions that do an
I think that's what I described in the example? ("in this package, LinkByName can fall back to using linkByNameDump, which searches a dump for the name"), and now in the godoc comment describing the |
@robmry This project follows a "One change/fix/feature per pull request" and the "Squash and Merge" option is not enabled. |
9c357a0
to
1f76c75
Compare
Oh, sure - done ... I got rid of the gofmt changes, as they weren't directly related (only really there to stop my editor messing with code in the other commits). The other three commits were split to try to make reviews a bit easier:
(I've still got those changes ... happy to re-raise as separate PRs if preferred?) |
ipset_linux.go
Outdated
@@ -250,7 +250,7 @@ func (h *Handle) IpsetListAll() ([]IPSetResult, error) { | |||
result[i].unserialize(msg) | |||
} | |||
|
|||
return result, nil | |||
return result, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to check for ErrDumpInterrupted at L244?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thank you - I think I modified this one as it's a ...ListAll()
- then realised it doesn't set NLM_F_DUMP
, and didn't fully undo the change. I'll change the return back to nil
.
@@ -2440,6 +2464,9 @@ func linkSubscribeAt(newNs, curNs netns.NsHandle, ch chan<- LinkUpdate, done <-c | |||
continue | |||
} | |||
for _, m := range msgs { | |||
if m.Header.Flags&unix.NLM_F_DUMP_INTR != 0 && cberr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if for initial list would return dump interrupted there would be additional notifications from kernel (i think).
not sure we want this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly ... what additional notifications are you thinking of, and where would they go?
@@ -157,6 +157,9 @@ func (u *UnixSocket) deserialize(b []byte) error { | |||
} | |||
|
|||
// SocketGet returns the Socket identified by its local and remote addresses. | |||
// | |||
// If the returned error is [ErrDumpInterrupted], the search for a result may | |||
// be incomplete and the caller should retry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ATM this would return nil as Socket if any error happened at L219 (req.Execute()) is this intended ?
i.e in case of ErrDumpInterrupted we dont get the partial results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one returns specific errors if there's not exactly one result and, even if there is one socket, either of those is a possibility if the dump was interrupted.
So, if there's not exactly one result, ErrDumpInterrupted
should be returned rather than one of those other errors to indicate that a retry is needed.
But, I guess it'd be better to explicitly check for that case ... I'll update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one returns specific errors if there's not exactly one result ...
Hmm - looking at it more closely, that might have been what was intended, but it's not what it does ...
if len(msgs) == 0 {
return nil, errors.New("no message nor error from netlink")
}
if len(msgs) > 2 {
return nil, fmt.Errorf("multiple (%d) matching sockets", len(msgs))
}
So, if there are 2 results, it'll use the first and discard the second. I guess the second test should probably be > 1
or >= 2
.
Assuming that's a bug, fixing it doesn't belong in this PR - so I'll leave it alone here ... just returning nil, rather than making things more confusing.
Add a specific error to report that a netlink response had NLM_F_DUMP_INTR set, indicating that the set of results may be incomplete or inconsistent. unix.EINTR was previously returned (with no results) when the NLM_F_DUMP_INTR flag was set. Now, errors.Is(err, unix.EINTR) will still work. But, this will be a breaking change for any code that's checking for equality with unix.EINTR. Return results with ErrDumpInterrupted. Results may be incomplete or inconsistent, but give the caller the option of using them. Look for NLM_F_DUMP_INTR in more places: - linkSubscribeAt, neighSubscribeAt, routeSubscribeAt - can do an initial dump, which may report inconsistent results -> if there's an error callback, call it with ErrDumpInterrupted - socketDiagXDPExecutor - makes an NLM_F_DUMP request, without using Execute() -> give it the same behaviour as functions that do use Execute() Signed-off-by: Rob Murray <[email protected]>
1f76c75
to
5da8257
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
…93d350 - Preserve results when NLM_F_DUMP_INTR is set vishvananda/netlink#1018 - Fix SetSendTimeout/SetReceiveTimeout vishvananda/netlink#1012 full diff: vishvananda/netlink@v1.3.0...084abd9 Signed-off-by: Rob Murray <[email protected]>
@adrianchiris @aboch any chance on tagging a new release now that our fixes went in? |
According to the kernel docs[^1], the kernel can return incomplete results for netlink state dumps if the state changes while we are dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The `vishvananda/netlink` library returned `EINTR` since v1.2.1, but more recent versions have changed it such that it returns `netlink.ErrDumpInterrupted` instead[^2]. These interruptions seem common in high-churn environments. If the error occurs, it is in most cases best to just try again. Therefore, this commit adds a wrapper for all `netlink` functions marked to return `ErrDumpInterrupted` that retries the function up to 30 times until it either succeeds or returns a different error. While may call sites do have their own high-level retry mechanism (see e.g. cilium#32099), the logged error message can still cause CI to fail (e.g. cilium#35259). Long high-level retry intervals can also become problematic: For example, if the routing setup fails due to `NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add several seconds of additional delay to an already overloaded system, instead of resolving the issue quickly. A subsequent commit will add an additional linter that nudges developers to use this new `safenetlink` package for function calls that can be interrupted. This ensures that we don't have to add retries in all subsystems individually. [^1]: https://docs.kernel.org/userspace-api/netlink/intro.html [^2]: vishvananda/netlink#1018 Signed-off-by: Sebastian Wicki <[email protected]>
According to the kernel docs[^1], the kernel can return incomplete results for netlink state dumps if the state changes while we are dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The `vishvananda/netlink` library returned `EINTR` since v1.2.1, but more recent versions have changed it such that it returns `netlink.ErrDumpInterrupted` instead[^2]. These interruptions seem common in high-churn environments. If the error occurs, it is in most cases best to just try again. Therefore, this commit adds a wrapper for all `netlink` functions marked to return `ErrDumpInterrupted` that retries the function up to 30 times until it either succeeds or returns a different error. While may call sites do have their own high-level retry mechanism (see e.g. cilium#32099), the logged error message can still cause CI to fail (e.g. cilium#35259). Long high-level retry intervals can also become problematic: For example, if the routing setup fails due to `NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add several seconds of additional delay to an already overloaded system, instead of resolving the issue quickly. A subsequent commit will add an additional linter that nudges developers to use this new `safenetlink` package for function calls that can be interrupted. This ensures that we don't have to add retries in all subsystems individually. [^1]: https://docs.kernel.org/userspace-api/netlink/intro.html [^2]: vishvananda/netlink#1018 Signed-off-by: Sebastian Wicki <[email protected]>
According to the kernel docs[^1], the kernel can return incomplete results for netlink state dumps if the state changes while we are dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The `vishvananda/netlink` library returned `EINTR` since v1.2.1, but more recent versions have changed it such that it returns `netlink.ErrDumpInterrupted` instead[^2]. These interruptions seem common in high-churn environments. If the error occurs, it is in most cases best to just try again. Therefore, this commit adds a wrapper for all `netlink` functions marked to return `ErrDumpInterrupted` that retries the function up to 30 times until it either succeeds or returns a different error. While may call sites do have their own high-level retry mechanism (see e.g. cilium#32099), the logged error message can still cause CI to fail (e.g. cilium#35259). Long high-level retry intervals can also become problematic: For example, if the routing setup fails due to `NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add several seconds of additional delay to an already overloaded system, instead of resolving the issue quickly. A subsequent commit will add an additional linter that nudges developers to use this new `safenetlink` package for function calls that can be interrupted. This ensures that we don't have to add retries in all subsystems individually. [^1]: https://docs.kernel.org/userspace-api/netlink/intro.html [^2]: vishvananda/netlink#1018 Signed-off-by: Sebastian Wicki <[email protected]>
According to the kernel docs[^1], the kernel can return incomplete results for netlink state dumps if the state changes while we are dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The `vishvananda/netlink` library returned `EINTR` since v1.2.1, but more recent versions have changed it such that it returns `netlink.ErrDumpInterrupted` instead[^2]. These interruptions seem common in high-churn environments. If the error occurs, it is in most cases best to just try again. Therefore, this commit adds a wrapper for all `netlink` functions marked to return `ErrDumpInterrupted` that retries the function up to 30 times until it either succeeds or returns a different error. While may call sites do have their own high-level retry mechanism (see e.g. cilium#32099), the logged error message can still cause CI to fail (e.g. cilium#35259). Long high-level retry intervals can also become problematic: For example, if the routing setup fails due to `NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add several seconds of additional delay to an already overloaded system, instead of resolving the issue quickly. A subsequent commit will add an additional linter that nudges developers to use this new `safenetlink` package for function calls that can be interrupted. This ensures that we don't have to add retries in all subsystems individually. [^1]: https://docs.kernel.org/userspace-api/netlink/intro.html [^2]: vishvananda/netlink#1018 Signed-off-by: Sebastian Wicki <[email protected]>
According to the kernel docs[^1], the kernel can return incomplete results for netlink state dumps if the state changes while we are dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The `vishvananda/netlink` library returned `EINTR` since v1.2.1, but more recent versions have changed it such that it returns `netlink.ErrDumpInterrupted` instead[^2]. These interruptions seem common in high-churn environments. If the error occurs, it is in most cases best to just try again. Therefore, this commit adds a wrapper for all `netlink` functions marked to return `ErrDumpInterrupted` that retries the function up to 30 times until it either succeeds or returns a different error. While may call sites do have their own high-level retry mechanism (see e.g. cilium#32099), the logged error message can still cause CI to fail (e.g. cilium#35259). Long high-level retry intervals can also become problematic: For example, if the routing setup fails due to `NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add several seconds of additional delay to an already overloaded system, instead of resolving the issue quickly. A subsequent commit will add an additional linter that nudges developers to use this new `safenetlink` package for function calls that can be interrupted. This ensures that we don't have to add retries in all subsystems individually. [^1]: https://docs.kernel.org/userspace-api/netlink/intro.html [^2]: vishvananda/netlink#1018 Signed-off-by: Sebastian Wicki <[email protected]>
According to the kernel docs[^1], the kernel can return incomplete results for netlink state dumps if the state changes while we are dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The `vishvananda/netlink` library returned `EINTR` since v1.2.1, but more recent versions have changed it such that it returns `netlink.ErrDumpInterrupted` instead[^2]. These interruptions seem common in high-churn environments. If the error occurs, it is in most cases best to just try again. Therefore, this commit adds a wrapper for all `netlink` functions marked to return `ErrDumpInterrupted` that retries the function up to 30 times until it either succeeds or returns a different error. While may call sites do have their own high-level retry mechanism (see e.g. #32099), the logged error message can still cause CI to fail (e.g. #35259). Long high-level retry intervals can also become problematic: For example, if the routing setup fails due to `NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add several seconds of additional delay to an already overloaded system, instead of resolving the issue quickly. A subsequent commit will add an additional linter that nudges developers to use this new `safenetlink` package for function calls that can be interrupted. This ensures that we don't have to add retries in all subsystems individually. [^1]: https://docs.kernel.org/userspace-api/netlink/intro.html [^2]: vishvananda/netlink#1018 Signed-off-by: Sebastian Wicki <[email protected]>
[ upstream commit 5d1951b ] According to the kernel docs[^1], the kernel can return incomplete results for netlink state dumps if the state changes while we are dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The `vishvananda/netlink` library returned `EINTR` since v1.2.1, but more recent versions have changed it such that it returns `netlink.ErrDumpInterrupted` instead[^2]. These interruptions seem common in high-churn environments. If the error occurs, it is in most cases best to just try again. Therefore, this commit adds a wrapper for all `netlink` functions marked to return `ErrDumpInterrupted` that retries the function up to 30 times until it either succeeds or returns a different error. While may call sites do have their own high-level retry mechanism (see e.g. #32099), the logged error message can still cause CI to fail (e.g. #35259). Long high-level retry intervals can also become problematic: For example, if the routing setup fails due to `NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add several seconds of additional delay to an already overloaded system, instead of resolving the issue quickly. A subsequent commit will add an additional linter that nudges developers to use this new `safenetlink` package for function calls that can be interrupted. This ensures that we don't have to add retries in all subsystems individually. [^1]: https://docs.kernel.org/userspace-api/netlink/intro.html [^2]: vishvananda/netlink#1018 Signed-off-by: Sebastian Wicki <[email protected]>
[ upstream commit 5d1951b ] According to the kernel docs[^1], the kernel can return incomplete results for netlink state dumps if the state changes while we are dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The `vishvananda/netlink` library returned `EINTR` since v1.2.1, but more recent versions have changed it such that it returns `netlink.ErrDumpInterrupted` instead[^2]. These interruptions seem common in high-churn environments. If the error occurs, it is in most cases best to just try again. Therefore, this commit adds a wrapper for all `netlink` functions marked to return `ErrDumpInterrupted` that retries the function up to 30 times until it either succeeds or returns a different error. While may call sites do have their own high-level retry mechanism (see e.g. #32099), the logged error message can still cause CI to fail (e.g. #35259). Long high-level retry intervals can also become problematic: For example, if the routing setup fails due to `NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add several seconds of additional delay to an already overloaded system, instead of resolving the issue quickly. A subsequent commit will add an additional linter that nudges developers to use this new `safenetlink` package for function calls that can be interrupted. This ensures that we don't have to add retries in all subsystems individually. [^1]: https://docs.kernel.org/userspace-api/netlink/intro.html [^2]: vishvananda/netlink#1018 Signed-off-by: Sebastian Wicki <[email protected]>
When the kernel detects a change in a set of resources being dumped (walked-over) by a netlink call, it sets flag
NLM_F_DUMP_INTR
to indicate that the results may be incomplete or inconsistent.Before #925 (in v1.2.1), the flag was ignored and results were returned without an error. With that change, response handling is aborted, results are discarded, and
unix.EINTR
is returned.Depending on what the caller is looking for, the incomplete/inconsistent results may be usable - and that may be preferable to retrying the whole request (which could take unbounded time on a system that's seeing a lot of changes in a lot of data).
This PR introduces a new error,
netlink.ErrDumpInterrupted
that's specific to theNLM_F_DUMP_INTR
case. Response handling completes as normal, and data is returned along with the error. So, the caller can use something likeerrors.Is(err, netlink.ErrDumpInterrupted)
, and take appropriate action.If any callers are checking for
errors.Is(err, unix.EINTR)
, that'll continue to work. But, this is a breaking change for any callers checking the error with a simpleerr == unix.EINTR
.