Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve results when NLM_F_DUMP_INTR is set #1018

Merged
merged 1 commit into from
Sep 22, 2024

Conversation

robmry
Copy link
Contributor

@robmry robmry commented Sep 9, 2024

When the kernel detects a change in a set of resources being dumped (walked-over) by a netlink call, it sets flag NLM_F_DUMP_INTR to indicate that the results may be incomplete or inconsistent.

Before #925 (in v1.2.1), the flag was ignored and results were returned without an error. With that change, response handling is aborted, results are discarded, and unix.EINTR is returned.

Depending on what the caller is looking for, the incomplete/inconsistent results may be usable - and that may be preferable to retrying the whole request (which could take unbounded time on a system that's seeing a lot of changes in a lot of data).

This PR introduces a new error, netlink.ErrDumpInterrupted that's specific to the NLM_F_DUMP_INTR case. Response handling completes as normal, and data is returned along with the error. So, the caller can use something like errors.Is(err, netlink.ErrDumpInterrupted), and take appropriate action.

If any callers are checking for errors.Is(err, unix.EINTR), that'll continue to work. But, this is a breaking change for any callers checking the error with a simple err == unix.EINTR.

@robmry robmry force-pushed the nlm_f_dump_intr_handling branch 3 times, most recently from 3b85385 to fae5970 Compare September 9, 2024 17:08
@robmry robmry marked this pull request as ready for review September 9, 2024 17:11
@robmry
Copy link
Contributor Author

robmry commented Sep 9, 2024

@adrianchiris, @aboch ... could you take a look?

// ErrDumpInterrupted is an instance of errDumpInterrupted, used to report that
// a netlink function has set the NLM_F_DUMP_INTR flag in a response message,
// indicating that the results may be incomplete or inconsistent.
var ErrDumpInterrupted = errDumpInterrupted{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use:
errors.New("results may be incomplete or inconsistent")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not equivalent: errDumpInterrupted implements interface { Is(error) bool }. errors.Is(errors.New(""), unix.EINTR) == false while errors.Is(ErrDumpInterrupted, unix.EINTR) == true.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you are right.

@adrianchiris
Copy link
Collaborator

adrianchiris commented Sep 10, 2024

@robmry just as an FYI, the check was introduced to align with how libmnl is executing netlink commands. see [1]

can you provide more info on the use-case where its ok and needed to rely partial results instead of retry ? (except the ci failure u linked).

TBH i favor sticking with how iproute2/libmnl implements execution of these cmds unless there is a good reason to deviate

[1] https://github.com/justmirror/libmnl/blob/f14732339a77a2e6ba9ed4ca99b347ce1dc60801/src/callback.c#L70

@robmry
Copy link
Contributor Author

robmry commented Sep 10, 2024

@robmry just as an FYI, the check was introduced to align with how libmnl is executing netlink commands. see [1]

can you provide more info on the use-case where its ok and needed to rely partial results instead of retry ? (except the ci failure u linked).

TBH i favor sticking with how iproute2/libmnl implements execution of these cmds unless there is a good reason to deviate

[1] https://github.com/justmirror/libmnl/blob/f14732339a77a2e6ba9ed4ca99b347ce1dc60801/src/callback.c#L70

Thanks @adrianchiris - the moby (dockerd) maintainers were concerned about unbounded retries on the new EINTR. It's not quite like a retry on a normal EINTR from a system call as it's (perhaps) more likely to happen again - and the kernel interface gives the option of using the partial data.

So, the preference is to be able to retry a limited number of times and, in the unlikely event the retries fail, fall-back to using the partial data - on the basis that it'll be no worse than it was before the change, and no problems have been noticed/reported.

As an example of where partial data might be used in a more considered way - in this package, LinkByName can fall back to using linkByNameDump, which searches a dump for the name. If there's no result with the name, there's no way to tell if the link exists without a retry. But, if there's at-least one entry with the name, it might be what the caller wants ... this package can't tell. Not-discarding the result and returning a Link with the error means the caller can decide - it might want to retry, use the Link as-is, or it might want to get a more up-to-date version using the returned Index in LinkByIndex (which doesn't do a dump).

Many callers are still likely to see the error is non-nil and just take an error path without looking at the data.

iproute2 does something similar here https://github.com/iproute2/iproute2/blob/e9096586e0701d5ae031df2f2708d20d34ae7bd4/lib/libnetlink.c#L901 ... it doesn't retry on NLM_F_DUMP_INTR, but it's not a daemon and the user can choose to retry when they get an error.

@aboch
Copy link
Collaborator

aboch commented Sep 10, 2024

Yes, the goal of this library is to be in sync with iproute2 behavior as much as possible.
If we start to add exceptions to accommodate this or this other project, we will end up introducing a lot of oddities in the code which will hinder the use of this library in future.

Per go guidelines, but in general a good practice for any language, if the the function returns an non nil error, any other returned variable must not be trusted.

The fact that you have each list method to still return the original INTR error is good, but if the parsing functions fail because the partial list is truncated at the element level (I do not know if it is possible) then the parsing function will fail and the returned error will be a parsing error. This will mask the real INTR error, making it harder to root cause.

@robmry
Copy link
Contributor Author

robmry commented Sep 10, 2024

Yes, the goal of this library is to be in sync with iproute2 behavior as much as possible. If we start to add exceptions to accommodate this or this other project, we will end up introducing a lot of oddities in the code which will hinder the use of this library in future.

The iproute2 link I pasted above shows it handling NLM_F_DUMP_INTR as it's handled by this change.

There's nothing moby-specific about the change, or requirement ... it's discussed in my comment because I was responding to the request for a use-case, and moby is one. But, the LinkByName example is based on how it could be used by a function in this package (or its caller).

Per go guidelines, but in general a good practice for any language, if the the function returns an non nil error, any other returned variable must not be trusted.

I can't find anything like that in https://go.dev/doc/effective_go#errors - I also thought it was unusual, but io.ReadAll is a counter-example.

In this case, the error is very explicit that the results aren't to be trusted. The idea is to give the caller the choice (like the kernel and iproute2 do).

The fact that you have each list method to still return the original INTR error is good, but if the parsing functions fail because the partial list is truncated at the element level (I do not know if it is possible) then the parsing function will fail and the returned error will be a parsing error. This will mask the real INTR error, making it harder to root cause.

I don't think that's possible - individual results from a collection are copied into the netlink response with locks held, so they'll be fine. The flag indicates that a generation number for the collection changed as those results were being accumulated. (A reference to a position in the collection is stored between netlink responses to a request, but that position can get out of date. Even with a single netlink response message, collections can be changed between reads of elements of the collection.)

@corhere
Copy link

corhere commented Sep 10, 2024

if the parsing functions fail because the partial list is truncated at the element level (I do not know if it is possible) then the parsing function will fail and the returned error will be a parsing error. This will mask the real INTR error, making it harder to root cause.

Before v1.2.1, NLM_F_DUMP_INTR flag was not checked when parsing messages. If it was possible for the kernel to return a truncated list, someone would have seen and complained about parsing errors in the same circumstances where we are currently complaining about EINTR. This library itself is very strong evidence that the kernel does not pass ill-formed messages to userspace when a dump is interrupted.

@aboch
Copy link
Collaborator

aboch commented Sep 10, 2024

I guess you convinced me.
@robmry can you please add a comment line on top of each exposed method saying the in case of EINTR returned error the collection of elements is still returned but may not be current?

@adrianchiris
Copy link
Collaborator

adrianchiris commented Sep 11, 2024

Hi all,

Adding some more points:

  1. kernel recommends to retry inconsistent dump commands [1]
  2. i dont think that iproute makes a habbit of using inconsistent results of dump commands. from grepping a bit in iproute code
    i found the example[2] listed in above comments used only in ip address flush[3]

ipaddr_flush() --- uses ---> rtnl_dump_filter_nc() --- uses ---> rtnl_dump_filter_l()

  1. NLM_F_DUMP_INTR is relevant only for dump commands (NLM_F_DUMP flag set ) so for non dump commands no need to do the check IMO.
  2. LinkByName in this library does a fallback to dump command only when kernel does not support getting the specific interface according to comments [4]

That said, i have no objection for the change as long as we still fail if results are inconsistent.

Thanks @adrianchiris - the moby (dockerd) maintainers were concerned about unbounded retries on the new EINTR. It's not quite like a retry on a normal EINTR from a system call as it's (perhaps) more likely to happen again - and the kernel interface gives the option of using the partial data.

please note that the kernel refers to this data as "inconsistent" rather then partial.

[1] https://github.com/torvalds/linux/blob/8d8d276ba2fb5f9ac4984f5c10ae60858090babc/Documentation/userspace-api/netlink/intro.rst#dump-consistency
[2] https://github.com/iproute2/iproute2/blob/e9096586e0701d5ae031df2f2708d20d34ae7bd4/lib/libnetlink.c#L901
[3] https://github.com/iproute2/iproute2/blob/e9096586e0701d5ae031df2f2708d20d34ae7bd4/ip/ipaddress.c#L1996
[4]

// older kernels don't support looking up via IFLA_IFNAME

@robmry
Copy link
Contributor Author

robmry commented Sep 11, 2024

I guess you convinced me. @robmry can you please add a comment line on top of each exposed method saying the in case of EINTR returned error the collection of elements is still returned but may not be current?

Done, except the error is ErrDumpInterrupted. Generally, the comment is ...

// If the returned error is [ErrDumpInterrupted], results may be inconsistent
// or incomplete.

@robmry
Copy link
Contributor Author

robmry commented Sep 11, 2024

  1. kernel recommends to retry inconsistent dump commands [1]

Yes, indeed. But, if the data is returned, the caller has the option of choosing to do something else.

  1. i dont think that iproute makes a habbit of using inconsistent results of dump commands. from grepping a bit in iproute code
    i found the example[2] listed in above comments used only in ip address flush[3]

ipaddr_flush() --- uses ---> rtnl_dump_filter_nc() --- uses ---> rtnl_dump_filter_l()

Also macro rtnl_dump_filter, which is used to process results for a lot of functions that make NLM_F_DUMP requests.

  1. NLM_F_DUMP_INTR is relevant only for dump commands (NLM_F_DUMP flag set ) so for non dump commands no need to do the check IMO.

Agreed - but this PR already only changes functions that do an NLM_F_DUMP.

  1. LinkByName in this library does a fallback to dump command only when kernel does not support getting the specific interface according to comments [4]

I think that's what I described in the example? ("in this package, LinkByName can fall back to using linkByNameDump, which searches a dump for the name"), and now in the godoc comment describing the ErrDumpInterrupted return. (Ditto for LinkByAlias.)

@aboch
Copy link
Collaborator

aboch commented Sep 12, 2024

@robmry This project follows a "One change/fix/feature per pull request" and the "Squash and Merge" option is not enabled.
Please squash your commits into one and push again.

@robmry
Copy link
Contributor Author

robmry commented Sep 15, 2024

@robmry This project follows a "One change/fix/feature per pull request" and the "Squash and Merge" option is not enabled. Please squash your commits into one and push again.

Oh, sure - done ...

I got rid of the gofmt changes, as they weren't directly related (only really there to stop my editor messing with code in the other commits). The other three commits were split to try to make reviews a bit easier:

  • introduce the new error type
  • mechanical changes to use the new type, and return results where EINTR would have been returned before
  • catch some NLM_F_DUMP_INTR cases missed by the original change

(I've still got those changes ... happy to re-raise as separate PRs if preferred?)

ipset_linux.go Outdated
@@ -250,7 +250,7 @@ func (h *Handle) IpsetListAll() ([]IPSetResult, error) {
result[i].unserialize(msg)
}

return result, nil
return result, err
Copy link
Collaborator

@adrianchiris adrianchiris Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to check for ErrDumpInterrupted at L244?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thank you - I think I modified this one as it's a ...ListAll() - then realised it doesn't set NLM_F_DUMP, and didn't fully undo the change. I'll change the return back to nil.

@@ -2440,6 +2464,9 @@ func linkSubscribeAt(newNs, curNs netns.NsHandle, ch chan<- LinkUpdate, done <-c
continue
}
for _, m := range msgs {
if m.Header.Flags&unix.NLM_F_DUMP_INTR != 0 && cberr != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if for initial list would return dump interrupted there would be additional notifications from kernel (i think).
not sure we want this here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly ... what additional notifications are you thinking of, and where would they go?

@@ -157,6 +157,9 @@ func (u *UnixSocket) deserialize(b []byte) error {
}

// SocketGet returns the Socket identified by its local and remote addresses.
//
// If the returned error is [ErrDumpInterrupted], the search for a result may
// be incomplete and the caller should retry.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATM this would return nil as Socket if any error happened at L219 (req.Execute()) is this intended ?

i.e in case of ErrDumpInterrupted we dont get the partial results

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one returns specific errors if there's not exactly one result and, even if there is one socket, either of those is a possibility if the dump was interrupted.

So, if there's not exactly one result, ErrDumpInterrupted should be returned rather than one of those other errors to indicate that a retry is needed.

But, I guess it'd be better to explicitly check for that case ... I'll update it.

Copy link
Contributor Author

@robmry robmry Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one returns specific errors if there's not exactly one result ...

Hmm - looking at it more closely, that might have been what was intended, but it's not what it does ...

	if len(msgs) == 0 {
		return nil, errors.New("no message nor error from netlink")
	}
	if len(msgs) > 2 {
		return nil, fmt.Errorf("multiple (%d) matching sockets", len(msgs))
	}

So, if there are 2 results, it'll use the first and discard the second. I guess the second test should probably be > 1 or >= 2.

Assuming that's a bug, fixing it doesn't belong in this PR - so I'll leave it alone here ... just returning nil, rather than making things more confusing.

Add a specific error to report that a netlink response had
NLM_F_DUMP_INTR set, indicating that the set of results may be
incomplete or inconsistent.

unix.EINTR was previously returned (with no results) when the
NLM_F_DUMP_INTR flag was set. Now, errors.Is(err, unix.EINTR) will
still work. But, this will be a breaking change for any code that's
checking for equality with unix.EINTR.

Return results with ErrDumpInterrupted. Results may be incomplete
or inconsistent, but give the caller the option of using them.

Look for NLM_F_DUMP_INTR in more places:
- linkSubscribeAt, neighSubscribeAt, routeSubscribeAt
  - can do an initial dump, which may report inconsistent results
  -> if there's an error callback, call it with ErrDumpInterrupted
- socketDiagXDPExecutor
  - makes an NLM_F_DUMP request, without using Execute()
  -> give it the same behaviour as functions that do use Execute()

Signed-off-by: Rob Murray <[email protected]>
Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@aboch aboch merged commit 084abd9 into vishvananda:main Sep 22, 2024
2 checks passed
robmry added a commit to robmry/moby that referenced this pull request Oct 7, 2024
…93d350

- Preserve results when NLM_F_DUMP_INTR is set
  vishvananda/netlink#1018
- Fix SetSendTimeout/SetReceiveTimeout
  vishvananda/netlink#1012

full diff: vishvananda/netlink@v1.3.0...084abd9

Signed-off-by: Rob Murray <[email protected]>
@thaJeztah
Copy link
Contributor

@adrianchiris @aboch any chance on tagging a new release now that our fixes went in?

gandro added a commit to gandro/cilium that referenced this pull request Oct 29, 2024
According to the kernel docs[^1], the kernel can return incomplete
results for netlink state dumps if the state changes while we are
dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The
`vishvananda/netlink` library returned `EINTR` since v1.2.1, but more
recent versions have changed it such that it returns
`netlink.ErrDumpInterrupted` instead[^2].

These interruptions seem common in high-churn environments. If the error
occurs, it is in most cases best to just try again.  Therefore, this
commit adds a wrapper for all `netlink` functions marked to return
`ErrDumpInterrupted` that retries the function up to 30 times until it
either succeeds or returns a different error.

While may call sites do have their own high-level retry mechanism (see
e.g. cilium#32099), the logged error message can still cause CI
to fail (e.g. cilium#35259). Long high-level retry intervals can
also become problematic: For example, if the routing setup fails due to
`NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add
several seconds of additional delay to an already overloaded system,
instead of resolving the issue quickly.

A subsequent commit will add an additional linter that nudges developers
to use this new `safenetlink` package for function calls that can be
interrupted. This ensures that we don't have to add retries in all
subsystems individually.

[^1]: https://docs.kernel.org/userspace-api/netlink/intro.html
[^2]: vishvananda/netlink#1018

Signed-off-by: Sebastian Wicki <[email protected]>
gandro added a commit to gandro/cilium that referenced this pull request Oct 29, 2024
According to the kernel docs[^1], the kernel can return incomplete
results for netlink state dumps if the state changes while we are
dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The
`vishvananda/netlink` library returned `EINTR` since v1.2.1, but more
recent versions have changed it such that it returns
`netlink.ErrDumpInterrupted` instead[^2].

These interruptions seem common in high-churn environments. If the error
occurs, it is in most cases best to just try again.  Therefore, this
commit adds a wrapper for all `netlink` functions marked to return
`ErrDumpInterrupted` that retries the function up to 30 times until it
either succeeds or returns a different error.

While may call sites do have their own high-level retry mechanism (see
e.g. cilium#32099), the logged error message can still cause CI
to fail (e.g. cilium#35259). Long high-level retry intervals can
also become problematic: For example, if the routing setup fails due to
`NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add
several seconds of additional delay to an already overloaded system,
instead of resolving the issue quickly.

A subsequent commit will add an additional linter that nudges developers
to use this new `safenetlink` package for function calls that can be
interrupted. This ensures that we don't have to add retries in all
subsystems individually.

[^1]: https://docs.kernel.org/userspace-api/netlink/intro.html
[^2]: vishvananda/netlink#1018

Signed-off-by: Sebastian Wicki <[email protected]>
gandro added a commit to gandro/cilium that referenced this pull request Oct 30, 2024
According to the kernel docs[^1], the kernel can return incomplete
results for netlink state dumps if the state changes while we are
dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The
`vishvananda/netlink` library returned `EINTR` since v1.2.1, but more
recent versions have changed it such that it returns
`netlink.ErrDumpInterrupted` instead[^2].

These interruptions seem common in high-churn environments. If the error
occurs, it is in most cases best to just try again.  Therefore, this
commit adds a wrapper for all `netlink` functions marked to return
`ErrDumpInterrupted` that retries the function up to 30 times until it
either succeeds or returns a different error.

While may call sites do have their own high-level retry mechanism (see
e.g. cilium#32099), the logged error message can still cause CI
to fail (e.g. cilium#35259). Long high-level retry intervals can
also become problematic: For example, if the routing setup fails due to
`NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add
several seconds of additional delay to an already overloaded system,
instead of resolving the issue quickly.

A subsequent commit will add an additional linter that nudges developers
to use this new `safenetlink` package for function calls that can be
interrupted. This ensures that we don't have to add retries in all
subsystems individually.

[^1]: https://docs.kernel.org/userspace-api/netlink/intro.html
[^2]: vishvananda/netlink#1018

Signed-off-by: Sebastian Wicki <[email protected]>
gandro added a commit to gandro/cilium that referenced this pull request Oct 30, 2024
According to the kernel docs[^1], the kernel can return incomplete
results for netlink state dumps if the state changes while we are
dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The
`vishvananda/netlink` library returned `EINTR` since v1.2.1, but more
recent versions have changed it such that it returns
`netlink.ErrDumpInterrupted` instead[^2].

These interruptions seem common in high-churn environments. If the error
occurs, it is in most cases best to just try again.  Therefore, this
commit adds a wrapper for all `netlink` functions marked to return
`ErrDumpInterrupted` that retries the function up to 30 times until it
either succeeds or returns a different error.

While may call sites do have their own high-level retry mechanism (see
e.g. cilium#32099), the logged error message can still cause CI
to fail (e.g. cilium#35259). Long high-level retry intervals can
also become problematic: For example, if the routing setup fails due to
`NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add
several seconds of additional delay to an already overloaded system,
instead of resolving the issue quickly.

A subsequent commit will add an additional linter that nudges developers
to use this new `safenetlink` package for function calls that can be
interrupted. This ensures that we don't have to add retries in all
subsystems individually.

[^1]: https://docs.kernel.org/userspace-api/netlink/intro.html
[^2]: vishvananda/netlink#1018

Signed-off-by: Sebastian Wicki <[email protected]>
gandro added a commit to gandro/cilium that referenced this pull request Oct 30, 2024
According to the kernel docs[^1], the kernel can return incomplete
results for netlink state dumps if the state changes while we are
dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The
`vishvananda/netlink` library returned `EINTR` since v1.2.1, but more
recent versions have changed it such that it returns
`netlink.ErrDumpInterrupted` instead[^2].

These interruptions seem common in high-churn environments. If the error
occurs, it is in most cases best to just try again.  Therefore, this
commit adds a wrapper for all `netlink` functions marked to return
`ErrDumpInterrupted` that retries the function up to 30 times until it
either succeeds or returns a different error.

While may call sites do have their own high-level retry mechanism (see
e.g. cilium#32099), the logged error message can still cause CI
to fail (e.g. cilium#35259). Long high-level retry intervals can
also become problematic: For example, if the routing setup fails due to
`NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add
several seconds of additional delay to an already overloaded system,
instead of resolving the issue quickly.

A subsequent commit will add an additional linter that nudges developers
to use this new `safenetlink` package for function calls that can be
interrupted. This ensures that we don't have to add retries in all
subsystems individually.

[^1]: https://docs.kernel.org/userspace-api/netlink/intro.html
[^2]: vishvananda/netlink#1018

Signed-off-by: Sebastian Wicki <[email protected]>
github-merge-queue bot pushed a commit to cilium/cilium that referenced this pull request Oct 30, 2024
According to the kernel docs[^1], the kernel can return incomplete
results for netlink state dumps if the state changes while we are
dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The
`vishvananda/netlink` library returned `EINTR` since v1.2.1, but more
recent versions have changed it such that it returns
`netlink.ErrDumpInterrupted` instead[^2].

These interruptions seem common in high-churn environments. If the error
occurs, it is in most cases best to just try again.  Therefore, this
commit adds a wrapper for all `netlink` functions marked to return
`ErrDumpInterrupted` that retries the function up to 30 times until it
either succeeds or returns a different error.

While may call sites do have their own high-level retry mechanism (see
e.g. #32099), the logged error message can still cause CI
to fail (e.g. #35259). Long high-level retry intervals can
also become problematic: For example, if the routing setup fails due to
`NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add
several seconds of additional delay to an already overloaded system,
instead of resolving the issue quickly.

A subsequent commit will add an additional linter that nudges developers
to use this new `safenetlink` package for function calls that can be
interrupted. This ensures that we don't have to add retries in all
subsystems individually.

[^1]: https://docs.kernel.org/userspace-api/netlink/intro.html
[^2]: vishvananda/netlink#1018

Signed-off-by: Sebastian Wicki <[email protected]>
gandro added a commit to cilium/cilium that referenced this pull request Oct 30, 2024
[ upstream commit 5d1951b ]

According to the kernel docs[^1], the kernel can return incomplete
results for netlink state dumps if the state changes while we are
dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The
`vishvananda/netlink` library returned `EINTR` since v1.2.1, but more
recent versions have changed it such that it returns
`netlink.ErrDumpInterrupted` instead[^2].

These interruptions seem common in high-churn environments. If the error
occurs, it is in most cases best to just try again.  Therefore, this
commit adds a wrapper for all `netlink` functions marked to return
`ErrDumpInterrupted` that retries the function up to 30 times until it
either succeeds or returns a different error.

While may call sites do have their own high-level retry mechanism (see
e.g. #32099), the logged error message can still cause CI
to fail (e.g. #35259). Long high-level retry intervals can
also become problematic: For example, if the routing setup fails due to
`NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add
several seconds of additional delay to an already overloaded system,
instead of resolving the issue quickly.

A subsequent commit will add an additional linter that nudges developers
to use this new `safenetlink` package for function calls that can be
interrupted. This ensures that we don't have to add retries in all
subsystems individually.

[^1]: https://docs.kernel.org/userspace-api/netlink/intro.html
[^2]: vishvananda/netlink#1018

Signed-off-by: Sebastian Wicki <[email protected]>
github-merge-queue bot pushed a commit to cilium/cilium that referenced this pull request Oct 31, 2024
[ upstream commit 5d1951b ]

According to the kernel docs[^1], the kernel can return incomplete
results for netlink state dumps if the state changes while we are
dumping it. The result is then marked by `NLM_F_DUMP_INTR`. The
`vishvananda/netlink` library returned `EINTR` since v1.2.1, but more
recent versions have changed it such that it returns
`netlink.ErrDumpInterrupted` instead[^2].

These interruptions seem common in high-churn environments. If the error
occurs, it is in most cases best to just try again.  Therefore, this
commit adds a wrapper for all `netlink` functions marked to return
`ErrDumpInterrupted` that retries the function up to 30 times until it
either succeeds or returns a different error.

While may call sites do have their own high-level retry mechanism (see
e.g. #32099), the logged error message can still cause CI
to fail (e.g. #35259). Long high-level retry intervals can
also become problematic: For example, if the routing setup fails due to
`NLM_F_DUMP_INTR` during an CNI ADD invocation, the retry adds add
several seconds of additional delay to an already overloaded system,
instead of resolving the issue quickly.

A subsequent commit will add an additional linter that nudges developers
to use this new `safenetlink` package for function calls that can be
interrupted. This ensures that we don't have to add retries in all
subsystems individually.

[^1]: https://docs.kernel.org/userspace-api/netlink/intro.html
[^2]: vishvananda/netlink#1018

Signed-off-by: Sebastian Wicki <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants