You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're running into problems with checkpointing containers that have connections on a host unix domain socket mounted to the container; it results in the following error (expected since SCMConnectedEndpoint objects aren't saveable):
We rely on the -net-disconnect-ok flag when checkpointing containers in production to close any TCP connections open at the time of the checkpoint rather than having the checkpoint attempt fail. This is fairly critical for us because we're running arbitrary user code and it's hard to guarantee that there are no open connections at the time we checkpoint.
If possible, we'd like for this flag (or a new flag) to apply to open unix domain sockets that are backed by host FDs. We mount (on the container) some host domain sockets for IPC between in-sandbox processes and our agent code running on the host, and in practice, we can't guarantee that the sockets are closed in gvisor at the time we try to checkpoint the container. This prevents us from successfully checkpointing certain workloads for some customers. The only way around this that I can think of is to have gvisor close the socket itself. It seems like there is precedent for this because gvisor already can do this for TCP connections.
I'm happy to attempt this myself if needed.
Is this feature related to a specific bug?
No response
Do you have a specific solution in mind?
No response
The text was updated successfully, but these errors were encountered:
I think this would be reasonable to bundle with --net-disconnect-ok if you want to take a shot at it. The need / use case for it seems more or less the same.
Description
We're running into problems with checkpointing containers that have connections on a host unix domain socket mounted to the container; it results in the following error (expected since
SCMConnectedEndpoint
objects aren't saveable):We rely on the
-net-disconnect-ok
flag when checkpointing containers in production to close any TCP connections open at the time of the checkpoint rather than having the checkpoint attempt fail. This is fairly critical for us because we're running arbitrary user code and it's hard to guarantee that there are no open connections at the time we checkpoint.If possible, we'd like for this flag (or a new flag) to apply to open unix domain sockets that are backed by host FDs. We mount (on the container) some host domain sockets for IPC between in-sandbox processes and our agent code running on the host, and in practice, we can't guarantee that the sockets are closed in gvisor at the time we try to checkpoint the container. This prevents us from successfully checkpointing certain workloads for some customers. The only way around this that I can think of is to have gvisor close the socket itself. It seems like there is precedent for this because gvisor already can do this for TCP connections.
I'm happy to attempt this myself if needed.
Is this feature related to a specific bug?
No response
Do you have a specific solution in mind?
No response
The text was updated successfully, but these errors were encountered: