-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[draft] support of bindmounted unix dgram sockets #1001
base: criu-dev
Are you sure you want to change the base?
[draft] support of bindmounted unix dgram sockets #1001
Conversation
Drop the constants for default cache host/port and page size because they are not used anywhere. Signed-off-by: Radostin Stoyanov <[email protected]> Reviewed-by: Dmitry Safonov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Radostin Stoyanov <[email protected]>
1) fix sfle memory leak on get_fle_for_scm error 2) fix gfd open descriptor leak on get_fle_for_scm error 3-6) fix buf memory leak on read and pwrite errors Signed-off-by: Pavel Tikhomirov <[email protected]>
Cc: Adrian Reber <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
Show error message when image-{cache,proxy} is called without --port and image-proxy without --address argument. Signed-off-by: Radostin Stoyanov <[email protected]>
criu/sk-packet.c:443:3: error: 'strncpy' output may be truncated copying 14 bytes from a string of length 15 strncpy(addr_spkt.sa_data, req.ifr_name, sa_data_size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ criu/img-remote.c:383:3: error: 'strncpy' specified bound 4096 equals destination size strncpy(snapshot_id, li->snapshot_id, PATHLEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ criu/img-remote.c:384:3: error: 'strncpy' specified bound 4096 equals destination size strncpy(path, li->name, PATHLEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ criu/files.c:288:3: error: 'strncpy' output may be truncated copying 4095 bytes from a string of length 4096 strncpy(buf, link->name, PATH_MAX - 1); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ criu/sk-unix.c:239:36: error: '/' directive output may be truncated writing 1 byte into a region of size between 0 and 4095 snprintf(path, sizeof(path), ".%s/%s", dir, sk->name); ^ criu/sk-unix.c:239:3: note: 'snprintf' output 3 or more bytes (assuming 4098) into a destination of size 4096 snprintf(path, sizeof(path), ".%s/%s", dir, sk->name); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ criu/mount.c:2563:3: error: 'strncpy' specified bound 4096 equals destination size strncpy(path, m->mountpoint, PATH_MAX); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ criu/cr-restore.c:3647:2: error: 'strncpy' specified bound 16 equals destination size strncpy(task_args->comm, core->tc->comm, sizeof(task_args->comm)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Andrei Vagin <[email protected]>
CID 190778 (checkpoint-restore#1 of 1): Read from pointer after free (USE_AFTER_FREE) 7. deref_after_free: Dereferencing freed pointer rop. Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Radostin Stoyanov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
Both functions check_pending_forwards() and check_pending_reads() have very similar functionality. This patch aims to reduce the duplication of code by merging both functions into check_pending() Signed-off-by: Radostin Stoyanov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
The macro PATHLEN is redundant. It is defined such that its replacement token sequence is the token PATH_MAX. Signed-off-by: Radostin Stoyanov <[email protected]> Acked-by: Adrian Reber <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
The function `setup_TCP_server_socket` (defined in img-remote.c) and `setup_tcp_server` (defined in util.c) have very similar functionality. Replace setup_TCP_server_socket() with setup_tcp_server() to reduce code duplication and to enable IPv6 support for the image-cache action of CRIU. We set SO_REUSEADDR flag to allow reuse of local addresses. Signed-off-by: Radostin Stoyanov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
Remove setup_TCP_client_socket() and use setup_tcp_server() instead as both functions have very similar functionality. Signed-off-by: Radostin Stoyanov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
Check only once if (sockfd < 0) Signed-off-by: Radostin Stoyanov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
When CRIU calls the ip tool on restore, it passes the fd of remote socket by replacing the STDIN before execvp. The stdin is used by the ip tool to receive input. However, the ip tool calls ftell(stdin) which fails with "Illegal seek" since UNIX sockets do not support file positioning operations. To resolve this issue, read the received content from the UNIX socket and store it into temporary file, then replace STDIN with the fd of this tmp file. # python test/zdtm.py run -t zdtm/static/env00 --remote -f ns === Run 1/1 ================ zdtm/static/env00 ========================= Run zdtm/static/env00 in ns ========================== Start test ./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST Adding image cache Adding image proxy Run criu dump Run criu restore =[log]=> dump/zdtm/static/env00/31/1/restore.log ------------------------ grep Error ------------------------ RTNETLINK answers: File exists (00.229895) 1: do_open_remote_image RDONLY path=route-9.img snapshot_id=dump/zdtm/static/env00/31/1 (00.230316) 1: Running ip route restore Failed to restore: ftell: Illegal seek (00.232757) 1: Error (criu/util.c:712): exited, status=255 (00.232777) 1: Error (criu/net.c:1479): IP tool failed on route restore (00.232803) 1: Error (criu/net.c:2153): Can't create net_ns (00.255091) Error (criu/cr-restore.c:1177): 105 killed by signal 9: Killed (00.255307) Error (criu/mount.c:2960): mnt: Can't remove the directory /tmp/.criu.mntns.dTd7ak: No such file or directory (00.255339) Error (criu/cr-restore.c:2119): Restoring FAILED. ------------------------ ERROR OVER ------------------------ ################# Test zdtm/static/env00 FAIL at CRIU restore ################## ##################################### FAIL ##################################### Fixes checkpoint-restore#311 Signed-off-by: Radostin Stoyanov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
In Py2 `range` returns a list and `xrange` creates a sequence object that evaluates lazily. In Py3 `range` is equivalent to `xrange` in Py2. Signed-off-by: Radostin Stoyanov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Pavel Tikhomirov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
As discussed on the mailing list, current .py files formatting does not conform to the world standard, so we should better reformat it. For this the yapf tool is used. The command I used was yapf -i $(find -name *.py) Signed-off-by: Pavel Emelyanov <[email protected]>
Most of the typos were found by codespell. Signed-off-by: Radostin Stoyanov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
This patch does not introduce any functional changes. Signed-off-by: Radostin Stoyanov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
The function accept_proxy_to_cache() is a wrapper around accept(). Use accept() directly instead. Signed-off-by: Radostin Stoyanov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Radostin Stoyanov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Radostin Stoyanov <[email protected]>
Reduce code duplication by introducing a strflags() macro which maps a flag to corresponding string. Signed-off-by: Radostin Stoyanov <[email protected]>
Signed-off-by: Radostin Stoyanov <[email protected]>
Signed-off-by: Radostin Stoyanov <[email protected]>
This patch makes various private variables and procedures static. These changes conform to the following code style conventions: When declaring pointer data or a function that returns a pointer type, the preferred use of ``*`` is adjacent to the data name or function name and not adjacent to the type name. Statements longer than 80 columns will be broken into sensible chunks, unless exceeding 80 columns significantly increases readability and does not hide information. Descendants are always substantially shorter than the parent and are placed substantially to the right. The same applies to function headers with a long argument list. from https://www.kernel.org/doc/Documentation/process/coding-style.rst The function declarations {send,recv}_image() from img-remote.h are removed because they do not have a corresponding implementation. The following functions are made static because they are not used outside img-remote.c: * {send,recv}_image_async() * {read,write}_remote_header() * socket_set_non_blocking() Signed-off-by: Radostin Stoyanov <[email protected]>
There is no need to print an error message, __xalloc() does that. Signed-off-by: Radostin Stoyanov <[email protected]>
When an interrupt signal (even SIGWINCH when strace is used) is received while epoll_wait() sleeps, it will return a value of -1 and set errno to EINTR, which is not an error and should be ignored. Signed-off-by: Radostin Stoyanov <[email protected]>
There is no need to pass the values for address and port as arguments when creating a TCP server. The external `opts` object, which provides opts.addr and opts.port, is accessible in all components that require these values. With this change, a value specified with the `--address` option will used by image-cache in the same way as with page-server. Example: criu image-cache --address 127.0.0.1 --port 1234 criu page-server --address 127.0.0.1 --port 1234 Signed-off-by: Radostin Stoyanov <[email protected]>
The variable name 'remote_sk' is shorter than 'proxy_to_cache_fd' and it is more similar to 'page_server_sk' (used in criu/page-xfer.c). Signed-off-by: Radostin Stoyanov <[email protected]>
Because we need to gather unix sockets earlier than we start creating mount tree. Thus we will be able to handle bindmounted sockets. Signed-off-by: Cyrill Gorcunov <[email protected]>
So when we will be examinating mount points and sockets we won't waste time on non-bindmounted. Signed-off-by: Cyrill Gorcunov <[email protected]>
We will need to take mutex when bind() bindmounted sockets. Strictly speaking we won't support bindmounted and deleted sockets for now but better prepare this scaffolds early. Signed-off-by: Cyrill Gorcunov <[email protected]>
Some unix sockets might be bindmounted (say /dev/log bound to another place). So to handle it we need to change the logic we open such sockets especially because we create mount tree earlier than we start to restore files. Thus here what we do: - on dump mark such sockets with UNIX_UFLAGS__BINDMOUNT flag so we would distinguish them on restore; - collect unix sockets before creating mount tree; note that at this moment we able to simply gather this sockets into own @unix_mnt_sockets list and nothing more because setting up the peers and such happens later in that named post action procedures; - when we need to create a bindmount point we enter into unix engine and figure out if there a socket to bindmount over; if found we pre-allocate the socketpair, bind it and save inside fdstore engine; using socketpair is important because later we need both peers to restore queued data; - finally when we start restoring files we simply fetch the socket from the fdstore and use it directly. All this scheme is working simply because we support dgram standalone sockets only, adding support for streamed sockets requires a way more engine rework and hopefully we won't need it in near future. Signed-off-by: Cyrill Gorcunov <[email protected]>
To test a case where unix socket is bind mounted to somewhere so restore may fail if socket has not been created. Signed-off-by: Cyrill Gorcunov <[email protected]>
Otherwise the mount call may fail since bindmounted unix socket wont be created. Signed-off-by: Cyrill Gorcunov <[email protected]>
To elimitane compilation warnings with gcc-8. Signed-off-by: Cyrill Gorcunov <[email protected]>
Signed-off-by: Cyrill Gorcunov <[email protected]>
For better debug Signed-off-by: Cyrill Gorcunov <[email protected]>
Signed-off-by: Cyrill Gorcunov <[email protected]>
No need to check for pair argument, it is rather confusing. Signed-off-by: Cyrill Gorcunov <[email protected]>
Currently bind_unix_sk is used in two contexts: to bind freshly created socket pairs and to bind name for sockets which are to be queued into fdstore (binmounted sockets). For first case we should notify the waiting side immediately but in turn bindmount sockets are created early and there might be the case where peers are not yet even opened and notification may simply lost or even cause sigsegv since file list is yet empty. In turn we should defer it until we do a real bindmount socket opening right after we fetched it from the fdstore. https://jira.sw.ru/browse/PSBM-88274 Signed-off-by: Cyrill Gorcunov <[email protected]>
The unlink procedure is rather a cleanup before we start creating new sockets, but bindmounted sockets are pre-created early so we should not touch them. Signed-off-by: Cyrill Gorcunov <[email protected]>
Once socket is bounded we should allow to connect to us via relative name. https://jira.sw.ru/browse/PSBM-88274 Signed-off-by: Cyrill Gorcunov <[email protected]>
Also full rework to make sure we can test the situation where client is not yet opened when we're restoring bindmount. Signed-off-by: Cyrill Gorcunov <[email protected]>
Signed-off-by: Cyrill Gorcunov <[email protected]>
Previously we always created sockets in root mount namespace but in 019ebec we've tried to resolve this problem setting up proper mount namespace. This is not always possible though: the root task from which we trying to switch ns might already have a number of children with CLONE_FS|CLONE_FILES set and kernel doesn't allow to do that. Lets disable this ability until we find a proper solution. https://jira.sw.ru/browse/PSBM-89126 Signed-off-by: Cyrill Gorcunov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is yet a draft, so let's draft-review it =)
Please see in-line comments. upd: Also can you please merge bugfixes to the corresponding base patches?
|
||
pr_info("%d restore sndbuf %d rcv buf %d\n", sk, soe->so_sndbuf, soe->so_rcvbuf); | ||
|
||
/* setsockopt() multiplies the input values by 2 */ | ||
ret |= userns_call(sk_setbufs, UNS_ASYNC, bufs, sizeof(bufs), sk); | ||
return userns_call(sk_setbufs, UNS_ASYNC, bufs, sizeof(bufs), sk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fine to use async call if we want to set queue size strictly before restoring the queue?
|
||
if (switch_ns(mi->nsid->ns_pid, &mnt_ns_desc, &ns_old) < 0) { | ||
pr_err("Can't switch ns to mnt_id %d", mi->mnt_id); | ||
if (restore_ns(ns_old, &mnt_ns_desc)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we don't need to restore_ns here if switch ns failed, we've just restored initial mntns in the end of previous iteration, and if switch_ns fails it does not actually do the switch.
if (restore_ns(ns_old, &mnt_ns_desc)) { | ||
pr_err("Can't switch mount ns back from %d\n", mi->nsid->ns_pid); | ||
return -1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also "ns_old = -1" after all restore_ns calls would be nice as the fd is closed, not good if we will accidentally reuse it.
return -1; | ||
} | ||
|
||
if (stat(mi->mountpoint, &st)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the mountpoint is overmounted you can stat not what you need (some other file from overmount). So we either need to use open+fdinfo to check that we stat the file from right mount or just fail if mnt_is_overmounted(mi).
} | ||
} else { | ||
ue->mnt_id = d->mnt_id; | ||
ue->has_mnt_id = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we loose this way the information about the original socket file mount id? Maybe we can add another field for it?
mi->mountpoint); | ||
goto err; | ||
} | ||
|
||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hunk is supper strange, please fix it =) You better merge "unix: bindmount -- Move mounting code to be called before first mount" to this commit.
A friendly reminder that this PR had no activity for 30 days. |
f392ea1
to
4d137b8
Compare
This patchset adds support of bindmounted unix DGRAM sockets
Will be ready for review/updated after #992 will be merged