Specify interoperability with fork and subprocess creation #474

nspark · 2021-07-06T19:40:55Z

This PR specifies interoperability with fork and subprocess creation via posix_spawn.

Feedback is very much requested.

Closes #223

nspark · 2021-07-12T20:55:02Z

In my testing, an application that holds to the following clause works as expected for both Cray SHMEM on Aries and OSHMEM/UCX on InfiniBand:

When fork is invoked before the OpenSHMEM library is initialized, only one of either the parent or child processes may initialize the OpenSHMEM library.

wrrobin · 2021-07-14T20:04:06Z

In my testing, an application that holds to the following clause works as expected for both Cray SHMEM on Aries and OSHMEM/UCX on InfiniBand:

When fork is invoked before the OpenSHMEM library is initialized, only one of either the parent or child processes may initialize the OpenSHMEM library.

Yes, makes sense.
For the other clause, if fork is invoked within the OpenSHMEM region, then the behavior is defined and both the parent and child can invoke SHMEM operations until shmem_finalize(). If the child process is killed before the shmem_finalize, will that be allowed for both child and parent to invoke SHMEM calls?

nspark · 2021-07-14T20:16:48Z

For the other clause, if fork is invoked within the OpenSHMEM region, then the behavior is defined and both the parent and child can invoke SHMEM operations until shmem_finalize(). If the child process is killed before the shmem_finalize, will that be allowed for both child and parent to invoke SHMEM calls?

No, the second clause says (emphasis added):

When fork is invoked within the OpenSHMEM portion of the program or after the OpenSHMEM library has been finalized, the newly created child process shall not call any OpenSHMEM routines; otherwise, the behavior is undefined.

So, no; the child process could not invoke SHMEM operations.

wrrobin · 2021-07-14T20:48:00Z

Thanks @nspark. The text mentioned system level interfaces, so I assume, user level threads / processes do not fall under this restriction.

After finalize, a call to any OpenSHMEM API would lead to undefined behavior, from both child and parent processes. I dont see the current spec has any such statement other than saying that finalize must be the last OpenSHMEM call.

jdinan · 2021-07-26T15:36:20Z

@nspark I realize we have deprecated atexit finalization of OpenSHMEM libraries, but I am wondering about the interactions. When a child process is forked, it inherits the atexit handlers of the parent process. The OpenSHMEM library can record the PID of the process that calls shmem_init and only finalize the library when the process with this PID invokes the atexit handler. However, this could break code that assumes the child process can continue to use OpenSHMEM until it exits. One solution is to remove start_pes and implicit finalization. However, I think some OpenSHMEM libraries may still use implicit finalization even though we have added an explicit finalize call.

nspark · 2021-08-13T23:39:40Z

@jdinan I wonder whether pthread_atfork could be relevant here as a way for a library to safely disable its atexit handler in the child process. Of course, it can't dequeue the handler itself, but it could atomically set a flag that gets queried in the handler before invoking the internal library finalization.

content/interoperability.tex

nspark · 2021-09-09T20:46:05Z

@jdinan Re: implicit finalization, what about adding something like the following under "Notes to implementers"?

OpenSHMEM implementations that support implicit library finalization for compatibility with start_pes should ensure that child processes created after library initialization do not implicitly call OpenSHMEM operations as part of exit handlers invoked during normal process termination.

jdinan · 2021-09-21T21:12:26Z

@nspark This is a useful note to implementors. The start_pes function and implicit finalization are still required for implementations to be conformant with the spec. Can we also remove these functions (in addition to adding this note about backward compatibility) so that implementors don't need to deal with this case?

nspark · 2021-10-27T23:58:10Z

@jdinan Just to clarify, by "remove these functions," you mean totally gone from the API and not just deprecated, correct?

Co-authored-by: Bryant C. Lam <[email protected]>

jdinan · 2021-11-08T19:14:40Z

Note that this would require OpenSHMEM libraries using IB Verbs to call ibv_fork_init: https://linux.die.net/man/3/ibv_fork_init

jdinan · 2022-05-13T15:55:40Z

content/interoperability.tex

+\item When \FUNC{fork} is invoked within the \openshmem portion of the
+  program or after the \openshmem library has been finalized, the
+  newly created child process shall not call any \openshmem routines;
+  otherwise, the behavior is undefined.


These two cases above are still problematic. The issue is that fork marks pages shared by the parent and child processes as copy on write. When either process touches the page, it causes a copy of that page to be mapped into that process' address space and this breaks memory registration because it can change the parent's mapping for memory registered with the NIC. The second bullet above is solved by ibv_fork_init, which marks any memory registered to the NIC with MADV_DONTFORK. This prevents the page being remapped to a copy in the parent's address space, but it also changes the behavior of fork since some pages don't propagate to the child process. In new kernels, Linux copies the pages immediately, which solves the memory registration problem without breaking fork, but at the expense of copy space/time overhead (especially if the application does fork+exec). I'm not exactly sure what will happen in the case where fork is called before ibv_fork_init on older Linux kernels. Hopefully the NIC driver will see that the page is marked copy on write during registration and duplicate it in the parent's address space.

To summarize, any support for fork is problematic. I know the bullet below says that we aren't required to support it, but the two bullets above describe two usage models as things that ought to work. Fork bugs are very hard to diagnose; here's an interesting read of one user's journey discovering that fork was broken: https://blog.nelhage.com/post/a-cursed-bug/. I would suggest that if we want to keep the above bullets, we also need to give users some way to ask the SHMEM library whether or not the above usage models are supported.

That post is a great write-up, and it touches on a behavioral aspect of CoW that I didn't know:

Following a fork, [...] the first process to attempt to write to a copy-on-write (“CoW”) page will make a copy, leaving the original page (still visible to one or more other processes) intact.

This (potentially) interacts poorly with RDMA; If a process using RDMA forks, any future attempts to write to pages used for RDMA would result in the parent copying the page, which would mean it would no longer see any remote writes to that page by RDMA peers.

I had always thought the child always gets the copy. It sounds like the behavior in Linux 5.12 is (or can be) much better.

I'm inclined to, for now, limit subprocess creation to posix_spawn. We could then revisit documenting fork compatibility with a supplemental API (e.g., shmem_is_forksafe) later.

I think the documentation rationale for posix_spawn is sound. I didn't know posix_spawn existed until I found that I couldn't fork-exec in various SHMEM implementations. I imagine many Unix-y programmers are more used to fork-exec than posix_spawn.

Still, what's the concern with use of fork before shmem_init[_thread]? I'd generally think that, in general, we're still in "vanilla Linux memory land" during that time.

The concern I had about fork before init is what NIC drivers will do when you try to register a page that's marked CoW. I suppose this is a common case, e.g. if a new allocation is registered before it's initialized (all pages mapped to zero page), so perhaps nothing to worry about.

I have been digging around this trying to better understand the problem and a few things have changed since this thread was initiated. It is not clear to me that posix_spawn should better than fork() or vfork(). I believe both of those functions can be called depending on the setting of a few POSIX system environment variables. I would like to see this interface defined, but I know this is not a trivial task. Have any of your (@jdinan) opinions on the matter changed with updates to the kernel and networking middle layers options.

The main ambition is having a defined interface within a SHMEM program that is able to spawn off a subprocess safely. The process should have limitations on what it can do (e.g. no shmem routines).

The only change that comes to mind is Linux 5.12 copying the parent process pages into the child process at the time of the fork rather than marking them as CoW. Is having a copy of the parent process' data a usage model that's important for your apps? If not, we leave more freedom to implementations by not needing to support such a model.

nspark added the Feedback Requested label Jul 6, 2021

nspark added the PendingReading label Aug 20, 2021

BryantLam reviewed Aug 27, 2021

View reviewed changes

content/interoperability.tex Outdated Show resolved Hide resolved

nspark added HadReading NeedsChangeLogEntry and removed PendingReading labels Aug 31, 2021

nspark and others added 4 commits October 27, 2021 20:08

Initial interop text re: fork and subprocesses

265a972

Update 'fork' wording for consistency

29e7417

Co-authored-by: Bryant C. Lam <[email protected]>

Clarifying note on forking, exit handlers, and implicit finalization

507d765

Changelog for fork/subprocess interoperability

8cd6cad

nspark force-pushed the interop/fork branch from 799d6df to 8cd6cad Compare October 28, 2021 00:14

nspark removed the NeedsChangeLogEntry label Oct 28, 2021

jdinan reviewed May 13, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify interoperability with fork and subprocess creation #474

Specify interoperability with fork and subprocess creation #474

nspark commented Jul 6, 2021

nspark commented Jul 12, 2021

wrrobin commented Jul 14, 2021

nspark commented Jul 14, 2021

wrrobin commented Jul 14, 2021

jdinan commented Jul 26, 2021

nspark commented Aug 13, 2021

nspark commented Sep 9, 2021

jdinan commented Sep 21, 2021

nspark commented Oct 27, 2021

jdinan commented Nov 8, 2021

jdinan May 13, 2022

nspark May 13, 2022

nspark May 13, 2022

jdinan May 13, 2022

kwaters4 Jan 11, 2024

jdinan Jan 11, 2024

Specify interoperability with fork and subprocess creation #474

Are you sure you want to change the base?

Specify interoperability with fork and subprocess creation #474

Conversation

nspark commented Jul 6, 2021

nspark commented Jul 12, 2021

wrrobin commented Jul 14, 2021

nspark commented Jul 14, 2021

wrrobin commented Jul 14, 2021

jdinan commented Jul 26, 2021

nspark commented Aug 13, 2021

nspark commented Sep 9, 2021

jdinan commented Sep 21, 2021

nspark commented Oct 27, 2021

jdinan commented Nov 8, 2021

jdinan May 13, 2022

Choose a reason for hiding this comment

nspark May 13, 2022

Choose a reason for hiding this comment

nspark May 13, 2022

Choose a reason for hiding this comment

jdinan May 13, 2022

Choose a reason for hiding this comment

kwaters4 Jan 11, 2024

Choose a reason for hiding this comment

jdinan Jan 11, 2024

Choose a reason for hiding this comment