Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

luv fails to handle signals promptly #400

Closed
talex5 opened this issue Jan 5, 2023 · 5 comments
Closed

luv fails to handle signals promptly #400

talex5 opened this issue Jan 5, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@talex5
Copy link
Collaborator

talex5 commented Jan 5, 2023

OCaml signal handlers can't run at any time (unlike C ones). So when an OCaml signal occurs, OCaml just records the fact and runs them at the next safe point. Typically the system call that was running will return with EINTR, the C stub will call caml_leave_blocking_section, and the signal handler will then run.

This works fine with ocaml-uring. However, libuv automatically retries on EINTR, without giving OCaml a chance to do anything.

This is really a problem with luv, but we might want to make some kind of work-around. libuv does provide its own signal handling system (using handlers written in C to wake the libuv loop).

Reacquiring the runtime lock briefly in e.g. libuv's epoll.c fixes it. However, that would prevent using the system version of libuv.

diff --git a/src/unix/epoll.c b/src/unix/epoll.c
index 97348e25..cc272d64 100644
--- a/src/unix/epoll.c
+++ b/src/unix/epoll.c
@@ -24,6 +24,9 @@
 #include <errno.h>
 #include <sys/epoll.h>
 
+void caml_enter_blocking_section (void);
+void caml_leave_blocking_section (void);
+
 int uv__epoll_init(uv_loop_t* loop) {
   int fd;
   fd = epoll_create1(O_CLOEXEC);
@@ -288,6 +291,9 @@ void uv__io_poll(uv_loop_t* loop, int timeout) {
         reset_timeout = 0;
       }
 
+      caml_leave_blocking_section();
+      caml_enter_blocking_section();
+
       if (timeout == -1)
         continue;
 
@talex5 talex5 added the bug Something isn't working label Jan 5, 2023
@haesbaert
Copy link
Contributor

It's a shame that it seems not even Luv.run ~mode:ONCE could do it, it seems the only thing that unwinds the epoll loop is getting a timeout or an event.

@haesbaert
Copy link
Contributor

At any rate, if you choose to go this path, it would make more sense to make the loop unwind so that we end up processing the signal up there, and not run ocaml code while luv is in the middle of an epoll loop.

@talex5
Copy link
Collaborator Author

talex5 commented Jan 5, 2023

After discussion with @haesbaert, we could mask events in luv threads and run a separate domain to handle signals under luv. Here's a proof-of-concept:

open Eio.Std

let i = ref 3

let interrupted = Eio.Condition.create ()

let handle (_:int) =
  Eio.Condition.broadcast interrupted

let _ =
  Domain.spawn (fun () -> while true do Unix.sleep 10 done)

let () =
  Eio_main.run @@ fun _env ->
  Thread.sigmask SIG_SETMASK [Sys.sigint] |> ignore;
  Sys.set_signal Sys.sigint (Signal_handle handle);
  for i = 1 to 3 do
    traceln "run %d" i;
    Eio.Condition.await_no_mutex interrupted;
  done

The output is:

$ EIO_BACKEND=luv dune exec -- ./examples/hello/main.exe
+run 1
^C+run 2
^C+run 3
^C

Would need to remember to clear the mask when forking a subprocess or spawning a non-Eio domain (or systhread?).

@haesbaert
Copy link
Contributor

The systhread version didn't work due to an actual bug in the runtime, the wrong mask was being restored in the child thread, I've pushed a fix here: ocaml/ocaml#11880

haesbaert added a commit to haesbaert/eio that referenced this issue Jan 23, 2023
This makes sure we can process signals in luv the same way we do in uring. As
stated in ocaml-multicore#400 the main issue is that luv's mainloop will restart on EINTR and
we will never unwind back to ocaml land, so even though the process got the
signal, the runtime will not see it until something else goes on.

The trick here is to abuse POSIX thread semantics and shove all signals into one
specific systhread by blocking them in the other threads.

Additionally, I've fixed a bug in Ocaml 5.0 where the systhreads end up starting
with all signals blocked: ocaml/ocaml#11880. This PR
works even with/without the bug.

Danger Zone
~~~~~~~~~~~
This is tricky ! If you're thinking we can do better than pipes, think again !
Unix.sigsuspend doesn't work on multithreaded, we don't have a real Unix.pause
(it's implemented on top of sigsuspend !).
The semantics for kill(2) to "self" are different than "from outside" when
multithreaded, and the runtime doesn't expose pthread_kill(2). That's why on the
test we have to fork and signal back to the parent. I know, it's horrible.
talex5 added a commit to haesbaert/eio that referenced this issue Jan 24, 2023
This makes sure we can process signals in luv the same way we do in uring. As
stated in ocaml-multicore#400 the main issue is that luv's mainloop will restart on EINTR and
we will never unwind back to ocaml land, so even though the process got the
signal, the runtime will not see it until something else goes on.

The trick here is to abuse POSIX thread semantics and shove all signals into one
specific systhread by blocking them in the other threads.

Additionally, I've fixed a bug in OCaml 5.0 where the systhreads end up starting
with all signals blocked: ocaml/ocaml#11880. This PR
works even with/without the bug.

Danger Zone
~~~~~~~~~~~
This is tricky ! If you're thinking we can do better than pipes, think again !
Unix.sigsuspend doesn't work on multithreaded, we don't have a real Unix.pause
(it's implemented on top of sigsuspend !).
The semantics for kill(2) to "self" are different than "from outside" when
multithreaded, and the runtime doesn't expose pthread_kill(2). That's why on the
test we have to fork and signal back to the parent. I know, it's horrible.

Co-authored-by: Thomas Leonard <[email protected]>
talex5 added a commit that referenced this issue Jan 24, 2023
@talex5
Copy link
Collaborator Author

talex5 commented Feb 1, 2023

Fixed in #412.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants