Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libsc crashes Julia when running the GC and using multiple threads #35

Open
ranocha opened this issue Oct 5, 2023 · 1 comment
Open
Labels
bug Something isn't working

Comments

@ranocha
Copy link
Collaborator

ranocha commented Oct 5, 2023

Here is an MWE:

# Debug segfault with multithreading when loading t8code

using MPI
using T8code

if !MPI.Initialized()
  mpiret  = MPI.Init()
end

# segfaults
T8code.Libt8.sc_init(MPI.COMM_WORLD, 1, 1, C_NULL, T8code.Libt8.SC_LP_ERROR)

# segfaults
# T8code.Libt8.sc_init(MPI.COMM_WORLD, 1, 0, C_NULL, T8code.Libt8.SC_LP_ERROR)

# does not segfault
# T8code.Libt8.sc_init(MPI.COMM_WORLD, 0, 1, C_NULL, T8code.Libt8.SC_LP_ERROR)

# does not segfault
# T8code.Libt8.sc_init(MPI.COMM_WORLD, 0, 0, C_NULL, T8code.Libt8.SC_LP_ERROR)

function allocate_as_crazy()
  n = 1_000
  k = 100
  A = [randn(k, k) for _ in 1:n]
  B = [randn(k, k) for _ in 1:n]

  Threads.@threads for i in 1:n
    A[i] = A[i] * B[i]
  end

  return sum(sum(A))
end

@show allocate_as_crazy()

Saving this script as debug_segfaults_t8code.jl, initializing an appropriate project with MPI and T8code, and running Julia v1.9.3 yields something like

$ julia --project=run --threads=6 debug_segfaults_t8code.jl
[libsc 0] Caught signal SEGV
[libsc 0] Caught signal SEGV
[libsc 0[libsc [libsc 0] Caught signal SEGV
] Caught signal SEGV
0] Caught signal SEGV
[libsc 0] Abort: Obtained 10 stack frames
[libsc 0] Stack 0: libsc.so.2(+0xd401) [0x7f03ac028401]
[libsc 0] Stack 1: libsc.so.2(sc_abort+0xa) [0x7f03ac0278ea]
[libsc 0] Stack 2: libsc.so.2(+0xd3cd) [0x7f03ac0283cd]
[libsc 0] Stack 3: libc.so.6(+0x3c4b0) [0x7f03c463c4b0]
[libsc 0] Stack 4: libjulia-internal.so.1(_jl_mutex_wait+0x91) [0x7f03c388f2e1]
[libsc 0] Stack 5: libjulia-internal.so.1(_jl_mutex_lock+0x30) [0x7f03c388f3a0]
[libsc 0] Stack 6: libjulia-codegen.so.1(jl_generate_fptr_impl+0x83) [0x7f03c452e393]
[libsc 0] Stack 7: libjulia-internal.so.1(jl_compile_method_internal+0xa0) [0x7f03c3842310]
[libsc 0] Stack 8: libjulia-internal.so.1(ijl_apply_generic+0x43e) [0x7f03c384311e]
[libsc 0] Stack 9: libjulia-internal.so.1(+0x645c0) [0x7f03c38645c0]
[libsc 0] Abort: Obtained 10 stack frames

It seems to be fine if libsc is initialized with T8code.Libt8.sc_init(MPI.COMM_WORLD, 0, ...), i.e., with catch_signals = 0.

CC @NicolasRiel

@jmark
Copy link
Contributor

jmark commented Oct 6, 2023

Yes, I observed the same behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants