Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading Flux 0.13.15 for the first time results in error #2232

Open
mkschleg opened this issue Apr 20, 2023 · 7 comments
Open

Loading Flux 0.13.15 for the first time results in error #2232

mkschleg opened this issue Apr 20, 2023 · 7 comments

Comments

@mkschleg
Copy link
Contributor

When having Flux v0.13.15 installed in a cluster environment I am facing an issue where loading the package through using Flux fails the first time with the error:

julia> using Flux
┌ Error: No CUDA-capable device found
└ @ CUDA ~/.julia/packages/CUDA/s0e3j/src/initialization.jl:99
ERROR: InitError: could not load library "/home/mkschleg/.julia/artifacts/5588d506bba72642650ee246d9f26714ac244dc6/lib/libcudnn_cnn_infer.so"
/home/mkschleg/.julia/artifacts/5588d506bba72642650ee246d9f26714ac244dc6/lib/libcudnn_cnn_infer.so: failed to map segment from shared object
Stacktrace:
  [1] dlopen(s::String, flags::UInt32; throw_error::Bool)
    @ Base.Libc.Libdl ./libdl.jl:117
  [2] dlopen(s::String, flags::UInt32)
    @ Base.Libc.Libdl ./libdl.jl:116
  [3] macro expansion
    @ ~/.julia/packages/JLLWrappers/QpMQW/src/products/library_generators.jl:54 [inlined]
  [4] __init__()
    @ CUDNN_jll ~/.julia/packages/CUDNN_jll/npufe/src/wrappers/x86_64-linux-gnu-cuda+11.0.jl:33
  [5] _include_from_serialized(pkg::Base.PkgId, path::String, depmods::Vector{Any})
    @ Base ./loading.jl:831
  [6] _tryrequire_from_serialized(modkey::Base.PkgId, path::String, sourcepath::String, depmods::Vector{Any})
    @ Base ./loading.jl:938
  [7] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt64)
    @ Base ./loading.jl:1028
  [8] _require(pkg::Base.PkgId)
    @ Base ./loading.jl:1315
  [9] _require_prelocked(uuidkey::Base.PkgId)
    @ Base ./loading.jl:1200
 [10] macro expansion
    @ ./loading.jl:1180 [inlined]
 [11] macro expansion
    @ ./lock.jl:223 [inlined]
 [12] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:1144
during initialization of module CUDNN_jll

This is on a fresh installation of Flux in a new project. On subsequent calls of using Flux (in the same julia session) the package loads correctly. Restarting julia and I get the same error again.

julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65e* (2023-01-08 06:45 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, broadwell)
  Threads: 1 on 64 virtual cores
Environment:
  JULIA_DEPOT_PATH = ::
  LD_LIBRARY_PATH = :/home/mkschleg/.mujoco/mujoco210/bin:/usr/lib/nvidia:/home/mkschleg/.mujoco/mujoco210/bin:/usr/lib/nvidia
@mkschleg
Copy link
Contributor Author

mkschleg commented Apr 20, 2023

For reference package versions for Flux and CUDA:

...
  [052768ef] CUDA v4.1.4
...
  [587475ba] Flux v0.13.15
...

@ToucheSir
Copy link
Member

Can you replicate this with just the cuDNN package? If so, worth an issue on the CUDA.jl side.

@mkschleg
Copy link
Contributor Author

mkschleg commented May 3, 2023

Sorry for taking so long. This does replicate when I just try to load cuDNN, but does not happen when I load CUDA.

@mkschleg
Copy link
Contributor Author

mkschleg commented May 3, 2023

Except, now that I have CUDA, cuDNN and Flux installed I only get the error when loading Flux for the first time. Not cuDNN.

@mkschleg
Copy link
Contributor Author

mkschleg commented May 3, 2023

If I load cuDNN first and then Flux the issue doesn't occur. Like:

using cuDNN
using Flux

Trying to figure out which version this starts.

@mkschleg
Copy link
Contributor Author

mkschleg commented May 3, 2023

The first version this happens in is 0.13.14 which is when we upgrade from CUDAv3 to CUDAv4. The trick above still works for now. Something weird is going on though. If I remove cuDNN (which only removes in the Project.toml), close julia, open a fresh Julia instance then add cuDNN and try to load we get the init error. I'm not sure what could be going on tbh, but I'll open an issue on CUDA.jl and link here.

@CarloLucibello
Copy link
Member

Is this issue still relevant now that Cuda support is an extension?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants