-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forwards-compatible driver breaks CURAND #2496
Comments
Can you try different versions of CUDA using |
Weirdly, no it doesn't fix it. On 12.5:
julia> using CUDA
Precompiling CUDA
2 dependencies successfully precompiled in 59 seconds. 66 already precompiled.
julia> CUDA.versioninfo()
CUDA runtime 12.5, artifact installation
CUDA driver 12.6
NVIDIA driver 550.54.15
CUDA libraries:
- CUBLAS: 12.5.3
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.3
- CUSPARSE: 12.5.1
- CUPTI: 2024.2.1 (API 23.0.0)
- NVML: 12.0.0+550.54.15
Julia packages:
- CUDA: 5.5.0
- CUDA_Driver_jll: 0.10.0+0
- CUDA_Runtime_jll: 0.15.1+0
Toolchain:
- Julia: 1.10.5
- LLVM: 15.0.7
Preferences:
- CUDA_Runtime_jll.version: 12.5
1 device:
0: Tesla V100-SXM2-16GB (sm_70, 15.460 GiB / 16.000 GiB available)
julia> CUDA.seed!(1234)
ERROR: CURANDError: initialization of CUDA failed (code 203, CURAND_STATUS_INITIALIZATION_FAILED)
Stacktrace:
[1] throw_api_error(res::CUDA.CURAND.curandStatus)
@ CUDA.CURAND ~/.julia/packages/CUDA/G5GKI/lib/curand/libcurand.jl:14
[2] check
@ ~/.julia/packages/CUDA/G5GKI/lib/curand/libcurand.jl:28 [inlined]
[3] curandCreateGenerator
@ ~/.julia/packages/CUDA/G5GKI/lib/utils/call.jl:34 [inlined]
[4] curandCreateGenerator(typ::CUDA.CURAND.curandRngType)
@ CUDA.CURAND ~/.julia/packages/CUDA/G5GKI/lib/curand/wrappers.jl:5
[5] CUDA.CURAND.RNG(typ::CUDA.CURAND.curandRngType; stream::CuStream)
@ CUDA.CURAND ~/.julia/packages/CUDA/G5GKI/lib/curand/random.jl:13
[6] RNG (repeats 2 times)
@ ~/.julia/packages/CUDA/G5GKI/lib/curand/random.jl:12 [inlined]
[7] #101
@ ~/.julia/packages/CUDA/G5GKI/lib/curand/CURAND.jl:29 [inlined]
[8] #context!#990
@ ~/.julia/packages/CUDA/G5GKI/lib/cudadrv/state.jl:168 [inlined]
[9] context!
@ ~/.julia/packages/CUDA/G5GKI/lib/cudadrv/state.jl:163 [inlined]
[10] handle_ctor(ctx::CuContext)
@ CUDA.CURAND ~/.julia/packages/CUDA/G5GKI/lib/curand/CURAND.jl:28
[11] pop!(cache::CUDA.APIUtils.HandleCache{CuContext, CUDA.CURAND.RNG}, key::CuContext)
@ CUDA.APIUtils ~/.julia/packages/CUDA/G5GKI/lib/utils/cache.jl:44
[12] (::CUDA.CURAND.var"#new_state#109")(cuda::@NamedTuple{device::CuDevice, context::CuContext, stream::CuStream, math_mode::CUDA.MathMode, math_precision::Symbol})
@ CUDA.CURAND ~/.julia/packages/CUDA/G5GKI/lib/curand/CURAND.jl:51
[13] #107
@ ~/.julia/packages/CUDA/G5GKI/lib/curand/CURAND.jl:61 [inlined]
[14] get!(default::CUDA.CURAND.var"#107#111"{CUDA.CURAND.var"#new_state#109", @NamedTuple{…}}, h::Dict{CuContext, @NamedTuple{…}}, key::CuContext)
@ Base ./dict.jl:479
[15] default_rng()
@ CUDA.CURAND ~/.julia/packages/CUDA/G5GKI/lib/curand/CURAND.jl:60
[16] curand_rng
@ ~/.julia/packages/CUDA/G5GKI/src/random.jl:282 [inlined]
[17] seed!(seed::Int64)
@ CUDA ~/.julia/packages/CUDA/G5GKI/src/random.jl:286
[18] top-level scope
@ REPL[7]:1
Some type information was truncated. Use `show(err)` to see complete types. and on 12.4:
julia> using CUDA
Downloaded artifact: CUDA_Runtime
Precompiling CUDA
2 dependencies successfully precompiled in 60 seconds. 66 already precompiled.
julia> CUDA.versioninfo()
CUDA runtime 12.4, artifact installation
CUDA driver 12.6
NVIDIA driver 550.54.15
CUDA libraries:
- CUBLAS: 12.4.5
- CURAND: 10.3.5
- CUFFT: 11.2.1
- CUSOLVER: 11.6.1
- CUSPARSE: 12.3.1
- CUPTI: 2024.1.1 (API 22.0.0)
- NVML: 12.0.0+550.54.15
Julia packages:
- CUDA: 5.5.0
- CUDA_Driver_jll: 0.10.0+0
- CUDA_Runtime_jll: 0.15.1+0
Toolchain:
- Julia: 1.10.5
- LLVM: 15.0.7
Preferences:
- CUDA_Runtime_jll.version: 12.4
1 device:
0: Tesla V100-SXM2-16GB (sm_70, 15.460 GiB / 16.000 GiB available)
julia> CUDA.seed!(1234)
ERROR: CURANDError: initialization of CUDA failed (code 203, CURAND_STATUS_INITIALIZATION_FAILED)
Stacktrace:
[1] throw_api_error(res::CUDA.CURAND.curandStatus)
@ CUDA.CURAND ~/.julia/packages/CUDA/G5GKI/lib/curand/libcurand.jl:14
[2] check
@ ~/.julia/packages/CUDA/G5GKI/lib/curand/libcurand.jl:28 [inlined]
[3] curandCreateGenerator
@ ~/.julia/packages/CUDA/G5GKI/lib/utils/call.jl:34 [inlined]
[4] curandCreateGenerator(typ::CUDA.CURAND.curandRngType)
@ CUDA.CURAND ~/.julia/packages/CUDA/G5GKI/lib/curand/wrappers.jl:5
[5] CUDA.CURAND.RNG(typ::CUDA.CURAND.curandRngType; stream::CuStream)
@ CUDA.CURAND ~/.julia/packages/CUDA/G5GKI/lib/curand/random.jl:13
[6] RNG (repeats 2 times)
@ ~/.julia/packages/CUDA/G5GKI/lib/curand/random.jl:12 [inlined]
[7] #101
@ ~/.julia/packages/CUDA/G5GKI/lib/curand/CURAND.jl:29 [inlined]
[8] #context!#990
@ ~/.julia/packages/CUDA/G5GKI/lib/cudadrv/state.jl:168 [inlined]
[9] context!
@ ~/.julia/packages/CUDA/G5GKI/lib/cudadrv/state.jl:163 [inlined]
[10] handle_ctor(ctx::CuContext)
@ CUDA.CURAND ~/.julia/packages/CUDA/G5GKI/lib/curand/CURAND.jl:28
[11] pop!(cache::CUDA.APIUtils.HandleCache{CuContext, CUDA.CURAND.RNG}, key::CuContext)
@ CUDA.APIUtils ~/.julia/packages/CUDA/G5GKI/lib/utils/cache.jl:44
[12] (::CUDA.CURAND.var"#new_state#109")(cuda::@NamedTuple{device::CuDevice, context::CuContext, stream::CuStream, math_mode::CUDA.MathMode, math_precision::Symbol})
@ CUDA.CURAND ~/.julia/packages/CUDA/G5GKI/lib/curand/CURAND.jl:51
[13] #107
@ ~/.julia/packages/CUDA/G5GKI/lib/curand/CURAND.jl:61 [inlined]
[14] get!(default::CUDA.CURAND.var"#107#111"{CUDA.CURAND.var"#new_state#109", @NamedTuple{…}}, h::Dict{CuContext, @NamedTuple{…}}, key::CuContext)
@ Base ./dict.jl:479
[15] default_rng()
@ CUDA.CURAND ~/.julia/packages/CUDA/G5GKI/lib/curand/CURAND.jl:60
[16] curand_rng
@ ~/.julia/packages/CUDA/G5GKI/src/random.jl:282 [inlined]
[17] seed!(seed::Int64)
@ CUDA ~/.julia/packages/CUDA/G5GKI/src/random.jl:286
[18] top-level scope
@ REPL[5]:1
Some type information was truncated. Use `show(err)` to see complete types. |
Here it is from the version where I'm not on master, but on the release branch: ulia> using CUDA
julia> CUDA.versioninfo()
CUDA runtime 12.5, artifact installation
CUDA driver 12.6
NVIDIA driver 550.54.15, originally for CUDA 12.4
CUDA libraries:
- CUBLAS: 12.5.3
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.3
- CUSPARSE: 12.5.1
- CUPTI: 2024.2.1 (API 23.0.0)
- NVML: 12.0.0+550.54.15
Julia packages:
- CUDA: 5.4.3
- CUDA_Driver_jll: 0.9.2+0
- CUDA_Runtime_jll: 0.14.1+0
Toolchain:
- Julia: 1.10.5
- LLVM: 15.0.7
1 device:
0: Tesla V100-SXM2-16GB (sm_70, 15.460 GiB / 16.000 GiB available)
julia> CUDA.seed!(1234) |
Interesting. Do you have the time to bisect which commit introduced this? |
I went through and bisected it, and it looks like the error was introduced in 763164d. Test Script
@info "Updating Packages"
using Pkg; Pkg.update()
@info "Loading/Precompiling"
@time using CUDA
@info "Beginning Test"
CUDA.seed!(1234) Bisection Log
|
Can you run with And FWIW, as a workaround, you can disable use of the forwards-compatible driver library by setting a preference (see |
did you mean Thanks for the suggestion of the workaround. Do you mean a section like what you have in the LocalPreferences.toml file but with I.e, [CUDA_Driver_jll]
# whether to attempt to load a forwards-compatibile userspace driver.
# only turn this off if you experience issues, e.g., when using a local
# toolkit that's much older than the available forwards compatible driver.
compat = "true" Without LD_DEBUG
With LD_DEBUG
|
Should be fixed once JuliaRegistries/General#115418 is merged (and you upgrade |
amazing it works. thank you so much! I really appreciate how quick and thorough you've been at responding to these issues the last few days |
Describe the bug
On master branch, attempting to set the seed causes an error. I suspect this may be hardware specific, because I only hit this error on a server I'm trying to run some code on (and not on my ubuntu desktop). On this machine, I do not hit this issue if I use the most recent release: v.5.4.3
To reproduce
The Minimal Working Example (MWE) for this bug:
Manifest.toml
Expected behavior
For the seed to be set.
Version info
Details on Julia:
Details on CUDA:
Additional context
The text was updated successfully, but these errors were encountered: