Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal Memory Access Error During Gradient Calculation of predefined losses on GPU RTX 4050 #2361

Closed
yolhan83 opened this issue Dec 17, 2023 · 1 comment

Comments

@yolhan83
Copy link

yolhan83 commented Dec 17, 2023

I'm experiencing a problem with gradient calculations on a GPU using Flux.jl. Below is a minimal example that demonstrates the issue:

using Flux,CUDA,cuDNN; x = rand(Float32,1,1000) |> gpu; y = rand(Float32,1,1000) |> gpu; model = Flux.Chain( Flux.Dense(1,10,tanh), Flux.Dense(10,10,tanh), Flux.Dense(10,1) ) |> gpu; loss(model,x,y) = Flux.mse(model(x),y); loss(model,x,y); gradient(loss,model,x,y)

When executing the gradient calculation (last line), I encounter the following error:

ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS) ERROR: WARNING: Error while freeing DeviceBuffer(3.906 KiB at 0x000000020502c800): CUDA.CuError(code=CUDA.cudaError_enum(0x000002bc), details=CUDA.Optional{String}(data=nothing))

Interestingly, when I modify the loss function to the following, the error no longer occurs, and the code runs as expected:

loss(model, x, y) = norm(model(x) .- y) ./ size(y, 2)

Given this behavior, I suspect there might be an issue specifically related to the Flux.Losses function (tested for mse and crossententropy, both fail) or its interaction with CUDA.jl when calculating gradients.

I've ensured that all libraries (Flux.jl, CUDA.jl, cuDNN) and drivers are up to date. The error persists despite various attempts to debug and isolate the issue.

Any insights or suggestions on how to address this problem would be greatly appreciated.

Thank you for your assistance.
julia : 1.9.4
[052768ef] CUDA v5.1.1
[587475ba] Flux v0.14.7
[02a925ec] cuDNN v1.2.1

@ToucheSir
Copy link
Member

mse uses mean, so this is a duplicate of FluxML/Zygote.jl#1473. As I mentioned on Slack, please give the troubleshooting instructions in that issue a try to help us figure out what's going on with CUDA + Zygote.

@ToucheSir ToucheSir closed this as not planned Won't fix, can't repro, duplicate, stale Dec 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants