Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper crashes calling ggml_init() with CUDA enabled #2421

Open
CassinianSoftware opened this issue Sep 18, 2024 · 0 comments
Open

Whisper crashes calling ggml_init() with CUDA enabled #2421

CassinianSoftware opened this issue Sep 18, 2024 · 0 comments

Comments

@CassinianSoftware
Copy link

CassinianSoftware commented Sep 18, 2024

I'm running Visual Studio 2022 (latest update) and CUDA 12.6 on a Dell T5600 with a pair of GeForce 1050 Ti GPUs (which I realize are old Pascal chips) and Windows 10 (latest update). I compiled WhisperCpp, ggml, and SLD2 without issue (as static libs) and tested using the command.cpp demo console app.

Fun app. Worked fine, but performance was sluggish. So I set GGML_USE_CUDA and recompiled with CUDA and cuBLAS in a bid to improve performance. After a bit of trial-and-error getting everything to compile and link, I was able to test again with command.exe. Unfortunately, whisper.cpp is now crashing at "model.ctx = ggml_init(params);" at around line 1620. Execution never gets to "if (!model.ctx)" so the error "ggml_init() failed" is not displayed.

It seems like an issue with memory allocation, the value of "n_tensors * ggml_tensor_overhead()" in "params" but I'm not sure about the value of "n_tensors" because it contains hard coded values (i.e. 10 + 15 + 15 * n_audio_layer + 24 * n_text_layer). What do 10, 15, 15, and 24 represent? The proper allocation of memory with ggml_init() seems important, but this appears an odd way to calculate it.

Or, am I chasing the wrong problem? Any suggestions would be most appreciated. Thanks!

UPDATE: I've been able to confirm that ggml is crashing WhisperCpp in ggml.c at this line:
"float f = ggml_table_f32_f16[i] = GGML_COMPUTE_FP16_TO_FP32(u.fp16);"

..which is in the function ggml_init() at around line 3469. The above offending line is around line 3500 in this function, in ggml.c. Not sure why this would be an issue when CUDA is enabled, but not when CUDA is not used.???

@CassinianSoftware CassinianSoftware changed the title Whisper crashes calling ggml_init() uising CUDA Whisper crashes calling ggml_init() using CUDA Sep 18, 2024
@CassinianSoftware CassinianSoftware changed the title Whisper crashes calling ggml_init() using CUDA Whisper crashes calling ggml_init() with CUDA enabled Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant