Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

latest ggml version sync #20

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

latest ggml version sync #20

wants to merge 2 commits into from

Conversation

dridri
Copy link

@dridri dridri commented Aug 10, 2024

This is an attempt to rebase on the latest commit of ggml master branch. My primary goal behind it is to add Vulkan/OpenCL support as I only have AMD GPUs.

Tested and working well on CPU, but I don't have any nVidia card around so I cannot test CUDA backend.

This needs the following PR on ggml to work : balisujohn/ggml#1

@balisujohn
Copy link
Owner

balisujohn commented Aug 10, 2024

It'll be maybe 10 days before I can properly review this but it looks like really solid work and I will make sure it gets merged.

One thing I'd like to preemptively request is an update to the readme listing the extent of support for Vulkan/OpenCL and adding compile instructions if any special instructions are necessary for those compile processes.

Vulkan/OpenCL support would be extremely welcome but likely will require new implementations being added to ggml for some ops.

@dridri
Copy link
Author

dridri commented Aug 10, 2024

Agreed, other backends support should come in different PRs to avoid polluting this one. And will take me more time as it's the very first time I'm dealing with compute shaders and tensors.

Tell me if I'm wrong, but porting pad_reflect_1d, unfold_1d and conv_transpose_1d should be enough.

@balisujohn
Copy link
Owner

balisujohn commented Aug 11, 2024

I expect it will be similar to the ops needed to add metal support, listed here #14. But I haven't confirmed there aren't additional ops missing vulkan/opencl implementations.

@baileyheading
Copy link
Contributor

Vulkan can support macos quite easily right?

Copy link
Owner

@balisujohn balisujohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dridri

Thanks for your hard work on this so far.

I looked at all the code and it all looks fine, barring one comment I had on CmakeLists.txt.

So I tested with intel CPU and a Nvidia 1070ti. It passes all tests of the Nvidia card, but fails the autoregressive model test on CPU. Generation quality was fine on both GPU and CPU for both short and long phrases. GPU and CPU tests compare against the same values, so this means that very likely ggml has changed such that some GPU op is now inconsistent with the equivalent CPU op.

You can uncomment the tests near the beginning of main to run the tests in CPU mode. The function print_all_tensors lets you print the first and last 3 elements of the tensor of your choice you can tag with ggml_set_name in the autoregressive graph. There are commented out instances of each function in main.cpp to give an idea out how to use them. So we need to find the point of divergence between the old and new CPU process. Help with this would be awesome but I will work on it when I can.

set(CMAKE_EXPORT_COMPILE_COMMANDS ON)

add_executable(tortoise main.cpp common.cpp)
option(DEBUG "Debug mode" OFF)
option(GGML_CUDA "cuda mode" OFF)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a GGML_METAL option similarly declared here?

@balisujohn
Copy link
Owner

balisujohn commented Aug 25, 2024

I have isolated the divergence in behavior on CPU to this OP

cur = ggml_mul_mat(
for some reason this ggml_mul_mat op gives different outputs given the test input in the version of GGML in your commit and the version the master branch of tortoise.cpp currently points to. This could be related to a change to how GGML implements matrix multiplication. Our options are to either create different test cases for GPU and CPU, and change the CPU test cases to match the current CPU behavior,or to isolate the divergence in ggml_mul_mat behavior to vanilla GGML versions and try to get it fixed upstream. I lean somewhat towards creating separate CPU and GPU tests so we can keep development moving.

@dridri
Copy link
Author

dridri commented Aug 25, 2024

just to note, I'm also having a hard time porting it to Vulkan, after implementing the missing *_1d() functions, the output is total garbage (white noise + buzzing sound).
Don't know if it could be related

@balisujohn
Copy link
Owner

balisujohn commented Aug 25, 2024

Definitely the first thing I'd recommend trying is seeing if any of the tests pass with the vulkan process by leaving only 1 uncommented at a time, that could help isolate the divergence in the Vulkan process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants