You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Would it be an idea to define the same kernels that exist in the CUDA backend with ThunderKittens as well? They have cool examples with FlashAttention2 and I think it would be interesting to have as an educational resource as well. Thoughts?
The text was updated successfully, but these errors were encountered:
Yes! I'm planning to look into using ThunderKittens once I've got more time (probably 2nd week of June). I'm not sure there's much point using it for kernels that don't use the tensor core though? But it might allow fusing even more things together (e.g. matmul and fused classifier maybe)
My plan was to mostly focus on making a hyper-optimised path for H100 using TMA though... But we'll see what happens :)
Would it be an idea to define the same kernels that exist in the CUDA backend with ThunderKittens as well? They have cool examples with FlashAttention2 and I think it would be interesting to have as an educational resource as well. Thoughts?
The text was updated successfully, but these errors were encountered: