2D and 3D tile divisions so that permutation coordinates can be read from threadIdx and blockIdx #406

ChrisDryden · 2024-05-13T18:12:59Z

Supposedly the permutation kernels, even though they are mostly memory bound can reduce the amount of division and do thread coarsening by having a 2d or 3d grid and not have to do any division in the kernel itself

Looking into this from the advice of @ngc92:

integer divisions are really expensive, but I don't think they will matter much in a kernel as memory-bound as this. I guess the first thing to do would be some thread coarsening, so that the divisions are amortized, and possibly a 2D or 3D grid, so that you don't even have to do the divisions at all, and can just read off individual coordinates from threadIdx and blockIdx.

Creating this issue to track progress on this

The text was updated successfully, but these errors were encountered:

ChrisDryden · 2024-05-13T18:17:28Z

Where this came up in discussion was regarding the possibility of adding all of the constants that can be passed into the kernel directly, such as the following values: https://github.com/karpathy/llm.c/blob/master/train_gpt2.cu#L689

Wouldn't neccesarily add more lines of code, just reorganize where the calculations are done. From a theoretical standpoint this should speed things up since it reduces the amount of calculations by a factor of how many kernels are used

Karliz24 · 2024-05-23T15:38:45Z

👍🏻

ChrisDryden · 2024-05-25T03:46:26Z

Created an example implementation here: #459 but it doesn't seem to be working properly

ChrisDryden mentioned this issue May 25, 2024

Added new cuda kernel for encoder forwards using three dimensional kernels #459

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2D and 3D tile divisions so that permutation coordinates can be read from threadIdx and blockIdx #406

2D and 3D tile divisions so that permutation coordinates can be read from threadIdx and blockIdx #406

ChrisDryden commented May 13, 2024 •

edited

Loading

ChrisDryden commented May 13, 2024

Karliz24 commented May 23, 2024

ChrisDryden commented May 25, 2024

2D and 3D tile divisions so that permutation coordinates can be read from threadIdx and blockIdx #406

2D and 3D tile divisions so that permutation coordinates can be read from threadIdx and blockIdx #406

Comments

ChrisDryden commented May 13, 2024 • edited Loading

ChrisDryden commented May 13, 2024

Karliz24 commented May 23, 2024

ChrisDryden commented May 25, 2024

ChrisDryden commented May 13, 2024 •

edited

Loading