torch.bmm kernel fusion #28

Edenzzzz · 2024-04-17T05:54:52Z

@DanFu09 Thanks for open-sourcing the code!
I see that in your previous fly repo(https://github.com/HazyResearch/fly), you used cast_inputs=torch.float16 for BlockdiagButterflyMultiply, but changed it to bf16 here. I wonder if there's a specific reason (e.g. fp16 training not converging due to range issues)?
Also, I wonder if there are opportunities for fusing the two bmm operations into one kernel? It seems hard to find the exact kernel torch is calling though.

Edenzzzz changed the title ~~torch.bmm kernel fusion and precision casting issue in BlockdiagButterflyMultiply~~ torch.bmm kernel fusion May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.bmm kernel fusion #28

torch.bmm kernel fusion #28

Edenzzzz commented Apr 17, 2024 •

edited

Loading

torch.bmm kernel fusion #28

torch.bmm kernel fusion #28

Comments

Edenzzzz commented Apr 17, 2024 • edited Loading

Edenzzzz commented Apr 17, 2024 •

edited

Loading