Replies: 2 comments 2 replies
-
I'd recommend starting with our GTC videos What's New in Transformer Engine and FP8 Training | GTC 2024 and FP8 Training with Transformer Engine | GTC 2023. You can also run our FP8 examples on any Hopper or Ada GPU. Once you're ready to contribute please refer to CONTRIBUTING.rst guide. |
Beta Was this translation helpful? Give feedback.
-
Transformer Engine is a library for accelerating Transformer models on NVIDIA GPUs. As you mention, "Attention is all you need" is a good introductory paper on this model architecture, and it may be helpful to study variations like BERT or GPT. Our biggest feature is probably support for FP8 training, including tuned CUDA kernels, integrations into cuBLAS and cuDNN, and automatic handling of FP8 scaling factors in multiple DL frameworks. We use the FP8 E4M3 and FP8 E5M2 formats. It may be useful to look at the Hopper white paper and FP16 training. I have also seen some interesting work on FP8 training by the MS-AMP team. TE also integrates with the large-scale training techniques used in Megatron-LM and NeMo. Important topics include data parallelism, tensor parallelism, ZeRO/FSDP, and pipeline parallelism. Finally, the best basic overviews are probably the TE docs and @ptrendx's 2023 GTC talk. |
Beta Was this translation helpful? Give feedback.
-
Hi i am looking to contribute to this project. I am just getting started... and I am looking for papers/books that explain the logic/concept of this open source project. Could you share any and everything that i should look into before I jump in?
I know that attention is all you need is a good start.
Thank you in advance
Beta Was this translation helpful? Give feedback.
All reactions