Could you share any papers/books that provide underlying explanations for this project? #807

nicks64 · 2024-04-24T17:12:20Z

nicks64
Apr 24, 2024

Hi i am looking to contribute to this project. I am just getting started... and I am looking for papers/books that explain the logic/concept of this open source project. Could you share any and everything that i should look into before I jump in?

I know that attention is all you need is a good start.

Thank you in advance

sbhavani · 2024-04-24T18:01:59Z

sbhavani
Apr 24, 2024

I'd recommend starting with our GTC videos What's New in Transformer Engine and FP8 Training | GTC 2024 and FP8 Training with Transformer Engine | GTC 2023. You can also run our FP8 examples on any Hopper or Ada GPU.

Once you're ready to contribute please refer to CONTRIBUTING.rst guide.

1 reply

nicks64 Apr 24, 2024
Author

i personally think i need to look into mathematical concept to contribute... am i wrong?

or some books/research papers.

timmoon10 · 2024-04-24T18:59:44Z

timmoon10
Apr 24, 2024
Maintainer

Transformer Engine is a library for accelerating Transformer models on NVIDIA GPUs. As you mention, "Attention is all you need" is a good introductory paper on this model architecture, and it may be helpful to study variations like BERT or GPT.

Our biggest feature is probably support for FP8 training, including tuned CUDA kernels, integrations into cuBLAS and cuDNN, and automatic handling of FP8 scaling factors in multiple DL frameworks. We use the FP8 E4M3 and FP8 E5M2 formats. It may be useful to look at the Hopper white paper and FP16 training. I have also seen some interesting work on FP8 training by the MS-AMP team.

TE also integrates with the large-scale training techniques used in Megatron-LM and NeMo. Important topics include data parallelism, tensor parallelism, ZeRO/FSDP, and pipeline parallelism.

Finally, the best basic overviews are probably the TE docs and @ptrendx's 2023 GTC talk.

1 reply

nicks64 Apr 24, 2024
Author

i personally think i need to look into mathematical concepts to contribute... am i wrong?

or work on the project in tangent

or some books/research papers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you share any papers/books that provide underlying explanations for this project? #807

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Could you share any papers/books that provide underlying explanations for this project? #807

nicks64 Apr 24, 2024

Replies: 2 comments · 2 replies

sbhavani Apr 24, 2024

nicks64 Apr 24, 2024 Author

timmoon10 Apr 24, 2024 Maintainer

nicks64 Apr 24, 2024 Author

nicks64
Apr 24, 2024

Replies: 2 comments 2 replies

sbhavani
Apr 24, 2024

nicks64 Apr 24, 2024
Author

timmoon10
Apr 24, 2024
Maintainer

nicks64 Apr 24, 2024
Author