Collection of articles, papers and techniques considered in this project.
- Knowledge Distillation: Teacher is GPT-4. Generate great Go samples on which to fine-tune the model.
Name | Description |
---|---|
Evaluating Large Language Models Trained on Code | Foundation Codex paper. HumanEval benchmark and introduction to large language models for Code |
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X | 13B multilingual model based on GPT |
NarrowBERT: Accelerating Masked Language Model Pretraining and Inference | A a modified transformer encoder that increases the throughput for masked language model pretraining by more than 2 times |
Name | Description |
---|---|
How to train your own Large Language Models | General advice in training code LLMs. Good advice on data pipeline. |
MosaicBERT: Pretraining BERT from Scratch for $20 | Optimised BERT training recipe |
Name | Description |
---|---|
Microsoft DeepSpeed | Efficient and fast training of LLMs. Optimized Transformer layers and backprop implementation |
NVIDIA FasterTransformer | Optimized inference of Transformer models |
OpenAI Triton | Language and compiler for writing highly efficient custom Deep-Learning primitives on GPUs |
Torch Compile | Compiles PyTorch models into optimised kernels. |
PyTorch 2.0 Nightly Release | Includes Triton support when using Torch Compile |