References

Collection of articles, papers and techniques considered in this project.

Knowledge Distillation: Teacher is GPT-4. Generate great Go samples on which to fine-tune the model.

Papers

Name	Description
Evaluating Large Language Models Trained on Code	Foundation Codex paper. HumanEval benchmark and introduction to large language models for Code
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X	13B multilingual model based on GPT
NarrowBERT: Accelerating Masked Language Model Pretraining and Inference	A a modified transformer encoder that increases the throughput for masked language model pretraining by more than 2 times

Name	Description
How to train your own Large Language Models	General advice in training code LLMs. Good advice on data pipeline.
MosaicBERT: Pretraining BERT from Scratch for $20	Optimised BERT training recipe

Name	Description
Microsoft DeepSpeed	Efficient and fast training of LLMs. Optimized Transformer layers and backprop implementation
NVIDIA FasterTransformer	Optimized inference of Transformer models
OpenAI Triton	Language and compiler for writing highly efficient custom Deep-Learning primitives on GPUs
Torch Compile	Compiles PyTorch models into optimised kernels.
PyTorch 2.0 Nightly Release	Includes Triton support when using Torch Compile