Implement basic functionalities of ML frameworks using Numpy. The goal of this project is to help build understanding of basic building blocks of modeling and training and the math behind it. Nothing practical beyond that.
Supports forward and backward path. Tested using Jax/Flax.
-
layers
Dense
: Fully connected layer.Conv2D
: 2D Convolutional layer.MultiHeadAttention
: Attention mechanism. Found in transformer encoder/decoder blocks.TransformerEncoder
: Transformer decoder block. Found in encoder-only architecture (BERT), encoder-decoder architecture (BART, T5).TransformerDecoder
: Transformer decoder block. Found in decoder-only arhictecture (GPT, PaLM, LLaMA), encoder-decoder architecture (BART, T5).
-
activaitons
ReLU
: ReLU activation. Basic and popular non-linear activation.Softmax
: Softmax activation. Normalize output as a probability distribution.
-
normalizations
Dropout
: Dropout. Resolve overfitting through preventing units co-adopting.LayerNormalization
: Layer normalization. Normalize each individual sample in a batch. Common in autoregressive NLP tasks.
MSELoss
: Mean square error loss for regression tasks.CrossEntropyLoss
: Cross entropy loss for classification tasks.
SGDOptimizer
: Stochastic gradient descent.AdamOptimizer
: Adam optimizer. Dynamically adjusting learning rate on individual weights based on momentumn and velocity.
Trainer
: Naive local trainer.
BinaryClassificationMetrics
: Precision (focus on predicts) and recall (focus on truth).