This repository is a fork of the Composer library to train Mamba models with the following features:
- Custom Block-wise activation checkpointing
- Custom FSDP layer wrapping for Mamba
- The WSD scheduler
- FLOPs computation for Mamba
- Custom and efficient dataloading
- Improved logging
More details and instructions can be found in the dedicated mamba
directory on how to use and train Mamba models with the provided codebase.