Skip to content

Code from the collaboration work between Intel and UKP. Implements a dynamic layer skipping based on Gumbel Softmax (for llama models).

License

Notifications You must be signed in to change notification settings

UKPLab/gumbel-softmax-layer-skipping-2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gumbel_softmax_layer_skipping_2024

License Python Versions

This repository contains the implementation of a trainable layer skipping mechanism using the Gumbel Softmax function. The code is tailored for Llama2.

Contact person: Ji-Ung Lee

UKP Lab | TU Darmstadt

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions. Some of the code is also based on the original llama code, so you may find this repository helpful as well.

Project structure

  • configs — contains five different configuration files:
    1. datasets.py available datasets, names, and splits
    2. fsdp.py settings for training on multiple GPUs and concerning quantization
    3. inference.py inference configuration; e.g., max_new_tokens, temperature, etc.
    4. peft.py settings for parameter efficient training methods (e.g., LoRA)
    5. training.py settings for training, e.g., learning rate, batch size, etc.
  • llama_datasets — Data loading and preparation routines
  • model_checkpointing — (self explanatory)
  • neuralnets — Implementation of the gumbel softmax for Llama2
  • utils — various utilities for training, saving/loading models, etc.
  • policies — Utilities for FSDP (do not touch)
  • results — Result folder (needs to be created)

Getting Started

To run the experiments, first create a respective virtual env (using e.g., conda):

conda create --name=<envname> python=3.9
conda activate <envname>
pip install -r requirements.txt

We run all our experiments with python 3.9.

Usage

Training

To finetune a base model, you can use torchrun with finetune_llama2.py

torchrun --standalone \
    --nnodes 1 --nproc_per_node 2 finetune_llama2.py \  # Use 2 GPUs
    --batch_size_training 1 \
    --model_name "<path-to-model>" \
    --pure_bf16 \
    --num_epochs 3 \
    --output_dir "model_output/llama-7b-trained-gumbel" \
    --gumbel 1  \
    --dataset "samsum_dataset"

Detailed description of all parameters are provided in configs/training.py.

Inference

To perform innference, you can use torchrun with evaluate_llama2.py.

torchrun --standalone \
    --nnodes 1 --nproc_per_node 1 evaluate_llama2.py \
    --model_name "model_output/llama-7b-trained-gumbel" \
    --use_gumbel 1 \
    --output_dir "results" 

This will also measure the overall time taken for inference (normalized by the number of generated tokens) and keep track of the layers that were activated for the Gumbel Softmax (written into activations.json). The scores and generated text will be written into .json and .tsv files.

Cite

This work has no accompanying paper. However you may cite the preliminary work on adaptable adapters that served as a basis for this work.

@inproceedings{moosavi-etal-2022-adaptable,
    title = "Adaptable Adapters",
    author = "Moosavi, Nafise  and
      Delfosse, Quentin  and
      Kersting, Kristian  and
      Gurevych, Iryna",
    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.naacl-main.274",
    pages = "3742--3753",
}

Disclaimer

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

About

Code from the collaboration work between Intel and UKP. Implements a dynamic layer skipping based on Gumbel Softmax (for llama models).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages