diff --git a/README.md b/README.md index 6eec8d3..312706e 100644 --- a/README.md +++ b/README.md @@ -16,56 +16,60 @@ Contact person: [Ji-Ung Lee](mailto:bungobang@yahoo.de) Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions. +## Project structure + +* `configs` — contains five different configuration files: + 1. `datasets.py` available datasets, names, and splits + 2. `fsdp.py` settings for training on multiple GPUs and concerning quantization + 3. `inference.py` inference configuration; e.g., max_new_tokens, temperature, etc. + 4. `peft.py` settings for parameter efficient training methods (e.g., LoRA) + 5. `training.py` settings for training, e.g., learning rate, batch size, etc. +* `llama_datasets` — Data loading and preparation routines +* `model_checkpointing` — (self explanatory) +* `neuralnets` — Implementation of the gumbel softmax for Llama2 +* `utils` — various utilities for training, saving/loading models, etc. +* `policies` — Utilities for FSDP (do not touch) +* `results` — Result folder (needs to be created) ## Getting Started -If you want to set up this template: - -1. Request a repository on UKP Lab's GitHub by following the standard procedure on the wiki. It will install the template directly. Alternatively, set it up in your personal GitHub account by clicking **[Use this template](https://github.com/rochacbruno/python-project-template/generate)**. -2. Wait until the first run of CI finishes. Github Actions will commit to your new repo with a "✅ Ready to clone and code" message. -3. Delete optional files: - - If you don't need automatic documentation generation, you can delete folder `docs`, file `.github\workflows\docs.yml` and `mkdocs.yml` - - If you don't want automatic testing, you can delete folder `tests` and file `.github\workflows\tests.yml` -4. Prepare a virtual environment: -```bash -python -m venv .venv -source .venv/bin/activate -pip install . -pip install -r requirements-dev.txt # Only needed for development -``` -5. Adapt anything else (for example this file) to your project. +To run the experiments, first create a respective virtual env (using e.g., conda): + + conda create --name= python=3.9 + conda activate + pip install -r requirements.txt -6. Read the file [ABOUT_THIS_TEMPLATE.md](ABOUT_THIS_TEMPLATE.md) for more information about development. +We run all our experiments with python 3.9. ## Usage -### Using the classes +### Training -To import classes/methods of `gumbel_softmax_layer_skipping_2024` from inside the package itself you can use relative imports: +To finetune a base model, you can use torchrun with `finetune_llama2.py` -```py -from .base import BaseClass # Notice how I omit the package name + torchrun --standalone \ + --nnodes 1 --nproc_per_node 2 finetune_llama2.py \ # Use 2 GPUs + --batch_size_training 1 \ + --model_name "" \ + --pure_bf16 \ + --num_epochs 3 \ + --output_dir "model_output/llama-7b-trained-gumbel" \ + --gumbel 1 \ + --dataset "samsum_dataset" -BaseClass().something() -``` +Detailed description of all parameters are provided in `configs/training.py`. -To import classes/methods from outside the package (e.g. when you want to use the package in some other project) you can instead refer to the package name: +### Inference -```py -from gumbel_softmax_layer_skipping_2024 import BaseClass # Notice how I omit the file name -from gumbel_softmax_layer_skipping_2024.subpackage import SubPackageClass # Here it's necessary because it's a subpackage +To perform innference, you can use torchrun with `evaluate_llama2.py`. -BaseClass().something() -SubPackageClass().something() -``` + torchrun --standalone \ + --nnodes 1 --nproc_per_node 1 evaluate_llama2.py \ + --model_name "model_output/llama-7b-trained-gumbel" \ + --use_gumbel 1 \ + --output_dir "results" -### Using scripts - -This is how you can use `gumbel_softmax_layer_skipping_2024` from command line: - -```bash -$ python -m gumbel_softmax_layer_skipping_2024 -``` +This will also measure the overall time taken for inference (normalized by the number of generated tokens) and keep track of the layers that were activated for the Gumbel Softmax (written into `activations.json`). The scores and generated text will be written into `.json` and `.tsv` files. ## Cite diff --git a/configs/__init__.py b/code/configs/__init__.py similarity index 100% rename from configs/__init__.py rename to code/configs/__init__.py diff --git a/configs/datasets.py b/code/configs/datasets.py similarity index 100% rename from configs/datasets.py rename to code/configs/datasets.py diff --git a/configs/fsdp.py b/code/configs/fsdp.py similarity index 100% rename from configs/fsdp.py rename to code/configs/fsdp.py diff --git a/configs/inference.py b/code/configs/inference.py similarity index 100% rename from configs/inference.py rename to code/configs/inference.py diff --git a/configs/peft.py b/code/configs/peft.py similarity index 100% rename from configs/peft.py rename to code/configs/peft.py diff --git a/configs/training.py b/code/configs/training.py similarity index 100% rename from configs/training.py rename to code/configs/training.py diff --git a/evaluate_llama2.py b/code/evaluate_llama2.py similarity index 99% rename from evaluate_llama2.py rename to code/evaluate_llama2.py index 0755bce..b39c452 100644 --- a/evaluate_llama2.py +++ b/code/evaluate_llama2.py @@ -108,7 +108,7 @@ def hook_fn(layer, input, output): results_generated = rouge.compute(predictions=predictions_generated, references=references) results_generated["time"] = e2e_inference_time - print("Fixed results: ",results_generated) + print("Results: ",results_generated) results_file = f"{inference_config.model_name.split('/')[-1]}_samsum" diff --git a/finetune_llama2.py b/code/finetune_llama2.py similarity index 100% rename from finetune_llama2.py rename to code/finetune_llama2.py diff --git a/llama_datasets/__init__.py b/code/llama_datasets/__init__.py similarity index 100% rename from llama_datasets/__init__.py rename to code/llama_datasets/__init__.py diff --git a/llama_datasets/cnndm_dataset.py b/code/llama_datasets/cnndm_dataset.py similarity index 100% rename from llama_datasets/cnndm_dataset.py rename to code/llama_datasets/cnndm_dataset.py diff --git a/llama_datasets/samsum_dataset.py b/code/llama_datasets/samsum_dataset.py similarity index 100% rename from llama_datasets/samsum_dataset.py rename to code/llama_datasets/samsum_dataset.py diff --git a/llama_datasets/utils.py b/code/llama_datasets/utils.py similarity index 100% rename from llama_datasets/utils.py rename to code/llama_datasets/utils.py diff --git a/model_checkpointing/__init__.py b/code/model_checkpointing/__init__.py similarity index 100% rename from model_checkpointing/__init__.py rename to code/model_checkpointing/__init__.py diff --git a/model_checkpointing/checkpoint_handler.py b/code/model_checkpointing/checkpoint_handler.py similarity index 100% rename from model_checkpointing/checkpoint_handler.py rename to code/model_checkpointing/checkpoint_handler.py diff --git a/neuralnets/llama_gumbel.py b/code/neuralnets/llama_gumbel.py similarity index 100% rename from neuralnets/llama_gumbel.py rename to code/neuralnets/llama_gumbel.py diff --git a/policies/__init__.py b/code/policies/__init__.py similarity index 100% rename from policies/__init__.py rename to code/policies/__init__.py diff --git a/policies/activation_checkpointing_functions.py b/code/policies/activation_checkpointing_functions.py similarity index 100% rename from policies/activation_checkpointing_functions.py rename to code/policies/activation_checkpointing_functions.py diff --git a/policies/anyprecision_optimizer.py b/code/policies/anyprecision_optimizer.py similarity index 100% rename from policies/anyprecision_optimizer.py rename to code/policies/anyprecision_optimizer.py diff --git a/policies/mixed_precision.py b/code/policies/mixed_precision.py similarity index 100% rename from policies/mixed_precision.py rename to code/policies/mixed_precision.py diff --git a/policies/wrapping.py b/code/policies/wrapping.py similarity index 100% rename from policies/wrapping.py rename to code/policies/wrapping.py diff --git a/utils/__init__.py b/code/utils/__init__.py similarity index 100% rename from utils/__init__.py rename to code/utils/__init__.py diff --git a/utils/chat_utils.py b/code/utils/chat_utils.py similarity index 100% rename from utils/chat_utils.py rename to code/utils/chat_utils.py diff --git a/utils/checkpoint_converter_fsdp_hf.py b/code/utils/checkpoint_converter_fsdp_hf.py similarity index 100% rename from utils/checkpoint_converter_fsdp_hf.py rename to code/utils/checkpoint_converter_fsdp_hf.py diff --git a/utils/config_utils.py b/code/utils/config_utils.py similarity index 100% rename from utils/config_utils.py rename to code/utils/config_utils.py diff --git a/utils/dataset_utils.py b/code/utils/dataset_utils.py similarity index 100% rename from utils/dataset_utils.py rename to code/utils/dataset_utils.py diff --git a/utils/fsdp_utils.py b/code/utils/fsdp_utils.py similarity index 100% rename from utils/fsdp_utils.py rename to code/utils/fsdp_utils.py diff --git a/utils/memory_utils.py b/code/utils/memory_utils.py similarity index 100% rename from utils/memory_utils.py rename to code/utils/memory_utils.py diff --git a/utils/model_utils.py b/code/utils/model_utils.py similarity index 100% rename from utils/model_utils.py rename to code/utils/model_utils.py diff --git a/utils/safety_utils.py b/code/utils/safety_utils.py similarity index 100% rename from utils/safety_utils.py rename to code/utils/safety_utils.py diff --git a/utils/train_utils.py b/code/utils/train_utils.py similarity index 100% rename from utils/train_utils.py rename to code/utils/train_utils.py