update readme

UKPLab · Aug 14, 2024 · 3ea7c48 · 3ea7c48
1 parent cb0345a
commit 3ea7c48
Show file tree

Hide file tree

Showing 31 changed files with 41 additions and 37 deletions.
diff --git a/README.md b/README.md
@@ -16,56 +16,60 @@ Contact person: [Ji-Ung Lee](mailto:[email protected])
 
 Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.
 
+## Project structure
+
+* `configs` &mdash; contains five different configuration files:
+    1. `datasets.py` available datasets, names, and splits
+    2. `fsdp.py` settings for training on multiple GPUs and concerning quantization
+    3. `inference.py` inference configuration; e.g., max_new_tokens, temperature, etc.
+    4. `peft.py` settings for parameter efficient training methods (e.g., LoRA)
+    5. `training.py` settings for training, e.g., learning rate, batch size, etc.
+* `llama_datasets` &mdash; Data loading and preparation routines
+* `model_checkpointing` &mdash; (self explanatory)
+* `neuralnets` &mdash; Implementation of the gumbel softmax for Llama2
+* `utils` &mdash; various utilities for training, saving/loading models, etc.
+* `policies` &mdash; Utilities for FSDP (do not touch)
+* `results` &mdash; Result folder (needs to be created)
 
 ## Getting Started
 
-If you want to set up this template:
-
-1. Request a repository on UKP Lab's GitHub by following the standard procedure on the wiki. It will install the template directly. Alternatively, set it up in your personal GitHub account by clicking **[Use this template](https://github.com/rochacbruno/python-project-template/generate)**.
-2. Wait until the first run of CI finishes. Github Actions will commit to your new repo with a "✅ Ready to clone and code" message.
-3. Delete optional files: 
-    - If you don't need automatic documentation generation, you can delete folder `docs`, file `.github\workflows\docs.yml` and `mkdocs.yml`
-    - If you don't want automatic testing, you can delete folder `tests` and file `.github\workflows\tests.yml`
-4. Prepare a virtual environment:
-```bash
-python -m venv .venv
-source .venv/bin/activate
-pip install .
-pip install -r requirements-dev.txt # Only needed for development
-```
-5. Adapt anything else (for example this file) to your project. 
+To run the experiments, first create a respective virtual env (using e.g., conda):
+
+    conda create --name=<envname> python=3.9
+    conda activate <envname>
+    pip install -r requirements.txt
 
-6. Read the file [ABOUT_THIS_TEMPLATE.md](ABOUT_THIS_TEMPLATE.md)  for more information about development.
+We run all our experiments with python 3.9.
 
 ## Usage
 
-### Using the classes
+### Training
 
-To import classes/methods of `gumbel_softmax_layer_skipping_2024` from inside the package itself you can use relative imports: 
+To finetune a base model, you can use torchrun with `finetune_llama2.py`
 
-```py
-from .base import BaseClass # Notice how I omit the package name
+    torchrun --standalone \
+        --nnodes 1 --nproc_per_node 2 finetune_llama2.py \  # Use 2 GPUs
+        --batch_size_training 1 \
+        --model_name "<path-to-model>" \
+        --pure_bf16 \
+        --num_epochs 3 \
+        --output_dir "model_output/llama-7b-trained-gumbel" \
+        --gumbel 1  \
+        --dataset "samsum_dataset"
 
-BaseClass().something()
-```
+Detailed description of all parameters are provided in `configs/training.py`. 
 
-To import classes/methods from outside the package (e.g. when you want to use the package in some other project) you can instead refer to the package name:
+### Inference 
 
-```py
-from gumbel_softmax_layer_skipping_2024 import BaseClass # Notice how I omit the file name
-from gumbel_softmax_layer_skipping_2024.subpackage import SubPackageClass # Here it's necessary because it's a subpackage
+To perform innference, you can use torchrun with `evaluate_llama2.py`.
 
-BaseClass().something()
-SubPackageClass().something()
-```
+    torchrun --standalone \
+        --nnodes 1 --nproc_per_node 1 evaluate_llama2.py \
+        --model_name "model_output/llama-7b-trained-gumbel" \
+        --use_gumbel 1 \
+        --output_dir "results" 
 
-### Using scripts
-
-This is how you can use `gumbel_softmax_layer_skipping_2024` from command line:
-
-```bash
-$ python -m gumbel_softmax_layer_skipping_2024
-```
+This will also measure the overall time taken for inference (normalized by the number of generated tokens) and keep track of the layers that were activated for the Gumbel Softmax (written into `activations.json`). The scores and generated text will be written into `.json` and `.tsv` files.
 
 ## Cite
 

diff --git a/configs/__init__.py → code/configs/__init__.py b/configs/__init__.py → code/configs/__init__.py
diff --git a/configs/datasets.py → code/configs/datasets.py b/configs/datasets.py → code/configs/datasets.py
diff --git a/configs/fsdp.py → code/configs/fsdp.py b/configs/fsdp.py → code/configs/fsdp.py
diff --git a/configs/inference.py → code/configs/inference.py b/configs/inference.py → code/configs/inference.py
diff --git a/configs/peft.py → code/configs/peft.py b/configs/peft.py → code/configs/peft.py
diff --git a/configs/training.py → code/configs/training.py b/configs/training.py → code/configs/training.py
diff --git a/evaluate_llama2.py → code/evaluate_llama2.py b/evaluate_llama2.py → code/evaluate_llama2.py
@@ -108,7 +108,7 @@ def hook_fn(layer, input, output):
     results_generated = rouge.compute(predictions=predictions_generated, references=references)
     results_generated["time"] = e2e_inference_time
 
-    print("Fixed results: ",results_generated)
+    print("Results: ",results_generated)
 
     results_file = f"{inference_config.model_name.split('/')[-1]}_samsum"
 

diff --git a/finetune_llama2.py → code/finetune_llama2.py b/finetune_llama2.py → code/finetune_llama2.py
diff --git a/llama_datasets/__init__.py → code/llama_datasets/__init__.py b/llama_datasets/__init__.py → code/llama_datasets/__init__.py
diff --git a/llama_datasets/cnndm_dataset.py → code/llama_datasets/cnndm_dataset.py b/llama_datasets/cnndm_dataset.py → code/llama_datasets/cnndm_dataset.py
diff --git a/llama_datasets/samsum_dataset.py → code/llama_datasets/samsum_dataset.py b/llama_datasets/samsum_dataset.py → code/llama_datasets/samsum_dataset.py
diff --git a/llama_datasets/utils.py → code/llama_datasets/utils.py b/llama_datasets/utils.py → code/llama_datasets/utils.py
diff --git a/model_checkpointing/__init__.py → code/model_checkpointing/__init__.py b/model_checkpointing/__init__.py → code/model_checkpointing/__init__.py
diff --git a/model_checkpointing/checkpoint_handler.py → ...model_checkpointing/checkpoint_handler.py b/model_checkpointing/checkpoint_handler.py → ...model_checkpointing/checkpoint_handler.py
diff --git a/neuralnets/llama_gumbel.py → code/neuralnets/llama_gumbel.py b/neuralnets/llama_gumbel.py → code/neuralnets/llama_gumbel.py
diff --git a/policies/__init__.py → code/policies/__init__.py b/policies/__init__.py → code/policies/__init__.py
diff --git a/...ies/activation_checkpointing_functions.py → ...ies/activation_checkpointing_functions.py b/...ies/activation_checkpointing_functions.py → ...ies/activation_checkpointing_functions.py
diff --git a/policies/anyprecision_optimizer.py → code/policies/anyprecision_optimizer.py b/policies/anyprecision_optimizer.py → code/policies/anyprecision_optimizer.py
diff --git a/policies/mixed_precision.py → code/policies/mixed_precision.py b/policies/mixed_precision.py → code/policies/mixed_precision.py
diff --git a/policies/wrapping.py → code/policies/wrapping.py b/policies/wrapping.py → code/policies/wrapping.py
diff --git a/utils/__init__.py → code/utils/__init__.py b/utils/__init__.py → code/utils/__init__.py
diff --git a/utils/chat_utils.py → code/utils/chat_utils.py b/utils/chat_utils.py → code/utils/chat_utils.py
diff --git a/utils/checkpoint_converter_fsdp_hf.py → code/utils/checkpoint_converter_fsdp_hf.py b/utils/checkpoint_converter_fsdp_hf.py → code/utils/checkpoint_converter_fsdp_hf.py
diff --git a/utils/config_utils.py → code/utils/config_utils.py b/utils/config_utils.py → code/utils/config_utils.py
diff --git a/utils/dataset_utils.py → code/utils/dataset_utils.py b/utils/dataset_utils.py → code/utils/dataset_utils.py
diff --git a/utils/fsdp_utils.py → code/utils/fsdp_utils.py b/utils/fsdp_utils.py → code/utils/fsdp_utils.py
diff --git a/utils/memory_utils.py → code/utils/memory_utils.py b/utils/memory_utils.py → code/utils/memory_utils.py
diff --git a/utils/model_utils.py → code/utils/model_utils.py b/utils/model_utils.py → code/utils/model_utils.py
diff --git a/utils/safety_utils.py → code/utils/safety_utils.py b/utils/safety_utils.py → code/utils/safety_utils.py
diff --git a/utils/train_utils.py → code/utils/train_utils.py b/utils/train_utils.py → code/utils/train_utils.py