Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
ljvmiranda921 committed Jul 13, 2024
1 parent 92c644a commit dff5259
Show file tree
Hide file tree
Showing 2 changed files with 59 additions and 6 deletions.
59 changes: 54 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,19 +26,68 @@ First, you need to set a [HuggingFace token](https://huggingface.co/settings/tok
export HF_TOKEN=<your huggingface token>
```

If you're planning to use some closed-source APIs, you also need to set the tokens for each:
You can find all runnable experiments in the `scripts` directory.
Their filename should explicitly tell you their purpose.

### Getting rewards from a Reward Model (RM) on a HuggingFace dataset

Here, we use the `rewardbench` command-line interface and pass a HuggingFace dataset.
This is useful if the reward model is trained as a Custom classifier (🛠️), Sequence classifier (🔢), or via DPO (🎯).
For example, if we want to get the reward score of the UltraRM-13b reward model on a preference dataset, we run:

```sh
rewardbench \
--model openbmb/UltraRM-13b \
--chat_template openbmb \
--dataset $DATASET \
--split $SPLIT \
--output_dir $OUTDIR \
--batch_size 8 \
--trust_remote_code \
--force_truncation \
--save_all
```

The evaluation parameters can be found in the [allenai/reward-bench](https://github.com/allenai/reward-bench/blob/main/scripts/configs/eval_configs.yaml) repository.
This runs the reward model on the (prompt, chosen, rejected) triples and give us the reward score for each instance.
The results are saved into a JSON file inside the `$OUTDIR` directory.
Finally, you can find some experiments in the `scripts/run_rm_evals.sh` script.

### Getting rewards from a Generative RM on a HuggingFace dataset

Here we use `scripts/run_generative.py`, a modified version of the [same script in RewardBench](https://github.com/allenai/reward-bench/blob/main/scripts/run_generative.py) to obtain rewards from a Generative RM (🗨️).
The only difference is that this script accepts any arbitrary HuggingFace preference dataset (which we plan to conribute upstream later on) instead of just the RewardBench dataset.

For Generative RMs, we prompt a model in a style akin to LLM-as-a-judge, and then parse the output to obtain the preference.
This can be done for closed-source APIs (e.g., GPT-4, Claude) or open-source LMs (done via vLLM).
If you're planning to use some closed-source APIs, you also need to set the tokens for each:

```sh
export OPENAI_API_KEY=<your openai token>
export ANTHROPIC_API_KEY=<your anthropic token>
export GEMINI_API_KEY=<your gemini token>
```

You can find all runnable experiments in the `scripts` directory.
Their filename should explicitly tell you their purpose.
For example, `scripts/run_rm_evals.sh` runs the RewardBench inference pipeline on a select number of models given a dataset:
Say we want to obtain the preferences of `gpt-4-2024-04-09`:

```sh
export OPENAI_API_KEY=<your openai token>
python -m scripts/run_generative.py \
--dataset_name $DATASET \
--split $SPLIT \
--model gpt-4-turbo-2024-04-09 \
--output_dir $OUTDIR
```

You can also run open-source LMs in a generative fashion.
The inference is then routed through [vLLM](https://github.com/vllm-project/vllm).
Here's an example using `meta-llama/Meta-Llama-3-70B-Instruct`:

```sh
./scripts/run_rm_evals.sh
python -m scripts/run_generative.py \
--dataset_name $DATASET \
--split $SPLIT \
--model "meta-llama/Meta-Llama-3-70B-Instruct" \
--num_gpus 4 \
--output_dir $OUTDIR
```
6 changes: 5 additions & 1 deletion scripts/run_generative.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
# Examples:
# python scripts/run_generative.py --dataset_name <DATASET_NAME> --model gpt-3.5-turbo
# python scripts/run_generative.py --dataset_name <DATASET_NAME> --model=claude-3-haiku-20240307
# python scripts/run_generative.py --dataset_name <DATASET_NAME> --model=CohereForAI/c4ai-command-r-v01 --num_gpus 2 --force_local

# note: for none API models, this script uses vllm
# pip install vllm
Expand All @@ -31,6 +32,7 @@
import os
import sys
from concurrent.futures import ThreadPoolExecutor, as_completed
from pathlib import Path

import numpy as np
from datasets import load_dataset
Expand All @@ -57,6 +59,7 @@ def get_args():
parser.add_argument("--dataset_name", type=str, required=True, help="name of dataset to test on")
parser.add_argument("--split", default="test", type=str, required=True, help="dataset split to evaluate")
parser.add_argument("--model", type=str, nargs="+", required=True, help="name of model to use")
parser.add_argument("--output_dir", type=str, required=True, help="Directory to save the results.")
parser.add_argument("--chat_template", type=str, default=None, help="fastchat chat template (optional)")
parser.add_argument("--trust_remote_code", action="store_true", default=False, help="directly load model instead of pipeline")
parser.add_argument("--num_gpus", type=int, default=1, help="number of gpus to use, for multi-node vllm")
Expand Down Expand Up @@ -352,7 +355,8 @@ def process_shuffled(win, shuffle):
},
}

file_path = f"{model_name.replace('/', '___')}.json"
output_dir = Path(args.output_dir)
file_path = output_dir / f"{model_name.replace('/', '___')}.json"
with open(file_path, "w") as f:
json.dump(results_dict, f, indent=4)

Expand Down

0 comments on commit dff5259

Please sign in to comment.