feat: reconstruct evaluation framework #33

Kass123777 · 2024-08-04T10:43:01Z

Description

Reconstruct evaluation framework.
Support vLLM and Deepspeed for generation backend.
Support 12 text -> text benchmarks and 1 text + image -> text benchmark.

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #1314520 if this solves the issue #15213

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide. (required)
My change requires a change to the documentation.
I have updated the tests accordingly. (required for a bug fix or a new feature)
I have updated the documentation accordingly.

htlou

LGTM despite some undeleted absolute paths

htlou · 2024-08-04T13:07:35Z

align_anything/evaluation/benchmarks/cot_fewshot/BBH/disambiguation_qa.json

Are these json files needed to merge with the PR or they should be downloaded separately from huggingface?

Gaiejj · 2024-08-04T13:16:03Z

align_anything/configs/evaluation/benchmarks/arc.yaml

@@ -0,0 +1,49 @@
+#!/usr/bin/env bash


Remove this line from all yaml files

align_anything/evaluation/README.md

align_anything/evaluation/README_zh-CN.md

Gaiejj · 2024-08-04T13:21:43Z

align_anything/evaluation/__main__.py

+        default=False,
+        help="If True, chain-of-thought will be implemented during generation",
+    )
+    parser.add_argument("--batch_size", type=str, default=1)


add 'help' here

Gaiejj

Please check the above comments.

htlou

LGTM in this way.

XuyaoWang

LGTM.

cby-pku · 2024-08-05T15:43:14Z

align_anything/evaluation/benchmarks/mt_bench/vllm_eval.py

+        exit()
+
+    for k, v in unparsed_args.items():
+        dict_configs = update_dict(dict_configs, custom_cfgs_to_dict(k, v))


Suggested change

dict_configs = update_dict(dict_configs, custom_cfgs_to_dict(k, v))

if v == '' or v is None:

continue

dict_configs = update_dict(dict_configs, custom_cfgs_to_dict(k, v))

@Kass123777
You should add this line in each of your vllm_eval.py of the ./evaluation/benchmarks.
The problem is described as follows:

You define some input variables in the eval.sh, such as output-dir. The current logic is that if we transfer value to the hyper-parameters in the shell script, it will cover related values in the yaml files.

But what if we don't transfer the value to the hyper-parameters in the shell script ? Related values will be set as None, and None will cover related values in the yaml files, leading to output path errors.

So we should add a if sentence in this line, if the value is None, use the values in the yaml file.

Compare your evaluatin files with training scripts, you will find that in training script, we don't define variables, so the above question will not occur.

Great Work! Looking forward to your reply!!

Thanks for the correction! After that, the program is more robust.

cby-pku

Please change related minor bugs.

cby-pku · 2024-08-05T17:18:34Z

align_anything/evaluation/benchmarks/mt_bench/vllm_eval.py

+def evaluator(raw_output1: List[InferenceOutput], raw_output2: List[InferenceOutput], dataloader: MTBenchDataLoader, task: str, eval_configs= None):
+    current_file_path = os.path.abspath(__file__)
+    current_dir = os.path.dirname(current_file_path)
+    dataset = load_dataset(task, data_files=os.path.join(current_dir, eval_configs.task_dir))[dataloader.split]


Suggested change

dataset = load_dataset(task, data_files=os.path.join(current_dir, eval_configs.task_dir))[dataloader.split]

dataset = load_dataset(current_dir,task)[dataloader.split]

dataset = load_dataset(current_dir,split='train',data_files='test.jsonl')

The first line: load dataset from the current directory, and use the value of dataloader.split to get dataset name, if your test prompt file name: test.jsonl, it will success, but if your test prompt files name: test_prompt.jsonl, it will fail.

The second file: a more robust dataset-load method.

Both of the above commands are correct. The original command leads to ERROR.

It is nice of you to provide such detailed feedback on the code! I really appreciate your insights and suggestions.

cby-pku · 2024-08-05T17:21:22Z

align_anything/evaluation/benchmarks/mt_bench/vllm_eval.py

+        merged_dict = {**resp, **eval_}
+        merged_list.append(merged_dict)
+
+    raw_result_file = eval_configs.output_dir+file_name+"_raw_result.jsonl"


Suggested change

raw_result_file = eval_configs.output_dir+file_name+"_raw_result.jsonl"

os.makedirs(eval_configs.output_dir,exist_ok=True)

raw_result_file = os.path.join(eval_configs.output_dir,file_name + "_raw_result.jsonl")

Bad output file path

Thanks for your insights and suggestions!

cby-pku · 2024-08-05T18:49:57Z

align_anything/evaluation/benchmarks/MMLU/vllm_eval.py

+    dataloader = MMLUDataLoader(dict_configs)
+    test_data = dataloader.load_dataset()
+    eval_module = MMLUGeneratorVLLM(model_config, infer_configs)
+    raw_outputs = eval_module.eval(test_data, eval_configs)


Please check the code. The speed for evaluating MMLU using vLLM is slow. See #35

Thanks for your question! I have replied it in #35

cby-pku · 2024-08-05T19:31:56Z

align_anything/evaluation/inference/base_inference.py

+        self.llm_trust_remote_code = self.vllm_cfgs_llm.trust_remote_code
+        self.llm_gpu_memory_utilization = self.vllm_cfgs_llm.gpu_memory_utilization
+        self.llm_max_num_seqs = self.vllm_cfgs_llm.max_num_seqs
+        self.llm_tensor_parallel_size = cuda_device_count_stateless()


Try to support command line hyper-parameters, you can also change the config file in vllm_basic.json

I try to implement it by the following command. But it occurs some error :

Suggested change

self.llm_tensor_parallel_size = cuda_device_count_stateless()

tensor_ps = self.vllm_cfgs_llm.tensor_parallel_size

self.llm_tensor_parallel_size = tensor_ps if tensor_ps else cuda_device_count_stateless()

AttributeError: 'RayGPUExecutor' object has no attribute 'forward_dag'

This problem is caused by the same reason as #35 and will be fixed in the next version.

Great! I suggest add some experiments with Vicuna-33B, if it works, then I think it is ok.

cby-pku

Great Work!

cby-pku

Add related descriptions about methods and updates news in the global README.md

Reconstruct evaluation framework.

f6c76af

htlou reviewed Aug 4, 2024

View reviewed changes

Gaiejj changed the title ~~feat: reconstruct evaluation framework.~~ feat: reconstruct evaluation framework Aug 4, 2024

Gaiejj reviewed Aug 4, 2024

View reviewed changes

Delete some unnecessary files.

2655295

Gaiejj reviewed Aug 4, 2024

View reviewed changes

align_anything/evaluation/README.md Show resolved Hide resolved

Gaiejj reviewed Aug 4, 2024

View reviewed changes

align_anything/evaluation/README_zh-CN.md Show resolved Hide resolved

Gaiejj reviewed Aug 4, 2024

View reviewed changes

Gaiejj requested changes Aug 4, 2024

View reviewed changes

Fix some bugs.

457a6fb

Gaiejj approved these changes Aug 4, 2024

View reviewed changes

htlou approved these changes Aug 4, 2024

View reviewed changes

Fix some bugs

0404c77

XuyaoWang approved these changes Aug 5, 2024

View reviewed changes

cby-pku reviewed Aug 5, 2024

View reviewed changes

cby-pku requested changes Aug 5, 2024

View reviewed changes

cby-pku reviewed Aug 5, 2024

View reviewed changes

Fix some bugs

36d0ec8

cby-pku reviewed Aug 6, 2024

View reviewed changes

cby-pku approved these changes Aug 6, 2024

View reviewed changes

cby-pku requested changes Aug 6, 2024

View reviewed changes

Gaiejj approved these changes Aug 6, 2024

View reviewed changes

cby-pku approved these changes Aug 6, 2024

View reviewed changes

cby-pku merged commit e830052 into PKU-Alignment:main Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: reconstruct evaluation framework #33

feat: reconstruct evaluation framework #33

Kass123777 commented Aug 4, 2024 •

edited

Loading

htlou left a comment

htlou Aug 4, 2024

htlou Aug 4, 2024

Gaiejj Aug 4, 2024

Gaiejj Aug 4, 2024

Gaiejj left a comment

htlou left a comment

XuyaoWang left a comment

cby-pku Aug 5, 2024 •

edited

Loading

Reindulger Aug 6, 2024

cby-pku left a comment

cby-pku Aug 5, 2024

Reindulger Aug 6, 2024

cby-pku Aug 5, 2024

Reindulger Aug 6, 2024

cby-pku Aug 5, 2024

Kass123777 Aug 5, 2024

cby-pku Aug 5, 2024 •

edited

Loading

Kass123777 Aug 5, 2024

cby-pku Aug 6, 2024

cby-pku left a comment

cby-pku left a comment

	dataset = load_dataset(task, data_files=os.path.join(current_dir, eval_configs.task_dir))[dataloader.split]
	dataset = load_dataset(current_dir,task)[dataloader.split]
	dataset = load_dataset(current_dir,split='train',data_files='test.jsonl')

	raw_result_file = eval_configs.output_dir+file_name+"_raw_result.jsonl"
	os.makedirs(eval_configs.output_dir,exist_ok=True)
	raw_result_file = os.path.join(eval_configs.output_dir,file_name + "_raw_result.jsonl")

	self.llm_tensor_parallel_size = cuda_device_count_stateless()
	tensor_ps = self.vllm_cfgs_llm.tensor_parallel_size
	self.llm_tensor_parallel_size = tensor_ps if tensor_ps else cuda_device_count_stateless()

feat: reconstruct evaluation framework #33

feat: reconstruct evaluation framework #33

Conversation

Kass123777 commented Aug 4, 2024 • edited Loading

Description

Motivation and Context

Types of changes

Checklist

htlou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gaiejj left a comment

Choose a reason for hiding this comment

htlou left a comment

Choose a reason for hiding this comment

XuyaoWang left a comment

Choose a reason for hiding this comment

cby-pku Aug 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cby-pku left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cby-pku Aug 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cby-pku left a comment

Choose a reason for hiding this comment

cby-pku left a comment

Choose a reason for hiding this comment

Kass123777 commented Aug 4, 2024 •

edited

Loading

cby-pku Aug 5, 2024 •

edited

Loading

cby-pku Aug 5, 2024 •

edited

Loading