-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: reconstruct evaluation framework #33
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM despite some undeleted absolute paths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these json files needed to merge with the PR or they should be downloaded separately from huggingface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solved
@@ -0,0 +1,49 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line from all yaml files
default=False, | ||
help="If True, chain-of-thought will be implemented during generation", | ||
) | ||
parser.add_argument("--batch_size", type=str, default=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add 'help' here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the above comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
exit() | ||
|
||
for k, v in unparsed_args.items(): | ||
dict_configs = update_dict(dict_configs, custom_cfgs_to_dict(k, v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dict_configs = update_dict(dict_configs, custom_cfgs_to_dict(k, v)) | |
if v == '' or v is None: | |
continue | |
dict_configs = update_dict(dict_configs, custom_cfgs_to_dict(k, v)) |
@Kass123777
You should add this line in each of your vllm_eval.py
of the ./evaluation/benchmarks
.
The problem is described as follows:
- You define some input variables in the
eval.sh
, such asoutput-dir
. The current logic is that if we transfer value to the hyper-parameters in the shell script, it will cover related values in the yaml files. - But what if we don't transfer the value to the hyper-parameters in the shell script ? Related values will be set as None, and None will cover related values in the yaml files, leading to output path errors.
- So we should add a
if
sentence in this line, if the value is None, use the values in the yaml file. - Compare your evaluatin files with training scripts, you will find that in training script, we don't define variables, so the above question will not occur.
Great Work! Looking forward to your reply!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the correction! After that, the program is more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change related minor bugs.
def evaluator(raw_output1: List[InferenceOutput], raw_output2: List[InferenceOutput], dataloader: MTBenchDataLoader, task: str, eval_configs= None): | ||
current_file_path = os.path.abspath(__file__) | ||
current_dir = os.path.dirname(current_file_path) | ||
dataset = load_dataset(task, data_files=os.path.join(current_dir, eval_configs.task_dir))[dataloader.split] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataset = load_dataset(task, data_files=os.path.join(current_dir, eval_configs.task_dir))[dataloader.split] | |
dataset = load_dataset(current_dir,task)[dataloader.split] | |
dataset = load_dataset(current_dir,split='train',data_files='test.jsonl') |
The first line: load dataset from the current directory, and use the value of dataloader.split
to get dataset name, if your test prompt file name: test.jsonl
, it will success, but if your test prompt files name: test_prompt.jsonl
, it will fail.
The second file: a more robust dataset-load method.
Both of the above commands are correct. The original command leads to ERROR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is nice of you to provide such detailed feedback on the code! I really appreciate your insights and suggestions.
merged_dict = {**resp, **eval_} | ||
merged_list.append(merged_dict) | ||
|
||
raw_result_file = eval_configs.output_dir+file_name+"_raw_result.jsonl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raw_result_file = eval_configs.output_dir+file_name+"_raw_result.jsonl" | |
os.makedirs(eval_configs.output_dir,exist_ok=True) | |
raw_result_file = os.path.join(eval_configs.output_dir,file_name + "_raw_result.jsonl") |
Bad output file path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your insights and suggestions!
dataloader = MMLUDataLoader(dict_configs) | ||
test_data = dataloader.load_dataset() | ||
eval_module = MMLUGeneratorVLLM(model_config, infer_configs) | ||
raw_outputs = eval_module.eval(test_data, eval_configs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the code. The speed for evaluating MMLU using vLLM is slow. See #35
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your question! I have replied it in #35
self.llm_trust_remote_code = self.vllm_cfgs_llm.trust_remote_code | ||
self.llm_gpu_memory_utilization = self.vllm_cfgs_llm.gpu_memory_utilization | ||
self.llm_max_num_seqs = self.vllm_cfgs_llm.max_num_seqs | ||
self.llm_tensor_parallel_size = cuda_device_count_stateless() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to support command line hyper-parameters, you can also change the config file in vllm_basic.json
I try to implement it by the following command. But it occurs some error :
self.llm_tensor_parallel_size = cuda_device_count_stateless() | |
tensor_ps = self.vllm_cfgs_llm.tensor_parallel_size | |
self.llm_tensor_parallel_size = tensor_ps if tensor_ps else cuda_device_count_stateless() |
AttributeError: 'RayGPUExecutor' object has no attribute 'forward_dag'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This problem is caused by the same reason as #35 and will be fixed in the next version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I suggest add some experiments with Vicuna-33B, if it works, then I think it is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add related descriptions about methods and updates news in the global README.md
Description
Motivation and Context
Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax
close #1314520
if this solves the issue #15213Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!