Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error for AGIEval when using fewshot #2323

Open
BaohaoLiao opened this issue Sep 19, 2024 · 1 comment
Open

Error for AGIEval when using fewshot #2323

BaohaoLiao opened this issue Sep 19, 2024 · 1 comment
Labels
bug Something isn't working. validation For validation of task implementations.

Comments

@BaohaoLiao
Copy link

BaohaoLiao commented Sep 19, 2024

Hi, I meet the following error when I evaluate on AGIEval with num_fewshot=3. However, everything works normally with 0-shot.

2024-09-19:13:03:47,189 DEBUG    [cache.py:33] requests-agieval_jec_qa_kd-3shot-rank0-world_size1-tokenizer is not cached, generating...
2024-09-19:13:03:47,189 INFO     [task.py:423] Building contexts for agieval_jec_qa_kd on rank 0...
 12%|████████████▍                                                                                            | 119/1000 [00:00<00:01, 480.91it/s]
Traceback (most recent call last):
  File "/home/baliao/.conda/envs/cluster/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/baliao/.conda/envs/cluster/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/chatgpt/data/baliao/cluster/04_lm_eval/lm-evaluation-harness/lm_eval/__main__.py", line 468, in <module>
    cli_evaluate()
  File "/data/chatgpt/data/baliao/cluster/04_lm_eval/lm-evaluation-harness/lm_eval/__main__.py", line 389, in cli_evaluate
    results = evaluator.simple_evaluate(
  File "/data/chatgpt/data/baliao/cluster/04_lm_eval/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
  File "/data/chatgpt/data/baliao/cluster/04_lm_eval/lm-evaluation-harness/lm_eval/evaluator.py", line 301, in simple_evaluate
    results = evaluate(
  File "/data/chatgpt/data/baliao/cluster/04_lm_eval/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
  File "/data/chatgpt/data/baliao/cluster/04_lm_eval/lm-evaluation-harness/lm_eval/evaluator.py", line 420, in evaluate
    task.build_all_requests(
  File "/data/chatgpt/data/baliao/cluster/04_lm_eval/lm-evaluation-harness/lm_eval/api/task.py", line 446, in build_all_requests
    fewshot_ctx = self.fewshot_context(
  File "/data/chatgpt/data/baliao/cluster/04_lm_eval/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
  File "/data/chatgpt/data/baliao/cluster/04_lm_eval/lm-evaluation-harness/lm_eval/api/task.py", line 1088, in fewshot_context
    labeled_examples += self.sampler.get_context(doc, num_fewshot)
  File "/data/chatgpt/data/baliao/cluster/04_lm_eval/lm-evaluation-harness/lm_eval/api/samplers.py", line 89, in get_context
    str(doc_target[0])
IndexError: list index out of range

Here is how I run the code:

MODEL=/path/to/Llama-2-7b-hf

accelerate launch --num_processes 1 -m lm_eval \
    --model hf \
    --model_args pretrained=$MODEL,trust_remote_code=True \
    --batch_size 16 \
    --verbosity DEBUG \
    --tasks agieval \
    --num_fewshot 3

Version: lm_eval 0.4.4

@baberabb
Copy link
Contributor

baberabb commented Sep 19, 2024

Hi! So the dataset we are using is missing the fewshot split. It uses the test split for the fewshot samples and looks like one of the rows in agieval_jec_qa_kd is missing the answer field. We have some logic to handle that when its an evaluation question, but not when its in the fewshot. I'll try looking into it!

@baberabb baberabb added bug Something isn't working. validation For validation of task implementations. labels Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. validation For validation of task implementations.
Projects
None yet
Development

No branches or pull requests

2 participants