Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR fixing the issue #1391 (wrong contexts in the mgsm task) #1440

Merged
merged 6 commits into from
Feb 22, 2024

Conversation

leocnj
Copy link
Contributor

@leocnj leocnj commented Feb 18, 2024

The issue reported in #1391 has been fixed.

For example, for mgsm_direct_en, new yaml file will be

!!@@##@@!! -- Example 1
Question: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?
Answer:Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11.

Question: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?
Answer:There are 3 cars in the beginning, 2 more arrive, so now there should be 3 + 2 = 5 cars. The answer is 5.

Question: There were nine computers in the server room. Five more computers were installed each day, from monday to thursday. How many computers are now in the server room?
Answer:

For mgsm_native_cot_zh, the new yaml file will be

!!@@##@@!! -- Example 1
问题:罗杰有 5 个网球。他又买了 2 罐网球。每罐有 3 个网球。他现在有多少个网球?
逐步解答: 杰一开始有 5 个球。2 罐各 3 个网球就是 6 个网球。5 + 6 = 11。答案是 11。

问题:如果停车场里有 3 辆车,又来了 2 辆车,停车场里有多少辆车?
逐步解答: 开始有 3 辆车,又来了 2 辆,所以现在应该有 3 + 2 = 5 辆车。答案是 5。

问题:服务器机房里有九台电脑。从周一到周四,每天又安装了五台电脑。服务器机房里现在有多少台电脑?
逐步解答:

- change naming so that file name will match with task name
- task|file follows a consistent naming way, mgsm_(mode)_(lang) for three modes, i.e., direct, en_cot, and native_cot
@CLAassistant
Copy link

CLAassistant commented Feb 18, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@haileyschoelkopf haileyschoelkopf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for this PR! looks good to me, modulo the one nit I had on target delimiter.

Also flagging that we will want to re-look at MGSM to apply some of the better answer extraction from GSM in #1356 to it.

lm_eval/tasks/mgsm/en_cot/cot_yaml Show resolved Hide resolved
@haileyschoelkopf haileyschoelkopf added the bug Something isn't working. label Feb 19, 2024
@haileyschoelkopf
Copy link
Collaborator

Thank you for this!

@haileyschoelkopf haileyschoelkopf merged commit a72babb into EleutherAI:main Feb 22, 2024
7 of 8 checks passed
@thnkinbtfly
Copy link
Contributor

Thanks very much for this PR! looks good to me, modulo the one nit I had on target delimiter.

Also flagging that we will want to re-look at MGSM to apply some of the better answer extraction from GSM in #1356 to it.

Working on it!

wx-zhang pushed a commit to wx-zhang/lm-evaluation-harness that referenced this pull request Mar 13, 2024
…EleutherAI#1440)

* fix the issue EleutherAI#1391, wrong contexts in mgsm tasks

* fix yaml issue for having two target_delimiter lines. For COT tasks, keep the one with a space (default)

* regenerate all task yaml files
- change naming so that file name will match with task name
- task|file follows a consistent naming way, mgsm_(mode)_(lang) for three modes, i.e., direct, en_cot, and native_cot

* English CoTs should have a space as target_delimiter

* Update utils.py

* Apply suggestions from code review

---------

Co-authored-by: Hailey Schoelkopf <[email protected]>
nightingal3 pushed a commit to mycoalchen/lm-evaluation-harness that referenced this pull request May 2, 2024
…EleutherAI#1440)

* fix the issue EleutherAI#1391, wrong contexts in mgsm tasks

* fix yaml issue for having two target_delimiter lines. For COT tasks, keep the one with a space (default)

* regenerate all task yaml files
- change naming so that file name will match with task name
- task|file follows a consistent naming way, mgsm_(mode)_(lang) for three modes, i.e., direct, en_cot, and native_cot

* English CoTs should have a space as target_delimiter

* Update utils.py

* Apply suggestions from code review

---------

Co-authored-by: Hailey Schoelkopf <[email protected]>
djstrong pushed a commit to speakleash/lm-evaluation-harness that referenced this pull request Aug 2, 2024
…EleutherAI#1440)

* fix the issue EleutherAI#1391, wrong contexts in mgsm tasks

* fix yaml issue for having two target_delimiter lines. For COT tasks, keep the one with a space (default)

* regenerate all task yaml files
- change naming so that file name will match with task name
- task|file follows a consistent naming way, mgsm_(mode)_(lang) for three modes, i.e., direct, en_cot, and native_cot

* English CoTs should have a space as target_delimiter

* Update utils.py

* Apply suggestions from code review

---------

Co-authored-by: Hailey Schoelkopf <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working.
Projects
Development

Successfully merging this pull request may close these issues.

4 participants