Add Open Arabic LLM Leaderboard Benchmarks (Full and Light Version) #2232

Malikeh97 · 2024-08-20T22:15:48Z

In this PR, we fully implemented and tested two following task sets and added them to lm-evaluation-harness repo.

All the benchmarks under the Open Arabic LLM Leaderboard according to their lighteval implementation: Link
- ```
  The tasks are under `lm_eval/tasks/arabic_leaderboard_complete` folder
```
The light version of the Open Arabic LLM Leaderboard benchmark with 10% of the train sets and full validation sets for quick evaluation of the models.
- ```
  The tasks are under `lm_eval/tasks/arabic_leaderboard_light` folder
```
Note! The further details of groups, tasks, and subtasks are added as a README file to both folders.

Malikeh97 · 2024-08-27T15:11:18Z

Hi, thanks very much for the substantial contribution!

I've left a review--the main comment I have is that 1) the task names should be more clearly named, and 2) also, we should be very clear what tasks are machine-translated and what are not.

(perhaps using a prefix like arabic_leaderboard for these tasks' names is also desirable?)

Hi @haileyschoelkopf
Thanks for you constructive comments. My colleague @shahrzads addressed all the comments and fully tested both light and full versions on sample models. It would be great if you do a final review of the PR and let us know if anything is still missing.

Thanks for creating such a fantastic repo and it was our pleasure to contribute!

shahrzads · 2024-08-27T21:17:47Z

Hi, thanks very much for the substantial contribution!

I've left a review--the main comment I have is that 1) the task names should be more clearly named, and 2) also, we should be very clear what tasks are machine-translated and what are not.

(perhaps using a prefix like arabic_leaderboard for these tasks' names is also desirable?)

Hi @haileyschoelkopf,
Thank you for your thorough review and valuable feedback. We have carefully addressed all the comments and made the necessary adjustments to the code. We also tested both light and full versions on sample models. Please let us know if there are any further changes required. Otherwise, we believe the updates are ready for merge and look forward to your approval. Your prompt action would be greatly appreciated.

Thanks for the great repo @haileyschoelkopf, @lintangsutawika, and @baberabb. It was a pleasure to contribute to this amazing project!

haileyschoelkopf · 2024-08-28T14:29:40Z

Thank you very much @shahrzads @Malikeh97 , I will review again today!

haileyschoelkopf

Hi @shahrzads @Malikeh97 , I reviewed again! Just a few more requests re: standardizing task naming (sorry...) then we'd be good to go!

Also, would it be possible to run the precommit on all changed files? (pip install pre-commit, pre-commit install, then re-commit changes)?

lm_eval/tasks/README.md

lm_eval/tasks/arabic_leaderboard_complete/alghafa/alghafa.yaml

lm_eval/tasks/arabic_leaderboard_complete/README.md

Co-authored-by: Hailey Schoelkopf <[email protected]>

shahrzads · 2024-09-03T19:15:01Z

Hi @shahrzads @Malikeh97 , I reviewed again! Just a few more requests re: standardizing task naming (sorry...) then we'd be good to go!

Also, would it be possible to run the precommit on all changed files? (pip install pre-commit, pre-commit install, then re-commit changes)?

Hi @haileyschoelkopf,

Thanks a lot for the review and your great comments! I have addressed all the comments, including standardizing the task naming.

I have also run the pre-commit on all the changed files, as requested.

Additionally, to make the benchmark more aligned with OALL (https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard) and lighteval (https://github.com/huggingface/lighteval), we added the Arabic mt_mmlu to the benchmark. The details are also added to the RADME file. Please take a look and let me know if there are any further changes needed.

Thanks again for your guidance!

shahrzads · 2024-09-09T06:29:44Z

Hi @haileyschoelkopf,

Hope you are doing well!

I would like to follow up on addressing your feedback and ask for your comment on the new changes. We would appreciate it if you can review the new version of the code and let us know your opinion.

haileyschoelkopf · 2024-09-09T12:48:42Z

Hi @shahrzads @Malikeh97 , thank you for your hard work on this and sorry for the delayed approval! Ready to merge, conditional on tests passing.

Malikeh97 · 2024-09-10T21:14:52Z

Hi @haileyschoelkopf
Thanks for all the constructive comments! Our PR is ready to merge. But, I see 2 workflows awaiting approval. Is there anything that we should done from our end?

Best,
Malikeh

lintangsutawika · 2024-09-10T21:29:12Z

@Malikeh97 thanks for the PR!

…leutherAI#2232) * arabic leaferboard yaml file is added * arabic toxigen is implemented * Dataset library is imported * arabic sciq is added * util file of arabic toxigen is updated * arabic race is added * arabic piqa is implemented * arabic open qa is added * arabic copa is implemented * arabic boolq ia added * arabic arc easy is added * arabic arc challenge is added * arabic exams benchmark is implemented * arabic hellaswag is added * arabic leaderboard yaml file metrics are updated * arabic mmlu benchmarks are added * arabic mmlu group yaml file is updated * alghafa benchmarks are added * acva benchmarks are added * acva utils.py is updated * light version of arabic leaderboard benchmarks are added * bugs fixed * bug fixed * bug fixed * bug fixed * bug fixed * bug fixed * library import bug is fixed * doc to target updated * bash file is deleted * results folder is deleted * leaderboard groups are added * full arabic leaderboard groups are added, plus some bug fixes to the light version * Create README.md README.md for arabic_leaderboard_complete * Create README.md README.md for arabic_leaderboard_light * Delete lm_eval/tasks/arabic_leaderboard directory * Update README.md * Update README.md adding the Arabic leaderboards to the library * Update README.md 10% of the training set * Update README.md 10% of the training set * revert .gitignore to prev version * Update lm_eval/tasks/README.md Co-authored-by: Hailey Schoelkopf <[email protected]> * updated main README.md * Update lm_eval/tasks/README.md * specify machine translated benchmarks (complete) * specify machine translated benchmarks (light version) * add alghafa to the related task names (complete and light) * add 'acva' to the related task names (complete and light) * add 'arabic_leaderboard' to all the groups (complete and light) * all dataset - not a random sample * added more accurate details to the readme file * added mt_mmlu from okapi * Update lm_eval/tasks/README.md Co-authored-by: Hailey Schoelkopf <[email protected]> * Update lm_eval/tasks/README.md Co-authored-by: Hailey Schoelkopf <[email protected]> * updated mt_mmlu readme * renaming 'alghafa' full and light * renaming 'arabic_mmlu' light and full * renaming 'acva' full and light * update readme and standardize dir/file names * running pre-commit --------- Co-authored-by: shahrzads <[email protected]> Co-authored-by: shahrzads <[email protected]> Co-authored-by: Hailey Schoelkopf <[email protected]>

Malikeh97 added 30 commits August 12, 2024 20:56

arabic leaferboard yaml file is added

ef82779

arabic toxigen is implemented

09d428f

Dataset library is imported

8d78ff4

arabic sciq is added

53f107b

util file of arabic toxigen is updated

07396a2

arabic race is added

71dfe8b

arabic piqa is implemented

71534a3

arabic open qa is added

6118a55

arabic copa is implemented

49e4011

arabic boolq ia added

38ff060

arabic arc easy is added

9720eb5

arabic arc challenge is added

045b383

arabic exams benchmark is implemented

850dca3

arabic hellaswag is added

d5c2e55

arabic leaderboard yaml file metrics are updated

d041ae9

arabic mmlu benchmarks are added

1c63080

arabic mmlu group yaml file is updated

5feec66

alghafa benchmarks are added

27f07d4

acva benchmarks are added

10b758f

acva utils.py is updated

40fe782

light version of arabic leaderboard benchmarks are added

c3396db

bugs fixed

b48bc10

bug fixed

38bd735

bug fixed

860b0dd

bug fixed

d92c684

bug fixed

6f3f692

bug fixed

89c5e31

library import bug is fixed

634ac08

doc to target updated

821e101

bash file is deleted

1992457

shahrzads and others added 5 commits August 26, 2024 12:14

Merge branch 'EleutherAI:main' into arabic-leaderboard

0a4adb9

add alghafa to the related task names (complete and light)

7151859

add 'acva' to the related task names (complete and light)

e65f546

add 'arabic_leaderboard' to all the groups (complete and light)

3c0d737

all dataset - not a random sample

7d4d0e9

added more accurate details to the readme file

8c24721

haileyschoelkopf requested changes Aug 28, 2024

View reviewed changes

shahrzads and others added 11 commits August 28, 2024 21:36

added mt_mmlu from okapi

7004776

Merge branch 'EleutherAI:main' into arabic-leaderboard

fdaa3c1

Update lm_eval/tasks/README.md

d0bcbed

Co-authored-by: Hailey Schoelkopf <[email protected]>

Update lm_eval/tasks/README.md

547ce12

Co-authored-by: Hailey Schoelkopf <[email protected]>

updated mt_mmlu readme

014473c

renaming 'alghafa' full and light

c7aa2ff

Merge branch 'EleutherAI:main' into arabic-leaderboard

b051008

renaming 'arabic_mmlu' light and full

7c641b7

renaming 'acva' full and light

e0f9fd2

update readme and standardize dir/file names

b0366af

running pre-commit

33237fe

haileyschoelkopf approved these changes Sep 9, 2024

View reviewed changes

Merge branch 'EleutherAI:main' into arabic-leaderboard

f590408

lintangsutawika merged commit decc533 into EleutherAI:main Sep 10, 2024
7 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Open Arabic LLM Leaderboard Benchmarks (Full and Light Version) #2232

Add Open Arabic LLM Leaderboard Benchmarks (Full and Light Version) #2232

Malikeh97 commented Aug 20, 2024

Malikeh97 commented Aug 27, 2024

shahrzads commented Aug 27, 2024

haileyschoelkopf commented Aug 28, 2024

haileyschoelkopf left a comment •

edited

Loading

shahrzads commented Sep 3, 2024

shahrzads commented Sep 9, 2024

haileyschoelkopf commented Sep 9, 2024

Malikeh97 commented Sep 10, 2024

lintangsutawika commented Sep 10, 2024

Add Open Arabic LLM Leaderboard Benchmarks (Full and Light Version) #2232

Add Open Arabic LLM Leaderboard Benchmarks (Full and Light Version) #2232

Conversation

Malikeh97 commented Aug 20, 2024

Malikeh97 commented Aug 27, 2024

shahrzads commented Aug 27, 2024

haileyschoelkopf commented Aug 28, 2024

haileyschoelkopf left a comment • edited Loading

Choose a reason for hiding this comment

shahrzads commented Sep 3, 2024

shahrzads commented Sep 9, 2024

haileyschoelkopf commented Sep 9, 2024

Malikeh97 commented Sep 10, 2024

lintangsutawika commented Sep 10, 2024

haileyschoelkopf left a comment •

edited

Loading