-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contamination analysis for MMLU, Hellaswag, and ARC_c #699
Conversation
And btw I am submitting a paper about data contamination to NAACL where the deadline is 15 Dec. I have finished experiments on all Llama models (see table below), but the paper would be stronger to have analysis of more LLMs. For example, we could add Multilingual LLMs (Qwen, Yi, baichuan) on CEval or more recent model (Mistral etc.). If you could provide some sort of data, it would really helpful. I would love to add you as co-authors. |
Found a bug in contamination annotation from the Contamination Detector side. Fixed. but if you have run contamination analysis for MMLU, please clean the cache file |
@Leymore Hi, how was your holiday? DO we have any updates on this pr? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Contamination analysis for ARC_c, mmlu, and Hellaswag * update `eval_contamination.py` * update `contamination.py` summarizer * fix `eval_contamination.py` * add mmlu groups for contamination analysis
…-compass#699) * Contamination analysis for ARC_c, mmlu, and Hellaswag * update `eval_contamination.py` * update `contamination.py` summarizer * fix `eval_contamination.py` * add mmlu groups for contamination analysis
This PR is to extend contamination analysis to MMLU, Hellaswag, and ARC_c.
Now OpenCompass should be able to conduct contamination analysis on CEval, MMLU, Hellaswag, and ARC_c.
In Contamination Detector, actually Winogrande and CommonsenseQA are also supported, but since they have very little comtaminated samples, I don't think it's necessary to add them in OpenCompass.
See the example results below:
Note that, for ARC_c the contamination detection was done on the
test
set, but the default test split of ARC_c in OpenCompass isdev
. So, it may require to rerun ARC_c on test to finish the data contamination analysis.