Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contamination analysis for MMLU, Hellaswag, and ARC_c #699

Merged
merged 5 commits into from
Jan 8, 2024

Conversation

liyucheng09
Copy link
Contributor

This PR is to extend contamination analysis to MMLU, Hellaswag, and ARC_c.

Now OpenCompass should be able to conduct contamination analysis on CEval, MMLU, Hellaswag, and ARC_c.

In Contamination Detector, actually Winogrande and CommonsenseQA are also supported, but since they have very little comtaminated samples, I don't think it's necessary to add them in OpenCompass.

See the example results below:

dataset     version    mode    baichuan2-7b-base-hf    -                              -                                        qwen-7b-hf        -                              -                                        llama_30b_autogptq    -                              -
----------  ---------  ------  ----------------------  -----------------------------  ---------------------------------------  ----------------  -----------------------------  ---------------------------------------  --------------------  -----------------------------  ---------------------------------------
-           -          -       accuracy - clean        accuracy - input contaminated  accuracy - input-and-label contaminated  accuracy - clean  accuracy - input contaminated  accuracy - input-and-label contaminated  accuracy - clean      accuracy - input contaminated  accuracy - input-and-label contaminated
mmlu        -          ppl     56.76                   44.69                          54.93                                    58.74             48.67                          58.28                                    57.46                 45.72                          57.16
hellaswag   47bff9     ppl     66.87                   57.14                          70.97                                    86.42             89.29                          90.88                                    76.71                 57.14                          82.37

Note that, for ARC_c the contamination detection was done on the test set, but the default test split of ARC_c in OpenCompass is dev. So, it may require to rerun ARC_c on test to finish the data contamination analysis.

@liyucheng09
Copy link
Contributor Author

And btw I am submitting a paper about data contamination to NAACL where the deadline is 15 Dec.

I have finished experiments on all Llama models (see table below), but the paper would be stronger to have analysis of more LLMs.

image

For example, we could add Multilingual LLMs (Qwen, Yi, baichuan) on CEval or more recent model (Mistral etc.).

If you could provide some sort of data, it would really helpful. I would love to add you as co-authors.

Let me know what do you think. @tonysy @Leymore

@liyucheng09
Copy link
Contributor Author

Found a bug in contamination annotation from the Contamination Detector side. Fixed.

but if you have run contamination analysis for MMLU, please clean the cache file data/mmlu/test/MMLU_test_contamination_annotations.json and try again.

@liyucheng09
Copy link
Contributor Author

@Leymore Hi, how was your holiday? DO we have any updates on this pr?

@yingfhu yingfhu assigned Leymore and unassigned yingfhu Jan 8, 2024
Copy link
Contributor

@Leymore Leymore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Leymore Leymore merged commit 0b28630 into open-compass:main Jan 8, 2024
7 checks passed
Leymore pushed a commit that referenced this pull request Jan 8, 2024
* Contamination analysis for ARC_c, mmlu, and Hellaswag

* update `eval_contamination.py`

* update `contamination.py` summarizer

* fix `eval_contamination.py`

* add mmlu groups for contamination analysis
liuyaox pushed a commit to liuyaox/opencompass that referenced this pull request Jun 26, 2024
…-compass#699)

* Contamination analysis for ARC_c, mmlu, and Hellaswag

* update `eval_contamination.py`

* update `contamination.py` summarizer

* fix `eval_contamination.py`

* add mmlu groups for contamination analysis
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants