Merge branch 'main' into zfz/prompt-docs

open-compass · Jul 28, 2023 · 9659468 · 9659468
2 parents 21801e4 + 538b439
commit 9659468
Show file tree

Hide file tree

Showing 106 changed files with 2,474 additions and 692 deletions.
diff --git a/.codespellrc b/.codespellrc
@@ -2,4 +2,4 @@
 skip = *.ipynb
 count =
 quiet-level = 3
-ignore-words-list = nd, ans, ques
+ignore-words-list = nd, ans, ques, rouge
diff --git a/.github/ISSUE_TEMPLATE/1_bug-report.yml b/.github/ISSUE_TEMPLATE/1_bug-report.yml
@@ -6,6 +6,7 @@ body:
   - type: markdown
     attributes:
       value: |
+        For general questions or idea discussions, please post it to our [**Forum**](https://github.com/InternLM/opencompass/discussions).
         If you have already identified the reason, we strongly appreciate you creating a new PR according to [the tutorial](https://opencompass.readthedocs.io/en/master/community/CONTRIBUTING.html)!
         If you need our help, please fill in the following form to help us to identify the bug.
 

diff --git a/.github/ISSUE_TEMPLATE/2_feature-request.yml b/.github/ISSUE_TEMPLATE/2_feature-request.yml
@@ -6,6 +6,7 @@ body:
   - type: markdown
     attributes:
       value: |
+        For general questions or idea discussions, please post it to our [**Forum**](https://github.com/InternLM/opencompass/discussions).
         If you have already implemented the feature, we strongly appreciate you creating a new PR according to [the tutorial](https://opencompass.readthedocs.io/en/master/community/CONTRIBUTING.html)!
 
   - type: textarea

diff --git a/.github/ISSUE_TEMPLATE/3_bug-report_zh.yml b/.github/ISSUE_TEMPLATE/3_bug-report_zh.yml
@@ -7,7 +7,7 @@ body:
     attributes:
       value: |
         我们推荐使用英语模板 Bug report，以便你的问题帮助更多人。
-
+        如果需要询问一般性的问题或者想法，请在我们的[**论坛**](https://github.com/InternLM/opencompass/discussions)讨论。
         如果你已经有了解决方案，我们非常欢迎你直接创建一个新的 PR 来解决这个问题。创建 PR 的流程可以参考[文档](https://opencompass.readthedocs.io/zh_CN/master/community/CONTRIBUTING.html)。
         如果你需要我们的帮助，请填写以下内容帮助我们定位 Bug。
 

diff --git a/.github/ISSUE_TEMPLATE/4_feature-request_zh.yml b/.github/ISSUE_TEMPLATE/4_feature-request_zh.yml
@@ -7,7 +7,7 @@ body:
     attributes:
       value: |
         推荐使用英语模板 Feature request，以便你的问题帮助更多人。
-
+        如果需要询问一般性的问题或者想法，请在我们的[**论坛**](https://github.com/InternLM/opencompass/discussions)讨论。
         如果你已经实现了该功能，我们非常欢迎你直接创建一个新的 PR 来解决这个问题。创建 PR 的流程可以参考[文档](https://opencompass.readthedocs.io/zh_CN/master/community/CONTRIBUTING.html)。
 
   - type: textarea

diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml
@@ -0,0 +1,23 @@
+name: lint
+
+on: [push, pull_request]
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - name: Set up Python 3.10
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.10'
+      - name: Install pre-commit hook
+        run: |
+          pip install pre-commit
+          pre-commit install
+      - name: Linting
+        run: pre-commit run --all-files
diff --git a/.gitignore b/.gitignore
@@ -82,3 +82,6 @@ instance/
 # Auto generate documentation
 docs/en/_build/
 docs/zh_cn/_build/
+
+# .zip
+*.zip
diff --git a/.owners.yml b/.owners.yml
@@ -0,0 +1,14 @@
+assign:
+  issues: enabled
+  pull_requests: disabled
+  strategy:
+    # random
+    daily-shift-based
+  scedule:
+    '*/1 * * * *'
+  assignees:
+    - Leymore
+    - gaotongxiao
+    - yingfhu
+    - Ezra-Yu
+    - tonysy
diff --git a/README.md b/README.md
@@ -5,21 +5,31 @@
 
 [![docs](https://readthedocs.org/projects/opencompass/badge)](https://opencompass.readthedocs.io/en)
 [![license](https://img.shields.io/github/license/InternLM/opencompass.svg)](https://github.com/InternLM/opencompass/blob/main/LICENSE)
+
 <!-- [![PyPI](https://badge.fury.io/py/opencompass.svg)](https://pypi.org/project/opencompass/) -->
 
 [🌐Website](https://opencompass.org.cn/) |
 [📘Documentation](https://opencompass.readthedocs.io/en/latest/) |
-[🛠️Installation](https://opencompass.readthedocs.io/en/latest/get_started/install.html) |
+[🛠️Installation](https://opencompass.readthedocs.io/en/latest/get_started.html#installation) |
 [🤔Reporting Issues](https://github.com/InternLM/opencompass/issues/new/choose)
 
 English | [简体中文](README_zh-CN.md)
 
 </div>
 
+<p align="center">
+    👋 join us on <a href="https://twitter.com/intern_lm" target="_blank">Twitter</a>, <a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> and <a href="https://r.vansin.top/?r=internwx" target="_blank">WeChat</a>
+</p>
+
 Welcome to **OpenCompass**!
 
 Just like a compass guides us on our journey, OpenCompass will guide you through the complex landscape of evaluating large language models. With its powerful algorithms and intuitive interface, OpenCompass makes it easy to assess the quality and effectiveness of your NLP models.
 
+## News
+
+- **\[2023.07.19\]** We have supported [Llama 2](https://ai.meta.com/llama/)! Its performance report will be available soon. \[[doc](./docs/en/get_started.md#Installation)\]
+- **\[2023.07.13\]** We release [MMBench](https://opencompass.org.cn/MMBench), a meticulously curated dataset to comprehensively evaluate different abilities of multimodality models 🔥🔥🔥.
+
 ## Introduction
 
 OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features includes:
@@ -281,10 +291,10 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
 
 ## Installation
 
-Below are the steps for quick installation. Some third-party features may require additional steps to work properly, for detailed steps please refer to the [Installation Guide](https://opencompass.readthedocs.io/en/latest/get_started.html).
+Below are the steps for quick installation and datasets preparation.
 
 ```Python
-conda create --name opencompass python=3.8 pytorch torchvision -c pytorch -y
+conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
 conda activate opencompass
 git clone https://github.com/InternLM/opencompass opencompass
 cd opencompass
@@ -294,9 +304,13 @@ wget https://github.com/InternLM/opencompass/releases/download/0.1.0/OpenCompass
 unzip OpenCompassData.zip
 ```
 
+Some third-party features, like Humaneval and Llama, may require additional steps to work properly, for detailed steps please refer to the [Installation Guide](https://opencompass.readthedocs.io/en/latest/get_started.html).
+
 ## Evaluation
 
-Please read the [Quick Start](https://opencompass.readthedocs.io/en/latest/get_started.html) to learn how to run an evaluation task.
+Make sure you have installed OpenCompass correctly and prepared your datasets according to the above steps. Please read the [Quick Start](https://opencompass.readthedocs.io/en/latest/get_started.html#quick-start) to learn how to run an evaluation task.
+
+For more tutorials, please check our [Documentation](https://opencompass.readthedocs.io/en/latest/index.html).
 
 ## Acknowledgements
 

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -5,21 +5,31 @@
 
 [![docs](https://readthedocs.org/projects/opencompass/badge)](https://opencompass.readthedocs.io/zh_CN)
 [![license](https://img.shields.io/github/license/InternLM/opencompass.svg)](https://github.com/InternLM/opencompass/blob/main/LICENSE)
+
 <!-- [![PyPI](https://badge.fury.io/py/opencompass.svg)](https://pypi.org/project/opencompass/) -->
 
 [🌐Website](https://opencompass.org.cn/) |
-[📘Documentation](https://opencompass.readthedocs.io/en/latest/) |
-[🛠️Installation](https://opencompass.readthedocs.io/en/latest/get_started/install.html) |
+[📘Documentation](https://opencompass.readthedocs.io/zh_CN/latest/index.html) |
+[🛠️Installation](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id1) |
 [🤔Reporting Issues](https://github.com/InternLM/opencompass/issues/new/choose)
 
 [English](/README.md) | 简体中文
 
 </div>
 
+<p align="center">
+    👋 加入我们的<a href="https://twitter.com/intern_lm" target="_blank">推特</a>、<a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> 和 <a href="https://r.vansin.top/?r=internwx" target="_blank">微信社区</a>
+</p>
+
 欢迎来到OpenCompass！
 
 就像指南针在我们的旅程中为我们导航一样，我们希望OpenCompass能够帮助你穿越评估大型语言模型的重重迷雾。OpenCompass提供丰富的算法和功能支持，期待OpenCompass能够帮助社区更便捷地对NLP模型的性能进行公平全面的评估。
 
+## 更新
+
+- **\[2023.07.19\]** 新增了 [Llama 2](https://ai.meta.com/llama/)！我们近期将会公布其评测结果。\[[文档](./docs/zh_cn/get_started.md#安装)\]
+- **\[2023.07.13\]** 发布了 [MMBench](https://opencompass.org.cn/MMBench)，该数据集经过细致整理，用于评测多模态模型全方位能力 🔥🔥🔥。
+
 ## 介绍
 
 OpenCompass 是面向大模型评测的一站式平台。其主要特点如下：
@@ -279,12 +289,12 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
   </tbody>
 </table>
 
-# 安装
+## 安装
 
-下面展示了快速安装的步骤。有部分第三方功能可能需要额外步骤才能正常运行，详细步骤请参考[安装指南](https://opencompass.readthedocs.io/zh_cn/latest/get_started.html)。
+下面展示了快速安装以及准备数据集的步骤。
 
 ```Python
-conda create --name opencompass python=3.8 pytorch torchvision -c pytorch -y
+conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
 conda activate opencompass
 git clone https://github.com/InternLM/opencompass opencompass
 cd opencompass
@@ -294,9 +304,13 @@ wget https://github.com/InternLM/opencompass/releases/download/0.1.0/OpenCompass
 unzip OpenCompassData.zip
 ```
 
+有部分第三方功能,如 Humaneval 以及 Llama,可能需要额外步骤才能正常运行，详细步骤请参考[安装指南](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html)。
+
 ## 评测
 
-请阅读[快速上手](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id2)了解如何运行一个评测任务。
+确保按照上述步骤正确安装 OpenCompass 并准备好数据集后，请阅读[快速上手](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id3)了解如何运行一个评测任务。
+
+更多教程请查看我们的[文档](https://opencompass.readthedocs.io/zh_CN/latest/index.html)。
 
 ## 致谢
 

diff --git a/configs/datasets/FewCLUE_chid/FewCLUE_chid_ppl_8f2872.py b/configs/datasets/FewCLUE_chid/FewCLUE_chid_ppl_8f2872.py
@@ -21,7 +21,7 @@
     retriever=dict(type=ZeroRetriever),
     inferencer=dict(type=PPLInferencer))
 
-chid_eval_cfg = dict(evaluator=dict(type=AccEvaluator), pred_role="BOT")
+chid_eval_cfg = dict(evaluator=dict(type=AccEvaluator))
 
 chid_datasets = [
     dict(

diff --git a/configs/datasets/cmmlu/cmmlu_gen.py b/configs/datasets/cmmlu/cmmlu_gen.py
@@ -0,0 +1,4 @@
+from mmengine.config import read_base
+
+with read_base():
+    from .cmmlu_gen_ffe7c0 import cmmlu_datasets  # noqa: F401, F403
diff --git a/configs/datasets/cmmlu/cmmlu_gen_ffe7c0.py b/configs/datasets/cmmlu/cmmlu_gen_ffe7c0.py
@@ -0,0 +1,122 @@
+from opencompass.openicl.icl_prompt_template import PromptTemplate
+from opencompass.openicl.icl_retriever import FixKRetriever
+from opencompass.openicl.icl_inferencer import GenInferencer
+from opencompass.openicl.icl_evaluator import AccEvaluator
+from opencompass.datasets import CMMLUDataset
+from opencompass.utils.text_postprocessors import first_capital_postprocess
+
+cmmlu_subject_mapping = {
+    'agronomy': '农学',
+    'anatomy': '解剖学',
+    'ancient_chinese': '古汉语',
+    'arts': '艺术学',
+    'astronomy': '天文学',
+    'business_ethics': '商业伦理',
+    'chinese_civil_service_exam': '中国公务员考试',
+    'chinese_driving_rule': '中国驾驶规则',
+    'chinese_food_culture': '中国饮食文化',
+    'chinese_foreign_policy': '中国外交政策',
+    'chinese_history': '中国历史',
+    'chinese_literature': '中国文学',
+    'chinese_teacher_qualification': '中国教师资格',
+    'clinical_knowledge': '临床知识',
+    'college_actuarial_science': '大学精算学',
+    'college_education': '大学教育学',
+    'college_engineering_hydrology': '大学工程水文学',
+    'college_law': '大学法律',
+    'college_mathematics': '大学数学',
+    'college_medical_statistics': '大学医学统计',
+    'college_medicine': '大学医学',
+    'computer_science': '计算机科学',
+    'computer_security': '计算机安全',
+    'conceptual_physics': '概念物理学',
+    'construction_project_management': '建设工程管理',
+    'economics': '经济学',
+    'education': '教育学',
+    'electrical_engineering': '电气工程',
+    'elementary_chinese': '小学语文',
+    'elementary_commonsense': '小学常识',
+    'elementary_information_and_technology': '小学信息技术',
+    'elementary_mathematics': '初等数学',
+    'ethnology': '民族学',
+    'food_science': '食品科学',
+    'genetics': '遗传学',
+    'global_facts': '全球事实',
+    'high_school_biology': '高中生物',
+    'high_school_chemistry': '高中化学',
+    'high_school_geography': '高中地理',
+    'high_school_mathematics': '高中数学',
+    'high_school_physics': '高中物理学',
+    'high_school_politics': '高中政治',
+    'human_sexuality': '人类性行为',
+    'international_law': '国际法学',
+    'journalism': '新闻学',
+    'jurisprudence': '法理学',
+    'legal_and_moral_basis': '法律与道德基础',
+    'logical': '逻辑学',
+    'machine_learning': '机器学习',
+    'management': '管理学',
+    'marketing': '市场营销',
+    'marxist_theory': '马克思主义理论',
+    'modern_chinese': '现代汉语',
+    'nutrition': '营养学',
+    'philosophy': '哲学',
+    'professional_accounting': '专业会计',
+    'professional_law': '专业法学',
+    'professional_medicine': '专业医学',
+    'professional_psychology': '专业心理学',
+    'public_relations': '公共关系',
+    'security_study': '安全研究',
+    'sociology': '社会学',
+    'sports_science': '体育学',
+    'traditional_chinese_medicine': '中医中药',
+    'virology': '病毒学',
+    'world_history': '世界历史',
+    'world_religions': '世界宗教'
+}
+
+
+cmmlu_all_sets = list(cmmlu_subject_mapping.keys())
+
+cmmlu_datasets = []
+for _name in cmmlu_all_sets:
+    _ch_name = cmmlu_subject_mapping[_name]
+    cmmlu_infer_cfg = dict(
+        ice_template=dict(
+            type=PromptTemplate,
+            template=dict(
+                begin="</E>",
+                round=[
+                    dict(
+                        role="HUMAN",
+                        prompt=
+                        f"以下是关于{_ch_name}的单项选择题，请直接给出正确答案的选项。\n题目：{{question}}\nA. {{A}}\nB. {{B}}\nC. {{C}}\nD. {{D}}"
+                    ),
+                    dict(role="BOT", prompt='答案是: {answer}'),
+                ]),
+            ice_token="</E>",
+        ),
+        retriever=dict(type=FixKRetriever),
+        inferencer=dict(type=GenInferencer, fix_id_list=[0, 1, 2, 3, 4]),
+    )
+
+    cmmlu_eval_cfg = dict(
+        evaluator=dict(type=AccEvaluator),
+        pred_postprocessor=dict(type=first_capital_postprocess))
+
+    cmmlu_datasets.append(
+        dict(
+            type=CMMLUDataset,
+            path="./data/cmmlu/",
+            name=_name,
+            abbr=f"cmmlu-{_name}",
+            reader_cfg=dict(
+                input_columns=["question", "A", "B", "C", "D"],
+                output_column="answer",
+                train_split="dev",
+                test_split='test'),
+            infer_cfg=cmmlu_infer_cfg,
+            eval_cfg=cmmlu_eval_cfg,
+        ))
+
+del _name, _ch_name
diff --git a/configs/datasets/cmmlu/cmmlu_ppl.py b/configs/datasets/cmmlu/cmmlu_ppl.py
@@ -0,0 +1,4 @@
+from mmengine.config import read_base
+
+with read_base():
+    from .cmmlu_ppl_fd1f2f import cmmlu_datasets  # noqa: F401, F403