diff --git a/README.md b/README.md index d30fa18a8..ffcca3fd4 100644 --- a/README.md +++ b/README.md @@ -70,6 +70,7 @@ Just like a compass guides us on our journey, OpenCompass will guide you through ## 🚀 What's New +- **\[2024.09.05\]** We now support answer extraction through model post-processing to provide a more accurate representation of the model's capabilities. As part of this update, we have integrated [XFinder](https://github.com/IAAR-Shanghai/xFinder) as our first post-processing model. For more detailed information, please refer to the [documentation](opencompass/utils/postprocessors/xfinder/README.md), and give it a try! 🔥🔥🔥 - **\[2024.08.20\]** OpenCompass now supports the [SciCode](https://github.com/scicode-bench/SciCode): A Research Coding Benchmark Curated by Scientists. 🔥🔥🔥 - **\[2024.08.16\]** OpenCompass now supports the brand new long-context language model evaluation benchmark — [RULER](https://arxiv.org/pdf/2404.06654). RULER provides an evaluation of long-context including retrieval, multi-hop tracing, aggregation, and question answering through flexible configurations. Check out the [RULER](configs/datasets/ruler/README.md) evaluation config now! 🔥🔥🔥 - **\[2024.08.09\]** We have released the example data and configuration for the CompassBench-202408, welcome to [CompassBench](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/compassbench_intro.html) for more details. 🔥🔥🔥 diff --git a/README_zh-CN.md b/README_zh-CN.md index 7c3a7a9e3..20a131b3d 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -69,6 +69,7 @@ ## 🚀 最新进展 +- **\[2024.09.05\]** OpenCompass 现在支持通过模型后处理来进行答案提取,以更准确地展示模型的能力。作为此次更新的一部分,我们集成了 [XFinder](https://github.com/IAAR-Shanghai/xFinder) 作为首个后处理模型。具体信息请参阅 [文档](opencompass/utils/postprocessors/xfinder/README.md),欢迎尝试! 🔥🔥🔥 - **\[2024.08.20\]** OpenCompass 现已支持 [SciCode](https://github.com/scicode-bench/SciCode): A Research Coding Benchmark Curated by Scientists。 🔥🔥🔥 - **\[2024.08.16\]** OpenCompass 现已支持全新的长上下文语言模型评估基准——[RULER](https://arxiv.org/pdf/2404.06654)。RULER 通过灵活的配置,提供了对长上下文包括检索、多跳追踪、聚合和问答等多种任务类型的评测,欢迎访问[RULER](configs/datasets/ruler/README.md)。🔥🔥🔥 - **\[2024.07.23\]** 我们支持了[Gemma2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315)模型,欢迎试用!🔥🔥🔥 diff --git a/opencompass/utils/postprocessors/xfinder/README.md b/opencompass/utils/postprocessors/xfinder/README.md index bb32381da..142092ddf 100644 --- a/opencompass/utils/postprocessors/xfinder/README.md +++ b/opencompass/utils/postprocessors/xfinder/README.md @@ -170,3 +170,25 @@ We have tested the model postprocess method with XFinder model on the GSM8K, MML | gsm8k | math | gsm8k_xfinder_gen_a58960 | 73.46 | 78.09 | | nq | short_text | nq_xfinder_gen_3dcea1 | 22.33 | 37.53 | | mmlu | alphabet_option | mmlu_xfinder_gen_4d595a | 67.89 | 67.93 | + +## 🖊️ Citation + +```bibtex +@misc{2023opencompass, + title={OpenCompass: A Universal Evaluation Platform for Foundation Models}, + author={OpenCompass Contributors}, + howpublished = {\url{https://github.com/open-compass/opencompass}}, + year={2023} +} + +@misc{yu2024xfinderrobustpinpointanswer, + title={xFinder: Robust and Pinpoint Answer Extraction for Large Language Models}, + author={Qingchen Yu and Zifan Zheng and Shichao Song and Zhiyu Li and Feiyu Xiong and Bo Tang and Ding Chen}, + year={2024}, + eprint={2405.11874}, + archivePrefix={arXiv}, + primaryClass={cs.CL}, + url={https://arxiv.org/abs/2405.11874}, +} + +```