Releases: open-compass/opencompass
0.3.5
The OpenCompass team is thrilled to announce the release of OpenCompress v0.3.5!
🌟 Highlights
- 🚀 Introduction of two new datasets: CMO&AIME, expanding our evaluation capabilities.
- 📖 Several updates to our documentation, ensuring clearer guidance for all users.
- ⚙ Several enhancements and refactoring efforts to make our codebase more robust and maintainable.
🚀 New Features
- 🆕 Added support for the CMO&AIME datasets, broadening the scope of models we can evaluate. (#1610)
- 🆕 Introduced the
CompassArenaSubjectiveBench
, a new benchmark for subjective evaluations. (#1645) - 🆕 Added configurations for the lmdeploy DeepSeek model, enhancing compatibility with cutting-edge technologies. (#1656)
📖 Documentation
- 📚 Updated the documentation to reflect the latest changes and improvements, making it easier than ever to navigate and understand. (#1655)
🐛 Bug Fixes
- 🔧 Fixed issues with the
ruler_16k_gen
component, ensuring more accurate and reliable results. (#1643) - 🔧 Resolved an error in the
get_loglikelihood
function when using lmdeploy as the accelerator. (#1659) - 🔧 Addressed problems with automatic downloads for certain datasets, streamlining the user experience. (#1652)
⚙ Enhancements and Refactors
- 💪 Enhanced the summarizer configurations for models, improving the efficiency and effectiveness of summarization tasks. (#1600)
- 💪 Added new model configurations, keeping up with the latest advancements in machine learning. (#1653)
- 💪 Updated the WildBench maximum sequence length, allowing for better handling of longer input sequences. (#1648)
- 💪 Updated the Needlebench OSS path, ensuring smoother data access and processing. (#1651)
- 💪 Improved the
mmmlu_lite
dataloader, optimizing data loading processes. (#1658)
🎉 Welcome New Contributors
- 👏 A warm welcome to @jnanliu, who has made their first contribution by adding the CMO&AIME datasets! (#1610)
For a complete overview of all changes, please refer to the full changelog: 0.3.4...0.3.5
0.3.4
The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.4!
🎉 OpenCompass v0.3.4 brings major enhancements including new benchmarks, improved documentation, and numerous bug fixes.
🌈 Notable features include support for new datasets and the integration of lmdeploy pipeline API.
🔧 Support for New Datasets:
- Addition of GaoKaoMath Dataset for Evaluation.
- Support for MMMLU & MMMLU-lite Benchmark.
- Integration of Judgerbench and reorganization of subeval.
- Support for LiveCodeBench.
📝 Output Format Enhancements:
- Support for printing and saving results as markdown format tables.
🔧 Pipeline and Integration Improvements:
- Integration of lmdeploy pipeline API.
- Update of TurboMindModel through integration of lmdeploy pipeline API.
- Removal of prefix bos_token from messages when using lmdeploy as the accelerator.
🛠️ Miscellaneous Enhancements:
- Updates to the common summarizer regex extraction.
- Internal humaneval postprocess addition and updates.
📖 Documentation Updates
🐛 Bug Fixes
🎉 Welcome New Contributors
👋 New Contributors Joined the Team:
@BobTsang1995 - Contributed support for MMMLU & MMMLU-lite Benchmark.
@noemotiovon - Provided NPU support fixes.
@changlan - Fixed RULER datasets.
@BIGWangYuDong - Added support for printing and saving results as markdown format tables.
Thank you to all contributors who have made this release possible. For a complete list of changes, please see the full changelog linked below.
Full Changelog: 0.3.3...0.3.4
0.3.3
🌟 OpenCompass v0.3.3 Release Log
The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.3!
🚀 New Features
- 🔧 Added support for the SciCode summarizer configuration.
- 🛠 Introduced support for internal Followbench.
- 🔧 Updated models and configurations for MathBench & WikiBench under FullBench.
- 🛠 Enhanced support for OpenAI O1 models and Qwen2.5 Instruct.
- 🔧 Included a postprocess function for custom models.
- 🛠 Added InternTrain feature for broader model training scenarios.
📖 Documentation
- 📚 Updated the README with the latest information on how to use OpenCompass effectively.
🐛 Bug Fixes
- 🔧 Fixed issues with the link-check workflow and wildbench.
- 🛠 Resolved errors in partitioning and corrected typos throughout the codebase.
- 🔧 Addressed compatibility issues with lmdeploy interface type changes.
- 🛠 Fixed the followbench dataset configuration and token settings.
⚙ Enhancements and Refactors
- 🛠 Enhanced support for verbose output in OpenAI API interactions.
- 🔧 Updated maximum output length configurations for multiple models.
- 🛠 Improved handling of the "begin section" in meta_template for better parsing.
- 🔧 Added a common summarizer for qabench and expanded test coverage for various models.
🎉 Welcome New Contributors
👋 We'd like to extend a warm welcome to our new contributors who have made their first contributions to OpenCompass:
- @x54-729 introduced InternTrain.
- @chuanyangjin helped correct typos.
- @cuauty added support for reasoning from BaiLing LLM.
Thank you to all our contributors for making this release possible!
Full Changelog: 0.3.2.post1...0.3.3
0.3.2.post1
What's Changed
- [Fix]Init import fix by @MaiziXiao in #1500
- [Bump] Bump version to 0.3.2.post1 by @MaiziXiao in #1502
Full Changelog: 0.3.2...0.3.2.post1
0.3.2
The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.2!
🚀 New Features
- 🛠 Added
extra_body
support for OpenAISDK and introduced proxy URL support when connecting to OpenAI's API. - 🗂 Included auto-download functionality for Mmlu-pro, Needlebench, Longbench and other datasets.
- 🤝 Integrated support for the Rendu API.
- 🧪 Added a model postprocess function.
📖 Documentation
- 📜 Updated the README file for better clarity and guidance.
🐛 Bug Fixes
- 🛠 Fixed CLI evaluation for multiple models.
- 🛠 Updated requirements to resolve dependency issues.
- 🛠 Corrected configurations for the Llama model series.
- 🛠 Addressed bad cases and added environment information to improve testing.
⚙ Enhancements and Refactors
- 🛠 Made OPENAI_API_BASE compatible with OpenAI's default environment settings.
- 🛠 Optimized SciCode for improved performance.
- 🛠 Added an
api_key
attribute to TurboMindAPIModel. - 🛠 Implemented fixes and improvements to the CI test environment, including baselines for vllm.
🎉 Welcome New Contributors
- 👋 @cpa2001 contributed with the addition of icl_sliding_k_retriever.py and updates to __init__.py.
- 👋 @gyin94 made the OPENAI_API_BASE compatible with OpenAI's default environment.
- 👋 @chengyingshe added an attribute
api_key
into TurboMindAPIModel. - 👋 @yanzeyu supported the integration of Rendu API.
Full Changelog: 0.3.1...0.3.2
OpenCompass v0.3.1
The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.1!
🌟 Highlights
- 🚀 Support pip installation, update Readme and evaluation demo
- 🐛 Fixed various dataset loading issues.
- ⚙️ Enhanced auto-download features for datasets.
🚀 New Features
- 🆕 Introduced support for Ruler datasets.
- 🆕 Enhanced model compatibility.
- 🆕 Improved dataset handling, support auto-download for various datasets
📖 Documentation
- 📚 Updated README to reflect the latest changes.
- 📚 Improved documentation for dataset loading procedures.
🐛 Bug Fixes
- 🐞 Resolved modelscope dataset load issues.
- 🐞 Corrected evaluation scores for the Lawbench dataset.
- 🐞 Fixed dataset bugs for CommonsenseQA and Longbench.
⚙ Enhancements and Refactors
- 🔧 Retained first and last halves of prompts to avoid max_seq_len issues.
- 🔧 Updated Compassbench to v1.3.
- 🔧 Switched to Python runner for single GPU operations.
🎉 Welcome New Contributors
- 🙌 @Yunnglin for fixing modelscope dataset load problem.
- 🙌 @changyeyu for addressing max_seq_len issues with prompt handling.
- 🙌 @seetimee for updates to openai_api.py.
- 🙌 @HariSeldon0 for adding the scicode dataset.
What's Changed
- [Fix] Fix modelscope dataset load problem by @Yunnglin in #1406
- [Fix] the issue where scores are negative in the Lawbench dataset evaluation(#1402) by @yaoyingyy in #1403
- [Doc] Update README by @tonysy in #1404
- Retain first and last halves of prompts to avoid max_seq_len issues by @changyeyu in #1373
- [UPDATE] Compassbench v1.3 by @MaiziXiao in #1396
- [Fix] longbench dataset load fix by @MaiziXiao in #1422
- [Fix] Sub summarizer order fix by @bittersweet1999 in #1426
- [Update] Support auto-download of FOFO/MT-Bench-101 by @tonysy in #1423
- [Bug] Commonsenseqa dataset fix by @MaiziXiao in #1425
- [Feature] Add abbr for rolebench dataset by @xu-song in #1431
- [Feature] Add Ruler datasets by @MaiziXiao in #1310
- [Fix] Fix openai api tiktoken bug for api server by @liushz in #1433
- Update openai_api.py by @seetimee in #1438
- [Feature] Add model support for 'huggingface_above_v4_33' when using '-a' by @liushz in #1430
- Add scicode by @HariSeldon0 in #1417
- [Doc] Update Readme by @MaiziXiao in #1439
- [Fix] Update option postprocess & mathbench language summarizer by @liushz in #1413
- [ci] add commond testcase into daily testcase by @zhulinJulia24 in #1447
- [Feature] Switch to python runner for single GPU by @xu-song in #1308
- [Fix] Update SciCode and Gemma model by @tonysy in #1449
- [Bump] Bump version to 0.3.1 by @tonysy in #1450
Full Changelog: 0.3.0...0.3.1
Thank you for your continued support and contributions to OpenCompass!
OpenCompass v0.3.0
The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.0! This release brings a variety of new features, enhancements, and bug fixes to improve your experience.
🌟 Highlights
- Support for OpenAI ChatCompletion
- Updated Model Support List
- Support Dataset Automatic Download
- Support
pip install opencompass
🚀 New Features
- Support for CompassBench Checklist Evaluation
- PR #1339 by @bittersweet1999
- Adding support for Doubao API
- PR #1218 by @LeavittLang
- Support for ModelScope Datasets
- PR #1289 by @wangxingjun778
📖 Documentation
🐛 Bug Fixes
- Fix Typing and Typo
- Fix Lint Issues
- PR #1334 by @DseidLi
- Fix Summary Error in subjective.py
⚙ Enhancements and Refactors
- Upgrade Default Math
pred_postprocessor
- Fix Path and Folder Updates
- Update Get Data Path for LCBench and HumanEval
🔗 Full Change Logs
- [Fix] Change abbr for arenahard dataset by @bittersweet1999 in #1302
- [Fix] Force register by @Leymore in #1311
- [Fix] add bc for alignbench summarizer by @bittersweet1999 in #1306
- [Fix] update Faq by @bittersweet1999 in #1313
- [Fix] Fix rouge evaluator of rolebench_zh by @xu-song in #1322
- [Doc] Update NeedleBench Docs by @DseidLi in #1330
- [Fix] Fix typing and typo by @xu-song in #1331
- [Fix] Fix lint by @DseidLi in #1334
- [Feature] support compassbench Checklist evaluation by @bittersweet1999 in #1339
- Add compassbench wiki&math part by @liushz in #1342
- Compassbench v1_3 subjective evaluation by @MaiziXiao in #1341
- [Fix] Update path and folder by @tonysy in #1344
- Upgrade default math
pred_postprocessor
by @xu-song in #1340 - commit inference ppl datasets by @Quehry in #1315
- CompassBench subjective summarizer added by @MaiziXiao in #1349
- Fix MathBench Generation Config by @liushz in #1351
- [Update] Update model support list by @bittersweet1999 in #1353
- [Update] update Subeval demo config by @bittersweet1999 in #1358
- [Fix] Fix the summary error in subjective.py by @WenjinW in #1363
- [Fix] Support HF models deployed with an OpenAI-compatible API. by @heya5 in #1352
- update docs by @Leymore in #1318
- [Feature] Make NeedleBench available on HF by @DseidLi in #1364
- 【bug fix】: Remove extra ampersands. by @baymax591 in #1365
- [Fix] minor update wildbench by @kleinzcy in #1335
- Adding support for Doubao API by @LeavittLang in #1218
- [Fix] origin_prompt should be None in llm-compression task by @mqy004 in #1225
- Calm dataset by @pengbo807 in #1287
- Add
en
andzh
groups to longbench summarizer; Fix longbench overall score by @xu-song in #1216 - [Revert] "Calm dataset (#1287)" by @bittersweet1999 in #1366
- Charm by @jxd0712 in #1230
- Support ModelScope datasets by @wangxingjun778 in #1289
- [Feature] Update pip install by @tonysy in #1324
- add support for hf_pulse_7b by @QXY716 in #1255
- [Fix] Update get_data_path for LCBench and HumanEval by @tonysy in #1375
- [Bug] Fix bug in turbomind by @tonysy in #1377
- [Fix] Fix version mismatch of CIBench by @kleinzcy in #1380
- [Fix] Fix InternLM2.5-7B-Chat-1M config by @DseidLi in #1383
- [Feature] Support import configs/models/summarizers from whl by @tonysy in #1376
- Calm dataset by @pengbo807 in #1385
- [Feature] Support OpenAI ChatCompletion by @tonysy in #1389
- [Fix] Fix slurm env by @tonysy in #1392
- [Fix] Fix CaLM import by @tonysy in #1395
- [Bump] Bump version for v0.3.0 by @tonysy in #1398
🎉 Welcome New Contributors
- @MaiziXiao made their first contribution in #1341
- @Quehry made their first contribution in #1315
- @WenjinW made their first contribution in #1363
- @heya5 made their first contribution in #1352
- @LeavittLang made their first contribution in #1218
- @pengbo807 made their first contribution in #1287
- @wangxingjun778 made their first contribution in #1289
- @QXY716 made their first contribution in #1255
Full Changelog: 0.2.6...0.3.0
OpenCompass v0.2.6
The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.6!
🌟 Highlights
- No noteworthy highlights.
🚀 New Features
📖 Documentation
🐛 Bug Fixes
- #1221 Resolve release version installation and import issues
- #1228 Fix pip version issues
- #1282 Update MathBench summarizer & fix cot setting
⚙ Enhancements and Refactors
- #1284 Reorganize subjective eval
🎉 Welcome New Contributors
- @mqy004, @sefira, @Zor-X-L and @baymax591 made their first contributions. Welcome to the OpenCompass community!
🔗 Full Change Logs
- [Fix] fix summarizer by @bittersweet1999 in #1217
- 解决release版本安装后不能导入opencompass.cli.main的问题 by @mqy004 in #1221
- MT-Bench-101 by @sefira in #1215
- [Feature] add dataset Fofo by @bittersweet1999 in #1224
- [Fix] fix pip version by @bittersweet1999 in #1228
- add ",<2.0.0" to "numpy>=1.23.4" in requirements/runtime.txt, as pand… by @Zor-X-L in #1267
- Support wildbench by @kleinzcy in #1266
- Add doc for accelerator function by @liushz in #1252
- flash attn installation in daily testcase by @zhulinJulia24 in #1272
- Update mtbench101.py by @sefira in #1276
- [Sync] Sync with internal codes 2024.06.28 by @Leymore in #1279
- Update MathBench summarizer & fix cot setting by @liushz in #1282
- npu适配 by @baymax591 in #1250
- [ci] update daily testcase by @zhulinJulia24 in #1285
- [Feature] Add InternLM2.5 by @tonysy in #1286
- [Feat] Update owners for issues by @tonysy in #1293
- [Refactor] Reorganize subjective eval by @bittersweet1999 in #1284
- [Doc] quick start swap tabs by @Leymore in #1263
Full Changelog: 0.2.5...0.2.6
OpenCompass v0.2.5
The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.5!
🌟 Highlights
- Simplify the huggingface / vllm / lmdeploy model wrapper.
meta_template
is no longer needed to be hand-crafted in model configs - Introduce evaluation results README in ~20 dataset config folders.
🚀 New Features
- #1065 Add LLaMA-3 Series Configs
- #1048 Add TheoremQA with 5-shot
- #1094 Support Math evaluation via judgemodel
- #1080 Add gpqa prompt from simple_evals, openai
- #1074 Add mmlu prompt from simple_evals, openai
- #1123 Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs
📖 Documentation
- #1053 Update readme
- #1102 Update NeedleInAHaystack Docs
- #1110 Update README.md
- #1205 Remove --no-batch-padding and Use --hf-num-gpus
🐛 Bug Fixes
- #1036 Update setup.py install_requires
- #1051 Fixed the issue caused
- #1043 fix multiround
- #1070 Fix sequential runner
- #1079 Fix Llama-3 meta template
⚙ Enhancements and Refactors
- #1163 enable HuggingFacewithChatTemplate with --accelerator via cli
- #1104 fix prompt template
- #1109 Update performance of common benchmarks
🎉 Welcome New Contributors
- @liuwei130, @IcyFeather233, @VVVenus1212, @binary-husky, @dmitrysarov, @eltociear, @acylam, @lfy79001, @JuhaoLiang1997, @yaoyingyy, and @jxd0712 made their first contributions. Welcome to the OpenCompass community!
🔗 Full Change Logs
- [Fix] Update setup.py install_requires by @Leymore in #1036
- add ChemBench by @liuwei130 in #1032
- [Fix] logger.error -> logger.debug in OpenAI by @Leymore in #1050
- [Sync] Bump version to 0.2.4 by @Leymore in #1052
- [Doc] Update readme by @tonysy in #1053
- [fix]Fixed the issue caused by the repeated loading of VLLM model dur… by @IcyFeather233 in #1051
- [Sync] Sync with internal code 2024.04.19 by @Leymore in #1064
- [Fix] fix multiround by @bittersweet1999 in #1043
- [Feature] Add LLaMA-3 Series Configs by @Leymore in #1065
- [Feature] Add TheoremQA with 5-shot by @Leymore in #1048
- [Fix] Fix sequential runner by @Leymore in #1070
- Add lmdeploy tis python backend model by @ispobock in #1014
- Fix Llama-3 meta template by @liushz in #1079
- Add humaneval prompt from simple_evals, openai by @jingmingzhuo in #1076
- [Feature] Support Math evaluation via judgemodel by @bittersweet1999 in #1094
- [Feature] support arenahard evaluation by @bittersweet1999 in #1096
- Update CIBench by @kleinzcy in #1089
- [Feature] Add gpqa prompt from simple_evals, openai by @Francis-llgg in #1080
- [Deperecate] Remove multi-modal related stuff by @kennymckormick in #1072
- add vllm get_ppl by @VVVenus1212 in #1003
- fix: python path bug by @binary-husky in #1063
- fix output typing, change mutable list to immutable tuple by @dmitrysarov in #989
- [Doc] Update NeedleInAHaystack Docs by @DseidLi in #1102
- [Feature] add support for Flames datasets by @Yggdrasill7D6 in #1093
- adapt to lmdeploy v0.4.0 by @lvhan028 in #1073
- [Fix] fix prompt template by @bittersweet1999 in #1104
- [Fix] Fix Math Evaluation with Judge Model Evaluator & Add README by @liushz in #1103
- [Update] Update performance of common benchmarks by @tonysy in #1109
- [Fix] fix cmb dataset by @bittersweet1999 in #1106
- [Docs] Update README.md by @eltociear in #1110
- [Feature] Adding support for LLM Compression Evaluation by @acylam in #1108
- [Fix] remove redundant pre-commit check by @Leymore in #891
- fix LightllmApi workers bug by @helloyongyang in #1113
- [Feature] Add mmlu prompt from simple_evals, openai by @Leymore in #1074
- [Feature] update drop dataset from openai simple eval by @kleinzcy in #1092
- add mgsm datasets by @Yggdrasill7D6 in #1081
- [Fix] Fix AGIEval chinese sets by @xu-song in #972
- S3Eval Dataset by @lfy79001 in #916
- [Feature] Add AceGPT-MMLUArabic benchmark by @JuhaoLiang1997 in #1099
- [Fix] fix links by @bittersweet1999 in #1120
- [Fix] Fix NeedleBench Summarizer Typo by @DseidLi in #1125
- [Feature] Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs by @acylam in #1123
- [Sync] Update accelerator by @Leymore in #1122
- [Fix] fix alpacaeval while add caching path by @bittersweet1999 in #1139
- [Fix] fix multiround by @bittersweet1999 in #1146
- [Fix] Fix Needlebench Summarizer by @DseidLi in #1143
- [Feature] Add huggingface apply_chat_template by @Leymore in #1098
- [Feat] Support dataset_suffix check for mixed configs by @xu-song in #973
- [Format] Add some config lints by @Leymore in #892
- [Sync] Sync with internal codes 2024.05.14 by @Leymore in #1156
- [Fix] fix arenahard summarizer by @bittersweet1999 in #1154
- [Fix] use ProcessPoolExecutor during mbpp eval by @Leymore in #1159
- [Fix] Update stop_words in huggingface_above_v4_33 by @Leymore in #1160
- Update accelerator by @liushz in #1152
- [Feat] enable HuggingFacewithChatTemplate with --accelerator via cli by @Leymore in #1163
- update test workflow by @zhulinJulia24 in #1167
- [Sync] Sync with internal codes 2024.05.17 by @Leymore in #1171
- add dependency in daily test workflow by @zhulinJulia24 in #1173
- [Sync] Sync with internal codes 2024.05.21.1 by @Leymore in #1175
- Update MathBench by @liushz in #1176
- [Fix] fix template by @bittersweet1999 in #1178
- Fix a bug in drop_gen.py by @kleinzcy in #1191
- [Fix] temporary files using tempfile by @yaoyingyy in #1186
- [Fix] add support for lmdeploy api judge by @bittersweet1999 in #1193
- [Fix] fix length by @bittersweet1999 in #1180
- support CHARM (https://github.com/opendatalab/CHARM) reasoning tasks by @jxd0712 in #1190
- [Feat] Update charm summary by @Leymore in #1194
- Update accelerator by @liushz in #1195
- [Sync] S...
OpenCompass v0.2.5.rc1
[Feature] Add lmdeploy tis python backend model (#1014) * add lmdeploy tis python backend model * fix pr check * update