Skip to content

Releases: open-compass/opencompass

0.3.5

04 Nov 02:56
db258eb
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompress v0.3.5!

🌟 Highlights

  • 🚀 Introduction of two new datasets: CMO&AIME, expanding our evaluation capabilities.
  • 📖 Several updates to our documentation, ensuring clearer guidance for all users.
  • ⚙ Several enhancements and refactoring efforts to make our codebase more robust and maintainable.

🚀 New Features

  • 🆕 Added support for the CMO&AIME datasets, broadening the scope of models we can evaluate. (#1610)
  • 🆕 Introduced the CompassArenaSubjectiveBench, a new benchmark for subjective evaluations. (#1645)
  • 🆕 Added configurations for the lmdeploy DeepSeek model, enhancing compatibility with cutting-edge technologies. (#1656)

📖 Documentation

  • 📚 Updated the documentation to reflect the latest changes and improvements, making it easier than ever to navigate and understand. (#1655)

🐛 Bug Fixes

  • 🔧 Fixed issues with the ruler_16k_gen component, ensuring more accurate and reliable results. (#1643)
  • 🔧 Resolved an error in the get_loglikelihood function when using lmdeploy as the accelerator. (#1659)
  • 🔧 Addressed problems with automatic downloads for certain datasets, streamlining the user experience. (#1652)

⚙ Enhancements and Refactors

  • 💪 Enhanced the summarizer configurations for models, improving the efficiency and effectiveness of summarization tasks. (#1600)
  • 💪 Added new model configurations, keeping up with the latest advancements in machine learning. (#1653)
  • 💪 Updated the WildBench maximum sequence length, allowing for better handling of longer input sequences. (#1648)
  • 💪 Updated the Needlebench OSS path, ensuring smoother data access and processing. (#1651)
  • 💪 Improved the mmmlu_lite dataloader, optimizing data loading processes. (#1658)

🎉 Welcome New Contributors

  • 👏 A warm welcome to @jnanliu, who has made their first contribution by adding the CMO&AIME datasets! (#1610)

For a complete overview of all changes, please refer to the full changelog: 0.3.4...0.3.5

0.3.4

25 Oct 12:25
9c39cb6
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.4!

🎉 OpenCompass v0.3.4 brings major enhancements including new benchmarks, improved documentation, and numerous bug fixes.
🌈 Notable features include support for new datasets and the integration of lmdeploy pipeline API.

🔧 Support for New Datasets:

  • Addition of GaoKaoMath Dataset for Evaluation.
  • Support for MMMLU & MMMLU-lite Benchmark.
  • Integration of Judgerbench and reorganization of subeval.
  • Support for LiveCodeBench.

📝 Output Format Enhancements:

  • Support for printing and saving results as markdown format tables.

🔧 Pipeline and Integration Improvements:

  • Integration of lmdeploy pipeline API.
  • Update of TurboMindModel through integration of lmdeploy pipeline API.
  • Removal of prefix bos_token from messages when using lmdeploy as the accelerator.

🛠️ Miscellaneous Enhancements:

  • Updates to the common summarizer regex extraction.
  • Internal humaneval postprocess addition and updates.

📖 Documentation Updates

🐛 Bug Fixes

🎉 Welcome New Contributors
👋 New Contributors Joined the Team:

@BobTsang1995 - Contributed support for MMMLU & MMMLU-lite Benchmark.
@noemotiovon - Provided NPU support fixes.
@changlan - Fixed RULER datasets.
@BIGWangYuDong - Added support for printing and saving results as markdown format tables.
Thank you to all contributors who have made this release possible. For a complete list of changes, please see the full changelog linked below.

Full Changelog: 0.3.3...0.3.4

0.3.3

30 Sep 08:58
22a4e76
Compare
Choose a tag to compare

🌟 OpenCompass v0.3.3 Release Log
The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.3!

🚀 New Features

  • 🔧 Added support for the SciCode summarizer configuration.
  • 🛠 Introduced support for internal Followbench.
  • 🔧 Updated models and configurations for MathBench & WikiBench under FullBench.
  • 🛠 Enhanced support for OpenAI O1 models and Qwen2.5 Instruct.
  • 🔧 Included a postprocess function for custom models.
  • 🛠 Added InternTrain feature for broader model training scenarios.

📖 Documentation

  • 📚 Updated the README with the latest information on how to use OpenCompass effectively.

🐛 Bug Fixes

  • 🔧 Fixed issues with the link-check workflow and wildbench.
  • 🛠 Resolved errors in partitioning and corrected typos throughout the codebase.
  • 🔧 Addressed compatibility issues with lmdeploy interface type changes.
  • 🛠 Fixed the followbench dataset configuration and token settings.

⚙ Enhancements and Refactors

  • 🛠 Enhanced support for verbose output in OpenAI API interactions.
  • 🔧 Updated maximum output length configurations for multiple models.
  • 🛠 Improved handling of the "begin section" in meta_template for better parsing.
  • 🔧 Added a common summarizer for qabench and expanded test coverage for various models.

🎉 Welcome New Contributors
👋 We'd like to extend a warm welcome to our new contributors who have made their first contributions to OpenCompass:

Thank you to all our contributors for making this release possible!

Full Changelog: 0.3.2.post1...0.3.3

0.3.2.post1

06 Sep 10:48
b5f8afb
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.3.2...0.3.2.post1

0.3.2

06 Sep 08:21
ff18545
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.2!

🚀 New Features

  • 🛠 Added extra_body support for OpenAISDK and introduced proxy URL support when connecting to OpenAI's API.
  • 🗂 Included auto-download functionality for Mmlu-pro, Needlebench, Longbench and other datasets.
  • 🤝 Integrated support for the Rendu API.
  • 🧪 Added a model postprocess function.

📖 Documentation

  • 📜 Updated the README file for better clarity and guidance.

🐛 Bug Fixes

  • 🛠 Fixed CLI evaluation for multiple models.
  • 🛠 Updated requirements to resolve dependency issues.
  • 🛠 Corrected configurations for the Llama model series.
  • 🛠 Addressed bad cases and added environment information to improve testing.

⚙ Enhancements and Refactors

  • 🛠 Made OPENAI_API_BASE compatible with OpenAI's default environment settings.
  • 🛠 Optimized SciCode for improved performance.
  • 🛠 Added an api_key attribute to TurboMindAPIModel.
  • 🛠 Implemented fixes and improvements to the CI test environment, including baselines for vllm.

🎉 Welcome New Contributors

  • 👋 @cpa2001 contributed with the addition of icl_sliding_k_retriever.py and updates to __init__.py.
  • 👋 @gyin94 made the OPENAI_API_BASE compatible with OpenAI's default environment.
  • 👋 @chengyingshe added an attribute api_key into TurboMindAPIModel.
  • 👋 @yanzeyu supported the integration of Rendu API.

Full Changelog: 0.3.1...0.3.2

OpenCompass v0.3.1

23 Aug 03:00
5485207
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.1!


🌟 Highlights

  • 🚀 Support pip installation, update Readme and evaluation demo
  • 🐛 Fixed various dataset loading issues.
  • ⚙️ Enhanced auto-download features for datasets.

🚀 New Features

  • 🆕 Introduced support for Ruler datasets.
  • 🆕 Enhanced model compatibility.
  • 🆕 Improved dataset handling, support auto-download for various datasets

📖 Documentation

  • 📚 Updated README to reflect the latest changes.
  • 📚 Improved documentation for dataset loading procedures.

🐛 Bug Fixes

  • 🐞 Resolved modelscope dataset load issues.
  • 🐞 Corrected evaluation scores for the Lawbench dataset.
  • 🐞 Fixed dataset bugs for CommonsenseQA and Longbench.

⚙ Enhancements and Refactors

  • 🔧 Retained first and last halves of prompts to avoid max_seq_len issues.
  • 🔧 Updated Compassbench to v1.3.
  • 🔧 Switched to Python runner for single GPU operations.

🎉 Welcome New Contributors

  • 🙌 @Yunnglin for fixing modelscope dataset load problem.
  • 🙌 @changyeyu for addressing max_seq_len issues with prompt handling.
  • 🙌 @seetimee for updates to openai_api.py.
  • 🙌 @HariSeldon0 for adding the scicode dataset.

What's Changed

Full Changelog: 0.3.0...0.3.1


Thank you for your continued support and contributions to OpenCompass!

OpenCompass v0.3.0

06 Aug 17:34
264fd23
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.3.0! This release brings a variety of new features, enhancements, and bug fixes to improve your experience.

🌟 Highlights

  1. Support for OpenAI ChatCompletion
  2. Updated Model Support List
  3. Support Dataset Automatic Download
  4. Support pip install opencompass

🚀 New Features

  1. Support for CompassBench Checklist Evaluation
  2. Adding support for Doubao API
  3. Support for ModelScope Datasets

📖 Documentation

  1. Update NeedleBench Docs
  2. Update Documentation

🐛 Bug Fixes

  1. Fix Typing and Typo
  2. Fix Lint Issues
  3. Fix Summary Error in subjective.py

⚙ Enhancements and Refactors

  1. Upgrade Default Math pred_postprocessor
  2. Fix Path and Folder Updates
  3. Update Get Data Path for LCBench and HumanEval

🔗 Full Change Logs

🎉 Welcome New Contributors

Full Changelog: 0.2.6...0.3.0

OpenCompass v0.2.6

05 Jul 16:36
a62c613
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.6!

🌟 Highlights

  • No noteworthy highlights.

🚀 New Features

  1. #1215 #1224 #1266 Add Datasets MT-Bench-101, Fofo, wildbench
  2. #1286 Add Models InternLM2.5-7B

📖 Documentation

  1. #1252 Add doc for accelerator function
  2. #1263 Update quick start guide

🐛 Bug Fixes

  1. #1221 Resolve release version installation and import issues
  2. #1228 Fix pip version issues
  3. #1282 Update MathBench summarizer & fix cot setting

⚙ Enhancements and Refactors

  1. #1284 Reorganize subjective eval

🎉 Welcome New Contributors

🔗 Full Change Logs

Full Changelog: 0.2.5...0.2.6

OpenCompass v0.2.5

29 May 16:35
a77b8a5
Compare
Choose a tag to compare

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.5!

🌟 Highlights

  • Simplify the huggingface / vllm / lmdeploy model wrapper. meta_template is no longer needed to be hand-crafted in model configs
  • Introduce evaluation results README in ~20 dataset config folders.

🚀 New Features

  1. #1065 Add LLaMA-3 Series Configs
  2. #1048 Add TheoremQA with 5-shot
  3. #1094 Support Math evaluation via judgemodel
  4. #1080 Add gpqa prompt from simple_evals, openai
  5. #1074 Add mmlu prompt from simple_evals, openai
  6. #1123 Add Qwen1.5 MoE 7b and Mixtral 8x22b model configs

📖 Documentation

  1. #1053 Update readme
  2. #1102 Update NeedleInAHaystack Docs
  3. #1110 Update README.md
  4. #1205 Remove --no-batch-padding and Use --hf-num-gpus

🐛 Bug Fixes

  1. #1036 Update setup.py install_requires
  2. #1051 Fixed the issue caused
  3. #1043 fix multiround
  4. #1070 Fix sequential runner
  5. #1079 Fix Llama-3 meta template

⚙ Enhancements and Refactors

  1. #1163 enable HuggingFacewithChatTemplate with --accelerator via cli
  2. #1104 fix prompt template
  3. #1109 Update performance of common benchmarks

🎉 Welcome New Contributors

🔗 Full Change Logs

Read more

OpenCompass v0.2.5.rc1

23 Apr 09:21
81d0e4d
Compare
Choose a tag to compare
Pre-release
[Feature] Add lmdeploy tis python backend model (#1014)

* add lmdeploy tis python backend model

* fix pr check

* update