- [2023.08.18] We have supported evaluation for multi-modality learning, include MMBench, SEED-Bench, COCO-Caption, Flickr-30K, OCR-VQA, ScienceQA and so on. Leaderboard is on the road. Feel free to try multi-modality evaluation with OpenCompass !
- [2023.08.18] Dataset card is now online. Welcome new evaluation benchmark OpenCompass !
- [2023.08.11] Model comparison is now online. We hope this feature offers deeper insights!
- [2023.08.11] We have supported LEval.
- [2023.08.10] OpenCompass is compatible with LMDeploy. Now you can follow this instruction to evaluate the accelerated models provide by the Turbomind.
- [2023.08.10] We have supported Qwen-7B and XVERSE-13B ! Go to our leaderboard for more results! More models are welcome to join OpenCompass.
- [2023.08.09] Several new datasets(CMMLU, TydiQA, SQuAD2.0, DROP) are updated on our leaderboard! More datasets are welcomed to join OpenCompass.
- [2023.08.07] We have added a script for users to evaluate the inference results of MMBench-dev.
- [2023.08.05] We have supported GPT-4! Go to our leaderboard for more results! More models are welcome to join OpenCompass.
- [2023.07.27] We have supported CMMLU! More datasets are welcome to join OpenCompass.