diff --git a/clip/benchmark.md b/clip/benchmark.md index 17d342b..3f4031e 100644 --- a/clip/benchmark.md +++ b/clip/benchmark.md @@ -948,4 +948,6 @@ The zero-shot retrieval performance of EVA-CLIP is relatively inferior to the Op - The size / capacity of the language tower in EVA-CLIP is much smaller / weaker than Open CLIP-H and Open CLIP-g, *i.e.*, `124M` *v.s.* `354M`. Meanwhile, retrieval tasks depend more on the capacity of the language branch compared with classification tasks. - Retrieval tasks seem benefit more from the training dataset size (LAION-2B used by Open CLIP), while we only leverage LAION-400M for EVA-CLIP training. -Nevertheless, it is hard to make a head-to-head comparison between different CLIP models. In the future, we will further scale up the language encoder & training data to improve the retrieval performance. \ No newline at end of file +Nevertheless, it is hard to make a head-to-head comparison between different CLIP models. In the future, we will further scale up the language encoder & training data to improve the retrieval performance. + +- **Updates (Feb, 2023)**: We are training an improved version of EVA-CLIP+ (WIP), now achieving ~79.5 zero-shot top-1 accuracy on IN-1K, and outperforming the prev. best CLIP by ~0.5\% in zero-shot retrieval. We will update the details soon and release all suites of EVA-CLIP+ in the future. Please stay tuned. \ No newline at end of file