diff --git a/index.html b/index.html index f31cd51..64701da 100644 --- a/index.html +++ b/index.html @@ -203,14 +203,9 @@

- geometric reasoning + geometric reasoning

- Overview of the GMAI-MMBench. The benchmark is meticulously designed for testing - LVLMs’ abilities in real-world clinical scenarios with three key features: (1) Comprehensive medical - knowledge: It consists of 285 diverse clinical-related datasets from worldwide sources, covering 39 - modalities. (2) Well-categorized data structure: It features 18 clinical VQA tasks and 18 clinical - departments, meticulously organized into a lexical tree. (3) Multi-perceptual granularity: Interactive - methods span from image to region level, offering varying degrees of perceptual details. + Overview of the GMAI-MMBench. The benchmark is meticulously designed for testing LVLMs' abilities in real-world clinical scenarios with three key features: (1) Comprehensive medical knowledge: It consists of 284 diverse clinical-related datasets from worldwide sources, covering 38 modalities. (2) Well-categorized data structure: It features 18 clinical VQA tasks and 18 clinical departments, meticulously organized into a lexical tree. (3) Multi-perceptual granularity: Interactive methods span from image to region level, offering varying degrees of perceptual details.

@@ -230,25 +225,7 @@

🔔News

Abstract

- Large Vision-Language Models (LVLMs) are capable of handling diverse data - types such as imaging, text, and physiological signals, and can be applied in various fields. In the medical field, LVLMs have a high potential to offer substantial - assistance for diagnosis and treatment. Before that, it is crucial to develop benchmarks to evaluate LVLMs’ effectiveness in various medical applications. Current - benchmarks are often built upon specific academic literature, mainly focusing on - a single domain, and lacking varying perceptual granularities. Thus, they face - specific challenges, including limited clinical relevance, incomplete evaluations, - and insufficient guidance for interactive LVLMs. To address these limitations, we developed the GMAI-MMBench, the most comprehensive general medical AI - benchmark with well-categorized data structure and multi-perceptual granularity to - date. It is constructed from 285 datasets across 39 medical image modalities, 18 - clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual - Question Answering (VQA) format. Additionally, we implemented a lexical tree - structure that allows users to customize evaluation tasks, accommodating various - assessment needs and substantially supporting medical AI research and applications. - We evaluated 50 LVLMs, and the results show that even the advanced GPT-4o - only achieves an accuracy of 52%, indicating significant room for improvement. - Moreover, we identified five key insufficiencies in current cutting-edge LVLMs that - need to be addressed to advance the development of better medical applications. - We believe that GMAI-MMBench will stimulate the community to build the next - generation of LVLMs toward GMAI. + Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields. In the medical field, LVLMs have a high potential to offer substantial assistance for diagnosis and treatment. Before that, it is crucial to develop benchmarks to evaluate LVLMs' effectiveness in various medical applications. Current benchmarks are often built upon specific academic literature, mainly focusing on a single domain, and lacking varying perceptual granularities. Thus, they face specific challenges, including limited clinical relevance, incomplete evaluations, and insufficient guidance for interactive LVLMs. To address these limitations, we developed the GMAI-MMBench, the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date. It is constructed from 284 datasets across 38 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format. Additionally, we implemented a lexical tree structure that allows users to customize evaluation tasks, accommodating various assessment needs and substantially supporting medical AI research and applications. We evaluated 50 LVLMs, and the results show that even the advanced GPT-4o only achieves an accuracy of 52\%, indicating significant room for improvement. Moreover, we identified five key insufficiencies in current cutting-edge LVLMs that need to be addressed to advance the development of better medical applications. We believe that GMAI-MMBench will stimulate the community to build the next generation of LVLMs toward GMAI.

@@ -284,15 +261,7 @@

Overview

- We propose GMAI-MMBench, an innovative benchmark meticulously designed for the medical field, - capable of providing comprehensive evaluations of LVLMs across various aspects of healthcare. - We collect 285 datasets from public sources and hospitals, covering medical - imaging tasks of detection, classification, and segmentation, to form the data fuel for establishing such - a benchmark. The detailed datasets are listed in the supplementary. Based on the data foundation, - we design a reliable pipeline to generate question-answering pairs and organize them from different - perspectives with manual validation. Finally, we carefully select approximately 26K questions with - varying levels of perceptual granularity from the manually validated cases to construct the final - GMAI-MMBench. + We propose GMAI-MMBench, an innovative benchmark meticulously designed for the medical field, capable of providing comprehensive evaluations of LVLMs across various aspects of healthcare. (shown in the Figure\ref{fig:body_med}) We collect 284 datasets from public sources and hospitals, covering medical imaging tasks of detection, classification, and segmentation, to form the data fuel for establishing such a benchmark. The detailed datasets are listed in the supplementary. Based on the data foundation, we design a reliable pipeline to generate question-answering pairs and organize them from different perspectives with manual validation. Finally, we carefully select approximately 26K questions with varying levels of perceptual granularity from the manually validated cases to construct the final GMAI-MMBench.

algebraic reasoning

@@ -306,26 +275,34 @@

Statistics

diff --git a/static/images/Statistics1.jpg b/static/images/Statistics1.jpg deleted file mode 100644 index 6a47300..0000000 Binary files a/static/images/Statistics1.jpg and /dev/null differ diff --git a/static/images/Statistics1.png b/static/images/Statistics1.png new file mode 100644 index 0000000..6d3fd8a Binary files /dev/null and b/static/images/Statistics1.png differ diff --git a/static/images/Statistics2.jpg b/static/images/Statistics2.jpg index 2d5c211..55b0cc8 100644 Binary files a/static/images/Statistics2.jpg and b/static/images/Statistics2.jpg differ diff --git a/static/images/Statistics3.jpg b/static/images/Statistics3.jpg index 7bc4eb8..87998fe 100644 Binary files a/static/images/Statistics3.jpg and b/static/images/Statistics3.jpg differ diff --git a/static/images/Statistics4.jpg b/static/images/Statistics4.jpg index 979506c..df9d1b9 100644 Binary files a/static/images/Statistics4.jpg and b/static/images/Statistics4.jpg differ diff --git a/static/images/body_med.png b/static/images/body_med.png index 6837e6c..14d114a 100644 Binary files a/static/images/body_med.png and b/static/images/body_med.png differ diff --git a/static/images/cover.jpg b/static/images/cover.jpg deleted file mode 100644 index 1801daf..0000000 Binary files a/static/images/cover.jpg and /dev/null differ diff --git a/static/images/cover.png b/static/images/cover.png new file mode 100644 index 0000000..b026fe4 Binary files /dev/null and b/static/images/cover.png differ diff --git a/static/images/workflow.png b/static/images/workflow.png index 16e81e8..35a9923 100644 Binary files a/static/images/workflow.png and b/static/images/workflow.png differ