diff --git a/docs/getting_started/overview.md b/docs/getting_started/overview.md index fe44e65e2..7be4fee8d 100644 --- a/docs/getting_started/overview.md +++ b/docs/getting_started/overview.md @@ -6,9 +6,9 @@ The MedPerf client provides all the necessary tools to run a complete benchmark - **Model Owner:** The [Model Owner](../roles.md#model-owners) can submit a model to the MedPerf server and request participation in a benchmark. - **Data Owner:** The [Data Owner](../roles.md#data-providers) can prepare their raw medical data, register the **metadata** of their prepared dataset, request participation in a benchmark, execute a benchmark's models on their data, and submit the results of the execution. -## What's Next? + diff --git a/docs/images/benchmark_committee.gif b/docs/images/benchmark_committee.gif new file mode 100644 index 000000000..e44de890e Binary files /dev/null and b/docs/images/benchmark_committee.gif differ diff --git a/docs/images/benefits.png b/docs/images/benefits.png new file mode 100644 index 000000000..5fcdf849b Binary files /dev/null and b/docs/images/benefits.png differ diff --git a/docs/images/fed_eva_example.gif b/docs/images/fed_eva_example.gif new file mode 100644 index 000000000..88a490ebc Binary files /dev/null and b/docs/images/fed_eva_example.gif differ diff --git a/docs/roles.md b/docs/roles.md index 9bcb43338..a09b536f6 100644 --- a/docs/roles.md +++ b/docs/roles.md @@ -1,27 +1,27 @@ # User Roles and Responsibilities -There are different types of users/roles that can be created in MedPerf: +Here we introduce user roles at MedPerf. Depending on the objectives and expectations a user may have multiple roles. ## Benchmark Committee -Include regulatory bodies, groups of experts (e.g., clinicians, patient representative groups), and data or Model Owners wishing to drive the evaluation of their model or data. Benchmark Committee do not have admin privileges on the system, but they have elevated permissions regarding the benchmark they define, such as deciding which model and Data Providers will participate. +May include healthcare stakeholders (e.g., hospitals, clinicians, patient advocacy groups, payors, etc.), regulatory bodies, data providers and model owners wishing to drive the evaluation of AI models on real world data. While the *Benchmark Committee* does not have admin privileges on MedPerf, they have elevated permissions regarding benchmark assets (e.g., task, evaluation metrics, etc.) and policies (e.g., participation of model owners, data providers, anonymizations) ![](./images/benchmark_committee.png) ## Data Providers -Include hospitals, medical practices, research organizations, and healthcare insurance providers that own medical data, register medical data, and execute benchmark requests. +May include hospitals, medical practices, research organizations, and healthcare payors that own medical data, register medical data, and execute benchmarks. ![](./images/data_owners.png) ## Model Owners -Include ML researchers and software vendors that own a trained medical ML model and want to evaluate its performance. +May include ML researchers and software vendors that own a trained medical ML model and want to evaluate its performance against a benchmark. ![](./images/model_owners.png) ## Platform Providers -Organizations like [MLCommons](https://mlcommons.org/en/){target="\_blank"} that operate a platform that enables a benchmark committee to run benchmarks by connecting Data Providers with Model Owners. +Organizations like [MLCommons](https://mlcommons.org/en/){target="\_blank"} that operate the MedPerf platform enabling benchmark committees to develop and run benchmarks. ![](./images/platform%20provider.png) diff --git a/docs/what_is_medperf.md b/docs/what_is_medperf.md index 75b01a37a..c867a7ccf 100644 --- a/docs/what_is_medperf.md +++ b/docs/what_is_medperf.md @@ -1,45 +1,41 @@ -# Introduction -Medical Artificial Intelligence (AI) has the potential to revolutionize healthcare by advancing evidence-based medicine, personalizing patient care, and reducing costs. Unlocking this potential requires reliable methods for evaluating the efficacy of medical machine learning (ML) models on large-scale heterogeneous data while maintaining patient privacy. + +MedPerf is an open-source framework for benchmarking medical ML models. It uses *Federated Evaluation* a method in which medical ML models are securely distributed to multiple global facilities for evaluation prioritizing patient privacy to mitigate legal and regulatory risks. The goal of *Federated Evaluation* is to make it simple and reliable to share ML models with many data providers, evaluate those ML models against their data in controlled settings, then aggregate and analyze the findings. -## What is MedPerf? +The MedPerf approach empowers healthcare stakeholders through neutral governance to assess and verify the performance of ML models in an efficient and human-supervised process without sharing any patient data across facilities during the process. -MedPerf is an open-source framework for benchmarking ML models to deliver clinical efficacy while prioritizing patient privacy and mitigating legal and regulatory risks. It enables federated evaluation in which models are securely distributed to different facilities for evaluation. The goal of federated evaluation is to make it simple and reliable to share models with many data providers, evaluate those models against their data in controlled settings, then aggregate and analyze the findings. -The MedPerf approach empowers healthcare organizations to assess and verify the performance of ML models in an efficient and human-supervised process without sharing any patient data across facilities during the process. -## Why MedPerf? - -MedPerf reduces the risks and costs associated with data sharing, maximizing medical and patient outcomes. The platform leads to an effective, broader, and cost-effective adoption of medical ML and improves patient outcomes. +| ![federated_evaluation.gif](images/fed_eva_example.gif) | +|:--:| +| *Federated evaluation of medical AI model using MedPerf on a hypothetical example* | -Anyone who joins our platform can get several benefits, regardless of the role they will assume. -**Benefits if you are a [Data Provider](roles.md#data-providers):** +## Why MedPerf? -* Evaluate how well machine learning models perform on your population’s data. -* Connect to Model Owners to help them improve medical ML in a specific domain. -* Help define impactful medical ML benchmarks. +MedPerf aims to identify bias and generalizability issues of medical ML models by evaluating them on diverse medical data across the world. This process allows developers of medical ML to efficiently identify performance and reliability issues on their models while healthcare stakeholders (e.g., hospitals, practices, etc.) can validate such models against clinical efficacy. -**Benefits if you are a [Model Owner](roles.md#model-owners):** +Importantly, MedPerf supports technology for **neutral governance** in order to enable **full trust** and **transparency** among participating parties (e.g., AI vendor, data provider, regulatory body, etc.). This is all encapsulated in the benchmark committee which is the overseeing body on a benchmark. -* Measure model performance on private datasets that you would never have access to. -* Connect to specific Data Providers that can help you increase the performance of your model. +| ![benchmark_committee.gif](images/benchmark_committee.gif) | +|:--:| +| *Benchmark committee in MedPerf* | -**Benefits if you own a benchmark ([Benchmark Committee](roles.md#benchmark-committee)):** +## Benefits to healthcare stakeholders -* Hold a leading role in the MedPerf ecosystem by defining specifications of a benchmark for a particular medical ML task. -* Get help to create a strong community around a specific area. -* Rule point on creating the guidelines to generate impactful ML for a specific area. -* Help improve best practices in your area of interest. -* Ensure the choice of the metrics as well as the proper reference implementations. +Anyone who joins our platform can get several benefits, regardless of the role they will assume. -**Benefits to the Broad Community:** +| ![benefits.png](images/benefits.png) | +|:--:| +| *Benefits to healthacare stakeholders using MedPerf* | -* Provide consistent and rigorous approaches for evaluating the accuracy of ML models for real-world use in a standardized manner. -* Enable model usability measurement across institutions while maintaining data privacy and model reliability. -* Connect with a community of expert groups to employ scientific evaluation methodologies and technical approaches to operate benchmarks that not only have well-defined clinical aspects, such as clinical impact, clinical workflow integration and patient outcome, but also support robust technical aspects, including proper metrics, data preprocessing and reference implementation. +[Our paper](https://www.nature.com/articles/s42256-023-00652-2){target="_blank"} describes the design philosophy in detail. -## What is a benchmark in the MedPerf perspective? + diff --git a/docs/workflow.md b/docs/workflow.md index beb1c5554..74e244c38 100644 --- a/docs/workflow.md +++ b/docs/workflow.md @@ -1,40 +1,55 @@ -# MedPerf Benchmarking Workflow + -## Creating a User + -## Establishing a Benchmark Committee -The benchmarking process starts with establishing a benchmark committee of healthcare stakeholders (experts, committee), which will identify a clinical problem where an effective ML-based solution can have a significant clinical impact. + -## Recruiting Data and Model Owners -The benchmark committee recruits Data Providers and Model Owners either by inviting trusted parties or by making an open call for participation. A higher number of dataset providers recruited can maximize diversity on a global scale. +A benchmark in MedPerf is a collection of assets that are developed by the benchmark committee that aims to evaluate medical ML on decentralized data providers. -## Benchmark Submission +The process is simple yet effective enabling scalability. -[MLCubes](mlcubes/mlcubes.md) are the building blocks of an experiment and are required in order to create a benchmark. Three MLCubes (Data Preparator MLCube, Reference Model MLCube, and Metrics MLCube) need to be submitted. After submitting the three MLCubes, alongside with a sample reference dataset, the Benchmark Committee is capable of creating a benchmark. Once the benchmark is submitted, the Medperf admin must approve it before it can be seen by other users. Follow our [Hands-on Tutorial](getting_started/benchmark_owner_demo.md) for detailed step-by-step guidelines. +## Step 1. Establish Benchmark Committee -## Submitting and Associating Additional Models +The benchmarking process starts with establishing a benchmark committee of healthcare stakeholders (experts, committee), which will identify a clinical problem where an effective ML-based solution can have a significant clinical impact. -Once a benchmark is submitted by the Benchmark Committee, any user can [submit their own Model MLCubes](getting_started/model_owner_demo.md) and [request an association with the benchmark](getting_started/model_owner_demo.md#3-request-participation). This association request executes the benchmark locally with the given model on the benchmark's reference dataset to ensure workflow validity and compatibility. If the model successfully passes the compatibility test, and its association is approved by the Benchmark Committee, it becomes a part of the benchmark. + + +## Step 2. Register Benchmark + +[MLCubes](mlcubes/mlcubes.md) are the building blocks of an experiment and are required in order to create a benchmark. Three MLCubes (Data Preparator MLCube, Reference Model MLCube, and Metrics MLCube) need to be submitted. After submitting the three MLCubes, alongside with a sample reference dataset, the Benchmark Committee is capable of creating a benchmark. Once the benchmark is submitted, the Medperf admin must approve it before it can be seen by other users. Follow our [Hands-on Tutorial](getting_started/benchmark_owner_demo.md) for detailed step-by-step guidelines. -## Dataset Preparation and Association +## Step 3. Register Dataset Data Providers that want to be part of the benchmark can [prepare their own datasets, register them, and associate them](getting_started/data_owner_demo.md) with the benchmark. A dataset will be prepared using the benchmark's Data Preparator MLCube. Then, the prepared dataset's **metadata** is registered within the MedPerf server. -![](./images/flow_preparation_association_folders.PNG) +| ![flow_preparation.gif](images/flow_preparation_association_folders.PNG) | +|:--:| +| *Data Preparation* | + The data provider then can request to participate in the benchmark with their dataset. Requesting the association will run the benchmark's reference workflow to assure the compatibility of the prepared dataset structure with the workflow. Once the association request is approved by the Benchmark Committee, then the dataset becomes a part of the benchmark. ![](./images/dataset_preparation_association.png) -## Executing the Benchmark +## Step 4. Register Models + +Once a benchmark is submitted by the Benchmark Committee, any user can [submit their own Model MLCubes](getting_started/model_owner_demo.md) and [request an association with the benchmark](getting_started/model_owner_demo.md#3-request-participation). This association request executes the benchmark locally with the given model on the benchmark's reference dataset to ensure workflow validity and compatibility. If the model successfully passes the compatibility test, and its association is approved by the Benchmark Committee, it becomes a part of the benchmark. + +![](./images/submitting_associating_additional_models_1.png) + +## Step 5. Execute Benchmark The Benchmark Committee may notify Data Providers that models are available for benchmarking. Data Providers can then [run the benchmark models](getting_started/data_owner_demo.md#4-execute-the-benchmark) locally on their data. @@ -42,7 +57,7 @@ This procedure retrieves the model MLCubes associated with the benchmark and run ![](./images/execution_flow_folders.PNG) -## Release Results to Participants +## Step 6. Aggregate and Release Results The benchmarking platform aggregates the results of running the models against the datasets and shares them **according to the Benchmark Committee's policy.** diff --git a/mkdocs.yml b/mkdocs.yml index d2f796580..e9b2e4296 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -3,22 +3,21 @@ site_url: https://medperf.com/ repo_url: https://github.com/mlcommons/medperf repo_name: mlcommons/medperf nav: -- What's Medperf?: what_is_medperf.md +- What is Medperf?: what_is_medperf.md - Benchmark Roles: roles.md - Benchmark Workflow: workflow.md -- MedPerf Components: medperf_components.md -- GETTING STARTED: - - Roles Capabilities: getting_started/overview.md - - First Steps: - - Create an Account: getting_started/signup.md - - Installation: getting_started/installation.md - - Setup: getting_started/setup.md - - Tutorials: - - Introduction: getting_started/tutorials_overview.md - - Bechmark Committee: getting_started/benchmark_owner_demo.md - - Model Owners: getting_started/model_owner_demo.md - - Data Owners: getting_started/data_owner_demo.md +- Components: medperf_components.md +- GETTING STARTED: + # - Roles: getting_started/overview.md + - Create an Account: getting_started/signup.md + - Installation: getting_started/installation.md + - Setup: getting_started/setup.md +- TUTORIAL: + - Introduction: getting_started/tutorials_overview.md + - Bechmark Committee: getting_started/benchmark_owner_demo.md + - Model Owners: getting_started/model_owner_demo.md + - Data Owners: getting_started/data_owner_demo.md - ADVANCED CONCEPTS: - MedPerf MLCubes: - Introduction: mlcubes/mlcubes.md @@ -27,14 +26,14 @@ nav: - Creating Metrics MLCubes from Scratch: mlcubes/mlcube_metrics.md - Creating Model MLCubes from GaNDLF: mlcubes/gandlf_mlcube.md - Authentication: concepts/auth.md -# - Client Configuration: concepts/profiles.md + # - Client Configuration: concepts/profiles.md - MLCube Components: concepts/mlcube_files.md - Hosting Files: concepts/hosting_files.md -# - Benchmark Associations: concepts/associations.md -# - Model Priority: concepts/priorities.md -# - Running Specific Models: concepts/single_run.md -#- CLI Reference: cli_reference.md -# - Code Reference: reference/ + # - Benchmark Associations: concepts/associations.md + # - Model Priority: concepts/priorities.md + # - Running Specific Models: concepts/single_run.md + #- CLI Reference: cli_reference.md + # - Code Reference: reference/ theme: custom_dir: docs/overrides features: