-
Notifications
You must be signed in to change notification settings - Fork 245
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into image2struct_v1.0.1_fixes
- Loading branch information
Showing
14 changed files
with
345 additions
and
91 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,68 @@ | ||
# HEIM (Text-to-image Model Evaluation) | ||
|
||
To run HEIM, follow these steps: | ||
**Holistic Evaluation of Text-To-Image Models (HEIM)** is an extension of the HELM framework for evaluating **text-to-image models**. | ||
|
||
## Holistic Evaluation of Text-To-Image Models | ||
|
||
<img src="https://github.com/stanford-crfm/helm/raw/heim/src/helm/benchmark/static/heim/images/heim-logo.png" alt="" width="800"/> | ||
|
||
Significant effort has recently been made in developing text-to-image generation models, which take textual prompts asmy-suite | ||
input and generate images. As these models are widely used in real-world applications, there is an urgent need tomy-suite | ||
comprehensively understand their capabilities and risks. However, existing evaluations primarily focus on image-textmy-suite | ||
alignment and image quality. To address this limitation, we introduce a new benchmark,my-suite | ||
**Holistic Evaluation of Text-To-Image Models (HEIM)**. | ||
|
||
We identify 12 different aspects that are important in real-world model deployment, including: | ||
|
||
- image-text alignment | ||
- image quality | ||
- aesthetics | ||
- originality | ||
- reasoning | ||
- knowledge | ||
- bias | ||
- toxicity | ||
- fairness | ||
- robustness | ||
- multilinguality | ||
- efficiency | ||
|
||
By curating scenarios encompassing these aspects, we evaluate state-of-the-art text-to-image models using this benchmark.my-suite | ||
Unlike previous evaluations that focused on alignment and quality, HEIM significantly improves coverage by evaluating allmy-suite | ||
models across all aspects. Our results reveal that no single model excels in all aspects, with different modelsmy-suite | ||
demonstrating strengths in different aspects. | ||
|
||
## References | ||
|
||
- [Leaderboard](https://crfm.stanford.edu/helm/heim/latest/) | ||
- [Paper](https://arxiv.org/abs/2311.04287) | ||
|
||
## Installation | ||
|
||
First, follow the [installation instructions](installation.md) to install the base HELM Python page. | ||
|
||
To install the additional dependencies to run HEIM, run: | ||
|
||
1. Create a run specs configuration file. For example, to evaluate | ||
[Stable Diffusion v1.4](https://huggingface.co/CompVis/stable-diffusion-v1-4) against the | ||
[MS-COCO scenario](https://github.com/stanford-crfm/heim/blob/main/src/helm/benchmark/scenarios/image_generation/mscoco_scenario.py), run: | ||
``` | ||
echo 'entries: [{description: "mscoco:model=huggingface/stable-diffusion-v1-4", priority: 1}]' > run_entries.conf | ||
pip install "crfm-helm[heim]" | ||
```my-suite | ||
Some models (e.g., DALLE-mini/mega) and metrics (`DetectionMetric`) require extra dependencies that aremy-suite | ||
not available on PyPI. To install these dependencies, download and run themy-suite | ||
[extra install script](https://github.com/stanford-crfm/helm/blob/main/install-heim-extras.sh): | ||
``` | ||
2. Run the benchmark with certain number of instances (e.g., 10 instances): | ||
`helm-run --conf-paths run_entries.conf --suite heim_v1 --max-eval-instances 10` | ||
bash install-heim-extras.sh | ||
``` | ||
## Getting Started | ||
The following is an example of evaluating [Stable Diffusion v1.4](https://huggingface.co/CompVis/stable-diffusion-v1-4) on the [MS-COCO scenario](https://github.com/stanford-crfm/heim/blob/main/src/helm/benchmark/scenarios/image_generation/mscoco_scenario.py) using 10 instances. | ||
```sh | ||
helm-run --run-entries mscoco:model=huggingface/stable-diffusion-v1-4 --suite my-heim-suite --max-eval-instances 10 | ||
``` | ||
|
||
## Reproducing the Leaderboard | ||
|
||
Examples of run specs configuration files can be found [here](https://github.com/stanford-crfm/helm/tree/main/src/helm/benchmark/presentation). | ||
We used [this configuration file](https://github.com/stanford-crfm/helm/blob/main/src/helm/benchmark/presentation/run_entries_heim.conf) | ||
to produce results of the paper. | ||
To reproduce the [entire HEIM leaderboard](https://crfm.stanford.edu/helm/heim/latest/), refer to the instructions for HEIM on the [Reproducing Leaderboards](reproducing_leaderboards.md) documentation. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[metadata] | ||
name = crfm-helm | ||
version = 0.5.3 | ||
version = 0.5.4 | ||
author = Stanford CRFM | ||
author_email = [email protected] | ||
description = Benchmark for language models | ||
|
2 changes: 1 addition & 1 deletion
2
...ark/static_build/assets/index-3ee38b3d.js → ...ark/static_build/assets/index-19bdae52.js
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.