Skip to content

Commit

Permalink
[Doc] Update FAQ (#628)
Browse files Browse the repository at this point in the history
* update faq

* Update docs/zh_cn/get_started/faq.md

* Update docs/en/get_started/faq.md

* Update docs/zh_cn/get_started/faq.md

---------

Co-authored-by: Songyang Zhang <[email protected]>
  • Loading branch information
Leymore and tonysy committed Nov 23, 2023
1 parent d949e3c commit 79f6449
Show file tree
Hide file tree
Showing 3 changed files with 67 additions and 1 deletion.
26 changes: 26 additions & 0 deletions docs/en/get_started/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,16 @@ During evaluation, OpenCompass deploys multiple workers to execute tasks in para

For instance, if you're using OpenCompass on a local machine equipped with 8 GPUs, and each task demands 4 GPUs, then by default, OpenCompass will employ all 8 GPUs to concurrently run 2 tasks. However, if you adjust the `--max-num-workers` setting to 1, then only one task will be processed at a time, utilizing just 4 GPUs.

### Why doesn't the GPU behavior of HuggingFace models align with my expectations?

This is a complex issue that needs to be explained from both the supply and demand sides:

The supply side refers to how many tasks are being run. A task is a combination of a model and a dataset, and it primarily depends on how many models and datasets need to be tested. Additionally, since OpenCompass splits a larger task into multiple smaller tasks, the number of data entries per sub-task (`--max-partition-size`) also affects the number of tasks. (The `--max-partition-size` is proportional to the actual number of data entries, but the relationship is not 1:1).

The demand side refers to how many workers are running. Since OpenCompass instantiates multiple models for inference simultaneously, we use `--num-gpus` to specify how many GPUs each instance uses. Note that `--num-gpus` is a parameter specific to HuggingFace models and setting this parameter for non-HuggingFace models will not have any effect. We also use `--max-num-workers` to indicate the maximum number of instances running at the same time. Lastly, due to issues like GPU memory and insufficient load, OpenCompass also supports running multiple instances on the same GPU, which is managed by the parameter `--max-num-workers-per-gpu`. Therefore, it can be generally assumed that we will use a total of `--num-gpus` * `--max-num-workers` / `--max-num-workers-per-gpu` GPUs.

In summary, when tasks run slowly or the GPU load is low, we first need to check if the supply is sufficient. If not, consider reducing `--max-partition-size` to split the tasks into finer parts. Next, we need to check if the demand is sufficient. If not, consider increasing `--max-num-workers` and `--max-num-workers-per-gpu`. Generally, **we set `--num-gpus` to the minimum value that meets the demand and do not adjust it further.**

### How do I control the number of GPUs that OpenCompass occupies?

Currently, there isn't a direct method to specify the number of GPUs OpenCompass can utilize. However, the following are some indirect strategies:
Expand All @@ -20,6 +30,22 @@ You can limit OpenCompass's GPU access by setting the `CUDA_VISIBLE_DEVICES` env
**If using Slurm or DLC:**
Although OpenCompass doesn't have direct access to the resource pool, you can adjust the `--max-num-workers` parameter to restrict the number of evaluation tasks being submitted simultaneously. This will indirectly manage the number of GPUs that OpenCompass employs. For instance, if each task requires 4 GPUs, and you wish to allocate a total of 8 GPUs, then you should set `--max-num-workers` to 2.

### `libGL.so.1` not foune

opencv-python depends on some dynamic libraries that are not present in the environment. The simplest solution is to uninstall opencv-python and then install opencv-python-headless.

```bash
pip uninstall opencv-python
pip install opencv-python-headless
```

Alternatively, you can install the corresponding dependency libraries according to the error message

```bash
sudo apt-get update
sudo apt-get install -y libgl1 libglib2.0-0
```

## Network

### My tasks failed with error: `('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))` or `urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443)`
Expand Down
26 changes: 26 additions & 0 deletions docs/zh_cn/get_started/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,16 @@ OpenCompass 使用称为 task (任务) 的单位处理评估请求。每个任

例如,如果您在配备有 8 个 GPU 的本地机器上使用 OpenCompass,每个任务要求 4 个 GPU,那么默认情况下,OpenCompass 会使用所有 8 个 GPU 同时运行 2 个任务。但是,如果您将 `--max-num-workers` 设置为 1,那么一次只会处理一个任务,只使用 4 个 GPU。

### 为什么 HuggingFace 模型使用 GPU 的行为和我的预期不符?

这是一个比较复杂的问题,我们需要从供给和需求两侧来说明:

供给侧就是运行多少任务。任务是模型和数据集的组合,它首先取决于要测多少模型和多少数据集。另外由于 OpenCompass 会将一个较大的任务拆分成多个小任务,因此每个子任务有多少条数据 `--max-partition-size` 也会影响任务的数量。(`--max-partition-size` 与真实数据条目成正比,但并不是 1:1 的关系)。

需求侧就是有多少 worker 在运行。由于 OpenCompass 会同时实例化多个模型去进行推理,因此我们用 `--num-gpus` 来指定每个实例使用多少 GPU。注意 `--num-gpus` 是一个 HuggingFace 模型专用的参数,非 HuggingFace 模型设置该参数是不会起作用的。同时我们使用 `--max-num-workers` 去表示最多有多少个实例在运行。最后由于 GPU 显存、负载不充分等问题,OpenCompass 也支持在同一个 GPU 上运行多个实例,这个参数是 `--max-num-workers-per-gpu`。因此可以笼统地认为,我们总共会使用 `--num-gpus` * `--max-num-workers` / `--max-num-workers-per-gpu` 个 GPU。

综上,当任务运行较慢,GPU 负载不高的时候,我们首先需要检查供给是否充足,如果不充足,可以考虑调小 `--max-partition-size` 来将任务拆分地更细;其次需要检查需求是否充足,如果不充足,可以考虑增大 `--max-num-workers``--max-num-workers-per-gpu`。一般来说,**我们会将 `--num-gpus` 设定为最小的满足需求的值,并不会再进行调整**

### 我如何控制 OpenCompass 占用的 GPU 数量?

目前,没有直接的方法来指定 OpenCompass 可以使用的 GPU 数量。但以下是一些间接策略:
Expand All @@ -20,6 +30,22 @@ OpenCompass 使用称为 task (任务) 的单位处理评估请求。每个任
**如果使用 Slurm 或 DLC:**
尽管 OpenCompass 没有直接访问资源池,但您可以调整 `--max-num-workers` 参数以限制同时提交的评估任务数量。这将间接管理 OpenCompass 使用的 GPU 数量。例如,如果每个任务需要 4 个 GPU,您希望分配总共 8 个 GPU,那么应将 `--max-num-workers` 设置为 2。

### 找不到 `libGL.so.1`

opencv-python 依赖一些动态库,但环境中没有,最简单的解决办法是卸载 opencv-python 再安装 opencv-python-headless。

```bash
pip uninstall opencv-python
pip install opencv-python-headless
```

也可以根据报错提示安装对应的依赖库

```bash
sudo apt-get update
sudo apt-get install -y libgl1 libglib2.0-0
```

## 网络

### 运行报错:`('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))``urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443)`
Expand Down
16 changes: 15 additions & 1 deletion opencompass/summarizers/subjective.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,21 @@
from datetime import datetime
from typing import List, Optional

import cv2
try:
import cv2
except ImportError:
import traceback

traceback.print_exc()
raise ImportError(
'Import cv2 failed. Please install it with '
'"pip install opencv-python-headless" and try again.\n\n'
'If the prompt `ImportError: libGL.so.1` appears,'
' you may consider one of the following two methods:\n'
'Method 1 - Uninstall opencv and then install opencv-headless\n'
'pip uninstall opencv-python; pip install opencv-python-headless\n\n'
'Method 2: Install the missing dynamic link libraries\n'
'sudo apt-get update; sudo apt-get install -y libgl1 libglib2.0-0')
import mmengine
import numpy as np
import pandas as pd
Expand Down

0 comments on commit 79f6449

Please sign in to comment.