You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reproduces the problem - code/configuration sample
I am using the following config, called eval_qwen_instruct.py
from opencompass.models import VLLMwithChatTemplate, VLLM
from mmengine.config import read_base
with read_base():
from opencompass.configs.datasets.gsm8k.gsm8k_gen import gsm8k_datasets
from opencompass.configs.summarizers.leaderboard import summarizer
datasets = sum([v for k, v in locals().items() if k.endswith('_datasets') or k == 'datasets'], [])
models = [
dict(
type=VLLMwithChatTemplate,
abbr='qwen2.5-7b-instruct-vllm',
path='Qwen/Qwen2.5-7B-Instruct',
model_kwargs=dict(tensor_parallel_size=1, gpu_memory_utilization=0.6),
max_out_len=4096,
max_seq_len=4096,
batch_size=16,
generation_kwargs=dict(temperature=0),
run_cfg=dict(num_gpus=1),
)
]
work_dir = 'outputs/debug/qwen_2_5_7b_instruct'
Reproduces the problem - command or script
The config works in debug mode but fails in normal mode.
opencompass eval_qwen_instruct.py -a vllm -m infer # fails, see error below
opencompass eval_qwen_instruct.py -a vllm -m infer --debug # runs successfully, but is very slow
Reproduces the problem - error message
The eval fails silently in normal mode. Here's what the output logs look like.
opencompass]$ opencompass eval_qwen_instruct.py -a vllm -m infer
10/04 21:30:50 - OpenCompass - INFO - Transforming qwen2.5-7b-instruct-vllm to vllm
10/04 21:30:50 - OpenCompass - WARNING - Unsupported model type <class 'opencompass.models.vllm_with_tf_above_v4_33.VLLMwithChatTemplate'>, will keep the original model.
10/04 21:30:50 - OpenCompass - INFO - Current exp folder: outputs/debug/qwen_2_5_7b_instruct/20241004_213050
10/04 21:30:50 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
10/04 21:30:50 - OpenCompass - INFO - Partitioned into 1 tasks.
launch OpenICLInfer[qwen2.5-7b-instruct-vllm/gsm8k] on GPU 0
0%| | 0/1 [00:00<?, ?it/s]10/04 21:31:27 - OpenCompass - ERROR - /root/opencompass/opencompass/runners/local.py - _launch - 228 - task OpenICLInfer[qwen2.5-7b-instruct-vllm/gsm8k] fail, see
outputs/debug/qwen_2_5_7b_instruct/20241004_213050/logs/infer/qwen2.5-7b-instruct-vllm/gsm8k.out
100%|███████████████████████████████████████████████████████████████████████| 1/1 [00:36<00:00, 36.54s/it]
10/04 21:31:27 - OpenCompass - ERROR - /opencompass/runners/base.py - summarize - 64 - OpenICLInfer[qwen2.5-7b-instruct-vllm/gsm8k] failed with code -11
Here's the output logs in outputs/debug/qwen_2_5_7b_instruct/20241004_213050/logs/infer/qwen2.5-7b-instruct-vllm/gsm8k.out look like
10/04 21:30:54 - OpenCompass - INFO - Task [qwen2.5-7b-instruct-vllm/gsm8k]
INFO 10-04 21:30:59 llm_engine.py:226] Initializing an LLM engine (v0.6.1.dev238+ge2c6e0a82) with config: model='Qwen/Qwen2.5-7B-Instruct', speculative_config=None, tokenizer='Qwen/Qwen2.5-7B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=Qwen/Qwen2.5-7B-Instruct, use_v2_block_manager=False, num_scheduler_steps=1, multi_step_stream_outputs=False, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, mm_processor_kwargs=None)
INFO 10-04 21:31:00 model_runner.py:1014] Starting to load model Qwen/Qwen2.5-7B-Instruct...
INFO 10-04 21:31:00 weight_utils.py:242] Using model weights format ['*.safetensors']
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:02<00:06, 2.11s/it]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:04<00:04, 2.12s/it]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:06<00:02, 2.05s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:08<00:00, 2.08s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:08<00:00, 2.08s/it]
INFO 10-04 21:31:09 model_runner.py:1025] Loading model weights took 14.2487 GB
INFO 10-04 21:31:11 gpu_executor.py:122] # GPU blocks: 5400, # CPU blocks: 4681
INFO 10-04 21:31:15 model_runner.py:1329] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 10-04 21:31:15 model_runner.py:1333] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 10-04 21:31:25 model_runner.py:1456] Graph capturing finished in 11 secs.
Other information
Hi OpenCompass team, curious to understand what the best way to debug something like this. Also, what is the correct way to set max_seq_len? I do set it to 4096 in eval_qwen_instruct.py but looking at the output logs in outputs/debug/qwen_2_5_7b_instruct/20241004_213050/logs/infer/qwen2.5-7b-instruct-vllm/gsm8k.out, it seems that it is still set to 32768. What am I missing?
The text was updated successfully, but these errors were encountered:
Prerequisite
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
Reproduces the problem - code/configuration sample
I am using the following config, called
eval_qwen_instruct.py
Reproduces the problem - command or script
The config works in debug mode but fails in normal mode.
Reproduces the problem - error message
The eval fails silently in normal mode. Here's what the output logs look like.
Here's the output logs in
outputs/debug/qwen_2_5_7b_instruct/20241004_213050/logs/infer/qwen2.5-7b-instruct-vllm/gsm8k.out
look likeOther information
Hi OpenCompass team, curious to understand what the best way to debug something like this. Also, what is the correct way to set
max_seq_len
? I do set it to 4096 in eval_qwen_instruct.py but looking at the output logs inoutputs/debug/qwen_2_5_7b_instruct/20241004_213050/logs/infer/qwen2.5-7b-instruct-vllm/gsm8k.out
, it seems that it is still set to 32768. What am I missing?The text was updated successfully, but these errors were encountered: