v0.2.0(cuda12)对比 v0.1.13(cuda11)表现下降 #74

invisifire · 2024-06-26T09:50:53Z

环境配置
A 环境 cuda12.1 v0.2.0
B 环境 cuda11.8 v0.1.13
硬件
A800单卡测试

模型 qwen14B
单卡加载 int8推理环境变量如下配置
export CUDA_VISIBLE_DEVICES=1
export MODEL_TYPE=qwen_2
export ACT_TYPE=BF16
export WEIGHT_TYPE=INT8
export INT8_KV_CACHE=1
export MAX_SEQ_LEN=32000
export CONCURRENCY_LIMIT=50
export TOKENIZER_PATH="/data/models/Qwen1.5-14B-Chat"
export CHECKPOINT_PATH="/data/models/Qwen1.5-14B-Chat"
export START_PORT=8020
export KV_CACHE_MEM_MB=8000
export PP_SIZE=1
export TP_SIZE=1

python -m maga_transformer.start_server

测试数据 10输入 50输出超短场景

经测试 deepseek 等其余模型也有一定的速度下降

invisifire · 2024-06-26T09:53:27Z

特殊补充 v0.2.0 加载是会有许多v0.1.13没有的warning 如下所示

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0(cuda12)对比 v0.1.13(cuda11)表现下降 #74

v0.2.0(cuda12)对比 v0.1.13(cuda11)表现下降 #74

invisifire commented Jun 26, 2024

invisifire commented Jun 26, 2024

v0.2.0(cuda12)对比 v0.1.13(cuda11)表现下降 #74

v0.2.0(cuda12)对比 v0.1.13(cuda11)表现下降 #74

Comments

invisifire commented Jun 26, 2024

invisifire commented Jun 26, 2024