We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
环境配置 A 环境 cuda12.1 v0.2.0 B 环境 cuda11.8 v0.1.13 硬件 A800单卡测试
模型 qwen14B 单卡加载 int8推理 环境变量如下配置 export CUDA_VISIBLE_DEVICES=1 export MODEL_TYPE=qwen_2 export ACT_TYPE=BF16 export WEIGHT_TYPE=INT8 export INT8_KV_CACHE=1 export MAX_SEQ_LEN=32000 export CONCURRENCY_LIMIT=50 export TOKENIZER_PATH="/data/models/Qwen1.5-14B-Chat" export CHECKPOINT_PATH="/data/models/Qwen1.5-14B-Chat" export START_PORT=8020 export KV_CACHE_MEM_MB=8000 export PP_SIZE=1 export TP_SIZE=1
python -m maga_transformer.start_server
测试数据 10输入 50输出 超短场景
经测试 deepseek 等其余模型也有一定的速度下降
The text was updated successfully, but these errors were encountered:
特殊补充 v0.2.0 加载是会有许多v0.1.13没有的warning 如下所示
Sorry, something went wrong.
No branches or pull requests
环境配置
A 环境 cuda12.1 v0.2.0
B 环境 cuda11.8 v0.1.13
硬件
A800单卡测试
模型 qwen14B
单卡加载 int8推理 环境变量如下配置
export CUDA_VISIBLE_DEVICES=1
export MODEL_TYPE=qwen_2
export ACT_TYPE=BF16
export WEIGHT_TYPE=INT8
export INT8_KV_CACHE=1
export MAX_SEQ_LEN=32000
export CONCURRENCY_LIMIT=50
export TOKENIZER_PATH="/data/models/Qwen1.5-14B-Chat"
export CHECKPOINT_PATH="/data/models/Qwen1.5-14B-Chat"
export START_PORT=8020
export KV_CACHE_MEM_MB=8000
export PP_SIZE=1
export TP_SIZE=1
python -m maga_transformer.start_server
测试数据 10输入 50输出 超短场景
经测试 deepseek 等其余模型也有一定的速度下降
The text was updated successfully, but these errors were encountered: