Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

怎么用OpenCompas 测试 kvcache 量化的的精度 #608

Closed
seeyourcell opened this issue Oct 25, 2023 · 5 comments
Closed

怎么用OpenCompas 测试 kvcache 量化的的精度 #608

seeyourcell opened this issue Oct 25, 2023 · 5 comments
Assignees

Comments

@seeyourcell
Copy link

Motivation

怎么用OpenCompas 测试 kvcache 量化的的精度

Related resources

No response

Additional context

No response

@seeyourcell
Copy link
Author

有对应的脚本吗? 在OpenCompass 看到可以用 huggface 的模型测试,但kvcache 量化的模型怎么用 OpenCompass 测试

@lvhan028
Copy link
Collaborator

opencompass中有一个PR:
open-compass/opencompass#484
只要准备好 kv cache 量化后的 turbomind 模型,就能使用这个PR测试了。
不过这个PR中的config是针对 internlm 模型的。

@lvhan028 lvhan028 self-assigned this Oct 25, 2023
@seeyourcell
Copy link
Author

image

1、只要准备好 kv cache 量化后的 turbomind 模型 的路径,选择 ./workspace 还是workspace/turbomind 还是 workspace/triton_models
2、 triton_models 和 turbomind 有什么区别, 开启kv cache 量化 是 triton_models的weight 还是 turbomind/1/weight

@seeyourcell
Copy link
Author

@lvhan028
Copy link
Collaborator

Since v0.4.0, LMDeploy has supported online 4-bit/8-bit kv cache quantization. The old offline way was removed.
The guide about the model evaluation using lmdeploy as an accelerator can be found from here

If kv cache quantization is required for evaluation, please set quant_policy to the engine_config. quant_policy=4 means 4-bit kv quantization and quant_policy=8 indicates 8-bit kv quantization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants