怎么用OpenCompas 测试 kvcache 量化的的精度 #608

seeyourcell · 2023-10-25T02:25:32Z

Motivation

怎么用OpenCompas 测试 kvcache 量化的的精度

Related resources

No response

Additional context

No response

seeyourcell · 2023-10-25T02:26:50Z

有对应的脚本吗？在OpenCompass 看到可以用 huggface 的模型测试，但kvcache 量化的模型怎么用 OpenCompass 测试

lvhan028 · 2023-10-25T03:03:12Z

opencompass中有一个PR：
open-compass/opencompass#484
只要准备好 kv cache 量化后的 turbomind 模型，就能使用这个PR测试了。
不过这个PR中的config是针对 internlm 模型的。

seeyourcell · 2023-10-25T12:45:24Z

1、只要准备好 kv cache 量化后的 turbomind 模型的路径，选择 ./workspace 还是workspace/turbomind 还是 workspace/triton_models
2、 triton_models 和 turbomind 有什么区别，开启kv cache 量化是 triton_models的weight 还是 turbomind/1/weight

seeyourcell · 2023-10-25T12:50:45Z

https://github.com/InternLM/lmdeploy/blob/main/docs/en/kv_int8.md

lvhan028 · 2024-06-12T04:06:55Z

Since v0.4.0, LMDeploy has supported online 4-bit/8-bit kv cache quantization. The old offline way was removed.
The guide about the model evaluation using lmdeploy as an accelerator can be found from here

If kv cache quantization is required for evaluation, please set quant_policy to the engine_config. quant_policy=4 means 4-bit kv quantization and quant_policy=8 indicates 8-bit kv quantization

lvhan028 self-assigned this Oct 25, 2023

lvhan028 closed this as completed Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

怎么用OpenCompas 测试 kvcache 量化的的精度 #608

怎么用OpenCompas 测试 kvcache 量化的的精度 #608

seeyourcell commented Oct 25, 2023

seeyourcell commented Oct 25, 2023

lvhan028 commented Oct 25, 2023

seeyourcell commented Oct 25, 2023

seeyourcell commented Oct 25, 2023

lvhan028 commented Jun 12, 2024

怎么用OpenCompas 测试 kvcache 量化的的精度 #608

怎么用OpenCompas 测试 kvcache 量化的的精度 #608

Comments

seeyourcell commented Oct 25, 2023

Motivation

Related resources

Additional context

seeyourcell commented Oct 25, 2023

lvhan028 commented Oct 25, 2023

seeyourcell commented Oct 25, 2023

seeyourcell commented Oct 25, 2023

lvhan028 commented Jun 12, 2024