GPTQ-for-RWKV

Setup

pip install -r requirements.txt
python setup_cuda.py install

Results

Here is a summary of RWKV:

Tested on a V100 16GB using the commands below

Wiki2 PPL	FP16	4bit-GPTQ	4g128-GPTQ
RWKV-430M	17.59375	19.28125	18.328125

All models can be found in the HF hub

# Fp16
python rwkv.py --model RWKV/rwkv-4-430m-pile --dataset wikitext2 --wbits 16 --benchmark 32
# Quantize to 4bit
python rwkv.py --model RWKV/rwkv-4-430m-pile --dataset wikitext2 --wbits 4 --save rwkv430M_4bit.pt
# Bench 4bit
python rwkv.py --model RWKV/rwkv-4-430m-pile --dataset wikitext2 --wbits 4 --load rwkv430M_4bit.pt --benchmark 32

# Quantize to 4bit groupsize 128
python rwkv.py --model RWKV/rwkv-4-430m-pile --dataset wikitext2 --wbits 4 --groupsize 128 --save rwkv430M_4g128.pt
# Bench 4bit groupsize 128
python rwkv.py --model RWKV/rwkv-4-430m-pile --dataset wikitext2 --wbits 4 --groupsize 128 --load rwkv430M_4g128.pt --benchmark 32

For text generation:

python rwkv_inference.py  --model RWKV/rwkv-4-430m-pile --load rwkv430M_4bit.pt --wbits 4

Todo

Fix HuggingFace bug when dispatching model to CPU & GPU

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
datautils.py		datautils.py
gptq.py		gptq.py
modelutils.py		modelutils.py
quant.py		quant.py
quant_cuda.cpp		quant_cuda.cpp
quant_cuda_kernel.cu		quant_cuda_kernel.cu
requirements.txt		requirements.txt
rwkv.py		rwkv.py
rwkv_inference.py		rwkv_inference.py
setup_cuda.py		setup_cuda.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPTQ-for-RWKV

Setup

Results

Todo

Acknowledgements

About

Releases

Packages

Languages

3outeille/GPTQ-for-RWKV

Folders and files

Latest commit

History

Repository files navigation

GPTQ-for-RWKV

Setup

Results

Todo

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages