Skip to content

Commit

Permalink
space
Browse files Browse the repository at this point in the history
  • Loading branch information
SunMarc committed Aug 14, 2023
1 parent 10c12bd commit 4d2f864
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions docs/source/llm_quantization/usage_guides/quantization.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ If you want to quantize 🤗 Transformers models with GPTQ, follow this [documen
To learn more about the quantization technique used in GPTQ, please refer to:
- the [GPTQ](https://arxiv.org/pdf/2210.17323.pdf) paper
- the [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) library used as the backend

Note that the AutoGPTQ library provides more advanced usage (triton backend, fused attention, fused MLP) that are not integrated with Optimum. For now, we leverage only the CUDA kernel for GPTQ.

### Requirements
Expand Down

0 comments on commit 4d2f864

Please sign in to comment.