From bfff5ff31a64ad9fed00fa78c8bbad1100d63b54 Mon Sep 17 00:00:00 2001 From: Eric Buehler Date: Wed, 9 Oct 2024 21:44:44 -0400 Subject: [PATCH] Update docs --- docs/ISQ.md | 1 + docs/UQFF.md | 3 +++ 2 files changed, 4 insertions(+) diff --git a/docs/ISQ.md b/docs/ISQ.md index 76cff4fc0..bfaad1a04 100644 --- a/docs/ISQ.md +++ b/docs/ISQ.md @@ -21,6 +21,7 @@ To set the ISQ type for individual layers, use a model [`topology`](TOPOLOGY.md) - Q8K (*not available on CUDA*) - HQQ4 - HQQ8 +- FP8 When using ISQ, it will automatically load ISQ-able weights into CPU memory before applying ISQ. The ISQ application process moves the weights to device memory. This process is implemented to avoid memory spikes from loading the model in full precision. diff --git a/docs/UQFF.md b/docs/UQFF.md index 7dfa4a30b..29bd19fdd 100644 --- a/docs/UQFF.md +++ b/docs/UQFF.md @@ -51,6 +51,9 @@ The following quantization formats are supported in UQFF. One can, of course, be - HQQ4 - HQQ8 +- FP8: + - FP8 E4M3 (4-bit exponent, 3-bit mantissa) + ## Loading a UQFF model To load a UQFF model, one should specify the artifact path. This can be either be a path to a UQFF file locally, or a Hugging Face model ID with the format `/`. For example, the following work: