Update docs

EricLBuehler · Oct 10, 2024 · bfff5ff · bfff5ff
1 parent 065236b
commit bfff5ff
Show file tree

Hide file tree

Showing 2 changed files with 4 additions and 0 deletions.
diff --git a/docs/ISQ.md b/docs/ISQ.md
@@ -21,6 +21,7 @@ To set the ISQ type for individual layers, use a model [`topology`](TOPOLOGY.md)
 - Q8K  (*not available on CUDA*)
 - HQQ4
 - HQQ8
+- FP8
 
 When using ISQ, it will automatically load ISQ-able weights into CPU memory before applying ISQ. The ISQ application process moves the weights to device memory. This process is implemented to avoid memory spikes from loading the model in full precision.
 

diff --git a/docs/UQFF.md b/docs/UQFF.md
@@ -51,6 +51,9 @@ The following quantization formats are supported in UQFF. One can, of course, be
     - HQQ4
     - HQQ8
 
+- FP8:
+    - FP8 E4M3 (4-bit exponent, 3-bit mantissa)
+
 ## Loading a UQFF model
 
 To load a UQFF model, one should specify the artifact path. This can be either be a path to a UQFF file locally, or a Hugging Face model ID with the format `<MODEL ID>/<FILE>`. For example, the following work: