Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
EricLBuehler committed Oct 10, 2024
1 parent 065236b commit bfff5ff
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/ISQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ To set the ISQ type for individual layers, use a model [`topology`](TOPOLOGY.md)
- Q8K (*not available on CUDA*)
- HQQ4
- HQQ8
- FP8

When using ISQ, it will automatically load ISQ-able weights into CPU memory before applying ISQ. The ISQ application process moves the weights to device memory. This process is implemented to avoid memory spikes from loading the model in full precision.

Expand Down
3 changes: 3 additions & 0 deletions docs/UQFF.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ The following quantization formats are supported in UQFF. One can, of course, be
- HQQ4
- HQQ8

- FP8:
- FP8 E4M3 (4-bit exponent, 3-bit mantissa)

## Loading a UQFF model

To load a UQFF model, one should specify the artifact path. This can be either be a path to a UQFF file locally, or a Hugging Face model ID with the format `<MODEL ID>/<FILE>`. For example, the following work:
Expand Down

0 comments on commit bfff5ff

Please sign in to comment.