Skip to content

Commit

Permalink
[Typo] Fix missing links in the bitnet integration's docs (#136)
Browse files Browse the repository at this point in the history
* fix install with absolute path

* efficient inference with torch compile

* update vllm ckpt tutorial for bitnet

* ReadME Fix.
  • Loading branch information
LeiWang1999 authored Aug 9, 2024
1 parent 7c6bccf commit d52f93d
Showing 1 changed file with 4 additions and 3 deletions.
7 changes: 4 additions & 3 deletions integration/BitNet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@
license: mit
---

## Latest News

- 08/09/2024 ✨: We provide a more efficient implementation for bitnet with vLLM, which should use special model checkpoints, to make the ckpt, please reach [].

This is a BitBLAS Implementation for the reproduced 1.58bit model from [1bitLLM/bitnet_b1_58-3B](https://huggingface.co/1bitLLM/bitnet_b1_58-3B). We replaced the original simulated Int8x3bit Quantized Inference Kernel with BitBLAS INT8xINT2 Kernel. We also evaluated the model's correctness and performance through `eval_correctness.py` and `benchmark_inference_latency.py`.

## Latest News

- 08/09/2024 ✨: We provide a more efficient implementation for bitnet with vLLM, which should use special model checkpoints, to make the ckpt and study how to deploy, please checkout [Make Checkpoints for vLLM](#make-checkpoints-for-vllm).

## Make Checkpoints for vLLM

We provide two scripts to make the checkpoints for vLLM. The first script is `generate_bitnet_model_native_format.sh`, which is used to make a checkpoint with fp16 uncompressed metaadta, the main difference with the original checkpoint is the `quant_config.json`, which allow vLLM to load the model and execute with a quant extension.
Expand Down

0 comments on commit d52f93d

Please sign in to comment.