Various improvements & upgrade ggml (#75)

* Use types from typing for better compatibility with older Python versions * Split last double end of line token as per BlinkDL's suggestion * Fix MSVC warnings * Drop Q4_2 support * Update ggml * Bump file format version for quantization changes * Apply suggestions
RWKV · May 27, 2023 · dea929f · dea929f
1 parent 3ca9c7f
commit dea929f
Show file tree

Hide file tree

Showing 13 changed files with 230 additions and 77 deletions.
diff --git a/CODE_STYLE.md b/CODE_STYLE.md
@@ -0,0 +1,34 @@
+# Code Style
+
+Please follow this code style when contributing to `rwkv.cpp`.
+
+This list is not complete.
+
+## General
+
+Overall, keep code in similar style as it was before.
+
+- Keep lines at 180 characters or shorter.
+- Separate logically grouped pieces of code with empty lines.
+- Surround `if`, `for`, `while`, `do` and other similar statements with empty lines.
+- Write documentation for public functions indended for outside use.
+- Place single-line comments on the line before, not right after the code line.
+- Start comments with a capital letter, use correct grammar and punctuation.
+
+## C/C++
+
+- Use 4 spaces for indentation.
+- Use [The One True Brace Style](https://en.wikipedia.org/wiki/Indentation_style#Variant:_1TBS_(OTBS)):
+  - Place braces on the same line as the statement.
+  - Always add braces to `if`, `for`, `while`, `do` and other similar statements.
+
+## Python
+
+- Use 2 spaces for indentation.
+- Specify types for functions and parameters.
+  - For `void` functions, specify `-> None`.
+- Specifying types for local variables:
+  - required, if they are global
+  - required, if they are compound (lists, dicts, optionals, etc.)
+  - optional otherwise.
+- Use types from `typing` (`List`, `Dict`) instead of built-in (`list`, `dict`).
diff --git a/FILE_FORMAT.md b/FILE_FORMAT.md
@@ -11,7 +11,8 @@ RWKVModelFile {
     // All ints and floats are in machine byte order.
     // Magic is "ggml" string bytes.
     int32 magic = 0x67676d66;
-    int32 version = 100;
+    // Can be either 100 or 101. See "File versions" section below for details.
+    int32 version = 101;
     int32 n_vocab;
     int32 n_embed;
     int32 n_layer;
@@ -39,14 +40,28 @@ Parameter {
 }
 ```
 
+## File versions
+
+### `100`
+
+Original version number, chosen to not interfere with `llama.cpp` file version number of `1`.
+
+### `101`
+
+Introduced on 2023-05-27, as `ggml` was updated to commit [00b49ec](https://github.com/ggerganov/ggml/commit/00b49ec707d73df0176e21630a6e23c2aa0e938c).
+
+All quantized formats (`QX_Y`) were changed in a backwards-incompatible way: new version of `ggml` can not handle loading version `100` quantized models.
+
+`FP32` and `FP16` remain the same.
+
 ## Data types
 
 - 0: `FP32`
 - 1: `FP16`
 - 2: `Q4_0`
 - 3: `Q4_1`
 - 4: *unused*
-- 5: `Q4_2`
+- 5: *unused*
 - 6: *unused*
 - 7: `Q5_0`
 - 8: `Q5_1`

diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 This is a port of [BlinkDL/RWKV-LM](https://github.com/BlinkDL/RWKV-LM) to [ggerganov/ggml](https://github.com/ggerganov/ggml).
 
-Besides the usual **FP32**, it supports **FP16**, **quantized INT4** and **quantized INT8** inference. This project is **CPU only**.
+Besides the usual **FP32**, it supports **FP16**, **quantized INT4, INT5 and INT8** inference. This project is **CPU only**.
 
 This project provides [a C library rwkv.h](rwkv.h) and [a convinient Python wrapper](rwkv%2Frwkv_cpp_model.py) for it.
 
@@ -20,7 +20,6 @@ Below table is for reference only. Measurements were made on 4C/8T x86 CPU with
 |-----------|-------------------|--------------------|----------------------|
 | `Q4_0`    | 17.507            | *76*               | **1.53**             |
 | `Q4_1`    | 17.187            | **72**             | 1.68                 |
-| `Q4_2`    | 17.060            | 85                 | **1.53**             |
 | `Q5_0`    | 16.194            | 78                 | *1.60*               |
 | `Q5_1`    | 15.851            | 81                 | 1.68                 |
 | `Q8_0`    | *15.652*          | 89                 | 2.13                 |
@@ -105,10 +104,10 @@ python rwkv/convert_pytorch_to_ggml.py ~/Downloads/RWKV-4-Pile-169M-20220807-802
 
 ```commandline
 # Windows
-python rwkv\quantize.py C:\rwkv.cpp-169M.bin C:\rwkv.cpp-169M-Q4_2.bin Q4_2
+python rwkv\quantize.py C:\rwkv.cpp-169M.bin C:\rwkv.cpp-169M-Q5_1.bin Q5_1
 
 # Linux / MacOS
-python rwkv/quantize.py ~/Downloads/rwkv.cpp-169M.bin ~/Downloads/rwkv.cpp-169M-Q4_2.bin Q4_2
+python rwkv/quantize.py ~/Downloads/rwkv.cpp-169M.bin ~/Downloads/rwkv.cpp-169M-Q5_1.bin Q5_1
 ```
 
 ### 4. Run the model
@@ -121,20 +120,20 @@ To generate some text, run:
 
 ```commandline
 # Windows
-python rwkv\generate_completions.py C:\rwkv.cpp-169M-Q4_2.bin
+python rwkv\generate_completions.py C:\rwkv.cpp-169M-Q5_1.bin
 
 # Linux / MacOS
-python rwkv/generate_completions.py ~/Downloads/rwkv.cpp-169M-Q4_2.bin
+python rwkv/generate_completions.py ~/Downloads/rwkv.cpp-169M-Q5_1.bin
 ```
 
 To chat with a bot, run:
 
 ```commandline
 # Windows
-python rwkv\chat_with_bot.py C:\rwkv.cpp-169M-Q4_2.bin
+python rwkv\chat_with_bot.py C:\rwkv.cpp-169M-Q5_1.bin
 
 # Linux / MacOS
-python rwkv/chat_with_bot.py ~/Downloads/rwkv.cpp-169M-Q4_2.bin
+python rwkv/chat_with_bot.py ~/Downloads/rwkv.cpp-169M-Q5_1.bin
 ```
 
 Edit [generate_completions.py](rwkv%2Fgenerate_completions.py) or [chat_with_bot.py](rwkv%2Fchat_with_bot.py) to change prompts and sampling settings.
@@ -167,3 +166,22 @@ for token in [1, 2, 3]:
 model.free()
 
 ```
+
+## Compatibility
+
+`ggml` moves fast, and can occasionally break compatibility with older file formats.
+
+`rwkv.cpp` will attempt it's best to explain why a model file can't be loaded and what next steps are available to the user.
+
+For reference only, here is a list of latest versions of `rwkv.cpp` that have supported older formats. **No support will be provided for these versions**.
+
+- `Q4_2`, old layout of quantized formats
+  - [commit 3ca9c7f](https://github.com/saharNooby/rwkv.cpp/commit/3ca9c7f7857a4b9f3de616ec938e71249cfb3f3f), [release with prebuilt binaries](https://github.com/saharNooby/rwkv.cpp/releases/tag/master-3ca9c7f)
+- `Q4_3`, `Q4_1_O`
+  - [commit c736ef5](https://github.com/saharNooby/rwkv.cpp/commit/c736ef5411606b529d3a74c139ee111ef1a28bb9), [release with prebuilt binaries](https://github.com/saharNooby/rwkv.cpp/releases/tag/master-1c363e6)
+
+See also [FILE_FORMAT.md](FILE_FORMAT.md) for version numbers of `rwkv.cpp` model files and their changelog.
+
+## Contributing
+
+There is no complete contributor guide yet; but we have [CODE_STYLE.md](CODE_STYLE.md).
diff --git a/ggml b/ggml