Skip to content

Commit

Permalink
Various improvements & upgrade ggml (#75)
Browse files Browse the repository at this point in the history
* Use types from typing for better compatibility with older Python versions

* Split last double end of line token as per BlinkDL's suggestion

* Fix MSVC warnings

* Drop Q4_2 support

* Update ggml

* Bump file format version for quantization changes

* Apply suggestions
  • Loading branch information
saharNooby authored May 27, 2023
1 parent 3ca9c7f commit dea929f
Show file tree
Hide file tree
Showing 13 changed files with 230 additions and 77 deletions.
34 changes: 34 additions & 0 deletions CODE_STYLE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Code Style

Please follow this code style when contributing to `rwkv.cpp`.

This list is not complete.

## General

Overall, keep code in similar style as it was before.

- Keep lines at 180 characters or shorter.
- Separate logically grouped pieces of code with empty lines.
- Surround `if`, `for`, `while`, `do` and other similar statements with empty lines.
- Write documentation for public functions indended for outside use.
- Place single-line comments on the line before, not right after the code line.
- Start comments with a capital letter, use correct grammar and punctuation.

## C/C++

- Use 4 spaces for indentation.
- Use [The One True Brace Style](https://en.wikipedia.org/wiki/Indentation_style#Variant:_1TBS_(OTBS)):
- Place braces on the same line as the statement.
- Always add braces to `if`, `for`, `while`, `do` and other similar statements.

## Python

- Use 2 spaces for indentation.
- Specify types for functions and parameters.
- For `void` functions, specify `-> None`.
- Specifying types for local variables:
- required, if they are global
- required, if they are compound (lists, dicts, optionals, etc.)
- optional otherwise.
- Use types from `typing` (`List`, `Dict`) instead of built-in (`list`, `dict`).
19 changes: 17 additions & 2 deletions FILE_FORMAT.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ RWKVModelFile {
// All ints and floats are in machine byte order.
// Magic is "ggml" string bytes.
int32 magic = 0x67676d66;
int32 version = 100;
// Can be either 100 or 101. See "File versions" section below for details.
int32 version = 101;
int32 n_vocab;
int32 n_embed;
int32 n_layer;
Expand Down Expand Up @@ -39,14 +40,28 @@ Parameter {
}
```

## File versions

### `100`

Original version number, chosen to not interfere with `llama.cpp` file version number of `1`.

### `101`

Introduced on 2023-05-27, as `ggml` was updated to commit [00b49ec](https://github.com/ggerganov/ggml/commit/00b49ec707d73df0176e21630a6e23c2aa0e938c).

All quantized formats (`QX_Y`) were changed in a backwards-incompatible way: new version of `ggml` can not handle loading version `100` quantized models.

`FP32` and `FP16` remain the same.

## Data types

- 0: `FP32`
- 1: `FP16`
- 2: `Q4_0`
- 3: `Q4_1`
- 4: *unused*
- 5: `Q4_2`
- 5: *unused*
- 6: *unused*
- 7: `Q5_0`
- 8: `Q5_1`
Expand Down
34 changes: 26 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This is a port of [BlinkDL/RWKV-LM](https://github.com/BlinkDL/RWKV-LM) to [ggerganov/ggml](https://github.com/ggerganov/ggml).

Besides the usual **FP32**, it supports **FP16**, **quantized INT4** and **quantized INT8** inference. This project is **CPU only**.
Besides the usual **FP32**, it supports **FP16**, **quantized INT4, INT5 and INT8** inference. This project is **CPU only**.

This project provides [a C library rwkv.h](rwkv.h) and [a convinient Python wrapper](rwkv%2Frwkv_cpp_model.py) for it.

Expand All @@ -20,7 +20,6 @@ Below table is for reference only. Measurements were made on 4C/8T x86 CPU with
|-----------|-------------------|--------------------|----------------------|
| `Q4_0` | 17.507 | *76* | **1.53** |
| `Q4_1` | 17.187 | **72** | 1.68 |
| `Q4_2` | 17.060 | 85 | **1.53** |
| `Q5_0` | 16.194 | 78 | *1.60* |
| `Q5_1` | 15.851 | 81 | 1.68 |
| `Q8_0` | *15.652* | 89 | 2.13 |
Expand Down Expand Up @@ -105,10 +104,10 @@ python rwkv/convert_pytorch_to_ggml.py ~/Downloads/RWKV-4-Pile-169M-20220807-802

```commandline
# Windows
python rwkv\quantize.py C:\rwkv.cpp-169M.bin C:\rwkv.cpp-169M-Q4_2.bin Q4_2
python rwkv\quantize.py C:\rwkv.cpp-169M.bin C:\rwkv.cpp-169M-Q5_1.bin Q5_1
# Linux / MacOS
python rwkv/quantize.py ~/Downloads/rwkv.cpp-169M.bin ~/Downloads/rwkv.cpp-169M-Q4_2.bin Q4_2
python rwkv/quantize.py ~/Downloads/rwkv.cpp-169M.bin ~/Downloads/rwkv.cpp-169M-Q5_1.bin Q5_1
```

### 4. Run the model
Expand All @@ -121,20 +120,20 @@ To generate some text, run:

```commandline
# Windows
python rwkv\generate_completions.py C:\rwkv.cpp-169M-Q4_2.bin
python rwkv\generate_completions.py C:\rwkv.cpp-169M-Q5_1.bin
# Linux / MacOS
python rwkv/generate_completions.py ~/Downloads/rwkv.cpp-169M-Q4_2.bin
python rwkv/generate_completions.py ~/Downloads/rwkv.cpp-169M-Q5_1.bin
```

To chat with a bot, run:

```commandline
# Windows
python rwkv\chat_with_bot.py C:\rwkv.cpp-169M-Q4_2.bin
python rwkv\chat_with_bot.py C:\rwkv.cpp-169M-Q5_1.bin
# Linux / MacOS
python rwkv/chat_with_bot.py ~/Downloads/rwkv.cpp-169M-Q4_2.bin
python rwkv/chat_with_bot.py ~/Downloads/rwkv.cpp-169M-Q5_1.bin
```

Edit [generate_completions.py](rwkv%2Fgenerate_completions.py) or [chat_with_bot.py](rwkv%2Fchat_with_bot.py) to change prompts and sampling settings.
Expand Down Expand Up @@ -167,3 +166,22 @@ for token in [1, 2, 3]:
model.free()

```

## Compatibility

`ggml` moves fast, and can occasionally break compatibility with older file formats.

`rwkv.cpp` will attempt it's best to explain why a model file can't be loaded and what next steps are available to the user.

For reference only, here is a list of latest versions of `rwkv.cpp` that have supported older formats. **No support will be provided for these versions**.

- `Q4_2`, old layout of quantized formats
- [commit 3ca9c7f](https://github.com/saharNooby/rwkv.cpp/commit/3ca9c7f7857a4b9f3de616ec938e71249cfb3f3f), [release with prebuilt binaries](https://github.com/saharNooby/rwkv.cpp/releases/tag/master-3ca9c7f)
- `Q4_3`, `Q4_1_O`
- [commit c736ef5](https://github.com/saharNooby/rwkv.cpp/commit/c736ef5411606b529d3a74c139ee111ef1a28bb9), [release with prebuilt binaries](https://github.com/saharNooby/rwkv.cpp/releases/tag/master-1c363e6)

See also [FILE_FORMAT.md](FILE_FORMAT.md) for version numbers of `rwkv.cpp` model files and their changelog.

## Contributing

There is no complete contributor guide yet; but we have [CODE_STYLE.md](CODE_STYLE.md).
2 changes: 1 addition & 1 deletion ggml
Submodule ggml updated 48 files
+16 −5 README.md
+16 −1 examples/CMakeLists.txt
+0 −6 examples/common-ggml.cpp
+174 −7 examples/common.cpp
+19 −1 examples/common.h
+1 −34 examples/dolly-v2/README.md
+1 −1 examples/dolly-v2/convert-h5-to-ggml.py
+88 −71 examples/dolly-v2/main.cpp
+30 −20 examples/dolly-v2/quantize.cpp
+21 −18 examples/gpt-2/main.cpp
+15 −9 examples/gpt-2/quantize.cpp
+23 −24 examples/gpt-j/main.cpp
+15 −9 examples/gpt-j/quantize.cpp
+4 −4 examples/gpt-neox/CMakeLists.txt
+107 −0 examples/gpt-neox/README.md
+1 −1 examples/gpt-neox/convert-h5-to-ggml.py
+105 −71 examples/gpt-neox/main.cpp
+22 −12 examples/gpt-neox/quantize.cpp
+12 −0 examples/mnist/README.md
+124 −71 examples/mnist/main.cpp
+178 −0 examples/mnist/web/index.html
+13 −0 examples/mpt/CMakeLists.txt
+158 −0 examples/mpt/convert-h5-to-ggml.py
+1,027 −0 examples/mpt/main.cpp
+186 −0 examples/mpt/quantize.cpp
+13 −0 examples/replit/CMakeLists.txt
+113 −0 examples/replit/convert-h5-to-ggml.py
+767 −0 examples/replit/main.cpp
+182 −0 examples/replit/quantize.cpp
+0 −144 examples/stablelm/README.md
+13 −0 examples/starcoder/CMakeLists.txt
+112 −0 examples/starcoder/README.md
+212 −0 examples/starcoder/convert-hf-to-ggml.py
+868 −0 examples/starcoder/main.cpp
+184 −0 examples/starcoder/quantize.cpp
+1 −1 examples/whisper/main.cpp
+10 −4 examples/whisper/quantize.cpp
+41 −39 examples/whisper/whisper.cpp
+216 −13 include/ggml/ggml.h
+2 −0 scripts/sync-whisper.sh
+368 −159 src/ggml-cuda.cu
+4 −0 src/ggml-cuda.h
+85 −122 src/ggml-opencl.c
+4,720 −2,433 src/ggml.c
+8 −0 tests/CMakeLists.txt
+777 −34 tests/test-grad0.c
+7 −7 tests/test-mul-mat0.c
+205 −0 tests/test-opt.c
Loading

0 comments on commit dea929f

Please sign in to comment.