Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Failed to quantize Qwen2.5-Math-72B-Instruct: Measurement/inference error (3): hidden_states #627

Open
3 tasks done
Orion-zhen opened this issue Sep 19, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@Orion-zhen
Copy link
Contributor

Orion-zhen commented Sep 19, 2024

OS

Linux

GPU Library

CUDA 12.x

Python version

3.12

Pytorch version

2.4.1

Model

Qwen/Qwen2.5-Math-72B-Instruct

Describe the bug

calibration dataset: Orion-zhen/math-hard-calibration

console output:

------------------------------------------------
| Measured: model.layers.79 (Attention)        |
| Duration: 17.22 seconds                      |
| Completed step: 159/163                      |
| Avg time / step (rolling): 38.69 seconds     |
| Estimated remaining time: 2min 34sec         |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
 -- Layer: model.layers.79 (MLP)
 ## Measurement/inference error (3): hidden_states

The quantization process goes all right until it reaches layer 79 MLP. I have tried 4 times, the error remains the same.

Reproduction steps

run the command:

python convert.py -fst -hb 8 -c ~/ai/math-hard/math-hard.parquet  -o ../tmp -i ~/ai/Models/qwen2.5-math-72b -cf ~/ai/Models/qwen2.5-math-72b-4.5 -b 4.5

Expected behavior

The model should be quantized to 4.5bpw

Logs

full log can be viewed here:

>_  python convert.py -fst -hb 8 -c ~/ai/math-hard/math-hard.parquet  -o ../tmp -i ~/ai/Models/qwen2.5-math-72b -cf ~/ai/Models/qwen2.5-math-72b-4.5 -b 4.5
 -- Resuming job
 !! Note: Overriding options with settings from existing job
 -- Input: /home/orion/ai/Models/qwen2.5-math-72b
 -- Output: ../tmp
 -- Calibration dataset: /home/orion/ai/math-hard/math-hard.parquet, 100 / 16 rows, 2048 tokens per sample
 -- Target bits per weight: 4.5 (decoder), 8 (head)
 -- Max shard size: 8192 MB
 -- Enabled fast_safetensors option.
 -- Full model will be compiled to: /home/orion/ai/Models/qwen2.5-math-72b-4.5
 !! Warning: Output path /home/orion/ai/Models/qwen2.5-math-72b-4.5 exists but is not empty
 -- Measuring quantization impact...
 -- Resuming from layer: model.layers.77 (MLP)
 -- Layer: model.layers.78 (Attention)
 -- model.layers.78.self_attn.q_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.15 bpw
 -- model.layers.78.self_attn.q_proj                   1:4b_128g s4                                       4.04 bpw
 -- model.layers.78.self_attn.q_proj                   1:4b_64g s4                                        4.07 bpw
 -- model.layers.78.self_attn.q_proj                   1:4b_32g s4                                        4.13 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.15 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.15 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.78.self_attn.q_proj                   1:6b_128g s4                                       6.04 bpw
 -- model.layers.78.self_attn.q_proj                   1:6b_32g s4                                        6.13 bpw
 -- model.layers.78.self_attn.q_proj                   1:8b_128g s4                                       8.04 bpw
 -- model.layers.78.self_attn.k_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.15 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.20 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.17 bpw
 -- model.layers.78.self_attn.k_proj                   1:4b_128g s4                                       4.06 bpw
 -- model.layers.78.self_attn.k_proj                   1:4b_64g s4                                        4.10 bpw
 -- model.layers.78.self_attn.k_proj                   1:4b_32g s4                                        4.16 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.17 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.20 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.26 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.17 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.26 bpw
 -- model.layers.78.self_attn.k_proj                   1:6b_128g s4                                       6.06 bpw
 -- model.layers.78.self_attn.k_proj                   1:6b_32g s4                                        6.16 bpw
 -- model.layers.78.self_attn.k_proj                   1:8b_128g s4                                       8.06 bpw
 -- model.layers.78.self_attn.v_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.15 bpw
 -- model.layers.78.self_attn.v_proj                   0.25:3b_64g/0.75:2b_64g s4                         2.35 bpw
 -- model.layers.78.self_attn.v_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.17 bpw
 -- model.layers.78.self_attn.v_proj                   0.1:4b_64g/0.9:3b_64g s4                           3.20 bpw
 -- model.layers.78.self_attn.v_proj                   1:4b_128g s4                                       4.06 bpw
 -- model.layers.78.self_attn.v_proj                   1:4b_64g s4                                        4.10 bpw
 -- model.layers.78.self_attn.v_proj                   1:4b_32g s4                                        4.16 bpw
 -- model.layers.78.self_attn.v_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.20 bpw
 -- model.layers.78.self_attn.v_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.26 bpw
 -- model.layers.78.self_attn.v_proj                   1:5b_64g s4                                        5.10 bpw
 -- model.layers.78.self_attn.v_proj                   1:5b_32g s4                                        5.16 bpw
 -- model.layers.78.self_attn.v_proj                   1:6b_128g s4                                       6.06 bpw
 -- model.layers.78.self_attn.v_proj                   1:6b_32g s4                                        6.16 bpw
 -- model.layers.78.self_attn.v_proj                   1:8b_32g s4                                        8.16 bpw
 -- model.layers.78.self_attn.v_proj                   1:8b_128g s4                                       8.06 bpw
 -- model.layers.78.self_attn.o_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.14 bpw
 -- model.layers.78.self_attn.o_proj                   1:4b_128g s4                                       4.04 bpw
 -- model.layers.78.self_attn.o_proj                   1:4b_64g s4                                        4.07 bpw
 -- model.layers.78.self_attn.o_proj                   1:4b_32g s4                                        4.13 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.14 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.14 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.78.self_attn.o_proj                   1:6b_128g s4                                       6.04 bpw
 -- model.layers.78.self_attn.o_proj                   1:6b_32g s4                                        6.13 bpw
 -- model.layers.78.self_attn.o_proj                   1:8b_128g s4                                       8.04 bpw
 -- 2.1254 bpw  accuracy: 0.98200013
 -- 2.1805 bpw  accuracy: 0.98263463
 -- 2.2265 bpw  accuracy: 0.98464463
 -- 2.6605 bpw  accuracy: 0.98872345
 -- 3.1487 bpw  accuracy: 0.99127257
 -- 3.1501 bpw  accuracy: 0.99148091
 -- 4.0394 bpw  accuracy: 0.99467434
 -- 4.0411 bpw  accuracy: 0.99497946
 -- 4.0742 bpw  accuracy: 0.99520273
 -- 4.1334 bpw  accuracy: 0.99506140
 -- 4.1501 bpw  accuracy: 0.99567622
 -- 4.1758 bpw  accuracy: 0.99595824
 -- 4.2222 bpw  accuracy: 0.99629830
 -- 4.2848 bpw  accuracy: 0.99655357
 -- 5.1982 bpw  accuracy: 0.99788529
 -- 5.2848 bpw  accuracy: 0.99816783
 -- 6.0394 bpw  accuracy: 0.99851872
 -- 6.2445 bpw  accuracy: 0.99891316
 -- 8.0394 bpw  accuracy: 0.99956538
------------------------------------------------
| Measured: model.layers.78 (Attention)        |
| Duration: 17.27 seconds                      |
| Completed step: 157/163                      |
| Avg time / step (rolling): 17.27 seconds     |
| Estimated remaining time: 1min 43sec         |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
 -- Layer: model.layers.78 (MLP)
 -- model.layers.78.mlp.gate_proj                      0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:4b_128g/0.9:3b_128g s4                         3.14 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:4b_32g/0.9:3b_32g s4                           3.23 bpw
 -- model.layers.78.mlp.gate_proj                      1:4b_128g s4                                       4.03 bpw
 -- model.layers.78.mlp.gate_proj                      1:4b_32g s4                                        4.13 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:5b_128g/0.9:4b_128g s4                         4.14 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:6b_128g/0.9:5b_128g s4                         5.14 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.78.mlp.gate_proj                      1:6b_128g s4                                       6.03 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:8b_128g/0.9:6b_128g s4                         6.25 bpw
 -- model.layers.78.mlp.gate_proj                      1:8b_128g s4                                       8.03 bpw
 -- model.layers.78.mlp.up_proj                        0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.78.mlp.up_proj                        0.25:3b_64g/0.75:2b_64g s4                         2.31 bpw
 -- model.layers.78.mlp.up_proj                        0.3:3b_64g/0.7:2b_64g s4                           2.37 bpw
 -- model.layers.78.mlp.up_proj                        0.25:4b_128g/0.75:3b_128g s4                       3.28 bpw
 -- model.layers.78.mlp.up_proj                        0.25:4b_32g/0.75:3b_32g s4                         3.38 bpw
 -- model.layers.78.mlp.up_proj                        1:4b_32g s4                                        4.13 bpw
 -- model.layers.78.mlp.up_proj                        0.25:5b_128g/0.75:4b_128g s4                       4.28 bpw
 -- model.layers.78.mlp.up_proj                        0.25:5b_32g/0.75:4b_32g s4                         4.38 bpw
 -- model.layers.78.mlp.up_proj                        0.25:6b_128g/0.75:5b_128g s4                       5.28 bpw
 -- model.layers.78.mlp.up_proj                        0.25:6b_32g/0.75:5b_32g s4                         5.38 bpw
 -- model.layers.78.mlp.up_proj                        1:6b_128g s4                                       6.03 bpw
 -- model.layers.78.mlp.up_proj                        0.1:8b_128g/0.9:6b_128g s4                         6.25 bpw
 -- model.layers.78.mlp.up_proj                        1:8b_128g s4                                       8.03 bpw
 -- model.layers.78.mlp.down_proj                      0.05:6b_32g/0.2:3b_64g/0.75:2b_64g s4              2.47 bpw
 -- model.layers.78.mlp.down_proj                      0.05:5b_32g/0.95:3b_32g s4                         3.23 bpw
 -- model.layers.78.mlp.down_proj                      0.05:5b_32g/0.95:4b_32g s4                         4.18 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:4b_128g/0.85:3b_128g s4            3.40 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:4b_32g/0.85:3b_32g s4              3.48 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.95:4b_128g s4                        4.24 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.95:4b_32g s4                         4.33 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:5b_128g/0.85:4b_128g s4            4.35 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:5b_32g/0.85:4b_32g s4              4.43 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4            5.30 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4              5.38 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.95:6b_128g s4                        6.14 bpw
 -- model.layers.78.mlp.down_proj                      0.15:8b_128g/0.85:6b_128g s4                       6.34 bpw
 -- model.layers.78.mlp.down_proj                      1:8b_128g s4                                       8.04 bpw
 -- 2.2370 bpw  accuracy: 0.96749690
 -- 2.3178 bpw  accuracy: 0.96841094
 -- 2.5881 bpw  accuracy: 0.97066376
 -- 2.9045 bpw  accuracy: 0.97190225
 -- 3.2741 bpw  accuracy: 0.98334273
 -- 3.3626 bpw  accuracy: 0.98490667
 -- 3.6158 bpw  accuracy: 0.98598315
 -- 4.1340 bpw  accuracy: 0.99084735
 -- 4.1949 bpw  accuracy: 0.99174905
 -- 4.2572 bpw  accuracy: 0.99142451
 -- 4.3457 bpw  accuracy: 0.99259937
 -- 5.2402 bpw  accuracy: 0.99557532
 -- 5.3287 bpw  accuracy: 0.99625175
 -- 6.0688 bpw  accuracy: 0.99740110
 -- 6.2801 bpw  accuracy: 0.99772034
 -- 6.8458 bpw  accuracy: 0.99798890
 -- 8.0333 bpw  accuracy: 0.99912791
------------------------------------------------
| Measured: model.layers.78 (MLP)              |
| Duration: 81.73 seconds                      |
| Completed step: 158/163                      |
| Avg time / step (rolling): 49.50 seconds     |
| Estimated remaining time: 4min 7sec          |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
 -- Layer: model.layers.79 (Attention)
 -- model.layers.79.self_attn.q_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.15 bpw
 -- model.layers.79.self_attn.q_proj                   1:4b_128g s4                                       4.04 bpw
 -- model.layers.79.self_attn.q_proj                   1:4b_64g s4                                        4.07 bpw
 -- model.layers.79.self_attn.q_proj                   1:4b_32g s4                                        4.13 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.15 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.15 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.79.self_attn.q_proj                   1:6b_128g s4                                       6.04 bpw
 -- model.layers.79.self_attn.q_proj                   1:6b_32g s4                                        6.13 bpw
 -- model.layers.79.self_attn.q_proj                   1:8b_128g s4                                       8.04 bpw
 -- model.layers.79.self_attn.k_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.15 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.20 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.17 bpw
 -- model.layers.79.self_attn.k_proj                   1:4b_128g s4                                       4.06 bpw
 -- model.layers.79.self_attn.k_proj                   1:4b_64g s4                                        4.10 bpw
 -- model.layers.79.self_attn.k_proj                   1:4b_32g s4                                        4.16 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.17 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.20 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.26 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.17 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.26 bpw
 -- model.layers.79.self_attn.k_proj                   1:6b_128g s4                                       6.06 bpw
 -- model.layers.79.self_attn.k_proj                   1:6b_32g s4                                        6.16 bpw
 -- model.layers.79.self_attn.k_proj                   1:8b_128g s4                                       8.06 bpw
 -- model.layers.79.self_attn.v_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.15 bpw
 -- model.layers.79.self_attn.v_proj                   0.25:3b_64g/0.75:2b_64g s4                         2.35 bpw
 -- model.layers.79.self_attn.v_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.17 bpw
 -- model.layers.79.self_attn.v_proj                   0.1:4b_64g/0.9:3b_64g s4                           3.20 bpw
 -- model.layers.79.self_attn.v_proj                   1:4b_128g s4                                       4.06 bpw
 -- model.layers.79.self_attn.v_proj                   1:4b_64g s4                                        4.10 bpw
 -- model.layers.79.self_attn.v_proj                   1:4b_32g s4                                        4.16 bpw
 -- model.layers.79.self_attn.v_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.20 bpw
 -- model.layers.79.self_attn.v_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.26 bpw
 -- model.layers.79.self_attn.v_proj                   1:5b_64g s4                                        5.10 bpw
 -- model.layers.79.self_attn.v_proj                   1:5b_32g s4                                        5.16 bpw
 -- model.layers.79.self_attn.v_proj                   1:6b_128g s4                                       6.06 bpw
 -- model.layers.79.self_attn.v_proj                   1:6b_32g s4                                        6.16 bpw
 -- model.layers.79.self_attn.v_proj                   1:8b_32g s4                                        8.16 bpw
 -- model.layers.79.self_attn.v_proj                   1:8b_128g s4                                       8.06 bpw
 -- model.layers.79.self_attn.o_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.14 bpw
 -- model.layers.79.self_attn.o_proj                   1:4b_128g s4                                       4.04 bpw
 -- model.layers.79.self_attn.o_proj                   1:4b_64g s4                                        4.07 bpw
 -- model.layers.79.self_attn.o_proj                   1:4b_32g s4                                        4.13 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.14 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.14 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.79.self_attn.o_proj                   1:6b_128g s4                                       6.04 bpw
 -- model.layers.79.self_attn.o_proj                   1:6b_32g s4                                        6.13 bpw
 -- model.layers.79.self_attn.o_proj                   1:8b_128g s4                                       8.04 bpw
 -- 2.1254 bpw  accuracy: 0.99430841
 -- 2.1805 bpw  accuracy: 0.99455748
 -- 2.2265 bpw  accuracy: 0.99536601
 -- 2.6605 bpw  accuracy: 0.99650368
 -- 3.1487 bpw  accuracy: 0.99726392
 -- 3.1501 bpw  accuracy: 0.99730426
 -- 4.0394 bpw  accuracy: 0.99843880
 -- 4.0411 bpw  accuracy: 0.99847716
 -- 4.0742 bpw  accuracy: 0.99857392
 -- 4.1334 bpw  accuracy: 0.99864129
 -- 4.1501 bpw  accuracy: 0.99860959
 -- 4.1758 bpw  accuracy: 0.99868624
 -- 4.2222 bpw  accuracy: 0.99883697
 -- 4.2848 bpw  accuracy: 0.99891667
 -- 5.1982 bpw  accuracy: 0.99933463
 -- 5.2848 bpw  accuracy: 0.99941761
 -- 6.0394 bpw  accuracy: 0.99953790
 -- 6.2445 bpw  accuracy: 0.99965662
 -- 8.0394 bpw  accuracy: 0.99981288
------------------------------------------------
| Measured: model.layers.79 (Attention)        |
| Duration: 17.24 seconds                      |
| Completed step: 159/163                      |
| Avg time / step (rolling): 38.75 seconds     |
| Estimated remaining time: 2min 34sec         |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
 -- Layer: model.layers.79 (MLP)
 ## Measurement/inference error (3): hidden_states

Additional context

No response

Acknowledgements

  • I have looked for similar issues before submitting this one.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will ask my questions politely.
@Orion-zhen Orion-zhen added the bug Something isn't working label Sep 19, 2024
@Orion-zhen Orion-zhen changed the title [BUG] Failed to quantize Qwen2.5-Math-72B-Instruct [BUG] Failed to quantize Qwen2.5-Math-72B-Instruct: Measurement/inference error (3): hidden_states Sep 19, 2024
@Downtown-Case
Copy link
Contributor

Out of curiosity, have you tried hb6? That tends to work best below 6bpw anyway.

@Orion-zhen
Copy link
Contributor Author

The issue persists even though I switched -hb to 6:

python convert.py -fst -hb 6 -c ~/ai/math-hard/math-hard.parquet  -o ../tmp -i ~/ai/Models/qwen2.5-math-72b -cf ~/ai/Models/qwen2.5-math-72b-4.5 -b 4.5
 -- Resuming job
 !! Note: Overriding options with settings from existing job
 -- Input: /home/orion/ai/Models/qwen2.5-math-72b
 -- Output: ../tmp
 -- Calibration dataset: /home/orion/ai/math-hard/math-hard.parquet, 100 / 16 rows, 2048 tokens per sample
 -- Target bits per weight: 4.5 (decoder), 6 (head)
 -- Max shard size: 8192 MB
 -- Enabled fast_safetensors option.
 -- Full model will be compiled to: /home/orion/ai/Models/qwen2.5-math-72b-4.5
 !! Warning: Output path /home/orion/ai/Models/qwen2.5-math-72b-4.5 exists but is not empty
 -- Measuring quantization impact...
 -- Resuming from layer: model.layers.77 (MLP)
 -- Layer: model.layers.78 (Attention)
 -- model.layers.78.self_attn.q_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.15 bpw
 -- model.layers.78.self_attn.q_proj                   1:4b_128g s4                                       4.04 bpw
 -- model.layers.78.self_attn.q_proj                   1:4b_64g s4                                        4.07 bpw
 -- model.layers.78.self_attn.q_proj                   1:4b_32g s4                                        4.13 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.15 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.15 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.78.self_attn.q_proj                   1:6b_128g s4                                       6.04 bpw
 -- model.layers.78.self_attn.q_proj                   1:6b_32g s4                                        6.13 bpw
 -- model.layers.78.self_attn.q_proj                   1:8b_128g s4                                       8.04 bpw
 -- model.layers.78.self_attn.k_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.15 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.20 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.17 bpw
 -- model.layers.78.self_attn.k_proj                   1:4b_128g s4                                       4.06 bpw
 -- model.layers.78.self_attn.k_proj                   1:4b_64g s4                                        4.10 bpw
 -- model.layers.78.self_attn.k_proj                   1:4b_32g s4                                        4.16 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.17 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.20 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.26 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.17 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.26 bpw
 -- model.layers.78.self_attn.k_proj                   1:6b_128g s4                                       6.06 bpw
 -- model.layers.78.self_attn.k_proj                   1:6b_32g s4                                        6.16 bpw
 -- model.layers.78.self_attn.k_proj                   1:8b_128g s4                                       8.06 bpw
 -- model.layers.78.self_attn.v_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.15 bpw
 -- model.layers.78.self_attn.v_proj                   0.25:3b_64g/0.75:2b_64g s4                         2.35 bpw
 -- model.layers.78.self_attn.v_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.17 bpw
 -- model.layers.78.self_attn.v_proj                   0.1:4b_64g/0.9:3b_64g s4                           3.20 bpw
 -- model.layers.78.self_attn.v_proj                   1:4b_128g s4                                       4.06 bpw
 -- model.layers.78.self_attn.v_proj                   1:4b_64g s4                                        4.10 bpw
 -- model.layers.78.self_attn.v_proj                   1:4b_32g s4                                        4.16 bpw
 -- model.layers.78.self_attn.v_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.20 bpw
 -- model.layers.78.self_attn.v_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.26 bpw
 -- model.layers.78.self_attn.v_proj                   1:5b_64g s4                                        5.10 bpw
 -- model.layers.78.self_attn.v_proj                   1:5b_32g s4                                        5.16 bpw
 -- model.layers.78.self_attn.v_proj                   1:6b_128g s4                                       6.06 bpw
 -- model.layers.78.self_attn.v_proj                   1:6b_32g s4                                        6.16 bpw
 -- model.layers.78.self_attn.v_proj                   1:8b_32g s4                                        8.16 bpw
 -- model.layers.78.self_attn.v_proj                   1:8b_128g s4                                       8.06 bpw
 -- model.layers.78.self_attn.o_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.14 bpw
 -- model.layers.78.self_attn.o_proj                   1:4b_128g s4                                       4.04 bpw
 -- model.layers.78.self_attn.o_proj                   1:4b_64g s4                                        4.07 bpw
 -- model.layers.78.self_attn.o_proj                   1:4b_32g s4                                        4.13 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.14 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.14 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.78.self_attn.o_proj                   1:6b_128g s4                                       6.04 bpw
 -- model.layers.78.self_attn.o_proj                   1:6b_32g s4                                        6.13 bpw
 -- model.layers.78.self_attn.o_proj                   1:8b_128g s4                                       8.04 bpw
 -- 2.1254 bpw  accuracy: 0.98200013
 -- 2.1805 bpw  accuracy: 0.98263463
 -- 2.2265 bpw  accuracy: 0.98464463
 -- 2.6605 bpw  accuracy: 0.98872345
 -- 3.1487 bpw  accuracy: 0.99127257
 -- 3.1501 bpw  accuracy: 0.99148091
 -- 4.0394 bpw  accuracy: 0.99467434
 -- 4.0411 bpw  accuracy: 0.99497946
 -- 4.0742 bpw  accuracy: 0.99520273
 -- 4.1334 bpw  accuracy: 0.99506140
 -- 4.1501 bpw  accuracy: 0.99567622
 -- 4.1758 bpw  accuracy: 0.99595824
 -- 4.2222 bpw  accuracy: 0.99629830
 -- 4.2848 bpw  accuracy: 0.99655357
 -- 5.1982 bpw  accuracy: 0.99788529
 -- 5.2848 bpw  accuracy: 0.99816783
 -- 6.0394 bpw  accuracy: 0.99851872
 -- 6.2445 bpw  accuracy: 0.99891316
 -- 8.0394 bpw  accuracy: 0.99956538
------------------------------------------------
| Measured: model.layers.78 (Attention)        |
| Duration: 17.24 seconds                      |
| Completed step: 157/163                      |
| Avg time / step (rolling): 17.24 seconds     |
| Estimated remaining time: 1min 43sec         |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
 -- Layer: model.layers.78 (MLP)
 -- model.layers.78.mlp.gate_proj                      0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:4b_128g/0.9:3b_128g s4                         3.14 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:4b_32g/0.9:3b_32g s4                           3.23 bpw
 -- model.layers.78.mlp.gate_proj                      1:4b_128g s4                                       4.03 bpw
 -- model.layers.78.mlp.gate_proj                      1:4b_32g s4                                        4.13 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:5b_128g/0.9:4b_128g s4                         4.14 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:6b_128g/0.9:5b_128g s4                         5.14 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.78.mlp.gate_proj                      1:6b_128g s4                                       6.03 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:8b_128g/0.9:6b_128g s4                         6.25 bpw
 -- model.layers.78.mlp.gate_proj                      1:8b_128g s4                                       8.03 bpw
 -- model.layers.78.mlp.up_proj                        0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.78.mlp.up_proj                        0.25:3b_64g/0.75:2b_64g s4                         2.31 bpw
 -- model.layers.78.mlp.up_proj                        0.3:3b_64g/0.7:2b_64g s4                           2.37 bpw
 -- model.layers.78.mlp.up_proj                        0.25:4b_128g/0.75:3b_128g s4                       3.28 bpw
 -- model.layers.78.mlp.up_proj                        0.25:4b_32g/0.75:3b_32g s4                         3.38 bpw
 -- model.layers.78.mlp.up_proj                        1:4b_32g s4                                        4.13 bpw
 -- model.layers.78.mlp.up_proj                        0.25:5b_128g/0.75:4b_128g s4                       4.28 bpw
 -- model.layers.78.mlp.up_proj                        0.25:5b_32g/0.75:4b_32g s4                         4.38 bpw
 -- model.layers.78.mlp.up_proj                        0.25:6b_128g/0.75:5b_128g s4                       5.28 bpw
 -- model.layers.78.mlp.up_proj                        0.25:6b_32g/0.75:5b_32g s4                         5.38 bpw
 -- model.layers.78.mlp.up_proj                        1:6b_128g s4                                       6.03 bpw
 -- model.layers.78.mlp.up_proj                        0.1:8b_128g/0.9:6b_128g s4                         6.25 bpw
 -- model.layers.78.mlp.up_proj                        1:8b_128g s4                                       8.03 bpw
 -- model.layers.78.mlp.down_proj                      0.05:6b_32g/0.2:3b_64g/0.75:2b_64g s4              2.47 bpw
 -- model.layers.78.mlp.down_proj                      0.05:5b_32g/0.95:3b_32g s4                         3.23 bpw
 -- model.layers.78.mlp.down_proj                      0.05:5b_32g/0.95:4b_32g s4                         4.18 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:4b_128g/0.85:3b_128g s4            3.40 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:4b_32g/0.85:3b_32g s4              3.48 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.95:4b_128g s4                        4.24 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.95:4b_32g s4                         4.33 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:5b_128g/0.85:4b_128g s4            4.35 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:5b_32g/0.85:4b_32g s4              4.43 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4            5.30 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4              5.38 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.95:6b_128g s4                        6.14 bpw
 -- model.layers.78.mlp.down_proj                      0.15:8b_128g/0.85:6b_128g s4                       6.34 bpw
 -- model.layers.78.mlp.down_proj                      1:8b_128g s4                                       8.04 bpw
 -- 2.2370 bpw  accuracy: 0.96749690
 -- 2.3178 bpw  accuracy: 0.96841094
 -- 2.5881 bpw  accuracy: 0.97066376
 -- 2.9045 bpw  accuracy: 0.97190225
 -- 3.2741 bpw  accuracy: 0.98334273
 -- 3.3626 bpw  accuracy: 0.98490667
 -- 3.6158 bpw  accuracy: 0.98598315
 -- 4.1340 bpw  accuracy: 0.99084735
 -- 4.1949 bpw  accuracy: 0.99174905
 -- 4.2572 bpw  accuracy: 0.99142451
 -- 4.3457 bpw  accuracy: 0.99259937
 -- 5.2402 bpw  accuracy: 0.99557532
 -- 5.3287 bpw  accuracy: 0.99625175
 -- 6.0688 bpw  accuracy: 0.99740110
 -- 6.2801 bpw  accuracy: 0.99772034
 -- 6.8458 bpw  accuracy: 0.99798890
 -- 8.0333 bpw  accuracy: 0.99912791
------------------------------------------------
| Measured: model.layers.78 (MLP)              |
| Duration: 81.78 seconds                      |
| Completed step: 158/163                      |
| Avg time / step (rolling): 49.51 seconds     |
| Estimated remaining time: 4min 7sec          |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
 -- Layer: model.layers.79 (Attention)
 -- model.layers.79.self_attn.q_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.15 bpw
 -- model.layers.79.self_attn.q_proj                   1:4b_128g s4                                       4.04 bpw
 -- model.layers.79.self_attn.q_proj                   1:4b_64g s4                                        4.07 bpw
 -- model.layers.79.self_attn.q_proj                   1:4b_32g s4                                        4.13 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.15 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.15 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.79.self_attn.q_proj                   1:6b_128g s4                                       6.04 bpw
 -- model.layers.79.self_attn.q_proj                   1:6b_32g s4                                        6.13 bpw
 -- model.layers.79.self_attn.q_proj                   1:8b_128g s4                                       8.04 bpw
 -- model.layers.79.self_attn.k_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.15 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.20 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.17 bpw
 -- model.layers.79.self_attn.k_proj                   1:4b_128g s4                                       4.06 bpw
 -- model.layers.79.self_attn.k_proj                   1:4b_64g s4                                        4.10 bpw
 -- model.layers.79.self_attn.k_proj                   1:4b_32g s4                                        4.16 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.17 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.20 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.26 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.17 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.26 bpw
 -- model.layers.79.self_attn.k_proj                   1:6b_128g s4                                       6.06 bpw
 -- model.layers.79.self_attn.k_proj                   1:6b_32g s4                                        6.16 bpw
 -- model.layers.79.self_attn.k_proj                   1:8b_128g s4                                       8.06 bpw
 -- model.layers.79.self_attn.v_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.15 bpw
 -- model.layers.79.self_attn.v_proj                   0.25:3b_64g/0.75:2b_64g s4                         2.35 bpw
 -- model.layers.79.self_attn.v_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.17 bpw
 -- model.layers.79.self_attn.v_proj                   0.1:4b_64g/0.9:3b_64g s4                           3.20 bpw
 -- model.layers.79.self_attn.v_proj                   1:4b_128g s4                                       4.06 bpw
 -- model.layers.79.self_attn.v_proj                   1:4b_64g s4                                        4.10 bpw
 -- model.layers.79.self_attn.v_proj                   1:4b_32g s4                                        4.16 bpw
 -- model.layers.79.self_attn.v_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.20 bpw
 -- model.layers.79.self_attn.v_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.26 bpw
 -- model.layers.79.self_attn.v_proj                   1:5b_64g s4                                        5.10 bpw
 -- model.layers.79.self_attn.v_proj                   1:5b_32g s4                                        5.16 bpw
 -- model.layers.79.self_attn.v_proj                   1:6b_128g s4                                       6.06 bpw
 -- model.layers.79.self_attn.v_proj                   1:6b_32g s4                                        6.16 bpw
 -- model.layers.79.self_attn.v_proj                   1:8b_32g s4                                        8.16 bpw
 -- model.layers.79.self_attn.v_proj                   1:8b_128g s4                                       8.06 bpw
 -- model.layers.79.self_attn.o_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.14 bpw
 -- model.layers.79.self_attn.o_proj                   1:4b_128g s4                                       4.04 bpw
 -- model.layers.79.self_attn.o_proj                   1:4b_64g s4                                        4.07 bpw
 -- model.layers.79.self_attn.o_proj                   1:4b_32g s4                                        4.13 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.14 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.14 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.79.self_attn.o_proj                   1:6b_128g s4                                       6.04 bpw
 -- model.layers.79.self_attn.o_proj                   1:6b_32g s4                                        6.13 bpw
 -- model.layers.79.self_attn.o_proj                   1:8b_128g s4                                       8.04 bpw
 -- 2.1254 bpw  accuracy: 0.99430841
 -- 2.1805 bpw  accuracy: 0.99455748
 -- 2.2265 bpw  accuracy: 0.99536601
 -- 2.6605 bpw  accuracy: 0.99650368
 -- 3.1487 bpw  accuracy: 0.99726392
 -- 3.1501 bpw  accuracy: 0.99730426
 -- 4.0394 bpw  accuracy: 0.99843880
 -- 4.0411 bpw  accuracy: 0.99847716
 -- 4.0742 bpw  accuracy: 0.99857254
 -- 4.1334 bpw  accuracy: 0.99864129
 -- 4.1501 bpw  accuracy: 0.99860959
 -- 4.1758 bpw  accuracy: 0.99868624
 -- 4.2222 bpw  accuracy: 0.99883697
 -- 4.2848 bpw  accuracy: 0.99891667
 -- 5.1982 bpw  accuracy: 0.99933463
 -- 5.2848 bpw  accuracy: 0.99941761
 -- 6.0394 bpw  accuracy: 0.99953790
 -- 6.2445 bpw  accuracy: 0.99965662
 -- 8.0394 bpw  accuracy: 0.99981288
------------------------------------------------
| Measured: model.layers.79 (Attention)        |
| Duration: 17.23 seconds                      |
| Completed step: 159/163                      |
| Avg time / step (rolling): 38.75 seconds     |
| Estimated remaining time: 2min 35sec         |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
 -- Layer: model.layers.79 (MLP)
 ## Measurement/inference error (3): hidden_states

@DocShotgun
Copy link

Head bits isn't relevant here because the error is happening during the measurement phase, not the quantization phase. Iirc this error means that NaN or inf values were produced in the hidden states.

Interestingly enough, I had this happen on my finetune of Qwen2.5 72B Instruct, on the exact same layer too (79 MLP), so I'm not sure if there's something slightly weird going on with Qwen2.5 models.

@jukofyork
Copy link

jukofyork commented Sep 21, 2024

Is it using fp16 for the computations? If so it's probably the same problem that llama.cpp had with qwen-2:

ggerganov/llama.cpp#7805

and the values are going outside the +/- 2^16 range of fp16.

The fact it's happening right at the end like that would make me strongly suspect this is the problem as I found when creating control vectors the last layer often blows up its activations 10-20x compared to the preceeding layers... The solution is to use bf16 or fp32.

@Orion-zhen
Copy link
Contributor Author

Sadly my hardware doesn't support bf16. Is there any way to make exl2 quantizations with fp32? Or even better, only upscale to fp32 when fp16 runs into errors?

@DocShotgun
Copy link

Is it using fp16 for the computations? If so it's probably the same problem that llama.cpp had with qwen-2:

ggerganov/llama.cpp#7805

and the values are going outside the +/- 2^16 range of fp16.

The fact it's happening right at the end like that would make me strongly suspect this is the problem as I found when creating control vectors the last layer often blows up its activations 10-20x compared to the preceeding layers... The solution is to use bf16 or fp32.

Yes, this is suspected to be the problem.

Sadly my hardware doesn't support bf16. Is there any way to make exl2 quantizations with fp32? Or even better, only upscale to fp32 when fp16 runs into errors?

It's fairly more complicated than this and would need changes to the code.

@turboderp
Copy link
Owner

You could possibly try again with the latest changes in the dev branch. I made it a bit more tolerant of overflows in the hidden state, so now if only a few channels overflow (as has been observed with Qwen2.5-72B specifically) it clamps them instead of erroring out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants