[BUG] Failed to quantize Qwen2.5-Math-72B-Instruct: Measurement/inference error (3): hidden_states #627

Orion-zhen · 2024-09-19T14:03:25Z

OS

Linux

GPU Library

CUDA 12.x

Python version

3.12

Pytorch version

2.4.1

Model

Qwen/Qwen2.5-Math-72B-Instruct

Describe the bug

calibration dataset: Orion-zhen/math-hard-calibration

console output:

------------------------------------------------
| Measured: model.layers.79 (Attention)        |
| Duration: 17.22 seconds                      |
| Completed step: 159/163                      |
| Avg time / step (rolling): 38.69 seconds     |
| Estimated remaining time: 2min 34sec         |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
 -- Layer: model.layers.79 (MLP)
 ## Measurement/inference error (3): hidden_states

The quantization process goes all right until it reaches layer 79 MLP. I have tried 4 times, the error remains the same.

Reproduction steps

run the command:

python convert.py -fst -hb 8 -c ~/ai/math-hard/math-hard.parquet  -o ../tmp -i ~/ai/Models/qwen2.5-math-72b -cf ~/ai/Models/qwen2.5-math-72b-4.5 -b 4.5

Expected behavior

The model should be quantized to 4.5bpw

Logs

full log can be viewed here:

>_  python convert.py -fst -hb 8 -c ~/ai/math-hard/math-hard.parquet  -o ../tmp -i ~/ai/Models/qwen2.5-math-72b -cf ~/ai/Models/qwen2.5-math-72b-4.5 -b 4.5
 -- Resuming job
 !! Note: Overriding options with settings from existing job
 -- Input: /home/orion/ai/Models/qwen2.5-math-72b
 -- Output: ../tmp
 -- Calibration dataset: /home/orion/ai/math-hard/math-hard.parquet, 100 / 16 rows, 2048 tokens per sample
 -- Target bits per weight: 4.5 (decoder), 8 (head)
 -- Max shard size: 8192 MB
 -- Enabled fast_safetensors option.
 -- Full model will be compiled to: /home/orion/ai/Models/qwen2.5-math-72b-4.5
 !! Warning: Output path /home/orion/ai/Models/qwen2.5-math-72b-4.5 exists but is not empty
 -- Measuring quantization impact...
 -- Resuming from layer: model.layers.77 (MLP)
 -- Layer: model.layers.78 (Attention)
 -- model.layers.78.self_attn.q_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.15 bpw
 -- model.layers.78.self_attn.q_proj                   1:4b_128g s4                                       4.04 bpw
 -- model.layers.78.self_attn.q_proj                   1:4b_64g s4                                        4.07 bpw
 -- model.layers.78.self_attn.q_proj                   1:4b_32g s4                                        4.13 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.15 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.15 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.78.self_attn.q_proj                   1:6b_128g s4                                       6.04 bpw
 -- model.layers.78.self_attn.q_proj                   1:6b_32g s4                                        6.13 bpw
 -- model.layers.78.self_attn.q_proj                   1:8b_128g s4                                       8.04 bpw
 -- model.layers.78.self_attn.k_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.15 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.20 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.17 bpw
 -- model.layers.78.self_attn.k_proj                   1:4b_128g s4                                       4.06 bpw
 -- model.layers.78.self_attn.k_proj                   1:4b_64g s4                                        4.10 bpw
 -- model.layers.78.self_attn.k_proj                   1:4b_32g s4                                        4.16 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.17 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.20 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.26 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.17 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.26 bpw
 -- model.layers.78.self_attn.k_proj                   1:6b_128g s4                                       6.06 bpw
 -- model.layers.78.self_attn.k_proj                   1:6b_32g s4                                        6.16 bpw
 -- model.layers.78.self_attn.k_proj                   1:8b_128g s4                                       8.06 bpw
 -- model.layers.78.self_attn.v_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.15 bpw
 -- model.layers.78.self_attn.v_proj                   0.25:3b_64g/0.75:2b_64g s4                         2.35 bpw
 -- model.layers.78.self_attn.v_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.17 bpw
 -- model.layers.78.self_attn.v_proj                   0.1:4b_64g/0.9:3b_64g s4                           3.20 bpw
 -- model.layers.78.self_attn.v_proj                   1:4b_128g s4                                       4.06 bpw
 -- model.layers.78.self_attn.v_proj                   1:4b_64g s4                                        4.10 bpw
 -- model.layers.78.self_attn.v_proj                   1:4b_32g s4                                        4.16 bpw
 -- model.layers.78.self_attn.v_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.20 bpw
 -- model.layers.78.self_attn.v_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.26 bpw
 -- model.layers.78.self_attn.v_proj                   1:5b_64g s4                                        5.10 bpw
 -- model.layers.78.self_attn.v_proj                   1:5b_32g s4                                        5.16 bpw
 -- model.layers.78.self_attn.v_proj                   1:6b_128g s4                                       6.06 bpw
 -- model.layers.78.self_attn.v_proj                   1:6b_32g s4                                        6.16 bpw
 -- model.layers.78.self_attn.v_proj                   1:8b_32g s4                                        8.16 bpw
 -- model.layers.78.self_attn.v_proj                   1:8b_128g s4                                       8.06 bpw
 -- model.layers.78.self_attn.o_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.14 bpw
 -- model.layers.78.self_attn.o_proj                   1:4b_128g s4                                       4.04 bpw
 -- model.layers.78.self_attn.o_proj                   1:4b_64g s4                                        4.07 bpw
 -- model.layers.78.self_attn.o_proj                   1:4b_32g s4                                        4.13 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.14 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.14 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.78.self_attn.o_proj                   1:6b_128g s4                                       6.04 bpw
 -- model.layers.78.self_attn.o_proj                   1:6b_32g s4                                        6.13 bpw
 -- model.layers.78.self_attn.o_proj                   1:8b_128g s4                                       8.04 bpw
 -- 2.1254 bpw  accuracy: 0.98200013
 -- 2.1805 bpw  accuracy: 0.98263463
 -- 2.2265 bpw  accuracy: 0.98464463
 -- 2.6605 bpw  accuracy: 0.98872345
 -- 3.1487 bpw  accuracy: 0.99127257
 -- 3.1501 bpw  accuracy: 0.99148091
 -- 4.0394 bpw  accuracy: 0.99467434
 -- 4.0411 bpw  accuracy: 0.99497946
 -- 4.0742 bpw  accuracy: 0.99520273
 -- 4.1334 bpw  accuracy: 0.99506140
 -- 4.1501 bpw  accuracy: 0.99567622
 -- 4.1758 bpw  accuracy: 0.99595824
 -- 4.2222 bpw  accuracy: 0.99629830
 -- 4.2848 bpw  accuracy: 0.99655357
 -- 5.1982 bpw  accuracy: 0.99788529
 -- 5.2848 bpw  accuracy: 0.99816783
 -- 6.0394 bpw  accuracy: 0.99851872
 -- 6.2445 bpw  accuracy: 0.99891316
 -- 8.0394 bpw  accuracy: 0.99956538
------------------------------------------------
| Measured: model.layers.78 (Attention)        |
| Duration: 17.27 seconds                      |
| Completed step: 157/163                      |
| Avg time / step (rolling): 17.27 seconds     |
| Estimated remaining time: 1min 43sec         |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
 -- Layer: model.layers.78 (MLP)
 -- model.layers.78.mlp.gate_proj                      0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:4b_128g/0.9:3b_128g s4                         3.14 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:4b_32g/0.9:3b_32g s4                           3.23 bpw
 -- model.layers.78.mlp.gate_proj                      1:4b_128g s4                                       4.03 bpw
 -- model.layers.78.mlp.gate_proj                      1:4b_32g s4                                        4.13 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:5b_128g/0.9:4b_128g s4                         4.14 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:6b_128g/0.9:5b_128g s4                         5.14 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.78.mlp.gate_proj                      1:6b_128g s4                                       6.03 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:8b_128g/0.9:6b_128g s4                         6.25 bpw
 -- model.layers.78.mlp.gate_proj                      1:8b_128g s4                                       8.03 bpw
 -- model.layers.78.mlp.up_proj                        0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.78.mlp.up_proj                        0.25:3b_64g/0.75:2b_64g s4                         2.31 bpw
 -- model.layers.78.mlp.up_proj                        0.3:3b_64g/0.7:2b_64g s4                           2.37 bpw
 -- model.layers.78.mlp.up_proj                        0.25:4b_128g/0.75:3b_128g s4                       3.28 bpw
 -- model.layers.78.mlp.up_proj                        0.25:4b_32g/0.75:3b_32g s4                         3.38 bpw
 -- model.layers.78.mlp.up_proj                        1:4b_32g s4                                        4.13 bpw
 -- model.layers.78.mlp.up_proj                        0.25:5b_128g/0.75:4b_128g s4                       4.28 bpw
 -- model.layers.78.mlp.up_proj                        0.25:5b_32g/0.75:4b_32g s4                         4.38 bpw
 -- model.layers.78.mlp.up_proj                        0.25:6b_128g/0.75:5b_128g s4                       5.28 bpw
 -- model.layers.78.mlp.up_proj                        0.25:6b_32g/0.75:5b_32g s4                         5.38 bpw
 -- model.layers.78.mlp.up_proj                        1:6b_128g s4                                       6.03 bpw
 -- model.layers.78.mlp.up_proj                        0.1:8b_128g/0.9:6b_128g s4                         6.25 bpw
 -- model.layers.78.mlp.up_proj                        1:8b_128g s4                                       8.03 bpw
 -- model.layers.78.mlp.down_proj                      0.05:6b_32g/0.2:3b_64g/0.75:2b_64g s4              2.47 bpw
 -- model.layers.78.mlp.down_proj                      0.05:5b_32g/0.95:3b_32g s4                         3.23 bpw
 -- model.layers.78.mlp.down_proj                      0.05:5b_32g/0.95:4b_32g s4                         4.18 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:4b_128g/0.85:3b_128g s4            3.40 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:4b_32g/0.85:3b_32g s4              3.48 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.95:4b_128g s4                        4.24 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.95:4b_32g s4                         4.33 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:5b_128g/0.85:4b_128g s4            4.35 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:5b_32g/0.85:4b_32g s4              4.43 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4            5.30 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4              5.38 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.95:6b_128g s4                        6.14 bpw
 -- model.layers.78.mlp.down_proj                      0.15:8b_128g/0.85:6b_128g s4                       6.34 bpw
 -- model.layers.78.mlp.down_proj                      1:8b_128g s4                                       8.04 bpw
 -- 2.2370 bpw  accuracy: 0.96749690
 -- 2.3178 bpw  accuracy: 0.96841094
 -- 2.5881 bpw  accuracy: 0.97066376
 -- 2.9045 bpw  accuracy: 0.97190225
 -- 3.2741 bpw  accuracy: 0.98334273
 -- 3.3626 bpw  accuracy: 0.98490667
 -- 3.6158 bpw  accuracy: 0.98598315
 -- 4.1340 bpw  accuracy: 0.99084735
 -- 4.1949 bpw  accuracy: 0.99174905
 -- 4.2572 bpw  accuracy: 0.99142451
 -- 4.3457 bpw  accuracy: 0.99259937
 -- 5.2402 bpw  accuracy: 0.99557532
 -- 5.3287 bpw  accuracy: 0.99625175
 -- 6.0688 bpw  accuracy: 0.99740110
 -- 6.2801 bpw  accuracy: 0.99772034
 -- 6.8458 bpw  accuracy: 0.99798890
 -- 8.0333 bpw  accuracy: 0.99912791
------------------------------------------------
| Measured: model.layers.78 (MLP)              |
| Duration: 81.73 seconds                      |
| Completed step: 158/163                      |
| Avg time / step (rolling): 49.50 seconds     |
| Estimated remaining time: 4min 7sec          |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
 -- Layer: model.layers.79 (Attention)
 -- model.layers.79.self_attn.q_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.15 bpw
 -- model.layers.79.self_attn.q_proj                   1:4b_128g s4                                       4.04 bpw
 -- model.layers.79.self_attn.q_proj                   1:4b_64g s4                                        4.07 bpw
 -- model.layers.79.self_attn.q_proj                   1:4b_32g s4                                        4.13 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.15 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.15 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.79.self_attn.q_proj                   1:6b_128g s4                                       6.04 bpw
 -- model.layers.79.self_attn.q_proj                   1:6b_32g s4                                        6.13 bpw
 -- model.layers.79.self_attn.q_proj                   1:8b_128g s4                                       8.04 bpw
 -- model.layers.79.self_attn.k_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.15 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.20 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.17 bpw
 -- model.layers.79.self_attn.k_proj                   1:4b_128g s4                                       4.06 bpw
 -- model.layers.79.self_attn.k_proj                   1:4b_64g s4                                        4.10 bpw
 -- model.layers.79.self_attn.k_proj                   1:4b_32g s4                                        4.16 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.17 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.20 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.26 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.17 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.26 bpw
 -- model.layers.79.self_attn.k_proj                   1:6b_128g s4                                       6.06 bpw
 -- model.layers.79.self_attn.k_proj                   1:6b_32g s4                                        6.16 bpw
 -- model.layers.79.self_attn.k_proj                   1:8b_128g s4                                       8.06 bpw
 -- model.layers.79.self_attn.v_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.15 bpw
 -- model.layers.79.self_attn.v_proj                   0.25:3b_64g/0.75:2b_64g s4                         2.35 bpw
 -- model.layers.79.self_attn.v_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.17 bpw
 -- model.layers.79.self_attn.v_proj                   0.1:4b_64g/0.9:3b_64g s4                           3.20 bpw
 -- model.layers.79.self_attn.v_proj                   1:4b_128g s4                                       4.06 bpw
 -- model.layers.79.self_attn.v_proj                   1:4b_64g s4                                        4.10 bpw
 -- model.layers.79.self_attn.v_proj                   1:4b_32g s4                                        4.16 bpw
 -- model.layers.79.self_attn.v_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.20 bpw
 -- model.layers.79.self_attn.v_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.26 bpw
 -- model.layers.79.self_attn.v_proj                   1:5b_64g s4                                        5.10 bpw
 -- model.layers.79.self_attn.v_proj                   1:5b_32g s4                                        5.16 bpw
 -- model.layers.79.self_attn.v_proj                   1:6b_128g s4                                       6.06 bpw
 -- model.layers.79.self_attn.v_proj                   1:6b_32g s4                                        6.16 bpw
 -- model.layers.79.self_attn.v_proj                   1:8b_32g s4                                        8.16 bpw
 -- model.layers.79.self_attn.v_proj                   1:8b_128g s4                                       8.06 bpw
 -- model.layers.79.self_attn.o_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.14 bpw
 -- model.layers.79.self_attn.o_proj                   1:4b_128g s4                                       4.04 bpw
 -- model.layers.79.self_attn.o_proj                   1:4b_64g s4                                        4.07 bpw
 -- model.layers.79.self_attn.o_proj                   1:4b_32g s4                                        4.13 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.14 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.14 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.79.self_attn.o_proj                   1:6b_128g s4                                       6.04 bpw
 -- model.layers.79.self_attn.o_proj                   1:6b_32g s4                                        6.13 bpw
 -- model.layers.79.self_attn.o_proj                   1:8b_128g s4                                       8.04 bpw
 -- 2.1254 bpw  accuracy: 0.99430841
 -- 2.1805 bpw  accuracy: 0.99455748
 -- 2.2265 bpw  accuracy: 0.99536601
 -- 2.6605 bpw  accuracy: 0.99650368
 -- 3.1487 bpw  accuracy: 0.99726392
 -- 3.1501 bpw  accuracy: 0.99730426
 -- 4.0394 bpw  accuracy: 0.99843880
 -- 4.0411 bpw  accuracy: 0.99847716
 -- 4.0742 bpw  accuracy: 0.99857392
 -- 4.1334 bpw  accuracy: 0.99864129
 -- 4.1501 bpw  accuracy: 0.99860959
 -- 4.1758 bpw  accuracy: 0.99868624
 -- 4.2222 bpw  accuracy: 0.99883697
 -- 4.2848 bpw  accuracy: 0.99891667
 -- 5.1982 bpw  accuracy: 0.99933463
 -- 5.2848 bpw  accuracy: 0.99941761
 -- 6.0394 bpw  accuracy: 0.99953790
 -- 6.2445 bpw  accuracy: 0.99965662
 -- 8.0394 bpw  accuracy: 0.99981288
------------------------------------------------
| Measured: model.layers.79 (Attention)        |
| Duration: 17.24 seconds                      |
| Completed step: 159/163                      |
| Avg time / step (rolling): 38.75 seconds     |
| Estimated remaining time: 2min 34sec         |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
 -- Layer: model.layers.79 (MLP)
 ## Measurement/inference error (3): hidden_states

Additional context

No response

Acknowledgements

I have looked for similar issues before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will ask my questions politely.

Downtown-Case · 2024-09-19T20:00:07Z

Out of curiosity, have you tried hb6? That tends to work best below 6bpw anyway.

Orion-zhen · 2024-09-20T02:00:44Z

The issue persists even though I switched -hb to 6:

python convert.py -fst -hb 6 -c ~/ai/math-hard/math-hard.parquet  -o ../tmp -i ~/ai/Models/qwen2.5-math-72b -cf ~/ai/Models/qwen2.5-math-72b-4.5 -b 4.5
 -- Resuming job
 !! Note: Overriding options with settings from existing job
 -- Input: /home/orion/ai/Models/qwen2.5-math-72b
 -- Output: ../tmp
 -- Calibration dataset: /home/orion/ai/math-hard/math-hard.parquet, 100 / 16 rows, 2048 tokens per sample
 -- Target bits per weight: 4.5 (decoder), 6 (head)
 -- Max shard size: 8192 MB
 -- Enabled fast_safetensors option.
 -- Full model will be compiled to: /home/orion/ai/Models/qwen2.5-math-72b-4.5
 !! Warning: Output path /home/orion/ai/Models/qwen2.5-math-72b-4.5 exists but is not empty
 -- Measuring quantization impact...
 -- Resuming from layer: model.layers.77 (MLP)
 -- Layer: model.layers.78 (Attention)
 -- model.layers.78.self_attn.q_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.15 bpw
 -- model.layers.78.self_attn.q_proj                   1:4b_128g s4                                       4.04 bpw
 -- model.layers.78.self_attn.q_proj                   1:4b_64g s4                                        4.07 bpw
 -- model.layers.78.self_attn.q_proj                   1:4b_32g s4                                        4.13 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.15 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.15 bpw
 -- model.layers.78.self_attn.q_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.78.self_attn.q_proj                   1:6b_128g s4                                       6.04 bpw
 -- model.layers.78.self_attn.q_proj                   1:6b_32g s4                                        6.13 bpw
 -- model.layers.78.self_attn.q_proj                   1:8b_128g s4                                       8.04 bpw
 -- model.layers.78.self_attn.k_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.15 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.20 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.17 bpw
 -- model.layers.78.self_attn.k_proj                   1:4b_128g s4                                       4.06 bpw
 -- model.layers.78.self_attn.k_proj                   1:4b_64g s4                                        4.10 bpw
 -- model.layers.78.self_attn.k_proj                   1:4b_32g s4                                        4.16 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.17 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.20 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.26 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.17 bpw
 -- model.layers.78.self_attn.k_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.26 bpw
 -- model.layers.78.self_attn.k_proj                   1:6b_128g s4                                       6.06 bpw
 -- model.layers.78.self_attn.k_proj                   1:6b_32g s4                                        6.16 bpw
 -- model.layers.78.self_attn.k_proj                   1:8b_128g s4                                       8.06 bpw
 -- model.layers.78.self_attn.v_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.15 bpw
 -- model.layers.78.self_attn.v_proj                   0.25:3b_64g/0.75:2b_64g s4                         2.35 bpw
 -- model.layers.78.self_attn.v_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.17 bpw
 -- model.layers.78.self_attn.v_proj                   0.1:4b_64g/0.9:3b_64g s4                           3.20 bpw
 -- model.layers.78.self_attn.v_proj                   1:4b_128g s4                                       4.06 bpw
 -- model.layers.78.self_attn.v_proj                   1:4b_64g s4                                        4.10 bpw
 -- model.layers.78.self_attn.v_proj                   1:4b_32g s4                                        4.16 bpw
 -- model.layers.78.self_attn.v_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.20 bpw
 -- model.layers.78.self_attn.v_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.26 bpw
 -- model.layers.78.self_attn.v_proj                   1:5b_64g s4                                        5.10 bpw
 -- model.layers.78.self_attn.v_proj                   1:5b_32g s4                                        5.16 bpw
 -- model.layers.78.self_attn.v_proj                   1:6b_128g s4                                       6.06 bpw
 -- model.layers.78.self_attn.v_proj                   1:6b_32g s4                                        6.16 bpw
 -- model.layers.78.self_attn.v_proj                   1:8b_32g s4                                        8.16 bpw
 -- model.layers.78.self_attn.v_proj                   1:8b_128g s4                                       8.06 bpw
 -- model.layers.78.self_attn.o_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.14 bpw
 -- model.layers.78.self_attn.o_proj                   1:4b_128g s4                                       4.04 bpw
 -- model.layers.78.self_attn.o_proj                   1:4b_64g s4                                        4.07 bpw
 -- model.layers.78.self_attn.o_proj                   1:4b_32g s4                                        4.13 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.14 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.14 bpw
 -- model.layers.78.self_attn.o_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.78.self_attn.o_proj                   1:6b_128g s4                                       6.04 bpw
 -- model.layers.78.self_attn.o_proj                   1:6b_32g s4                                        6.13 bpw
 -- model.layers.78.self_attn.o_proj                   1:8b_128g s4                                       8.04 bpw
 -- 2.1254 bpw  accuracy: 0.98200013
 -- 2.1805 bpw  accuracy: 0.98263463
 -- 2.2265 bpw  accuracy: 0.98464463
 -- 2.6605 bpw  accuracy: 0.98872345
 -- 3.1487 bpw  accuracy: 0.99127257
 -- 3.1501 bpw  accuracy: 0.99148091
 -- 4.0394 bpw  accuracy: 0.99467434
 -- 4.0411 bpw  accuracy: 0.99497946
 -- 4.0742 bpw  accuracy: 0.99520273
 -- 4.1334 bpw  accuracy: 0.99506140
 -- 4.1501 bpw  accuracy: 0.99567622
 -- 4.1758 bpw  accuracy: 0.99595824
 -- 4.2222 bpw  accuracy: 0.99629830
 -- 4.2848 bpw  accuracy: 0.99655357
 -- 5.1982 bpw  accuracy: 0.99788529
 -- 5.2848 bpw  accuracy: 0.99816783
 -- 6.0394 bpw  accuracy: 0.99851872
 -- 6.2445 bpw  accuracy: 0.99891316
 -- 8.0394 bpw  accuracy: 0.99956538
------------------------------------------------
| Measured: model.layers.78 (Attention)        |
| Duration: 17.24 seconds                      |
| Completed step: 157/163                      |
| Avg time / step (rolling): 17.24 seconds     |
| Estimated remaining time: 1min 43sec         |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
 -- Layer: model.layers.78 (MLP)
 -- model.layers.78.mlp.gate_proj                      0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:4b_128g/0.9:3b_128g s4                         3.14 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:4b_32g/0.9:3b_32g s4                           3.23 bpw
 -- model.layers.78.mlp.gate_proj                      1:4b_128g s4                                       4.03 bpw
 -- model.layers.78.mlp.gate_proj                      1:4b_32g s4                                        4.13 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:5b_128g/0.9:4b_128g s4                         4.14 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:6b_128g/0.9:5b_128g s4                         5.14 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.78.mlp.gate_proj                      1:6b_128g s4                                       6.03 bpw
 -- model.layers.78.mlp.gate_proj                      0.1:8b_128g/0.9:6b_128g s4                         6.25 bpw
 -- model.layers.78.mlp.gate_proj                      1:8b_128g s4                                       8.03 bpw
 -- model.layers.78.mlp.up_proj                        0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.78.mlp.up_proj                        0.25:3b_64g/0.75:2b_64g s4                         2.31 bpw
 -- model.layers.78.mlp.up_proj                        0.3:3b_64g/0.7:2b_64g s4                           2.37 bpw
 -- model.layers.78.mlp.up_proj                        0.25:4b_128g/0.75:3b_128g s4                       3.28 bpw
 -- model.layers.78.mlp.up_proj                        0.25:4b_32g/0.75:3b_32g s4                         3.38 bpw
 -- model.layers.78.mlp.up_proj                        1:4b_32g s4                                        4.13 bpw
 -- model.layers.78.mlp.up_proj                        0.25:5b_128g/0.75:4b_128g s4                       4.28 bpw
 -- model.layers.78.mlp.up_proj                        0.25:5b_32g/0.75:4b_32g s4                         4.38 bpw
 -- model.layers.78.mlp.up_proj                        0.25:6b_128g/0.75:5b_128g s4                       5.28 bpw
 -- model.layers.78.mlp.up_proj                        0.25:6b_32g/0.75:5b_32g s4                         5.38 bpw
 -- model.layers.78.mlp.up_proj                        1:6b_128g s4                                       6.03 bpw
 -- model.layers.78.mlp.up_proj                        0.1:8b_128g/0.9:6b_128g s4                         6.25 bpw
 -- model.layers.78.mlp.up_proj                        1:8b_128g s4                                       8.03 bpw
 -- model.layers.78.mlp.down_proj                      0.05:6b_32g/0.2:3b_64g/0.75:2b_64g s4              2.47 bpw
 -- model.layers.78.mlp.down_proj                      0.05:5b_32g/0.95:3b_32g s4                         3.23 bpw
 -- model.layers.78.mlp.down_proj                      0.05:5b_32g/0.95:4b_32g s4                         4.18 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:4b_128g/0.85:3b_128g s4            3.40 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:4b_32g/0.85:3b_32g s4              3.48 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.95:4b_128g s4                        4.24 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.95:4b_32g s4                         4.33 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:5b_128g/0.85:4b_128g s4            4.35 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:5b_32g/0.85:4b_32g s4              4.43 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4            5.30 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4              5.38 bpw
 -- model.layers.78.mlp.down_proj                      0.05:8b_32g/0.95:6b_128g s4                        6.14 bpw
 -- model.layers.78.mlp.down_proj                      0.15:8b_128g/0.85:6b_128g s4                       6.34 bpw
 -- model.layers.78.mlp.down_proj                      1:8b_128g s4                                       8.04 bpw
 -- 2.2370 bpw  accuracy: 0.96749690
 -- 2.3178 bpw  accuracy: 0.96841094
 -- 2.5881 bpw  accuracy: 0.97066376
 -- 2.9045 bpw  accuracy: 0.97190225
 -- 3.2741 bpw  accuracy: 0.98334273
 -- 3.3626 bpw  accuracy: 0.98490667
 -- 3.6158 bpw  accuracy: 0.98598315
 -- 4.1340 bpw  accuracy: 0.99084735
 -- 4.1949 bpw  accuracy: 0.99174905
 -- 4.2572 bpw  accuracy: 0.99142451
 -- 4.3457 bpw  accuracy: 0.99259937
 -- 5.2402 bpw  accuracy: 0.99557532
 -- 5.3287 bpw  accuracy: 0.99625175
 -- 6.0688 bpw  accuracy: 0.99740110
 -- 6.2801 bpw  accuracy: 0.99772034
 -- 6.8458 bpw  accuracy: 0.99798890
 -- 8.0333 bpw  accuracy: 0.99912791
------------------------------------------------
| Measured: model.layers.78 (MLP)              |
| Duration: 81.78 seconds                      |
| Completed step: 158/163                      |
| Avg time / step (rolling): 49.51 seconds     |
| Estimated remaining time: 4min 7sec          |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
 -- Layer: model.layers.79 (Attention)
 -- model.layers.79.self_attn.q_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.15 bpw
 -- model.layers.79.self_attn.q_proj                   1:4b_128g s4                                       4.04 bpw
 -- model.layers.79.self_attn.q_proj                   1:4b_64g s4                                        4.07 bpw
 -- model.layers.79.self_attn.q_proj                   1:4b_32g s4                                        4.13 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.15 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.15 bpw
 -- model.layers.79.self_attn.q_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.79.self_attn.q_proj                   1:6b_128g s4                                       6.04 bpw
 -- model.layers.79.self_attn.q_proj                   1:6b_32g s4                                        6.13 bpw
 -- model.layers.79.self_attn.q_proj                   1:8b_128g s4                                       8.04 bpw
 -- model.layers.79.self_attn.k_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.15 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.20 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.17 bpw
 -- model.layers.79.self_attn.k_proj                   1:4b_128g s4                                       4.06 bpw
 -- model.layers.79.self_attn.k_proj                   1:4b_64g s4                                        4.10 bpw
 -- model.layers.79.self_attn.k_proj                   1:4b_32g s4                                        4.16 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.17 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.20 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.26 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.17 bpw
 -- model.layers.79.self_attn.k_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.26 bpw
 -- model.layers.79.self_attn.k_proj                   1:6b_128g s4                                       6.06 bpw
 -- model.layers.79.self_attn.k_proj                   1:6b_32g s4                                        6.16 bpw
 -- model.layers.79.self_attn.k_proj                   1:8b_128g s4                                       8.06 bpw
 -- model.layers.79.self_attn.v_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.15 bpw
 -- model.layers.79.self_attn.v_proj                   0.25:3b_64g/0.75:2b_64g s4                         2.35 bpw
 -- model.layers.79.self_attn.v_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.17 bpw
 -- model.layers.79.self_attn.v_proj                   0.1:4b_64g/0.9:3b_64g s4                           3.20 bpw
 -- model.layers.79.self_attn.v_proj                   1:4b_128g s4                                       4.06 bpw
 -- model.layers.79.self_attn.v_proj                   1:4b_64g s4                                        4.10 bpw
 -- model.layers.79.self_attn.v_proj                   1:4b_32g s4                                        4.16 bpw
 -- model.layers.79.self_attn.v_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.20 bpw
 -- model.layers.79.self_attn.v_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.26 bpw
 -- model.layers.79.self_attn.v_proj                   1:5b_64g s4                                        5.10 bpw
 -- model.layers.79.self_attn.v_proj                   1:5b_32g s4                                        5.16 bpw
 -- model.layers.79.self_attn.v_proj                   1:6b_128g s4                                       6.06 bpw
 -- model.layers.79.self_attn.v_proj                   1:6b_32g s4                                        6.16 bpw
 -- model.layers.79.self_attn.v_proj                   1:8b_32g s4                                        8.16 bpw
 -- model.layers.79.self_attn.v_proj                   1:8b_128g s4                                       8.06 bpw
 -- model.layers.79.self_attn.o_proj                   0.05:3b_64g/0.95:2b_64g s4                         2.12 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:4b_128g/0.9:3b_128g s4                         3.14 bpw
 -- model.layers.79.self_attn.o_proj                   1:4b_128g s4                                       4.04 bpw
 -- model.layers.79.self_attn.o_proj                   1:4b_64g s4                                        4.07 bpw
 -- model.layers.79.self_attn.o_proj                   1:4b_32g s4                                        4.13 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:5b_128g/0.9:4b_128g s4                         4.14 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:5b_64g/0.9:4b_64g s4                           4.17 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:6b_128g/0.9:5b_128g s4                         5.14 bpw
 -- model.layers.79.self_attn.o_proj                   0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.79.self_attn.o_proj                   1:6b_128g s4                                       6.04 bpw
 -- model.layers.79.self_attn.o_proj                   1:6b_32g s4                                        6.13 bpw
 -- model.layers.79.self_attn.o_proj                   1:8b_128g s4                                       8.04 bpw
 -- 2.1254 bpw  accuracy: 0.99430841
 -- 2.1805 bpw  accuracy: 0.99455748
 -- 2.2265 bpw  accuracy: 0.99536601
 -- 2.6605 bpw  accuracy: 0.99650368
 -- 3.1487 bpw  accuracy: 0.99726392
 -- 3.1501 bpw  accuracy: 0.99730426
 -- 4.0394 bpw  accuracy: 0.99843880
 -- 4.0411 bpw  accuracy: 0.99847716
 -- 4.0742 bpw  accuracy: 0.99857254
 -- 4.1334 bpw  accuracy: 0.99864129
 -- 4.1501 bpw  accuracy: 0.99860959
 -- 4.1758 bpw  accuracy: 0.99868624
 -- 4.2222 bpw  accuracy: 0.99883697
 -- 4.2848 bpw  accuracy: 0.99891667
 -- 5.1982 bpw  accuracy: 0.99933463
 -- 5.2848 bpw  accuracy: 0.99941761
 -- 6.0394 bpw  accuracy: 0.99953790
 -- 6.2445 bpw  accuracy: 0.99965662
 -- 8.0394 bpw  accuracy: 0.99981288
------------------------------------------------
| Measured: model.layers.79 (Attention)        |
| Duration: 17.23 seconds                      |
| Completed step: 159/163                      |
| Avg time / step (rolling): 38.75 seconds     |
| Estimated remaining time: 2min 35sec         |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
 -- Layer: model.layers.79 (MLP)
 ## Measurement/inference error (3): hidden_states

DocShotgun · 2024-09-20T14:25:01Z

Head bits isn't relevant here because the error is happening during the measurement phase, not the quantization phase. Iirc this error means that NaN or inf values were produced in the hidden states.

Interestingly enough, I had this happen on my finetune of Qwen2.5 72B Instruct, on the exact same layer too (79 MLP), so I'm not sure if there's something slightly weird going on with Qwen2.5 models.

jukofyork · 2024-09-21T12:15:28Z

Is it using fp16 for the computations? If so it's probably the same problem that llama.cpp had with qwen-2:

ggerganov/llama.cpp#7805

and the values are going outside the +/- 2^16 range of fp16.

The fact it's happening right at the end like that would make me strongly suspect this is the problem as I found when creating control vectors the last layer often blows up its activations 10-20x compared to the preceeding layers... The solution is to use bf16 or fp32.

Orion-zhen · 2024-09-21T14:20:07Z

Sadly my hardware doesn't support bf16. Is there any way to make exl2 quantizations with fp32? Or even better, only upscale to fp32 when fp16 runs into errors?

DocShotgun · 2024-09-22T03:24:41Z

Is it using fp16 for the computations? If so it's probably the same problem that llama.cpp had with qwen-2:

ggerganov/llama.cpp#7805

and the values are going outside the +/- 2^16 range of fp16.

The fact it's happening right at the end like that would make me strongly suspect this is the problem as I found when creating control vectors the last layer often blows up its activations 10-20x compared to the preceeding layers... The solution is to use bf16 or fp32.

Yes, this is suspected to be the problem.

Sadly my hardware doesn't support bf16. Is there any way to make exl2 quantizations with fp32? Or even better, only upscale to fp32 when fp16 runs into errors?

It's fairly more complicated than this and would need changes to the code.

turboderp · 2024-09-23T16:41:10Z

You could possibly try again with the latest changes in the dev branch. I made it a bit more tolerant of overflows in the hidden state, so now if only a few channels overflow (as has been observed with Qwen2.5-72B specifically) it clamps them instead of erroring out.

Orion-zhen added the bug Something isn't working label Sep 19, 2024

Orion-zhen changed the title ~~[BUG] Failed to quantize Qwen2.5-Math-72B-Instruct~~ [BUG] Failed to quantize Qwen2.5-Math-72B-Instruct: Measurement/inference error (3): hidden_states Sep 19, 2024

Downtown-Case mentioned this issue Sep 19, 2024

[BUG] Qwen 2.5 34B returns garbage at certain quantization levels, but not others #628

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Failed to quantize Qwen2.5-Math-72B-Instruct: Measurement/inference error (3): hidden_states #627

[BUG] Failed to quantize Qwen2.5-Math-72B-Instruct: Measurement/inference error (3): hidden_states #627

Orion-zhen commented Sep 19, 2024 •

edited

Loading

Downtown-Case commented Sep 19, 2024

Orion-zhen commented Sep 20, 2024

DocShotgun commented Sep 20, 2024

jukofyork commented Sep 21, 2024 •

edited

Loading

Orion-zhen commented Sep 21, 2024

DocShotgun commented Sep 22, 2024

turboderp commented Sep 23, 2024

[BUG] Failed to quantize Qwen2.5-Math-72B-Instruct: Measurement/inference error (3): hidden_states #627

[BUG] Failed to quantize Qwen2.5-Math-72B-Instruct: Measurement/inference error (3): hidden_states #627

Comments

Orion-zhen commented Sep 19, 2024 • edited Loading

OS

GPU Library

Python version

Pytorch version

Model

Describe the bug

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements

Downtown-Case commented Sep 19, 2024

Orion-zhen commented Sep 20, 2024

DocShotgun commented Sep 20, 2024

jukofyork commented Sep 21, 2024 • edited Loading

Orion-zhen commented Sep 21, 2024

DocShotgun commented Sep 22, 2024

turboderp commented Sep 23, 2024

Orion-zhen commented Sep 19, 2024 •

edited

Loading

jukofyork commented Sep 21, 2024 •

edited

Loading