-
-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Failed to quantize Qwen2.5-Math-72B-Instruct: Measurement/inference error (3): hidden_states #627
Comments
Out of curiosity, have you tried hb6? That tends to work best below 6bpw anyway. |
The issue persists even though I switched python convert.py -fst -hb 6 -c ~/ai/math-hard/math-hard.parquet -o ../tmp -i ~/ai/Models/qwen2.5-math-72b -cf ~/ai/Models/qwen2.5-math-72b-4.5 -b 4.5
-- Resuming job
!! Note: Overriding options with settings from existing job
-- Input: /home/orion/ai/Models/qwen2.5-math-72b
-- Output: ../tmp
-- Calibration dataset: /home/orion/ai/math-hard/math-hard.parquet, 100 / 16 rows, 2048 tokens per sample
-- Target bits per weight: 4.5 (decoder), 6 (head)
-- Max shard size: 8192 MB
-- Enabled fast_safetensors option.
-- Full model will be compiled to: /home/orion/ai/Models/qwen2.5-math-72b-4.5
!! Warning: Output path /home/orion/ai/Models/qwen2.5-math-72b-4.5 exists but is not empty
-- Measuring quantization impact...
-- Resuming from layer: model.layers.77 (MLP)
-- Layer: model.layers.78 (Attention)
-- model.layers.78.self_attn.q_proj 0.05:3b_64g/0.95:2b_64g s4 2.12 bpw
-- model.layers.78.self_attn.q_proj 0.1:3b_64g/0.9:2b_64g s4 2.17 bpw
-- model.layers.78.self_attn.q_proj 0.1:4b_128g/0.9:3b_128g s4 3.15 bpw
-- model.layers.78.self_attn.q_proj 1:4b_128g s4 4.04 bpw
-- model.layers.78.self_attn.q_proj 1:4b_64g s4 4.07 bpw
-- model.layers.78.self_attn.q_proj 1:4b_32g s4 4.13 bpw
-- model.layers.78.self_attn.q_proj 0.1:5b_128g/0.9:4b_128g s4 4.15 bpw
-- model.layers.78.self_attn.q_proj 0.1:5b_64g/0.9:4b_64g s4 4.17 bpw
-- model.layers.78.self_attn.q_proj 0.1:5b_32g/0.9:4b_32g s4 4.23 bpw
-- model.layers.78.self_attn.q_proj 0.1:6b_128g/0.9:5b_128g s4 5.15 bpw
-- model.layers.78.self_attn.q_proj 0.1:6b_32g/0.9:5b_32g s4 5.23 bpw
-- model.layers.78.self_attn.q_proj 1:6b_128g s4 6.04 bpw
-- model.layers.78.self_attn.q_proj 1:6b_32g s4 6.13 bpw
-- model.layers.78.self_attn.q_proj 1:8b_128g s4 8.04 bpw
-- model.layers.78.self_attn.k_proj 0.05:3b_64g/0.95:2b_64g s4 2.15 bpw
-- model.layers.78.self_attn.k_proj 0.1:3b_64g/0.9:2b_64g s4 2.20 bpw
-- model.layers.78.self_attn.k_proj 0.1:4b_128g/0.9:3b_128g s4 3.17 bpw
-- model.layers.78.self_attn.k_proj 1:4b_128g s4 4.06 bpw
-- model.layers.78.self_attn.k_proj 1:4b_64g s4 4.10 bpw
-- model.layers.78.self_attn.k_proj 1:4b_32g s4 4.16 bpw
-- model.layers.78.self_attn.k_proj 0.1:5b_128g/0.9:4b_128g s4 4.17 bpw
-- model.layers.78.self_attn.k_proj 0.1:5b_64g/0.9:4b_64g s4 4.20 bpw
-- model.layers.78.self_attn.k_proj 0.1:5b_32g/0.9:4b_32g s4 4.26 bpw
-- model.layers.78.self_attn.k_proj 0.1:6b_128g/0.9:5b_128g s4 5.17 bpw
-- model.layers.78.self_attn.k_proj 0.1:6b_32g/0.9:5b_32g s4 5.26 bpw
-- model.layers.78.self_attn.k_proj 1:6b_128g s4 6.06 bpw
-- model.layers.78.self_attn.k_proj 1:6b_32g s4 6.16 bpw
-- model.layers.78.self_attn.k_proj 1:8b_128g s4 8.06 bpw
-- model.layers.78.self_attn.v_proj 0.05:3b_64g/0.95:2b_64g s4 2.15 bpw
-- model.layers.78.self_attn.v_proj 0.25:3b_64g/0.75:2b_64g s4 2.35 bpw
-- model.layers.78.self_attn.v_proj 0.1:4b_128g/0.9:3b_128g s4 3.17 bpw
-- model.layers.78.self_attn.v_proj 0.1:4b_64g/0.9:3b_64g s4 3.20 bpw
-- model.layers.78.self_attn.v_proj 1:4b_128g s4 4.06 bpw
-- model.layers.78.self_attn.v_proj 1:4b_64g s4 4.10 bpw
-- model.layers.78.self_attn.v_proj 1:4b_32g s4 4.16 bpw
-- model.layers.78.self_attn.v_proj 0.1:5b_64g/0.9:4b_64g s4 4.20 bpw
-- model.layers.78.self_attn.v_proj 0.1:5b_32g/0.9:4b_32g s4 4.26 bpw
-- model.layers.78.self_attn.v_proj 1:5b_64g s4 5.10 bpw
-- model.layers.78.self_attn.v_proj 1:5b_32g s4 5.16 bpw
-- model.layers.78.self_attn.v_proj 1:6b_128g s4 6.06 bpw
-- model.layers.78.self_attn.v_proj 1:6b_32g s4 6.16 bpw
-- model.layers.78.self_attn.v_proj 1:8b_32g s4 8.16 bpw
-- model.layers.78.self_attn.v_proj 1:8b_128g s4 8.06 bpw
-- model.layers.78.self_attn.o_proj 0.05:3b_64g/0.95:2b_64g s4 2.12 bpw
-- model.layers.78.self_attn.o_proj 0.1:3b_64g/0.9:2b_64g s4 2.17 bpw
-- model.layers.78.self_attn.o_proj 0.1:4b_128g/0.9:3b_128g s4 3.14 bpw
-- model.layers.78.self_attn.o_proj 1:4b_128g s4 4.04 bpw
-- model.layers.78.self_attn.o_proj 1:4b_64g s4 4.07 bpw
-- model.layers.78.self_attn.o_proj 1:4b_32g s4 4.13 bpw
-- model.layers.78.self_attn.o_proj 0.1:5b_128g/0.9:4b_128g s4 4.14 bpw
-- model.layers.78.self_attn.o_proj 0.1:5b_64g/0.9:4b_64g s4 4.17 bpw
-- model.layers.78.self_attn.o_proj 0.1:5b_32g/0.9:4b_32g s4 4.23 bpw
-- model.layers.78.self_attn.o_proj 0.1:6b_128g/0.9:5b_128g s4 5.14 bpw
-- model.layers.78.self_attn.o_proj 0.1:6b_32g/0.9:5b_32g s4 5.23 bpw
-- model.layers.78.self_attn.o_proj 1:6b_128g s4 6.04 bpw
-- model.layers.78.self_attn.o_proj 1:6b_32g s4 6.13 bpw
-- model.layers.78.self_attn.o_proj 1:8b_128g s4 8.04 bpw
-- 2.1254 bpw accuracy: 0.98200013
-- 2.1805 bpw accuracy: 0.98263463
-- 2.2265 bpw accuracy: 0.98464463
-- 2.6605 bpw accuracy: 0.98872345
-- 3.1487 bpw accuracy: 0.99127257
-- 3.1501 bpw accuracy: 0.99148091
-- 4.0394 bpw accuracy: 0.99467434
-- 4.0411 bpw accuracy: 0.99497946
-- 4.0742 bpw accuracy: 0.99520273
-- 4.1334 bpw accuracy: 0.99506140
-- 4.1501 bpw accuracy: 0.99567622
-- 4.1758 bpw accuracy: 0.99595824
-- 4.2222 bpw accuracy: 0.99629830
-- 4.2848 bpw accuracy: 0.99655357
-- 5.1982 bpw accuracy: 0.99788529
-- 5.2848 bpw accuracy: 0.99816783
-- 6.0394 bpw accuracy: 0.99851872
-- 6.2445 bpw accuracy: 0.99891316
-- 8.0394 bpw accuracy: 0.99956538
------------------------------------------------
| Measured: model.layers.78 (Attention) |
| Duration: 17.24 seconds |
| Completed step: 157/163 |
| Avg time / step (rolling): 17.24 seconds |
| Estimated remaining time: 1min 43sec |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
-- Layer: model.layers.78 (MLP)
-- model.layers.78.mlp.gate_proj 0.05:3b_64g/0.95:2b_64g s4 2.12 bpw
-- model.layers.78.mlp.gate_proj 0.1:3b_64g/0.9:2b_64g s4 2.17 bpw
-- model.layers.78.mlp.gate_proj 0.1:4b_128g/0.9:3b_128g s4 3.14 bpw
-- model.layers.78.mlp.gate_proj 0.1:4b_32g/0.9:3b_32g s4 3.23 bpw
-- model.layers.78.mlp.gate_proj 1:4b_128g s4 4.03 bpw
-- model.layers.78.mlp.gate_proj 1:4b_32g s4 4.13 bpw
-- model.layers.78.mlp.gate_proj 0.1:5b_128g/0.9:4b_128g s4 4.14 bpw
-- model.layers.78.mlp.gate_proj 0.1:5b_32g/0.9:4b_32g s4 4.23 bpw
-- model.layers.78.mlp.gate_proj 0.1:6b_128g/0.9:5b_128g s4 5.14 bpw
-- model.layers.78.mlp.gate_proj 0.1:6b_32g/0.9:5b_32g s4 5.23 bpw
-- model.layers.78.mlp.gate_proj 1:6b_128g s4 6.03 bpw
-- model.layers.78.mlp.gate_proj 0.1:8b_128g/0.9:6b_128g s4 6.25 bpw
-- model.layers.78.mlp.gate_proj 1:8b_128g s4 8.03 bpw
-- model.layers.78.mlp.up_proj 0.05:3b_64g/0.95:2b_64g s4 2.12 bpw
-- model.layers.78.mlp.up_proj 0.25:3b_64g/0.75:2b_64g s4 2.31 bpw
-- model.layers.78.mlp.up_proj 0.3:3b_64g/0.7:2b_64g s4 2.37 bpw
-- model.layers.78.mlp.up_proj 0.25:4b_128g/0.75:3b_128g s4 3.28 bpw
-- model.layers.78.mlp.up_proj 0.25:4b_32g/0.75:3b_32g s4 3.38 bpw
-- model.layers.78.mlp.up_proj 1:4b_32g s4 4.13 bpw
-- model.layers.78.mlp.up_proj 0.25:5b_128g/0.75:4b_128g s4 4.28 bpw
-- model.layers.78.mlp.up_proj 0.25:5b_32g/0.75:4b_32g s4 4.38 bpw
-- model.layers.78.mlp.up_proj 0.25:6b_128g/0.75:5b_128g s4 5.28 bpw
-- model.layers.78.mlp.up_proj 0.25:6b_32g/0.75:5b_32g s4 5.38 bpw
-- model.layers.78.mlp.up_proj 1:6b_128g s4 6.03 bpw
-- model.layers.78.mlp.up_proj 0.1:8b_128g/0.9:6b_128g s4 6.25 bpw
-- model.layers.78.mlp.up_proj 1:8b_128g s4 8.03 bpw
-- model.layers.78.mlp.down_proj 0.05:6b_32g/0.2:3b_64g/0.75:2b_64g s4 2.47 bpw
-- model.layers.78.mlp.down_proj 0.05:5b_32g/0.95:3b_32g s4 3.23 bpw
-- model.layers.78.mlp.down_proj 0.05:5b_32g/0.95:4b_32g s4 4.18 bpw
-- model.layers.78.mlp.down_proj 0.05:8b_32g/0.1:4b_128g/0.85:3b_128g s4 3.40 bpw
-- model.layers.78.mlp.down_proj 0.05:8b_32g/0.1:4b_32g/0.85:3b_32g s4 3.48 bpw
-- model.layers.78.mlp.down_proj 0.05:8b_32g/0.95:4b_128g s4 4.24 bpw
-- model.layers.78.mlp.down_proj 0.05:8b_32g/0.95:4b_32g s4 4.33 bpw
-- model.layers.78.mlp.down_proj 0.05:8b_32g/0.1:5b_128g/0.85:4b_128g s4 4.35 bpw
-- model.layers.78.mlp.down_proj 0.05:8b_32g/0.1:5b_32g/0.85:4b_32g s4 4.43 bpw
-- model.layers.78.mlp.down_proj 0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4 5.30 bpw
-- model.layers.78.mlp.down_proj 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4 5.38 bpw
-- model.layers.78.mlp.down_proj 0.05:8b_32g/0.95:6b_128g s4 6.14 bpw
-- model.layers.78.mlp.down_proj 0.15:8b_128g/0.85:6b_128g s4 6.34 bpw
-- model.layers.78.mlp.down_proj 1:8b_128g s4 8.04 bpw
-- 2.2370 bpw accuracy: 0.96749690
-- 2.3178 bpw accuracy: 0.96841094
-- 2.5881 bpw accuracy: 0.97066376
-- 2.9045 bpw accuracy: 0.97190225
-- 3.2741 bpw accuracy: 0.98334273
-- 3.3626 bpw accuracy: 0.98490667
-- 3.6158 bpw accuracy: 0.98598315
-- 4.1340 bpw accuracy: 0.99084735
-- 4.1949 bpw accuracy: 0.99174905
-- 4.2572 bpw accuracy: 0.99142451
-- 4.3457 bpw accuracy: 0.99259937
-- 5.2402 bpw accuracy: 0.99557532
-- 5.3287 bpw accuracy: 0.99625175
-- 6.0688 bpw accuracy: 0.99740110
-- 6.2801 bpw accuracy: 0.99772034
-- 6.8458 bpw accuracy: 0.99798890
-- 8.0333 bpw accuracy: 0.99912791
------------------------------------------------
| Measured: model.layers.78 (MLP) |
| Duration: 81.78 seconds |
| Completed step: 158/163 |
| Avg time / step (rolling): 49.51 seconds |
| Estimated remaining time: 4min 7sec |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
-- Layer: model.layers.79 (Attention)
-- model.layers.79.self_attn.q_proj 0.05:3b_64g/0.95:2b_64g s4 2.12 bpw
-- model.layers.79.self_attn.q_proj 0.1:3b_64g/0.9:2b_64g s4 2.17 bpw
-- model.layers.79.self_attn.q_proj 0.1:4b_128g/0.9:3b_128g s4 3.15 bpw
-- model.layers.79.self_attn.q_proj 1:4b_128g s4 4.04 bpw
-- model.layers.79.self_attn.q_proj 1:4b_64g s4 4.07 bpw
-- model.layers.79.self_attn.q_proj 1:4b_32g s4 4.13 bpw
-- model.layers.79.self_attn.q_proj 0.1:5b_128g/0.9:4b_128g s4 4.15 bpw
-- model.layers.79.self_attn.q_proj 0.1:5b_64g/0.9:4b_64g s4 4.17 bpw
-- model.layers.79.self_attn.q_proj 0.1:5b_32g/0.9:4b_32g s4 4.23 bpw
-- model.layers.79.self_attn.q_proj 0.1:6b_128g/0.9:5b_128g s4 5.15 bpw
-- model.layers.79.self_attn.q_proj 0.1:6b_32g/0.9:5b_32g s4 5.23 bpw
-- model.layers.79.self_attn.q_proj 1:6b_128g s4 6.04 bpw
-- model.layers.79.self_attn.q_proj 1:6b_32g s4 6.13 bpw
-- model.layers.79.self_attn.q_proj 1:8b_128g s4 8.04 bpw
-- model.layers.79.self_attn.k_proj 0.05:3b_64g/0.95:2b_64g s4 2.15 bpw
-- model.layers.79.self_attn.k_proj 0.1:3b_64g/0.9:2b_64g s4 2.20 bpw
-- model.layers.79.self_attn.k_proj 0.1:4b_128g/0.9:3b_128g s4 3.17 bpw
-- model.layers.79.self_attn.k_proj 1:4b_128g s4 4.06 bpw
-- model.layers.79.self_attn.k_proj 1:4b_64g s4 4.10 bpw
-- model.layers.79.self_attn.k_proj 1:4b_32g s4 4.16 bpw
-- model.layers.79.self_attn.k_proj 0.1:5b_128g/0.9:4b_128g s4 4.17 bpw
-- model.layers.79.self_attn.k_proj 0.1:5b_64g/0.9:4b_64g s4 4.20 bpw
-- model.layers.79.self_attn.k_proj 0.1:5b_32g/0.9:4b_32g s4 4.26 bpw
-- model.layers.79.self_attn.k_proj 0.1:6b_128g/0.9:5b_128g s4 5.17 bpw
-- model.layers.79.self_attn.k_proj 0.1:6b_32g/0.9:5b_32g s4 5.26 bpw
-- model.layers.79.self_attn.k_proj 1:6b_128g s4 6.06 bpw
-- model.layers.79.self_attn.k_proj 1:6b_32g s4 6.16 bpw
-- model.layers.79.self_attn.k_proj 1:8b_128g s4 8.06 bpw
-- model.layers.79.self_attn.v_proj 0.05:3b_64g/0.95:2b_64g s4 2.15 bpw
-- model.layers.79.self_attn.v_proj 0.25:3b_64g/0.75:2b_64g s4 2.35 bpw
-- model.layers.79.self_attn.v_proj 0.1:4b_128g/0.9:3b_128g s4 3.17 bpw
-- model.layers.79.self_attn.v_proj 0.1:4b_64g/0.9:3b_64g s4 3.20 bpw
-- model.layers.79.self_attn.v_proj 1:4b_128g s4 4.06 bpw
-- model.layers.79.self_attn.v_proj 1:4b_64g s4 4.10 bpw
-- model.layers.79.self_attn.v_proj 1:4b_32g s4 4.16 bpw
-- model.layers.79.self_attn.v_proj 0.1:5b_64g/0.9:4b_64g s4 4.20 bpw
-- model.layers.79.self_attn.v_proj 0.1:5b_32g/0.9:4b_32g s4 4.26 bpw
-- model.layers.79.self_attn.v_proj 1:5b_64g s4 5.10 bpw
-- model.layers.79.self_attn.v_proj 1:5b_32g s4 5.16 bpw
-- model.layers.79.self_attn.v_proj 1:6b_128g s4 6.06 bpw
-- model.layers.79.self_attn.v_proj 1:6b_32g s4 6.16 bpw
-- model.layers.79.self_attn.v_proj 1:8b_32g s4 8.16 bpw
-- model.layers.79.self_attn.v_proj 1:8b_128g s4 8.06 bpw
-- model.layers.79.self_attn.o_proj 0.05:3b_64g/0.95:2b_64g s4 2.12 bpw
-- model.layers.79.self_attn.o_proj 0.1:3b_64g/0.9:2b_64g s4 2.17 bpw
-- model.layers.79.self_attn.o_proj 0.1:4b_128g/0.9:3b_128g s4 3.14 bpw
-- model.layers.79.self_attn.o_proj 1:4b_128g s4 4.04 bpw
-- model.layers.79.self_attn.o_proj 1:4b_64g s4 4.07 bpw
-- model.layers.79.self_attn.o_proj 1:4b_32g s4 4.13 bpw
-- model.layers.79.self_attn.o_proj 0.1:5b_128g/0.9:4b_128g s4 4.14 bpw
-- model.layers.79.self_attn.o_proj 0.1:5b_64g/0.9:4b_64g s4 4.17 bpw
-- model.layers.79.self_attn.o_proj 0.1:5b_32g/0.9:4b_32g s4 4.23 bpw
-- model.layers.79.self_attn.o_proj 0.1:6b_128g/0.9:5b_128g s4 5.14 bpw
-- model.layers.79.self_attn.o_proj 0.1:6b_32g/0.9:5b_32g s4 5.23 bpw
-- model.layers.79.self_attn.o_proj 1:6b_128g s4 6.04 bpw
-- model.layers.79.self_attn.o_proj 1:6b_32g s4 6.13 bpw
-- model.layers.79.self_attn.o_proj 1:8b_128g s4 8.04 bpw
-- 2.1254 bpw accuracy: 0.99430841
-- 2.1805 bpw accuracy: 0.99455748
-- 2.2265 bpw accuracy: 0.99536601
-- 2.6605 bpw accuracy: 0.99650368
-- 3.1487 bpw accuracy: 0.99726392
-- 3.1501 bpw accuracy: 0.99730426
-- 4.0394 bpw accuracy: 0.99843880
-- 4.0411 bpw accuracy: 0.99847716
-- 4.0742 bpw accuracy: 0.99857254
-- 4.1334 bpw accuracy: 0.99864129
-- 4.1501 bpw accuracy: 0.99860959
-- 4.1758 bpw accuracy: 0.99868624
-- 4.2222 bpw accuracy: 0.99883697
-- 4.2848 bpw accuracy: 0.99891667
-- 5.1982 bpw accuracy: 0.99933463
-- 5.2848 bpw accuracy: 0.99941761
-- 6.0394 bpw accuracy: 0.99953790
-- 6.2445 bpw accuracy: 0.99965662
-- 8.0394 bpw accuracy: 0.99981288
------------------------------------------------
| Measured: model.layers.79 (Attention) |
| Duration: 17.23 seconds |
| Completed step: 159/163 |
| Avg time / step (rolling): 38.75 seconds |
| Estimated remaining time: 2min 35sec |
| Last checkpoint layer: model.layers.77 (MLP) |
------------------------------------------------
-- Layer: model.layers.79 (MLP)
## Measurement/inference error (3): hidden_states |
Head bits isn't relevant here because the error is happening during the measurement phase, not the quantization phase. Iirc this error means that NaN or inf values were produced in the hidden states. Interestingly enough, I had this happen on my finetune of Qwen2.5 72B Instruct, on the exact same layer too (79 MLP), so I'm not sure if there's something slightly weird going on with Qwen2.5 models. |
Is it using fp16 for the computations? If so it's probably the same problem that and the values are going outside the +/- 2^16 range of fp16. The fact it's happening right at the end like that would make me strongly suspect this is the problem as I found when creating control vectors the last layer often blows up its activations 10-20x compared to the preceeding layers... The solution is to use bf16 or fp32. |
Sadly my hardware doesn't support bf16. Is there any way to make exl2 quantizations with fp32? Or even better, only upscale to fp32 when fp16 runs into errors? |
Yes, this is suspected to be the problem.
It's fairly more complicated than this and would need changes to the code. |
You could possibly try again with the latest changes in the dev branch. I made it a bit more tolerant of overflows in the hidden state, so now if only a few channels overflow (as has been observed with Qwen2.5-72B specifically) it clamps them instead of erroring out. |
OS
Linux
GPU Library
CUDA 12.x
Python version
3.12
Pytorch version
2.4.1
Model
Qwen/Qwen2.5-Math-72B-Instruct
Describe the bug
calibration dataset: Orion-zhen/math-hard-calibration
console output:
The quantization process goes all right until it reaches layer 79 MLP. I have tried 4 times, the error remains the same.
Reproduction steps
run the command:
Expected behavior
The model should be quantized to 4.5bpw
Logs
full log can be viewed here:
Additional context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: