Add overflow protection for quantization bias to reduce quantization precision loss #21645

duanshengliu · 2024-08-06T23:53:15Z

Description

When the scale of the bias is too small, the quantized bias may exceed the range of int32, leading to significant loss of precision. Therefore, before converting quantized bias to int32, it needs to be clipped within the range of int32 to reduce the loss of quantization precision.

Motivation and Context

Fix the issue #21000

duanshengliu · 2024-08-06T23:56:39Z

@yihonglyu @adrianlizarraga, could you take a look and start the CI pipelines?

adrianlizarraga · 2024-08-07T16:53:43Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

adrianlizarraga · 2024-08-07T16:53:59Z

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, Windows x64 QNN CI Pipeline, Linux MIGraphX CI Pipeline, Big Models

adrianlizarraga · 2024-08-07T16:54:19Z

/azp run ONNX Runtime React Native CI Pipeline, orttraining-amd-gpu-ci-pipeline, Linux Android Emulator QNN CI Pipeline

azure-pipelines · 2024-08-07T16:54:23Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2024-08-07T16:54:36Z

Azure Pipelines successfully started running 9 pipeline(s).

adrianlizarraga · 2024-08-07T17:06:50Z

/azp run Windows GPU CUDA CI Pipeline Windows GPU DML CI Pipeline Windows GPU Doc Gen CI Pipeline Linux Android Emulator QNN CI Pipeline

azure-pipelines · 2024-08-07T17:06:56Z

No pipelines are associated with this pull request.

adrianlizarraga · 2024-08-07T17:21:30Z

/azp run Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Linux Android Emulator QNN CI Pipeline

azure-pipelines · 2024-08-07T17:21:39Z

Azure Pipelines successfully started running 1 pipeline(s).

yufenglee · 2024-08-14T17:32:23Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

yufenglee · 2024-08-14T17:32:34Z

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, Windows x64 QNN CI Pipeline, Linux MIGraphX CI Pipeline, Big Models

yufenglee · 2024-08-14T17:32:51Z

/azp run ONNX Runtime React Native CI Pipeline, orttraining-amd-gpu-ci-pipeline, Linux Android Emulator QNN CI Pipeline

yufenglee · 2024-08-14T17:33:02Z

/azp run Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Linux Android Emulator QNN CI Pipeline

azure-pipelines · 2024-08-14T17:33:09Z

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines · 2024-08-14T17:33:20Z

Azure Pipelines successfully started running 4 pipeline(s).

azure-pipelines · 2024-08-14T17:33:22Z

Azure Pipelines successfully started running 9 pipeline(s).

yufenglee · 2024-08-15T16:40:11Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

yufenglee · 2024-08-15T16:40:18Z

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, Windows x64 QNN CI Pipeline, Linux MIGraphX CI Pipeline, Big Models

yufenglee · 2024-08-15T16:40:24Z

/azp run ONNX Runtime React Native CI Pipeline, orttraining-amd-gpu-ci-pipeline, Linux Android Emulator QNN CI Pipeline

yufenglee · 2024-08-15T16:40:31Z

/azp run Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Linux Android Emulator QNN CI Pipeline

azure-pipelines · 2024-08-15T16:40:42Z

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines · 2024-08-15T16:40:48Z

Azure Pipelines successfully started running 4 pipeline(s).

azure-pipelines · 2024-08-15T16:40:51Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2024-08-15T16:40:59Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2024-08-23T12:28:44Z

Commenter does not have sufficient privileges for PR 21645 in repo microsoft/onnxruntime

duanshengliu · 2024-08-23T22:09:52Z

@fajin-corp, could you help to start the CI pipelines?

fajin-corp · 2024-08-23T23:39:49Z

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline

azure-pipelines · 2024-08-23T23:40:23Z

Azure Pipelines successfully started running 9 pipeline(s).

fajin-corp · 2024-08-23T23:40:33Z

/azp run ONNX Runtime Web CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline

fajin-corp · 2024-08-23T23:40:48Z

/azp run onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

azure-pipelines · 2024-08-23T23:41:07Z

Azure Pipelines successfully started running 8 pipeline(s).

azure-pipelines · 2024-08-23T23:41:07Z

Azure Pipelines successfully started running 4 pipeline(s).

duanshengliu · 2024-08-27T08:21:13Z

@yufenglee, @fajin-corp, @adrianlizarraga, the orttraining-ortmodule-distributed pipeline was canceled. Could you help to restart it?

fajin-corp · 2024-08-27T18:34:20Z

/azp run orttraining-ortmodule-distributed

azure-pipelines · 2024-08-27T18:34:32Z

Azure Pipelines successfully started running 1 pipeline(s).

duanshengliu · 2024-08-28T14:34:10Z

@yufenglee, @fajin-corp, @adrianlizarraga, all the checks have passed. Could you help to merge?

yihonglyu · 2024-08-29T23:25:00Z

@duanshengliu What's the difference between clip + cast and cast only? Could you add a test for it?

duanshengliu · 2024-08-30T01:35:27Z

@duanshengliu What's the difference between clip + cast and cast only? Could you add a test for it?

@yihonglyu, for the values within the int32 range, there is no difference between clip + cast and cast only. However, for the values outside the int32 range, cast only will wrap around into the representable range of int32 (resulting in significant loss of precision), for example:

>>> import numpy as np
>>> bias = np.array(2147483648, dtype=np.float32)
>>> bias.astype(np.int32)
<stdin>:1: RuntimeWarning: invalid value encountered in cast
array(-2147483648, dtype=int32)

If we do clip + cast ：

>>> import numpy as np
>>> bias = np.array(2147483648, dtype=np.float32)
>>> bias = np.clip(bias, np.iinfo(np.int32).min, np.iinfo(np.int32).max)
>>> bias.astype(np.int32)
2147483647

Obviously, cast only carries the risk of causing significant precision loss, whereas clip + cast does not, and it can reduce quantization precision loss

duanshengliu force-pushed the improve-small-scale-acc branch from 23a739f to 3ff8d99 Compare August 12, 2024 13:18

yufenglee approved these changes Aug 12, 2024

View reviewed changes

Add overflow protection for quantization bias to reduce precision loss

f810324

duanshengliu force-pushed the improve-small-scale-acc branch from 981ed28 to f810324 Compare August 22, 2024 12:55

Merge branch 'microsoft:main' into improve-small-scale-acc

e3370bc

duanshengliu added 2 commits August 23, 2024 20:34

Merge branch 'microsoft:main' into improve-small-scale-acc

421d777

Merge branch 'microsoft:main' into improve-small-scale-acc

015ccae

fajin-corp merged commit 7df8776 into microsoft:main Aug 28, 2024
72 checks passed

Add overflow protection for quantization bias to reduce quantization precision loss #21645

Add overflow protection for quantization bias to reduce quantization precision loss #21645

Conversation

duanshengliu commented Aug 6, 2024

Description

Motivation and Context

duanshengliu commented Aug 6, 2024

adrianlizarraga commented Aug 7, 2024

adrianlizarraga commented Aug 7, 2024

adrianlizarraga commented Aug 7, 2024

azure-pipelines bot commented Aug 7, 2024

azure-pipelines bot commented Aug 7, 2024

adrianlizarraga commented Aug 7, 2024

azure-pipelines bot commented Aug 7, 2024

adrianlizarraga commented Aug 7, 2024

azure-pipelines bot commented Aug 7, 2024

yufenglee commented Aug 14, 2024

yufenglee commented Aug 14, 2024

yufenglee commented Aug 14, 2024

yufenglee commented Aug 14, 2024

azure-pipelines bot commented Aug 14, 2024

azure-pipelines bot commented Aug 14, 2024

azure-pipelines bot commented Aug 14, 2024

yufenglee commented Aug 15, 2024

yufenglee commented Aug 15, 2024

yufenglee commented Aug 15, 2024

yufenglee commented Aug 15, 2024

azure-pipelines bot commented Aug 15, 2024

azure-pipelines bot commented Aug 15, 2024

azure-pipelines bot commented Aug 15, 2024

azure-pipelines bot commented Aug 15, 2024

azure-pipelines bot commented Aug 23, 2024

duanshengliu commented Aug 23, 2024

fajin-corp commented Aug 23, 2024

azure-pipelines bot commented Aug 23, 2024

fajin-corp commented Aug 23, 2024

fajin-corp commented Aug 23, 2024

azure-pipelines bot commented Aug 23, 2024

azure-pipelines bot commented Aug 23, 2024

duanshengliu commented Aug 27, 2024

fajin-corp commented Aug 27, 2024

azure-pipelines bot commented Aug 27, 2024

duanshengliu commented Aug 28, 2024

yihonglyu commented Aug 29, 2024

duanshengliu commented Aug 30, 2024 • edited Loading

duanshengliu commented Aug 30, 2024 •

edited

Loading