Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add overflow protection for quantization bias to reduce quantization precision loss #21645

Merged
merged 4 commits into from
Aug 28, 2024

Conversation

duanshengliu
Copy link
Contributor

Description

When the scale of the bias is too small, the quantized bias may exceed the range of int32, leading to significant loss of precision. Therefore, before converting quantized bias to int32, it needs to be clipped within the range of int32 to reduce the loss of quantization precision.

Motivation and Context

Fix the issue #21000

@duanshengliu
Copy link
Contributor Author

@yihonglyu @adrianlizarraga, could you take a look and start the CI pipelines?

@adrianlizarraga
Copy link
Contributor

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

@adrianlizarraga
Copy link
Contributor

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, Windows x64 QNN CI Pipeline, Linux MIGraphX CI Pipeline, Big Models

@adrianlizarraga
Copy link
Contributor

/azp run ONNX Runtime React Native CI Pipeline, orttraining-amd-gpu-ci-pipeline, Linux Android Emulator QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@adrianlizarraga
Copy link
Contributor

/azp run Windows GPU CUDA CI Pipeline Windows GPU DML CI Pipeline Windows GPU Doc Gen CI Pipeline Linux Android Emulator QNN CI Pipeline

Copy link

No pipelines are associated with this pull request.

@adrianlizarraga
Copy link
Contributor

/azp run Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Linux Android Emulator QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yufenglee
Copy link
Member

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

@yufenglee
Copy link
Member

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, Windows x64 QNN CI Pipeline, Linux MIGraphX CI Pipeline, Big Models

@yufenglee
Copy link
Member

/azp run ONNX Runtime React Native CI Pipeline, orttraining-amd-gpu-ci-pipeline, Linux Android Emulator QNN CI Pipeline

@yufenglee
Copy link
Member

/azp run Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Linux Android Emulator QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@yufenglee
Copy link
Member

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

@yufenglee
Copy link
Member

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, Windows x64 QNN CI Pipeline, Linux MIGraphX CI Pipeline, Big Models

@yufenglee
Copy link
Member

/azp run ONNX Runtime React Native CI Pipeline, orttraining-amd-gpu-ci-pipeline, Linux Android Emulator QNN CI Pipeline

@yufenglee
Copy link
Member

/azp run Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Linux Android Emulator QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Copy link

Commenter does not have sufficient privileges for PR 21645 in repo microsoft/onnxruntime

@duanshengliu
Copy link
Contributor Author

@fajin-corp, could you help to start the CI pipelines?

@fajin-corp
Copy link
Contributor

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@fajin-corp
Copy link
Contributor

/azp run ONNX Runtime Web CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline

@fajin-corp
Copy link
Contributor

/azp run onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

Copy link

Azure Pipelines successfully started running 8 pipeline(s).

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@duanshengliu
Copy link
Contributor Author

@yufenglee, @fajin-corp, @adrianlizarraga, the orttraining-ortmodule-distributed pipeline was canceled. Could you help to restart it?

@fajin-corp
Copy link
Contributor

/azp run orttraining-ortmodule-distributed

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@duanshengliu
Copy link
Contributor Author

@yufenglee, @fajin-corp, @adrianlizarraga, all the checks have passed. Could you help to merge?

@fajin-corp fajin-corp merged commit 7df8776 into microsoft:main Aug 28, 2024
72 checks passed
@yihonglyu
Copy link
Contributor

@duanshengliu What's the difference between clip + cast and cast only? Could you add a test for it?

@duanshengliu
Copy link
Contributor Author

duanshengliu commented Aug 30, 2024

@duanshengliu What's the difference between clip + cast and cast only? Could you add a test for it?

@yihonglyu, for the values within the int32 range, there is no difference between clip + cast and cast only. However, for the values outside the int32 range, cast only will wrap around into the representable range of int32 (resulting in significant loss of precision), for example:

>>> import numpy as np
>>> bias = np.array(2147483648, dtype=np.float32)
>>> bias.astype(np.int32)
<stdin>:1: RuntimeWarning: invalid value encountered in cast
array(-2147483648, dtype=int32)

If we do clip + cast

>>> import numpy as np
>>> bias = np.array(2147483648, dtype=np.float32)
>>> bias = np.clip(bias, np.iinfo(np.int32).min, np.iinfo(np.int32).max)
>>> bias.astype(np.int32)
2147483647

Obviously, cast only carries the risk of causing significant precision loss, whereas clip + cast does not, and it can reduce quantization precision loss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants