Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ONNX quantization doc #1451

Merged
merged 1 commit into from
Oct 16, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 13 additions & 13 deletions optimum/onnxruntime/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,11 +252,11 @@ class QuantizationConfig:
reduce_range (`bool`, defaults to `False`):
Whether to use reduce-range 7-bits integers instead of 8-bits integers.
nodes_to_quantize (`List[str]`, defaults to `[]`):
List of the nodes names to quantize.
List of the nodes names to quantize. When unspecified, all nodes will be quantized. If empty, all nodes being operators from `operators_to_quantize` will be quantized.
nodes_to_exclude (`List[str]`, defaults to `[]`):
List of the nodes names to exclude when applying quantization.
List of the nodes names to exclude when applying quantization. The list of nodes in a model can be found loading the ONNX model through onnx.load, or through visual inspection with [netron](https://github.com/lutzroeder/netron).
operators_to_quantize (`List[str]`):
List of the operators types to quantize. Defaults to all quantizable operators for the given quantization mode and format.
List of the operators types to quantize. Defaults to all quantizable operators for the given quantization mode and format. Quantizable operators can be found at https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/registry.py.
qdq_add_pair_to_weight (`bool`, defaults to `False`):
By default, floating-point weights are quantized and feed to solely inserted DeQuantizeLinear node.
If set to True, the floating-point weights will remain and both QuantizeLinear / DeQuantizeLinear nodes
Expand Down Expand Up @@ -404,9 +404,9 @@ def arm64(
nodes_to_quantize (`Optional[List[str]]`, defaults to `None`):
Specific nodes to quantize. If `None`, all nodes being operators from `operators_to_quantize` will be quantized.
nodes_to_exclude (`Optional[List[str]]`, defaults to `None`):
Specific nodes to exclude from quantization.
Specific nodes to exclude from quantization. The list of nodes in a model can be found loading the ONNX model through onnx.load, or through visual inspection with [netron](https://github.com/lutzroeder/netron).
operators_to_quantize (`Optional[List[str]]`, defaults to `None`):
Type of nodes to perform quantization on. By default, all the quantizable operators will be quantized.
Type of nodes to perform quantization on. By default, all the quantizable operators will be quantized. Quantizable operators can be found at https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/registry.py.
"""
format, mode, operators_to_quantize = default_quantization_parameters(
is_static, operators_to_quantize=operators_to_quantize
Expand Down Expand Up @@ -462,9 +462,9 @@ def avx2(
nodes_to_quantize (`Optional[List[str]]`, defaults to `None`):
Specific nodes to quantize. If `None`, all nodes being operators from `operators_to_quantize` will be quantized.
nodes_to_exclude (`Optional[List[str]]`, defaults to `None`):
Specific nodes to exclude from quantization.
Specific nodes to exclude from quantization. The list of nodes in a model can be found loading the ONNX model through onnx.load, or through visual inspection with [netron](https://github.com/lutzroeder/netron).
operators_to_quantize (`Optional[List[str]]`, defaults to `None`):
Type of nodes to perform quantization on. By default, all the quantizable operators will be quantized.
Type of nodes to perform quantization on. By default, all the quantizable operators will be quantized. Quantizable operators can be found at https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/registry.py.
"""
format, mode, operators_to_quantize = default_quantization_parameters(
is_static, operators_to_quantize=operators_to_quantize
Expand Down Expand Up @@ -518,9 +518,9 @@ def avx512(
nodes_to_quantize (`Optional[List[str]]`, defaults to `None`):
Specific nodes to quantize. If `None`, all nodes being operators from `operators_to_quantize` will be quantized.
nodes_to_exclude (`Optional[List[str]]`, defaults to `None`):
Specific nodes to exclude from quantization.
Specific nodes to exclude from quantization. The list of nodes in a model can be found loading the ONNX model through onnx.load, or through visual inspection with [netron](https://github.com/lutzroeder/netron).
operators_to_quantize (`Optional[List[str]]`, defaults to `None`):
Type of nodes to perform quantization on. By default, all the quantizable operators will be quantized.
Type of nodes to perform quantization on. By default, all the quantizable operators will be quantized. Quantizable operators can be found at https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/registry.py.
"""
format, mode, operators_to_quantize = default_quantization_parameters(
is_static, operators_to_quantize=operators_to_quantize
Expand Down Expand Up @@ -575,9 +575,9 @@ def avx512_vnni(
nodes_to_quantize (`Optional[List[str]]`, defaults to `None`):
Specific nodes to quantize. If `None`, all nodes being operators from `operators_to_quantize` will be quantized.
nodes_to_exclude (`Optional[List[str]]`, defaults to `None`):
Specific nodes to exclude from quantization.
Specific nodes to exclude from quantization. The list of nodes in a model can be found loading the ONNX model through onnx.load, or through visual inspection with [netron](https://github.com/lutzroeder/netron).
operators_to_quantize (`Optional[List[str]]`, defaults to `None`):
Type of nodes to perform quantization on. By default, all the quantizable operators will be quantized.
Type of nodes to perform quantization on. By default, all the quantizable operators will be quantized. Quantizable operators can be found at https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/registry.py.
"""
format, mode, operators_to_quantize = default_quantization_parameters(
is_static, operators_to_quantize=operators_to_quantize
Expand Down Expand Up @@ -615,9 +615,9 @@ def tensorrt(
nodes_to_quantize (`Optional[List[str]]`, defaults to `None`):
Specific nodes to quantize. If `None`, all nodes being operators from `operators_to_quantize` will be quantized.
nodes_to_exclude (`Optional[List[str]]`, defaults to `None`):
Specific nodes to exclude from quantization.
Specific nodes to exclude from quantization. The list of nodes in a model can be found loading the ONNX model through onnx.load, or through visual inspection with [netron](https://github.com/lutzroeder/netron).
operators_to_quantize (`Optional[List[str]]`, defaults to `None`):
Type of nodes to perform quantization on. By default, all the quantizable operators will be quantized.
Type of nodes to perform quantization on. By default, all the quantizable operators will be quantized. Quantizable operators can be found at https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/registry.py.
"""
format, mode, operators_to_quantize = default_quantization_parameters(
is_static=True, operators_to_quantize=operators_to_quantize
Expand Down
Loading