Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"nodes-to-exclude" in quantization doesnt work #1418

Closed
marziye-A opened this issue Sep 27, 2023 · 4 comments · Fixed by #1451
Closed

"nodes-to-exclude" in quantization doesnt work #1418

marziye-A opened this issue Sep 27, 2023 · 4 comments · Fixed by #1451

Comments

@marziye-A
Copy link

marziye-A commented Sep 27, 2023

hi
why "nodes-to-exclude" in quantization doesn't work?
my code is as below:

from optimum.onnxruntime.configuration import QuantizationConfig

dqconfig = QuantizationConfig(is_static=False,format=QuantFormat.QDQ,mode = QuantizationMode.IntegerOps, per_channel=True ,weights_dtype = QuantType.QUInt8,nodes_to_exclude=["Wav2Vec2FeatureEncoder","Wav2Vec2FeatureProjection","Wav2Vec2EncoderStableLayerNorm"])

does any one know the reason?
any help is really appreciated!

@baskrahmer
Copy link
Contributor

Hi @marziye-A

I took a look and I think the issue is that your nodes_to_exclude list does not correspond to valid ONNX graph nodes but instead to model submodules. You can run the below code to get all ONNX nodes for a model:

import tempfile
from pathlib import Path

from onnx import load as onnx_load
from optimum.onnxruntime import ORTModelForAudioClassification, ORTQuantizer
from optimum.onnxruntime.configuration import QuantizationConfig, QuantFormat, QuantizationMode, QuantType

qconfig = QuantizationConfig(
    is_static=False,
    format=QuantFormat.QDQ,
    mode=QuantizationMode.IntegerOps,
    per_channel=True,
    weights_dtype=QuantType.QUInt8,
    nodes_to_exclude=["/wav2vec2/feature_extractor/conv_layers.0/conv/Conv"]  # <-- Node from ONNX graph
)

with tempfile.TemporaryDirectory() as tmp_dir:
    output_dir = Path(tmp_dir)
    model = ORTModelForAudioClassification.from_pretrained("hf-internal-testing/tiny-random-wav2vec2", export=True)

    quantizer = ORTQuantizer.from_pretrained(model)
    quantizer.quantize(
        save_dir=output_dir,
        quantization_config=qconfig,
    )
    quantized_model = onnx_load(output_dir.joinpath("model_quantized.onnx"))
    node_list = [node.name for node in quantized_model.graph.node]

    print(node_list)

I don't know what your exact use-case is, but if you want to exclude the Wav2Vec2FeatureEncoder graph from quantization, you would have to find the corresponding ONNX nodes for those computations and add them to the nodes_to_exclude argument. You can use a regex to filter those nodes from the full list or maybe implement something clever to get all nodes of the submodule ;)

@fxmarty
Copy link
Contributor

fxmarty commented Oct 16, 2023

Hi @marziye-A, apology for the late reply and thank you @baskrahmer for the correct answer!

I will improve the documentation in this regard.

An alternative to make it easier to exclude part of a model from quantization may be using export_modules_as_functions during the export as suggested here huggingface/transformers#26307 (comment)

@marziye-A
Copy link
Author

thank you very much for your answer,
do we have any limitation in number of nodes that we can to exclude?

@baskrahmer
Copy link
Contributor

do we have any limitation in number of nodes that we can to exclude?

As far as I know, you can add all nodes of the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants