Can't get quantized model to run on GPU #14390

elephantpanda · 2023-01-22T04:15:25Z

elephantpanda
Jan 22, 2023

So I took a vae_decoder.onnx and quantized it using the

quant_pre_process()
and the
quantize_static()
functions

So now I have a uint8 onnx which is half the size of my float16 onnx. All well so far.

I try it in CPU mode and it works giving the correct output.
However... In CUDA mode it just hangs in inference.
In DirectML mode it gives the error when trying to load the model/session:

> Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:RuntimeException] Exception during initialization: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1819)\onnxruntime.dll!00007FFCC40DB67B: (caller: 00007FFCC40EB1BA) Exception(3) tid(7970) 80070057 The parameter is incorrect.

So it doesn't look like it wants to run on GPU. I have NVidia Quadro P5000. (I think this card should support int8)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't get quantized model to run on GPU #14390

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Can't get quantized model to run on GPU #14390

elephantpanda Jan 22, 2023

Replies: 0 comments

elephantpanda
Jan 22, 2023