Can't get quantized model to run on GPU #14390
Unanswered
elephantpanda
asked this question in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
So I took a vae_decoder.onnx and quantized it using the
quant_pre_process()
and the
quantize_static()
functions
So now I have a uint8 onnx which is half the size of my float16 onnx. All well so far.
I try it in CPU mode and it works giving the correct output.
However... In CUDA mode it just hangs in inference.
In DirectML mode it gives the error when trying to load the model/session:
> Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:RuntimeException] Exception during initialization: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1819)\onnxruntime.dll!00007FFCC40DB67B: (caller: 00007FFCC40EB1BA) Exception(3) tid(7970) 80070057 The parameter is incorrect.
So it doesn't look like it wants to run on GPU. I have NVidia Quadro P5000. (I think this card should support int8)
Beta Was this translation helpful? Give feedback.
All reactions