NNAPI usage via onnxruntime #10692
-
Hi Team, I also had another query if it is necessary to quantize the onnx models if I have to utilize DSP runtime via NNAPI and Onnxruntime. Other platform Runtime Platforms do instruct to do so. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi, you know to select different devices now? Could you share your solutions? |
Beta Was this translation helpful? Give feedback.
-
The ORT NNAPI EP will be preferred over the ORT CPU EP when assigning nodes in the model. If the NNAPI EP can handle a specific operator ('handle' meaning convert to the equivalent NNAPI operator), nodes involving that operator will be assigned to the NNAPI EP. Any remaining nodes will be handled by the ORT CPU EP. The NNAPI EP will create NNAPI model/s from the nodes assigned to it at runtime. We don't have much control over how NNAPI itself will execute the NNAPI model we create. It will internally pick whatever it thinks is best. Setting the NNAPI_FLAG_CPU_DISABLED flag will prevent NNAPI from running operators in the NNAPI model that only have a CPU implementation. If there are operators like that in the NNAPI model the NNAPI model creation will fail. ORT does not fall back to using the ORT CPU EP for that node, as it has already been included in the NNAPI model. Basically that means the NNAPI_FLAG_CPU_DISABLED flag is good for testing if the nodes assigned to NNAPI will run well (the NNAPI fallback CPU implementation is not optimized in any way). FWIW NNAPI performance differs dramatically across devices. Your best bet is to run the model both with the NNAPI EP enabled and disabled and pick whichever option is best on a per-device basis. Whether you need to quantize or not will most likely depend on the hardware in the particular device. The NNAPI spec supports both quantized and float data types in general, but the device implementation of the spec may only support quantized types. e.g. if the NNAPI implementation for the particular DSP only supports 8-bit data types, you'll need to quantize the model to use that. that doesn't mean NNAPI can't be used - for example the GPU on the device may handle float data types and your model could potentially be executed by NNAPI using the GPU instead of the DSP. |
Beta Was this translation helpful? Give feedback.
The ORT NNAPI EP will be preferred over the ORT CPU EP when assigning nodes in the model.
If the NNAPI EP can handle a specific operator ('handle' meaning convert to the equivalent NNAPI operator), nodes involving that operator will be assigned to the NNAPI EP. Any remaining nodes will be handled by the ORT CPU EP.
The NNAPI EP will create NNAPI model/s from the nodes assigned to it at runtime. We don't have much control over how NNAPI itself will execute the NNAPI model we create. It will internally pick whatever it thinks is best.
Setting the NNAPI_FLAG_CPU_DISABLED flag will prevent NNAPI from running operators in the NNAPI model that only have a CPU implementation. If there are operat…