Replies: 3 comments 15 replies
-
@trainchoo - it might help to check what's the model looked like. Some ops may introduce memcpy between cpu and gpu, which could impact perf. Is that model sharable? Also - did you have your ORT built with CUDA enabled? @yuslepukhin - do you spot any potential issue on the c# code above? thx |
Beta Was this translation helpful? Give feedback.
-
Thanks. I exported the model using this library: https://github.com/onnx/tensorflow-onnx I dont mind sharing the model: easyupload link |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
I have a small-ish LSTM model built in Keras and exported to onnx so I can do inference in a c# app.
To run 100 loops on the CPU I get ~270 ms total compute time. But if I enable CUDA exec provider, i get a compute time of 1500+ ms. I've looked through the tutorials and made sure all the versions of Cuda and cudnn are correct, and i couldnt figure out the cause of the slowdown. My only guess is running one session at a time on the GPU is inefficient? Any way i can do a batch of a same model to speed up inference?
Model sequence:
Input - 30 floats
LSTM - 120 nodes
LSTM - 120 nodes
Dense - 30 nodes
Dense - 2 nodes
Test code:
Beta Was this translation helpful? Give feedback.
All reactions