ONNX runtime prediction using GPU and with different intervals #10137

wujyamat · 2021-12-27T07:13:52Z

wujyamat
Dec 27, 2021

Hi,

I'm facing a problem using ONNX runtime to do prediction using GPU (CUDAExecutionProvider) with different intervals. I'm doing the inference using Geforce RTX 2080 GPU. When I do the prediction without intervals (i.e., continuously in the for loop), the average prediction time is around 4ms. But if I insert interval of 0.1 seconds (time.sleep(0.1)), the average prediction latency increases to around 16 ms. Such is also the case for other intervals. But as interval becomes smaller, the prediction latency becomes smaller.
Does anyone face similar problem before or can kindly help to provide some suggestions?

import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input
import keras2onnx
import onnxruntime
import datetime

image preprocessing

img_path = './image/defective_sample_0001.png' # make sure the image is in img_path
img_size = 384
img = image.load_img(img_path, target_size=(img_size, img_size))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

providers = [
('CUDAExecutionProvider', {
'device_id': 0,
'arena_extend_strategy': 'kNextPowerOfTwo',
'gpu_mem_limit': 2 * 1024 * 1024 * 1024,
'cudnn_conv_algo_search': 'EXHAUSTIVE',
'do_copy_in_default_stream': True,
}),
'CPUExecutionProvider',
]
temp_model_file = 'model.onnx'
keras2onnx.save_model(onnx_model, temp_model_file)
sess = onnxruntime.InferenceSession(temp_model_file, providers=providers)

runtime prediction

content = onnx_model.SerializeToString()
sess = onnxruntime.InferenceSession(content)

x = x if isinstance(x, list) else [x]
feed = dict([(input.name, x[n]) for n, input in enumerate(sess.get_inputs())])

for i in range(50):
start_time = datetime.datetime.now()
pred_onnx = sess.run(None, feed)
end_time = datetime.datetime.now()
time_diff = (end_time - start_time)
execution_time = time_diff.total_seconds() * 1000
print("execution time: ", execution_time)
time.sleep(0.1) # this is the place to insert intervals among predictions

Thanks in advance! Another note is that the above result is in a computer with Xeno processor. When predicting on another computer with Core processor, the latency increase problem is not so obvious.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX runtime prediction using GPU and with different intervals #10137

{{title}}

Replies: 0 comments

Select a reply

ONNX runtime prediction using GPU and with different intervals #10137

wujyamat Dec 27, 2021

image preprocessing

runtime prediction

Replies: 0 comments

wujyamat
Dec 27, 2021