You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm facing a problem using ONNX runtime to do prediction using GPU (CUDAExecutionProvider) with different intervals. I'm doing the inference using Geforce RTX 2080 GPU. When I do the prediction without intervals (i.e., continuously in the for loop), the average prediction time is around 4ms. But if I insert interval of 0.1 seconds (time.sleep(0.1)), the average prediction latency increases to around 16 ms. Such is also the case for other intervals. But as interval becomes smaller, the prediction latency becomes smaller.
Does anyone face similar problem before or can kindly help to provide some suggestions?
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input
import keras2onnx
import onnxruntime
import datetime
image preprocessing
img_path = './image/defective_sample_0001.png' # make sure the image is in img_path
img_size = 384
img = image.load_img(img_path, target_size=(img_size, img_size))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
x = x if isinstance(x, list) else [x]
feed = dict([(input.name, x[n]) for n, input in enumerate(sess.get_inputs())])
for i in range(50):
start_time = datetime.datetime.now()
pred_onnx = sess.run(None, feed)
end_time = datetime.datetime.now()
time_diff = (end_time - start_time)
execution_time = time_diff.total_seconds() * 1000
print("execution time: ", execution_time)
time.sleep(0.1) # this is the place to insert intervals among predictions
Thanks in advance! Another note is that the above result is in a computer with Xeno processor. When predicting on another computer with Core processor, the latency increase problem is not so obvious.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
I'm facing a problem using ONNX runtime to do prediction using GPU (CUDAExecutionProvider) with different intervals. I'm doing the inference using Geforce RTX 2080 GPU. When I do the prediction without intervals (i.e., continuously in the for loop), the average prediction time is around 4ms. But if I insert interval of 0.1 seconds (time.sleep(0.1)), the average prediction latency increases to around 16 ms. Such is also the case for other intervals. But as interval becomes smaller, the prediction latency becomes smaller.
Does anyone face similar problem before or can kindly help to provide some suggestions?
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input
import keras2onnx
import onnxruntime
import datetime
image preprocessing
img_path = './image/defective_sample_0001.png' # make sure the image is in img_path
img_size = 384
img = image.load_img(img_path, target_size=(img_size, img_size))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
providers = [
('CUDAExecutionProvider', {
'device_id': 0,
'arena_extend_strategy': 'kNextPowerOfTwo',
'gpu_mem_limit': 2 * 1024 * 1024 * 1024,
'cudnn_conv_algo_search': 'EXHAUSTIVE',
'do_copy_in_default_stream': True,
}),
'CPUExecutionProvider',
]
temp_model_file = 'model.onnx'
keras2onnx.save_model(onnx_model, temp_model_file)
sess = onnxruntime.InferenceSession(temp_model_file, providers=providers)
runtime prediction
content = onnx_model.SerializeToString()
sess = onnxruntime.InferenceSession(content)
x = x if isinstance(x, list) else [x]
feed = dict([(input.name, x[n]) for n, input in enumerate(sess.get_inputs())])
for i in range(50):
start_time = datetime.datetime.now()
pred_onnx = sess.run(None, feed)
end_time = datetime.datetime.now()
time_diff = (end_time - start_time)
execution_time = time_diff.total_seconds() * 1000
print("execution time: ", execution_time)
time.sleep(0.1) # this is the place to insert intervals among predictions
Thanks in advance! Another note is that the above result is in a computer with Xeno processor. When predicting on another computer with Core processor, the latency increase problem is not so obvious.
Beta Was this translation helpful? Give feedback.
All reactions