精度速度测试结果

Backends

CPU: ncnn, ONNXRuntime, OpenVINO

GPU: ncnn, TensorRT, PPLNN

软硬件环境

Ubuntu 18.04
ncnn 20211208
Cuda 11.3
TensorRT 7.2.3.4
Docker 20.10.8
NVIDIA tesla T4 tensor core GPU for TensorRT

配置

静态图导出
batch 大小为 1
测试时，计算各个数据集中 100 张图片的平均耗时

用户可以直接通过model profiling获得想要的速度测试结果。下面是我们环境中的测试结果：

速度测试

mmpretrain		TensorRT(ms)					PPLNN(ms)		ncnn(ms)		Ascend(ms)
model	spatial	T4			JetsonNano2GB		Jetson TX2	T4	SnapDragon888	Adreno660	Ascend310
model	spatial	fp32	fp16	int8	fp32	fp16	fp32	fp16	fp32	fp32	fp32
ResNet	224x224	2.97	1.26	1.21	59.32	30.54	24.13	1.30	33.91	25.93	2.49
ResNeXt	224x224	4.31	1.42	1.37	88.10	49.18	37.45	1.36	133.44	69.38	-
SE-ResNet	224x224	3.41	1.66	1.51	74.59	48.78	29.62	1.91	107.84	80.85	-
ShuffleNetV2	224x224	1.37	1.19	1.13	15.26	10.23	7.37	4.69	9.55	10.66	-

mmdet part1		TensorRT(ms)				PPLNN(ms)
model	spatial	T4			Jetson TX2	T4
model	spatial	fp32	fp16	int8	fp32	fp16
YOLOv3	320x320	14.76	24.92	24.92	-	18.07
SSD-Lite	320x320	8.84	9.21	8.04	1.28	19.72
RetinaNet	800x1344	97.09	25.79	16.88	780.48	38.34
FCOS	800x1344	84.06	23.15	17.68	-	-
FSAF	800x1344	82.96	21.02	13.50	-	30.41
Faster R-CNN	800x1344	88.08	26.52	19.14	733.81	65.40
Mask R-CNN	800x1344	104.83	58.27	-	-	86.80

mmdet part2		ncnn
model	spatial	SnapDragon888	Adreno660
model	spatial	fp32	fp32
MobileNetv2-YOLOv3	320x320	48.57	66.55
SSD-Lite	320x320	44.91	66.19
YOLOX	416x416	111.60	134.50

mmagic		TensorRT(ms)				PPLNN(ms)
model	spatial	T4			Jetson TX2	T4
model	spatial	fp32	fp16	int8	fp32	fp16
ESRGAN	32x32	12.64	12.42	12.45	-	7.67
SRCNN	32x32	0.70	0.35	0.26	58.86	0.56

mmocr		TensorRT(ms)			PPLNN(ms)	ncnn(ms)
model	spatial	T4			T4	SnapDragon888	Adreno660
model	spatial	fp32	fp16	int8	fp16	fp32	fp32
DBNet	640x640	10.70	5.62	5.00	34.84	-	-
CRNN	32x32	1.93	1.40	1.36	-	10.57	20.00

mmseg		TensorRT(ms)				PPLNN(ms)
model	spatial	T4			Jetson TX2	T4
model	spatial	fp32	fp16	int8	fp32	fp16
FCN	512x1024	128.42	23.97	18.13	1682.54	27.00
PSPNet	1x3x512x1024	119.77	24.10	16.33	1586.19	27.26
DeepLabV3	512x1024	226.75	31.80	19.85	-	36.01
DeepLabV3+	512x1024	151.25	47.03	50.38	2534.96	34.80

精度测试

mmpretrain		PyTorch	TorchScript	ONNX Runtime	TensorRT			PPLNN	Ascend
model	metric	fp32	fp32	fp32	fp32	fp16	int8	fp16	fp32
ResNet-18	top-1	69.90	69.90	69.88	69.88	69.86	69.86	69.86	69.91
ResNet-18	top-5	89.43	89.43	89.34	89.34	89.33	89.38	89.34	89.43
ResNeXt-50	top-1	77.90	77.90	77.90	77.90	-	77.78	77.89	-
ResNeXt-50	top-5	93.66	93.66	93.66	93.66	-	93.64	93.65	-
SE-ResNet-50	top-1	77.74	77.74	77.74	77.74	77.75	77.63	77.73	-
SE-ResNet-50	top-5	93.84	93.84	93.84	93.84	93.83	93.72	93.84	-
ShuffleNetV1 1.0x	top-1	68.13	68.13	68.13	68.13	68.13	67.71	68.11	-
ShuffleNetV1 1.0x	top-5	87.81	87.81	87.81	87.81	87.81	87.58	87.80	-
ShuffleNetV2 1.0x	top-1	69.55	69.55	69.55	69.55	69.54	69.10	69.54	-
ShuffleNetV2 1.0x	top-5	88.92	88.92	88.92	88.92	88.91	88.58	88.92	-
MobileNet V2	top-1	71.86	71.86	71.86	71.86	71.87	70.91	71.84	71.87
MobileNet V2	top-5	90.42	90.42	90.42	90.42	90.40	89.85	90.41	90.42
Vision Transformer	top-1	85.43	85.43	-	85.43	85.42	-	-	85.43
Vision Transformer	top-5	97.77	97.77	-	97.77	97.76	-	-	97.77
Swin Transformer	top-1	81.18	81.18	81.18	81.18	81.18	-	-	-
Swin Transformer	top-5	95.61	95.61	95.61	95.61	95.61	-	-	-
EfficientFormer	top-1	80.46	80.45	80.46	80.46	-	-	-	-
EfficientFormer	top-5	94.99	94.98	94.99	94.99	-	-	-	-

mmdet				Pytorch	TorchScript	ONNXRuntime	TensorRT			PPLNN	Ascend	OpenVINO
model	task	dataset	metric	fp32	fp32	fp32	fp32	fp16	int8	fp16	fp32	fp32
YOLOV3	Object Detection	COCO2017	box AP	33.7	33.7	-	33.5	33.5	33.5	-	-	-
SSD	Object Detection	COCO2017	box AP	25.5	25.5	-	25.5	25.5	-	-	-	-
RetinaNet	Object Detection	COCO2017	box AP	36.5	36.4	-	36.4	36.4	36.3	36.5	36.4	-
FCOS	Object Detection	COCO2017	box AP	36.6	-	-	36.6	36.5	-	-	-	-
FSAF	Object Detection	COCO2017	box AP	37.4	37.4	-	37.4	37.4	37.2	37.4	-	-
CenterNet	Object Detection	COCO2017	box AP	25.9	26.0	26.0	26.0	25.8	-	-	-	-
YOLOX	Object Detection	COCO2017	box AP	40.5	40.3	-	40.3	40.3	29.3	-	-	-
Faster R-CNN	Object Detection	COCO2017	box AP	37.4	37.3	-	37.3	37.3	37.1	37.3	37.2	-
ATSS	Object Detection	COCO2017	box AP	39.4	-	-	39.4	39.4	-	-	-	-
Cascade R-CNN	Object Detection	COCO2017	box AP	40.4	-	-	40.4	40.4	-	40.4	-	-
GFL	Object Detection	COCO2017	box AP	40.2	-	40.2	40.2	40.0	-	-	-	-
RepPoints	Object Detection	COCO2017	box AP	37.0	-	-	36.9	-	-	-	-	-
DETR	Object Detection	COCO2017	box AP	40.1	40.1	-	40.1	40.1	-	-	-	-
Mask R-CNN	Instance Segmentation	COCO2017	box AP	38.2	38.1	-	38.1	38.1	-	38.0	-	-
Mask R-CNN	Instance Segmentation	COCO2017	mask AP	34.7	34.7	-	33.7	33.7	-	-	-	-
Swin-Transformer	Instance Segmentation	COCO2017	box AP	42.7	-	42.7	42.5	37.7	-	-	-	-
Swin-Transformer	Instance Segmentation	COCO2017	mask AP	39.3	-	39.3	39.3	35.4	-	-	-	-
SOLO	Instance Segmentation	COCO2017	mask AP	33.1	-	32.7	-	-	-	-	-	32.7
SOLOv2	Instance Segmentation	COCO2017	mask AP	34.8	-	34.5	-	-	-	-	-	34.5

mmagic				Pytorch	TorchScript	ONNX Runtime	TensorRT			PPLNN
model	task	dataset	metric	fp32	fp32	fp32	fp32	fp16	int8	fp16
SRCNN	Super Resolution	Set5	PSNR	28.4316	28.4120	28.4323	28.4323	28.4286	28.1995	28.4311
SRCNN	Super Resolution	Set5	SSIM	0.8099	0.8106	0.8097	0.8097	0.8096	0.7934	0.8096
ESRGAN	Super Resolution	Set5	PSNR	28.2700	28.2619	28.2592	28.2592	-	-	28.2624
ESRGAN	Super Resolution	Set5	SSIM	0.7778	0.7784	0.7764	0.7774	-	-	0.7765
ESRGAN-PSNR	Super Resolution	Set5	PSNR	30.6428	30.6306	30.6444	30.6430	-	-	27.0426
ESRGAN-PSNR	Super Resolution	Set5	SSIM	0.8559	0.8565	0.8558	0.8558	-	-	0.8557
SRGAN	Super Resolution	Set5	PSNR	27.9499	27.9252	27.9408	27.9408	-	-	27.9388
SRGAN	Super Resolution	Set5	SSIM	0.7846	0.7851	0.7839	0.7839	-	-	0.7839
SRResNet	Super Resolution	Set5	PSNR	30.2252	30.2069	30.2300	30.2300	-	-	30.2294
SRResNet	Super Resolution	Set5	SSIM	0.8491	0.8497	0.8488	0.8488	-	-	0.8488
Real-ESRNet	Super Resolution	Set5	PSNR	28.0297	-	27.7016	27.7016	-	-	27.7049
Real-ESRNet	Super Resolution	Set5	SSIM	0.8236	-	0.8122	0.8122	-	-	0.8123
EDSR	Super Resolution	Set5	PSNR	30.2223	30.2192	30.2214	30.2214	30.2211	30.1383	-
EDSR	Super Resolution	Set5	SSIM	0.8500	0.8507	0.8497	0.8497	0.8497	0.8469	-

mmocr				Pytorch	TorchScript	ONNXRuntime	TensorRT			PPLNN	OpenVINO
model	task	dataset	metric	fp32	fp32	fp32	fp32	fp16	int8	fp16	fp32
DBNet*	TextDetection	ICDAR2015	recall	0.7310	0.7308	0.7304	0.7198	0.7179	0.7111	0.7304	0.7309
			precision	0.8714	0.8718	0.8714	0.8677	0.8674	0.8688	0.8718	0.8714
			hmean	0.7950	0.7949	0.7950	0.7868	0.7856	0.7821	0.7949	0.7950
DBNetpp	TextDetection	ICDAR2015	recall	0.8209	0.8209	0.8209	0.8199	0.8204	0.8204	-	0.8209
			precision	0.9079	0.9079	0.9079	0.9117	0.9117	0.9142	-	0.9079
			hmean	0.8622	0.8622	0.8622	0.8634	0.8637	0.8648	-	0.8622
PSENet	TextDetection	ICDAR2015	recall	0.7526	0.7526	0.7526	0.7526	0.7520	0.7496	-	0.7526
			precision	0.8669	0.8669	0.8669	0.8669	0.8668	0.8550	-	0.8669
			hmean	0.8057	0.8057	0.8057	0.8057	0.8054	0.7989	-	0.8057
PANet	TextDetection	ICDAR2015	recall	0.7401	0.7401	0.7401	0.7357	0.7366	-	-	0.7401
			precision	0.8601	0.8601	0.8601	0.8570	0.8586	-	-	0.8601
			hmean	0.7955	0.7955	0.7955	0.7917	0.7930	-	-	0.7955
TextSnake	TextDetection	CTW1500	recall	0.8052	0.8052	0.8052	0.8055	-	-	-	-
			precision	0.8535	0.8535	0.8535	0.8538	-	-	-	-
			hmean	0.8286	0.8286	0.8286	0.8290	-	-	-	-
MaskRCNN	TextDetection	ICDAR2015	recall	0.7766	0.7766	0.7766	0.7766	0.7761	0.7670	-	-
			precision	0.8644	0.8644	0.8644	0.8644	0.8630	0.8705	-	-
			hmean	0.8182	0.8182	0.8182	0.8182	0.8172	0.8155	-	-
CRNN	TextRecognition	IIIT5K	acc	0.8067	0.8067	0.8067	0.8067	0.8063	0.8067	0.8067	-
SAR	TextRecognition	IIIT5K	acc	0.9517	-	0.9287	-	-	-	-	-
SATRN	TextRecognition	IIIT5K	acc	0.9470	0.9487	0.9487	0.9487	0.9483	0.9483	-	-
ABINet	TextRecognition	IIIT5K	acc	0.9603	0.9563	0.9563	0.9573	0.9507	0.9510	-	-

mmseg			Pytorch	TorchScript	ONNXRuntime	TensorRT			PPLNN	Ascend
model	dataset	metric	fp32	fp32	fp32	fp32	fp16	int8	fp16	fp32
FCN	Cityscapes	mIoU	72.25	72.36	-	72.36	72.35	74.19	72.35	72.35
PSPNet	Cityscapes	mIoU	78.55	78.66	-	78.26	78.24	77.97	78.09	78.67
deeplabv3	Cityscapes	mIoU	79.09	79.12	-	79.12	79.12	78.96	79.12	79.06
deeplabv3+	Cityscapes	mIoU	79.61	79.60	-	79.60	79.60	79.43	79.60	79.51
Fast-SCNN	Cityscapes	mIoU	70.96	70.96	-	70.93	70.92	66.00	70.92	-
UNet	Cityscapes	mIoU	69.10	-	-	69.10	69.10	68.95	-	-
ANN	Cityscapes	mIoU	77.40	-	-	77.32	77.32	-	-	-
APCNet	Cityscapes	mIoU	77.40	-	-	77.32	77.32	-	-	-
BiSeNetV1	Cityscapes	mIoU	74.44	-	-	74.44	74.43	-	-	-
BiSeNetV2	Cityscapes	mIoU	73.21	-	-	73.21	73.21	-	-	-
CGNet	Cityscapes	mIoU	68.25	-	-	68.27	68.27	-	-	-
EMANet	Cityscapes	mIoU	77.59	-	-	77.59	77.6	-	-	-
EncNet	Cityscapes	mIoU	75.67	-	-	75.66	75.66	-	-	-
ERFNet	Cityscapes	mIoU	71.08	-	-	71.08	71.07	-	-	-
FastFCN	Cityscapes	mIoU	79.12	-	-	79.12	79.12	-	-	-
GCNet	Cityscapes	mIoU	77.69	-	-	77.69	77.69	-	-	-
ICNet	Cityscapes	mIoU	76.29	-	-	76.36	76.36	-	-	-
ISANet	Cityscapes	mIoU	78.49	-	-	78.49	78.49	-	-	-
OCRNet	Cityscapes	mIoU	74.30	-	-	73.66	73.67	-	-	-
PointRend	Cityscapes	mIoU	76.47	76.47	-	76.41	76.42	-	-	-
Semantic FPN	Cityscapes	mIoU	74.52	-	-	74.52	74.52	-	-	-
STDC	Cityscapes	mIoU	75.10	-	-	75.10	75.10	-	-	-
STDC	Cityscapes	mIoU	77.17	-	-	77.17	77.17	-	-	-
UPerNet	Cityscapes	mIoU	77.10	-	-	77.19	77.18	-	-	-
Segmenter	ADE20K	mIoU	44.32	44.29	44.29	44.29	43.34	43.35	-	-

mmpose				Pytorch	ONNXRuntime	TensorRT		PPLNN	OpenVINO
model	task	dataset	metric	fp32	fp32	fp32	fp16	fp16	fp32
HRNet	Pose Detection	COCO	AP	0.748	0.748	0.748	0.748	-	0.748
HRNet	Pose Detection	COCO	AR	0.802	0.802	0.802	0.802	-	0.802
LiteHRNet	Pose Detection	COCO	AP	0.663	0.663	0.663	-	-	0.663
LiteHRNet	Pose Detection	COCO	AR	0.728	0.728	0.728	-	-	0.728
MSPN	Pose Detection	COCO	AP	0.762	0.762	0.762	0.762	-	0.762
MSPN	Pose Detection	COCO	AR	0.825	0.825	0.825	0.825	-	0.825
Hourglass	Pose Detection	COCO	AP	0.717	0.717	0.717	0.717	-	0.717
Hourglass	Pose Detection	COCO	AR	0.774	0.774	0.774	0.774	-	0.774
SimCC	Pose Detection	COCO	AP	0.607	-	0.608	-	-	-
SimCC	Pose Detection	COCO	AR	0.668	-	0.672	-	-	-

mmrotate				Pytorch	ONNXRuntime	TensorRT		PPLNN	OpenVINO
model	task	dataset	metrics	fp32	fp32	fp32	fp16	fp16	fp32
RotatedRetinaNet	Rotated Detection	DOTA-v1.0	mAP	0.698	0.698	0.698	0.697	-	-
Oriented RCNN	Rotated Detection	DOTA-v1.0	mAP	0.756	0.756	0.758	0.730	-	-
GlidingVertex	Rotated Detection	DOTA-v1.0	mAP	0.732	-	0.733	0.731	-	-
RoI Transformer	Rotated Detection	DOTA-v1.0	mAP	0.761	-	0.758	-	-	-

mmaction2				Pytorch	ONNXRuntime	TensorRT		PPLNN	OpenVINO
model	task	dataset	metrics	fp32	fp32	fp32	fp16	fp16	fp32
TSN	Recognition	Kinetics-400	top-1	69.71	-	69.71	-	-	-
TSN	Recognition	Kinetics-400	top-5	88.75	-	88.75	-	-	-
SlowFast	Recognition	Kinetics-400	top-1	74.45	-	75.62	-	-	-
SlowFast	Recognition	Kinetics-400	top-5	91.55	-	92.10	-	-	-

## 备注

由于某些数据集在代码库中包含各种分辨率的图像，例如 MMDet，速度基准是通过 MMDeploy 中的静态配置获得的，而性能基准是通过动态配置获得的
TensorRT 的一些 int8 性能基准测试需要有 tensor core 的 Nvidia 卡，否则性能会大幅下降
DBNet 在模型 neck 使用了nearest 插值，TensorRT-7 用了与 Pytorch 完全不同的策略。为了使与 TensorRT-7 兼容，我们重写了neck以使用bilinear插值，这提高了检测性能。为了获得与 Pytorch 匹配的性能，推荐使用 TensorRT-8+，其插值方法与 Pytorch 相同。
对于 mmpose 模型，在模型配置文件中 flip_test 需设置为 False
部分模型在 fp16 模式下可能存在较大的精度损失，请根据具体情况对模型进行调整。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark.md

benchmark.md

精度速度测试结果

Backends

软硬件环境

配置

速度测试

精度测试

Files

benchmark.md

Latest commit

History

benchmark.md

File metadata and controls

精度速度测试结果

Backends

软硬件环境

配置

速度测试

精度测试