This tutorial explains how to convert a YOLOv6 model into the NCNN format, as well as some common issues that may arise during the conversion process. This tutorial covers packaging and debugging in the lite.ai.toolkit platform on the macOS environment.
There are currently two paths to convert a model to the NCNN format: the first path is from PyTorch to ONNX to NCNN, and the second path is from PyTorch to TorchScript to ONNX to NCNN.
- First path: Build NCNN
- Second path: Build NCNN and PNNX. If you don't want to build PNNX, maybe have a try: PNNX releases
-
Prepare the original .pt file under the ./path/to/yolov6 directory.
-
(Path 2)Prepare the export_pt.py file under the ./path/to/yolov6/deploy directory. And you should modify the code as the following tutorial.
- Export ONNX model as following command:
python deploy/ONNX/export_onnx.py --weights ./path/to/yolov6s.pt --device 0 --simplify --batch [1 or 32]
- Use the onnx2ncnn tool to convert the ONNX model to NCNN format:
./onnx2ncnn ./path/to/yolov6s.onnx ./path/to/save/yolov6s.param /path/to/save/yolov6s.bin
- Modify the export_pt.py as follow
Show/Hide export.py
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
import argparse
import sys
import os
import torch
import torch.nn as nn
ROOT = os.getcwd()
if str(ROOT) not in sys.path:
sys.path.append(str(ROOT))
from yolov6.models.yolo import *
from yolov6.models.effidehead import Detect
from yolov6.layers.common import *
from yolov6.utils.events import LOGGER
from yolov6.utils.checkpoint import load_checkpoint
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--weights', type=str, default='./yolov6s.pt', help='weights path')
parser.add_argument('--half', action='store_true', help='FP16 half-precision export')
parser.add_argument('--device', default='cpu', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
parser.add_argument('--inplace', action='store_true', help='set Detect() inplace=True')
args = parser.parse_args()
print(args)
cuda = args.device != 'cpu' and torch.cuda.is_available()
device = torch.device(f'cuda:{args.device}' if cuda else 'cpu')
assert not (device.type == 'cpu' and args.half), '--half only compatible with GPU export, i.e. use --device 0'
model = load_checkpoint(args.weights, map_location=device, inplace=True, fuse=True) # load FP32 model
for layer in model.modules():
if isinstance(layer, RepVGGBlock):
layer.switch_to_deploy()
if args.half:
model = model.half()
model.eval()
for k, m in model.named_modules():
if isinstance(m, Conv):
if isinstance(m.act, nn.SiLU):
m.act = SiLU()
elif isinstance(m, Detect):
m.inplace = args.inplace
x = torch.rand(1, 3, 512, 512)
mod = torch.jit.trace(model, x)
mod.save("your_filename.pt")
- Then, run the export_pt.py in shell
python ./path/to/yolov6/deploy/export_pt.py --weights ./path/to/yolov6s.pt
The above code throws an error that it cannot output a List. To fix this, modify the forward function of the Model in yolov6/models/yolo.py to return x only if export_mode is True, otherwise return a List [x, featmaps].
- Copy the generated new .pt file to the directory where the pnnx script is located, and then execute following command.
./path/to/pnnx ./path/to/generate.pt inputshape=[1,3,640,640] #windows
./path/to/pnnx ./path/to/generate.pt inputshape="[1,3,640,640]" #mac and linux
In most versions of ncnn, there are some issues with directly generating ncnn as mentioned above, manifested as xywh being all 0 or random numbers. This is because some versions of ncnn have problems with broadcast multiplication, which requires modifying the param file.
-
Open *.param and find the parameter name that corresponds to the output of the last Mul operator and the first input.
-
Change the output corresponding to the last concat operation's first input from the output mentioned in a to the first input.
Show/Hide modified.param
#The parameter names corresponding to the output of step a and the first input are 182 (output) and 180 (first input) on line 162.
#The specific operation of step b is to change 182 to 180 in line 165.
7767517
163 186
Input in0 0 1 in0
Convolution convrelu_0 1 1 in0 1 0=16 1=3 11=3 12=1 13=2 14=1 2=1 3=2 4=1 5=1 6=432 9=1
Convolution convrelu_1 1 1 1 2 0=32 1=3 11=3 12=1 13=2 14=1 2=1 3=2 4=1 5=1 6=4608 9=1
Convolution convrelu_2 1 1 2 3 0=32 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=9216 9=1
Convolution convrelu_3 1 1 3 4 0=32 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=9216 9=1
Split splitncnn_0 1 2 4 5 6
Convolution convrelu_4 1 1 6 7 0=64 1=3 11=3 12=1 13=2 14=1 2=1 3=2 4=1 5=1 6=18432 9=1
Convolution convrelu_5 1 1 7 8 0=64 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=36864 9=1
Convolution convrelu_6 1 1 8 9 0=64 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=36864 9=1
Convolution convrelu_7 1 1 9 10 0=64 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=36864 9=1
Convolution convrelu_8 1 1 10 11 0=64 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=36864 9=1
Split splitncnn_1 1 3 11 12 13 14
Convolution convrelu_9 1 1 14 15 0=128 1=3 11=3 12=1 13=2 14=1 2=1 3=2 4=1 5=1 6=73728 9=1
Convolution convrelu_10 1 1 15 16 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456 9=1
Convolution convrelu_11 1 1 16 17 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456 9=1
Convolution convrelu_12 1 1 17 18 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456 9=1
Convolution convrelu_13 1 1 18 19 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456 9=1
Convolution convrelu_14 1 1 19 20 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456 9=1
Convolution convrelu_15 1 1 20 21 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456 9=1
Split splitncnn_2 1 3 21 22 23 24
Convolution convrelu_16 1 1 24 25 0=192 1=3 11=3 12=1 13=2 14=1 2=1 3=2 4=1 5=1 6=221184 9=1
Convolution convrelu_17 1 1 25 26 0=192 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=331776 9=1
Convolution convrelu_18 1 1 26 27 0=192 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=331776 9=1
Split splitncnn_3 1 2 27 28 29
Convolution convrelu_19 1 1 29 30 0=256 1=3 11=3 12=1 13=2 14=1 2=1 3=2 4=1 5=1 6=442368 9=1
Convolution convrelu_20 1 1 30 31 0=256 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=589824 9=1
Convolution convrelu_21 1 1 31 32 0=256 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=589824 9=1
Split splitncnn_4 1 2 32 33 34
Convolution convrelu_22 1 1 34 35 0=128 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=32768 9=1
Convolution convrelu_23 1 1 35 36 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456 9=1
Convolution convrelu_24 1 1 36 37 0=128 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=16384 9=1
Split splitncnn_5 1 2 37 38 39
Pooling maxpool2d_110 1 1 39 40 0=0 1=5 11=5 12=1 13=2 2=1 3=2 5=1
Split splitncnn_6 1 2 40 41 42
Pooling maxpool2d_111 1 1 42 43 0=0 1=5 11=5 12=1 13=2 2=1 3=2 5=1
Split splitncnn_7 1 2 43 44 45
Pooling maxpool2d_112 1 1 45 46 0=0 1=5 11=5 12=1 13=2 2=1 3=2 5=1
Concat cat_0 4 1 38 41 44 46 47 0=0
Convolution convrelu_27 1 1 33 48 0=128 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=32768 9=1
Convolution convrelu_25 1 1 47 49 0=128 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=65536 9=1
Convolution convrelu_26 1 1 49 50 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456 9=1
Concat cat_1 2 1 48 50 51 0=0
Convolution convrelu_28 1 1 51 52 0=256 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=65536 9=1
Convolution convrelu_30 1 1 52 53 0=128 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=32768 9=1
Split splitncnn_8 1 2 53 54 55
Deconvolution deconv_107 1 1 55 56 0=128 1=2 11=2 12=1 13=2 14=0 18=0 19=0 2=1 3=2 4=0 5=1 6=65536
Convolution convrelu_32 1 1 28 57 0=128 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=24576 9=1
Convolution convrelu_29 1 1 23 58 0=128 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=16384 9=1
Convolution convrelu_31 1 1 58 59 0=128 1=3 11=3 12=1 13=2 14=1 2=1 3=2 4=1 5=1 6=147456 9=1
Concat cat_2 3 1 56 57 59 60 0=0
Convolution convrelu_33 1 1 60 61 0=128 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=49152 9=1
Convolution convrelu_34 1 1 61 62 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456 9=1
Convolution convrelu_35 1 1 62 63 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456 9=1
Convolution convrelu_36 1 1 63 64 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456 9=1
Convolution convrelu_37 1 1 64 65 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456 9=1
Convolution convrelu_39 1 1 65 66 0=64 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=8192 9=1
Split splitncnn_9 1 2 66 67 68
Deconvolution deconv_108 1 1 68 69 0=64 1=2 11=2 12=1 13=2 14=0 18=0 19=0 2=1 3=2 4=0 5=1 6=16384
Convolution convrelu_41 1 1 22 70 0=64 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=8192 9=1
Convolution convrelu_38 1 1 13 71 0=64 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=4096 9=1
Convolution convrelu_40 1 1 71 72 0=64 1=3 11=3 12=1 13=2 14=1 2=1 3=2 4=1 5=1 6=36864 9=1
Concat cat_3 3 1 69 70 72 73 0=0
Convolution convrelu_42 1 1 73 74 0=64 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=12288 9=1
Convolution convrelu_43 1 1 74 75 0=64 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=36864 9=1
Convolution convrelu_44 1 1 75 76 0=64 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=36864 9=1
Convolution convrelu_45 1 1 76 77 0=64 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=36864 9=1
Convolution convrelu_46 1 1 77 78 0=64 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=36864 9=1
Convolution convrelu_48 1 1 78 79 0=32 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=2048 9=1
Split splitncnn_10 1 2 79 80 81
Deconvolution deconv_109 1 1 81 82 0=32 1=2 11=2 12=1 13=2 14=0 18=0 19=0 2=1 3=2 4=0 5=1 6=4096
Convolution convrelu_50 1 1 12 83 0=32 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=2048 9=1
Convolution convrelu_47 1 1 5 84 0=32 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=1024 9=1
Convolution convrelu_49 1 1 84 85 0=32 1=3 11=3 12=1 13=2 14=1 2=1 3=2 4=1 5=1 6=9216 9=1
Concat cat_4 3 1 82 83 85 86 0=0
Convolution convrelu_51 1 1 86 87 0=32 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=3072 9=1
Convolution convrelu_52 1 1 87 88 0=32 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=9216 9=1
Convolution convrelu_53 1 1 88 89 0=32 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=9216 9=1
Convolution convrelu_54 1 1 89 90 0=32 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=9216 9=1
Convolution convrelu_55 1 1 90 91 0=32 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=9216 9=1
Split splitncnn_11 1 2 91 92 93
Convolution convrelu_56 1 1 93 94 0=32 1=3 11=3 12=1 13=2 14=1 2=1 3=2 4=1 5=1 6=9216 9=1
Concat cat_5 2 1 94 80 95 0=0
Convolution convrelu_57 1 1 95 96 0=64 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=36864 9=1
Convolution convrelu_58 1 1 96 97 0=64 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=36864 9=1
Convolution convrelu_59 1 1 97 98 0=64 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=36864 9=1
Convolution convrelu_60 1 1 98 99 0=64 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=36864 9=1
Split splitncnn_12 1 2 99 100 101
Convolution convrelu_61 1 1 101 102 0=64 1=3 11=3 12=1 13=2 14=1 2=1 3=2 4=1 5=1 6=36864 9=1
Concat cat_6 2 1 102 67 103 0=0
Convolution convrelu_62 1 1 103 104 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456 9=1
Convolution convrelu_63 1 1 104 105 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456 9=1
Convolution convrelu_64 1 1 105 106 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456 9=1
Convolution convrelu_65 1 1 106 107 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456 9=1
Split splitncnn_13 1 2 107 108 109
Convolution convrelu_66 1 1 109 110 0=128 1=3 11=3 12=1 13=2 14=1 2=1 3=2 4=1 5=1 6=147456 9=1
Concat cat_7 2 1 110 54 111 0=0
Convolution conv_87 1 1 92 112 0=32 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=1024
Swish silu_4 1 1 112 113
Split splitncnn_14 1 2 113 114 115
Convolution conv_88 1 1 115 116 0=32 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=9216
Swish silu_5 1 1 116 117
Convolution conv_90 1 1 114 118 0=32 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=9216
Swish silu_6 1 1 118 119
Convolution conv_92 1 1 100 120 0=64 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=4096
Swish silu_7 1 1 120 121
Split splitncnn_15 1 2 121 122 123
Convolution conv_93 1 1 123 124 0=64 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=36864
Swish silu_8 1 1 124 125
Convolution conv_95 1 1 122 126 0=64 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=36864
Swish silu_9 1 1 126 127
Convolution conv_97 1 1 108 128 0=128 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=16384
Swish silu_10 1 1 128 129
Split splitncnn_16 1 2 129 130 131
Convolution conv_98 1 1 131 132 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456
Swish silu_11 1 1 132 133
Convolution conv_100 1 1 130 134 0=128 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=147456
Swish silu_12 1 1 134 135
Convolution convrelu_67 1 1 111 136 0=256 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=589824 9=1
Convolution convrelu_68 1 1 136 137 0=256 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=589824 9=1
Convolution convrelu_69 1 1 137 138 0=256 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=589824 9=1
Convolution convrelu_70 1 1 138 139 0=256 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=589824 9=1
Convolution conv_102 1 1 139 140 0=256 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=65536
Swish silu_13 1 1 140 141
Split splitncnn_17 1 2 141 142 143
Convolution conv_103 1 1 143 144 0=256 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=589824
Swish silu_14 1 1 144 145
Convolution conv_105 1 1 142 146 0=256 1=3 11=3 12=1 13=1 14=1 2=1 3=1 4=1 5=1 6=589824
Swish silu_15 1 1 146 147
Convolution convsigmoid_74 1 1 117 148 0=80 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=2560 9=4
Reshape reshape_187 1 1 148 149 0=4096 1=80
Convolution convsigmoid_73 1 1 125 150 0=80 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=5120 9=4
Reshape reshape_186 1 1 150 151 0=1024 1=80
Convolution convsigmoid_72 1 1 133 152 0=80 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=10240 9=4
Reshape reshape_185 1 1 152 153 0=256 1=80
Convolution convsigmoid_71 1 1 145 154 0=80 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=20480 9=4
Reshape reshape_184 1 1 154 155 0=64 1=80
Concat cat_8 4 1 149 151 153 155 156 0=1
Convolution conv_106 1 1 147 157 0=4 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=1024
Convolution conv_101 1 1 135 158 0=4 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=512
Convolution conv_96 1 1 127 159 0=4 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=256
Convolution conv_91 1 1 119 160 0=4 1=1 11=1 12=1 13=1 14=0 2=1 3=1 4=0 5=1 6=128
Reshape reshape_191 1 1 160 161 0=4096 1=4
Reshape reshape_190 1 1 159 162 0=1024 1=4
Reshape reshape_189 1 1 158 163 0=256 1=4
Reshape reshape_188 1 1 157 164 0=64 1=4
Concat cat_9 4 1 161 162 163 164 165 0=1
Permute permute_192 1 1 165 166 0=1
Slice split_0 1 2 166 167 168 -23300=2,2,-233 1=1
MemoryData pnnx_fold_anchor_points.1 0 1 169 0=2 1=5440
MemoryData pnnx_fold_anchor_points.1_1 0 1 170 0=2 1=5440
BinaryOp sub_0 2 1 169 167 171 0=1
Split splitncnn_18 1 2 171 172 173
BinaryOp add_1 2 1 170 168 174 0=0
Split splitncnn_19 1 2 174 175 176
BinaryOp add_2 2 1 172 175 177 0=0
BinaryOp div_3 1 1 177 178 0=3 1=1 2=2.000000e+00
BinaryOp sub_4 2 1 176 173 179 0=1
Concat cat_10 2 1 178 179 180 0=1
MemoryData pnnx_fold_stride_tensor.1 0 1 181 0=1 1=5440
BinaryOp mul_5 2 1 180 181 182 0=2
MemoryData pnnx_fold_925 0 1 183 0=1 1=5440
Permute permute_193 1 1 156 184 0=1
Concat cat_11 3 1 180 183 184 out0 0=1
#origin : Concat cat_11 3 1 182 183 184 out0 0=1
This modification means that some operations in the head need to be added to the post-processing of the used framework. Next, we will use lite.ai.toolkit as an example to explain.
- Modify ./path/to/lite.ai.Toolkit/examples/lite/cv/test_lite_yolov6.cpp as
Show/Hide test_lite_yolov6.cpp
// // Created by DefTruth on 2022/6/25. //#include "lite/lite.h"
static void test_onnxruntime(std::string onnx)//保留onnx对比下效果,如果要更换onnx模型需更改onnx对应的头文件与代码
{
#ifdef ENABLE_ONNXRUNTIME
std::string onnx_path = "../../../hub/onnx/cv/" + onnx;
std::string test_img_path = "../../../examples/lite/resources/test_lite_yolov5_2.jpg";//切换为测试图片路径
std::string save_img_path = "../../../logs/test_oxr_yolov6_1.jpg";
// 2. Test Specific Engine ONNXRuntime
lite::onnxruntime::cv::detection::YOLOv6 *yolov6 =
new lite::onnxruntime::cv::detection::YOLOv6(onnx_path);
std::vector<lite::types::Boxf> detected_boxes;
cv::Mat img_bgr = cv::imread(test_img_path);
yolov6->detect(img_bgr, detected_boxes, 0.5);
lite::utils::draw_boxes_inplace(img_bgr, detected_boxes);
cv::imwrite(save_img_path, img_bgr);
std::cout << "ONNXRuntime Version Detected Boxes Num: " << detected_boxes.size() << std::endl;
delete yolov6;
#endif
}
static void test_ncnn(std::string ncnn_param, std::string ncnn_bin)
{
#ifdef ENABLE_NCNN
std::string param_path = "../../../hub/ncnn/cv/" + ncnn_param;
std::string bin_path = "../../../hub/ncnn/cv/" + ncnn_bin;
std::string test_img_path = "../../../examples/lite/resources/test_lite_yolov5_2.jpg"; //切换为测试图片路径
std::string save_img_path = "../../../logs/test_ncnn_yolov6_2.jpg";
// 4. Test Specific Engine NCNN
lite::ncnn::cv::detection::YOLOv6 *yolov6 =
new lite::ncnn::cv::detection::YOLOv6(param_path, bin_path);
std::vector<lite::types::Boxf> detected_boxes;
cv::Mat img_bgr = cv::imread(test_img_path);
yolov6->detect(img_bgr, detected_boxes);
lite::utils::draw_boxes_inplace(img_bgr, detected_boxes);
cv::imwrite(save_img_path, img_bgr);
std::cout << "NCNN Version Detected Boxes Num: " << detected_boxes.size() << std::endl;
delete yolov6;
#endif
}
static void test_lite(std::string onnx, std::string ncnn_param, std::string ncnn_bin)
{
test_onnxruntime(onnx);
test_ncnn(ncnn_param, ncnn_bin);
}
int main(__unused int argc, __unused char *argv[])
{
std::string onnx = argv[1];
std::string ncnn_param = argv[2];
std::string ncnn_bin = argv[3];
test_lite(onnx, ncnn_param, ncnn_bin);
return 0;
}
* Modify ./path/to/lite.ai.Toolkit/lite/ncnn/cv/ncnn_yolov6.h Line 28-29 to the input resolution of the ncnn model.
Show/Hide ncnn_yolov6.h
//
// Created by DefTruth on 2022/6/25.
//
#ifndef LITE_AI_TOOLKIT_NCNN_CV_NCNN_YOLOV6_H
#define LITE_AI_TOOLKIT_NCNN_CV_NCNN_YOLOV6_H
#include "lite/ncnn/core/ncnn_core.h"
namespace ncnncv
{
class LITE_EXPORTS NCNNYOLOv6
{
private:
ncnn::Net *net = nullptr;
const char *log_id = nullptr;
const char *param_path = nullptr;
const char *bin_path = nullptr;
std::vector<const char *> input_names;
std::vector<const char *> output_names;
std::vector<int> input_indexes;
std::vector<int> output_indexes;
public:
explicit NCNNYOLOv6(const std::string &_param_path,
const std::string &_bin_path,
unsigned int _num_threads = 1,
int _input_height = 512,
int _input_width = 512); //
~NCNNYOLOv6();
private:
// nested classes
typedef struct GridAndStride
{
int grid0;
int grid1;
int stride;
} YOLOv6Anchor;
typedef struct
{
float r;
int dw;
int dh;
int new_unpad_w;
int new_unpad_h;
bool flag;
} YOLOv6ScaleParams;
private:
const unsigned int num_threads; // initialize at runtime.
const int input_height; // 640/320
const int input_width; // 640/320
const char *class_names[80] = {
"person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
"fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
"elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
"skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
"tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
"sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
"potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard",
"cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase",
"scissors", "teddy bear", "hair drier", "toothbrush"
};
enum NMS
{
HARD = 0, BLEND = 1, OFFSET = 2
};
const float mean_vals[3] = {0.f, 0.f, 0.f}; // RGB
const float norm_vals[3] = {1.0 / 255.f, 1.0 / 255.f, 1.0 / 255.f};
static constexpr const unsigned int max_nms = 30000;
protected:
NCNNYOLOv6(const NCNNYOLOv6 &) = delete; //
NCNNYOLOv6(NCNNYOLOv6 &&) = delete; //
NCNNYOLOv6 &operator=(const NCNNYOLOv6 &) = delete; //
NCNNYOLOv6 &operator=(NCNNYOLOv6 &&) = delete; //
private:
void print_debug_string();
void transform(const cv::Mat &mat_rs, ncnn::Mat &in);
void resize_unscale(const cv::Mat &mat,
cv::Mat &mat_rs,
int target_height,
int target_width,
YOLOv6ScaleParams &scale_params);
void generate_anchors(const int target_height,
const int target_width,
std::vector<int> &strides,
std::vector<YOLOv6Anchor> &anchors);
void generate_bboxes(const YOLOv6ScaleParams &scale_params,
std::vector<types::Boxf> &bbox_collection,
ncnn::Extractor &extractor,
float score_threshold, int img_height,
int img_width); // rescale & exclude
void nms(std::vector<types::Boxf> &input, std::vector<types::Boxf> &output,
float iou_threshold, unsigned int topk, unsigned int nms_type);
public:
void detect(const cv::Mat &mat, std::vector<types::Boxf> &detected_boxes,
float score_threshold = 0.5f, float iou_threshold = 0.45f,
unsigned int topk = 100, unsigned int nms_type = NMS::OFFSET);
};
}
#endif //LITE_AI_TOOLKIT_NCNN_CV_NCNN_YOLOV6_H
* Modify ./path/to/lite.ai.Toolkit/lite/ncnn/cv/ncnn_yolov6.cpp
Show/Hide ncnn_yolov6.cpp
//
// Created by DefTruth on 2022/6/25.
//
#include "ncnn_yolov6.h"
#include "lite/utils.h"
using ncnncv::NCNNYOLOv6;
NCNNYOLOv6::NCNNYOLOv6(const std::string &_param_path,
const std::string &_bin_path,
unsigned int _num_threads,
int _input_height,
int _input_width) :
log_id(_param_path.data()), param_path(_param_path.data()),
bin_path(_bin_path.data()), num_threads(_num_threads),
input_height(_input_height), input_width(_input_width)
{
net = new ncnn::Net();
// init net, change this setting for better performance.
net->opt.use_fp16_arithmetic = false;
net->opt.use_vulkan_compute = false; // default
// setup Focus in yolov5
// net->register_custom_layer("YoloV5Focus", YoloV5Focus_layer_creator);
net->load_param(param_path);
net->load_model(bin_path);
#ifdef LITENCNN_DEBUG
this->print_debug_string();
#endif
}
NCNNYOLOv6::~NCNNYOLOv6()
{
if (net) delete net;
net = nullptr;
}
void NCNNYOLOv6::transform(const cv::Mat &mat_rs, ncnn::Mat &in)
{
// BGR NHWC -> RGB NCHW
in = ncnn::Mat::from_pixels(mat_rs.data, ncnn::Mat::PIXEL_BGR2RGB, input_width, input_height);
in.substract_mean_normalize(mean_vals, norm_vals);
}
// letterbox
void NCNNYOLOv6::resize_unscale(const cv::Mat &mat, cv::Mat &mat_rs,
int target_height, int target_width,
YOLOv6ScaleParams &scale_params)
{
if (mat.empty()) return;
int img_height = static_cast<int>(mat.rows);
int img_width = static_cast<int>(mat.cols);
mat_rs = cv::Mat(target_height, target_width, CV_8UC3,
cv::Scalar(114, 114, 114));
// scale ratio (new / old) new_shape(h,w)
float w_r = (float) target_width / (float) img_width;
float h_r = (float) target_height / (float) img_height;
float r = std::min(w_r, h_r);
// compute padding
int new_unpad_w = static_cast<int>((float) img_width * r); // floor
int new_unpad_h = static_cast<int>((float) img_height * r); // floor
int pad_w = target_width - new_unpad_w; // >=0
int pad_h = target_height - new_unpad_h; // >=0
int dw = pad_w / 2;
int dh = pad_h / 2;
// resize with unscaling
cv::Mat new_unpad_mat;
// cv::Mat new_unpad_mat = mat.clone(); // may not need clone.
cv::resize(mat, new_unpad_mat, cv::Size(new_unpad_w, new_unpad_h));
new_unpad_mat.copyTo(mat_rs(cv::Rect(dw, dh, new_unpad_w, new_unpad_h)));
// record scale params.
scale_params.r = r;
scale_params.dw = dw;
scale_params.dh = dh;
scale_params.new_unpad_w = new_unpad_w;
scale_params.new_unpad_h = new_unpad_h;
scale_params.flag = true;
}
void NCNNYOLOv6::detect(const cv::Mat &mat, std::vector<types::Boxf> &detected_boxes,
float score_threshold, float iou_threshold,
unsigned int topk, unsigned int nms_type)
{
if (mat.empty()) return;
int img_height = static_cast<int>(mat.rows);
int img_width = static_cast<int>(mat.cols);
// resize & unscale
cv::Mat mat_rs;
YOLOv6ScaleParams scale_params;
this->resize_unscale(mat, mat_rs, input_height, input_width, scale_params);
// 1. make input tensor
ncnn::Mat input;
this->transform(mat_rs, input);
// 2. inference & extract
auto extractor = net->create_extractor();
extractor.set_light_mode(false); // default
extractor.set_num_threads(num_threads);
extractor.input("in0", input);
// 3.rescale & exclude.
std::vector<types::Boxf> bbox_collection;
this->generate_bboxes(scale_params, bbox_collection, extractor, score_threshold, img_height, img_width);
// 4. hard|blend|offset nms with topk.
this->nms(bbox_collection, detected_boxes, iou_threshold, topk, nms_type);
}
void NCNNYOLOv6::generate_anchors(const int target_height,
const int target_width,
std::vector<int> &strides,
std::vector<YOLOv6Anchor> &anchors)
{
for (auto stride: strides)
{
int num_grid_w = target_width / stride;
int num_grid_h = target_height / stride;
for (int g1 = 0; g1 < num_grid_h; ++g1)
{
for (int g0 = 0; g0 < num_grid_w; ++g0)
{
YOLOv6Anchor anchor;
anchor.grid0 = g0;
anchor.grid1 = g1;
anchor.stride = stride;
anchors.push_back(anchor);
}
}
}
}
static inline float sigmoid(float x)
{
return static_cast<float>(1.f / (1.f + std::exp(-x)));
}
void NCNNYOLOv6::generate_bboxes(const YOLOv6ScaleParams &scale_params,
std::vector<types::Boxf> &bbox_collection,
ncnn::Extractor &extractor,
float score_threshold, int img_height,
int img_width)
{
ncnn::Mat outputs;
ncnn::Mat temp;
ncnn::Mat temp2;
extractor.extract("out0", outputs); // (1,n=?,85=5+80=cxcy+cwch+obj_conf+cls_conf)
extractor.extract("181", temp);
extractor.extract("180", temp2);
const float* ptr = temp.channel(0);
const float* ptr2 = temp2.channel(0);
std::cout << temp.dims << "\n";
const unsigned int num_anchors = outputs.h;
const unsigned int num_classes = outputs.w - 5;
std::vector<YOLOv6Anchor> anchors;
std::vector<int> strides = {8, 16, 32, 64}; // might have stride=64
this->generate_anchors(input_height, input_width, strides, anchors);
float r_ = scale_params.r;
int dw_ = scale_params.dw;
int dh_ = scale_params.dh;
bbox_collection.clear();
unsigned int count = 0;
for (unsigned int i = 0; i < num_anchors; ++i)
{
const float *offset_obj_cls_ptr =
(float *) outputs.data + (i * (num_classes + 5)); // row ptr
float obj_conf = offset_obj_cls_ptr[4];
if (obj_conf < score_threshold) continue; // filter first.
float cls_conf = offset_obj_cls_ptr[5];
unsigned int label = 0;
for (unsigned int j = 0; j < num_classes; ++j)
{
float tmp_conf = offset_obj_cls_ptr[j + 5];
if (tmp_conf > cls_conf)
{
cls_conf = tmp_conf;
label = j;
}
} // argmax
float conf = obj_conf * cls_conf; // cls_conf (0.,1.)
if (conf < score_threshold) continue; // filter
float dx = offset_obj_cls_ptr[0];
float dy = offset_obj_cls_ptr[1];
float dw = offset_obj_cls_ptr[2];
float dh = offset_obj_cls_ptr[3];
const int stride = anchors.at(i).stride;
float cx = dx * stride;
float cy = dy * stride;
float w = dw * stride;
float h = dh * stride;
float x1 = ((cx - w / 2.f) - (float) dw_) / r_;
float y1 = ((cy - h / 2.f) - (float) dh_) / r_;
float x2 = ((cx + w / 2.f) - (float) dw_) / r_;
float y2 = ((cy + h / 2.f) - (float) dh_) / r_;
std::cout << "x: " << cx << ", y: " << cy << " | w: " << dw << ", h: "<< dh << ", config: " << class_names[label] << "\n";
types::Boxf box;
box.x1 = std::max(0.f, x1);
box.y1 = std::max(0.f, y1);
box.x2 = std::min(x2, (float) img_width - 1.f);
box.y2 = std::min(y2, (float) img_height - 1.f);
box.score = conf;
box.label = label;
box.label_text = class_names[label];
box.flag = true;
bbox_collection.push_back(box);
count += 1; // limit boxes for nms.
if (count > max_nms)
break;
}
#if LITENCNN_DEBUG
std::cout << "detected num_anchors: " << num_anchors << "\n";
std::cout << "generate_bboxes num: " << bbox_collection.size() << "\n";
#endif
}
void NCNNYOLOv6::nms(std::vector<types::Boxf> &input, std::vector<types::Boxf> &output,
float iou_threshold, unsigned int topk,
unsigned int nms_type)
{
if (nms_type == NMS::BLEND) lite::utils::blending_nms(input, output, iou_threshold, topk);
else if (nms_type == NMS::OFFSET) lite::utils::offset_nms(input, output, iou_threshold, topk);
else lite::utils::hard_nms(input, output, iou_threshold, topk);
}
void NCNNYOLOv6::print_debug_string()
{
std::cout << "LITENCNN_DEBUG LogId: " << log_id << "\n";
input_indexes = net->input_indexes();
output_indexes = net->output_indexes();
#ifdef NCNN_STRING
input_names = net->input_names();
output_names = net->output_names();
#endif
std::cout << "=============== Input-Dims ==============\n";
for (int i = 0; i < input_indexes.size(); ++i)
{
std::cout << "Input: ";
auto tmp_in_blob = net->blobs().at(input_indexes.at(i));
#ifdef NCNN_STRING
std::cout << input_names.at(i) << ": ";
#endif
std::cout << "shape: c=" << tmp_in_blob.shape.c
<< " h=" << tmp_in_blob.shape.h << " w=" << tmp_in_blob.shape.w << "\n";
}
std::cout << "=============== Output-Dims ==============\n";
for (int i = 0; i < output_indexes.size(); ++i)
{
auto tmp_out_blob = net->blobs().at(output_indexes.at(i));
std::cout << "Output: ";
#ifdef NCNN_STRING
std::cout << output_names.at(i) << ": ";
#endif
std::cout << "shape: c=" << tmp_out_blob.shape.c
<< " h=" << tmp_out_blob.shape.h << " w=" << tmp_out_blob.shape.w << "\n";
}
std::cout << "========================================\n";
}
* Save yolov6s.onnx, yolov6s.param, yolov6s.bin to ./path/to/lite.ai.Toolkit/hub. Compile lite.ai.Toolkit. Then execute the following commands.
Show/Hide Commands
cd ./path/to/lite.ai.Toolkit/build/lite.ai.toolkit/bin
./lite_yolov6 yolov6s.onnx yolov6s.param yolov6s.bin
Model | Size | SpeedNCNN average (fps) |
Params (M) |
FLOPs (G) |
---|---|---|---|---|
YOLOv6Lite-L | 320*320 | 39.88 | 1.09 | 0.87 |
YOLOv6Lite-L | 320*192 | 64.51 | 1.09 | 0.52 |
YOLOv6Lite-L | 224*128 | 130.05 | 1.09 | 0.24 |
- Speed is tested with 2.6 GHz 6Core Intel Core i7 on macOS. And the architecture used in the speed test is Coffee Lake. During the speed measurement process, 1000 pictures were randomly sampled from the COCO dataset, and the average value of the speed measurement was taken as the final result.