Skip to content

Latest commit

 

History

History
165 lines (126 loc) · 5.34 KB

getting_started.md

File metadata and controls

165 lines (126 loc) · 5.34 KB

Getting Started

This page provides details about mmdet2trt.

dynamic shape/batched input

shape_ranges is used to set the min/optimize/max shape of the input tensor. For each dimension in it, min<=optimize<=max. For example:

shape_ranges=dict(
    x=dict(
        min=[1,3,320,320],
        opt=[1,3,800,1344],
        max=[1,3,1344,1344],
    )
)
trt_model = mmdet2trt(  ...,
                        shape_ranges=shape_ranges, # set the opt shape
                        ...)

This config will give you input tensor size between (320, 320) to (1344, 1344), max batch_size=4

Warning

Dynamic input shape and batch support might need more memory. Use fixed shape to avoid unnecessary memory usage(min=optimize=max).

fp16 support

fp16 mode can accelerate the inference. Set the fp16_mode=True to enable it.

trt_model = mmdet2trt(  ...,
                        fp16_mode=True, # enable fp16 mode
                        ...)

int8 support

  • set input8_mode=True.
  • provide calibrate dataset, the __getitem__() method of dataset should return a list of tensor with shape (C,H,W), the shape must be the same as shape_range['x']['opt'][1:] (optimize shape). The tensor should do the same preprocess as the model. There is a default dataset, you can also set your custom one.
  • set the calibrate algorithm, support entropy and minmax.
from mmdet2trt import mmdet2trt, Int8CalibDataset
cfg_path="..."  # MMDetection config path
model_path="..." # MMDetection checkpoint path
image_path_list = [...] # lists of image paths
shape_ranges=dict(
    x=dict(
        min=[...],
        opt=[...],
        max=[...],
    )
)
calib_dataset = Int8CalibDataset(image_path_list, cfg_path, shape_ranges)
trt_model = mmdet2trt(cfg_path, model_path,
                    shape_ranges=shape_ranges,
                    int8_mode=True,
                    int8_calib_dataset=calib_dataset,
                    int8_calib_alg="entropy")

Warning

Not all models support int8 mode.

max workspace size

Some layers need extra GPU memory. Any some optimization tactics also need more space. Please enlarge max_workspace_size may potentially accelerate your model with the cost of more memory.

use in c++

The converted model is a python warp of TensorRT engine. first, get the serialized engine from trt_model:

with open(engine_path, mode='wb') as f:
    f.write(model_trt.state_dict()['engine'])

Link the ${AMIRSTAN_PLUGIN_DIR}/build/lib/libamirstan_plugin.so in your project (or you can load it in runtime). Compile and load the engine.

Warning

might need to invoke initLibAmirstanInferPlugins() in amirInferPlugin.h to load the plugins.

The engine only contains inference forward. Preprocess(resize, normalize) and postprocess (divide scale factor) should be done in your project.

DeepStream support

when converting model, set the output names:

trt_model = mmdet2trt(  ...,
                        output_names=["num_detections", "boxes", "scores", "classes"], # output names
                        ...)

Create engine file:

with open(engine_path, mode='wb') as f:
    f.write(model_trt.state_dict()['engine'])

In the DeepStream model config file, set some config

[property]
...
net-scale-factor=0.0173         # compute from mean, std
offsets=123.675;116.28;103.53   # compute from mean, std
model-engine-file=trt.engine    # the engine file created by mmdet2trt
labelfile-path=labels.txt       # label file
...

In the same config file, set the plugin and parse function

[property]
...
parse-bbox-func-name=NvDsInferParseMmdet                # parse funtion name(amirstan plugin buildin)
output-bbox-name=boxes                                  # output name of the bounding box
output-blob-names=num_detections;boxes;scores;classes   # output blob names, same as convert output_names
custom-lib-path=libamirstan_plugin.so                   # amirstan plugin lib path
...

You might also need to set group_threshold=0, cause nvdsinfer would try to cluster the detected objects generated by the parse function. Read A problem about parse-bbox-func-name for more detail. (Thanks @Paweł Pęczek for providing the technical details.)

[class-attrs-all]
...
group-threshold=0
...

Enjoy the model in DeepStream.

Warning: I am not so familiar with DeepStream. If you find anything wrong above, please let me know.

instance segmentation support(experimentation)

set flag enable_mask to True

# enable mask
trt_model = mmdet2trt(... , enable_mask = True)

Note

The mask output is of shape [batch_size, num_boxes, 28, 28], the post-process of masks have not been included in the model. Please implement it by yourself if you want to integrate the converted engine into your own project.