Performance focused implementation of Mask RCNN based on the Tensorpack implementation. The original paper: Mask R-CNN
This implementation of Mask RCNN is focused on increasing training throughput without sacrificing any accuracy. We do this by training with a batch size > 1 per GPU using FP16 and two custom TF ops.
Training on N GPUs (V100s in our experiments) with a per-gpu batch size of M = NxM training
Training converges to target accuracy for configurations from 8x1 up to 32x4 training. Training throughput is substantially improved from original Tensorpack code.
A pre-built dockerfile is available in DockerHub under armandmcqueen/tensorpack-mask-rcnn:master-latest
. It is automatically built on each commit to master.
- Running this codebase requires a custom TF binary - available under GitHub releases (custom ops and fix for bug introduced in TF 1.13
- We give some details the codebase and optimizations in
CODEBASE.md
Container is recommended for training
The result was running on P3dn.24xl instances using EKS. 12 epochs training:
Num_GPUs x Images_Per_GPU | Training time | Box mAP | Mask mAP |
---|---|---|---|
8x4 | 5.09h | 37.47% | 34.45% |
16x4 | 3.11h | 37.41% | 34.47% |
32x4 | 1.94h | 37.20% | 34.25% |
24 epochs training:
Num_GPUs x Images_Per_GPU | Training time | Box mAP | Mask mAP |
---|---|---|---|
8x4 | 9.78h | 38.25% | 35.08% |
16x4 | 5.60h | 38.44% | 35.18% |
32x4 | 3.33h | 38.33% | 35.12% |
Forked from the excellent Tensorpack repo at commit a9dce5b220dca34b15122a9329ba9ff055e8edc6