Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking

Abstract

3D multi-object tracking plays a critical role in autonomous driving by enabling the real-time monitoring and prediction of multiple objects’ movements. Traditional 3D tracking systems are typically constrained by predefined object categories, limiting their adaptability to novel, unseen objects in dynamic environments. To address this limitation, we introduce open-vocabulary 3D tracking, which extends the scope of 3D tracking to include objects beyond predefined categories. We formulate the problem of open-vocabulary 3D tracking and introduce dataset splits designed to represent various open-vocabulary scenarios. We propose a novel approach that integrates open-vocabulary capabilities into a 3D tracking framework, allowing for generalization to unseen object classes. Our method effectively reduces the performance gap between tracking known and novel objects through strategic adaptation. Experimental results demonstrate the robustness and adaptability of our method in diverse outdoor driving scenarios. To the best of our knowledge, this work is the first to address open-vocabulary 3D tracking, presenting a significant advancement for autonomous systems in real-world settings.

Getting Started

Clone the repository and submodules:

git clone --recurse-submodules https://github.com/ayesha-ishaq/Open3DTrack

Environment Setup

Create and activate the Conda environment:

conda create -n open3dtrack python=3.10
conda activate open3dtrack

Install dependencies:

pip install ultralytics
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
pip install nuscenes-devkit matplotlib pandas motmetrics==1.2.0
conda install pyg -c pyg
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-1.13.0+cu116.html

Install third-party dependencies:
Refer to this repository for additional installations.

Dataset Preparation

Download NuScenes dataset: NuScenes (Keyframes only)
Generate YOLOWorld Detections: Set the $dataset_dir in yoloworld.py to <path_to_nuscenes> $output_dir to <path_to_yoloworld_detections>
```
python yoloworld.py
```
3D Detector Detections:
- CenterPoint Detection
- Megvii Detection
- BEVFusion Detection (To obtain BEVFusion results, you must install BEVFusion and run inference. Alternatively, you can use BEVFusion from mmdetection3d)

Rename the files as train.json and val.json and save them at <path_to_3D_detections>

Generate Split Data: To preprocess 3D and 2D detections and obtain data for 3D tracker run: Split scenarios are as proposed in the paper: rare, urban, and diverse. Set <split_name> to one of these.

python generate_data_yoloworld.py \
--dataset_dir <path_to_nuscenes> \
--detection_dir <path_to_3D_detections> \
--output_dir <path_to_output> \
--yoloworld_dir <path_to_yoloworld_detections> \
--data_split_scenario <split_name> --apply_nms

Training and Evaluation

Update Config File:
Set paths for dataset and processed data in config/default.json, and set field split to the desired scenario (rare, urban, diverse)
Training:
```
python train.py -c config/default.json
```

Evaluation:

python train.py -c config/default.json -r <path_to_checkpoint> --eval_only -o <path_to_result_folder>

Experimental Results

Split	AMOTA	AMOTP	Bicycle	Bus	Car	Motorcycle	Pedestrian	Trailer	Truck	Checkpoint
Rare	0.578	0.783	0.445	0.612	0.779	0.469	0.752	0.477	0.511	weights
Urban	0.590	0.677	0.400	0.683	0.788	0.702	0.548	0.488	0.522	weights
Diverse	0.536	0.804	0.524	0.770	0.708	0.438	0.564	0.470	0.276	weights
UpperBound (3DMOTFormer)	0.710	0.521	0.545	0.853	0.838	0.723	0.812	0.509	0.690

AMOTA and AMOTP show overall results, while only AMOTA results are shown for each class here. Bold values indicate novel classes of that split.

Acknowledgements

We would like to thank the following contributors and organizations for their support and resources:

3DMOTFormer
NuScenes
Ultralytics
All collaborators and contributors who have helped shape this project.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
CenterPoint @ 3cf7d87		CenterPoint @ 3cf7d87
SimpleTrack @ 05c96bb		SimpleTrack @ 05c96bb
base		base
config		config
dataset		dataset
eval		eval
img		img
logger		logger
model		model
trainer		trainer
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
CITATION.cff		CITATION.cff
README.md		README.md
generate_data_yoloworld.py		generate_data_yoloworld.py
parse_config.py		parse_config.py
train.py		train.py
yoloworld.py		yoloworld.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking

Abstract

Getting Started

Environment Setup

Dataset Preparation

Training and Evaluation

Experimental Results

AMOTA and AMOTP show overall results, while only AMOTA results are shown for each class here. Bold values indicate novel classes of that split.

Acknowledgements

About

Languages

ayesha-ishaq/Open3DTrack

Folders and files

Latest commit

History

Repository files navigation

Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking

Abstract

Getting Started

Environment Setup

Dataset Preparation

Training and Evaluation

Experimental Results

AMOTA and AMOTP show overall results, while only AMOTA results are shown for each class here. Bold values indicate novel classes of that split.

Acknowledgements

About

Resources

Stars

Watchers

Forks

Languages