Tensorflow implementation of unsupervised single image depth prediction using a convolutional neural network.
Unsupervised Monocular Depth Estimation with Left-Right Consistency
Clément Godard, Oisin Mac Aodha and Gabriel J. Brostow
CVPR 2017
For more details:
project page
arXiv
Digging Into Self-Supervised Monocular Depth Estimation
Clément Godard, Oisin Mac Aodha, Michael Firman and Gabriel J. BrostowImproved accuracy, monocular training, and shorter training times!
This code was tested with Tensorflow 1.0, CUDA 8.0 and Ubuntu 16.04.
Training takes about 30 hours with the default parameters on the kitti split on a single Titan X machine.
You can train on multiple GPUs by setting them with the --num_gpus
flag, make sure your batch_size
is divisible by num_gpus
.
There is a simple mode monodepth_simple.py
which allows you to quickly run our model on a test image.
Make sure your first download one of the pretrained models in this example we will use model_cityscapes
.
python monodepth_simple.py --image_path ~/my_image.jpg --checkpoint_path ~/models/model_cityscapes
Please note that there is NO extension after the checkpoint name
This model requires rectified stereo pairs for training.
There are two main datasets available:
We used two different split of the data, kitti and eigen, amounting for respectively 29000 and 22600 training samples, you can find them in the filenames folder.
You can download the entire raw dataset by running:
wget -i utils/kitti_archives_to_download.txt -P ~/my/output/folder/
Warning: it weights about 175GB, make sure you have enough space to unzip too!
To save space you can convert the png images to jpeg.
find ~/my/output/folder/ -name '*.png' | parallel 'convert {.}.png {.}.jpg && rm {}'
You will need to register in order to download the data, which already has a train/val/test set with 22973 training images.
We used leftImg8bit_trainvaltest.zip
, rightImg8bit_trainvaltest.zip
, leftImg8bit_trainextra.zip
and rightImg8bit_trainextra.zip
which weights 110GB.
Warning: The input sizes need to be mutiples of 128 for vgg
or 64 for resnet50
.
The model's dataloader expects a data folder path as well as a list of filenames (relative to the root data folder):
python monodepth_main.py --mode train --model_name my_model --data_path ~/data/KITTI/ \
--filenames_file ~/code/monodepth/utils/filenames/kitti_train_files.txt --log_directory ~/tmp/
You can continue training by loading the last saved checkpoint using --checkpoint_path
and pointing to it:
python monodepth_main.py --mode train --model_name my_model --data_path ~/data/KITTI/ \
--filenames_file ~/code/monodepth/utils/filenames/kitti_train_files.txt --log_directory ~/tmp/ \
--checkpoint_path ~/tmp/my_model/model-50000
You can also fine-tune from a checkpoint using --retrain
.
You can monitor the learning process using tensorboard
and pointing it to your chosen log_directory
.
By default the model only saves a reduced summary to save disk space, you can disable this using --full_summary
.
Please look at the main file for all the available options.
To test change the --mode
flag to test
, the network will output the disparities in the model folder or in any other folder you specify wiht --output_directory
.
You will also need to load the checkpoint you want to test on, this can be done with --checkpoint_path
:
python monodepth_main.py --mode test --data_path ~/data/KITTI/ \
--filenames_file ~/code/monodepth/utils/filenames/kitti_stereo_2015_test_files.txt --log_directory ~/tmp/ \
--checkpoint_path ~/tmp/my_model/model-181250
Please note that there is NO extension after the checkpoint name
If your test filenames contain two files per line the model will ignore the second one, unless you use the --do_stereo
flag.
The network will output two files disparities.npy
and disparities_pp.npy
, respecively for raw and post-processed disparities.
To evaluate run:
python utils/evaluate_kitti.py --split kitti --predicted_disp_path ~/tmp/my_model/disparities.npy \
--gt_path ~/data/KITTI/
The --split
flag allows you to choose which dataset you want to test on.
kitti
corresponds to the 200 official training set pairs from KITTI stereo 2015.eigen
corresponds to the 697 test images used by Eigen NIPS14 and uses the raw LIDAR points.
Warning: The results on the Eigen split are usually cropped, which you can do by passing the --garg_crop
flag.
You can download our pre-trained models to an existing directory by running:
sh ./utils/get_model.sh model_name output_directory
All our models were trained for 50 epochs, 512x256 resolution and a batch size of 8, please see our paper for more details.
We converted KITTI and Cityscapes to jpeg before training.
Here are all the models available:
model_kitti
: Our main model trained on the kitti splitmodel_eigen
: Our main model trained on the eigen splitmodel_cityscapes
: Our main model trained on cityscapesmodel_city2kitti
:model_cityscapes
fine-tuned on kittimodel_city2eigen
:model_cityscapes
fine-tuned on eigenmodel_kitti_stereo
: Our stereo model trained on the kitti split for 12 epochs, make sure to use--do_stereo
when using it
All our models, except for stereo, have a Resnet50 variant which you can get by adding _resnet
to the model name.
To test or train using these variants, you need to use the flag --encoder resnet50
.
You can download our results (unscaled disparities at 512x256) on both KITTI splits (kitti and eigen) here.
The naming convention is the same as with the models.
If you find our work useful in your research please consider citing our paper:
@inproceedings{monodepth17,
title = {Unsupervised Monocular Depth Estimation with Left-Right Consistency},
author = {Cl{\'{e}}ment Godard and
Oisin {Mac Aodha} and
Gabriel J. Brostow},
booktitle = {CVPR},
year = {2017}
}
Copyright © Niantic, Inc. 2018. Patent Pending. All rights reserved.
This Software is licensed under the terms of the UCLB ACP-A Licence which allows for non-commercial use only, the full terms of which are made available in the LICENSE file. For any other use of the software not covered by the terms of this licence, please contact [email protected]