Tools to run ops on drive datasets.
For proficiency you must be familiar with our drive data schema and nomenclature.
Tools are packaged as a CLI which can be installed and standalone Python scripts for 3rd party data.
Tools contained in this repo will be your one-stop needs for querying our postgres drive database for statistics aggregation, building training sets, setting up annotation projects.
For are
3rd_party datasets
you can pull dataset chunks and store them for development.
To fetch this repo and its submodules clone as
git clone ssh://[email protected]:443/am/roam/perception/data-catalogue.git --recurse-submodules
The IO ops are performed on S3 and Postgres database hence you'd need your AWS credentials under ~/.aws/credentials
. To setup your perception
credentials for the first time follow these steps. You can configure each profile f.ex
# default profile
aws cli configure --profile perception
Update your credentials list by adding the following profiles. Please name the profile names as 👇 for sane CLI usage. Use the AWS keys provide here
# end2end integration testing
das-testing -> aws configure --profile perception-cli-testing
# development break everything
das-data-catalogue-dev -> aws configure --profile perception-cli-develop
# staging with clean valid and models
das-data-catalogue -> aws configure --profile perception-cli-staging
We provide a convenience Makefile to build and run self-contained reproducible Docker images
make
make run datadir=/nas
The datadir=/nas
option will mount the host's /nas
directory to /data
inside the container.
Set up you enviroment before you push/transform data to our buckets. Chose from default
, develop
, testing
, staging
.
# data-catalogue env set <enviroment name>
# To set up develop environment
data-catalogue env set develop
# Check is set up correctly
data-catalogue env get
[configure.py][get_environ:69][INFO] Environment name : develop
[configure.py][get_environ:70][INFO] Profile name : perception-cli-develop
See the beautiful help pages we automagically create
./bin/data-catalogue --help
- Waylens instructions
- OnePlus instructions
- HW Kit instruction
Following scripts would help you fetch/pull 3rd-party dataset from the web/S3.
Berkely Deep-Drive Dataset
The following command would download the chunks of dataset in zips at the destination location. The command below would download the zips from the BDD100k website.
# http://dl.yf.io/bdd-data/bdd100k/video_parts/bdd100k_videos_train_00.zip
# http://dl.yf.io/bdd-data/bdd100k/video_parts/bdd100k_videos_train_01.zip
# http://dl.yf.io/bdd-data/bdd100k/video_parts/bdd100k_videos_train_02.zip
python scripts/fetch-bdd100k-data.py -c train -r 0 3 -d /data/datasets/bdd/zips/
To fetch myTaxi GPS traces you'd first need to setup your AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
. The access keys for myTaxi data give you access to buckets with GPS traces. Ask your realtime team member to provide you with one :).
python scripts/fetch-mytaxi-light.py --help
Fetch myTaxi dataset
optional arguments:
-h, --help show this help message and exit
-bucket_name BUCKET_NAME, -b BUCKET_NAME
S3 Bucket with objects to download
-object_name OBJECT_NAME, -o OBJECT_NAME
S3 Object in the bucket
-destination DESTINATION, -d DESTINATION
Location where to download the file
python scripts/fetch-mytaxi-light.py -b s3://com.mytaxi.sophie.das.exchange/booking_location_das_2 -o 00000_0 -d /data/datasets/mytaxi/
OpenImagsV5 Dataset
You can download parts of the dataset by desired label or category.
Usage example: python3 downloadOI.py --classes 'vehicle_registration_plate, person' --mode train
- Berkley Deep Drive dataset
- myTaxi GPS traces
[TODO] : Add link to wiki
- Alpha Fleet Drive data
[TODO] : Add link to wiki
- OpenImagesV5
-
Add S3/url location for data sources[04/22/2019] -
Mock drive data from BDD100K sample videos[04/30/2019] -
Mock map updates from BDD100K sample videos[05/02/2019] -
Oneplus Transform[07/16/2019] -
CLI - WIP : 29/05/2019 : Harsimrat