Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/borglab/gtsfm into avg-co…
Browse files Browse the repository at this point in the history
…ord-nms
  • Loading branch information
senselessdev1 committed Sep 28, 2023
2 parents cd41a97 + af7ce85 commit 7896bf1
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 16 deletions.
24 changes: 13 additions & 11 deletions CLUSTER.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,27 +5,29 @@
GTSfM uses the [SSHCluster](https://docs.dask.org/en/stable/deploying-ssh.html#dask.distributed.SSHCluster) module of [Dask](https://distributed.dask.org/en/stable/) to provide cluster-utilization functionality for SfM execution. This readme is a step-by-step guide on how to set up your machines for a successful run on a cluster.

1. Choose which machine will serve as the scheduler. The data only needs to be on the scheduler node.
2. Enable passwordless SSH between all the workers on the cluster
- Log in into a machine
- For each of the other workers on the cluster run
2. Create a config file listing the IP addresses of cluster machines (example in [gtsfm/configs/cluster.yaml](https://github.com/borglab/gtsfm/blob/master/gtsfm/configs/cluster.yaml)).
- Note that the first worker in the cluster.yaml file must be the scheduler machine where the data is hosted.
3. Enable passwordless SSH between all the workers (machines) on the cluster.
- Log in individually to each machine listed in the cluster config file.
- For each of the other machines on the cluster, run:
* ```bash
ssh-copy-id username@machine_ip_address_of_another_worker
ssh-copy-id {username}@{machine_ip_address_of_another_worker}
```
* Repeat the above two steps on all machines
* If you see `/usr/bin/ssh-copy-id: ERROR: No identities found`, then run `ssh-keygen -t rsa` first.
* Repeat the two steps above on all machines.
- Note machines should be able to ssh into themselves passwordless e.g. host1 should be able to ssh into host1.
3. Clone gtsfm and follow the main readme file to setup the environment on all nodes in the cluster at an identical path
- If the cluster has 5 machines, then `ssh-copy-id` must be run 5*5=25 times.
4. Clone gtsfm and follow the main readme file to setup the environment on all nodes in the cluster at an identical path
- ```bash
git clone https://github.com/borglab/gtsfm.git
git clone --recursive https://github.com/borglab/gtsfm.git
conda env create -f environment_linux.yml
conda activate gtsfm-v1
```
4. Log into scheduler again and download the data to scheduler machine
5. Create a config file listing the cluster workers (example in [gtsfm/configs/cluster.yaml](https://github.com/borglab/gtsfm/blob/master/gtsfm/configs/cluster.yaml))
6. Run gtsfm with –cluster_config flag enabled, for example
5. Log into scheduler again and download the data to scheduler machine.
6. Run gtsfm with `-–cluster_config` flag enabled, for example
- ```
python /home/username/gtsfm/gtsfm/runner run_scene_optimizer_colmaploader.py --images_dir /home/username/gtsfm/skydio-32/images/ --config_name sift_front_end.yaml --colmap_files_dirpath /home/hstepanyan3/gtsfm/skydio-32/colmap_crane_mast_32imgs/ --cluster_config cluster.yaml
```
- Note that the first worker in the cluster.yaml file must be the scheduler machine where the data is hosted.
- Always provide absolute paths for all directories
7. If you would like to check out the dask dashboard, you will need to do port forwarding from machine to your local computer:
- ```
Expand Down
15 changes: 10 additions & 5 deletions gtsfm/runner/gtsfm_runner_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,13 +59,13 @@ def construct_argparser(self) -> argparse.ArgumentParser:
"--num_workers",
type=int,
default=1,
help="Number of workers to start (processes, by default)",
help="Number of workers to start (processes, by default).",
)
parser.add_argument(
"--threads_per_worker",
type=int,
default=1,
help="Number of threads per each worker",
help="Number of threads per each worker.",
)
parser.add_argument(
"--worker_memory_limit", type=str, default="8GB", help="Memory limit per worker, e.g. `8GB`"
Expand Down Expand Up @@ -106,7 +106,7 @@ def construct_argparser(self) -> argparse.ArgumentParser:
"--max_frame_lookahead",
type=int,
default=None,
help="maximum number of consecutive frames to consider for matching/co-visibility",
help="Maximum number of consecutive frames to consider for matching/co-visibility.",
)
parser.add_argument(
"--num_matched",
Expand All @@ -115,7 +115,7 @@ def construct_argparser(self) -> argparse.ArgumentParser:
help="Number of K potential matches to provide per query. These are the top `K` matches per query.",
)
parser.add_argument(
"--share_intrinsics", action="store_true", help="Shares the intrinsics between all the cameras"
"--share_intrinsics", action="store_true", help="Shares the intrinsics between all the cameras."
)
parser.add_argument("--mvs_off", action="store_true", help="Turn off dense MVS reconstruction")
parser.add_argument(
Expand Down Expand Up @@ -148,7 +148,7 @@ def construct_argparser(self) -> argparse.ArgumentParser:
"--num_retry_cluster_connection",
type=int,
default=3,
help="number of times to retry cluster connection if it fails",
help="Number of times to retry cluster connection if it fails.",
)
return parser

Expand Down Expand Up @@ -253,6 +253,11 @@ def setup_ssh_cluster_with_retries(self) -> SSHCluster:
except Exception as e:
logger.info(f"Worker failed to start: {str(e)}")
retry_count += 1
if not connected:
raise ValueError(
f"Connection to cluster could not be established after {self.parsed_args.num_retry_cluster_connection}"
" attempts. Aborting..."
)
return cluster

def run(self) -> GtsfmData:
Expand Down
6 changes: 6 additions & 0 deletions scripts/benchmark_wildcat.sh
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,12 @@ for num_matched in ${num_matched_sizes[@]}; do
continue
fi

if [[ $num_matched == 0 && $max_frame_lookahead == 0 ]]
then
# Matches must come from at least some retriever.
continue
fi

for correspondence_generator_config_name in ${correspondence_generator_config_names[@]}; do

if [[ $correspondence_generator_config_name == *"sift"* ]]
Expand Down

0 comments on commit 7896bf1

Please sign in to comment.