borglab · travisdriver · Oct 14, 2024 · Oct 16, 2024
diff --git a/README.md b/README.md
@@ -4,9 +4,7 @@
 |:------------:| :-------------:|
 | Ubuntu 20.04.3 |  ![Linux CI](https://github.com/borglab/gtsfm/actions/workflows/test-python.yml/badge.svg?branch=master) |
 
-Georgia Tech Structure-from-Motion (GTSfM) is an end-to-end SfM pipeline based on [GTSAM](https://github.com/borglab/gtsam). GTSfM was designed from the ground-up to natively support parallel computation using [Dask](https://dask.org/). 
-
-For more details, please refer to our [arXiv preprint](https://arxiv.org/abs/2311.18801).
+### Georgia Tech Structure-from-Motion (GTSfM) is an end-to-end SfM pipeline based on [GTSAM](https://github.com/borglab/gtsam). GTSfM was designed from the ground-up to natively support parallel computation using [Dask](https://dask.org/). 
 
 <p align="left">
   <img src="https://user-images.githubusercontent.com/16724970/121294002-a4d7a400-c8ba-11eb-895e-a50305c049b6.gif" height="315" title="Olsson Lund Dataset: Door, 12 images">
@@ -24,6 +22,8 @@ The majority of our code is governed by an MIT license and is suitable for comme
 
 ## Installation
 
+<details><summary>Click to expand</summary>
+
 GTSfM requires no compilation, as Python wheels are provided for GTSAM. This repository includes external repositories as Git submodules –- don't forget to pull submodules with `git submodule update --init --recursive` or clone with `git clone --recursive https://github.com/borglab/gtsfm.git`.
 
 To run GTSfM, first, we need to create a conda environment with the required dependencies.
@@ -50,7 +50,11 @@ pip install -e .
 
 Make sure that you can run `python -c "import gtsfm; import gtsam; print('hello world')"` in python, and you are good to go!
 
-## Usage Guide (Running 3D Reconstruction)
+</details>
+
+## Usage Guide
+
+<details><summary>Click to expand</summary>
 
 Before running reconstruction, if you intend to use modules with pre-trained weights, such as SuperPoint, SuperGlue, or PatchmatchNet, please first run:
 
@@ -107,19 +111,19 @@ For users that want to run GTSfM on a cluster of multiple machines, we provide s
 
 The results will be stored at `--output_root`, which is the `results` folder in the repo root by default. The poses and 3D tracks are stored in COLMAP format inside the `ba_output` subdirectory of `--output_root`. These can be visualized using the COLMAP GUI as well.
 
-### Nerfstudio
+</details>
 
-We provide a preprocessing script to convert the camera poses estimated by GTSfM to [nerfstudio](https://docs.nerf.studio/en/latest/) format:
+## Pipeline Overview
 
-```bash
-python scripts/prepare_nerfstudio.py --results_path {RESULTS_DIR} --images_dir {IMAGES_DIR}
-```
+![Alt text](assets/gtsfm-overview.svg?raw=true)
 
-The results are stored in the nerfstudio_input subdirectory inside `{RESULTS_DIR}`, which can be used directly with nerfstudio if installed:
+GTSfM attempts to make the global Structure-from-Motion process as modular as possible to allow for streamlined integration of new state-of-the-art tools. We provide details for each module of the GTSfM pipeline below.
 
-```bash
-ns-train nerfacto --data {RESULTS_DIR}/nerfstudio_input
-```
+- [Loader](assets/LOADER.md)
+- [Image Pairs Generator](assets/IMAGE_PAIRS_GENERATOR.md)
+- [Correspondence Generator](assets/CORRESPONDENCE_GENERATOR.md)
+- [Two View Estimator](assets/TWO_VIEW_ESTIMATOR.md)
+- [Multiview Optimizer](assets/MULTIVIEW_OPTIMIZER.md)
 
 ## Repository Structure
 
@@ -143,6 +147,20 @@ GTSfM is designed in an extremely modular way. Each module can be swapped out wi
   - `utils`: utility functions such as serialization routines and pose comparisons, etc
 - `tests`: unit tests on every function and module
 
+## Nerfstudio
+
+We provide a preprocessing script to convert the camera poses estimated by GTSfM to [nerfstudio](https://docs.nerf.studio/en/latest/) format:
+
+```bash
+python scripts/prepare_nerfstudio.py --results_path {RESULTS_DIR} --images_dir {IMAGES_DIR}
+```
+
+The results are stored in the nerfstudio_input subdirectory inside `{RESULTS_DIR}`, which can be used directly with nerfstudio if installed:
+
+```bash
+ns-train nerfacto --data {RESULTS_DIR}/nerfstudio_input
+```
+
 ## Contributing
 
 Contributions are always welcome! Please be aware of our [contribution guidelines for this project](CONTRIBUTING.md).

diff --git a/assets/CORRESPONDENCE_GENERATOR.md b/assets/CORRESPONDENCE_GENERATOR.md
@@ -0,0 +1,177 @@
+# Correspondence Generator
+
+![Alt text](gtsfm-overview-correspondence-generator.svg?raw=true)
+
+- [Loader](assets/LOADER.md)
+- [Image Pairs Generator](assets/IMAGE_PAIRS_GENERATOR.md)
+- **Correspondence Generator**
+- [Two View Estimator](assets/TWO_VIEW_ESTIMATOR.md)
+- [Multiview Optimizer](assets/MULTIVIEW_OPTIMIZER.md)
+
+## What is a Correspondence Generator?
+
+The Correspondence Generator is responsible for taking in putative image pairs from the [`ImagePairsGenerator`](https://github.com/borglab/gtsfm/blob/master/gtsfm/retriever/image_pairs_generator.py) and returning keypoints for each image and correspondences between each specified image pair. Correspondence generation is implemented by the [`CorrespondenceGeneratorBase`](https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/correspondence_generator/correspondence_generator_base.py) class defined below.
+
+```python
+class CorrespondenceGeneratorBase:
+    """Base class for correspondence generators."""
+
+    @abstractmethod
+    def generate_correspondences(
+        self,
+        client: Client,
+        images: List[Future],
+        image_pairs: List[Tuple[int, int]],
+    ) -> Tuple[List[Keypoints], Dict[Tuple[int, int], np.ndarray]]:
+        """Apply the correspondence generator to generate putative correspondences.
+
+        Args:
+            client: Dask client, used to execute the front-end as futures.
+            images: List of all images, as futures.
+            image_pairs: Indices of the pairs of images to estimate two-view pose and correspondences.
+
+        Returns:
+            List of keypoints, one entry for each input images.
+            Putative correspondence as indices of keypoints, for pairs of images.
+        """
+```
+
+## Types of Correspondence Generators
+
+We provide support for two correspondence generation paradigms: [feature extraction _then_ matching](#feature-detection-and-description-then-matching) and [_detector-free_ matching](#detector-free-matching). The supported Correspondence Generator paradigms are detailed below.
+
+
+### Feature Detection and Description _then_ Matching
+
+This paradigm jointly computes feature detections and descriptors, typically using shared weights in a deep convolutional neural network, followed by feature matching. This is implemented by the [`DetDescCorrespondenceGenerator`](https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/correspondence_generator/det_desc_correspondence_generator.py) class, which wraps a feature detector and descriptor ([`DetectorDescriptorBase`](https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/detector_descriptor/detector_descriptor_base.py)) and a feature matcher ([`MatcherBase`](https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/matcher/matcher_base.py)).
+
+The feature detector and descriptor module takes in a single image and outputs keypoints and feature descriptors. Joint detection and description is implemented by the [`DetectorDescriptorBase`](https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/detector_descriptor/detector_descriptor_base.py) class defined below. We also provide functionality for combining different keypoint detection and feature description modules to form a joint detector-descriptor module (see [`CombinationDetectorDescriptor`](https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/detector_descriptor/combination_detector_descriptor.py)). To create your own feature extractor, simply copy the contents of `detector_descriptor_base.py` to a new file corresponding to the new extractor's class name and implement the `detect_and_describe` method. 
+
+The feature matcher takes in the keypoints and descriptors for image images and outputs indices for matching keypoints. Feature matching is implemented by the [`MatcherBase`](https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/matcher/matcher_base.py) class defined below. To create your own feature matcher, simply copy the contents of `matcher_base.py` to a new file corresponding to the new matcher's class name and implement the `match` method.
+
+```python
+class DetectorDescriptorBase(GTSFMProcess):
+    """Base class for all methods which provide a joint detector-descriptor to work on a single image."""
+
+    def __init__(self, max_keypoints: int = 5000):
+        """Initialize the detector-descriptor.
+
+        Args:
+            max_keypoints: Maximum number of keypoints to detect. Defaults to 5000.
+        """
+        self.max_keypoints = max_keypoints
+
+    @abc.abstractmethod
+    def detect_and_describe(self, image: Image) -> Tuple[Keypoints, np.ndarray]:
+        """Perform feature detection as well as their description.
+
+        Refer to detect() in DetectorBase and describe() in DescriptorBase for
+        details about the output format.
+
+        Args:
+            image: the input image.
+
+        Returns:
+            Detected keypoints, with length N <= max_keypoints.
+            Corr. descriptors, of shape (N, D) where D is the dimension of each descriptor.
+        """
+```
+
+```python
+class MatcherBase(GTSFMProcess):
+    """Base class for all matchers."""
+
+    @abc.abstractmethod
+    def match(
+        self,
+        keypoints_i1: Keypoints,
+        keypoints_i2: Keypoints,
+        descriptors_i1: np.ndarray,
+        descriptors_i2: np.ndarray,
+        im_shape_i1: Tuple[int, int, int],
+        im_shape_i2: Tuple[int, int, int],
+    ) -> np.ndarray:
+        """Match descriptor vectors.
+
+        # Some matcher implementations (such as SuperGlue) utilize keypoint coordinates as
+        # positional encoding, so our matcher API provides them for optional use.
+
+        Output format:
+        1. Each row represents a match.
+        2. First column represents keypoint index from image #i1.
+        3. Second column represents keypoint index from image #i2.
+        4. Matches are sorted in descending order of the confidence (score), if possible.
+
+        Args:
+            keypoints_i1: keypoints for image #i1, of length N1.
+            keypoints_i2: keypoints for image #i2, of length N2.
+            descriptors_i1: descriptors corr. to keypoints_i1.
+            descriptors_i2: descriptors corr. to keypoints_i2.
+            im_shape_i1: shape of image #i1, as (height,width,channel).
+            im_shape_i2: shape of image #i2, as (height,width,channel).
+
+
+        Returns:
+            Match indices (sorted by confidence), as matrix of shape (N, 2), where N < min(N1, N2).
+        """
+```
+
+<details><summary>Supported Feature Detectors & Descriptors</summary>
+<ul>
+  <li><strong>SIFT</strong>, D. G. Lowe, IJCV 2004. <a href="https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf">[paper]</a> <a href="https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/detector_descriptor/sift.py">[code]</a></li>
+  <li><strong>BRISK</strong>, S. Leutenegger, <em>et al.</em>, ICCV 2011. <a href="https://margaritachli.com/papers/ICCV2011paper.pdf">[paper]</a> <a href="https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/detector_descriptor/brisk.py">[code]</a></li>
+  <li><strong>ORB</strong>, E. Rublee <em>et al.</em>, ICCV 2011. <a href="https://ieeexplore.ieee.org/document/6126544">[paper]</a> <a href="https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/detector_descriptor/orb.py">[code]</a></li>
+  <li><strong>KAZE</strong>, P. F. Alcantarilla <em>et al.</em>, ECCV 2012. <a href="https://link.springer.com/chapter/10.1007/978-3-642-33783-3_16">[paper]</a> <a href="https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/detector_descriptor/kaze.py">[code]</a></li>
+  <li><strong>SuperPoint</strong>, D. DeTone <em>et al.</em>, CVPRW 2018. <a href="https://openaccess.thecvf.com/content_cvpr_2018_workshops/papers/w9/DeTone_SuperPoint_Self-Supervised_Interest_CVPR_2018_paper.pdf">[paper]</a> <a href="https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/detector_descriptor/superpoint.py">[code]</a></li>
+  <li><strong>D2-Net</strong>, M. Dusmanu <em>et al.</em>, CVPR 2019. <a href="https://arxiv.org/abs/1905.03561">[paper]</a> <a href="https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/detector_descriptor/d2net.py">[code]</a></li>
+  <li><strong>DISK</strong>, M. Tyszkiewicz <em>et al.</em>, NeurIPS 2020. <a href="https://proceedings.neurips.cc/paper/2020/file/a42a596fc71e17828440030074d15e74-Paper.pdf">[paper]</a> <a href="https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/detector_descriptor/disk.py">[code]</a></li>
+</ul>
+</details>
+
+<details><summary>Supported Feature Matchers</summary>
+<ul>
+  <li><strong>Mutual Nearest Neighbors (MNN)</strong></li>
+  <li><strong>SuperGlue (trained for SuperPoint)</strong>, P.-E. Sarlin <em>et al.</em>, CVPR 2020. <a href="http://openaccess.thecvf.com/content_CVPR_2020/papers/Sarlin_SuperGlue_Learning_Feature_Matching_With_Graph_Neural_Networks_CVPR_2020_paper.pdf">[paper]</a> <a href="https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/matcher/superglue_matcher.py">[code]</a></li>
+  <li><strong>LightGlue (trained for SuperPoint and DISK)</strong>, P. Lindenberger <em>et al.</em>, CVPR 2023. <a href="https://openaccess.thecvf.com/content/ICCV2023/papers/Lindenberger_LightGlue_Local_Feature_Matching_at_Light_Speed_ICCV_2023_paper.pdf">[paper]</a> <a href="https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/matcher/lightglue_matcher.py">[code]</a></li>
+</ul>
+</details>
+
+
+### Detector-Free Matching 
+
+This paradigm directly regresses _per-pixel_ matches between two input images as opposed to generating detections for each image followed by matching. This is implemented by the [`ImageCorrespondenceGenerator`](https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/correspondence_generator/image_correspondence_generator.py) class, which siply wraps an detector-free matcher ([`ImageMatcherBase`](https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/matcher/image_matcher_base.py)). 
+
+Detector-free matching is implemented by the `ImageMatcherBase` class defined below. To create your own detector-free matcher, simply copy the contents of `image_matcher_base.py` to a new file corresponding to the new matcher's class name and implement the `match` method.
+
+```python
+class ImageMatcherBase(GTSFMProcess):
+    """Base class for matchers that accept an image pair, and directly generate keypoint matches.
+
+    Note: these matchers do NOT use descriptors as input.
+    """
+
+    @abc.abstractmethod
+    def match(
+        self,
+        image_i1: Image,
+        image_i2: Image,
+    ) -> Tuple[Keypoints, Keypoints]:
+        """Identify feature matches across two images.
+
+        Args:
+            image_i1: first input image of pair.
+            image_i2: second input image of pair.
+
+        Returns:
+            Keypoints from image 1 (N keypoints will exist).
+            Corresponding keypoints from image 2 (there will also be N keypoints). These represent feature matches.
+        """
+```
+
+<details><summary>Supported Detector-Free Matchers</summary>
+<ul>
+  <li><strong>LoFTR</strong>, J. Sun, Z. Shen, Y. Wang, <em>et al.</em>, CVPR 2021. <a href="https://zju3dv.github.io/loftr/">[paper]</a> <a href="https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/matcher/loftr.py">[code]</a></li>
+  <li><strong>DKM</strong>, J. Edstedt <em>et al.</em>, CVPR 2021. <a href="https://parskatt.github.io/DKM/">[paper]</a> <a href="https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/matcher/dkm.py">[code]</a></li>
+  <li><strong>RoMa</strong>, J. Edstedt <em>et al.</em>, CVPR 2024. <a href="https://parskatt.github.io/RoMa/">[paper]</a> <a href="https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/matcher/roma.py">[code]</a></li>
+</ul>
+</details>