Skip to content

Latest commit

 

History

History
142 lines (98 loc) · 9.98 KB

model.md

File metadata and controls

142 lines (98 loc) · 9.98 KB

Supported Models

All assembly models follow a similar pipeline:

  1. A point cloud encoder extracts features from each input part. We support common encoders such as PointNet, PointNet++ and DGCNN
  2. A correlation module performs relation reasoning between part features, which can be LSTM, GNN, and Transformer
  3. A MLP-based PoseRegressor predicts rotation and translation for each part

Model Details

Below we briefly describe methods we implement in this repo:

Global

A naive baseline from DGL. This model concatenates all part point clouds and extract a global feature. Then, it concatenates the global feature with each part feature as the input to the pose regressor.

LSTM

A naive baseline from DGL. This model applies a Bidirectional-LSTM over part features for reasoning, and use the LSTM output for pose prediction.

Note that, in PartNet the order of parts in pre-processed data follow some patterns (e.g. from chair leg to seat to back), which causes information leak if using LSTM. Therefore, we need to shuffle the order of parts in training data.

DGL (NeurIPS'20)

Proposed in DGL. This model leverages a GNN to perform message passing and interactions between parts. Besides, it adopts an iterative refinement process. The model first outputs a rough prediction given initial input parts. Then, it applies the predicted transformation to each part, and runs the model on the transformed parts to predict a residual transformation. DGL repeats this process for several (3 by default) times, thus refining the prediction to get a good result.

RGL-NET (WACV'22)

Proposed in RGL-NET. Intuitively, RGL-NET is a combination of DGL and LSTM. It applies both the GNN and Bidirectional-LSTM to reason part relations. It assumes the input parts follow some orders (not necessarily need GT part labels, can also be e.g. part volumes, see Table 4 in their paper).

We do not implement the input sorting operation. This is because on the one hand, the pre-processed PartNet data does follow some partterns, and we observe that RGL-NET can indeed leverage such partterns. On the other hand, in geometric assembly, there are no semantically meaningful order of parts. Indeed, in our Breaking Bad benchmark, RGL-NET performs similarly to DGL, so we do not include it in our paper.

Transformer-based Methods (our designed)

This class of methods simply replace the GNN with a standard TransformerEncoder to learn part interactions. We also provide a variant that adopts the iterative refinement process as in DGL.

Remark: We implement two additional Transformer-based models which is further discussed in the dev branch.

Benchmarks

Semantic Assembly

  • Results on PartNet chair:
Method Shape Chamfer (SCD) ↓ Part Accuracy (%) ↑ Connectivity Accuracy (%) ↑
Global 0.0128 23.82 16.29
LSTM 0.0114 22.03 14.88
DGL 0.0079 40.56 27.58
RGL-NET 0.0068 44.24 29.38
Transformer 0.0089 41.90 29.11
Refine-Transformer 0.0079 42.97 31.25

See wandb report for detailed training logs.

To reproduce the result, take DGL for example, simply run:

GPUS=1 CPUS_PER_TASK=8 MEM_PER_CPU=4 QOS=normal REPEAT=1 ./scripts/dup_run_sbatch_ddl.sh $PARTITION dgl-32x1-cosine_300e-partnet_chair scripts/train.py configs/dgl/dgl-32x1-cosine_300e-partnet_chair.py --fp16 --cudnn

Then, you can go to wandb to find the results.

Geometric Assembly

  • Results on Breaking Bad Dataset (see our paper for more results)
Method RMSE (R) $\downarrow$ MAE (R) $\downarrow$ RMSE (T) $\downarrow$ MAE (T) $\downarrow$ CD $\downarrow$ PA $\uparrow$
degree degree $\times 10^{-2}$ $\times 10^{-2}$ $\times 10^{-3}$ %
Global 80.7 68.0 15.1 12.0 14.6 24.6
LSTM 84.2 72.4 16.2 12.6 15.8 22.7
DGL 79.4 66.5 15.0 11.9 14.3 31.0

To reproduce our main results on the everyday subset (paper Table 3), take DGL for example, please run:

./scripts/train_everyday_categories.sh "GPUS=1 CPUS_PER_GPU=8 MEM_PER_CPU=4 QOS=normal REPEAT=3 ./scripts/dup_run_sbatch.sh $PARTITION dgl-32x1-cosine_200e-everyday-CATEGORY ./scripts/train.py configs/dgl/dgl-32x1-cosine_200e-everyday.py --fp16 --cudnn" configs/dgl/dgl-32x1-cosine_200e-everyday.py
  • This assumes you are working on a slurm-based computing cluster. If you work on servers then you will need to manually train the model on all categories.
  • In Table 4, we train one model per category, and report the numbers averaged over all categories.
  • Since some categories have only a few base shapes, the results may vary among different runs. Therefore, we run all the experiments 3 times and report the average results. You can modify the REPEAT=3 flag above for your need.

After running the above script, the model weights will be saved in checkpoint/dgl-32x1-cosine_200e-everyday-$CATEGORY-dup$X, where $CATEGORY is the category (e.g. Bottle, Teapot), and X indexes different runs. To collect the results, run (assuming you are in a GPU environment):

python scripts/collect_test.py --cfg_file configs/dgl/dgl-32x1-cosine_200e-everyday.py --num_dup 3 --ckp_suffix checkpoint/dgl-32x1-cosine_200e-everyday-

It will automatically test each model and collect its evaluation metrics, doing the calculation, and format them into LaTeX format, which you can directly copy paste to your table.

To reproduce our ablation study results (paper Table 4), you need to create new config files for each model, and set the _C.data.max_num_part to the number you want to try. Then, you can train the model in the same way as detailed above.

To collect the results, again you can use the scripts/collect_test.py script. To control the number of pieces to test, you can set the --min_num_part and --max_num_part flags.

To reproduce our results in the appendix (Table 11 bottom), i.e. train one model on all the categories, simply run:

GPUS=1 CPUS_PER_GPU=8 MEM_PER_CPU=4 QOS=normal REPEAT=3 ./scripts/dup_run_sbatch.sh $PARTITION dgl-32x1-cosine_200e-everyday scripts/train.py configs/dgl/dgl-32x1-cosine_200e-everyday.py --fp16 --cudnn

Then, you can use the same script to collect the results as detailed above (add a --train_all flag because the model is trained on all categories jointly).

Geometric Assembly with Inner-Face-Removed Data

See issue#6 for details on this data update.

  • We re-run the models with the same scripts, and report the results below:
Method RMSE (R) $\downarrow$ MAE (R) $\downarrow$ RMSE (T) $\downarrow$ MAE (T) $\downarrow$ CD $\downarrow$ PA $\uparrow$
degree degree $\times 10^{-2}$ $\times 10^{-2}$ $\times 10^{-3}$ %
Global 82.4 69.7 14.8 11.8 15.0 21.8
LSTM 84.7 72.7 16.2 12.7 17.1 19.4
DGL 81.8 68.9 14.8 11.8 14.6 28.2

Compared to the original results, the results on rotation and translation are similar, while there are notable changes in CD and PA. The change on PA is due to CD, because PA is computed by comparing CD with a pre-defined threshold.