Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug of multiple pre-processing when segmentation (PyTorch) #645

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Lionelsy
Copy link

It is very slow in performing segmentation inference.
#531
#234

And, it is because the dataloader will apply multiple data preprocessing if self.cache_convert is None.

attr = dataset.get_attr(index)
if self.cache_convert:
data = self.cache_convert(attr['name'])
elif self.preprocess:
data = self.preprocess(dataset.get_data(index), attr)
else:
data = dataset.get_data(index)

When running the run_inference method, the cache_convert of dataloader is None.

infer_split = TorchDataloader(dataset=infer_dataset,
preprocess=model.preprocess,
transform=model.transform,
sampler=infer_sampler,
use_cache=False)

This leads to extreme slowness in performing reasoning.

I've added a get_cache method to provide cache to avoid slowdowns caused by multiple preprocessing during inference.

I tested it on a GV100 GPU with RandLA-Net on the Toronto3D dataset.
Inferencing time for a single scene is only two minutes and 37 seconds.
Reasoning is considerably faster than before

After: test 0/1: 100%|██████████████████████████████████████████████████████| 4990714/4990714 [02:37<00:00, 31769.86it/s]

Before: test 0/1:   4%|██                                                     | 187127/4990714 [05:12<2:19:39, 573.27it/s]

@ssheorey ssheorey self-requested a review July 17, 2024 05:53
@rejexx
Copy link

rejexx commented Jul 23, 2024

I applied this fix on a local fork with a custom dataset and saw RandLA-NET inference go from 27 hours to 6 minutes. I can't thank you enough for sharing.

@ssheorey
Copy link
Member

Hi @Lionelsy thanks for debugging this and submitting a PR. I have a question:

  • if cache_convert is None, it looks like preprocess is applied only once in line 81 of torch_dataloader.py. Can you point out the multiple pre-processing?

@Lionelsy
Copy link
Author

Hi @Lionelsy thanks for debugging this and submitting a PR. I have a question:

  • if cache_convert is None, it looks like preprocess is applied only once in line 81 of torch_dataloader.py. Can you point out the multiple pre-processing?

Thank you for your continued contributions to Open3D!

In fact, the __getitem__ method in the torch_dataloader.py is called multiple times during the inference process, and the preprocess method (a very time-consuming step) is called again each time.

If we use the cache_convert to store the preprocessed data, it will save much time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

3 participants