Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify depth dimensions to match our input #2

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Santoi
Copy link
Collaborator

@Santoi Santoi commented Aug 6, 2024

The original application is intended to be executed while capturing the NeRF input at the same time, or transmitting the input with DSS.

This PR performs the required tweaks to use previously recorded datasets in the following format:

dataset-dir
|
|--- depth
|    |
|    `--- 0.png
|--- rgb
|    |
|    `--- 0.png
|
` transforms.json

Changes:

  • The depth vector has the channels separated into 3 vectors, but the algorithm expects them to be all in 1 vector. Flattening the last dimension solves this by taking those 3 vectors and joining them.

  • When using the depth mask for the color_mask, it is expected that it has only 1 channel instead of 3, so the tiling is adapted for 3 channels. This is because the depth image is black and white, and could have been represented with 1 channel.

  • Add the possibility of generating a pointcloud output with the visualization script.
    WARNING: Visualization script freezes if there is no GUI when ran.

@Santoi Santoi requested review from olmerg and hidmic August 6, 2024 13:44
Copy link
Collaborator

@hidmic hidmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind to explain the flattening and tiling that is going on?

opencv-python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Santoi a missing dependency? If opencv is not used for visualization, consider using pulling opencv-python-headless instead. opencv-python and matplotlib don't play ball in certain cases (due to PyQt compatibility issues).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks!

@@ -107,9 +107,9 @@ def get_pointcloud(color, depth, intrinsics, w2c, transform_pts=True,

# Select points based on mask
if mask is not None:
point_cld = point_cld[mask]
#point_cld = point_cld[mask]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Santoi why comment this out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 reasons, not necessarily valid.

  1. It was causing some tensor dimension disparity I hadn't been able to solve.
    This is now addressed in a new commit by generating the mask from a single channel, instead of all of them.
  2. It didn't make sense since the mask is being created from valid depth values, and every depth value we had was valid (greater than 0). Just in case this happens, let's apply it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still don't understand what's going on here. How is that depth maps have 3 channels?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The capturing app saves the depth as a gray scale png images, therefore each pixel is represented with red, green and blue channels, which all have the same value.

Copy link
Collaborator

@hidmic hidmic Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that is unusual. Depth maps usually use single-channel, unsigned integer pixels. How is the app converting that grayscale? Now I wonder if we are not losing information to this conversion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multi-channel depth map is only provided by the NeRF Capture's "offline-mode", which currently has some issues:

  • On one side, it hasn't been tested out by the SplaTAM authors. This means it probably shouldn't work right out of the box.
  • On a second note, it seems it is simply broken, see comment.

In the meantime, I will give it a try with this issue's suggestions to see how it goes: spla-tam#59 (comment)

@Santoi Santoi requested a review from hidmic August 13, 2024 20:40
@Santoi
Copy link
Collaborator Author

Santoi commented Aug 13, 2024

Hi, @hidmic ! Thanks for the review!

PR description has been updated to better explain the changes. PTAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants