Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to implement the Spatial-temporal positional encoding? #5

Open
Jacklikesironman opened this issue Jul 3, 2021 · 0 comments
Open

Comments

@Jacklikesironman
Copy link

As shown in the Supplementary Materials of the proposed method, the channel dimension of feature after Extractor, which need to be added to position embeding, is 64. But in Subsection 4.1 of the main paper, it's noted that the dimension 'd' should be divisible
by 3 since the positional encodings of the three dimensions should be concatenated to form the final 'd' channel positional encodings. However, 64 can't be divisible by 3.

So, how to implement the Spatial-temporal positional encoding? I am looking forward to your reply as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant