You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As shown in the Supplementary Materials of the proposed method, the channel dimension of feature after Extractor, which need to be added to position embeding, is 64. But in Subsection 4.1 of the main paper, it's noted that the dimension 'd' should be divisible
by 3 since the positional encodings of the three dimensions should be concatenated to form the final 'd' channel positional encodings. However, 64 can't be divisible by 3.
So, how to implement the Spatial-temporal positional encoding? I am looking forward to your reply as soon as possible.
The text was updated successfully, but these errors were encountered:
As shown in the Supplementary Materials of the proposed method, the channel dimension of feature after Extractor, which need to be added to position embeding, is 64. But in Subsection 4.1 of the main paper, it's noted that the dimension 'd' should be divisible
by 3 since the positional encodings of the three dimensions should be concatenated to form the final 'd' channel positional encodings. However, 64 can't be divisible by 3.
So, how to implement the Spatial-temporal positional encoding? I am looking forward to your reply as soon as possible.
The text was updated successfully, but these errors were encountered: