You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I was embedding a relevant text pair using the m2-bert-80M-32k-retrieval model, the cosine similarity obtained with padding=max_length was 0.7, while with padding=true (to save memory) it was close to 0. This resulted in semantic retrieval being completely impossible with padding=true. The same situation occurred with the 2k and 8k models as well.Why is this the case? And is padding=true completely unusable?
The text was updated successfully, but these errors were encountered:
The bidirectional convolutions in these models use the padding tokens to
pass information from layer to layer (like scratch tokens). Padding = true
sets the padding to be the length of the longest element in the batch, max
length sets it to the max length of the tokenizer.
We’re working on a version that gracefully interpolates between the
32k/8k/2k versions to save compute but it’s still active research so may
not be live for a while.
On Sun, Jan 21, 2024 at 9:01 AM qianyue76 ***@***.***> wrote:
When I was embedding a relevant text pair using the
m2-bert-80M-32k-retrieval model, the cosine similarity obtained with
padding=max_length was 0.7, while with padding=true (to save memory) it was
close to 0. This resulted in semantic retrieval being completely impossible
with padding=true. The same situation occurred with the 2k and 8k models as
well.Why is this the case? And is padding=true completely unusable?
—
Reply to this email directly, view it on GitHub
<#19>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDDIIRDZ5WDR7Y3S45VTILYPUNSVAVCNFSM6AAAAABCEAFNIKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4TENJZHA3TQMI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
When I was embedding a relevant text pair using the m2-bert-80M-32k-retrieval model, the cosine similarity obtained with padding=max_length was 0.7, while with padding=true (to save memory) it was close to 0. This resulted in semantic retrieval being completely impossible with padding=true. The same situation occurred with the 2k and 8k models as well.Why is this the case? And is padding=true completely unusable?
The text was updated successfully, but these errors were encountered: