You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your amazing work on this model. I was able to reproduce your remarkable results. I am looking to contribute and develop downstream inference using faiss but I am running to a lot of issues. The cosine similarity gives incorrect results.
08/03/2024 22:03:10 - INFO - main - Setup model...
08/03/2024 22:03:11 - INFO - main - Using CLIP pretrained weights...
08/03/2024 22:03:17 - INFO - main - Setup model done!
Loaded existing embeddings.
08/03/2024 22:03:17 - INFO - main - Loading metadata...
08/03/2024 22:03:17 - INFO - main - Metadata loaded
Top 5 results for 'a woman eating':
Distance: 3.3629, Index: 126 Caption: 3d animation music video song Path: video7136.mp4
Distance: 3.0853, Index: 759 Caption: there are some people flying in a helicopter Path: video7769.mp4
Distance: 3.0298, Index: 769 Caption: two men examine a red lamborghini with no tires Path: video7779.mp4
Distance: 3.0025, Index: 176 Caption: a man hugs another man in outer space Path: video7186.mp4
Distance: 2.9686, Index: 55 Caption: a band performs Path: video7065.mp4
I use faiss.IndexFlatIP which is the inner product. How do I make better predictions on the MSRVTT dataset?
The text was updated successfully, but these errors were encountered:
Hi @feemthan, thank you for your interest. Could you elaborate how you are obtaining the distance values? Does the "distance" you print refer to "cosine similarity"? Cosine similarity should return a max similarity of 1, but the distance values you have printed are around 3. Could you check that the embeddings you are using are normalized?
Hello Team,
Thank you for your amazing work on this model. I was able to reproduce your remarkable results. I am looking to contribute and develop downstream inference using faiss but I am running to a lot of issues. The cosine similarity gives incorrect results.
08/03/2024 22:03:10 - INFO - main - Setup model...
08/03/2024 22:03:11 - INFO - main - Using CLIP pretrained weights...
08/03/2024 22:03:17 - INFO - main - Setup model done!
Loaded existing embeddings.
08/03/2024 22:03:17 - INFO - main - Loading metadata...
08/03/2024 22:03:17 - INFO - main - Metadata loaded
Top 5 results for 'a woman eating':
I use faiss.IndexFlatIP which is the inner product. How do I make better predictions on the MSRVTT dataset?
The text was updated successfully, but these errors were encountered: