Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downstream inference using Faiss #1

Open
feemthan opened this issue Aug 3, 2024 · 1 comment
Open

Downstream inference using Faiss #1

feemthan opened this issue Aug 3, 2024 · 1 comment

Comments

@feemthan
Copy link

feemthan commented Aug 3, 2024

Hello Team,

Thank you for your amazing work on this model. I was able to reproduce your remarkable results. I am looking to contribute and develop downstream inference using faiss but I am running to a lot of issues. The cosine similarity gives incorrect results.

08/03/2024 22:03:10 - INFO - main - Setup model...
08/03/2024 22:03:11 - INFO - main - Using CLIP pretrained weights...
08/03/2024 22:03:17 - INFO - main - Setup model done!
Loaded existing embeddings.
08/03/2024 22:03:17 - INFO - main - Loading metadata...
08/03/2024 22:03:17 - INFO - main - Metadata loaded
Top 5 results for 'a woman eating':

  1. Distance: 3.3629, Index: 126 Caption: 3d animation music video song Path: video7136.mp4
  2. Distance: 3.0853, Index: 759 Caption: there are some people flying in a helicopter Path: video7769.mp4
  3. Distance: 3.0298, Index: 769 Caption: two men examine a red lamborghini with no tires Path: video7779.mp4
  4. Distance: 3.0025, Index: 176 Caption: a man hugs another man in outer space Path: video7186.mp4
  5. Distance: 2.9686, Index: 55 Caption: a band performs Path: video7065.mp4

I use faiss.IndexFlatIP which is the inner product. How do I make better predictions on the MSRVTT dataset?

@angelaaye
Copy link
Contributor

Hi @feemthan, thank you for your interest. Could you elaborate how you are obtaining the distance values? Does the "distance" you print refer to "cosine similarity"? Cosine similarity should return a max similarity of 1, but the distance values you have printed are around 3. Could you check that the embeddings you are using are normalized?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants