Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-lingual semantic retrieval #33

Open
artitw opened this issue Mar 26, 2022 · 2 comments
Open

Cross-lingual semantic retrieval #33

artitw opened this issue Mar 26, 2022 · 2 comments

Comments

@artitw
Copy link
Owner

artitw commented Mar 26, 2022

Perform a similar study to https://arxiv.org/pdf/1907.04307.pdf
but expanding to support 100 languages using the embeddings from the translator.

Possibly start with the paper's code sample.

@lere01
Copy link
Contributor

lere01 commented May 30, 2022

@artitw

This looks interesting. Can I begin to look into this?

@artitw
Copy link
Owner Author

artitw commented May 30, 2022

@lere01 thanks for your interest. I would recommend the following steps:

  1. Try out the code sample mentioned above to ensure that results from the paper are reproducible.
  2. Run the same process but use Text2Text embeddings for 100 languages.
  3. Try different types of Text2Text embeddings: (a) neural, (b) TF-IDF and (c) BM-25. We can also ensemble all of them.
  4. Share your findings; report on any improvements and other things you learned.

Let us know what you think, and if you have other ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants