Searcher should add an "normalize" argument? #1952

dayuyang1999 · 2024-07-30T20:51:16Z

Hi,

If I use my own embedding model like bge-large-en-v1.5.

Because the model is trained on optimizing cosine similarity. When creating index, the correct implementation should add --l2-norm option.

--l2-norm

However, when creating FaissSearcher for search, it seems there is no option for normalizing the embedding.

class FaissSearcher:
    """Simple Searcher for dense representation

    Parameters
    ----------
    index_dir : str
        Path to faiss index directory.
    """

    def __init__(self, index_dir: str, query_encoder: Union[QueryEncoder, str],
                 prebuilt_index_name: Optional[str] = None):
        requires_backends(self, "faiss")
        if not isinstance(query_encoder, str):
            self.query_encoder = query_encoder
        else:
            self.query_encoder = self._init_encoder_from_str(query_encoder)
        self.index, self.docids = self.load_index(index_dir)
        self.dimension = self.index.d
        self.num_docs = self.index.ntotal

        assert self.docids is None or self.num_docs == len(self.docids)
        if prebuilt_index_name:
            sparse_index = get_sparse_index(prebuilt_index_name)
            self.ssearcher = LuceneSearcher.from_prebuilt_index(sparse_index)

The text was updated successfully, but these errors were encountered:

MXueguang · 2024-07-31T18:13:15Z

hi @dayuyang1999,
At search time, for l2 norm vectors, we assume the indexes are built with vector normalized already and the query encoder is generating normalized vectors. You can make the l2-norm=true when you initialize the query encoder and then pass the query encoder to the searcher.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Searcher should add an "normalize" argument? #1952

Searcher should add an "normalize" argument? #1952

dayuyang1999 commented Jul 30, 2024 •

edited

Loading

MXueguang commented Jul 31, 2024

Searcher should add an "normalize" argument? #1952

Searcher should add an "normalize" argument? #1952

Comments

dayuyang1999 commented Jul 30, 2024 • edited Loading

MXueguang commented Jul 31, 2024

dayuyang1999 commented Jul 30, 2024 •

edited

Loading