Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Allow hybrid search querying within the DSL #630

Open
youcandanch opened this issue Dec 14, 2023 · 4 comments
Open

[FEATURE] Allow hybrid search querying within the DSL #630

youcandanch opened this issue Dec 14, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@youcandanch
Copy link

youcandanch commented Dec 14, 2023

Is your feature request related to a problem?

OpenSearch 2.10 introduced the ability to do hybrid querying, where you can use multiple queries and normalize the resulting scores to implement a hybrid search strategy. Low-level client totally supports this, and the high-level client should be able to support it with something like this:

plaintext_query = "hello this is dog"
vector_embeddings = [-0.018416348844766617, -0.02228226698935032, 0.016061007976531982, ...etc]
lexical_search_params = {
    "multi_match": {
        "query": plaintext_query,
        "fields": ["title", "text"],
        "type": "phrase",
    }
}
semantic_search_params = {
    "script_score": {
        "query": {"bool": {}},
        "script": {
            "source": "knn_score",
            "lang": "knn",
            "params": {
                "field": "vectors",
                "query_value": vector_embeddings,
                "space_type": "cosinesimil",
            },
        },
    }
}
search = Search().query(Q({
    'hybrid': {
        'queries': [
            lexical_search_params,
            semantic_search_params
        ]
    }
}))

Not the prettiest code, but it should work! Problem is upon constructing that query, the following error is raised:

  File "hybrid_search_test.py", line 52, in get
    search = Search().query(Q({
  File "lib/python3.8/site-packages/opensearchpy/helpers/query.py", line 49, in Q
    return Query.get_dsl_class(name)(_expand__to_dot=False, **params)
  File "lib/python3.8/site-packages/opensearchpy/helpers/utils.py", line 283, in get_dsl_class
    raise UnknownDslObject(

opensearchpy.exceptions.UnknownDslObject: DSL class `hybrid` does not exist in query.

This prevents hybrid searching via the DSL, and forces dropping down to the low-level client to execute.

What solution would you like?

Within helpers/query.py, adding:

class Hybrid(Query):
    name = "hybrid"

...works with my example above, but I'm not 100% sure it's the right way to broach adding this to the DSL. I think something like a Search.hybrid_query function might make sense, but haven't dug down that rabbit hole yet.

@youcandanch youcandanch added enhancement New feature or request untriaged Need triage labels Dec 14, 2023
@youcandanch youcandanch changed the title [FEATURE] [FEATURE] Allow hybrid search querying within the DSL Dec 14, 2023
@saimedhi saimedhi removed the untriaged Need triage label Dec 14, 2023
@lambda-science
Copy link

Intesrested in this

@saimedhi
Copy link
Collaborator

Please feel free to contribute. Thank you!

@ssharm8-etr
Copy link

interested in this

@saimedhi
Copy link
Collaborator

saimedhi commented Apr 1, 2024

@youcandanch, @lambda-science, @ssharm8-etr Your contributions are highly valued and greatly appreciated. Whenever you have a moment, we welcome your input and encourage you to submit a pull request. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants