Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi word match query minShould #1747

Open
jgschis opened this issue Oct 23, 2022 · 5 comments
Open

Multi word match query minShould #1747

jgschis opened this issue Oct 23, 2022 · 5 comments

Comments

@jgschis
Copy link

jgschis commented Oct 23, 2022

Hi,

let's say a user enters 5 words into the search bar. I want to return all documents that contain at least 4 of the words. How do I do that in Bleve? It seems that Bleve only has 2 options:

  1. any term (or)
  2. all terms (and)

but nothing in between.

Btw, Elasticearch can do this:
https://www.elastic.co/guide/en/elasticsearch/guide/current/match-multi-word.html#match-improving-precision

@abhinavdangeti
Copy link
Member

You'll want to use a disjunction query with a min for this. Here's a JSON example -

{
  "query": {
    "disjuncts": [
      {"match": "one"},
      {"match": "two"},
      {"match": "three"},
      {"match": "four"},
      {"match": "five"}
    ],
    "min": 4
  }
}

If you were using a query string, doing this would look for any of the 5 words ..

"one two three four five"

.. but will now allow you to set a value for "min".

@abhinavdangeti
Copy link
Member

abhinavdangeti commented Oct 24, 2022

One more thing, the elasticsearch section you highlight is supported within bleve as well.

This query ..

{
  "query": {
    "match": "brown dog",
    "field": "title"
  }
}

.. will look for the occurrence of either brown or dog in the field title (assuming you're using the standard analyzer).
And the following query will look for both brown and dog in the field title:

{
  "query": {
    "match": "brown dog",
    "field": "title",
    "operator": "and"
  }
}

.. but the matching precision is only supported via the min setting within the disjunction query I highlighted earlier.

@jgschis
Copy link
Author

jgschis commented Oct 24, 2022

The problem with what you are saying is that I have to split the user's search query into sub queries myself. But the search-time anayser should be doing it, as only it knows how to correctly do the splitting. I can assume that the query should be split on whitespace, but this will not always be true -- it depends on the analyser.

But 99% of the time, it will be true, so it should be good enough for government work.

But it would be good if a match query had a parameter that allowed you to specify the precision. Maybe I could implement it, but I have no familiarity with Go or Bleve.

@abhinavdangeti
Copy link
Member

Yes, an analyzer is responsible for tokenizing (on unicode or whitespace or something else), stemming, replacing and/or applying other filtering to text. This is applicable at index time and also at query time but only for analytic queries - match, match_phrase.

However, an analyzer will not handle an at-least-these-many-matches situation.

The matching precision offered by elastic is not directly supported by bleve at the moment (we'll consider supporting it for a future release). The current workaround is to use a disjunction query with the min setting.

@jgschis
Copy link
Author

jgschis commented Oct 25, 2022

I could look into doing this...I will.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants