Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding documentation for filter search in OpenSearch #7900

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
316 changes: 316 additions & 0 deletions _search-plugins/filter-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,316 @@
---
layout: default
title: Filter search results
nav_order: 36
---

# Filter search results

In OpenSearch, filtering search results can be achieved through two main approaches: using a [DSL boolean query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html) with a filter clause. The boolean query filtering approach applies filters to both search hits and aggregations.

Check failure on line 9 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: boolean. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: boolean. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 9, "column": 99}}}, "severity": "ERROR"}

Check failure on line 9 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'Boolean' instead of 'boolean'. Raw Output: {"message": "[Vale.Terms] Use 'Boolean' instead of 'boolean'.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 9, "column": 99}}}, "severity": "ERROR"}

Check failure on line 9 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: boolean. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: boolean. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 9, "column": 231}}}, "severity": "ERROR"}

Check failure on line 9 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'Boolean' instead of 'boolean'. Raw Output: {"message": "[Vale.Terms] Use 'Boolean' instead of 'boolean'.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 9, "column": 231}}}, "severity": "ERROR"}
hdhalter marked this conversation as resolved.
Show resolved Hide resolved

You can also filter search results with the `post_filter` parameter in the search API, which applies filters only to search hits, not aggregations.

#### Table of contents
1. TOC
{:toc}

---

## Using `post_filter` to filter search results

Using the `post_filter` parameter to filter search results allows for calculating aggregations based on a broader result set before narrowing down the search hits. It can also improve relevance of results and reorder results by rescoring hits after applying the post filter.

Check failure on line 21 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 21, "column": 229}}}, "severity": "ERROR"}

### Example of filtering search results

1. Create an index of products

```
PUT /electronics
{
"mappings": {
"properties": {
"brand": { "type": "keyword" },
"category": { "type": "keyword" },
"price": { "type": "float" },
"features": { "type": "keyword" }
}
}
}
```

2. Index data:

```
PUT /electronics/_doc/1?refresh
{
"brand": "BrandX",
"category": "Smartphone",
"price": 699.99,
"features": ["5G", "Dual Camera"]
}

PUT /electronics/_doc/2?refresh
{
"brand": "BrandX",
"category": "Laptop",
"price": 1199.99,
"features": ["Touchscreen", "16GB RAM"]
}

PUT /electronics/_doc/3?refresh
{
"brand": "BrandY",
"category": "Smartphone",
"price": 799.99,
"features": ["5G", "Triple Camera"]
}
```

3. Perform a `boolean filter` to show only smartphones from BrandX

```
GET /electronics/_search
{
"query": {
"bool": {
"filter": [
{ "term": { "brand": "BrandX" }},
{ "term": { "category": "Smartphone" }}
]
}
}
}
```

Alternatively, to refine search results further, for example, you may have a category field that allows users to limit their search results to BrandX smartphones or tablets, you can utilize a `terms aggregation`:

```
GET /electronics/_search
{
"query": {
"bool": {
"filter": [
{ "term": { "brand": "BrandX" }},
{ "term": { "category": "Smartphone" }}
]
}
},
"aggs": {
"categories": {
"terms": { "field": "category" }
}
}
}
```
This returns the most popular categories of products from BrandX that are smartphones.

To display how many BrandX products are available in different price ranges, use a `post_filter`:

```
GET /electronics/_search
{
"query": {
"bool": {
"filter": {
"term": { "brand": "BrandX" }
}
}
},
"aggs": {
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "to": 500 },
{ "from": 500, "to": 1000 },
{ "from": 1000 }
]
}
},
"category_smartphone": {
"filter": {
"term": { "category": "Smartphone" }
},
"aggs": {
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "to": 500 },
{ "from": 500, "to": 1000 },
{ "from": 1000 }
]
}
}
}
}
},
"post_filter": {
"term": { "category": "Smartphone" }
}
}

```
This query finds all products from BrandX. The `category_smartphone` aggregation limits the price range. The `price_ranges` aggregation returns price ranges for all BrandX products and the `post_filter` narrows the search hits to `smartphones`.

### Rescoring filtered search results

Check failure on line 156 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 156, "column": 5}}}, "severity": "ERROR"}
Rescoring is a tool to improve the accuracy of the returned search results. Rescoring focuses on the top results rather than applying the complex algorithm to the entire dataset, optimizing efficiency. Each shard processes the rescore request before the final results are aggregated and sorted by the coordinating node.

Check failure on line 157 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 157, "column": 1}}}, "severity": "ERROR"}

Check failure on line 157 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Rescoring. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 157, "column": 77}}}, "severity": "ERROR"}

Example of using a rescore query:
```
GET /electronics/_search
{
"query": {
"bool": {
"filter": [
{ "term": { "brand": "BrandX" }},
{ "term": { "category": "Smartphone" }}
]
}
},
"post_filter": {
"term": { "category": "Smartphone" }
},
"rescore": {
"window_size": 50,
"query": {
"rescore_query": {
"match": {
"features": "5G"
}
},
"query_weight": 1.0,
"rescore_query_weight": 2.0
}
}
}

```
In this example, the rescore section reorders the top 50 smartphones from BrandX based on whether their features include "5G".

When using pagination, avoid changing window_size with each page step, as it may cause shifting results, which could confuse users.

### Query rescorer

Check failure on line 193 in _search-plugins/filter-search.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: rescorer. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: rescorer. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/filter-search.md", "range": {"start": {"line": 193, "column": 11}}}, "severity": "ERROR"}

In OpenSearch, the query rescorer refines search results by applying an additional query to the top results obtained from the initial search. Instead of evaluating every document, the rescorer focuses only on a subset defined by the window_size parameter, which defaults to 10. This approach enhances the efficiency of relevance adjustments.

The rescore query’s influence is balanced with the original query through the `query_weight` and `rescore_query_weight` parameters, both set to 1 by default.

#### Query rescorer example

1. Create an index and add sample data:

```
PUT /articles
{
"mappings": {
"properties": {
"title": { "type": "text" },
"content": { "type": "text" },
"views": { "type": "integer" }
}
}
}
```

2. Add sample documents:

```
POST /articles/_doc/1
{
"title": "OpenSearch Basics",
"content": "Learn the basics of OpenSearch with this guide.",
"views": 150
}

POST /articles/_doc/2
{
"title": "Advanced OpenSearch Techniques",
"content": "Explore advanced features and techniques in OpenSearch.",
"views": 300
}

POST /articles/_doc/3
{
"title": "OpenSearch Performance Tuning",
"content": "Optimize the performance of your OpenSearch cluster.",
"views": 450
}

```

3. Perform a search with query rescorer:

This example query uses the query rescorer. It refines the results based on a phrase match for the content field. Documents that match "OpenSearch" in the content field are further rescored based on a phrase match, giving more weight to exact phrases.

```
POST /articles/_search
{
"query": {
"match": {
"content": "OpenSearch"
}
},
"rescore": {
"window_size": 10,
"query": {
"rescore_query": {
"match_phrase": {
"content": {
"query": "OpenSearch",
"slop": 2
}
}
},
"query_weight": 1,
"rescore_query_weight": 2
}
}
}
```
4. Perform a search with multiple rescorers:

In this example, we first apply a phrase match rescorer and then a function score rescorer to adjust the final relevance based on the number of views.
```
POST /articles/_search
{
"query": {
"match": {
"content": "OpenSearch"
}
},
"rescore": [
{
"window_size": 10,
"query": {
"rescore_query": {
"match_phrase": {
"content": {
"query": "OpenSearch",
"slop": 2
}
}
},
"query_weight": 0.7,
"rescore_query_weight": 1.5
}
},
{
"window_size": 5,
"query": {
"score_mode": "multiply",
"rescore_query": {
"function_score": {
"field_value_factor": {
"field": "views",
"factor": 1.2,
"missing": 1
}
}
}
}
}
]
}

```
Loading