Skip to content

Commit

Permalink
Add has_child query (#8354)
Browse files Browse the repository at this point in the history
* Add has_child query

Signed-off-by: Fanit Kolchina <[email protected]>

* Rename parameter table header

Signed-off-by: Fanit Kolchina <[email protected]>

* Update _query-dsl/joining/has-child.md

Co-authored-by: Naarcha-AWS <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Naarcha-AWS <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
  • Loading branch information
3 people committed Sep 24, 2024
1 parent d9829d7 commit 19cdfa3
Show file tree
Hide file tree
Showing 7 changed files with 275 additions and 15 deletions.
2 changes: 1 addition & 1 deletion _field-types/supported-field-types/join.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ PUT testindex1/_doc/1
```
{% include copy-curl.html %}

When indexing child documents, you have to specify the `routing` query parameter because parent and child documents in the same relation have to be indexed on the same shard. Each child document refers to its parent's ID in the `parent` field.
When indexing child documents, you need to specify the `routing` query parameter because parent and child documents in the same parent/child hierarchy must be indexed on the same shard. For more information, see [Routing]({{site.url}}{{site.baseurl}}/field-types/metadata-fields/routing/). Each child document refers to its parent's ID in the `parent` field.

Index two child documents, one for each parent:

Expand Down
6 changes: 3 additions & 3 deletions _query-dsl/geo-and-xy/geo-bounding-box.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,11 +173,11 @@ GET testindex1/_search
```
{% include copy-curl.html %}

## Request fields
## Parameters

Geo-bounding box queries accept the following fields.
Geo-bounding box queries accept the following parameters.

Field | Data type | Description
Parameter | Data type | Description
:--- | :--- | :---
`_name` | String | The name of the filter. Optional.
`validation_method` | String | The validation method. Valid values are `IGNORE_MALFORMED` (accept geopoints with invalid coordinates), `COERCE` (try to coerce coordinates to valid values), and `STRICT` (return an error when coordinates are invalid). Default is `STRICT`.
Expand Down
6 changes: 3 additions & 3 deletions _query-dsl/geo-and-xy/geodistance.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,11 +103,11 @@ The response contains the matching document:
}
```

## Request fields
## Parameters

Geodistance queries accept the following fields.
Geodistance queries accept the following parameters.

Field | Data type | Description
Parameter | Data type | Description
:--- | :--- | :---
`_name` | String | The name of the filter. Optional.
`distance` | String | The distance within which to match the points. This distance is the radius of a circle centered at the specified point. For supported distance units, see [Distance units]({{site.url}}{{site.baseurl}}/api-reference/common-parameters/#distance-units). Required.
Expand Down
6 changes: 3 additions & 3 deletions _query-dsl/geo-and-xy/geopolygon.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,11 +161,11 @@ However, if you specify the vertices in the following order:

The response returns no results.

## Request fields
## Parameters

Geopolygon queries accept the following fields.
Geopolygon queries accept the following parameters.

Field | Data type | Description
Parameter | Data type | Description
:--- | :--- | :---
`_name` | String | The name of the filter. Optional.
`validation_method` | String | The validation method. Valid values are `IGNORE_MALFORMED` (accept geopoints with invalid coordinates), `COERCE` (try to coerce coordinates to valid values), and `STRICT` (return an error when coordinates are invalid). Optional. Default is `STRICT`.
Expand Down
6 changes: 3 additions & 3 deletions _query-dsl/geo-and-xy/geoshape.md
Original file line number Diff line number Diff line change
Expand Up @@ -721,10 +721,10 @@ The response returns document 1:

Note that when you indexed the geopoints, you specified their coordinates in `"latitude, longitude"` format. When you search for matching documents, the coordinate array is in `[longitude, latitude]` format. Thus, document 1 is returned in the results but document 2 is not.

## Request fields
## Parameters

Geoshape queries accept the following fields.
Geoshape queries accept the following parameters.

Field | Data type | Description
Parameter | Data type | Description
:--- | :--- | :---
`ignore_unmapped` | Boolean | Specifies whether to ignore an unmapped field. If set to `true`, then the query does not return any documents that contain an unmapped field. If set to `false`, then an exception is thrown when the field is unmapped. Optional. Default is `false`.
259 changes: 259 additions & 0 deletions _query-dsl/joining/has-child.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
---
layout: default
title: Has child
parent: Joining queries
nav_order: 10
---

# Has child query

The `has_child` query returns parent documents whose child documents match a specific query. You can establish parent-child relationships between documents in the same index by using a [join]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/) field type.

The `has_child` query is slower than other queries because of the join operation it performs. Performance decreases as the number of matching child documents pointing to different parent documents increases. Each `has_child` query in your search may significantly impact query performance. If you prioritize speed, avoid using this query or limit its usage as much as possible.
{: .warning}

## Example

Before you can run a `has_child` query, your index must contain a [join]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/) field in order to establish parent-child relationships. The index mapping request uses the following format:

```json
PUT /example_index
{
"mappings": {
"properties": {
"relationship_field": {
"type": "join",
"relations": {
"parent_doc": "child_doc"
}
}
}
}
}
```
{% include copy-curl.html %}

In this example, you'll configure an index that contains documents representing products and their brands.

First, create the index and establish the parent-child relationship between `brand` and `product`:

```json
PUT testindex1
{
"mappings": {
"properties": {
"product_to_brand": {
"type": "join",
"relations": {
"brand": "product"
}
}
}
}
}
```
{% include copy-curl.html %}

Index two parent (brand) documents:

```json
PUT testindex1/_doc/1
{
"name": "Luxury brand",
"product_to_brand" : "brand"
}
```
{% include copy-curl.html %}

```json
PUT testindex1/_doc/2
{
"name": "Economy brand",
"product_to_brand" : "brand"
}
```
{% include copy-curl.html %}

Index three child (product) documents:

```json
PUT testindex1/_doc/3?routing=1
{
"name": "Mechanical watch",
"sales_count": 150,
"product_to_brand": {
"name": "product",
"parent": "1"
}
}
```
{% include copy-curl.html %}

```json
PUT testindex1/_doc/4?routing=2
{
"name": "Electronic watch",
"sales_count": 300,
"product_to_brand": {
"name": "product",
"parent": "2"
}
}
```
{% include copy-curl.html %}

```json
PUT testindex1/_doc/5?routing=2
{
"name": "Digital watch",
"sales_count": 100,
"product_to_brand": {
"name": "product",
"parent": "2"
}
}
```
{% include copy-curl.html %}

To search for the parent of a child, use a `has_child` query. The following query returns parent documents (brands) that make watches:

```json
GET testindex1/_search
{
"query" : {
"has_child": {
"type":"product",
"query": {
"match" : {
"name": "watch"
}
}
}
}
}
```
{% include copy-curl.html %}

The response returns both brands:

```json
{
"took": 15,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "testindex1",
"_id": "1",
"_score": 1,
"_source": {
"name": "Luxury brand",
"product_to_brand": "brand"
}
},
{
"_index": "testindex1",
"_id": "2",
"_score": 1,
"_source": {
"name": "Economy brand",
"product_to_brand": "brand"
}
}
]
}
}
```

## Parameters

The following table lists all top-level parameters supported by `has_child` queries.

| Parameter | Required/Optional | Description |
|:---|:---|:---|
| `type` | Required | Specifies the name of the child relationship as defined in the `join` field mapping. |
| `query` | Required | The query to run on child documents. If a child document matches the query, the parent document is returned. |
| `ignore_unmapped` | Optional | Indicates whether to ignore unmapped `type` fields and not return documents instead of throwing an error. You can provide this parameter when querying multiple indexes, some of which may not contain the `type` field. Default is `false`. |
| `max_children` | Optional | The maximum number of matching child documents for a parent document. If exceeded, the parent document is excluded from the search results. |
| `min_children` | Optional | The minimum number of matching child documents required for a parent document to be included in the results. If not met, the parent is excluded. Default is `1`.|
| `score_mode` | Optional | Defines how scores of matching child documents influence the parent document's score. Valid values are: <br> - `none`: Ignores the relevance scores of child documents and assigns a score of `0` to the parent document. <br> - `avg`: Uses the average relevance score of all matching child documents. <br> - `max`: Assigns the highest relevance score from the matching child documents to the parent. <br> - `min`: Assigns the lowest relevance score from the matching child documents to the parent. <br> - `sum`: Sums the relevance scores of all matching child documents. <br> Default is `none`. |


## Sorting limitations

The `has_child` query does not support [sorting results]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/sort/) using standard sorting options. If you need to sort parent documents by fields in their child documents, you can use a [`function_score` query]({{site.url}}{{site.baseurl}}/query-dsl/compound/function-score/) and sort by the parent document's score.

In the preceding example, you can sort parent documents (brands) based on the `sales_count` of their child products. This query multiplies the score by the `sales_count` field of the child documents and assigns the highest relevance score from the matching child documents to the parent:

```json
GET testindex1/_search
{
"query": {
"has_child": {
"type": "product",
"query": {
"function_score": {
"script_score": {
"script": "_score * doc['sales_count'].value"
}
}
},
"score_mode": "max"
}
}
}
```
{% include copy-curl.html %}

The response contains the brands sorted by the highest child `sales_count`:

```json
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 300,
"hits": [
{
"_index": "testindex1",
"_id": "2",
"_score": 300,
"_source": {
"name": "Economy brand",
"product_to_brand": "brand"
}
},
{
"_index": "testindex1",
"_id": "1",
"_score": 150,
"_source": {
"name": "Luxury brand",
"product_to_brand": "brand"
}
}
]
}
}
```
5 changes: 3 additions & 2 deletions _query-dsl/joining/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,17 @@ layout: default
title: Joining queries
has_children: true
nav_order: 55
has_toc: false
---

# Joining queries

OpenSearch is a distributed system in which data is spread across multiple nodes. Thus, running a SQL-like JOIN operation in OpenSearch is resource intensive. As an alternative, OpenSearch provides the following queries that perform join operations and are optimized for scaling across multiple nodes:

- `nested` queries: Act as wrappers for other queries to search [nested]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/nested/) fields. The nested field objects are searched as though they were indexed as separate documents.
- `has_child` queries: Search for parent documents whose child documents match the query.
- [`has_child`]({{site.url}}{{site.baseurl}}/query-dsl/joining/has-child/) queries: Search for parent documents whose child documents match the query.
- `has_parent` queries: Search for child documents whose parent documents match the query.
- `parent_id` queries: A [join]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/nested/) field type establishes a parent/child relationship between documents in the same index. `parent_id` queries search for child documents that are joined to a specific parent document.
- `parent_id` queries: A [join]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/) field type establishes a parent/child relationship between documents in the same index. `parent_id` queries search for child documents that are joined to a specific parent document.

If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, then joining queries are not executed.
{: .important}

0 comments on commit 19cdfa3

Please sign in to comment.