Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.16] Add metadata fields for mappings (content gap initiative) #8125

Merged
merged 1 commit into from
Aug 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
182 changes: 85 additions & 97 deletions _field-types/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,43 +12,77 @@

# Mappings and field types

You can define how documents and their fields are stored and indexed by creating a _mapping_. The mapping specifies the list of fields for a document. Every field in the document has a _field type_, which defines the type of data the field contains. For example, you may want to specify that the `year` field should be of type `date`. To learn more, see [Supported field types]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/index/).
Mappings tell OpenSearch how to store and index your documents and their fields. You can specify the data type for each field (for example, `year` as `date`) to make storage and querying more efficient.

If you're just starting to build out your cluster and data, you may not know exactly how your data should be stored. In those cases, you can use dynamic mappings, which tell OpenSearch to dynamically add data and its fields. However, if you know exactly what types your data falls under and want to enforce that standard, then you can use explicit mappings.
While [dynamic mappings](#dynamic-mapping) automatically add new data and fields, using explicit mappings is recommended. Explicit mappings let you define the exact structure and data types upfront. This helps to maintain data consistency and optimize performance, especially for large datasets or high-volume indexing operations.

For example, if you want to indicate that `year` should be of type `text` instead of an `integer`, and `age` should be an `integer`, you can do so with explicit mappings. By using dynamic mapping, OpenSearch might interpret both `year` and `age` as integers.
For example, with explicit mappings, you can ensure that `year` is treated as text and `age` as an integer instead of both being interpreted as integers by dynamic mapping.

This section provides an example for how to create an index mapping and how to add a document to it that will get ip_range validated.

#### Table of contents
1. TOC
{:toc}


---
## Dynamic mapping

When you index a document, OpenSearch adds fields automatically with dynamic mapping. You can also explicitly add fields to an index mapping.

#### Dynamic mapping types
### Dynamic mapping types

Type | Description
:--- | :---
null | A `null` field can't be indexed or searched. When a field is set to null, OpenSearch behaves as if that field has no values.
boolean | OpenSearch accepts `true` and `false` as boolean values. An empty string is equal to `false.`
float | A single-precision 32-bit floating point number.
double | A double-precision 64-bit floating point number.
integer | A signed 32-bit number.
object | Objects are standard JSON objects, which can have fields and mappings of their own. For example, a `movies` object can have additional properties such as `title`, `year`, and `director`.
array | Arrays in OpenSearch can only store values of one type, such as an array of just integers or strings. Empty arrays are treated as though they are fields with no values.
text | A string sequence of characters that represent full-text values.
keyword | A string sequence of structured characters, such as an email address or ZIP code.
`null` | A `null` field can't be indexed or searched. When a field is set to null, OpenSearch behaves as if the field has no value.
`boolean` | OpenSearch accepts `true` and `false` as Boolean values. An empty string is equal to `false.`
`float` | A single-precision, 32-bit floating-point number.
`double` | A double-precision, 64-bit floating-point number.
`integer` | A signed 32-bit number.
`object` | Objects are standard JSON objects, which can have fields and mappings of their own. For example, a `movies` object can have additional properties such as `title`, `year`, and `director`.
`array` | OpenSearch does not have a specific array data type. Arrays are represented as a set of values of the same data type (for example, integers or strings) associated with a field. When indexing, you can pass multiple values for a field, and OpenSearch will treat it as an array. Empty arrays are valid and recognized as array fields with zero elements---not as fields with no values. OpenSearch supports querying and filtering arrays, including checking for values, range queries, and array operations like concatenation and intersection. Nested arrays, which may contain complex objects or other arrays, can also be used for advanced data modeling.
`text` | A string sequence of characters that represent full-text values.
`keyword` | A string sequence of structured characters, such as an email address or ZIP code.
date detection string | Enabled by default, if new string fields match a date's format, then the string is processed as a `date` field. For example, `date: "2012/03/11"` is processed as a date.
numeric detection string | If disabled, OpenSearch may automatically process numeric values as strings when they should be processed as numbers. When enabled, OpenSearch can process strings into `long`, `integer`, `short`, `byte`, `double`, `float`, `half_float`, `scaled_float`, and `unsigned_long`. Default is disabled.

### Dynamic templates

Dynamic templates are used to define custom mappings for dynamically added fields based on the data type, field name, or field path. They allow you to define a flexible schema for your data that can automatically adapt to changes in the structure or format of the input data.

You can use the following syntax to define a dynamic mapping template:

```json
PUT index
{
"mappings": {
"dynamic_templates": [
{
"fields": {
"mapping": {
"type": "short"
},
"match_mapping_type": "string",
"path_match": "status*"
}
}
]
}
}
```
{% include copy-curl.html %}

This mapping configuration dynamically maps any field with a name starting with `status` (for example, `status_code`) to the `short` data type if the initial value provided during indexing is a string.

### Dynamic mapping parameters

The `dynamic_templates` support the following parameters for matching conditions and mapping rules. The default value is `null`.

Parameter | Description |
----------|-------------|
`match_mapping_type` | Specifies the JSON data type (for example, string, long, double, object, binary, Boolean, date) that triggers the mapping.
`match` | A regular expression used to match field names and apply the mapping.
`unmatch` | A regular expression used to exclude field names from the mapping.
`match_pattern` | Determines the pattern matching behavior, either `regex` or `simple`. Default is `simple`.
`path_match` | Allows you to match nested field paths using a regular expression.
`path_unmatch` | Excludes nested field paths from the mapping using a regular expression.
`mapping` | The mapping configuration to apply.

## Explicit mapping

If you know exactly what your field data types need to be, you can specify them in your request body when creating your index.
If you know exactly which field data types you need to use, then you can specify them in your request body when creating your index, as shown in the following example request:

```json
PUT sample-index1
Expand All @@ -62,17 +96,19 @@
}
}
```
{% include copy-curl.html %}

### Response
#### Response
```json
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "sample-index1"
}
```
{% include copy-curl.html %}

To add mappings to an existing index or data stream, you can send a request to the `_mapping` endpoint using the `PUT` or `POST` HTTP method:
To add mappings to an existing index or data stream, you can send a request to the `_mapping` endpoint using the `PUT` or `POST` HTTP method, as shown in the following example request:

```json
POST sample-index1/_mapping
Expand All @@ -84,84 +120,29 @@
}
}
```
{% include copy-curl.html %}

You cannot change the mapping of an existing field, you can only modify the field's mapping parameters.
{: .note}

---
## Mapping example usage
## Mapping parameters

The following example shows how to create a mapping to specify that OpenSearch should ignore any documents with malformed IP addresses that do not conform to the [`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/) data type. You accomplish this by setting the `ignore_malformed` parameter to `true`.
Mapping parameters are used to configure the behavior of index fields. See [Mappings and field types]({{site.url}}{{site.baseurl}}/field-types/) for more information.

### Create an index with an `ip` mapping
## Mapping limit settings

To create an index, use a PUT request:
OpenSearch has certain mapping limits and settings, such as the settings listed in the following table. Settings can be configured based on your requirements.

```json
PUT /test-index
{
"mappings" : {
"properties" : {
"ip_address" : {
"type" : "ip",
"ignore_malformed": true
}
}
}
}
```

You can add a document that has a malformed IP address to your index:

```json
PUT /test-index/_doc/1
{
"ip_address" : "malformed ip address"
}
```

This indexed IP address does not throw an error because `ignore_malformed` is set to true.

You can query the index using the following request:

```json
GET /test-index/_search
```
| Setting | Default value | Allowed value | Type | Description |
|-|-|-|-|-|
| `index.mapping.nested_fields.limit` | 50 | [0,) | Dynamic | Limits the maximum number of nested fields that can be defined in an index mapping. |
| `index.mapping.nested_objects.limit` | 10,000 | [0,) | Dynamic | Limits the maximum number of nested objects that can be created in a single document. |
| `index.mapping.total_fields.limit` | 1,000 | [0,) | Dynamic | Limits the maximum number of fields that can be defined in an index mapping. |
| `index.mapping.depth.limit` | 20 | [1,100] | Dynamic | Limits the maximum depth of nested objects and nested fields that can be defined in an index mapping. |
| `index.mapping.field_name_length.limit` | 50,000 | [1,50000] | Dynamic | Limits the maximum length of field names that can be defined in an index mapping. |
| `index.mapper.dynamic` | true | {true,false} | Dynamic | Determines whether new fields should be dynamically added to a mapping. |

Check failure on line 143 in _field-types/index.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _field-types/index.md#L143

[OpenSearch.SpacingPunctuation] There should be no space before and one space after the punctuation mark in 'true,false'.
Raw output
{"message": "[OpenSearch.SpacingPunctuation] There should be no space before and one space after the punctuation mark in 'true,false'.", "location": {"path": "_field-types/index.md", "range": {"start": {"line": 143, "column": 36}}}, "severity": "ERROR"}

The response shows that the `ip_address` field is ignored in the indexed document:

```json
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "test-index",
"_id": "1",
"_score": 1,
"_ignored": [
"ip_address"
],
"_source": {
"ip_address": "malformed ip address"
}
}
]
}
}
```
---

## Get a mapping

Expand All @@ -170,29 +151,31 @@
```json
GET <index>/_mapping
```
{% include copy-curl.html %}

In the above request, `<index>` may be an index name or a comma-separated list of index names.
In the previous request, `<index>` may be an index name or a comma-separated list of index names.

To get all mappings for all indexes, use the following request:

```json
GET _mapping
```
{% include copy-curl.html %}

To get a mapping for a specific field, provide the index name and the field name:

```json
GET _mapping/field/<fields>
GET /<index>/_mapping/field/<fields>
```
{% include copy-curl.html %}

Both `<index>` and `<fields>` can be specified as one value or a comma-separated list.

For example, the following request retrieves the mapping for the `year` and `age` fields in `sample-index1`:
Both `<index>` and `<fields>` can be specified as either one value or a comma-separated list. For example, the following request retrieves the mapping for the `year` and `age` fields in `sample-index1`:

```json
GET sample-index1/_mapping/field/year,age
```
{% include copy-curl.html %}

The response contains the specified fields:

Expand Down Expand Up @@ -220,3 +203,8 @@
}
}
```
{% include copy-curl.html %}

## Mappings use cases

See [Mappings use cases]({{site.url}}{{site.baseurl}}/field-types/mappings-use-cases/) for use case examples, including examples of mapping string fields and ignoring malformed IP addresses.
122 changes: 122 additions & 0 deletions _field-types/mappings-use-cases.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
---
layout: default
title: Mappings use cases
parent: Mappings and fields types
nav_order: 5
nav_exclude: true
---

# Mappings use cases

Mappings provide control over how data is indexed and queried, enabling optimized performance and efficient storage for a range of use cases.

---

## Example: Ignoring malformed IP addresses

The following example shows you how to create a mapping specifying that OpenSearch should ignore any documents containing malformed IP addresses that do not conform to the [`ip`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/ip/) data type. You can accomplish this by setting the `ignore_malformed` parameter to `true`.

### Create an index with an `ip` mapping

To create an index with an `ip` mapping, use a PUT request:

```json
PUT /test-index
{
"mappings" : {
"properties" : {
"ip_address" : {
"type" : "ip",
"ignore_malformed": true
}
}
}
}
```
{% include copy-curl.html %}

Then add a document with a malformed IP address:

```json
PUT /test-index/_doc/1
{
"ip_address" : "malformed ip address"
}
```
{% include copy-curl.html %}

When you query the index, the `ip_address` field will be ignored. You can query the index using the following request:

```json
GET /test-index/_search
```
{% include copy-curl.html %}

#### Response

```json
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "test-index",
"_id": "1",
"_score": 1,
"_ignored": [
"ip_address"
],
"_source": {
"ip_address": "malformed ip address"
}
}
]
}
}
```
{% include copy-curl.html %}

---

## Mapping string fields to `text` and `keyword` types

To create an index named `movies1` with a dynamic template that maps all string fields to both the `text` and `keyword` types, you can use the following request:

```json
PUT movies1
{
"mappings": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
]
}
}
```
{% include copy-curl.html %}

This dynamic template ensures that any string fields in your documents will be indexed as both a full-text `text` type and a `keyword` type.
Loading
Loading