Skip to content

Commit

Permalink
Merge branch 'main' into 20230825-edit-document-level-permissions
Browse files Browse the repository at this point in the history
  • Loading branch information
Naarcha-AWS committed Mar 28, 2024
2 parents 6afb195 + 13a3537 commit 06065f6
Show file tree
Hide file tree
Showing 26 changed files with 1,424 additions and 106 deletions.
2 changes: 1 addition & 1 deletion .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1 +1 @@
* @hdhalter @kolchfa-aws @Naarcha-AWS @vagimeli @AMoo-Miki @natebower @dlvenable @scrawfor99
* @hdhalter @kolchfa-aws @Naarcha-AWS @vagimeli @AMoo-Miki @natebower @dlvenable @scrawfor99 @epugh
5 changes: 3 additions & 2 deletions MAINTAINERS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Overview

This document contains a list of maintainers in this repo. See [opensearch-project/.github/RESPONSIBILITIES.md](https://github.com/opensearch-project/.github/blob/main/RESPONSIBILITIES.md#maintainer-responsibilities) that explains what the role of maintainer means, what maintainers do in this and other repos, and how they should be doing it. If you're interested in contributing, and becoming a maintainer, see [CONTRIBUTING](CONTRIBUTING.md).
This document lists the maintainers in this repo. See [opensearch-project/.github/RESPONSIBILITIES.md](https://github.com/opensearch-project/.github/blob/main/RESPONSIBILITIES.md#maintainer-responsibilities) for information about the role of a maintainer, what maintainers do in this and other repos, and how they should be doing it. If you're interested in contributing or becoming a maintainer, see [CONTRIBUTING](CONTRIBUTING.md).

## Current Maintainers

Expand All @@ -9,8 +9,9 @@ This document contains a list of maintainers in this repo. See [opensearch-proje
| Heather Halter | [hdhalter](https://github.com/hdhalter) | Amazon |
| Fanit Kolchina | [kolchfa-aws](https://github.com/kolchfa-aws) | Amazon |
| Nate Archer | [Naarcha-AWS](https://github.com/Naarcha-AWS) | Amazon |
| Nate Bower | [natebower](https://github.com/natebower) | Amazon |
| Nathan Bower | [natebower](https://github.com/natebower) | Amazon |
| Melissa Vagi | [vagimeli](https://github.com/vagimeli) | Amazon |
| Miki Barahmand | [AMoo-Miki](https://github.com/AMoo-Miki) | Amazon |
| David Venable | [dlvenable](https://github.com/dlvenable) | Amazon |
| Stephen Crawford | [scraw99](https://github.com/scrawfor99) | Amazon |
| Eric Pugh | [epugh](https://github.com/epugh) | OpenSource Connections |
3 changes: 2 additions & 1 deletion _api-reference/document-apis/reindex.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,11 @@ slice | Whether to manually or automatically slice the reindex operation so it e
_source | Whether to reindex source fields. Specify a list of fields to reindex or true to reindex all fields. Default is true.
id | The ID to associate with manual slicing.
max | Maximum number of slices.
dest | Information about the destination index. Valid values are `index`, `version_type`, and `op_type`.
dest | Information about the destination index. Valid values are `index`, `version_type`, `op_type`, and `pipeline`.
index | Name of the destination index.
version_type | The indexing operation's version type. Valid values are `internal`, `external`, `external_gt` (retrieve the document if the specified version number is greater than the document’s current version), and `external_gte` (retrieve the document if the specified version number is greater or equal to than the document’s current version).
op_type | Whether to copy over documents that are missing in the destination index. Valid values are `create` (ignore documents with the same ID from the source index) and `index` (copy everything from the source index).
pipeline | Which ingest pipeline to utilize during the reindex.
script | A script that OpenSearch uses to apply transformations to the data during the reindex operation.
source | The actual script that OpenSearch runs.
lang | The scripting language. Valid options are `painless`, `expression`, `mustache`, and `java`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ extensions:
region: <YOUR_REGION_1>
sts_role_arn: <YOUR_STS_ROLE_ARN_1>
refresh_interval: <YOUR_REFRESH_INTERVAL>
disable_refresh: false
<YOUR_SECRET_CONFIG_ID_2>:
...
```
Expand All @@ -148,7 +149,8 @@ Option | Required | Type | Description
secret_id | Yes | String | The AWS secret name or ARN. |
region | No | String | The AWS region of the secret. Defaults to `us-east-1`.
sts_role_arn | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to the AWS Secrets Manager. Defaults to `null`, which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html).
refresh_interval | No | Duration | The refreshment interval for AWS secrets extension plugin to poll new secret values. Defaults to `PT1H`. See [Automatically refreshing secrets](#automatically-refreshing-secrets) for details.
refresh_interval | No | Duration | The refreshment interval for the AWS Secrets extension plugin to poll new secret values. Defaults to `PT1H`. For more information, see [Automatically refreshing secrets](#automatically-refreshing-secrets).
disable_refresh | No | Boolean | Disables regular polling on the latest secret values inside the AWS secrets extension plugin. Defaults to `false`. When set to `true`, `refresh_interval` will not be used.

#### Reference secrets
ß
Expand Down
15 changes: 15 additions & 0 deletions _data-prepper/managing-data-prepper/extensions/extensions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
layout: default
title: Extensions
parent: Managing Data Prepper
has_children: true
nav_order: 18
---

# Extensions

Data Prepper extensions provide Data Prepper functionality outside of core Data Prepper pipeline components.
Many extensions provide configuration options that give Data Prepper administrators greater flexibility over Data Prepper's functionality.

Extension configurations can be configured in the `data-prepper-config.yaml` file under the `extensions:` YAML block.

67 changes: 67 additions & 0 deletions _data-prepper/managing-data-prepper/extensions/geoip_service.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
layout: default
title: geoip_service
nav_order: 5
parent: Extensions
grand_parent: Managing Data Prepper
---

# geoip_service

The `geoip_service` extension configures all [`geoip`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/geoip) processors in Data Prepper.

## Usage

You can configure the GeoIP service that Data Prepper uses for the `geoip` processor.
By default, the GeoIP service comes with the [`maxmind`](#maxmind) option configured.

The following example shows how to configure the `geoip_service` in the `data-prepper-config.yaml` file:

```
extensions:
geoip_service:
maxmind:
database_refresh_interval: PT1H
cache_count: 16_384
```

## maxmind

The GeoIP service supports the MaxMind [GeoIP and GeoLite](https://dev.maxmind.com/geoip) databases.
By default, Data Prepper will use all three of the following [MaxMind GeoLite2](https://dev.maxmind.com/geoip/geolite2-free-geolocation-data) databases:

* City
* Country
* ASN

The service also downloads databases automatically to keep Data Prepper up to date with changes from MaxMind.

You can use the following options to configure the `maxmind` extension.

Option | Required | Type | Description
:--- | :--- | :--- | :---
`databases` | No | [database](#database) | The database configuration.
`database_refresh_interval` | No | Duration | How frequently to check for updates from MaxMind. This can be any duration in the range of 15 minutes to 30 days. Default is `PT7D`.
`cache_count` | No | Integer | The maximum cache count by number of items in the cache, with a range of 100--100,000. Default is `4096`.
`database_destination` | No | String | The name of the directory in which to store downloaded databases. Default is `{data-prepper.dir}/data/geoip`.
`aws` | No | [aws](#aws) | Configures the AWS credentials for downloading the database from Amazon Simple Storage Service (Amazon S3).
`insecure` | No | Boolean | When `true`, this options allows you to download database files over HTTP. Default is `false`.

## database

Option | Required | Type | Description
:--- | :--- | :--- | :---
`city` | No | String | The URL of the city in which the database resides. Can be an HTTP URL for a manifest file, an MMDB file, or an S3 URL.
`country` | No | String | The URL of the country in which the database resides. Can be an HTTP URL for a manifest file, an MMDB file, or an S3 URL.
`asn` | No | String | The URL of the Autonomous System Number (ASN) of where the database resides. Can be an HTTP URL for a manifest file, an MMDB file, or an S3 URL.
`enterprise` | No | String | The URL of the enterprise in which the database resides. Can be an HTTP URL for a manifest file, an MMDB file, or an S3 URL.


## aws

Option | Required | Type | Description
:--- | :--- | :--- | :---
`region` | No | String | The AWS Region to use for the credentials. Default is the [standard SDK behavior for determining the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html).
`sts_role_arn` | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon S3. Default is `null`, which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html).
`aws_sts_header_overrides` | No | Map | A map of header overrides that the AWS Identity and Access Management (IAM) role assumes when downloading from Amazon S3.
`sts_external_id` | No | String | An STS external ID used when Data Prepper assumes the STS role. For more information, see the `ExternalID` documentation in the [STS AssumeRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html) API reference.
54 changes: 43 additions & 11 deletions _data-prepper/pipelines/configuration/processors/date.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,24 +9,32 @@ nav_order: 50
# date


The `date` processor adds a default timestamp to an event, parses timestamp fields, and converts timestamp information to the International Organization for Standardization (ISO) 8601 format. This timestamp information can be used as an event timestamp.
The `date` processor adds a default timestamp to an event, parses timestamp fields, and converts timestamp information to the International Organization for Standardization (ISO) 8601 format. This timestamp information can be used as an event timestamp.

## Configuration

The following table describes the options you can use to configure the `date` processor.

<!-- vale off -->
Option | Required | Type | Description
:--- | :--- | :--- | :---
match | Conditionally | List | List of `key` and `patterns` where patterns is a list. The list of match can have exactly one `key` and `patterns`. There is no default value. This option cannot be defined at the same time as `from_time_received`. Include multiple date processors in your pipeline if both options should be used.
from_time_received | Conditionally | Boolean | A boolean that is used for adding default timestamp to event data from event metadata which is the time when source receives the event. Default value is `false`. This option cannot be defined at the same time as `match`. Include multiple date processors in your pipeline if both options should be used.
destination | No | String | Field to store the timestamp parsed by date processor. It can be used with both `match` and `from_time_received`. Default value is `@timestamp`.
source_timezone | No | String | Time zone used to parse dates. It is used in case the zone or offset cannot be extracted from the value. If the zone or offset are part of the value, then timezone is ignored. Find all the available timezones [the list of database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List) in the **TZ database name** column.
destination_timezone | No | String | Timezone used for storing timestamp in `destination` field. The available timezone values are the same as `source_timestamp`.
locale | No | String | Locale is used for parsing dates. It's commonly used for parsing month names(`MMM`). It can have language, country and variant fields using IETF BCP 47 or String representation of [Locale](https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html) object. For example `en-US` for IETF BCP 47 and `en_US` for string representation of Locale. Full list of locale fields which includes language, country and variant can be found [the language subtag registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry). Default value is `Locale.ROOT`.
`match` | Conditionally | [Match](#Match) | The date match configuration. This option cannot be defined at the same time as `from_time_received`. There is no default value.
`from_time_received` | Conditionally | Boolean | When `true`, the timestamp from the event metadata, which is the time at which the source receives the event, is added to the event data. This option cannot be defined at the same time as `match`. Default is `false`.
`date_when` | No | String | Specifies under what condition the `date` processor should perform matching. Default is no condition.
`to_origination_metadata` | No | Boolean | When `true`, the matched time is also added to the event's metadata as an instance of `Instant`. Default is `false`.
`destination` | No | String | The field used to store the timestamp parsed by the date processor. Can be used with both `match` and `from_time_received`. Default is `@timestamp`.
`output_format` | No | String | Determines the format of the timestamp added to an event. Default is `yyyy-MM-dd'T'HH:mm:ss.SSSXXX`.
`source_timezone` | No | String | The time zone used to parse dates, including when the zone or offset cannot be extracted from the value. If the zone or offset are part of the value, then the time zone is ignored. A list of all the available time zones is contained in the **TZ database name** column of [the list of database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List).
`destination_timezone` | No | String | The time zone used for storing the timestamp in the `destination` field. A list of all the available time zones is contained in the **TZ database name** column of [the list of database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List).
`locale` | No | String | The location used for parsing dates. Commonly used for parsing month names (`MMM`). The value can contain language, country, or variant fields in IETF BCP 47, such as `en-US`, or a string representation of the [locale](https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html) object, such as `en_US`. A full list of locale fields, including language, country, and variant, can be found in [the language subtag registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry). Default is `Locale.ROOT`.
<!-- vale on -->

<!---## Configuration
### Match

Content will be added to this section.--->
Option | Required | Type | Description
:--- | :--- | :--- | :---
`key` | Yes | String | Represents the event key against which to match patterns. Required if `match` is configured.
`patterns` | Yes | List | A list of possible patterns that the timestamp value of the key can have. The patterns are based on a sequence of letters and symbols. The `patterns` support all the patterns listed in the Java [DatetimeFormatter](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html) reference. The timestamp value also supports `epoch_second`, `epoch_milli`, and `epoch_nano` values, which represent the timestamp as the number of seconds, milliseconds, and nanoseconds since the epoch. Epoch values always use the UTC time zone.

## Metrics

Expand All @@ -40,5 +48,29 @@ The following table describes common [Abstract processor](https://github.com/ope

The `date` processor includes the following custom metrics.

* `dateProcessingMatchSuccessCounter`: Returns the number of records that match with at least one pattern specified by the `match configuration` option.
* `dateProcessingMatchFailureCounter`: Returns the number of records that did not match any of the patterns specified by the `patterns match` configuration option.
* `dateProcessingMatchSuccessCounter`: Returns the number of records that match at least one pattern specified by the `match configuration` option.
* `dateProcessingMatchFailureCounter`: Returns the number of records that did not match any of the patterns specified by the `patterns match` configuration option.

## Example: Add the default timestamp to an event
The following `date` processor configuration can be used to add a default timestamp in the `@timestamp` filed applied to all events:

```yaml
- date:
from_time_received: true
destination: "@timestamp"
```
## Example: Parse a timestamp to convert its format and time zone
The following `date` processor configuration can be used to parse the value of the timestamp applied to `dd/MMM/yyyy:HH:mm:ss` and write it in `yyyy-MM-dd'T'HH:mm:ss.SSSXXX` format:

```yaml
- date:
match:
- key: timestamp
patterns: ["dd/MMM/yyyy:HH:mm:ss"]
destination: "@timestamp"
output_format: "yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
source_timezone: "America/Los_Angeles"
destination_timezone: "America/Chicago"
locale: "en_US"
```
49 changes: 49 additions & 0 deletions _data-prepper/pipelines/configuration/processors/decompress.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
layout: default
title: decompress
parent: Processors
grand_parent: Pipelines
nav_order: 40
---

# decompress

The `decompress` processor decompresses any Base64-encoded compressed fields inside of an event.

## Configuration

Option | Required | Type | Description
:--- | :--- | :--- | :---
`keys` | Yes | List<String> | The fields in the event that will be decompressed.
`type` | Yes | Enum | The type of decompression to use for the `keys` in the event. Only `gzip` is supported.
`decompress_when` | No | String| A [Data Prepper conditional expression](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/) that determines when the `decompress` processor will run on certain events.
`tags_on_failure` | No | List<String> | A list of strings with which to tag events when the processor fails to decompress the `keys` inside an event. Defaults to `_decompression_failure`.

## Usage

The following example shows the `decompress` processor used in `pipelines.yaml`:

```yaml
processor:
- decompress:
decompress_when: '/some_key == null'
keys: [ "base_64_gzip_key" ]
type: gzip
```
## Metrics
The following table describes common [abstract processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/processor/AbstractProcessor.java) metrics.
| Metric name | Type | Description |
| ------------- | ---- | -----------|
| `recordsIn` | Counter | The ingress of records to a pipeline component. |
| `recordsOut` | Counter | The egress of records from a pipeline component. |
| `timeElapsed` | Timer | The time elapsed during execution of a pipeline component. |

### Counter

The `decompress` processor accounts for the following metrics:

* `processingErrors`: The number of processing errors that have occurred in the `decompress` processor.

Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: delete_entries
parent: Processors
grand_parent: Pipelines
nav_order: 51
nav_order: 41
---

# delete_entries
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: dissect
parent: Processors
grand_parent: Pipelines
nav_order: 52
nav_order: 45
---

# dissect
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: drop_events
parent: Processors
grand_parent: Pipelines
nav_order: 53
nav_order: 46
---

# drop_events
Expand Down
Loading

0 comments on commit 06065f6

Please sign in to comment.