forked from opensearch-project/data-prepper
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update dissect and user_agent readme (opensearch-project#4100)
* Update dissect and user_agent readme Signed-off-by: Hai Yan <[email protected]> * Fix format issue Signed-off-by: Hai Yan <[email protected]> --------- Signed-off-by: Hai Yan <[email protected]>
- Loading branch information
Showing
2 changed files
with
3 additions
and
170 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,129 +1,5 @@ | ||
# Dissect Processor | ||
|
||
The Dissect processor is useful when dealing with log files or messages that have a known pattern or structure. It extracts specific pieces of information from the text and map them to individual fields based on the defined Dissect patterns. | ||
The dissect processor extracts values from an event and maps them to individual fields based on user-defined dissect patterns. | ||
|
||
|
||
## Basic Usage | ||
|
||
To get started with dissect processor using Data Prepper, create the following `pipeline.yaml`. | ||
```yaml | ||
dissect-pipeline: | ||
source: | ||
file: | ||
path: "/full/path/to/dissect_logs_json.log" | ||
record_type: "event" | ||
format: "json" | ||
processor: | ||
- dissect: | ||
map: | ||
log: "%{Date} %{Time} %{Log_Type}: %{Message}" | ||
sink: | ||
- stdout: | ||
``` | ||
Create the following file named `dissect_logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` with the path of this file. | ||
|
||
``` | ||
{"log": "07-25-2023 10:00:00 ERROR: Some error"} | ||
``` | ||
The Dissect processor will retrieve the necessary fields from the `log` message, such as `Date`, `Time`, `Log_Type`, and `Message`, with the help of the pattern `%{Date} %{Time} %{Type}: %{Message}`, configured in the pipeline. | ||
When you run Data Prepper with this `pipeline.yaml` passed in, you should see the following standard output. | ||
``` | ||
{ | ||
"log" : "07-25-2023 10:00:00 ERROR: Some error", | ||
"Date" : "07-25-2023" | ||
"Time" : "10:00:00" | ||
"Log_Type" : "ERROR" | ||
"Message" : "Some error" | ||
} | ||
``` | ||
The fields `Date`, `Time`, `Log_Type`, and `Message` have been extracted from `log` value. | ||
## Configuration | ||
* `map` (Required): `map` is required to specify the dissect patterns. It takes a `Map<String, String>` with fields as keys and respective dissect patterns as values. | ||
* `target_types` (Optional): A `Map<String, String>` that specifies what the target type of specific field should be. Valid options are `integer`, `double`, `string`, and `boolean`. By default, all the values are `string`. Target types will be changed after the dissection process. | ||
* `dissect_when` (Optional): A Data Prepper Expression string following the [Data Prepper Expression syntax](../../docs/expression_syntax.md). When configured, the processor will evaluate the expression before proceeding with the dissection process and perform the dissection if the expression evaluates to `true`. | ||
## Field Notations | ||
Symbols like `?, +, ->, /, &` can be used to perform logical extraction of data. | ||
* **Normal Field** : The field without a suffix or prefix. The field will be directly added to the output Event. | ||
Ex: `%{field_name}` | ||
* **Skip Field** : ? can be used as a prefix to key to skip that field in the output JSON. | ||
* Skip Field : `%{}` | ||
* Named skip field : `%{?field_name}` | ||
* **Append Field** : To append multiple values and put the final value in the field, we can use + before the field name in the dissect pattern | ||
* **Usage**: | ||
Pattern : "%{+field_name}, %{+field_name}" | ||
Text : "foo, bar" | ||
Output : {"field_name" : "foobar"} | ||
We can also define the order the concatenation with the help of suffix `/<digits>` . | ||
* **Usage**: | ||
Pattern : "%{+field_name/2}, %{+field_name/1}" | ||
Text : "foo, bar" | ||
Output : {"field_name" : "barfoo"} | ||
If the order is not mentioned, the append operation will take place in the order of fields specified in the dissect pattern.<br><br> | ||
* **Indirect Field** : While defining a pattern, prefix the field with a `&` to assign the value found with this field to the value of another field found as the key. | ||
* **Usage**: | ||
Pattern : "%{?field_name}, %{&field_name}" | ||
Text: "foo, bar" | ||
Output : {“foo” : “bar”} | ||
Here we can see that `foo` which was captured from the skip field `%{?field_name}` is made the key to value captured form the field `%{&field_name}` | ||
* **Usage**: | ||
Pattern : %{field_name}, %{&field_name} | ||
Text: "foo, bar" | ||
Output : {“field_name”:“foo”, “foo”:“bar”} | ||
We can also indirectly assign the value to an appended field, along with `normal` field and `skip` field. | ||
### Padding | ||
* `->` operator can be used as a suffix to a field to indicate that white spaces after this field can be ignored. | ||
* **Usage**: | ||
Pattern : %{field1→} %{field2} | ||
Text : “firstname lastname” | ||
Output : {“field1” : “firstname”, “field2” : “lastname”} | ||
* This operator should be used as the right most suffix. | ||
* **Usage**: | ||
Pattern : %{fieldname/1->} %{fieldname/2} | ||
If we use `->` before `/<digit>`, the `->` operator will also be considered part of the field name. | ||
## Developer Guide | ||
This plugin is compatible with Java 14. See | ||
- [CONTRIBUTING](https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md) | ||
- [monitoring](https://github.com/opensearch-project/data-prepper/blob/main/docs/monitoring.md) | ||
See the [`dissect` processor documentation](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/dissect/). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,47 +1,4 @@ | ||
# User Agent Processor | ||
This processor parses User-Agent (UA) string in an event and add the parsing result to the event. | ||
|
||
## Basic Example | ||
An example configuration for the process is as follows: | ||
```yaml | ||
... | ||
processor: | ||
- user_agent: | ||
source: "ua" | ||
target: "user_agent" | ||
... | ||
``` | ||
|
||
Assume the event contains the following user agent string: | ||
```json | ||
{ | ||
"ua": "Mozilla/5.0 (iPhone; CPU iPhone OS 13_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Mobile/15E148 Safari/604.1" | ||
} | ||
``` | ||
|
||
The processor will parse the "ua" field and add the result to the specified target in the following format compatible with Elastic Common Schema (ECS): | ||
``` | ||
{ | ||
"user_agent": { | ||
"original": "Mozilla/5.0 (iPhone; CPU iPhone OS 13_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Mobile/15E148 Safari/604.1", | ||
"os": { | ||
"version": "13.5.1", | ||
"full": "iOS 13.5.1", | ||
"name": "iOS" | ||
}, | ||
"name": "Mobile Safari", | ||
"version": "13.1.1", | ||
"device": { | ||
"name": "iPhone" | ||
} | ||
}, | ||
"ua": "Mozilla/5.0 (iPhone; CPU iPhone OS 13_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Mobile/15E148 Safari/604.1" | ||
} | ||
``` | ||
|
||
## Configuration | ||
* `source` (Required) — The key to the user agent string in the Event that will be parsed. | ||
* `target` (Optional) — The key to put the parsing result in the Event. Defaults to `user_agent`. | ||
* `exclude_original` (Optional) — Whether to exclude original user agent string from the parsing result. Defaults to false. | ||
* `cache_size` (Optional) - Cache size to use in the parser. Should be a positive integer. Defaults to 1000. | ||
* `tags_on_parse_failure` (Optional) - Tags to add to an event if the processor fails to parse the user agent string. | ||
See the [`user_agent` processor documentation](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/user-agent/). |