Updates documentation for the Avro codec and S3 sink. Resolves opense…

…arch-project#3162. (opensearch-project#3211) Signed-off-by: David Venable <[email protected]>
asifsmohammed · Aug 22, 2023 · 3d67468 · 3d67468
1 parent 727a9cb
commit 3d67468
Show file tree

Hide file tree

Showing 2 changed files with 14 additions and 151 deletions.
diff --git a/data-prepper-plugins/avro-codecs/README.md b/data-prepper-plugins/avro-codecs/README.md
@@ -1,89 +1,20 @@
-# Avro Sink/Output Codec
+# Avro codecs
 
-This is an implementation of Avro Sink Codec that parses the Data Prepper Events into Avro records and writes them into the underlying OutputStream.
+This project provides [Apache Avro](https://avro.apache.org/) support for Data Prepper. It includes an input codec, and output codec, and common libraries which can be used by other projects using Avro.
 
-## Usages
+## Usage
 
-Avro Output Codec can be configured with sink plugins (e.g. S3 Sink) in the Pipeline file. 
+For usage information, see the Data Prepper documentation:
 
-## Configuration Options
+* [S3 source](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sources/s3/) 
+* [S3 sink](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sinks/s3/) 
 
-```
-pipeline:
-  ...
-  sink:
-    - s3:
-        aws:
-          region: us-east-1
-          sts_role_arn: arn:aws:iam::123456789012:role/Data-Prepper
-          sts_header_overrides:
-        max_retries: 5
-        bucket: bucket_name
-        object_key:
-          path_prefix: vpc-flow-logs/%{yyyy}/%{MM}/%{dd}/
-        threshold:
-          event_count: 2000
-          maximum_size: 50mb
-          event_collect_timeout: 15s
-        codec:
-          avro:
-            schema: >
-              {
-                "type" : "record",
-                "namespace" : "org.opensearch.dataprepper.examples",
-                "name" : "VpcFlowLog",
-                "fields" : [
-                  { "name" : "version", "type" : ["null", "string"]},
-                  { "name" : "srcport", "type": ["null", "int"]},
-                  { "name" : "dstport", "type": ["null", "int"]},
-                  { "name" : "accountId", "type" : ["null", "string"]},
-                  { "name" : "interfaceId", "type" : ["null", "string"]},
-                  { "name" : "srcaddr", "type" : ["null", "string"]},
-                  { "name" : "dstaddr", "type" : ["null", "string"]},
-                  { "name" : "start", "type": ["null", "int"]},
-                  { "name" : "end", "type": ["null", "int"]},
-                  { "name" : "protocol", "type": ["null", "int"]},
-                  { "name" : "packets", "type": ["null", "int"]},
-                  { "name" : "bytes", "type": ["null", "int"]},
-                  { "name" : "action", "type": ["null", "string"]},
-                  { "name" : "logStatus", "type" : ["null", "string"]}
-                ]
-              }
-            exclude_keys:
-              - s3
-        buffer_type: in_memory
-```
-
-## AWS Configuration
-
-### Codec Configuration:
-
-1) `schema`: A json string that user can provide in the yaml file itself. The codec parses schema object from this schema string. 
-2) `exclude_keys`: Those keys of the events that the user wants to exclude while converting them to avro records.
-
-### Note:
-
-1) User can provide only one schema at a time i.e. through either of the ways provided in codec config.
-2) If the user wants the tags to be a part of the resultant Avro Data and has given `tagsTargetKey` in the config file, the user also has to modify the schema to accommodate the tags. Another field has to be provided in the `schema.json` file:
-
-    `{
-   "name": "yourTagsTargetKey",
-   "type": { "type": "array",
-   "items": "string"
-   }`
-3) If the user doesn't provide any schema, the codec will auto-generate schema from the first event in the buffer.
 
 ## Developer Guide
 
-This plugin is compatible with Java 11. See below
-
-- [CONTRIBUTING](https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md)
-- [monitoring](https://github.com/opensearch-project/data-prepper/blob/main/docs/monitoring.md)
+See the [CONTRIBUTING](https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md) guide for general information on contributions.
 
 The integration tests for this plugin do not run as part of the Data Prepper build.
+They are included only with the S3 source or S3 sink for now.
 
-The following command runs the integration tests:
-
-```
-./gradlew :data-prepper-plugins:s3-sink:integrationTest -Dtests.s3sink.region=<your-aws-region> -Dtests.s3sink.bucket=<your-bucket>
-```
+See the README files for those projects for information on running those tests.
diff --git a/data-prepper-plugins/s3-sink/README.md b/data-prepper-plugins/s3-sink/README.md
@@ -1,83 +1,15 @@
-# S3 Sink
+# S3 sink
 
-This is the Data Prepper S3 sink plugin that sends records to an S3 bucket via S3Client.
+The `s3` sink saves batches of events to Amazon Simple Storage Service (Amazon S3) objects.
 
-## Usages
+## Usage
 
-The s3 sink should be configured as part of Data Prepper pipeline yaml file.
-
-## Configuration Options
-
-```
-pipeline:
-  ...
-  sink:
-    - s3:
-        aws:
-          region: us-east-1
-          sts_role_arn: arn:aws:iam::123456789012:role/Data-Prepper
-          sts_header_overrides:
-        max_retries: 5
-        bucket: bucket_name
-        object_key:
-          path_prefix: my-elb/%{yyyy}/%{MM}/%{dd}/
-        threshold:
-          event_count: 2000
-          maximum_size: 50mb
-          event_collect_timeout: 15s
-        codec:
-          ndjson:
-        buffer_type: in_memory
-```
-
-## AWS Configuration
-
-- `region` (Optional) : The AWS region to use for credentials. Defaults to [standard SDK behavior to determine the region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html).
-
-- `sts_role_arn` (Optional) : The AWS STS role to assume for requests to S3. which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html). 
-
-- `sts_external_id` (Optional) : The external ID to attach to AssumeRole requests.
-
-- `max_retries` (Optional) : An integer value indicates the maximum number of times that single request should be retired in-order to ingest data to amazon s3. Defaults to `5`.
-
-- `bucket` (Required) : The name of the S3 bucket to write to.
-
-- `object_key` (Optional) : It contains `path_prefix` and `file_pattern`. Defaults to s3 object `events-%{yyyy-MM-dd'T'hh-mm-ss}` inside bucket root directory.
-
-- `path_prefix` (Optional) : path_prefix nothing but directory structure inside bucket in-order to store objects. Defaults to `none`.
-
-## Threshold Configuration
-
-- `event_count` (Required) : An integer value indicates the maximum number of events required to ingest into s3-bucket as part of threshold.
-
-- `maximum_size` (Optional) : A String representing the count or size of bytes required to ingest into s3-bucket as part of threshold. Defaults to `50mb`.
-
-- `event_collect_timeout` (Required) : A String representing how long events should be collected before ingest into s3-bucket as part of threshold. All Duration values are a string that represents a duration. They support ISO_8601 notation string ("PT20.345S", "PT15M", etc.) as well as simple notation Strings for seconds ("60s") and milliseconds ("1500ms").
-
-## Buffer Type Configuration
-
-- `buffer_type` (Optional) : Records stored temporary before flushing into s3 bucket. Possible values are `local_file` and `in_memory`. Defaults to `in_memory`.
-
-## Metrics
-
-### Counters
-
-* `s3SinkObjectsSucceeded` - The number of S3 objects that the S3 sink has successfully written to S3.
-* `s3SinkObjectsFailed` - The number of S3 objects that the S3 sink failed to write to S3.
-* `s3SinkObjectsEventsSucceeded` - The number of records that the S3 sink has successfully written to S3.
-* `s3SinkObjectsEventsFailed` - The number of records that the S3 sink has failed to write to S3.
-
-### Distribution Summaries
-
-* `s3SinkObjectSizeBytes` - Measures the distribution of the S3 request's payload size in bytes.
+For information on usage, see the [s3 sink documentation](https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sinks/s3/).
 
 
 ## Developer Guide
 
-This plugin is compatible with Java 11. See below
-
-- [CONTRIBUTING](https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md)
-- [monitoring](https://github.com/opensearch-project/data-prepper/blob/main/docs/monitoring.md)
+See the [CONTRIBUTING](https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md) guide for general information on contributions.
 
 The integration tests for this plugin do not run as part of the Data Prepper build.