-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Sink Codecs - Follow up PR to 2881 #2898
Closed
Closed
Changes from 8 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
1ae457e
-Support for Sink Codecs
umairofficial cad24ed
-Support for Sink Codecs
umairofficial 3f61c27
-Support for Sink Codecs
umairofficial 4eed69e
-Support for Sink Codecs
umairofficial d4d5750
-Support for Sink Codecs
umairofficial 3e8c687
-Support for Sink Codecs
umairofficial 8e2f0c1
-Support for Sink Codecs
umairofficial 7992721
-Support for Sink Codecs
umairofficial 6db256c
-Support for Sink Codecs
umairofficial fe0a435
-Support for Sink Codecs
umairofficial 2a0ba3d
-Support for Sink Codecs
umairofficial 5128915
-Support for Sink Codecs
umairofficial 77aba5e
-Support for Sink Codecs
umairofficial File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# Avro Sink/Output Codec | ||
|
||
This is an implementation of Avro Sink Codec that parses the Dataprepper Events into avro records and writes them into the underlying OutputStream. | ||
|
||
## Usages | ||
|
||
Avro Output Codec can be configured with sink plugins (e.g. S3 Sink) in the Pipeline file. | ||
|
||
## Configuration Options | ||
|
||
``` | ||
pipeline: | ||
... | ||
sink: | ||
- s3: | ||
aws: | ||
region: us-east-1 | ||
sts_role_arn: arn:aws:iam::123456789012:role/Data-Prepper | ||
sts_header_overrides: | ||
max_retries: 5 | ||
bucket: bucket_name | ||
object_key: | ||
path_prefix: my-elb/%{yyyy}/%{MM}/%{dd}/ | ||
threshold: | ||
event_count: 2000 | ||
maximum_size: 50mb | ||
event_collect_timeout: 15s | ||
codec: | ||
avro: | ||
schema: "{\"namespace\": \"org.example.test\"," + | ||
" \"type\": \"record\"," + | ||
" \"name\": \"TestMessage\"," + | ||
" \"fields\": [" + | ||
" {\"name\": \"name\", \"type\": \"string\"}," + | ||
" {\"name\": \"age\", \"type\": \"int\"}]" + | ||
"}"; | ||
schema_file_location: "C:\\Users\\OM20254233\\Downloads\\schema.json" | ||
schema_registry_url: https://your.schema.registry.url.com | ||
exclude_keys: | ||
- s3 | ||
buffer_type: in_memory | ||
``` | ||
|
||
## AWS Configuration | ||
|
||
### Codec Configuration: | ||
|
||
1) `schema`: A json string that user can provide in the yaml file itself. The codec parses schema object from this schema string. | ||
2) `schema_file_location`: Path to the schema json file through which the user can provide schema. | ||
3) `exclude_keys`: Those keys of the events that the user wants to exclude while converting them to avro records. | ||
4) `schema_registry_url`: Another way of providing the schema through schema registry. | ||
|
||
### Note: | ||
|
||
User can provide only one schema at a time i.e. through either of the ways provided in codec config. | ||
|
||
## Developer Guide | ||
|
||
This plugin is compatible with Java 11. See below | ||
|
||
- [CONTRIBUTING](https://github.com/opensearch-project/data-prepper/blob/main/CONTRIBUTING.md) | ||
- [monitoring](https://github.com/opensearch-project/data-prepper/blob/main/docs/monitoring.md) | ||
|
||
The integration tests for this plugin do not run as part of the Data Prepper build. | ||
|
||
The following command runs the integration tests: | ||
|
||
``` | ||
./gradlew :data-prepper-plugins:s3-sink:integrationTest -Dtests.s3sink.region=<your-aws-region> -Dtests.s3sink.bucket=<your-bucket> | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
41 changes: 41 additions & 0 deletions
41
...-codecs/src/main/java/org/opensearch/dataprepper/plugins/codec/avro/AvroSchemaParser.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
/* | ||
* Copyright OpenSearch Contributors | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
package org.opensearch.dataprepper.plugins.codec.avro; | ||
|
||
import com.fasterxml.jackson.databind.ObjectMapper; | ||
import org.apache.avro.Schema; | ||
|
||
import java.io.FileNotFoundException; | ||
import java.io.IOException; | ||
import java.nio.file.Paths; | ||
import java.util.HashMap; | ||
import java.util.Map; | ||
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
public class AvroSchemaParser { | ||
private static final ObjectMapper mapper = new ObjectMapper(); | ||
private static final Logger LOG = LoggerFactory.getLogger(AvroOutputCodec.class); | ||
|
||
public static Schema parseSchemaFromJsonFile(final String location) throws IOException { | ||
final Map<?, ?> jsonMap; | ||
try { | ||
jsonMap = mapper.readValue(Paths.get(location).toFile(), Map.class); | ||
} catch (FileNotFoundException e) { | ||
LOG.error("Schema file not found, Error: {}", e.getMessage()); | ||
throw new IOException("Can't proceed without schema."); | ||
} | ||
final Map<Object,Object> schemaMap = new HashMap<Object,Object>(); | ||
for (Map.Entry<?, ?> entry : jsonMap.entrySet()) { | ||
schemaMap.put(entry.getKey(), entry.getValue()); | ||
} | ||
try{ | ||
return new Schema.Parser().parse(mapper.writeValueAsString(schemaMap)); | ||
}catch(Exception e) { | ||
LOG.error("Unable to parse schema from the provided schema file, Error: {}", e.getMessage()); | ||
throw new IOException("Can't proceed without schema."); | ||
} | ||
} | ||
} |
66 changes: 66 additions & 0 deletions
66
...ava/org/opensearch/dataprepper/plugins/codec/avro/AvroSchemaParserFromSchemaRegistry.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
package org.opensearch.dataprepper.plugins.codec.avro; | ||
|
||
import com.fasterxml.jackson.databind.JsonNode; | ||
import com.fasterxml.jackson.databind.ObjectMapper; | ||
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
import java.io.BufferedReader; | ||
import java.io.IOException; | ||
import java.io.InputStream; | ||
import java.io.InputStreamReader; | ||
import java.net.HttpURLConnection; | ||
import java.net.URL; | ||
|
||
public class AvroSchemaParserFromSchemaRegistry { | ||
private static final ObjectMapper mapper = new ObjectMapper(); | ||
private static final Logger LOG = LoggerFactory.getLogger(AvroSchemaParserFromSchemaRegistry.class); | ||
static String getSchemaType(final String schemaRegistryUrl) { | ||
final StringBuilder response = new StringBuilder(); | ||
String schemaType = ""; | ||
try { | ||
final String urlPath = schemaRegistryUrl; | ||
final URL url = new URL(urlPath); | ||
final HttpURLConnection connection = (HttpURLConnection) url.openConnection(); | ||
connection.setRequestMethod("GET"); | ||
final int responseCode = connection.getResponseCode(); | ||
if (responseCode == HttpURLConnection.HTTP_OK) { | ||
final BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream())); | ||
String inputLine; | ||
while ((inputLine = reader.readLine()) != null) { | ||
response.append(inputLine); | ||
} | ||
reader.close(); | ||
final Object json = mapper.readValue(response.toString(), Object.class); | ||
final String indented = mapper.writerWithDefaultPrettyPrinter().writeValueAsString(json); | ||
final JsonNode rootNode = mapper.readTree(indented); | ||
if(rootNode.get("schema") != null ){ | ||
return rootNode.get("schema").toString(); | ||
} | ||
} else { | ||
final InputStream errorStream = connection.getErrorStream(); | ||
final String errorMessage = readErrorMessage(errorStream); | ||
LOG.error("GET request failed while fetching the schema registry details : {}", errorMessage); | ||
} | ||
} catch (IOException e) { | ||
LOG.error("An error while fetching the schema registry details : ", e); | ||
throw new RuntimeException(); | ||
} | ||
return null; | ||
} | ||
|
||
private static String readErrorMessage(final InputStream errorStream) throws IOException { | ||
if (errorStream == null) { | ||
return null; | ||
} | ||
final BufferedReader reader = new BufferedReader(new InputStreamReader(errorStream)); | ||
final StringBuilder errorMessage = new StringBuilder(); | ||
String line; | ||
while ((line = reader.readLine()) != null) { | ||
errorMessage.append(line); | ||
} | ||
reader.close(); | ||
errorStream.close(); | ||
return errorMessage.toString(); | ||
} | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an implicit assumption here that the schema is always provided as a local file. What if the schema is provided as a file in S3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fetching Schema from a file present in S3 or elsewhere wasn't a part of the initial requirements.