-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet Sink Codec #2928
Parquet Sink Codec #2928
Conversation
Signed-off-by: umairofficial <[email protected]>
Signed-off-by: umairofficial <[email protected]>
Signed-off-by: umairofficial <[email protected]>
Signed-off-by: umairofficial <[email protected]>
Signed-off-by: umairofficial <[email protected]>
Signed-off-by: umairofficial <[email protected]>
Signed-off-by: umairofficial <[email protected]>
Signed-off-by: umairofficial <[email protected]>
Signed-off-by: umairofficial <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make integration tests like the ones in https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/s3-source/src/integrationTest/java/org/opensearch/dataprepper/plugins/source?
You haven't addressed my previous comment. I want to see integration tests for all codecs where files are actually written to an S3 bucket and then read and asserted that they are correct. |
Team is working on integration test. |
As suggested by @dlvenable we have written integration test which integrates input and output codec and assert that the input data to input codec is equal to output data from outputcodec. Same we would be writing for Parquet as well. |
Signed-off-by: umairofficial <[email protected]>
Signed-off-by: umairofficial <[email protected]>
Signed-off-by: umairofficial <[email protected]>
Where are the integration tests for all the codecs? I would like to see some instructions like the bottom of https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/s3-source/README.md to run the integration tests and I want to see the command + output that runs the integration test. Similar to the the integration tests for the source:
|
The integration of all codecs are in the respective PRs. ITs for CSV, JSON and Avro codecs are in PR #2986 (in S3SinkServiceIT class). |
Signed-off-by: umairofficial <[email protected]>
Please see the readme.md file of |
} else { | ||
modifiedEvent = event; | ||
} | ||
for (final String key : modifiedEvent.toMap().keySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are anyways creating a new parquetRecord, I think it makes sense to add the "tagsTargetKey" also here without creating a 'modifiedEvent', right? I think you should be write like
if (tagsTargetKey != null) {
parquetRecord.put(tagsTargetKey, event.getMetadata().getTags());
}.
writer.write(parquestRecord)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Krishna, the logic behing modifying the event itself is because we are generating schema from event in case user doesn't supply any schema.If event won't be modified to contain tags target key and tags then parquet record won't be able to store it either.
# Conflicts: # data-prepper-plugins/avro-codecs/src/main/java/org/opensearch/dataprepper/plugins/codec/avro/AvroOutputCodec.java # data-prepper-plugins/csv-processor/src/main/java/org/opensearch/dataprepper/plugins/codec/csv/CsvOutputCodec.java # data-prepper-plugins/newline-codecs/src/main/java/org/opensearch/dataprepper/plugins/codec/newline/NewlineDelimitedOutputCodec.java # data-prepper-plugins/parquet-codecs/src/main/java/org/opensearch/dataprepper/plugins/codec/parquet/ParquetOutputCodec.java # data-prepper-plugins/parse-json-processor/src/main/java/org/opensearch/dataprepper/plugins/codec/json/JsonOutputCodec.java # data-prepper-plugins/s3-sink/build.gradle # data-prepper-plugins/s3-sink/src/integrationTest/java/org/opensearch/dataprepper/plugins/sink/S3SinkServiceIT.java # data-prepper-plugins/s3-sink/src/main/java/org/opensearch/dataprepper/plugins/sink/S3SinkService.java # data-prepper-plugins/s3-sink/src/main/java/org/opensearch/dataprepper/plugins/sink/accumulator/Buffer.java # data-prepper-plugins/s3-sink/src/main/java/org/opensearch/dataprepper/plugins/sink/accumulator/InMemoryBuffer.java # data-prepper-plugins/s3-sink/src/main/java/org/opensearch/dataprepper/plugins/sink/accumulator/LocalFileBuffer.java
Signed-off-by: umairofficial <[email protected]>
Description
ParquetSinkCodec
implementation addedIssues Resolved
Resolves #2403
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.