-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escape control characters for DynamoDB source #5177
base: main
Are you sure you want to change the base?
Escape control characters for DynamoDB source #5177
Conversation
Signed-off-by: Paul Sasieta Arana <[email protected]>
char c = jsonData.charAt(i); | ||
if (Character.isISOControl(c) && c != '\t' && c != '\n' && c != '\r') { | ||
// Replace control characters with escaped versions (e.g. \u0000 for null, \u0001 for start of heading, etc.) | ||
sanitizedStringBuilder.append(String.format("\\u%04X", (int) c)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to write this string without calling String.format
to avoid the performance penalty?
StringBuilder sanitizedStringBuilder = new StringBuilder(); | ||
for (int i = 0; i < jsonData.length(); i++) { | ||
char c = jsonData.charAt(i); | ||
if (Character.isISOControl(c) && c != '\t' && c != '\n' && c != '\r') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to think that we should have this as an optional configuration to avoid breaking any existing behavior. Thoughts?
@@ -188,7 +188,8 @@ void test_writeSingleRecordToBuffer() throws Exception { | |||
"and/or", | |||
"c:\\Home", | |||
"I take\nup multiple\nlines", | |||
"String with some \"backquotes\"." | |||
"String with some \"backquotes\".", | |||
"String with some control characters: \0\1\2\3\4\5\6\7\10\11\12\13\14\15\16\17\20\21\22\23\24\25\26\27\28\29\30\31\127\b\t\n\f\r" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also have a test that provides both the input and the expected output to verify that the result is what we want.
Description
Escapes control characters for the DynamoDB source.
Issues Resolved
Resolves #5027
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.