-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RowStringInterpolator utility #32367
Conversation
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
assign set of reviewers |
tests are red because #32366 hasn't been merged yet, but this PR is still ready for review |
Assigning reviewers. If you would like to opt out of this review, comment R: @damondouglas for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Reminder, please take a look at this pr: @damondouglas |
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @robertwb for label java. Available commands:
|
sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowStringInterpolator.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowStringInterpolator.java
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowStringInterpolator.java
Outdated
Show resolved
Hide resolved
fieldsToReplace = new HashSet<>(); | ||
while (m.find()) { | ||
fieldsToReplace.add(StringUtils.strip(m.group(), "{}")); | ||
} | ||
|
||
List<String> rowFields = | ||
fieldsToReplace.stream() | ||
.filter(f -> !WINDOWING_METADATA.contains(f)) | ||
.collect(Collectors.toList()); | ||
|
||
RowFilter.validateSchemaContainsFields(rowSchema, rowFields, "string interpolation"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following results from my concern for doing work inside a constructor. I don't want to block this PR. I will move on if you resolve this comment. However, brainstorming out loud, may we consider a constructor, possibly package private, that accepts the template and fieldsToReplace. Then provide a static method, perhaps called compile, that performs the work of this constructor. The benefits of this is that testing the interpolate could be independent of testing the compile phase. Also, this separation may make changes in the future possibly easier as they seem like separate but related phases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if it was not possible for one to instantiate an invalid interpolator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to not being able to instantiate an invalid interpolator.
} | ||
|
||
/** Like {@link #interpolate(Row)} but also potentially include windowing information. */ | ||
public String interpolate(Row row, BoundedWindow window, PaneInfo paneInfo, Instant timestamp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non blocking but I was concerned about the possibility for future new sources of interpolation. For example, what if someone wants to add a new Thing but include the existing sources of interpolate, row, window, etc? Then in another future, someone else wants to add a new Thing2. This may require some thought. However, what I was envisioning was something like this:
interface Interpolator {
String interpolate();
}
// instantiated with a helper method maybe
class RowInterpolator {
String interpolate() { }
}
class RowStringInterpolator
public String interpolate(Interpolator ...interpolators) { ... }
}
String result = interpolator.interpolate(rowInterpolator, windowInterpolator, paneInfoInterpolator, timestampInterpolator);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I could see this being something like
interface Interpolator<T> {
String interpolate(T);
...
}
RowStringInterpolator implements Interpolator<Row> {}
With that said though, I'd prefer we leave it off until it is warranted. Most of this work is just for cross-language and portability, where Beam Row is used as a standard. I haven't come across a case where we would benefit with String interpolation using another type, so modularizing this may not actually make a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that we would want to support "row-compatible" objects, not just Rows, but I don't think that needs to be done from the start.
Thanks @damondouglas, PTAL |
sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowStringInterpolator.java
Outdated
Show resolved
Hide resolved
fieldsToReplace = new HashSet<>(); | ||
while (m.find()) { | ||
fieldsToReplace.add(StringUtils.strip(m.group(), "{}")); | ||
} | ||
|
||
List<String> rowFields = | ||
fieldsToReplace.stream() | ||
.filter(f -> !WINDOWING_METADATA.contains(f)) | ||
.collect(Collectors.toList()); | ||
|
||
RowFilter.validateSchemaContainsFields(rowSchema, rowFields, "string interpolation"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to not being able to instantiate an invalid interpolator.
public static final String DD = "$DD"; | ||
private static final Set<String> WINDOWING_METADATA = | ||
Sets.newHashSet(WINDOW, PANE_INDEX, YYYY, MM, DD); | ||
private static final Pattern TEMPLATE_PATTERN = Pattern.compile("\\{(.+?)}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not care about being able to escape a {
? (Maybe this is fine as future work.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It complains if we don't escape it in the pattern:
java.util.regex.PatternSyntaxException: Illegal repetition {(.+?)}
Normally, curly brackets signify how many times a character should be repeated
} | ||
|
||
/** Like {@link #interpolate(Row)} but also potentially include windowing information. */ | ||
public String interpolate(Row row, BoundedWindow window, PaneInfo paneInfo, Instant timestamp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that we would want to support "row-compatible" objects, not just Rows, but I don't think that needs to be done from the start.
sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowStringInterpolator.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
/** Performs string interpolation on the template using values from the input {@link Row}. */ | ||
public String interpolate(Row row) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation to have a separate no-windowing implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Figured why not, but on a second thought will remove it to keep windowing properties as first-class citizens
Utility to substitute values from Beam Rows into a template String.
Depends on #32366
Part of #32365
See #31807 for how this can work for portable dynamic destinations