-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-7749][VL] Trim ISOControl characters in string for casting to integral type #7806
Conversation
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxSparkPlanExecApi.scala
Outdated
Show resolved
Hide resolved
Run Gluten Clickhouse CI on x86 |
1 similar comment
Run Gluten Clickhouse CI on x86 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically look good to me. @PHILO-HE any more comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wForget, thanks for your fix! Could you give me some details about where these control chars are skipped when casting to integral types in Spark? I only found white space characters are checked and skipped.
@@ -723,30 +723,29 @@ class VeloxSparkPlanExecApi extends SparkPlanExecApi { | |||
val trimParaSepStr = "\u2029" | |||
// Needs to be trimmed for casting to float/double/decimal | |||
val trimSpaceStr = ('\u0000' to '\u0020').toList.mkString | |||
// ISOControl characters, refer java.lang.Character.isISOControl(int) | |||
val isoControlControlStr = (('\u0000' to '\u001F') ++ ('\u007F' to '\u009F')).toList.mkString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isoControlControlStr
->isoControlStr
, a typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, fixed
Run Gluten Clickhouse CI on x86 |
@wForget, I see. What I checked is Spark-3.3.1, which doesn't cover that change. Let's note this is not applicable to all supported Spark versions. But I think it may be acceptable to end users. |
What changes were proposed in this pull request?
Fixes: #7749, to align with Spark's change introduced by apache/spark#41535.
How was this patch tested?
added unit test