Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE] Fix incorrect precision of decimal literal #6954

Merged
merged 2 commits into from
Aug 28, 2024

Conversation

jiangjiangtian
Copy link
Contributor

For SQL as follows:

select (col0 / (col1 + 0.00000001)) from table;

In this case, col0 and col1 are 0.
The result may be NULL. The reason is that the result of Decimal(0.00000001).toString() is "1E-8", which will make the new precision be 4.
Therefore, we use toPlainString here to prevent scientific notation.

@github-actions github-actions bot added the CORE works for Gluten Core label Aug 21, 2024
@jiangjiangtian
Copy link
Contributor Author

@kecookier

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@jiangjiangtian
Copy link
Contributor Author

@rui-mo Can you review this PR?
I hava a question: why do we have this rescale in gluten? I can't find the same logic in Spark. I have read the comment and I still can't have a good understanding of it.
Thanks!

@kecookier kecookier requested a review from rui-mo August 26, 2024 03:06
@rui-mo
Copy link
Contributor

rui-mo commented Aug 27, 2024

Hi @jiangjiangtian, this adjustment, as I recall, is for the situation where one performs an arithmetic operation between a decimal and a number. In this instance, the number is converted to decimal and its precision and scale acquired from Spark are (38, 18), which are inconsistent with the real values. E.g., in the case you mentioned, Decimal(0.00000001) should have a precision and scale of (8, 8) instead of (38, 18). In order to produce accurate results, we need additional logic to extract the accurate precision and scale that native computing requires.

Perhaps you could help confirm if it is the case for the example you provided. Thanks.

Copy link
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. Would you like to add the buggy case as a unit test?

@jiangjiangtian
Copy link
Contributor Author

Hi @jiangjiangtian, this adjustment, as I recall, is for the situation where one performs an arithmetic operation between a decimal and a number. In this instance, the number is converted to decimal and its precision and scale acquired from Spark are (38, 18), which are inconsistent with the real values. E.g., in the case you mentioned, Decimal(0.00000001) should have a precision and scale of (8, 8) instead of (38, 18). In order to produce accurate results, we need additional logic to extract the accurate precision and scale that native computing requires.

Perhaps you could help confirm if it is the case for the example you provided. Thanks.

@rui-mo Thanks! It seems that Spark doesn't have the logic. I don't know why Spark doesn't need this logic.

In my case, the type of the literal 0.00000001 is Decimal(19, 8). After the adjustment, the type is still Decimal(19, 8). Because the string representation of the decimal contains . and the decimal is not a valid long number. Perhaps we should not return the original precision and scale in the end of the function. https://github.com/apache/incubator-gluten/blob/main/gluten-core/src/main/scala/org/apache/gluten/utils/DecimalArithmeticUtil.scala#L110

@github-actions github-actions bot added the VELOX label Aug 27, 2024
Copy link

Run Gluten Clickhouse CI

@jiangjiangtian
Copy link
Contributor Author

Thanks for the fix. Would you like to add the buggy case as a unit test?

I add a unit test. Is there anything that I need to add or edit? Thanks.

@rui-mo
Copy link
Contributor

rui-mo commented Aug 27, 2024

Because the string representation of the decimal contains . and the decimal is not a valid long number.

Thanks for reminding me of this. I just remembered that this adjustment is typically for the arithmetic operation between a decimal and an integer/bigint. In your case, the literal is double, so to return the original precision and scale should be fine.

withTable("test") {
sql("create table test (col0 decimal(10, 0), col1 decimal(10, 0)) using parquet")
sql("insert into test values (0, 0)")
runQueryAndCompare("select col0 / (col1 + 1E-8) from test") { _ => }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test failure as below:

Fix wrong rescale *** FAILED ***
  org.apache.spark.sql.AnalysisException: unknown requires that the data to be inserted have the same number of columns as the target table: target table has 3 column(s) but the inserted data has 2 column(s), including 0 partition column(s) having constant value(s).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@rui-mo rui-mo changed the title [CORE] Fix incorrect precision of Decimal literal [CORE] Fix incorrect precision of decimal literal Aug 28, 2024
@rui-mo rui-mo merged commit 7e800f6 into apache:main Aug 28, 2024
44 checks passed
@jiangjiangtian jiangjiangtian deleted the fix_precision branch August 28, 2024 05:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CORE works for Gluten Core VELOX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants