GH-41011: [C++] Add an output type resolver for decimal types in CompareFunction so can be casted correctly #41012

ZhangHuiGui · 2024-04-04T15:37:56Z

Rationale for this change

Fix the problem that decimal types array compare with different scales make error results.

What changes are included in this PR?

Add an output type resolver for decimal types in CompareFunction so can be casted correctly

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, decimal type's comparison operations with different scale will be correct.

Co-authored-by ZhangHuiGui [email protected]
Co-authored-by laotan332 [email protected]

GitHub Issue: [C++] Decimal types array compare with different scales make error results #41011

github-actions · 2024-04-04T15:38:26Z

⚠️ GitHub issue #41011 has been automatically assigned in GitHub to PR creator.

ZhangHuiGui · 2024-04-11T07:31:53Z

@bkietz Would you mind take a look at this? A simple fix for the decimal type's compare function:)

felipecrv · 2024-05-10T15:02:34Z

cpp/src/arrow/compute/kernels/scalar_compare.cc

+  }
+  return boolean();
+}
+


It's an interesting approach to use the output-type resolver to assert on the input types, but what you really doing here is asserting on the inputs. I think you should write a custom matcher instead. Compare always returns boolean(), there is nothing to resolve.

That would also help (in the future) dispatching to kernels that can deal with different scales dynamically. Values can still be logically equal even though they are represented physically in memory with different scales.

I've read the issue and it seems that the problem here is that we need a special cast when the scales don't match because the generic cast can't currently handle that.

Or maybe we want users to be explicit with casts before comparing. I still think this should be handled by the input matching machinery.

but what you really doing here is asserting on the inputs.

Yes, this line of code DCHECK_EQ(left_type.id(), right_type.id()); in ResolveDecimalCompareOutputType is problematic because we might be comparing float and decimal or decimal128 and decimal256...

Or maybe we want users to be explicit with casts before comparing. I still think this should be handled by the input matching machinery.

The current matcher system only works on matching of builtin types, and what the compare kernel function needs is whether the dependencies between builtin types meet the requirements. This requires adding a bool Matches(const std::vector<TypeHolder>& types) const; interface to TypeMatcher.

That means the previous semantics of TypeMatcher was to check the validity of builtin types. Now it is necessary to check the legality of dependencies generated when functions use builtin types. Is this reasonable for the design of TypeMatcher?

But in fact, it is indeed more reasonable to use matcher to determine whether the comparison operation of decimal types is legal.

Besides, there is a similar case in IfElse related kernel functions : #41363.
Do the input types' check in the output resolver in IfElse case is reasonable?

Is this reasonable for the design of TypeMatcher?

As long as it checks only the types and the code is simple enough that we can reason about the set of combinations handled.

If some combinations are invalid, but you want to generate a message, I think you could have the type matcher route bad combinations to these fail kernels that produce an error instead of automatically casting inputs.

That way, the output type of "compare" remains boolean as it should be. No extra logic needed for output type resolution, but the function is not total — for some input types, it produces a bad Status.

As long as it checks only the types and the code is simple enough that we can reason about the set of combinations handled.

Thanks, it's more concise to use TypeMatcher check the input types.

ZhangHuiGui · 2024-05-17T08:17:11Z

cpp/src/arrow/compute/kernel.cc

@@ -505,9 +510,14 @@ bool KernelSignature::Equals(const KernelSignature& other) const {
 }

 bool KernelSignature::MatchesInputs(const std::vector<TypeHolder>& types) const {
+  auto is_match_combination_types = [&](const InputType& in_type) {


Add type matcher acceptable rule for the combination types.

mapleFU · 2024-05-17T08:31:15Z

cpp/src/arrow/compute/kernels/scalar_compare.cc

+    //
+    const auto& left_type = checked_cast<const DecimalType&>(*types[0]);
+    const auto& right_type = checked_cast<const DecimalType&>(*types[1]);
+    assert(is_decimal(left_type) && is_decimal(right_type));


Is this duplicate?

Ah, yeah. Thanks, the line is for test. :)

mapleFU · 2024-05-17T08:32:44Z

cpp/src/arrow/compute/kernels/scalar_compare.cc

+    const int32_t s1 = left_type.scale();
+    const int32_t s2 = right_type.scale();


So same scale with different precision can be matched here?

Yes, kAdd promotion rule only care about the scale and the implicit cast with this rule only keep scales same.

arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

Lines 509 to 523 in 2dbc5e2

Result<TypeHolder> ResolveDecimalAdditionOrSubtractionOutput(

KernelContext*, const std::vector<TypeHolder>& types) {

return ResolveDecimalBinaryOperationOutput(

types,

[](int32_t p1, int32_t s1, int32_t p2,

int32_t s2) -> Result<std::pair<int32_t, int32_t>> {

if (s1 != s2) {

return Status::Invalid("Addition or subtraction of two decimal ",

"types scale1 != scale2. (", s1, s2, ").");

}

DCHECK_EQ(s1, s2);

const int32_t scale = s1;

const int32_t precision = std::max(p1 - s1, p2 - s2) + scale + 1;

return std::make_pair(precision, scale);

});

mapleFU · 2024-05-17T08:34:21Z

cpp/src/arrow/compute/kernel.cc

  if (is_varargs_) {
    for (size_t i = 0; i < types.size(); ++i) {
-      if (!in_types_[std::min(i, in_types_.size() - 1)].Matches(*types[i])) {
+      auto in_type = in_types_[std::min(i, in_types_.size() - 1)];


Suggested change

auto in_type = in_types_[std::min(i, in_types_.size() - 1)];

const auto& in_type = in_types_[std::min(i, in_types_.size() - 1)];

…ernel

ZhangHuiGui · 2024-05-21T00:44:51Z

cc @felipecrv , the new changes include the TypeMatcher way to return failure for combination types.

felipecrv · 2024-06-05T20:32:04Z

cc @felipecrv , the new changes include the TypeMatcher way to return failure for combination types.

I've been away. I will take a look soon.

felipecrv · 2024-06-05T21:22:36Z

cpp/src/arrow/compute/kernel.h

+  /// \brief Return true if this matcher accepts the combination types
+  virtual bool Matches(const std::vector<TypeHolder>& types) const { return true; }
+


This class is for matching a single matcher (this) against a single type.

I don't think it's reasonable to extend the type matching this way as it makes type checking Turing-complete (wouldn't be easy/possible to encode the typing rules in a provably-decidable type system).

I'm sorry for this back and forth because I was the one that said "type matching machinery". I still think this logic shouldn't go on the output type resolver so I have a more precise suggestion this time. Since functions already have a way of implementing custom type matching rules -- DispatchBest —— you can add this logic in

arrow/cpp/src/arrow/compute/kernels/scalar_compare.cc

Lines 343 to 345 in 9ee6ea7

if (HasDecimal(*types)) {

RETURN_NOT_OK(CastBinaryDecimalArgs(DecimalPromotion::kAdd, types));

}

and return return arrow::compute::detail::NoMatchingKernel(this, *types); or a Status::NotImplemented with a message that describes that you can't compare two decimals with different scales.

Ah, the root cause of the problem that this PR is trying to solve is actually CompareFunction with two different decimal scales in BindNonRecursive , it is impossible to enter the logic of DispatchBest.

arrow/cpp/src/arrow/compute/expression.cc

Lines 556 to 563 in 9ee6ea7

// First try and bind exactly

Result<const Kernel*> maybe_exact_match = call.function->DispatchExact(types);

if (maybe_exact_match.ok()) {

call.kernel = *maybe_exact_match;

if (FinishBind().ok()) {

return Expression(std::move(call));

}

}

Which means the DispatchExact will always return ok and can't go into DispatchBest when CompareFunction called by expression system with two different decimal scales.

This is why we did type judgment in the output type Resolver before, so that we can return not-ok in the first FinishBind and enter DispatchBest.

DispatchBest does not need to make additional judgments, because it will do cast according to the decimal rules, ensuring that different input scales are cast to the same scales according to DecimalPromotion::kAdd.

github-actions bot added Component: C++ awaiting review Awaiting review labels Apr 4, 2024

fix decimal compare wrong in compute expression

8ca5f83

ZhangHuiGui force-pushed the fix-deciaml-compare-wrong-in-compute-expression branch from d837aeb to 8ca5f83 Compare April 10, 2024 14:46

pitrou requested a review from bkietz April 15, 2024 14:16

ZhangHuiGui requested a review from felipecrv May 10, 2024 13:53

felipecrv requested changes May 10, 2024

View reviewed changes

github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels May 10, 2024

ZhangHuiGui commented May 17, 2024

View reviewed changes

mapleFU reviewed May 17, 2024

View reviewed changes

support type_matcher for combination input types in decimal compare k…

a7703b9

…ernel

ZhangHuiGui force-pushed the fix-deciaml-compare-wrong-in-compute-expression branch from cd69ab2 to a7703b9 Compare May 17, 2024 10:09

ZhangHuiGui mentioned this pull request May 25, 2024

GH-41336: [C++][Compute] Fix the bug of decimal types skipping cast in IfElse related expression function calls #41363

Open

felipecrv requested changes Jun 5, 2024

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-41011: [C++] Add an output type resolver for decimal types in CompareFunction so can be casted correctly #41012

GH-41011: [C++] Add an output type resolver for decimal types in CompareFunction so can be casted correctly #41012

ZhangHuiGui commented Apr 4, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Apr 4, 2024

ZhangHuiGui commented Apr 11, 2024 •

edited

Loading

felipecrv May 10, 2024

felipecrv May 10, 2024 •

edited

Loading

felipecrv May 10, 2024

ZhangHuiGui May 10, 2024

ZhangHuiGui May 10, 2024

felipecrv May 14, 2024

felipecrv May 14, 2024

ZhangHuiGui May 17, 2024

ZhangHuiGui May 17, 2024 •

edited

Loading

mapleFU May 17, 2024

ZhangHuiGui May 17, 2024

mapleFU May 17, 2024

ZhangHuiGui May 17, 2024 •

edited

Loading

mapleFU May 17, 2024

ZhangHuiGui commented May 21, 2024

felipecrv commented Jun 5, 2024

felipecrv Jun 5, 2024

ZhangHuiGui Jun 6, 2024 •

edited

Loading

		const int32_t s1 = left_type.scale();
		const int32_t s2 = right_type.scale();

	Result<TypeHolder> ResolveDecimalAdditionOrSubtractionOutput(
	KernelContext*, const std::vector<TypeHolder>& types) {
	return ResolveDecimalBinaryOperationOutput(
	types,
	[](int32_t p1, int32_t s1, int32_t p2,
	int32_t s2) -> Result<std::pair<int32_t, int32_t>> {
	if (s1 != s2) {
	return Status::Invalid("Addition or subtraction of two decimal ",
	"types scale1 != scale2. (", s1, s2, ").");
	}
	DCHECK_EQ(s1, s2);
	const int32_t scale = s1;
	const int32_t precision = std::max(p1 - s1, p2 - s2) + scale + 1;
	return std::make_pair(precision, scale);
	});

	auto in_type = in_types_[std::min(i, in_types_.size() - 1)];
	const auto& in_type = in_types_[std::min(i, in_types_.size() - 1)];

		/// \brief Return true if this matcher accepts the combination types
		virtual bool Matches(const std::vector<TypeHolder>& types) const { return true; }

	if (HasDecimal(*types)) {
	RETURN_NOT_OK(CastBinaryDecimalArgs(DecimalPromotion::kAdd, types));
	}

	// First try and bind exactly
	Result<const Kernel*> maybe_exact_match = call.function->DispatchExact(types);
	if (maybe_exact_match.ok()) {
	call.kernel = *maybe_exact_match;
	if (FinishBind().ok()) {
	return Expression(std::move(call));
	}
	}

GH-41011: [C++] Add an output type resolver for decimal types in CompareFunction so can be casted correctly #41012

Are you sure you want to change the base?

GH-41011: [C++] Add an output type resolver for decimal types in CompareFunction so can be casted correctly #41012

Conversation

ZhangHuiGui commented Apr 4, 2024 • edited by github-actions bot Loading

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

github-actions bot commented Apr 4, 2024

ZhangHuiGui commented Apr 11, 2024 • edited Loading

Choose a reason for hiding this comment

felipecrv May 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZhangHuiGui May 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZhangHuiGui May 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZhangHuiGui commented May 21, 2024

felipecrv commented Jun 5, 2024

Choose a reason for hiding this comment

ZhangHuiGui Jun 6, 2024 • edited Loading

Choose a reason for hiding this comment

ZhangHuiGui commented Apr 4, 2024 •

edited by github-actions bot

Loading

ZhangHuiGui commented Apr 11, 2024 •

edited

Loading

felipecrv May 10, 2024 •

edited

Loading

ZhangHuiGui May 17, 2024 •

edited

Loading

ZhangHuiGui May 17, 2024 •

edited

Loading

ZhangHuiGui Jun 6, 2024 •

edited

Loading