Skip to content

Commit

Permalink
GH-41121: [C++] Fix: left anti join filter empty rows. (#41122)
Browse files Browse the repository at this point in the history
### Rationale for this change

Since the left anti filter implementation is based on the left semi filter, and an assertion error occurs when the left semi filter rows are empty, this problem should be fixed.

### What changes are included in this PR?

swiss_join.cc and hash_join_node_test.cc

### Are these changes tested?
Yes

### Are there any user-facing changes?
No

* GitHub Issue: #41121

Lead-authored-by: light-city <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
  • Loading branch information
2 people authored and raulcd committed Apr 15, 2024
1 parent 1e40252 commit 9cb361c
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 0 deletions.
23 changes: 23 additions & 0 deletions cpp/src/arrow/acero/hash_join_node_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2036,6 +2036,29 @@ TEST(HashJoin, ResidualFilter) {
[3, 4, "alpha", 4, 16, "alpha"]])")});
}

TEST(HashJoin, FilterEmptyRows) {
// Regression test for GH-41121.
BatchesWithSchema input_left;
input_left.batches = {
ExecBatchFromJSON({int32(), utf8(), int32()}, R"([[2, "Jarry", 28]])")};
input_left.schema =
schema({field("id", int32()), field("name", utf8()), field("age", int32())});

BatchesWithSchema input_right;
input_right.batches = {ExecBatchFromJSON(
{int32(), int32(), utf8()},
R"([[2, 10, "Jack"], [3, 12, "Mark"], [4, 15, "Tom"], [1, 10, "Jack"]])")};
input_right.schema =
schema({field("id", int32()), field("stu_id", int32()), field("subject", utf8())});

const ResidualFilterCaseRunner runner{std::move(input_left), std::move(input_right)};

Expression filter = greater(field_ref("age"), literal(25));

runner.Run(JoinType::LEFT_ANTI, {"id"}, {"stu_id"}, std::move(filter),
{ExecBatchFromJSON({int32(), utf8(), int32()}, R"([[2, "Jarry", 28]])")});
}

TEST(HashJoin, TrivialResidualFilter) {
Expression always_true =
equal(call("add", {field_ref("l1"), field_ref("r1")}), literal(2)); // 1 + 1 == 2
Expand Down
5 changes: 5 additions & 0 deletions cpp/src/arrow/acero/swiss_join.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2167,6 +2167,11 @@ Status JoinResidualFilter::FilterOneBatch(const ExecBatch& keypayload_batch,
ARROW_DCHECK(!output_payload_ids || payload_ids_maybe_null);

*num_passing_rows = 0;

if (num_batch_rows == 0) {
return Status::OK();
}

ARROW_ASSIGN_OR_RAISE(Datum mask,
EvalFilter(keypayload_batch, num_batch_rows, batch_row_ids,
key_ids_maybe_null, payload_ids_maybe_null));
Expand Down

0 comments on commit 9cb361c

Please sign in to comment.