Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relational: Anti-join with IS DISTINCT FROM #8

Closed
1 task done
krlmlr opened this issue Aug 17, 2023 · 4 comments
Closed
1 task done

Relational: Anti-join with IS DISTINCT FROM #8

krlmlr opened this issue Aug 17, 2023 · 4 comments
Assignees

Comments

@krlmlr
Copy link
Collaborator

krlmlr commented Aug 17, 2023

What happens?

Supporting this might lead to faster execution in duckplyr.

To Reproduce

Needs duckdb/duckdb#8600.

con <- DBI::dbConnect(duckdb::duckdb())
experimental <- FALSE
invisible(
  DBI::dbExecute(con, "CREATE MACRO \"___eq_na_matches_na\"(x, y) AS (x IS DISTINCT FROM y)")
)
df1 <- data.frame(a = 1L)

rel1 <- duckdb:::rel_from_df(con, df1, experimental = experimental)
rel2 <- duckdb:::rel_set_alias(rel1, "lhs")
rel3 <- duckdb:::rel_from_df(con, df1, experimental = experimental)
rel4 <- duckdb:::rel_set_alias(rel3, "rhs")
rel5 <- duckdb:::rel_join(
  rel2,
  rel4,
  list(
    duckdb:::expr_function(
      "___eq_na_matches_na",
      list(duckdb:::expr_reference("a", rel2), duckdb:::expr_reference("a", rel4))
    )
  ),
  "anti"
)
rel5
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Join REGULAR ANTI ___eq_na_matches_na(lhs.a, rhs.a)
#>   r_dataframe_scan(0x12821dee8)
#>   r_dataframe_scan(0x12821dee8)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (INTEGER)
duckdb:::rel_to_altrep(rel5)
#> Error in row.names.data.frame(x): Error evaluating duckdb query: Not implemented Error: Unimplemented comparison type for join!

Created on 2023-08-17 with reprex v2.0.2

OS:

macOS aarch64

DuckDB Version:

0ff709bdc628ea24111265eb66d74220ce3bb6df

DuckDB Client:

R

Full Name:

Kirill Müller

Affiliation:

cynkra GmbH

Have you tried this on the latest master branch?

I have tested with a master build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have
@hannes hannes transferred this issue from duckdb/duckdb Sep 13, 2023
@Tmonster
Copy link
Contributor

This works in the client, so this is a bug in the r-client

create table tb1 as select range*2 as a from range(100);
create table tb2 as select range*4 as a from range(100);
insert into tb2 (select NULL from range(20));
insert into tb1 (select NULL from range(20));
select * from tb1 join tb2 on (tb1.a is not distinct from  tb2.a);
select * from tb1 anti join tb2 on (tb1.a is not distinct from  tb2.a);

will take a look

@Tmonster
Copy link
Contributor

Tmonster commented Sep 20, 2023

Whoops, seems like a true duckdb bug. The macro is causing the problem it looks like.

CREATE MACRO ___eq_na_matches_na(x, y) AS (x IS DISTINCT FROM y);
create table tb1 as select range*2 as a from range(100);
create table tb2 as select range*4 as a from range(100);
insert into tb2 (select NULL from range(20));
insert into tb1 (select NULL from range(20));
SELECT * FROM tb1 AS lhs ANTI JOIN tb2 AS rhs ON (___eq_na_matches_na(lhs.a, rhs.a));
# Error: Not implemented Error: Unimplemented comparison type for join!

The error is thrown on line in file nested_loop_join_mark.cpp -> MarkJoinComparisonSwitch::131

@lnkuiper
Copy link
Contributor

I can send a PR to fix this after duckdb/duckdb#8979 gets merged, we're just missing a case there

@krlmlr
Copy link
Collaborator Author

krlmlr commented Nov 8, 2023

Works now (we need IS NOT DISTINCT FROM) :

con <- DBI::dbConnect(duckdb::duckdb())
experimental <- FALSE
invisible(
  DBI::dbExecute(con, "CREATE MACRO \"___eq_na_matches_na\"(x, y) AS (x IS NOT DISTINCT FROM y)")
)
df1 <- data.frame(a = 1:3)
df2 <- data.frame(a = 1L)

rel1 <- duckdb:::rel_from_df(con, df1, experimental = experimental)
rel2 <- duckdb:::rel_set_alias(rel1, "lhs")
rel3 <- duckdb:::rel_from_df(con, df2, experimental = experimental)
rel4 <- duckdb:::rel_set_alias(rel3, "rhs")
rel5 <- duckdb:::rel_join(
  rel2,
  rel4,
  list(
    duckdb:::expr_function(
      "___eq_na_matches_na",
      list(duckdb:::expr_reference("a", rel2), duckdb:::expr_reference("a", rel4))
    )
  ),
  "anti"
)
rel5
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Join REGULAR ANTI ___eq_na_matches_na(lhs.a, rhs.a)
#>   r_dataframe_scan(0x106f9fed8)
#>   r_dataframe_scan(0x106fe3730)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (INTEGER)
duckdb:::rel_to_altrep(rel5)
#>   a
#> 1 2
#> 2 3

Created on 2023-11-08 with reprex v2.0.2

@krlmlr krlmlr closed this as completed Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants