Skip to content

Commit

Permalink
Update doc note and comments
Browse files Browse the repository at this point in the history
  • Loading branch information
nealrichardson committed Nov 5, 2024
1 parent ac69fba commit 1c470cc
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 8 deletions.
5 changes: 4 additions & 1 deletion r/R/arrow-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,10 @@ supported_dplyr_methods <- list(
relocate = NULL,
compute = NULL,
collapse = NULL,
distinct = "`.keep_all = TRUE` not supported",
distinct = c(
"`.keep_all = TRUE` returns a non-missing value if present,",
"only returning missing values if all are missing."
),
left_join = "the `copy` argument is ignored",
right_join = "the `copy` argument is ignored",
inner_join = "the `copy` argument is ignored",
Expand Down
10 changes: 3 additions & 7 deletions r/R/dplyr-distinct.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,10 @@ distinct.arrow_dplyr_query <- function(.data, ..., .keep_all = FALSE) {
}

if (isTRUE(.keep_all)) {
# (TODO) `.keep_all = TRUE` return first row value,
# but this implementation do NOT always return the same result
# because `hash_one` skips rows if they contain null value.
# Skipping null values is happened by each cols,
# so this option has possiblity to destory data.
# Note: in regular dplyr, `.keep_all = TRUE` returns the first row's value.
# However, Acero's `hash_one` function prefers returning non-null values.
# So, you'll get the same shape of data, but the values may differ.
keeps <- names(.data)[!(names(.data) %in% .data$group_by_vars)]
# `one()` is wrapper for calling "hash_one" function (implemented ARROW-13993)
# `USAGE: summarize(x = one(x), y = one(y) ...)` for x, y in non-group cols
exprs <- lapply(keeps, function(x) call2("one", sym(x)))
names(exprs) <- keeps
} else {
Expand Down

0 comments on commit 1c470cc

Please sign in to comment.