Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump rust-polars to 0.32.0 #334

Merged
merged 26 commits into from
Aug 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
ddf7df6
obey compiler halfway
sorhawell Jul 24, 2023
5b5db0b
fix more compiler errors & impl robj_to!(f64,)
sorhawell Jul 25, 2023
6a7178a
refactor Expr_sample
sorhawell Jul 25, 2023
2824957
fix remaining compiler errors
sorhawell Jul 27, 2023
24b25ed
document
sorhawell Jul 27, 2023
fd12433
merge main fix conflict bump to 0.32
sorhawell Aug 21, 2023
61fd186
obey compiler, only half-done on R sise, refactor when-then, fmt->fo…
sorhawell Aug 24, 2023
6c85afd
with last
sorhawell Aug 24, 2023
c4771ee
fix all unit tests and examples
sorhawell Aug 25, 2023
a88b2fe
with last
sorhawell Aug 25, 2023
892c787
move changes notes
sorhawell Aug 25, 2023
8d5891a
try fix docs
sorhawell Aug 28, 2023
1b9b42f
bump flume, ipc-channel, state, make release-optmized use lto="fat", …
sorhawell Aug 28, 2023
9aa4001
update docs msrv + date_range eager = true
sorhawell Aug 28, 2023
d50c292
update msrv to 1.70
sorhawell Aug 28, 2023
e5a1e63
update news 1
sorhawell Aug 28, 2023
a5b91f1
news + minor Makevars.win
sorhawell Aug 28, 2023
c4fd201
erxtendr 0.3.1 not 9000
sorhawell Aug 28, 2023
61bdeee
add more news
sorhawell Aug 28, 2023
a3e906e
make fmt
sorhawell Aug 28, 2023
995c6a3
Merge branch 'main' into bump_rust_31
etiennebacher Aug 28, 2023
6fdd7c0
tweak news
etiennebacher Aug 28, 2023
e687b4c
tweak readme and regen docs
eitsupi Aug 29, 2023
ea00a15
formatting
eitsupi Aug 29, 2023
be54ceb
ref to the main branch
eitsupi Aug 29, 2023
796c12d
some test requiers the package installed
eitsupi Aug 29, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ jobs:
shell: bash
run: |
echo "RPOLARS_FULL_FEATURES=true" >>$GITHUB_ENV
echo "RPOLARS_PROFILE=release-optimized" >>$GITHUB_ENV
echo "RPOLARS_PROFILE=release" >>$GITHUB_ENV

- uses: r-lib/actions/check-r-package@v2
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ concurrency:
env:
RPOLARS_FULL_FEATURES: "true"
RPOLARS_CARGO_CLEAN_DEPS: "true"
RPOLARS_PROFILE: release-optimized
RPOLARS_PROFILE: release

jobs:
documentation:
Expand Down
8 changes: 4 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
SHELL := /bin/bash
VENV := .venv

RUST_TOOLCHAIN_VERSION := nightly-2023-05-07
RUST_TOOLCHAIN_VERSION := nightly-2023-07-27

MANIFEST_PATH := src/rust/Cargo.toml

Expand Down Expand Up @@ -50,7 +50,7 @@ build: ## Compile polars R package with all features and generate Rd files
&& Rscript -e 'if (!(require(arrow)&&require(nanoarrow))) warning("could not load arrow/nanoarrow, igonore changes to nanoarrow.Rd"); rextendr::document()'

.PHONY: install
install:
install: ## Install the R package
export RPOLARS_FULL_FEATURES=true \
&& R CMD INSTALL --no-multiarch --with-keep.source .

Expand All @@ -77,8 +77,8 @@ LICENSE.note: src/rust/Cargo.lock ## Update LICENSE.note
Rscript -e 'rextendr::write_license_note(force = TRUE)'

.PHONY: test
test: build ## Run fast unittests
Rscript -e 'devtools::load_all(); devtools::test()'
test: build install ## Run fast unittests
Rscript -e 'devtools::test()'

.PHONY: fmt
fmt: fmt-rs fmt-r ## Format files
Expand Down
20 changes: 12 additions & 8 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ S3method("!",Expr)
S3method("!=",Expr)
S3method("!=",RPolarsDataType)
S3method("!=",Series)
S3method("$",ChainedThen)
S3method("$",ChainedWhen)
S3method("$",DataFrame)
S3method("$",DataTypeVector)
S3method("$",Expr)
Expand All @@ -25,10 +27,9 @@ S3method("$",RPolarsDataType)
S3method("$",RPolarsErr)
S3method("$",RThreadHandle)
S3method("$",Series)
S3method("$",Then)
S3method("$",VecDataFrame)
S3method("$",When)
S3method("$",WhenThen)
S3method("$",WhenThenThen)
S3method("$",pl_polars_env)
S3method("$",private_polars_env)
S3method("$<-",DataFrame)
Expand Down Expand Up @@ -56,6 +57,8 @@ S3method(">=",Series)
S3method("[",DataFrame)
S3method("[",ExprArrNameSpace)
S3method("[",LazyFrame)
S3method("[[",ChainedThen)
S3method("[[",ChainedWhen)
S3method("[[",DataFrame)
S3method("[[",DataTypeVector)
S3method("[[",Expr)
Expand All @@ -70,12 +73,13 @@ S3method("[[",RPolarsDataType)
S3method("[[",RPolarsErr)
S3method("[[",RThreadHandle)
S3method("[[",Series)
S3method("[[",Then)
S3method("[[",VecDataFrame)
S3method("[[",When)
S3method("[[",WhenThen)
S3method("[[",WhenThenThen)
S3method("^",Expr)
S3method("|",Expr)
S3method(.DollarNames,ChainedThen)
S3method(.DollarNames,ChainedWhen)
S3method(.DollarNames,DataFrame)
S3method(.DollarNames,Expr)
S3method(.DollarNames,GroupBy)
Expand All @@ -84,10 +88,9 @@ S3method(.DollarNames,RField)
S3method(.DollarNames,RPolarsErr)
S3method(.DollarNames,RThreadHandle)
S3method(.DollarNames,Series)
S3method(.DollarNames,Then)
S3method(.DollarNames,VecDataFrame)
S3method(.DollarNames,When)
S3method(.DollarNames,WhenThen)
S3method(.DollarNames,WhenThenThen)
S3method(.DollarNames,method_environment)
S3method(.DollarNames,polars_option_list)
S3method(as.character,RPolarsErr)
Expand Down Expand Up @@ -123,6 +126,8 @@ S3method(na.omit,DataFrame)
S3method(na.omit,LazyFrame)
S3method(names,DataFrame)
S3method(names,LazyFrame)
S3method(print,ChainedThen)
S3method(print,ChainedWhen)
S3method(print,DataFrame)
S3method(print,Expr)
S3method(print,GroupBy)
Expand All @@ -134,9 +139,8 @@ S3method(print,RPolarsDataType)
S3method(print,RPolarsErr)
S3method(print,RThreadHandle)
S3method(print,Series)
S3method(print,Then)
S3method(print,When)
S3method(print,WhenThen)
S3method(print,WhenThenThen)
S3method(print,polars_info)
S3method(print,polars_option_list)
S3method(row.names,DataFrame)
Expand Down
49 changes: 47 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,59 @@
# polars (development version)

## BREAKING CHANGES
# polars 0.7.0.9000

## CHANGES DUE TO RUST-POLARS 0.32.0

rust-polars was updated to 0.32.0, which comes with many breaking changes and new
features. Unrelated breaking changes and new features are put in separate sections
(#334):

- update of rust toolchain: nightly bumped to nightly-2023-07-27 and MSRV is
now >=1.70.
- param `common_subplan_elimination = TRUE` in `<LazyFrame>` methods `$collect()`,
`$sink_ipc()` and `$sink_parquet()` is renamed and split into
`comm_subplan_elim = TRUE` and `comm_subexpr_elim = TRUE`.
- Series_is_sorted: nulls_last argument is dropped.
- `when-then-otherwise` classes are renamed to `When`, `Then`, `ChainedWhen`
and `ChainedThen`. The syntactically illegal methods have been removed, e.g.
chaining `$when()` twice.
- Github release + R-universe is compiled with `profile=release-optimized`,
which now includes `strip=false`, `lto=fat` & `codegen-units=1`. This should
make the binary a bit smaller and faster. See also FULL_FEATURES=`true` env
flag to enable simd with nightly rust. For development or faster compilation,
use instead `profile=release`.
- `fmt` arg is renamed `format` in `pl$Ptimes` and `<Expr>$str$strptime`.
- `<Expr>$approx_unique()` changed name to `<Expr>$approx_n_unique()`.
- `<Expr>$str$json_extract` arg `pat` changed to `dtype` and has a new argument
`infer_schema_length = 100`.
- Some arguments in `pl$date_range()` have changed: `low` -> `start`,
`high` -> `end`, `lazy = TRUE` -> `eager = FALSE`. Args `time_zone` and `time_unit`
can no longer be used to implicitly cast time types. These two args can only
be used to annotate a naive time unit. Mixing `time_zone` and `time_unit` for
`start` and `end` is not allowed anymore.
- `<Expr>$is_in()` operation no longer supported for dtype `null`.
- Various subtle changes:
- `(pl$lit(NA_real_) == pl$lit(NA_real_))$lit_to_s()` renders now to `null`
not `true`.
- `pl$lit(NA_real_)$is_in(pl$lit(NULL))$lit_to_s()` renders now to `false`
and before `true`
- `pl$lit(numeric(0))$sum()$lit_to_s()` now yields `0f64` and not `null`.
- `<Expr>$all()` and `<Expr>$any()` have a new arg `drop_nulls = TRUE`.
- `<Expr>$sample()` and `<Expr>$shuffle()` have a new arg `fix_seed`.
- `<DataFrame>$sort()` and `<LazyFrame>$sort()` have a new arg
`maintain_order = FALSE`.

## OTHER BREAKING CHANGES

- `$rpow()` is removed. It should never have been translated. Use `^` and `$pow()`
instead (#346).
- `<LazyFrame>$collect_background()` renamed `<LazyFrame>$collect_in_background()`
and reworked. Likewise `PolarsBackgroundHandle` reworked and renamed to
`RThreadHandle` (#311).
- `pl$scan_arrow_ipc` is now called `pl$scan_ipc` (#343).

## What's changed
## Other changes

- Stream query to file with `pl$sink_ipc()` and `pl$sink_parquet()` (#343)
- New method `$explode()` for `DataFrame` and `LazyFrame` (#314).
- New method `$clone()` for `LazyFrame` (#347).
Expand Down
14 changes: 7 additions & 7 deletions R/PTime.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ time_unit_conv_factor = c(
#' @param x an integer or double vector of n epochs since midnight OR a char vector of char times
#' passed to as.POSIXct converted to seconds.
#' @param tu timeunit either "s","ms","us","ns"
#' @param fmt a format string passed to as.POSIXct format via ...
#' @param format a format string passed to as.POSIXct format via ...
#'
#' @details
#'
Expand Down Expand Up @@ -69,15 +69,15 @@ time_unit_conv_factor = c(
#' pl$lit(pl$PTime("23:59:59"))$lit_to_s()
#'
#' pl$lit(pl$PTime("23:59:59"))$to_r()
pl$PTime = function(x, tu = c("s", "ms", "us", "ns"), fmt = "%H:%M:%S") {
pl$PTime = function(x, tu = c("s", "ms", "us", "ns"), format = "%H:%M:%S") {
tu = tu[1]
if (!is_string(tu) || !tu %in% c("s", "ms", "us", "ns")) {
stopf("tu must be either 's','ms','us' ,or 'ns', not [%s]", str_string(tu))
}

if (is.character(x)) {
x = as.double(as.POSIXct(x, format = fmt, tz = "GMT")) -
as.double(as.POSIXct("00:00:00", format = fmt, tz = "GMT"))
x = as.double(as.POSIXct(x, format = format, tz = "GMT")) -
as.double(as.POSIXct("00:00:00", format = format, tz = "GMT"))
x = x * time_unit_conv_factor[tu]
}

Expand Down Expand Up @@ -140,15 +140,15 @@ print.PTime = function(x, ...) {
)
val = unclass(x) / 10^tu_exp
origin = structure(0, tzone = "GMT", class = c("POSIXct", "POSIXt"))
fmt = format(as.POSIXct(val, tz = "GMT", origin = origin), format = "%H:%M:%S")
format = format(as.POSIXct(val, tz = "GMT", origin = origin), format = "%H:%M:%S")

if (tu != "s") {
dgt = formatC((val - floor(val)) * 10^tu_exp, width = tu_exp, flag = 0, big.mark = "_", digits = tu_exp)
fmt = paste0(fmt, ":", dgt, tu)
format = paste0(format, ":", dgt, tu)
}
cat("PTime [", typeof(x), "]: number of epochs [", tu, "] since midnight\n")
print(paste0(
fmt, " val: ", as.character(x)
format, " val: ", as.character(x)
))
invisible(x)
}
7 changes: 4 additions & 3 deletions R/after-wrappers.R
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,9 @@ extendr_method_to_pure_functions = function(env, class_name = NULL) {
.pr$Expr = extendr_method_to_pure_functions(Expr)
.pr$ProtoExprArray = extendr_method_to_pure_functions(ProtoExprArray)
.pr$When = extendr_method_to_pure_functions(When)
.pr$WhenThen = extendr_method_to_pure_functions(WhenThen)
.pr$WhenThenThen = extendr_method_to_pure_functions(WhenThenThen)
.pr$Then = extendr_method_to_pure_functions(Then)
.pr$ChainedWhen = extendr_method_to_pure_functions(ChainedWhen)
.pr$ChainedThen = extendr_method_to_pure_functions(ChainedThen)
.pr$VecDataFrame = extendr_method_to_pure_functions(VecDataFrame)
.pr$RNullValues = extendr_method_to_pure_functions(RNullValues)
.pr$RPolarsErr = extendr_method_to_pure_functions(RPolarsErr)
Expand Down Expand Up @@ -265,7 +266,7 @@ DataType = clone_env_one_level_deep(RPolarsDataType)
pl_class_names = sort(
c(
"LazyFrame", "Series", "LazyGroupBy", "DataType", "Expr", "DataFrame",
"When", "WhenThen", "WhenThenThen"
"When", "Then", "ChainedWhen", "ChainedThen"
)
) # TODO discover all public class automatically

Expand Down
29 changes: 10 additions & 19 deletions R/dataframe__frame.R
Original file line number Diff line number Diff line change
Expand Up @@ -670,19 +670,7 @@ DataFrame_to_series = function(idx = 0) {
}

#' DataFrame Sort
#' @description sort a DataFrame by on or more Expr.
#'
#' @param by Column(s) to sort by. Column name strings, character vector of
#' column names, or Iterable `Into<Expr>` (e.g. one Expr, or list mixed Expr and
#' column name strings).
#' @param ... more columns to sort by as above but provided one Expr per argument.
#' @param descending Sort descending? Default = FALSE logical vector of length 1 or same length
#' as number of Expr's from above by + ....
#' @param nulls_last Bool default FALSE, place all nulls_last?
#' @details by and ... args allow to either provide e.g. a list of Expr or something which can
#' be converted into an Expr e.g. `$sort(list(e1,e2,e3))`,
#' or provide each Expr as an individual argument `$sort(e1,e2,e3)`´ ... or both.
#'
#' @inherit LazyFrame_sort details description params
#' @return DataFrame
#' @keywords DataFrame
#' @examples
Expand All @@ -697,12 +685,15 @@ DataFrame_to_series = function(idx = 0) {
#' df$sort(c("cyl", "mpg"), descending = c(TRUE, FALSE))
#' df$sort(pl$col("cyl"), pl$col("mpg"))
DataFrame_sort = function(
by, # : IntoExpr | List[IntoExpr],
..., # unnamed Into expr
descending = FALSE, # bool | vector[bool] = False,
nulls_last = FALSE) {
# args after ... must be named
self$lazy()$sort(by, ..., descending = descending, nulls_last = nulls_last)$collect()
by,
...,
descending = FALSE,
nulls_last = FALSE,
maintain_order = FALSE) {
self$lazy()$sort(
by, ...,
descending = descending, nulls_last = nulls_last, maintain_order = maintain_order
)$collect()
}


Expand Down
24 changes: 24 additions & 0 deletions R/error__rpolarserr.R
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,27 @@ upgrade_err.RPolarsErr = function(err) { # already RPolarsErr pass through
bad_robj = function(r) {
.pr$RPolarsErr$new()$bad_robj(r)
}

Err_plain = function(x) {
Err(.pr$RPolarsErr$new()$plain(x))
}

# short hand for extracting an error context in unit testing, will raise error if not an RPolarsErr
get_err_ctx = \(x) unwrap_err(result(x))$contexts()


# wrapper to return Result
err_on_named_args = function(...) {
l = list2(...)
if (is.null(names(l)) || all(names(l) == "")) {
Ok(l)
} else {
bad_names = names(l)[names(l) != ""]
.pr$RPolarsErr$
new()$
bad_arg(paste(bad_names, collapse = ", "))$
plain("... args not allowed to be named here")$
hint("named ... arg was passed, or a non ... arg was misspelled") |>
Err()
}
}
12 changes: 8 additions & 4 deletions R/error__trait.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#' Internal generic method to add call to error
#' @param err any type which impl as.character
#' @param call calling context
#' @noRd
#' @details
#' Additional details...
#'
Expand All @@ -25,9 +26,11 @@ when_calling.default = function(err, call) {
call_to_string = function(call) paste(capture.output(print(call)), collapse = "\n")
# NB collapse is needed to ensure no invalid multi-line error strings

#' Internal generic method to point to which public method the user got wrong

#' where in (lexically) error happened
#' @description Internal generic method to point to which public method the user got wrong
#' @param err any type which impl as.character
#' @param call calling context
#' @param context calling context
#' @keywords internal
#' @return err as string
#' @examples
Expand All @@ -52,8 +55,8 @@ where_in.default = function(err, context) {

#' Internal generic method to convert an error_type to condition.
#' @param err any type which impl as.character
#' @param call calling context
#' @keywords internal
#' @noRd
#' @details
#' this method is needed to preserve state of err without upcasting to a string message
#' an implementation will describe how to store the error in the condition
Expand All @@ -75,6 +78,7 @@ to_condition.default = function(err) {
#' Internal generic method to add plain text to error message
#' @param err some error type object
#' @param msg string to add
#' @noRd
#' @keywords internal
#' @return condition
plain = function(err, msg) {
Expand All @@ -95,7 +99,7 @@ plain.default = function(err, msg) {
#' An error type can choose to implement this to improve the translation.
#' As fall back the error will be deparsed into a string with rust Debug, see rdbg()
#' @param err some error type object
#' @param msg string to add
#' @noRd
#' @keywords internal
#' @return condition
upgrade_err = function(err) {
Expand Down
11 changes: 8 additions & 3 deletions R/error_conversion.R
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
# THIS FILE IMPLEMENTS ERROR CONVERSION, FOR R TO Result-list & FOR Result-list TO R

# TODO unwrap should be eventually renamed to unwrap_with_context (or similar)
# a simpler unwrap without where_in and when_calling should be defined in rust_result.R

#' rust-like unwrapping of result. Useful to keep error handling on the R side.
#' unwrap
#' @description rust-like unwrapping of result. Useful to keep error handling on the R side.
#' @noRd
#' @param result a list here either element ok or err is NULL, or both if ok is litteral NULL
#' @param call context of error or string
#' @param context a msg to prefix a raised error with
#'
#' @details
#' unwraps any ok value and raises any err values
#' when raising error value, the error will be called with methods where_in() a simple lexical
#' context and when_calling() to add the call context and finally to_condition() to convert any
#' error into an R error condition. These s3 methods can be implemented for any future error type.
#'
#' @return the ok-element of list , or a error will be thrown
#' @keywords internal
#' @examples
Expand Down
Loading