Skip to content

Releases: pola-rs/r-polars

v0.20.0

16 Oct 22:30
Compare
Choose a tag to compare

Breaking changes

  • Updated rust-polars to 0.43.1 (#1230).
  • In pl$scan_ipc() and pl$read_ipc(), the argument memory_map is removed
    (#1230).
  • In $serialize(), in the field schema, the field inner is renamed fields,
    and the fields output_schema and filter are removed (#1230).

New features

  • New method $cast() for DataFrame and LazyFrame (#1219).
  • New argument strict in $drop() to determine whether unknown column names
    should trigger an error (#1220).
  • New method $to_dummies() for DataFrame (#1225).
  • New argument include_file_paths in pl$scan_csv() and pl$read_csv() (#1235).
  • New method $join_where() for DataFrame and LazyFrame to perform
    inequality joins (#1237).

Bug fixes

  • Converting data of datatype Null to R doesn't error anymore. It now creates
    a column filled with NA (#1217).

New Contributors

Full Changelog: v0.19.0...v0.20.0

lib-v0.43.0

16 Oct 14:28
c564e05
Compare
Choose a tag to compare
lib-v0.43.0 Pre-release
Pre-release
test: the latest nanoarrow supports utf8view type (#1257)

v0.19.1

31 Aug 09:37
Compare
Choose a tag to compare

This is a maintenance relase. No user facing changes.

lib-v0.42.1

30 Aug 17:34
d6b992f
Compare
Choose a tag to compare
lib-v0.42.1 Pre-release
Pre-release
docs: fix some typos in DEVELOPMENT.md (#1211)

v0.19.0

29 Aug 16:59
cf0f9e0
Compare
Choose a tag to compare

Breaking changes

  • Updated rust-polars to unreleased 2024-08-20, after 0.42.0 (#1183).
  • $describe_plan() and $describe_optimized_plan() are removed. Use
    respectively $explain(optimized = FALSE) and $explain() instead (#1182).
  • The parameter inherit_optimization is removed from all functions that had it
    (#1183).
  • In $write_parquet() and $sink_parquet(), the parameter data_pagesize_limit
    is renamed data_page_size (#1183).
  • The LazyFrame method $get_optimization_toggle() is removed, and
    $set_optimization_toggle() is renamed $optimization_toggle() (#1183).
  • In $unpivot(), the parameter streamable is removed (#1183).
  • Some functions have a parameter future that determines the compatibility level
    when exporting Polars' internal data structures. This parameter is renamed
    compat_level, which takes FALSE for the oldest flavor (more compatible)
    and TRUE for the newest one (less compatible). It can also take an integer
    determining a specific compatibility level when more are added in the future.
    For now, future = FALSE can be replaced by compat_level = FALSE (#1183).
  • In $scan_parquet() and $read_parquet(), the default value of
    hive_partitioning is now NULL (#1189).
  • In $dt$epoch(), the argument tu is renamed to time_unit (#1196).
  • In $fill_nan() for DataFrame, LazyFrame and Expr, the argument is
    renamed value (#1198).
  • $shift_and_fill() is removed and replaced by a new argument fill_value in
    $shift(). $shift_and_fill(fill_value, periods) can be replaced by
    $shift(n, fill_value) (#1201).
  • In $shift() for various Expr, the argument periods is renamed n (#1201).
  • In $clip(), arguments min and max are renamed lower_bound and
    upper_bound (#1203).
  • $clip_min() and $clip_max() are removed. Use $clip() with only
    lower_bound or upper_bound instead (#1203).
  • In $write_csv and $sink_csv(), the argument quote is renamed
    quote_char (#1206).

New features

  • New method $str$extract_many() (#1163).
  • Converting a nanoarrow_array with zero rows to an RPolarsDataFrame via
    as_polars_df() now keeps the original schema (#1177).
  • $write_parquet() has two new arguments partition_by and
    partition_chunk_size_bytes to write a DataFrame to a hive-partitioned
    directory (#1183).
  • New method $bin$size() (#1183).
  • In $scan_parquet() and $read_parquet(), the parallel argument can take
    the new value "prefiltered" (#1183).
  • $scan_parquet(), $scan_ipc() and $read_parquet() have a new argument
    include_file_paths to automatically add a column containing the path to the
    source file(s) (#1183).
  • $scan_ipc() can read a hive-partitioned directory with its new arguments
    hive_partitioning, hive_schema, and try_parse_hive_dates (#1183).
  • $scan_parquet() and $read_parquet() gain two new arguments for more control
    on importing hive partitions: hive_schema and try_parse_hive_dates (#1189).
  • New method $gather_every() for LazyFrame and DataFrame (#1199).
  • $glimpse() for DataFrame has two new arguments max_items_per_column and
    max_colname_length (#1200).
  • New method $list$sample() (#1204).
  • New argument coalesce in $join_asof() (#1205).
  • New argument maintain_order in $list$unique() (#1207).

Other changes

  • In $unnest() for DataFrame and LazyFrame, the names argument is removed
    and replaced by .... This doesn't change the previous behavior, e.g.
    df$unnest(names = c("a", "b")) still works (#1170).

Full Changelog: v0.18.0...v0.19.0

lib-v0.42.0

29 Aug 12:01
b421545
Compare
Choose a tag to compare
lib-v0.42.0 Pre-release
Pre-release
chore: bump serde_json from 1.0.125 to 1.0.127 in /src/rust (#1209)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

v0.18.0

05 Jul 15:02
Compare
Choose a tag to compare

Breaking changes

  • Updated rust-polars to 0.41.3 (#1147, #1156).
  • In $n_chunks(), the default value of strategy now is "first" (#1137).
  • $sample() for Expr and DataFrame (#1136):
    • the argument frac is renamed fraction;
    • all the arguments except n must be named;
    • for the Expr method only, the first argument is now n (it was already the
      case for the DataFrame method);
    • for the Expr method only, the default value for with_replacement is now
      FALSE (it was already the case for the DataFrame method).
  • $melt() had several changes (#1147):
    • melt() is renamed $unpivot().
    • Some arguments were renamed: id_vars is now index, value_vars is now
      on.
    • The order of arguments has changed: on is now first, then index. The
      order of the other arguments hasn't changed. Note that on can be unnamed
      but all the other arguments must be named.
  • pivot() had several changes (#1147):
    • The argument columns is renamed on.
    • The order of arguments has changed: on is now first, then index and
      values. The order of the other arguments hasn't changed. Note that on
      can be unnamed but all the other arguments must be named.
  • In $write_parquet() and $sink_parquet(), the default value of argument
    statistics is now TRUE and can take other values than TRUE/FALSE (#1147).
  • In $dt$truncate() and $dt$round(), the argument offset has been removed.
    Use $dt$offset_by() after those functions instead (#1147).
  • In $top_k() and $bottom_k() for Expr, the arguments nulls_last,
    maintain_order and multithreaded have been removed. If any null values
    are in the top/bottom k values, they will always be positioned last (#1147).
  • $replace() has been split in two functions depending on the desired
    behaviour (#1147):
    • $replace() recodes some values in the column, leaving all other values
      unchanged. Compared to the previous version, it doesn't use the arguments
      default and return_dtype anymore.
    • $replace_strict() replaces all values by different values. If a value
      doesn't have a specific mapping, it is replaced by the default value.
  • $str$concat() is deprecated, use $str$join() (with the same arguments)
    instead (#1147).
  • In pl$date_range() and pl$date_ranges(), the arguments time_unit and
    time_zone have been removed. They were deprecated in previous versions
    (#1147).
  • In $join(), when how = "cross", on, left_on and right_on must be
    NULL (#1147).

New features

  • New method $has_nulls() (#1133).
  • New method $list$explode() (#1139).
  • $over() gains a new argument order_by to specify the order of values
    within each group. This is useful when the operation depends on the order of
    values, such as $shift() (#1147).
  • $value_counts() gains an argument normalize to give relative frequencies
    of unique values instead of their count (#1147).

New Contributors

Full Changelog: v0.17.0...v0.18.0

lib-v0.41.0

05 Jul 14:18
d67c57d
Compare
Choose a tag to compare
lib-v0.41.0 Pre-release
Pre-release
test: tempolary disable the test of `pl$mem_address` (#1161)

v0.17.0

04 Jun 03:22
Compare
Choose a tag to compare

Breaking changes

  • Updated rust-polars to unreleased version (> 0.40.0) (#1104, #1110, #1117, #1124):
    • In $join(), there is a new argument coalesce and the how options now accept "full" instead of "outer" and "outer_coalesce".
    • $top_k() and $bottom_k() gain three arguments nulls_last, maintain_order and multithreaded.
    • All $rolling_*() functions lose the arguments by, closed and warn_if_unsorted. Rolling computations based on by must be made via the corresponding rolling_*_by(), e.g rolling_mean_by() instead of rolling_mean(by =) (#1115).
    • pl$scan_parquet() and pl$read_parquet() gain an argument glob which defaults to TRUE. Set it to FALSE to avoid considering * as a globing pattern.
    • $is_not_nan() on a null value (NA in R) now returns null. Previously, it returned TRUE.
    • In $reshape(), argument dims is renamed dimensions and there is a new argument nested_type specifying if the output should be of type List or Array.
    • In $value_counts(), all arguments must be named and there is a new argument name to specify the name of the output.
    • In all functions accepting optimization parameter (such as projection_pushdown), there is a new parameter cluster_with_columns to combine sequential independent calls to $with_columns().
    • $str$explode() is removed.
    • The check_sorted argument is removed from $rolling() and $group_by_dynamic(). Sortedness is now verified in a quick manner, so this argument is no longer needed (pola-rs/polars#16494).
    • $name$map() stacks on Linux, so this method is deprecated and the document is removed. Please use other methods like <LazyFrame>$rename(<function>) instead (#1123).
  • As warned in v0.16.0, the order of arguments in pl$Series is changed (#1071). The first argument is now name, and the second argument is values.
  • $to_struct() on an Expr is removed. This method is now only available for Series, DataFrame, and in the $list and $arr subnamespaces. For example, pl$col("a", "b", "c")$to_struct() should be replaced with pl$struct(c("a", "b", "c")) (#1092).
  • pl$Struct() now only accepts named inputs and objects of class RPolarsField. For example, pl$Struct(pl$Boolean) doesn't work anymore and should be named like pl$Struct(a = pl$Boolean) (#1053).
  • In $all() and $any(), the argument drop_nulls is renamed ignore_nulls, and this argument must be named (#1050).
  • New method $struct$with_fields() (#1109) and new function pl$field() to be used in expressions in $struct$with_fields() (#1113).
  • New methods for RPolarsDataType: $is_enum(), $is_categorical(), $is_known(), $is_string(), $contains_views(), $contains_categorical() (#1112).
  • In $dt$combine(), the arguments tm and tu are renamed time and time_unit (#1116).
  • The default value of the rechunk argument of pl$concat() is changed from TRUE to FALSE (#1125).
  • In $rename() for LazyFrame and DataFrame, key-value pairs of names are changed to old_name = "new_name" instead of new_name = "old_name" (#1129).
  • In $rename() for LazyFrame and DataFrame, no argument is not allowed (#1129).
  • In all $rolling_*() functions, the arguments center and ddof must be named (#1115).

New features

  • Allow specify a function in $rename() for LazyFrame and DataFrame. They are equivalent to polars.LazyFrame.rename(mapping: Callable[[str], str]) or polars.DataFrame.rename(mapping: Callable[[str], str]) in Python Polars (#1122, #1129).

Full Changelog: v0.16.4...v0.17.0

lib-v0.40.0

03 Jun 22:56
3e3eece
Compare
Choose a tag to compare
lib-v0.40.0 Pre-release
Pre-release
Add `$rolling_*_by()` expressions (#1115)

Co-authored-by: eitsupi <[email protected]>