Releases: pola-rs/r-polars
Releases · pola-rs/r-polars
v0.20.0
Breaking changes
- Updated rust-polars to 0.43.1 (#1230).
- In
pl$scan_ipc()
andpl$read_ipc()
, the argumentmemory_map
is removed
(#1230). - In
$serialize()
, in the fieldschema
, the fieldinner
is renamedfields
,
and the fieldsoutput_schema
andfilter
are removed (#1230).
New features
- New method
$cast()
forDataFrame
andLazyFrame
(#1219). - New argument
strict
in$drop()
to determine whether unknown column names
should trigger an error (#1220). - New method
$to_dummies()
forDataFrame
(#1225). - New argument
include_file_paths
inpl$scan_csv()
andpl$read_csv()
(#1235). - New method
$join_where()
forDataFrame
andLazyFrame
to perform
inequality joins (#1237).
Bug fixes
- Converting data of datatype
Null
to R doesn't error anymore. It now creates
a column filled withNA
(#1217).
New Contributors
Full Changelog: v0.19.0...v0.20.0
lib-v0.43.0
test: the latest nanoarrow supports utf8view type (#1257)
v0.19.1
lib-v0.42.1
docs: fix some typos in DEVELOPMENT.md (#1211)
v0.19.0
Breaking changes
- Updated rust-polars to unreleased 2024-08-20, after 0.42.0 (#1183).
$describe_plan()
and$describe_optimized_plan()
are removed. Use
respectively$explain(optimized = FALSE)
and$explain()
instead (#1182).- The parameter
inherit_optimization
is removed from all functions that had it
(#1183). - In
$write_parquet()
and$sink_parquet()
, the parameterdata_pagesize_limit
is renameddata_page_size
(#1183). - The LazyFrame method
$get_optimization_toggle()
is removed, and
$set_optimization_toggle()
is renamed$optimization_toggle()
(#1183). - In
$unpivot()
, the parameterstreamable
is removed (#1183). - Some functions have a parameter
future
that determines the compatibility level
when exporting Polars' internal data structures. This parameter is renamed
compat_level
, which takesFALSE
for the oldest flavor (more compatible)
andTRUE
for the newest one (less compatible). It can also take an integer
determining a specific compatibility level when more are added in the future.
For now,future = FALSE
can be replaced bycompat_level = FALSE
(#1183). - In
$scan_parquet()
and$read_parquet()
, the default value of
hive_partitioning
is nowNULL
(#1189). - In
$dt$epoch()
, the argumenttu
is renamed totime_unit
(#1196). - In
$fill_nan()
forDataFrame
,LazyFrame
andExpr
, the argument is
renamedvalue
(#1198). $shift_and_fill()
is removed and replaced by a new argumentfill_value
in
$shift()
.$shift_and_fill(fill_value, periods)
can be replaced by
$shift(n, fill_value)
(#1201).- In
$shift()
for variousExpr
, the argumentperiods
is renamedn
(#1201). - In
$clip()
, argumentsmin
andmax
are renamedlower_bound
and
upper_bound
(#1203). $clip_min()
and$clip_max()
are removed. Use$clip()
with only
lower_bound
orupper_bound
instead (#1203).- In
$write_csv
and$sink_csv()
, the argumentquote
is renamed
quote_char
(#1206).
New features
- New method
$str$extract_many()
(#1163). - Converting a
nanoarrow_array
with zero rows to anRPolarsDataFrame
via
as_polars_df()
now keeps the original schema (#1177). $write_parquet()
has two new argumentspartition_by
and
partition_chunk_size_bytes
to write aDataFrame
to a hive-partitioned
directory (#1183).- New method
$bin$size()
(#1183). - In
$scan_parquet()
and$read_parquet()
, theparallel
argument can take
the new value"prefiltered"
(#1183). $scan_parquet()
,$scan_ipc()
and$read_parquet()
have a new argument
include_file_paths
to automatically add a column containing the path to the
source file(s) (#1183).$scan_ipc()
can read a hive-partitioned directory with its new arguments
hive_partitioning
,hive_schema
, andtry_parse_hive_dates
(#1183).$scan_parquet()
and$read_parquet()
gain two new arguments for more control
on importing hive partitions:hive_schema
andtry_parse_hive_dates
(#1189).- New method
$gather_every()
forLazyFrame
andDataFrame
(#1199). $glimpse()
forDataFrame
has two new argumentsmax_items_per_column
and
max_colname_length
(#1200).- New method
$list$sample()
(#1204). - New argument
coalesce
in$join_asof()
(#1205). - New argument
maintain_order
in$list$unique()
(#1207).
Other changes
- In
$unnest()
forDataFrame
andLazyFrame
, thenames
argument is removed
and replaced by...
. This doesn't change the previous behavior, e.g.
df$unnest(names = c("a", "b"))
still works (#1170).
Full Changelog: v0.18.0...v0.19.0
lib-v0.42.0
chore: bump serde_json from 1.0.125 to 1.0.127 in /src/rust (#1209) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
v0.18.0
Breaking changes
- Updated rust-polars to 0.41.3 (#1147, #1156).
- In
$n_chunks()
, the default value ofstrategy
now is"first"
(#1137). $sample()
for Expr and DataFrame (#1136):- the argument
frac
is renamedfraction
; - all the arguments except
n
must be named; - for the Expr method only, the first argument is now
n
(it was already the
case for the DataFrame method); - for the Expr method only, the default value for
with_replacement
is now
FALSE
(it was already the case for the DataFrame method).
- the argument
$melt()
had several changes (#1147):melt()
is renamed$unpivot()
.- Some arguments were renamed:
id_vars
is nowindex
,value_vars
is now
on
. - The order of arguments has changed:
on
is now first, thenindex
. The
order of the other arguments hasn't changed. Note thaton
can be unnamed
but all the other arguments must be named.
pivot()
had several changes (#1147):- The argument
columns
is renamedon
. - The order of arguments has changed:
on
is now first, thenindex
and
values
. The order of the other arguments hasn't changed. Note thaton
can be unnamed but all the other arguments must be named.
- The argument
- In
$write_parquet()
and$sink_parquet()
, the default value of argument
statistics
is nowTRUE
and can take other values thanTRUE/FALSE
(#1147). - In
$dt$truncate()
and$dt$round()
, the argumentoffset
has been removed.
Use$dt$offset_by()
after those functions instead (#1147). - In
$top_k()
and$bottom_k()
forExpr
, the argumentsnulls_last
,
maintain_order
andmultithreaded
have been removed. If anynull
values
are in the top/bottomk
values, they will always be positioned last (#1147). $replace()
has been split in two functions depending on the desired
behaviour (#1147):$replace()
recodes some values in the column, leaving all other values
unchanged. Compared to the previous version, it doesn't use the arguments
default
andreturn_dtype
anymore.$replace_strict()
replaces all values by different values. If a value
doesn't have a specific mapping, it is replaced by thedefault
value.
$str$concat()
is deprecated, use$str$join()
(with the same arguments)
instead (#1147).- In
pl$date_range()
andpl$date_ranges()
, the argumentstime_unit
and
time_zone
have been removed. They were deprecated in previous versions
(#1147). - In
$join()
, whenhow = "cross"
,on
,left_on
andright_on
must be
NULL
(#1147).
New features
- New method
$has_nulls()
(#1133). - New method
$list$explode()
(#1139). $over()
gains a new argumentorder_by
to specify the order of values
within each group. This is useful when the operation depends on the order of
values, such as$shift()
(#1147).$value_counts()
gains an argumentnormalize
to give relative frequencies
of unique values instead of their count (#1147).
New Contributors
- @ju6ge made their first contribution in #1135
- @shikokuchuo made their first contribution in #1160
Full Changelog: v0.17.0...v0.18.0
lib-v0.41.0
test: tempolary disable the test of `pl$mem_address` (#1161)
v0.17.0
Breaking changes
- Updated rust-polars to unreleased version (> 0.40.0) (#1104, #1110, #1117, #1124):
- In
$join()
, there is a new argumentcoalesce
and thehow
options now accept"full"
instead of"outer"
and"outer_coalesce"
. $top_k()
and$bottom_k()
gain three argumentsnulls_last
,maintain_order
andmultithreaded
.- All
$rolling_*()
functions lose the argumentsby
,closed
andwarn_if_unsorted
. Rolling computations based onby
must be made via the correspondingrolling_*_by()
, e.grolling_mean_by()
instead ofrolling_mean(by =)
(#1115). pl$scan_parquet()
andpl$read_parquet()
gain an argumentglob
which defaults toTRUE
. Set it toFALSE
to avoid considering*
as a globing pattern.$is_not_nan()
on anull
value (NA
in R) now returnsnull
. Previously, it returnedTRUE
.- In
$reshape()
, argumentdims
is renameddimensions
and there is a new argumentnested_type
specifying if the output should be of type List or Array. - In
$value_counts()
, all arguments must be named and there is a new argumentname
to specify the name of the output. - In all functions accepting optimization parameter (such as
projection_pushdown
), there is a new parametercluster_with_columns
to combine sequential independent calls to$with_columns()
. $str$explode()
is removed.- The
check_sorted
argument is removed from$rolling()
and$group_by_dynamic()
. Sortedness is now verified in a quick manner, so this argument is no longer needed (pola-rs/polars#16494). $name$map()
stacks on Linux, so this method is deprecated and the document is removed. Please use other methods like<LazyFrame>$rename(<function>)
instead (#1123).
- In
- As warned in v0.16.0, the order of arguments in
pl$Series
is changed (#1071). The first argument is nowname
, and the second argument isvalues
. $to_struct()
on an Expr is removed. This method is now only available forSeries
,DataFrame
, and in the$list
and$arr
subnamespaces. For example,pl$col("a", "b", "c")$to_struct()
should be replaced withpl$struct(c("a", "b", "c"))
(#1092).pl$Struct()
now only accepts named inputs and objects of classRPolarsField
. For example,pl$Struct(pl$Boolean)
doesn't work anymore and should be named likepl$Struct(a = pl$Boolean)
(#1053).- In
$all()
and$any()
, the argumentdrop_nulls
is renamedignore_nulls
, and this argument must be named (#1050). - New method
$struct$with_fields()
(#1109) and new functionpl$field()
to be used in expressions in$struct$with_fields()
(#1113). - New methods for
RPolarsDataType
:$is_enum()
,$is_categorical()
,$is_known()
,$is_string()
,$contains_views()
,$contains_categorical()
(#1112). - In
$dt$combine()
, the argumentstm
andtu
are renamedtime
andtime_unit
(#1116). - The default value of the
rechunk
argument ofpl$concat()
is changed fromTRUE
toFALSE
(#1125). - In
$rename()
for LazyFrame and DataFrame, key-value pairs of names are changed toold_name = "new_name"
instead ofnew_name = "old_name"
(#1129). - In
$rename()
for LazyFrame and DataFrame, no argument is not allowed (#1129). - In all
$rolling_*()
functions, the argumentscenter
andddof
must be named (#1115).
New features
- Allow specify a function in
$rename()
for LazyFrame and DataFrame. They are equivalent topolars.LazyFrame.rename(mapping: Callable[[str], str])
orpolars.DataFrame.rename(mapping: Callable[[str], str])
in Python Polars (#1122, #1129).
Full Changelog: v0.16.4...v0.17.0
lib-v0.40.0
Add `$rolling_*_by()` expressions (#1115) Co-authored-by: eitsupi <[email protected]>