Skip to content

tidypolars 0.8.0

Compare
Choose a tag to compare
@etiennebacher etiennebacher released this 04 Jun 08:50

tidypolars requires polars >= 0.17.0.

Breaking changes

  • As announced in tidypolars 0.7.0, the behavior of collect() has changed.
    It now returns a standard R data.frame and not a Polars DataFrame anymore.
    Replace collect() by compute() (with the same arguments) to keep the old
    behavior.

  • In bind_rows_polars(), if .id is passed, the resulting column now is of
    type character instead of integer.

New features

  • Add support for several functions:

    • from package base: all(), any(), diff(), ISOdatetime(),
      length(), rev(), unique().

    • from package dplyr: consecutive_id(), min_rank(), na_if(),
      n_distinct(), nth().

    • from package lubridate: make_datetime().

    • from package stringr: str_dup(), str_split(), str_split_i(),
      str_trunc().

    • from package tidyr: replace_na() (the data.frame method was already
      translated but not the vector one that can be used in mutate() for example).

  • It is now possible to use explicit namespaces (such as dplyr::first() instead
    of first()) in mutate(), summarize() and filter() (#114).

  • In bind_rows_polars(), if all elements are named and .id is specified, the
    .id column will use the names of the elements (#116).

  • It is now possible to rename variables in select() (#117).

  • Add support for argument na_matches in all join functions (except
    cross_join() that doesn't need it) (#109).

Bug fixes

  • Local variables in custom functions could not be used in tidypolars functions
    (reported in a blog post of Art Steinmetz). This is now fixed.

  • across() now works when .cols contains only one variable and .fns contains
    only one function.

  • In across(), the .cols argument now takes into account variables created
    in the same mutate() or summarize() call before across().

    as_polars_df(mtcars) |> 
      head(n = 3) |> 
      mutate(
        foo = 1, 
        across(.cols = contains("oo"), \(x) x - 1)
      )
    
    shape: (3, 12)
    ┌──────┬─────┬───────┬───────┬───┬─────┬──────┬──────┬─────┐
    │ mpgcyldisphp    ┆ … ┆ amgearcarbfoo │
    │ ------------   ┆   ┆ ------------ │
    │ f64f64f64f64   ┆   ┆ f64f64f64f64 │
    ╞══════╪═════╪═══════╪═══════╪═══╪═════╪══════╪══════╪═════╡
    │ 21.06.0160.0110.0 ┆ … ┆ 1.04.04.00.0 │
    │ 21.06.0160.0110.0 ┆ … ┆ 1.04.04.00.0 │
    │ 22.84.0108.093.0  ┆ … ┆ 1.04.01.00.0 │
    └──────┴─────┴───────┴───────┴───┴─────┴──────┴──────┴─────┘

    Note that the where() function is not supported here. For example:

    as_polars_df(mtcars) |> 
      mutate(
        foo = 1, 
        across(.cols = where(is.numeric), \(x) x - 1)
      )

    will not return 0 for the variable foo. A warning is emitted about this
    behavior.

  • Better handling of negative values in c() when called in mutate() and
    summarize().