Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow n() in data_modify() #535

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Allow n() in data_modify() #535

wants to merge 13 commits into from

Conversation

strengejacke
Copy link
Member

@strengejacke strengejacke commented Aug 27, 2024

@etiennebacher Maybe, if checks pass, you can merge this PR for the planned release? Just a minor code change.

Copy link
Member

@etiennebacher etiennebacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think there are cases where this implementation would fail, e.g. if someone passes 1:fun(). Can you check for this?

You have more usecases than me with datawizard but it seems that we gradually try to replace all functions from dplyr, which I thought was not the main objective of datawizard. I don't know if this needed in another package internally but if not, do we actually need to implement this instead of using dplyr?


Also, if you want to add more bug fixes/features, I don't plan on releasing before the end of the week.

NEWS.md Outdated Show resolved Hide resolved
R/data_modify.R Outdated Show resolved Hide resolved
R/data_modify.R Outdated Show resolved Hide resolved
tests/testthat/test-data_modify.R Outdated Show resolved Hide resolved
strengejacke and others added 3 commits August 27, 2024 20:22
Co-authored-by: Etienne Bacher <[email protected]>
Co-authored-by: Etienne Bacher <[email protected]>
@strengejacke
Copy link
Member Author

but if not, do we actually need to implement this

You're right, but we had this already for data_summary(), because it's an more often used function, and I thought it should then also work for data_modify(). That's the main reason why I added this. And actually, I don't use dplyr and tidyr anymore, since 99.5% of my tasks can be done with datawizard. :-)

Copy link
Member

@etiennebacher etiennebacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I have no good idea for a use-case for fun()?

I think you can just check that the error message of data_modify(mtcars, b = fun()) is comprehensible.

But before that, please check my other comment.

R/data_modify.R Show resolved Hide resolved
@etiennebacher
Copy link
Member

You're right, but we had this already for data_summary(), because it's an more often used function, and I thought it should then also work for data_modify().

I'm also not a fan of having this in data_summary(). I think it adds confusion for a tidyverse user that could think "why is n() supported but not first() or last()?" and basically we don't have a good answer to that. To me the tidyverse and datawizard are more complements than substitutes: datawizard provides nice statistical functions that can be included in a pipeline, and it also provides dependency-free functions that we need internally in other packages anyway. Adding support for n() doesn't seem to meet any of those cases.

@strengejacke
Copy link
Member Author

But I have no good idea for a use-case for fun()?

I think you can just check that the error message of data_modify(mtcars, b = fun()) is comprehensible.

But before that, please check my other comment.

But does fun() necessarily result in an error? I'm not sure whether there are use cases of user-defined functions?

@strengejacke
Copy link
Member Author

strengejacke commented Aug 28, 2024

datawizard provides nice statistical functions that can be included in a pipeline, and it also provides dependency-free functions that we need internally in other packages anyway. Adding support for n() doesn't seem to meet any of those cases.

Yes, that was certainly the initial idea of the package, but a) there are probably other people who use datawizard instead of dplyr/tidyr (e.g., @DominiqueMakowski - at least in the past - never really liked the NSE of tidyverse and prefers working with strings to identify variables etc., same like me) and b) maybe there will be other packages that prefer using datawizard for its minimal dependencies in their packages, and then it could be nice to have some features that are also supported by other packages. I personally don't mind implementing "common" features, but what do the other @easystats/maintainers think? Should we have a "stopping rule" of implementing features?

@etiennebacher
Copy link
Member

But does fun() necessarily result in an error? I'm not sure whether there are use cases of user-defined functions?

I just meant that we should have a test for the error below:

library(datawizard)

head(mtcars) |> 
  data_modify(x = n())
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb x
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 6
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 6
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 6
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 6
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 6
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 6

head(mtcars) |> 
  data_modify(x = fun())
#> Error: There was an error in the first expression. Attempt to apply
#>   non-function. Possibly misspelled or not yet defined?

@strengejacke
Copy link
Member Author

@easystats/core-team Any opinions to
#535 (comment)
#535 (comment)

?

@rempsyc
Copy link
Sponsor Member

rempsyc commented Sep 11, 2024

I personally don't mind implementing "common" features, but what do the other easystats/maintainers think?

Personally, I think the harm in adding extra features is limited (although I can also see Etienne's point about creating confusion). I think the potential benefit of new features ouweight the cons (although it also adds to future maintenance and development), so as long as you're motivated to do the associated work Daniel (and I think you are), I'd say go ahead 🥸

Sometimes in my packages I might want to internally rely on datawizard/easystats and I need dplyr just for one or two functions like n(), then it might be a plus that I have the option to rely on datawizard instead of adding an additional dependency (dplyr) or of rewriting the code myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants