Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a vignette about equivalence with tidyverse #183

Merged
merged 33 commits into from
Jul 26, 2022
Merged

Add a vignette about equivalence with tidyverse #183

merged 33 commits into from
Jul 26, 2022

Conversation

etiennebacher
Copy link
Member

@etiennebacher etiennebacher commented Jun 30, 2022

Closes #130

I put datawizard and tidyverse chunks side by side to easily compare the syntax, for example:

image

(I wanted to put different background colors to each chunk to distinguish between the two syntax, but apparently it is not yet supported by pkgdown (r-lib/downlit#149) so I added comments in each chunk).

@etiennebacher etiennebacher marked this pull request as draft June 30, 2022 13:24
@codecov-commenter
Copy link

codecov-commenter commented Jun 30, 2022

Codecov Report

Merging #183 (87d84e0) into main (6dbcdba) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #183   +/-   ##
=======================================
  Coverage   83.79%   83.79%           
=======================================
  Files          52       52           
  Lines        3196     3196           
=======================================
  Hits         2678     2678           
  Misses        518      518           

Help us with your feedback. Take ten seconds to tell us how you rate us.

one of its main features is that it has very few dependencies: `stats` and `utils`
(included in base R) and `insight`, which is the core package of the easystats
ecosystem. One drawback of this approach is that not all features of the
`tidyverse` packages are not supported and we will have to rely on base R, or on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ambiguous, because we don't rely on poorman anywhere in the code.


`data_select()` is the equivalent of `dplyr::select()`. The main difference
between these two functions is that `data_select()` uses two arguments (`select`
and `exclude`) and requires quoted column names, while `dplyr::select()` accepts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there are many possibilities, including unquoted column names, but only for a single variable, I think (because we don't use ... to capture comma-separated, unquoted names). But maybe this is too much to explain here, and for now, we can keep it like this.

Copy link
Member

@DominiqueMakowski DominiqueMakowski Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing thought, maybe we should add in our style guide in easystats the fact that we prefer quoted names rather than non-standard eval (in easystats, as opposed to the tidyverse), because it's easier to program with, because it's flexible, it makes more sense to distinguish names from objects, it's a much more standard programming good practice, and that in general NSE is an eldritch invention

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also, quoted names are much easier to work with for debugging)

@strengejacke
Copy link
Member

Nice draft! I like the structure of this vignette very much!

_pkgdown.yml Outdated Show resolved Hide resolved
@IndrajeetPatil IndrajeetPatil linked an issue Jul 1, 2022 that may be closed by this pull request
@IndrajeetPatil
Copy link
Member

This is a fantastic start, @etiennebacher! Thanks for working on this. Lovely use of pandoc's fenced div!

I am going to make some additions and add a few edits, but overall this is a good structure. I think, once finalized, we can also melt this down a bit and convert it to a JOSS paper (#59).

@etiennebacher
Copy link
Member Author

Thanks @IndrajeetPatil and @strengejacke!

Before finishing this vignette, we need to improve reshape_longer and reshape_wider to match more precisely the tidyr implementations and outputs. I'll skim through datawizard later to see if I forgot some major functions, but maybe you already have some suggestions of additional functions to include?

@IndrajeetPatil
Copy link
Member

but maybe you already have some suggestions of additional functions to include

I have added a couple of TODOs for this in the vignette. I will keep updating them as I think about more.

@IndrajeetPatil IndrajeetPatil changed the title [WIP] Add a vignette about basic data wrangling [WIP] Add a vignette about equivalence with tidyverse Jul 1, 2022
@IndrajeetPatil
Copy link
Member

@etiennebacher I am changing the targeted issue for this PR to #130 because that's what the PR is targeting.

For #90, I had a very different kind of vignette in mind.
Specifically, one where we start with a messy dataset and achieve the desired result using data wrangling functions. Something similar to this article.

@etiennebacher
Copy link
Member Author

For #90, I had a very different kind of vignette in mind. Specifically, one where we start with a messy dataset and achieve the desired result using data wrangling functions. Something similar to this article.

I see, I think a good example of a messy dataset could be to download a csv file from the World bank catalogue (not using the package WDI because it provides already cleaned data). We could use:

  • filtering to keep countries only (remove aggregates)
  • pivoting because it's one year per col
  • selecting, renaming and relocating for diverse things

@bwiernik
Copy link
Contributor

bwiernik commented Jul 7, 2022

An example:

colors <- data.frame(
  group = c("a", "b", "c"),
  color = c("black", "forestgreen", "lightblue")
)
scale_color_manual(values = data_extract(colors, "color",  name = group))

@etiennebacher
Copy link
Member Author

I think it's now ok to review this but it should be merged after #189 because the vignette uses some args of reshape_longer that are not yet in main branch.

Also, I prefer using reshape_longer and reshape_wider because to me it's easier to make the connection with pivot_*, but all other functions start with data_ so should we use data_to_long and data_to_wide in the vignette instead?

@etiennebacher etiennebacher marked this pull request as ready for review July 8, 2022 10:33
@IndrajeetPatil
Copy link
Member

I don't have strong opinions on that, but I personally prefer showcasing data wrangling functions having the data_*() prefix because it's an easy pattern for the users to remember.

@IndrajeetPatil
Copy link
Member

I also feel that we should remove the data_*() prefix from statistical transformation function names.

@etiennebacher

This comment was marked as resolved.

@IndrajeetPatil
Copy link
Member

@etiennebacher I'd like to merge this soon. I think most of the TODOs have been addressed, so this should be ready.

We can keep updating it as we add more tidyverse equivalents (e.g. arrange() <-> data_arrange()).

@IndrajeetPatil IndrajeetPatil changed the title [WIP] Add a vignette about equivalence with tidyverse Add a vignette about equivalence with tidyverse Jul 26, 2022
@etiennebacher
Copy link
Member Author

@IndrajeetPatil if my last commit doesn't break anything, I agree to merge this and update it later, I think it's ready

@IndrajeetPatil IndrajeetPatil self-requested a review July 26, 2022 13:41
@IndrajeetPatil IndrajeetPatil merged commit ecceea3 into easystats:main Jul 26, 2022
@etiennebacher etiennebacher deleted the vignette-data-wrangling branch July 26, 2022 13:41
@IndrajeetPatil
Copy link
Member

Merci beaucoup, Étienne!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add some doc about transition from tidyverse?
6 participants