Add a vignette about equivalence with tidyverse #183

etiennebacher · 2022-06-30T13:24:17Z

Closes #130

I put datawizard and tidyverse chunks side by side to easily compare the syntax, for example:

(I wanted to put different background colors to each chunk to distinguish between the two syntax, but apparently it is not yet supported by pkgdown (r-lib/downlit#149) so I added comments in each chunk).

codecov-commenter · 2022-06-30T13:27:34Z

Codecov Report

Merging #183 (87d84e0) into main (6dbcdba) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #183   +/-   ##
=======================================
  Coverage   83.79%   83.79%           
=======================================
  Files          52       52           
  Lines        3196     3196           
=======================================
  Hits         2678     2678           
  Misses        518      518

Help us with your feedback. Take ten seconds to tell us how you rate us.

…acher/datawizard into vignette-data-wrangling

strengejacke · 2022-06-30T15:43:30Z

vignettes/basic_data_wrangling.Rmd

+one of its main features is that it has very few dependencies: `stats` and `utils`
+(included in base R) and `insight`, which is the core package of the easystats 
+ecosystem. One drawback of this approach is that not all features of the 
+`tidyverse` packages are not supported and we will have to rely on base R, or on 


This is ambiguous, because we don't rely on poorman anywhere in the code.

strengejacke · 2022-06-30T15:45:17Z

vignettes/basic_data_wrangling.Rmd

+
+`data_select()` is the equivalent of `dplyr::select()`. The main difference 
+between these two functions is that `data_select()` uses two arguments (`select`
+and `exclude`) and requires quoted column names, while `dplyr::select()` accepts


No, there are many possibilities, including unquoted column names, but only for a single variable, I think (because we don't use ... to capture comma-separated, unquoted names). But maybe this is too much to explain here, and for now, we can keep it like this.

Passing thought, maybe we should add in our style guide in easystats the fact that we prefer quoted names rather than non-standard eval (in easystats, as opposed to the tidyverse), because it's easier to program with, because it's flexible, it makes more sense to distinguish names from objects, it's a much more standard programming good practice, and that in general NSE is an eldritch invention

(also, quoted names are much easier to work with for debugging)

strengejacke · 2022-06-30T15:48:11Z

Nice draft! I like the structure of this vignette very much!

_pkgdown.yml

vignettes/basic_data_wrangling.Rmd

IndrajeetPatil · 2022-07-01T05:15:29Z

This is a fantastic start, @etiennebacher! Thanks for working on this. Lovely use of pandoc's fenced div!

I am going to make some additions and add a few edits, but overall this is a good structure. I think, once finalized, we can also melt this down a bit and convert it to a JOSS paper (#59).

etiennebacher · 2022-07-01T07:09:25Z

Thanks @IndrajeetPatil and @strengejacke!

Before finishing this vignette, we need to improve reshape_longer and reshape_wider to match more precisely the tidyr implementations and outputs. I'll skim through datawizard later to see if I forgot some major functions, but maybe you already have some suggestions of additional functions to include?

IndrajeetPatil · 2022-07-01T08:17:47Z

but maybe you already have some suggestions of additional functions to include

I have added a couple of TODOs for this in the vignette. I will keep updating them as I think about more.

IndrajeetPatil · 2022-07-01T08:22:18Z

@etiennebacher I am changing the targeted issue for this PR to #130 because that's what the PR is targeting.

For #90, I had a very different kind of vignette in mind.
Specifically, one where we start with a messy dataset and achieve the desired result using data wrangling functions. Something similar to this article.

etiennebacher · 2022-07-01T08:26:43Z

For #90, I had a very different kind of vignette in mind. Specifically, one where we start with a messy dataset and achieve the desired result using data wrangling functions. Something similar to this article.

I see, I think a good example of a messy dataset could be to download a csv file from the World bank catalogue (not using the package WDI because it provides already cleaned data). We could use:

filtering to keep countries only (remove aggregates)
pivoting because it's one year per col
selecting, renaming and relocating for diverse things

bwiernik · 2022-07-07T18:47:43Z

An example:

colors <- data.frame(
  group = c("a", "b", "c"),
  color = c("black", "forestgreen", "lightblue")
)
scale_color_manual(values = data_extract(colors, "color",  name = group))

vignettes/tidyverse_translation.Rmd

etiennebacher · 2022-07-08T10:33:38Z

I think it's now ok to review this but it should be merged after #189 because the vignette uses some args of reshape_longer that are not yet in main branch.

Also, I prefer using reshape_longer and reshape_wider because to me it's easier to make the connection with pivot_*, but all other functions start with data_ so should we use data_to_long and data_to_wide in the vignette instead?

IndrajeetPatil · 2022-07-08T10:37:48Z

I don't have strong opinions on that, but I personally prefer showcasing data wrangling functions having the data_*() prefix because it's an easy pattern for the users to remember.

IndrajeetPatil · 2022-07-08T11:31:51Z

I also feel that we should remove the data_*() prefix from statistical transformation function names.

IndrajeetPatil · 2022-07-26T13:28:12Z

@etiennebacher I'd like to merge this soon. I think most of the TODOs have been addressed, so this should be ready.

We can keep updating it as we add more tidyverse equivalents (e.g. arrange() <-> data_arrange()).

etiennebacher · 2022-07-26T13:31:59Z

@IndrajeetPatil if my last commit doesn't break anything, I agree to merge this and update it later, I think it's ready

IndrajeetPatil · 2022-07-26T13:42:08Z

Merci beaucoup, Étienne!

etiennebacher added 2 commits June 27, 2022 18:43

start vignette about data wrangling

e5f69ab

add "title" to each chunk since coloring is not supported by pkgdown yet

01882d8

etiennebacher marked this pull request as draft June 30, 2022 13:24

strengejacke and others added 4 commits June 30, 2022 15:32

Merge branch 'master' into vignette-data-wrangling

da21f11

remove native pipe for compat with old R

4e401fa

Merge branch 'vignette-data-wrangling' of https://github.com/etienneb…

a71ed5b

…acher/datawizard into vignette-data-wrangling

Update basic_data_wrangling.Rmd

6e2c9f5

strengejacke reviewed Jun 30, 2022

View reviewed changes