Skip to content

Commit

Permalink
Merge branch 'main' into strengejacke/issue441
Browse files Browse the repository at this point in the history
  • Loading branch information
strengejacke authored Sep 12, 2023
2 parents 898b229 + ad96b50 commit c95b6ad
Show file tree
Hide file tree
Showing 39 changed files with 1,995 additions and 734 deletions.
1 change: 0 additions & 1 deletion .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,5 @@ references.bib
^hextools/.
^WIP/.
^CRAN-SUBMISSION$
^LICENSE$
docs
^.dev$
8 changes: 5 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: datawizard
Title: Easy Data Wrangling and Statistical Transformations
Version: 0.8.0.5
Version: 0.8.0.13
Authors@R: c(
person("Indrajeet", "Patil", , "[email protected]", role = "aut",
comment = c(ORCID = "0000-0003-1995-6531", Twitter = "@patilindrajeets")),
Expand All @@ -27,13 +27,13 @@ Description: A lightweight package to assist in key steps involved in any data
(3) compute statistical summaries of data properties and distributions.
It is also the data wrangling backend for packages in 'easystats' ecosystem.
References: Patil et al. (2022) <doi:10.21105/joss.04684>.
License: GPL (>= 3)
License: MIT + file LICENSE
URL: https://easystats.github.io/datawizard/
BugReports: https://github.com/easystats/datawizard/issues
Depends:
R (>= 3.6)
Imports:
insight (>= 0.19.1),
insight (>= 0.19.3.2),
stats,
utils
Suggests:
Expand All @@ -43,6 +43,7 @@ Suggests:
data.table,
dplyr (>= 1.0),
effectsize,
emmeans,
gamm4,
ggplot2,
gt,
Expand Down Expand Up @@ -77,3 +78,4 @@ Config/Needs/website:
rstudio/bslib,
r-lib/pkgdown,
easystats/easystatstemplate
Remotes: easystats/insight
676 changes: 2 additions & 674 deletions LICENSE

Large diffs are not rendered by default.

21 changes: 21 additions & 0 deletions LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# MIT License

Copyright (c) 2023 datawizard authors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
10 changes: 10 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ S3method(describe_distribution,numeric)
S3method(format,data_codebook)
S3method(format,dw_data_peek)
S3method(format,dw_data_tabulate)
S3method(format,dw_groupmeans)
S3method(format,parameters_distribution)
S3method(kurtosis,data.frame)
S3method(kurtosis,default)
Expand All @@ -76,6 +77,9 @@ S3method(labels_to_levels,data.frame)
S3method(labels_to_levels,default)
S3method(labels_to_levels,factor)
S3method(makepredictcall,dw_transformer)
S3method(means_by_group,data.frame)
S3method(means_by_group,default)
S3method(means_by_group,numeric)
S3method(normalize,data.frame)
S3method(normalize,factor)
S3method(normalize,grouped_df)
Expand All @@ -86,6 +90,8 @@ S3method(print,data_codebook)
S3method(print,dw_data_peek)
S3method(print,dw_data_tabulate)
S3method(print,dw_data_tabulates)
S3method(print,dw_groupmeans)
S3method(print,dw_groupmeans_list)
S3method(print,dw_transformer)
S3method(print,parameters_distribution)
S3method(print,parameters_kurtosis)
Expand Down Expand Up @@ -176,6 +182,7 @@ S3method(to_numeric,logical)
S3method(to_numeric,numeric)
S3method(unnormalize,data.frame)
S3method(unnormalize,default)
S3method(unnormalize,grouped_df)
S3method(unnormalize,numeric)
S3method(unstandardize,array)
S3method(unstandardize,character)
Expand All @@ -202,6 +209,7 @@ export(coef_var)
export(coerce_to_numeric)
export(colnames_to_row)
export(column_as_rownames)
export(contr.deviation)
export(convert_na_to)
export(convert_to_na)
export(data_addprefix)
Expand Down Expand Up @@ -252,6 +260,7 @@ export(get_columns)
export(kurtosis)
export(labels_to_levels)
export(mean_sd)
export(means_by_group)
export(median_mad)
export(normalize)
export(print_html)
Expand All @@ -270,6 +279,7 @@ export(reshape_longer)
export(reshape_wider)
export(reverse)
export(reverse_scale)
export(row_means)
export(row_to_colnames)
export(rowid_as_column)
export(rownames_as_column)
Expand Down
33 changes: 33 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,55 @@
# datawizard (devel)

NEW FUNCTIONS

* `row_means()`, to compute row means, optionally only for the rows with at
least `min_valid` non-missing values.

* `contr.deviation()` for sum-deviation contrast coding of factors.

* `means_by_group()`, to compute mean values of variables, grouped by levels
of specified factors.

CHANGES

* `recode_into()` gains an `overwrite` argument to skip overwriting already
recoded cases when multiple recode patterns apply to the same case.

* `recode_into()` gains an `preserve_na` argument to preserve `NA` values
when recoding.

* `data_read()` now passes the `encoding` argument to `data.table::fread()`.
This allows to read files with non-ASCII characters.

* `datawizard` moves from the GPL-3 license to the MIT license.

* `unnormalize()` and `unstandardize()` now work with grouped data (#415).

* `unnormalize()` now errors instead of emitting a warning if it doesn't have the
necessary info (#415).

BUG FIXES

* Fixed issue in `labels_to_levels()` when values of labels were not in sorted
order and values were not sequentially numbered.

* Fixed issues in `data_write()` when writing labelled data into SPSS format
and vectors were of different type as value labels.

* Fixed issue in `recode_into()` with probably wrong case number printed in the
warning when several recode patterns match to one case.

* Fixed issue in `recode_into()` when original data contained `NA` values and
`NA` was not included in the recode pattern.

* Fixed issue in `data_filter()` where functions containing a `=` (e.g. when
naming arguments, like `grepl(pattern, x = a)`) were mistakenly seen as
faulty syntax.

* Fixed issue in `empty_column()` for strings with invalid multibyte strings.
For such data frames or files, `empty_column()` or `data_read()` no longer
fails.

# datawizard 0.8.0

BREAKING CHANGES
Expand Down
99 changes: 99 additions & 0 deletions R/contrs.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
#' Deviation Contrast Matrix
#'
#' Build a deviation contrast matrix, a type of _effects contrast_ matrix.
#'
#' @inheritParams stats::contr.sum
#'
#' @details
#' In effects coding, unlike treatment/dummy coding
#' ([stats::contr.treatment()]), each contrast sums to 0. In regressions models,
#' this results in an intercept that represents the (unweighted) average of the
#' group means. In ANOVA settings, this also guarantees that lower order effects
#' represent _main_ effects (and not _simple_ or _conditional_ effects, as is
#' the case when using R's default [stats::contr.treatment()]).
#' \cr\cr
#' Deviation coding (`contr.deviation`) is a type of effects coding. With
#' deviation coding, the coefficients for factor variables are interpreted as
#' the difference of each factor level from the base level (this is the same
#' interpretation as with treatment/dummy coding). For example, for a factor
#' `group` with levels "A", "B", and "C", with `contr.devation`, the intercept
#' represents the overall mean (average of the group means for the 3 groups),
#' and the coefficients `groupB` and `groupC` represent the differences between
#' the A group mean and the B and C group means, respectively.
#' \cr\cr
#' Sum coding ([stats::contr.sum()]) is another type of effects coding. With sum
#' coding, the coefficients for factor variables are interpreted as the
#' difference of each factor level from **the grand (across-groups) mean**. For
#' example, for a factor `group` with levels "A", "B", and "C", with
#' `contr.sum`, the intercept represents the overall mean (average of the group
#' means for the 3 groups), and the coefficients `group1` and `group2` represent
#' the differences the
#' **A** and **B** group means from the overall mean, respectively.
#'
#' @seealso [stats::contr.sum()]
#'
#' @examples
#' if (FALSE) {
#' data("mtcars")
#'
#' mtcars <- data_modify(mtcars, cyl = factor(cyl))
#'
#' c.treatment <- cbind(Intercept = 1, contrasts(mtcars$cyl))
#' solve(c.treatment)
#' #> 4 6 8
#' #> Intercept 1 0 0 # mean of the 1st level
#' #> 6 -1 1 0 # 2nd level - 1st level
#' #> 8 -1 0 1 # 3rd level - 1st level
#'
#' contrasts(mtcars$cyl) <- contr.sum
#' c.sum <- cbind(Intercept = 1, contrasts(mtcars$cyl))
#' solve(c.sum)
#' #> 4 6 8
#' #> Intercept 0.333 0.333 0.333 # overall mean
#' #> 0.667 -0.333 -0.333 # deviation of 1st from overall mean
#' #> -0.333 0.667 -0.333 # deviation of 2nd from overall mean
#'
#'
#' contrasts(mtcars$cyl) <- contr.deviation
#' c.deviation <- cbind(Intercept = 1, contrasts(mtcars$cyl))
#' solve(c.deviation)
#' #> 4 6 8
#' #> Intercept 0.333 0.333 0.333 # overall mean
#' #> 6 -1.000 1.000 0.000 # 2nd level - 1st level
#' #> 8 -1.000 0.000 1.000 # 3rd level - 1st level
#'
#' ## With Interactions -----------------------------------------
#' mtcars <- data_modify(mtcars, am = C(am, contr = contr.deviation))
#' mtcars <- data_arrange(mtcars, select = c("cyl", "am"))
#'
#' mm <- unique(model.matrix(~ cyl * am, data = mtcars))
#' rownames(mm) <- c(
#' "cyl4.am0", "cyl4.am1", "cyl6.am0",
#' "cyl6.am1", "cyl8.am0", "cyl8.am1"
#' )
#'
#' solve(mm)
#' #> cyl4.am0 cyl4.am1 cyl6.am0 cyl6.am1 cyl8.am0 cyl8.am1
#' #> (Intercept) 0.167 0.167 0.167 0.167 0.167 0.167 # overall mean
#' #> cyl6 -0.500 -0.500 0.500 0.500 0.000 0.000 # cyl MAIN eff: 2nd - 1st
#' #> cyl8 -0.500 -0.500 0.000 0.000 0.500 0.500 # cyl MAIN eff: 2nd - 1st
#' #> am1 -0.333 0.333 -0.333 0.333 -0.333 0.333 # am MAIN eff
#' #> cyl6:am1 1.000 -1.000 -1.000 1.000 0.000 0.000
#' #> cyl8:am1 1.000 -1.000 0.000 0.000 -1.000 1.000
#' }
#'
#' @export
contr.deviation <- function(n, base = 1,
contrasts = TRUE,
sparse = FALSE) {
cont <- stats::contr.treatment(n,
base = base,
contrasts = contrasts,
sparse = sparse
)
if (contrasts) {
n <- nrow(cont)
cont <- cont - 1 / n
}
cont
}
File renamed without changes.
6 changes: 5 additions & 1 deletion R/data_read.R
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,11 @@ data_read <- function(path,

.read_text <- function(path, encoding, verbose, ...) {
if (insight::check_if_installed("data.table", quietly = TRUE)) {
out <- data.table::fread(input = path, ...)
# set proper default encoding-value for fread
if (is.null(encoding)) {
encoding <- "unknown"
}
out <- data.table::fread(input = path, encoding = encoding, ...)
class(out) <- "data.frame"
return(out)
}
Expand Down
File renamed without changes.
4 changes: 2 additions & 2 deletions R/demean.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,13 @@
#' @inheritParams center
#'
#' @return
#'
#' A data frame with the group-/de-meaned variables, which get the suffix
#' `"_between"` (for the group-meaned variable) and `"_within"` (for the
#' de-meaned variable) by default.
#'
#' @seealso If grand-mean centering (instead of centering within-clusters)
#' is required, see [center()].
#' is required, see [center()]. See [`performance::check_heterogeneity_bias()`]
#' to check for heterogeneity bias.
#'
#' @details
#'
Expand Down
Loading

0 comments on commit c95b6ad

Please sign in to comment.