standardize with refit without centering #164

alaindanet · 2022-04-12T14:32:53Z

Thank you so much for the package!

I would like to know if it is possible to provide an option of standardization without centering when using refit method.

The rationale is that the negative and positive values of some predictive variables can have a meaningful signification (i.e. difference of price over a period), and it that case, it is valuable to only scale the variable and not center them as suggested by Andrew Gelman here and here (Gelman, 2008; actually cited in the documentation of the standardize function):

subtracting the mean of each input variable and dividing by its standard deviation. (Strictly
speaking, subtracting the mean is not necessary, but this step allows main effects to be more
easily interpreted in the presence of interactions.)

We also center each input variable to have a mean of zero so that interactions are more
interpretable. Again, in some applications it can make sense for variables to be centered around
some particular baseline value, but we believe our automatic procedure is better than the current
default of using whatever value happens to be zero on the scale of the data, which all too commonly
results in absurdities such as age = 0 years or party identification = 0 on a 1–7 scale.

In the case where negative and positive values of predictor variables have different meaning, I believe that the centering can change the meaning of the regression coefficients.

I realized that with my own data analysis where a positive coefficient become negative with centering, with the type of explicative variable that I mentionned above.

mattansb · 2022-04-12T19:05:24Z

Do you have an example (reprex) --without an interaction-- where centering vs non-centering changes the signs on the coefficients (other than the intercept)? It really shouldn't...

strengejacke · 2022-04-12T19:11:43Z

Maybe standardizing the data before fitting the model can help, you have options to control the reference for centering and dispersion: https://easystats.github.io/datawizard/reference/standardize.html

alaindanet · 2022-04-12T21:45:29Z

@mattansb It is a two way interactions that display change of sign, it happens where the predictors log2 ratio over temporal data, i.e. log2(x1/x0). I have not so much time right now to reproduce this but it took me a long time to figure out why the results would change.

@strengejacke I agree.

I was thinking about an option in compare_models() function that I was using a bit too automatically.

It is just that this discrepancy made me realize that I should really think about I am doing when standardizing variables. As highlighted in my first post, centering is may be misleading in some cases.

To elaborate a bit, when standardizing coefficients, we think about the formula :
$$r_\delta = \beta \dfrac{\sigma_x}{\sigma_y}$$. Standardization with "refit" option leads to centering and scaling of variables which is not so clear right now from the compare_models() function, I guess that some users (including me) are not thinking about centering.

It is not a big deal, but I wanted to raise this issue, to see if a sentence could be added to the documentation of compare_models(), or an option to specify if variables should be center or not, or at least add a ref to Gelman regarding the difference between scaling and centering/scaling.

This said, thank you again for your great package.

mattansb · 2022-04-13T04:58:17Z

Indeed, if there is an interaction, the simple slopes will change after centering - this is usually something people want (to have the simple slopes represent "main effects").

As @strengejacke pointed out, if you want more fine-grain control, you can standardize each variable as you see fit manually, prior to model fitting.

Seeing how the back-end function (datawizard::standardize.data.frame()) is setup, I don't see the suggested functionality being added right now.

alaindanet · 2022-04-13T21:08:53Z

Well @mattansb , I agree with you in the case where the 0 values of your predictive variable give little insight as age in the epidemiological study on adult population.
But in the case were the 0 values are of interest like a variable describing changes of weight, it sound more relevant to not center the variable to the mean change of weight.

Here again, I quote Gelman (2008):

We also center each input variable to have a mean of zero so that interactions are more
interpretable. Again, in some applications it can make sense for variables to be centered around
some particular baseline value, but we believe our automatic procedure is better than the current
default of using whatever value happens to be zero on the scale of the data, which all too commonly
results in absurdities such as age = 0 years or party identification = 0 on a 1–7 scale. Even with
such scaling, the correct interpretation of the model can be untangled from the regression by
pulling out the right combination of coefficients (for example, evaluating interactions at different
plausible values of age such as 20, 40, and 60); the advantage of our procedure is that the default
outputs in the regression table can be compared and understood in a consistent way.

That is fine that is low priority! At least, people who have questions about centering/scaling may end up here and read Gelman (2008).

Thank you so much!

mattansb · 2022-04-14T06:21:08Z

@alaindanet I am aware of these points, even though their application is less commonly used - yes, ideally people would understand their scales and units of measure and would center (or not) variables around sensible values that are derived from domain specific knowledge.
But if this is the case, effectsize::standardize() wouldn't be used anyway 😉

mattansb added enhancement 🔥 Low priority 😴 labels Apr 13, 2022

bwiernik closed this as completed Apr 14, 2022

mattansb reopened this Apr 17, 2022

mattansb transferred this issue from easystats/effectsize May 3, 2022

IndrajeetPatil removed the enhancement 🔥 label May 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

standardize with refit without centering #164

standardize with refit without centering #164

alaindanet commented Apr 12, 2022

mattansb commented Apr 12, 2022

strengejacke commented Apr 12, 2022

alaindanet commented Apr 12, 2022 •

edited

Loading

mattansb commented Apr 13, 2022

alaindanet commented Apr 13, 2022

mattansb commented Apr 14, 2022

standardize with refit without centering #164

standardize with refit without centering #164

Comments

alaindanet commented Apr 12, 2022

mattansb commented Apr 12, 2022

strengejacke commented Apr 12, 2022

alaindanet commented Apr 12, 2022 • edited Loading

mattansb commented Apr 13, 2022

alaindanet commented Apr 13, 2022

mattansb commented Apr 14, 2022

alaindanet commented Apr 12, 2022 •

edited

Loading