Add tests for StandardScaler covering with_mean and with_std #15

agoscinski · 2023-02-23T06:34:56Z

I hope it is clear from the commit messages and title. I can also split it up into two PRs, if requested.

add tests for StandardScaler covering with_mean and with_std
adds gradient_compliant flag to StandardScaler

github-actions · 2023-02-23T07:37:16Z

The documentation for this PR is (or will soon be) available on readthedocs: https://equisolve--15.org.readthedocs.build/en/15/

Luthaf · 2023-02-23T09:52:13Z

src/equisolve/numpy/preprocessing/_base.py

@@ -39,6 +48,7 @@ def __init__(
        column_wise: bool = False,
        rtol: float = 0.0,
        atol: float = 1e-12,
+        gradient_compliant: bool = True,


Why do we need this flag? I don't see a use case for gradient_compliant=False at all, so it feels like this is only adding needless complexity =/

It goes back to the discussion of metatensor/metatensor#118
I am not sure, I would like to try it out. Its not much added complexity in terms of code and it just makes the user aware of a complexity that he should be anyway aware of. So in the worst case this is not used and it only makes the user aware of what is happening here.

I would apply YAGNI here. If we don't have a use case for it, do not add it, even if it is a simple change, because a lot of unnecessary little changes can lead to death by a thousand cuts.

If you want to play with it, this change does not have to be in the main branch, it can stay in a personal branch until we find a usecase for it.

makes the user aware of what is happening here.

I would rather do this with documentation on the with_mean parameter, saying there that only the values are modified.

Fair enough, I thought about it and I cannot think of any transformer benefiting of this option, and if it turns out to be useful for some other transformer, we can think about when implementing such a transformer.

src/equisolve/numpy/preprocessing/_base.py

Luthaf · 2023-02-23T11:22:20Z

tests/numpy/preprocessing/test_base.py

@@ -25,6 +25,9 @@ class TestStandardScaler:

    rng = np.random.default_rng(0x122578748CFF12AD12)
    n_strucs = 5
+
+    # this computation is only needed because equistore does not offer random
+    # initialization TODO change as soons as equistore as random initialization


What do you mean with random initialization? something like np.rand((3, 4))? I doubt we will be able to provide something like this, since we also need metadata =/ You could have an helper function here that create a tensormap with known metadata and random data, in the same way this works: https://github.com/lab-cosmo/equistore/blob/master/python/tests/utils.py#L6

Both things seems useful. For now I would just use an adapted version of the file you linked here, and put it in a test utils foder as you did in equstore.

agoscinski · 2023-02-23T15:00:03Z

Changes addressing the review are in this commit 65a18ad

After approval I rebase this into two commits, first one adding the tests/numpy/utils.py file and the second commit with the rest, but I think this way is easier for reviewing.

agoscinski · 2023-02-23T15:09:10Z

src/equisolve/numpy/preprocessing/_base.py

-            for parameter in self.parameter_keys:
-                if parameter == "values":
-                    continue
+            for parameter in X_block.gradients_list():


this change was required because we allow to only apply the standardizer on certain gradient keys using parameter_keys, so if parameter_keys does not contain all keys, later the returned TensorBlock only contained the gradients in paramater_keys. That was not the intention, so we just the means to zero and scale to 1 when parameter is not in parameter_keys

This makes me think the usage of parameter_keys is a bit fragile. I guess this can be addressed when re-writing this class with operations, so it is fine for now.

Could you remind me why this functionality is not built on top of remove_gradients? Is there something missing?

We thought that a user might for example want to only fit on energies, but wants to also predict forces (for a quick and dirty model) without removing the gradients, because we still want to output the force error at the end. I think it can be discussed how useful this actually is, but I wouldn't touch it in this PR.

Luthaf · 2023-02-24T11:38:17Z

tests/numpy/preprocessing/test_base.py

+sys.path.append("../")
+from utils import random_single_block_no_components_tensor_map  # noqa


Do you want to fix #20 before merging this? Should be relatively easy

Luthaf

This is good to go for me, if you want to clean it up a bit and fix #20 first go for it, otherwise it will happen in separate PR!

PicoCentauri · 2023-02-27T13:32:34Z

src/equisolve/numpy/preprocessing/_base.py

                    X_mean = np.average(X_mat, weights=sample_weights, axis=0)
                    var = np.average((X_mat - X_mean) ** 2, axis=0)


Maybe it is actually useful to use the mean_over_samples and variance_over_samples function here.

adding to issue #18

tests for all combinations of with_mean=[True, False], with_std=[True, False] for the StandardScaler have been added fixes the smal bugs that raise errors for new tests

agoscinski changed the title ~~Add tests for StandardScaler covering with_mean with_std and adds gradient_compliant falg~~ Add tests for StandardScaler covering with_mean and with_std, and adds gradient_compliant flag to StandardScaler Feb 23, 2023

agoscinski force-pushed the add-standardscaler-test branch 2 times, most recently from 3bfd854 to 575c77b Compare February 23, 2023 07:32

agoscinski force-pushed the add-standardscaler-test branch 2 times, most recently from f5a843f to f120d35 Compare February 23, 2023 07:56

agoscinski requested a review from PicoCentauri February 23, 2023 07:57

Luthaf reviewed Feb 23, 2023

View reviewed changes

src/equisolve/numpy/preprocessing/_base.py Show resolved Hide resolved

Luthaf reviewed Feb 23, 2023

View reviewed changes

Luthaf mentioned this pull request Feb 23, 2023

Rewrite StandardScaler in terms of equistore operations #18

Open

agoscinski mentioned this pull request Feb 23, 2023

Change operations mean, std, substract, div in the StandardScaler to equistore operations #19

Closed

agoscinski changed the title ~~Add tests for StandardScaler covering with_mean and with_std, and adds gradient_compliant flag to StandardScaler~~ Add tests for StandardScaler covering with_mean and with_std Feb 23, 2023

agoscinski mentioned this pull request Feb 23, 2023

Relative imports for tests #20

Closed

agoscinski force-pushed the add-standardscaler-test branch 2 times, most recently from 9b3542d to 99725cc Compare February 23, 2023 14:59

agoscinski requested a review from Luthaf February 23, 2023 15:00

agoscinski force-pushed the add-standardscaler-test branch from 99725cc to 778b393 Compare February 23, 2023 15:07

agoscinski commented Feb 23, 2023

View reviewed changes

agoscinski force-pushed the add-standardscaler-test branch from 778b393 to 65a18ad Compare February 23, 2023 15:10

Luthaf reviewed Feb 24, 2023

View reviewed changes

Luthaf approved these changes Feb 24, 2023

View reviewed changes

agoscinski force-pushed the add-standardscaler-test branch 3 times, most recently from a82df88 to dec8e7e Compare February 27, 2023 11:20

PicoCentauri reviewed Feb 27, 2023

View reviewed changes

tests for StandardScaler for with_mean with_std

4d478aa

tests for all combinations of with_mean=[True, False], with_std=[True, False] for the StandardScaler have been added fixes the smal bugs that raise errors for new tests

agoscinski force-pushed the add-standardscaler-test branch from dec8e7e to 4d478aa Compare February 28, 2023 09:53

agoscinski merged commit 4d478aa into main Feb 28, 2023

agoscinski deleted the add-standardscaler-test branch February 28, 2023 10:22

agoscinski temporarily deployed to github-pages February 28, 2023 10:23 — with GitHub Pages Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests for StandardScaler covering with_mean and with_std #15

Add tests for StandardScaler covering with_mean and with_std #15

agoscinski commented Feb 23, 2023 •

edited

Loading

github-actions bot commented Feb 23, 2023

Luthaf Feb 23, 2023

agoscinski Feb 23, 2023

Luthaf Feb 23, 2023

agoscinski Feb 23, 2023 •

edited

Loading

Luthaf Feb 23, 2023

agoscinski Feb 23, 2023

agoscinski commented Feb 23, 2023 •

edited

Loading

agoscinski Feb 23, 2023

Luthaf Feb 24, 2023

agoscinski Feb 24, 2023

Luthaf Feb 24, 2023

Luthaf left a comment

PicoCentauri Feb 27, 2023

agoscinski Feb 27, 2023

		sys.path.append("../")
		from utils import random_single_block_no_components_tensor_map # noqa

		X_mean = np.average(X_mat, weights=sample_weights, axis=0)
		var = np.average((X_mat - X_mean) ** 2, axis=0)

Add tests for StandardScaler covering with_mean and with_std #15

Add tests for StandardScaler covering with_mean and with_std #15

Conversation

agoscinski commented Feb 23, 2023 • edited Loading

github-actions bot commented Feb 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agoscinski Feb 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agoscinski commented Feb 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Luthaf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agoscinski commented Feb 23, 2023 •

edited

Loading

agoscinski Feb 23, 2023 •

edited

Loading

agoscinski commented Feb 23, 2023 •

edited

Loading