Skip to content
This repository has been archived by the owner on Apr 24, 2024. It is now read-only.

Add tests for StandardScaler covering with_mean and with_std #15

Merged
merged 1 commit into from
Feb 28, 2023

Conversation

agoscinski
Copy link
Collaborator

@agoscinski agoscinski commented Feb 23, 2023

I hope it is clear from the commit messages and title. I can also split it up into two PRs, if requested.

  • add tests for StandardScaler covering with_mean and with_std
  • adds gradient_compliant flag to StandardScaler

@agoscinski agoscinski changed the title Add tests for StandardScaler covering with_mean with_std and adds gradient_compliant falg Add tests for StandardScaler covering with_mean and with_std, and adds gradient_compliant flag to StandardScaler Feb 23, 2023
@agoscinski agoscinski force-pushed the add-standardscaler-test branch 2 times, most recently from 3bfd854 to 575c77b Compare February 23, 2023 07:32
@github-actions
Copy link

The documentation for this PR is (or will soon be) available on readthedocs: https://equisolve--15.org.readthedocs.build/en/15/

@agoscinski agoscinski force-pushed the add-standardscaler-test branch 2 times, most recently from f5a843f to f120d35 Compare February 23, 2023 07:56
@@ -39,6 +48,7 @@ def __init__(
column_wise: bool = False,
rtol: float = 0.0,
atol: float = 1e-12,
gradient_compliant: bool = True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this flag? I don't see a use case for gradient_compliant=False at all, so it feels like this is only adding needless complexity =/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It goes back to the discussion of metatensor/metatensor#118
I am not sure, I would like to try it out. Its not much added complexity in terms of code and it just makes the user aware of a complexity that he should be anyway aware of. So in the worst case this is not used and it only makes the user aware of what is happening here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would apply YAGNI here. If we don't have a use case for it, do not add it, even if it is a simple change, because a lot of unnecessary little changes can lead to death by a thousand cuts.

If you want to play with it, this change does not have to be in the main branch, it can stay in a personal branch until we find a usecase for it.

makes the user aware of what is happening here.

I would rather do this with documentation on the with_mean parameter, saying there that only the values are modified.

Copy link
Collaborator Author

@agoscinski agoscinski Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I thought about it and I cannot think of any transformer benefiting of this option, and if it turns out to be useful for some other transformer, we can think about when implementing such a transformer.

@@ -25,6 +25,9 @@ class TestStandardScaler:

rng = np.random.default_rng(0x122578748CFF12AD12)
n_strucs = 5

# this computation is only needed because equistore does not offer random
# initialization TODO change as soons as equistore as random initialization
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with random initialization? something like np.rand((3, 4))? I doubt we will be able to provide something like this, since we also need metadata =/ You could have an helper function here that create a tensormap with known metadata and random data, in the same way this works: https://github.com/lab-cosmo/equistore/blob/master/python/tests/utils.py#L6

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both things seems useful. For now I would just use an adapted version of the file you linked here, and put it in a test utils foder as you did in equstore.

@agoscinski agoscinski changed the title Add tests for StandardScaler covering with_mean and with_std, and adds gradient_compliant flag to StandardScaler Add tests for StandardScaler covering with_mean and with_std Feb 23, 2023
@agoscinski agoscinski force-pushed the add-standardscaler-test branch 2 times, most recently from 9b3542d to 99725cc Compare February 23, 2023 14:59
@agoscinski
Copy link
Collaborator Author

agoscinski commented Feb 23, 2023

Changes addressing the review are in this commit 65a18ad

After approval I rebase this into two commits, first one adding the tests/numpy/utils.py file and the second commit with the rest, but I think this way is easier for reviewing.

for parameter in self.parameter_keys:
if parameter == "values":
continue
for parameter in X_block.gradients_list():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change was required because we allow to only apply the standardizer on certain gradient keys using parameter_keys, so if parameter_keys does not contain all keys, later the returned TensorBlock only contained the gradients in paramater_keys. That was not the intention, so we just the means to zero and scale to 1 when parameter is not in parameter_keys

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me think the usage of parameter_keys is a bit fragile. I guess this can be addressed when re-writing this class with operations, so it is fine for now.

Could you remind me why this functionality is not built on top of remove_gradients? Is there something missing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We thought that a user might for example want to only fit on energies, but wants to also predict forces (for a quick and dirty model) without removing the gradients, because we still want to output the force error at the end. I think it can be discussed how useful this actually is, but I wouldn't touch it in this PR.

Comment on lines 20 to 21
sys.path.append("../")
from utils import random_single_block_no_components_tensor_map # noqa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to fix #20 before merging this? Should be relatively easy

Copy link
Collaborator

@Luthaf Luthaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to go for me, if you want to clean it up a bit and fix #20 first go for it, otherwise it will happen in separate PR!

@agoscinski agoscinski force-pushed the add-standardscaler-test branch 3 times, most recently from a82df88 to dec8e7e Compare February 27, 2023 11:20
Comment on lines 192 to 193
X_mean = np.average(X_mat, weights=sample_weights, axis=0)
var = np.average((X_mat - X_mean) ** 2, axis=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is actually useful to use the mean_over_samples and variance_over_samples function here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding to issue #18

tests for all combinations of with_mean=[True, False],
with_std=[True, False] for the StandardScaler have been added
fixes the smal bugs that raise errors for new tests
@agoscinski agoscinski merged commit 4d478aa into main Feb 28, 2023
@agoscinski agoscinski deleted the add-standardscaler-test branch February 28, 2023 10:22
@agoscinski agoscinski temporarily deployed to github-pages February 28, 2023 10:23 — with GitHub Pages Inactive
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants