Hashing made with element and md5 checksum #168

unkcpz · 2023-12-08T13:55:41Z

fixes #167

For pseudo data, all we care about for the node is the hash of the file. So we could even think of overriding the _get_objects_to_hash and just return the md5 and element.

unkcpz · 2023-12-08T13:58:41Z

Somehow the objcets are the same but the hash values are different? I think I miss some key behaviour of make_hash function.

sphuber · 2023-12-09T14:36:15Z

It is not a misunderstanding of the make_hash function but rather that Node.get_hash does not call Node._get_objects_to_hash() but Node.base.caching._get_objects_to_hash. You should override the Node._CLS_NODE_CACHING attribute with a custom version of NodeCaching, i.e., something like:

from aiida.orm.nodes.caching import NodeCaching

class PseudoPotentialDataCaching(NodeCaching):
    """Class to define caching behavior of ``PseudoPotentialData`` nodes."""

    def _get_objects_to_hash(self) -> list:
        """Return a list of objects which should be included in the node hash."""
        return [self.element, self.md5]


class PseudoPotentialData(Data):

    _CLS_NODE_CACHING = PseudoPotentialDataCaching

That should make the tests pass. You have to change the original._get_objects_to_hash to original.base.caching._get_objects_to_hash of course.

unkcpz · 2023-12-12T21:21:18Z

@sphuber It works! thanks!

sphuber · 2023-12-13T13:07:37Z

Last thing I was wondering @unkcpz . Why do we need to include the element? Should the file content not really just be sufficient? Could it really be possible where you have two identical files but they should be for different elements?

Pinging also @mbercx for his thoughts

unkcpz · 2023-12-13T14:04:23Z

Could it really be possible where you have two identical files but they should be for different elements?

Because I saw the setter function for the element and then I was thinking that there might be different site labels for the same element using the same UPF file.
I think it all depends on whether we assume user will change the element attribute or not. Afterall, the element is parsed from file, the same md5 will guarantee the same element.

sphuber · 2023-12-13T14:09:33Z

Afterall, the element is parsed from file, the same md5 will guarantee the same element.

For the subclasses it is yes, but for the PseudoPotentialData it has to be set manually. But regardless on how it is set, if the file content is identical for two different pseudos, how could it possibly for two differente elements? Functionally, there would be zero difference, so would it be necessary to even include the element attribute?

unkcpz · 2023-12-13T14:27:44Z

I don't have any use case involved with subfix for element, so I am leaning to remove element from caching.
Is there use case in aiida-quantumespresso that may set for instance Fe to Fe_1 and Fe_2, even so that should have nothing to do with the aiida-pseudo I guess?

By default, the hash for data nodes is computed from a set of attributes that includes the version of `aiida-core` as well as the plugin package that provides the data plugin, if applicable. This means that the hash of a data node changes merely if the version of `aiida-core` or the plugin package changes, even if its content is identical. This choice was made in `aiida-core` to err on the safe side. However, for pseudopotentials, this is unnecessary and actually reduces the effectiveness of caching. If a calculation is run with pseudos that has already run before, but the version of `aiida-pseudo` is different with which the pseudo was installed, the calculation will not be cached but run again. The `PseudoPotentialData` now customizes the caching class, ensuring the hash is calculated just based on the element and the file content. This should uniquely define a pseudopotential and so should be safe enough to base the hash upon.

sphuber

Thanks @unkcpz . Let's get this merged. The concept just makes sense. We will find out later if there really is some use-case out there that is not compatible with this change.

unkcpz force-pushed the fix/167/hash-with-md5 branch from bb47f4e to f4de94d Compare December 12, 2023 21:21

sphuber force-pushed the fix/167/hash-with-md5 branch from f4de94d to a7f5335 Compare December 13, 2023 13:01

sphuber force-pushed the fix/167/hash-with-md5 branch from a7f5335 to 48635ee Compare December 22, 2023 18:53

sphuber force-pushed the fix/167/hash-with-md5 branch from 48635ee to 4ba2454 Compare December 22, 2023 18:55

sphuber self-requested a review December 22, 2023 18:58

sphuber approved these changes Dec 22, 2023

View reviewed changes

sphuber merged commit 26a23aa into aiidateam:main Dec 22, 2023
6 checks passed

unkcpz deleted the fix/167/hash-with-md5 branch December 23, 2023 01:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hashing made with element and md5 checksum #168

Hashing made with element and md5 checksum #168

unkcpz commented Dec 8, 2023 •

edited

Loading

unkcpz commented Dec 8, 2023

sphuber commented Dec 9, 2023

unkcpz commented Dec 12, 2023

sphuber commented Dec 13, 2023 •

edited

Loading

unkcpz commented Dec 13, 2023

sphuber commented Dec 13, 2023

unkcpz commented Dec 13, 2023

sphuber left a comment

Hashing made with element and md5 checksum #168

Hashing made with element and md5 checksum #168

Conversation

unkcpz commented Dec 8, 2023 • edited Loading

unkcpz commented Dec 8, 2023

sphuber commented Dec 9, 2023

unkcpz commented Dec 12, 2023

sphuber commented Dec 13, 2023 • edited Loading

unkcpz commented Dec 13, 2023

sphuber commented Dec 13, 2023

unkcpz commented Dec 13, 2023

sphuber left a comment

Choose a reason for hiding this comment

unkcpz commented Dec 8, 2023 •

edited

Loading

sphuber commented Dec 13, 2023 •

edited

Loading