Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashing made with element and md5 checksum #168

Merged
merged 1 commit into from
Dec 22, 2023

Conversation

unkcpz
Copy link
Member

@unkcpz unkcpz commented Dec 8, 2023

fixes #167

For pseudo data, all we care about for the node is the hash of the file. So we could even think of overriding the _get_objects_to_hash and just return the md5 and element.

@unkcpz
Copy link
Member Author

unkcpz commented Dec 8, 2023

Somehow the objcets are the same but the hash values are different? I think I miss some key behaviour of make_hash function.

@sphuber
Copy link
Contributor

sphuber commented Dec 9, 2023

It is not a misunderstanding of the make_hash function but rather that Node.get_hash does not call Node._get_objects_to_hash() but Node.base.caching._get_objects_to_hash. You should override the Node._CLS_NODE_CACHING attribute with a custom version of NodeCaching, i.e., something like:

from aiida.orm.nodes.caching import NodeCaching

class PseudoPotentialDataCaching(NodeCaching):
    """Class to define caching behavior of ``PseudoPotentialData`` nodes."""

    def _get_objects_to_hash(self) -> list:
        """Return a list of objects which should be included in the node hash."""
        return [self.element, self.md5]


class PseudoPotentialData(Data):

    _CLS_NODE_CACHING = PseudoPotentialDataCaching

That should make the tests pass. You have to change the original._get_objects_to_hash to original.base.caching._get_objects_to_hash of course.

@unkcpz
Copy link
Member Author

unkcpz commented Dec 12, 2023

@sphuber It works! thanks!

@sphuber
Copy link
Contributor

sphuber commented Dec 13, 2023

Last thing I was wondering @unkcpz . Why do we need to include the element? Should the file content not really just be sufficient? Could it really be possible where you have two identical files but they should be for different elements?

Pinging also @mbercx for his thoughts

@unkcpz
Copy link
Member Author

unkcpz commented Dec 13, 2023

Could it really be possible where you have two identical files but they should be for different elements?

Because I saw the setter function for the element and then I was thinking that there might be different site labels for the same element using the same UPF file.
I think it all depends on whether we assume user will change the element attribute or not. Afterall, the element is parsed from file, the same md5 will guarantee the same element.

@sphuber
Copy link
Contributor

sphuber commented Dec 13, 2023

Afterall, the element is parsed from file, the same md5 will guarantee the same element.

For the subclasses it is yes, but for the PseudoPotentialData it has to be set manually. But regardless on how it is set, if the file content is identical for two different pseudos, how could it possibly for two differente elements? Functionally, there would be zero difference, so would it be necessary to even include the element attribute?

@unkcpz
Copy link
Member Author

unkcpz commented Dec 13, 2023

I don't have any use case involved with subfix for element, so I am leaning to remove element from caching.
Is there use case in aiida-quantumespresso that may set for instance Fe to Fe_1 and Fe_2, even so that should have nothing to do with the aiida-pseudo I guess?

By default, the hash for data nodes is computed from a set of attributes
that includes the version of `aiida-core` as well as the plugin package
that provides the data plugin, if applicable. This means that the hash
of a data node changes merely if the version of `aiida-core` or the
plugin package changes, even if its content is identical.

This choice was made in `aiida-core` to err on the safe side. However,
for pseudopotentials, this is unnecessary and actually reduces the
effectiveness of caching. If a calculation is run with pseudos that has
already run before, but the version of `aiida-pseudo` is different with
which the pseudo was installed, the calculation will not be cached but
run again.

The `PseudoPotentialData` now customizes the caching class, ensuring the
hash is calculated just based on the element and the file content. This
should uniquely define a pseudopotential and so should be safe enough to
base the hash upon.
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @unkcpz . Let's get this merged. The concept just makes sense. We will find out later if there really is some use-case out there that is not compatible with this change.

@sphuber sphuber merged commit 26a23aa into aiidateam:main Dec 22, 2023
6 checks passed
@unkcpz unkcpz deleted the fix/167/hash-with-md5 branch December 23, 2023 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ignore version and filename when compute hash for the pseudo node
2 participants