add facilities for crystal structures and polymers #33

sgbaird · 2022-01-15T21:10:07Z

Feature request

Requires valid distance metrics for crystal structures and polymers that encode chemo-structural novelty and polymeric novelty, respectively as well as structure-based regression models. After that, just some basic plumbing.

sgbaird · 2022-10-03T16:39:53Z

Probably makes sense to default to MEGNet for ease of use. @sp8rks mentioned that the Liverpool group has crystal similarity measures that we can use based on an attention network. Ideally that crystal similarity measure would be packaged on PyPI (i.e. pip-installable) and have a function that takes two pymatgen Structure objects and returns a scalar (or takes two lists of pymatgen Structure objects and returns a pairwise distance matrix)

sgbaird · 2022-10-03T16:47:02Z

Some places that need to change:

anything that hardcodes CrabNet; for example, the following which would need to be changed to regressor_kwargs instead:

mat_discover/mat_discover/mat_discover_.py

Lines 358 to 365 in 7c41890

    
           self.crabnet_kwargs = dict( 
        
               mat_prop=self.mat_prop_name, 
        
               losscurve=False, 
        
               learningcurve=False, 
        
               verbose=self.verbose, 
        
               force_cpu=self.force_cpu, 
        
               epochs=self.epochs, 
        
           )

anything that hardcodes ElMD; for example:

mat_discover/mat_discover/mat_discover_.py

Line 704 in 7c41890

X_train.append(ElMD(comp, metric=self.novelty_prop).feature_vector)
probably some other places

Probably best to start by modifying and testing the bare bones example:
https://mat-discover.readthedocs.io/en/latest/examples.html#bare-bones

This is something that a collaborator can modify without knowledge of the mat_discover code architecture.

EDIT: For evaluation metrics, could keep the element metrics and instead of new chemical formula, check if there's a new space group represented. Could also be new space group + new number of sites.

sgbaird · 2022-10-04T19:20:15Z

Based on email discussion:

Taylor brought up some great points, and I think this is an exciting project. There's been a push/encouragement both internal and external to incorporate structure into the search for high-performing, novel materials, and I think this will be a timely extension of DiSCoVeR.

Weighting

how do we weight these? Should it be tunable? Or should it be a fixed ratio of different in composition as well as structure?

For the weighting, perhaps we could use chimera as the scalarizing function. Alternatively, I think it would be interesting/best practice to use these two as separate objectives in a multi-objective optimization via e.g. expected hypervolume improvement. In other words, a mathematically robust way of collapsing multiple objectives in the context of observed data into a single number. Expected hypervolume improvement is taken care of implicitly with most sophisticated multi-objective optimization platforms.

Another option would be using an expected improvement acquisition function, except where the novelty proxy takes the place of uncertainty predictions.

How do we validate performance?

Interesting idea about recognizing new motifs. The structural prototypes from AFLOW seem relevant, since they're going for a set of canonical prototypes IIUC.

Some other issues related to validating performance

Comments on the plumbing to modify for structure:

add facilities for crystal structures and polymers #33

The easiest place to start testing things out is via the mat_discover bare bones script. Today, I adapted this to use a matbench elasticity dataset with pymatgen Structures, M3GNet instead of CrabNet, and a Euclidean fingerprint-based structural distance instead of ElMD. Everything else is the same.

See the notebook below

https://colab.research.google.com/github/sparks-baird/mat_discover/blob/main/examples/structurally-aware-mat-discover-bare-bones.ipynb

When your structural distance metric of choice is ready, then that can be swapped out with the fingerprint-based structural distance. After that comes the most difficult part - validation (hence Taylor's comments). Validation can proceed in a similar fashion to the original one and/or include some extensions/modifications to how validation is performed.

sgbaird added the enhancement New feature or request label Jan 15, 2022

sgbaird self-assigned this Jan 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add facilities for crystal structures and polymers #33

add facilities for crystal structures and polymers #33

sgbaird commented Jan 15, 2022 •

edited

Loading

sgbaird commented Oct 3, 2022

sgbaird commented Oct 3, 2022 •

edited

Loading

sgbaird commented Oct 4, 2022

add facilities for crystal structures and polymers #33

add facilities for crystal structures and polymers #33

Comments

sgbaird commented Jan 15, 2022 • edited Loading

Feature request

sgbaird commented Oct 3, 2022

sgbaird commented Oct 3, 2022 • edited Loading

sgbaird commented Oct 4, 2022

Weighting

How do we validate performance?

Some other issues related to validating performance

Comments on the plumbing to modify for structure:

sgbaird commented Jan 15, 2022 •

edited

Loading

sgbaird commented Oct 3, 2022 •

edited

Loading