Skip to content

Latest commit

 

History

History
115 lines (72 loc) · 7 KB

README.md

File metadata and controls

115 lines (72 loc) · 7 KB

rholearn

DOI

A torch-based workflow for training descriptor-based equivariant neural networks to predict real-space electronic density scalar fields of molecules and materials at near-DFT accuracy.

Author: Joseph W. Abbott, PhD Student @ Lab COSMO, EPFL

Note: under active development, breaking changes are likely!

rholearn workflow summary

Background

Electronic densities, such as the electron density and local density of states, are central quantities in understanding the electronic properties of molecules and materials on the atomic scale. First principles quantum simulations such as density-functional theory (DFT) are able to accurately predict such fields as a linear combination of single-particle solutions to the Kohn-Sham equations. While reliable and accurate, such methods scale unfavourably with the number of electrons in the system.

Machine learning methods offer a complementary solution to probing the electronic structure of matter on the atomic scale. With a sufficiently expressive model, one can learn the mapping between nuclear geometry and real-space electronic density and predict such quantities with more favourable scaling. Typically, predictions can be used to accelerate DFT by providing initial guesses, or directly probe electronic structure.

There are many approaches to learn the aforementioned mapping. In the density fitting approach, the real-space target electronic density $\rho^{\text{DFT}}(\mathbf{r})$ is decomposed onto a linear atom-centered basis set:

$$ \rho^{\text{DFT}}(\mathbf{r}) \approx \rho^{\text{RI}}(\mathbf{r}) = \sum_{b} d_b^{\text{RI}} \ \varphi_b^{\text{RI}}(\mathbf{r}) $$

where $\rho^{\text{RI}}(\mathbf{r})$ the basis set approximation to it, $\varphi_b^{\text{RI}}(\mathbf{r})$ are fitted basis functions (each a product of radial function and spherical harmonics) generated by the resolution-of-the-identity (RI) approach, and $d_b^{\text{RI}}$ are the coefficients that minimize the basis set expansion error for the given basis set definition.

An equivariant model is then trained to predict coefficients $d_b^{\text{ML}}$ that reconstruct a density in real-space, ideally minimising the generalisation error on the real-space DFT densities of a test set.

For one of the original workflows for predicting the electron density under the density-fitting frameworl, readers are referred to SALTED. This uses a symmetry-adapted Gaussian process regression (SA-GPR) method via sparse kernel ridge regression to learn and predict $d_b^{\text{ML}}$.

Goals

rholearn also operates under the density fitting approach. The nuclear coordinates $\to$ electonic density mapping is learned via a feature-based equivariant neural network whose outputs are the predicted coefficients. Currently, rholearn is integrated with the electronic structure code FHI-aims for both data generation and building of real-space fields from predicted coefficients.

rholearn aims to improve the scalability of the density-fitting approach to learning electronic densities. It is built on top of a modular software ecosystem, with the following packages forming the main components of the workflow:

  • metatensor (GitHub) is used as the self-describing block-sparse data storage format, wrapping multidimensional tensors with metadata. Subpackages metatensor-operations and metatensor-learn are used to provide convenient sparse operations and ML building blocks respectively that operate on the metatensor.TensorMap object.
  • rascaline (GitHub) is used to transform the nuclear coordinates into local equivariant descriptors that encode physical symmetries and geometric information for input into the neural network.
  • PyTorch is used as the learning framework, allowing definition of arbitrarily complex neural networks that can be trained by minibatch gradient descent.

Leveraging the speed- and memory-efficient operations of torch, and using building on top of metatensor and rascaline, descriptors, models, and learning methodologies can be flexibly prototyped and customized for a specific learning task.

Getting Started

Installing rholearn

With a working conda installation, first set up an environment:

conda create -n rho python==3.11
conda activate rho

Then clone and install rholearn:

git clone https://github.com/lab-cosmo/rholearn.git
cd rholearn
# Specify CPU-only torch
pip install --extra-index-url https://download.pytorch.org/whl/cpu .

Running tox from the top directory will run linting and formatting. To run some tests (currently limited to testing rholearn.loss), run pytest tests/rholearn/loss.py.

Installing FHI-aims

For generating reference data, using the aims_interface of rholearn, a working installation of FHIaims >= 240926 is required. FHI-aims is not open source but is free for academic use. Follow the instructions on their website fhi-aims.org/get-the-code to get and build the code. The end result should be an executable, compiled for your specific system.

There are also useful tutorials on the basics of running FHI-aims here.

Basic usage

In a run directory, user-options are defined in YAML files named "dft-options.yaml", "hpc-options.yaml", and "ml-options.yaml". Any options specified in these files overwrite the defaults.

Default options can be found in the rholearn/options/ directory, and some templates for user options can be found in the examples/options/ directory.

Then, data can be generated and model training and evaluation run with the following CLI commands:

rholearn_run_scf  # run SCF with FHI-aims

rholearn_process_scf  # process SCF outputs

rholearn_setup_ri_fit  # setup RI fitting calculation

rholearn_run_ri_fit  # run RI fitting with FHI-aims

rholearn_process_ri_fit  # process RI outputs

rholearn_train  # train model with rholearn

rholearn_eval  # evaluate model with rholearn

Tutorial

For a more in-depth walkthrough of the functionality, see this tutorial on data generation using FHI-aims and model training using rholearn.

Citing this work

@software{abbott_2024_13891847,
  author       = {Abbott, Joseph W. and
                  Fraux, Guillaume and
                  Ceriotti, Michele},
  title        = {lab-cosmo/rholearn: rholearn v0.1.0},
  month        = oct,
  year         = 2024,
  publisher    = {Zenodo},
  version      = {v0.1.0},
  doi          = {10.5281/zenodo.13891847},
  url          = {https://doi.org/10.5281/zenodo.13891847}
}