diff --git a/README.md b/README.md index 0ce6ede..5601e0e 100644 --- a/README.md +++ b/README.md @@ -1,121 +1,89 @@ -WorkFlow for Automated Cluster Expansion Regression (WFACER) +WorkFlow for Automated Cluster Expansion Regression (WFacer) =================================================== -*Modulated automation of cluster expansion based on atomate2 and Jobflow* +*Modulated automation of cluster expansion model construction based on atomate2 and Jobflow* ----------------------------------------------------------------------------- -**WFacer** is a light-weight package based on [**smol**](https://github.com/CederGroupHub/smol.git) -to automate the building of energy models in crystalline material systems, based on the -*cluster expansion* method from alloy theory. Beyond metallic alloys, **WFacer** is also designed -to handle ionic systems through enabling charge **Decorator**s and external **EwaldTerm**. With the -support of [**Atomate2**](https://github.com/materialsproject/atomate2.git), -[**Jobflow**](https://github.com/materialsproject/jobflow.git) -and [**Fireworks**](https://github.com/materialsproject/fireworks.git), **WFacer** is able to fully automate the +**WFacer** ("Wall"Facer) is a light-weight package based on [smol](https://github.com/CederGroupHub/smol.git) +to automate the fitting of lattice models in disordered crystalline solids using +*cluster expansion* method. Beyond metallic alloys, **WFacer** is also designed +to handle ionic systems through enabling charge **Decorator** and the external **EwaldTerm**. Powered by [Atomate2](https://github.com/materialsproject/atomate2.git), +[Jobflow](https://github.com/materialsproject/jobflow.git) +and [Fireworks](https://github.com/materialsproject/fireworks.git), **WFacer** is able to fully automate the cluster expansion building process on super-computing clusters, and can easily interface -with materials-project style MongoDB data storage. +with MongoDB data storage in the **Materials Project** style . Functionality ------------- **WFacer** currently supports the following functionalities: -- Preprocess setup to a cluster expansion workflow as dictionary. -- Enumerating and choosing the least aliasing super-cell matrices with given number of sites; - enumerating charge balanced compositions in super-cells; Enumerating and selecting low-energy, - non-duplicated structures into the training set at the beginning of each iteration. -- Computing enumerated structures using **Atomate2** **VASP** interfaces. -- Extracting and saving relaxed structures information and energy in **Atomate2** schemas. -- Decorating structures. Currently, supports charge decoration from fixed labels, from Pymatgen guesses, - or from [a gaussian process](https://doi.org/10.1038/s41524-022-00818-3) to partition site magnetic moments. -- Fitting effective cluster interactions (ECIs) from structures and energies with sparse linear - regularization methods and model selection methods provided by - [**sparse-lm**](https://github.com/CederGroupHub/sparse-lm.git), - except for overlapped group Lasso regularization. -- Checking convergence of cluster expansion model using the minimum energy convergence per composition, - the cross validation error, and the difference of ECIs (if needed). -- Creating an **atomate2** style workflow to be executed locally or with **Fireworks**. +- Preprocess setup to a cluster expansion workflow as dictionary. +- Enumerating and choosing the least aliasing super-cell matrices with given number of sites; + enumerating charge balanced compositions in super-cells; Enumerating and selecting low-energy, + non-duplicated structures into the training set at the beginning of each iteration. +- Computing enumerated structures using **Atomate2** **VASP** interfaces. +- Extracting and saving relaxed structures information and energy in **Atomate2** schemas. +- Decorating structures. Currently, supports charge decoration from fixed labels, from Pymatgen guesses, + or from [a gaussian optimization process](https://doi.org/10.1038/s41524-022-00818-3) based on partitioning site magnetic moments. +- Fitting effective cluster interactions (ECIs) from structures and energies with sparse linear + regularization methods and model selection methods provided by + [sparse-lm](https://github.com/CederGroupHub/sparse-lm.git), + except for overlapped group Lasso regularization. +- Checking convergence of cluster expansion model using the minimum energy convergence per composition, + the cross validation error, and the difference of ECIs (if needed). +- Creating an **atomate2** style workflow to be executed locally or with **Fireworks**. Installation ------------ -1. Install the latest [**smol**](https://github.com/CederGroupHub/smol.git) - and [**sparse-lm**](https://github.com/CederGroupHub/sparse-lm.git) from repository. - (Deprecate after **smol**>=0.3.2 and **sparse-lm**>=0.3.2 update). -2. Install WFacer: - * From pypi: `pip install WFacer` - * From source: `Clone` the repository. The latest tag in the `main` branch is the stable version of the +* From pypi: `pip install WFacer` +* From source: `Clone` the repository. The latest tag in the `main` branch is the stable version of the code. The `main` branch has the newest tested features, but may have more -lingering bugs. From the top level directory: `pip install .` +lingering bugs. From the top level directory, do `pip install -r requirements.txt`, then `pip install .` If +you wish to use **Fireworks** as the workflows manager, do `pip install -r requirements-optional.txt` as well. Post-installation configuration ------------ Specific configurations are required before you can properly use **WFacer**. -* **Firework** job management is optional but not required. - To use job management with **Fireworks** and **Atomate2**, - configuring **Fireworks** and **Atomate2** with your MongoDB storage is necessary. - Users are advised to follow the guidance in - [**Atomate2**](https://materialsproject.github.io/atomate2/user/install.html) and - [**Atomate**](https://atomate.org/installation.html#configure-database-connections-and-computing-center-parameters) - installation guides, and run a simple [test workflow](https://materialsproject.github.io/atomate2/user/fireworks.html) - to see if it is able to run on your queue. - - Important notice: instead of writing in **my_qadapter.yaml** - ```commandline - rlaunch -c <>/config rapidfire - ``` - we suggest using singleshot in rlaunch instead, because by doing this a queue task will - be terminated upon one structure is finished, rather than trying to fetch another waiting structure - from the launchpad. This will guarantee that each structure to be able to use up the maximum wall-time - possible. - By switching to singleshot in rlaunch, you will need to qlaunch after every iteration trigger job because for some reason Fireworks sets the enumeration job to ready, but can not continue executing it. -* A mixed integer programming (MIP) solver would be necessary when a MIQP based - regularization method is used. A list of available MIP solvers can be found in - [**cvxpy** documentations](https://www.cvxpy.org/tutorial/advanced/index.html#choosing-a-solver). - Commercial solvers such as **Gurobi** and **CPLEX** are typically pre-compiled - but require specific licenses to run on a super-computing system. For open-source solvers, - the users are recommended to install **SCIP** in a dedicated conda environment following - the installation instructions in [**PySCIPOpt**](https://github.com/scipopt/PySCIPOpt.git). - -Quick example for semi-automation using Fireworks +- We highly recommend to use **Fireworks** but it is not required. + To use job management with **Fireworks** and **Atomate2**, + configuring **Fireworks** and **Atomate2** with your MongoDB storage is necessary. + Users are advised to follow the guidance in + [**Atomate2**](https://materialsproject.github.io/atomate2/user/install.html) and + [**Atomate**](https://atomate.org/installation.html#configure-database-connections-and-computing-center-parameters) + installation guides, and run a simple [test workflow](https://materialsproject.github.io/atomate2/user/fireworks.html) + to see if it is able to run on your queue. + + Instead of writing in **my_qadapter.yaml** as + ```commandline + rlaunch -c <>/config rapidfire + ``` + we suggest using: + ```commandline + rlaunch -c <>/config singleshot + ``` + because by using *singleshot* with rlaunch, a task in the submission queue will + be terminated once a structure is finished instead of trying to fetch another structure + from the launchpad. This can be used in combination with: + ```commandline + qlaunch rapidfire -m + ``` + to guarantee that each structure is able to use up the maximum wall-time in + its computation. + +* A mixed integer programming (MIP) solver would be necessary when a MIQP based + regularization method is used. A list of available MIP solvers can be found in + [cvxpy documentations](https://www.cvxpy.org/tutorial/advanced/index.html#choosing-a-solver). + Commercial solvers such as **Gurobi** and **CPLEX** are typically pre-compiled + but require specific licenses to run on a super-computing system. For open-source solvers, + the users are recommended to install **SCIP** in a dedicated conda environment following + the installation instructions in [PySCIPOpt](https://github.com/scipopt/PySCIPOpt.git). + +A quick example for fully automating cluster expansion ------------------------------- -examples/semi_automation_BCC_AlLi shows a use case where you can semi-automate building the cluster expansion for -the Al-Li system on BCC lattice. - -You will need to manually execute **initialize.py** in the first iteration, and then in each of the following iterations, -execute **generate.py** to enumerate new structures and load their corresponding workflows to fireworks launcpad. Then in -the command line, call - -```commandline -nohup qlaunch rapidfire -m {n_jobs} --sleep {time} > qlaunch.log -``` - -in order to run all workflows. Check the status of the queue until all queue tasks are terminated, -and no firework on the launchpad is lost (i.e., some firework in the "RUNNING" state but nothing is being run on -the queue). If lost jobs are found, you may choose to fizzle or rerun them with - -```commandline -lpad detect_lostruns --time 1 --fizzle -``` - -or - -```commandline -lpad detect_lostruns --time 1 --rerun -``` - -When all structures are finished, call **fit_model.py** to parse calculations and fit ECIs. Start the next iteration -by enumerating new structures with **generate.py** again. - - -Quick example for full automation[beta] -------------------------------- -Notice: -Since cluster expansion might include structures that takes a long time to compute, or may fail to relax, -and Jobflow + Fireworks might not always handle these cases properly, the following full automation workflow -could be flakey. - A simple workflow to run automatic cluster expansion in a Ag-Li alloy on FCC lattice is as follows -(see other available options in preprocessing documentations.): +(see other available options in the documentations of [*preprocessing.py*](WFacer/preprocessing.py).): ```python from fireworks import LaunchPad @@ -146,7 +114,7 @@ lpad.add_wf(wf) After running this script, a workflow with the name *"agli_fcc_ce"* should have been added to **Fireworks**' launchpad. -Submit the workflow to queue using the following command after you have correctly configured **Fireworks** +Submit the workflow to queue using the following command once you have correctly configured **Fireworks** queue adapter, ```bash nohup qlaunch rapidfire -m {n_jobs} --sleep {time} > qlaunch.log @@ -155,9 +123,11 @@ where `n_jobs` is the number of jobs you want to keep in queue, and `time` is th time between two queue submission attempts. `qlaunch` will keep submitting jobs to the queue until no unfinished job could be found on launchpad. -After the workflow is finished, use the following codes to retrieve the computed results from MongoDB, -(Assume you run the workflow generation script and the following dataloading script -on the same machine, otherwise you will have to figure out which `JOB_STORE` to use!): +> Note: You may still need to qlaunch manually after every cluster expansion iteration +> because for Fireworks could occasionally set the enumeration job to the READY state +> but fails to continue executing the job. + +After the workflow is finished, use the following codes to retrieve the computed results from MongoDB: ```python from jobflow import SETTINGS @@ -181,3 +151,23 @@ print("Cluster subspace:", doc.cluster_subspace) print("Wrangler:", doc.data_wrangler) print("coefficients:", doc.coefs_history[-1]) ``` +> Note: Check that the **Jobflow** installations on the computer cluster and the query +> terminal are configured to use the same **JOB_STORE**. + +Copyright Notice +---------------- +Workflow for automated cluster expansion regression (WFacer) Copyright (c) 2023, +The Regents of the University of California, through Lawrence Berkeley National +Laboratory (subject to receipt of any required approvals from the U.S. +Dept. of Energy) and the University of California, Berkeley. All rights reserved. + +If you have questions about your rights to use or distribute this software, +please contact Berkeley Lab's Intellectual Property Office at +IPO@lbl.gov. + +> NOTICE: This Software was developed under funding from the U.S. Department +> of Energy and the U.S. Government consequently retains certain rights. As +> such, the U.S. Government has been granted for itself and others acting on +> its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the +> Software to reproduce, distribute copies to the public, prepare derivative +> works, and perform publicly and display publicly, and to permit others to do so. diff --git a/WFacer/convergence.py b/WFacer/convergence.py index a4892be..685be05 100644 --- a/WFacer/convergence.py +++ b/WFacer/convergence.py @@ -9,24 +9,25 @@ def compare_min_energy_structures_by_composition(min_e1, min_e2, matcher=None): """Compare minimum energy and structure by composition for convergence check. - We will only compare keys that exist in both older and newer iterations. - If one composition appears in the older one but not the newer one, we will not - claim convergence. + We will only compare keys that exist in both older and newer iterations. + If one composition appears in the older one but not the newer one, we will not + claim convergence. Args: min_e1 (defaultdict): Minimum energies and structures from an earlier iteration. min_e2 (defaultdict): Minimum energies and structures from a later iteration. - See docs in WFacer.wrangling. + See documentation of :mod:`WFacer.wrangling`. matcher (StructureMatcher): optional A StructureMatcher used compare structures. wrangler.cluster_subspace._site_matcher is recommended. - Return: + Returns: float, bool: - maximum energy difference in eV/site, - and whether a new ground state structure appeared. + The maximum energy difference compared across compositions + (unit: eV/site), and whether a new ground-state structure + has appeared. """ diffs = [] matches = [] @@ -48,15 +49,18 @@ def compare_fitted_coefs(cluster_subspace, coefs_prev, coefs_now): Args: cluster_subspace(ClusterSubspace): The cluster subspace used in fitting. - coefs_prev(1d arrayLike): + coefs_prev(1D ArrayLike): Cluster coefficients fitted in the previous iteration. - Not ECIs because not divided by multiplicity! - coefs_now(1d arrayLike): + They are not ECIs as they are not divided by multiplicity! + coefs_now(1D ArrayLike): Cluster coefficients fitted in the latest iteration. Returns: float: - || ECI' - ECI ||_1 / ||ECI||_1. + :math:`|| J' - J ||_1 / ||J||_1`, + where :math:`J` represents the coefficients from the last + iteration and :math:`J'` represents coefficients from the + current iteration. """ # Get ECIs from coefficients. eci_prev = ClusterExpansion(cluster_subspace, coefficients=coefs_prev).eci @@ -71,11 +75,11 @@ def ce_converged( """Check whether the ce workflow has converged. Args: - coefs_history(list[list[float]]): + coefs_history(list of lists of float): CE coefficients from all past iterations. - cv_history(list[float]): + cv_history(list of float): Past cross validation errors. - cv_std_history(list[float]): + cv_std_history(list of float): Past cross validation standard deviations. The length of the first three arguments must be equal. @@ -87,7 +91,8 @@ def ce_converged( Pre-processed convergence criterion. Returns: - bool. + bool: + Whether the cluster expansion has converged. """ # Wrangler is not empty, but its maximum iteration index does not match the # last iteration. diff --git a/WFacer/enumeration.py b/WFacer/enumeration.py index 22a1528..9d0f27f 100644 --- a/WFacer/enumeration.py +++ b/WFacer/enumeration.py @@ -1,10 +1,10 @@ """This module implements a StructureEnumerator class for CE sampling. -Algorithm based on: +The algorithm is based on the work of +`A. Seko et al `_. -Ground state structures will also be added to the structure pool, but -they are not added here. They will be added in the convergence checker -module. +Ground state structures will also be included in the structure pool if not +included yet. """ __author__ = "Fengyu Xie" @@ -26,7 +26,7 @@ from .utils.supercells import get_three_factors, is_duplicate_sc -# TODO: in the future, may employ mcsqs type algos. +# TODO: in the future, may employ mcsqs-like algos. def enumerate_matrices( objective_sc_size, cluster_subspace, @@ -44,12 +44,14 @@ def enumerate_matrices( objective_sc_size(int): Objective supercell size in the number of primitive cells. Better be a multiple of det(conv_mat). - cluster_subspace(smol.ClusterSubspace): + cluster_subspace(ClusterSubspace): The cluster subspace. cluster_subspace.structure must be pre-processed such that it is the true primitive cell in under its space group symmetry. - Note: The cluster_subspace.structure must be reduced to a - primitive cell! + + .. note:: The structure of :class:`ClusterSubspace` must be reduced to a + primitive cell! + supercell_from_conventional(bool): optional Whether to enumerate supercell matrices in the form M@T, where M is an integer matrix, T is the primitive to conventional cell @@ -61,11 +63,12 @@ def enumerate_matrices( min_sc_angle(float): Minimum allowed angle of the supercell lattice. By default, set to 30, to prevent over-skewing. - kwargs: - keyword arguments to pass into SpaceGroupAnalyzer. + **kwargs: + keyword arguments to pass into :class:`SpaceGroupAnalyzer`. Returns: - List of 2D lists. + list of 2D lists: + Enumerated super-cell matrices. """ if not supercell_from_conventional: conv_mat = np.eye(3, dtype=int) @@ -184,7 +187,8 @@ def truncate_cluster_subspace(cluster_subspace, sc_matrices): Enumerated super-cell matrices. Returns: - ClusterSubspace: truncated subspace without aliased orbits. + ClusterSubspace: + The truncated cluster subspace without aliasing orbits. """ alias = [] for m in sc_matrices: @@ -221,30 +225,31 @@ def enumerate_compositions_as_counts( ): """Enumerate compositions in a given supercell size. - Results will be returned in "counts" format - (see smol.moca.CompositionSpace). + Results will be returned in "counts" format, + see documentation of :mod:`smol.moca.composition` for details. Args: sc_size(int): The super-cell size in the number of prim cells. comp_space(CompositionSpace): optional - Composition space in a primitive cell. If not given, - arguments "bits" and "sublattice_sizes" must be given. - bits(List[List[Species|DummySpecies|Element|Vacancy]]): + Composition space in a primitive cell. If not given, the + arguments **bits** and **sublattice_sizes** must be given. + bits(list of Lists of Species or DummySpecies or Element or Vacancy): Allowed species on each sub-lattice. - sublattice_sizes(List[int]): - Number of sites in each sub-lattice in a prim cell. + sublattice_sizes(list of int): + The number of sites in each sub-lattice in a prim cell. comp_enumeration_step(int): Step in returning the enumerated compositions. If step = N > 1, on each dimension of the composition space, we will only yield one composition every N compositions. Default to 1. - kwargs: - Other keyword arguments to initialize CompositionSpace. + **kwargs: + Other keyword arguments used to initialize a :class:`CompositionSpace`. Returns: - Enumerated possible compositions in "counts" format, not normalized: - 2D np.ndarray[int] + 2D np.ndarray of int: + Enumerated possible compositions in "counts" format + (**NOT** normalized by supercell size). """ if comp_space is None: if bits is None or sublattice_sizes is None: @@ -271,7 +276,7 @@ def enumerate_compositions_as_counts( def get_num_structs_to_sample( all_counts, num_structs_select, scale=3, min_num_per_composition=2 ): - """Get number of structures to sample in each McSampleGenerator. + """Get number of structures to sample in each :class:`McSampleGenerator`. Args: all_counts(ArrayLike): @@ -390,18 +395,18 @@ def generate_training_structures( ce(ClusterExpansion): ClusterExpansion object initialized as null. If charge decorated, will contain an ewald contribution at 100% - enumerated_matrices(list[3*3 ArrayLike[int]]): + enumerated_matrices(list of 3*3 ArrayLike of int): Previously enumerated supercell matrices. Must be the same super-cell size. - enumerated_counts(list[1D ArrayLike]): + enumerated_counts(list of 1D ArrayLike): Previously enumerated compositions in "counts" format. Must fit in the super-cell size. Note: Different super-cell sizes not supported! - previous_sampled_structures(list[Structure]): optional + previous_sampled_structures(list of Structure): optional Sample structures already calculated in past iterations. If given, that means you will add structures to an existing training set. - previous_feature_matrix(list[list[[float]]): optional + previous_feature_matrix(list of lists of float): optional Correlation vectors of structures already calculated in past iterations. keep_ground_states(bool): optional Whether always to include the electrostatic ground states. @@ -421,18 +426,20 @@ def generate_training_structures( duplicacy_criteria(str): The criteria when to consider two structures as the same and old to add one of them into the candidate training set. - Default is "correlations", which means to assert duplication - if two structures have the same correlation vectors. While - "structure" means two structures must be symmetrically equivalent - after being reduced. No other option is allowed. - Note that option "structure" might be significantly slower since - it has to attempt reducing every structure to its primitive cell - before matching. It should be used with caution. - kwargs: - Keyword arguments for utils.selection.select_initial_rows. + + #. (Default) "correlations", which means to assert duplication + if two structures have the same correlation vectors. + #. "structure" means two structures must be symmetrically equivalent + after being reduced. No other option is allowed. + + .. note:: The option "structure" could be considerably slower as + it attempts to reduce every structure into a primitive cell + before matching. Used with caution! + **kwargs: + Keyword arguments for :func:`WFacer.utils.selection.select_initial_rows`. Returns: - list[Structure], list[3*3 list[list[int]]], list[list[float]]: + list of Structure, list of 3*3 list of lists of int, list of lists of float: Initial training structures, super-cell matrices, and normalized correlation vectors. """ diff --git a/WFacer/fit.py b/WFacer/fit.py index 339d322..cb5061c 100644 --- a/WFacer/fit.py +++ b/WFacer/fit.py @@ -27,30 +27,33 @@ def fit_ecis_from_wrangler( ): """Fit ECIs from a fully processed wrangler. - No weights will be used. + .. note:: Currently, this function does not support adjusting sample weights. Args: wrangler(CeDataWrangler): - A CeDataWrangler storing all training structures. + A :class:`CeDataWrangler` to store all training structures. estimator_name(str): The name of estimator, following the rules in - smol.utils.class_name_from_str. + :mod:`smol.utils.class_name_from_str`. optimizer_name(str): - Name of hyperparameter optimizer. Currently, only supports GridSearch and - LineSearch. + The name of model optimizer. Currently, only supports + :class:`GridSearch` and :class:`LineSearch` + from :mod:`sparse-lm.model_selection`. param_grid(dict|list[tuple]): - Parameter grid to initialize the optimizer. See docs of - sparselm.model_selection. + Parameter grid to initialize the optimizer. See documentation of + :mod:`sparselm.model_selection`. use_hierarchy(bool): optional Whether to use cluster hierarchy constraints when available. Default to true. center_point_external(bool): optional - Whether to fit the point and external terms with linear regression - first, then fit the residue with regressor. Default to None, which means - when the feature matrix is full rank, will not use centering, otherwise - centers. If set to True, will force centering, but use at your own risk - because this may cause very large CV. If set to False, will never use - centering. + Whether to perform centering operation, which means to fit the point and + the external terms using linear regression first, then fit the residue + with the specified regressor. Default to None, which means + when the feature matrix is full-ranked, will not use centering, otherwise + will perform centering. + If set to True, will always use centering, but use at your own risk + because this may cause very large CV when the feature matrix is full rank. + If set to False, will never perform centering. filter_unique_correlations(bool): If the wrangler have structures with duplicated correlation vectors, whether to fit with only the one with the lowest energy. @@ -59,12 +62,14 @@ def fit_ecis_from_wrangler( Other keyword arguments to initialize an estimator. optimizer_kwargs(dict): optional Other keyword arguments to initialize an optimizer. - kwargs: - Keyword arguments used by estimator._fit. For example, solver arguments. + **kwargs: + Keyword arguments used by the estimator._fit method. + For example, solver specifications. Returns: Estimator, 1D np.ndarray, float, float, float, 1D np.ndarray: - Fitted estimator, coefficients (not ECIs), cross validation error (meV/site), + Fitted estimator, coefficients (not ECIs), + cross validation error (meV/site), standard deviation of CV (meV/site) , RMSE(meV/site) and corresponding best parameters. """ diff --git a/WFacer/jobs.py b/WFacer/jobs.py index 9e071aa..7c2367a 100644 --- a/WFacer/jobs.py +++ b/WFacer/jobs.py @@ -1,4 +1,4 @@ -"""Unitary jobs used by Maker.""" +"""Unitary jobs used by an atomate2 workflow.""" import logging from copy import deepcopy from warnings import warn @@ -46,6 +46,7 @@ def _preprocess_options(options): + """Pre-process and concatenate options.""" sc_options = process_supercell_options(options) comp_options = process_composition_options(options) struct_options = process_structure_options(options) @@ -110,7 +111,7 @@ def _enumerate_structures( def _get_vasp_makers(options): - """Get required vasp makers.""" + """Get the required VASP makers.""" relax_gen_kwargs = options["relax_generator_kwargs"] relax_generator = RelaxSetGenerator(**relax_gen_kwargs) relax_maker_kwargs = options["relax_maker_kwargs"] @@ -140,7 +141,7 @@ def _get_vasp_makers(options): def _check_flow_convergence(taskdoc): - """Check vasp convergence for a single structure.""" + """Check VASP convergence for a single structure.""" try: status = taskdoc.calcs_reversed[0].has_vasp_completed if status == TaskState.FAILED: @@ -198,7 +199,7 @@ def enumerate_structures(last_ce_document): Returns: dict: - Newly enumerated structures, super-cell matrices + The newly enumerated structures with super-cell matrices and feature vectors. """ @@ -264,14 +265,14 @@ def get_structure_calculation_flows(enum_output, last_ce_document): Args: enum_output(dict): - Output by enumeration job. + Output from the enumeration job. last_ce_document(CeOutputsDocument): - The last cluster expansion outputs document. + The cluster expansion outputs document from the last iteration. Returns: - list[Flow], list[OutputReference]: + list of Flow, list of OutputReference: Flows for each structure and their output references pointing - at the final TaskDoc. + at the final :class:`TaskDoc`. """ project_name = last_ce_document.project_name iter_id = last_ce_document.last_iter_id + 1 @@ -320,8 +321,9 @@ def get_structure_calculation_flows(enum_output, last_ce_document): def calculate_structures_job(enum_output, last_ce_document): """Calculate newly enumerated structures. - Note: it will replace itself with workflows to run for - each structure. + .. note:: This job will replace itself with the calculation jobs to run for + each structure. + Args: enum_output(dict): Output by enumeration job. @@ -329,8 +331,8 @@ def calculate_structures_job(enum_output, last_ce_document): The last cluster expansion outputs document. Returns: - list[TaskDoc]: - Results of VASP calculations as TaskDoc. + list of TaskDoc: + Results of VASP calculations, in the form of :class:`emmet.core.TaskDoc`. """ project_name = last_ce_document.project_name iter_id = last_ce_document.last_iter_id + 1 @@ -344,23 +346,24 @@ def calculate_structures_job(enum_output, last_ce_document): def parse_calculations(taskdocs, enum_output, last_ce_document): - """Parse finished calculations into CeDataWrangler. + """Parse finished calculations into :class:`CeDataWrangler`. Gives CeDataEntry with full decoration. Each computed structure will be re-decorated and re-inserted every iteration. + Args: - taskdocs(list[TaskDoc]): - Task documents generated by vasp computations of - added structures. + taskdocs(list of TaskDoc): + Task documents generated as results of VASP computations. enum_output(dict): Output by enumeration job. last_ce_document(CeOutputsDocument): The last cluster expansion outputs document. Returns: - dict - Updated wrangler, all entries before decoration, - and all computed properties. + dict: + A dictionary containing the updated wrangler with successfully decorated + and mapped calculations, the computed structure entries of all structures + before decoration, and the computed properties for all structures. """ options = last_ce_document.ce_options prim_specs = last_ce_document.prim_specs @@ -478,7 +481,8 @@ def fit_calculations(parse_output, last_ce_document): Returns: dict: - Dictionary containing fitted CE information. + A dictionary containing the CE coefficients, the cross-validation error, + the RMSE, and the optimal hyperparameters. """ options = last_ce_document.ce_options _, coefs, cv, cv_std, rmse, params = fit_ecis_from_wrangler( @@ -514,7 +518,8 @@ def update_document(enum_output, parse_output, fit_output, last_ce_document): Returns: CeOutputDocument: - The updated document. + The updated :class:`CeOutputDocument` upon finishing the current + iteration. """ ce_document = deepcopy(last_ce_document) ce_document.data_wrangler = deepcopy(parse_output["wrangler"]) @@ -554,8 +559,9 @@ def update_document(enum_output, parse_output, fit_output, last_ce_document): def initialize_document(prim, project_name="ace-work", options=None): """Initialize an empty cluster expansion document. - In this job, a cluster subspace will be created, super-cells - and compositions will also be enumerated. + A :class:`ClusterSubspace` instance will be created and trimmed out of duplicacy + in this job, the supercell matrices and compositions to be used for structure + generation will also be enumerated. Args: prim(structure): @@ -569,7 +575,8 @@ def initialize_document(prim, project_name="ace-work", options=None): options(dict): optional A dictionary including all options to set up the automatic workflow. - For available options, see docs in preprocessing.py. + For available options, see documentation of + :mod:'WFacer.preprocessing'. """ # Pre-process options. options = options or {} diff --git a/WFacer/maker.py b/WFacer/maker.py index f1529ea..fd133cc 100644 --- a/WFacer/maker.py +++ b/WFacer/maker.py @@ -1,4 +1,4 @@ -"""Automatic jobflow maker.""" +"""Automatic cluster expansion workflow maker.""" from dataclasses import dataclass, field from warnings import warn @@ -16,7 +16,7 @@ @job def ce_step_trigger(last_ce_document): - """Trigger a step in CE iteration. + """Triggers a CE iteration. Args: last_ce_document(CeOutputsDocument): @@ -25,7 +25,7 @@ def ce_step_trigger(last_ce_document): Returns: Response: - Either a CeOutputsDocument if converged, or a + A :class:`CeOutputsDocument` if converged, or a response to replace with another step. """ iter_id = last_ce_document.last_iter_id + 1 @@ -87,13 +87,16 @@ def ce_step_trigger(last_ce_document): @dataclass class AutoClusterExpansionMaker(Maker): - """The cluster expansion automatic workflow maker. + """Automatic cluster expansion workflow maker. Attributes: name(str): - Name of the cluster expansion project. Since the underscore - will be used to separate fields of job names, it should not - appear in the project name! + The name of the cluster expansion project. + + .. note:: Since the underscore ("_") will be used to separate + the fields in job names, it should not appear in the project + name here! + options(dict): A dictionary including all options to set up the automatic workflow. @@ -124,7 +127,7 @@ def make(self, prim, last_document=None, add_num_iterations=None): Returns: Flow: - The iterative cluster expansion workflow. + The iterative automatic cluster expansion workflow. """ if last_document is None: initialize = initialize_document_job( diff --git a/WFacer/preprocessing.py b/WFacer/preprocessing.py index a33c484..22d9c33 100644 --- a/WFacer/preprocessing.py +++ b/WFacer/preprocessing.py @@ -20,11 +20,12 @@ def reduce_prim(prim, **kwargs): Args: prim(Structure): A primitive cell with partial occupancy to be expanded. - kwargs: - Keyword arguments for SpacegroupAnalyzer. + **kwargs: + Keyword arguments for initializing :class:`SpacegroupAnalyzer`. Returns: - Structure + Structure: + The primitive cell reduced from the input structure. """ sa = SpacegroupAnalyzer(prim, **kwargs) # TODO: maybe we can re-define site_properties transformation @@ -38,23 +39,24 @@ def construct_prim(bits, sublattice_sites, lattice, frac_coords, **kwargs): Provides a helper method to initialize a primitive cell. Of course, a prim cell can also be parsed directly from a given Structure object or file. + Args: - bits(List[List[Specie]]): + bits(list of lists of Species): Allowed species on each sublattice. No sorting required. - sublattice_sites(List[List[int]]): + sublattice_sites(list of lists of int): Site indices in each sub-lattice of a primitive cell. Must include all site indices in range(len(frac_coords)) lattice(Lattice): Lattice of the primitive cell. - frac_coords(ArrayLike): + frac_coords(2D ArrayLike): Fractional coordinates of sites. - kwargs: - Keyword arguments for SpacegroupAnalyzer. + **kwargs: + Keyword arguments for initializing :class:`SpacegroupAnalyzer`. Returns: - a reduced primitive cell (not necessarily charge neutral): - Structure + Structure: + A reduced primitive cell (not necessarily charge neutral). """ n_sites = len(frac_coords) if not np.allclose( @@ -95,8 +97,10 @@ def get_prim_specs(prim): and species concentrations are considered the same sub-lattice! Returns: dict: - a spec dict containing bits, sub-lattice sites, - sub-lattice sizes, and more. + A specification dictionary of the cluster expansion space, containing + the species on each sub-lattice, the indices of sites belonging to each + sub-lattice in a primitive cell, whether the system requires charge + decoration and the nearest-neighbor distance. """ unique_spaces = sorted(set(get_site_spaces(prim))) @@ -149,26 +153,25 @@ def get_cluster_subspace( prim(Structure): Reduced primitive cell. charge_decorated(bool): - Whether to use a charge deocration in CE. + Whether to perform a charge decoration in CE. nn_distance(float): Nearest neighbor distance in structure, used to guess cluster cutoffs if argument "cutoffs" is not given. cutoffs(dict): optional Cluster cutoff diameters in Angstrom. - If cutoff values not given, will use a guessing from the nearest neighbor - distance d: - pair=3.5d, triplet=2d, quad=2d. + If cutoff values not given, will guess based on the nearest neighbor + distance d, such as pair=3.5d, triplet=2d, quad=2d. This guessing is formed based on empirical cutoffs in DRX, but not always good for your system. Setting your own cutoffs is highly recommended. use_ewald(bool): optional Whether to use the EwaldTerm when CE is charge decorated. Default to True. ewald_kwargs(dict): optional Keyword arguments to initialize EwaldTerm. See docs in smol.cofe.extern. - other_terms(list[ExternalTerm]): optional + other_terms(list of ExternalTerm): optional List of other external terms to be added besides the EwaldTerm. (Reserved for extensibility.) - kwargs: - Other keyword arguments for ClusterSubspace.from_cutoffs. + **kwargs: + Other keyword arguments of :func:`ClusterSubspace.from_cutoffs`. Returns: ClusterSubspace: @@ -200,38 +203,39 @@ def process_supercell_options(d): Returns: dict: A dict containing supercell matrix options, including the following keys: - supercell_from_conventional(bool): - Whether to find out primitive cell to conventional - standard structure transformation matrix T, and enumerate - super-cell matrices in the form of: M = M'T. - Default to true. If not, will set T to eye(3). - objective_num_sites(int): - The Supercel sizes (in number of sites, both active and inactive) - to approach. - Default to 64. Enumerated super-cell size will be - a multiple of det(T) but the closest one to this objective - size. - Note: since super-cell matrices with too high a conditional - number will be dropped, do not use a super-cell size whose - decompose to 3 integer factors are different in scale. - For example, 17 = 1 * 1 * 17 is the only possible factor - decomposition for 17, whose matrix conditional number will - always be larger than the cut-off (8). - Currently, we only support enumerating super-cells with the - same size. - spacegroup_kwargs(dict): - Keyword arguments used to initialize a SpaceGroupAnalyzer. - Will also be used in reducing the primitive cell. - max_sc_condition_number(float): - Maximum conditional number of the supercell lattice matrix. - Default to 8, prevent overly slender super-cells. - min_sc_angle(float): - Minimum allowed angle of the supercell lattice. - Default to 30, prevent overly skewed super-cells. - sc_matrices(List[3*3 ArrayLike[int]]): - Supercell matrices. Will not enumerate super-cells if this - is given. Default to None. Note: if given, all supercell matrices - must be of the same size! + supercell_from_conventional(bool): + Whether to find out primitive cell to conventional + standard structure transformation matrix T, and enumerate + super-cell matrices in the form of: M = M'T. + Default to true. If not, will set T to eye(3). + objective_num_sites(int): + The Supercel sizes (in number of sites, both active and inactive) + to approach. + Default to 64. Enumerated super-cell size will be + a multiple of det(T) but the closest one to this objective + size. + + .. note:: Since super-cell matrices with too high a conditional + number will be dropped, do not use a super-cell size that could only + be decomposed into three factors largely different in scale. + For example, 17 = 1 * 1 * 17 is the only possible three-factor + decomposition for 17 with a large conditional number of 17. + + Currently, we only support enumerating super-cells with the + same size. + spacegroup_kwargs(dict): + Keyword arguments used to initialize a SpaceGroupAnalyzer. + Will also be used in reducing the primitive cell. + max_sc_condition_number(float): + Maximum conditional number of the supercell lattice matrix. + Default to 8, prevent overly slender super-cells. + min_sc_angle(float): + Minimum allowed angle of the supercell lattice. + Default to 30, prevent overly skewed super-cells. + sc_matrices(list of 3*3 ArrayLike of int): + Supercell matrices. Will not enumerate super-cells if this + is given. Default to None. Note: if given, all supercell matrices + must be of the same size! """ return { "supercell_from_conventional": d.get("supercell_from_conventional", True), @@ -253,63 +257,68 @@ def process_composition_options(d): Returns: dict: A dict containing composition options, including the following keys: - charge_neutral (bool): optional + charge_neutral (bool): optional Whether to add charge balance constraint. Default to true. - other_constraints: - (list of tuples of (1D arrayLike[float], float, str) or str): optional + other_constraints(list): optional Other composition constraints to be applied to restrict the - enumerated compositions. - Allows two formats for each constraint in the list: - 1, A string that encodes the constraint equation. - For example: "2 Ag+(0) + Cl-(1) +3 H+(2) <= 3 Mn2+ +4". - A string representation of constraint must satisfy the following - rules, - a, Contains a relation symbol ("==", "<=", ">=" or "=") are - allowed. - The relation symbol must have exactly one space before and one - space after to separate the left and the right sides. - b, Species strings must be readable by get_species in smol.cofe - .space.domain. No space is allowed within a species string. - For the format of a legal species string, refer to - pymatgen.core.species and smol.cofe. - c, You can add a number in brackets following a species string - to specify constraining the amount of species in a particular - sub-lattice. If not given, will apply the constraint to this - species on all sub-lattices. - This sub-lattice index label must not be separated from - the species string with space or any other character. - d, Species strings along with any sub-lattice index label must - be separated from other parts (such as operators and numbers) - with at least one space. - e, The intercept terms (a number with no species that follows) - must always be written at the end on both side of the equation. - 2, The equation expression, which is a tuple containing a list of - floats of length self.n_dims to give the left-hand side coefficients - of each component in the composition "counts" format, a float to - give the right-hand side, and a string to specify the comparative - relationship between the left- and right-hand sides. Constrained in - the form of a_left @ n = (or <= or >=) b_right. - The components in the left-hand side are in the same order as in - itertools.chain(*self.bits). - Note that all numerical values in the constraints must be set as they are - to be satisfied per primitive cell given the sublattice_sizes! - For example, if each primitive cell contains 1 site in 1 sub-lattice - specified as sublattice_sizes=[1], with the requirement that species - A, B and C sum up to occupy less than 0.6 sites per sub-lattice, then - you must write: "A + B + C <= 0.6". - While if you specify sublattice_sizes=[2] in the same system per - primitive cell, to specify the same constraint, write - "A + B + C <= 1.2" or "0.5 A + 0.5 B + 0.5 C <= 0.6", etc. - See documentation of smol.moca.composition.space. - comp_enumeration_step (int): optional - Skip step in returning the enumerated compositions. - If step > 1, on each dimension of the composition space, - we will only yield one composition in every N compositions. - Default to 1. - compositions (2D arrayLike[int]): optional - Fixed compositions with which to enumerate the structures. If - given, will not enumerate other compositions. - Should be provided as the "species count"-format of CompositionSpace. + enumerated compositions, each allowing two formats: + + #. A string that encodes the constraint equation. + For example, **"2 Ag+(0) + Cl-(1) +3 H+(2) <= 3 Mn2+ +4"**. + A string representation of constraint must satisfy the following + rules: + + #. Contains a relation symbol ("==", "<=", ">=" or "=" are + allowed). The relation symbol must have exactly one space before + and one space after to separate the left and the right sides. + #. Species strings must be readable by function :func:`get_species` + in :mod:`smol.cofe .space.domain`. + No space is allowed within a species string. + For the requirements on a legal species string, refer to + the documentation of :mod:`pymatgen.core.species` + and :mod:`smol.cofe`. + #. You can add a number in brackets following a species string + to specify constraining the amount of species in a particular + sub-lattice. If not given, will apply the constraint to this + species on all sub-lattices. + This sub-lattice index label must not be separated from + the species string with space or any other character. + #. Species strings along with any sub-lattice index label must + be separated from other parts (such as operators and numbers) + with at least one space. + #. The intercept terms (a number with no species that follows) + must always be written at the end on both side of the equation. + + #. The equation expression, which is a tuple containing a list of + floats of length self.n_dims to give the left-hand side coefficients + of each component in the composition "counts" format, a float to + give the right-hand side, and a string to specify the comparative + relationship between the left- and right-hand sides. Constrained in + the form of a_left @ n = (or <= or >=) b_right. + The components in the left-hand side are in the same order as + outlined by :func:`itertools.chain`. + + .. note:: All numerical values in the constraints must be set as they are + to be satisfied per primitive cell given the sublattice_sizes! + For example, if each primitive cell contains 1 site in 1 sub-lattice + specified as sublattice_sizes=[1], with the requirement that species + A, B and C sum up to occupy less than 0.6 sites per sub-lattice, then + you must write: "A + B + C <= 0.6". + While if you specify sublattice_sizes=[2] in the same system per + primitive cell, to specify the same constraint, write + "A + B + C <= 1.2" or "0.5 A + 0.5 B + 0.5 C <= 0.6", etc. + + See documentation of :mod:`smol.moca.composition`. + comp_enumeration_step (int): optional + Skip step in returning the enumerated compositions. + If step > 1, on each dimension of the composition space, + we will only yield one composition in every N compositions. + Default to 1. + compositions (2D ArrayLike of int): optional + Fixed compositions with which to enumerate the structures. If + given, will not enumerate other compositions. + Should be provided as the "species count" format, + see :mod:`smol.moca.composition`. """ return { "comp_enumeration_step": d.get("comp_enumeration_step", 1), @@ -329,40 +338,42 @@ def process_structure_options(d): Returns: dict: A dict containing structure options, including the following keys: - num_structs_per_iter_init (int): - Number of new structures to enumerate in the first iteration. - It is recommended that in each iteration, at least 2~3 - structures are added for each composition. - Default is 60. - num_structs_per_iter_add (int): - Number of new structures to enumerate in each followed iteration. - Default is 40. - sample_generator_kwargs(Dict): - kwargs of CanonicalSampleGenerator. - init_method(str): - Structure selection method in the first iteration. - Default is "leverage". Allowed options include: "leverage" and - "random". - add_method(str): - Structure selection method in subsequent iterations. - Default is 'leverage'. Allowed options are: 'leverage' - and 'random'. - duplicacy_criteria(str): - The criteria when to consider two structures as the same and - old to add one of them into the candidate training set. - Default is "correlations", which means to assert duplication - if two structures have the same correlation vectors. While - "structure" means two structures must be symmetrically equivalent - after being reduced. No other option is allowed. - Note that option "structure" might be significantly slower since - it has to attempt reducing every structure to its primitive cell - before matching. It should be used with caution. - n_parallel(int): optional - Number of generators to run in parallel. Default is to use - a quarter of cpu count. - keep_ground_states(bool): - Whether always to add new ground states to the training set. - Default to True. + num_structs_per_iter_init (int): + Number of new structures to enumerate in the first iteration. + It is recommended that in each iteration, at least 2~3 + structures are added for each composition. + Default is 60. + num_structs_per_iter_add (int): + Number of new structures to enumerate in each followed iteration. + Default is 40. + sample_generator_kwargs(dict): + kwargs of CanonicalSampleGenerator. + init_method(str): + Structure selection method in the first iteration. + Default is "leverage". Allowed options include: "leverage" and + "random". + add_method(str): + Structure selection method in subsequent iterations. + Default is 'leverage'. Allowed options are: 'leverage' + and 'random'. + duplicacy_criteria(str): + The criteria when to consider two structures as the same and + old to add one of them into the candidate training set. + Default is "correlations", which means to assert duplication + if two structures have the same correlation vectors. While + "structure" means two structures must be symmetrically equivalent + after being reduced. No other option is allowed. + + .. note:: The Option "structure" might be significantly slower since + it has to attempt reducing every structure to its primitive cell + before matching. It should be used with caution. + + n_parallel(int): optional + Number of generators to run in parallel. Default is to use + a quarter of cpu count. + keep_ground_states(bool): + Whether always to add new ground states to the training set. + Default to True. """ return { "num_structs_per_iter_init": d.get("num_structs_per_iter_init", 60), @@ -386,53 +397,55 @@ def process_calculation_options(d): Returns: dict: A dict containing calculation options, including the following keys: - apply_strain(3*3 ArrayLike or 1D ArrayLike[float] of 3): - Strain matrix to apply to the structure before relaxation, - in order to break structural symmetry of forces. - Default is [1.03, 1.02, 1.01], which means to - stretch the structure by 3%, 2% and 1% along a, b, and c - directions, respectively. - relax_generator_kwargs(dict): - Additional arguments to pass into an atomate2 - VaspInputGenerator that is used to initialize RelaxMaker. - This is where the pymatgen vaspset arguments should go. - relax_maker_kwargs(dict): - Additional arguments to initialize an atomate2 RelaxMaker. - Not frequently used. - add_tight_relax(bool): - Whether to add a tight relaxation job after a coarse - relaxation. Default to True. - You may want to disable this if your system has - difficulty converging forces or energies. - tight_generator_kwargs(dict): - Additional arguments to pass into an atomate2 VaspInputGenerator - that is used to initialize TightRelaxMaker. - This is where the pymatgen vaspset arguments should go. - tight_maker_kwargs(dict): - Additional arguments to pass into an atomate2 - TightRelaxMaker. A tight relax is performed after - relaxation, if add_tight_relax is True. - Not frequently used. - static_generator_kwargs(dict): - Additional arguments to pass into an atomate2 - VaspInputGenerator that is used to initialize StaticMaker. - This is where the pymatgen vaspset arguments should go. - static_maker_kwargs(dict): - Additional arguments to pass into an atomate2 - StaticMaker. - Not frequently used. - other_properties(list[(str, str)| str]): optional - Other property names beyond "energy" and "uncorrected_energy" - to be retrieved from taskdoc and recorded into the wrangler, - and the query string to retrieve them, paired in tuples. - If only strings are given, will also query with the given - string. - For the rules in writing the query string, refer to utils.query. - By default, will not record any other property. - Refer to the atomate2 documentation for more information. - Note: the default vasp sets in atomate 2 are not specifically - chosen for specific systems. Using your own vasp set input - settings is highly recommended! + apply_strain(3*3 ArrayLike or 1D ArrayLike[float] of three numbers): + Strain matrix to apply to the structure before relaxation, + in order to break structural symmetry of forces. + Default is [1.03, 1.02, 1.01], which means to + stretch the structure by 3%, 2% and 1% along a, b, and c + directions, respectively. + relax_generator_kwargs(dict): + Additional arguments to pass into an :mod:`atomate2` + class :class:`VaspInputGenerator` that is used to specify + a :class:`RelaxMaker`. + This is where the :mod:`pymatgen` VASP set arguments should go. + relax_maker_kwargs(dict): + Additional arguments to initialize an :mod:`atomate2` + class :class:`RelaxMaker`. **Not frequently used**. + add_tight_relax(bool): + Whether to add a tight relaxation job after a coarse + relaxation. Default to True. + You may want to disable this if your system has + difficulty converging forces or energies. + tight_generator_kwargs(dict): + Additional arguments to pass into an :mod:`atomate2` + class :class:`VaspInputGenerator` + that is used to initialize a :class:`TightRelaxMaker`. + This is where the :mod:`pymatgen` VASP set arguments should go. + tight_maker_kwargs(dict): + Additional arguments to pass into an :mod:`atomate2` + class :class:`TightRelaxMaker`. A tight relax is performed after + relaxation, if add_tight_relax is True. **Not frequently used**. + static_generator_kwargs(dict): + Additional arguments to pass into an :mod:`atomate2` + class :class:`VaspInputGenerator` that is used to initialize + a :class:`StaticMaker`. + This is where the :mod:`pymatgen` VASP set arguments should go. + static_maker_kwargs(dict): + Additional arguments to pass into an :mod:`atomate2` + :class:`StaticMaker`. **Not frequently used**. + other_properties(list[(str, str)| str]): optional + Other property names beyond "energy" and "uncorrected_energy" + to be retrieved from :class:`TaskDoc` and recorded into the wrangler, + and the query string to retrieve them, paired in tuples. + If only strings are given, will also query with the given + string. + For the rules in writing the query string, refer + to :mod:`WFacer.utils.query`. + By default, will not record any other property beyond energy. + Refer to :mod:`atomate2` documentation for more information. + .. note:: the default VASP sets in :mod:`atomate2` are not specifically + chosen for specific systems. Using your own VASP input settings + is highly recommended! """ strain_before_relax = d.get("apply_strain", [1.03, 1.02, 1.01]) strain_before_relax = np.array(strain_before_relax) @@ -469,15 +482,16 @@ def process_decorator_options(d): Returns: dict: A dict containing calculation options, including the following keys: - decorator_types(list(str)): optional - Name of decorators to use for each property. If None, will - choose the first one in all implemented decorators - (see specie_decorators module). - decorator_kwargs(list[dict]): optional - Arguments to pass into each decorator. See the doc of each specific - decorator. - decorator_train_kwargs(list[dict]): optional - Arguments to pass into each decorator when calling decorator.train. + decorator_types(list(str)): optional + Name of decorators to use for each property. If None, will + choose the first one in all implemented decorators + (see :mod:`WFacer.specie_decorators`). + decorator_kwargs(list[dict]): optional + Arguments to pass into each decorator. See the documentation + of each specific decorator for its usage. + decorator_train_kwargs(list[dict]): optional + Arguments to pass into each decorator when calling + the function :func:`decorator.train`. """ # Update these pre-processing rules when necessary, # if you have new decorators implemented. @@ -520,20 +534,21 @@ def process_subspace_options(d): Returns: dict: A dict containing fit options, including the following keys: - cutoffs(dict{int: float}): - Cluster cutoff diameters of each type of clusters. If not given, - will guess with nearest neighbor distance in the structure. - Setting your own is highly recommended. - use_ewald(bool): - Whether to use the EwaldTerm as an ExternalTerm in the cluster - space. Only available when the expansion is charge decorated. - Default to True. - ewald_kwargs(dict): - Keyword arguments used to initialize EwaldTerm. - Note: Other external terms than ewald term not supported yet. - from_cutoffs_kwargs(dict): - Other keyword arguments to be used in ClusterSubspace.from_cutoffs, - for example, the cluster basis type. Check smol.cofe for detail. + cutoffs(dict{int: float}): + Cluster cutoff diameters of each type of clusters. If not given, + will guess with nearest neighbor distance in the structure. + Setting your own is highly recommended. + use_ewald(bool): + Whether to use the :class:`EwaldTerm` as an external term in the cluster + space. Only available when the expansion is charge decorated. + Default to True. + ewald_kwargs(dict): + Keyword arguments used to initialize EwaldTerm. + Note: Other external terms than ewald term not supported yet. + from_cutoffs_kwargs(dict): + Other keyword arguments to be used in + the function :func:`ClusterSubspace.from_cutoffs`. + For example, the cluster basis type. """ return { "cutoffs": d.get("cutoffs", None), @@ -553,39 +568,40 @@ def process_fit_options(d): Returns: dict: A dict containing fit options, including the following keys: - estimator_type(str): - The name of an estimator class in sparce-lm. Default to - 'Lasso'. - use_hierarchy(str): - Whether to use hierarchy in regularization fitting, when - estimator type is mixedL0. Default to True. - center_point_external(bool): optional - Whether to fit the point and external terms with linear regression - first, then fit the residue with regressor. Default to None, which means - when the feature matrix is full rank, will not use centering, otherwise - centers. If set to True, will force centering, but use at your own risk - because this may cause very large CV. If set to False, will never use - centering. - filter_unique_correlations(bool): - If the wrangler have structures with duplicated correlation vectors, - whether to fit with only the one with the lowest energy. - Default to True. - estimator_kwargs(dict): - Other keyword arguments to pass in when constructing an - estimator. See sparselm.models - optimizer_type(str): - The name of optimizer class used to optimize model hyperparameters - over cross validation. Default is None. Supports "grid-search-CV" and - "line-search-CV" optimizers. See sparselm.model_selection. - param_grid(dict|list(tuple)): - Parameters grid to search for estimator hyperparameters. - See sparselm.optimizer. - optimizer_kwargs(dict): - Keyword arguments when constructing GridSearch or LineSearch class. - See sparselm.optimizer. - fit_kwargs(dict): - Keyword arguments when calling GridSearch/LineSearch/Estimator.fit. - See docs of the specific estimator. + estimator_type(str): + The name of an estimator class in :mod:`sparce-lm`. Default to + 'Lasso'. + use_hierarchy(str): + Whether to use hierarchy in regularization fitting, when + estimator belongs to :class:`mixedL0`. Default to True. + center_point_external(bool): optional + Whether to fit the point and external terms with linear regression + first, then fit the residue with regressor. Default to None, which means + when the feature matrix is full rank, will not use centering, otherwise + centers. + If set to True, will force centering, but use at your own risk + because this may cause very large CV. If set to False, will never use + centering. + filter_unique_correlations(bool): + If the wrangler have structures with duplicated correlation vectors, + whether to fit with only the one with the lowest energy. + Default to True. + estimator_kwargs(dict): + Other keyword arguments to pass in when constructing an + estimator. See :mod:`sparselm.models`. + optimizer_type(str): + The name of optimizer class used to optimize model hyperparameters + over cross validation. Default is None. Supports :class:`GridSearch` and + :class:`LineSearch` optimizers. See :mod:`sparselm.model_selection`. + param_grid(dict|list(tuple)): + Parameters grid to search for estimator hyperparameters. + See :mod:`sparselm.model_selection`. + optimizer_kwargs(dict): + Keyword arguments when constructing GridSearch or LineSearch class. + See :mod:`sparselm.model_selection`. + fit_kwargs(dict): + Keyword arguments when calling :func:`Estimator.fit`. + Refer to the documentation of the specific estimator. """ return { "estimator_type": d.get("estimator_type", "lasso"), @@ -616,40 +632,43 @@ def process_convergence_options(d): Returns: dict: A dict containing convergence options, including the following keys: - cv_tol(float): optional - Maximum allowed CV value in meV per site (including vacancies). - (not eV per atom because some CE may contain Vacancies.) - Default to None, but better set it manually! - std_cv_rtol(float): optional - Maximum standard deviation of CV allowed in cross validations, - normalized by mean CV value. - Dimensionless, default to None, which means this standard deviation - of cv will not be checked. - delta_cv_rtol(float): optional - Maximum difference of CV allowed between the last 2 iterations, - divided by the standard deviation of CV in cross validation. - Dimensionless, default to 0.5. - delta_eci_rtol(float): optional - Maximum allowed mangnitude of change in ECIs, measured by: - ||J' - J||_1 | / || J' ||_1. (L1-norms) - Dimensionless. If not given, will not check ECI values for - convergence, because this may significantly increase the - number of iterations. - delta_min_e_rtol(float): optional - Maximum difference allowed to the predicted minimum CE and DFT energy - at every composition between the last 2 iterations. Dimensionless, - divided by the value of CV. - Default set to 2. - continue_on_finding_new_gs(bool): optional - If true, whenever a new ground-state structure is detected ( - symmetrically distinct), the CE iteration will - continue even if all other criterion are satisfied. - Default to False because this may also increase the - number of iterations. - max_iter(int): optional - Maximum number of iterations allowed. Will not limit number - of iterations if set to None, but setting one limit is still - recommended. Default to 10. + cv_tol(float): optional + Maximum allowed CV value in meV per site (including vacancies). + (not eV per atom because some CE may contain Vacancies.) + Default to None, but better set it manually! + std_cv_rtol(float): optional + Maximum standard deviation of CV allowed in cross validations, + normalized by mean CV value. + Dimensionless, default to None, which means this standard deviation + of cv will not be checked. + delta_cv_rtol(float): optional + Maximum difference of CV allowed between the last 2 iterations, + divided by the standard deviation of CV in cross validation. + Dimensionless, default to 0.5. + delta_eci_rtol(float): optional + Maximum allowed mangnitude of change in ECIs, measured by: + + .. math:: + ||J' - J||_1 | / || J' ||_1. + + Dimensionless. If not given, will not check ECI values for + convergence, because this may significantly increase the + number of iterations required to converge. + delta_min_e_rtol(float): optional + Maximum difference allowed to the predicted minimum CE and DFT energy + at every composition between the last 2 iterations. Dimensionless, + divided by the value of CV. + Default set to 2. + continue_on_finding_new_gs(bool): optional + If true, whenever a new ground-state structure is detected ( + symmetrically distinct), the CE iteration will + continue even if all other criterion are satisfied. + Default to False because this may also increase the + number of iterations. + max_iter(int): optional + Maximum number of iterations allowed. Will not limit number + of iterations if set to None, but setting one limit is still + recommended. Default to 10. """ return { "cv_tol": d.get("cv_tol"), @@ -663,18 +682,20 @@ def process_convergence_options(d): def get_initial_ce_coefficients(cluster_subspace): - """Initialize null ce coefficients. + """Initialize a set of null CE coefficients. - Any coefficient, except those for external terms, will be initialized to 0. + Any coefficient, except those for external terms, will be initialized as 0. This guarantees that for ionic systems, structures with lower ewald energy are always selected first. + External term coefficients are all initialized as 1. Args: cluster_subspace(ClusterSubspace): The initial cluster subspace. Returns: - np.ndarray[float]. + 1D np.ndarray of float: + Initialized CE coefficients. """ return np.array( [0 for _ in range(cluster_subspace.num_corr_functions)] diff --git a/WFacer/sample_generators/mc_generators.py b/WFacer/sample_generators/mc_generators.py index c2e4531..9bb6b6a 100644 --- a/WFacer/sample_generators/mc_generators.py +++ b/WFacer/sample_generators/mc_generators.py @@ -41,23 +41,23 @@ def __init__( duplicacy_criteria="correlations", remove_decorations_before_duplicacy=False, ): - """Initialize McSampleGenerator. + """Initialize. Args: ce(ClusterExpansion): A cluster expansion object to enumerate with. sc_matrix(3*3 ArrayLike): Supercell matrix to solve on. - anneal_temp_series(list[float]): optional + anneal_temp_series(list of float): optional A series of temperatures to use in simulated annealing. - Must be mono-decreasing. - heat_temp_series(list[float]): optional + Must be strictly decreasing. + heat_temp_series(list of float): optional A series of increasing temperatures to sample on. - Must be mono-increasing + Must be strictly increasing num_steps_anneal(int): optional - Number of MC steps to run per annealing temperature step. + The number of MC steps to run per annealing temperature. num_steps_heat(int): optional - Number of MC steps to run per heat temperature step. + The number of MC steps to run per heating temperature. duplicacy_criteria(str): The criteria when to consider two structures as the same and old to add one of them into the candidate training set. @@ -109,9 +109,9 @@ def processor(self): def sublattices(self): """Get sublattices in ensemble. - Note: If you wish to do delicate operations such as sub-lattice - splitting, please do it on self.ensemble. - See docs of smol.moca.ensemble. + .. note:: If you wish to do delicate operations such as sub-lattice + splitting, please do it on self.ensemble. Refer to + :class:`smol.moca.ensemble` for further details. """ return self.ensemble.sublattices @@ -149,8 +149,10 @@ def get_ground_state_occupancy(self): """Use simulated annealing to solve the ground state occupancy. Returns: - ground state in encoded occupancy array: - list[int] + list of int: + The ground-state occupancy string obtained through + simulated annealing. + """ if self._gs_occu is None: init_occu = self._get_init_occu() @@ -185,7 +187,7 @@ def get_ground_state_features(self): """Get the feature vector of the ground state. Returns: - list[float]. + list of float. """ gs_occu = self.get_ground_state_occupancy() return ( @@ -202,10 +204,10 @@ def get_unfrozen_sample( """Generate a sample of structures by heating the ground state. Args: - previous_sampled_structures(list[Structure]): optional + previous_sampled_structures(list of Structure): optional Sample structures already calculated in past iterations. - previous_sampled_features(list[arrayLike]): optional + previous_sampled_features(list of ArrayLike): optional Feature vectors of sample structures already calculated in past iterations. num_samples(int): optional @@ -215,7 +217,7 @@ def get_unfrozen_sample( threshold. Default to 100. Return: - list[Structure], list[list[int]], list[list[float]]: + list of Structure, list of lists of int, list of lists of float: New samples structures, NOT including the ground-state, sampled occupancy arrays, and feature vectors of sampled structures. @@ -321,7 +323,7 @@ def get_unfrozen_sample( class CanonicalSampleGenerator(McSampleGenerator): - """Sample generator in canonical ensemble.""" + """Sample generator in canonical ensembles.""" def __init__( self, @@ -342,20 +344,20 @@ def __init__( A cluster expansion object to enumerate with. sc_matrix(3*3 ArrayLike): Supercell matrix to solve on. - counts(1D ArrayLike[int]): + counts(1D ArrayLike of int): Composition in the "counts " format, not normalized by number of primitive cells per super-cell. Refer to - smol.moca.Composition space for explanation. - anneal_temp_series(list[float]): optional + :mod:`smol.moca.composition` for explanation. + anneal_temp_series(list of float): optional A series of temperatures to use in simulated annealing. - Must be mono-decreasing. - heat_temp_series(list[float]): optional + Must be strictly decreasing. + heat_temp_series(list of float): optional A series of increasing temperatures to sample on. - Must be mono-increasing + Must be strictly increasing num_steps_anneal(int): optional - Number of steps to run per simulated annealing temperature. + The number of steps to run per simulated annealing temperature. num_steps_heat(int): optional - Number of steps to run per heat temperature. + The number of steps to run per heat temperature. duplicacy_criteria(str): The criteria when to consider two structures as the same and old to add one of them into the candidate training set. @@ -411,7 +413,7 @@ def _get_init_occu(self): # Grand-canonical generator will not be used very often. class SemigrandSampleGenerator(McSampleGenerator): - """Sample generator in canonical ensemble.""" + """Sample generator in semi-grand canonical ensembles.""" def __init__( self, @@ -434,17 +436,17 @@ def __init__( Supercell matrix to solve on. chemical_potentials(dict): Chemical potentials of each species. See documentation - of smol.moca Ensemble. - anneal_temp_series(list[float]): optional + of :mod:`smol.moca.ensemble`. + anneal_temp_series(list of float): optional A series of temperatures to use in simulated annealing. - Must be mono-decreasing. - heat_temp_series(list[float]): optional + Must be strictly decreasing. + heat_temp_series(list of float): optional A series of increasing temperatures to sample on. - Must be mono-increasing + Must be strictly increasing. num_steps_anneal(int): optional - Number of steps to run per simulated annealing temperature. + The number of steps to run per simulated annealing temperature. num_steps_heat(int): optional - Number of steps to run per heat temperature. + The number of steps to run per heat temperature. duplicacy_criteria(str): The criteria when to consider two structures as the same and old to add one of them into the candidate training set. @@ -525,11 +527,11 @@ def _get_init_occu(self): def mcgenerator_factory(mcgenerator_name, *args, **kwargs): - """Create a MCHandler with given name. + """Create a McSampleGenerator with its subclass name. Args: mcgenerator_name(str): - Name of a McSampleGenerator sub-class. + The name of a subclass of :class:`McSampleGenerator`. *args, **kwargs: Arguments used to initialize the class. """ diff --git a/WFacer/schema.py b/WFacer/schema.py index 8b8f940..76bfa8a 100644 --- a/WFacer/schema.py +++ b/WFacer/schema.py @@ -1,4 +1,4 @@ -"""Defines the data schema for WFacer jobs.""" +"""Defines the standard output schema for automated cluster expansion workflows.""" from typing import Any, Dict, List, Union from pydantic import BaseModel, Field @@ -11,7 +11,7 @@ class CeOutputsDocument(BaseModel): - """Summary of cluster expansion workflow as outputs.""" + """Summary document of cluster expansion outputs.""" project_name: str = Field( "ace-work", description="The name of cluster expansion" " project." @@ -87,7 +87,7 @@ class CeOutputsDocument(BaseModel): # This is to make feature matrix validated correctly. class Config: - """Setting configuration for schema.""" + """Setting configurations for the schema.""" arbitrary_types_allowed = True @@ -140,7 +140,7 @@ def last_iter_id(self): @property def converged(self): - """Check convergence based on given output doc. + """Check convergence based on the current outputs. Returns: bool. diff --git a/WFacer/specie_decorators/base.py b/WFacer/specie_decorators/base.py index 79fa837..b7e669a 100644 --- a/WFacer/specie_decorators/base.py +++ b/WFacer/specie_decorators/base.py @@ -1,10 +1,17 @@ """Decorate properties to a structure composed of Element. +This module offers generic classes and functions for defining an algorithm +used to map VASP calculated site properties into the label of species. For +example, :class:`BaseDecorator`, :class:`MixtureGaussianDecorator`, +:class:`GpOptimizedDecorator` and :class:`NoTrainDecorator`. These abstract +classes are meant to be inherited by any decorator class that maps specific +site properties. + Currently, we can only decorate charge. Plan to allow decorating spin in the future updates. -#Note: all entries should be re-decorated and all decorators should be -be-retrained after each iteration. +.. note:: All entries should be re-decorated and all decorators + should be retrained after an iteration. """ __author__ = "Fengyu Xie, Julia H. Yang" @@ -58,14 +65,15 @@ def explore_key_path(path, d): class BaseDecorator(MSONable, metaclass=ABCMeta): """Abstract decorator class. - 1, Each decorator should only be used to decorate one property. - 2, Currently, only supports assigning labels from one scalar site property, - and requires that the site property can be accessed from ComputedStructureEntry, - which should be sufficient for most purposes. - 3, Can not decorate entries with partial disorder. + #. Each decorator should only be used to decorate one property. + #. Currently, only supports assigning labels from one scalar site property, + and requires that the site property can be accessed from + :class:`ComputedStructureEntry`, which should be sufficient for most + purposes. + #. Can not decorate entries with partial disorder. """ - # Edit this as you implement new child classes. + # Edit these if you implement new child classes. decorated_prop_name = None required_prop_names = None @@ -73,7 +81,7 @@ def __init__(self, labels=None, **kwargs): """Initialize. Args: - labels(dict{str|Species:list}): optional + labels(dict of str or Species to list): optional A table of labels to decorate each element with. keys are species symbol, values are possible decorated property values, such as oxidation states, magnetic spin directions. @@ -101,12 +109,12 @@ def group_site_by_species(entries): """Group required properties on sites by species. Args: - entries(List[ComputedStructureEntry]): + entries(list of ComputedStructureEntry): Entries of computed structures. Return: - (Entry index, site index) occupied by each species: - defaultdict + defaultdict: + (Entry index, site index) belonging to each species. """ groups_by_species = defaultdict(lambda: []) @@ -128,7 +136,8 @@ def is_trained(self): If trained, will be blocked from training again. Returns: - bool. + bool: + Whether the model has been trained. """ return @@ -140,12 +149,12 @@ def train(self, entries, reset=False): object. Args: - entries(List[ComputedStructureEntry]): + entries(list of ComputedStructureEntry): Entries of computed structures. - reset(Boolean): optional + reset(bool): optional If you want to re-train the decorator model, set this value - to true. Otherwise, we will skip training if the model is - trained before. Default to false. + to true. Otherwise, will skip training if the model is + trained. Default to false. """ return @@ -154,15 +163,16 @@ def decorate(self, entries): """Give decoration to entries based on trained model. If an assigned entry is not valid, - for example, in charge assignment, if an assigned structure is not - charge neutral, then this entry will be returned as None. + for example, in charge assignment, if a decorated structure is not + charge neutral, this entry will be returned as None. Args: - entries(List[ComputedStructureEntry]): - Entries of computed structures. + entries(list of ComputedStructureEntry): + Entries of computed, undecorated structures. Returns: - List[NoneType|ComputedStructureEntry] + list of NoneType or ComputedStructureEntry: + Entries with decorated structures or failed structures. """ return @@ -271,8 +281,8 @@ def _filter(self, entries): """Filter out entries by some criteria. Must be implemented for every decorator class. - For entries that does not satisfy criteria, will - be replaced with None. + The entries that fail to satisfy the specific criteria + defined here will be returned as None. """ return entries @@ -296,9 +306,9 @@ def from_dict(cls, d): class MixtureGaussianDecorator(BaseDecorator, metaclass=ABCMeta): """Mixture of Gaussians (MoGs) decorator class. - Uses mixture of gaussian to label each species. + Uses mixture of Gaussians method to label each species. - Note: not tested yet. + .. note:: No test has been added for this specific class yet. """ decorated_prop_name = None @@ -318,7 +328,7 @@ def __init__(self, labels, gaussian_models=None, **kwargs): """Initialize. Args: - labels(dict{str:list}): optional + labels(dict of str to list): optional A table of labels to decorate each element with. keys are species symbol, values are possible decorated property values, such as oxidation states, magnetic spin directions. @@ -337,8 +347,8 @@ def __init__(self, labels, gaussian_models=None, **kwargs): GuessChargeDecorator. Be sure to provide labels for all the species you wish to assign a property to, otherwise, you are the cause of your own error! - gaussian_models(dict{str|Element|Species:GaussianMixture}): - Gaussian models corresponding to each key in labels. + gaussian_models(dict of str or Element or Species to GaussianMixture): + Gaussian models corresponding to each key in argument **labels**. """ super().__init__(labels, **kwargs) if gaussian_models is None: @@ -386,7 +396,8 @@ def is_trained(self): """Determine whether the decorator has been trained. Returns: - bool. + bool: + Whether the model has been trained. """ return all([self.is_trained_gaussian_model(m) for m in self._gms.values()]) @@ -397,9 +408,9 @@ def train(self, entries, reset=False): object. Args: - entries(List[ComputedStructureEntry]): + entries(list of ComputedStructureEntry): Entries of computed structures. - reset(Boolean): optional + reset(bool): optional If you want to re-train the decorator model, set this value to true. Otherwise, we will skip training if the model is trained before. Default to false. @@ -428,13 +439,12 @@ def decorate(self, entries): charge neutral, then this entry will be returned as None. Args: - entries(List[ComputedStructureEntry]): - Entries of computed structures. + entries(list of ComputedStructureEntry): + Entries of computed, undecorated structures. Returns: - Entries with structures decorated. Returns None if decoration - failed (not charge balanced, etc.) - List[NoneType|ComputedStructureEntry] + List of NoneType or ComputedStructureEntry: + Entries with decorated structures or failed structures. """ if not self.is_trained: raise ValueError("Can not make predictions from un-trained" " models!") @@ -489,9 +499,11 @@ def from_dict(cls, d): class GpOptimizedDecorator(BaseDecorator, metaclass=ABCMeta): """Gaussian process decorator class. - Uses Gaussian optimization process described by J. Yang - et.al. Can only handle decoration from a single scalar - property up to now. + Uses Gaussian optimization process described by `J. H. Yang + et al. `_ + + Up to now, this class can only take as input a single scalar + property per site. """ # Edit this as you implement new child classes. @@ -502,7 +514,7 @@ def __init__(self, labels, cuts=None, **kwargs): """Initialize. Args: - labels(dict{str:list}): optional + labels(dict of str to list): optional A table of labels to decorate each element with. keys are species symbol, values are possible decorated property values, such as oxidation states, magnetic spin directions. @@ -510,7 +522,7 @@ def __init__(self, labels, cuts=None, **kwargs): required property is increasing. For example, in Mn(2, 3, 4)+ all high spin, the magnetic moments is sorted as [Mn4+, Mn3+, Mn2+], thus you should provide labels as {Element("Mn"):[4, 3, 2]}. - Keys can be either Element|Species object, or their + Keys can be either Element and Species object, or their string representations. Currently, do not support decoration of Vacancy. If you have multiple required properties, or required properties @@ -521,7 +533,7 @@ def __init__(self, labels, cuts=None, **kwargs): GuessChargeDecorator. Be sure to provide labels for all the species you wish to assign a property to, otherwise, you are the cause of your own error! - cuts(dict{str|Species: list}): optional + cuts(dict of str or Species over list): optional Cuts to divide required property value into sectors, so as to decide the label they belong to. Keys are the same as argument "labels". @@ -532,8 +544,9 @@ def __init__(self, labels, cuts=None, **kwargs): < 1.0 will be assigned label 3, and atoms with magnetic moment >= 1.0 will be assigned label 2. If provided: - 1, Must be monotonically ascending, - 2, Must be len(labels[key]) = len(cuts[key]) + 1 for any key. + + #. Cut values must be monotonically increasing, + #. Must satisfy len(labels[key]) = len(cuts[key]) + 1 for any key. """ super().__init__(labels, **kwargs) if cuts is not None: @@ -555,7 +568,8 @@ def is_trained(self): If trained, will be blocked from training again. Returns: - bool. + bool: + Whether the model is trained. """ return self._cuts is not None @@ -666,14 +680,15 @@ def train(self, entries, reset=False, n_calls=50): optimize some objective function with gaussian process. Args: - entries(List[ComputedStructureEntry]): + entries(list of ComputedStructureEntry): Entries of computed structures. - reset(Boolean): optional + reset(bool): optional If you want to re-train the decorator model, set this value - to true. Otherwise, we will skip training if the model is - trained before. Default to false. + to true. Otherwise, training will be skipped if the model is + trained. Default to false. n_calls(int): optional - Number of iterations used in gp_minimize. Default is 50. + The number of iterations to be used by :func:`gp_minimize`. + Default is 50. """ if self.is_trained and not reset: return @@ -713,15 +728,16 @@ def decorate(self, entries): """Give decoration to entries based on trained model. If an assigned entry is not valid, - for example, in charge assignment, if an assigned structure is not - charge neutral, then this entry will be returned as None. + for example, in charge assignment, if a decorated structure is not + charge neutral, then its corresponding entry will be returned as None. Args: - entries(List[ComputedStructureEntry]): - Entries of computed structures. + entries(list of ComputedStructureEntry): + Entries of computed, undecorated structures. Returns: - List[NoneType|ComputedStructureEntry] + list of NoneType or ComputedStructureEntry: + Entries with decorated structures or failed structures. """ decoration_rules = self._decoration_rules_from_cuts(entries, self._cuts) entries_decorated = self._process(entries, decoration_rules) @@ -749,7 +765,7 @@ def __init__(self, labels, **kwargs): """Initialize. Args: - labels(dict{str|Species:list}): optional + labels(dict of str or Species to list}): optional A table of labels to decorate each element with. keys are species symbol, values are possible decorated property values, such as oxidation states, magnetic spin directions. @@ -757,7 +773,7 @@ def __init__(self, labels, **kwargs): required property is increasing. For example, in Mn(2, 3, 4)+ (high spin), the magnetic moments is sorted as [Mn4+, Mn3+, Mn2+], thus you should provide labels as {Element("Mn"):[4, 3, 2]}. - Keys can be either Element|Species object, or their + Keys can be either Element and Species object, or their string representations. Currently, do not support decoration of Vacancy. If you have multiple required properties, or required properties @@ -786,13 +802,17 @@ def train(self, entries=None, reset=False): def decorator_factory(decorator_type, *args, **kwargs): - """Create a species decorator with given name. + """Create a BaseDecorator with its subclass name. Args: decorator_type(str): - Name of a BaseDecorator subclass. - args, kwargs: + The name of a subclass of :class:`BaseDecorator`. + *args, **kwargs: Arguments used to initialize the class. + + Returns: + BaseDecorator: + The initialized decorator. """ if "decorator" not in decorator_type and "Decorator" not in decorator_type: decorator_type += "-decorator" @@ -801,15 +821,15 @@ def decorator_factory(decorator_type, *args, **kwargs): def get_site_property_query_names_from_decorator(decname): - """Get the name of required properties from decorator name. + """Get the required properties from a decorator name. Args: decname(str): Decorator name. Returns: - list[str]: - List of names of required site properties by the + list of str: + The list of names of required site properties by the decorator. """ if "decorator" not in decname and "decorator" not in decname: diff --git a/WFacer/specie_decorators/charge.py b/WFacer/specie_decorators/charge.py index 1150b8f..7a19bdb 100644 --- a/WFacer/specie_decorators/charge.py +++ b/WFacer/specie_decorators/charge.py @@ -11,7 +11,7 @@ class ChargeDecorator(BaseDecorator): - """A type of decorators to assign charge.""" + """Abstract decorators to assign charge.""" decorated_prop_name = "oxi_state" required_prop_names = None @@ -20,7 +20,7 @@ def __init__(self, labels=None, max_allowed_abs_charge=0): """Initialize. Args: - labels(dict{str|Species:list}): optional + labels(dict of str or Species to list): optional A table of labels to decorate each element with. keys are species symbol, values are possible decorated property values, such as oxidation states, magnetic spin directions. @@ -28,7 +28,7 @@ def __init__(self, labels=None, max_allowed_abs_charge=0): required property is increasing. For example, in Mn(2, 3, 4)+ (high spin), the magnetic moments is sorted as [Mn4+, Mn3+, Mn2+], thus you should provide labels as {Element("Mn"):[4, 3, 2]}. - Keys can be either Element|Species object, or their + Keys can be either Element and Species object, or their string representations. Currently, do not support decoration of Vacancy. If you have multiple required properties, or required properties @@ -41,8 +41,8 @@ def __init__(self, labels=None, max_allowed_abs_charge=0): a property to, otherwise, you are responsible for your own error! max_allowed_abs_charge(float): optional Maximum allowed absolute value of charge in a decorated structure - entry. If abs(structure.charge) exceeds this value, the entry - will be filtered and returned as a NoneType. + entry. If the absolute value of structure charge exceeds this value, + the entry will be filtered and returned as a NoneType. Default to 0, which means we require absolute charge balance. """ super().__init__(labels=labels) @@ -72,11 +72,12 @@ def from_dict(cls, d): class PmgGuessChargeDecorator(ChargeDecorator, NoTrainDecorator): - """Assign charges from pymatgen auto guesses. + """Assign charges from :mod:`pymatgen` automatic guesses. + + .. note:: This class does not need labels. - Notice: This class does not need labels at all. - Warning: This Decorator should not be used with - structures that include multi-valent elements! + .. warning:: This Decorator should never be used on + structures with multi-valent elements! """ decorated_prop_name = "oxi_state" @@ -98,16 +99,13 @@ def train(self, entries=None, reset=False): def decorate(self, entries): """Decorate entries by guessed charges. - Warning: Do not use this with multi-valent - elements, unless you know what you want - clearly!!! - Args: - entries(List[ComputedStructureEntry]): - Entries of computed structures. + entries(list of ComputedStructureEntry): + The entries of computed structures. Returns: - List[NoneType|ComputedStructureEntry] + list of NoneType or ComputedStructureEntry: + Entries with decorated structures or failed structures. """ entries_decor = [] for entry in entries: @@ -132,8 +130,8 @@ def decorate(self, entries): class FixedChargeDecorator(ChargeDecorator, NoTrainDecorator): """Assign fixed charge to each element from setting. - Warning: This Decorator should not be used with - structures that include multi-valent elements! + .. warning:: This Decorator should never be used on + structures with multi-valent elements! """ decorated_prop_name = "oxi_state" @@ -151,7 +149,8 @@ def decorate(self, entries): Entries of computed structures. Returns: - List[NoneType|ComputedStructureEntry] + list of NoneType or ComputedStructureEntry: + Entries with decorated structures or failed structures. """ entries_decor = [] for entry in entries: @@ -177,7 +176,7 @@ def decorate(self, entries): class MagneticChargeDecorator(GpOptimizedDecorator, ChargeDecorator): """Assign charges from magnitudes of total magentic moments on sites. - Is a sub-class of GPOptimizedDecorator. + Uses Gaussian process to optimize charge assignment. """ decorated_prop_name = "oxi_state" @@ -191,7 +190,7 @@ def __init__(self, labels, cuts=None, max_allowed_abs_charge=0): """Initialize. Args: - labels(dict{str: List[int|float]...}): + labels(dict of str to list of int}): A table of species as key, and charges to decorate to the species in the key. Values of a key should be sorted as the decorated species should have increasing magnetic @@ -206,14 +205,14 @@ def __init__(self, labels, cuts=None, max_allowed_abs_charge=0): GuessChargeDecorator. Be sure to provide labels for all the species you wish to assign a property to, otherwise, you are the cause of your own error! - cuts(dict{str: List[int|float]...}): optional + cuts(dict of str to list of int or float}): optional A table of species and cutting points of the magnetic moments, so that a magnetic moment is compared with each of these cutting values, and decided which charge label it should be assigned with. max_allowed_abs_charge(float): optional Maximum allowed absolute value of charge in a decorated structure - entry. If abs(structure.charge) exceeds this value, the entry - will be filtered and returned as a NoneType. + entry. If the absolute value of structure charge exceeds this value, + the entry will be filtered and returned as a NoneType. Default to 0, which means we require absolute charge balance. """ super().__init__( diff --git a/WFacer/utils/convex_hull.py b/WFacer/utils/convex_hull.py index 6637922..faea2e6 100644 --- a/WFacer/utils/convex_hull.py +++ b/WFacer/utils/convex_hull.py @@ -1,9 +1,4 @@ -"""Utilities related to min energies and convex hull. - -Notice: when generating and adding training structures, distinguish -element oxidation states. But when generating hulls for comparing -convergence, will not distinguish oxidation states. -""" +"""Utilities to obtain minimum energies per composition and the convex hull.""" from collections import defaultdict import numpy as np @@ -12,24 +7,25 @@ def get_min_energy_structures_by_composition(wrangler, max_iter_id=None): - """Get minimum energy and structure at each composition. + """Get the minimum energy and its corresponding structure at each composition. This function provides quick tools to compare minimum DFT energies. - Remember this is NOT hull! - Sublattice and oxidation state degrees of freedom in compositions - are not distinguished in generating hull. + + .. note:: Oxidation states are not distinguished when computing minimum energies + for determining hull convergence. Args: wrangler(CeDataWrangler): - Datawangler object. + A :class:`CeDataWangler` object storing the structure data. max_iter_id(int): optional Maximum iteration index included in the energy comparison. If none given, will read existing maximum iteration number. Returns: defaultdict: - element compositions as keys, energy per site and structure - as values. + Elemental compositions (:class:`Composition` objects accounting for + only the amount of each element instead of species) as keys, + energy per site and structure as values. """ min_e = defaultdict(lambda: (np.inf, None)) prim_size = len(wrangler.cluster_subspace.structure) @@ -52,22 +48,23 @@ def get_min_energy_structures_by_composition(wrangler, max_iter_id=None): def get_hull(wrangler, max_iter_id=None): - """Get the energies and compositions on the convex hull. + """Get the compositions convex hull at zero Kelvin. - Sublattice and oxidation state degrees of freedom in compositions - are not distinguished in generating hull. + .. note:: Oxidation states are not distinguished when computing hulls + for determining hull convergence. Args: wrangler(CeDataWrangler): - Datawangler object. + A :class:`CeDataWangler` object storing the structure data. max_iter_id(int): optional Maximum iteration index included in the energy comparison. If none given, will read existing maximum iteration number. Returns: dict: - element compositions as keys, energy per site and structure - as values. + Elemental compositions (:class:`Composition` objects accounting for + only the amount of each element instead of species) as keys, + energy per site and structure as values. """ if max_iter_id is None: max_iter_id = wrangler.max_iter_id diff --git a/WFacer/utils/duplicacy.py b/WFacer/utils/duplicacy.py index 7db4021..9869359 100644 --- a/WFacer/utils/duplicacy.py +++ b/WFacer/utils/duplicacy.py @@ -8,7 +8,7 @@ def clean_up_decoration(s): """Remove all decoration from a structure. - Used before comparing two structures before sending to compute. + Typically, used before comparing two structures. Args: s(Structure): @@ -44,7 +44,7 @@ def get_element(p): def is_duplicate(s1, s2, remove_decorations=False, matcher=None): - """Check duplication between structures. + """Check the duplicacy between structures. Args: s1(Structure): @@ -52,10 +52,10 @@ def is_duplicate(s1, s2, remove_decorations=False, matcher=None): s2(Structure): Same as s1. remove_decorations(bool): optional - Whether or not to remove all decorations from species (i.e, + Whether to remove all decorations from species (i.e, charge and other properties). Default to false. matcher(StructureMatcher): optional - A StructureMatcher to compare two structures. Using the same + A :class:`StructureMatcher` to compare two structures. Using the same _site_matcher as cluster_subspace is highly recommended. Returns: @@ -74,30 +74,30 @@ def is_duplicate(s1, s2, remove_decorations=False, matcher=None): def is_corr_duplicate(s1, proc1, s2=None, proc2=None, features2=None): - """Check whether two structures have the same correlations. + """Check whether two structures have the same correlation vectors. - Note: This is to mostly used criteria for checking structure - duplicacy, because two structures with the same correlation - vector should typically not be included in the training set - together! Also, comparing correlation vectors should be much - faster that comparing two structures, because comparing two - structures might involve reducing them to primitive cells - in advance, which can occasionally be very slow. + .. note:: This is the most used criteria for structure duplicacy as + two structures with the same correlation vector should in principle + not be included in the training set together! Also, comparing + correlation vectors can be much faster than comparing two structures + with :class:`StructureMatcher`. Args: - s1 (Structure): + s1(Structure): A structure to be checked. - proc1 (CompositeProcessor): - A processor established with the super-cell matrix of s1. - (Must be ClusterExpansionProcessor rather than - ClusterDecompositionProcessor!) - s2 (Structure): optional + proc1(CompositeProcessor): + A processor established on the super-cell matrix of s1. + + .. note:: Must use :class:`ClusterExpansionProcessor` instead of + :class:`ClusterDecompositionProcessor`. + + s2(Structure): optional Same as s1, but if a feature vector is already given, no need to give s2. - proc2 (CompositeProcessor): optional + proc2(CompositeProcessor): optional Same as proc1. But if a feature vector is already given, no need to give. - features2 (1D arrayLike): optional + features2(1D arrayLike): optional The feature vector of s2. If not given, must give both s2 and proc2. """ diff --git a/WFacer/utils/occu.py b/WFacer/utils/occu.py index 72f7ef3..fe7bc11 100644 --- a/WFacer/utils/occu.py +++ b/WFacer/utils/occu.py @@ -1,4 +1,4 @@ -"""Generate random occupancy.""" +"""Random occupancy generation.""" import numpy as np @@ -7,10 +7,10 @@ def get_random_occupancy_from_counts(ensemble, counts): Args: ensemble(Ensemble): - An ensemble object to generate occupancy in. + An :class:`Ensemble` object to generate occupancy in. counts(1D arrayLike): Species composition in "counts" format. - See smol.moca.composition. + See :mod:`smol.moca.composition`. Returns: np.ndarray: diff --git a/WFacer/utils/query.py b/WFacer/utils/query.py index c3e7a74..abec4fa 100644 --- a/WFacer/utils/query.py +++ b/WFacer/utils/query.py @@ -1,4 +1,4 @@ -"""Define rules to query a nested task documents and dictionaries.""" +"""Rules to query nested task documents and dictionaries.""" import random from warnings import warn @@ -9,13 +9,14 @@ def query_keypath(obj, keypath): """Query attributes of an object along a path. Args: - obj(Object|dict): + obj(Object or dict): The object to be queried. - keypath(list[str]): + keypath(list of str): A path of attribute names to query. Returns: - Any: the queried result. + Any: + The result of query. """ if not isinstance(keypath, (list, tuple)): raise ValueError("A key path must be a list or tuple!") @@ -66,15 +67,18 @@ def query_keypath(obj, keypath): def query_name_iteratively(obj, name): """Query an attribute from a nested object. + .. note:: The first match encountered at the lowest search level + will always be returned first. + Args: - obj(Object|dict): + obj(Object or dict): The object to be queried. name(str): The attribute name. Returns: - Any: the queried result. Will always return the first one - found at the shallowest reference level. + Any: + The result of query. """ if isinstance(obj, dict): if name in obj: @@ -119,8 +123,9 @@ def get_property_from_object(obj, query_string): Args: obj(Object): - An object to recursively parse property from. - A task document generated as vasp task output by atomate2. + An object to recursively parse property from, typically + a :class:`emmet-core.TaskDoc` generated as vasp task + output by atomate2. query_string(str): A string that defines the rule to query the object. Three special characters are reserved: ".", "-" and "^": @@ -132,23 +137,22 @@ def get_property_from_object(obj, query_string): each level. If a level of reference is a list or tuple: - 1, Using f"{some_ind}-" as the prefix to this level will yield - the corresponding key/attribute of the "some_ind"'th member - in the list. - 2, Using "^" as the prefix to this level will yield the - corresponding key/attribute of all the members in the list and - return them as a list in the original order. - 3, Using f"{some_ind}-" as the prefix to this level will yield - the corresponding key/attribute of the first member - in the list. - Do not use "-" or "^" prefix when the corresponding level is - not a list or tuple. If a corresponding level is a set, a random - element will be yielded. - - For example, "calcs_reversed.0-output.outcar.magnetization.^tot" + #. Using f"{some index}-" as the prefix to this level will yield + the corresponding key/attribute of the "some_ind"'th member + in the list. + #. Using "^" as the prefix to this level will yield the + corresponding key/attribute of all the members in the list and + return them as a list in the original order. + #. Using f"{some_ind}-" as the prefix to this level will yield + the corresponding key/attribute of the first member in the list. + #. Do not use "-" or "^" prefix when the corresponding level is + not a list or tuple. If a corresponding level is a set, a random + element will be yielded. + + For example, *"calcs_reversed.0-output.outcar.magnetization.^tot"* will give you the total magnetization on each site of the structure - in the final ionic step, if the input object is a valid atomate2 - TaskDoc. + in the final ionic step, if the input object is a valid emmet-core + :class:`TaskDoc`. If a string with no special character is given, we will iteratively search through each level of attributes and dict keys until the @@ -159,7 +163,8 @@ def get_property_from_object(obj, query_string): have specified the exact full path to retrieve the desired item. Returns: - any: value of the queried property. + Any: + The result of query. """ # Add more special conversion rules if needed. query = query_string.split(".") diff --git a/WFacer/utils/selection.py b/WFacer/utils/selection.py index a9b4195..f35f324 100644 --- a/WFacer/utils/selection.py +++ b/WFacer/utils/selection.py @@ -1,4 +1,4 @@ -"""Provide structure selection methods.""" +"""Structure selection methods from feature matrices.""" from warnings import warn import numpy as np @@ -23,12 +23,14 @@ def select_initial_rows( num_external_terms(int): optional Number of external terms in cluster subspace. These terms should not be compared in a structure selection. - keep_indices(list[int]): optional + keep_indices(list of int): optional Indices of structures that must be selected. Usually those of important ground state structures. Returns: - List[int]: indices of selected structures. + list of int: + Indices of selected rows in the feature matrix, + corresponding to the selected structures. """ # Leave out external terms. a = np.array(femat)[:, : len(femat[0]) - num_external_terms] @@ -95,8 +97,9 @@ def select_added_rows( We select structures by minimizing the leverage score under a certain domain matrix, or fully at random. - Refer to: Phys. Rev. B 82, 184107 (2010). - Inputs: + Refer to `T. Mueller et al. `_ + + Args: femat(2D arraylike): Correlation vectors of new structures. old_femat(2D arraylike): @@ -107,7 +110,7 @@ def select_added_rows( The method used to select structures. Default is by maximizing leverage score reduction ("leverage"). "random" is also supported. - keep_indices(List[int]): optional + keep_indices(list of int): optional Indices of structures that must be selected. Usually those of important ground state structures. num_external_terms(int): optional @@ -117,8 +120,10 @@ def select_added_rows( The domain matrix used to compute leverage score. By default, we use an identity matrix. - Outputs: - List of ints. Indices of selected rows in femat. + Returns: + list of int: + Indices of selected rows in the feature matrix, + corresponding to the selected structures. """ # Leave out external terms. a = np.array(femat)[:, : len(femat[0]) - num_external_terms] diff --git a/WFacer/utils/sparselm_estimators.py b/WFacer/utils/sparselm_estimators.py index 7718d95..f10468d 100644 --- a/WFacer/utils/sparselm_estimators.py +++ b/WFacer/utils/sparselm_estimators.py @@ -1,4 +1,4 @@ -"""Utility functions to manage sparselm estimators.""" +"""Utility functions to prepare sparse-lm estimators.""" from warnings import warn import numpy as np @@ -12,16 +12,18 @@ def is_subclass(classname, parent_classname): - """Check whether an estimator is a subclass of some parent. + """Check whether the estimator is a subclass of some parent. Args: classname(str): - Name of the sparselm estimator class. + Name of the :mod:`sparse-lm` estimator class. parent_classname(str): - Name of the parent class. Also in sparselm.model. + Name of the parent class. Also in :mod:`sparselm.model`. Returns: - bool. + bool: + Whether the given class is a subclass of + another given class. """ cls = getattr(sparselm.model, classname) if hasattr(sparselm.model, parent_classname): @@ -58,16 +60,19 @@ def is_subclass(classname, parent_classname): # smol 0.3.1 cannot correctly identify subclasses in sparse-lm. # Temporarily writing as import __all__. def estimator_factory(estimator_name, **kwargs): - """Get an estimator object from class name. + """Get an estimator object from its class name. Args: estimator_name (str): - Name of the estimator. - kwargs: + The name of the estimator. + **kwargs: Other keyword arguments to initialize an estimator. Depends on the specific class + Returns: - Estimator + Estimator: + Packed estimator or stepwise estimator to be used + directly for fitting. """ class_name = class_name_from_str(estimator_name) @@ -80,21 +85,23 @@ def estimator_factory(estimator_name, **kwargs): def optimizer_factory(optimizer_name, estimator, param_grid=None, **kwargs): - """Get an optimizer object from class name. + """Get an optimizer object from its class name. Args: - optimizer_name (str): + optimizer_name(str): Name of the optimizer. estimator(CVXRegressor): An estimator used to initialize the optimizer. - param_grid(dict|list[tuple]): + param_grid(dict or list of tuple): Parameters grid used to initialize the optimizer. Format - depends on the type of optimizer. See sparselm.model_selection. - kwargs: + depends on the type of optimizer. See :mod:`sparselm.model_selection`. + **kwargs: Other keyword arguments to initialize an optimizer. - Depends on the specific class + Depends on the specific class used. + Returns: - GridSearchCV or LineSearchCV. + GridSearchCV or LineSearchCV: + An initialized model selection object. """ all_optimizers = {"GridSearchCV": GridSearchCV, "LineSearchCV": LineSearchCV} if ( @@ -121,25 +128,26 @@ def prepare_estimator( estimator_kwargs=None, optimizer_kwargs=None, ): - """Prepare an estimator for the direct call of fit. + """Prepare an estimator for fitting. - No weights will be used. + .. note:: Sample weights are not supported yet. Args: cluster_subspace(ClusterSubspace): - A cluster subspace to expand with. + A :class:`ClusterSubspace` to expand with. estimator_name(str): The name of estimator, following the rules in - smol.utils.class_name_from_str. + :mod:`smol.utils`. optimizer_name(str): - Name of hyperparameter optimizer. Currently, only supports GridSearch and - LineSearch. + The name of the model optimizer. + Currently, only supports :class:`GridSearch` and :class:`LineSearch`. param_grid(dict|list[tuple]): - Parameter grid to initialize the optimizer. See docs of - sparselm.model_selection. Not needed for OrdinaryLeastSquares. + Parameter grid to initialize the optimizer. See docs of the + :mod:`sparselm.model_selection` module. + **Not needed when using** :class:`OrdinaryLeastSquares`. use_hierarchy(bool): optional - Whether to use cluster hierarchy constraints when available. Default to - true. + Whether to use the cluster hierarchy constraints when available. + Default to true. center_point_external(bool): optional Whether to fit the point and external terms with linear regression first, then fit the residue with regressor. Default to true, @@ -151,8 +159,8 @@ def prepare_estimator( Other keyword arguments to initialize an optimizer. Returns: - GridSearchCV/LineSearchCV, StepwiseEstimator, - or OrdinaryLeastSquares. + Estimator: + The estimator wrapped up for fitting. """ # Corrected and normalized DFT energy in eV/prim. point_func_inds = cluster_subspace.function_inds_by_size[1] diff --git a/WFacer/utils/supercells.py b/WFacer/utils/supercells.py index b0f0124..6366d8d 100644 --- a/WFacer/utils/supercells.py +++ b/WFacer/utils/supercells.py @@ -10,18 +10,19 @@ def get_three_factors(n): - """Enumerate all 3 factor decompositions of an integer. + """Enumerate all three-factor decompositions of an integer. + + .. note:: Do not use this to factorize a large integer with many + allowed factors! - Note: - Do not use this to factorize an integer with many - possible factors. Args: n(int): The integer to factorize. Returns: - All 3 factor decompositions: - List[tuple(int)] + list of tuples of int: + All three-factor decompositions of the input integer. + """ def enumerate_three_summations(c): @@ -59,22 +60,25 @@ def is_proper_sc(sc_matrix, lat, max_cond=8, min_angle=30): """Assess the quality of a given supercell matrix. If too skewed or too slender, this matrix will be dropped - because it does not fit for DFT calculation. + because it typically causes poor DFT convergence. + Args: sc_matrix(3 * 3 ArrayLike): Supercell matrix - lat(pymatgen.Lattice): + lat(Lattice): Lattice of the primitive cell max_cond(float): optional Maximum conditional number allowed in the supercell lattice matrix. This is to avoid overly imbalance in the lengths of three lattice vectors. By default, set to 8. min_angle(float): optional - Minimum allowed angle of the supercell lattice. By default, set - to 30, to prevent over-skewing. + Minimum allowed angle of the supercell lattice. + By default, set to 30 degrees to prevent over-skewing. Returns: - Boolean. + bool: + Whether the super-cell matrix is proper to be used in structure + enumeration. """ new_mat = np.dot(sc_matrix, lat.matrix) new_lat = Lattice(new_mat) @@ -91,18 +95,19 @@ def is_proper_sc(sc_matrix, lat, max_cond=8, min_angle=30): def is_duplicate_sc(m1, m2, prim): - """Give whether two super-cell matrices give identical super-cell. + """Check whether two super-cell matrices are symmetrically identical. Args: - m1(3*3 ArrayLike[int]): + m1(3*3 ArrayLike of int): Supercell matrices to compare. - m2(3*3 ArrayLike[int]): + m2(3*3 ArrayLike of int): Supercell matrices to compare. - prim(pymatgen.Structure): + prim(Structure): Primitive cell object. Returns: - bool. + bool: + Whether the two super-cell matrices are symmetrically equivalent. """ s1 = prim.copy() s2 = prim.copy() diff --git a/WFacer/utils/task_document.py b/WFacer/utils/task_document.py index 5a55010..478b937 100644 --- a/WFacer/utils/task_document.py +++ b/WFacer/utils/task_document.py @@ -11,16 +11,18 @@ def _merge_computed_structure_entry(entry, structure): - """Merge structure into ComputedEntry. + """Merge a structure into :class:`ComputedEntry`. Args: entry(ComputedEntry): - A computed Entry given by taskdoc. + A computed Entry extracted from a :class:`TaskDoc`. structure(Structure): - A structure given by taskdoc. + A structure from the same :class:`TaskDoc`. Return: - ComputedStuctureEntry. + ComputedStuctureEntry: + A :class:`ComputedStructureEntry` created from + class :class:`ComputedEntry`. """ return ComputedStructureEntry( structure, @@ -35,12 +37,12 @@ def _merge_computed_structure_entry(entry, structure): def get_entry_from_taskdoc(taskdoc, property_and_queries=None, decorator_names=None): - """Get the computed structure entry from taskdoc. + """Get the computed structure entry from :class:`TaskDoc`. Args: taskdoc(TaskDoc): A task document generated as vasp task output by emmet-core. - property_and_queries(list[(str, str)|str]): optional + property_and_queries(list of (str, str) or str): optional A list of property names to be retrieved from taskdoc, and the query string to retrieve them, paired in tuples. If only strings are given, will also query with the given @@ -48,7 +50,7 @@ def get_entry_from_taskdoc(taskdoc, property_and_queries=None, decorator_names=N These are properties that you wish to record besides "energy" and "uncorrected_energy", etc. By default, will not record any other property. - decorator_names(list[str]): optional + decorator_names(list of str): optional The name of decorators used in this CE workflow, used to determine what site properties to retrieve from TaskDoc and to include in the returned entry. @@ -57,7 +59,7 @@ def get_entry_from_taskdoc(taskdoc, property_and_queries=None, decorator_names=N ComputedStructureEntry, dict: The computed structure entry, with each site having the site property required by decorator, and the properties - dict for insertion into CeDataWangler. + dict ready to be inserted into a :class:`CeDataWangler`. """ # Final optimized structure. structure = taskdoc.structure diff --git a/WFacer/wrangling.py b/WFacer/wrangling.py index e9c6e9e..d768b8b 100644 --- a/WFacer/wrangling.py +++ b/WFacer/wrangling.py @@ -1,7 +1,8 @@ -"""CeDataWrangler. +"""Defines :class:`CeDataWrangler`. -This file includes a modified version of StructureWrangler, which stores -more information that CE might use. +The class :class:`CeDataWrangler` is modified from +class :mod:`smol.cofe.wrangling.StructureWrangler` +to store the iteration information required by :mod:`WFacer`. """ __author__ = "Fengyu Xie" @@ -19,10 +20,10 @@ class CeDataWrangler(StructureWrangler): """CeDataWrangler class. - Interfaces WFacer generated data, does insertion and deletion, - but will not generate any data. + Merely a data storage and validation structure, will not perform any operation + to the data. - Note: This DataWrangler is not compatible with legacy version of smol. + .. note:: Not compatible with :mod:`smol` < 0.5.0. """ def _check_structure_duplicacy(self, entry, sm=None): @@ -45,7 +46,9 @@ def _check_structure_duplicacy(self, entry, sm=None): def max_iter_id(self): """Maximum index of iteration existing. - Iteration counted from 0. + Returns: + int: + Maximum iteration index (counting from 0). """ return ( max(entry.data["properties"]["spec"]["iter_id"] for entry in self.entries) @@ -74,43 +77,46 @@ def add_entry( Usually failures are caused by the StructureMatcher in the given ClusterSubspace failing to map structures to the primitive structure. - Same as StructureWrangler but refuses to insert symmetrically equivalent - entries. It also records the iteration number when then entry was added. + Same as :class:`StructureWrangler` but refuses to insert symmetrically + equivalent entries. It also records the iteration number when then entry + was added. Args: - entry (ComputedStructureEntry): + entry(ComputedStructureEntry): A ComputedStructureEntry with a training structure, energy and properties - properties (dict): + properties(dict): Dictionary with a key describing the property and the target value for the corresponding structure. For example if only a single property {'energy': value} but can also add more than one, i.e. {'total_energy': value1, 'formation_energy': value2}. You are free to make up the keys for each property but make sure you are consistent for all structures that you add. - weights (dict): - the weight given to the structure when doing the fit. The key + weights(dict): + The weight given to the structure when doing the fit. The key must match at least one of the given properties. - supercell_matrix (ndarray): optional - if the corresponding structure has already been matched to the + supercell_matrix(ndarray): optional + If the corresponding structure has already been matched to the ClusterSubspace prim structure, passing the supercell_matrix will use that instead of trying to re-match. If using this, the user is responsible for having the correct supercell_matrix. Here you are the cause of your own bugs. - site_mapping (list): optional - site mapping as obtained by StructureMatcher.get_mapping - such that the elements of site_mapping represent the indices + site_mapping(list): optional + Site mapping as obtained by + function :func:`StructureMatcher.get_mapping` such that the + elements of site_mapping represent the indices of the matching sites to the prim structure. If you pass this option, you are fully responsible that the mappings are correct! check_struct_duplicacy(bool): optional - if true, will check structure duplicacy, and skip an entry if it + If true, will check structure duplicacy, and skip an entry if it is symmetrically equivalent to an existing one. Default to true. - verbose (bool): optional - if True, will raise warning regarding structures that fail in - StructureMatcher, and structures that have duplicate corr vectors. - raise_failed (bool): optional - if True, will raise the thrown error when adding a structure - that fails. This can be helpful to keep a list of structures that + verbose(bool): optional + If True, will raise warning regarding structures that fail in + :class:`StructureMatcher`, and structures that have duplicated + correlation vectors. + raise_failed(bool): optional + If True, will raise the thrown error when adding a structure + that fails. This can be helpful to keep a list of structures that fail for further inspection. """ # Add property "spec" to store iter_id and enum_id to record in which iteration diff --git a/docs/Makefile b/docs/Makefile new file mode 100644 index 0000000..d0c3cbf --- /dev/null +++ b/docs/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = source +BUILDDIR = build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/make.bat b/docs/make.bat new file mode 100644 index 0000000..dc1312a --- /dev/null +++ b/docs/make.bat @@ -0,0 +1,35 @@ +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=source +set BUILDDIR=build + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.https://www.sphinx-doc.org/ + exit /b 1 +) + +if "%1" == "" goto help + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% + +:end +popd diff --git a/docs/src/api_reference/WFacer/convergence.rst b/docs/src/api_reference/WFacer/convergence.rst new file mode 100644 index 0000000..fd5e922 --- /dev/null +++ b/docs/src/api_reference/WFacer/convergence.rst @@ -0,0 +1,8 @@ +==================================================== +Cluster expansion convergence check +==================================================== + +.. automodule:: WFacer.convergence + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/WFacer/enumeration.rst b/docs/src/api_reference/WFacer/enumeration.rst new file mode 100644 index 0000000..04da84f --- /dev/null +++ b/docs/src/api_reference/WFacer/enumeration.rst @@ -0,0 +1,8 @@ +==================================================== +Structure enumeration +==================================================== + +.. automodule:: WFacer.enumeration + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/WFacer/fit.rst b/docs/src/api_reference/WFacer/fit.rst new file mode 100644 index 0000000..4a8a9e3 --- /dev/null +++ b/docs/src/api_reference/WFacer/fit.rst @@ -0,0 +1,8 @@ +==================================================== +Cluster expansion fitting +==================================================== + +.. automodule:: WFacer.fit + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/WFacer/index.rst b/docs/src/api_reference/WFacer/index.rst new file mode 100644 index 0000000..73a03b2 --- /dev/null +++ b/docs/src/api_reference/WFacer/index.rst @@ -0,0 +1,18 @@ +==================================================== +WFacer main functionalities +==================================================== + +This module contains major functionalities to construct cluster expansion +jobs and workflows in **WFacer**. + +.. toctree:: + :maxdepth: 1 + + convergence + enumeration + fit + jobs + maker + preprocessing + schema + wrangling diff --git a/docs/src/api_reference/WFacer/jobs.rst b/docs/src/api_reference/WFacer/jobs.rst new file mode 100644 index 0000000..12740c1 --- /dev/null +++ b/docs/src/api_reference/WFacer/jobs.rst @@ -0,0 +1,8 @@ +==================================================== +Cluster expansion jobs (Atomate2) +==================================================== + +.. automodule:: WFacer.jobs + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/WFacer/maker.rst b/docs/src/api_reference/WFacer/maker.rst new file mode 100644 index 0000000..02728bd --- /dev/null +++ b/docs/src/api_reference/WFacer/maker.rst @@ -0,0 +1,8 @@ +==================================================== +Cluster expansion workflow maker (Atomate2) +==================================================== + +.. automodule:: WFacer.maker + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/WFacer/preprocessing.rst b/docs/src/api_reference/WFacer/preprocessing.rst new file mode 100644 index 0000000..ef872cf --- /dev/null +++ b/docs/src/api_reference/WFacer/preprocessing.rst @@ -0,0 +1,8 @@ +==================================================== +Preprocess options and inputs +==================================================== + +.. automodule:: WFacer.preprocessing + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/WFacer/schema.rst b/docs/src/api_reference/WFacer/schema.rst new file mode 100644 index 0000000..0f8a06f --- /dev/null +++ b/docs/src/api_reference/WFacer/schema.rst @@ -0,0 +1,8 @@ +==================================================== +The CeOutputsDocument schema +==================================================== + +.. automodule:: WFacer.schema + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/WFacer/wrangling.rst b/docs/src/api_reference/WFacer/wrangling.rst new file mode 100644 index 0000000..01551b9 --- /dev/null +++ b/docs/src/api_reference/WFacer/wrangling.rst @@ -0,0 +1,8 @@ +==================================================== +CeDataWrangler (WFacer) +==================================================== + +.. automodule:: WFacer.wrangling + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/index.rst b/docs/src/api_reference/index.rst new file mode 100644 index 0000000..3535360 --- /dev/null +++ b/docs/src/api_reference/index.rst @@ -0,0 +1,23 @@ +.. _api ref: + +============= +API Reference +============= + +The full API documentation for all classes and functions in the **WFacer** package are +listed below. + +.. toctree:: + :maxdepth: 2 + + sample_generators/index + species_decorators/index + utils/index + WFacer/index + +Autogenerated Indices +--------------------- + +* :ref:`modindex` +* :ref:`genindex` +* :ref:`search` diff --git a/docs/src/api_reference/sample_generators/index.rst b/docs/src/api_reference/sample_generators/index.rst new file mode 100644 index 0000000..61263d1 --- /dev/null +++ b/docs/src/api_reference/sample_generators/index.rst @@ -0,0 +1,13 @@ +==================================================== +Structure generators +==================================================== + +This module contains classes and functions to sample training +structures using existing cluster expansion. + +.. toctree:: + :maxdepth: 1 + + mc_generators + +More to come... diff --git a/docs/src/api_reference/sample_generators/mc_generators.rst b/docs/src/api_reference/sample_generators/mc_generators.rst new file mode 100644 index 0000000..43da7bd --- /dev/null +++ b/docs/src/api_reference/sample_generators/mc_generators.rst @@ -0,0 +1,17 @@ +.. _monte_carlo_generators : + +==================================================== +Monte-Carlo sample generators +==================================================== + +This module contains classes and functions to sample training +structures using Monte-Carlo at an increasing series of temperatures, +given a specific supercell matrix and a constant composition or +chemical potentials. + +Class are implemented with :class:`smol.moca`. + +.. automodule:: WFacer.sample_generators.mc_generators + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/species_decorators/base.rst b/docs/src/api_reference/species_decorators/base.rst new file mode 100644 index 0000000..68b433c --- /dev/null +++ b/docs/src/api_reference/species_decorators/base.rst @@ -0,0 +1,8 @@ +==================================================== +Generic decorators +==================================================== + +.. automodule:: WFacer.specie_decorators.base + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/species_decorators/charge.rst b/docs/src/api_reference/species_decorators/charge.rst new file mode 100644 index 0000000..46d15b0 --- /dev/null +++ b/docs/src/api_reference/species_decorators/charge.rst @@ -0,0 +1,8 @@ +==================================================== +Charge decorators +==================================================== + +.. automodule:: WFacer.specie_decorators.charge + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/species_decorators/index.rst b/docs/src/api_reference/species_decorators/index.rst new file mode 100644 index 0000000..fbe1ce4 --- /dev/null +++ b/docs/src/api_reference/species_decorators/index.rst @@ -0,0 +1,15 @@ +==================================================== +Species decorators +==================================================== + +This module contains classes and functions to label species charge, +magnetic moment or other site property given a +:class:`pymatgen.entries.ComputedStructureEntry`. + +.. toctree:: + :maxdepth: 1 + + base + charge + +More to come... diff --git a/docs/src/api_reference/utils/convex_hull.rst b/docs/src/api_reference/utils/convex_hull.rst new file mode 100644 index 0000000..c10c19e --- /dev/null +++ b/docs/src/api_reference/utils/convex_hull.rst @@ -0,0 +1,8 @@ +==================================================== +Convex hull +==================================================== + +.. automodule:: WFacer.utils.convex_hull + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/utils/duplicacy.rst b/docs/src/api_reference/utils/duplicacy.rst new file mode 100644 index 0000000..5067ad9 --- /dev/null +++ b/docs/src/api_reference/utils/duplicacy.rst @@ -0,0 +1,8 @@ +==================================================== +Duplicacy check +==================================================== + +.. automodule:: WFacer.utils.duplicacy + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/utils/index.rst b/docs/src/api_reference/utils/index.rst new file mode 100644 index 0000000..5ed9785 --- /dev/null +++ b/docs/src/api_reference/utils/index.rst @@ -0,0 +1,19 @@ +==================================================== +Miscellaneous utility functions +==================================================== + +This module contains utility functions to handle the convex hull, the occupancy +strings, the calculation data, the structure selection process and the spase-lm +estimators, etc. + +.. toctree:: + :maxdepth: 1 + + convex_hull + duplicacy + occu + query + selection + sparselm + supercells + taskdoc diff --git a/docs/src/api_reference/utils/occu.rst b/docs/src/api_reference/utils/occu.rst new file mode 100644 index 0000000..ebaacd9 --- /dev/null +++ b/docs/src/api_reference/utils/occu.rst @@ -0,0 +1,8 @@ +==================================================== +Occupancy +==================================================== + +.. automodule:: WFacer.utils.occu + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/utils/query.rst b/docs/src/api_reference/utils/query.rst new file mode 100644 index 0000000..5798a58 --- /dev/null +++ b/docs/src/api_reference/utils/query.rst @@ -0,0 +1,8 @@ +==================================================== +Data query +==================================================== + +.. automodule:: WFacer.utils.query + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/utils/selection.rst b/docs/src/api_reference/utils/selection.rst new file mode 100644 index 0000000..a124814 --- /dev/null +++ b/docs/src/api_reference/utils/selection.rst @@ -0,0 +1,8 @@ +==================================================== +Structure selection +==================================================== + +.. automodule:: WFacer.utils.selection + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/utils/sparselm.rst b/docs/src/api_reference/utils/sparselm.rst new file mode 100644 index 0000000..d488acc --- /dev/null +++ b/docs/src/api_reference/utils/sparselm.rst @@ -0,0 +1,8 @@ +==================================================== +Sparse-lm estimator handling +==================================================== + +.. automodule:: WFacer.utils.sparselm_estimators + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/utils/supercells.rst b/docs/src/api_reference/utils/supercells.rst new file mode 100644 index 0000000..a15d154 --- /dev/null +++ b/docs/src/api_reference/utils/supercells.rst @@ -0,0 +1,8 @@ +==================================================== +Supercell matrix +==================================================== + +.. automodule:: WFacer.utils.supercells + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/api_reference/utils/taskdoc.rst b/docs/src/api_reference/utils/taskdoc.rst new file mode 100644 index 0000000..b47bacf --- /dev/null +++ b/docs/src/api_reference/utils/taskdoc.rst @@ -0,0 +1,8 @@ +==================================================== +TaskDoc handling +==================================================== + +.. automodule:: WFacer.utils.task_document + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/src/conf.py b/docs/src/conf.py new file mode 100644 index 0000000..c931aec --- /dev/null +++ b/docs/src/conf.py @@ -0,0 +1,161 @@ +# Configuration file for the Sphinx documentation builder. +# +# For the full list of built-in configuration values, see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Project information ----------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information +from datetime import date + +import WFacer + +project = "WFacer" +copyright = f"2022-{date.today().year}, Ceder Group" +author = "Fengyu Xie" +release = WFacer.__version__ +version = WFacer.__version__ + +# -- General configuration --------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration + +extensions = [ + "sphinx.ext.autodoc", + "sphinx.ext.napoleon", + "sphinx.ext.intersphinx", + "sphinx.ext.mathjax", + "sphinx.ext.autosummary", + "sphinx_autodoc_typehints", + # "sphinx.ext.coverage", + # "sphinx.ext.doctest", + # "sphinx.ext.todo", + "sphinx.ext.viewcode", + "sphinx_mdinclude", +] + +# Generate the API documentation when building +autosummary_generate = True +add_module_names = False +autoclass_content = "both" + +napoleon_google_docstring = True +napoleon_include_init_with_doc = False +napoleon_include_private_with_doc = False +napoleon_include_special_with_doc = False +napoleon_use_admonition_for_examples = False +napoleon_use_admonition_for_notes = False +napoleon_use_admonition_for_references = False +napoleon_use_ivar = False +napoleon_custom_sections = None + +# Add any paths that contain templates here, relative to this directory. +source_suffix = [".rst"] + +# The encoding of src files. +source_encoding = "utf-8" + +# Add any paths that contain templates here, relative to this directory. +templates_path = ["_templates"] + +# List of patterns, relative to src directory, that match files and +# directories to ignore when looking for src files. +# This pattern also affects html_static_path and html_extra_path. +exclude_patterns = [] + +# The name of the Pygments (syntax highlighting) style to use. +pygments_style = "sphinx" + +# -- Options for HTML output ------------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. + +html_theme = "pydata_sphinx_theme" + +# TODO: update this when fixed version of pydata-sphinx-theme is released +# html_logo = "_static/logo.png" # banner.svg needs text as paths to avoid font missing + +html_theme_options = { + # "logo": { + # "image_light": "logo.png", + # "image_dark": "logo.png", + # }, + "github_url": "https://github.com/CederGroupHub/WFacer", + "use_edit_page_button": True, + "show_toc_level": 2, + # "navbar_align": "left", # [left, content, right] For testing that the navbar + # items align properly + # "navbar_start": ["navbar-logo", "navbar-version"], + # "navbar_center": ["navbar-nav", "navbar-version"], # Just for testing + "navigation_depth": 2, + "show_nav_level": 2, + "navbar_end": ["version-switcher", "theme-switcher", "navbar-icon-links"], # + # "left_sidebar_end": ["custom-template.html", "sidebar-ethical-ads.html"], + # "footer_items": ["copyright", "sphinx-version", ""] + "switcher": { + # "json_url": "/_static/switcher.json", + "json_url": "https://pydata-sphinx-theme.readthedocs.io/en/latest/_static/" + "switcher.json", + # "url_template": "https://pydata-sphinx-theme.readthedocs.io/en/v{version}/", + "version_match": version, + }, + "external_links": [ + { + "name": "Changes", + "url": "https://github.com/CederGroupHub/WFacer/blob/master/CHANGES.md", + }, + {"name": "Issues", "url": "https://github.com/CederGroupHub/WFacer/issues"}, + ], +} + +html_context = { + "github_url": "https://github.com", # or your GitHub Enterprise interprise + "github_user": "CederGroupHub", + "github_repo": "WFacer", + "github_version": "main", + "doc_path": "docs/src", + "source_suffix": source_suffix, + "default_mode": "auto", +} + +# Custom sidebar templates, maps page names to templates. +html_sidebars = { + "contribute/index": [ + "search-field", + "sidebar-nav-bs", + "custom-template", + ], # This ensures we test for custom sidebars + # "demo/no-sidebar": [], # Test what page looks like with no sidebar items +} + +# The style sheet to use for HTML and HTML Help pages. A file of that name +# must exist either in Sphinx' static/ path, or in one of the custom paths +# given in html_static_path. +# html_style = '' + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". + +html_static_path = ["_static"] + +# If not '', a 'Last updated on:' timestamp is inserted at every page bottom, +# using the given strftime format. +html_last_updated_fmt = "%b %d, %Y" + +# If true, SmartyPants will be used to convert quotes and dashes to +# typographically correct entities. +# html_use_smartypants = True + +# Content template for the index page. +html_index = "index.html" + +# If false, no module index is generated. +html_use_modindex = True + +html_file_suffix = ".html" + +# If true, the reST sources are included in the HTML build as _sources/. +html_copy_source = False + +# Output file base name for HTML help builder. +htmlhelp_basename = "WFacer" diff --git a/docs/src/contributing.rst b/docs/src/contributing.rst new file mode 100644 index 0000000..5823038 --- /dev/null +++ b/docs/src/contributing.rst @@ -0,0 +1,139 @@ +.. _contributing : + +==================================== +Contributing Guidelines +==================================== + +For optimal teamwork, it's crucial to set clear and pragmatic guidelines upfront, +rather than addressing the confusion that arises from overlooking them later. +If you're committed to making impactful contributions, please take a moment to +thoroughly review the following guidelines! + +Bugs, issues, input, questions, etc +=================================== +Please use the +`issue tracker `_ to share any +of the following: + +- Bugs +- Issues +- Questions +- Feature requests +- Ideas +- Input + +Having these reported and saved in the issue tracker is very helpful to make +them properly addressed. Please be as descriptive and neat as possible when +opening up an issue. When available, please also attach your I/O data and the +full error message. + +Developing +========== +Code contributions can be anything from fixing the simplest bugs, to adding new +extensive features or subpackages. If you have written code or want to start +writing new code that you think will improve **WFacer** then please follow the +steps below to make a contribution. + +Guidelines +---------- + +* All code should have unit tests. +* Code should be well documented following `google style `_ docstrings. +* All code should pass the pre-commit hook. The code follows the `black code style `_. +* Additional dependencies should only be added when they are critical or if they are + already a :mod:`smol` or a :mod:`sparse-lm` dependency. More often than not it is best to avoid adding + a new dependency by simply delegating to directly using the external packages rather + than adding them to the source code. +* Implementing new features should be more fun than tedious. + +Installing a development version +-------------------------------- + +#. *Clone* the main repository or *fork* it and *clone* clone your fork using git. + If you plan to contribute back to the project, then you should create a fork and + clone that:: + + git clone https://github.com//WFacer.git + + Where ```` is your github username, or if you are cloning the main repository + directly then `` = CederGroupHub``. + +#. Install Python 3.8 or higher. We recommend using python 3.9 or higher: + `conda `_. + +#. We recommend developing using a virtual environment. You can do so using + `conda `_ + or using `virtualenv `_. + +#. Install the development version of *WFacer* in *editable* mode:: + + pip install --verbose --editable .[dev,test] + + This will install the package in *editable* mode, meaning that any changes + you make to the source code will be reflected in the installed package. + +Adding code contributions +------------------------- + +#. If you are contributing for the first time: + + * Install a development version of *WFacer* in *editable* mode as described above. + * Make sure to also add the *upstream* repository as a remote:: + + git remote add upstream https://github.com/CederGroupHub/WFacer.git + + * You should always keep your ``main`` branch or any feature branch up to date + with the upstream repository ``main`` branch. Be good about doing *fast forward* + merges of the upstream ``main`` into your fork branches while developing. + +#. In order to have changes available without having to re-install the package: + + * Install the package in *editable* mode:: + + pip install -e . + + +#. To develop your contributions you are free to do so in your *main* branch or any feature + branch in your fork. + + * We recommend to only your forks *main* branch for short/easy fixes and additions. + * For more complex features, try to use a feature branch with a descriptive name. + * For very complex feautres feel free to open up a PR even before your contribution is finished with + [WIP] in its name, and optionally mark it as a *draft*. + +#. While developing we recommend you use the pre-commit hook that is setup to ensure that your + code will satisfy all lint, documentation and black requirements. To do so install pre-commit, and run + in your clones top directory:: + + pre-commit install + + * All code should use `google style `_ docstrings + and `black `_ style formatting. + +#. Make sure to test your contribution and write unit tests for any new features. All tests should go in the + ``WFacer\tests`` directory. The CI will run tests upon opening a PR, but running them locally will help find + problems before:: + + pytests tests + + +#. To submit a contribution open a *pull request* to the upstream repository. If your contribution changes + the API (adds new features, edits or removes existing features). Please add a description to the + `change log `_. + +#. If your contribution includes novel published (or to be published) methodology, you should also edit the + citing page accordingly. + + +Adding examples +--------------- + +In many occasions novel use of the package does not necessarily require introducing new source code, but rather +using the existing functionality, and possibly external packages (that are are requirements) for particular or +advanced calculations. + +#. Create a sub-directory with a descriptive name in the ``docs/src/example_scripts`` directory. +#. Implement the functionality with enough sections to carefully describe the background, theory, + and steps in the index.rst file. +#. Once the script is ready, add an entry to the :ref:`examples` page's rst file so your example shows up in the + documentation. diff --git a/docs/src/example_scripts/full_automation_FCC_AgLi/analyze_wf.py b/docs/src/example_scripts/full_automation_FCC_AgLi/analyze_wf.py new file mode 100644 index 0000000..bd56afe --- /dev/null +++ b/docs/src/example_scripts/full_automation_FCC_AgLi/analyze_wf.py @@ -0,0 +1,21 @@ +from jobflow import SETTINGS +from pydantic import parse_obj_as + +from WFacer.schema import CeOutputsDocument + +store = SETTINGS.JOB_STORE +store.connect() + +# Just a random example. You have to check what is your maximum iteration on your own. +max_iter = 10 +# Find the output of a trigger job, which should be the CeOutputDocument of the final +# iteration. +job_return = store.query_one({"name": f"agli_fcc_ce_iter_{max_iter}_trigger"}) +raw_doc = job_return["output"] +# De-serialize everything. +doc = parse_obj_as(CeOutputsDocument, raw_doc) + +# See WFacer.schema for more. +print("Cluster subspace:", doc.cluster_subspace) +print("Wrangler:", doc.data_wrangler) +print("coefficients:", doc.coefs_history[-1]) diff --git a/docs/src/example_scripts/full_automation_FCC_AgLi/generate_wf.py b/docs/src/example_scripts/full_automation_FCC_AgLi/generate_wf.py new file mode 100644 index 0000000..71ce93a --- /dev/null +++ b/docs/src/example_scripts/full_automation_FCC_AgLi/generate_wf.py @@ -0,0 +1,26 @@ +from fireworks import LaunchPad +from jobflow.managers.fireworks import flow_to_workflow +from pymatgen.core import Structure + +from WFacer.maker import AutoClusterExpansionMaker + +# construct a rock salt Ag-Li structure +agli_prim = Structure( + lattice=[[0, 2.13, 2.13], [2.13, 0, 2.13], [2.13, 2.13, 0]], + species=[ + {"Ag": 0.5, "Li": 0.5}, + {"Ag": 0.5, "Li": 0.5}, + ], + coords=[[0, 0, 0], [0.5, 0.5, 0.5]], +) +# Use default for every option. +ce_flow = AutoClusterExpansionMaker(name="agli_fcc_ce", options={}).make(agli_prim) + +# convert the flow to a fireworks WorkFlow object +# If argument "store" is not specified, all documents will be saved to the JOB_STORE +# Defined by the local configuration files where you run THIS script from. +wf = flow_to_workflow(ce_flow) + +# submit the workflow to the FireWorks launchpad +lpad = LaunchPad.auto_load() +lpad.add_wf(wf) diff --git a/docs/src/example_scripts/full_automation_FCC_AgLi/index.rst b/docs/src/example_scripts/full_automation_FCC_AgLi/index.rst new file mode 100644 index 0000000..d318f9f --- /dev/null +++ b/docs/src/example_scripts/full_automation_FCC_AgLi/index.rst @@ -0,0 +1,38 @@ +.. _ full_automation : + +================================== +Fully automate a cluster expansion +================================== + +We provide a simple example workflow to run automatic cluster expansion in a Ag-Li alloy on FCC lattice +(see other available options in the documentations of *preprocessing.py*.): + +.. literalinclude :: generate_wf.py + :language: python + +After running this script, a workflow with the name **agli_fcc_ce** should have appeared on **Fireworks**' +launchpad. + +Make sure you have correctly configured **Fireworks**, **Jobflow** and **atomate2**, +then submit the workflow to computing cluster by running the following command, + +.. code-block:: bash + + nohup qlaunch rapidfire -m {n_jobs} --sleep {time} > qlaunch.log + +where *n_jobs* is the number of jobs you want to keep in queue, and *time* is the amount of sleep +time in seconds between two queue submission attempts. +*qlaunch* will keep submitting jobs to the queue until no job in the **READY** state could be found +on launchpad. + +.. note:: You may still need to qlaunch manually after every cluster expansion iteration + because for Fireworks could occasionally set the enumeration job to the READY state + but fails to continue executing the job. + +After finishing, use the following code to query the computation results from MongoDB, + +.. note:: Check that the **Jobflow** installations on the computer cluster and the query + terminal are configured to use the same **JOB_STORE**. + +.. literalinclude :: analyze_wf.py + :language: python diff --git a/examples/semi_automation_BCC_AlLi/fit_model.py b/docs/src/example_scripts/semi_automation_BCC_AlLi/fit_model.py similarity index 100% rename from examples/semi_automation_BCC_AlLi/fit_model.py rename to docs/src/example_scripts/semi_automation_BCC_AlLi/fit_model.py diff --git a/examples/semi_automation_BCC_AlLi/generate.py b/docs/src/example_scripts/semi_automation_BCC_AlLi/generate.py similarity index 100% rename from examples/semi_automation_BCC_AlLi/generate.py rename to docs/src/example_scripts/semi_automation_BCC_AlLi/generate.py diff --git a/docs/src/example_scripts/semi_automation_BCC_AlLi/index.rst b/docs/src/example_scripts/semi_automation_BCC_AlLi/index.rst new file mode 100644 index 0000000..eb2ac0a --- /dev/null +++ b/docs/src/example_scripts/semi_automation_BCC_AlLi/index.rst @@ -0,0 +1,29 @@ +.. _Semi-automate a basic cluster expansion : + +================================= +Semi-automate a cluster expansion +================================= + +.. note:: In the context of **WFacer** documentation, **semi-automation** refers to manual execution of scripts for structural generation + and model fitting, while Jobflow or Fireworks are allowed to manage merely the computation + of each individual enumerated structure. + +The following scripts demonstrate how use classes and utility functions to manually perform semi-automated steps +in a *cluster expansion* iteration. + +At the beginning first iteration, parameters for the cluster expansion and first-principles calculations +must be *initialized*. The following script provides an example in doing so: + +.. literalinclude :: initialize.py + :language: python + +Using the cluster expansion constructed in the last iteration, you can enumerate new structures to +be added in the current iteration and compute them with **atomate2**: + +.. literalinclude :: generate.py + :language: python + +In the final step, you would like to refit a cluster expansion model using the updated training set: + +.. literalinclude :: fit_model.py + :language: python diff --git a/examples/semi_automation_BCC_AlLi/initialize.py b/docs/src/example_scripts/semi_automation_BCC_AlLi/initialize.py similarity index 100% rename from examples/semi_automation_BCC_AlLi/initialize.py rename to docs/src/example_scripts/semi_automation_BCC_AlLi/initialize.py diff --git a/docs/src/examples.rst b/docs/src/examples.rst new file mode 100644 index 0000000..6a770a6 --- /dev/null +++ b/docs/src/examples.rst @@ -0,0 +1,21 @@ +.. _examples : + +================= +Examples +================= + +The following scripts demonstrate several examples of using **WFacer** for practical +applications. You can simply view an example python script by clicking it's link. + + +Basic Examples +-------------- + +.. toctree:: + :maxdepth: 2 + + example_scripts/semi_automation_BCC_AlLi/index + example_scripts/full_automation_FCC_AgLi/index + + +More to come... diff --git a/docs/src/index.rst b/docs/src/index.rst new file mode 100644 index 0000000..1b9ae03 --- /dev/null +++ b/docs/src/index.rst @@ -0,0 +1,118 @@ +.. WFacer documentation master file, created by + sphinx-quickstart on Thu Sep 14 13:17:04 2023. + You can adapt this file completely to your liking, but it should at least + contain the root toctree directive. + +:notoc: + +.. toctree:: + :maxdepth: 1 + :hidden: + + examples + contributing + api_reference/index + + +============================================================ +WorkFlow for Automated Cluster Expansion Regression (WFacer) +============================================================ + +*Modulated automation of cluster expansion model construction based on atomate2 and Jobflow* + +************** + +WFacer ("Wall"Facer) is a light-weight package based on `smol `_ +to automate the fitting of lattice models in disordered crystalline solids using +*cluster expansion* method. Beyond metallic alloys, **WFacer** is also designed +to handle ionic systems through enabling decorators in +:mod:`WFacer.species_decorators.charge` and the :class:`smol.cofe.extern.ewald.EwaldTerm`. + +Powered by `Atomate2 `_, +`Jobflow `_ +and `Fireworks `_, **WFacer** is able to fully automate the +cluster expansion building process on super-computing clusters, and can easily interface +with MongoDB data storage in the **Materials Project** style. + +Functionality +------------- +**WFacer** currently offers the following functionalities: + +- Preprocess setup to a cluster expansion workflow as dictionary. +- Enumerating and choosing the least aliasing super-cell matrices with given number of sites; + enumerating charge balanced compositions in super-cells; Enumerating and selecting low-energy, + non-duplicated structures into the training set at the beginning of each iteration. +- Computing enumerated structures using **Atomate2** **VASP** interfaces. +- Extracting and saving relaxed structures information and energy in **Atomate2** schemas. +- Decorating structures. Currently, supports charge decoration from fixed labels, from Pymatgen guesses, + or from `a gaussian optimization process `_ based on partitioning + site magnetic moments. +- Fitting effective cluster interactions (ECIs) from structures and energies with sparse linear + regularization methods and model selection methods provided by + `sparse-lm `_, + except for overlapped group Lasso regularization. +- Checking convergence of cluster expansion model using the minimum energy convergence per composition, + the cross validation error, and the difference of ECIs (if needed). +- Creating an **atomate2** style workflow to be executed locally or with **Fireworks**. + +Installation +------------ +* From pypi: :code:`pip install WFacer` +* From source: :code:`git clone` the repository. The latest tag in the *main* branch is the stable version of the + code. The **main** branch has the latest tested features, but may have more + lingering bugs. From the top level directory, do :code:`pip install -r requirements.txt`, then :code:`pip install .` If + you wish to use **Fireworks** as the calculation manager, do :code:`pip install -r requirements-optional.txt` as well. + +Post-installation configuration +------------------------------- +Specific configurations are required before you can properly use **WFacer**. + +* **Fireworks** job management is highly recommended but not required. + To use job management with **Fireworks** and **Atomate2**, + configuring **Fireworks** and **Atomate2** with your MongoDB storage is necessary. + Users are advised to follow the guidance in + **Atomate2** and `Atomate `_ + installation guides, and run a simple `test workflow `_ + to see if it is able to run on your queue. + + Instead of writing in **my_qadapter.yaml** as + + .. code-block:: bash + + rlaunch -c <>/config rapidfire + + we suggest using: + + .. code-block:: bash + + rlaunch -c <>/config singleshot + + instead, because by using **singleshot** within **rlaunch**, a task in the submission queue will + be terminated once a structure is finished instead of trying to fetch another structure + from the launchpad. This can be used in combination with: + + .. code-block:: bash + + qlaunch rapidfire -m + + to guarantee that each structure is able to use up the maximum wall-time in + its computation. + +* A mixed integer programming (MIP) solver would be necessary when a MIQP based + regularization method is used. A list of available MIP solvers can be found in + `cvxpy `_ documentations. + Commercial solvers such as **Gurobi** and **CPLEX** are typically pre-compiled + but require specific licenses to run on a super-computing system. For open-source solvers, + the users are recommended to install **SCIP** in a dedicated conda environment following + the installation instructions in `PySCIPOpt `_. + +Examples +------------ +See :ref:`examples` for some typical use cases. + +License +------------------------------- + +**WFacer** is distributed openly under a modified 3-clause BSD licence. + +.. include:: ../../LICENSE diff --git a/pyproject.toml b/pyproject.toml index 2fbaaf2..101cd4e 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -51,8 +51,16 @@ classifiers = [ dev = ["pre-commit", "black", "isort", "flake8", "pylint", "pydocstyle", "flake8-pyproject"] # Gurobipy needed by mixedL0 tests. tests = ["pytest >=7.2.0", "pytest-cov >=4.0.0", "coverage", "gurobipy", "pyscipopt>=4.0.0"] -docs = ["sphinx >= 5.3", "pydata-sphinx-theme >=0.12.0", "ipython >=8.2.0", "nbsphinx >=0.8.10", - "nbsphinx-link >=1.3.0", "nb2plots >=0.6.1"] +docs = [ + "sphinx >=7.0.0", + "pydata-sphinx-theme >=0.13.3", +# "ipython >=8.2.0", +# "nbsphinx >=0.9.0", +# "nbsphinx-link >=1.3.0", + "sphinx-copybutton >=0.5.2", + "sphinx-autodoc-typehints >=1.24.0", + "sphinx-mdinclude" +] optional = ["gurobipy"] # Specify to only package WFacer