Skip to content

Commit

Permalink
Merge pull request #1 from QuantumChemist/main
Browse files Browse the repository at this point in the history
A small revision of the jobflow tutorial
  • Loading branch information
JaGeo authored Jul 8, 2024
2 parents 73b06f6 + 5c9e516 commit f22a413
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 12 deletions.
4 changes: 2 additions & 2 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@ channels:
- conda-forge
dependencies:
- python=3.11
- pyiron_base=0.9.5
- pyiron_base=0.9.6
- qe=7.2
- qe-tools=2.0.0
- ase=3.23.0
- matplotlib=3.9.0
- matplotlib=3.8.4
- xmlschema=3.3.1
- jobflow=0.1.17
- pymatgen=2024.3.1
Expand Down
33 changes: 24 additions & 9 deletions jobflow.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,17 @@
{
"metadata": {},
"cell_type": "markdown",
"source": "[`jobflow`](https://materialsproject.github.io/jobflow/index.html) and [`atomate2`](https://materialsproject.github.io/atomate2/index.html) are key packages of the [Materials Project](https://materialsproject.org/) . `jobflow` was especially designed to simplify the execution of dynamic workflows -- when the actual number of jobs is dynamically determined upon runtime instead of being statically fixed before running the workflow(s). `jobflow`'s overall flexibility allows for building workflows that go beyond the usage in materials science. `jobflow` forms the basis of `atomate2`. `atomate2` implements data generation workflows in the context of materials science and will be responsible for data generation in the Materials Project in the future. "
"source": [
"[`jobflow`](https://materialsproject.github.io/jobflow/index.html) and [`atomate2`](https://materialsproject.github.io/atomate2/index.html) are key packages of the [Materials Project](https://materialsproject.org/) . `jobflow` was especially designed to simplify the execution of dynamic workflows -- when the actual number of jobs is dynamically determined upon runtime instead of being statically fixed before running the workflow(s). `jobflow`'s overall flexibility allows for building workflows that go beyond the usage in materials science. `jobflow` serves as the basis of `atomate2`, which implements data generation workflows in the context of materials science and will be used for data generation in the Materials Project in the future. "
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"## Installation / Setup\n",
"`jobflow` can be installed via `pip install` and directly run with the default setup. It will then internally rely on a memory database as defined in the software package [`maggma`](https://materialsproject.github.io/maggma/). For large-scale usage, a [MongoDB](https://www.mongodb.com/)-like database might be specified in a jobflow.yaml file.\n",
"A high-throughput setup (i.e., parallel execution of independent parts in the workflow) of `jobflow` can be achieved using additional packages like [`fireworks`](https://materialsproject.github.io/fireworks/) or [`jobflow-remote`](https://matgenix.github.io/jobflow-remote/). Both packages require a MongoDB database. In case of `FireWorks`, however, the MongoDB database needs to be directly connected to the compute nodes. `jobflow-remote` allows remote submission options that only require a MongoDB database on the submitting computer but not the compute nodes. It can also deal with multi-factor authentification."
"A high-throughput setup (i.e., parallel execution of independent parts in the workflow) of `jobflow` can be achieved using additional packages like [`fireworks`](https://materialsproject.github.io/fireworks/) or [`jobflow-remote`](https://matgenix.github.io/jobflow-remote/). Both packages require a MongoDB database. In case of `FireWorks`, however, the MongoDB database needs to be directly connected to the compute nodes. `jobflow-remote` allows remote submission options that only require a MongoDB database on the submitting computer but not the compute nodes. It can also deal with multi-factor authentication."
]
},
{
Expand Down Expand Up @@ -240,7 +242,9 @@
},
{
"cell_type": "markdown",
"source": "Then, we import tools for data plotting and mathematical operations and manipulation.",
"source": [
"Then, we import tools for data plotting as well as mathematical operations and manipulation."
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
Expand Down Expand Up @@ -325,7 +329,7 @@
"\n",
"\n",
"def generate_structures(\n",
" structure: Atoms, strain_lst: list(float)\n",
" structure: Atoms, strain_lst: list[float]\n",
"): # structure should be of ase Atoms type\n",
" structure = MSONAtoms(structure)\n",
" structure_lst = []\n",
Expand Down Expand Up @@ -358,7 +362,7 @@
{
"cell_type": "code",
"source": [
"def plot_energy_volume_curve(volume_lst: list(float), energy_lst: list(float)):\n",
"def plot_energy_volume_curve(volume_lst: list[float], energy_lst: list[float]):\n",
" plt.plot(volume_lst, energy_lst)\n",
" plt.xlabel(\"Volume\")\n",
" plt.ylabel(\"Energy\")\n",
Expand Down Expand Up @@ -524,7 +528,7 @@
{
"cell_type": "markdown",
"source": [
"The next steps are all concerned about handling the execution of QE and the output data collection. We start by defining a QE task document `QETaskDoc` to systematically collect the output data. For the (E, V) curve, the energy and volume are of course the most important information. The task document could be further extended to contain information that is relevant for other purposes. Next, we define a `BaseQEMaker` to handle generic QE jobs (in our case for the structural relaxation) and a separate `StaticQEMaker` for the static QE calculations. The `BaseQEMaker` is expecting the generic input set generated by `QEInputGenerator`, while `StaticQEMaker` expects the `QEInputStaticGenerator` type. As the `StaticQEMaker` inherits from the `BaseQEMaker`, we only need to make sure to pass the correct input set generator type."
"The next steps are all about handling the execution of QE and the output data collection. We start by defining a QE task document `QETaskDoc` to systematically collect the output data. For the (E, V) curve, the energy and volume are of course the most important information. The task document could be further extended to contain other relevant information. Next, we define a `BaseQEMaker` to handle generic QE jobs (in our case for the structural relaxation) and a separate `StaticQEMaker` for the static QE calculations. The `BaseQEMaker` is expecting the generic input set generated by `QEInputGenerator`, while `StaticQEMaker` expects the `QEInputStaticGenerator` type. As the `StaticQEMaker` inherits from the `BaseQEMaker`, we only need to make sure to pass the correct input set generator type."
],
"metadata": {
"collapsed": false
Expand Down Expand Up @@ -636,7 +640,9 @@
},
{
"cell_type": "markdown",
"source": "Finally, it's time to orchestrate all functions and classes together in an actual flow. Note how the number of jobs in `get_ev_curve` can be flexibly controlled by using `strain_lst` and therefore we use a `Response` object to handle the flexible job output. By making `get_ev_curve` and `plot_energy_volume_curve_job` into `job` objects using the `@job` decorator, we ensure that first all the (E, V) data points are calculated before they are plotted. The `qe_flow` contains the list of the jobs that need to be executed in this workflow. The jobs are connected by the respective `job.output` objects that also ensures the correct order in executing the jobs.",
"source": [
"Finally, it's time to orchestrate all functions and classes together into an actual flow. Note how the number of jobs in `get_ev_curve` can be flexibly controlled by using `strain_lst` and therefore we use a `Response` object to handle the flexible job output. By making `get_ev_curve` and `plot_energy_volume_curve_job` into `job` objects using the `@job` decorator, we ensure that first all the (E, V) data points are calculated before they are plotted. The `qe_flow` contains the list of the jobs that need to be executed in this workflow. The jobs are connected by the respective `job.output` objects that also ensures the correct order in executing the jobs."
],
"metadata": {
"collapsed": false
}
Expand Down Expand Up @@ -687,7 +693,7 @@
"# qe_flow is the QE flow that consists of the job for structural optimization, calculating the (E, V) curve data points and plotting the curve\n",
"run_locally(\n",
" qe_fw, create_folders=True\n",
") # order of the jobs in the flow is determined by connectivity\n",
") # order of the jobs in the flow determined by connectivity\n",
"\n",
"graph = to_mermaid(qe_fw, show_flow_boxes=True)\n",
"mm(graph)"
Expand Down Expand Up @@ -908,11 +914,20 @@
"cell_type": "markdown",
"source": [
"## Publication of the workflow\n",
"The `jobflow` infrastructure does not provide a dedicated platform for publishing a workflow currently. However, workflows related to computational materials science have been collected in the package `atomate2`. In addition, users can build their own package by relying on jobflow and share it as a new Python-based program. Additional packages in materials science using `jobflow` exist."
"The `jobflow` infrastructure does not provide a dedicated platform for publishing a workflow currently. However, workflows related to computational materials science have been collected in the package `atomate2`. In addition, users can build their own package by relying on `jobflow` and share it as a new Python-based program. There are also additional materials science packages like [NanoParticleTools](https://github.com/BlauGroup/NanoParticleTools) or [QuAcc](https://github.com/Quantum-Accelerators/quacc) that rely on `jobflow`."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [],
"metadata": {
"collapsed": false
}
}
]
}
2 changes: 1 addition & 1 deletion pyiron_base.ipynb

Large diffs are not rendered by default.

0 comments on commit f22a413

Please sign in to comment.