From 28bff949d8310b3cca62393e8ee58ca508f645e7 Mon Sep 17 00:00:00 2001 From: JaGeo Date: Tue, 2 Jul 2024 22:28:35 +0200 Subject: [PATCH 1/5] Update Tutorial text --- jobflow.ipynb | 192 +++++++++++++++++++++++++++++--------------------- 1 file changed, 111 insertions(+), 81 deletions(-) diff --git a/jobflow.ipynb b/jobflow.ipynb index 47ab152..65934c8 100644 --- a/jobflow.ipynb +++ b/jobflow.ipynb @@ -31,17 +31,15 @@ { "metadata": {}, "cell_type": "markdown", - "source": [ - "The Python package [`jobflow`](https://materialsproject.github.io/jobflow/index.html) was developed in the context of the [Materials Project](https://next-gen.materialsproject.org/) and its overall flexibility allows for building workflows that go beyond the usage in material science. `jobflow` forms the basis of [`atomate2`](https://materialsproject.github.io/atomate2/index.html), a software package for data generation workflows. `jobflow` and `atomate2` are key packages of the Materials Project. `jobflow` was especially designed to simplify the execution of dynamic workflows -- when the actual number of jobs is dynamically determined upon runtime instead of being statically fixed before running the workflow(s)." - ] + "source": "[`jobflow`](https://materialsproject.github.io/jobflow/index.html) and [`atomate2`](https://materialsproject.github.io/atomate2/index.html) are key packages of the [Materials Project](https://materialsproject.org/) . `jobflow` was especially designed to simplify the execution of dynamic workflows -- when the actual number of jobs is dynamically determined upon runtime instead of being statically fixed before running the workflow(s). `jobflow`'s overall flexibility allows for building workflows that go beyond the usage in material science. `jobflow` forms the basis of `atomate2`. `atomate2` implements data generation workflows in the context of materials science and will be responsible for data generation in the Materials Project in the future. " }, { "metadata": {}, "cell_type": "markdown", "source": [ "## Installation / Setup\n", - "`jobflow` can be installed via 'pip install' and be run with the default setup. It will then internally rely on a predefined memory database as defined in the software package [`maggma`](https://materialsproject.github.io/maggma/). For large-scale usage, a [MongoDB](https://www.mongodb.com/)-like database might be specified for each project in a jobflow.yaml file.\n", - "A high-throughput setup (i.e., parallel execution of independent parts in the workflow) of `jobflow` can be achieved using additional packages like [`fireworks`](https://materialsproject.github.io/fireworks/) or [`jobflow-remote`](https://matgenix.github.io/jobflow-remote/). Both packages require a MongoDB database. In case of `FireWorks`, however, the MongoDB database needs to be connected to the compute nodes. `jobflow-remote` allows remote submission options that only require a MongoDB database on the submitting computer but not the compute nodes. It can also deal with multi-factor authentification." + "`jobflow` can be installed via `pip install` and directly run with the default setup. It will then internally rely on a memory database as defined in the software package [`maggma`](https://materialsproject.github.io/maggma/). For large-scale usage, a [MongoDB](https://www.mongodb.com/)-like database might be specified in a jobflow.yaml file.\n", + "A high-throughput setup (i.e., parallel execution of independent parts in the workflow) of `jobflow` can be achieved using additional packages like [`fireworks`](https://materialsproject.github.io/fireworks/) or [`jobflow-remote`](https://matgenix.github.io/jobflow-remote/). Both packages require a MongoDB database. In case of `FireWorks`, however, the MongoDB database needs to be directly connected to the compute nodes. `jobflow-remote` allows remote submission options that only require a MongoDB database on the submitting computer but not the compute nodes. It can also deal with multi-factor authentification." ] }, { @@ -65,11 +63,14 @@ "cell_type": "code", "source": [ "from jobflow import job\n", + "\n", + "\n", "@job\n", "def my_function(my_parameter: int):\n", " return my_parameter\n", "\n", - "job1 = my_function(my_parameter=1)\n" + "\n", + "job1 = my_function(my_parameter=1)" ], "outputs": [], "execution_count": 2 @@ -77,9 +78,7 @@ { "metadata": {}, "cell_type": "markdown", - "source": [ - "When we connect several `jobs` to generate a `Flow` object, we can construct a whole workflow. The list of connected `jobs` has to be passed to initialize a `Flow`. The order of the `jobs` is automatically determined by the job connectivity via the `job.output` upon runtime. We can also connect several `jobs` and `Flows` to create a new `Flow`.\n" - ] + "source": "When we connect several `job`s to generate a `Flow` object, we can construct a whole workflow. The list of connected `jobs` has to be passed to initialize a `Flow`. The order of the `jobs` is automatically determined by the job connectivity via the `job.output` upon runtime. We can also connect several `job`s and `Flows` to create a new `Flow`.\n" }, { "metadata": { @@ -93,13 +92,18 @@ "execution_count": 3, "source": [ "from jobflow import job, Flow\n", + "\n", + "\n", "@job\n", "def my_function(my_parameter: int):\n", " return my_parameter\n", + "\n", + "\n", "@job\n", "def my_second_function(my_parameter: int):\n", " return my_parameter\n", "\n", + "\n", "job1 = my_function(my_parameter=1)\n", "job2 = my_second_function(job1.output)\n", "flow = Flow([job1, job2], job2.output)" @@ -132,15 +136,17 @@ "from IPython.display import Image, display\n", "from jobflow.utils.graph import to_mermaid\n", "\n", + "\n", "def mm(graph):\n", " graphbytes = graph.encode(\"utf8\")\n", " base64_bytes = base64.b64encode(graphbytes)\n", " base64_string = base64_bytes.decode(\"ascii\")\n", " display(Image(url=\"https://mermaid.ink/img/\" + base64_string))\n", "\n", + "\n", "graph_source = to_mermaid(flow, show_flow_boxes=True)\n", "\n", - "mm(graph_source)\n" + "mm(graph_source)" ], "metadata": { "collapsed": false, @@ -153,9 +159,7 @@ { "metadata": {}, "cell_type": "markdown", - "source": [ - "With a so-called `Maker` object, the implementation of `job` and `Flow` objects can be further simplified. It is an extended dataclass object that generates a `job` or `Flow` with the `make` method. Dataclasses are used as they allow defining immutable default values. These `Makers` can then be used to reuse code via the usual inheritance in Python.\n" - ] + "source": "With a so-called `Maker` object, the implementation of `job` and `Flow` objects can be further simplified. It is an extended dataclass object that generates a `job` or `Flow` with the `make` method. Dataclasses are used as they allow defining immutable default values and simplify initialisation of attributes. These `Makers` can then be used to reuse code via the usual inheritance in Python.\n" }, { "metadata": { @@ -170,6 +174,8 @@ "source": [ "from dataclasses import dataclass\n", "from jobflow import Maker\n", + "\n", + "\n", "@dataclass\n", "class MyMaker(Maker):\n", " name: str = \"My Maker\"\n", @@ -178,30 +184,35 @@ " @job\n", " def make(self, my_parameter: int):\n", " return my_parameter * self.scaling\n", - " \n", + "\n", + "\n", "@dataclass\n", "class MyInheritedMaker(MyMaker):\n", " name: str = \"My inherited Maker\"\n", "\n", - "job1=MyMaker().make(my_parameter=1)\n", - "job2=MyInheritedMaker().make(my_parameter=job1.output)" + "\n", + "job1 = MyMaker().make(my_parameter=1)\n", + "job2 = MyInheritedMaker().make(my_parameter=job1.output)" ] }, { "metadata": {}, "cell_type": "markdown", - "source": [ - "When faced with dynamic workflows (where the number of jobs is unclear before execution) a `Response` object instead of a `job` or `Flow` object can be returned. There are several options to insert the new list of jobs into the `Response` object.\n", - "We will now use this knowledge to implement the Quantum Espresso-related tasks for calculating a \"Energy vs. Volume\" curve. It’s important to point out that this is only a basic implementation, and further extensions towards data validation or for nicer user experience can be added. " - ] + "source": "When faced with dynamic workflows (where the number of jobs is unclear before execution) a `Response` object instead of a `job` or `Flow` object can be returned. There are several options to insert the new list of jobs into the `Response` object." }, { "metadata": {}, "cell_type": "markdown", "source": [ - "> ℹ️ Note: all outputs of a job need to be transformable into a JSON-like format. Therefore, we use a pymatgen `Structure` object here instead of an ase `Atoms` object. `Structure` inherits from the Python package monty’s MSONable class allowing an easy serialization into a `dict`. In addition, commands to execute jobs might be set in such a way that users can easily adapt them in a configuration file." + "## Implementation of Quantum Espresso Workflow\n", + "We will use this knowledge to implement the Quantum Espresso-related tasks for calculating an \"Energy vs. Volume\" curve. It’s important to note that this is only a basic implementation, and further extensions towards data validation or for a simplified user experience can be added. For example, one can typically configure run commands for quantum-chemical programs via configuration files in atomate2." ] }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "> ℹ️ Note: all outputs of a job need to be transformable into a JSON-like format. Therefore, we use a pymatgen `MSONAtoms` object here instead of an ase `Atoms` object. `MSONAtoms` inherits from the Python package monty’s MSONable class allowing an easy serialization into a `dict`. " + }, { "cell_type": "markdown", "source": [ @@ -229,9 +240,7 @@ }, { "cell_type": "markdown", - "source": [ - "Then we import tools for data plotting and mathematical operations and manipulation." - ], + "source": "Then, we import tools for data plotting and mathematical operations and manipulation.", "metadata": { "collapsed": false, "ExecuteTime": { @@ -281,9 +290,7 @@ }, { "cell_type": "markdown", - "source": [ - "From ADIS we take the QE XML output parser." - ], + "source": "Additionally, we take the QE XML output parser written for this tutorial.", "metadata": { "collapsed": false } @@ -305,9 +312,7 @@ }, { "cell_type": "markdown", - "source": [ - "As we want to calculate a (E, V) curve with QE, we start by implementing a function to apply a certain strain to the original structure." - ], + "source": "To calculate a (E, V) curve with QE, we implement a function that applies a certain strain to the original structure.", "metadata": { "collapsed": false } @@ -320,8 +325,7 @@ " for strain in strain_lst:\n", " structure_strain = structure.copy()\n", " structure_strain.set_cell(\n", - " structure_strain.cell * strain**(1/3), \n", - " scale_atoms=True\n", + " structure_strain.cell * strain ** (1 / 3), scale_atoms=True\n", " )\n", " structure_lst.append(structure_strain)\n", " return structure_lst" @@ -375,21 +379,21 @@ "cell_type": "code", "source": [ "def write_input(input_dict, working_directory=\".\"):\n", - " filename = os.path.join(working_directory, 'input.pwi')\n", + " filename = os.path.join(working_directory, \"input.pwi\")\n", " os.makedirs(working_directory, exist_ok=True)\n", " write(\n", - " filename=filename, \n", - " images=input_dict[\"structure\"], \n", - " Crystal=True, \n", - " kpts=input_dict[\"kpts\"], \n", + " filename=filename,\n", + " images=input_dict[\"structure\"],\n", + " Crystal=True,\n", + " kpts=input_dict[\"kpts\"],\n", " input_data={\n", - " 'calculation': input_dict[\"calculation\"],\n", - " 'occupations': 'smearing',\n", - " 'degauss': input_dict[\"smearing\"],\n", - " }, \n", + " \"calculation\": input_dict[\"calculation\"],\n", + " \"occupations\": \"smearing\",\n", + " \"degauss\": input_dict[\"smearing\"],\n", + " },\n", " pseudopotentials=input_dict[\"pseudopotentials\"],\n", - " tstress=True, \n", - " tprnfor=True\n", + " tstress=True,\n", + " tprnfor=True,\n", " )" ], "metadata": { @@ -414,11 +418,11 @@ "cell_type": "code", "source": [ "def collect_output(working_directory=\".\"):\n", - " output = parse_pw(os.path.join(working_directory, 'pwscf.xml'))\n", + " output = parse_pw(os.path.join(working_directory, \"pwscf.xml\"))\n", " return {\n", - " \"structure\": output['ase_structure'],\n", + " \"structure\": output[\"ase_structure\"],\n", " \"energy\": output[\"energy\"],\n", - " \"volume\": output['ase_structure'].get_volume(),\n", + " \"volume\": output[\"ase_structure\"].get_volume(),\n", " }" ], "metadata": { @@ -454,6 +458,7 @@ " \"\"\"\n", " Writes an QE input based on an input_dict\n", " \"\"\"\n", + "\n", " def __init__(self, input_dict):\n", " self.input_dict = input_dict\n", "\n", @@ -466,32 +471,36 @@ " \"\"\"\n", " Generates an QE input based on the format given in QEInputSet.\n", " \"\"\"\n", - " pseudopotentials: dict = field(default_factory=lambda: {\"Al\": \"Al.pbe-n-kjpaw_psl.1.0.0.UPF\"})\n", - " kpts: tuple = (3,3,3)\n", + "\n", + " pseudopotentials: dict = field(\n", + " default_factory=lambda: {\"Al\": \"Al.pbe-n-kjpaw_psl.1.0.0.UPF\"}\n", + " )\n", + " kpts: tuple = (3, 3, 3)\n", " calculation: str = \"vc-relax\"\n", " smearing: float = 0.02\n", - " \n", "\n", " def get_input_set(self, structure) -> QEInputSet:\n", "\n", - " input_dict={\"structure\":structure,\n", - " \"pseudopotentials\":self.pseudopotentials, \n", + " input_dict = {\n", + " \"structure\": structure,\n", + " \"pseudopotentials\": self.pseudopotentials,\n", " \"kpts\": self.kpts,\n", " \"calculation\": self.calculation,\n", " \"smearing\": self.smearing,\n", - " }\n", + " }\n", " return QEInputSet(input_dict=input_dict)\n", "\n", + "\n", "@dataclass\n", "class QEInputStaticGenerator(QEInputGenerator):\n", " calculation: str = \"scf\"\n", "\n", - " \n", - "def write_qe_input_set(structure, input_set_generator=QEInputGenerator(), working_directory=\".\"):\n", + "\n", + "def write_qe_input_set(\n", + " structure, input_set_generator=QEInputGenerator(), working_directory=\".\"\n", + "):\n", " qis = input_set_generator.get_input_set(structure=structure)\n", - " qis.write_input(working_directory=working_directory)\n", - " \n", - " " + " qis.write_input(working_directory=working_directory)" ], "metadata": { "collapsed": false, @@ -521,21 +530,28 @@ "from typing import Any, Optional, Union\n", "\n", "\n", - "QE_CMD= \"mpirun -np 1 pw.x -in input.pwi > output.pwo\"\n", + "QE_CMD = \"mpirun -np 1 pw.x -in input.pwi > output.pwo\"\n", + "\n", + "\n", "def run_qe(qe_cmd=QE_CMD):\n", " subprocess.check_output(qe_cmd, shell=True, universal_newlines=True)\n", - " \n", - " \n", + "\n", + "\n", "class QETaskDoc(BaseModel):\n", " structure: Optional[MSONAtoms] = Field(None, description=\"ASE structure\")\n", " energy: Optional[float] = Field(None, description=\"DFT energy in eV\")\n", " volume: Optional[float] = Field(None, description=\"volume in Angstrom^3\")\n", - " \n", + "\n", " @classmethod\n", " def from_directory(cls, working_directory):\n", - " output=collect_output(working_directory=working_directory)\n", + " output = collect_output(working_directory=working_directory)\n", " # structure object needs to be serializable, i.e., we need an additional transformation\n", - " return cls(structure=MSONAtoms(output[\"structure\"]), energy=output[\"energy\"], volume=output[\"volume\"])\n", + " return cls(\n", + " structure=MSONAtoms(output[\"structure\"]),\n", + " energy=output[\"energy\"],\n", + " volume=output[\"volume\"],\n", + " )\n", + "\n", "\n", "@dataclass\n", "class BaseQEMaker(Maker):\n", @@ -554,9 +570,7 @@ " input_set_generator: QEInputGenerator = field(default_factory=QEInputGenerator)\n", "\n", " @job(output_schema=QETaskDoc)\n", - " def make(\n", - " self, structure\n", - " ) -> QETaskDoc:\n", + " def make(self, structure) -> QETaskDoc:\n", " \"\"\"\n", " Run a QE calculation.\n", "\n", @@ -564,7 +578,7 @@ " ----------\n", " structure : MSONAtoms|Atoms\n", " An Atoms or MSONAtoms object.\n", - " \n", + "\n", " Returns\n", " -------\n", " Output of a QE calculation\n", @@ -573,16 +587,18 @@ "\n", " # write qe input files\n", " write_qe_input_set(\n", - " structure=structure, input_set_generator=self.input_set_generator)\n", + " structure=structure, input_set_generator=self.input_set_generator\n", + " )\n", "\n", " # run the QE software\n", " run_qe()\n", "\n", " # parse QE outputs in form of a task document\n", - " task_doc=QETaskDoc.from_directory(\".\")\n", - " \n", + " task_doc = QETaskDoc.from_directory(\".\")\n", + "\n", " return task_doc\n", "\n", + "\n", "@dataclass\n", "class StaticQEMaker(BaseQEMaker):\n", " \"\"\"\n", @@ -597,7 +613,9 @@ " \"\"\"\n", "\n", " name: str = \"static qe job\"\n", - " input_set_generator: QEInputGenerator = field(default_factory=QEInputStaticGenerator)\n" + " input_set_generator: QEInputGenerator = field(\n", + " default_factory=QEInputStaticGenerator\n", + " )" ], "metadata": { "ExecuteTime": { @@ -752,11 +770,12 @@ "from jobflow import job, Response, Flow, run_locally\n", "\n", "# set up QE PP env\n", - "os.environ['ESPRESSO_PSEUDO'] = f\"{os.getcwd()}/espresso/pseudo\"\n", + "os.environ[\"ESPRESSO_PSEUDO\"] = f\"{os.getcwd()}/espresso/pseudo\"\n", + "\n", "\n", "@job\n", "def get_ev_curve(structure, strain_lst):\n", - " structures=generate_structures(structure,strain_lst=strain_lst)\n", + " structures = generate_structures(structure, strain_lst=strain_lst)\n", " jobs = []\n", " volumes = []\n", " energies = []\n", @@ -765,22 +784,33 @@ " jobs.append(new_job)\n", " volumes.append(new_job.output.volume)\n", " energies.append(new_job.output.energy)\n", - " return Response(replace=Flow(jobs, output={\"energies\": energies, \"volumes\": volumes}))\n", - " \n", + " return Response(\n", + " replace=Flow(jobs, output={\"energies\": energies, \"volumes\": volumes})\n", + " )\n", + "\n", + "\n", "@job\n", "def plot_energy_volume_curve_job(volume_lst, energy_lst):\n", " plot_energy_volume_curve(volume_lst=volume_lst, energy_lst=energy_lst)\n", "\n", - "structure = bulk('Al', a=4.15, cubic=True)\n", + "\n", + "structure = bulk(\"Al\", a=4.15, cubic=True)\n", "relax = BaseQEMaker().make(structure=MSONAtoms(structure))\n", - "ev_curve_data = get_ev_curve(relax.output.structure, strain_lst=np.linspace(0.9, 1.1, 5))\n", + "ev_curve_data = get_ev_curve(\n", + " relax.output.structure, strain_lst=np.linspace(0.9, 1.1, 5)\n", + ")\n", "# structure optimization job and (E, V) curve data job connected via relax.output\n", - "plot_curve = plot_energy_volume_curve_job(volume_lst=ev_curve_data.output[\"volumes\"], energy_lst=ev_curve_data.output[\"energies\"])\n", + "plot_curve = plot_energy_volume_curve_job(\n", + " volume_lst=ev_curve_data.output[\"volumes\"],\n", + " energy_lst=ev_curve_data.output[\"energies\"],\n", + ")\n", "# (E, V) curve data job and plotting the curve job connected via ev_curve_data.output\n", - "qe_flow = [relax, ev_curve_data, plot_curve] \n", + "qe_flow = [relax, ev_curve_data, plot_curve]\n", "qe_fw = Flow(qe_flow)\n", "# qe_flow is the QE flow that consists of the job for structural optimization, calculating the (E, V) curve data points and plotting the curve\n", - "run_locally(qe_fw, create_folders=True) # order of the jobs in the flow is determined by connectivity\n", + "run_locally(\n", + " qe_fw, create_folders=True\n", + ") # order of the jobs in the flow is determined by connectivity\n", "\n", "graph = to_mermaid(qe_fw, show_flow_boxes=True)\n", "mm(graph)" @@ -817,7 +847,7 @@ ], "source": [ "mm(\n", - "\"\"\"\n", + " \"\"\"\n", "flowchart TD\n", " 6883bfe0-2b20-49de-92df-c166d6f91dbc(base qe job) --> |output.structure| dfc9a5cd-fb91-4582-b6c6-c42b4c65cb83(get_ev_curve)\n", " dfc9a5cd-fb91-4582-b6c6-c42b4c65cb83(get_ev_curve) --> |'volumes', 'energies'| 92d14a25-bb90-4c86-b970-af05db90550e(plot_energy_volume_curve_job)\n", From d3d4a6c45869fd2181453869afb40c8f80367405 Mon Sep 17 00:00:00 2001 From: JaGeo Date: Wed, 3 Jul 2024 14:01:42 +0200 Subject: [PATCH 2/5] Additional text iteration and code improvements --- jobflow.ipynb | 413 +++++++++++++++++++++++++------------------------- 1 file changed, 204 insertions(+), 209 deletions(-) diff --git a/jobflow.ipynb b/jobflow.ipynb index 65934c8..3e7f9eb 100644 --- a/jobflow.ipynb +++ b/jobflow.ipynb @@ -31,7 +31,7 @@ { "metadata": {}, "cell_type": "markdown", - "source": "[`jobflow`](https://materialsproject.github.io/jobflow/index.html) and [`atomate2`](https://materialsproject.github.io/atomate2/index.html) are key packages of the [Materials Project](https://materialsproject.org/) . `jobflow` was especially designed to simplify the execution of dynamic workflows -- when the actual number of jobs is dynamically determined upon runtime instead of being statically fixed before running the workflow(s). `jobflow`'s overall flexibility allows for building workflows that go beyond the usage in material science. `jobflow` forms the basis of `atomate2`. `atomate2` implements data generation workflows in the context of materials science and will be responsible for data generation in the Materials Project in the future. " + "source": "[`jobflow`](https://materialsproject.github.io/jobflow/index.html) and [`atomate2`](https://materialsproject.github.io/atomate2/index.html) are key packages of the [Materials Project](https://materialsproject.org/) . `jobflow` was especially designed to simplify the execution of dynamic workflows -- when the actual number of jobs is dynamically determined upon runtime instead of being statically fixed before running the workflow(s). `jobflow`'s overall flexibility allows for building workflows that go beyond the usage in materials science. `jobflow` forms the basis of `atomate2`. `atomate2` implements data generation workflows in the context of materials science and will be responsible for data generation in the Materials Project in the future. " }, { "metadata": {}, @@ -52,28 +52,26 @@ }, { "metadata": { - "jupyter": { - "is_executing": true - }, "ExecuteTime": { - "end_time": "2024-06-30T21:48:46.425490304Z", - "start_time": "2024-06-30T21:48:45.207125865Z" + "end_time": "2024-07-03T11:59:43.383373Z", + "start_time": "2024-07-03T11:59:42.489756Z" } }, "cell_type": "code", "source": [ + "from __future__ import annotations\n", "from jobflow import job\n", "\n", "\n", "@job\n", - "def my_function(my_parameter: int):\n", + "def my_function(my_parameter: int)->int:\n", " return my_parameter\n", "\n", "\n", "job1 = my_function(my_parameter=1)" ], "outputs": [], - "execution_count": 2 + "execution_count": 1 }, { "metadata": {}, @@ -83,31 +81,31 @@ { "metadata": { "ExecuteTime": { - "end_time": "2024-06-30T21:48:46.430179016Z", - "start_time": "2024-06-30T21:48:46.428480615Z" + "end_time": "2024-07-03T11:59:43.391580Z", + "start_time": "2024-07-03T11:59:43.386059Z" } }, "cell_type": "code", - "outputs": [], - "execution_count": 3, "source": [ "from jobflow import job, Flow\n", "\n", "\n", "@job\n", - "def my_function(my_parameter: int):\n", + "def my_function(my_parameter: int)->int:\n", " return my_parameter\n", "\n", "\n", "@job\n", - "def my_second_function(my_parameter: int):\n", + "def my_second_function(my_parameter: int)->int:\n", " return my_parameter\n", "\n", "\n", "job1 = my_function(my_parameter=1)\n", "job2 = my_second_function(job1.output)\n", "flow = Flow([job1, job2], job2.output)" - ] + ], + "outputs": [], + "execution_count": 2 }, { "cell_type": "markdown", @@ -120,17 +118,6 @@ }, { "cell_type": "code", - "execution_count": 4, - "outputs": [ - { - "data": { - "text/html": "", - "text/plain": "" - }, - "metadata": {}, - "output_type": "display_data" - } - ], "source": [ "import base64\n", "from IPython.display import Image, display\n", @@ -151,10 +138,25 @@ "metadata": { "collapsed": false, "ExecuteTime": { - "end_time": "2024-06-30T21:48:48.253403373Z", - "start_time": "2024-06-30T21:48:48.050654446Z" + "end_time": "2024-07-03T11:59:43.810036Z", + "start_time": "2024-07-03T11:59:43.392933Z" } - } + }, + "outputs": [ + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "execution_count": 3 }, { "metadata": {}, @@ -164,13 +166,11 @@ { "metadata": { "ExecuteTime": { - "end_time": "2024-06-30T21:48:50.046004482Z", - "start_time": "2024-06-30T21:48:50.036716636Z" + "end_time": "2024-07-03T11:59:43.817918Z", + "start_time": "2024-07-03T11:59:43.811514Z" } }, "cell_type": "code", - "outputs": [], - "execution_count": 5, "source": [ "from dataclasses import dataclass\n", "from jobflow import Maker\n", @@ -182,7 +182,7 @@ " scaling: float = 1.1\n", "\n", " @job\n", - " def make(self, my_parameter: int):\n", + " def make(self, my_parameter: int)->float:\n", " return my_parameter * self.scaling\n", "\n", "\n", @@ -193,7 +193,9 @@ "\n", "job1 = MyMaker().make(my_parameter=1)\n", "job2 = MyInheritedMaker().make(my_parameter=job1.output)" - ] + ], + "outputs": [], + "execution_count": 4 }, { "metadata": {}, @@ -215,9 +217,7 @@ }, { "cell_type": "markdown", - "source": [ - "For building the Quantum Espresso (QE)-related tasks, we start by importing essential tools to be able to create and navigate directories (`os`) and execute programs like the QE binary (`subprocess`). `pydantic` is very convenient for managing all kinds of data types." - ], + "source": "For building the Quantum Espresso (QE)-related tasks, we start by importing essential tools to be able to create and navigate directories (`os`) and execute programs like the QE binary (`subprocess`). `pydantic` is very convenient for validating all kinds of data types. These Pydantic models can then also be directly used for API development as part of [FAST-API](https://fastapi.tiangolo.com/).", "metadata": { "collapsed": false } @@ -231,12 +231,12 @@ ], "metadata": { "ExecuteTime": { - "end_time": "2024-06-30T21:48:52.311875326Z", - "start_time": "2024-06-30T21:48:52.302010359Z" + "end_time": "2024-07-03T11:59:43.822983Z", + "start_time": "2024-07-03T11:59:43.820356Z" } }, - "execution_count": 6, - "outputs": [] + "outputs": [], + "execution_count": 5 }, { "cell_type": "markdown", @@ -257,12 +257,12 @@ ], "metadata": { "ExecuteTime": { - "end_time": "2024-06-30T21:48:53.766161056Z", - "start_time": "2024-06-30T21:48:53.587585176Z" + "end_time": "2024-07-03T11:59:44.103958Z", + "start_time": "2024-07-03T11:59:43.824324Z" } }, - "execution_count": 7, - "outputs": [] + "outputs": [], + "execution_count": 6 }, { "cell_type": "markdown", @@ -281,12 +281,12 @@ ], "metadata": { "ExecuteTime": { - "end_time": "2024-06-30T21:48:54.880093570Z", - "start_time": "2024-06-30T21:48:54.760953766Z" + "end_time": "2024-07-03T11:59:44.490896Z", + "start_time": "2024-07-03T11:59:44.105647Z" } }, - "execution_count": 8, - "outputs": [] + "outputs": [], + "execution_count": 7 }, { "cell_type": "markdown", @@ -297,18 +297,18 @@ }, { "cell_type": "code", - "execution_count": 9, - "outputs": [], "source": [ "from adis_tools.parsers import parse_pw" ], "metadata": { "collapsed": false, "ExecuteTime": { - "end_time": "2024-06-30T21:48:55.896681674Z", - "start_time": "2024-06-30T21:48:55.826305046Z" + "end_time": "2024-07-03T11:59:44.614351Z", + "start_time": "2024-07-03T11:59:44.492579Z" } - } + }, + "outputs": [], + "execution_count": 8 }, { "cell_type": "markdown", @@ -320,7 +320,10 @@ { "cell_type": "code", "source": [ - "def generate_structures(structure, strain_lst): # structure should be of ase Atoms type\n", + "from pymatgen.io.ase import MSONAtoms\n", + "from ase import Atoms\n", + "def generate_structures(structure: Atoms, strain_lst: list(float)): # structure should be of ase Atoms type\n", + " structure=MSONAtoms(structure)\n", " structure_lst = []\n", " for strain in strain_lst:\n", " structure_strain = structure.copy()\n", @@ -332,12 +335,12 @@ ], "metadata": { "ExecuteTime": { - "end_time": "2024-06-30T21:48:56.861445479Z", - "start_time": "2024-06-30T21:48:56.853288871Z" + "end_time": "2024-07-03T11:59:45.187502Z", + "start_time": "2024-07-03T11:59:44.615864Z" } }, - "execution_count": 10, - "outputs": [] + "outputs": [], + "execution_count": 9 }, { "cell_type": "markdown", @@ -351,7 +354,7 @@ { "cell_type": "code", "source": [ - "def plot_energy_volume_curve(volume_lst, energy_lst):\n", + "def plot_energy_volume_curve(volume_lst: list(float), energy_lst:list(float)):\n", " plt.plot(volume_lst, energy_lst)\n", " plt.xlabel(\"Volume\")\n", " plt.ylabel(\"Energy\")\n", @@ -359,12 +362,12 @@ ], "metadata": { "ExecuteTime": { - "end_time": "2024-06-30T21:48:57.897136033Z", - "start_time": "2024-06-30T21:48:57.887720116Z" + "end_time": "2024-07-03T11:59:45.196059Z", + "start_time": "2024-07-03T11:59:45.190692Z" } }, - "execution_count": 11, - "outputs": [] + "outputs": [], + "execution_count": 10 }, { "cell_type": "markdown", @@ -378,7 +381,7 @@ { "cell_type": "code", "source": [ - "def write_input(input_dict, working_directory=\".\"):\n", + "def write_input(input_dict: dict, working_directory: str=\".\"):\n", " filename = os.path.join(working_directory, \"input.pwi\")\n", " os.makedirs(working_directory, exist_ok=True)\n", " write(\n", @@ -398,12 +401,12 @@ ], "metadata": { "ExecuteTime": { - "end_time": "2024-06-30T21:48:58.798035578Z", - "start_time": "2024-06-30T21:48:58.791457316Z" + "end_time": "2024-07-03T11:59:45.203075Z", + "start_time": "2024-07-03T11:59:45.198354Z" } }, - "execution_count": 12, - "outputs": [] + "outputs": [], + "execution_count": 11 }, { "cell_type": "markdown", @@ -420,19 +423,19 @@ "def collect_output(working_directory=\".\"):\n", " output = parse_pw(os.path.join(working_directory, \"pwscf.xml\"))\n", " return {\n", - " \"structure\": output[\"ase_structure\"],\n", + " \"structure\": MSONAtoms(output[\"ase_structure\"]),\n", " \"energy\": output[\"energy\"],\n", " \"volume\": output[\"ase_structure\"].get_volume(),\n", " }" ], "metadata": { "ExecuteTime": { - "end_time": "2024-06-30T21:48:59.718169308Z", - "start_time": "2024-06-30T21:48:59.711684823Z" + "end_time": "2024-07-03T11:59:45.208945Z", + "start_time": "2024-07-03T11:59:45.205071Z" } }, - "execution_count": 13, - "outputs": [] + "outputs": [], + "execution_count": 12 }, { "cell_type": "markdown", @@ -497,20 +500,20 @@ "\n", "\n", "def write_qe_input_set(\n", - " structure, input_set_generator=QEInputGenerator(), working_directory=\".\"\n", - "):\n", + " structure: Atoms, input_set_generator: InputGenerator=QEInputGenerator(), working_directory: str=\".\"\n", + ")->None:\n", " qis = input_set_generator.get_input_set(structure=structure)\n", " qis.write_input(working_directory=working_directory)" ], "metadata": { "collapsed": false, "ExecuteTime": { - "end_time": "2024-06-30T21:49:00.757248540Z", - "start_time": "2024-06-30T21:49:00.748276484Z" + "end_time": "2024-07-03T11:59:45.220507Z", + "start_time": "2024-07-03T11:59:45.210945Z" } }, - "execution_count": 14, - "outputs": [] + "outputs": [], + "execution_count": 13 }, { "cell_type": "markdown", @@ -526,14 +529,13 @@ "source": [ "from dataclasses import dataclass, field\n", "from jobflow import job, Maker\n", - "from pymatgen.io.ase import MSONAtoms\n", "from typing import Any, Optional, Union\n", "\n", "\n", "QE_CMD = \"mpirun -np 1 pw.x -in input.pwi > output.pwo\"\n", "\n", "\n", - "def run_qe(qe_cmd=QE_CMD):\n", + "def run_qe(qe_cmd: str=QE_CMD):\n", " subprocess.check_output(qe_cmd, shell=True, universal_newlines=True)\n", "\n", "\n", @@ -547,7 +549,7 @@ " output = collect_output(working_directory=working_directory)\n", " # structure object needs to be serializable, i.e., we need an additional transformation\n", " return cls(\n", - " structure=MSONAtoms(output[\"structure\"]),\n", + " structure=output[\"structure\"],\n", " energy=output[\"energy\"],\n", " volume=output[\"volume\"],\n", " )\n", @@ -570,7 +572,7 @@ " input_set_generator: QEInputGenerator = field(default_factory=QEInputGenerator)\n", "\n", " @job(output_schema=QETaskDoc)\n", - " def make(self, structure) -> QETaskDoc:\n", + " def make(self, structure: Atoms|MSONAtoms) -> QETaskDoc:\n", " \"\"\"\n", " Run a QE calculation.\n", "\n", @@ -619,209 +621,207 @@ ], "metadata": { "ExecuteTime": { - "end_time": "2024-06-30T21:49:02.075801908Z", - "start_time": "2024-06-30T21:49:01.757212927Z" + "end_time": "2024-07-03T11:59:45.236298Z", + "start_time": "2024-07-03T11:59:45.224011Z" } }, - "execution_count": 15, - "outputs": [] + "outputs": [], + "execution_count": 14 }, { "cell_type": "markdown", - "source": [ - "Finally, it's time to orchestrate all functions and classes together in an actual flow. Note how the number of jobs in `get_ev_curve` can be flexibly controlled by using `strain_lst` and therefore we use `Response` to handle the job output. By making `get_ev_curve` and `plot_energy_volume_curve_job` into `job` objects using the `@job` decorator, we ensure that first all the (E, V) data points are calculated before they are plotted. The `qe_flow` contains the list of the jobs that need to be executed in this workflow. The jobs are connected by the respective `job.output` objects that also ensures the correct order in executing the jobs." - ], + "source": "Finally, it's time to orchestrate all functions and classes together in an actual flow. Note how the number of jobs in `get_ev_curve` can be flexibly controlled by using `strain_lst` and therefore we use a `Response` object to handle the flexible job output. By making `get_ev_curve` and `plot_energy_volume_curve_job` into `job` objects using the `@job` decorator, we ensure that first all the (E, V) data points are calculated before they are plotted. The `qe_flow` contains the list of the jobs that need to be executed in this workflow. The jobs are connected by the respective `job.output` objects that also ensures the correct order in executing the jobs.", "metadata": { "collapsed": false } }, { "cell_type": "code", - "execution_count": 16, + "source": [ + "from jobflow import job, Response, Flow, run_locally\n", + "\n", + "# set up QE PP env\n", + "os.environ[\"ESPRESSO_PSEUDO\"] = f\"{os.getcwd()}/espresso/pseudo\"\n", + "\n", + "\n", + "@job\n", + "def get_ev_curve(structure: Atoms|MSONAtoms, strain_lst: list(float)):\n", + " structures = generate_structures(structure, strain_lst=strain_lst)\n", + " jobs = []\n", + " volumes = []\n", + " energies = []\n", + " for istructure in range(len(strain_lst)):\n", + " new_job = StaticQEMaker().make(structures[istructure])\n", + " jobs.append(new_job)\n", + " volumes.append(new_job.output.volume)\n", + " energies.append(new_job.output.energy)\n", + " return Response(\n", + " replace=Flow(jobs, output={\"energies\": energies, \"volumes\": volumes})\n", + " )\n", + "\n", + "\n", + "@job\n", + "def plot_energy_volume_curve_job(volume_lst: list(float), energy_lst: list(float)):\n", + " plot_energy_volume_curve(volume_lst=volume_lst, energy_lst=energy_lst)\n", + "\n", + "\n", + "structure = bulk(\"Al\", a=4.15, cubic=True)\n", + "relax = BaseQEMaker().make(structure=MSONAtoms(structure))\n", + "ev_curve_data = get_ev_curve(\n", + " relax.output.structure, strain_lst=np.linspace(0.9, 1.1, 5)\n", + ")\n", + "# structure optimization job and (E, V) curve data job connected via relax.output\n", + "plot_curve = plot_energy_volume_curve_job(\n", + " volume_lst=ev_curve_data.output[\"volumes\"],\n", + " energy_lst=ev_curve_data.output[\"energies\"],\n", + ")\n", + "# (E, V) curve data job and plotting the curve job connected via ev_curve_data.output\n", + "qe_flow = [relax, ev_curve_data, plot_curve]\n", + "qe_fw = Flow(qe_flow)\n", + "# qe_flow is the QE flow that consists of the job for structural optimization, calculating the (E, V) curve data points and plotting the curve\n", + "run_locally(\n", + " qe_fw, create_folders=True\n", + ") # order of the jobs in the flow is determined by connectivity\n", + "\n", + "graph = to_mermaid(qe_fw, show_flow_boxes=True)\n", + "mm(graph)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2024-07-03T12:01:06.585706Z", + "start_time": "2024-07-03T11:59:45.237648Z" + } + }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "2024-06-30 23:49:02,706 INFO Started executing jobs locally\n", - "2024-06-30 23:49:02,710 INFO Starting job - base qe job (6883bfe0-2b20-49de-92df-c166d6f91dbc)\n" + "2024-07-03 13:59:45,257 INFO Started executing jobs locally\n", + "2024-07-03 13:59:45,266 INFO Starting job - base qe job (fb7e4f95-15fe-4ae0-aa3b-3d0f5dc8b65a)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ - "Authorization required, but no authorization protocol specified\n", - "Authorization required, but no authorization protocol specified\n" + "Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ - "2024-06-30 23:49:32,255 INFO Finished job - base qe job (6883bfe0-2b20-49de-92df-c166d6f91dbc)\n", - "2024-06-30 23:49:32,256 INFO Starting job - get_ev_curve (dfc9a5cd-fb91-4582-b6c6-c42b4c65cb83)\n", - "2024-06-30 23:49:32,268 INFO Finished job - get_ev_curve (dfc9a5cd-fb91-4582-b6c6-c42b4c65cb83)\n", - "2024-06-30 23:49:32,271 INFO Starting job - static qe job (0a57429a-2360-4293-8f2a-5323a1173570)\n" + "2024-07-03 14:00:31,075 INFO Finished job - base qe job (fb7e4f95-15fe-4ae0-aa3b-3d0f5dc8b65a)\n", + "2024-07-03 14:00:31,078 INFO Starting job - get_ev_curve (8df443c4-f690-4c1b-9126-8455b6458d9d)\n", + "2024-07-03 14:00:31,099 INFO Finished job - get_ev_curve (8df443c4-f690-4c1b-9126-8455b6458d9d)\n", + "2024-07-03 14:00:31,105 INFO Starting job - static qe job (8c2ce097-6b17-43dd-9f03-d4477338688b)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ - "Authorization required, but no authorization protocol specified\n", - "Authorization required, but no authorization protocol specified\n" + "Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ - "2024-06-30 23:49:35,977 INFO Finished job - static qe job (0a57429a-2360-4293-8f2a-5323a1173570)\n", - "2024-06-30 23:49:35,979 INFO Starting job - static qe job (71996726-fb51-4a1f-8585-373a470bd16c)\n" + "2024-07-03 14:00:37,026 INFO Finished job - static qe job (8c2ce097-6b17-43dd-9f03-d4477338688b)\n", + "2024-07-03 14:00:37,029 INFO Starting job - static qe job (046ab976-1dc9-4b45-816d-86297a209912)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ - "Authorization required, but no authorization protocol specified\n", - "Authorization required, but no authorization protocol specified\n" + "Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ - "2024-06-30 23:49:39,977 INFO Finished job - static qe job (71996726-fb51-4a1f-8585-373a470bd16c)\n", - "2024-06-30 23:49:39,982 INFO Starting job - static qe job (2cf7b6c9-1f19-4b2f-b85f-df201d546c26)\n" + "2024-07-03 14:00:43,878 INFO Finished job - static qe job (046ab976-1dc9-4b45-816d-86297a209912)\n", + "2024-07-03 14:00:43,883 INFO Starting job - static qe job (2e1199a8-12c0-4df0-afdc-2543f0a1652b)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ - "Authorization required, but no authorization protocol specified\n", - "Authorization required, but no authorization protocol specified\n" + "Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ - "2024-06-30 23:49:44,274 INFO Finished job - static qe job (2cf7b6c9-1f19-4b2f-b85f-df201d546c26)\n", - "2024-06-30 23:49:44,275 INFO Starting job - static qe job (a04f788d-9bc1-4453-9897-5d0b4e9321d3)\n" + "2024-07-03 14:00:50,753 INFO Finished job - static qe job (2e1199a8-12c0-4df0-afdc-2543f0a1652b)\n", + "2024-07-03 14:00:50,759 INFO Starting job - static qe job (b6118649-654f-445c-ba44-091c6bf6a5af)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ - "Authorization required, but no authorization protocol specified\n", - "Authorization required, but no authorization protocol specified\n" + "Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ - "2024-06-30 23:49:49,249 INFO Finished job - static qe job (a04f788d-9bc1-4453-9897-5d0b4e9321d3)\n", - "2024-06-30 23:49:49,254 INFO Starting job - static qe job (0ed3d090-bd30-49da-96ed-143b8e7f5b4a)\n" + "2024-07-03 14:00:58,591 INFO Finished job - static qe job (b6118649-654f-445c-ba44-091c6bf6a5af)\n", + "2024-07-03 14:00:58,598 INFO Starting job - static qe job (ed35e52e-fd72-495a-ae7f-4f2f949d8031)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ - "Authorization required, but no authorization protocol specified\n", - "Authorization required, but no authorization protocol specified\n" + "Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ - "2024-06-30 23:49:54,070 INFO Finished job - static qe job (0ed3d090-bd30-49da-96ed-143b8e7f5b4a)\n", - "2024-06-30 23:49:54,071 INFO Starting job - store_inputs (dfc9a5cd-fb91-4582-b6c6-c42b4c65cb83, 2)\n", - "2024-06-30 23:49:54,073 INFO Finished job - store_inputs (dfc9a5cd-fb91-4582-b6c6-c42b4c65cb83, 2)\n", - "2024-06-30 23:49:54,074 INFO Starting job - plot_energy_volume_curve_job (92d14a25-bb90-4c86-b970-af05db90550e)\n", - "2024-06-30 23:49:54,163 INFO Finished job - plot_energy_volume_curve_job (92d14a25-bb90-4c86-b970-af05db90550e)\n", - "2024-06-30 23:49:54,173 INFO Finished executing jobs locally\n" + "2024-07-03 14:01:06,332 INFO Finished job - static qe job (ed35e52e-fd72-495a-ae7f-4f2f949d8031)\n", + "2024-07-03 14:01:06,337 INFO Starting job - store_inputs (8df443c4-f690-4c1b-9126-8455b6458d9d, 2)\n", + "2024-07-03 14:01:06,340 INFO Finished job - store_inputs (8df443c4-f690-4c1b-9126-8455b6458d9d, 2)\n", + "2024-07-03 14:01:06,343 INFO Starting job - plot_energy_volume_curve_job (4bb64989-7cc3-4dc4-900f-19c10d959dfa)\n", + "2024-07-03 14:01:06,497 INFO Finished job - plot_energy_volume_curve_job (4bb64989-7cc3-4dc4-900f-19c10d959dfa)\n", + "2024-07-03 14:01:06,498 INFO Finished executing jobs locally\n" ] }, { "data": { - "text/html": "", - "text/plain": "" + "text/html": [ + "" + ], + "text/plain": [ + "" + ] }, "metadata": {}, "output_type": "display_data" }, { "data": { - "text/plain": "
", - "image/png": "" + "text/plain": [ + "
" + ], + "image/png": "" }, "metadata": {}, "output_type": "display_data" } ], - "source": [ - "from jobflow import job, Response, Flow, run_locally\n", - "\n", - "# set up QE PP env\n", - "os.environ[\"ESPRESSO_PSEUDO\"] = f\"{os.getcwd()}/espresso/pseudo\"\n", - "\n", - "\n", - "@job\n", - "def get_ev_curve(structure, strain_lst):\n", - " structures = generate_structures(structure, strain_lst=strain_lst)\n", - " jobs = []\n", - " volumes = []\n", - " energies = []\n", - " for istructure in range(len(strain_lst)):\n", - " new_job = StaticQEMaker().make(structures[istructure])\n", - " jobs.append(new_job)\n", - " volumes.append(new_job.output.volume)\n", - " energies.append(new_job.output.energy)\n", - " return Response(\n", - " replace=Flow(jobs, output={\"energies\": energies, \"volumes\": volumes})\n", - " )\n", - "\n", - "\n", - "@job\n", - "def plot_energy_volume_curve_job(volume_lst, energy_lst):\n", - " plot_energy_volume_curve(volume_lst=volume_lst, energy_lst=energy_lst)\n", - "\n", - "\n", - "structure = bulk(\"Al\", a=4.15, cubic=True)\n", - "relax = BaseQEMaker().make(structure=MSONAtoms(structure))\n", - "ev_curve_data = get_ev_curve(\n", - " relax.output.structure, strain_lst=np.linspace(0.9, 1.1, 5)\n", - ")\n", - "# structure optimization job and (E, V) curve data job connected via relax.output\n", - "plot_curve = plot_energy_volume_curve_job(\n", - " volume_lst=ev_curve_data.output[\"volumes\"],\n", - " energy_lst=ev_curve_data.output[\"energies\"],\n", - ")\n", - "# (E, V) curve data job and plotting the curve job connected via ev_curve_data.output\n", - "qe_flow = [relax, ev_curve_data, plot_curve]\n", - "qe_fw = Flow(qe_flow)\n", - "# qe_flow is the QE flow that consists of the job for structural optimization, calculating the (E, V) curve data points and plotting the curve\n", - "run_locally(\n", - " qe_fw, create_folders=True\n", - ") # order of the jobs in the flow is determined by connectivity\n", - "\n", - "graph = to_mermaid(qe_fw, show_flow_boxes=True)\n", - "mm(graph)" - ], - "metadata": { - "collapsed": false, - "ExecuteTime": { - "end_time": "2024-06-30T21:49:54.258057729Z", - "start_time": "2024-06-30T21:49:02.701155072Z" - } - } + "execution_count": 15 }, { "cell_type": "markdown", @@ -834,17 +834,6 @@ }, { "cell_type": "code", - "execution_count": 21, - "outputs": [ - { - "data": { - "text/html": "", - "text/plain": "" - }, - "metadata": {}, - "output_type": "display_data" - } - ], "source": [ "mm(\n", " \"\"\"\n", @@ -857,10 +846,25 @@ "metadata": { "collapsed": false, "ExecuteTime": { - "end_time": "2024-06-30T21:52:49.647846141Z", - "start_time": "2024-06-30T21:52:49.604021289Z" + "end_time": "2024-07-03T12:01:06.592117Z", + "start_time": "2024-07-03T12:01:06.587277Z" } - } + }, + "outputs": [ + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "execution_count": 16 }, { "metadata": { @@ -898,20 +902,11 @@ "cell_type": "markdown", "source": [ "## Publication of the workflow\n", - "The `jobflow` infrastructure does not provide a dedicated platform for publishing a workflow currently. However, workflows related to computational materials science have been collected in the package `atomate2`. In addition, users can build their own package by relying on jobflow and share it as any Python project." + "The `jobflow` infrastructure does not provide a dedicated platform for publishing a workflow currently. However, workflows related to computational materials science have been collected in the package `atomate2`. In addition, users can build their own package by relying on jobflow and share it as a new Python-based program. Additional packages in materials science using `jobflow` exist." ], "metadata": { "collapsed": false } - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [], - "metadata": { - "collapsed": false - } } ] } From 73b06f691cd26971495269446f08f60b222be61d Mon Sep 17 00:00:00 2001 From: JaGeo Date: Wed, 3 Jul 2024 15:17:35 +0200 Subject: [PATCH 3/5] reformatting --- jobflow.ipynb | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/jobflow.ipynb b/jobflow.ipynb index 3e7f9eb..8a5ce20 100644 --- a/jobflow.ipynb +++ b/jobflow.ipynb @@ -64,7 +64,7 @@ "\n", "\n", "@job\n", - "def my_function(my_parameter: int)->int:\n", + "def my_function(my_parameter: int) -> int:\n", " return my_parameter\n", "\n", "\n", @@ -91,12 +91,12 @@ "\n", "\n", "@job\n", - "def my_function(my_parameter: int)->int:\n", + "def my_function(my_parameter: int) -> int:\n", " return my_parameter\n", "\n", "\n", "@job\n", - "def my_second_function(my_parameter: int)->int:\n", + "def my_second_function(my_parameter: int) -> int:\n", " return my_parameter\n", "\n", "\n", @@ -182,7 +182,7 @@ " scaling: float = 1.1\n", "\n", " @job\n", - " def make(self, my_parameter: int)->float:\n", + " def make(self, my_parameter: int) -> float:\n", " return my_parameter * self.scaling\n", "\n", "\n", @@ -322,8 +322,12 @@ "source": [ "from pymatgen.io.ase import MSONAtoms\n", "from ase import Atoms\n", - "def generate_structures(structure: Atoms, strain_lst: list(float)): # structure should be of ase Atoms type\n", - " structure=MSONAtoms(structure)\n", + "\n", + "\n", + "def generate_structures(\n", + " structure: Atoms, strain_lst: list(float)\n", + "): # structure should be of ase Atoms type\n", + " structure = MSONAtoms(structure)\n", " structure_lst = []\n", " for strain in strain_lst:\n", " structure_strain = structure.copy()\n", @@ -354,7 +358,7 @@ { "cell_type": "code", "source": [ - "def plot_energy_volume_curve(volume_lst: list(float), energy_lst:list(float)):\n", + "def plot_energy_volume_curve(volume_lst: list(float), energy_lst: list(float)):\n", " plt.plot(volume_lst, energy_lst)\n", " plt.xlabel(\"Volume\")\n", " plt.ylabel(\"Energy\")\n", @@ -381,7 +385,7 @@ { "cell_type": "code", "source": [ - "def write_input(input_dict: dict, working_directory: str=\".\"):\n", + "def write_input(input_dict: dict, working_directory: str = \".\"):\n", " filename = os.path.join(working_directory, \"input.pwi\")\n", " os.makedirs(working_directory, exist_ok=True)\n", " write(\n", @@ -500,8 +504,10 @@ "\n", "\n", "def write_qe_input_set(\n", - " structure: Atoms, input_set_generator: InputGenerator=QEInputGenerator(), working_directory: str=\".\"\n", - ")->None:\n", + " structure: Atoms,\n", + " input_set_generator: InputGenerator = QEInputGenerator(),\n", + " working_directory: str = \".\",\n", + ") -> None:\n", " qis = input_set_generator.get_input_set(structure=structure)\n", " qis.write_input(working_directory=working_directory)" ], @@ -535,7 +541,7 @@ "QE_CMD = \"mpirun -np 1 pw.x -in input.pwi > output.pwo\"\n", "\n", "\n", - "def run_qe(qe_cmd: str=QE_CMD):\n", + "def run_qe(qe_cmd: str = QE_CMD):\n", " subprocess.check_output(qe_cmd, shell=True, universal_newlines=True)\n", "\n", "\n", @@ -572,7 +578,7 @@ " input_set_generator: QEInputGenerator = field(default_factory=QEInputGenerator)\n", "\n", " @job(output_schema=QETaskDoc)\n", - " def make(self, structure: Atoms|MSONAtoms) -> QETaskDoc:\n", + " def make(self, structure: Atoms | MSONAtoms) -> QETaskDoc:\n", " \"\"\"\n", " Run a QE calculation.\n", "\n", @@ -645,7 +651,7 @@ "\n", "\n", "@job\n", - "def get_ev_curve(structure: Atoms|MSONAtoms, strain_lst: list(float)):\n", + "def get_ev_curve(structure: Atoms | MSONAtoms, strain_lst: list(float)):\n", " structures = generate_structures(structure, strain_lst=strain_lst)\n", " jobs = []\n", " volumes = []\n", From 5f3c2000579d530476ca7815f722e89a0dd64a82 Mon Sep 17 00:00:00 2001 From: QuantumChemist Date: Sun, 7 Jul 2024 18:15:51 +0200 Subject: [PATCH 4/5] a small revision --- jobflow.ipynb | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/jobflow.ipynb b/jobflow.ipynb index 8a5ce20..cd5a238 100644 --- a/jobflow.ipynb +++ b/jobflow.ipynb @@ -31,7 +31,9 @@ { "metadata": {}, "cell_type": "markdown", - "source": "[`jobflow`](https://materialsproject.github.io/jobflow/index.html) and [`atomate2`](https://materialsproject.github.io/atomate2/index.html) are key packages of the [Materials Project](https://materialsproject.org/) . `jobflow` was especially designed to simplify the execution of dynamic workflows -- when the actual number of jobs is dynamically determined upon runtime instead of being statically fixed before running the workflow(s). `jobflow`'s overall flexibility allows for building workflows that go beyond the usage in materials science. `jobflow` forms the basis of `atomate2`. `atomate2` implements data generation workflows in the context of materials science and will be responsible for data generation in the Materials Project in the future. " + "source": [ + "[`jobflow`](https://materialsproject.github.io/jobflow/index.html) and [`atomate2`](https://materialsproject.github.io/atomate2/index.html) are key packages of the [Materials Project](https://materialsproject.org/) . `jobflow` was especially designed to simplify the execution of dynamic workflows -- when the actual number of jobs is dynamically determined upon runtime instead of being statically fixed before running the workflow(s). `jobflow`'s overall flexibility allows for building workflows that go beyond the usage in materials science. `jobflow` serves as the basis of `atomate2`, which implements data generation workflows in the context of materials science and will be used for data generation in the Materials Project in the future. " + ] }, { "metadata": {}, @@ -39,7 +41,7 @@ "source": [ "## Installation / Setup\n", "`jobflow` can be installed via `pip install` and directly run with the default setup. It will then internally rely on a memory database as defined in the software package [`maggma`](https://materialsproject.github.io/maggma/). For large-scale usage, a [MongoDB](https://www.mongodb.com/)-like database might be specified in a jobflow.yaml file.\n", - "A high-throughput setup (i.e., parallel execution of independent parts in the workflow) of `jobflow` can be achieved using additional packages like [`fireworks`](https://materialsproject.github.io/fireworks/) or [`jobflow-remote`](https://matgenix.github.io/jobflow-remote/). Both packages require a MongoDB database. In case of `FireWorks`, however, the MongoDB database needs to be directly connected to the compute nodes. `jobflow-remote` allows remote submission options that only require a MongoDB database on the submitting computer but not the compute nodes. It can also deal with multi-factor authentification." + "A high-throughput setup (i.e., parallel execution of independent parts in the workflow) of `jobflow` can be achieved using additional packages like [`fireworks`](https://materialsproject.github.io/fireworks/) or [`jobflow-remote`](https://matgenix.github.io/jobflow-remote/). Both packages require a MongoDB database. In case of `FireWorks`, however, the MongoDB database needs to be directly connected to the compute nodes. `jobflow-remote` allows remote submission options that only require a MongoDB database on the submitting computer but not the compute nodes. It can also deal with multi-factor authentication." ] }, { @@ -240,7 +242,9 @@ }, { "cell_type": "markdown", - "source": "Then, we import tools for data plotting and mathematical operations and manipulation.", + "source": [ + "Then, we import tools for data plotting as well as mathematical operations and manipulation." + ], "metadata": { "collapsed": false, "ExecuteTime": { @@ -325,7 +329,7 @@ "\n", "\n", "def generate_structures(\n", - " structure: Atoms, strain_lst: list(float)\n", + " structure: Atoms, strain_lst: list[float]\n", "): # structure should be of ase Atoms type\n", " structure = MSONAtoms(structure)\n", " structure_lst = []\n", @@ -358,7 +362,7 @@ { "cell_type": "code", "source": [ - "def plot_energy_volume_curve(volume_lst: list(float), energy_lst: list(float)):\n", + "def plot_energy_volume_curve(volume_lst: list[float], energy_lst: list[float]):\n", " plt.plot(volume_lst, energy_lst)\n", " plt.xlabel(\"Volume\")\n", " plt.ylabel(\"Energy\")\n", @@ -524,7 +528,7 @@ { "cell_type": "markdown", "source": [ - "The next steps are all concerned about handling the execution of QE and the output data collection. We start by defining a QE task document `QETaskDoc` to systematically collect the output data. For the (E, V) curve, the energy and volume are of course the most important information. The task document could be further extended to contain information that is relevant for other purposes. Next, we define a `BaseQEMaker` to handle generic QE jobs (in our case for the structural relaxation) and a separate `StaticQEMaker` for the static QE calculations. The `BaseQEMaker` is expecting the generic input set generated by `QEInputGenerator`, while `StaticQEMaker` expects the `QEInputStaticGenerator` type. As the `StaticQEMaker` inherits from the `BaseQEMaker`, we only need to make sure to pass the correct input set generator type." + "The next steps are all about handling the execution of QE and the output data collection. We start by defining a QE task document `QETaskDoc` to systematically collect the output data. For the (E, V) curve, the energy and volume are of course the most important information. The task document could be further extended to contain other relevant information. Next, we define a `BaseQEMaker` to handle generic QE jobs (in our case for the structural relaxation) and a separate `StaticQEMaker` for the static QE calculations. The `BaseQEMaker` is expecting the generic input set generated by `QEInputGenerator`, while `StaticQEMaker` expects the `QEInputStaticGenerator` type. As the `StaticQEMaker` inherits from the `BaseQEMaker`, we only need to make sure to pass the correct input set generator type." ], "metadata": { "collapsed": false @@ -636,7 +640,9 @@ }, { "cell_type": "markdown", - "source": "Finally, it's time to orchestrate all functions and classes together in an actual flow. Note how the number of jobs in `get_ev_curve` can be flexibly controlled by using `strain_lst` and therefore we use a `Response` object to handle the flexible job output. By making `get_ev_curve` and `plot_energy_volume_curve_job` into `job` objects using the `@job` decorator, we ensure that first all the (E, V) data points are calculated before they are plotted. The `qe_flow` contains the list of the jobs that need to be executed in this workflow. The jobs are connected by the respective `job.output` objects that also ensures the correct order in executing the jobs.", + "source": [ + "Finally, it's time to orchestrate all functions and classes together into an actual flow. Note how the number of jobs in `get_ev_curve` can be flexibly controlled by using `strain_lst` and therefore we use a `Response` object to handle the flexible job output. By making `get_ev_curve` and `plot_energy_volume_curve_job` into `job` objects using the `@job` decorator, we ensure that first all the (E, V) data points are calculated before they are plotted. The `qe_flow` contains the list of the jobs that need to be executed in this workflow. The jobs are connected by the respective `job.output` objects that also ensures the correct order in executing the jobs." + ], "metadata": { "collapsed": false } @@ -687,7 +693,7 @@ "# qe_flow is the QE flow that consists of the job for structural optimization, calculating the (E, V) curve data points and plotting the curve\n", "run_locally(\n", " qe_fw, create_folders=True\n", - ") # order of the jobs in the flow is determined by connectivity\n", + ") # order of the jobs in the flow determined by connectivity\n", "\n", "graph = to_mermaid(qe_fw, show_flow_boxes=True)\n", "mm(graph)" @@ -908,11 +914,20 @@ "cell_type": "markdown", "source": [ "## Publication of the workflow\n", - "The `jobflow` infrastructure does not provide a dedicated platform for publishing a workflow currently. However, workflows related to computational materials science have been collected in the package `atomate2`. In addition, users can build their own package by relying on jobflow and share it as a new Python-based program. Additional packages in materials science using `jobflow` exist." + "The `jobflow` infrastructure does not provide a dedicated platform for publishing a workflow currently. However, workflows related to computational materials science have been collected in the package `atomate2`. In addition, users can build their own package by relying on `jobflow` and share it as a new Python-based program. There are also additional materials science packages that rely on `jobflow`." ], "metadata": { "collapsed": false } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [], + "metadata": { + "collapsed": false + } } ] } From 68dc0518fd146f7c8ca4bf670e4e4028ecb90da2 Mon Sep 17 00:00:00 2001 From: QuantumChemist Date: Sun, 7 Jul 2024 18:51:59 +0200 Subject: [PATCH 5/5] giving some examples fo outside MP use of jobflow --- jobflow.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/jobflow.ipynb b/jobflow.ipynb index cd5a238..5de51ee 100644 --- a/jobflow.ipynb +++ b/jobflow.ipynb @@ -914,7 +914,7 @@ "cell_type": "markdown", "source": [ "## Publication of the workflow\n", - "The `jobflow` infrastructure does not provide a dedicated platform for publishing a workflow currently. However, workflows related to computational materials science have been collected in the package `atomate2`. In addition, users can build their own package by relying on `jobflow` and share it as a new Python-based program. There are also additional materials science packages that rely on `jobflow`." + "The `jobflow` infrastructure does not provide a dedicated platform for publishing a workflow currently. However, workflows related to computational materials science have been collected in the package `atomate2`. In addition, users can build their own package by relying on `jobflow` and share it as a new Python-based program. There are also additional materials science packages like [NanoParticleTools](https://github.com/BlauGroup/NanoParticleTools) or [QuAcc](https://github.com/Quantum-Accelerators/quacc) that rely on `jobflow`." ], "metadata": { "collapsed": false