From fd14c2010ac0856ae654ca21ca53f14a20245924 Mon Sep 17 00:00:00 2001 From: Garrett Birkel Date: Thu, 17 Oct 2024 13:58:12 -0700 Subject: [PATCH 1/4] Moving some setup/configuration info to the main README since it applies to the whole project. Moving beamline-specific docs into the docs folder and linking them (keeping them in orchestration/flows will get messy as we start re-using flows across beamlines). Creating an .env-example file to aid in initial configuration. --- .env-example | 4 + README.md | 89 ++++++++++++++++++- .../flows/bl7012/README.md => docs/bl7012.md | 64 +------------ scripts/README.md => docs/bl832_ALCF.md | 0 docs/{globus_endpoint.md => globus.md} | 19 ++-- 5 files changed, 106 insertions(+), 70 deletions(-) create mode 100644 .env-example rename orchestration/flows/bl7012/README.md => docs/bl7012.md (62%) rename scripts/README.md => docs/bl832_ALCF.md (100%) rename docs/{globus_endpoint.md => globus.md} (53%) diff --git a/.env-example b/.env-example new file mode 100644 index 0000000..2b8fdb8 --- /dev/null +++ b/.env-example @@ -0,0 +1,4 @@ +GLOBUS_CLIENT_ID= +GLOBUS_CLIENT_SECRET= +PREFECT_API_URL= +PREFECT_API_KEY= \ No newline at end of file diff --git a/README.md b/README.md index 3dceeb6..cd7cd5c 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,89 @@ # Splash Flows Globus -This repo contains code that can be used to run Prefect Orion workflows for Globus data movement. -These flows can be run from the command line or built into a self-contained docker container. \ No newline at end of file +This repo contains configuration and code for Prefect workflows to move data and run computing tasks. Many of the workflows use Globus for data movement between local servers and back and forth to NERSC. + +These flows can be run from the command line or built into a self-contained docker container. + +## Getting started + +### Clone this repo and set up the python enviornment: +``` +$ git clone git@github.com:als-computing/splash_flows_globus.git +$ cd splash_flows_globus +$ pip3 install -e . +``` + +### Provide Prefect and Globus authentication in `.env`: + +Use `.env-example` as a template. + +``` +GLOBUS_CLIENT_ID= +GLOBUS_CLIENT_SECRET= +PREFECT_API_URL= +PREFECT_API_KEY= +``` + +## General configuration + +### Define your Globus collection endpoints in `config.yml`: + +``` +globus: + globus_endpoints: + spot832: + root_path: / + uri: spot832.lbl.gov + uuid: 44ae904c-ab64-4145-a8f0-7287de38324d +``` + +### Create a Prefect deployment workflow `create_deployments.sh` file: + +``` +prefect deployment build : -n 'name_of_the_workflow' -q +prefect deployment apply -deployment.yaml +``` + +The following example creates a Prefect workflow for the function of `process_new_file` in file of `./orchestration/flows/bl7012/move.py` + +``` +prefect deployment build ./orchestration/flows/bl7012/move.py:process_new_file -n 'process_newdata7012' -q bl7012 +prefect deployment apply process_new_file-deployment.yaml +``` + +## Starting a Prefect workflow manually + +Below is the command to start the Prefect workflow: +``` +python -m orchestration.flows.bl7012.move +``` + +## Submitting workflow via Prefect API + +An example is shown `example.ipynb` to submit a PREFECT workflow to PREFECT server. + +Once the job is submitted, a workflow agent is needed to work on jobs in queue. A workflow agent can be launched by: + +``` +prefect agent start -q +``` + +It requires to have the `PREFECT_API_URL` and `PREFECT_API_KEY` stored as an environment variables, such that the agent knows where to get the work queue. Once the agent is launched, the following message indicates where the agent is currently listening to. + +``` +Starting v2.7.9 agent connected to https://.../api... + + ___ ___ ___ ___ ___ ___ _____ _ ___ ___ _ _ _____ + | _ \ _ \ __| __| __/ __|_ _| /_\ / __| __| \| |_ _| + | _/ / _|| _|| _| (__ | | / _ \ (_ | _|| .` | | | + |_| |_|_\___|_| |___\___| |_| /_/ \_\___|___|_|\_| |_| + + +Agent started! Looking for work from queue(s): ... +``` + +## More specific documentation can be found in the `docs` folder: + +* ["Globus configuration"](./docs/globus.md) +* ["8.3.2 ALCF Globus Flow And Reconstruction Setup"](./docs/bl832_ALCF.md) +* ["Data movement for BL7012"](./docs/bl7012.md) \ No newline at end of file diff --git a/orchestration/flows/bl7012/README.md b/docs/bl7012.md similarity index 62% rename from orchestration/flows/bl7012/README.md rename to docs/bl7012.md index 0d703e5..fd85bd2 100644 --- a/orchestration/flows/bl7012/README.md +++ b/docs/bl7012.md @@ -1,65 +1,6 @@ -# Data movement BL7012 -This package contains functions to initiate Prefect data processing workflows for COSMIC data. The current implemented Prefect workflow enables data movement from a Globus beamline endpoint to the desinated NERSC Globus endpoint. +# Data movement for BL7012 -# Getting start -Set up a python enviornment as follow: -``` -$ git clone https://github.com/grace227/splash_flows_globus.git -$ cd splash_flows_globus -$ pip install -e . -``` -Provide information of a Globus collection endpoint in `splash_flows_globus/config.yml`: -``` -globus: - globus_endpoints: - spot832: - root_path: / - uri: spot832.lbl.gov - uuid: 44ae904c-ab64-4145-a8f0-7287de38324d -``` -Provide information of Prefect and Globus authentication in `.env` file: -``` -GLOBUS_CLIENT_ID= -GLOBUS_CLIENT_SECRET= -PREFECT_API_URL= -PREFECT_API_KEY= -``` -Create a Prefect deployment workflow. Append new workflow in the `create_deployments.sh` file: -``` -prefect deployment build : -n 'name_of_the_workflow' -q -prefect deployment apply -deployment.yaml -``` -Following example creates a Prefect workflow for the function of `process_new_file` in file of `./orchestration/flows/bl7012/move.py` -``` -prefect deployment build ./orchestration/flows/bl7012/move.py:process_new_file -n 'process_newdata7012' -q bl7012 -prefect deployment apply process_new_file-deployment.yaml -``` - -# Starting a Prefect workflow manually -Below is the command to start the Prefect workflow: -``` -python -m orchestration.flows.bl7012.move -``` - -# Submitting workflow via Prefect API -An example is shown `example.ipynb` to submit a PREFECT workflow to PREFECT server. - -Once the job is submitted, a workflow agent is needed to work on jobs in queue. A workflow agent can be launched by: -``` -prefect agent start -q -``` -It requires to have the `PREFECT_API_URL` and `PREFECT_API_KEY` stored as an environment variables, such that the agent knows where to get the work queue. Once the agent is launched, the following message indicates where the agent is currently listening to. -``` -Starting v2.7.9 agent connected to https://.../api... - - ___ ___ ___ ___ ___ ___ _____ _ ___ ___ _ _ _____ - | _ \ _ \ __| __| __/ __|_ _| /_\ / __| __| \| |_ _| - | _/ / _|| _|| _| (__ | | / _ \ (_ | _|| .` | | | - |_| |_|_\___|_| |___\___| |_| /_/ \_\___|___|_|\_| |_| - - -Agent started! Looking for work from queue(s): ... -``` +This package contains functions to initiate Prefect data processing workflows for COSMIC data. The current implemented Prefect workflow enables data movement from a Globus beamline endpoint to the desinated NERSC Globus endpoint. # Deploy a Ptychograpy NERSC Prefect Agent A Prefect Agent is process that exists to run the code in a Prefect Flow. This repository supports building a container that runs prefect agents for several beamline operations at NERSC. The bl7012 agent has code that can copy data to NERSC and luanch a reconstruction at NERSC. The reconstruction is performed at NERSC, and the agent uses the [Super Facility API](https://docs.nersc.gov/services/sfapi/) to launch reconstruction jobs. @@ -103,7 +44,6 @@ title: Deployment --- graph - bc[Beamline Computer] --new flow--> prefect_api subgraph ps [Prefect Server] diff --git a/scripts/README.md b/docs/bl832_ALCF.md similarity index 100% rename from scripts/README.md rename to docs/bl832_ALCF.md diff --git a/docs/globus_endpoint.md b/docs/globus.md similarity index 53% rename from docs/globus_endpoint.md rename to docs/globus.md index 4de178a..5858d56 100644 --- a/docs/globus_endpoint.md +++ b/docs/globus.md @@ -1,13 +1,20 @@ -# Notes about Globus Endpoints +# Globus + +[​Globus](https://www.globus.org/) is a file transport service, developed and operated as not-for-profit by the University of Chicago. There is also [​Globus Compute](https://www.globus.org/compute), which uses similar infrastructure to link compute services. + +The UoC operates the server, and we deploy client software to make "endpoints" on our machines and move files between them. + +## How to set up a Globus endpoint -## Configuring an endpoint As of globus server v5, the configuration of a globus endpoint for data transfer via API has changed a bit. -The data orchestration code has a ("config.yml")[../config.yml"] file that contains configuration for endpoints. +The data orchestration code has a ["config.yml"](../config.yml") file that contains configuration for endpoints. + +In the following steps, a "Guest Collection" is created, your application is assigned to it, and your client application is given permission to it. - > Note: A "Guest Collection" is created and your application is assigned to it and your client application is given permission to it. Before performing this step, have your Globus Application configurated. + > Note: Before performing these steps, have your Globus Application configurated. -To setup a new endpoint, +To set up a new endpoint, * Create a collection for transfer. * In the globus web app, find the endpoint and click on the vertical ellipse to view the endpoint details. * Click on the "Collections" tab @@ -21,7 +28,7 @@ To setup a new endpoint, * Add an email * Click "Add Permission" - Now that you have collection for sharing, the "Endpoint UUID" of this collection is your new endpoint UUID. Set it in ("config.yml")[../config.yml"] + Now that you have collection for sharing, the "Endpoint UUID" of this collection is your new endpoint UUID. Set it in ["config.yml"]("../config.yml") From 9d52babe35a1e9a2ab652252a2e67bf8c362748c Mon Sep 17 00:00:00 2001 From: Garrett Birkel Date: Thu, 17 Oct 2024 14:28:13 -0700 Subject: [PATCH 2/4] Using a '.' to match https://github.com/als-computing/service_configs/blob/main/flow-prd/.env.example --- .env-example => .env.example | 0 README.md | 2 +- 2 files changed, 1 insertion(+), 1 deletion(-) rename .env-example => .env.example (100%) diff --git a/.env-example b/.env.example similarity index 100% rename from .env-example rename to .env.example diff --git a/README.md b/README.md index cd7cd5c..65bfe59 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,7 @@ $ pip3 install -e . ### Provide Prefect and Globus authentication in `.env`: -Use `.env-example` as a template. +Use `.env.example` as a template. ``` GLOBUS_CLIENT_ID= From dfb42d56d3eabf6d42ffd1f04b4bb6e3bce45d4e Mon Sep 17 00:00:00 2001 From: Garrett Birkel Date: Mon, 21 Oct 2024 11:42:02 -0700 Subject: [PATCH 3/4] Some clarity on where create_deployment scripts are meant to be run --- README.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 65bfe59..7ebe001 100644 --- a/README.md +++ b/README.md @@ -26,7 +26,7 @@ PREFECT_API_KEY= ## General configuration -### Define your Globus collection endpoints in `config.yml`: +### Globus collection endpoints are defined in `config.yml`: ``` globus: @@ -37,14 +37,18 @@ globus: uuid: 44ae904c-ab64-4145-a8f0-7287de38324d ``` -### Create a Prefect deployment workflow `create_deployments.sh` file: +### Prefect workflows are deployed using the `create_deployments_[name].sh` scripts. + +These are meant to be run on `flow-prd`, in `bl832_agent` with properly set `.env` variables (i.e. prefect id/secret, globus id/secret, ..) + +General anatomy of a file: ``` prefect deployment build : -n 'name_of_the_workflow' -q prefect deployment apply -deployment.yaml ``` -The following example creates a Prefect workflow for the function of `process_new_file` in file of `./orchestration/flows/bl7012/move.py` +Example: The following creates a Prefect workflow for the function of `process_new_file` in file of `./orchestration/flows/bl7012/move.py` ``` prefect deployment build ./orchestration/flows/bl7012/move.py:process_new_file -n 'process_newdata7012' -q bl7012 From a80f61ba930a1a5d7d52194104d73fae8740a3e3 Mon Sep 17 00:00:00 2001 From: Garrett Birkel Date: Mon, 21 Oct 2024 12:07:31 -0700 Subject: [PATCH 4/4] Overview of existing flows/scripts --- README.md | 20 +++++++++++++++++++- docs/bl832_ALCF.md | 1 + 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 7ebe001..79f5564 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,25 @@ PREFECT_API_URL= PREFECT_API_KEY= ``` -## General configuration +## Current workflow overview and status: + +| Name | Description | Status | Notes | +|------------------------------|-------------------------------------------------------------------------------------|:--------:|-------| +| `move.py` | Move data from spot832 to data832, schedule pruning, and ingest into scicat | Deployed | [Details](./docs/bl832_ALCF.md) | +| `prune.py` | Run data pruning flows as scheduled | Deployed | | +| `alcf.py` | Run tomography reconstruction Globus Compute Flows at ALCF | Deployed | | +| `nersc.py` | Run tomography reconstruction using SFAPI at NERSC | WIP | | +| `dispatcher.py` | Dispatch flow to control beamline subflows | WIP | | +| `create_deployments_.sh` | Deploy functions as prefect flows. Run once, or after updating underlying flow code | | | +| `globus/flows.py` and `globus/transfer.py` | Connect Python with globus API – could use better error handling | | | +| `scripts/check_globus_compute.py` | Check if CC has access to compute endpoint and if it is available | | | +| `scripts/check_globus_transfer.py` | Check if CC has r/w/d access to an endpoint. Also ability to delete data | | | +| `source scripts/login_to_globus_and_prefect.py` | Login to globus/prefect in current terminal from .env file | | | +| `init__globus_flow.py` | Register and update globus flow UUIDs in Prefect | | | +| `init__globus_flow.py` | Register and update globus flow UUIDs in Prefect | | | +| `orchestration/tests/.py` | Test scripts using pytest | | | + +## Further development ### Globus collection endpoints are defined in `config.yml`: diff --git a/docs/bl832_ALCF.md b/docs/bl832_ALCF.md index 2e19887..b6f9f8c 100644 --- a/docs/bl832_ALCF.md +++ b/docs/bl832_ALCF.md @@ -484,6 +484,7 @@ Read more about Blocks here: https://docs.prefect.io/latest/concepts/blocks/ # Helper Scripts + We also provide several scripts for registering a new Globus Compute Flow, checking that the Globus Compute Endpoint is available, and ensuring that Globus Transfer has the correct permissions for reading, writing, and deleting data at a given transfer endpoint. ### Check Globus Compute Status [`orchestration/scripts/check_globus_compute.py`](orchestration/scripts/check_globus_compute.py)