Skip to content

Commit

Permalink
Merge pull request #34 from als-computing/2024/10-Docs_Organizing
Browse files Browse the repository at this point in the history
Some minor documentation cleanup
  • Loading branch information
davramov authored Oct 21, 2024
2 parents 72bb7e1 + a80f61b commit f0044d1
Show file tree
Hide file tree
Showing 5 changed files with 129 additions and 70 deletions.
4 changes: 4 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
GLOBUS_CLIENT_ID=<globus_client_id>
GLOBUS_CLIENT_SECRET=<globus_client_secret>
PREFECT_API_URL=<url_of_prefect_server>
PREFECT_API_KEY=<prefect_client_secret>
111 changes: 109 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,111 @@
# Splash Flows Globus
This repo contains code that can be used to run Prefect Orion workflows for Globus data movement.

These flows can be run from the command line or built into a self-contained docker container.
This repo contains configuration and code for Prefect workflows to move data and run computing tasks. Many of the workflows use Globus for data movement between local servers and back and forth to NERSC.

These flows can be run from the command line or built into a self-contained docker container.

## Getting started

### Clone this repo and set up the python enviornment:
```
$ git clone [email protected]:als-computing/splash_flows_globus.git
$ cd splash_flows_globus
$ pip3 install -e .
```

### Provide Prefect and Globus authentication in `.env`:

Use `.env.example` as a template.

```
GLOBUS_CLIENT_ID=<globus_client_id>
GLOBUS_CLIENT_SECRET=<globus_client_secret>
PREFECT_API_URL=<url_of_prefect_server>
PREFECT_API_KEY=<prefect_client_secret>
```

## Current workflow overview and status:

| Name | Description | Status | Notes |
|------------------------------|-------------------------------------------------------------------------------------|:--------:|-------|
| `move.py` | Move data from spot832 to data832, schedule pruning, and ingest into scicat | Deployed | [Details](./docs/bl832_ALCF.md) |
| `prune.py` | Run data pruning flows as scheduled | Deployed | |
| `alcf.py` | Run tomography reconstruction Globus Compute Flows at ALCF | Deployed | |
| `nersc.py` | Run tomography reconstruction using SFAPI at NERSC | WIP | |
| `dispatcher.py` | Dispatch flow to control beamline subflows | WIP | |
| `create_deployments_<bl>.sh` | Deploy functions as prefect flows. Run once, or after updating underlying flow code | | |
| `globus/flows.py` and `globus/transfer.py` | Connect Python with globus API – could use better error handling | | |
| `scripts/check_globus_compute.py` | Check if CC has access to compute endpoint and if it is available | | |
| `scripts/check_globus_transfer.py` | Check if CC has r/w/d access to an endpoint. Also ability to delete data | | |
| `source scripts/login_to_globus_and_prefect.py` | Login to globus/prefect in current terminal from .env file | | |
| `init_<data_task>_globus_flow.py` | Register and update globus flow UUIDs in Prefect | | |
| `init_<data_task>_globus_flow.py` | Register and update globus flow UUIDs in Prefect | | |
| `orchestration/tests/<pytest_scripts>.py` | Test scripts using pytest | | |

## Further development

### Globus collection endpoints are defined in `config.yml`:

```
globus:
globus_endpoints:
spot832:
root_path: /
uri: spot832.lbl.gov
uuid: 44ae904c-ab64-4145-a8f0-7287de38324d
```

### Prefect workflows are deployed using the `create_deployments_[name].sh` scripts.

These are meant to be run on `flow-prd`, in `bl832_agent` with properly set `.env` variables (i.e. prefect id/secret, globus id/secret, ..)

General anatomy of a file:

```
prefect deployment build <path_of_file>:<prefect_function> -n 'name_of_the_workflow' -q <tag>
prefect deployment apply <prefect_function>-deployment.yaml
```

Example: The following creates a Prefect workflow for the function of `process_new_file` in file of `./orchestration/flows/bl7012/move.py`

```
prefect deployment build ./orchestration/flows/bl7012/move.py:process_new_file -n 'process_newdata7012' -q bl7012
prefect deployment apply process_new_file-deployment.yaml
```

## Starting a Prefect workflow manually

Below is the command to start the Prefect workflow:
```
python -m orchestration.flows.bl7012.move <Relative path of file respect to the root_path defined in Globus endpoint>
```

## Submitting workflow via Prefect API

An example is shown `example.ipynb` to submit a PREFECT workflow to PREFECT server.

Once the job is submitted, a workflow agent is needed to work on jobs in queue. A workflow agent can be launched by:

```
prefect agent start -q <name-of-work-queue>
```

It requires to have the `PREFECT_API_URL` and `PREFECT_API_KEY` stored as an environment variables, such that the agent knows where to get the work queue. Once the agent is launched, the following message indicates where the agent is currently listening to.

```
Starting v2.7.9 agent connected to https://.../api...
___ ___ ___ ___ ___ ___ _____ _ ___ ___ _ _ _____
| _ \ _ \ __| __| __/ __|_ _| /_\ / __| __| \| |_ _|
| _/ / _|| _|| _| (__ | | / _ \ (_ | _|| .` | | |
|_| |_|_\___|_| |___\___| |_| /_/ \_\___|___|_|\_| |_|
Agent started! Looking for work from queue(s): <name-of-work-queue>...
```

## More specific documentation can be found in the `docs` folder:

* ["Globus configuration"](./docs/globus.md)
* ["8.3.2 ALCF Globus Flow And Reconstruction Setup"](./docs/bl832_ALCF.md)
* ["Data movement for BL7012"](./docs/bl7012.md)
64 changes: 2 additions & 62 deletions orchestration/flows/bl7012/README.md → docs/bl7012.md
Original file line number Diff line number Diff line change
@@ -1,65 +1,6 @@
# Data movement BL7012
This package contains functions to initiate Prefect data processing workflows for COSMIC data. The current implemented Prefect workflow enables data movement from a Globus beamline endpoint to the desinated NERSC Globus endpoint.
# Data movement for BL7012

# Getting start
Set up a python enviornment as follow:
```
$ git clone https://github.com/grace227/splash_flows_globus.git
$ cd splash_flows_globus
$ pip install -e .
```
Provide information of a Globus collection endpoint in `splash_flows_globus/config.yml`:
```
globus:
globus_endpoints:
spot832:
root_path: /
uri: spot832.lbl.gov
uuid: 44ae904c-ab64-4145-a8f0-7287de38324d
```
Provide information of Prefect and Globus authentication in `.env` file:
```
GLOBUS_CLIENT_ID=<globus_client_id>
GLOBUS_CLIENT_SECRET=<globus_client_secret>
PREFECT_API_URL=<url_of_prefect_server>
PREFECT_API_KEY=<prefect_client_secret>
```
Create a Prefect deployment workflow. Append new workflow in the `create_deployments.sh` file:
```
prefect deployment build <path_of_file>:<prefect_function> -n 'name_of_the_workflow' -q <tag>
prefect deployment apply <prefect_function>-deployment.yaml
```
Following example creates a Prefect workflow for the function of `process_new_file` in file of `./orchestration/flows/bl7012/move.py`
```
prefect deployment build ./orchestration/flows/bl7012/move.py:process_new_file -n 'process_newdata7012' -q bl7012
prefect deployment apply process_new_file-deployment.yaml
```

# Starting a Prefect workflow manually
Below is the command to start the Prefect workflow:
```
python -m orchestration.flows.bl7012.move <Relative path of file respect to the root_path defined in Globus endpoint>
```

# Submitting workflow via Prefect API
An example is shown `example.ipynb` to submit a PREFECT workflow to PREFECT server.

Once the job is submitted, a workflow agent is needed to work on jobs in queue. A workflow agent can be launched by:
```
prefect agent start -q <name-of-work-queue>
```
It requires to have the `PREFECT_API_URL` and `PREFECT_API_KEY` stored as an environment variables, such that the agent knows where to get the work queue. Once the agent is launched, the following message indicates where the agent is currently listening to.
```
Starting v2.7.9 agent connected to https://.../api...
___ ___ ___ ___ ___ ___ _____ _ ___ ___ _ _ _____
| _ \ _ \ __| __| __/ __|_ _| /_\ / __| __| \| |_ _|
| _/ / _|| _|| _| (__ | | / _ \ (_ | _|| .` | | |
|_| |_|_\___|_| |___\___| |_| /_/ \_\___|___|_|\_| |_|
Agent started! Looking for work from queue(s): <name-of-work-queue>...
```
This package contains functions to initiate Prefect data processing workflows for COSMIC data. The current implemented Prefect workflow enables data movement from a Globus beamline endpoint to the desinated NERSC Globus endpoint.

# Deploy a Ptychograpy NERSC Prefect Agent
A Prefect Agent is process that exists to run the code in a Prefect Flow. This repository supports building a container that runs prefect agents for several beamline operations at NERSC. The bl7012 agent has code that can copy data to NERSC and luanch a reconstruction at NERSC. The reconstruction is performed at NERSC, and the agent uses the [Super Facility API](https://docs.nersc.gov/services/sfapi/) to launch reconstruction jobs.
Expand Down Expand Up @@ -103,7 +44,6 @@ title: Deployment
---
graph


bc[Beamline Computer] --new flow--> prefect_api

subgraph ps [Prefect Server]
Expand Down
1 change: 1 addition & 0 deletions scripts/README.md → docs/bl832_ALCF.md
Original file line number Diff line number Diff line change
Expand Up @@ -484,6 +484,7 @@ Read more about Blocks here: https://docs.prefect.io/latest/concepts/blocks/


# Helper Scripts

We also provide several scripts for registering a new Globus Compute Flow, checking that the Globus Compute Endpoint is available, and ensuring that Globus Transfer has the correct permissions for reading, writing, and deleting data at a given transfer endpoint.

### Check Globus Compute Status [`orchestration/scripts/check_globus_compute.py`](orchestration/scripts/check_globus_compute.py)
Expand Down
19 changes: 13 additions & 6 deletions docs/globus_endpoint.md → docs/globus.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,20 @@
# Notes about Globus Endpoints
# Globus

[​Globus](https://www.globus.org/) is a file transport service, developed and operated as not-for-profit by the University of Chicago. There is also [​Globus Compute](https://www.globus.org/compute), which uses similar infrastructure to link compute services.

The UoC operates the server, and we deploy client software to make "endpoints" on our machines and move files between them.

## How to set up a Globus endpoint

## Configuring an endpoint
As of globus server v5, the configuration of a globus endpoint for data transfer via API has changed a bit.

The data orchestration code has a ("config.yml")[../config.yml"] file that contains configuration for endpoints.
The data orchestration code has a ["config.yml"](../config.yml") file that contains configuration for endpoints.

In the following steps, a "Guest Collection" is created, your application is assigned to it, and your client application is given permission to it.

> Note: A "Guest Collection" is created and your application is assigned to it and your client application is given permission to it. Before performing this step, have your Globus Application configurated.
> Note: Before performing these steps, have your Globus Application configurated.
To setup a new endpoint,
To set up a new endpoint,
* Create a collection for transfer.
* In the globus web app, find the endpoint and click on the vertical ellipse to view the endpoint details.
* Click on the "Collections" tab
Expand All @@ -21,7 +28,7 @@ To setup a new endpoint,
* Add an email
* Click "Add Permission"

Now that you have collection for sharing, the "Endpoint UUID" of this collection is your new endpoint UUID. Set it in ("config.yml")[../config.yml"]
Now that you have collection for sharing, the "Endpoint UUID" of this collection is your new endpoint UUID. Set it in ["config.yml"]("../config.yml")



Expand Down

0 comments on commit f0044d1

Please sign in to comment.