Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

ECS worker setup guide improvements #401

Merged
merged 11 commits into from
Apr 5, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
226 changes: 141 additions & 85 deletions docs/ecs_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Why use ECS for flow run execution?

ECS (Elastic Container Service) tasks are a good option for executing Prefect 2 flow runs for several reasons:
ECS (Elastic Container Service) tasks are a good option for executing Prefect flow runs for several reasons:

1. **Scalability**: ECS scales your infrastructure in response to demand, effectively managing Prefect flow runs. ECS automatically administers container distribution across multiple instances based on demand.
2. **Flexibility**: ECS lets you choose between AWS Fargate and Amazon EC2 for container operation. Fargate abstracts the underlying infrastructure, while EC2 has faster job start times and offers additional control over instance management and configuration.
Expand All @@ -16,6 +16,18 @@ Prefect enables remote flow execution via [workers](https://docs.prefect.io/conc
For details on how workers and work pools are implemented for ECS, see the diagram below.

```mermaid
%%{
init: {
'theme': 'base',
'themeVariables': {
'primaryColor': '#2D6DF6',
'primaryTextColor': '#fff',
'lineColor': '#FE5A14',
'secondaryColor': '#E04BF0',
'tertiaryColor': '#fff'
}
}
}%%
graph TB

subgraph ecs_cluster[ECS cluster]
Expand All @@ -26,10 +38,11 @@ graph TB
fr_task_definition[Flow run task definition]


subgraph ecs_task["ECS task execution <br> (Flow run infrastructure)"]
style ecs_task text-align:center

flow_run((Flow run))
subgraph ecs_task["ECS task execution"]
style ecs_task text-align:center,display:flex


flow_run((Flow run))

end
fr_task_definition -->|defines| ecs_task
Expand All @@ -41,7 +54,7 @@ graph TB
end
end

subgraph github["GitHub"]
subgraph github["ECR"]
flow_code{{"Flow code"}}
end
flow_code --> |pulls| ecs_task
Expand All @@ -50,9 +63,10 @@ graph TB
```

## ECS and Prefect

!!! tip "ECS tasks != Prefect tasks"
An ECS task is **not** the same thing as a [Prefect task](https://docs.prefect.io/latest/concepts/tasks/#tasks-overview).
An ECS task is **not** the same thing as a [Prefect task](https://docs.prefect.io/latest/concepts/tasks/#tasks-overview).

ECS tasks are groupings of containers that run within an ECS Cluster. An ECS task's behavior is determined by its task definition.

An [*ECS task definition*](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definitions.html) is the blueprint for the ECS task. It describes which Docker containers to run and what you want to have happen inside these containers.
Expand All @@ -73,52 +87,47 @@ You can use either EC2 or Fargate as the capacity provider. Fargate simplifies i

<hr>


!!! tip
If you prefer infrastructure as code check out this [Terraform module](https://github.com/PrefectHQ/prefect-recipes/tree/main/devops/infrastructure-as-code/aws/tf-prefect2-ecs-worker) to provision an ECS cluster with a worker.

## Prerequisites

- An AWS account with permissions to create ECS services and IAM roles.
- The AWS CLI installed on your local machine. You can [download it from the AWS website](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).
- An [ECS Cluster](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/clusters.html) to host both the worker and the flow runs it submits. Follow [this guide](https://docs.aws.amazon.com/AmazonECS/latest/userguide/create_cluster.html) to create an ECS cluster or simply use the default cluster.
- A [VPC](https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html) configured for your ECS tasks. A VPC is a good idea if using EC2 and required if using Fargate.
- An [ECS Cluster](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/clusters.html) to host both the worker and the flow runs it submits. This guide uses the default cluster. To create your own follow [this guide](https://docs.aws.amazon.com/AmazonECS/latest/userguide/create_cluster.html).
- A [VPC](https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html) configured for your ECS tasks. This guide uses the default VPC.
- Prefect Cloud account or Prefect self-managed instance.

## Step 1: Set Up an ECS work pool
## Step 1: Set up an ECS work pool

Before setting up the worker, create a simple [work pool](https://docs.prefect.io/latest/concepts/work-pools/#work-pool-configuration) of type ECS for the worker to pull work from.
Before setting up the worker, create a [work pool](https://docs.prefect.io/latest/concepts/work-pools/#work-pool-configuration) of type ECS for the worker to pull work from. If doing so from the CLI, be sure to [authenticate with Prefect Cloud](https://docs.prefect.io/latest/cloud/cloud-quickstart/#log-into-prefect-cloud-from-a-terminal).

Create a work pool from the Prefect UI or CLI:
Create a work pool from the CLI:

```bash
prefect work-pool create --type ecs my-ecs-pool
```

Configure the VPC and ECS cluster for your work pool via the UI:
![VPC](img/VPC_UI.png)
Or from the Prefect UI:
![WorkPool](img/Workpool_UI.png)

Configuring custom fields is easiest from the UI.

![ECSCluster](img/ECSCluster_UI.png)

!!! Warning
You need to have a VPC specified for your work pool if you are using AWS Fargate.

![Launch](img/LaunchType_UI.png)
!!!
Because this guide uses Fargate as the capacity provider and the default VPC and ECS cluster, no further configuration is needed.

Next, set up a Prefect ECS worker that will discover and pull work from this work pool.

## Step 2: Start a Prefect worker in your ECS cluster

To create an [IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-custom.html#roles-creatingrole-custom-trust-policy-console) for the ECS task using the AWS CLI, follow these steps:
First start by creating the [IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-custom.html#roles-creatingrole-custom-trust-policy-console) required in order for your worker and flows to run. The sample flow in this guide doesn't interact with many other AWS services, so you will only be creating one role, `taskExecutionRole`. To create an IAM role for the ECS task using the AWS CLI, follow these steps:

### 1. Create a trust policy

The trust policy will specify that ECS can assume the role.
The trust policy will specify that the ECS service containing the Prefect worker will be able to assume the role required for calling other AWS services.

Save this policy to a file, such as `ecs-trust-policy.json`:

```json

{
"Version": "2012-10-17",
"Statement": [
Expand All @@ -133,35 +142,38 @@ Save this policy to a file, such as `ecs-trust-policy.json`:
}
```

### 2. Create the IAM role
### 2. Create the IAM roles

Use the `aws iam create-role` command to create the role:
Use the `aws iam create-role` command to create the roles that you will be using. For this guide, the `ecsTaskExecutionRole` will be used by the worker to start ECS tasks, and will also be the role assigned to the ECS tasks running your Prefect flows.

```bash
aws iam create-role \
--role-name ecsTaskExecutionRole \
--assume-role-policy-document file://ecs-trust-policy.json
aws iam create-role \
--role-name ecsTaskExecutionRole \
--assume-role-policy-document file://ecs-trust-policy.json
```

!!! tip
Depending on the requirements of your flows, it is advised to create a [second role for your ECS tasks](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html). This role will contain the permissions required by the ECS tasks in which your flows will run. For example, if your workflow loads data into an S3 bucket, you would need a role with additional permissions to access S3.

### 3. Attach the policy to the role

Amazon has a managed policy named `AmazonECSTaskExecutionRolePolicy` that grants the permissions necessary for ECS tasks. Attach this policy to your role:
For this guide the ECS worker will require permissions to pull images from ECR and publish logs to CloudWatch. Amazon has a managed policy named `AmazonECSTaskExecutionRolePolicy` that grants the permissions necessary for starting ECS tasks. [See here](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_execution_IAM_role.html) for other common execution role permissions. Attach this policy to your task execution role:

```bash
aws iam attach-role-policy \
--role-name ecsTaskExecutionRole \
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
aws iam attach-role-policy \
--role-name ecsTaskExecutionRole \
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
```

Remember to replace the `--role-name` and `--policy-arn` with the actual role name and policy Amazon Resource Name (ARN) you want to use.

Now, you have a role named `ecsTaskExecutionRole` that you can assign to your ECS tasks. This role has the necessary permissions to pull container images and publish logs to CloudWatch.
## Step 3: Creating an ECS worker service

### 4. Launch an ECS Service to host the worker
### 1. Launch an ECS Service to host the worker

Next, create an ECS task definition that specifies the Docker image for the Prefect worker, the resources it requires, and the command it should run. In this example, the command to start the worker is `prefect worker start --pool my-ecs-pool`.

Create a JSON file with the following contents:
**Create a JSON file with the following contents:**

```json
{
Expand All @@ -172,8 +184,8 @@ Create a JSON file with the following contents:
],
"cpu": "512",
"memory": "1024",
"executionRoleArn": "<your-ecs-task-role-arn>",
"taskRoleArn": "<your-ecs-task-role-arn>",
"executionRoleArn": "<ecs-task-role-arn>",
"taskRoleArn": "<ecs-task-role-arn>",
"containerDefinitions": [
{
"name": "prefect-worker",
Expand All @@ -189,28 +201,34 @@ Create a JSON file with the following contents:
"environment": [
{
"name": "PREFECT_API_URL",
"value": "https://api.prefect.cloud/api/accounts/<your-account-id>/workspaces/<your-workspace-id>"
"value": "prefect-api-url>"
},
{
"name": "PREFECT_API_KEY",
"value": "<your-prefect-api-key>"
"value": "<prefect-api-key>"
}
]
}
]
}
```

- Use `prefect config view` to view the `PREFECT_API_URL` for your current Prefect profile. Use this to replace both `<your-account-id>` and `<your-workspace-id>`.
- For the `PREFECT_API_KEY`, individuals on the organization tier can create a [service account](https://docs.prefect.io/latest/cloud/users/service-accounts/) for the worker. If on a personal tier, you can pass a user’s API key.
- Replace both instances of `<your-ecs-task-role-arn>` with the ARN of the IAM role you created in Step 2.
- Use `prefect config view` to view the `PREFECT_API_URL` for your current Prefect profile. Use this to replace `<prefect-api-url>`.

- For the `PREFECT_API_KEY`, if you are on a paid plan you can create a [service account](https://docs.prefect.io/latest/cloud/users/service-accounts/) for the worker. If your are on a free plan, you can pass a user’s API key.

- Replace both instances of `<ecs-task-role-arn>` with the ARN of the IAM role you created in Step 2. You can grab this by running:
```
aws iam get-role --role-name taskExecutionRole --query 'Role.[RoleName, Arn]' --output text
```

- Notice that the CPU and Memory allocations are relatively small. The worker's main responsibility is to submit work through API calls to AWS, _not_ to execute your Prefect flow code.

!!! tip
To avoid hardcoding your API key into the task definition JSON see [how to add environment variables to the container definition](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/secrets-envvar-secrets-manager.html#secrets-envvar-secrets-manager-update-container-definition). The API key must be stored as plain text, not the key-value pair dictionary that it is formatted in by default.
To avoid hardcoding your API key into the task definition JSON see [how to add sensitive data using AWS secrets manager to the container definition](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/specifying-sensitive-data-tutorial.html#specifying-sensitive-data-tutorial-create-taskdef).

### 2. Register the task definition

### 5. Register the task definition

Before creating a service, you first need to register a task definition. You can do that using the `register-task-definition` command in the AWS CLI. Here is an example:

```bash
Expand All @@ -219,7 +237,7 @@ aws ecs register-task-definition --cli-input-json file://task-definition.json

Replace `task-definition.json` with the name of your JSON file.

### 6. Create an ECS service to host your worker
### 3. Create an ECS service to host your worker

Finally, create a service that will manage your Prefect worker:

Expand All @@ -228,26 +246,32 @@ Open a terminal window and run the following command to create an ECS Fargate se
```bash
aws ecs create-service \
--service-name prefect-worker-service \
--cluster <your-ecs-cluster> \
--cluster <ecs-cluster> \
--task-definition <task-definition-arn> \
--launch-type FARGATE \
--desired-count 1 \
--network-configuration "awsvpcConfiguration={subnets=[<your-subnet-ids>],securityGroups=[<your-security-group-ids>]}"
--network-configuration "awsvpcConfiguration={subnets=[<subnet-ids>],securityGroups=[<security-group-ids>],assignPublicIp='ENABLED'}"
```

- Replace `<your-ecs-cluster>` with the name of your ECS cluster.
- Replace `<path-to-task-definition-file>` with the path to the JSON file you created in Step 2, `<your-subnet-ids>` with a comma-separated list of your VPC subnet IDs. Ensure that these subnets are aligned with the vpc specified on the work pool in step 1.
- Replace `<your-security-group-ids>` with a comma-separated list of your VPC security group IDs.
- Replace `<ecs-cluster>` with the name of your ECS cluster.
- Replace `<task-definition-arn>` with the ARN of the task definition you just registered.
- Replace `<subnet-ids>` with a comma-separated list of your VPC subnet IDs. Ensure that these subnets are aligned with the vpc specified on the work pool in step 1. You can view subnet ids with the following command:
`aws ec2 describe-subnets --filter Name=<vpc-id>`
- Replace `<security-group-ids>` with a comma-separated list of your VPC security group IDs.

!!! tip "Sanity check"
The work pool page in the Prefect UI allows you to check the health of your workers - make sure your new worker is live!
The work pool page in the Prefect UI allows you to check the health of your workers - make sure your new worker is live! Note that it can take a few minutes for an ECS service to come online.
jeanluciano marked this conversation as resolved.
Show resolved Hide resolved
If your worker does not come online and you are using the command from this guide, you may not be using the default VPC. For connectivity issues, check your VPC's configuration and refer to the [ECS outbound networking guide](https://docs.aws.amazon.com/AmazonECS/latest/bestpracticesguide/networking-outbound.html).

## Step 3: Pick up a flow run with your new worker!
## Step 4: Pick up a flow run with your new worker
jeanluciano marked this conversation as resolved.
Show resolved Hide resolved

### 1. Write a simple test flow in a repo of your choice:
This guide uses ECR to store a Docker image containing your flow code. To do this, we will write a flow, then deploy it using build and push steps that copy flow code into a Docker image and push that image to an ECR repository.

```python title="my_flow.py"
### 1. Write a simple test flow

`my_flow.py`

```python
from prefect import flow, get_run_logger

@flow
Expand All @@ -259,42 +283,74 @@ if __name__ == "__main__":
my_flow()
```

### 2. [Deploy](https://docs.prefect.io/2.11.0/tutorial/deployments/#create-a-deployment) the flow to the server, specifying the ECS work pool when prompted.
### 2. Create an ECR repository

Use the following AWS CLI command to create an ECR repository. The name you choose for your repository will be reused in the next step when defining your Prefect deployment.

```bash
jeanluciano marked this conversation as resolved.
Show resolved Hide resolved
prefect deploy my_flow.py:my_flow
aws ecr create-repository \
--repository-name <my-ecr-repo> \
--region <region>
```

jeanluciano marked this conversation as resolved.
Show resolved Hide resolved
### 3. Find the deployment in the UI and click the **Quick Run** button!
### 3. Create a `prefect.yaml` file

## Next steps
To have Prefect build your image when deploying your flow create a `prefect.yaml` file with the following specification:

Now that you are confident your ECS worker is healthy, you can experiment with different work pool configurations.
```yaml
name: ecs-worker-guide
# this is pre-populated by running prefect init
prefect-version: 2.14.20

- Do your flow runs require higher `CPU`?
- Would an EC2 `Launch Type` speed up your flow run execution?
# build section allows you to manage and build docker images
build:
- prefect_docker.deployments.steps.build_docker_image:
id: build_image
requires: prefect-docker>=0.3.1
image_name: <my-ecr-repo>
tag: latest
dockerfile: auto

These infrastructure configuration values can be set on your ECS work pool or they can be overridden on the deployment level through [job_variables](https://docs.prefect.io/concepts/infrastructure/#kubernetesjob-overrides-and-customizations) if desired.
# push section allows you to manage if and how this project is uploaded to remote locations
push:
- prefect_docker.deployments.steps.push_docker_image:
requires: prefect-docker>=0.3.1
image_name: '{{ build_image.image_name }}'
tag: '{{ build_image.tag }}'

# the deployments section allows you to provide configuration for deploying flows
deployments:
- name: my_ecs_deployment
version:
tags: []
description:
entrypoint: flow.py:my_flow
parameters: {}
work_pool:
name: ecs-dev-pool
work_queue_name:
job_variables:
image: '{{ build_image.image }}'
schedules: []
pull:
- prefect.deployments.steps.set_working_directory:
directory: /opt/prefect/ecs-worker-guide

Consider adding a [build action](https://docs.prefect.io/concepts/deployments-ux/#the-build-action) to your [`prefect.yaml`](https://docs.prefect.io/concepts/deployments-ux/#the-prefect-yaml-file) file if you want to automatically build a Docker image and push it to an image registry. Note that a Docker image is built and pushed by default if a deployemnt is creaded with the `flow.deploy` method in Python.
```

Here is an example build action for ECR:
### 4. [Deploy](https://docs.prefect.io/tutorial/deployments/#create-a-deployment) the flow to the Prefect Cloud or your self-managed server instance, specifying the ECS work pool when prompted

```yaml
build:
- prefect.deployments.steps.run_shell_script:
id: get-commit-hash
script: git rev-parse --short HEAD
stream_output: false
- prefect.deployments.steps.run_shell_script:
id: ecr-auth-step
script: aws ecr get-login-password --region <region> | docker login --username
AWS --password-stdin <>.dkr.ecr.<region>.amazonaws.com
stream_output: false
- prefect_docker.deployments.steps.build_docker_image:
requires: prefect-docker>=0.3.0
image_name: <your-AWS-account-number>.dkr.ecr.us-east-2.amazonaws.com/<registry>
tag: '{{ get-commit-hash.stdout }}'
dockerfile: auto
push: true
```bash
prefect deploy my_flow.py:my_ecs_deployment
```

### 5. Find the deployment in the UI and click the **Quick Run** button!

## Optional Next Steps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Optional Next Steps
## Optional next steps


1. Now that you are confident your ECS worker is healthy, you can experiment with different work pool configurations.

- Do your flow runs require higher `CPU`?
- Would an EC2 `Launch Type` speed up your flow run execution?

These infrastructure configuration values can be set on your ECS work pool or they can be overridden on the deployment level through [job_variables](https://docs.prefect.io/concepts/infrastructure/#kubernetesjob-overrides-and-customizations) if desired.
Binary file added docs/img/Workpool_UI.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading