diff --git a/search/search_index.json b/search/search_index.json
index 7bc9908a..afb125dd 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"
prefect-aws
","text":""},{"location":"#welcome","title":"Welcome!","text":"
prefect-aws
makes it easy to leverage the capabilities of AWS in your flows, featuring support for ECSTask, S3, Secrets Manager, Batch Job, and Client Waiter.
"},{"location":"#getting-started","title":"Getting Started","text":""},{"location":"#saving-credentials-to-a-block","title":"Saving credentials to a block","text":"
You will need an AWS account and credentials in order to use prefect-aws
.
- Refer to the AWS Configuration documentation on how to retrieve your access key ID and secret access key
- Copy the access key ID and secret access key
- Create a short script and replace the placeholders with your credential information and desired block name:
from prefect_aws import AwsCredentials\nAwsCredentials(\n aws_access_key_id=\"PLACEHOLDER\",\n aws_secret_access_key=\"PLACEHOLDER\",\n aws_session_token=None, # replace this with token if necessary\n region_name=\"us-east-2\"\n).save(\"BLOCK-NAME-PLACEHOLDER\")\n
Congrats! You can now load the saved block to use your credentials in your Python code:
from prefect_aws import AwsCredentials\nAwsCredentials.load(\"BLOCK-NAME-PLACEHOLDER\")\n
Registering blocks
Register blocks in this module to view and edit them on Prefect Cloud:
prefect block register -m prefect_aws\n
"},{"location":"#using-prefect-with-aws-ecs","title":"Using Prefect with AWS ECS","text":"
prefect_aws
allows you to use AWS ECS as infrastructure for your deployments. Using ECS for scheduled flow runs enables the dynamic provisioning of infrastructure for containers and unlocks greater scalability.
The snippets below show how you can use prefect_aws
to run a task on ECS. The first example uses the ECSTask
block as infrastructure and the second example shows using ECS within a flow.
"},{"location":"#as-deployment-infrastructure","title":"As deployment Infrastructure","text":""},{"location":"#set-variables","title":"Set variables","text":"
To expedite copy/paste without the needing to update placeholders manually, update and execute the following.
export CREDENTIALS_BLOCK_NAME=\"aws-credentials\"\nexport VPC_ID=\"vpc-id\"\nexport ECS_TASK_BLOCK_NAME=\"ecs-task-example\"\nexport S3_BUCKET_BLOCK_NAME=\"ecs-task-bucket-example\"\n
"},{"location":"#save-an-infrastructure-and-storage-block","title":"Save an infrastructure and storage block","text":"
Save a custom infrastructure and storage block by executing the following snippet.
import os\nfrom prefect_aws import AwsCredentials, ECSTask, S3Bucket\n\naws_credentials = AwsCredentials.load(os.environ[\"CREDENTIALS_BLOCK_NAME\"])\n\necs_task = ECSTask(\n image=\"prefecthq/prefect:2-python3.10\",\n aws_credentials=aws_credentials,\n vpc_id=os.environ[\"VPC_ID\"],\n)\necs_task.save(os.environ[\"ECS_TASK_BLOCK_NAME\"], overwrite=True)\n\nbucket_name = \"ecs-task-bucket-example\"\ns3_client = aws_credentials.get_s3_client()\ns3_client.create_bucket(\n Bucket=bucket_name,\n CreateBucketConfiguration={\"LocationConstraint\": aws_credentials.region_name}\n)\ns3_bucket = S3Bucket(\n bucket_name=bucket_name,\n credentials=aws_credentials,\n)\ns3_bucket.save(os.environ[\"S3_BUCKET_BLOCK_NAME\"], overwrite=True)\n
"},{"location":"#write-a-flow","title":"Write a flow","text":"
Then, use an existing flow to create a deployment with, or use the flow below if you don't have an existing flow handy.
from prefect import flow\n\n@flow(log_prints=True)\ndef ecs_task_flow():\n print(\"Hello, Prefect!\")\n\nif __name__ == \"__main__\":\n ecs_task_flow()\n
"},{"location":"#create-a-deployment","title":"Create a deployment","text":"
If the script was named \"ecs_task_script.py\", build a deployment manifest with the following command.
prefect deployment build ecs_task_script.py:ecs_task_flow \\\n -n ecs-task-deployment \\\n -ib ecs-task/${ECS_TASK_BLOCK_NAME} \\\n -sb s3-bucket/${S3_BUCKET_BLOCK_NAME} \\\n --override env.EXTRA_PIP_PACKAGES=prefect-aws\n
Now apply the deployment!
prefect deployment apply ecs_task_flow-deployment.yaml\n
"},{"location":"#test-the-deployment","title":"Test the deployment","text":"
Start an agent in a separate terminal. The agent will poll the Prefect API's work pool for scheduled flow runs.
prefect agent start -q 'default'\n
Run the deployment once to test it:
prefect deployment run ecs-task-flow/ecs-task-deployment\n
Once the flow run has completed, you will see Hello, Prefect!
logged in the CLI and the Prefect UI.
No class found for dispatch key
If you encounter an error message like KeyError: \"No class found for dispatch key 'ecs-task' in registry for type 'Block'.\"
, ensure prefect-aws
is installed in the environment in which your agent is running!
Another tutorial on ECSTask
can be found here.
"},{"location":"#within-flow","title":"Within Flow","text":"
You can also execute commands with an ECSTask
block directly within a Prefect flow. Running containers via ECS in your flows is useful for executing non-Python code in a distributed manner while using Prefect.
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.ecs import ECSTask\n\n@flow\ndef ecs_task_flow():\n ecs_task = ECSTask(\n image=\"prefecthq/prefect:2-python3.10\",\n credentials=AwsCredentials.load(\"BLOCK-NAME-PLACEHOLDER\"),\n region=\"us-east-2\",\n command=[\"echo\", \"Hello, Prefect!\"],\n )\n return ecs_task.run()\n
This setup gives you all of the observation and orchestration benefits of Prefect, while also providing you the scalability of ECS.
"},{"location":"#using-prefect-with-aws-s3","title":"Using Prefect with AWS S3","text":"
prefect_aws
allows you to read and write objects with AWS S3 within your Prefect flows.
The provided code snippet shows how you can use prefect_aws
to upload a file to a AWS S3 bucket and download the same file under a different file name.
Note, the following code assumes that the bucket already exists.
from pathlib import Path\nfrom prefect import flow\nfrom prefect_aws import AwsCredentials, S3Bucket\n\n@flow\ndef s3_flow():\n # create a dummy file to upload\n file_path = Path(\"test-example.txt\")\n file_path.write_text(\"Hello, Prefect!\")\n\n aws_credentials = AwsCredentials.load(\"BLOCK-NAME-PLACEHOLDER\")\n s3_bucket = S3Bucket(\n bucket_name=\"BUCKET-NAME-PLACEHOLDER\",\n aws_credentials=aws_credentials\n )\n\n s3_bucket_path = s3_bucket.upload_from_path(file_path)\n downloaded_file_path = s3_bucket.download_object_to_path(\n s3_bucket_path, \"downloaded-test-example.txt\"\n )\n return downloaded_file_path.read_text()\n\ns3_flow()\n
"},{"location":"#using-prefect-with-aws-secrets-manager","title":"Using Prefect with AWS Secrets Manager","text":"
prefect_aws
allows you to read and write secrets with AWS Secrets Manager within your Prefect flows.
The provided code snippet shows how you can use prefect_aws
to write a secret to the Secret Manager, read the secret data, delete the secret, and finally return the secret data.
from prefect import flow\nfrom prefect_aws import AwsCredentials, AwsSecret\n\n@flow\ndef secrets_manager_flow():\n aws_credentials = AwsCredentials.load(\"BLOCK-NAME-PLACEHOLDER\")\n aws_secret = AwsSecret(secret_name=\"test-example\", aws_credentials=aws_credentials)\n aws_secret.write_secret(secret_data=b\"Hello, Prefect!\")\n secret_data = aws_secret.read_secret()\n aws_secret.delete_secret()\n return secret_data\n\nsecrets_manager_flow()\n
"},{"location":"#resources","title":"Resources","text":"
Refer to the API documentation on the sidebar to explore all the capabilities of Prefect AWS!
For more tips on how to use tasks and flows in a Collection, check out Using Collections!
"},{"location":"#recipes","title":"Recipes","text":"
For additional recipes and examples, check out prefect-recipes
.
"},{"location":"#installation","title":"Installation","text":"
Install prefect-aws
pip install prefect-aws\n
A list of available blocks in prefect-aws
and their setup instructions can be found here.
Requires an installation of Python 3.7+
We recommend using a Python virtual environment manager such as pipenv, conda or virtualenv.
These tasks are designed to work with Prefect 2.0. For more information about how to use Prefect, please refer to the Prefect documentation.
"},{"location":"#feedback","title":"Feedback","text":"
If you encounter any bugs while using prefect-aws
, feel free to open an issue in the prefect-aws
repository.
If you have any questions or issues while using prefect-aws
, you can find help in either the Prefect Discourse forum or the Prefect Slack community.
Feel free to star or watch prefect-aws
for updates too!
"},{"location":"batch/","title":"Batch","text":""},{"location":"batch/#prefect_aws.batch","title":"
prefect_aws.batch
","text":"
Tasks for interacting with AWS Batch
"},{"location":"batch/#prefect_aws.batch-functions","title":"Functions","text":""},{"location":"batch/#prefect_aws.batch.batch_submit","title":"
batch_submit
async
","text":"
Submit a job to the AWS Batch job service.
Parameters:
Name Type Description Default
job_name
str
The AWS batch job name.
required
job_definition
str
The AWS batch job definition.
required
job_queue
str
Name of the AWS batch job queue.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
**batch_kwargs
Optional[Dict[str, Any]]
Additional keyword arguments to pass to the boto3 submit_job
function. See the documentation for submit_job for more details.
{}
Returns:
Type Description
str
The id corresponding to the job.
Examples:
Submits a job to batch.
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.batch import batch_submit\n\n\n@flow\ndef example_batch_submit_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n job_id = batch_submit(\n \"job_name\",\n \"job_definition\",\n \"job_queue\",\n aws_credentials\n )\n return job_id\n\nexample_batch_submit_flow()\n
Source code in
prefect_aws/batch.py
@task\nasync def batch_submit(\n job_name: str,\n job_queue: str,\n job_definition: str,\n aws_credentials: AwsCredentials,\n **batch_kwargs: Optional[Dict[str, Any]],\n) -> str:\n \"\"\"\n Submit a job to the AWS Batch job service.\n\n Args:\n job_name: The AWS batch job name.\n job_definition: The AWS batch job definition.\n job_queue: Name of the AWS batch job queue.\n aws_credentials: Credentials to use for authentication with AWS.\n **batch_kwargs: Additional keyword arguments to pass to the boto3\n `submit_job` function. See the documentation for\n [submit_job](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/batch.html#Batch.Client.submit_job)\n for more details.\n\n Returns:\n The id corresponding to the job.\n\n Example:\n Submits a job to batch.\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.batch import batch_submit\n\n\n @flow\n def example_batch_submit_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n job_id = batch_submit(\n \"job_name\",\n \"job_definition\",\n \"job_queue\",\n aws_credentials\n )\n return job_id\n\n example_batch_submit_flow()\n ```\n\n \"\"\" # noqa\n logger = get_run_logger()\n logger.info(\"Preparing to submit %s job to %s job queue\", job_name, job_queue)\n\n batch_client = aws_credentials.get_boto3_session().client(\"batch\")\n\n response = await run_sync_in_worker_thread(\n batch_client.submit_job,\n jobName=job_name,\n jobQueue=job_queue,\n jobDefinition=job_definition,\n **batch_kwargs,\n )\n return response[\"jobId\"]\n
"},{"location":"blocks_catalog/","title":"Blocks Catalog","text":"
Below is a list of Blocks available for registration in prefect-aws
.
To register blocks in this module to view and edit them on Prefect Cloud, first install the required packages, then
prefect block register -m prefect_aws\n
Note, to use the
load
method on Blocks, you must already have a block document saved through code or saved through the UI."},{"location":"blocks_catalog/#credentials-module","title":"Credentials Module","text":"
AwsCredentials
Block used to manage authentication with AWS. AWS authentication is handled via the boto3
module. Refer to the boto3 docs for more info about the possible credential configurations.
To load the AwsCredentials:
from prefect import flow\nfrom prefect_aws.credentials import AwsCredentials\n\n@flow\ndef my_flow():\n my_block = AwsCredentials.load(\"MY_BLOCK_NAME\")\n\nmy_flow()\n
MinIOCredentials
Block used to manage authentication with MinIO. Refer to the MinIO docs: https://docs.min.io/docs/minio-server-configuration-guide.html for more info about the possible credential configurations.
To load the MinIOCredentials:
from prefect import flow\nfrom prefect_aws.credentials import MinIOCredentials\n\n@flow\ndef my_flow():\n my_block = MinIOCredentials.load(\"MY_BLOCK_NAME\")\n\nmy_flow()\n
For additional examples, check out the Credentials Module under Examples Catalog."},{"location":"blocks_catalog/#s3-module","title":"S3 Module","text":"
S3Bucket
Block used to store data using AWS S3 or S3-compatible object storage like MinIO.
To load the S3Bucket:
from prefect import flow\nfrom prefect_aws.s3 import S3Bucket\n\n@flow\ndef my_flow():\n my_block = S3Bucket.load(\"MY_BLOCK_NAME\")\n\nmy_flow()\n
For additional examples, check out the S3 Module under Examples Catalog."},{"location":"blocks_catalog/#ecs-module","title":"Ecs Module","text":"
ECSTask
Run a command as an ECS task.
To load the ECSTask:
from prefect import flow\nfrom prefect_aws.ecs import ECSTask\n\n@flow\ndef my_flow():\n my_block = ECSTask.load(\"MY_BLOCK_NAME\")\n\nmy_flow()\n
For additional examples, check out the Ecs Module under Examples Catalog."},{"location":"blocks_catalog/#secrets-manager-module","title":"Secrets Manager Module","text":"
AwsSecret
Manages a secret in AWS's Secrets Manager.
To load the AwsSecret:
from prefect import flow\nfrom prefect_aws.secrets_manager import AwsSecret\n\n@flow\ndef my_flow():\n my_block = AwsSecret.load(\"MY_BLOCK_NAME\")\n\nmy_flow()\n
For additional examples, check out the Secrets Manager Module under Examples Catalog."},{"location":"client_waiter/","title":"Client Waiter","text":""},{"location":"client_waiter/#prefect_aws.client_waiter","title":"
prefect_aws.client_waiter
","text":"
Task for waiting on a long-running AWS job
"},{"location":"client_waiter/#prefect_aws.client_waiter-functions","title":"Functions","text":""},{"location":"client_waiter/#prefect_aws.client_waiter.client_waiter","title":"
client_waiter
async
","text":"
Uses the underlying boto3 waiter functionality.
Parameters:
Name Type Description Default
client
str
The AWS client on which to wait (e.g., 'client_wait', 'ec2', etc).
required
waiter_name
str
The name of the waiter to instantiate. You may also use a custom waiter name, if you supply an accompanying waiter definition dict.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
waiter_definition
Optional[Dict[str, Any]]
A valid custom waiter model, as a dict. Note that if you supply a custom definition, it is assumed that the provided 'waiter_name' is contained within the waiter definition dict.
None
**waiter_kwargs
Optional[Dict[str, Any]]
Arguments to pass to the waiter.wait(...)
method. Will depend upon the specific waiter being called.
{}
Examples:
Run an ec2 waiter until instance_exists.
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.client_wait import client_waiter\n\n@flow\ndef example_client_wait_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n\n waiter = client_waiter(\n \"ec2\",\n \"instance_exists\",\n aws_credentials\n )\n\n return waiter\nexample_client_wait_flow()\n
Source code in
prefect_aws/client_waiter.py
@task\nasync def client_waiter(\n client: str,\n waiter_name: str,\n aws_credentials: AwsCredentials,\n waiter_definition: Optional[Dict[str, Any]] = None,\n **waiter_kwargs: Optional[Dict[str, Any]],\n):\n \"\"\"\n Uses the underlying boto3 waiter functionality.\n\n Args:\n client: The AWS client on which to wait (e.g., 'client_wait', 'ec2', etc).\n waiter_name: The name of the waiter to instantiate.\n You may also use a custom waiter name, if you supply\n an accompanying waiter definition dict.\n aws_credentials: Credentials to use for authentication with AWS.\n waiter_definition: A valid custom waiter model, as a dict. Note that if\n you supply a custom definition, it is assumed that the provided\n 'waiter_name' is contained within the waiter definition dict.\n **waiter_kwargs: Arguments to pass to the `waiter.wait(...)` method. Will\n depend upon the specific waiter being called.\n\n Example:\n Run an ec2 waiter until instance_exists.\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.client_wait import client_waiter\n\n @flow\n def example_client_wait_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n\n waiter = client_waiter(\n \"ec2\",\n \"instance_exists\",\n aws_credentials\n )\n\n return waiter\n example_client_wait_flow()\n ```\n \"\"\"\n logger = get_run_logger()\n logger.info(\"Waiting on %s job\", client)\n\n boto_client = aws_credentials.get_boto3_session().client(client)\n\n if waiter_definition is not None:\n # Use user-provided waiter definition\n waiter_model = WaiterModel(waiter_definition)\n waiter = create_waiter_with_client(waiter_name, waiter_model, boto_client)\n elif waiter_name in boto_client.waiter_names:\n waiter = boto_client.get_waiter(waiter_name)\n else:\n raise ValueError(\n f\"The waiter name, {waiter_name}, is not a valid boto waiter; \"\n \"if using a custom waiter, you must provide a waiter definition\"\n )\n\n await run_sync_in_worker_thread(waiter.wait, **waiter_kwargs)\n
"},{"location":"contributing/","title":"Contributing","text":"
If you'd like to help contribute to fix an issue or add a feature to prefect-aws
, please propose changes through a pull request from a fork of the repository.
Here are the steps:
- Fork the repository
- Clone the forked repository
- Install the repository and its dependencies:
pip install -e \".[dev]\"\n
- Make desired changes
- Add tests
- Insert an entry to CHANGELOG.md
- Install
pre-commit
to perform quality checks prior to commit: pre-commit install\n
git commit
, git push
, and create a pull request
"},{"location":"credentials/","title":"Credentials","text":""},{"location":"credentials/#prefect_aws.credentials","title":"
prefect_aws.credentials
","text":"
Module handling AWS credentials
"},{"location":"credentials/#prefect_aws.credentials-classes","title":"Classes","text":""},{"location":"credentials/#prefect_aws.credentials.AwsCredentials","title":"
AwsCredentials (CredentialsBlock)
pydantic-model
","text":"
Block used to manage authentication with AWS. AWS authentication is handled via the boto3
module. Refer to the boto3 docs for more info about the possible credential configurations.
Examples:
Load stored AWS credentials:
from prefect_aws import AwsCredentials\n\naws_credentials_block = AwsCredentials.load(\"BLOCK_NAME\")\n
Source code in
prefect_aws/credentials.py
class AwsCredentials(CredentialsBlock):\n \"\"\"\n Block used to manage authentication with AWS. AWS authentication is\n handled via the `boto3` module. Refer to the\n [boto3 docs](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html)\n for more info about the possible credential configurations.\n\n Example:\n Load stored AWS credentials:\n ```python\n from prefect_aws import AwsCredentials\n\n aws_credentials_block = AwsCredentials.load(\"BLOCK_NAME\")\n ```\n \"\"\" # noqa E501\n\n _logo_url = \"https://cdn.sanity.io/images/3ugk85nk/production/d74b16fe84ce626345adf235a47008fea2869a60-225x225.png\" # noqa\n _block_type_name = \"AWS Credentials\"\n _documentation_url = \"https://prefecthq.github.io/prefect-aws/credentials/#prefect_aws.credentials.AwsCredentials\" # noqa\n\n aws_access_key_id: Optional[str] = Field(\n default=None,\n description=\"A specific AWS access key ID.\",\n title=\"AWS Access Key ID\",\n )\n aws_secret_access_key: Optional[SecretStr] = Field(\n default=None,\n description=\"A specific AWS secret access key.\",\n title=\"AWS Access Key Secret\",\n )\n aws_session_token: Optional[str] = Field(\n default=None,\n description=(\n \"The session key for your AWS account. \"\n \"This is only needed when you are using temporary credentials.\"\n ),\n title=\"AWS Session Token\",\n )\n profile_name: Optional[str] = Field(\n default=None, description=\"The profile to use when creating your session.\"\n )\n region_name: Optional[str] = Field(\n default=None,\n description=\"The AWS Region where you want to create new connections.\",\n )\n aws_client_parameters: AwsClientParameters = Field(\n default_factory=AwsClientParameters,\n description=\"Extra parameters to initialize the Client.\",\n title=\"AWS Client Parameters\",\n )\n\n def get_boto3_session(self) -> boto3.Session:\n \"\"\"\n Returns an authenticated boto3 session that can be used to create clients\n for AWS services\n\n Example:\n Create an S3 client from an authorized boto3 session:\n ```python\n aws_credentials = AwsCredentials(\n aws_access_key_id = \"access_key_id\",\n aws_secret_access_key = \"secret_access_key\"\n )\n s3_client = aws_credentials.get_boto3_session().client(\"s3\")\n ```\n \"\"\"\n\n if self.aws_secret_access_key:\n aws_secret_access_key = self.aws_secret_access_key.get_secret_value()\n else:\n aws_secret_access_key = None\n\n return boto3.Session(\n aws_access_key_id=self.aws_access_key_id,\n aws_secret_access_key=aws_secret_access_key,\n aws_session_token=self.aws_session_token,\n profile_name=self.profile_name,\n region_name=self.region_name,\n )\n\n def get_client(self, client_type: Union[str, ClientType]) -> Any:\n \"\"\"\n Helper method to dynamically get a client type.\n\n Args:\n client_type: The client's service name.\n\n Returns:\n An authenticated client.\n\n Raises:\n ValueError: if the client is not supported.\n \"\"\"\n if isinstance(client_type, ClientType):\n client_type = client_type.value\n\n client = self.get_boto3_session().client(\n service_name=client_type, **self.aws_client_parameters.get_params_override()\n )\n return client\n\n def get_s3_client(self) -> S3Client:\n \"\"\"\n Gets an authenticated S3 client.\n\n Returns:\n An authenticated S3 client.\n \"\"\"\n return self.get_client(client_type=ClientType.S3)\n\n def get_secrets_manager_client(self) -> SecretsManagerClient:\n \"\"\"\n Gets an authenticated Secrets Manager client.\n\n Returns:\n An authenticated Secrets Manager client.\n \"\"\"\n return self.get_client(client_type=ClientType.SECRETS_MANAGER)\n
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials-attributes","title":"Attributes","text":""},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.aws_access_key_id","title":"
aws_access_key_id: str
pydantic-field
","text":"
A specific AWS access key ID.
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.aws_client_parameters","title":"
aws_client_parameters: AwsClientParameters
pydantic-field
","text":"
Extra parameters to initialize the Client.
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.aws_secret_access_key","title":"
aws_secret_access_key: SecretStr
pydantic-field
","text":"
A specific AWS secret access key.
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.aws_session_token","title":"
aws_session_token: str
pydantic-field
","text":"
The session key for your AWS account. This is only needed when you are using temporary credentials.
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.profile_name","title":"
profile_name: str
pydantic-field
","text":"
The profile to use when creating your session.
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.region_name","title":"
region_name: str
pydantic-field
","text":"
The AWS Region where you want to create new connections.
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials-methods","title":"Methods","text":""},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.__json_encoder__","title":"
__json_encoder__
special
staticmethod
","text":"
partial(func, args, *keywords) - new function with partial application of the given arguments and keywords.
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.get_boto3_session","title":"
get_boto3_session
","text":"
Returns an authenticated boto3 session that can be used to create clients for AWS services
Examples:
Create an S3 client from an authorized boto3 session:
aws_credentials = AwsCredentials(\n aws_access_key_id = \"access_key_id\",\n aws_secret_access_key = \"secret_access_key\"\n )\ns3_client = aws_credentials.get_boto3_session().client(\"s3\")\n
Source code in
prefect_aws/credentials.py
def get_boto3_session(self) -> boto3.Session:\n \"\"\"\n Returns an authenticated boto3 session that can be used to create clients\n for AWS services\n\n Example:\n Create an S3 client from an authorized boto3 session:\n ```python\n aws_credentials = AwsCredentials(\n aws_access_key_id = \"access_key_id\",\n aws_secret_access_key = \"secret_access_key\"\n )\n s3_client = aws_credentials.get_boto3_session().client(\"s3\")\n ```\n \"\"\"\n\n if self.aws_secret_access_key:\n aws_secret_access_key = self.aws_secret_access_key.get_secret_value()\n else:\n aws_secret_access_key = None\n\n return boto3.Session(\n aws_access_key_id=self.aws_access_key_id,\n aws_secret_access_key=aws_secret_access_key,\n aws_session_token=self.aws_session_token,\n profile_name=self.profile_name,\n region_name=self.region_name,\n )\n
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.get_client","title":"
get_client
","text":"
Helper method to dynamically get a client type.
Parameters:
Name Type Description Default
client_type
Union[str, prefect_aws.credentials.ClientType]
The client's service name.
required
Returns:
Type Description
Any
An authenticated client.
Exceptions:
Type Description
ValueError
if the client is not supported.
Source code in
prefect_aws/credentials.py
def get_client(self, client_type: Union[str, ClientType]) -> Any:\n \"\"\"\n Helper method to dynamically get a client type.\n\n Args:\n client_type: The client's service name.\n\n Returns:\n An authenticated client.\n\n Raises:\n ValueError: if the client is not supported.\n \"\"\"\n if isinstance(client_type, ClientType):\n client_type = client_type.value\n\n client = self.get_boto3_session().client(\n service_name=client_type, **self.aws_client_parameters.get_params_override()\n )\n return client\n
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.get_s3_client","title":"
get_s3_client
","text":"
Gets an authenticated S3 client.
Returns:
Type Description
S3Client
An authenticated S3 client.
Source code in
prefect_aws/credentials.py
def get_s3_client(self) -> S3Client:\n \"\"\"\n Gets an authenticated S3 client.\n\n Returns:\n An authenticated S3 client.\n \"\"\"\n return self.get_client(client_type=ClientType.S3)\n
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.get_secrets_manager_client","title":"
get_secrets_manager_client
","text":"
Gets an authenticated Secrets Manager client.
Returns:
Type Description
SecretsManagerClient
An authenticated Secrets Manager client.
Source code in
prefect_aws/credentials.py
def get_secrets_manager_client(self) -> SecretsManagerClient:\n \"\"\"\n Gets an authenticated Secrets Manager client.\n\n Returns:\n An authenticated Secrets Manager client.\n \"\"\"\n return self.get_client(client_type=ClientType.SECRETS_MANAGER)\n
"},{"location":"credentials/#prefect_aws.credentials.ClientType","title":"
ClientType (Enum)
","text":"
An enumeration.
Source code in
prefect_aws/credentials.py
class ClientType(Enum):\n S3 = \"s3\"\n ECS = \"ecs\"\n BATCH = \"batch\"\n SECRETS_MANAGER = \"secretsmanager\"\n
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials","title":"
MinIOCredentials (CredentialsBlock)
pydantic-model
","text":"
Block used to manage authentication with MinIO. Refer to the MinIO docs for more info about the possible credential configurations.
Attributes:
Name Type Description
minio_root_user
str
Admin or root user.
minio_root_password
SecretStr
Admin or root password.
region_name
Optional[str]
Location of server, e.g. \"us-east-1\".
Examples:
Load stored MinIO credentials:
from prefect_aws import MinIOCredentials\n\nminio_credentials_block = MinIOCredentials.load(\"BLOCK_NAME\")\n
Source code in
prefect_aws/credentials.py
class MinIOCredentials(CredentialsBlock):\n \"\"\"\n Block used to manage authentication with MinIO. Refer to the\n [MinIO docs](https://docs.min.io/docs/minio-server-configuration-guide.html)\n for more info about the possible credential configurations.\n\n Attributes:\n minio_root_user: Admin or root user.\n minio_root_password: Admin or root password.\n region_name: Location of server, e.g. \"us-east-1\".\n\n Example:\n Load stored MinIO credentials:\n ```python\n from prefect_aws import MinIOCredentials\n\n minio_credentials_block = MinIOCredentials.load(\"BLOCK_NAME\")\n ```\n \"\"\" # noqa E501\n\n _logo_url = \"https://cdn.sanity.io/images/3ugk85nk/production/676cb17bcbdff601f97e0a02ff8bcb480e91ff40-250x250.png\" # noqa\n _block_type_name = \"MinIO Credentials\"\n _description = (\n \"Block used to manage authentication with MinIO. Refer to the MinIO \"\n \"docs: https://docs.min.io/docs/minio-server-configuration-guide.html \"\n \"for more info about the possible credential configurations.\"\n )\n _documentation_url = \"https://prefecthq.github.io/prefect-aws/credentials/#prefect_aws.credentials.MinIOCredentials\" # noqa\n\n minio_root_user: str = Field(default=..., description=\"Admin or root user.\")\n minio_root_password: SecretStr = Field(\n default=..., description=\"Admin or root password.\"\n )\n region_name: Optional[str] = Field(\n default=None,\n description=\"The AWS Region where you want to create new connections.\",\n )\n aws_client_parameters: AwsClientParameters = Field(\n default_factory=AwsClientParameters,\n description=\"Extra parameters to initialize the Client.\",\n )\n\n def get_boto3_session(self) -> boto3.Session:\n \"\"\"\n Returns an authenticated boto3 session that can be used to create clients\n and perform object operations on MinIO server.\n\n Example:\n Create an S3 client from an authorized boto3 session\n\n ```python\n minio_credentials = MinIOCredentials(\n minio_root_user = \"minio_root_user\",\n minio_root_password = \"minio_root_password\"\n )\n s3_client = minio_credentials.get_boto3_session().client(\n service=\"s3\",\n endpoint_url=\"http://localhost:9000\"\n )\n ```\n \"\"\"\n\n minio_root_password = (\n self.minio_root_password.get_secret_value()\n if self.minio_root_password\n else None\n )\n\n return boto3.Session(\n aws_access_key_id=self.minio_root_user,\n aws_secret_access_key=minio_root_password,\n region_name=self.region_name,\n )\n\n def get_client(self, client_type: Union[str, ClientType]) -> Any:\n \"\"\"\n Helper method to dynamically get a client type.\n\n Args:\n client_type: The client's service name.\n\n Returns:\n An authenticated client.\n\n Raises:\n ValueError: if the client is not supported.\n \"\"\"\n if isinstance(client_type, ClientType):\n client_type = client_type.value\n\n client = self.get_boto3_session().client(\n service_name=client_type, **self.aws_client_parameters.get_params_override()\n )\n return client\n\n def get_s3_client(self) -> S3Client:\n \"\"\"\n Gets an authenticated S3 client.\n\n Returns:\n An authenticated S3 client.\n \"\"\"\n return self.get_client(client_type=ClientType.S3)\n
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials-attributes","title":"Attributes","text":""},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials.aws_client_parameters","title":"
aws_client_parameters: AwsClientParameters
pydantic-field
","text":"
Extra parameters to initialize the Client.
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials.minio_root_password","title":"
minio_root_password: SecretStr
pydantic-field
required
","text":"
Admin or root password.
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials.minio_root_user","title":"
minio_root_user: str
pydantic-field
required
","text":"
Admin or root user.
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials.region_name","title":"
region_name: str
pydantic-field
","text":"
The AWS Region where you want to create new connections.
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials-methods","title":"Methods","text":""},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials.__json_encoder__","title":"
__json_encoder__
special
staticmethod
","text":"
partial(func, args, *keywords) - new function with partial application of the given arguments and keywords.
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials.get_boto3_session","title":"
get_boto3_session
","text":"
Returns an authenticated boto3 session that can be used to create clients and perform object operations on MinIO server.
Examples:
Create an S3 client from an authorized boto3 session
minio_credentials = MinIOCredentials(\n minio_root_user = \"minio_root_user\",\n minio_root_password = \"minio_root_password\"\n)\ns3_client = minio_credentials.get_boto3_session().client(\n service=\"s3\",\n endpoint_url=\"http://localhost:9000\"\n)\n
Source code in
prefect_aws/credentials.py
def get_boto3_session(self) -> boto3.Session:\n \"\"\"\n Returns an authenticated boto3 session that can be used to create clients\n and perform object operations on MinIO server.\n\n Example:\n Create an S3 client from an authorized boto3 session\n\n ```python\n minio_credentials = MinIOCredentials(\n minio_root_user = \"minio_root_user\",\n minio_root_password = \"minio_root_password\"\n )\n s3_client = minio_credentials.get_boto3_session().client(\n service=\"s3\",\n endpoint_url=\"http://localhost:9000\"\n )\n ```\n \"\"\"\n\n minio_root_password = (\n self.minio_root_password.get_secret_value()\n if self.minio_root_password\n else None\n )\n\n return boto3.Session(\n aws_access_key_id=self.minio_root_user,\n aws_secret_access_key=minio_root_password,\n region_name=self.region_name,\n )\n
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials.get_client","title":"
get_client
","text":"
Helper method to dynamically get a client type.
Parameters:
Name Type Description Default
client_type
Union[str, prefect_aws.credentials.ClientType]
The client's service name.
required
Returns:
Type Description
Any
An authenticated client.
Exceptions:
Type Description
ValueError
if the client is not supported.
Source code in
prefect_aws/credentials.py
def get_client(self, client_type: Union[str, ClientType]) -> Any:\n \"\"\"\n Helper method to dynamically get a client type.\n\n Args:\n client_type: The client's service name.\n\n Returns:\n An authenticated client.\n\n Raises:\n ValueError: if the client is not supported.\n \"\"\"\n if isinstance(client_type, ClientType):\n client_type = client_type.value\n\n client = self.get_boto3_session().client(\n service_name=client_type, **self.aws_client_parameters.get_params_override()\n )\n return client\n
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials.get_s3_client","title":"
get_s3_client
","text":"
Gets an authenticated S3 client.
Returns:
Type Description
S3Client
An authenticated S3 client.
Source code in
prefect_aws/credentials.py
def get_s3_client(self) -> S3Client:\n \"\"\"\n Gets an authenticated S3 client.\n\n Returns:\n An authenticated S3 client.\n \"\"\"\n return self.get_client(client_type=ClientType.S3)\n
"},{"location":"ecs/","title":"ECS","text":""},{"location":"ecs/#prefect_aws.ecs","title":"
prefect_aws.ecs
","text":"
Integrations with the Amazon Elastic Container Service.
Examples:
Run a task using ECS Fargate
ECSTask(command=[\"echo\", \"hello world\"]).run()\n
Run a task using ECS Fargate with a spot container instance
ECSTask(command=[\"echo\", \"hello world\"], launch_type=\"FARGATE_SPOT\").run()\n
Run a task using ECS with an EC2 container instance
ECSTask(command=[\"echo\", \"hello world\"], launch_type=\"EC2\").run()\n
Run a task on a specific VPC using ECS Fargate
ECSTask(command=[\"echo\", \"hello world\"], vpc_id=\"vpc-01abcdf123456789a\").run()\n
Run a task and stream the container's output to the local terminal. Note an execution role must be provided with permissions: logs:CreateLogStream, logs:CreateLogGroup, and logs:PutLogEvents.
ECSTask(\n command=[\"echo\", \"hello world\"],\n stream_output=True,\n execution_role_arn=\"...\"\n)\n
Run a task using an existing task definition as a base
ECSTask(command=[\"echo\", \"hello world\"], task_definition_arn=\"arn:aws:ecs:...\")\n
Run a task with a specific image
ECSTask(command=[\"echo\", \"hello world\"], image=\"alpine:latest\")\n
Run a task with custom memory and CPU requirements
ECSTask(command=[\"echo\", \"hello world\"], memory=4096, cpu=2048)\n
Run a task with custom environment variables
ECSTask(command=[\"echo\", \"hello $PLANET\"], env={\"PLANET\": \"earth\"})\n
Run a task in a specific ECS cluster
ECSTask(command=[\"echo\", \"hello world\"], cluster=\"my-cluster-name\")\n
Run a task with custom VPC subnets
ECSTask(\n command=[\"echo\", \"hello world\"],\n task_customizations=[\n {\n \"op\": \"add\",\n \"path\": \"/networkConfiguration/awsvpcConfiguration/subnets\",\n \"value\": [\"subnet-80b6fbcd\", \"subnet-42a6fdgd\"],\n },\n ]\n)\n
Run a task without a public IP assigned
ECSTask(\n command=[\"echo\", \"hello world\"],\n vpc_id=\"vpc-01abcdf123456789a\",\n task_customizations=[\n {\n \"op\": \"replace\",\n \"path\": \"/networkConfiguration/awsvpcConfiguration/assignPublicIp\",\n \"value\": \"DISABLED\",\n },\n ]\n)\n
Run a task with custom VPC security groups
ECSTask(\n command=[\"echo\", \"hello world\"],\n vpc_id=\"vpc-01abcdf123456789a\",\n task_customizations=[\n {\n \"op\": \"add\",\n \"path\": \"/networkConfiguration/awsvpcConfiguration/securityGroups\",\n \"value\": [\"sg-d72e9599956a084f5\"],\n },\n ],\n)\n
"},{"location":"ecs/#prefect_aws.ecs-classes","title":"Classes","text":""},{"location":"ecs/#prefect_aws.ecs.ECSTask","title":"
ECSTask (Infrastructure)
pydantic-model
","text":"
Run a command as an ECS task.
Attributes:
Name Type Description
type
Literal['ecs-task']
The slug for this task type with a default value of \"ecs-task\".
aws_credentials
AwsCredentials
The AWS credentials to use to connect to ECS with a default factory of AwsCredentials.
task_definition_arn
Optional[str]
An optional identifier for an existing task definition to use. If fields are set on the ECSTask that conflict with the task definition, a new copy will be registered with the required values. Cannot be used with task_definition. If not provided, Prefect will generate and register a minimal task definition.
task_definition
Optional[dict]
An optional ECS task definition to use. Prefect may set defaults or override fields on this task definition to match other ECSTask fields. Cannot be used with task_definition_arn. If not provided, Prefect will generate and register a minimal task definition.
family
Optional[str]
An optional family for the task definition. If not provided, it will be inferred from the task definition. If the task definition does not have a family, the name will be generated. When flow and deployment metadata is available, the generated name will include their names. Values for this field will be slugified to match AWS character requirements.
image
Optional[str]
An optional image to use for the Prefect container in the task. If this value is not null, it will override the value in the task definition. This value defaults to a Prefect base image matching your local versions.
auto_deregister_task_definition
bool
A boolean that controls if any task definitions that are created by this block will be deregistered or not. Existing task definitions linked by ARN will never be deregistered. Deregistering a task definition does not remove it from your AWS account, instead it will be marked as INACTIVE.
cpu
int
The amount of CPU to provide to the ECS task. Valid amounts are specified in the AWS documentation. If not provided, a default value of ECS_DEFAULT_CPU will be used unless present on the task definition.
memory
int
The amount of memory to provide to the ECS task. Valid amounts are specified in the AWS documentation. If not provided, a default value of ECS_DEFAULT_MEMORY will be used unless present on the task definition.
execution_role_arn
str
An execution role to use for the task. This controls the permissions of the task when it is launching. If this value is not null, it will override the value in the task definition. An execution role must be provided to capture logs from the container.
configure_cloudwatch_logs
bool
A boolean that controls if the Prefect container will be configured to send its output to the AWS CloudWatch logs service or not. This functionality requires an execution role with permissions to create log streams and groups.
cloudwatch_logs_options
Dict[str, str]
A dictionary of options to pass to the CloudWatch logs configuration.
stream_output
bool
A boolean indicating whether logs will be streamed from the Prefect container to the local console.
launch_type
Optional[Literal['FARGATE', 'EC2', 'EXTERNAL', 'FARGATE_SPOT']]
An optional launch type for the ECS task run infrastructure.
vpc_id
Optional[str]
An optional VPC ID to link the task run to. This is only applicable when using the 'awsvpc' network mode for your task.
cluster
Optional[str]
An optional ECS cluster to run the task in. The ARN or name may be provided. If not provided, the default cluster will be used.
env
Dict[str, Optional[str]]
A dictionary of environment variables to provide to the task run. These variables are set on the Prefect container at task runtime.
task_role_arn
str
An optional role to attach to the task run. This controls the permissions of the task while it is running.
task_customizations
JsonPatch
A list of JSON 6902 patches to apply to the task run request. If a string is given, it will parsed as a JSON expression.
task_start_timeout_seconds
int
The amount of time to watch for the start of the ECS task before marking it as failed. The task must enter a RUNNING state to be considered started.
task_watch_poll_interval
float
The amount of time to wait between AWS API calls while monitoring the state of an ECS task.
Source code in
prefect_aws/ecs.py
class ECSTask(Infrastructure):\n \"\"\"\n Run a command as an ECS task.\n\n Attributes:\n type: The slug for this task type with a default value of \"ecs-task\".\n aws_credentials: The AWS credentials to use to connect to ECS with a\n default factory of AwsCredentials.\n task_definition_arn: An optional identifier for an existing task definition\n to use. If fields are set on the ECSTask that conflict with the task\n definition, a new copy will be registered with the required values.\n Cannot be used with task_definition. If not provided, Prefect will\n generate and register a minimal task definition.\n task_definition: An optional ECS task definition to use. Prefect may set\n defaults or override fields on this task definition to match other\n ECSTask fields. Cannot be used with task_definition_arn.\n If not provided, Prefect will generate and register\n a minimal task definition.\n family: An optional family for the task definition. If not provided,\n it will be inferred from the task definition. If the task definition\n does not have a family, the name will be generated. When flow and\n deployment metadata is available, the generated name will include\n their names. Values for this field will be slugified to match\n AWS character requirements.\n image: An optional image to use for the Prefect container in the task.\n If this value is not null, it will override the value in the task\n definition. This value defaults to a Prefect base image matching\n your local versions.\n auto_deregister_task_definition: A boolean that controls if any task\n definitions that are created by this block will be deregistered\n or not. Existing task definitions linked by ARN will never be\n deregistered. Deregistering a task definition does not remove\n it from your AWS account, instead it will be marked as INACTIVE.\n cpu: The amount of CPU to provide to the ECS task. Valid amounts are\n specified in the AWS documentation. If not provided, a default\n value of ECS_DEFAULT_CPU will be used unless present on\n the task definition.\n memory: The amount of memory to provide to the ECS task.\n Valid amounts are specified in the AWS documentation.\n If not provided, a default value of ECS_DEFAULT_MEMORY\n will be used unless present on the task definition.\n execution_role_arn: An execution role to use for the task.\n This controls the permissions of the task when it is launching.\n If this value is not null, it will override the value in the task\n definition. An execution role must be provided to capture logs\n from the container.\n configure_cloudwatch_logs: A boolean that controls if the Prefect\n container will be configured to send its output to the\n AWS CloudWatch logs service or not. This functionality requires\n an execution role with permissions to create log streams and groups.\n cloudwatch_logs_options: A dictionary of options to pass to\n the CloudWatch logs configuration.\n stream_output: A boolean indicating whether logs will be\n streamed from the Prefect container to the local console.\n launch_type: An optional launch type for the ECS task run infrastructure.\n vpc_id: An optional VPC ID to link the task run to.\n This is only applicable when using the 'awsvpc' network mode for your task.\n cluster: An optional ECS cluster to run the task in.\n The ARN or name may be provided. If not provided,\n the default cluster will be used.\n env: A dictionary of environment variables to provide to\n the task run. These variables are set on the\n Prefect container at task runtime.\n task_role_arn: An optional role to attach to the task run.\n This controls the permissions of the task while it is running.\n task_customizations: A list of JSON 6902 patches to apply to the task\n run request. If a string is given, it will parsed as a JSON expression.\n task_start_timeout_seconds: The amount of time to watch for the\n start of the ECS task before marking it as failed. The task must\n enter a RUNNING state to be considered started.\n task_watch_poll_interval: The amount of time to wait between AWS API\n calls while monitoring the state of an ECS task.\n \"\"\"\n\n _block_type_slug = \"ecs-task\"\n _block_type_name = \"ECS Task\"\n _logo_url = \"https://cdn.sanity.io/images/3ugk85nk/production/d74b16fe84ce626345adf235a47008fea2869a60-225x225.png\" # noqa\n _description = \"Run a command as an ECS task.\" # noqa\n _documentation_url = (\n \"https://prefecthq.github.io/prefect-aws/ecs/#prefect_aws.ecs.ECSTask\" # noqa\n )\n\n type: Literal[\"ecs-task\"] = Field(\n \"ecs-task\", description=\"The slug for this task type.\"\n )\n\n aws_credentials: AwsCredentials = Field(\n title=\"AWS Credentials\",\n default_factory=AwsCredentials,\n description=\"The AWS credentials to use to connect to ECS.\",\n )\n\n # Task definition settings\n task_definition_arn: Optional[str] = Field(\n default=None,\n description=(\n \"An identifier for an existing task definition to use. If fields are set \"\n \"on the `ECSTask` that conflict with the task definition, a new copy \"\n \"will be registered with the required values. \"\n \"Cannot be used with `task_definition`. If not provided, Prefect will \"\n \"generate and register a minimal task definition.\"\n ),\n )\n task_definition: Optional[dict] = Field(\n default=None,\n description=(\n \"An ECS task definition to use. Prefect may set defaults or override \"\n \"fields on this task definition to match other `ECSTask` fields. \"\n \"Cannot be used with `task_definition_arn`. If not provided, Prefect will \"\n \"generate and register a minimal task definition.\"\n ),\n )\n family: Optional[str] = Field(\n default=None,\n description=(\n \"A family for the task definition. If not provided, it will be inferred \"\n \"from the task definition. If the task definition does not have a family, \"\n \"the name will be generated. When flow and deployment metadata is \"\n \"available, the generated name will include their names. Values for this \"\n \"field will be slugified to match AWS character requirements.\"\n ),\n )\n image: Optional[str] = Field(\n default=None,\n description=(\n \"The image to use for the Prefect container in the task. If this value is \"\n \"not null, it will override the value in the task definition. This value \"\n \"defaults to a Prefect base image matching your local versions.\"\n ),\n )\n auto_deregister_task_definition: bool = Field(\n default=True,\n description=(\n \"If set, any task definitions that are created by this block will be \"\n \"deregistered. Existing task definitions linked by ARN will never be \"\n \"deregistered. Deregistering a task definition does not remove it from \"\n \"your AWS account, instead it will be marked as INACTIVE.\"\n ),\n )\n\n # Mixed task definition / run settings\n cpu: int = Field(\n title=\"CPU\",\n default=None,\n description=(\n \"The amount of CPU to provide to the ECS task. Valid amounts are \"\n \"specified in the AWS documentation. If not provided, a default value of \"\n f\"{ECS_DEFAULT_CPU} will be used unless present on the task definition.\"\n ),\n )\n memory: int = Field(\n default=None,\n description=(\n \"The amount of memory to provide to the ECS task. Valid amounts are \"\n \"specified in the AWS documentation. If not provided, a default value of \"\n f\"{ECS_DEFAULT_MEMORY} will be used unless present on the task definition.\"\n ),\n )\n execution_role_arn: str = Field(\n title=\"Execution Role ARN\",\n default=None,\n description=(\n \"An execution role to use for the task. This controls the permissions of \"\n \"the task when it is launching. If this value is not null, it will \"\n \"override the value in the task definition. An execution role must be \"\n \"provided to capture logs from the container.\"\n ),\n )\n configure_cloudwatch_logs: bool = Field(\n default=None,\n description=(\n \"If `True`, the Prefect container will be configured to send its output \"\n \"to the AWS CloudWatch logs service. This functionality requires an \"\n \"execution role with logs:CreateLogStream, logs:CreateLogGroup, and \"\n \"logs:PutLogEvents permissions. The default for this field is `False` \"\n \"unless `stream_output` is set.\"\n ),\n )\n cloudwatch_logs_options: Dict[str, str] = Field(\n default_factory=dict,\n description=(\n \"When `configure_cloudwatch_logs` is enabled, this setting may be used to \"\n \"pass additional options to the CloudWatch logs configuration or override \"\n \"the default options. See the AWS documentation for available options. \"\n \"https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_awslogs.html#create_awslogs_logdriver_options.\" # noqa\n ),\n )\n stream_output: bool = Field(\n default=None,\n description=(\n \"If `True`, logs will be streamed from the Prefect container to the local \"\n \"console. Unless you have configured AWS CloudWatch logs manually on your \"\n \"task definition, this requires the same prerequisites outlined in \"\n \"`configure_cloudwatch_logs`.\"\n ),\n )\n\n # Task run settings\n launch_type: Optional[Literal[\"FARGATE\", \"EC2\", \"EXTERNAL\", \"FARGATE_SPOT\"]] = (\n Field(\n default=\"FARGATE\",\n description=(\n \"The type of ECS task run infrastructure that should be used. Note that\"\n \" 'FARGATE_SPOT' is not a formal ECS launch type, but we will configure\"\n \" the proper capacity provider stategy if set here.\"\n ),\n )\n )\n vpc_id: Optional[str] = Field(\n title=\"VPC ID\",\n default=None,\n description=(\n \"The AWS VPC to link the task run to. This is only applicable when using \"\n \"the 'awsvpc' network mode for your task. FARGATE tasks require this \"\n \"network mode, but for EC2 tasks the default network mode is 'bridge'. \"\n \"If using the 'awsvpc' network mode and this field is null, your default \"\n \"VPC will be used. If no default VPC can be found, the task run will fail.\"\n ),\n )\n cluster: Optional[str] = Field(\n default=None,\n description=(\n \"The ECS cluster to run the task in. The ARN or name may be provided. If \"\n \"not provided, the default cluster will be used.\"\n ),\n )\n env: Dict[str, Optional[str]] = Field(\n title=\"Environment Variables\",\n default_factory=dict,\n description=(\n \"Environment variables to provide to the task run. These variables are set \"\n \"on the Prefect container at task runtime. These will not be set on the \"\n \"task definition.\"\n ),\n )\n task_role_arn: str = Field(\n title=\"Task Role ARN\",\n default=None,\n description=(\n \"A role to attach to the task run. This controls the permissions of the \"\n \"task while it is running.\"\n ),\n )\n task_customizations: JsonPatch = Field(\n default_factory=lambda: JsonPatch([]),\n description=(\n \"A list of JSON 6902 patches to apply to the task run request. \"\n \"If a string is given, it will parsed as a JSON expression.\"\n ),\n )\n\n # Execution settings\n task_start_timeout_seconds: int = Field(\n default=120,\n description=(\n \"The amount of time to watch for the start of the ECS task \"\n \"before marking it as failed. The task must enter a RUNNING state to be \"\n \"considered started.\"\n ),\n )\n task_watch_poll_interval: float = Field(\n default=5.0,\n description=(\n \"The amount of time to wait between AWS API calls while monitoring the \"\n \"state of an ECS task.\"\n ),\n )\n\n @root_validator(pre=True)\n def set_default_configure_cloudwatch_logs(cls, values: dict) -> dict:\n \"\"\"\n Streaming output generally requires CloudWatch logs to be configured.\n\n To avoid entangled arguments in the simple case, `configure_cloudwatch_logs`\n defaults to matching the value of `stream_output`.\n \"\"\"\n configure_cloudwatch_logs = values.get(\"configure_cloudwatch_logs\")\n if configure_cloudwatch_logs is None:\n values[\"configure_cloudwatch_logs\"] = values.get(\"stream_output\")\n return values\n\n @root_validator\n def configure_cloudwatch_logs_requires_execution_role_arn(\n cls, values: dict\n ) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if (\n values.get(\"configure_cloudwatch_logs\")\n and not values.get(\"execution_role_arn\")\n # Do not raise if they've linked to another task definition or provided\n # it without using our shortcuts\n and not values.get(\"task_definition_arn\")\n and not (values.get(\"task_definition\") or {}).get(\"executionRoleArn\")\n ):\n raise ValueError(\n \"An `execution_role_arn` must be provided to use \"\n \"`configure_cloudwatch_logs` or `stream_logs`.\"\n )\n return values\n\n @root_validator\n def cloudwatch_logs_options_requires_configure_cloudwatch_logs(\n cls, values: dict\n ) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if values.get(\"cloudwatch_logs_options\") and not values.get(\n \"configure_cloudwatch_logs\"\n ):\n raise ValueError(\n \"`configure_cloudwatch_log` must be enabled to use \"\n \"`cloudwatch_logs_options`.\"\n )\n return values\n\n @root_validator(pre=True)\n def image_is_required(cls, values: dict) -> dict:\n \"\"\"\n Enforces that an image is available if image is `None`.\n \"\"\"\n has_image = bool(values.get(\"image\"))\n has_task_definition_arn = bool(values.get(\"task_definition_arn\"))\n\n # The image can only be null when the task_definition_arn is set\n if has_image or has_task_definition_arn:\n return values\n\n prefect_container = (\n get_prefect_container(\n (values.get(\"task_definition\") or {}).get(\"containerDefinitions\", [])\n )\n or {}\n )\n image_in_task_definition = prefect_container.get(\"image\")\n\n # If a task_definition is given with a prefect container image, use that value\n if image_in_task_definition:\n values[\"image\"] = image_in_task_definition\n # Otherwise, it should default to the Prefect base image\n else:\n values[\"image\"] = get_prefect_image_name()\n return values\n\n @validator(\"task_customizations\", pre=True)\n def cast_customizations_to_a_json_patch(\n cls, value: Union[List[Dict], JsonPatch, str]\n ) -> JsonPatch:\n \"\"\"\n Casts lists to JsonPatch instances.\n \"\"\"\n if isinstance(value, str):\n value = json.loads(value)\n if isinstance(value, list):\n return JsonPatch(value)\n return value # type: ignore\n\n class Config:\n \"\"\"Configuration of pydantic.\"\"\"\n\n # Support serialization of the 'JsonPatch' type\n arbitrary_types_allowed = True\n json_encoders = {JsonPatch: lambda p: p.patch}\n\n def dict(self, *args, **kwargs) -> Dict:\n \"\"\"\n Convert to a dictionary.\n \"\"\"\n # Support serialization of the 'JsonPatch' type\n d = super().dict(*args, **kwargs)\n d[\"task_customizations\"] = self.task_customizations.patch\n return d\n\n def prepare_for_flow_run(\n self: Self,\n flow_run: \"FlowRun\",\n deployment: Optional[\"Deployment\"] = None,\n flow: Optional[\"Flow\"] = None,\n ) -> Self:\n \"\"\"\n Return an copy of the block that is prepared to execute a flow run.\n \"\"\"\n new_family = None\n\n # Update the family if not specified elsewhere\n if (\n not self.family\n and not self.task_definition_arn\n and not (self.task_definition and self.task_definition.get(\"family\"))\n ):\n if flow and deployment:\n new_family = f\"{ECS_DEFAULT_FAMILY}__{flow.name}__{deployment.name}\"\n elif flow and not deployment:\n new_family = f\"{ECS_DEFAULT_FAMILY}__{flow.name}\"\n elif deployment and not flow:\n # This is a weird case and should not be see in the wild\n new_family = f\"{ECS_DEFAULT_FAMILY}__unknown-flow__{deployment.name}\"\n\n new = super().prepare_for_flow_run(flow_run, deployment=deployment, flow=flow)\n\n if new_family:\n return new.copy(update={\"family\": new_family})\n else:\n # Avoid an extra copy if not needed\n return new\n\n @sync_compatible\n async def run(self, task_status: Optional[TaskStatus] = None) -> ECSTaskResult:\n \"\"\"\n Run the configured task on ECS.\n \"\"\"\n boto_session, ecs_client = await run_sync_in_worker_thread(\n self._get_session_and_client\n )\n\n (\n task_arn,\n cluster_arn,\n task_definition,\n is_new_task_definition,\n ) = await run_sync_in_worker_thread(\n self._create_task_and_wait_for_start, boto_session, ecs_client\n )\n\n # Display a nice message indicating the command and image\n command = self.command or get_prefect_container(\n task_definition[\"containerDefinitions\"]\n ).get(\"command\", [])\n self.logger.info(\n f\"{self._log_prefix}: Running command {' '.join(command)!r} \"\n f\"in container {PREFECT_ECS_CONTAINER_NAME!r} ({self.image})...\"\n )\n\n # The task identifier is \"{cluster}::{task}\" where we use the configured cluster\n # if set to preserve matching by name rather than arn\n # Note \"::\" is used despite the Prefect standard being \":\" because ARNs contain\n # single colons.\n identifier = (self.cluster if self.cluster else cluster_arn) + \"::\" + task_arn\n\n if task_status:\n task_status.started(identifier)\n\n status_code = await run_sync_in_worker_thread(\n self._watch_task_and_get_exit_code,\n task_arn,\n cluster_arn,\n task_definition,\n is_new_task_definition and self.auto_deregister_task_definition,\n boto_session,\n ecs_client,\n )\n\n return ECSTaskResult(\n identifier=identifier,\n # If the container does not start the exit code can be null but we must\n # still report a status code. We use a -1 to indicate a special code.\n status_code=status_code if status_code is not None else -1,\n )\n\n @sync_compatible\n async def kill(self, identifier: str, grace_seconds: int = 30) -> None:\n \"\"\"\n Kill a task running on ECS.\n\n Args:\n identifier: A cluster and task arn combination. This should match a value\n yielded by `ECSTask.run`.\n \"\"\"\n if grace_seconds != 30:\n self.logger.warning(\n f\"Kill grace period of {grace_seconds}s requested, but AWS does not \"\n \"support dynamic grace period configuration so 30s will be used. \"\n \"See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-config.html for configuration of grace periods.\" # noqa\n )\n cluster, task = parse_task_identifier(identifier)\n await run_sync_in_worker_thread(self._stop_task, cluster, task)\n\n @staticmethod\n def get_corresponding_worker_type() -> str:\n \"\"\"Return the corresponding worker type for this infrastructure block.\"\"\"\n return ECSWorker.type\n\n async def generate_work_pool_base_job_template(self) -> dict:\n \"\"\"\n Generate a base job template for a cloud-run work pool with the same\n configuration as this block.\n\n Returns:\n - dict: a base job template for a cloud-run work pool\n \"\"\"\n base_job_template = copy.deepcopy(ECSWorker.get_default_base_job_template())\n for key, value in self.dict(exclude_unset=True, exclude_defaults=True).items():\n if key == \"command\":\n base_job_template[\"variables\"][\"properties\"][\"command\"][\"default\"] = (\n shlex.join(value)\n )\n elif key in [\n \"type\",\n \"block_type_slug\",\n \"_block_document_id\",\n \"_block_document_name\",\n \"_is_anonymous\",\n \"task_customizations\",\n ]:\n continue\n elif key == \"aws_credentials\":\n if not self.aws_credentials._block_document_id:\n raise BlockNotSavedError(\n \"It looks like you are trying to use a block that\"\n \" has not been saved. Please call `.save` on your block\"\n \" before publishing it as a work pool.\"\n )\n base_job_template[\"variables\"][\"properties\"][\"aws_credentials\"][\n \"default\"\n ] = {\n \"$ref\": {\n \"block_document_id\": str(\n self.aws_credentials._block_document_id\n )\n }\n }\n elif key == \"task_definition\":\n base_job_template[\"job_configuration\"][\"task_definition\"] = value\n elif key in base_job_template[\"variables\"][\"properties\"]:\n base_job_template[\"variables\"][\"properties\"][key][\"default\"] = value\n else:\n self.logger.warning(\n f\"Variable {key!r} is not supported by Cloud Run work pools.\"\n \" Skipping.\"\n )\n\n if self.task_customizations:\n try:\n base_job_template[\"job_configuration\"][\"task_run_request\"] = (\n self.task_customizations.apply(\n base_job_template[\"job_configuration\"][\"task_run_request\"]\n )\n )\n except JsonPointerException:\n self.logger.warning(\n \"Unable to apply task customizations to the base job template.\"\n \"You may need to update the template manually.\"\n )\n\n return base_job_template\n\n def _stop_task(self, cluster: str, task: str) -> None:\n \"\"\"\n Stop a running ECS task.\n \"\"\"\n if self.cluster is not None and cluster != self.cluster:\n raise InfrastructureNotAvailable(\n \"Cannot stop ECS task: this infrastructure block has access to \"\n f\"cluster {self.cluster!r} but the task is running in cluster \"\n f\"{cluster!r}.\"\n )\n\n _, ecs_client = self._get_session_and_client()\n try:\n ecs_client.stop_task(cluster=cluster, task=task)\n except Exception as exc:\n # Raise a special exception if the task does not exist\n if \"ClusterNotFound\" in str(exc):\n raise InfrastructureNotFound(\n f\"Cannot stop ECS task: the cluster {cluster!r} could not be found.\"\n ) from exc\n if \"not find task\" in str(exc) or \"referenced task was not found\" in str(\n exc\n ):\n raise InfrastructureNotFound(\n f\"Cannot stop ECS task: the task {task!r} could not be found in \"\n f\"cluster {cluster!r}.\"\n ) from exc\n if \"no registered tasks\" in str(exc):\n raise InfrastructureNotFound(\n f\"Cannot stop ECS task: the cluster {cluster!r} has no tasks.\"\n ) from exc\n\n # Reraise unknown exceptions\n raise\n\n @property\n def _log_prefix(self) -> str:\n \"\"\"\n Internal property for generating a prefix for logs where `name` may be null\n \"\"\"\n if self.name is not None:\n return f\"ECSTask {self.name!r}\"\n else:\n return \"ECSTask\"\n\n def _get_session_and_client(self) -> Tuple[boto3.Session, _ECSClient]:\n \"\"\"\n Retrieve a boto3 session and ECS client\n \"\"\"\n boto_session = self.aws_credentials.get_boto3_session()\n ecs_client = boto_session.client(\"ecs\")\n return boto_session, ecs_client\n\n def _create_task_and_wait_for_start(\n self, boto_session: boto3.Session, ecs_client: _ECSClient\n ) -> Tuple[str, str, dict, bool]:\n \"\"\"\n Register the task definition, create the task run, and wait for it to start.\n\n Returns a tuple of\n - The task ARN\n - The task's cluster ARN\n - The task definition\n - A bool indicating if the task definition is newly registered\n \"\"\"\n new_task_definition_registered = False\n requested_task_definition = (\n self._retrieve_task_definition(ecs_client, self.task_definition_arn)\n if self.task_definition_arn\n else self.task_definition\n ) or {}\n task_definition_arn = requested_task_definition.get(\"taskDefinitionArn\", None)\n\n task_definition = self._prepare_task_definition(\n requested_task_definition, region=ecs_client.meta.region_name\n )\n\n # We must register the task definition if the arn is null or changes were made\n if task_definition != requested_task_definition or not task_definition_arn:\n # Before registering, check if the latest task definition in the family\n # can be used\n latest_task_definition = self._retrieve_latest_task_definition(\n ecs_client, task_definition[\"family\"]\n )\n if self._task_definitions_equal(latest_task_definition, task_definition):\n self.logger.debug(\n f\"{self._log_prefix}: The latest task definition matches the \"\n \"required task definition; using that instead of registering a new \"\n \" one.\"\n )\n task_definition_arn = latest_task_definition[\"taskDefinitionArn\"]\n else:\n if task_definition_arn:\n self.logger.warning(\n f\"{self._log_prefix}: Settings require changes to the linked \"\n \"task definition. A new task definition will be registered. \"\n + (\n \"Enable DEBUG level logs to see the difference.\"\n if self.logger.level > logging.DEBUG\n else \"\"\n )\n )\n self.logger.debug(\n f\"{self._log_prefix}: Diff for requested task definition\"\n + _pretty_diff(requested_task_definition, task_definition)\n )\n else:\n self.logger.info(\n f\"{self._log_prefix}: Registering task definition...\"\n )\n self.logger.debug(\n \"Task definition payload\\n\" + yaml.dump(task_definition)\n )\n\n task_definition_arn = self._register_task_definition(\n ecs_client, task_definition\n )\n new_task_definition_registered = True\n\n if task_definition.get(\"networkMode\") == \"awsvpc\":\n network_config = self._load_vpc_network_config(self.vpc_id, boto_session)\n else:\n network_config = None\n\n task_run = self._prepare_task_run(\n network_config=network_config,\n task_definition_arn=task_definition_arn,\n )\n self.logger.info(f\"{self._log_prefix}: Creating task run...\")\n self.logger.debug(\"Task run payload\\n\" + yaml.dump(task_run))\n\n try:\n task = self._run_task(ecs_client, task_run)\n task_arn = task[\"taskArn\"]\n cluster_arn = task[\"clusterArn\"]\n except Exception as exc:\n self._report_task_run_creation_failure(task_run, exc)\n\n # Raises an exception if the task does not start\n self.logger.info(f\"{self._log_prefix}: Waiting for task run to start...\")\n self._wait_for_task_start(\n task_arn, cluster_arn, ecs_client, timeout=self.task_start_timeout_seconds\n )\n\n return task_arn, cluster_arn, task_definition, new_task_definition_registered\n\n def _watch_task_and_get_exit_code(\n self,\n task_arn: str,\n cluster_arn: str,\n task_definition: dict,\n deregister_task_definition: bool,\n boto_session: boto3.Session,\n ecs_client: _ECSClient,\n ) -> Optional[int]:\n \"\"\"\n Wait for the task run to complete and retrieve the exit code of the Prefect\n container.\n \"\"\"\n\n # Wait for completion and stream logs\n task = self._wait_for_task_finish(\n task_arn, cluster_arn, task_definition, ecs_client, boto_session\n )\n\n if deregister_task_definition:\n ecs_client.deregister_task_definition(\n taskDefinition=task[\"taskDefinitionArn\"]\n )\n\n # Check the status code of the Prefect container\n prefect_container = get_prefect_container(task[\"containers\"])\n assert (\n prefect_container is not None\n ), f\"'prefect' container missing from task: {task}\"\n status_code = prefect_container.get(\"exitCode\")\n self._report_container_status_code(PREFECT_ECS_CONTAINER_NAME, status_code)\n\n return status_code\n\n def _task_definitions_equal(self, taskdef_1, taskdef_2) -> bool:\n \"\"\"\n Compare two task definitions.\n\n Since one may come from the AWS API and have populated defaults, we do our best\n to homogenize the definitions without changing their meaning.\n \"\"\"\n if taskdef_1 == taskdef_2:\n return True\n\n if taskdef_1 is None or taskdef_2 is None:\n return False\n\n taskdef_1 = copy.deepcopy(taskdef_1)\n taskdef_2 = copy.deepcopy(taskdef_2)\n\n def _set_aws_defaults(taskdef):\n \"\"\"Set defaults that AWS would set after registration\"\"\"\n container_definitions = taskdef.get(\"containerDefinitions\", [])\n essential = any(\n container.get(\"essential\") for container in container_definitions\n )\n if not essential:\n container_definitions[0].setdefault(\"essential\", True)\n\n taskdef.setdefault(\"networkMode\", \"bridge\")\n\n _set_aws_defaults(taskdef_1)\n _set_aws_defaults(taskdef_2)\n\n def _drop_empty_keys(dict_):\n \"\"\"Recursively drop keys with 'empty' values\"\"\"\n for key, value in tuple(dict_.items()):\n if not value:\n dict_.pop(key)\n if isinstance(value, dict):\n _drop_empty_keys(value)\n if isinstance(value, list):\n for v in value:\n if isinstance(v, dict):\n _drop_empty_keys(v)\n\n _drop_empty_keys(taskdef_1)\n _drop_empty_keys(taskdef_2)\n\n # Clear fields that change on registration for comparison\n for field in POST_REGISTRATION_FIELDS:\n taskdef_1.pop(field, None)\n taskdef_2.pop(field, None)\n\n return taskdef_1 == taskdef_2\n\n def preview(self) -> str:\n \"\"\"\n Generate a preview of the task definition and task run that will be sent to AWS.\n \"\"\"\n preview = \"\"\n\n task_definition_arn = self.task_definition_arn or \"<registered at runtime>\"\n\n if self.task_definition or not self.task_definition_arn:\n task_definition = self._prepare_task_definition(\n self.task_definition or {},\n region=self.aws_credentials.region_name\n or \"<loaded from client at runtime>\",\n )\n preview += \"---\\n# Task definition\\n\"\n preview += yaml.dump(task_definition)\n preview += \"\\n\"\n else:\n task_definition = None\n\n if task_definition and task_definition.get(\"networkMode\") == \"awsvpc\":\n vpc = \"the default VPC\" if not self.vpc_id else self.vpc_id\n network_config = {\n \"awsvpcConfiguration\": {\n \"subnets\": f\"<loaded from {vpc} at runtime>\",\n \"assignPublicIp\": \"ENABLED\",\n }\n }\n else:\n network_config = None\n\n task_run = self._prepare_task_run(network_config, task_definition_arn)\n preview += \"---\\n# Task run request\\n\"\n preview += yaml.dump(task_run)\n\n return preview\n\n def _report_container_status_code(\n self, name: str, status_code: Optional[int]\n ) -> None:\n \"\"\"\n Display a log for the given container status code.\n \"\"\"\n if status_code is None:\n self.logger.error(\n f\"{self._log_prefix}: Task exited without reporting an exit status \"\n f\"for container {name!r}.\"\n )\n elif status_code == 0:\n self.logger.info(\n f\"{self._log_prefix}: Container {name!r} exited successfully.\"\n )\n else:\n self.logger.warning(\n f\"{self._log_prefix}: Container {name!r} exited with non-zero exit \"\n f\"code {status_code}.\"\n )\n\n def _report_task_run_creation_failure(self, task_run: dict, exc: Exception) -> None:\n \"\"\"\n Wrap common AWS task run creation failures with nicer user-facing messages.\n \"\"\"\n # AWS generates exception types at runtime so they must be captured a bit\n # differently than normal.\n if \"ClusterNotFoundException\" in str(exc):\n cluster = task_run.get(\"cluster\", \"default\")\n raise RuntimeError(\n f\"Failed to run ECS task, cluster {cluster!r} not found. \"\n \"Confirm that the cluster is configured in your region.\"\n ) from exc\n elif \"No Container Instances\" in str(exc) and self.launch_type == \"EC2\":\n cluster = task_run.get(\"cluster\", \"default\")\n raise RuntimeError(\n f\"Failed to run ECS task, cluster {cluster!r} does not appear to \"\n \"have any container instances associated with it. Confirm that you \"\n \"have EC2 container instances available.\"\n ) from exc\n elif (\n \"failed to validate logger args\" in str(exc)\n and \"AccessDeniedException\" in str(exc)\n and self.configure_cloudwatch_logs\n ):\n raise RuntimeError(\n \"Failed to run ECS task, the attached execution role does not appear \"\n \"to have sufficient permissions. Ensure that the execution role \"\n f\"{self.execution_role!r} has permissions logs:CreateLogStream, \"\n \"logs:CreateLogGroup, and logs:PutLogEvents.\"\n )\n else:\n raise\n\n def _watch_task_run(\n self,\n task_arn: str,\n cluster_arn: str,\n ecs_client: _ECSClient,\n current_status: str = \"UNKNOWN\",\n until_status: str = None,\n timeout: int = None,\n ) -> Generator[None, None, dict]:\n \"\"\"\n Watches an ECS task run by querying every `poll_interval` seconds. After each\n query, the retrieved task is yielded. This function returns when the task run\n reaches a STOPPED status or the provided `until_status`.\n\n Emits a log each time the status changes.\n \"\"\"\n last_status = status = current_status\n t0 = time.time()\n while status != until_status:\n tasks = ecs_client.describe_tasks(\n tasks=[task_arn], cluster=cluster_arn, include=[\"TAGS\"]\n )[\"tasks\"]\n\n if tasks:\n task = tasks[0]\n\n status = task[\"lastStatus\"]\n if status != last_status:\n self.logger.info(f\"{self._log_prefix}: Status is {status}.\")\n\n yield task\n\n # No point in continuing if the status is final\n if status == \"STOPPED\":\n break\n\n last_status = status\n\n else:\n # Intermittently, the task will not be described. We wat to respect the\n # watch timeout though.\n self.logger.debug(f\"{self._log_prefix}: Task not found.\")\n\n elapsed_time = time.time() - t0\n if timeout is not None and elapsed_time > timeout:\n raise RuntimeError(\n f\"Timed out after {elapsed_time}s while watching task for status \"\n f\"{until_status or 'STOPPED'}\"\n )\n time.sleep(self.task_watch_poll_interval)\n\n def _wait_for_task_start(\n self, task_arn: str, cluster_arn: str, ecs_client: _ECSClient, timeout: int\n ) -> dict:\n \"\"\"\n Waits for an ECS task run to reach a RUNNING status.\n\n If a STOPPED status is reached instead, an exception is raised indicating the\n reason that the task run did not start.\n \"\"\"\n for task in self._watch_task_run(\n task_arn, cluster_arn, ecs_client, until_status=\"RUNNING\", timeout=timeout\n ):\n # TODO: It is possible that the task has passed _through_ a RUNNING\n # status during the polling interval. In this case, there is not an\n # exception to raise.\n if task[\"lastStatus\"] == \"STOPPED\":\n code = task.get(\"stopCode\")\n reason = task.get(\"stoppedReason\")\n # Generate a dynamic exception type from the AWS name\n raise type(code, (RuntimeError,), {})(reason)\n\n return task\n\n def _wait_for_task_finish(\n self,\n task_arn: str,\n cluster_arn: str,\n task_definition: dict,\n ecs_client: _ECSClient,\n boto_session: boto3.Session,\n ):\n \"\"\"\n Watch an ECS task until it reaches a STOPPED status.\n\n If configured, logs from the Prefect container are streamed to stderr.\n\n Returns a description of the task on completion.\n \"\"\"\n can_stream_output = False\n\n if self.stream_output:\n container_def = get_prefect_container(\n task_definition[\"containerDefinitions\"]\n )\n if not container_def:\n self.logger.warning(\n f\"{self._log_prefix}: Prefect container definition not found in \"\n \"task definition. Output cannot be streamed.\"\n )\n elif not container_def.get(\"logConfiguration\"):\n self.logger.warning(\n f\"{self._log_prefix}: Logging configuration not found on task. \"\n \"Output cannot be streamed.\"\n )\n elif not container_def[\"logConfiguration\"].get(\"logDriver\") == \"awslogs\":\n self.logger.warning(\n f\"{self._log_prefix}: Logging configuration uses unsupported \"\n \" driver {container_def['logConfiguration'].get('logDriver')!r}. \"\n \"Output cannot be streamed.\"\n )\n else:\n # Prepare to stream the output\n log_config = container_def[\"logConfiguration\"][\"options\"]\n logs_client = boto_session.client(\"logs\")\n can_stream_output = True\n # Track the last log timestamp to prevent double display\n last_log_timestamp: Optional[int] = None\n # Determine the name of the stream as \"prefix/container/run-id\"\n stream_name = \"/\".join(\n [\n log_config[\"awslogs-stream-prefix\"],\n PREFECT_ECS_CONTAINER_NAME,\n task_arn.rsplit(\"/\")[-1],\n ]\n )\n self.logger.info(\n f\"{self._log_prefix}: Streaming output from container \"\n f\"{PREFECT_ECS_CONTAINER_NAME!r}...\"\n )\n\n for task in self._watch_task_run(\n task_arn, cluster_arn, ecs_client, current_status=\"RUNNING\"\n ):\n if self.stream_output and can_stream_output:\n # On each poll for task run status, also retrieve available logs\n last_log_timestamp = self._stream_available_logs(\n logs_client,\n log_group=log_config[\"awslogs-group\"],\n log_stream=stream_name,\n last_log_timestamp=last_log_timestamp,\n )\n\n return task\n\n def _stream_available_logs(\n self,\n logs_client: Any,\n log_group: str,\n log_stream: str,\n last_log_timestamp: Optional[int] = None,\n ) -> Optional[int]:\n \"\"\"\n Stream logs from the given log group and stream since the last log timestamp.\n\n Will continue on paginated responses until all logs are returned.\n\n Returns the last log timestamp which can be used to call this method in the\n future.\n \"\"\"\n last_log_stream_token = \"NO-TOKEN\"\n next_log_stream_token = None\n\n # AWS will return the same token that we send once the end of the paginated\n # response is reached\n while last_log_stream_token != next_log_stream_token:\n last_log_stream_token = next_log_stream_token\n\n request = {\n \"logGroupName\": log_group,\n \"logStreamName\": log_stream,\n }\n\n if last_log_stream_token is not None:\n request[\"nextToken\"] = last_log_stream_token\n\n if last_log_timestamp is not None:\n # Bump the timestamp by one ms to avoid retrieving the last log again\n request[\"startTime\"] = last_log_timestamp + 1\n\n try:\n response = logs_client.get_log_events(**request)\n except Exception:\n self.logger.error(\n (\n f\"{self._log_prefix}: Failed to read log events with request \"\n f\"{request}\"\n ),\n exc_info=True,\n )\n return last_log_timestamp\n\n log_events = response[\"events\"]\n for log_event in log_events:\n # TODO: This doesn't forward to the local logger, which can be\n # bad for customizing handling and understanding where the\n # log is coming from, but it avoid nesting logger information\n # when the content is output from a Prefect logger on the\n # running infrastructure\n print(log_event[\"message\"], file=sys.stderr)\n\n if (\n last_log_timestamp is None\n or log_event[\"timestamp\"] > last_log_timestamp\n ):\n last_log_timestamp = log_event[\"timestamp\"]\n\n next_log_stream_token = response.get(\"nextForwardToken\")\n if not log_events:\n # Stop reading pages if there was no data\n break\n\n return last_log_timestamp\n\n def _retrieve_latest_task_definition(\n self, ecs_client: _ECSClient, task_definition_family: str\n ) -> Optional[dict]:\n try:\n latest_task_definition = self._retrieve_task_definition(\n ecs_client, task_definition_family\n )\n except Exception:\n # The family does not exist...\n return None\n\n return latest_task_definition\n\n def _retrieve_task_definition(\n self, ecs_client: _ECSClient, task_definition_arn: str\n ):\n \"\"\"\n Retrieve an existing task definition from AWS.\n \"\"\"\n self.logger.info(\n f\"{self._log_prefix}: Retrieving task definition {task_definition_arn!r}...\"\n )\n response = ecs_client.describe_task_definition(\n taskDefinition=task_definition_arn\n )\n return response[\"taskDefinition\"]\n\n def _register_task_definition(\n self, ecs_client: _ECSClient, task_definition: dict\n ) -> str:\n \"\"\"\n Register a new task definition with AWS.\n \"\"\"\n # TODO: Consider including a global cache for this task definition since\n # registration of task definitions is frequently rate limited\n task_definition_request = copy.deepcopy(task_definition)\n\n # We need to remove some fields here if copying an existing task definition\n for field in POST_REGISTRATION_FIELDS:\n task_definition_request.pop(field, None)\n\n response = ecs_client.register_task_definition(**task_definition_request)\n return response[\"taskDefinition\"][\"taskDefinitionArn\"]\n\n def _prepare_task_definition(self, task_definition: dict, region: str) -> dict:\n \"\"\"\n Prepare a task definition by inferring any defaults and merging overrides.\n \"\"\"\n task_definition = copy.deepcopy(task_definition)\n\n # Configure the Prefect runtime container\n task_definition.setdefault(\"containerDefinitions\", [])\n container = get_prefect_container(task_definition[\"containerDefinitions\"])\n if container is None:\n container = {\"name\": PREFECT_ECS_CONTAINER_NAME}\n task_definition[\"containerDefinitions\"].append(container)\n\n if self.image:\n container[\"image\"] = self.image\n\n # Remove any keys that have been explicitly \"unset\"\n unset_keys = {key for key, value in self.env.items() if value is None}\n for item in tuple(container.get(\"environment\", [])):\n if item[\"name\"] in unset_keys:\n container[\"environment\"].remove(item)\n\n if self.configure_cloudwatch_logs:\n container[\"logConfiguration\"] = {\n \"logDriver\": \"awslogs\",\n \"options\": {\n \"awslogs-create-group\": \"true\",\n \"awslogs-group\": \"prefect\",\n \"awslogs-region\": region,\n \"awslogs-stream-prefix\": self.name or \"prefect\",\n **self.cloudwatch_logs_options,\n },\n }\n\n family = self.family or task_definition.get(\"family\") or ECS_DEFAULT_FAMILY\n task_definition[\"family\"] = slugify(\n family,\n max_length=255,\n regex_pattern=r\"[^a-zA-Z0-9-_]+\",\n )\n\n # CPU and memory are required in some cases, retrieve the value to use\n cpu = self.cpu or task_definition.get(\"cpu\") or ECS_DEFAULT_CPU\n memory = self.memory or task_definition.get(\"memory\") or ECS_DEFAULT_MEMORY\n\n if self.launch_type == \"FARGATE\" or self.launch_type == \"FARGATE_SPOT\":\n # Task level memory and cpu are required when using fargate\n task_definition[\"cpu\"] = str(cpu)\n task_definition[\"memory\"] = str(memory)\n\n # The FARGATE compatibility is required if it will be used as as launch type\n requires_compatibilities = task_definition.setdefault(\n \"requiresCompatibilities\", []\n )\n if \"FARGATE\" not in requires_compatibilities:\n task_definition[\"requiresCompatibilities\"].append(\"FARGATE\")\n\n # Only the 'awsvpc' network mode is supported when using FARGATE\n # However, we will not enforce that here if the user has set it\n network_mode = task_definition.setdefault(\"networkMode\", \"awsvpc\")\n\n if network_mode != \"awsvpc\":\n warnings.warn(\n f\"Found network mode {network_mode!r} which is not compatible with \"\n f\"launch type {self.launch_type!r}. Use either the 'EC2' launch \"\n \"type or the 'awsvpc' network mode.\"\n )\n\n elif self.launch_type == \"EC2\":\n # Container level memory and cpu are required when using ec2\n container.setdefault(\"cpu\", int(cpu))\n container.setdefault(\"memory\", int(memory))\n\n if self.execution_role_arn and not self.task_definition_arn:\n task_definition[\"executionRoleArn\"] = self.execution_role_arn\n\n if self.configure_cloudwatch_logs and not task_definition.get(\n \"executionRoleArn\"\n ):\n raise ValueError(\n \"An execution role arn must be set on the task definition to use \"\n \"`configure_cloudwatch_logs` or `stream_logs` but no execution role \"\n \"was found on the task definition.\"\n )\n\n return task_definition\n\n def _prepare_task_run_overrides(self) -> dict:\n \"\"\"\n Prepare the 'overrides' payload for a task run request.\n \"\"\"\n overrides = {\n \"containerOverrides\": [\n {\n \"name\": PREFECT_ECS_CONTAINER_NAME,\n \"environment\": [\n {\"name\": key, \"value\": value}\n for key, value in {\n **self._base_environment(),\n **self.env,\n }.items()\n if value is not None\n ],\n }\n ],\n }\n\n prefect_container_overrides = overrides[\"containerOverrides\"][0]\n\n if self.command:\n prefect_container_overrides[\"command\"] = self.command\n\n if self.execution_role_arn:\n overrides[\"executionRoleArn\"] = self.execution_role_arn\n\n if self.task_role_arn:\n overrides[\"taskRoleArn\"] = self.task_role_arn\n\n if self.memory:\n overrides[\"memory\"] = str(self.memory)\n prefect_container_overrides.setdefault(\"memory\", self.memory)\n\n if self.cpu:\n overrides[\"cpu\"] = str(self.cpu)\n prefect_container_overrides.setdefault(\"cpu\", self.cpu)\n\n return overrides\n\n def _load_vpc_network_config(\n self, vpc_id: Optional[str], boto_session: boto3.Session\n ) -> dict:\n \"\"\"\n Load settings from a specific VPC or the default VPC and generate a task\n run request's network configuration.\n \"\"\"\n ec2_client = boto_session.client(\"ec2\")\n vpc_message = \"the default VPC\" if not vpc_id else f\"VPC with ID {vpc_id}\"\n\n if not vpc_id:\n # Retrieve the default VPC\n describe = {\"Filters\": [{\"Name\": \"isDefault\", \"Values\": [\"true\"]}]}\n else:\n describe = {\"VpcIds\": [vpc_id]}\n\n vpcs = ec2_client.describe_vpcs(**describe)[\"Vpcs\"]\n if not vpcs:\n help_message = (\n \"Pass an explicit `vpc_id` or configure a default VPC.\"\n if not vpc_id\n else \"Check that the VPC exists in the current region.\"\n )\n raise ValueError(\n f\"Failed to find {vpc_message}. \"\n \"Network configuration cannot be inferred. \"\n + help_message\n )\n\n vpc_id = vpcs[0][\"VpcId\"]\n subnets = ec2_client.describe_subnets(\n Filters=[{\"Name\": \"vpc-id\", \"Values\": [vpc_id]}]\n )[\"Subnets\"]\n if not subnets:\n raise ValueError(\n f\"Failed to find subnets for {vpc_message}. \"\n \"Network configuration cannot be inferred.\"\n )\n\n return {\n \"awsvpcConfiguration\": {\n \"subnets\": [s[\"SubnetId\"] for s in subnets],\n \"assignPublicIp\": \"ENABLED\",\n \"securityGroups\": [],\n }\n }\n\n def _prepare_task_run(\n self,\n network_config: Optional[dict],\n task_definition_arn: str,\n ) -> dict:\n \"\"\"\n Prepare a task run request payload.\n \"\"\"\n task_run = {\n \"overrides\": self._prepare_task_run_overrides(),\n \"tags\": [\n {\n \"key\": slugify(\n key,\n regex_pattern=_TAG_REGEX,\n allow_unicode=True,\n lowercase=False,\n ),\n \"value\": slugify(\n value,\n regex_pattern=_TAG_REGEX,\n allow_unicode=True,\n lowercase=False,\n ),\n }\n for key, value in self.labels.items()\n ],\n \"taskDefinition\": task_definition_arn,\n }\n\n if self.cluster:\n task_run[\"cluster\"] = self.cluster\n\n if self.launch_type:\n if self.launch_type == \"FARGATE_SPOT\":\n task_run[\"capacityProviderStrategy\"] = [\n {\"capacityProvider\": \"FARGATE_SPOT\", \"weight\": 1}\n ]\n else:\n task_run[\"launchType\"] = self.launch_type\n\n if network_config:\n task_run[\"networkConfiguration\"] = network_config\n\n task_run = self.task_customizations.apply(task_run)\n return task_run\n\n def _run_task(self, ecs_client: _ECSClient, task_run: dict):\n \"\"\"\n Run the task using the ECS client.\n\n This is isolated as a separate method for testing purposes.\n \"\"\"\n return ecs_client.run_task(**task_run)[\"tasks\"][0]\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask-attributes","title":"Attributes","text":""},{"location":"ecs/#prefect_aws.ecs.ECSTask.auto_deregister_task_definition","title":"
auto_deregister_task_definition: bool
pydantic-field
","text":"
If set, any task definitions that are created by this block will be deregistered. Existing task definitions linked by ARN will never be deregistered. Deregistering a task definition does not remove it from your AWS account, instead it will be marked as INACTIVE.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.aws_credentials","title":"
aws_credentials: AwsCredentials
pydantic-field
","text":"
The AWS credentials to use to connect to ECS.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.cloudwatch_logs_options","title":"
cloudwatch_logs_options: Dict[str, str]
pydantic-field
","text":"
When configure_cloudwatch_logs
is enabled, this setting may be used to pass additional options to the CloudWatch logs configuration or override the default options. See the AWS documentation for available options. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_awslogs.html#create_awslogs_logdriver_options.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.cluster","title":"
cluster: str
pydantic-field
","text":"
The ECS cluster to run the task in. The ARN or name may be provided. If not provided, the default cluster will be used.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.configure_cloudwatch_logs","title":"
configure_cloudwatch_logs: bool
pydantic-field
","text":"
If True
, the Prefect container will be configured to send its output to the AWS CloudWatch logs service. This functionality requires an execution role with logs:CreateLogStream, logs:CreateLogGroup, and logs:PutLogEvents permissions. The default for this field is False
unless stream_output
is set.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.cpu","title":"
cpu: int
pydantic-field
","text":"
The amount of CPU to provide to the ECS task. Valid amounts are specified in the AWS documentation. If not provided, a default value of 1024 will be used unless present on the task definition.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.execution_role_arn","title":"
execution_role_arn: str
pydantic-field
","text":"
An execution role to use for the task. This controls the permissions of the task when it is launching. If this value is not null, it will override the value in the task definition. An execution role must be provided to capture logs from the container.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.family","title":"
family: str
pydantic-field
","text":"
A family for the task definition. If not provided, it will be inferred from the task definition. If the task definition does not have a family, the name will be generated. When flow and deployment metadata is available, the generated name will include their names. Values for this field will be slugified to match AWS character requirements.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.image","title":"
image: str
pydantic-field
","text":"
The image to use for the Prefect container in the task. If this value is not null, it will override the value in the task definition. This value defaults to a Prefect base image matching your local versions.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.launch_type","title":"
launch_type: Literal['FARGATE', 'EC2', 'EXTERNAL', 'FARGATE_SPOT']
pydantic-field
","text":"
The type of ECS task run infrastructure that should be used. Note that 'FARGATE_SPOT' is not a formal ECS launch type, but we will configure the proper capacity provider stategy if set here.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.memory","title":"
memory: int
pydantic-field
","text":"
The amount of memory to provide to the ECS task. Valid amounts are specified in the AWS documentation. If not provided, a default value of 2048 will be used unless present on the task definition.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.stream_output","title":"
stream_output: bool
pydantic-field
","text":"
If True
, logs will be streamed from the Prefect container to the local console. Unless you have configured AWS CloudWatch logs manually on your task definition, this requires the same prerequisites outlined in configure_cloudwatch_logs
.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.task_customizations","title":"
task_customizations: JsonPatch
pydantic-field
","text":"
A list of JSON 6902 patches to apply to the task run request. If a string is given, it will parsed as a JSON expression.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.task_definition","title":"
task_definition: dict
pydantic-field
","text":"
An ECS task definition to use. Prefect may set defaults or override fields on this task definition to match other ECSTask
fields. Cannot be used with task_definition_arn
. If not provided, Prefect will generate and register a minimal task definition.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.task_definition_arn","title":"
task_definition_arn: str
pydantic-field
","text":"
An identifier for an existing task definition to use. If fields are set on the ECSTask
that conflict with the task definition, a new copy will be registered with the required values. Cannot be used with task_definition
. If not provided, Prefect will generate and register a minimal task definition.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.task_role_arn","title":"
task_role_arn: str
pydantic-field
","text":"
A role to attach to the task run. This controls the permissions of the task while it is running.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.task_start_timeout_seconds","title":"
task_start_timeout_seconds: int
pydantic-field
","text":"
The amount of time to watch for the start of the ECS task before marking it as failed. The task must enter a RUNNING state to be considered started.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.task_watch_poll_interval","title":"
task_watch_poll_interval: float
pydantic-field
","text":"
The amount of time to wait between AWS API calls while monitoring the state of an ECS task.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.vpc_id","title":"
vpc_id: str
pydantic-field
","text":"
The AWS VPC to link the task run to. This is only applicable when using the 'awsvpc' network mode for your task. FARGATE tasks require this network mode, but for EC2 tasks the default network mode is 'bridge'. If using the 'awsvpc' network mode and this field is null, your default VPC will be used. If no default VPC can be found, the task run will fail.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask-classes","title":"Classes","text":""},{"location":"ecs/#prefect_aws.ecs.ECSTask.Config","title":"
Config
","text":"
Configuration of pydantic.
Source code in
prefect_aws/ecs.py
class Config:\n \"\"\"Configuration of pydantic.\"\"\"\n\n # Support serialization of the 'JsonPatch' type\n arbitrary_types_allowed = True\n json_encoders = {JsonPatch: lambda p: p.patch}\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask-methods","title":"Methods","text":""},{"location":"ecs/#prefect_aws.ecs.ECSTask.__json_encoder__","title":"
__json_encoder__
special
staticmethod
","text":"
partial(func, args, *keywords) - new function with partial application of the given arguments and keywords.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.cast_customizations_to_a_json_patch","title":"
cast_customizations_to_a_json_patch
classmethod
","text":"
Casts lists to JsonPatch instances.
Source code in
prefect_aws/ecs.py
@validator(\"task_customizations\", pre=True)\ndef cast_customizations_to_a_json_patch(\n cls, value: Union[List[Dict], JsonPatch, str]\n) -> JsonPatch:\n \"\"\"\n Casts lists to JsonPatch instances.\n \"\"\"\n if isinstance(value, str):\n value = json.loads(value)\n if isinstance(value, list):\n return JsonPatch(value)\n return value # type: ignore\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.cloudwatch_logs_options_requires_configure_cloudwatch_logs","title":"
cloudwatch_logs_options_requires_configure_cloudwatch_logs
classmethod
","text":"
Enforces that an execution role arn is provided (or could be provided by a runtime task definition) when configuring logging.
Source code in
prefect_aws/ecs.py
@root_validator\ndef cloudwatch_logs_options_requires_configure_cloudwatch_logs(\n cls, values: dict\n) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if values.get(\"cloudwatch_logs_options\") and not values.get(\n \"configure_cloudwatch_logs\"\n ):\n raise ValueError(\n \"`configure_cloudwatch_log` must be enabled to use \"\n \"`cloudwatch_logs_options`.\"\n )\n return values\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.configure_cloudwatch_logs_requires_execution_role_arn","title":"
configure_cloudwatch_logs_requires_execution_role_arn
classmethod
","text":"
Enforces that an execution role arn is provided (or could be provided by a runtime task definition) when configuring logging.
Source code in
prefect_aws/ecs.py
@root_validator\ndef configure_cloudwatch_logs_requires_execution_role_arn(\n cls, values: dict\n) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if (\n values.get(\"configure_cloudwatch_logs\")\n and not values.get(\"execution_role_arn\")\n # Do not raise if they've linked to another task definition or provided\n # it without using our shortcuts\n and not values.get(\"task_definition_arn\")\n and not (values.get(\"task_definition\") or {}).get(\"executionRoleArn\")\n ):\n raise ValueError(\n \"An `execution_role_arn` must be provided to use \"\n \"`configure_cloudwatch_logs` or `stream_logs`.\"\n )\n return values\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.dict","title":"
dict
","text":"
Convert to a dictionary.
Source code in
prefect_aws/ecs.py
def dict(self, *args, **kwargs) -> Dict:\n \"\"\"\n Convert to a dictionary.\n \"\"\"\n # Support serialization of the 'JsonPatch' type\n d = super().dict(*args, **kwargs)\n d[\"task_customizations\"] = self.task_customizations.patch\n return d\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.generate_work_pool_base_job_template","title":"
generate_work_pool_base_job_template
async
","text":"
Generate a base job template for a cloud-run work pool with the same configuration as this block.
Returns:
Type Description
- dict
a base job template for a cloud-run work pool
Source code in
prefect_aws/ecs.py
async def generate_work_pool_base_job_template(self) -> dict:\n \"\"\"\n Generate a base job template for a cloud-run work pool with the same\n configuration as this block.\n\n Returns:\n - dict: a base job template for a cloud-run work pool\n \"\"\"\n base_job_template = copy.deepcopy(ECSWorker.get_default_base_job_template())\n for key, value in self.dict(exclude_unset=True, exclude_defaults=True).items():\n if key == \"command\":\n base_job_template[\"variables\"][\"properties\"][\"command\"][\"default\"] = (\n shlex.join(value)\n )\n elif key in [\n \"type\",\n \"block_type_slug\",\n \"_block_document_id\",\n \"_block_document_name\",\n \"_is_anonymous\",\n \"task_customizations\",\n ]:\n continue\n elif key == \"aws_credentials\":\n if not self.aws_credentials._block_document_id:\n raise BlockNotSavedError(\n \"It looks like you are trying to use a block that\"\n \" has not been saved. Please call `.save` on your block\"\n \" before publishing it as a work pool.\"\n )\n base_job_template[\"variables\"][\"properties\"][\"aws_credentials\"][\n \"default\"\n ] = {\n \"$ref\": {\n \"block_document_id\": str(\n self.aws_credentials._block_document_id\n )\n }\n }\n elif key == \"task_definition\":\n base_job_template[\"job_configuration\"][\"task_definition\"] = value\n elif key in base_job_template[\"variables\"][\"properties\"]:\n base_job_template[\"variables\"][\"properties\"][key][\"default\"] = value\n else:\n self.logger.warning(\n f\"Variable {key!r} is not supported by Cloud Run work pools.\"\n \" Skipping.\"\n )\n\n if self.task_customizations:\n try:\n base_job_template[\"job_configuration\"][\"task_run_request\"] = (\n self.task_customizations.apply(\n base_job_template[\"job_configuration\"][\"task_run_request\"]\n )\n )\n except JsonPointerException:\n self.logger.warning(\n \"Unable to apply task customizations to the base job template.\"\n \"You may need to update the template manually.\"\n )\n\n return base_job_template\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.get_corresponding_worker_type","title":"
get_corresponding_worker_type
staticmethod
","text":"
Return the corresponding worker type for this infrastructure block.
Source code in
prefect_aws/ecs.py
@staticmethod\ndef get_corresponding_worker_type() -> str:\n \"\"\"Return the corresponding worker type for this infrastructure block.\"\"\"\n return ECSWorker.type\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.image_is_required","title":"
image_is_required
classmethod
","text":"
Enforces that an image is available if image is None
.
Source code in
prefect_aws/ecs.py
@root_validator(pre=True)\ndef image_is_required(cls, values: dict) -> dict:\n \"\"\"\n Enforces that an image is available if image is `None`.\n \"\"\"\n has_image = bool(values.get(\"image\"))\n has_task_definition_arn = bool(values.get(\"task_definition_arn\"))\n\n # The image can only be null when the task_definition_arn is set\n if has_image or has_task_definition_arn:\n return values\n\n prefect_container = (\n get_prefect_container(\n (values.get(\"task_definition\") or {}).get(\"containerDefinitions\", [])\n )\n or {}\n )\n image_in_task_definition = prefect_container.get(\"image\")\n\n # If a task_definition is given with a prefect container image, use that value\n if image_in_task_definition:\n values[\"image\"] = image_in_task_definition\n # Otherwise, it should default to the Prefect base image\n else:\n values[\"image\"] = get_prefect_image_name()\n return values\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.kill","title":"
kill
async
","text":"
Kill a task running on ECS.
Parameters:
Name Type Description Default
identifier
str
A cluster and task arn combination. This should match a value yielded by ECSTask.run
.
required Source code in
prefect_aws/ecs.py
@sync_compatible\nasync def kill(self, identifier: str, grace_seconds: int = 30) -> None:\n \"\"\"\n Kill a task running on ECS.\n\n Args:\n identifier: A cluster and task arn combination. This should match a value\n yielded by `ECSTask.run`.\n \"\"\"\n if grace_seconds != 30:\n self.logger.warning(\n f\"Kill grace period of {grace_seconds}s requested, but AWS does not \"\n \"support dynamic grace period configuration so 30s will be used. \"\n \"See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-config.html for configuration of grace periods.\" # noqa\n )\n cluster, task = parse_task_identifier(identifier)\n await run_sync_in_worker_thread(self._stop_task, cluster, task)\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.prepare_for_flow_run","title":"
prepare_for_flow_run
","text":"
Return an copy of the block that is prepared to execute a flow run.
Source code in
prefect_aws/ecs.py
def prepare_for_flow_run(\n self: Self,\n flow_run: \"FlowRun\",\n deployment: Optional[\"Deployment\"] = None,\n flow: Optional[\"Flow\"] = None,\n) -> Self:\n \"\"\"\n Return an copy of the block that is prepared to execute a flow run.\n \"\"\"\n new_family = None\n\n # Update the family if not specified elsewhere\n if (\n not self.family\n and not self.task_definition_arn\n and not (self.task_definition and self.task_definition.get(\"family\"))\n ):\n if flow and deployment:\n new_family = f\"{ECS_DEFAULT_FAMILY}__{flow.name}__{deployment.name}\"\n elif flow and not deployment:\n new_family = f\"{ECS_DEFAULT_FAMILY}__{flow.name}\"\n elif deployment and not flow:\n # This is a weird case and should not be see in the wild\n new_family = f\"{ECS_DEFAULT_FAMILY}__unknown-flow__{deployment.name}\"\n\n new = super().prepare_for_flow_run(flow_run, deployment=deployment, flow=flow)\n\n if new_family:\n return new.copy(update={\"family\": new_family})\n else:\n # Avoid an extra copy if not needed\n return new\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.preview","title":"
preview
","text":"
Generate a preview of the task definition and task run that will be sent to AWS.
Source code in
prefect_aws/ecs.py
def preview(self) -> str:\n \"\"\"\n Generate a preview of the task definition and task run that will be sent to AWS.\n \"\"\"\n preview = \"\"\n\n task_definition_arn = self.task_definition_arn or \"<registered at runtime>\"\n\n if self.task_definition or not self.task_definition_arn:\n task_definition = self._prepare_task_definition(\n self.task_definition or {},\n region=self.aws_credentials.region_name\n or \"<loaded from client at runtime>\",\n )\n preview += \"---\\n# Task definition\\n\"\n preview += yaml.dump(task_definition)\n preview += \"\\n\"\n else:\n task_definition = None\n\n if task_definition and task_definition.get(\"networkMode\") == \"awsvpc\":\n vpc = \"the default VPC\" if not self.vpc_id else self.vpc_id\n network_config = {\n \"awsvpcConfiguration\": {\n \"subnets\": f\"<loaded from {vpc} at runtime>\",\n \"assignPublicIp\": \"ENABLED\",\n }\n }\n else:\n network_config = None\n\n task_run = self._prepare_task_run(network_config, task_definition_arn)\n preview += \"---\\n# Task run request\\n\"\n preview += yaml.dump(task_run)\n\n return preview\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.run","title":"
run
async
","text":"
Run the configured task on ECS.
Source code in
prefect_aws/ecs.py
@sync_compatible\nasync def run(self, task_status: Optional[TaskStatus] = None) -> ECSTaskResult:\n \"\"\"\n Run the configured task on ECS.\n \"\"\"\n boto_session, ecs_client = await run_sync_in_worker_thread(\n self._get_session_and_client\n )\n\n (\n task_arn,\n cluster_arn,\n task_definition,\n is_new_task_definition,\n ) = await run_sync_in_worker_thread(\n self._create_task_and_wait_for_start, boto_session, ecs_client\n )\n\n # Display a nice message indicating the command and image\n command = self.command or get_prefect_container(\n task_definition[\"containerDefinitions\"]\n ).get(\"command\", [])\n self.logger.info(\n f\"{self._log_prefix}: Running command {' '.join(command)!r} \"\n f\"in container {PREFECT_ECS_CONTAINER_NAME!r} ({self.image})...\"\n )\n\n # The task identifier is \"{cluster}::{task}\" where we use the configured cluster\n # if set to preserve matching by name rather than arn\n # Note \"::\" is used despite the Prefect standard being \":\" because ARNs contain\n # single colons.\n identifier = (self.cluster if self.cluster else cluster_arn) + \"::\" + task_arn\n\n if task_status:\n task_status.started(identifier)\n\n status_code = await run_sync_in_worker_thread(\n self._watch_task_and_get_exit_code,\n task_arn,\n cluster_arn,\n task_definition,\n is_new_task_definition and self.auto_deregister_task_definition,\n boto_session,\n ecs_client,\n )\n\n return ECSTaskResult(\n identifier=identifier,\n # If the container does not start the exit code can be null but we must\n # still report a status code. We use a -1 to indicate a special code.\n status_code=status_code if status_code is not None else -1,\n )\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.set_default_configure_cloudwatch_logs","title":"
set_default_configure_cloudwatch_logs
classmethod
","text":"
Streaming output generally requires CloudWatch logs to be configured.
To avoid entangled arguments in the simple case, configure_cloudwatch_logs
defaults to matching the value of stream_output
.
Source code in
prefect_aws/ecs.py
@root_validator(pre=True)\ndef set_default_configure_cloudwatch_logs(cls, values: dict) -> dict:\n \"\"\"\n Streaming output generally requires CloudWatch logs to be configured.\n\n To avoid entangled arguments in the simple case, `configure_cloudwatch_logs`\n defaults to matching the value of `stream_output`.\n \"\"\"\n configure_cloudwatch_logs = values.get(\"configure_cloudwatch_logs\")\n if configure_cloudwatch_logs is None:\n values[\"configure_cloudwatch_logs\"] = values.get(\"stream_output\")\n return values\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTaskResult","title":"
ECSTaskResult (InfrastructureResult)
pydantic-model
","text":"
The result of a run of an ECS task
Source code in
prefect_aws/ecs.py
class ECSTaskResult(InfrastructureResult):\n \"\"\"The result of a run of an ECS task\"\"\"\n
"},{"location":"ecs/#prefect_aws.ecs-functions","title":"Functions","text":""},{"location":"ecs/#prefect_aws.ecs.get_container","title":"
get_container
","text":"
Extract a container from a list of containers or container definitions. If not found, None
is returned.
Source code in
prefect_aws/ecs.py
def get_container(containers: List[dict], name: str) -> Optional[dict]:\n \"\"\"\n Extract a container from a list of containers or container definitions.\n If not found, `None` is returned.\n \"\"\"\n for container in containers:\n if container.get(\"name\") == name:\n return container\n return None\n
"},{"location":"ecs/#prefect_aws.ecs.get_prefect_container","title":"
get_prefect_container
","text":"
Extract the Prefect container from a list of containers or container definitions. If not found, None
is returned.
Source code in
prefect_aws/ecs.py
def get_prefect_container(containers: List[dict]) -> Optional[dict]:\n \"\"\"\n Extract the Prefect container from a list of containers or container definitions.\n If not found, `None` is returned.\n \"\"\"\n return get_container(containers, PREFECT_ECS_CONTAINER_NAME)\n
"},{"location":"ecs/#prefect_aws.ecs.parse_task_identifier","title":"
parse_task_identifier
","text":"
Splits identifier into its cluster and task components, e.g. input \"cluster_name::task_arn\" outputs (\"cluster_name\", \"task_arn\").
Source code in
prefect_aws/ecs.py
def parse_task_identifier(identifier: str) -> Tuple[str, str]:\n \"\"\"\n Splits identifier into its cluster and task components, e.g.\n input \"cluster_name::task_arn\" outputs (\"cluster_name\", \"task_arn\").\n \"\"\"\n cluster, task = identifier.split(\"::\", maxsplit=1)\n return cluster, task\n
"},{"location":"ecs_guide/","title":"ECS Worker Guide","text":""},{"location":"ecs_guide/#why-use-ecs-for-flow-run-execution","title":"Why use ECS for flow run execution?","text":"
ECS (Elastic Container Service) tasks are a good option for executing Prefect 2 flow runs for several reasons:
- Scalability: ECS scales your infrastructure in response to demand, effectively managing Prefect flow runs. ECS automatically administers container distribution across multiple instances based on demand.
- Flexibility: ECS lets you choose between AWS Fargate and Amazon EC2 for container operation. Fargate abstracts the underlying infrastructure, while EC2 has faster job start times and offers additional control over instance management and configuration.
- AWS Integration: Easily connect with other AWS services, such as AWS IAM and CloudWatch.
- Containerization: ECS supports Docker containers and offers managed execution. Containerization encourages reproducible deployments.
"},{"location":"ecs_guide/#ecs-flow-run-execution","title":"ECS Flow Run Execution","text":"
Prefect enables remote flow execution via workers and work pools. To learn more about these concepts please see our deployment tutorial.
For details on how workers and work pools are implemented for ECS, see the diagram below:
"},{"location":"ecs_guide/#architecture-diagram","title":"Architecture Diagram","text":"
graph TB\n\n subgraph ecs_cluster[ECS cluster]\n subgraph ecs_service[ECS service]\n td_worker[Worker task definition] --> |defines| prefect_worker((Prefect worker))\n end\n prefect_worker -->|kicks off| ecs_task\n fr_task_definition[Flow run task definition]\n\n\n subgraph ecs_task[\"ECS task execution <br> (Flow run infrastructure)\"]\n style ecs_task text-align:center\n\n flow_run((Flow run))\n\n end\n fr_task_definition -->|defines| ecs_task\n end\n\n subgraph prefect_cloud[Prefect Cloud]\n subgraph prefect_workpool[ECS work pool]\n workqueue[Work queue]\n end\n end\n\n subgraph github[\"GitHub\"]\n flow_code{{\"Flow code\"}}\n end\n flow_code --> |pulls| ecs_task\n prefect_worker -->|polls| workqueue\n prefect_workpool -->|configures| fr_task_definition
"},{"location":"ecs_guide/#ecs-in-prefect-terms","title":"ECS in Prefect Terms","text":"
ECS tasks != Prefect tasks
An ECS task is not the same thing as a Prefect task.
ECS tasks are groupings of containers that run within an ECS Cluster. An ECS task's behavior is determined by its task definition.
An ECS task definition is the blueprint for the ECS task. It describes which Docker containers to run and what you want to have happen inside these containers.
ECS tasks are instances of a task definition. A Task Execution launches container(s) as defined in the task definition until they are stopped or exit on their own. This setup is ideal for ephemeral processes such as a Prefect flow run.
The ECS task running the Prefect worker should be an ECS Service, given its long-running nature and need for auto-recovery in case of failure. An ECS service automatically replaces any task that fails, which is ideal for managing a long-running process such as a Prefect worker.
When a Prefect flow is scheduled to run it goes into the work pool specified in the flow's deployment. Work pools are typed according to the infrastructure the flow will run on. Flow runs scheduled in an ecs
typed work pool are executed as ECS tasks. Only Prefect ECS workers can poll an ecs
typed work pool.
When the ECS worker receives a scheduled flow run from the ECS work pool it is polling, it spins up the specified infrastructure on AWS ECS. The worker knows to build an ECS task definition for each flow run based on the configuration specified in the work pool.
Once the flow run completes, the ECS containers of the cluster are spun down to a single container that continues to run the Prefect worker. This worker continues polling for work from the Prefect work pool.
If you specify a task definition ARN (Amazon Resource Name) in the work pool, the worker will use that ARN when spinning up the ECS Task, rather than creating a task definition from the fields supplied in the work pool configuration.
You can use either EC2 or Fargate as the capacity provider. Fargate simplifies initiation, but lengthens infrastructure setup time for each flow run. Using EC2 for the ECS cluster can reduce setup time. In this example, we will show how to use Fargate.
"},{"location":"ecs_guide/#aws-cli-guide","title":"AWS CLI Guide","text":"
Tip
If you prefer infrastructure as code check out this Terraform module to provision an ECS cluster with a worker.
"},{"location":"ecs_guide/#prerequisites","title":"Prerequisites","text":"
Before you begin, make sure you have:
- An AWS account with permissions to create ECS services and IAM roles.
- The AWS CLI installed on your local machine. You can download it from the AWS website.
- An ECS Cluster to host both the worker and the flow runs it submits. Follow this guide to create an ECS cluster or simply use the default cluster.
- A VPC configured for your ECS tasks. A VPC is a good idea if using EC2 and required if using Fargate.
"},{"location":"ecs_guide/#step-1-set-up-an-ecs-work-pool","title":"Step 1: Set Up an ECS work pool","text":"
Before setting up the worker, create a simple work pool of type ECS for the worker to pull work from.
Create a work pool from the Prefect UI or CLI:
prefect work-pool create --type ecs my-ecs-pool\n
Configure the VPC and ECS cluster for your work pool via the UI:
Configuring custom fields is easiest from the UI.
Warning
You need to have a VPC specified for your work pool if you are using AWS Fargate.
Next, set up a Prefect ECS worker that will discover and pull work from this work pool.
"},{"location":"ecs_guide/#step-2-start-a-prefect-worker-in-your-ecs-cluster","title":"Step 2: Start a Prefect worker in your ECS cluster.","text":"
To create an IAM role for the ECS task using the AWS CLI, follow these steps:
-
Create a trust policy
The trust policy will specify that ECS can assume the role.
Save this policy to a file, such as ecs-trust-policy.json
:
{\n \"Version\": \"2012-10-17\",\n \"Statement\": [\n {\n \"Effect\": \"Allow\",\n \"Principal\": {\n \"Service\": \"ecs-tasks.amazonaws.com\"\n },\n \"Action\": \"sts:AssumeRole\"\n }\n ]\n}\n
-
Create the IAM role
Use the aws iam create-role
command to create the role:
aws iam create-role \\ \n--role-name ecsTaskExecutionRole \\\n--assume-role-policy-document file://ecs-trust-policy.json\n
-
Attach the policy to the role
Amazon has a managed policy named AmazonECSTaskExecutionRolePolicy
that grants the permissions necessary for ECS tasks. Attach this policy to your role:
aws iam attach-role-policy \\\n--role-name ecsTaskExecutionRole \\ \n--policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy\n
Remember to replace the --role-name
and --policy-arn
with the actual role name and policy Amazon Resource Name (ARN) you want to use.
Now, you have a role named ecsTaskExecutionRole
that you can assign to your ECS tasks. This role has the necessary permissions to pull container images and publish logs to CloudWatch.
-
Launch an ECS Service to host the worker
Next, create an ECS task definition that specifies the Docker image for the Prefect worker, the resources it requires, and the command it should run. In this example, the command to start the worker is prefect worker start --pool my-ecs-pool
.
Create a JSON file with the following contents:
{\n \"family\": \"prefect-worker-task\",\n \"networkMode\": \"awsvpc\",\n \"requiresCompatibilities\": [\n \"FARGATE\"\n ],\n \"cpu\": \"512\",\n \"memory\": \"1024\",\n \"executionRoleArn\": \"<your-ecs-task-role-arn>\",\n \"taskRoleArn\": \"<your-ecs-task-role-arn>\",\n \"containerDefinitions\": [\n {\n \"name\": \"prefect-worker\",\n \"image\": \"prefecthq/prefect:2-latest\",\n \"cpu\": 512,\n \"memory\": 1024,\n \"essential\": true,\n \"command\": [\n \"/bin/sh\",\n \"-c\",\n \"pip install prefect-aws && prefect worker start --pool my-ecs-pool --type ecs\"\n ],\n \"environment\": [\n {\n \"name\": \"PREFECT_API_URL\",\n \"value\": \"https://api.prefect.cloud/api/accounts/<your-account-id>/workspaces/<your-workspace-id>\"\n },\n {\n \"name\": \"PREFECT_API_KEY\",\n \"value\": \"<your-prefect-api-key>\"\n }\n ]\n }\n ]\n}\n
-
Use prefect config view
to view the PREFECT_API_URL
for your current Prefect profile. Use this to replace both <your-account-id>
and <your-workspace-id>
.
-
For the PREFECT_API_KEY
, individuals on the organization tier can create a service account for the worker. If on a personal tier, you can pass a user\u2019s API key.
-
Replace both instances of <your-ecs-task-role-arn>
with the ARN of the IAM role you created in Step 2.
-
Notice that the CPU and Memory allocations are relatively small. The worker's main responsibility is to submit work through API calls to AWS, not to execute your Prefect flow code.
Tip
To avoid hardcoding your API key into the task definition JSON see how to add environment variables to the container definition. The API key must be stored as plain text, not the key-value pair dictionary that it is formatted in by default.
-
Register the task definition:
Before creating a service, you first need to register a task definition. You can do that using the register-task-definition
command in the AWS CLI. Here is an example:
aws ecs register-task-definition --cli-input-json file://task-definition.json\n
Replace task-definition.json
with the name of your JSON file.
-
Create an ECS service to host your worker:
Finally, create a service that will manage your Prefect worker:
Open a terminal window and run the following command to create an ECS Fargate service:
aws ecs create-service \\\n --service-name prefect-worker-service \\\n --cluster <your-ecs-cluster> \\\n --task-definition <task-definition-arn> \\\n --launch-type FARGATE \\\n --desired-count 1 \\\n --network-configuration \"awsvpcConfiguration={subnets=[<your-subnet-ids>],securityGroups=[<your-security-group-ids>]}\"\n
- Replace
<your-ecs-cluster>
with the name of your ECS cluster. - Replace
<path-to-task-definition-file>
with the path to the JSON file you created in Step 2, <your-subnet-ids>
with a comma-separated list of your VPC subnet IDs. Ensure that these subnets are aligned with the vpc specified on the work pool in step 1. - Replace
<your-security-group-ids>
with a comma-separated list of your VPC security group IDs. - Replace
<task-definition-arn>
with the ARN of the task definition you just registered.
Sanity check
The work pool page in the Prefect UI allows you to check the health of your workers - make sure your new worker is live!
"},{"location":"ecs_guide/#step-4-pick-up-a-flow-run-with-your-new-worker","title":"Step 4: Pick up a flow run with your new worker!","text":"
-
Write a simple test flow in a repo of your choice:
my_flow.py
from prefect import flow, get_run_logger\n\n@flow\ndef my_flow():\n logger = get_run_logger()\n logger.info(\"Hello from ECS!!\")\n\nif __name__ == \"__main__\":\n my_flow()\n
-
Deploy the flow to the server, specifying the ECS work pool when prompted.
prefect deploy my_flow.py:my_flow\n
-
Find the deployment in the UI and click the Quick Run button!
"},{"location":"ecs_guide/#optional-next-steps","title":"Optional Next Steps","text":"
-
Now that you are confident your ECS worker is healthy, you can experiment with different work pool configurations.
- Do your flow runs require higher
CPU
? - Would an EC2
Launch Type
speed up your flow run execution?
These infrastructure configuration values can be set on your ECS work pool or they can be overridden on the deployment level through job_variables if desired.
-
Consider adding a build action to your Prefect Project prefect.yaml
if you want to automatically build a Docker image and push it to an image registry prefect deploy
is run.
Here is an example build action for ECR:
build:\n- prefect.deployments.steps.run_shell_script:\n id: get-commit-hash\n script: git rev-parse --short HEAD\n stream_output: false\n- prefect.deployments.steps.run_shell_script:\n id: ecr-auth-step\n script: aws ecr get-login-password --region <region> | docker login --username\n AWS --password-stdin <>.dkr.ecr.<region>.amazonaws.com\n stream_output: false\n- prefect_docker.deployments.steps.build_docker_image:\n requires: prefect-docker>=0.3.0\n image_name: <your-AWS-account-number>.dkr.ecr.us-east-2.amazonaws.com/<registry>\n tag: '{{ get-commit-hash.stdout }}'\n dockerfile: auto\n push: true\n
"},{"location":"ecs_worker/","title":"ECS Worker","text":""},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker","title":"
prefect_aws.workers.ecs_worker
","text":"
Prefect worker for executing flow runs as ECS tasks.
Get started by creating a work pool:
$ prefect work-pool create --type ecs my-ecs-pool\n
Then, you can start a worker for the pool:
$ prefect worker start --pool my-ecs-pool\n
It's common to deploy the worker as an ECS task as well. However, you can run the worker locally to get started.
The worker may work without any additional configuration, but it is dependent on your specific AWS setup and we'd recommend opening the work pool editor in the UI to see the available options.
By default, the worker will register a task definition for each flow run and run a task in your default ECS cluster using AWS Fargate. Fargate requires tasks to configure subnets, which we will infer from your default VPC. If you do not have a default VPC, you must provide a VPC ID or manually setup the network configuration for your tasks.
Note, the worker caches task definitions for each deployment to avoid excessive registration. The worker will check that the cached task definition is compatible with your configuration before using it.
The launch type option can be used to run your tasks in different modes. For example, FARGATE_SPOT
can be used to use spot instances for your Fargate tasks or EC2
can be used to run your tasks on a cluster backed by EC2 instances.
Generally, it is very useful to enable CloudWatch logging for your ECS tasks; this can help you debug task failures. To enable CloudWatch logging, you must provide an execution role ARN with permissions to create and write to log streams. See the configure_cloudwatch_logs
field documentation for details.
The worker can be configured to use an existing task definition by setting the task definition arn variable or by providing a \"taskDefinition\" in the task run request. When a task definition is provided, the worker will never create a new task definition which may result in variables that are templated into the task definition payload being ignored.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker-classes","title":"Classes","text":""},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSIdentifier","title":"
ECSIdentifier (tuple)
","text":"
The identifier for a running ECS task.
Source code in
prefect_aws/workers/ecs_worker.py
class ECSIdentifier(NamedTuple):\n \"\"\"\n The identifier for a running ECS task.\n \"\"\"\n\n cluster: str\n task_arn: str\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSIdentifier-methods","title":"Methods","text":""},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSIdentifier.__getnewargs__","title":"
__getnewargs__
special
","text":"
Return self as a plain tuple. Used by copy and pickle.
Source code in
prefect_aws/workers/ecs_worker.py
def __getnewargs__(self):\n 'Return self as a plain tuple. Used by copy and pickle.'\n return _tuple(self)\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSIdentifier.__new__","title":"
__new__
special
staticmethod
","text":"
Create new instance of ECSIdentifier(cluster, task_arn)
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSIdentifier.__repr__","title":"
__repr__
special
","text":"
Return a nicely formatted representation string
Source code in
prefect_aws/workers/ecs_worker.py
def __repr__(self):\n 'Return a nicely formatted representation string'\n return self.__class__.__name__ + repr_fmt % self\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSJobConfiguration","title":"
ECSJobConfiguration (BaseJobConfiguration)
pydantic-model
","text":"
Job configuration for an ECS worker.
Source code in
prefect_aws/workers/ecs_worker.py
class ECSJobConfiguration(BaseJobConfiguration):\n \"\"\"\n Job configuration for an ECS worker.\n \"\"\"\n\n aws_credentials: Optional[AwsCredentials] = Field(default_factory=AwsCredentials)\n task_definition: Optional[Dict[str, Any]] = Field(\n template=_default_task_definition_template()\n )\n task_run_request: Dict[str, Any] = Field(\n template=_default_task_run_request_template()\n )\n configure_cloudwatch_logs: Optional[bool] = Field(default=None)\n cloudwatch_logs_options: Dict[str, str] = Field(default_factory=dict)\n network_configuration: Dict[str, Any] = Field(default_factory=dict)\n stream_output: Optional[bool] = Field(default=None)\n task_start_timeout_seconds: int = Field(default=300)\n task_watch_poll_interval: float = Field(default=5.0)\n auto_deregister_task_definition: bool = Field(default=False)\n vpc_id: Optional[str] = Field(default=None)\n container_name: Optional[str] = Field(default=None)\n cluster: Optional[str] = Field(default=None)\n\n @root_validator\n def task_run_request_requires_arn_if_no_task_definition_given(cls, values) -> dict:\n \"\"\"\n If no task definition is provided, a task definition ARN must be present on the\n task run request.\n \"\"\"\n if not values.get(\"task_run_request\", {}).get(\n \"taskDefinition\"\n ) and not values.get(\"task_definition\"):\n raise ValueError(\n \"A task definition must be provided if a task definition ARN is not \"\n \"present on the task run request.\"\n )\n return values\n\n @root_validator\n def container_name_default_from_task_definition(cls, values) -> dict:\n \"\"\"\n Infers the container name from the task definition if not provided.\n \"\"\"\n if values.get(\"container_name\") is None:\n values[\"container_name\"] = _container_name_from_task_definition(\n values.get(\"task_definition\")\n )\n\n # We may not have a name here still; for example if someone is using a task\n # definition arn. In that case, we'll perform similar logic later to find\n # the name to treat as the \"orchestration\" container.\n\n return values\n\n @root_validator(pre=True)\n def set_default_configure_cloudwatch_logs(cls, values: dict) -> dict:\n \"\"\"\n Streaming output generally requires CloudWatch logs to be configured.\n\n To avoid entangled arguments in the simple case, `configure_cloudwatch_logs`\n defaults to matching the value of `stream_output`.\n \"\"\"\n configure_cloudwatch_logs = values.get(\"configure_cloudwatch_logs\")\n if configure_cloudwatch_logs is None:\n values[\"configure_cloudwatch_logs\"] = values.get(\"stream_output\")\n return values\n\n @root_validator\n def configure_cloudwatch_logs_requires_execution_role_arn(\n cls, values: dict\n ) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if (\n values.get(\"configure_cloudwatch_logs\")\n and not values.get(\"execution_role_arn\")\n # TODO: Does not match\n # Do not raise if they've linked to another task definition or provided\n # it without using our shortcuts\n and not values.get(\"task_run_request\", {}).get(\"taskDefinition\")\n and not (values.get(\"task_definition\") or {}).get(\"executionRoleArn\")\n ):\n raise ValueError(\n \"An `execution_role_arn` must be provided to use \"\n \"`configure_cloudwatch_logs` or `stream_logs`.\"\n )\n return values\n\n @root_validator\n def cloudwatch_logs_options_requires_configure_cloudwatch_logs(\n cls, values: dict\n ) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if values.get(\"cloudwatch_logs_options\") and not values.get(\n \"configure_cloudwatch_logs\"\n ):\n raise ValueError(\n \"`configure_cloudwatch_log` must be enabled to use \"\n \"`cloudwatch_logs_options`.\"\n )\n return values\n\n @root_validator\n def network_configuration_requires_vpc_id(cls, values: dict) -> dict:\n \"\"\"\n Enforces a `vpc_id` is provided when custom network configuration mode is\n enabled for network settings.\n \"\"\"\n if values.get(\"network_configuration\") and not values.get(\"vpc_id\"):\n raise ValueError(\n \"You must provide a `vpc_id` to enable custom `network_configuration`.\"\n )\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSJobConfiguration-methods","title":"Methods","text":""},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSJobConfiguration.cloudwatch_logs_options_requires_configure_cloudwatch_logs","title":"
cloudwatch_logs_options_requires_configure_cloudwatch_logs
classmethod
","text":"
Enforces that an execution role arn is provided (or could be provided by a runtime task definition) when configuring logging.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef cloudwatch_logs_options_requires_configure_cloudwatch_logs(\n cls, values: dict\n) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if values.get(\"cloudwatch_logs_options\") and not values.get(\n \"configure_cloudwatch_logs\"\n ):\n raise ValueError(\n \"`configure_cloudwatch_log` must be enabled to use \"\n \"`cloudwatch_logs_options`.\"\n )\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSJobConfiguration.configure_cloudwatch_logs_requires_execution_role_arn","title":"
configure_cloudwatch_logs_requires_execution_role_arn
classmethod
","text":"
Enforces that an execution role arn is provided (or could be provided by a runtime task definition) when configuring logging.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef configure_cloudwatch_logs_requires_execution_role_arn(\n cls, values: dict\n) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if (\n values.get(\"configure_cloudwatch_logs\")\n and not values.get(\"execution_role_arn\")\n # TODO: Does not match\n # Do not raise if they've linked to another task definition or provided\n # it without using our shortcuts\n and not values.get(\"task_run_request\", {}).get(\"taskDefinition\")\n and not (values.get(\"task_definition\") or {}).get(\"executionRoleArn\")\n ):\n raise ValueError(\n \"An `execution_role_arn` must be provided to use \"\n \"`configure_cloudwatch_logs` or `stream_logs`.\"\n )\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSJobConfiguration.container_name_default_from_task_definition","title":"
container_name_default_from_task_definition
classmethod
","text":"
Infers the container name from the task definition if not provided.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef container_name_default_from_task_definition(cls, values) -> dict:\n \"\"\"\n Infers the container name from the task definition if not provided.\n \"\"\"\n if values.get(\"container_name\") is None:\n values[\"container_name\"] = _container_name_from_task_definition(\n values.get(\"task_definition\")\n )\n\n # We may not have a name here still; for example if someone is using a task\n # definition arn. In that case, we'll perform similar logic later to find\n # the name to treat as the \"orchestration\" container.\n\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSJobConfiguration.network_configuration_requires_vpc_id","title":"
network_configuration_requires_vpc_id
classmethod
","text":"
Enforces a vpc_id
is provided when custom network configuration mode is enabled for network settings.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef network_configuration_requires_vpc_id(cls, values: dict) -> dict:\n \"\"\"\n Enforces a `vpc_id` is provided when custom network configuration mode is\n enabled for network settings.\n \"\"\"\n if values.get(\"network_configuration\") and not values.get(\"vpc_id\"):\n raise ValueError(\n \"You must provide a `vpc_id` to enable custom `network_configuration`.\"\n )\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSJobConfiguration.set_default_configure_cloudwatch_logs","title":"
set_default_configure_cloudwatch_logs
classmethod
","text":"
Streaming output generally requires CloudWatch logs to be configured.
To avoid entangled arguments in the simple case, configure_cloudwatch_logs
defaults to matching the value of stream_output
.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator(pre=True)\ndef set_default_configure_cloudwatch_logs(cls, values: dict) -> dict:\n \"\"\"\n Streaming output generally requires CloudWatch logs to be configured.\n\n To avoid entangled arguments in the simple case, `configure_cloudwatch_logs`\n defaults to matching the value of `stream_output`.\n \"\"\"\n configure_cloudwatch_logs = values.get(\"configure_cloudwatch_logs\")\n if configure_cloudwatch_logs is None:\n values[\"configure_cloudwatch_logs\"] = values.get(\"stream_output\")\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSJobConfiguration.task_run_request_requires_arn_if_no_task_definition_given","title":"
task_run_request_requires_arn_if_no_task_definition_given
classmethod
","text":"
If no task definition is provided, a task definition ARN must be present on the task run request.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef task_run_request_requires_arn_if_no_task_definition_given(cls, values) -> dict:\n \"\"\"\n If no task definition is provided, a task definition ARN must be present on the\n task run request.\n \"\"\"\n if not values.get(\"task_run_request\", {}).get(\n \"taskDefinition\"\n ) and not values.get(\"task_definition\"):\n raise ValueError(\n \"A task definition must be provided if a task definition ARN is not \"\n \"present on the task run request.\"\n )\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables","title":"
ECSVariables (BaseVariables)
pydantic-model
","text":"
Variables for templating an ECS job.
Source code in
prefect_aws/workers/ecs_worker.py
class ECSVariables(BaseVariables):\n \"\"\"\n Variables for templating an ECS job.\n \"\"\"\n\n task_definition_arn: Optional[str] = Field(\n default=None,\n description=(\n \"An identifier for an existing task definition to use. If set, options that\"\n \" require changes to the task definition will be ignored. All contents of \"\n \"the task definition in the job configuration will be ignored.\"\n ),\n )\n env: Dict[str, Optional[str]] = Field(\n title=\"Environment Variables\",\n default_factory=dict,\n description=(\n \"Environment variables to provide to the task run. These variables are set \"\n \"on the Prefect container at task runtime. These will not be set on the \"\n \"task definition.\"\n ),\n )\n aws_credentials: AwsCredentials = Field(\n title=\"AWS Credentials\",\n default_factory=AwsCredentials,\n description=(\n \"The AWS credentials to use to connect to ECS. If not provided, credentials\"\n \" will be inferred from the local environment following AWS's boto client's\"\n \" rules.\"\n ),\n )\n cluster: Optional[str] = Field(\n default=None,\n description=(\n \"The ECS cluster to run the task in. An ARN or name may be provided. If \"\n \"not provided, the default cluster will be used.\"\n ),\n )\n family: Optional[str] = Field(\n default=None,\n description=(\n \"A family for the task definition. If not provided, it will be inferred \"\n \"from the task definition. If the task definition does not have a family, \"\n \"the name will be generated. When flow and deployment metadata is \"\n \"available, the generated name will include their names. Values for this \"\n \"field will be slugified to match AWS character requirements.\"\n ),\n )\n launch_type: Optional[Literal[\"FARGATE\", \"EC2\", \"EXTERNAL\", \"FARGATE_SPOT\"]] = (\n Field(\n default=ECS_DEFAULT_LAUNCH_TYPE,\n description=(\n \"The type of ECS task run infrastructure that should be used. Note that\"\n \" 'FARGATE_SPOT' is not a formal ECS launch type, but we will configure\"\n \" the proper capacity provider stategy if set here.\"\n ),\n )\n )\n image: Optional[str] = Field(\n default=None,\n description=(\n \"The image to use for the Prefect container in the task. If this value is \"\n \"not null, it will override the value in the task definition. This value \"\n \"defaults to a Prefect base image matching your local versions.\"\n ),\n )\n cpu: int = Field(\n title=\"CPU\",\n default=None,\n description=(\n \"The amount of CPU to provide to the ECS task. Valid amounts are \"\n \"specified in the AWS documentation. If not provided, a default value of \"\n f\"{ECS_DEFAULT_CPU} will be used unless present on the task definition.\"\n ),\n )\n memory: int = Field(\n default=None,\n description=(\n \"The amount of memory to provide to the ECS task. Valid amounts are \"\n \"specified in the AWS documentation. If not provided, a default value of \"\n f\"{ECS_DEFAULT_MEMORY} will be used unless present on the task definition.\"\n ),\n )\n container_name: str = Field(\n default=None,\n description=(\n \"The name of the container flow run orchestration will occur in. If not \"\n f\"specified, a default value of {ECS_DEFAULT_CONTAINER_NAME} will be used \"\n \"and if that is not found in the task definition the first container will \"\n \"be used.\"\n ),\n )\n task_role_arn: str = Field(\n title=\"Task Role ARN\",\n default=None,\n description=(\n \"A role to attach to the task run. This controls the permissions of the \"\n \"task while it is running.\"\n ),\n )\n execution_role_arn: str = Field(\n title=\"Execution Role ARN\",\n default=None,\n description=(\n \"An execution role to use for the task. This controls the permissions of \"\n \"the task when it is launching. If this value is not null, it will \"\n \"override the value in the task definition. An execution role must be \"\n \"provided to capture logs from the container.\"\n ),\n )\n vpc_id: Optional[str] = Field(\n title=\"VPC ID\",\n default=None,\n description=(\n \"The AWS VPC to link the task run to. This is only applicable when using \"\n \"the 'awsvpc' network mode for your task. FARGATE tasks require this \"\n \"network mode, but for EC2 tasks the default network mode is 'bridge'. \"\n \"If using the 'awsvpc' network mode and this field is null, your default \"\n \"VPC will be used. If no default VPC can be found, the task run will fail.\"\n ),\n )\n configure_cloudwatch_logs: bool = Field(\n default=None,\n description=(\n \"If enabled, the Prefect container will be configured to send its output \"\n \"to the AWS CloudWatch logs service. This functionality requires an \"\n \"execution role with logs:CreateLogStream, logs:CreateLogGroup, and \"\n \"logs:PutLogEvents permissions. The default for this field is `False` \"\n \"unless `stream_output` is set.\"\n ),\n )\n cloudwatch_logs_options: Dict[str, str] = Field(\n default_factory=dict,\n description=(\n \"When `configure_cloudwatch_logs` is enabled, this setting may be used to\"\n \" pass additional options to the CloudWatch logs configuration or override\"\n \" the default options. See the [AWS\"\n \" documentation](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_awslogs.html#create_awslogs_logdriver_options)\" # noqa\n \" for available options. \"\n ),\n )\n\n network_configuration: Dict[str, Any] = Field(\n default_factory=dict,\n description=(\n \"When `network_configuration` is supplied it will override ECS Worker's\"\n \"awsvpcConfiguration that defined in the ECS task executing your workload. \"\n \"See the [AWS documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ecs-service-awsvpcconfiguration.html)\" # noqa\n \" for available options.\"\n ),\n )\n\n stream_output: bool = Field(\n default=None,\n description=(\n \"If enabled, logs will be streamed from the Prefect container to the local \"\n \"console. Unless you have configured AWS CloudWatch logs manually on your \"\n \"task definition, this requires the same prerequisites outlined in \"\n \"`configure_cloudwatch_logs`.\"\n ),\n )\n task_start_timeout_seconds: int = Field(\n default=300,\n description=(\n \"The amount of time to watch for the start of the ECS task \"\n \"before marking it as failed. The task must enter a RUNNING state to be \"\n \"considered started.\"\n ),\n )\n task_watch_poll_interval: float = Field(\n default=5.0,\n description=(\n \"The amount of time to wait between AWS API calls while monitoring the \"\n \"state of an ECS task.\"\n ),\n )\n auto_deregister_task_definition: bool = Field(\n default=False,\n description=(\n \"If enabled, any task definitions that are created by this block will be \"\n \"deregistered. Existing task definitions linked by ARN will never be \"\n \"deregistered. Deregistering a task definition does not remove it from \"\n \"your AWS account, instead it will be marked as INACTIVE.\"\n ),\n )\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables-attributes","title":"Attributes","text":""},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.auto_deregister_task_definition","title":"
auto_deregister_task_definition: bool
pydantic-field
","text":"
If enabled, any task definitions that are created by this block will be deregistered. Existing task definitions linked by ARN will never be deregistered. Deregistering a task definition does not remove it from your AWS account, instead it will be marked as INACTIVE.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.aws_credentials","title":"
aws_credentials: AwsCredentials
pydantic-field
","text":"
The AWS credentials to use to connect to ECS. If not provided, credentials will be inferred from the local environment following AWS's boto client's rules.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.cloudwatch_logs_options","title":"
cloudwatch_logs_options: Dict[str, str]
pydantic-field
","text":"
When configure_cloudwatch_logs
is enabled, this setting may be used to pass additional options to the CloudWatch logs configuration or override the default options. See the AWS documentation for available options.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.cluster","title":"
cluster: str
pydantic-field
","text":"
The ECS cluster to run the task in. An ARN or name may be provided. If not provided, the default cluster will be used.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.configure_cloudwatch_logs","title":"
configure_cloudwatch_logs: bool
pydantic-field
","text":"
If enabled, the Prefect container will be configured to send its output to the AWS CloudWatch logs service. This functionality requires an execution role with logs:CreateLogStream, logs:CreateLogGroup, and logs:PutLogEvents permissions. The default for this field is False
unless stream_output
is set.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.container_name","title":"
container_name: str
pydantic-field
","text":"
The name of the container flow run orchestration will occur in. If not specified, a default value of prefect will be used and if that is not found in the task definition the first container will be used.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.cpu","title":"
cpu: int
pydantic-field
","text":"
The amount of CPU to provide to the ECS task. Valid amounts are specified in the AWS documentation. If not provided, a default value of 1024 will be used unless present on the task definition.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.execution_role_arn","title":"
execution_role_arn: str
pydantic-field
","text":"
An execution role to use for the task. This controls the permissions of the task when it is launching. If this value is not null, it will override the value in the task definition. An execution role must be provided to capture logs from the container.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.family","title":"
family: str
pydantic-field
","text":"
A family for the task definition. If not provided, it will be inferred from the task definition. If the task definition does not have a family, the name will be generated. When flow and deployment metadata is available, the generated name will include their names. Values for this field will be slugified to match AWS character requirements.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.image","title":"
image: str
pydantic-field
","text":"
The image to use for the Prefect container in the task. If this value is not null, it will override the value in the task definition. This value defaults to a Prefect base image matching your local versions.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.launch_type","title":"
launch_type: Literal['FARGATE', 'EC2', 'EXTERNAL', 'FARGATE_SPOT']
pydantic-field
","text":"
The type of ECS task run infrastructure that should be used. Note that 'FARGATE_SPOT' is not a formal ECS launch type, but we will configure the proper capacity provider stategy if set here.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.memory","title":"
memory: int
pydantic-field
","text":"
The amount of memory to provide to the ECS task. Valid amounts are specified in the AWS documentation. If not provided, a default value of 2048 will be used unless present on the task definition.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.network_configuration","title":"
network_configuration: Dict[str, Any]
pydantic-field
","text":"
When network_configuration
is supplied it will override ECS Worker'sawsvpcConfiguration that defined in the ECS task executing your workload. See the AWS documentation for available options.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.stream_output","title":"
stream_output: bool
pydantic-field
","text":"
If enabled, logs will be streamed from the Prefect container to the local console. Unless you have configured AWS CloudWatch logs manually on your task definition, this requires the same prerequisites outlined in configure_cloudwatch_logs
.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.task_definition_arn","title":"
task_definition_arn: str
pydantic-field
","text":"
An identifier for an existing task definition to use. If set, options that require changes to the task definition will be ignored. All contents of the task definition in the job configuration will be ignored.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.task_role_arn","title":"
task_role_arn: str
pydantic-field
","text":"
A role to attach to the task run. This controls the permissions of the task while it is running.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.task_start_timeout_seconds","title":"
task_start_timeout_seconds: int
pydantic-field
","text":"
The amount of time to watch for the start of the ECS task before marking it as failed. The task must enter a RUNNING state to be considered started.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.task_watch_poll_interval","title":"
task_watch_poll_interval: float
pydantic-field
","text":"
The amount of time to wait between AWS API calls while monitoring the state of an ECS task.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.vpc_id","title":"
vpc_id: str
pydantic-field
","text":"
The AWS VPC to link the task run to. This is only applicable when using the 'awsvpc' network mode for your task. FARGATE tasks require this network mode, but for EC2 tasks the default network mode is 'bridge'. If using the 'awsvpc' network mode and this field is null, your default VPC will be used. If no default VPC can be found, the task run will fail.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker","title":"
ECSWorker (BaseWorker)
","text":"
A Prefect worker to run flow runs as ECS tasks.
Source code in
prefect_aws/workers/ecs_worker.py
class ECSWorker(BaseWorker):\n \"\"\"\n A Prefect worker to run flow runs as ECS tasks.\n \"\"\"\n\n type = \"ecs\"\n job_configuration = ECSJobConfiguration\n job_configuration_variables = ECSVariables\n _description = (\n \"Execute flow runs within containers on AWS ECS. Works with EC2 \"\n \"and Fargate clusters. Requires an AWS account.\"\n )\n _display_name = \"AWS Elastic Container Service\"\n _documentation_url = \"https://prefecthq.github.io/prefect-aws/ecs_worker/\"\n _logo_url = \"https://cdn.sanity.io/images/3ugk85nk/production/d74b16fe84ce626345adf235a47008fea2869a60-225x225.png\" # noqa\n\n async def run(\n self,\n flow_run: \"FlowRun\",\n configuration: ECSJobConfiguration,\n task_status: Optional[anyio.abc.TaskStatus] = None,\n ) -> BaseWorkerResult:\n \"\"\"\n Runs a given flow run on the current worker.\n \"\"\"\n boto_session, ecs_client = await run_sync_in_worker_thread(\n self._get_session_and_client, configuration\n )\n\n logger = self.get_flow_run_logger(flow_run)\n\n (\n task_arn,\n cluster_arn,\n task_definition,\n is_new_task_definition,\n ) = await run_sync_in_worker_thread(\n self._create_task_and_wait_for_start,\n logger,\n boto_session,\n ecs_client,\n configuration,\n flow_run,\n )\n\n # The task identifier is \"{cluster}::{task}\" where we use the configured cluster\n # if set to preserve matching by name rather than arn\n # Note \"::\" is used despite the Prefect standard being \":\" because ARNs contain\n # single colons.\n identifier = (\n (configuration.cluster if configuration.cluster else cluster_arn)\n + \"::\"\n + task_arn\n )\n\n if task_status:\n task_status.started(identifier)\n\n status_code = await run_sync_in_worker_thread(\n self._watch_task_and_get_exit_code,\n logger,\n configuration,\n task_arn,\n cluster_arn,\n task_definition,\n is_new_task_definition and configuration.auto_deregister_task_definition,\n boto_session,\n ecs_client,\n )\n\n return ECSWorkerResult(\n identifier=identifier,\n # If the container does not start the exit code can be null but we must\n # still report a status code. We use a -1 to indicate a special code.\n status_code=status_code if status_code is not None else -1,\n )\n\n def _get_session_and_client(\n self,\n configuration: ECSJobConfiguration,\n ) -> Tuple[boto3.Session, _ECSClient]:\n \"\"\"\n Retrieve a boto3 session and ECS client\n \"\"\"\n boto_session = configuration.aws_credentials.get_boto3_session()\n ecs_client = boto_session.client(\"ecs\")\n return boto_session, ecs_client\n\n def _create_task_and_wait_for_start(\n self,\n logger: logging.Logger,\n boto_session: boto3.Session,\n ecs_client: _ECSClient,\n configuration: ECSJobConfiguration,\n flow_run: FlowRun,\n ) -> Tuple[str, str, dict, bool]:\n \"\"\"\n Register the task definition, create the task run, and wait for it to start.\n\n Returns a tuple of\n - The task ARN\n - The task's cluster ARN\n - The task definition\n - A bool indicating if the task definition is newly registered\n \"\"\"\n task_definition_arn = configuration.task_run_request.get(\"taskDefinition\")\n new_task_definition_registered = False\n\n if not task_definition_arn:\n cached_task_definition_arn = _TASK_DEFINITION_CACHE.get(\n flow_run.deployment_id\n )\n task_definition = self._prepare_task_definition(\n configuration, region=ecs_client.meta.region_name\n )\n\n if cached_task_definition_arn:\n # Read the task definition to see if the cached task definition is valid\n try:\n cached_task_definition = self._retrieve_task_definition(\n logger, ecs_client, cached_task_definition_arn\n )\n except Exception as exc:\n logger.warning(\n \"Failed to retrieve cached task definition\"\n f\" {cached_task_definition_arn!r}: {exc!r}\"\n )\n # Clear from cache\n _TASK_DEFINITION_CACHE.pop(flow_run.deployment_id, None)\n cached_task_definition_arn = None\n else:\n if not cached_task_definition[\"status\"] == \"ACTIVE\":\n # Cached task definition is not active\n logger.warning(\n \"Cached task definition\"\n f\" {cached_task_definition_arn!r} is not active\"\n )\n _TASK_DEFINITION_CACHE.pop(flow_run.deployment_id, None)\n cached_task_definition_arn = None\n elif not self._task_definitions_equal(\n task_definition, cached_task_definition\n ):\n # Cached task definition is not valid\n logger.warning(\n \"Cached task definition\"\n f\" {cached_task_definition_arn!r} does not meet\"\n \" requirements\"\n )\n _TASK_DEFINITION_CACHE.pop(flow_run.deployment_id, None)\n cached_task_definition_arn = None\n\n if not cached_task_definition_arn:\n task_definition_arn = self._register_task_definition(\n logger, ecs_client, task_definition\n )\n new_task_definition_registered = True\n else:\n task_definition_arn = cached_task_definition_arn\n else:\n task_definition = self._retrieve_task_definition(\n logger, ecs_client, task_definition_arn\n )\n if configuration.task_definition:\n logger.warning(\n \"Ignoring task definition in configuration since task definition\"\n \" ARN is provided on the task run request.\"\n )\n\n self._validate_task_definition(task_definition, configuration)\n\n # Update the cached task definition ARN to avoid re-registering the task\n # definition on this worker unless necessary; registration is agressively\n # rate limited by AWS\n _TASK_DEFINITION_CACHE[flow_run.deployment_id] = task_definition_arn\n\n logger.info(f\"Using ECS task definition {task_definition_arn!r}...\")\n logger.debug(\n f\"Task definition {json.dumps(task_definition, indent=2, default=str)}\"\n )\n\n # Prepare the task run request\n task_run_request = self._prepare_task_run_request(\n boto_session,\n configuration,\n task_definition,\n task_definition_arn,\n )\n\n logger.info(\"Creating ECS task run...\")\n logger.debug(\n \"Task run request\"\n f\"{json.dumps(mask_api_key(task_run_request), indent=2, default=str)}\"\n )\n\n try:\n task = self._create_task_run(ecs_client, task_run_request)\n task_arn = task[\"taskArn\"]\n cluster_arn = task[\"clusterArn\"]\n except Exception as exc:\n self._report_task_run_creation_failure(configuration, task_run_request, exc)\n raise\n\n # Raises an exception if the task does not start\n logger.info(\"Waiting for ECS task run to start...\")\n self._wait_for_task_start(\n logger,\n configuration,\n task_arn,\n cluster_arn,\n ecs_client,\n timeout=configuration.task_start_timeout_seconds,\n )\n\n return task_arn, cluster_arn, task_definition, new_task_definition_registered\n\n def _watch_task_and_get_exit_code(\n self,\n logger: logging.Logger,\n configuration: ECSJobConfiguration,\n task_arn: str,\n cluster_arn: str,\n task_definition: dict,\n deregister_task_definition: bool,\n boto_session: boto3.Session,\n ecs_client: _ECSClient,\n ) -> Optional[int]:\n \"\"\"\n Wait for the task run to complete and retrieve the exit code of the Prefect\n container.\n \"\"\"\n\n # Wait for completion and stream logs\n task = self._wait_for_task_finish(\n logger,\n configuration,\n task_arn,\n cluster_arn,\n task_definition,\n ecs_client,\n boto_session,\n )\n\n if deregister_task_definition:\n ecs_client.deregister_task_definition(\n taskDefinition=task[\"taskDefinitionArn\"]\n )\n\n container_name = (\n configuration.container_name\n or _container_name_from_task_definition(task_definition)\n or ECS_DEFAULT_CONTAINER_NAME\n )\n\n # Check the status code of the Prefect container\n container = _get_container(task[\"containers\"], container_name)\n assert (\n container is not None\n ), f\"'{container_name}' container missing from task: {task}\"\n status_code = container.get(\"exitCode\")\n self._report_container_status_code(logger, container_name, status_code)\n\n return status_code\n\n def _report_container_status_code(\n self, logger: logging.Logger, name: str, status_code: Optional[int]\n ) -> None:\n \"\"\"\n Display a log for the given container status code.\n \"\"\"\n if status_code is None:\n logger.error(\n f\"Task exited without reporting an exit status for container {name!r}.\"\n )\n elif status_code == 0:\n logger.info(f\"Container {name!r} exited successfully.\")\n else:\n logger.warning(\n f\"Container {name!r} exited with non-zero exit code {status_code}.\"\n )\n\n def _report_task_run_creation_failure(\n self, configuration: ECSJobConfiguration, task_run: dict, exc: Exception\n ) -> None:\n \"\"\"\n Wrap common AWS task run creation failures with nicer user-facing messages.\n \"\"\"\n # AWS generates exception types at runtime so they must be captured a bit\n # differently than normal.\n if \"ClusterNotFoundException\" in str(exc):\n cluster = task_run.get(\"cluster\", \"default\")\n raise RuntimeError(\n f\"Failed to run ECS task, cluster {cluster!r} not found. \"\n \"Confirm that the cluster is configured in your region.\"\n ) from exc\n elif (\n \"No Container Instances\" in str(exc) and task_run.get(\"launchType\") == \"EC2\"\n ):\n cluster = task_run.get(\"cluster\", \"default\")\n raise RuntimeError(\n f\"Failed to run ECS task, cluster {cluster!r} does not appear to \"\n \"have any container instances associated with it. Confirm that you \"\n \"have EC2 container instances available.\"\n ) from exc\n elif (\n \"failed to validate logger args\" in str(exc)\n and \"AccessDeniedException\" in str(exc)\n and configuration.configure_cloudwatch_logs\n ):\n raise RuntimeError(\n \"Failed to run ECS task, the attached execution role does not appear\"\n \" to have sufficient permissions. Ensure that the execution role\"\n f\" {configuration.execution_role!r} has permissions\"\n \" logs:CreateLogStream, logs:CreateLogGroup, and logs:PutLogEvents.\"\n )\n else:\n raise\n\n def _validate_task_definition(\n self, task_definition: dict, configuration: ECSJobConfiguration\n ) -> None:\n \"\"\"\n Ensure that the task definition is compatible with the configuration.\n\n Raises `ValueError` on incompatibility. Returns `None` on success.\n \"\"\"\n launch_type = configuration.task_run_request.get(\n \"launchType\", ECS_DEFAULT_LAUNCH_TYPE\n )\n if (\n launch_type != \"EC2\"\n and \"FARGATE\" not in task_definition[\"requiresCompatibilities\"]\n ):\n raise ValueError(\n \"Task definition does not have 'FARGATE' in 'requiresCompatibilities'\"\n f\" and cannot be used with launch type {launch_type!r}\"\n )\n\n if launch_type == \"FARGATE\" or launch_type == \"FARGATE_SPOT\":\n # Only the 'awsvpc' network mode is supported when using FARGATE\n network_mode = task_definition.get(\"networkMode\")\n if network_mode != \"awsvpc\":\n raise ValueError(\n f\"Found network mode {network_mode!r} which is not compatible with \"\n f\"launch type {launch_type!r}. Use either the 'EC2' launch \"\n \"type or the 'awsvpc' network mode.\"\n )\n\n if configuration.configure_cloudwatch_logs and not task_definition.get(\n \"executionRoleArn\"\n ):\n raise ValueError(\n \"An execution role arn must be set on the task definition to use \"\n \"`configure_cloudwatch_logs` or `stream_logs` but no execution role \"\n \"was found on the task definition.\"\n )\n\n def _register_task_definition(\n self,\n logger: logging.Logger,\n ecs_client: _ECSClient,\n task_definition: dict,\n ) -> str:\n \"\"\"\n Register a new task definition with AWS.\n\n Returns the ARN.\n \"\"\"\n logger.info(\"Registering ECS task definition...\")\n logger.debug(\n \"Task definition request\"\n f\"{json.dumps(task_definition, indent=2, default=str)}\"\n )\n response = ecs_client.register_task_definition(**task_definition)\n return response[\"taskDefinition\"][\"taskDefinitionArn\"]\n\n def _retrieve_task_definition(\n self,\n logger: logging.Logger,\n ecs_client: _ECSClient,\n task_definition_arn: str,\n ):\n \"\"\"\n Retrieve an existing task definition from AWS.\n \"\"\"\n logger.info(f\"Retrieving ECS task definition {task_definition_arn!r}...\")\n response = ecs_client.describe_task_definition(\n taskDefinition=task_definition_arn\n )\n return response[\"taskDefinition\"]\n\n def _wait_for_task_start(\n self,\n logger: logging.Logger,\n configuration: ECSJobConfiguration,\n task_arn: str,\n cluster_arn: str,\n ecs_client: _ECSClient,\n timeout: int,\n ) -> dict:\n \"\"\"\n Waits for an ECS task run to reach a RUNNING status.\n\n If a STOPPED status is reached instead, an exception is raised indicating the\n reason that the task run did not start.\n \"\"\"\n for task in self._watch_task_run(\n logger,\n configuration,\n task_arn,\n cluster_arn,\n ecs_client,\n until_status=\"RUNNING\",\n timeout=timeout,\n ):\n # TODO: It is possible that the task has passed _through_ a RUNNING\n # status during the polling interval. In this case, there is not an\n # exception to raise.\n if task[\"lastStatus\"] == \"STOPPED\":\n code = task.get(\"stopCode\")\n reason = task.get(\"stoppedReason\")\n # Generate a dynamic exception type from the AWS name\n raise type(code, (RuntimeError,), {})(reason)\n\n return task\n\n def _wait_for_task_finish(\n self,\n logger: logging.Logger,\n configuration: ECSJobConfiguration,\n task_arn: str,\n cluster_arn: str,\n task_definition: dict,\n ecs_client: _ECSClient,\n boto_session: boto3.Session,\n ):\n \"\"\"\n Watch an ECS task until it reaches a STOPPED status.\n\n If configured, logs from the Prefect container are streamed to stderr.\n\n Returns a description of the task on completion.\n \"\"\"\n can_stream_output = False\n container_name = (\n configuration.container_name\n or _container_name_from_task_definition(task_definition)\n or ECS_DEFAULT_CONTAINER_NAME\n )\n\n if configuration.stream_output:\n container_def = _get_container(\n task_definition[\"containerDefinitions\"], container_name\n )\n if not container_def:\n logger.warning(\n \"Prefect container definition not found in \"\n \"task definition. Output cannot be streamed.\"\n )\n elif not container_def.get(\"logConfiguration\"):\n logger.warning(\n \"Logging configuration not found on task. \"\n \"Output cannot be streamed.\"\n )\n elif not container_def[\"logConfiguration\"].get(\"logDriver\") == \"awslogs\":\n logger.warning(\n \"Logging configuration uses unsupported \"\n \" driver {container_def['logConfiguration'].get('logDriver')!r}. \"\n \"Output cannot be streamed.\"\n )\n else:\n # Prepare to stream the output\n log_config = container_def[\"logConfiguration\"][\"options\"]\n logs_client = boto_session.client(\"logs\")\n can_stream_output = True\n # Track the last log timestamp to prevent double display\n last_log_timestamp: Optional[int] = None\n # Determine the name of the stream as \"prefix/container/run-id\"\n stream_name = \"/\".join(\n [\n log_config[\"awslogs-stream-prefix\"],\n container_name,\n task_arn.rsplit(\"/\")[-1],\n ]\n )\n self._logger.info(\n f\"Streaming output from container {container_name!r}...\"\n )\n\n for task in self._watch_task_run(\n logger,\n configuration,\n task_arn,\n cluster_arn,\n ecs_client,\n current_status=\"RUNNING\",\n ):\n if configuration.stream_output and can_stream_output:\n # On each poll for task run status, also retrieve available logs\n last_log_timestamp = self._stream_available_logs(\n logger,\n logs_client,\n log_group=log_config[\"awslogs-group\"],\n log_stream=stream_name,\n last_log_timestamp=last_log_timestamp,\n )\n\n return task\n\n def _stream_available_logs(\n self,\n logger: logging.Logger,\n logs_client: Any,\n log_group: str,\n log_stream: str,\n last_log_timestamp: Optional[int] = None,\n ) -> Optional[int]:\n \"\"\"\n Stream logs from the given log group and stream since the last log timestamp.\n\n Will continue on paginated responses until all logs are returned.\n\n Returns the last log timestamp which can be used to call this method in the\n future.\n \"\"\"\n last_log_stream_token = \"NO-TOKEN\"\n next_log_stream_token = None\n\n # AWS will return the same token that we send once the end of the paginated\n # response is reached\n while last_log_stream_token != next_log_stream_token:\n last_log_stream_token = next_log_stream_token\n\n request = {\n \"logGroupName\": log_group,\n \"logStreamName\": log_stream,\n }\n\n if last_log_stream_token is not None:\n request[\"nextToken\"] = last_log_stream_token\n\n if last_log_timestamp is not None:\n # Bump the timestamp by one ms to avoid retrieving the last log again\n request[\"startTime\"] = last_log_timestamp + 1\n\n try:\n response = logs_client.get_log_events(**request)\n except Exception:\n logger.error(\n f\"Failed to read log events with request {request}\",\n exc_info=True,\n )\n return last_log_timestamp\n\n log_events = response[\"events\"]\n for log_event in log_events:\n # TODO: This doesn't forward to the local logger, which can be\n # bad for customizing handling and understanding where the\n # log is coming from, but it avoid nesting logger information\n # when the content is output from a Prefect logger on the\n # running infrastructure\n print(log_event[\"message\"], file=sys.stderr)\n\n if (\n last_log_timestamp is None\n or log_event[\"timestamp\"] > last_log_timestamp\n ):\n last_log_timestamp = log_event[\"timestamp\"]\n\n next_log_stream_token = response.get(\"nextForwardToken\")\n if not log_events:\n # Stop reading pages if there was no data\n break\n\n return last_log_timestamp\n\n def _watch_task_run(\n self,\n logger: logging.Logger,\n configuration: ECSJobConfiguration,\n task_arn: str,\n cluster_arn: str,\n ecs_client: _ECSClient,\n current_status: str = \"UNKNOWN\",\n until_status: str = None,\n timeout: int = None,\n ) -> Generator[None, None, dict]:\n \"\"\"\n Watches an ECS task run by querying every `poll_interval` seconds. After each\n query, the retrieved task is yielded. This function returns when the task run\n reaches a STOPPED status or the provided `until_status`.\n\n Emits a log each time the status changes.\n \"\"\"\n last_status = status = current_status\n t0 = time.time()\n while status != until_status:\n tasks = ecs_client.describe_tasks(\n tasks=[task_arn], cluster=cluster_arn, include=[\"TAGS\"]\n )[\"tasks\"]\n\n if tasks:\n task = tasks[0]\n\n status = task[\"lastStatus\"]\n if status != last_status:\n logger.info(f\"ECS task status is {status}.\")\n\n yield task\n\n # No point in continuing if the status is final\n if status == \"STOPPED\":\n break\n\n last_status = status\n\n else:\n # Intermittently, the task will not be described. We wat to respect the\n # watch timeout though.\n logger.debug(\"Task not found.\")\n\n elapsed_time = time.time() - t0\n if timeout is not None and elapsed_time > timeout:\n raise RuntimeError(\n f\"Timed out after {elapsed_time}s while watching task for status \"\n f\"{until_status or 'STOPPED'}.\"\n )\n time.sleep(configuration.task_watch_poll_interval)\n\n def _prepare_task_definition(\n self,\n configuration: ECSJobConfiguration,\n region: str,\n ) -> dict:\n \"\"\"\n Prepare a task definition by inferring any defaults and merging overrides.\n \"\"\"\n task_definition = copy.deepcopy(configuration.task_definition)\n\n # Configure the Prefect runtime container\n task_definition.setdefault(\"containerDefinitions\", [])\n\n # Remove empty container definitions\n task_definition[\"containerDefinitions\"] = [\n d for d in task_definition[\"containerDefinitions\"] if d\n ]\n\n container_name = configuration.container_name\n if not container_name:\n container_name = (\n _container_name_from_task_definition(task_definition)\n or ECS_DEFAULT_CONTAINER_NAME\n )\n\n container = _get_container(\n task_definition[\"containerDefinitions\"], container_name\n )\n if container is None:\n if container_name != ECS_DEFAULT_CONTAINER_NAME:\n raise ValueError(\n f\"Container {container_name!r} not found in task definition.\"\n )\n\n # Look for a container without a name\n for container in task_definition[\"containerDefinitions\"]:\n if \"name\" not in container:\n container[\"name\"] = container_name\n break\n else:\n container = {\"name\": container_name}\n task_definition[\"containerDefinitions\"].append(container)\n\n # Image is required so make sure it's present\n container.setdefault(\"image\", get_prefect_image_name())\n\n # Remove any keys that have been explicitly \"unset\"\n unset_keys = {key for key, value in configuration.env.items() if value is None}\n for item in tuple(container.get(\"environment\", [])):\n if item[\"name\"] in unset_keys or item[\"value\"] is None:\n container[\"environment\"].remove(item)\n\n if configuration.configure_cloudwatch_logs:\n container[\"logConfiguration\"] = {\n \"logDriver\": \"awslogs\",\n \"options\": {\n \"awslogs-create-group\": \"true\",\n \"awslogs-group\": \"prefect\",\n \"awslogs-region\": region,\n \"awslogs-stream-prefix\": configuration.name or \"prefect\",\n **configuration.cloudwatch_logs_options,\n },\n }\n\n family = task_definition.get(\"family\") or ECS_DEFAULT_FAMILY\n task_definition[\"family\"] = slugify(\n family,\n max_length=255,\n regex_pattern=r\"[^a-zA-Z0-9-_]+\",\n )\n\n # CPU and memory are required in some cases, retrieve the value to use\n cpu = task_definition.get(\"cpu\") or ECS_DEFAULT_CPU\n memory = task_definition.get(\"memory\") or ECS_DEFAULT_MEMORY\n\n launch_type = configuration.task_run_request.get(\n \"launchType\", ECS_DEFAULT_LAUNCH_TYPE\n )\n\n if launch_type == \"FARGATE\" or launch_type == \"FARGATE_SPOT\":\n # Task level memory and cpu are required when using fargate\n task_definition[\"cpu\"] = str(cpu)\n task_definition[\"memory\"] = str(memory)\n\n # The FARGATE compatibility is required if it will be used as as launch type\n requires_compatibilities = task_definition.setdefault(\n \"requiresCompatibilities\", []\n )\n if \"FARGATE\" not in requires_compatibilities:\n task_definition[\"requiresCompatibilities\"].append(\"FARGATE\")\n\n # Only the 'awsvpc' network mode is supported when using FARGATE\n # However, we will not enforce that here if the user has set it\n task_definition.setdefault(\"networkMode\", \"awsvpc\")\n\n elif launch_type == \"EC2\":\n # Container level memory and cpu are required when using ec2\n container.setdefault(\"cpu\", cpu)\n container.setdefault(\"memory\", memory)\n\n # Ensure set values are cast to integers\n container[\"cpu\"] = int(container[\"cpu\"])\n container[\"memory\"] = int(container[\"memory\"])\n\n # Ensure set values are cast to strings\n if task_definition.get(\"cpu\"):\n task_definition[\"cpu\"] = str(task_definition[\"cpu\"])\n if task_definition.get(\"memory\"):\n task_definition[\"memory\"] = str(task_definition[\"memory\"])\n\n return task_definition\n\n def _load_network_configuration(\n self, vpc_id: Optional[str], boto_session: boto3.Session\n ) -> dict:\n \"\"\"\n Load settings from a specific VPC or the default VPC and generate a task\n run request's network configuration.\n \"\"\"\n ec2_client = boto_session.client(\"ec2\")\n vpc_message = \"the default VPC\" if not vpc_id else f\"VPC with ID {vpc_id}\"\n\n if not vpc_id:\n # Retrieve the default VPC\n describe = {\"Filters\": [{\"Name\": \"isDefault\", \"Values\": [\"true\"]}]}\n else:\n describe = {\"VpcIds\": [vpc_id]}\n\n vpcs = ec2_client.describe_vpcs(**describe)[\"Vpcs\"]\n if not vpcs:\n help_message = (\n \"Pass an explicit `vpc_id` or configure a default VPC.\"\n if not vpc_id\n else \"Check that the VPC exists in the current region.\"\n )\n raise ValueError(\n f\"Failed to find {vpc_message}. \"\n \"Network configuration cannot be inferred. \"\n + help_message\n )\n\n vpc_id = vpcs[0][\"VpcId\"]\n subnets = ec2_client.describe_subnets(\n Filters=[{\"Name\": \"vpc-id\", \"Values\": [vpc_id]}]\n )[\"Subnets\"]\n if not subnets:\n raise ValueError(\n f\"Failed to find subnets for {vpc_message}. \"\n \"Network configuration cannot be inferred.\"\n )\n\n return {\n \"awsvpcConfiguration\": {\n \"subnets\": [s[\"SubnetId\"] for s in subnets],\n \"assignPublicIp\": \"ENABLED\",\n \"securityGroups\": [],\n }\n }\n\n def _custom_network_configuration(\n self, vpc_id: str, network_configuration: dict, boto_session: boto3.Session\n ) -> dict:\n \"\"\"\n Load settings from a specific VPC or the default VPC and generate a task\n run request's network configuration.\n \"\"\"\n ec2_client = boto_session.client(\"ec2\")\n vpc_message = f\"VPC with ID {vpc_id}\"\n\n vpcs = ec2_client.describe_vpcs(VpcIds=[vpc_id]).get(\"Vpcs\")\n\n if not vpcs:\n raise ValueError(\n f\"Failed to find {vpc_message}. \"\n + \"Network configuration cannot be inferred. \"\n + \"Pass an explicit `vpc_id`.\"\n )\n\n vpc_id = vpcs[0][\"VpcId\"]\n subnets = ec2_client.describe_subnets(\n Filters=[{\"Name\": \"vpc-id\", \"Values\": [vpc_id]}]\n )[\"Subnets\"]\n\n if not subnets:\n raise ValueError(\n f\"Failed to find subnets for {vpc_message}. \"\n + \"Network configuration cannot be inferred.\"\n )\n\n subnet_ids = [subnet[\"SubnetId\"] for subnet in subnets]\n\n config_subnets = network_configuration.get(\"subnets\", [])\n if not all(conf_sn in subnet_ids for conf_sn in config_subnets):\n raise ValueError(\n f\"Subnets {config_subnets} not found within {vpc_message}.\"\n + \"Please check that VPC is associated with supplied subnets.\"\n )\n\n return {\"awsvpcConfiguration\": network_configuration}\n\n def _prepare_task_run_request(\n self,\n boto_session: boto3.Session,\n configuration: ECSJobConfiguration,\n task_definition: dict,\n task_definition_arn: str,\n ) -> dict:\n \"\"\"\n Prepare a task run request payload.\n \"\"\"\n task_run_request = deepcopy(configuration.task_run_request)\n\n task_run_request.setdefault(\"taskDefinition\", task_definition_arn)\n assert task_run_request[\"taskDefinition\"] == task_definition_arn\n\n if task_run_request.get(\"launchType\") == \"FARGATE_SPOT\":\n # Should not be provided at all for FARGATE SPOT\n task_run_request.pop(\"launchType\", None)\n\n # A capacity provider strategy is required for FARGATE SPOT\n task_run_request.setdefault(\n \"capacityProviderStrategy\",\n [{\"capacityProvider\": \"FARGATE_SPOT\", \"weight\": 1}],\n )\n\n overrides = task_run_request.get(\"overrides\", {})\n container_overrides = overrides.get(\"containerOverrides\", [])\n\n # Ensure the network configuration is present if using awsvpc for network mode\n if (\n task_definition.get(\"networkMode\") == \"awsvpc\"\n and not task_run_request.get(\"networkConfiguration\")\n and not configuration.network_configuration\n ):\n task_run_request[\"networkConfiguration\"] = self._load_network_configuration(\n configuration.vpc_id, boto_session\n )\n\n # Use networkConfiguration if supplied by user\n if (\n task_definition.get(\"networkMode\") == \"awsvpc\"\n and configuration.network_configuration\n and configuration.vpc_id\n ):\n task_run_request[\"networkConfiguration\"] = (\n self._custom_network_configuration(\n configuration.vpc_id,\n configuration.network_configuration,\n boto_session,\n )\n )\n\n # Ensure the container name is set if not provided at template time\n\n container_name = (\n configuration.container_name\n or _container_name_from_task_definition(task_definition)\n or ECS_DEFAULT_CONTAINER_NAME\n )\n\n if container_overrides and not container_overrides[0].get(\"name\"):\n container_overrides[0][\"name\"] = container_name\n\n # Ensure configuration command is respected post-templating\n\n orchestration_container = _get_container(container_overrides, container_name)\n\n if orchestration_container:\n # Override the command if given on the configuration\n if configuration.command:\n orchestration_container[\"command\"] = configuration.command\n\n # Clean up templated variable formatting\n\n for container in container_overrides:\n if isinstance(container.get(\"command\"), str):\n container[\"command\"] = shlex.split(container[\"command\"])\n if isinstance(container.get(\"environment\"), dict):\n container[\"environment\"] = [\n {\"name\": k, \"value\": v} for k, v in container[\"environment\"].items()\n ]\n\n # Remove null values \u2014 they're not allowed by AWS\n container[\"environment\"] = [\n item\n for item in container.get(\"environment\", [])\n if item[\"value\"] is not None\n ]\n\n if isinstance(task_run_request.get(\"tags\"), dict):\n task_run_request[\"tags\"] = [\n {\"key\": k, \"value\": v} for k, v in task_run_request[\"tags\"].items()\n ]\n\n if overrides.get(\"cpu\"):\n overrides[\"cpu\"] = str(overrides[\"cpu\"])\n\n if overrides.get(\"memory\"):\n overrides[\"memory\"] = str(overrides[\"memory\"])\n\n # Ensure configuration tags and env are respected post-templating\n\n tags = [\n item\n for item in task_run_request.get(\"tags\", [])\n if item[\"key\"] not in configuration.labels.keys()\n ] + [\n {\"key\": k, \"value\": v}\n for k, v in configuration.labels.items()\n if v is not None\n ]\n\n # Slugify tags keys and values\n tags = [\n {\n \"key\": slugify(\n item[\"key\"],\n regex_pattern=_TAG_REGEX,\n allow_unicode=True,\n lowercase=False,\n ),\n \"value\": slugify(\n item[\"value\"],\n regex_pattern=_TAG_REGEX,\n allow_unicode=True,\n lowercase=False,\n ),\n }\n for item in tags\n ]\n\n if tags:\n task_run_request[\"tags\"] = tags\n\n if orchestration_container:\n environment = [\n item\n for item in orchestration_container.get(\"environment\", [])\n if item[\"name\"] not in configuration.env.keys()\n ] + [\n {\"name\": k, \"value\": v}\n for k, v in configuration.env.items()\n if v is not None\n ]\n if environment:\n orchestration_container[\"environment\"] = environment\n\n # Remove empty container overrides\n\n overrides[\"containerOverrides\"] = [v for v in container_overrides if v]\n\n return task_run_request\n\n @retry(\n stop=stop_after_attempt(MAX_CREATE_TASK_RUN_ATTEMPTS),\n wait=wait_fixed(CREATE_TASK_RUN_MIN_DELAY_SECONDS)\n + wait_random(\n CREATE_TASK_RUN_MIN_DELAY_JITTER_SECONDS,\n CREATE_TASK_RUN_MAX_DELAY_JITTER_SECONDS,\n ),\n )\n def _create_task_run(self, ecs_client: _ECSClient, task_run_request: dict) -> str:\n \"\"\"\n Create a run of a task definition.\n\n Returns the task run ARN.\n \"\"\"\n return ecs_client.run_task(**task_run_request)[\"tasks\"][0]\n\n def _task_definitions_equal(self, taskdef_1, taskdef_2) -> bool:\n \"\"\"\n Compare two task definitions.\n\n Since one may come from the AWS API and have populated defaults, we do our best\n to homogenize the definitions without changing their meaning.\n \"\"\"\n if taskdef_1 == taskdef_2:\n return True\n\n if taskdef_1 is None or taskdef_2 is None:\n return False\n\n taskdef_1 = copy.deepcopy(taskdef_1)\n taskdef_2 = copy.deepcopy(taskdef_2)\n\n for taskdef in (taskdef_1, taskdef_2):\n # Set defaults that AWS would set after registration\n container_definitions = taskdef.get(\"containerDefinitions\", [])\n essential = any(\n container.get(\"essential\") for container in container_definitions\n )\n if not essential:\n container_definitions[0].setdefault(\"essential\", True)\n\n taskdef.setdefault(\"networkMode\", \"bridge\")\n\n _drop_empty_keys_from_task_definition(taskdef_1)\n _drop_empty_keys_from_task_definition(taskdef_2)\n\n # Clear fields that change on registration for comparison\n for field in ECS_POST_REGISTRATION_FIELDS:\n taskdef_1.pop(field, None)\n taskdef_2.pop(field, None)\n\n return taskdef_1 == taskdef_2\n\n async def kill_infrastructure(\n self,\n configuration: ECSJobConfiguration,\n infrastructure_pid: str,\n grace_seconds: int = 30,\n ) -> None:\n \"\"\"\n Kill a task running on ECS.\n\n Args:\n infrastructure_pid: A cluster and task arn combination. This should match a\n value yielded by `ECSWorker.run`.\n \"\"\"\n if grace_seconds != 30:\n self._logger.warning(\n f\"Kill grace period of {grace_seconds}s requested, but AWS does not \"\n \"support dynamic grace period configuration so 30s will be used. \"\n \"See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-config.html for configuration of grace periods.\" # noqa\n )\n cluster, task = parse_identifier(infrastructure_pid)\n await run_sync_in_worker_thread(self._stop_task, configuration, cluster, task)\n\n def _stop_task(\n self, configuration: ECSJobConfiguration, cluster: str, task: str\n ) -> None:\n \"\"\"\n Stop a running ECS task.\n \"\"\"\n if configuration.cluster is not None and cluster != configuration.cluster:\n raise InfrastructureNotAvailable(\n \"Cannot stop ECS task: this infrastructure block has access to \"\n f\"cluster {configuration.cluster!r} but the task is running in cluster \"\n f\"{cluster!r}.\"\n )\n\n _, ecs_client = self._get_session_and_client(configuration)\n try:\n ecs_client.stop_task(cluster=cluster, task=task)\n except Exception as exc:\n # Raise a special exception if the task does not exist\n if \"ClusterNotFound\" in str(exc):\n raise InfrastructureNotFound(\n f\"Cannot stop ECS task: the cluster {cluster!r} could not be found.\"\n ) from exc\n if \"not find task\" in str(exc) or \"referenced task was not found\" in str(\n exc\n ):\n raise InfrastructureNotFound(\n f\"Cannot stop ECS task: the task {task!r} could not be found in \"\n f\"cluster {cluster!r}.\"\n ) from exc\n if \"no registered tasks\" in str(exc):\n raise InfrastructureNotFound(\n f\"Cannot stop ECS task: the cluster {cluster!r} has no tasks.\"\n ) from exc\n\n # Reraise unknown exceptions\n raise\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker-classes","title":"Classes","text":""},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker.job_configuration","title":"
job_configuration (BaseJobConfiguration)
pydantic-model
","text":"
Job configuration for an ECS worker.
Source code in
prefect_aws/workers/ecs_worker.py
class ECSJobConfiguration(BaseJobConfiguration):\n \"\"\"\n Job configuration for an ECS worker.\n \"\"\"\n\n aws_credentials: Optional[AwsCredentials] = Field(default_factory=AwsCredentials)\n task_definition: Optional[Dict[str, Any]] = Field(\n template=_default_task_definition_template()\n )\n task_run_request: Dict[str, Any] = Field(\n template=_default_task_run_request_template()\n )\n configure_cloudwatch_logs: Optional[bool] = Field(default=None)\n cloudwatch_logs_options: Dict[str, str] = Field(default_factory=dict)\n network_configuration: Dict[str, Any] = Field(default_factory=dict)\n stream_output: Optional[bool] = Field(default=None)\n task_start_timeout_seconds: int = Field(default=300)\n task_watch_poll_interval: float = Field(default=5.0)\n auto_deregister_task_definition: bool = Field(default=False)\n vpc_id: Optional[str] = Field(default=None)\n container_name: Optional[str] = Field(default=None)\n cluster: Optional[str] = Field(default=None)\n\n @root_validator\n def task_run_request_requires_arn_if_no_task_definition_given(cls, values) -> dict:\n \"\"\"\n If no task definition is provided, a task definition ARN must be present on the\n task run request.\n \"\"\"\n if not values.get(\"task_run_request\", {}).get(\n \"taskDefinition\"\n ) and not values.get(\"task_definition\"):\n raise ValueError(\n \"A task definition must be provided if a task definition ARN is not \"\n \"present on the task run request.\"\n )\n return values\n\n @root_validator\n def container_name_default_from_task_definition(cls, values) -> dict:\n \"\"\"\n Infers the container name from the task definition if not provided.\n \"\"\"\n if values.get(\"container_name\") is None:\n values[\"container_name\"] = _container_name_from_task_definition(\n values.get(\"task_definition\")\n )\n\n # We may not have a name here still; for example if someone is using a task\n # definition arn. In that case, we'll perform similar logic later to find\n # the name to treat as the \"orchestration\" container.\n\n return values\n\n @root_validator(pre=True)\n def set_default_configure_cloudwatch_logs(cls, values: dict) -> dict:\n \"\"\"\n Streaming output generally requires CloudWatch logs to be configured.\n\n To avoid entangled arguments in the simple case, `configure_cloudwatch_logs`\n defaults to matching the value of `stream_output`.\n \"\"\"\n configure_cloudwatch_logs = values.get(\"configure_cloudwatch_logs\")\n if configure_cloudwatch_logs is None:\n values[\"configure_cloudwatch_logs\"] = values.get(\"stream_output\")\n return values\n\n @root_validator\n def configure_cloudwatch_logs_requires_execution_role_arn(\n cls, values: dict\n ) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if (\n values.get(\"configure_cloudwatch_logs\")\n and not values.get(\"execution_role_arn\")\n # TODO: Does not match\n # Do not raise if they've linked to another task definition or provided\n # it without using our shortcuts\n and not values.get(\"task_run_request\", {}).get(\"taskDefinition\")\n and not (values.get(\"task_definition\") or {}).get(\"executionRoleArn\")\n ):\n raise ValueError(\n \"An `execution_role_arn` must be provided to use \"\n \"`configure_cloudwatch_logs` or `stream_logs`.\"\n )\n return values\n\n @root_validator\n def cloudwatch_logs_options_requires_configure_cloudwatch_logs(\n cls, values: dict\n ) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if values.get(\"cloudwatch_logs_options\") and not values.get(\n \"configure_cloudwatch_logs\"\n ):\n raise ValueError(\n \"`configure_cloudwatch_log` must be enabled to use \"\n \"`cloudwatch_logs_options`.\"\n )\n return values\n\n @root_validator\n def network_configuration_requires_vpc_id(cls, values: dict) -> dict:\n \"\"\"\n Enforces a `vpc_id` is provided when custom network configuration mode is\n enabled for network settings.\n \"\"\"\n if values.get(\"network_configuration\") and not values.get(\"vpc_id\"):\n raise ValueError(\n \"You must provide a `vpc_id` to enable custom `network_configuration`.\"\n )\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker.job_configuration-methods","title":"Methods","text":"
cloudwatch_logs_options_requires_configure_cloudwatch_logs
classmethod
Enforces that an execution role arn is provided (or could be provided by a runtime task definition) when configuring logging.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef cloudwatch_logs_options_requires_configure_cloudwatch_logs(\n cls, values: dict\n) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if values.get(\"cloudwatch_logs_options\") and not values.get(\n \"configure_cloudwatch_logs\"\n ):\n raise ValueError(\n \"`configure_cloudwatch_log` must be enabled to use \"\n \"`cloudwatch_logs_options`.\"\n )\n return values\n
configure_cloudwatch_logs_requires_execution_role_arn
classmethod
Enforces that an execution role arn is provided (or could be provided by a runtime task definition) when configuring logging.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef configure_cloudwatch_logs_requires_execution_role_arn(\n cls, values: dict\n) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if (\n values.get(\"configure_cloudwatch_logs\")\n and not values.get(\"execution_role_arn\")\n # TODO: Does not match\n # Do not raise if they've linked to another task definition or provided\n # it without using our shortcuts\n and not values.get(\"task_run_request\", {}).get(\"taskDefinition\")\n and not (values.get(\"task_definition\") or {}).get(\"executionRoleArn\")\n ):\n raise ValueError(\n \"An `execution_role_arn` must be provided to use \"\n \"`configure_cloudwatch_logs` or `stream_logs`.\"\n )\n return values\n
container_name_default_from_task_definition
classmethod
Infers the container name from the task definition if not provided.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef container_name_default_from_task_definition(cls, values) -> dict:\n \"\"\"\n Infers the container name from the task definition if not provided.\n \"\"\"\n if values.get(\"container_name\") is None:\n values[\"container_name\"] = _container_name_from_task_definition(\n values.get(\"task_definition\")\n )\n\n # We may not have a name here still; for example if someone is using a task\n # definition arn. In that case, we'll perform similar logic later to find\n # the name to treat as the \"orchestration\" container.\n\n return values\n
network_configuration_requires_vpc_id
classmethod
Enforces a vpc_id
is provided when custom network configuration mode is enabled for network settings.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef network_configuration_requires_vpc_id(cls, values: dict) -> dict:\n \"\"\"\n Enforces a `vpc_id` is provided when custom network configuration mode is\n enabled for network settings.\n \"\"\"\n if values.get(\"network_configuration\") and not values.get(\"vpc_id\"):\n raise ValueError(\n \"You must provide a `vpc_id` to enable custom `network_configuration`.\"\n )\n return values\n
set_default_configure_cloudwatch_logs
classmethod
Streaming output generally requires CloudWatch logs to be configured.
To avoid entangled arguments in the simple case, configure_cloudwatch_logs
defaults to matching the value of stream_output
.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator(pre=True)\ndef set_default_configure_cloudwatch_logs(cls, values: dict) -> dict:\n \"\"\"\n Streaming output generally requires CloudWatch logs to be configured.\n\n To avoid entangled arguments in the simple case, `configure_cloudwatch_logs`\n defaults to matching the value of `stream_output`.\n \"\"\"\n configure_cloudwatch_logs = values.get(\"configure_cloudwatch_logs\")\n if configure_cloudwatch_logs is None:\n values[\"configure_cloudwatch_logs\"] = values.get(\"stream_output\")\n return values\n
task_run_request_requires_arn_if_no_task_definition_given
classmethod
If no task definition is provided, a task definition ARN must be present on the task run request.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef task_run_request_requires_arn_if_no_task_definition_given(cls, values) -> dict:\n \"\"\"\n If no task definition is provided, a task definition ARN must be present on the\n task run request.\n \"\"\"\n if not values.get(\"task_run_request\", {}).get(\n \"taskDefinition\"\n ) and not values.get(\"task_definition\"):\n raise ValueError(\n \"A task definition must be provided if a task definition ARN is not \"\n \"present on the task run request.\"\n )\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker.job_configuration_variables","title":"
job_configuration_variables (BaseVariables)
pydantic-model
","text":"
Variables for templating an ECS job.
Source code in
prefect_aws/workers/ecs_worker.py
class ECSVariables(BaseVariables):\n \"\"\"\n Variables for templating an ECS job.\n \"\"\"\n\n task_definition_arn: Optional[str] = Field(\n default=None,\n description=(\n \"An identifier for an existing task definition to use. If set, options that\"\n \" require changes to the task definition will be ignored. All contents of \"\n \"the task definition in the job configuration will be ignored.\"\n ),\n )\n env: Dict[str, Optional[str]] = Field(\n title=\"Environment Variables\",\n default_factory=dict,\n description=(\n \"Environment variables to provide to the task run. These variables are set \"\n \"on the Prefect container at task runtime. These will not be set on the \"\n \"task definition.\"\n ),\n )\n aws_credentials: AwsCredentials = Field(\n title=\"AWS Credentials\",\n default_factory=AwsCredentials,\n description=(\n \"The AWS credentials to use to connect to ECS. If not provided, credentials\"\n \" will be inferred from the local environment following AWS's boto client's\"\n \" rules.\"\n ),\n )\n cluster: Optional[str] = Field(\n default=None,\n description=(\n \"The ECS cluster to run the task in. An ARN or name may be provided. If \"\n \"not provided, the default cluster will be used.\"\n ),\n )\n family: Optional[str] = Field(\n default=None,\n description=(\n \"A family for the task definition. If not provided, it will be inferred \"\n \"from the task definition. If the task definition does not have a family, \"\n \"the name will be generated. When flow and deployment metadata is \"\n \"available, the generated name will include their names. Values for this \"\n \"field will be slugified to match AWS character requirements.\"\n ),\n )\n launch_type: Optional[Literal[\"FARGATE\", \"EC2\", \"EXTERNAL\", \"FARGATE_SPOT\"]] = (\n Field(\n default=ECS_DEFAULT_LAUNCH_TYPE,\n description=(\n \"The type of ECS task run infrastructure that should be used. Note that\"\n \" 'FARGATE_SPOT' is not a formal ECS launch type, but we will configure\"\n \" the proper capacity provider stategy if set here.\"\n ),\n )\n )\n image: Optional[str] = Field(\n default=None,\n description=(\n \"The image to use for the Prefect container in the task. If this value is \"\n \"not null, it will override the value in the task definition. This value \"\n \"defaults to a Prefect base image matching your local versions.\"\n ),\n )\n cpu: int = Field(\n title=\"CPU\",\n default=None,\n description=(\n \"The amount of CPU to provide to the ECS task. Valid amounts are \"\n \"specified in the AWS documentation. If not provided, a default value of \"\n f\"{ECS_DEFAULT_CPU} will be used unless present on the task definition.\"\n ),\n )\n memory: int = Field(\n default=None,\n description=(\n \"The amount of memory to provide to the ECS task. Valid amounts are \"\n \"specified in the AWS documentation. If not provided, a default value of \"\n f\"{ECS_DEFAULT_MEMORY} will be used unless present on the task definition.\"\n ),\n )\n container_name: str = Field(\n default=None,\n description=(\n \"The name of the container flow run orchestration will occur in. If not \"\n f\"specified, a default value of {ECS_DEFAULT_CONTAINER_NAME} will be used \"\n \"and if that is not found in the task definition the first container will \"\n \"be used.\"\n ),\n )\n task_role_arn: str = Field(\n title=\"Task Role ARN\",\n default=None,\n description=(\n \"A role to attach to the task run. This controls the permissions of the \"\n \"task while it is running.\"\n ),\n )\n execution_role_arn: str = Field(\n title=\"Execution Role ARN\",\n default=None,\n description=(\n \"An execution role to use for the task. This controls the permissions of \"\n \"the task when it is launching. If this value is not null, it will \"\n \"override the value in the task definition. An execution role must be \"\n \"provided to capture logs from the container.\"\n ),\n )\n vpc_id: Optional[str] = Field(\n title=\"VPC ID\",\n default=None,\n description=(\n \"The AWS VPC to link the task run to. This is only applicable when using \"\n \"the 'awsvpc' network mode for your task. FARGATE tasks require this \"\n \"network mode, but for EC2 tasks the default network mode is 'bridge'. \"\n \"If using the 'awsvpc' network mode and this field is null, your default \"\n \"VPC will be used. If no default VPC can be found, the task run will fail.\"\n ),\n )\n configure_cloudwatch_logs: bool = Field(\n default=None,\n description=(\n \"If enabled, the Prefect container will be configured to send its output \"\n \"to the AWS CloudWatch logs service. This functionality requires an \"\n \"execution role with logs:CreateLogStream, logs:CreateLogGroup, and \"\n \"logs:PutLogEvents permissions. The default for this field is `False` \"\n \"unless `stream_output` is set.\"\n ),\n )\n cloudwatch_logs_options: Dict[str, str] = Field(\n default_factory=dict,\n description=(\n \"When `configure_cloudwatch_logs` is enabled, this setting may be used to\"\n \" pass additional options to the CloudWatch logs configuration or override\"\n \" the default options. See the [AWS\"\n \" documentation](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_awslogs.html#create_awslogs_logdriver_options)\" # noqa\n \" for available options. \"\n ),\n )\n\n network_configuration: Dict[str, Any] = Field(\n default_factory=dict,\n description=(\n \"When `network_configuration` is supplied it will override ECS Worker's\"\n \"awsvpcConfiguration that defined in the ECS task executing your workload. \"\n \"See the [AWS documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ecs-service-awsvpcconfiguration.html)\" # noqa\n \" for available options.\"\n ),\n )\n\n stream_output: bool = Field(\n default=None,\n description=(\n \"If enabled, logs will be streamed from the Prefect container to the local \"\n \"console. Unless you have configured AWS CloudWatch logs manually on your \"\n \"task definition, this requires the same prerequisites outlined in \"\n \"`configure_cloudwatch_logs`.\"\n ),\n )\n task_start_timeout_seconds: int = Field(\n default=300,\n description=(\n \"The amount of time to watch for the start of the ECS task \"\n \"before marking it as failed. The task must enter a RUNNING state to be \"\n \"considered started.\"\n ),\n )\n task_watch_poll_interval: float = Field(\n default=5.0,\n description=(\n \"The amount of time to wait between AWS API calls while monitoring the \"\n \"state of an ECS task.\"\n ),\n )\n auto_deregister_task_definition: bool = Field(\n default=False,\n description=(\n \"If enabled, any task definitions that are created by this block will be \"\n \"deregistered. Existing task definitions linked by ARN will never be \"\n \"deregistered. Deregistering a task definition does not remove it from \"\n \"your AWS account, instead it will be marked as INACTIVE.\"\n ),\n )\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker.job_configuration_variables-attributes","title":"Attributes","text":"
auto_deregister_task_definition: bool
pydantic-field
If enabled, any task definitions that are created by this block will be deregistered. Existing task definitions linked by ARN will never be deregistered. Deregistering a task definition does not remove it from your AWS account, instead it will be marked as INACTIVE.
aws_credentials: AwsCredentials
pydantic-field
The AWS credentials to use to connect to ECS. If not provided, credentials will be inferred from the local environment following AWS's boto client's rules.
cloudwatch_logs_options: Dict[str, str]
pydantic-field
When configure_cloudwatch_logs
is enabled, this setting may be used to pass additional options to the CloudWatch logs configuration or override the default options. See the AWS documentation for available options.
cluster: str
pydantic-field
The ECS cluster to run the task in. An ARN or name may be provided. If not provided, the default cluster will be used.
configure_cloudwatch_logs: bool
pydantic-field
If enabled, the Prefect container will be configured to send its output to the AWS CloudWatch logs service. This functionality requires an execution role with logs:CreateLogStream, logs:CreateLogGroup, and logs:PutLogEvents permissions. The default for this field is False
unless stream_output
is set.
container_name: str
pydantic-field
The name of the container flow run orchestration will occur in. If not specified, a default value of prefect will be used and if that is not found in the task definition the first container will be used.
cpu: int
pydantic-field
The amount of CPU to provide to the ECS task. Valid amounts are specified in the AWS documentation. If not provided, a default value of 1024 will be used unless present on the task definition.
execution_role_arn: str
pydantic-field
An execution role to use for the task. This controls the permissions of the task when it is launching. If this value is not null, it will override the value in the task definition. An execution role must be provided to capture logs from the container.
family: str
pydantic-field
A family for the task definition. If not provided, it will be inferred from the task definition. If the task definition does not have a family, the name will be generated. When flow and deployment metadata is available, the generated name will include their names. Values for this field will be slugified to match AWS character requirements.
image: str
pydantic-field
The image to use for the Prefect container in the task. If this value is not null, it will override the value in the task definition. This value defaults to a Prefect base image matching your local versions.
launch_type: Literal['FARGATE', 'EC2', 'EXTERNAL', 'FARGATE_SPOT']
pydantic-field
The type of ECS task run infrastructure that should be used. Note that 'FARGATE_SPOT' is not a formal ECS launch type, but we will configure the proper capacity provider stategy if set here.
memory: int
pydantic-field
The amount of memory to provide to the ECS task. Valid amounts are specified in the AWS documentation. If not provided, a default value of 2048 will be used unless present on the task definition.
network_configuration: Dict[str, Any]
pydantic-field
When network_configuration
is supplied it will override ECS Worker'sawsvpcConfiguration that defined in the ECS task executing your workload. See the AWS documentation for available options.
stream_output: bool
pydantic-field
If enabled, logs will be streamed from the Prefect container to the local console. Unless you have configured AWS CloudWatch logs manually on your task definition, this requires the same prerequisites outlined in configure_cloudwatch_logs
.
task_definition_arn: str
pydantic-field
An identifier for an existing task definition to use. If set, options that require changes to the task definition will be ignored. All contents of the task definition in the job configuration will be ignored.
task_role_arn: str
pydantic-field
A role to attach to the task run. This controls the permissions of the task while it is running.
task_start_timeout_seconds: int
pydantic-field
The amount of time to watch for the start of the ECS task before marking it as failed. The task must enter a RUNNING state to be considered started.
task_watch_poll_interval: float
pydantic-field
The amount of time to wait between AWS API calls while monitoring the state of an ECS task.
vpc_id: str
pydantic-field
The AWS VPC to link the task run to. This is only applicable when using the 'awsvpc' network mode for your task. FARGATE tasks require this network mode, but for EC2 tasks the default network mode is 'bridge'. If using the 'awsvpc' network mode and this field is null, your default VPC will be used. If no default VPC can be found, the task run will fail.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker-methods","title":"Methods","text":""},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker.kill_infrastructure","title":"
kill_infrastructure
async
","text":"
Kill a task running on ECS.
Parameters:
Name Type Description Default
infrastructure_pid
str
A cluster and task arn combination. This should match a value yielded by ECSWorker.run
.
required Source code in
prefect_aws/workers/ecs_worker.py
async def kill_infrastructure(\n self,\n configuration: ECSJobConfiguration,\n infrastructure_pid: str,\n grace_seconds: int = 30,\n) -> None:\n \"\"\"\n Kill a task running on ECS.\n\n Args:\n infrastructure_pid: A cluster and task arn combination. This should match a\n value yielded by `ECSWorker.run`.\n \"\"\"\n if grace_seconds != 30:\n self._logger.warning(\n f\"Kill grace period of {grace_seconds}s requested, but AWS does not \"\n \"support dynamic grace period configuration so 30s will be used. \"\n \"See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-config.html for configuration of grace periods.\" # noqa\n )\n cluster, task = parse_identifier(infrastructure_pid)\n await run_sync_in_worker_thread(self._stop_task, configuration, cluster, task)\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker.run","title":"
run
async
","text":"
Runs a given flow run on the current worker.
Source code in
prefect_aws/workers/ecs_worker.py
async def run(\n self,\n flow_run: \"FlowRun\",\n configuration: ECSJobConfiguration,\n task_status: Optional[anyio.abc.TaskStatus] = None,\n) -> BaseWorkerResult:\n \"\"\"\n Runs a given flow run on the current worker.\n \"\"\"\n boto_session, ecs_client = await run_sync_in_worker_thread(\n self._get_session_and_client, configuration\n )\n\n logger = self.get_flow_run_logger(flow_run)\n\n (\n task_arn,\n cluster_arn,\n task_definition,\n is_new_task_definition,\n ) = await run_sync_in_worker_thread(\n self._create_task_and_wait_for_start,\n logger,\n boto_session,\n ecs_client,\n configuration,\n flow_run,\n )\n\n # The task identifier is \"{cluster}::{task}\" where we use the configured cluster\n # if set to preserve matching by name rather than arn\n # Note \"::\" is used despite the Prefect standard being \":\" because ARNs contain\n # single colons.\n identifier = (\n (configuration.cluster if configuration.cluster else cluster_arn)\n + \"::\"\n + task_arn\n )\n\n if task_status:\n task_status.started(identifier)\n\n status_code = await run_sync_in_worker_thread(\n self._watch_task_and_get_exit_code,\n logger,\n configuration,\n task_arn,\n cluster_arn,\n task_definition,\n is_new_task_definition and configuration.auto_deregister_task_definition,\n boto_session,\n ecs_client,\n )\n\n return ECSWorkerResult(\n identifier=identifier,\n # If the container does not start the exit code can be null but we must\n # still report a status code. We use a -1 to indicate a special code.\n status_code=status_code if status_code is not None else -1,\n )\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorkerResult","title":"
ECSWorkerResult (BaseWorkerResult)
pydantic-model
","text":"
The result of an ECS job.
Source code in
prefect_aws/workers/ecs_worker.py
class ECSWorkerResult(BaseWorkerResult):\n \"\"\"\n The result of an ECS job.\n \"\"\"\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker-functions","title":"Functions","text":""},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.parse_identifier","title":"
parse_identifier
","text":"
Splits identifier into its cluster and task components, e.g. input \"cluster_name::task_arn\" outputs (\"cluster_name\", \"task_arn\").
Source code in
prefect_aws/workers/ecs_worker.py
def parse_identifier(identifier: str) -> ECSIdentifier:\n \"\"\"\n Splits identifier into its cluster and task components, e.g.\n input \"cluster_name::task_arn\" outputs (\"cluster_name\", \"task_arn\").\n \"\"\"\n cluster, task = identifier.split(\"::\", maxsplit=1)\n return ECSIdentifier(cluster, task)\n
"},{"location":"examples_catalog/","title":"Examples Catalog","text":"
Below is a list of examples for prefect-aws
.
"},{"location":"examples_catalog/#batch-module","title":"Batch Module","text":"
Submits a job to batch.
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.batch import batch_submit\n\n\n@flow\ndef example_batch_submit_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n job_id = batch_submit(\n \"job_name\",\n \"job_definition\",\n \"job_queue\",\n aws_credentials\n )\n return job_id\n\nexample_batch_submit_flow()\n
"},{"location":"examples_catalog/#client-waiter-module","title":"Client Waiter Module","text":"
Run an ec2 waiter until instance_exists.
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.client_wait import client_waiter\n\n@flow\ndef example_client_wait_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n\n waiter = client_waiter(\n \"ec2\",\n \"instance_exists\",\n aws_credentials\n )\n\n return waiter\nexample_client_wait_flow()\n
"},{"location":"examples_catalog/#credentials-module","title":"Credentials Module","text":"
Create an S3 client from an authorized boto3 session:
aws_credentials = AwsCredentials(\n aws_access_key_id = \"access_key_id\",\n aws_secret_access_key = \"secret_access_key\"\n )\ns3_client = aws_credentials.get_boto3_session().client(\"s3\")\n
Create an S3 client from an authorized boto3 session
minio_credentials = MinIOCredentials(\n minio_root_user = \"minio_root_user\",\n minio_root_password = \"minio_root_password\"\n)\ns3_client = minio_credentials.get_boto3_session().client(\n service=\"s3\",\n endpoint_url=\"http://localhost:9000\"\n)\n
"},{"location":"examples_catalog/#s3-module","title":"S3 Module","text":"
Read \"subfolder/file1\" contents from an S3 bucket named \"bucket\":
from prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import S3Bucket\n\naws_creds = AwsCredentials(\n aws_access_key_id=AWS_ACCESS_KEY_ID,\n aws_secret_access_key=AWS_SECRET_ACCESS_KEY\n)\n\ns3_bucket_block = S3Bucket(\n bucket_name=\"bucket\",\n aws_credentials=aws_creds,\n basepath=\"subfolder\"\n)\n\nkey_contents = s3_bucket_block.read_path(path=\"subfolder/file1\")\n
Download my_folder to a local folder named my_folder.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.download_folder_to_path(\"my_folder\", \"my_folder\")\n
Read and upload a file to an S3 bucket:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import s3_upload\n\n\n@flow\nasync def example_s3_upload_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n with open(\"data.csv\", \"rb\") as file:\n key = await s3_upload(\n bucket=\"bucket\",\n key=\"data.csv\",\n data=file.read(),\n aws_credentials=aws_credentials,\n )\n\nexample_s3_upload_flow()\n
Download my_folder/notes.txt object to a BytesIO object.
from io import BytesIO\n\nfrom prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\nwith BytesIO() as buf:\n s3_bucket.download_object_to_file_object(\"my_folder/notes.txt\", buf)\n
Download my_folder/notes.txt object to a BufferedWriter.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\nwith open(\"notes.txt\", \"wb\") as f:\n s3_bucket.download_object_to_file_object(\"my_folder/notes.txt\", f)\n
Download a file from an S3 bucket:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import s3_download\n\n\n@flow\nasync def example_s3_download_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n data = await s3_download(\n bucket=\"bucket\",\n key=\"key\",\n aws_credentials=aws_credentials,\n )\n\nexample_s3_download_flow()\n
Stream notes.txt from your-bucket/notes.txt to my-bucket/landed/notes.txt.
from prefect_aws.s3 import S3Bucket\n\nyour_s3_bucket = S3Bucket.load(\"your-bucket\")\nmy_s3_bucket = S3Bucket.load(\"my-bucket\")\n\nmy_s3_bucket.stream_from(\n your_s3_bucket,\n \"notes.txt\",\n to_path=\"landed/notes.txt\"\n)\n
Download my_folder/notes.txt object to notes.txt.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.download_object_to_path(\"my_folder/notes.txt\", \"notes.txt\")\n
List all objects in a bucket:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import s3_list_objects\n\n\n@flow\nasync def example_s3_list_objects_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n objects = await s3_list_objects(\n bucket=\"data_bucket\",\n aws_credentials=aws_credentials\n )\n\nexample_s3_list_objects_flow()\n
List objects under the
base_folder
.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.list_objects(\"base_folder\")\n
Upload contents from my_folder to new_folder.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.upload_from_folder(\"my_folder\", \"new_folder\")\n
Upload notes.txt to my_folder/notes.txt.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.upload_from_path(\"notes.txt\", \"my_folder/notes.txt\")\n
Upload BytesIO object to my_folder/notes.txt.
from io import BytesIO\n\nfrom prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\nwith open(\"notes.txt\", \"rb\") as f:\n s3_bucket.upload_from_file_object(f, \"my_folder/notes.txt\")\n
Upload BufferedReader object to my_folder/notes.txt.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\nwith open(\"notes.txt\", \"rb\") as f:\n s3_bucket.upload_from_file_object(\n f, \"my_folder/notes.txt\"\n )\n
"},{"location":"examples_catalog/#secrets-manager-module","title":"Secrets Manager Module","text":"
Reads a secret.
secrets_manager = SecretsManager.load(\"MY_BLOCK\")\nsecrets_manager.read_secret()\n
Delete a secret immediately:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import delete_secret\n\n@flow\ndef example_delete_secret_immediately():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n delete_secret(\n secret_name=\"life_the_universe_and_everything\",\n aws_credentials=aws_credentials,\n force_delete_without_recovery: True\n )\n\nexample_delete_secret_immediately()\n
Delete a secret with a 90 day recovery window:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import delete_secret\n\n@flow\ndef example_delete_secret_with_recovery_window():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n delete_secret(\n secret_name=\"life_the_universe_and_everything\",\n aws_credentials=aws_credentials,\n recovery_window_in_days=90\n )\n\nexample_delete_secret_with_recovery_window()\n
Write some secret data.
secrets_manager = SecretsManager.load(\"MY_BLOCK\")\nsecrets_manager.write_secret(b\"my_secret_data\")\n
Deletes the secret with a recovery window of 15 days.
secrets_manager = SecretsManager.load(\"MY_BLOCK\")\nsecrets_manager.delete_secret(recovery_window_in_days=15)\n
Read a secret value:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import read_secret\n\n@flow\ndef example_read_secret():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n secret_value = read_secret(\n secret_name=\"db_password\",\n aws_credentials=aws_credentials\n )\n\nexample_read_secret()\n
Update a secret value:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import update_secret\n\n@flow\ndef example_update_secret():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n update_secret(\n secret_name=\"life_the_universe_and_everything\",\n secret_value=\"42\",\n aws_credentials=aws_credentials\n )\n\nexample_update_secret()\n
"},{"location":"s3/","title":"S3","text":""},{"location":"s3/#prefect_aws.s3","title":"
prefect_aws.s3
","text":"
Tasks for interacting with AWS S3
"},{"location":"s3/#prefect_aws.s3-classes","title":"Classes","text":""},{"location":"s3/#prefect_aws.s3.S3Bucket","title":"
S3Bucket (WritableFileSystem, WritableDeploymentStorage, ObjectStorageBlock)
pydantic-model
","text":"
Block used to store data using AWS S3 or S3-compatible object storage like MinIO.
Attributes:
Name Type Description
bucket_name
str
Name of your bucket.
credentials
Union[prefect_aws.credentials.AwsCredentials, prefect_aws.credentials.MinIOCredentials]
A block containing your credentials to AWS or MinIO.
bucket_folder
str
A default path to a folder within the S3 bucket to use for reading and writing objects.
Source code in
prefect_aws/s3.py
class S3Bucket(WritableFileSystem, WritableDeploymentStorage, ObjectStorageBlock):\n\n \"\"\"\n Block used to store data using AWS S3 or S3-compatible object storage like MinIO.\n\n Attributes:\n bucket_name: Name of your bucket.\n credentials: A block containing your credentials to AWS or MinIO.\n bucket_folder: A default path to a folder within the S3 bucket to use\n for reading and writing objects.\n \"\"\"\n\n _logo_url = \"https://cdn.sanity.io/images/3ugk85nk/production/d74b16fe84ce626345adf235a47008fea2869a60-225x225.png\" # noqa\n _block_type_name = \"S3 Bucket\"\n _documentation_url = (\n \"https://prefecthq.github.io/prefect-aws/s3/#prefect_aws.s3.S3Bucket\" # noqa\n )\n\n bucket_name: str = Field(default=..., description=\"Name of your bucket.\")\n\n credentials: Union[AwsCredentials, MinIOCredentials] = Field(\n default_factory=AwsCredentials,\n description=\"A block containing your credentials to AWS or MinIO.\",\n )\n\n bucket_folder: str = Field(\n default=\"\",\n description=(\n \"A default path to a folder within the S3 bucket to use \"\n \"for reading and writing objects.\"\n ),\n )\n\n class Config:\n smart_union = True\n\n # Property to maintain compatibility with storage block based deployments\n @property\n def basepath(self) -> str:\n \"\"\"\n The base path of the S3 bucket.\n\n Returns:\n str: The base path of the S3 bucket.\n \"\"\"\n return self.bucket_folder\n\n @basepath.setter\n def basepath(self, value: str) -> None:\n self.bucket_folder = value\n\n def _resolve_path(self, path: str) -> str:\n \"\"\"\n A helper function used in write_path to join `self.basepath` and `path`.\n\n Args:\n\n path: Name of the key, e.g. \"file1\". Each object in your\n bucket has a unique key (or key name).\n\n \"\"\"\n # If bucket_folder provided, it means we won't write to the root dir of\n # the bucket. So we need to add it on the front of the path.\n #\n # AWS object key naming guidelines require '/' for bucket folders.\n # Get POSIX path to prevent `pathlib` from inferring '\\' on Windows OS\n path = (\n (Path(self.bucket_folder) / path).as_posix() if self.bucket_folder else path\n )\n\n return path\n\n def _get_s3_client(self) -> boto3.client:\n \"\"\"\n Authenticate MinIO credentials or AWS credentials and return an S3 client.\n This is a helper function called by read_path() or write_path().\n \"\"\"\n return self.credentials.get_s3_client()\n\n def _get_bucket_resource(self) -> boto3.resource:\n \"\"\"\n Retrieves boto3 resource object for the configured bucket\n \"\"\"\n params_override = self.credentials.aws_client_parameters.get_params_override()\n bucket = (\n self.credentials.get_boto3_session()\n .resource(\"s3\", **params_override)\n .Bucket(self.bucket_name)\n )\n return bucket\n\n @sync_compatible\n async def get_directory(\n self, from_path: Optional[str] = None, local_path: Optional[str] = None\n ) -> None:\n \"\"\"\n Copies a folder from the configured S3 bucket to a local directory.\n\n Defaults to copying the entire contents of the block's basepath to the current\n working directory.\n\n Args:\n from_path: Path in S3 bucket to download from. Defaults to the block's\n configured basepath.\n local_path: Local path to download S3 contents to. Defaults to the current\n working directory.\n \"\"\"\n bucket_folder = self.bucket_folder\n if from_path is None:\n from_path = str(bucket_folder) if bucket_folder else \"\"\n\n if local_path is None:\n local_path = str(Path(\".\").absolute())\n else:\n local_path = str(Path(local_path).expanduser())\n\n bucket = self._get_bucket_resource()\n for obj in bucket.objects.filter(Prefix=from_path):\n if obj.key[-1] == \"/\":\n # object is a folder and will be created if it contains any objects\n continue\n target = os.path.join(\n local_path,\n os.path.relpath(obj.key, from_path),\n )\n os.makedirs(os.path.dirname(target), exist_ok=True)\n bucket.download_file(obj.key, target)\n\n @sync_compatible\n async def put_directory(\n self,\n local_path: Optional[str] = None,\n to_path: Optional[str] = None,\n ignore_file: Optional[str] = None,\n ) -> int:\n \"\"\"\n Uploads a directory from a given local path to the configured S3 bucket in a\n given folder.\n\n Defaults to uploading the entire contents the current working directory to the\n block's basepath.\n\n Args:\n local_path: Path to local directory to upload from.\n to_path: Path in S3 bucket to upload to. Defaults to block's configured\n basepath.\n ignore_file: Path to file containing gitignore style expressions for\n filepaths to ignore.\n\n \"\"\"\n to_path = \"\" if to_path is None else to_path\n\n if local_path is None:\n local_path = \".\"\n\n included_files = None\n if ignore_file:\n with open(ignore_file, \"r\") as f:\n ignore_patterns = f.readlines()\n\n included_files = filter_files(local_path, ignore_patterns)\n\n uploaded_file_count = 0\n for local_file_path in Path(local_path).expanduser().rglob(\"*\"):\n if (\n included_files is not None\n and str(local_file_path.relative_to(local_path)) not in included_files\n ):\n continue\n elif not local_file_path.is_dir():\n remote_file_path = Path(to_path) / local_file_path.relative_to(\n local_path\n )\n with open(local_file_path, \"rb\") as local_file:\n local_file_content = local_file.read()\n\n await self.write_path(\n remote_file_path.as_posix(), content=local_file_content\n )\n uploaded_file_count += 1\n\n return uploaded_file_count\n\n @sync_compatible\n async def read_path(self, path: str) -> bytes:\n \"\"\"\n Read specified path from S3 and return contents. Provide the entire\n path to the key in S3.\n\n Args:\n path: Entire path to (and including) the key.\n\n Example:\n Read \"subfolder/file1\" contents from an S3 bucket named \"bucket\":\n ```python\n from prefect_aws import AwsCredentials\n from prefect_aws.s3 import S3Bucket\n\n aws_creds = AwsCredentials(\n aws_access_key_id=AWS_ACCESS_KEY_ID,\n aws_secret_access_key=AWS_SECRET_ACCESS_KEY\n )\n\n s3_bucket_block = S3Bucket(\n bucket_name=\"bucket\",\n aws_credentials=aws_creds,\n basepath=\"subfolder\"\n )\n\n key_contents = s3_bucket_block.read_path(path=\"subfolder/file1\")\n ```\n \"\"\"\n path = self._resolve_path(path)\n\n return await run_sync_in_worker_thread(self._read_sync, path)\n\n def _read_sync(self, key: str) -> bytes:\n \"\"\"\n Called by read_path(). Creates an S3 client and retrieves the\n contents from a specified path.\n \"\"\"\n\n s3_client = self._get_s3_client()\n\n with io.BytesIO() as stream:\n s3_client.download_fileobj(Bucket=self.bucket_name, Key=key, Fileobj=stream)\n stream.seek(0)\n output = stream.read()\n return output\n\n @sync_compatible\n async def write_path(self, path: str, content: bytes) -> str:\n \"\"\"\n Writes to an S3 bucket.\n\n Args:\n\n path: The key name. Each object in your bucket has a unique\n key (or key name).\n content: What you are uploading to S3.\n\n Example:\n\n Write data to the path `dogs/small_dogs/havanese` in an S3 Bucket:\n ```python\n from prefect_aws import MinioCredentials\n from prefect_aws.s3 import S3Bucket\n\n minio_creds = MinIOCredentials(\n minio_root_user = \"minioadmin\",\n minio_root_password = \"minioadmin\",\n )\n\n s3_bucket_block = S3Bucket(\n bucket_name=\"bucket\",\n minio_credentials=minio_creds,\n basepath=\"dogs/smalldogs\",\n endpoint_url=\"http://localhost:9000\",\n )\n s3_havanese_path = s3_bucket_block.write_path(path=\"havanese\", content=data)\n ```\n \"\"\"\n\n path = self._resolve_path(path)\n\n await run_sync_in_worker_thread(self._write_sync, path, content)\n\n return path\n\n def _write_sync(self, key: str, data: bytes) -> None:\n \"\"\"\n Called by write_path(). Creates an S3 client and uploads a file\n object.\n \"\"\"\n\n s3_client = self._get_s3_client()\n\n with io.BytesIO(data) as stream:\n s3_client.upload_fileobj(Fileobj=stream, Bucket=self.bucket_name, Key=key)\n\n # NEW BLOCK INTERFACE METHODS BELOW\n @staticmethod\n def _list_objects_sync(page_iterator: PageIterator) -> List[Dict[str, Any]]:\n \"\"\"\n Synchronous method to collect S3 objects into a list\n\n Args:\n page_iterator: AWS Paginator for S3 objects\n\n Returns:\n List[Dict]: List of object information\n \"\"\"\n return [\n content for page in page_iterator for content in page.get(\"Contents\", [])\n ]\n\n def _join_bucket_folder(self, bucket_path: str = \"\") -> str:\n \"\"\"\n Joins the base bucket folder to the bucket path.\n NOTE: If a method reuses another method in this class, be careful to not\n call this twice because it'll join the bucket folder twice.\n See https://github.com/PrefectHQ/prefect-aws/issues/141 for a past issue.\n \"\"\"\n if not self.bucket_folder and not bucket_path:\n # there's a difference between \".\" and \"\", at least in the tests\n return \"\"\n\n bucket_path = str(bucket_path)\n if self.bucket_folder != \"\" and bucket_path.startswith(self.bucket_folder):\n self.logger.info(\n f\"Bucket path {bucket_path!r} is already prefixed with \"\n f\"bucket folder {self.bucket_folder!r}; is this intentional?\"\n )\n\n return (Path(self.bucket_folder) / bucket_path).as_posix() + (\n \"\" if not bucket_path.endswith(\"/\") else \"/\"\n )\n\n @sync_compatible\n async def list_objects(\n self,\n folder: str = \"\",\n delimiter: str = \"\",\n page_size: Optional[int] = None,\n max_items: Optional[int] = None,\n jmespath_query: Optional[str] = None,\n ) -> List[Dict[str, Any]]:\n \"\"\"\n Args:\n folder: Folder to list objects from.\n delimiter: Character used to group keys of listed objects.\n page_size: Number of objects to return in each request to the AWS API.\n max_items: Maximum number of objects that to be returned by task.\n jmespath_query: Query used to filter objects based on object attributes refer to\n the [boto3 docs](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html#filtering-results-with-jmespath)\n for more information on how to construct queries.\n\n Returns:\n List of objects and their metadata in the bucket.\n\n Examples:\n List objects under the `base_folder`.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.list_objects(\"base_folder\")\n ```\n \"\"\" # noqa: E501\n bucket_path = self._join_bucket_folder(folder)\n client = self.credentials.get_s3_client()\n paginator = client.get_paginator(\"list_objects_v2\")\n page_iterator = paginator.paginate(\n Bucket=self.bucket_name,\n Prefix=bucket_path,\n Delimiter=delimiter,\n PaginationConfig={\"PageSize\": page_size, \"MaxItems\": max_items},\n )\n if jmespath_query:\n page_iterator = page_iterator.search(f\"{jmespath_query} | {{Contents: @}}\")\n\n self.logger.info(f\"Listing objects in bucket {bucket_path}.\")\n objects = await run_sync_in_worker_thread(\n self._list_objects_sync, page_iterator\n )\n return objects\n\n @sync_compatible\n async def download_object_to_path(\n self,\n from_path: str,\n to_path: Optional[Union[str, Path]],\n **download_kwargs: Dict[str, Any],\n ) -> Path:\n \"\"\"\n Downloads an object from the S3 bucket to a path.\n\n Args:\n from_path: The path to the object to download; this gets prefixed\n with the bucket_folder.\n to_path: The path to download the object to. If not provided, the\n object's name will be used.\n **download_kwargs: Additional keyword arguments to pass to\n `Client.download_file`.\n\n Returns:\n The absolute path that the object was downloaded to.\n\n Examples:\n Download my_folder/notes.txt object to notes.txt.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.download_object_to_path(\"my_folder/notes.txt\", \"notes.txt\")\n ```\n \"\"\"\n if to_path is None:\n to_path = Path(from_path).name\n\n # making path absolute, but converting back to str here\n # since !r looks nicer that way and filename arg expects str\n to_path = str(Path(to_path).absolute())\n bucket_path = self._join_bucket_folder(from_path)\n client = self.credentials.get_s3_client()\n\n self.logger.debug(\n f\"Preparing to download object from bucket {self.bucket_name!r} \"\n f\"path {bucket_path!r} to {to_path!r}.\"\n )\n await run_sync_in_worker_thread(\n client.download_file,\n Bucket=self.bucket_name,\n Key=from_path,\n Filename=to_path,\n **download_kwargs,\n )\n self.logger.info(\n f\"Downloaded object from bucket {self.bucket_name!r} path {bucket_path!r} \"\n f\"to {to_path!r}.\"\n )\n return Path(to_path)\n\n @sync_compatible\n async def download_object_to_file_object(\n self,\n from_path: str,\n to_file_object: BinaryIO,\n **download_kwargs: Dict[str, Any],\n ) -> BinaryIO:\n \"\"\"\n Downloads an object from the object storage service to a file-like object,\n which can be a BytesIO object or a BufferedWriter.\n\n Args:\n from_path: The path to the object to download from; this gets prefixed\n with the bucket_folder.\n to_file_object: The file-like object to download the object to.\n **download_kwargs: Additional keyword arguments to pass to\n `Client.download_fileobj`.\n\n Returns:\n The file-like object that the object was downloaded to.\n\n Examples:\n Download my_folder/notes.txt object to a BytesIO object.\n ```python\n from io import BytesIO\n\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n with BytesIO() as buf:\n s3_bucket.download_object_to_file_object(\"my_folder/notes.txt\", buf)\n ```\n\n Download my_folder/notes.txt object to a BufferedWriter.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n with open(\"notes.txt\", \"wb\") as f:\n s3_bucket.download_object_to_file_object(\"my_folder/notes.txt\", f)\n ```\n \"\"\"\n client = self.credentials.get_s3_client()\n bucket_path = self._join_bucket_folder(from_path)\n\n self.logger.debug(\n f\"Preparing to download object from bucket {self.bucket_name!r} \"\n f\"path {bucket_path!r} to file object.\"\n )\n await run_sync_in_worker_thread(\n client.download_fileobj,\n Bucket=self.bucket_name,\n Key=bucket_path,\n Fileobj=to_file_object,\n **download_kwargs,\n )\n self.logger.info(\n f\"Downloaded object from bucket {self.bucket_name!r} path {bucket_path!r} \"\n \"to file object.\"\n )\n return to_file_object\n\n @sync_compatible\n async def download_folder_to_path(\n self,\n from_folder: str,\n to_folder: Optional[Union[str, Path]] = None,\n **download_kwargs: Dict[str, Any],\n ) -> Path:\n \"\"\"\n Downloads objects *within* a folder (excluding the folder itself)\n from the S3 bucket to a folder.\n\n Args:\n from_folder: The path to the folder to download from.\n to_folder: The path to download the folder to.\n **download_kwargs: Additional keyword arguments to pass to\n `Client.download_file`.\n\n Returns:\n The absolute path that the folder was downloaded to.\n\n Examples:\n Download my_folder to a local folder named my_folder.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.download_folder_to_path(\"my_folder\", \"my_folder\")\n ```\n \"\"\"\n if to_folder is None:\n to_folder = \"\"\n to_folder = Path(to_folder).absolute()\n\n client = self.credentials.get_s3_client()\n objects = await self.list_objects(folder=from_folder)\n\n # do not call self._join_bucket_folder for filter\n # because it's built-in to that method already!\n # however, we still need to do it because we're using relative_to\n bucket_folder = self._join_bucket_folder(from_folder)\n\n async_coros = []\n for object in objects:\n bucket_path = Path(object[\"Key\"]).relative_to(bucket_folder)\n # this skips the actual directory itself, e.g.\n # `my_folder/` will be skipped\n # `my_folder/notes.txt` will be downloaded\n if bucket_path.is_dir():\n continue\n to_path = to_folder / bucket_path\n to_path.parent.mkdir(parents=True, exist_ok=True)\n to_path = str(to_path) # must be string\n self.logger.info(\n f\"Downloading object from bucket {self.bucket_name!r} path \"\n f\"{bucket_path.as_posix()!r} to {to_path!r}.\"\n )\n async_coros.append(\n run_sync_in_worker_thread(\n client.download_file,\n Bucket=self.bucket_name,\n Key=object[\"Key\"],\n Filename=to_path,\n **download_kwargs,\n )\n )\n await asyncio.gather(*async_coros)\n\n return Path(to_folder)\n\n @sync_compatible\n async def stream_from(\n self,\n bucket: \"S3Bucket\",\n from_path: str,\n to_path: Optional[str] = None,\n **upload_kwargs: Dict[str, Any],\n ) -> str:\n \"\"\"Streams an object from another bucket to this bucket. Requires the\n object to be downloaded and uploaded in chunks. If `self`'s credentials\n allow for writes to the other bucket, try using `S3Bucket.copy_object`.\n\n Args:\n bucket: The bucket to stream from.\n from_path: The path of the object to stream.\n to_path: The path to stream the object to. Defaults to the object's name.\n **upload_kwargs: Additional keyword arguments to pass to\n `Client.upload_fileobj`.\n\n Returns:\n The path that the object was uploaded to.\n\n Examples:\n Stream notes.txt from your-bucket/notes.txt to my-bucket/landed/notes.txt.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n your_s3_bucket = S3Bucket.load(\"your-bucket\")\n my_s3_bucket = S3Bucket.load(\"my-bucket\")\n\n my_s3_bucket.stream_from(\n your_s3_bucket,\n \"notes.txt\",\n to_path=\"landed/notes.txt\"\n )\n ```\n\n \"\"\"\n if to_path is None:\n to_path = Path(from_path).name\n\n # Get the source object's StreamingBody\n from_path: str = bucket._join_bucket_folder(from_path)\n from_client = bucket.credentials.get_s3_client()\n obj = await run_sync_in_worker_thread(\n from_client.get_object, Bucket=bucket.bucket_name, Key=from_path\n )\n body: StreamingBody = obj[\"Body\"]\n\n # Upload the StreamingBody to this bucket\n bucket_path = str(self._join_bucket_folder(to_path))\n to_client = self.credentials.get_s3_client()\n await run_sync_in_worker_thread(\n to_client.upload_fileobj,\n Fileobj=body,\n Bucket=self.bucket_name,\n Key=bucket_path,\n **upload_kwargs,\n )\n self.logger.info(\n f\"Streamed s3://{bucket.bucket_name}/{from_path} to the bucket \"\n f\"{self.bucket_name!r} path {bucket_path!r}.\"\n )\n return bucket_path\n\n @sync_compatible\n async def upload_from_path(\n self,\n from_path: Union[str, Path],\n to_path: Optional[str] = None,\n **upload_kwargs: Dict[str, Any],\n ) -> str:\n \"\"\"\n Uploads an object from a path to the S3 bucket.\n\n Args:\n from_path: The path to the file to upload from.\n to_path: The path to upload the file to.\n **upload_kwargs: Additional keyword arguments to pass to `Client.upload`.\n\n Returns:\n The path that the object was uploaded to.\n\n Examples:\n Upload notes.txt to my_folder/notes.txt.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.upload_from_path(\"notes.txt\", \"my_folder/notes.txt\")\n ```\n \"\"\"\n from_path = str(Path(from_path).absolute())\n if to_path is None:\n to_path = Path(from_path).name\n\n bucket_path = str(self._join_bucket_folder(to_path))\n client = self.credentials.get_s3_client()\n\n await run_sync_in_worker_thread(\n client.upload_file,\n Filename=from_path,\n Bucket=self.bucket_name,\n Key=bucket_path,\n **upload_kwargs,\n )\n self.logger.info(\n f\"Uploaded from {from_path!r} to the bucket \"\n f\"{self.bucket_name!r} path {bucket_path!r}.\"\n )\n return bucket_path\n\n @sync_compatible\n async def upload_from_file_object(\n self, from_file_object: BinaryIO, to_path: str, **upload_kwargs: Dict[str, Any]\n ) -> str:\n \"\"\"\n Uploads an object to the S3 bucket from a file-like object,\n which can be a BytesIO object or a BufferedReader.\n\n Args:\n from_file_object: The file-like object to upload from.\n to_path: The path to upload the object to.\n **upload_kwargs: Additional keyword arguments to pass to\n `Client.upload_fileobj`.\n\n Returns:\n The path that the object was uploaded to.\n\n Examples:\n Upload BytesIO object to my_folder/notes.txt.\n ```python\n from io import BytesIO\n\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n with open(\"notes.txt\", \"rb\") as f:\n s3_bucket.upload_from_file_object(f, \"my_folder/notes.txt\")\n ```\n\n Upload BufferedReader object to my_folder/notes.txt.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n with open(\"notes.txt\", \"rb\") as f:\n s3_bucket.upload_from_file_object(\n f, \"my_folder/notes.txt\"\n )\n ```\n \"\"\"\n bucket_path = str(self._join_bucket_folder(to_path))\n client = self.credentials.get_s3_client()\n await run_sync_in_worker_thread(\n client.upload_fileobj,\n Fileobj=from_file_object,\n Bucket=self.bucket_name,\n Key=bucket_path,\n **upload_kwargs,\n )\n self.logger.info(\n \"Uploaded from file object to the bucket \"\n f\"{self.bucket_name!r} path {bucket_path!r}.\"\n )\n return bucket_path\n\n @sync_compatible\n async def upload_from_folder(\n self,\n from_folder: Union[str, Path],\n to_folder: Optional[str] = None,\n **upload_kwargs: Dict[str, Any],\n ) -> str:\n \"\"\"\n Uploads files *within* a folder (excluding the folder itself)\n to the object storage service folder.\n\n Args:\n from_folder: The path to the folder to upload from.\n to_folder: The path to upload the folder to.\n **upload_kwargs: Additional keyword arguments to pass to\n `Client.upload_fileobj`.\n\n Returns:\n The path that the folder was uploaded to.\n\n Examples:\n Upload contents from my_folder to new_folder.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.upload_from_folder(\"my_folder\", \"new_folder\")\n ```\n \"\"\"\n from_folder = Path(from_folder)\n bucket_folder = self._join_bucket_folder(to_folder or \"\")\n\n num_uploaded = 0\n client = self.credentials.get_s3_client()\n\n async_coros = []\n for from_path in from_folder.rglob(\"**/*\"):\n # this skips the actual directory itself, e.g.\n # `my_folder/` will be skipped\n # `my_folder/notes.txt` will be uploaded\n if from_path.is_dir():\n continue\n bucket_path = (\n Path(bucket_folder) / from_path.relative_to(from_folder)\n ).as_posix()\n self.logger.info(\n f\"Uploading from {str(from_path)!r} to the bucket \"\n f\"{self.bucket_name!r} path {bucket_path!r}.\"\n )\n async_coros.append(\n run_sync_in_worker_thread(\n client.upload_file,\n Filename=str(from_path),\n Bucket=self.bucket_name,\n Key=bucket_path,\n **upload_kwargs,\n )\n )\n num_uploaded += 1\n await asyncio.gather(*async_coros)\n\n if num_uploaded == 0:\n self.logger.warning(f\"No files were uploaded from {str(from_folder)!r}.\")\n else:\n self.logger.info(\n f\"Uploaded {num_uploaded} files from {str(from_folder)!r} to \"\n f\"the bucket {self.bucket_name!r} path {bucket_path!r}\"\n )\n\n return to_folder\n\n @sync_compatible\n async def copy_object(\n self,\n from_path: Union[str, Path],\n to_path: Union[str, Path],\n to_bucket: Optional[Union[\"S3Bucket\", str]] = None,\n **copy_kwargs,\n ) -> str:\n \"\"\"Uses S3's internal\n [CopyObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html)\n to copy objects within or between buckets. To copy objects between buckets,\n `self`'s credentials must have permission to read the source object and write\n to the target object. If the credentials do not have those permissions, try\n using `S3Bucket.stream_from`.\n\n Args:\n from_path: The path of the object to copy.\n to_path: The path to copy the object to.\n to_bucket: The bucket to copy to. Defaults to the current bucket.\n **copy_kwargs: Additional keyword arguments to pass to\n `S3Client.copy_object`.\n\n Returns:\n The path that the object was copied to. Excludes the bucket name.\n\n Examples:\n\n Copy notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.copy_object(\"my_folder/notes.txt\", \"my_folder/notes_copy.txt\")\n ```\n\n Copy notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt in\n another bucket.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.copy_object(\n \"my_folder/notes.txt\",\n \"my_folder/notes_copy.txt\",\n to_bucket=\"other-bucket\"\n )\n ```\n \"\"\"\n s3_client = self.credentials.get_s3_client()\n\n source_path = self._resolve_path(Path(from_path).as_posix())\n target_path = self._resolve_path(Path(to_path).as_posix())\n\n source_bucket_name = self.bucket_name\n target_bucket_name = self.bucket_name\n if isinstance(to_bucket, S3Bucket):\n target_bucket_name = to_bucket.bucket_name\n target_path = to_bucket._resolve_path(target_path)\n elif isinstance(to_bucket, str):\n target_bucket_name = to_bucket\n elif to_bucket is not None:\n raise TypeError(\n \"to_bucket must be a string or S3Bucket, not\"\n f\" {type(target_bucket_name)}\"\n )\n\n self.logger.info(\n \"Copying object from bucket %s with key %s to bucket %s with key %s\",\n source_bucket_name,\n source_path,\n target_bucket_name,\n target_path,\n )\n\n s3_client.copy_object(\n CopySource={\"Bucket\": source_bucket_name, \"Key\": source_path},\n Bucket=target_bucket_name,\n Key=target_path,\n **copy_kwargs,\n )\n\n return target_path\n\n @sync_compatible\n async def move_object(\n self,\n from_path: Union[str, Path],\n to_path: Union[str, Path],\n to_bucket: Optional[Union[\"S3Bucket\", str]] = None,\n ) -> str:\n \"\"\"Uses S3's internal CopyObject and DeleteObject to move objects within or\n between buckets. To move objects between buckets, `self`'s credentials must\n have permission to read and delete the source object and write to the target\n object. If the credentials do not have those permissions, this method will\n raise an error. If the credentials have permission to read the source object\n but not delete it, the object will be copied but not deleted.\n\n Args:\n from_path: The path of the object to move.\n to_path: The path to move the object to.\n to_bucket: The bucket to move to. Defaults to the current bucket.\n\n Returns:\n The path that the object was moved to. Excludes the bucket name.\n\n Examples:\n\n Move notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.move_object(\"my_folder/notes.txt\", \"my_folder/notes_copy.txt\")\n ```\n\n Move notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt in\n another bucket.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.move_object(\n \"my_folder/notes.txt\",\n \"my_folder/notes_copy.txt\",\n to_bucket=\"other-bucket\"\n )\n ```\n \"\"\"\n s3_client = self.credentials.get_s3_client()\n\n source_path = self._resolve_path(Path(from_path).as_posix())\n target_path = self._resolve_path(Path(to_path).as_posix())\n\n source_bucket_name = self.bucket_name\n target_bucket_name = self.bucket_name\n if isinstance(to_bucket, S3Bucket):\n target_bucket_name = to_bucket.bucket_name\n target_path = to_bucket._resolve_path(target_path)\n elif isinstance(to_bucket, str):\n target_bucket_name = to_bucket\n elif to_bucket is not None:\n raise TypeError(\n \"to_bucket must be a string or S3Bucket, not\"\n f\" {type(target_bucket_name)}\"\n )\n\n self.logger.info(\n \"Moving object from s3://%s/%s to s3://%s/%s\",\n source_bucket_name,\n source_path,\n target_bucket_name,\n target_path,\n )\n\n # If invalid, should error and prevent next operation\n s3_client.copy(\n CopySource={\"Bucket\": source_bucket_name, \"Key\": source_path},\n Bucket=target_bucket_name,\n Key=target_path,\n )\n s3_client.delete_object(Bucket=source_bucket_name, Key=source_path)\n return target_path\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket-attributes","title":"Attributes","text":""},{"location":"s3/#prefect_aws.s3.S3Bucket.basepath","title":"
basepath: str
property
writable
","text":"
The base path of the S3 bucket.
Returns:
Type Description
str
The base path of the S3 bucket.
"},{"location":"s3/#prefect_aws.s3.S3Bucket.bucket_folder","title":"
bucket_folder: str
pydantic-field
","text":"
A default path to a folder within the S3 bucket to use for reading and writing objects.
"},{"location":"s3/#prefect_aws.s3.S3Bucket.bucket_name","title":"
bucket_name: str
pydantic-field
required
","text":"
Name of your bucket.
"},{"location":"s3/#prefect_aws.s3.S3Bucket.credentials","title":"
credentials: Union[prefect_aws.credentials.AwsCredentials, prefect_aws.credentials.MinIOCredentials]
pydantic-field
","text":"
A block containing your credentials to AWS or MinIO.
"},{"location":"s3/#prefect_aws.s3.S3Bucket-methods","title":"Methods","text":""},{"location":"s3/#prefect_aws.s3.S3Bucket.__json_encoder__","title":"
__json_encoder__
special
staticmethod
","text":"
partial(func, args, *keywords) - new function with partial application of the given arguments and keywords.
"},{"location":"s3/#prefect_aws.s3.S3Bucket.copy_object","title":"
copy_object
async
","text":"
Uses S3's internal CopyObject to copy objects within or between buckets. To copy objects between buckets, self
's credentials must have permission to read the source object and write to the target object. If the credentials do not have those permissions, try using S3Bucket.stream_from
.
Parameters:
Name Type Description Default
from_path
Union[str, pathlib.Path]
The path of the object to copy.
required
to_path
Union[str, pathlib.Path]
The path to copy the object to.
required
to_bucket
Union[S3Bucket, str]
The bucket to copy to. Defaults to the current bucket.
None
**copy_kwargs
Additional keyword arguments to pass to S3Client.copy_object
.
{}
Returns:
Type Description
str
The path that the object was copied to. Excludes the bucket name.
Examples:
Copy notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.copy_object(\"my_folder/notes.txt\", \"my_folder/notes_copy.txt\")\n
Copy notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt in another bucket.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.copy_object(\n \"my_folder/notes.txt\",\n \"my_folder/notes_copy.txt\",\n to_bucket=\"other-bucket\"\n)\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def copy_object(\n self,\n from_path: Union[str, Path],\n to_path: Union[str, Path],\n to_bucket: Optional[Union[\"S3Bucket\", str]] = None,\n **copy_kwargs,\n) -> str:\n \"\"\"Uses S3's internal\n [CopyObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html)\n to copy objects within or between buckets. To copy objects between buckets,\n `self`'s credentials must have permission to read the source object and write\n to the target object. If the credentials do not have those permissions, try\n using `S3Bucket.stream_from`.\n\n Args:\n from_path: The path of the object to copy.\n to_path: The path to copy the object to.\n to_bucket: The bucket to copy to. Defaults to the current bucket.\n **copy_kwargs: Additional keyword arguments to pass to\n `S3Client.copy_object`.\n\n Returns:\n The path that the object was copied to. Excludes the bucket name.\n\n Examples:\n\n Copy notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.copy_object(\"my_folder/notes.txt\", \"my_folder/notes_copy.txt\")\n ```\n\n Copy notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt in\n another bucket.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.copy_object(\n \"my_folder/notes.txt\",\n \"my_folder/notes_copy.txt\",\n to_bucket=\"other-bucket\"\n )\n ```\n \"\"\"\n s3_client = self.credentials.get_s3_client()\n\n source_path = self._resolve_path(Path(from_path).as_posix())\n target_path = self._resolve_path(Path(to_path).as_posix())\n\n source_bucket_name = self.bucket_name\n target_bucket_name = self.bucket_name\n if isinstance(to_bucket, S3Bucket):\n target_bucket_name = to_bucket.bucket_name\n target_path = to_bucket._resolve_path(target_path)\n elif isinstance(to_bucket, str):\n target_bucket_name = to_bucket\n elif to_bucket is not None:\n raise TypeError(\n \"to_bucket must be a string or S3Bucket, not\"\n f\" {type(target_bucket_name)}\"\n )\n\n self.logger.info(\n \"Copying object from bucket %s with key %s to bucket %s with key %s\",\n source_bucket_name,\n source_path,\n target_bucket_name,\n target_path,\n )\n\n s3_client.copy_object(\n CopySource={\"Bucket\": source_bucket_name, \"Key\": source_path},\n Bucket=target_bucket_name,\n Key=target_path,\n **copy_kwargs,\n )\n\n return target_path\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.download_folder_to_path","title":"
download_folder_to_path
async
","text":"
Downloads objects within a folder (excluding the folder itself) from the S3 bucket to a folder.
Parameters:
Name Type Description Default
from_folder
str
The path to the folder to download from.
required
to_folder
Union[str, pathlib.Path]
The path to download the folder to.
None
**download_kwargs
Dict[str, Any]
Additional keyword arguments to pass to Client.download_file
.
{}
Returns:
Type Description
Path
The absolute path that the folder was downloaded to.
Examples:
Download my_folder to a local folder named my_folder.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.download_folder_to_path(\"my_folder\", \"my_folder\")\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def download_folder_to_path(\n self,\n from_folder: str,\n to_folder: Optional[Union[str, Path]] = None,\n **download_kwargs: Dict[str, Any],\n) -> Path:\n \"\"\"\n Downloads objects *within* a folder (excluding the folder itself)\n from the S3 bucket to a folder.\n\n Args:\n from_folder: The path to the folder to download from.\n to_folder: The path to download the folder to.\n **download_kwargs: Additional keyword arguments to pass to\n `Client.download_file`.\n\n Returns:\n The absolute path that the folder was downloaded to.\n\n Examples:\n Download my_folder to a local folder named my_folder.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.download_folder_to_path(\"my_folder\", \"my_folder\")\n ```\n \"\"\"\n if to_folder is None:\n to_folder = \"\"\n to_folder = Path(to_folder).absolute()\n\n client = self.credentials.get_s3_client()\n objects = await self.list_objects(folder=from_folder)\n\n # do not call self._join_bucket_folder for filter\n # because it's built-in to that method already!\n # however, we still need to do it because we're using relative_to\n bucket_folder = self._join_bucket_folder(from_folder)\n\n async_coros = []\n for object in objects:\n bucket_path = Path(object[\"Key\"]).relative_to(bucket_folder)\n # this skips the actual directory itself, e.g.\n # `my_folder/` will be skipped\n # `my_folder/notes.txt` will be downloaded\n if bucket_path.is_dir():\n continue\n to_path = to_folder / bucket_path\n to_path.parent.mkdir(parents=True, exist_ok=True)\n to_path = str(to_path) # must be string\n self.logger.info(\n f\"Downloading object from bucket {self.bucket_name!r} path \"\n f\"{bucket_path.as_posix()!r} to {to_path!r}.\"\n )\n async_coros.append(\n run_sync_in_worker_thread(\n client.download_file,\n Bucket=self.bucket_name,\n Key=object[\"Key\"],\n Filename=to_path,\n **download_kwargs,\n )\n )\n await asyncio.gather(*async_coros)\n\n return Path(to_folder)\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.download_object_to_file_object","title":"
download_object_to_file_object
async
","text":"
Downloads an object from the object storage service to a file-like object, which can be a BytesIO object or a BufferedWriter.
Parameters:
Name Type Description Default
from_path
str
The path to the object to download from; this gets prefixed with the bucket_folder.
required
to_file_object
BinaryIO
The file-like object to download the object to.
required
**download_kwargs
Dict[str, Any]
Additional keyword arguments to pass to Client.download_fileobj
.
{}
Returns:
Type Description
BinaryIO
The file-like object that the object was downloaded to.
Examples:
Download my_folder/notes.txt object to a BytesIO object.
from io import BytesIO\n\nfrom prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\nwith BytesIO() as buf:\n s3_bucket.download_object_to_file_object(\"my_folder/notes.txt\", buf)\n
Download my_folder/notes.txt object to a BufferedWriter.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\nwith open(\"notes.txt\", \"wb\") as f:\n s3_bucket.download_object_to_file_object(\"my_folder/notes.txt\", f)\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def download_object_to_file_object(\n self,\n from_path: str,\n to_file_object: BinaryIO,\n **download_kwargs: Dict[str, Any],\n) -> BinaryIO:\n \"\"\"\n Downloads an object from the object storage service to a file-like object,\n which can be a BytesIO object or a BufferedWriter.\n\n Args:\n from_path: The path to the object to download from; this gets prefixed\n with the bucket_folder.\n to_file_object: The file-like object to download the object to.\n **download_kwargs: Additional keyword arguments to pass to\n `Client.download_fileobj`.\n\n Returns:\n The file-like object that the object was downloaded to.\n\n Examples:\n Download my_folder/notes.txt object to a BytesIO object.\n ```python\n from io import BytesIO\n\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n with BytesIO() as buf:\n s3_bucket.download_object_to_file_object(\"my_folder/notes.txt\", buf)\n ```\n\n Download my_folder/notes.txt object to a BufferedWriter.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n with open(\"notes.txt\", \"wb\") as f:\n s3_bucket.download_object_to_file_object(\"my_folder/notes.txt\", f)\n ```\n \"\"\"\n client = self.credentials.get_s3_client()\n bucket_path = self._join_bucket_folder(from_path)\n\n self.logger.debug(\n f\"Preparing to download object from bucket {self.bucket_name!r} \"\n f\"path {bucket_path!r} to file object.\"\n )\n await run_sync_in_worker_thread(\n client.download_fileobj,\n Bucket=self.bucket_name,\n Key=bucket_path,\n Fileobj=to_file_object,\n **download_kwargs,\n )\n self.logger.info(\n f\"Downloaded object from bucket {self.bucket_name!r} path {bucket_path!r} \"\n \"to file object.\"\n )\n return to_file_object\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.download_object_to_path","title":"
download_object_to_path
async
","text":"
Downloads an object from the S3 bucket to a path.
Parameters:
Name Type Description Default
from_path
str
The path to the object to download; this gets prefixed with the bucket_folder.
required
to_path
Union[str, pathlib.Path]
The path to download the object to. If not provided, the object's name will be used.
required
**download_kwargs
Dict[str, Any]
Additional keyword arguments to pass to Client.download_file
.
{}
Returns:
Type Description
Path
The absolute path that the object was downloaded to.
Examples:
Download my_folder/notes.txt object to notes.txt.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.download_object_to_path(\"my_folder/notes.txt\", \"notes.txt\")\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def download_object_to_path(\n self,\n from_path: str,\n to_path: Optional[Union[str, Path]],\n **download_kwargs: Dict[str, Any],\n) -> Path:\n \"\"\"\n Downloads an object from the S3 bucket to a path.\n\n Args:\n from_path: The path to the object to download; this gets prefixed\n with the bucket_folder.\n to_path: The path to download the object to. If not provided, the\n object's name will be used.\n **download_kwargs: Additional keyword arguments to pass to\n `Client.download_file`.\n\n Returns:\n The absolute path that the object was downloaded to.\n\n Examples:\n Download my_folder/notes.txt object to notes.txt.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.download_object_to_path(\"my_folder/notes.txt\", \"notes.txt\")\n ```\n \"\"\"\n if to_path is None:\n to_path = Path(from_path).name\n\n # making path absolute, but converting back to str here\n # since !r looks nicer that way and filename arg expects str\n to_path = str(Path(to_path).absolute())\n bucket_path = self._join_bucket_folder(from_path)\n client = self.credentials.get_s3_client()\n\n self.logger.debug(\n f\"Preparing to download object from bucket {self.bucket_name!r} \"\n f\"path {bucket_path!r} to {to_path!r}.\"\n )\n await run_sync_in_worker_thread(\n client.download_file,\n Bucket=self.bucket_name,\n Key=from_path,\n Filename=to_path,\n **download_kwargs,\n )\n self.logger.info(\n f\"Downloaded object from bucket {self.bucket_name!r} path {bucket_path!r} \"\n f\"to {to_path!r}.\"\n )\n return Path(to_path)\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.get_directory","title":"
get_directory
async
","text":"
Copies a folder from the configured S3 bucket to a local directory.
Defaults to copying the entire contents of the block's basepath to the current working directory.
Parameters:
Name Type Description Default
from_path
Optional[str]
Path in S3 bucket to download from. Defaults to the block's configured basepath.
None
local_path
Optional[str]
Local path to download S3 contents to. Defaults to the current working directory.
None
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def get_directory(\n self, from_path: Optional[str] = None, local_path: Optional[str] = None\n) -> None:\n \"\"\"\n Copies a folder from the configured S3 bucket to a local directory.\n\n Defaults to copying the entire contents of the block's basepath to the current\n working directory.\n\n Args:\n from_path: Path in S3 bucket to download from. Defaults to the block's\n configured basepath.\n local_path: Local path to download S3 contents to. Defaults to the current\n working directory.\n \"\"\"\n bucket_folder = self.bucket_folder\n if from_path is None:\n from_path = str(bucket_folder) if bucket_folder else \"\"\n\n if local_path is None:\n local_path = str(Path(\".\").absolute())\n else:\n local_path = str(Path(local_path).expanduser())\n\n bucket = self._get_bucket_resource()\n for obj in bucket.objects.filter(Prefix=from_path):\n if obj.key[-1] == \"/\":\n # object is a folder and will be created if it contains any objects\n continue\n target = os.path.join(\n local_path,\n os.path.relpath(obj.key, from_path),\n )\n os.makedirs(os.path.dirname(target), exist_ok=True)\n bucket.download_file(obj.key, target)\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.list_objects","title":"
list_objects
async
","text":"
Parameters:
Name Type Description Default
folder
str
Folder to list objects from.
''
delimiter
str
Character used to group keys of listed objects.
''
page_size
Optional[int]
Number of objects to return in each request to the AWS API.
None
max_items
Optional[int]
Maximum number of objects that to be returned by task.
None
jmespath_query
Optional[str]
Query used to filter objects based on object attributes refer to the boto3 docs for more information on how to construct queries.
None
Returns:
Type Description
List[Dict[str, Any]]
List of objects and their metadata in the bucket.
Examples:
List objects under the base_folder
.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.list_objects(\"base_folder\")\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def list_objects(\n self,\n folder: str = \"\",\n delimiter: str = \"\",\n page_size: Optional[int] = None,\n max_items: Optional[int] = None,\n jmespath_query: Optional[str] = None,\n) -> List[Dict[str, Any]]:\n \"\"\"\n Args:\n folder: Folder to list objects from.\n delimiter: Character used to group keys of listed objects.\n page_size: Number of objects to return in each request to the AWS API.\n max_items: Maximum number of objects that to be returned by task.\n jmespath_query: Query used to filter objects based on object attributes refer to\n the [boto3 docs](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html#filtering-results-with-jmespath)\n for more information on how to construct queries.\n\n Returns:\n List of objects and their metadata in the bucket.\n\n Examples:\n List objects under the `base_folder`.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.list_objects(\"base_folder\")\n ```\n \"\"\" # noqa: E501\n bucket_path = self._join_bucket_folder(folder)\n client = self.credentials.get_s3_client()\n paginator = client.get_paginator(\"list_objects_v2\")\n page_iterator = paginator.paginate(\n Bucket=self.bucket_name,\n Prefix=bucket_path,\n Delimiter=delimiter,\n PaginationConfig={\"PageSize\": page_size, \"MaxItems\": max_items},\n )\n if jmespath_query:\n page_iterator = page_iterator.search(f\"{jmespath_query} | {{Contents: @}}\")\n\n self.logger.info(f\"Listing objects in bucket {bucket_path}.\")\n objects = await run_sync_in_worker_thread(\n self._list_objects_sync, page_iterator\n )\n return objects\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.move_object","title":"
move_object
async
","text":"
Uses S3's internal CopyObject and DeleteObject to move objects within or between buckets. To move objects between buckets, self
's credentials must have permission to read and delete the source object and write to the target object. If the credentials do not have those permissions, this method will raise an error. If the credentials have permission to read the source object but not delete it, the object will be copied but not deleted.
Parameters:
Name Type Description Default
from_path
Union[str, pathlib.Path]
The path of the object to move.
required
to_path
Union[str, pathlib.Path]
The path to move the object to.
required
to_bucket
Union[S3Bucket, str]
The bucket to move to. Defaults to the current bucket.
None
Returns:
Type Description
str
The path that the object was moved to. Excludes the bucket name.
Examples:
Move notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.move_object(\"my_folder/notes.txt\", \"my_folder/notes_copy.txt\")\n
Move notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt in another bucket.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.move_object(\n \"my_folder/notes.txt\",\n \"my_folder/notes_copy.txt\",\n to_bucket=\"other-bucket\"\n)\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def move_object(\n self,\n from_path: Union[str, Path],\n to_path: Union[str, Path],\n to_bucket: Optional[Union[\"S3Bucket\", str]] = None,\n) -> str:\n \"\"\"Uses S3's internal CopyObject and DeleteObject to move objects within or\n between buckets. To move objects between buckets, `self`'s credentials must\n have permission to read and delete the source object and write to the target\n object. If the credentials do not have those permissions, this method will\n raise an error. If the credentials have permission to read the source object\n but not delete it, the object will be copied but not deleted.\n\n Args:\n from_path: The path of the object to move.\n to_path: The path to move the object to.\n to_bucket: The bucket to move to. Defaults to the current bucket.\n\n Returns:\n The path that the object was moved to. Excludes the bucket name.\n\n Examples:\n\n Move notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.move_object(\"my_folder/notes.txt\", \"my_folder/notes_copy.txt\")\n ```\n\n Move notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt in\n another bucket.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.move_object(\n \"my_folder/notes.txt\",\n \"my_folder/notes_copy.txt\",\n to_bucket=\"other-bucket\"\n )\n ```\n \"\"\"\n s3_client = self.credentials.get_s3_client()\n\n source_path = self._resolve_path(Path(from_path).as_posix())\n target_path = self._resolve_path(Path(to_path).as_posix())\n\n source_bucket_name = self.bucket_name\n target_bucket_name = self.bucket_name\n if isinstance(to_bucket, S3Bucket):\n target_bucket_name = to_bucket.bucket_name\n target_path = to_bucket._resolve_path(target_path)\n elif isinstance(to_bucket, str):\n target_bucket_name = to_bucket\n elif to_bucket is not None:\n raise TypeError(\n \"to_bucket must be a string or S3Bucket, not\"\n f\" {type(target_bucket_name)}\"\n )\n\n self.logger.info(\n \"Moving object from s3://%s/%s to s3://%s/%s\",\n source_bucket_name,\n source_path,\n target_bucket_name,\n target_path,\n )\n\n # If invalid, should error and prevent next operation\n s3_client.copy(\n CopySource={\"Bucket\": source_bucket_name, \"Key\": source_path},\n Bucket=target_bucket_name,\n Key=target_path,\n )\n s3_client.delete_object(Bucket=source_bucket_name, Key=source_path)\n return target_path\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.put_directory","title":"
put_directory
async
","text":"
Uploads a directory from a given local path to the configured S3 bucket in a given folder.
Defaults to uploading the entire contents the current working directory to the block's basepath.
Parameters:
Name Type Description Default
local_path
Optional[str]
Path to local directory to upload from.
None
to_path
Optional[str]
Path in S3 bucket to upload to. Defaults to block's configured basepath.
None
ignore_file
Optional[str]
Path to file containing gitignore style expressions for filepaths to ignore.
None
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def put_directory(\n self,\n local_path: Optional[str] = None,\n to_path: Optional[str] = None,\n ignore_file: Optional[str] = None,\n) -> int:\n \"\"\"\n Uploads a directory from a given local path to the configured S3 bucket in a\n given folder.\n\n Defaults to uploading the entire contents the current working directory to the\n block's basepath.\n\n Args:\n local_path: Path to local directory to upload from.\n to_path: Path in S3 bucket to upload to. Defaults to block's configured\n basepath.\n ignore_file: Path to file containing gitignore style expressions for\n filepaths to ignore.\n\n \"\"\"\n to_path = \"\" if to_path is None else to_path\n\n if local_path is None:\n local_path = \".\"\n\n included_files = None\n if ignore_file:\n with open(ignore_file, \"r\") as f:\n ignore_patterns = f.readlines()\n\n included_files = filter_files(local_path, ignore_patterns)\n\n uploaded_file_count = 0\n for local_file_path in Path(local_path).expanduser().rglob(\"*\"):\n if (\n included_files is not None\n and str(local_file_path.relative_to(local_path)) not in included_files\n ):\n continue\n elif not local_file_path.is_dir():\n remote_file_path = Path(to_path) / local_file_path.relative_to(\n local_path\n )\n with open(local_file_path, \"rb\") as local_file:\n local_file_content = local_file.read()\n\n await self.write_path(\n remote_file_path.as_posix(), content=local_file_content\n )\n uploaded_file_count += 1\n\n return uploaded_file_count\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.read_path","title":"
read_path
async
","text":"
Read specified path from S3 and return contents. Provide the entire path to the key in S3.
Parameters:
Name Type Description Default
path
str
Entire path to (and including) the key.
required
Examples:
Read \"subfolder/file1\" contents from an S3 bucket named \"bucket\":
from prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import S3Bucket\n\naws_creds = AwsCredentials(\n aws_access_key_id=AWS_ACCESS_KEY_ID,\n aws_secret_access_key=AWS_SECRET_ACCESS_KEY\n)\n\ns3_bucket_block = S3Bucket(\n bucket_name=\"bucket\",\n aws_credentials=aws_creds,\n basepath=\"subfolder\"\n)\n\nkey_contents = s3_bucket_block.read_path(path=\"subfolder/file1\")\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def read_path(self, path: str) -> bytes:\n \"\"\"\n Read specified path from S3 and return contents. Provide the entire\n path to the key in S3.\n\n Args:\n path: Entire path to (and including) the key.\n\n Example:\n Read \"subfolder/file1\" contents from an S3 bucket named \"bucket\":\n ```python\n from prefect_aws import AwsCredentials\n from prefect_aws.s3 import S3Bucket\n\n aws_creds = AwsCredentials(\n aws_access_key_id=AWS_ACCESS_KEY_ID,\n aws_secret_access_key=AWS_SECRET_ACCESS_KEY\n )\n\n s3_bucket_block = S3Bucket(\n bucket_name=\"bucket\",\n aws_credentials=aws_creds,\n basepath=\"subfolder\"\n )\n\n key_contents = s3_bucket_block.read_path(path=\"subfolder/file1\")\n ```\n \"\"\"\n path = self._resolve_path(path)\n\n return await run_sync_in_worker_thread(self._read_sync, path)\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.stream_from","title":"
stream_from
async
","text":"
Streams an object from another bucket to this bucket. Requires the object to be downloaded and uploaded in chunks. If self
's credentials allow for writes to the other bucket, try using S3Bucket.copy_object
.
Parameters:
Name Type Description Default
bucket
S3Bucket
The bucket to stream from.
required
from_path
str
The path of the object to stream.
required
to_path
Optional[str]
The path to stream the object to. Defaults to the object's name.
None
**upload_kwargs
Dict[str, Any]
Additional keyword arguments to pass to Client.upload_fileobj
.
{}
Returns:
Type Description
str
The path that the object was uploaded to.
Examples:
Stream notes.txt from your-bucket/notes.txt to my-bucket/landed/notes.txt.
from prefect_aws.s3 import S3Bucket\n\nyour_s3_bucket = S3Bucket.load(\"your-bucket\")\nmy_s3_bucket = S3Bucket.load(\"my-bucket\")\n\nmy_s3_bucket.stream_from(\n your_s3_bucket,\n \"notes.txt\",\n to_path=\"landed/notes.txt\"\n)\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def stream_from(\n self,\n bucket: \"S3Bucket\",\n from_path: str,\n to_path: Optional[str] = None,\n **upload_kwargs: Dict[str, Any],\n) -> str:\n \"\"\"Streams an object from another bucket to this bucket. Requires the\n object to be downloaded and uploaded in chunks. If `self`'s credentials\n allow for writes to the other bucket, try using `S3Bucket.copy_object`.\n\n Args:\n bucket: The bucket to stream from.\n from_path: The path of the object to stream.\n to_path: The path to stream the object to. Defaults to the object's name.\n **upload_kwargs: Additional keyword arguments to pass to\n `Client.upload_fileobj`.\n\n Returns:\n The path that the object was uploaded to.\n\n Examples:\n Stream notes.txt from your-bucket/notes.txt to my-bucket/landed/notes.txt.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n your_s3_bucket = S3Bucket.load(\"your-bucket\")\n my_s3_bucket = S3Bucket.load(\"my-bucket\")\n\n my_s3_bucket.stream_from(\n your_s3_bucket,\n \"notes.txt\",\n to_path=\"landed/notes.txt\"\n )\n ```\n\n \"\"\"\n if to_path is None:\n to_path = Path(from_path).name\n\n # Get the source object's StreamingBody\n from_path: str = bucket._join_bucket_folder(from_path)\n from_client = bucket.credentials.get_s3_client()\n obj = await run_sync_in_worker_thread(\n from_client.get_object, Bucket=bucket.bucket_name, Key=from_path\n )\n body: StreamingBody = obj[\"Body\"]\n\n # Upload the StreamingBody to this bucket\n bucket_path = str(self._join_bucket_folder(to_path))\n to_client = self.credentials.get_s3_client()\n await run_sync_in_worker_thread(\n to_client.upload_fileobj,\n Fileobj=body,\n Bucket=self.bucket_name,\n Key=bucket_path,\n **upload_kwargs,\n )\n self.logger.info(\n f\"Streamed s3://{bucket.bucket_name}/{from_path} to the bucket \"\n f\"{self.bucket_name!r} path {bucket_path!r}.\"\n )\n return bucket_path\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.upload_from_file_object","title":"
upload_from_file_object
async
","text":"
Uploads an object to the S3 bucket from a file-like object, which can be a BytesIO object or a BufferedReader.
Parameters:
Name Type Description Default
from_file_object
BinaryIO
The file-like object to upload from.
required
to_path
str
The path to upload the object to.
required
**upload_kwargs
Dict[str, Any]
Additional keyword arguments to pass to Client.upload_fileobj
.
{}
Returns:
Type Description
str
The path that the object was uploaded to.
Examples:
Upload BytesIO object to my_folder/notes.txt.
from io import BytesIO\n\nfrom prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\nwith open(\"notes.txt\", \"rb\") as f:\n s3_bucket.upload_from_file_object(f, \"my_folder/notes.txt\")\n
Upload BufferedReader object to my_folder/notes.txt.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\nwith open(\"notes.txt\", \"rb\") as f:\n s3_bucket.upload_from_file_object(\n f, \"my_folder/notes.txt\"\n )\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def upload_from_file_object(\n self, from_file_object: BinaryIO, to_path: str, **upload_kwargs: Dict[str, Any]\n) -> str:\n \"\"\"\n Uploads an object to the S3 bucket from a file-like object,\n which can be a BytesIO object or a BufferedReader.\n\n Args:\n from_file_object: The file-like object to upload from.\n to_path: The path to upload the object to.\n **upload_kwargs: Additional keyword arguments to pass to\n `Client.upload_fileobj`.\n\n Returns:\n The path that the object was uploaded to.\n\n Examples:\n Upload BytesIO object to my_folder/notes.txt.\n ```python\n from io import BytesIO\n\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n with open(\"notes.txt\", \"rb\") as f:\n s3_bucket.upload_from_file_object(f, \"my_folder/notes.txt\")\n ```\n\n Upload BufferedReader object to my_folder/notes.txt.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n with open(\"notes.txt\", \"rb\") as f:\n s3_bucket.upload_from_file_object(\n f, \"my_folder/notes.txt\"\n )\n ```\n \"\"\"\n bucket_path = str(self._join_bucket_folder(to_path))\n client = self.credentials.get_s3_client()\n await run_sync_in_worker_thread(\n client.upload_fileobj,\n Fileobj=from_file_object,\n Bucket=self.bucket_name,\n Key=bucket_path,\n **upload_kwargs,\n )\n self.logger.info(\n \"Uploaded from file object to the bucket \"\n f\"{self.bucket_name!r} path {bucket_path!r}.\"\n )\n return bucket_path\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.upload_from_folder","title":"
upload_from_folder
async
","text":"
Uploads files within a folder (excluding the folder itself) to the object storage service folder.
Parameters:
Name Type Description Default
from_folder
Union[str, pathlib.Path]
The path to the folder to upload from.
required
to_folder
Optional[str]
The path to upload the folder to.
None
**upload_kwargs
Dict[str, Any]
Additional keyword arguments to pass to Client.upload_fileobj
.
{}
Returns:
Type Description
str
The path that the folder was uploaded to.
Examples:
Upload contents from my_folder to new_folder.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.upload_from_folder(\"my_folder\", \"new_folder\")\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def upload_from_folder(\n self,\n from_folder: Union[str, Path],\n to_folder: Optional[str] = None,\n **upload_kwargs: Dict[str, Any],\n) -> str:\n \"\"\"\n Uploads files *within* a folder (excluding the folder itself)\n to the object storage service folder.\n\n Args:\n from_folder: The path to the folder to upload from.\n to_folder: The path to upload the folder to.\n **upload_kwargs: Additional keyword arguments to pass to\n `Client.upload_fileobj`.\n\n Returns:\n The path that the folder was uploaded to.\n\n Examples:\n Upload contents from my_folder to new_folder.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.upload_from_folder(\"my_folder\", \"new_folder\")\n ```\n \"\"\"\n from_folder = Path(from_folder)\n bucket_folder = self._join_bucket_folder(to_folder or \"\")\n\n num_uploaded = 0\n client = self.credentials.get_s3_client()\n\n async_coros = []\n for from_path in from_folder.rglob(\"**/*\"):\n # this skips the actual directory itself, e.g.\n # `my_folder/` will be skipped\n # `my_folder/notes.txt` will be uploaded\n if from_path.is_dir():\n continue\n bucket_path = (\n Path(bucket_folder) / from_path.relative_to(from_folder)\n ).as_posix()\n self.logger.info(\n f\"Uploading from {str(from_path)!r} to the bucket \"\n f\"{self.bucket_name!r} path {bucket_path!r}.\"\n )\n async_coros.append(\n run_sync_in_worker_thread(\n client.upload_file,\n Filename=str(from_path),\n Bucket=self.bucket_name,\n Key=bucket_path,\n **upload_kwargs,\n )\n )\n num_uploaded += 1\n await asyncio.gather(*async_coros)\n\n if num_uploaded == 0:\n self.logger.warning(f\"No files were uploaded from {str(from_folder)!r}.\")\n else:\n self.logger.info(\n f\"Uploaded {num_uploaded} files from {str(from_folder)!r} to \"\n f\"the bucket {self.bucket_name!r} path {bucket_path!r}\"\n )\n\n return to_folder\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.upload_from_path","title":"
upload_from_path
async
","text":"
Uploads an object from a path to the S3 bucket.
Parameters:
Name Type Description Default
from_path
Union[str, pathlib.Path]
The path to the file to upload from.
required
to_path
Optional[str]
The path to upload the file to.
None
**upload_kwargs
Dict[str, Any]
Additional keyword arguments to pass to Client.upload
.
{}
Returns:
Type Description
str
The path that the object was uploaded to.
Examples:
Upload notes.txt to my_folder/notes.txt.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.upload_from_path(\"notes.txt\", \"my_folder/notes.txt\")\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def upload_from_path(\n self,\n from_path: Union[str, Path],\n to_path: Optional[str] = None,\n **upload_kwargs: Dict[str, Any],\n) -> str:\n \"\"\"\n Uploads an object from a path to the S3 bucket.\n\n Args:\n from_path: The path to the file to upload from.\n to_path: The path to upload the file to.\n **upload_kwargs: Additional keyword arguments to pass to `Client.upload`.\n\n Returns:\n The path that the object was uploaded to.\n\n Examples:\n Upload notes.txt to my_folder/notes.txt.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.upload_from_path(\"notes.txt\", \"my_folder/notes.txt\")\n ```\n \"\"\"\n from_path = str(Path(from_path).absolute())\n if to_path is None:\n to_path = Path(from_path).name\n\n bucket_path = str(self._join_bucket_folder(to_path))\n client = self.credentials.get_s3_client()\n\n await run_sync_in_worker_thread(\n client.upload_file,\n Filename=from_path,\n Bucket=self.bucket_name,\n Key=bucket_path,\n **upload_kwargs,\n )\n self.logger.info(\n f\"Uploaded from {from_path!r} to the bucket \"\n f\"{self.bucket_name!r} path {bucket_path!r}.\"\n )\n return bucket_path\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.write_path","title":"
write_path
async
","text":"
Writes to an S3 bucket.
Parameters:
Name Type Description Default
path
str
The key name. Each object in your bucket has a unique key (or key name).
required
content
bytes
What you are uploading to S3.
required
Examples:
Write data to the path dogs/small_dogs/havanese
in an S3 Bucket:
from prefect_aws import MinioCredentials\nfrom prefect_aws.s3 import S3Bucket\n\nminio_creds = MinIOCredentials(\n minio_root_user = \"minioadmin\",\n minio_root_password = \"minioadmin\",\n)\n\ns3_bucket_block = S3Bucket(\n bucket_name=\"bucket\",\n minio_credentials=minio_creds,\n basepath=\"dogs/smalldogs\",\n endpoint_url=\"http://localhost:9000\",\n)\ns3_havanese_path = s3_bucket_block.write_path(path=\"havanese\", content=data)\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def write_path(self, path: str, content: bytes) -> str:\n \"\"\"\n Writes to an S3 bucket.\n\n Args:\n\n path: The key name. Each object in your bucket has a unique\n key (or key name).\n content: What you are uploading to S3.\n\n Example:\n\n Write data to the path `dogs/small_dogs/havanese` in an S3 Bucket:\n ```python\n from prefect_aws import MinioCredentials\n from prefect_aws.s3 import S3Bucket\n\n minio_creds = MinIOCredentials(\n minio_root_user = \"minioadmin\",\n minio_root_password = \"minioadmin\",\n )\n\n s3_bucket_block = S3Bucket(\n bucket_name=\"bucket\",\n minio_credentials=minio_creds,\n basepath=\"dogs/smalldogs\",\n endpoint_url=\"http://localhost:9000\",\n )\n s3_havanese_path = s3_bucket_block.write_path(path=\"havanese\", content=data)\n ```\n \"\"\"\n\n path = self._resolve_path(path)\n\n await run_sync_in_worker_thread(self._write_sync, path, content)\n\n return path\n
"},{"location":"s3/#prefect_aws.s3-functions","title":"Functions","text":""},{"location":"s3/#prefect_aws.s3.s3_copy","title":"
s3_copy
async
","text":"
Uses S3's internal CopyObject to copy objects within or between buckets. To copy objects between buckets, the credentials must have permission to read the source object and write to the target object. If the credentials do not have those permissions, try using S3Bucket.stream_from
.
Parameters:
Name Type Description Default
source_path
str
The path to the object to copy. Can be a string or Path
.
required
target_path
str
The path to copy the object to. Can be a string or Path
.
required
source_bucket_name
str
The bucket to copy the object from.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
target_bucket_name
Optional[str]
The bucket to copy the object to. If not provided, defaults to source_bucket
.
None
**copy_kwargs
Additional keyword arguments to pass to S3Client.copy_object
.
{}
Returns:
Type Description
str
The path that the object was copied to. Excludes the bucket name.
Examples:
Copy notes.txt from s3://my-bucket/my_folder/notes.txt to s3://my-bucket/my_folder/notes_copy.txt.
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import s3_copy\n\naws_credentials = AwsCredentials.load(\"my-creds\")\n\n@flow\nasync def example_copy_flow():\n await s3_copy(\n source_path=\"my_folder/notes.txt\",\n target_path=\"my_folder/notes_copy.txt\",\n source_bucket_name=\"my-bucket\",\n aws_credentials=aws_credentials,\n )\n\nexample_copy_flow()\n
Copy notes.txt from s3://my-bucket/my_folder/notes.txt to s3://other-bucket/notes_copy.txt.
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import s3_copy\n\naws_credentials = AwsCredentials.load(\"shared-creds\")\n\n@flow\nasync def example_copy_flow():\n await s3_copy(\n source_path=\"my_folder/notes.txt\",\n target_path=\"notes_copy.txt\",\n source_bucket_name=\"my-bucket\",\n aws_credentials=aws_credentials,\n target_bucket_name=\"other-bucket\",\n )\n\nexample_copy_flow()\n
Source code in
prefect_aws/s3.py
@task\nasync def s3_copy(\n source_path: str,\n target_path: str,\n source_bucket_name: str,\n aws_credentials: AwsCredentials,\n target_bucket_name: Optional[str] = None,\n **copy_kwargs,\n) -> str:\n \"\"\"Uses S3's internal\n [CopyObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html)\n to copy objects within or between buckets. To copy objects between buckets, the\n credentials must have permission to read the source object and write to the target\n object. If the credentials do not have those permissions, try using\n `S3Bucket.stream_from`.\n\n Args:\n source_path: The path to the object to copy. Can be a string or `Path`.\n target_path: The path to copy the object to. Can be a string or `Path`.\n source_bucket_name: The bucket to copy the object from.\n aws_credentials: Credentials to use for authentication with AWS.\n target_bucket_name: The bucket to copy the object to. If not provided, defaults\n to `source_bucket`.\n **copy_kwargs: Additional keyword arguments to pass to `S3Client.copy_object`.\n\n Returns:\n The path that the object was copied to. Excludes the bucket name.\n\n Examples:\n\n Copy notes.txt from s3://my-bucket/my_folder/notes.txt to\n s3://my-bucket/my_folder/notes_copy.txt.\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.s3 import s3_copy\n\n aws_credentials = AwsCredentials.load(\"my-creds\")\n\n @flow\n async def example_copy_flow():\n await s3_copy(\n source_path=\"my_folder/notes.txt\",\n target_path=\"my_folder/notes_copy.txt\",\n source_bucket_name=\"my-bucket\",\n aws_credentials=aws_credentials,\n )\n\n example_copy_flow()\n ```\n\n Copy notes.txt from s3://my-bucket/my_folder/notes.txt to\n s3://other-bucket/notes_copy.txt.\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.s3 import s3_copy\n\n aws_credentials = AwsCredentials.load(\"shared-creds\")\n\n @flow\n async def example_copy_flow():\n await s3_copy(\n source_path=\"my_folder/notes.txt\",\n target_path=\"notes_copy.txt\",\n source_bucket_name=\"my-bucket\",\n aws_credentials=aws_credentials,\n target_bucket_name=\"other-bucket\",\n )\n\n example_copy_flow()\n ```\n\n \"\"\"\n logger = get_run_logger()\n\n s3_client = aws_credentials.get_s3_client()\n\n target_bucket_name = target_bucket_name or source_bucket_name\n\n logger.info(\n \"Copying object from bucket %s with key %s to bucket %s with key %s\",\n source_bucket_name,\n source_path,\n target_bucket_name,\n target_path,\n )\n\n s3_client.copy_object(\n CopySource={\"Bucket\": source_bucket_name, \"Key\": source_path},\n Bucket=target_bucket_name,\n Key=target_path,\n **copy_kwargs,\n )\n\n return target_path\n
"},{"location":"s3/#prefect_aws.s3.s3_download","title":"
s3_download
async
","text":"
Downloads an object with a given key from a given S3 bucket.
Parameters:
Name Type Description Default
bucket
str
Name of bucket to download object from. Required if a default value was not supplied when creating the task.
required
key
str
Key of object to download. Required if a default value was not supplied when creating the task.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
aws_client_parameters
AwsClientParameters
Custom parameter for the boto3 client initialization.
AwsClientParameters(api_version=None, use_ssl=True, verify=True, verify_cert_path=None, endpoint_url=None, config=None)
Returns:
Type Description
bytes
A bytes
representation of the downloaded object.
Examples:
Download a file from an S3 bucket:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import s3_download\n\n\n@flow\nasync def example_s3_download_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n data = await s3_download(\n bucket=\"bucket\",\n key=\"key\",\n aws_credentials=aws_credentials,\n )\n\nexample_s3_download_flow()\n
Source code in
prefect_aws/s3.py
@task\nasync def s3_download(\n bucket: str,\n key: str,\n aws_credentials: AwsCredentials,\n aws_client_parameters: AwsClientParameters = AwsClientParameters(),\n) -> bytes:\n \"\"\"\n Downloads an object with a given key from a given S3 bucket.\n\n Args:\n bucket: Name of bucket to download object from. Required if a default value was\n not supplied when creating the task.\n key: Key of object to download. Required if a default value was not supplied\n when creating the task.\n aws_credentials: Credentials to use for authentication with AWS.\n aws_client_parameters: Custom parameter for the boto3 client initialization.\n\n\n Returns:\n A `bytes` representation of the downloaded object.\n\n Example:\n Download a file from an S3 bucket:\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.s3 import s3_download\n\n\n @flow\n async def example_s3_download_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n data = await s3_download(\n bucket=\"bucket\",\n key=\"key\",\n aws_credentials=aws_credentials,\n )\n\n example_s3_download_flow()\n ```\n \"\"\"\n logger = get_run_logger()\n logger.info(\"Downloading object from bucket %s with key %s\", bucket, key)\n\n s3_client = aws_credentials.get_boto3_session().client(\n \"s3\", **aws_client_parameters.get_params_override()\n )\n stream = io.BytesIO()\n await run_sync_in_worker_thread(\n s3_client.download_fileobj, Bucket=bucket, Key=key, Fileobj=stream\n )\n stream.seek(0)\n output = stream.read()\n\n return output\n
"},{"location":"s3/#prefect_aws.s3.s3_list_objects","title":"
s3_list_objects
async
","text":"
Lists details of objects in a given S3 bucket.
Parameters:
Name Type Description Default
bucket
str
Name of bucket to list items from. Required if a default value was not supplied when creating the task.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
aws_client_parameters
AwsClientParameters
Custom parameter for the boto3 client initialization..
AwsClientParameters(api_version=None, use_ssl=True, verify=True, verify_cert_path=None, endpoint_url=None, config=None)
prefix
str
Used to filter objects with keys starting with the specified prefix.
''
delimiter
str
Character used to group keys of listed objects.
''
page_size
Optional[int]
Number of objects to return in each request to the AWS API.
None
max_items
Optional[int]
Maximum number of objects that to be returned by task.
None
jmespath_query
Optional[str]
Query used to filter objects based on object attributes refer to the boto3 docs for more information on how to construct queries.
None
Returns:
Type Description
List[Dict[str, Any]]
A list of dictionaries containing information about the objects retrieved. Refer to the boto3 docs for an example response.
Examples:
List all objects in a bucket:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import s3_list_objects\n\n\n@flow\nasync def example_s3_list_objects_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n objects = await s3_list_objects(\n bucket=\"data_bucket\",\n aws_credentials=aws_credentials\n )\n\nexample_s3_list_objects_flow()\n
Source code in
prefect_aws/s3.py
@task\nasync def s3_list_objects(\n bucket: str,\n aws_credentials: AwsCredentials,\n aws_client_parameters: AwsClientParameters = AwsClientParameters(),\n prefix: str = \"\",\n delimiter: str = \"\",\n page_size: Optional[int] = None,\n max_items: Optional[int] = None,\n jmespath_query: Optional[str] = None,\n) -> List[Dict[str, Any]]:\n \"\"\"\n Lists details of objects in a given S3 bucket.\n\n Args:\n bucket: Name of bucket to list items from. Required if a default value was not\n supplied when creating the task.\n aws_credentials: Credentials to use for authentication with AWS.\n aws_client_parameters: Custom parameter for the boto3 client initialization..\n prefix: Used to filter objects with keys starting with the specified prefix.\n delimiter: Character used to group keys of listed objects.\n page_size: Number of objects to return in each request to the AWS API.\n max_items: Maximum number of objects that to be returned by task.\n jmespath_query: Query used to filter objects based on object attributes refer to\n the [boto3 docs](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html#filtering-results-with-jmespath)\n for more information on how to construct queries.\n\n Returns:\n A list of dictionaries containing information about the objects retrieved. Refer\n to the boto3 docs for an example response.\n\n Example:\n List all objects in a bucket:\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.s3 import s3_list_objects\n\n\n @flow\n async def example_s3_list_objects_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n objects = await s3_list_objects(\n bucket=\"data_bucket\",\n aws_credentials=aws_credentials\n )\n\n example_s3_list_objects_flow()\n ```\n \"\"\" # noqa E501\n logger = get_run_logger()\n logger.info(\"Listing objects in bucket %s with prefix %s\", bucket, prefix)\n\n s3_client = aws_credentials.get_boto3_session().client(\n \"s3\", **aws_client_parameters.get_params_override()\n )\n paginator = s3_client.get_paginator(\"list_objects_v2\")\n page_iterator = paginator.paginate(\n Bucket=bucket,\n Prefix=prefix,\n Delimiter=delimiter,\n PaginationConfig={\"PageSize\": page_size, \"MaxItems\": max_items},\n )\n if jmespath_query:\n page_iterator = page_iterator.search(f\"{jmespath_query} | {{Contents: @}}\")\n\n return await run_sync_in_worker_thread(_list_objects_sync, page_iterator)\n
"},{"location":"s3/#prefect_aws.s3.s3_move","title":"
s3_move
async
","text":"
Move an object from one S3 location to another. To move objects between buckets, the credentials must have permission to read and delete the source object and write to the target object. If the credentials do not have those permissions, this method will raise an error. If the credentials have permission to read the source object but not delete it, the object will be copied but not deleted.
Parameters:
Name Type Description Default
source_path
str
The path of the object to move
required
target_path
str
The path to move the object to
required
source_bucket_name
str
The name of the bucket containing the source object
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
target_bucket_name
Optional[str]
The bucket to copy the object to. If not provided, defaults to source_bucket
.
None
Returns:
Type Description
str
The path that the object was moved to. Excludes the bucket name.
Source code in
prefect_aws/s3.py
@task\nasync def s3_move(\n source_path: str,\n target_path: str,\n source_bucket_name: str,\n aws_credentials: AwsCredentials,\n target_bucket_name: Optional[str] = None,\n) -> str:\n \"\"\"\n Move an object from one S3 location to another. To move objects between buckets,\n the credentials must have permission to read and delete the source object and write\n to the target object. If the credentials do not have those permissions, this method\n will raise an error. If the credentials have permission to read the source object\n but not delete it, the object will be copied but not deleted.\n\n Args:\n source_path: The path of the object to move\n target_path: The path to move the object to\n source_bucket_name: The name of the bucket containing the source object\n aws_credentials: Credentials to use for authentication with AWS.\n target_bucket_name: The bucket to copy the object to. If not provided, defaults\n to `source_bucket`.\n\n Returns:\n The path that the object was moved to. Excludes the bucket name.\n \"\"\"\n logger = get_run_logger()\n\n s3_client = aws_credentials.get_s3_client()\n\n # If target bucket is not provided, assume it's the same as the source bucket\n target_bucket_name = target_bucket_name or source_bucket_name\n\n logger.info(\n \"Moving object from s3://%s/%s s3://%s/%s\",\n source_bucket_name,\n source_path,\n target_bucket_name,\n target_path,\n )\n\n # Copy the object to the new location\n s3_client.copy_object(\n Bucket=target_bucket_name,\n CopySource={\"Bucket\": source_bucket_name, \"Key\": source_path},\n Key=target_path,\n )\n\n # Delete the original object\n s3_client.delete_object(Bucket=source_bucket_name, Key=source_path)\n\n return target_path\n
"},{"location":"s3/#prefect_aws.s3.s3_upload","title":"
s3_upload
async
","text":"
Uploads data to an S3 bucket.
Parameters:
Name Type Description Default
data
bytes
Bytes representation of data to upload to S3.
required
bucket
str
Name of bucket to upload data to. Required if a default value was not supplied when creating the task.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
aws_client_parameters
AwsClientParameters
Custom parameter for the boto3 client initialization..
AwsClientParameters(api_version=None, use_ssl=True, verify=True, verify_cert_path=None, endpoint_url=None, config=None)
key
Optional[str]
Key of object to download. Defaults to a UUID string.
None
Returns:
Type Description
str
The key of the uploaded object
Examples:
Read and upload a file to an S3 bucket:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import s3_upload\n\n\n@flow\nasync def example_s3_upload_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n with open(\"data.csv\", \"rb\") as file:\n key = await s3_upload(\n bucket=\"bucket\",\n key=\"data.csv\",\n data=file.read(),\n aws_credentials=aws_credentials,\n )\n\nexample_s3_upload_flow()\n
Source code in
prefect_aws/s3.py
@task\nasync def s3_upload(\n data: bytes,\n bucket: str,\n aws_credentials: AwsCredentials,\n aws_client_parameters: AwsClientParameters = AwsClientParameters(),\n key: Optional[str] = None,\n) -> str:\n \"\"\"\n Uploads data to an S3 bucket.\n\n Args:\n data: Bytes representation of data to upload to S3.\n bucket: Name of bucket to upload data to. Required if a default value was not\n supplied when creating the task.\n aws_credentials: Credentials to use for authentication with AWS.\n aws_client_parameters: Custom parameter for the boto3 client initialization..\n key: Key of object to download. Defaults to a UUID string.\n\n Returns:\n The key of the uploaded object\n\n Example:\n Read and upload a file to an S3 bucket:\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.s3 import s3_upload\n\n\n @flow\n async def example_s3_upload_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n with open(\"data.csv\", \"rb\") as file:\n key = await s3_upload(\n bucket=\"bucket\",\n key=\"data.csv\",\n data=file.read(),\n aws_credentials=aws_credentials,\n )\n\n example_s3_upload_flow()\n ```\n \"\"\"\n logger = get_run_logger()\n\n key = key or str(uuid.uuid4())\n\n logger.info(\"Uploading object to bucket %s with key %s\", bucket, key)\n\n s3_client = aws_credentials.get_boto3_session().client(\n \"s3\", **aws_client_parameters.get_params_override()\n )\n stream = io.BytesIO(data)\n await run_sync_in_worker_thread(\n s3_client.upload_fileobj, stream, Bucket=bucket, Key=key\n )\n\n return key\n
"},{"location":"secrets_manager/","title":"Secrets Manager","text":""},{"location":"secrets_manager/#prefect_aws.secrets_manager","title":"
prefect_aws.secrets_manager
","text":"
Tasks for interacting with AWS Secrets Manager
"},{"location":"secrets_manager/#prefect_aws.secrets_manager-classes","title":"Classes","text":""},{"location":"secrets_manager/#prefect_aws.secrets_manager.AwsSecret","title":"
AwsSecret (SecretBlock)
pydantic-model
","text":"
Manages a secret in AWS's Secrets Manager.
Attributes:
Name Type Description
aws_credentials
AwsCredentials
The credentials to use for authentication with AWS.
secret_name
str
The name of the secret.
Source code in
prefect_aws/secrets_manager.py
class AwsSecret(SecretBlock):\n \"\"\"\n Manages a secret in AWS's Secrets Manager.\n\n Attributes:\n aws_credentials: The credentials to use for authentication with AWS.\n secret_name: The name of the secret.\n \"\"\"\n\n _logo_url = \"https://cdn.sanity.io/images/3ugk85nk/production/d74b16fe84ce626345adf235a47008fea2869a60-225x225.png\" # noqa\n _block_type_name = \"AWS Secret\"\n _documentation_url = \"https://prefecthq.github.io/prefect-aws/secrets_manager/#prefect_aws.secrets_manager.AwsSecret\" # noqa\n\n aws_credentials: AwsCredentials\n secret_name: str = Field(default=..., description=\"The name of the secret.\")\n\n @sync_compatible\n async def read_secret(\n self,\n version_id: str = None,\n version_stage: str = None,\n **read_kwargs: Dict[str, Any],\n ) -> bytes:\n \"\"\"\n Reads the secret from the secret storage service.\n\n Args:\n version_id: The version of the secret to read. If not provided, the latest\n version will be read.\n version_stage: The version stage of the secret to read. If not provided,\n the latest version will be read.\n read_kwargs: Additional keyword arguments to pass to the\n `get_secret_value` method of the boto3 client.\n\n Returns:\n The secret data.\n\n Examples:\n Reads a secret.\n ```python\n secrets_manager = SecretsManager.load(\"MY_BLOCK\")\n secrets_manager.read_secret()\n ```\n \"\"\"\n client = self.aws_credentials.get_secrets_manager_client()\n if version_id is not None:\n read_kwargs[\"VersionId\"] = version_id\n if version_stage is not None:\n read_kwargs[\"VersionStage\"] = version_stage\n response = await run_sync_in_worker_thread(\n client.get_secret_value, SecretId=self.secret_name, **read_kwargs\n )\n if \"SecretBinary\" in response:\n secret = response[\"SecretBinary\"]\n elif \"SecretString\" in response:\n secret = response[\"SecretString\"]\n arn = response[\"ARN\"]\n self.logger.info(f\"The secret {arn!r} data was successfully read.\")\n return secret\n\n @sync_compatible\n async def write_secret(\n self, secret_data: bytes, **put_or_create_secret_kwargs: Dict[str, Any]\n ) -> str:\n \"\"\"\n Writes the secret to the secret storage service as a SecretBinary;\n if it doesn't exist, it will be created.\n\n Args:\n secret_data: The secret data to write.\n **put_or_create_secret_kwargs: Additional keyword arguments to pass to\n put_secret_value or create_secret method of the boto3 client.\n\n Returns:\n The path that the secret was written to.\n\n Examples:\n Write some secret data.\n ```python\n secrets_manager = SecretsManager.load(\"MY_BLOCK\")\n secrets_manager.write_secret(b\"my_secret_data\")\n ```\n \"\"\"\n client = self.aws_credentials.get_secrets_manager_client()\n try:\n response = await run_sync_in_worker_thread(\n client.put_secret_value,\n SecretId=self.secret_name,\n SecretBinary=secret_data,\n **put_or_create_secret_kwargs,\n )\n except client.exceptions.ResourceNotFoundException:\n self.logger.info(\n f\"The secret {self.secret_name!r} does not exist yet, creating it now.\"\n )\n response = await run_sync_in_worker_thread(\n client.create_secret,\n Name=self.secret_name,\n SecretBinary=secret_data,\n **put_or_create_secret_kwargs,\n )\n arn = response[\"ARN\"]\n self.logger.info(f\"The secret data was written successfully to {arn!r}.\")\n return arn\n\n @sync_compatible\n async def delete_secret(\n self,\n recovery_window_in_days: int = 30,\n force_delete_without_recovery: bool = False,\n **delete_kwargs: Dict[str, Any],\n ) -> str:\n \"\"\"\n Deletes the secret from the secret storage service.\n\n Args:\n recovery_window_in_days: The number of days to wait before permanently\n deleting the secret. Must be between 7 and 30 days.\n force_delete_without_recovery: If True, the secret will be deleted\n immediately without a recovery window.\n **delete_kwargs: Additional keyword arguments to pass to the\n delete_secret method of the boto3 client.\n\n Returns:\n The path that the secret was deleted from.\n\n Examples:\n Deletes the secret with a recovery window of 15 days.\n ```python\n secrets_manager = SecretsManager.load(\"MY_BLOCK\")\n secrets_manager.delete_secret(recovery_window_in_days=15)\n ```\n \"\"\"\n if force_delete_without_recovery and recovery_window_in_days:\n raise ValueError(\n \"Cannot specify recovery window and force delete without recovery.\"\n )\n elif not (7 <= recovery_window_in_days <= 30):\n raise ValueError(\n \"Recovery window must be between 7 and 30 days, got \"\n f\"{recovery_window_in_days}.\"\n )\n\n client = self.aws_credentials.get_secrets_manager_client()\n response = await run_sync_in_worker_thread(\n client.delete_secret,\n SecretId=self.secret_name,\n RecoveryWindowInDays=recovery_window_in_days,\n ForceDeleteWithoutRecovery=force_delete_without_recovery,\n **delete_kwargs,\n )\n arn = response[\"ARN\"]\n self.logger.info(f\"The secret {arn} was deleted successfully.\")\n return arn\n
"},{"location":"secrets_manager/#prefect_aws.secrets_manager.AwsSecret-attributes","title":"Attributes","text":""},{"location":"secrets_manager/#prefect_aws.secrets_manager.AwsSecret.secret_name","title":"
secret_name: str
pydantic-field
required
","text":"
The name of the secret.
"},{"location":"secrets_manager/#prefect_aws.secrets_manager.AwsSecret-methods","title":"Methods","text":""},{"location":"secrets_manager/#prefect_aws.secrets_manager.AwsSecret.__json_encoder__","title":"
__json_encoder__
special
staticmethod
","text":"
partial(func, args, *keywords) - new function with partial application of the given arguments and keywords.
"},{"location":"secrets_manager/#prefect_aws.secrets_manager.AwsSecret.delete_secret","title":"
delete_secret
async
","text":"
Deletes the secret from the secret storage service.
Parameters:
Name Type Description Default
recovery_window_in_days
int
The number of days to wait before permanently deleting the secret. Must be between 7 and 30 days.
30
force_delete_without_recovery
bool
If True, the secret will be deleted immediately without a recovery window.
False
**delete_kwargs
Dict[str, Any]
Additional keyword arguments to pass to the delete_secret method of the boto3 client.
{}
Returns:
Type Description
str
The path that the secret was deleted from.
Examples:
Deletes the secret with a recovery window of 15 days.
secrets_manager = SecretsManager.load(\"MY_BLOCK\")\nsecrets_manager.delete_secret(recovery_window_in_days=15)\n
Source code in
prefect_aws/secrets_manager.py
@sync_compatible\nasync def delete_secret(\n self,\n recovery_window_in_days: int = 30,\n force_delete_without_recovery: bool = False,\n **delete_kwargs: Dict[str, Any],\n) -> str:\n \"\"\"\n Deletes the secret from the secret storage service.\n\n Args:\n recovery_window_in_days: The number of days to wait before permanently\n deleting the secret. Must be between 7 and 30 days.\n force_delete_without_recovery: If True, the secret will be deleted\n immediately without a recovery window.\n **delete_kwargs: Additional keyword arguments to pass to the\n delete_secret method of the boto3 client.\n\n Returns:\n The path that the secret was deleted from.\n\n Examples:\n Deletes the secret with a recovery window of 15 days.\n ```python\n secrets_manager = SecretsManager.load(\"MY_BLOCK\")\n secrets_manager.delete_secret(recovery_window_in_days=15)\n ```\n \"\"\"\n if force_delete_without_recovery and recovery_window_in_days:\n raise ValueError(\n \"Cannot specify recovery window and force delete without recovery.\"\n )\n elif not (7 <= recovery_window_in_days <= 30):\n raise ValueError(\n \"Recovery window must be between 7 and 30 days, got \"\n f\"{recovery_window_in_days}.\"\n )\n\n client = self.aws_credentials.get_secrets_manager_client()\n response = await run_sync_in_worker_thread(\n client.delete_secret,\n SecretId=self.secret_name,\n RecoveryWindowInDays=recovery_window_in_days,\n ForceDeleteWithoutRecovery=force_delete_without_recovery,\n **delete_kwargs,\n )\n arn = response[\"ARN\"]\n self.logger.info(f\"The secret {arn} was deleted successfully.\")\n return arn\n
"},{"location":"secrets_manager/#prefect_aws.secrets_manager.AwsSecret.read_secret","title":"
read_secret
async
","text":"
Reads the secret from the secret storage service.
Parameters:
Name Type Description Default
version_id
str
The version of the secret to read. If not provided, the latest version will be read.
None
version_stage
str
The version stage of the secret to read. If not provided, the latest version will be read.
None
read_kwargs
Dict[str, Any]
Additional keyword arguments to pass to the get_secret_value
method of the boto3 client.
{}
Returns:
Type Description
bytes
The secret data.
Examples:
Reads a secret.
secrets_manager = SecretsManager.load(\"MY_BLOCK\")\nsecrets_manager.read_secret()\n
Source code in
prefect_aws/secrets_manager.py
@sync_compatible\nasync def read_secret(\n self,\n version_id: str = None,\n version_stage: str = None,\n **read_kwargs: Dict[str, Any],\n) -> bytes:\n \"\"\"\n Reads the secret from the secret storage service.\n\n Args:\n version_id: The version of the secret to read. If not provided, the latest\n version will be read.\n version_stage: The version stage of the secret to read. If not provided,\n the latest version will be read.\n read_kwargs: Additional keyword arguments to pass to the\n `get_secret_value` method of the boto3 client.\n\n Returns:\n The secret data.\n\n Examples:\n Reads a secret.\n ```python\n secrets_manager = SecretsManager.load(\"MY_BLOCK\")\n secrets_manager.read_secret()\n ```\n \"\"\"\n client = self.aws_credentials.get_secrets_manager_client()\n if version_id is not None:\n read_kwargs[\"VersionId\"] = version_id\n if version_stage is not None:\n read_kwargs[\"VersionStage\"] = version_stage\n response = await run_sync_in_worker_thread(\n client.get_secret_value, SecretId=self.secret_name, **read_kwargs\n )\n if \"SecretBinary\" in response:\n secret = response[\"SecretBinary\"]\n elif \"SecretString\" in response:\n secret = response[\"SecretString\"]\n arn = response[\"ARN\"]\n self.logger.info(f\"The secret {arn!r} data was successfully read.\")\n return secret\n
"},{"location":"secrets_manager/#prefect_aws.secrets_manager.AwsSecret.write_secret","title":"
write_secret
async
","text":"
Writes the secret to the secret storage service as a SecretBinary; if it doesn't exist, it will be created.
Parameters:
Name Type Description Default
secret_data
bytes
The secret data to write.
required
**put_or_create_secret_kwargs
Dict[str, Any]
Additional keyword arguments to pass to put_secret_value or create_secret method of the boto3 client.
{}
Returns:
Type Description
str
The path that the secret was written to.
Examples:
Write some secret data.
secrets_manager = SecretsManager.load(\"MY_BLOCK\")\nsecrets_manager.write_secret(b\"my_secret_data\")\n
Source code in
prefect_aws/secrets_manager.py
@sync_compatible\nasync def write_secret(\n self, secret_data: bytes, **put_or_create_secret_kwargs: Dict[str, Any]\n) -> str:\n \"\"\"\n Writes the secret to the secret storage service as a SecretBinary;\n if it doesn't exist, it will be created.\n\n Args:\n secret_data: The secret data to write.\n **put_or_create_secret_kwargs: Additional keyword arguments to pass to\n put_secret_value or create_secret method of the boto3 client.\n\n Returns:\n The path that the secret was written to.\n\n Examples:\n Write some secret data.\n ```python\n secrets_manager = SecretsManager.load(\"MY_BLOCK\")\n secrets_manager.write_secret(b\"my_secret_data\")\n ```\n \"\"\"\n client = self.aws_credentials.get_secrets_manager_client()\n try:\n response = await run_sync_in_worker_thread(\n client.put_secret_value,\n SecretId=self.secret_name,\n SecretBinary=secret_data,\n **put_or_create_secret_kwargs,\n )\n except client.exceptions.ResourceNotFoundException:\n self.logger.info(\n f\"The secret {self.secret_name!r} does not exist yet, creating it now.\"\n )\n response = await run_sync_in_worker_thread(\n client.create_secret,\n Name=self.secret_name,\n SecretBinary=secret_data,\n **put_or_create_secret_kwargs,\n )\n arn = response[\"ARN\"]\n self.logger.info(f\"The secret data was written successfully to {arn!r}.\")\n return arn\n
"},{"location":"secrets_manager/#prefect_aws.secrets_manager-functions","title":"Functions","text":""},{"location":"secrets_manager/#prefect_aws.secrets_manager.create_secret","title":"
create_secret
async
","text":"
Creates a secret in AWS Secrets Manager.
Parameters:
Name Type Description Default
secret_name
str
The name of the secret to create.
required
secret_value
Union[str, bytes]
The value to store in the created secret.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
description
Optional[str]
A description for the created secret.
None
tags
Optional[List[Dict[str, str]]]
A list of tags to attach to the secret. Each tag should be specified as a dictionary in the following format:
{\n \"Key\": str,\n \"Value\": str\n}\n
None
Returns:
Type Description
A dict containing the secret ARN (Amazon Resource Name), name, and current version ID. ```python { \"ARN\"
str, \"Name\": str, \"VersionId\": str } ```
Examples:
Create a secret:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import create_secret\n\n@flow\ndef example_create_secret():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n create_secret(\n secret_name=\"life_the_universe_and_everything\",\n secret_value=\"42\",\n aws_credentials=aws_credentials\n )\n\nexample_create_secret()\n
Source code in
prefect_aws/secrets_manager.py
@task\nasync def create_secret(\n secret_name: str,\n secret_value: Union[str, bytes],\n aws_credentials: AwsCredentials,\n description: Optional[str] = None,\n tags: Optional[List[Dict[str, str]]] = None,\n) -> Dict[str, str]:\n \"\"\"\n Creates a secret in AWS Secrets Manager.\n\n Args:\n secret_name: The name of the secret to create.\n secret_value: The value to store in the created secret.\n aws_credentials: Credentials to use for authentication with AWS.\n description: A description for the created secret.\n tags: A list of tags to attach to the secret. Each tag should be specified as a\n dictionary in the following format:\n ```python\n {\n \"Key\": str,\n \"Value\": str\n }\n ```\n\n Returns:\n A dict containing the secret ARN (Amazon Resource Name),\n name, and current version ID.\n ```python\n {\n \"ARN\": str,\n \"Name\": str,\n \"VersionId\": str\n }\n ```\n Example:\n Create a secret:\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.secrets_manager import create_secret\n\n @flow\n def example_create_secret():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n create_secret(\n secret_name=\"life_the_universe_and_everything\",\n secret_value=\"42\",\n aws_credentials=aws_credentials\n )\n\n example_create_secret()\n ```\n\n\n \"\"\"\n create_secret_kwargs: Dict[str, Union[str, bytes, List[Dict[str, str]]]] = dict(\n Name=secret_name\n )\n if description is not None:\n create_secret_kwargs[\"Description\"] = description\n if tags is not None:\n create_secret_kwargs[\"Tags\"] = tags\n if isinstance(secret_value, bytes):\n create_secret_kwargs[\"SecretBinary\"] = secret_value\n elif isinstance(secret_value, str):\n create_secret_kwargs[\"SecretString\"] = secret_value\n else:\n raise ValueError(\"Please provide a bytes or str value for secret_value\")\n\n logger = get_run_logger()\n logger.info(\"Creating secret named %s\", secret_name)\n\n client = aws_credentials.get_boto3_session().client(\"secretsmanager\")\n\n try:\n response = await run_sync_in_worker_thread(\n client.create_secret, **create_secret_kwargs\n )\n print(response.pop(\"ResponseMetadata\", None))\n return response\n except ClientError:\n logger.exception(\"Unable to create secret %s\", secret_name)\n raise\n
"},{"location":"secrets_manager/#prefect_aws.secrets_manager.delete_secret","title":"
delete_secret
async
","text":"
Deletes a secret from AWS Secrets Manager.
Secrets can either be deleted immediately by setting force_delete_without_recovery
equal to True
. Otherwise, secrets will be marked for deletion and available for recovery for the number of days specified in recovery_window_in_days
Parameters:
Name Type Description Default
secret_name
str
Name of the secret to be deleted.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
recovery_window_in_days
int
Number of days a secret should be recoverable for before permanent deletion. Minium window is 7 days and maximum window is 30 days. If force_delete_without_recovery
is set to True
, this value will be ignored.
30
force_delete_without_recovery
bool
If True
, the secret will be immediately deleted and will not be recoverable.
False
Returns:
Type Description
A dict containing the secret ARN (Amazon Resource Name), name, and deletion date of the secret. DeletionDate is the date and time of the delete request plus the number of days in `recovery_window_in_days`. ```python { \"ARN\"
str, \"Name\": str, \"DeletionDate\": datetime.datetime } ```
Examples:
Delete a secret immediately:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import delete_secret\n\n@flow\ndef example_delete_secret_immediately():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n delete_secret(\n secret_name=\"life_the_universe_and_everything\",\n aws_credentials=aws_credentials,\n force_delete_without_recovery: True\n )\n\nexample_delete_secret_immediately()\n
Delete a secret with a 90 day recovery window:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import delete_secret\n\n@flow\ndef example_delete_secret_with_recovery_window():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n delete_secret(\n secret_name=\"life_the_universe_and_everything\",\n aws_credentials=aws_credentials,\n recovery_window_in_days=90\n )\n\nexample_delete_secret_with_recovery_window()\n
Source code in
prefect_aws/secrets_manager.py
@task\nasync def delete_secret(\n secret_name: str,\n aws_credentials: AwsCredentials,\n recovery_window_in_days: int = 30,\n force_delete_without_recovery: bool = False,\n) -> Dict[str, str]:\n \"\"\"\n Deletes a secret from AWS Secrets Manager.\n\n Secrets can either be deleted immediately by setting `force_delete_without_recovery`\n equal to `True`. Otherwise, secrets will be marked for deletion and available for\n recovery for the number of days specified in `recovery_window_in_days`\n\n Args:\n secret_name: Name of the secret to be deleted.\n aws_credentials: Credentials to use for authentication with AWS.\n recovery_window_in_days: Number of days a secret should be recoverable for\n before permanent deletion. Minium window is 7 days and maximum window\n is 30 days. If `force_delete_without_recovery` is set to `True`, this\n value will be ignored.\n force_delete_without_recovery: If `True`, the secret will be immediately\n deleted and will not be recoverable.\n\n Returns:\n A dict containing the secret ARN (Amazon Resource Name),\n name, and deletion date of the secret. DeletionDate is the date and\n time of the delete request plus the number of days in\n `recovery_window_in_days`.\n ```python\n {\n \"ARN\": str,\n \"Name\": str,\n \"DeletionDate\": datetime.datetime\n }\n ```\n\n Examples:\n Delete a secret immediately:\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.secrets_manager import delete_secret\n\n @flow\n def example_delete_secret_immediately():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n delete_secret(\n secret_name=\"life_the_universe_and_everything\",\n aws_credentials=aws_credentials,\n force_delete_without_recovery: True\n )\n\n example_delete_secret_immediately()\n ```\n\n Delete a secret with a 90 day recovery window:\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.secrets_manager import delete_secret\n\n @flow\n def example_delete_secret_with_recovery_window():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n delete_secret(\n secret_name=\"life_the_universe_and_everything\",\n aws_credentials=aws_credentials,\n recovery_window_in_days=90\n )\n\n example_delete_secret_with_recovery_window()\n ```\n\n\n \"\"\"\n if not force_delete_without_recovery and not (7 <= recovery_window_in_days <= 30):\n raise ValueError(\"Recovery window must be between 7 and 30 days.\")\n\n delete_secret_kwargs: Dict[str, Union[str, int, bool]] = dict(SecretId=secret_name)\n if force_delete_without_recovery:\n delete_secret_kwargs[\"ForceDeleteWithoutRecovery\"] = (\n force_delete_without_recovery\n )\n else:\n delete_secret_kwargs[\"RecoveryWindowInDays\"] = recovery_window_in_days\n\n logger = get_run_logger()\n logger.info(\"Deleting secret %s\", secret_name)\n\n client = aws_credentials.get_boto3_session().client(\"secretsmanager\")\n\n try:\n response = await run_sync_in_worker_thread(\n client.delete_secret, **delete_secret_kwargs\n )\n response.pop(\"ResponseMetadata\", None)\n return response\n except ClientError:\n logger.exception(\"Unable to delete secret %s\", secret_name)\n raise\n
"},{"location":"secrets_manager/#prefect_aws.secrets_manager.read_secret","title":"
read_secret
async
","text":"
Reads the value of a given secret from AWS Secrets Manager.
Parameters:
Name Type Description Default
secret_name
str
Name of stored secret.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
version_id
Optional[str]
Specifies version of secret to read. Defaults to the most recent version if not given.
None
version_stage
Optional[str]
Specifies the version stage of the secret to read. Defaults to AWS_CURRENT if not given.
None
Returns:
Type Description
Union[str, bytes]
The secret values as a str
or bytes
depending on the format in which the secret was stored.
Examples:
Read a secret value:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import read_secret\n\n@flow\ndef example_read_secret():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n secret_value = read_secret(\n secret_name=\"db_password\",\n aws_credentials=aws_credentials\n )\n\nexample_read_secret()\n
Source code in
prefect_aws/secrets_manager.py
@task\nasync def read_secret(\n secret_name: str,\n aws_credentials: AwsCredentials,\n version_id: Optional[str] = None,\n version_stage: Optional[str] = None,\n) -> Union[str, bytes]:\n \"\"\"\n Reads the value of a given secret from AWS Secrets Manager.\n\n Args:\n secret_name: Name of stored secret.\n aws_credentials: Credentials to use for authentication with AWS.\n version_id: Specifies version of secret to read. Defaults to the most recent\n version if not given.\n version_stage: Specifies the version stage of the secret to read. Defaults to\n AWS_CURRENT if not given.\n\n Returns:\n The secret values as a `str` or `bytes` depending on the format in which the\n secret was stored.\n\n Example:\n Read a secret value:\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.secrets_manager import read_secret\n\n @flow\n def example_read_secret():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n secret_value = read_secret(\n secret_name=\"db_password\",\n aws_credentials=aws_credentials\n )\n\n example_read_secret()\n ```\n \"\"\"\n logger = get_run_logger()\n logger.info(\"Getting value for secret %s\", secret_name)\n\n client = aws_credentials.get_boto3_session().client(\"secretsmanager\")\n\n get_secret_value_kwargs = dict(SecretId=secret_name)\n if version_id is not None:\n get_secret_value_kwargs[\"VersionId\"] = version_id\n if version_stage is not None:\n get_secret_value_kwargs[\"VersionStage\"] = version_stage\n\n try:\n response = await run_sync_in_worker_thread(\n client.get_secret_value, **get_secret_value_kwargs\n )\n except ClientError:\n logger.exception(\"Unable to get value for secret %s\", secret_name)\n raise\n else:\n return response.get(\"SecretString\") or response.get(\"SecretBinary\")\n
"},{"location":"secrets_manager/#prefect_aws.secrets_manager.update_secret","title":"
update_secret
async
","text":"
Updates the value of a given secret in AWS Secrets Manager.
Parameters:
Name Type Description Default
secret_name
str
Name of secret to update.
required
secret_value
Union[str, bytes]
Desired value of the secret. Can be either str
or bytes
.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
description
Optional[str]
Desired description of the secret.
None
Returns:
Type Description
A dict containing the secret ARN (Amazon Resource Name), name, and current version ID. ```python { \"ARN\"
str, \"Name\": str, \"VersionId\": str } ```
Examples:
Update a secret value:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import update_secret\n\n@flow\ndef example_update_secret():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n update_secret(\n secret_name=\"life_the_universe_and_everything\",\n secret_value=\"42\",\n aws_credentials=aws_credentials\n )\n\nexample_update_secret()\n
Source code in
prefect_aws/secrets_manager.py
@task\nasync def update_secret(\n secret_name: str,\n secret_value: Union[str, bytes],\n aws_credentials: AwsCredentials,\n description: Optional[str] = None,\n) -> Dict[str, str]:\n \"\"\"\n Updates the value of a given secret in AWS Secrets Manager.\n\n Args:\n secret_name: Name of secret to update.\n secret_value: Desired value of the secret. Can be either `str` or `bytes`.\n aws_credentials: Credentials to use for authentication with AWS.\n description: Desired description of the secret.\n\n Returns:\n A dict containing the secret ARN (Amazon Resource Name),\n name, and current version ID.\n ```python\n {\n \"ARN\": str,\n \"Name\": str,\n \"VersionId\": str\n }\n ```\n\n Example:\n Update a secret value:\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.secrets_manager import update_secret\n\n @flow\n def example_update_secret():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n update_secret(\n secret_name=\"life_the_universe_and_everything\",\n secret_value=\"42\",\n aws_credentials=aws_credentials\n )\n\n example_update_secret()\n ```\n\n \"\"\"\n update_secret_kwargs: Dict[str, Union[str, bytes]] = dict(SecretId=secret_name)\n if description is not None:\n update_secret_kwargs[\"Description\"] = description\n if isinstance(secret_value, bytes):\n update_secret_kwargs[\"SecretBinary\"] = secret_value\n elif isinstance(secret_value, str):\n update_secret_kwargs[\"SecretString\"] = secret_value\n else:\n raise ValueError(\"Please provide a bytes or str value for secret_value\")\n\n logger = get_run_logger()\n logger.info(\"Updating value for secret %s\", secret_name)\n\n client = aws_credentials.get_boto3_session().client(\"secretsmanager\")\n\n try:\n response = await run_sync_in_worker_thread(\n client.update_secret, **update_secret_kwargs\n )\n response.pop(\"ResponseMetadata\", None)\n return response\n except ClientError:\n logger.exception(\"Unable to update secret %s\", secret_name)\n raise\n
"},{"location":"deployments/steps/","title":"Steps","text":""},{"location":"deployments/steps/#prefect_aws.deployments.steps","title":"
prefect_aws.deployments.steps
","text":"
Prefect deployment steps for code storage and retrieval in S3 and S3 compatible services.
"},{"location":"deployments/steps/#prefect_aws.deployments.steps-classes","title":"Classes","text":""},{"location":"deployments/steps/#prefect_aws.deployments.steps.PullFromS3Output","title":"
PullFromS3Output (dict)
","text":"
The output of the pull_from_s3
step.
Source code in
prefect_aws/deployments/steps.py
class PullFromS3Output(TypedDict):\n \"\"\"\n The output of the `pull_from_s3` step.\n \"\"\"\n\n bucket: str\n folder: str\n directory: str\n
"},{"location":"deployments/steps/#prefect_aws.deployments.steps.PullProjectFromS3Output","title":"
PullProjectFromS3Output (dict)
","text":"
Deprecated. Use PullFromS3Output
instead..
Source code in
prefect_aws/deployments/steps.py
@deprecated_callable(start_date=\"Jun 2023\", help=\"Use `PullFromS3Output` instead.\")\nclass PullProjectFromS3Output(PullFromS3Output):\n \"\"\"Deprecated. Use `PullFromS3Output` instead..\"\"\"\n
"},{"location":"deployments/steps/#prefect_aws.deployments.steps.PushProjectToS3Output","title":"
PushProjectToS3Output (dict)
","text":"
Deprecated. Use PushToS3Output
instead.
Source code in
prefect_aws/deployments/steps.py
@deprecated_callable(start_date=\"Jun 2023\", help=\"Use `PushToS3Output` instead.\")\nclass PushProjectToS3Output(PushToS3Output):\n \"\"\"Deprecated. Use `PushToS3Output` instead.\"\"\"\n
"},{"location":"deployments/steps/#prefect_aws.deployments.steps.PushToS3Output","title":"
PushToS3Output (dict)
","text":"
The output of the push_to_s3
step.
Source code in
prefect_aws/deployments/steps.py
class PushToS3Output(TypedDict):\n \"\"\"\n The output of the `push_to_s3` step.\n \"\"\"\n\n bucket: str\n folder: str\n
"},{"location":"deployments/steps/#prefect_aws.deployments.steps-functions","title":"Functions","text":""},{"location":"deployments/steps/#prefect_aws.deployments.steps.pull_from_s3","title":"
pull_from_s3
","text":"
Pulls the contents of an S3 bucket folder to the current working directory.
Parameters:
Name Type Description Default
bucket
str
The name of the S3 bucket where files are stored.
required
folder
str
The folder in the S3 bucket where files are stored.
required
credentials
Optional[Dict]
A dictionary of AWS credentials (aws_access_key_id, aws_secret_access_key, aws_session_token).
None
client_parameters
Optional[Dict]
A dictionary of additional parameters to pass to the boto3 client.
None
Returns:
Type Description
PullFromS3Output
A dictionary containing the bucket, folder, and local directory where files were downloaded.
Examples:
Pull files from S3 using the default credentials and client parameters:
pull:\n - prefect_aws.deployments.steps.pull_from_s3:\n requires: prefect-aws\n bucket: my-bucket\n folder: my-project\n
Pull files from S3 using credentials stored in a block:
pull:\n - prefect_aws.deployments.steps.pull_from_s3:\n requires: prefect-aws\n bucket: my-bucket\n folder: my-project\n credentials: \"{{ prefect.blocks.aws-credentials.dev-credentials }}\"\n
Source code in
prefect_aws/deployments/steps.py
def pull_from_s3(\n bucket: str,\n folder: str,\n credentials: Optional[Dict] = None,\n client_parameters: Optional[Dict] = None,\n) -> PullFromS3Output:\n \"\"\"\n Pulls the contents of an S3 bucket folder to the current working directory.\n\n Args:\n bucket: The name of the S3 bucket where files are stored.\n folder: The folder in the S3 bucket where files are stored.\n credentials: A dictionary of AWS credentials (aws_access_key_id,\n aws_secret_access_key, aws_session_token).\n client_parameters: A dictionary of additional parameters to pass to the\n boto3 client.\n\n Returns:\n A dictionary containing the bucket, folder, and local directory where\n files were downloaded.\n\n Examples:\n Pull files from S3 using the default credentials and client parameters:\n ```yaml\n pull:\n - prefect_aws.deployments.steps.pull_from_s3:\n requires: prefect-aws\n bucket: my-bucket\n folder: my-project\n ```\n\n Pull files from S3 using credentials stored in a block:\n ```yaml\n pull:\n - prefect_aws.deployments.steps.pull_from_s3:\n requires: prefect-aws\n bucket: my-bucket\n folder: my-project\n credentials: \"{{ prefect.blocks.aws-credentials.dev-credentials }}\"\n ```\n \"\"\"\n s3 = get_s3_client(credentials=credentials, client_parameters=client_parameters)\n\n local_path = Path.cwd()\n\n paginator = s3.get_paginator(\"list_objects_v2\")\n for result in paginator.paginate(Bucket=bucket, Prefix=folder):\n for obj in result.get(\"Contents\", []):\n remote_key = obj[\"Key\"]\n\n if remote_key[-1] == \"/\":\n # object is a folder and will be created if it contains any objects\n continue\n\n target = PurePosixPath(\n local_path\n / relative_path_to_current_platform(remote_key).relative_to(folder)\n )\n Path.mkdir(Path(target.parent), parents=True, exist_ok=True)\n s3.download_file(bucket, remote_key, str(target))\n\n return {\n \"bucket\": bucket,\n \"folder\": folder,\n \"directory\": str(local_path),\n }\n
"},{"location":"deployments/steps/#prefect_aws.deployments.steps.pull_project_from_s3","title":"
pull_project_from_s3
","text":"
Deprecated. Use pull_from_s3
instead.
Source code in
prefect_aws/deployments/steps.py
@deprecated_callable(start_date=\"Jun 2023\", help=\"Use `pull_from_s3` instead.\")\ndef pull_project_from_s3(*args, **kwargs):\n \"\"\"Deprecated. Use `pull_from_s3` instead.\"\"\"\n pull_from_s3(*args, **kwargs)\n
"},{"location":"deployments/steps/#prefect_aws.deployments.steps.push_project_to_s3","title":"
push_project_to_s3
","text":"
Deprecated. Use push_to_s3
instead.
Source code in
prefect_aws/deployments/steps.py
@deprecated_callable(start_date=\"Jun 2023\", help=\"Use `push_to_s3` instead.\")\ndef push_project_to_s3(*args, **kwargs):\n \"\"\"Deprecated. Use `push_to_s3` instead.\"\"\"\n push_to_s3(*args, **kwargs)\n
"},{"location":"deployments/steps/#prefect_aws.deployments.steps.push_to_s3","title":"
push_to_s3
","text":"
Pushes the contents of the current working directory to an S3 bucket, excluding files and folders specified in the ignore_file.
Parameters:
Name Type Description Default
bucket
str
The name of the S3 bucket where files will be uploaded.
required
folder
str
The folder in the S3 bucket where files will be uploaded.
required
credentials
Optional[Dict]
A dictionary of AWS credentials (aws_access_key_id, aws_secret_access_key, aws_session_token).
None
client_parameters
Optional[Dict]
A dictionary of additional parameters to pass to the boto3 client.
None
ignore_file
Optional[str]
The name of the file containing ignore patterns.
'.prefectignore'
Returns:
Type Description
PushToS3Output
A dictionary containing the bucket and folder where files were uploaded.
Examples:
Push files to an S3 bucket:
push:\n - prefect_aws.deployments.steps.push_to_s3:\n requires: prefect-aws\n bucket: my-bucket\n folder: my-project\n
Push files to an S3 bucket using credentials stored in a block:
push:\n - prefect_aws.deployments.steps.push_to_s3:\n requires: prefect-aws\n bucket: my-bucket\n folder: my-project\n credentials: \"{{ prefect.blocks.aws-credentials.dev-credentials }}\"\n
Source code in
prefect_aws/deployments/steps.py
def push_to_s3(\n bucket: str,\n folder: str,\n credentials: Optional[Dict] = None,\n client_parameters: Optional[Dict] = None,\n ignore_file: Optional[str] = \".prefectignore\",\n) -> PushToS3Output:\n \"\"\"\n Pushes the contents of the current working directory to an S3 bucket,\n excluding files and folders specified in the ignore_file.\n\n Args:\n bucket: The name of the S3 bucket where files will be uploaded.\n folder: The folder in the S3 bucket where files will be uploaded.\n credentials: A dictionary of AWS credentials (aws_access_key_id,\n aws_secret_access_key, aws_session_token).\n client_parameters: A dictionary of additional parameters to pass to the boto3\n client.\n ignore_file: The name of the file containing ignore patterns.\n\n Returns:\n A dictionary containing the bucket and folder where files were uploaded.\n\n Examples:\n Push files to an S3 bucket:\n ```yaml\n push:\n - prefect_aws.deployments.steps.push_to_s3:\n requires: prefect-aws\n bucket: my-bucket\n folder: my-project\n ```\n\n Push files to an S3 bucket using credentials stored in a block:\n ```yaml\n push:\n - prefect_aws.deployments.steps.push_to_s3:\n requires: prefect-aws\n bucket: my-bucket\n folder: my-project\n credentials: \"{{ prefect.blocks.aws-credentials.dev-credentials }}\"\n ```\n\n \"\"\"\n s3 = get_s3_client(credentials=credentials, client_parameters=client_parameters)\n\n local_path = Path.cwd()\n\n included_files = None\n if ignore_file and Path(ignore_file).exists():\n with open(ignore_file, \"r\") as f:\n ignore_patterns = f.readlines()\n\n included_files = filter_files(str(local_path), ignore_patterns)\n\n for local_file_path in local_path.expanduser().rglob(\"*\"):\n if (\n included_files is not None\n and str(local_file_path.relative_to(local_path)) not in included_files\n ):\n continue\n elif not local_file_path.is_dir():\n remote_file_path = Path(folder) / local_file_path.relative_to(local_path)\n s3.upload_file(\n str(local_file_path), bucket, str(remote_file_path.as_posix())\n )\n\n return {\n \"bucket\": bucket,\n \"folder\": folder,\n }\n
"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"
prefect-aws
","text":""},{"location":"#welcome","title":"Welcome!","text":"
prefect-aws
makes it easy to leverage the capabilities of AWS in your flows, featuring support for ECSTask, S3, Secrets Manager, Batch Job, and Client Waiter.
"},{"location":"#getting-started","title":"Getting Started","text":""},{"location":"#saving-credentials-to-a-block","title":"Saving credentials to a block","text":"
You will need an AWS account and credentials in order to use prefect-aws
.
- Refer to the AWS Configuration documentation on how to retrieve your access key ID and secret access key
- Copy the access key ID and secret access key
- Create a short script and replace the placeholders with your credential information and desired block name:
from prefect_aws import AwsCredentials\nAwsCredentials(\n aws_access_key_id=\"PLACEHOLDER\",\n aws_secret_access_key=\"PLACEHOLDER\",\n aws_session_token=None, # replace this with token if necessary\n region_name=\"us-east-2\"\n).save(\"BLOCK-NAME-PLACEHOLDER\")\n
Congrats! You can now load the saved block to use your credentials in your Python code:
from prefect_aws import AwsCredentials\nAwsCredentials.load(\"BLOCK-NAME-PLACEHOLDER\")\n
Registering blocks
Register blocks in this module to view and edit them on Prefect Cloud:
prefect block register -m prefect_aws\n
"},{"location":"#using-prefect-with-aws-ecs","title":"Using Prefect with AWS ECS","text":"
prefect_aws
allows you to use AWS ECS as infrastructure for your deployments. Using ECS for scheduled flow runs enables the dynamic provisioning of infrastructure for containers and unlocks greater scalability.
The snippets below show how you can use prefect_aws
to run a task on ECS. The first example uses the ECSTask
block as infrastructure and the second example shows using ECS within a flow.
"},{"location":"#as-deployment-infrastructure","title":"As deployment Infrastructure","text":""},{"location":"#set-variables","title":"Set variables","text":"
To expedite copy/paste without the needing to update placeholders manually, update and execute the following.
export CREDENTIALS_BLOCK_NAME=\"aws-credentials\"\nexport VPC_ID=\"vpc-id\"\nexport ECS_TASK_BLOCK_NAME=\"ecs-task-example\"\nexport S3_BUCKET_BLOCK_NAME=\"ecs-task-bucket-example\"\n
"},{"location":"#save-an-infrastructure-and-storage-block","title":"Save an infrastructure and storage block","text":"
Save a custom infrastructure and storage block by executing the following snippet.
import os\nfrom prefect_aws import AwsCredentials, ECSTask, S3Bucket\n\naws_credentials = AwsCredentials.load(os.environ[\"CREDENTIALS_BLOCK_NAME\"])\n\necs_task = ECSTask(\n image=\"prefecthq/prefect:2-python3.10\",\n aws_credentials=aws_credentials,\n vpc_id=os.environ[\"VPC_ID\"],\n)\necs_task.save(os.environ[\"ECS_TASK_BLOCK_NAME\"], overwrite=True)\n\nbucket_name = \"ecs-task-bucket-example\"\ns3_client = aws_credentials.get_s3_client()\ns3_client.create_bucket(\n Bucket=bucket_name,\n CreateBucketConfiguration={\"LocationConstraint\": aws_credentials.region_name}\n)\ns3_bucket = S3Bucket(\n bucket_name=bucket_name,\n credentials=aws_credentials,\n)\ns3_bucket.save(os.environ[\"S3_BUCKET_BLOCK_NAME\"], overwrite=True)\n
"},{"location":"#write-a-flow","title":"Write a flow","text":"
Then, use an existing flow to create a deployment with, or use the flow below if you don't have an existing flow handy.
from prefect import flow\n\n@flow(log_prints=True)\ndef ecs_task_flow():\n print(\"Hello, Prefect!\")\n\nif __name__ == \"__main__\":\n ecs_task_flow()\n
"},{"location":"#create-a-deployment","title":"Create a deployment","text":"
If the script was named \"ecs_task_script.py\", build a deployment manifest with the following command.
prefect deployment build ecs_task_script.py:ecs_task_flow \\\n -n ecs-task-deployment \\\n -ib ecs-task/${ECS_TASK_BLOCK_NAME} \\\n -sb s3-bucket/${S3_BUCKET_BLOCK_NAME} \\\n --override env.EXTRA_PIP_PACKAGES=prefect-aws\n
Now apply the deployment!
prefect deployment apply ecs_task_flow-deployment.yaml\n
"},{"location":"#test-the-deployment","title":"Test the deployment","text":"
Start an agent in a separate terminal. The agent will poll the Prefect API's work pool for scheduled flow runs.
prefect agent start -q 'default'\n
Run the deployment once to test it:
prefect deployment run ecs-task-flow/ecs-task-deployment\n
Once the flow run has completed, you will see Hello, Prefect!
logged in the CLI and the Prefect UI.
No class found for dispatch key
If you encounter an error message like KeyError: \"No class found for dispatch key 'ecs-task' in registry for type 'Block'.\"
, ensure prefect-aws
is installed in the environment in which your agent is running!
Another tutorial on ECSTask
can be found here.
"},{"location":"#within-flow","title":"Within Flow","text":"
You can also execute commands with an ECSTask
block directly within a Prefect flow. Running containers via ECS in your flows is useful for executing non-Python code in a distributed manner while using Prefect.
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.ecs import ECSTask\n\n@flow\ndef ecs_task_flow():\n ecs_task = ECSTask(\n image=\"prefecthq/prefect:2-python3.10\",\n credentials=AwsCredentials.load(\"BLOCK-NAME-PLACEHOLDER\"),\n region=\"us-east-2\",\n command=[\"echo\", \"Hello, Prefect!\"],\n )\n return ecs_task.run()\n
This setup gives you all of the observation and orchestration benefits of Prefect, while also providing you the scalability of ECS.
"},{"location":"#using-prefect-with-aws-s3","title":"Using Prefect with AWS S3","text":"
prefect_aws
allows you to read and write objects with AWS S3 within your Prefect flows.
The provided code snippet shows how you can use prefect_aws
to upload a file to a AWS S3 bucket and download the same file under a different file name.
Note, the following code assumes that the bucket already exists.
from pathlib import Path\nfrom prefect import flow\nfrom prefect_aws import AwsCredentials, S3Bucket\n\n@flow\ndef s3_flow():\n # create a dummy file to upload\n file_path = Path(\"test-example.txt\")\n file_path.write_text(\"Hello, Prefect!\")\n\n aws_credentials = AwsCredentials.load(\"BLOCK-NAME-PLACEHOLDER\")\n s3_bucket = S3Bucket(\n bucket_name=\"BUCKET-NAME-PLACEHOLDER\",\n aws_credentials=aws_credentials\n )\n\n s3_bucket_path = s3_bucket.upload_from_path(file_path)\n downloaded_file_path = s3_bucket.download_object_to_path(\n s3_bucket_path, \"downloaded-test-example.txt\"\n )\n return downloaded_file_path.read_text()\n\ns3_flow()\n
"},{"location":"#using-prefect-with-aws-secrets-manager","title":"Using Prefect with AWS Secrets Manager","text":"
prefect_aws
allows you to read and write secrets with AWS Secrets Manager within your Prefect flows.
The provided code snippet shows how you can use prefect_aws
to write a secret to the Secret Manager, read the secret data, delete the secret, and finally return the secret data.
from prefect import flow\nfrom prefect_aws import AwsCredentials, AwsSecret\n\n@flow\ndef secrets_manager_flow():\n aws_credentials = AwsCredentials.load(\"BLOCK-NAME-PLACEHOLDER\")\n aws_secret = AwsSecret(secret_name=\"test-example\", aws_credentials=aws_credentials)\n aws_secret.write_secret(secret_data=b\"Hello, Prefect!\")\n secret_data = aws_secret.read_secret()\n aws_secret.delete_secret()\n return secret_data\n\nsecrets_manager_flow()\n
"},{"location":"#resources","title":"Resources","text":"
Refer to the API documentation on the sidebar to explore all the capabilities of Prefect AWS!
For more tips on how to use tasks and flows in a Collection, check out Using Collections!
"},{"location":"#recipes","title":"Recipes","text":"
For additional recipes and examples, check out prefect-recipes
.
"},{"location":"#installation","title":"Installation","text":"
Install prefect-aws
pip install prefect-aws\n
A list of available blocks in prefect-aws
and their setup instructions can be found here.
Requires an installation of Python 3.7+
We recommend using a Python virtual environment manager such as pipenv, conda or virtualenv.
These tasks are designed to work with Prefect 2.0. For more information about how to use Prefect, please refer to the Prefect documentation.
"},{"location":"#feedback","title":"Feedback","text":"
If you encounter any bugs while using prefect-aws
, feel free to open an issue in the prefect-aws
repository.
If you have any questions or issues while using prefect-aws
, you can find help in either the Prefect Discourse forum or the Prefect Slack community.
Feel free to star or watch prefect-aws
for updates too!
"},{"location":"batch/","title":"Batch","text":""},{"location":"batch/#prefect_aws.batch","title":"
prefect_aws.batch
","text":"
Tasks for interacting with AWS Batch
"},{"location":"batch/#prefect_aws.batch-functions","title":"Functions","text":""},{"location":"batch/#prefect_aws.batch.batch_submit","title":"
batch_submit
async
","text":"
Submit a job to the AWS Batch job service.
Parameters:
Name Type Description Default
job_name
str
The AWS batch job name.
required
job_definition
str
The AWS batch job definition.
required
job_queue
str
Name of the AWS batch job queue.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
**batch_kwargs
Optional[Dict[str, Any]]
Additional keyword arguments to pass to the boto3 submit_job
function. See the documentation for submit_job for more details.
{}
Returns:
Type Description
str
The id corresponding to the job.
Examples:
Submits a job to batch.
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.batch import batch_submit\n\n\n@flow\ndef example_batch_submit_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n job_id = batch_submit(\n \"job_name\",\n \"job_definition\",\n \"job_queue\",\n aws_credentials\n )\n return job_id\n\nexample_batch_submit_flow()\n
Source code in
prefect_aws/batch.py
@task\nasync def batch_submit(\n job_name: str,\n job_queue: str,\n job_definition: str,\n aws_credentials: AwsCredentials,\n **batch_kwargs: Optional[Dict[str, Any]],\n) -> str:\n \"\"\"\n Submit a job to the AWS Batch job service.\n\n Args:\n job_name: The AWS batch job name.\n job_definition: The AWS batch job definition.\n job_queue: Name of the AWS batch job queue.\n aws_credentials: Credentials to use for authentication with AWS.\n **batch_kwargs: Additional keyword arguments to pass to the boto3\n `submit_job` function. See the documentation for\n [submit_job](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/batch.html#Batch.Client.submit_job)\n for more details.\n\n Returns:\n The id corresponding to the job.\n\n Example:\n Submits a job to batch.\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.batch import batch_submit\n\n\n @flow\n def example_batch_submit_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n job_id = batch_submit(\n \"job_name\",\n \"job_definition\",\n \"job_queue\",\n aws_credentials\n )\n return job_id\n\n example_batch_submit_flow()\n ```\n\n \"\"\" # noqa\n logger = get_run_logger()\n logger.info(\"Preparing to submit %s job to %s job queue\", job_name, job_queue)\n\n batch_client = aws_credentials.get_boto3_session().client(\"batch\")\n\n response = await run_sync_in_worker_thread(\n batch_client.submit_job,\n jobName=job_name,\n jobQueue=job_queue,\n jobDefinition=job_definition,\n **batch_kwargs,\n )\n return response[\"jobId\"]\n
"},{"location":"blocks_catalog/","title":"Blocks Catalog","text":"
Below is a list of Blocks available for registration in prefect-aws
.
To register blocks in this module to view and edit them on Prefect Cloud, first install the required packages, then
prefect block register -m prefect_aws\n
Note, to use the
load
method on Blocks, you must already have a block document saved through code or saved through the UI."},{"location":"blocks_catalog/#credentials-module","title":"Credentials Module","text":"
AwsCredentials
Block used to manage authentication with AWS. AWS authentication is handled via the boto3
module. Refer to the boto3 docs for more info about the possible credential configurations.
To load the AwsCredentials:
from prefect import flow\nfrom prefect_aws.credentials import AwsCredentials\n\n@flow\ndef my_flow():\n my_block = AwsCredentials.load(\"MY_BLOCK_NAME\")\n\nmy_flow()\n
MinIOCredentials
Block used to manage authentication with MinIO. Refer to the MinIO docs: https://docs.min.io/docs/minio-server-configuration-guide.html for more info about the possible credential configurations.
To load the MinIOCredentials:
from prefect import flow\nfrom prefect_aws.credentials import MinIOCredentials\n\n@flow\ndef my_flow():\n my_block = MinIOCredentials.load(\"MY_BLOCK_NAME\")\n\nmy_flow()\n
For additional examples, check out the Credentials Module under Examples Catalog."},{"location":"blocks_catalog/#lambda-function-module","title":"Lambda Function Module","text":"
LambdaFunction
Invoke a Lambda function. This block is part of the prefect-aws collection. Install prefect-aws with pip install prefect-aws
to use this block.
To load the LambdaFunction:
from prefect import flow\nfrom prefect_aws.lambda_function import LambdaFunction\n\n@flow\ndef my_flow():\n my_block = LambdaFunction.load(\"MY_BLOCK_NAME\")\n\nmy_flow()\n
For additional examples, check out the Lambda Function Module under Examples Catalog."},{"location":"blocks_catalog/#s3-module","title":"S3 Module","text":"
S3Bucket
Block used to store data using AWS S3 or S3-compatible object storage like MinIO.
To load the S3Bucket:
from prefect import flow\nfrom prefect_aws.s3 import S3Bucket\n\n@flow\ndef my_flow():\n my_block = S3Bucket.load(\"MY_BLOCK_NAME\")\n\nmy_flow()\n
For additional examples, check out the S3 Module under Examples Catalog."},{"location":"blocks_catalog/#ecs-module","title":"Ecs Module","text":"
ECSTask
Run a command as an ECS task.
To load the ECSTask:
from prefect import flow\nfrom prefect_aws.ecs import ECSTask\n\n@flow\ndef my_flow():\n my_block = ECSTask.load(\"MY_BLOCK_NAME\")\n\nmy_flow()\n
For additional examples, check out the Ecs Module under Examples Catalog."},{"location":"blocks_catalog/#secrets-manager-module","title":"Secrets Manager Module","text":"
AwsSecret
Manages a secret in AWS's Secrets Manager.
To load the AwsSecret:
from prefect import flow\nfrom prefect_aws.secrets_manager import AwsSecret\n\n@flow\ndef my_flow():\n my_block = AwsSecret.load(\"MY_BLOCK_NAME\")\n\nmy_flow()\n
For additional examples, check out the Secrets Manager Module under Examples Catalog."},{"location":"client_waiter/","title":"Client Waiter","text":""},{"location":"client_waiter/#prefect_aws.client_waiter","title":"
prefect_aws.client_waiter
","text":"
Task for waiting on a long-running AWS job
"},{"location":"client_waiter/#prefect_aws.client_waiter-functions","title":"Functions","text":""},{"location":"client_waiter/#prefect_aws.client_waiter.client_waiter","title":"
client_waiter
async
","text":"
Uses the underlying boto3 waiter functionality.
Parameters:
Name Type Description Default
client
str
The AWS client on which to wait (e.g., 'client_wait', 'ec2', etc).
required
waiter_name
str
The name of the waiter to instantiate. You may also use a custom waiter name, if you supply an accompanying waiter definition dict.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
waiter_definition
Optional[Dict[str, Any]]
A valid custom waiter model, as a dict. Note that if you supply a custom definition, it is assumed that the provided 'waiter_name' is contained within the waiter definition dict.
None
**waiter_kwargs
Optional[Dict[str, Any]]
Arguments to pass to the waiter.wait(...)
method. Will depend upon the specific waiter being called.
{}
Examples:
Run an ec2 waiter until instance_exists.
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.client_wait import client_waiter\n\n@flow\ndef example_client_wait_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n\n waiter = client_waiter(\n \"ec2\",\n \"instance_exists\",\n aws_credentials\n )\n\n return waiter\nexample_client_wait_flow()\n
Source code in
prefect_aws/client_waiter.py
@task\nasync def client_waiter(\n client: str,\n waiter_name: str,\n aws_credentials: AwsCredentials,\n waiter_definition: Optional[Dict[str, Any]] = None,\n **waiter_kwargs: Optional[Dict[str, Any]],\n):\n \"\"\"\n Uses the underlying boto3 waiter functionality.\n\n Args:\n client: The AWS client on which to wait (e.g., 'client_wait', 'ec2', etc).\n waiter_name: The name of the waiter to instantiate.\n You may also use a custom waiter name, if you supply\n an accompanying waiter definition dict.\n aws_credentials: Credentials to use for authentication with AWS.\n waiter_definition: A valid custom waiter model, as a dict. Note that if\n you supply a custom definition, it is assumed that the provided\n 'waiter_name' is contained within the waiter definition dict.\n **waiter_kwargs: Arguments to pass to the `waiter.wait(...)` method. Will\n depend upon the specific waiter being called.\n\n Example:\n Run an ec2 waiter until instance_exists.\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.client_wait import client_waiter\n\n @flow\n def example_client_wait_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n\n waiter = client_waiter(\n \"ec2\",\n \"instance_exists\",\n aws_credentials\n )\n\n return waiter\n example_client_wait_flow()\n ```\n \"\"\"\n logger = get_run_logger()\n logger.info(\"Waiting on %s job\", client)\n\n boto_client = aws_credentials.get_boto3_session().client(client)\n\n if waiter_definition is not None:\n # Use user-provided waiter definition\n waiter_model = WaiterModel(waiter_definition)\n waiter = create_waiter_with_client(waiter_name, waiter_model, boto_client)\n elif waiter_name in boto_client.waiter_names:\n waiter = boto_client.get_waiter(waiter_name)\n else:\n raise ValueError(\n f\"The waiter name, {waiter_name}, is not a valid boto waiter; \"\n \"if using a custom waiter, you must provide a waiter definition\"\n )\n\n await run_sync_in_worker_thread(waiter.wait, **waiter_kwargs)\n
"},{"location":"contributing/","title":"Contributing","text":"
If you'd like to help contribute to fix an issue or add a feature to prefect-aws
, please propose changes through a pull request from a fork of the repository.
Here are the steps:
- Fork the repository
- Clone the forked repository
- Install the repository and its dependencies:
pip install -e \".[dev]\"\n
- Make desired changes
- Add tests
- Insert an entry to CHANGELOG.md
- Install
pre-commit
to perform quality checks prior to commit: pre-commit install\n
git commit
, git push
, and create a pull request
"},{"location":"credentials/","title":"Credentials","text":""},{"location":"credentials/#prefect_aws.credentials","title":"
prefect_aws.credentials
","text":"
Module handling AWS credentials
"},{"location":"credentials/#prefect_aws.credentials-classes","title":"Classes","text":""},{"location":"credentials/#prefect_aws.credentials.AwsCredentials","title":"
AwsCredentials (CredentialsBlock)
pydantic-model
","text":"
Block used to manage authentication with AWS. AWS authentication is handled via the boto3
module. Refer to the boto3 docs for more info about the possible credential configurations.
Examples:
Load stored AWS credentials:
from prefect_aws import AwsCredentials\n\naws_credentials_block = AwsCredentials.load(\"BLOCK_NAME\")\n
Source code in
prefect_aws/credentials.py
class AwsCredentials(CredentialsBlock):\n \"\"\"\n Block used to manage authentication with AWS. AWS authentication is\n handled via the `boto3` module. Refer to the\n [boto3 docs](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html)\n for more info about the possible credential configurations.\n\n Example:\n Load stored AWS credentials:\n ```python\n from prefect_aws import AwsCredentials\n\n aws_credentials_block = AwsCredentials.load(\"BLOCK_NAME\")\n ```\n \"\"\" # noqa E501\n\n _logo_url = \"https://cdn.sanity.io/images/3ugk85nk/production/d74b16fe84ce626345adf235a47008fea2869a60-225x225.png\" # noqa\n _block_type_name = \"AWS Credentials\"\n _documentation_url = \"https://prefecthq.github.io/prefect-aws/credentials/#prefect_aws.credentials.AwsCredentials\" # noqa\n\n aws_access_key_id: Optional[str] = Field(\n default=None,\n description=\"A specific AWS access key ID.\",\n title=\"AWS Access Key ID\",\n )\n aws_secret_access_key: Optional[SecretStr] = Field(\n default=None,\n description=\"A specific AWS secret access key.\",\n title=\"AWS Access Key Secret\",\n )\n aws_session_token: Optional[str] = Field(\n default=None,\n description=(\n \"The session key for your AWS account. \"\n \"This is only needed when you are using temporary credentials.\"\n ),\n title=\"AWS Session Token\",\n )\n profile_name: Optional[str] = Field(\n default=None, description=\"The profile to use when creating your session.\"\n )\n region_name: Optional[str] = Field(\n default=None,\n description=\"The AWS Region where you want to create new connections.\",\n )\n aws_client_parameters: AwsClientParameters = Field(\n default_factory=AwsClientParameters,\n description=\"Extra parameters to initialize the Client.\",\n title=\"AWS Client Parameters\",\n )\n\n def get_boto3_session(self) -> boto3.Session:\n \"\"\"\n Returns an authenticated boto3 session that can be used to create clients\n for AWS services\n\n Example:\n Create an S3 client from an authorized boto3 session:\n ```python\n aws_credentials = AwsCredentials(\n aws_access_key_id = \"access_key_id\",\n aws_secret_access_key = \"secret_access_key\"\n )\n s3_client = aws_credentials.get_boto3_session().client(\"s3\")\n ```\n \"\"\"\n\n if self.aws_secret_access_key:\n aws_secret_access_key = self.aws_secret_access_key.get_secret_value()\n else:\n aws_secret_access_key = None\n\n return boto3.Session(\n aws_access_key_id=self.aws_access_key_id,\n aws_secret_access_key=aws_secret_access_key,\n aws_session_token=self.aws_session_token,\n profile_name=self.profile_name,\n region_name=self.region_name,\n )\n\n def get_client(self, client_type: Union[str, ClientType]) -> Any:\n \"\"\"\n Helper method to dynamically get a client type.\n\n Args:\n client_type: The client's service name.\n\n Returns:\n An authenticated client.\n\n Raises:\n ValueError: if the client is not supported.\n \"\"\"\n if isinstance(client_type, ClientType):\n client_type = client_type.value\n\n client = self.get_boto3_session().client(\n service_name=client_type, **self.aws_client_parameters.get_params_override()\n )\n return client\n\n def get_s3_client(self) -> S3Client:\n \"\"\"\n Gets an authenticated S3 client.\n\n Returns:\n An authenticated S3 client.\n \"\"\"\n return self.get_client(client_type=ClientType.S3)\n\n def get_secrets_manager_client(self) -> SecretsManagerClient:\n \"\"\"\n Gets an authenticated Secrets Manager client.\n\n Returns:\n An authenticated Secrets Manager client.\n \"\"\"\n return self.get_client(client_type=ClientType.SECRETS_MANAGER)\n
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials-attributes","title":"Attributes","text":""},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.aws_access_key_id","title":"
aws_access_key_id: str
pydantic-field
","text":"
A specific AWS access key ID.
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.aws_client_parameters","title":"
aws_client_parameters: AwsClientParameters
pydantic-field
","text":"
Extra parameters to initialize the Client.
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.aws_secret_access_key","title":"
aws_secret_access_key: SecretStr
pydantic-field
","text":"
A specific AWS secret access key.
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.aws_session_token","title":"
aws_session_token: str
pydantic-field
","text":"
The session key for your AWS account. This is only needed when you are using temporary credentials.
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.profile_name","title":"
profile_name: str
pydantic-field
","text":"
The profile to use when creating your session.
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.region_name","title":"
region_name: str
pydantic-field
","text":"
The AWS Region where you want to create new connections.
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials-methods","title":"Methods","text":""},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.__json_encoder__","title":"
__json_encoder__
special
staticmethod
","text":"
partial(func, args, *keywords) - new function with partial application of the given arguments and keywords.
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.get_boto3_session","title":"
get_boto3_session
","text":"
Returns an authenticated boto3 session that can be used to create clients for AWS services
Examples:
Create an S3 client from an authorized boto3 session:
aws_credentials = AwsCredentials(\n aws_access_key_id = \"access_key_id\",\n aws_secret_access_key = \"secret_access_key\"\n )\ns3_client = aws_credentials.get_boto3_session().client(\"s3\")\n
Source code in
prefect_aws/credentials.py
def get_boto3_session(self) -> boto3.Session:\n \"\"\"\n Returns an authenticated boto3 session that can be used to create clients\n for AWS services\n\n Example:\n Create an S3 client from an authorized boto3 session:\n ```python\n aws_credentials = AwsCredentials(\n aws_access_key_id = \"access_key_id\",\n aws_secret_access_key = \"secret_access_key\"\n )\n s3_client = aws_credentials.get_boto3_session().client(\"s3\")\n ```\n \"\"\"\n\n if self.aws_secret_access_key:\n aws_secret_access_key = self.aws_secret_access_key.get_secret_value()\n else:\n aws_secret_access_key = None\n\n return boto3.Session(\n aws_access_key_id=self.aws_access_key_id,\n aws_secret_access_key=aws_secret_access_key,\n aws_session_token=self.aws_session_token,\n profile_name=self.profile_name,\n region_name=self.region_name,\n )\n
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.get_client","title":"
get_client
","text":"
Helper method to dynamically get a client type.
Parameters:
Name Type Description Default
client_type
Union[str, prefect_aws.credentials.ClientType]
The client's service name.
required
Returns:
Type Description
Any
An authenticated client.
Exceptions:
Type Description
ValueError
if the client is not supported.
Source code in
prefect_aws/credentials.py
def get_client(self, client_type: Union[str, ClientType]) -> Any:\n \"\"\"\n Helper method to dynamically get a client type.\n\n Args:\n client_type: The client's service name.\n\n Returns:\n An authenticated client.\n\n Raises:\n ValueError: if the client is not supported.\n \"\"\"\n if isinstance(client_type, ClientType):\n client_type = client_type.value\n\n client = self.get_boto3_session().client(\n service_name=client_type, **self.aws_client_parameters.get_params_override()\n )\n return client\n
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.get_s3_client","title":"
get_s3_client
","text":"
Gets an authenticated S3 client.
Returns:
Type Description
S3Client
An authenticated S3 client.
Source code in
prefect_aws/credentials.py
def get_s3_client(self) -> S3Client:\n \"\"\"\n Gets an authenticated S3 client.\n\n Returns:\n An authenticated S3 client.\n \"\"\"\n return self.get_client(client_type=ClientType.S3)\n
"},{"location":"credentials/#prefect_aws.credentials.AwsCredentials.get_secrets_manager_client","title":"
get_secrets_manager_client
","text":"
Gets an authenticated Secrets Manager client.
Returns:
Type Description
SecretsManagerClient
An authenticated Secrets Manager client.
Source code in
prefect_aws/credentials.py
def get_secrets_manager_client(self) -> SecretsManagerClient:\n \"\"\"\n Gets an authenticated Secrets Manager client.\n\n Returns:\n An authenticated Secrets Manager client.\n \"\"\"\n return self.get_client(client_type=ClientType.SECRETS_MANAGER)\n
"},{"location":"credentials/#prefect_aws.credentials.ClientType","title":"
ClientType (Enum)
","text":"
An enumeration.
Source code in
prefect_aws/credentials.py
class ClientType(Enum):\n S3 = \"s3\"\n ECS = \"ecs\"\n BATCH = \"batch\"\n SECRETS_MANAGER = \"secretsmanager\"\n
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials","title":"
MinIOCredentials (CredentialsBlock)
pydantic-model
","text":"
Block used to manage authentication with MinIO. Refer to the MinIO docs for more info about the possible credential configurations.
Attributes:
Name Type Description
minio_root_user
str
Admin or root user.
minio_root_password
SecretStr
Admin or root password.
region_name
Optional[str]
Location of server, e.g. \"us-east-1\".
Examples:
Load stored MinIO credentials:
from prefect_aws import MinIOCredentials\n\nminio_credentials_block = MinIOCredentials.load(\"BLOCK_NAME\")\n
Source code in
prefect_aws/credentials.py
class MinIOCredentials(CredentialsBlock):\n \"\"\"\n Block used to manage authentication with MinIO. Refer to the\n [MinIO docs](https://docs.min.io/docs/minio-server-configuration-guide.html)\n for more info about the possible credential configurations.\n\n Attributes:\n minio_root_user: Admin or root user.\n minio_root_password: Admin or root password.\n region_name: Location of server, e.g. \"us-east-1\".\n\n Example:\n Load stored MinIO credentials:\n ```python\n from prefect_aws import MinIOCredentials\n\n minio_credentials_block = MinIOCredentials.load(\"BLOCK_NAME\")\n ```\n \"\"\" # noqa E501\n\n _logo_url = \"https://cdn.sanity.io/images/3ugk85nk/production/676cb17bcbdff601f97e0a02ff8bcb480e91ff40-250x250.png\" # noqa\n _block_type_name = \"MinIO Credentials\"\n _description = (\n \"Block used to manage authentication with MinIO. Refer to the MinIO \"\n \"docs: https://docs.min.io/docs/minio-server-configuration-guide.html \"\n \"for more info about the possible credential configurations.\"\n )\n _documentation_url = \"https://prefecthq.github.io/prefect-aws/credentials/#prefect_aws.credentials.MinIOCredentials\" # noqa\n\n minio_root_user: str = Field(default=..., description=\"Admin or root user.\")\n minio_root_password: SecretStr = Field(\n default=..., description=\"Admin or root password.\"\n )\n region_name: Optional[str] = Field(\n default=None,\n description=\"The AWS Region where you want to create new connections.\",\n )\n aws_client_parameters: AwsClientParameters = Field(\n default_factory=AwsClientParameters,\n description=\"Extra parameters to initialize the Client.\",\n )\n\n def get_boto3_session(self) -> boto3.Session:\n \"\"\"\n Returns an authenticated boto3 session that can be used to create clients\n and perform object operations on MinIO server.\n\n Example:\n Create an S3 client from an authorized boto3 session\n\n ```python\n minio_credentials = MinIOCredentials(\n minio_root_user = \"minio_root_user\",\n minio_root_password = \"minio_root_password\"\n )\n s3_client = minio_credentials.get_boto3_session().client(\n service=\"s3\",\n endpoint_url=\"http://localhost:9000\"\n )\n ```\n \"\"\"\n\n minio_root_password = (\n self.minio_root_password.get_secret_value()\n if self.minio_root_password\n else None\n )\n\n return boto3.Session(\n aws_access_key_id=self.minio_root_user,\n aws_secret_access_key=minio_root_password,\n region_name=self.region_name,\n )\n\n def get_client(self, client_type: Union[str, ClientType]) -> Any:\n \"\"\"\n Helper method to dynamically get a client type.\n\n Args:\n client_type: The client's service name.\n\n Returns:\n An authenticated client.\n\n Raises:\n ValueError: if the client is not supported.\n \"\"\"\n if isinstance(client_type, ClientType):\n client_type = client_type.value\n\n client = self.get_boto3_session().client(\n service_name=client_type, **self.aws_client_parameters.get_params_override()\n )\n return client\n\n def get_s3_client(self) -> S3Client:\n \"\"\"\n Gets an authenticated S3 client.\n\n Returns:\n An authenticated S3 client.\n \"\"\"\n return self.get_client(client_type=ClientType.S3)\n
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials-attributes","title":"Attributes","text":""},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials.aws_client_parameters","title":"
aws_client_parameters: AwsClientParameters
pydantic-field
","text":"
Extra parameters to initialize the Client.
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials.minio_root_password","title":"
minio_root_password: SecretStr
pydantic-field
required
","text":"
Admin or root password.
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials.minio_root_user","title":"
minio_root_user: str
pydantic-field
required
","text":"
Admin or root user.
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials.region_name","title":"
region_name: str
pydantic-field
","text":"
The AWS Region where you want to create new connections.
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials-methods","title":"Methods","text":""},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials.__json_encoder__","title":"
__json_encoder__
special
staticmethod
","text":"
partial(func, args, *keywords) - new function with partial application of the given arguments and keywords.
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials.get_boto3_session","title":"
get_boto3_session
","text":"
Returns an authenticated boto3 session that can be used to create clients and perform object operations on MinIO server.
Examples:
Create an S3 client from an authorized boto3 session
minio_credentials = MinIOCredentials(\n minio_root_user = \"minio_root_user\",\n minio_root_password = \"minio_root_password\"\n)\ns3_client = minio_credentials.get_boto3_session().client(\n service=\"s3\",\n endpoint_url=\"http://localhost:9000\"\n)\n
Source code in
prefect_aws/credentials.py
def get_boto3_session(self) -> boto3.Session:\n \"\"\"\n Returns an authenticated boto3 session that can be used to create clients\n and perform object operations on MinIO server.\n\n Example:\n Create an S3 client from an authorized boto3 session\n\n ```python\n minio_credentials = MinIOCredentials(\n minio_root_user = \"minio_root_user\",\n minio_root_password = \"minio_root_password\"\n )\n s3_client = minio_credentials.get_boto3_session().client(\n service=\"s3\",\n endpoint_url=\"http://localhost:9000\"\n )\n ```\n \"\"\"\n\n minio_root_password = (\n self.minio_root_password.get_secret_value()\n if self.minio_root_password\n else None\n )\n\n return boto3.Session(\n aws_access_key_id=self.minio_root_user,\n aws_secret_access_key=minio_root_password,\n region_name=self.region_name,\n )\n
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials.get_client","title":"
get_client
","text":"
Helper method to dynamically get a client type.
Parameters:
Name Type Description Default
client_type
Union[str, prefect_aws.credentials.ClientType]
The client's service name.
required
Returns:
Type Description
Any
An authenticated client.
Exceptions:
Type Description
ValueError
if the client is not supported.
Source code in
prefect_aws/credentials.py
def get_client(self, client_type: Union[str, ClientType]) -> Any:\n \"\"\"\n Helper method to dynamically get a client type.\n\n Args:\n client_type: The client's service name.\n\n Returns:\n An authenticated client.\n\n Raises:\n ValueError: if the client is not supported.\n \"\"\"\n if isinstance(client_type, ClientType):\n client_type = client_type.value\n\n client = self.get_boto3_session().client(\n service_name=client_type, **self.aws_client_parameters.get_params_override()\n )\n return client\n
"},{"location":"credentials/#prefect_aws.credentials.MinIOCredentials.get_s3_client","title":"
get_s3_client
","text":"
Gets an authenticated S3 client.
Returns:
Type Description
S3Client
An authenticated S3 client.
Source code in
prefect_aws/credentials.py
def get_s3_client(self) -> S3Client:\n \"\"\"\n Gets an authenticated S3 client.\n\n Returns:\n An authenticated S3 client.\n \"\"\"\n return self.get_client(client_type=ClientType.S3)\n
"},{"location":"ecs/","title":"ECS","text":""},{"location":"ecs/#prefect_aws.ecs","title":"
prefect_aws.ecs
","text":"
Integrations with the Amazon Elastic Container Service.
Examples:
Run a task using ECS Fargate
ECSTask(command=[\"echo\", \"hello world\"]).run()\n
Run a task using ECS Fargate with a spot container instance
ECSTask(command=[\"echo\", \"hello world\"], launch_type=\"FARGATE_SPOT\").run()\n
Run a task using ECS with an EC2 container instance
ECSTask(command=[\"echo\", \"hello world\"], launch_type=\"EC2\").run()\n
Run a task on a specific VPC using ECS Fargate
ECSTask(command=[\"echo\", \"hello world\"], vpc_id=\"vpc-01abcdf123456789a\").run()\n
Run a task and stream the container's output to the local terminal. Note an execution role must be provided with permissions: logs:CreateLogStream, logs:CreateLogGroup, and logs:PutLogEvents.
ECSTask(\n command=[\"echo\", \"hello world\"],\n stream_output=True,\n execution_role_arn=\"...\"\n)\n
Run a task using an existing task definition as a base
ECSTask(command=[\"echo\", \"hello world\"], task_definition_arn=\"arn:aws:ecs:...\")\n
Run a task with a specific image
ECSTask(command=[\"echo\", \"hello world\"], image=\"alpine:latest\")\n
Run a task with custom memory and CPU requirements
ECSTask(command=[\"echo\", \"hello world\"], memory=4096, cpu=2048)\n
Run a task with custom environment variables
ECSTask(command=[\"echo\", \"hello $PLANET\"], env={\"PLANET\": \"earth\"})\n
Run a task in a specific ECS cluster
ECSTask(command=[\"echo\", \"hello world\"], cluster=\"my-cluster-name\")\n
Run a task with custom VPC subnets
ECSTask(\n command=[\"echo\", \"hello world\"],\n task_customizations=[\n {\n \"op\": \"add\",\n \"path\": \"/networkConfiguration/awsvpcConfiguration/subnets\",\n \"value\": [\"subnet-80b6fbcd\", \"subnet-42a6fdgd\"],\n },\n ]\n)\n
Run a task without a public IP assigned
ECSTask(\n command=[\"echo\", \"hello world\"],\n vpc_id=\"vpc-01abcdf123456789a\",\n task_customizations=[\n {\n \"op\": \"replace\",\n \"path\": \"/networkConfiguration/awsvpcConfiguration/assignPublicIp\",\n \"value\": \"DISABLED\",\n },\n ]\n)\n
Run a task with custom VPC security groups
ECSTask(\n command=[\"echo\", \"hello world\"],\n vpc_id=\"vpc-01abcdf123456789a\",\n task_customizations=[\n {\n \"op\": \"add\",\n \"path\": \"/networkConfiguration/awsvpcConfiguration/securityGroups\",\n \"value\": [\"sg-d72e9599956a084f5\"],\n },\n ],\n)\n
"},{"location":"ecs/#prefect_aws.ecs-classes","title":"Classes","text":""},{"location":"ecs/#prefect_aws.ecs.ECSTask","title":"
ECSTask (Infrastructure)
pydantic-model
","text":"
Run a command as an ECS task.
Attributes:
Name Type Description
type
Literal['ecs-task']
The slug for this task type with a default value of \"ecs-task\".
aws_credentials
AwsCredentials
The AWS credentials to use to connect to ECS with a default factory of AwsCredentials.
task_definition_arn
Optional[str]
An optional identifier for an existing task definition to use. If fields are set on the ECSTask that conflict with the task definition, a new copy will be registered with the required values. Cannot be used with task_definition. If not provided, Prefect will generate and register a minimal task definition.
task_definition
Optional[dict]
An optional ECS task definition to use. Prefect may set defaults or override fields on this task definition to match other ECSTask fields. Cannot be used with task_definition_arn. If not provided, Prefect will generate and register a minimal task definition.
family
Optional[str]
An optional family for the task definition. If not provided, it will be inferred from the task definition. If the task definition does not have a family, the name will be generated. When flow and deployment metadata is available, the generated name will include their names. Values for this field will be slugified to match AWS character requirements.
image
Optional[str]
An optional image to use for the Prefect container in the task. If this value is not null, it will override the value in the task definition. This value defaults to a Prefect base image matching your local versions.
auto_deregister_task_definition
bool
A boolean that controls if any task definitions that are created by this block will be deregistered or not. Existing task definitions linked by ARN will never be deregistered. Deregistering a task definition does not remove it from your AWS account, instead it will be marked as INACTIVE.
cpu
int
The amount of CPU to provide to the ECS task. Valid amounts are specified in the AWS documentation. If not provided, a default value of ECS_DEFAULT_CPU will be used unless present on the task definition.
memory
int
The amount of memory to provide to the ECS task. Valid amounts are specified in the AWS documentation. If not provided, a default value of ECS_DEFAULT_MEMORY will be used unless present on the task definition.
execution_role_arn
str
An execution role to use for the task. This controls the permissions of the task when it is launching. If this value is not null, it will override the value in the task definition. An execution role must be provided to capture logs from the container.
configure_cloudwatch_logs
bool
A boolean that controls if the Prefect container will be configured to send its output to the AWS CloudWatch logs service or not. This functionality requires an execution role with permissions to create log streams and groups.
cloudwatch_logs_options
Dict[str, str]
A dictionary of options to pass to the CloudWatch logs configuration.
stream_output
bool
A boolean indicating whether logs will be streamed from the Prefect container to the local console.
launch_type
Optional[Literal['FARGATE', 'EC2', 'EXTERNAL', 'FARGATE_SPOT']]
An optional launch type for the ECS task run infrastructure.
vpc_id
Optional[str]
An optional VPC ID to link the task run to. This is only applicable when using the 'awsvpc' network mode for your task.
cluster
Optional[str]
An optional ECS cluster to run the task in. The ARN or name may be provided. If not provided, the default cluster will be used.
env
Dict[str, Optional[str]]
A dictionary of environment variables to provide to the task run. These variables are set on the Prefect container at task runtime.
task_role_arn
str
An optional role to attach to the task run. This controls the permissions of the task while it is running.
task_customizations
JsonPatch
A list of JSON 6902 patches to apply to the task run request. If a string is given, it will parsed as a JSON expression.
task_start_timeout_seconds
int
The amount of time to watch for the start of the ECS task before marking it as failed. The task must enter a RUNNING state to be considered started.
task_watch_poll_interval
float
The amount of time to wait between AWS API calls while monitoring the state of an ECS task.
Source code in
prefect_aws/ecs.py
class ECSTask(Infrastructure):\n \"\"\"\n Run a command as an ECS task.\n\n Attributes:\n type: The slug for this task type with a default value of \"ecs-task\".\n aws_credentials: The AWS credentials to use to connect to ECS with a\n default factory of AwsCredentials.\n task_definition_arn: An optional identifier for an existing task definition\n to use. If fields are set on the ECSTask that conflict with the task\n definition, a new copy will be registered with the required values.\n Cannot be used with task_definition. If not provided, Prefect will\n generate and register a minimal task definition.\n task_definition: An optional ECS task definition to use. Prefect may set\n defaults or override fields on this task definition to match other\n ECSTask fields. Cannot be used with task_definition_arn.\n If not provided, Prefect will generate and register\n a minimal task definition.\n family: An optional family for the task definition. If not provided,\n it will be inferred from the task definition. If the task definition\n does not have a family, the name will be generated. When flow and\n deployment metadata is available, the generated name will include\n their names. Values for this field will be slugified to match\n AWS character requirements.\n image: An optional image to use for the Prefect container in the task.\n If this value is not null, it will override the value in the task\n definition. This value defaults to a Prefect base image matching\n your local versions.\n auto_deregister_task_definition: A boolean that controls if any task\n definitions that are created by this block will be deregistered\n or not. Existing task definitions linked by ARN will never be\n deregistered. Deregistering a task definition does not remove\n it from your AWS account, instead it will be marked as INACTIVE.\n cpu: The amount of CPU to provide to the ECS task. Valid amounts are\n specified in the AWS documentation. If not provided, a default\n value of ECS_DEFAULT_CPU will be used unless present on\n the task definition.\n memory: The amount of memory to provide to the ECS task.\n Valid amounts are specified in the AWS documentation.\n If not provided, a default value of ECS_DEFAULT_MEMORY\n will be used unless present on the task definition.\n execution_role_arn: An execution role to use for the task.\n This controls the permissions of the task when it is launching.\n If this value is not null, it will override the value in the task\n definition. An execution role must be provided to capture logs\n from the container.\n configure_cloudwatch_logs: A boolean that controls if the Prefect\n container will be configured to send its output to the\n AWS CloudWatch logs service or not. This functionality requires\n an execution role with permissions to create log streams and groups.\n cloudwatch_logs_options: A dictionary of options to pass to\n the CloudWatch logs configuration.\n stream_output: A boolean indicating whether logs will be\n streamed from the Prefect container to the local console.\n launch_type: An optional launch type for the ECS task run infrastructure.\n vpc_id: An optional VPC ID to link the task run to.\n This is only applicable when using the 'awsvpc' network mode for your task.\n cluster: An optional ECS cluster to run the task in.\n The ARN or name may be provided. If not provided,\n the default cluster will be used.\n env: A dictionary of environment variables to provide to\n the task run. These variables are set on the\n Prefect container at task runtime.\n task_role_arn: An optional role to attach to the task run.\n This controls the permissions of the task while it is running.\n task_customizations: A list of JSON 6902 patches to apply to the task\n run request. If a string is given, it will parsed as a JSON expression.\n task_start_timeout_seconds: The amount of time to watch for the\n start of the ECS task before marking it as failed. The task must\n enter a RUNNING state to be considered started.\n task_watch_poll_interval: The amount of time to wait between AWS API\n calls while monitoring the state of an ECS task.\n \"\"\"\n\n _block_type_slug = \"ecs-task\"\n _block_type_name = \"ECS Task\"\n _logo_url = \"https://cdn.sanity.io/images/3ugk85nk/production/d74b16fe84ce626345adf235a47008fea2869a60-225x225.png\" # noqa\n _description = \"Run a command as an ECS task.\" # noqa\n _documentation_url = (\n \"https://prefecthq.github.io/prefect-aws/ecs/#prefect_aws.ecs.ECSTask\" # noqa\n )\n\n type: Literal[\"ecs-task\"] = Field(\n \"ecs-task\", description=\"The slug for this task type.\"\n )\n\n aws_credentials: AwsCredentials = Field(\n title=\"AWS Credentials\",\n default_factory=AwsCredentials,\n description=\"The AWS credentials to use to connect to ECS.\",\n )\n\n # Task definition settings\n task_definition_arn: Optional[str] = Field(\n default=None,\n description=(\n \"An identifier for an existing task definition to use. If fields are set \"\n \"on the `ECSTask` that conflict with the task definition, a new copy \"\n \"will be registered with the required values. \"\n \"Cannot be used with `task_definition`. If not provided, Prefect will \"\n \"generate and register a minimal task definition.\"\n ),\n )\n task_definition: Optional[dict] = Field(\n default=None,\n description=(\n \"An ECS task definition to use. Prefect may set defaults or override \"\n \"fields on this task definition to match other `ECSTask` fields. \"\n \"Cannot be used with `task_definition_arn`. If not provided, Prefect will \"\n \"generate and register a minimal task definition.\"\n ),\n )\n family: Optional[str] = Field(\n default=None,\n description=(\n \"A family for the task definition. If not provided, it will be inferred \"\n \"from the task definition. If the task definition does not have a family, \"\n \"the name will be generated. When flow and deployment metadata is \"\n \"available, the generated name will include their names. Values for this \"\n \"field will be slugified to match AWS character requirements.\"\n ),\n )\n image: Optional[str] = Field(\n default=None,\n description=(\n \"The image to use for the Prefect container in the task. If this value is \"\n \"not null, it will override the value in the task definition. This value \"\n \"defaults to a Prefect base image matching your local versions.\"\n ),\n )\n auto_deregister_task_definition: bool = Field(\n default=True,\n description=(\n \"If set, any task definitions that are created by this block will be \"\n \"deregistered. Existing task definitions linked by ARN will never be \"\n \"deregistered. Deregistering a task definition does not remove it from \"\n \"your AWS account, instead it will be marked as INACTIVE.\"\n ),\n )\n\n # Mixed task definition / run settings\n cpu: int = Field(\n title=\"CPU\",\n default=None,\n description=(\n \"The amount of CPU to provide to the ECS task. Valid amounts are \"\n \"specified in the AWS documentation. If not provided, a default value of \"\n f\"{ECS_DEFAULT_CPU} will be used unless present on the task definition.\"\n ),\n )\n memory: int = Field(\n default=None,\n description=(\n \"The amount of memory to provide to the ECS task. Valid amounts are \"\n \"specified in the AWS documentation. If not provided, a default value of \"\n f\"{ECS_DEFAULT_MEMORY} will be used unless present on the task definition.\"\n ),\n )\n execution_role_arn: str = Field(\n title=\"Execution Role ARN\",\n default=None,\n description=(\n \"An execution role to use for the task. This controls the permissions of \"\n \"the task when it is launching. If this value is not null, it will \"\n \"override the value in the task definition. An execution role must be \"\n \"provided to capture logs from the container.\"\n ),\n )\n configure_cloudwatch_logs: bool = Field(\n default=None,\n description=(\n \"If `True`, the Prefect container will be configured to send its output \"\n \"to the AWS CloudWatch logs service. This functionality requires an \"\n \"execution role with logs:CreateLogStream, logs:CreateLogGroup, and \"\n \"logs:PutLogEvents permissions. The default for this field is `False` \"\n \"unless `stream_output` is set.\"\n ),\n )\n cloudwatch_logs_options: Dict[str, str] = Field(\n default_factory=dict,\n description=(\n \"When `configure_cloudwatch_logs` is enabled, this setting may be used to \"\n \"pass additional options to the CloudWatch logs configuration or override \"\n \"the default options. See the AWS documentation for available options. \"\n \"https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_awslogs.html#create_awslogs_logdriver_options.\" # noqa\n ),\n )\n stream_output: bool = Field(\n default=None,\n description=(\n \"If `True`, logs will be streamed from the Prefect container to the local \"\n \"console. Unless you have configured AWS CloudWatch logs manually on your \"\n \"task definition, this requires the same prerequisites outlined in \"\n \"`configure_cloudwatch_logs`.\"\n ),\n )\n\n # Task run settings\n launch_type: Optional[Literal[\"FARGATE\", \"EC2\", \"EXTERNAL\", \"FARGATE_SPOT\"]] = (\n Field(\n default=\"FARGATE\",\n description=(\n \"The type of ECS task run infrastructure that should be used. Note that\"\n \" 'FARGATE_SPOT' is not a formal ECS launch type, but we will configure\"\n \" the proper capacity provider stategy if set here.\"\n ),\n )\n )\n vpc_id: Optional[str] = Field(\n title=\"VPC ID\",\n default=None,\n description=(\n \"The AWS VPC to link the task run to. This is only applicable when using \"\n \"the 'awsvpc' network mode for your task. FARGATE tasks require this \"\n \"network mode, but for EC2 tasks the default network mode is 'bridge'. \"\n \"If using the 'awsvpc' network mode and this field is null, your default \"\n \"VPC will be used. If no default VPC can be found, the task run will fail.\"\n ),\n )\n cluster: Optional[str] = Field(\n default=None,\n description=(\n \"The ECS cluster to run the task in. The ARN or name may be provided. If \"\n \"not provided, the default cluster will be used.\"\n ),\n )\n env: Dict[str, Optional[str]] = Field(\n title=\"Environment Variables\",\n default_factory=dict,\n description=(\n \"Environment variables to provide to the task run. These variables are set \"\n \"on the Prefect container at task runtime. These will not be set on the \"\n \"task definition.\"\n ),\n )\n task_role_arn: str = Field(\n title=\"Task Role ARN\",\n default=None,\n description=(\n \"A role to attach to the task run. This controls the permissions of the \"\n \"task while it is running.\"\n ),\n )\n task_customizations: JsonPatch = Field(\n default_factory=lambda: JsonPatch([]),\n description=(\n \"A list of JSON 6902 patches to apply to the task run request. \"\n \"If a string is given, it will parsed as a JSON expression.\"\n ),\n )\n\n # Execution settings\n task_start_timeout_seconds: int = Field(\n default=120,\n description=(\n \"The amount of time to watch for the start of the ECS task \"\n \"before marking it as failed. The task must enter a RUNNING state to be \"\n \"considered started.\"\n ),\n )\n task_watch_poll_interval: float = Field(\n default=5.0,\n description=(\n \"The amount of time to wait between AWS API calls while monitoring the \"\n \"state of an ECS task.\"\n ),\n )\n\n @root_validator(pre=True)\n def set_default_configure_cloudwatch_logs(cls, values: dict) -> dict:\n \"\"\"\n Streaming output generally requires CloudWatch logs to be configured.\n\n To avoid entangled arguments in the simple case, `configure_cloudwatch_logs`\n defaults to matching the value of `stream_output`.\n \"\"\"\n configure_cloudwatch_logs = values.get(\"configure_cloudwatch_logs\")\n if configure_cloudwatch_logs is None:\n values[\"configure_cloudwatch_logs\"] = values.get(\"stream_output\")\n return values\n\n @root_validator\n def configure_cloudwatch_logs_requires_execution_role_arn(\n cls, values: dict\n ) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if (\n values.get(\"configure_cloudwatch_logs\")\n and not values.get(\"execution_role_arn\")\n # Do not raise if they've linked to another task definition or provided\n # it without using our shortcuts\n and not values.get(\"task_definition_arn\")\n and not (values.get(\"task_definition\") or {}).get(\"executionRoleArn\")\n ):\n raise ValueError(\n \"An `execution_role_arn` must be provided to use \"\n \"`configure_cloudwatch_logs` or `stream_logs`.\"\n )\n return values\n\n @root_validator\n def cloudwatch_logs_options_requires_configure_cloudwatch_logs(\n cls, values: dict\n ) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if values.get(\"cloudwatch_logs_options\") and not values.get(\n \"configure_cloudwatch_logs\"\n ):\n raise ValueError(\n \"`configure_cloudwatch_log` must be enabled to use \"\n \"`cloudwatch_logs_options`.\"\n )\n return values\n\n @root_validator(pre=True)\n def image_is_required(cls, values: dict) -> dict:\n \"\"\"\n Enforces that an image is available if image is `None`.\n \"\"\"\n has_image = bool(values.get(\"image\"))\n has_task_definition_arn = bool(values.get(\"task_definition_arn\"))\n\n # The image can only be null when the task_definition_arn is set\n if has_image or has_task_definition_arn:\n return values\n\n prefect_container = (\n get_prefect_container(\n (values.get(\"task_definition\") or {}).get(\"containerDefinitions\", [])\n )\n or {}\n )\n image_in_task_definition = prefect_container.get(\"image\")\n\n # If a task_definition is given with a prefect container image, use that value\n if image_in_task_definition:\n values[\"image\"] = image_in_task_definition\n # Otherwise, it should default to the Prefect base image\n else:\n values[\"image\"] = get_prefect_image_name()\n return values\n\n @validator(\"task_customizations\", pre=True)\n def cast_customizations_to_a_json_patch(\n cls, value: Union[List[Dict], JsonPatch, str]\n ) -> JsonPatch:\n \"\"\"\n Casts lists to JsonPatch instances.\n \"\"\"\n if isinstance(value, str):\n value = json.loads(value)\n if isinstance(value, list):\n return JsonPatch(value)\n return value # type: ignore\n\n class Config:\n \"\"\"Configuration of pydantic.\"\"\"\n\n # Support serialization of the 'JsonPatch' type\n arbitrary_types_allowed = True\n json_encoders = {JsonPatch: lambda p: p.patch}\n\n def dict(self, *args, **kwargs) -> Dict:\n \"\"\"\n Convert to a dictionary.\n \"\"\"\n # Support serialization of the 'JsonPatch' type\n d = super().dict(*args, **kwargs)\n d[\"task_customizations\"] = self.task_customizations.patch\n return d\n\n def prepare_for_flow_run(\n self: Self,\n flow_run: \"FlowRun\",\n deployment: Optional[\"Deployment\"] = None,\n flow: Optional[\"Flow\"] = None,\n ) -> Self:\n \"\"\"\n Return an copy of the block that is prepared to execute a flow run.\n \"\"\"\n new_family = None\n\n # Update the family if not specified elsewhere\n if (\n not self.family\n and not self.task_definition_arn\n and not (self.task_definition and self.task_definition.get(\"family\"))\n ):\n if flow and deployment:\n new_family = f\"{ECS_DEFAULT_FAMILY}__{flow.name}__{deployment.name}\"\n elif flow and not deployment:\n new_family = f\"{ECS_DEFAULT_FAMILY}__{flow.name}\"\n elif deployment and not flow:\n # This is a weird case and should not be see in the wild\n new_family = f\"{ECS_DEFAULT_FAMILY}__unknown-flow__{deployment.name}\"\n\n new = super().prepare_for_flow_run(flow_run, deployment=deployment, flow=flow)\n\n if new_family:\n return new.copy(update={\"family\": new_family})\n else:\n # Avoid an extra copy if not needed\n return new\n\n @sync_compatible\n async def run(self, task_status: Optional[TaskStatus] = None) -> ECSTaskResult:\n \"\"\"\n Run the configured task on ECS.\n \"\"\"\n boto_session, ecs_client = await run_sync_in_worker_thread(\n self._get_session_and_client\n )\n\n (\n task_arn,\n cluster_arn,\n task_definition,\n is_new_task_definition,\n ) = await run_sync_in_worker_thread(\n self._create_task_and_wait_for_start, boto_session, ecs_client\n )\n\n # Display a nice message indicating the command and image\n command = self.command or get_prefect_container(\n task_definition[\"containerDefinitions\"]\n ).get(\"command\", [])\n self.logger.info(\n f\"{self._log_prefix}: Running command {' '.join(command)!r} \"\n f\"in container {PREFECT_ECS_CONTAINER_NAME!r} ({self.image})...\"\n )\n\n # The task identifier is \"{cluster}::{task}\" where we use the configured cluster\n # if set to preserve matching by name rather than arn\n # Note \"::\" is used despite the Prefect standard being \":\" because ARNs contain\n # single colons.\n identifier = (self.cluster if self.cluster else cluster_arn) + \"::\" + task_arn\n\n if task_status:\n task_status.started(identifier)\n\n status_code = await run_sync_in_worker_thread(\n self._watch_task_and_get_exit_code,\n task_arn,\n cluster_arn,\n task_definition,\n is_new_task_definition and self.auto_deregister_task_definition,\n boto_session,\n ecs_client,\n )\n\n return ECSTaskResult(\n identifier=identifier,\n # If the container does not start the exit code can be null but we must\n # still report a status code. We use a -1 to indicate a special code.\n status_code=status_code if status_code is not None else -1,\n )\n\n @sync_compatible\n async def kill(self, identifier: str, grace_seconds: int = 30) -> None:\n \"\"\"\n Kill a task running on ECS.\n\n Args:\n identifier: A cluster and task arn combination. This should match a value\n yielded by `ECSTask.run`.\n \"\"\"\n if grace_seconds != 30:\n self.logger.warning(\n f\"Kill grace period of {grace_seconds}s requested, but AWS does not \"\n \"support dynamic grace period configuration so 30s will be used. \"\n \"See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-config.html for configuration of grace periods.\" # noqa\n )\n cluster, task = parse_task_identifier(identifier)\n await run_sync_in_worker_thread(self._stop_task, cluster, task)\n\n @staticmethod\n def get_corresponding_worker_type() -> str:\n \"\"\"Return the corresponding worker type for this infrastructure block.\"\"\"\n return ECSWorker.type\n\n async def generate_work_pool_base_job_template(self) -> dict:\n \"\"\"\n Generate a base job template for a cloud-run work pool with the same\n configuration as this block.\n\n Returns:\n - dict: a base job template for a cloud-run work pool\n \"\"\"\n base_job_template = copy.deepcopy(ECSWorker.get_default_base_job_template())\n for key, value in self.dict(exclude_unset=True, exclude_defaults=True).items():\n if key == \"command\":\n base_job_template[\"variables\"][\"properties\"][\"command\"][\"default\"] = (\n shlex.join(value)\n )\n elif key in [\n \"type\",\n \"block_type_slug\",\n \"_block_document_id\",\n \"_block_document_name\",\n \"_is_anonymous\",\n \"task_customizations\",\n ]:\n continue\n elif key == \"aws_credentials\":\n if not self.aws_credentials._block_document_id:\n raise BlockNotSavedError(\n \"It looks like you are trying to use a block that\"\n \" has not been saved. Please call `.save` on your block\"\n \" before publishing it as a work pool.\"\n )\n base_job_template[\"variables\"][\"properties\"][\"aws_credentials\"][\n \"default\"\n ] = {\n \"$ref\": {\n \"block_document_id\": str(\n self.aws_credentials._block_document_id\n )\n }\n }\n elif key == \"task_definition\":\n base_job_template[\"job_configuration\"][\"task_definition\"] = value\n elif key in base_job_template[\"variables\"][\"properties\"]:\n base_job_template[\"variables\"][\"properties\"][key][\"default\"] = value\n else:\n self.logger.warning(\n f\"Variable {key!r} is not supported by Cloud Run work pools.\"\n \" Skipping.\"\n )\n\n if self.task_customizations:\n try:\n base_job_template[\"job_configuration\"][\"task_run_request\"] = (\n self.task_customizations.apply(\n base_job_template[\"job_configuration\"][\"task_run_request\"]\n )\n )\n except JsonPointerException:\n self.logger.warning(\n \"Unable to apply task customizations to the base job template.\"\n \"You may need to update the template manually.\"\n )\n\n return base_job_template\n\n def _stop_task(self, cluster: str, task: str) -> None:\n \"\"\"\n Stop a running ECS task.\n \"\"\"\n if self.cluster is not None and cluster != self.cluster:\n raise InfrastructureNotAvailable(\n \"Cannot stop ECS task: this infrastructure block has access to \"\n f\"cluster {self.cluster!r} but the task is running in cluster \"\n f\"{cluster!r}.\"\n )\n\n _, ecs_client = self._get_session_and_client()\n try:\n ecs_client.stop_task(cluster=cluster, task=task)\n except Exception as exc:\n # Raise a special exception if the task does not exist\n if \"ClusterNotFound\" in str(exc):\n raise InfrastructureNotFound(\n f\"Cannot stop ECS task: the cluster {cluster!r} could not be found.\"\n ) from exc\n if \"not find task\" in str(exc) or \"referenced task was not found\" in str(\n exc\n ):\n raise InfrastructureNotFound(\n f\"Cannot stop ECS task: the task {task!r} could not be found in \"\n f\"cluster {cluster!r}.\"\n ) from exc\n if \"no registered tasks\" in str(exc):\n raise InfrastructureNotFound(\n f\"Cannot stop ECS task: the cluster {cluster!r} has no tasks.\"\n ) from exc\n\n # Reraise unknown exceptions\n raise\n\n @property\n def _log_prefix(self) -> str:\n \"\"\"\n Internal property for generating a prefix for logs where `name` may be null\n \"\"\"\n if self.name is not None:\n return f\"ECSTask {self.name!r}\"\n else:\n return \"ECSTask\"\n\n def _get_session_and_client(self) -> Tuple[boto3.Session, _ECSClient]:\n \"\"\"\n Retrieve a boto3 session and ECS client\n \"\"\"\n boto_session = self.aws_credentials.get_boto3_session()\n ecs_client = boto_session.client(\"ecs\")\n return boto_session, ecs_client\n\n def _create_task_and_wait_for_start(\n self, boto_session: boto3.Session, ecs_client: _ECSClient\n ) -> Tuple[str, str, dict, bool]:\n \"\"\"\n Register the task definition, create the task run, and wait for it to start.\n\n Returns a tuple of\n - The task ARN\n - The task's cluster ARN\n - The task definition\n - A bool indicating if the task definition is newly registered\n \"\"\"\n new_task_definition_registered = False\n requested_task_definition = (\n self._retrieve_task_definition(ecs_client, self.task_definition_arn)\n if self.task_definition_arn\n else self.task_definition\n ) or {}\n task_definition_arn = requested_task_definition.get(\"taskDefinitionArn\", None)\n\n task_definition = self._prepare_task_definition(\n requested_task_definition, region=ecs_client.meta.region_name\n )\n\n # We must register the task definition if the arn is null or changes were made\n if task_definition != requested_task_definition or not task_definition_arn:\n # Before registering, check if the latest task definition in the family\n # can be used\n latest_task_definition = self._retrieve_latest_task_definition(\n ecs_client, task_definition[\"family\"]\n )\n if self._task_definitions_equal(latest_task_definition, task_definition):\n self.logger.debug(\n f\"{self._log_prefix}: The latest task definition matches the \"\n \"required task definition; using that instead of registering a new \"\n \" one.\"\n )\n task_definition_arn = latest_task_definition[\"taskDefinitionArn\"]\n else:\n if task_definition_arn:\n self.logger.warning(\n f\"{self._log_prefix}: Settings require changes to the linked \"\n \"task definition. A new task definition will be registered. \"\n + (\n \"Enable DEBUG level logs to see the difference.\"\n if self.logger.level > logging.DEBUG\n else \"\"\n )\n )\n self.logger.debug(\n f\"{self._log_prefix}: Diff for requested task definition\"\n + _pretty_diff(requested_task_definition, task_definition)\n )\n else:\n self.logger.info(\n f\"{self._log_prefix}: Registering task definition...\"\n )\n self.logger.debug(\n \"Task definition payload\\n\" + yaml.dump(task_definition)\n )\n\n task_definition_arn = self._register_task_definition(\n ecs_client, task_definition\n )\n new_task_definition_registered = True\n\n if task_definition.get(\"networkMode\") == \"awsvpc\":\n network_config = self._load_vpc_network_config(self.vpc_id, boto_session)\n else:\n network_config = None\n\n task_run = self._prepare_task_run(\n network_config=network_config,\n task_definition_arn=task_definition_arn,\n )\n self.logger.info(f\"{self._log_prefix}: Creating task run...\")\n self.logger.debug(\"Task run payload\\n\" + yaml.dump(task_run))\n\n try:\n task = self._run_task(ecs_client, task_run)\n task_arn = task[\"taskArn\"]\n cluster_arn = task[\"clusterArn\"]\n except Exception as exc:\n self._report_task_run_creation_failure(task_run, exc)\n\n # Raises an exception if the task does not start\n self.logger.info(f\"{self._log_prefix}: Waiting for task run to start...\")\n self._wait_for_task_start(\n task_arn, cluster_arn, ecs_client, timeout=self.task_start_timeout_seconds\n )\n\n return task_arn, cluster_arn, task_definition, new_task_definition_registered\n\n def _watch_task_and_get_exit_code(\n self,\n task_arn: str,\n cluster_arn: str,\n task_definition: dict,\n deregister_task_definition: bool,\n boto_session: boto3.Session,\n ecs_client: _ECSClient,\n ) -> Optional[int]:\n \"\"\"\n Wait for the task run to complete and retrieve the exit code of the Prefect\n container.\n \"\"\"\n\n # Wait for completion and stream logs\n task = self._wait_for_task_finish(\n task_arn, cluster_arn, task_definition, ecs_client, boto_session\n )\n\n if deregister_task_definition:\n ecs_client.deregister_task_definition(\n taskDefinition=task[\"taskDefinitionArn\"]\n )\n\n # Check the status code of the Prefect container\n prefect_container = get_prefect_container(task[\"containers\"])\n assert (\n prefect_container is not None\n ), f\"'prefect' container missing from task: {task}\"\n status_code = prefect_container.get(\"exitCode\")\n self._report_container_status_code(PREFECT_ECS_CONTAINER_NAME, status_code)\n\n return status_code\n\n def _task_definitions_equal(self, taskdef_1, taskdef_2) -> bool:\n \"\"\"\n Compare two task definitions.\n\n Since one may come from the AWS API and have populated defaults, we do our best\n to homogenize the definitions without changing their meaning.\n \"\"\"\n if taskdef_1 == taskdef_2:\n return True\n\n if taskdef_1 is None or taskdef_2 is None:\n return False\n\n taskdef_1 = copy.deepcopy(taskdef_1)\n taskdef_2 = copy.deepcopy(taskdef_2)\n\n def _set_aws_defaults(taskdef):\n \"\"\"Set defaults that AWS would set after registration\"\"\"\n container_definitions = taskdef.get(\"containerDefinitions\", [])\n essential = any(\n container.get(\"essential\") for container in container_definitions\n )\n if not essential:\n container_definitions[0].setdefault(\"essential\", True)\n\n taskdef.setdefault(\"networkMode\", \"bridge\")\n\n _set_aws_defaults(taskdef_1)\n _set_aws_defaults(taskdef_2)\n\n def _drop_empty_keys(dict_):\n \"\"\"Recursively drop keys with 'empty' values\"\"\"\n for key, value in tuple(dict_.items()):\n if not value:\n dict_.pop(key)\n if isinstance(value, dict):\n _drop_empty_keys(value)\n if isinstance(value, list):\n for v in value:\n if isinstance(v, dict):\n _drop_empty_keys(v)\n\n _drop_empty_keys(taskdef_1)\n _drop_empty_keys(taskdef_2)\n\n # Clear fields that change on registration for comparison\n for field in POST_REGISTRATION_FIELDS:\n taskdef_1.pop(field, None)\n taskdef_2.pop(field, None)\n\n return taskdef_1 == taskdef_2\n\n def preview(self) -> str:\n \"\"\"\n Generate a preview of the task definition and task run that will be sent to AWS.\n \"\"\"\n preview = \"\"\n\n task_definition_arn = self.task_definition_arn or \"<registered at runtime>\"\n\n if self.task_definition or not self.task_definition_arn:\n task_definition = self._prepare_task_definition(\n self.task_definition or {},\n region=self.aws_credentials.region_name\n or \"<loaded from client at runtime>\",\n )\n preview += \"---\\n# Task definition\\n\"\n preview += yaml.dump(task_definition)\n preview += \"\\n\"\n else:\n task_definition = None\n\n if task_definition and task_definition.get(\"networkMode\") == \"awsvpc\":\n vpc = \"the default VPC\" if not self.vpc_id else self.vpc_id\n network_config = {\n \"awsvpcConfiguration\": {\n \"subnets\": f\"<loaded from {vpc} at runtime>\",\n \"assignPublicIp\": \"ENABLED\",\n }\n }\n else:\n network_config = None\n\n task_run = self._prepare_task_run(network_config, task_definition_arn)\n preview += \"---\\n# Task run request\\n\"\n preview += yaml.dump(task_run)\n\n return preview\n\n def _report_container_status_code(\n self, name: str, status_code: Optional[int]\n ) -> None:\n \"\"\"\n Display a log for the given container status code.\n \"\"\"\n if status_code is None:\n self.logger.error(\n f\"{self._log_prefix}: Task exited without reporting an exit status \"\n f\"for container {name!r}.\"\n )\n elif status_code == 0:\n self.logger.info(\n f\"{self._log_prefix}: Container {name!r} exited successfully.\"\n )\n else:\n self.logger.warning(\n f\"{self._log_prefix}: Container {name!r} exited with non-zero exit \"\n f\"code {status_code}.\"\n )\n\n def _report_task_run_creation_failure(self, task_run: dict, exc: Exception) -> None:\n \"\"\"\n Wrap common AWS task run creation failures with nicer user-facing messages.\n \"\"\"\n # AWS generates exception types at runtime so they must be captured a bit\n # differently than normal.\n if \"ClusterNotFoundException\" in str(exc):\n cluster = task_run.get(\"cluster\", \"default\")\n raise RuntimeError(\n f\"Failed to run ECS task, cluster {cluster!r} not found. \"\n \"Confirm that the cluster is configured in your region.\"\n ) from exc\n elif \"No Container Instances\" in str(exc) and self.launch_type == \"EC2\":\n cluster = task_run.get(\"cluster\", \"default\")\n raise RuntimeError(\n f\"Failed to run ECS task, cluster {cluster!r} does not appear to \"\n \"have any container instances associated with it. Confirm that you \"\n \"have EC2 container instances available.\"\n ) from exc\n elif (\n \"failed to validate logger args\" in str(exc)\n and \"AccessDeniedException\" in str(exc)\n and self.configure_cloudwatch_logs\n ):\n raise RuntimeError(\n \"Failed to run ECS task, the attached execution role does not appear \"\n \"to have sufficient permissions. Ensure that the execution role \"\n f\"{self.execution_role!r} has permissions logs:CreateLogStream, \"\n \"logs:CreateLogGroup, and logs:PutLogEvents.\"\n )\n else:\n raise\n\n def _watch_task_run(\n self,\n task_arn: str,\n cluster_arn: str,\n ecs_client: _ECSClient,\n current_status: str = \"UNKNOWN\",\n until_status: str = None,\n timeout: int = None,\n ) -> Generator[None, None, dict]:\n \"\"\"\n Watches an ECS task run by querying every `poll_interval` seconds. After each\n query, the retrieved task is yielded. This function returns when the task run\n reaches a STOPPED status or the provided `until_status`.\n\n Emits a log each time the status changes.\n \"\"\"\n last_status = status = current_status\n t0 = time.time()\n while status != until_status:\n tasks = ecs_client.describe_tasks(\n tasks=[task_arn], cluster=cluster_arn, include=[\"TAGS\"]\n )[\"tasks\"]\n\n if tasks:\n task = tasks[0]\n\n status = task[\"lastStatus\"]\n if status != last_status:\n self.logger.info(f\"{self._log_prefix}: Status is {status}.\")\n\n yield task\n\n # No point in continuing if the status is final\n if status == \"STOPPED\":\n break\n\n last_status = status\n\n else:\n # Intermittently, the task will not be described. We wat to respect the\n # watch timeout though.\n self.logger.debug(f\"{self._log_prefix}: Task not found.\")\n\n elapsed_time = time.time() - t0\n if timeout is not None and elapsed_time > timeout:\n raise RuntimeError(\n f\"Timed out after {elapsed_time}s while watching task for status \"\n f\"{until_status or 'STOPPED'}\"\n )\n time.sleep(self.task_watch_poll_interval)\n\n def _wait_for_task_start(\n self, task_arn: str, cluster_arn: str, ecs_client: _ECSClient, timeout: int\n ) -> dict:\n \"\"\"\n Waits for an ECS task run to reach a RUNNING status.\n\n If a STOPPED status is reached instead, an exception is raised indicating the\n reason that the task run did not start.\n \"\"\"\n for task in self._watch_task_run(\n task_arn, cluster_arn, ecs_client, until_status=\"RUNNING\", timeout=timeout\n ):\n # TODO: It is possible that the task has passed _through_ a RUNNING\n # status during the polling interval. In this case, there is not an\n # exception to raise.\n if task[\"lastStatus\"] == \"STOPPED\":\n code = task.get(\"stopCode\")\n reason = task.get(\"stoppedReason\")\n # Generate a dynamic exception type from the AWS name\n raise type(code, (RuntimeError,), {})(reason)\n\n return task\n\n def _wait_for_task_finish(\n self,\n task_arn: str,\n cluster_arn: str,\n task_definition: dict,\n ecs_client: _ECSClient,\n boto_session: boto3.Session,\n ):\n \"\"\"\n Watch an ECS task until it reaches a STOPPED status.\n\n If configured, logs from the Prefect container are streamed to stderr.\n\n Returns a description of the task on completion.\n \"\"\"\n can_stream_output = False\n\n if self.stream_output:\n container_def = get_prefect_container(\n task_definition[\"containerDefinitions\"]\n )\n if not container_def:\n self.logger.warning(\n f\"{self._log_prefix}: Prefect container definition not found in \"\n \"task definition. Output cannot be streamed.\"\n )\n elif not container_def.get(\"logConfiguration\"):\n self.logger.warning(\n f\"{self._log_prefix}: Logging configuration not found on task. \"\n \"Output cannot be streamed.\"\n )\n elif not container_def[\"logConfiguration\"].get(\"logDriver\") == \"awslogs\":\n self.logger.warning(\n f\"{self._log_prefix}: Logging configuration uses unsupported \"\n \" driver {container_def['logConfiguration'].get('logDriver')!r}. \"\n \"Output cannot be streamed.\"\n )\n else:\n # Prepare to stream the output\n log_config = container_def[\"logConfiguration\"][\"options\"]\n logs_client = boto_session.client(\"logs\")\n can_stream_output = True\n # Track the last log timestamp to prevent double display\n last_log_timestamp: Optional[int] = None\n # Determine the name of the stream as \"prefix/container/run-id\"\n stream_name = \"/\".join(\n [\n log_config[\"awslogs-stream-prefix\"],\n PREFECT_ECS_CONTAINER_NAME,\n task_arn.rsplit(\"/\")[-1],\n ]\n )\n self.logger.info(\n f\"{self._log_prefix}: Streaming output from container \"\n f\"{PREFECT_ECS_CONTAINER_NAME!r}...\"\n )\n\n for task in self._watch_task_run(\n task_arn, cluster_arn, ecs_client, current_status=\"RUNNING\"\n ):\n if self.stream_output and can_stream_output:\n # On each poll for task run status, also retrieve available logs\n last_log_timestamp = self._stream_available_logs(\n logs_client,\n log_group=log_config[\"awslogs-group\"],\n log_stream=stream_name,\n last_log_timestamp=last_log_timestamp,\n )\n\n return task\n\n def _stream_available_logs(\n self,\n logs_client: Any,\n log_group: str,\n log_stream: str,\n last_log_timestamp: Optional[int] = None,\n ) -> Optional[int]:\n \"\"\"\n Stream logs from the given log group and stream since the last log timestamp.\n\n Will continue on paginated responses until all logs are returned.\n\n Returns the last log timestamp which can be used to call this method in the\n future.\n \"\"\"\n last_log_stream_token = \"NO-TOKEN\"\n next_log_stream_token = None\n\n # AWS will return the same token that we send once the end of the paginated\n # response is reached\n while last_log_stream_token != next_log_stream_token:\n last_log_stream_token = next_log_stream_token\n\n request = {\n \"logGroupName\": log_group,\n \"logStreamName\": log_stream,\n }\n\n if last_log_stream_token is not None:\n request[\"nextToken\"] = last_log_stream_token\n\n if last_log_timestamp is not None:\n # Bump the timestamp by one ms to avoid retrieving the last log again\n request[\"startTime\"] = last_log_timestamp + 1\n\n try:\n response = logs_client.get_log_events(**request)\n except Exception:\n self.logger.error(\n (\n f\"{self._log_prefix}: Failed to read log events with request \"\n f\"{request}\"\n ),\n exc_info=True,\n )\n return last_log_timestamp\n\n log_events = response[\"events\"]\n for log_event in log_events:\n # TODO: This doesn't forward to the local logger, which can be\n # bad for customizing handling and understanding where the\n # log is coming from, but it avoid nesting logger information\n # when the content is output from a Prefect logger on the\n # running infrastructure\n print(log_event[\"message\"], file=sys.stderr)\n\n if (\n last_log_timestamp is None\n or log_event[\"timestamp\"] > last_log_timestamp\n ):\n last_log_timestamp = log_event[\"timestamp\"]\n\n next_log_stream_token = response.get(\"nextForwardToken\")\n if not log_events:\n # Stop reading pages if there was no data\n break\n\n return last_log_timestamp\n\n def _retrieve_latest_task_definition(\n self, ecs_client: _ECSClient, task_definition_family: str\n ) -> Optional[dict]:\n try:\n latest_task_definition = self._retrieve_task_definition(\n ecs_client, task_definition_family\n )\n except Exception:\n # The family does not exist...\n return None\n\n return latest_task_definition\n\n def _retrieve_task_definition(\n self, ecs_client: _ECSClient, task_definition_arn: str\n ):\n \"\"\"\n Retrieve an existing task definition from AWS.\n \"\"\"\n self.logger.info(\n f\"{self._log_prefix}: Retrieving task definition {task_definition_arn!r}...\"\n )\n response = ecs_client.describe_task_definition(\n taskDefinition=task_definition_arn\n )\n return response[\"taskDefinition\"]\n\n def _register_task_definition(\n self, ecs_client: _ECSClient, task_definition: dict\n ) -> str:\n \"\"\"\n Register a new task definition with AWS.\n \"\"\"\n # TODO: Consider including a global cache for this task definition since\n # registration of task definitions is frequently rate limited\n task_definition_request = copy.deepcopy(task_definition)\n\n # We need to remove some fields here if copying an existing task definition\n for field in POST_REGISTRATION_FIELDS:\n task_definition_request.pop(field, None)\n\n response = ecs_client.register_task_definition(**task_definition_request)\n return response[\"taskDefinition\"][\"taskDefinitionArn\"]\n\n def _prepare_task_definition(self, task_definition: dict, region: str) -> dict:\n \"\"\"\n Prepare a task definition by inferring any defaults and merging overrides.\n \"\"\"\n task_definition = copy.deepcopy(task_definition)\n\n # Configure the Prefect runtime container\n task_definition.setdefault(\"containerDefinitions\", [])\n container = get_prefect_container(task_definition[\"containerDefinitions\"])\n if container is None:\n container = {\"name\": PREFECT_ECS_CONTAINER_NAME}\n task_definition[\"containerDefinitions\"].append(container)\n\n if self.image:\n container[\"image\"] = self.image\n\n # Remove any keys that have been explicitly \"unset\"\n unset_keys = {key for key, value in self.env.items() if value is None}\n for item in tuple(container.get(\"environment\", [])):\n if item[\"name\"] in unset_keys:\n container[\"environment\"].remove(item)\n\n if self.configure_cloudwatch_logs:\n container[\"logConfiguration\"] = {\n \"logDriver\": \"awslogs\",\n \"options\": {\n \"awslogs-create-group\": \"true\",\n \"awslogs-group\": \"prefect\",\n \"awslogs-region\": region,\n \"awslogs-stream-prefix\": self.name or \"prefect\",\n **self.cloudwatch_logs_options,\n },\n }\n\n family = self.family or task_definition.get(\"family\") or ECS_DEFAULT_FAMILY\n task_definition[\"family\"] = slugify(\n family,\n max_length=255,\n regex_pattern=r\"[^a-zA-Z0-9-_]+\",\n )\n\n # CPU and memory are required in some cases, retrieve the value to use\n cpu = self.cpu or task_definition.get(\"cpu\") or ECS_DEFAULT_CPU\n memory = self.memory or task_definition.get(\"memory\") or ECS_DEFAULT_MEMORY\n\n if self.launch_type == \"FARGATE\" or self.launch_type == \"FARGATE_SPOT\":\n # Task level memory and cpu are required when using fargate\n task_definition[\"cpu\"] = str(cpu)\n task_definition[\"memory\"] = str(memory)\n\n # The FARGATE compatibility is required if it will be used as as launch type\n requires_compatibilities = task_definition.setdefault(\n \"requiresCompatibilities\", []\n )\n if \"FARGATE\" not in requires_compatibilities:\n task_definition[\"requiresCompatibilities\"].append(\"FARGATE\")\n\n # Only the 'awsvpc' network mode is supported when using FARGATE\n # However, we will not enforce that here if the user has set it\n network_mode = task_definition.setdefault(\"networkMode\", \"awsvpc\")\n\n if network_mode != \"awsvpc\":\n warnings.warn(\n f\"Found network mode {network_mode!r} which is not compatible with \"\n f\"launch type {self.launch_type!r}. Use either the 'EC2' launch \"\n \"type or the 'awsvpc' network mode.\"\n )\n\n elif self.launch_type == \"EC2\":\n # Container level memory and cpu are required when using ec2\n container.setdefault(\"cpu\", int(cpu))\n container.setdefault(\"memory\", int(memory))\n\n if self.execution_role_arn and not self.task_definition_arn:\n task_definition[\"executionRoleArn\"] = self.execution_role_arn\n\n if self.configure_cloudwatch_logs and not task_definition.get(\n \"executionRoleArn\"\n ):\n raise ValueError(\n \"An execution role arn must be set on the task definition to use \"\n \"`configure_cloudwatch_logs` or `stream_logs` but no execution role \"\n \"was found on the task definition.\"\n )\n\n return task_definition\n\n def _prepare_task_run_overrides(self) -> dict:\n \"\"\"\n Prepare the 'overrides' payload for a task run request.\n \"\"\"\n overrides = {\n \"containerOverrides\": [\n {\n \"name\": PREFECT_ECS_CONTAINER_NAME,\n \"environment\": [\n {\"name\": key, \"value\": value}\n for key, value in {\n **self._base_environment(),\n **self.env,\n }.items()\n if value is not None\n ],\n }\n ],\n }\n\n prefect_container_overrides = overrides[\"containerOverrides\"][0]\n\n if self.command:\n prefect_container_overrides[\"command\"] = self.command\n\n if self.execution_role_arn:\n overrides[\"executionRoleArn\"] = self.execution_role_arn\n\n if self.task_role_arn:\n overrides[\"taskRoleArn\"] = self.task_role_arn\n\n if self.memory:\n overrides[\"memory\"] = str(self.memory)\n prefect_container_overrides.setdefault(\"memory\", self.memory)\n\n if self.cpu:\n overrides[\"cpu\"] = str(self.cpu)\n prefect_container_overrides.setdefault(\"cpu\", self.cpu)\n\n return overrides\n\n def _load_vpc_network_config(\n self, vpc_id: Optional[str], boto_session: boto3.Session\n ) -> dict:\n \"\"\"\n Load settings from a specific VPC or the default VPC and generate a task\n run request's network configuration.\n \"\"\"\n ec2_client = boto_session.client(\"ec2\")\n vpc_message = \"the default VPC\" if not vpc_id else f\"VPC with ID {vpc_id}\"\n\n if not vpc_id:\n # Retrieve the default VPC\n describe = {\"Filters\": [{\"Name\": \"isDefault\", \"Values\": [\"true\"]}]}\n else:\n describe = {\"VpcIds\": [vpc_id]}\n\n vpcs = ec2_client.describe_vpcs(**describe)[\"Vpcs\"]\n if not vpcs:\n help_message = (\n \"Pass an explicit `vpc_id` or configure a default VPC.\"\n if not vpc_id\n else \"Check that the VPC exists in the current region.\"\n )\n raise ValueError(\n f\"Failed to find {vpc_message}. \"\n \"Network configuration cannot be inferred. \"\n + help_message\n )\n\n vpc_id = vpcs[0][\"VpcId\"]\n subnets = ec2_client.describe_subnets(\n Filters=[{\"Name\": \"vpc-id\", \"Values\": [vpc_id]}]\n )[\"Subnets\"]\n if not subnets:\n raise ValueError(\n f\"Failed to find subnets for {vpc_message}. \"\n \"Network configuration cannot be inferred.\"\n )\n\n return {\n \"awsvpcConfiguration\": {\n \"subnets\": [s[\"SubnetId\"] for s in subnets],\n \"assignPublicIp\": \"ENABLED\",\n \"securityGroups\": [],\n }\n }\n\n def _prepare_task_run(\n self,\n network_config: Optional[dict],\n task_definition_arn: str,\n ) -> dict:\n \"\"\"\n Prepare a task run request payload.\n \"\"\"\n task_run = {\n \"overrides\": self._prepare_task_run_overrides(),\n \"tags\": [\n {\n \"key\": slugify(\n key,\n regex_pattern=_TAG_REGEX,\n allow_unicode=True,\n lowercase=False,\n ),\n \"value\": slugify(\n value,\n regex_pattern=_TAG_REGEX,\n allow_unicode=True,\n lowercase=False,\n ),\n }\n for key, value in self.labels.items()\n ],\n \"taskDefinition\": task_definition_arn,\n }\n\n if self.cluster:\n task_run[\"cluster\"] = self.cluster\n\n if self.launch_type:\n if self.launch_type == \"FARGATE_SPOT\":\n task_run[\"capacityProviderStrategy\"] = [\n {\"capacityProvider\": \"FARGATE_SPOT\", \"weight\": 1}\n ]\n else:\n task_run[\"launchType\"] = self.launch_type\n\n if network_config:\n task_run[\"networkConfiguration\"] = network_config\n\n task_run = self.task_customizations.apply(task_run)\n return task_run\n\n def _run_task(self, ecs_client: _ECSClient, task_run: dict):\n \"\"\"\n Run the task using the ECS client.\n\n This is isolated as a separate method for testing purposes.\n \"\"\"\n return ecs_client.run_task(**task_run)[\"tasks\"][0]\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask-attributes","title":"Attributes","text":""},{"location":"ecs/#prefect_aws.ecs.ECSTask.auto_deregister_task_definition","title":"
auto_deregister_task_definition: bool
pydantic-field
","text":"
If set, any task definitions that are created by this block will be deregistered. Existing task definitions linked by ARN will never be deregistered. Deregistering a task definition does not remove it from your AWS account, instead it will be marked as INACTIVE.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.aws_credentials","title":"
aws_credentials: AwsCredentials
pydantic-field
","text":"
The AWS credentials to use to connect to ECS.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.cloudwatch_logs_options","title":"
cloudwatch_logs_options: Dict[str, str]
pydantic-field
","text":"
When configure_cloudwatch_logs
is enabled, this setting may be used to pass additional options to the CloudWatch logs configuration or override the default options. See the AWS documentation for available options. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_awslogs.html#create_awslogs_logdriver_options.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.cluster","title":"
cluster: str
pydantic-field
","text":"
The ECS cluster to run the task in. The ARN or name may be provided. If not provided, the default cluster will be used.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.configure_cloudwatch_logs","title":"
configure_cloudwatch_logs: bool
pydantic-field
","text":"
If True
, the Prefect container will be configured to send its output to the AWS CloudWatch logs service. This functionality requires an execution role with logs:CreateLogStream, logs:CreateLogGroup, and logs:PutLogEvents permissions. The default for this field is False
unless stream_output
is set.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.cpu","title":"
cpu: int
pydantic-field
","text":"
The amount of CPU to provide to the ECS task. Valid amounts are specified in the AWS documentation. If not provided, a default value of 1024 will be used unless present on the task definition.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.execution_role_arn","title":"
execution_role_arn: str
pydantic-field
","text":"
An execution role to use for the task. This controls the permissions of the task when it is launching. If this value is not null, it will override the value in the task definition. An execution role must be provided to capture logs from the container.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.family","title":"
family: str
pydantic-field
","text":"
A family for the task definition. If not provided, it will be inferred from the task definition. If the task definition does not have a family, the name will be generated. When flow and deployment metadata is available, the generated name will include their names. Values for this field will be slugified to match AWS character requirements.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.image","title":"
image: str
pydantic-field
","text":"
The image to use for the Prefect container in the task. If this value is not null, it will override the value in the task definition. This value defaults to a Prefect base image matching your local versions.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.launch_type","title":"
launch_type: Literal['FARGATE', 'EC2', 'EXTERNAL', 'FARGATE_SPOT']
pydantic-field
","text":"
The type of ECS task run infrastructure that should be used. Note that 'FARGATE_SPOT' is not a formal ECS launch type, but we will configure the proper capacity provider stategy if set here.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.memory","title":"
memory: int
pydantic-field
","text":"
The amount of memory to provide to the ECS task. Valid amounts are specified in the AWS documentation. If not provided, a default value of 2048 will be used unless present on the task definition.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.stream_output","title":"
stream_output: bool
pydantic-field
","text":"
If True
, logs will be streamed from the Prefect container to the local console. Unless you have configured AWS CloudWatch logs manually on your task definition, this requires the same prerequisites outlined in configure_cloudwatch_logs
.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.task_customizations","title":"
task_customizations: JsonPatch
pydantic-field
","text":"
A list of JSON 6902 patches to apply to the task run request. If a string is given, it will parsed as a JSON expression.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.task_definition","title":"
task_definition: dict
pydantic-field
","text":"
An ECS task definition to use. Prefect may set defaults or override fields on this task definition to match other ECSTask
fields. Cannot be used with task_definition_arn
. If not provided, Prefect will generate and register a minimal task definition.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.task_definition_arn","title":"
task_definition_arn: str
pydantic-field
","text":"
An identifier for an existing task definition to use. If fields are set on the ECSTask
that conflict with the task definition, a new copy will be registered with the required values. Cannot be used with task_definition
. If not provided, Prefect will generate and register a minimal task definition.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.task_role_arn","title":"
task_role_arn: str
pydantic-field
","text":"
A role to attach to the task run. This controls the permissions of the task while it is running.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.task_start_timeout_seconds","title":"
task_start_timeout_seconds: int
pydantic-field
","text":"
The amount of time to watch for the start of the ECS task before marking it as failed. The task must enter a RUNNING state to be considered started.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.task_watch_poll_interval","title":"
task_watch_poll_interval: float
pydantic-field
","text":"
The amount of time to wait between AWS API calls while monitoring the state of an ECS task.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.vpc_id","title":"
vpc_id: str
pydantic-field
","text":"
The AWS VPC to link the task run to. This is only applicable when using the 'awsvpc' network mode for your task. FARGATE tasks require this network mode, but for EC2 tasks the default network mode is 'bridge'. If using the 'awsvpc' network mode and this field is null, your default VPC will be used. If no default VPC can be found, the task run will fail.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask-classes","title":"Classes","text":""},{"location":"ecs/#prefect_aws.ecs.ECSTask.Config","title":"
Config
","text":"
Configuration of pydantic.
Source code in
prefect_aws/ecs.py
class Config:\n \"\"\"Configuration of pydantic.\"\"\"\n\n # Support serialization of the 'JsonPatch' type\n arbitrary_types_allowed = True\n json_encoders = {JsonPatch: lambda p: p.patch}\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask-methods","title":"Methods","text":""},{"location":"ecs/#prefect_aws.ecs.ECSTask.__json_encoder__","title":"
__json_encoder__
special
staticmethod
","text":"
partial(func, args, *keywords) - new function with partial application of the given arguments and keywords.
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.cast_customizations_to_a_json_patch","title":"
cast_customizations_to_a_json_patch
classmethod
","text":"
Casts lists to JsonPatch instances.
Source code in
prefect_aws/ecs.py
@validator(\"task_customizations\", pre=True)\ndef cast_customizations_to_a_json_patch(\n cls, value: Union[List[Dict], JsonPatch, str]\n) -> JsonPatch:\n \"\"\"\n Casts lists to JsonPatch instances.\n \"\"\"\n if isinstance(value, str):\n value = json.loads(value)\n if isinstance(value, list):\n return JsonPatch(value)\n return value # type: ignore\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.cloudwatch_logs_options_requires_configure_cloudwatch_logs","title":"
cloudwatch_logs_options_requires_configure_cloudwatch_logs
classmethod
","text":"
Enforces that an execution role arn is provided (or could be provided by a runtime task definition) when configuring logging.
Source code in
prefect_aws/ecs.py
@root_validator\ndef cloudwatch_logs_options_requires_configure_cloudwatch_logs(\n cls, values: dict\n) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if values.get(\"cloudwatch_logs_options\") and not values.get(\n \"configure_cloudwatch_logs\"\n ):\n raise ValueError(\n \"`configure_cloudwatch_log` must be enabled to use \"\n \"`cloudwatch_logs_options`.\"\n )\n return values\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.configure_cloudwatch_logs_requires_execution_role_arn","title":"
configure_cloudwatch_logs_requires_execution_role_arn
classmethod
","text":"
Enforces that an execution role arn is provided (or could be provided by a runtime task definition) when configuring logging.
Source code in
prefect_aws/ecs.py
@root_validator\ndef configure_cloudwatch_logs_requires_execution_role_arn(\n cls, values: dict\n) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if (\n values.get(\"configure_cloudwatch_logs\")\n and not values.get(\"execution_role_arn\")\n # Do not raise if they've linked to another task definition or provided\n # it without using our shortcuts\n and not values.get(\"task_definition_arn\")\n and not (values.get(\"task_definition\") or {}).get(\"executionRoleArn\")\n ):\n raise ValueError(\n \"An `execution_role_arn` must be provided to use \"\n \"`configure_cloudwatch_logs` or `stream_logs`.\"\n )\n return values\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.dict","title":"
dict
","text":"
Convert to a dictionary.
Source code in
prefect_aws/ecs.py
def dict(self, *args, **kwargs) -> Dict:\n \"\"\"\n Convert to a dictionary.\n \"\"\"\n # Support serialization of the 'JsonPatch' type\n d = super().dict(*args, **kwargs)\n d[\"task_customizations\"] = self.task_customizations.patch\n return d\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.generate_work_pool_base_job_template","title":"
generate_work_pool_base_job_template
async
","text":"
Generate a base job template for a cloud-run work pool with the same configuration as this block.
Returns:
Type Description
- dict
a base job template for a cloud-run work pool
Source code in
prefect_aws/ecs.py
async def generate_work_pool_base_job_template(self) -> dict:\n \"\"\"\n Generate a base job template for a cloud-run work pool with the same\n configuration as this block.\n\n Returns:\n - dict: a base job template for a cloud-run work pool\n \"\"\"\n base_job_template = copy.deepcopy(ECSWorker.get_default_base_job_template())\n for key, value in self.dict(exclude_unset=True, exclude_defaults=True).items():\n if key == \"command\":\n base_job_template[\"variables\"][\"properties\"][\"command\"][\"default\"] = (\n shlex.join(value)\n )\n elif key in [\n \"type\",\n \"block_type_slug\",\n \"_block_document_id\",\n \"_block_document_name\",\n \"_is_anonymous\",\n \"task_customizations\",\n ]:\n continue\n elif key == \"aws_credentials\":\n if not self.aws_credentials._block_document_id:\n raise BlockNotSavedError(\n \"It looks like you are trying to use a block that\"\n \" has not been saved. Please call `.save` on your block\"\n \" before publishing it as a work pool.\"\n )\n base_job_template[\"variables\"][\"properties\"][\"aws_credentials\"][\n \"default\"\n ] = {\n \"$ref\": {\n \"block_document_id\": str(\n self.aws_credentials._block_document_id\n )\n }\n }\n elif key == \"task_definition\":\n base_job_template[\"job_configuration\"][\"task_definition\"] = value\n elif key in base_job_template[\"variables\"][\"properties\"]:\n base_job_template[\"variables\"][\"properties\"][key][\"default\"] = value\n else:\n self.logger.warning(\n f\"Variable {key!r} is not supported by Cloud Run work pools.\"\n \" Skipping.\"\n )\n\n if self.task_customizations:\n try:\n base_job_template[\"job_configuration\"][\"task_run_request\"] = (\n self.task_customizations.apply(\n base_job_template[\"job_configuration\"][\"task_run_request\"]\n )\n )\n except JsonPointerException:\n self.logger.warning(\n \"Unable to apply task customizations to the base job template.\"\n \"You may need to update the template manually.\"\n )\n\n return base_job_template\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.get_corresponding_worker_type","title":"
get_corresponding_worker_type
staticmethod
","text":"
Return the corresponding worker type for this infrastructure block.
Source code in
prefect_aws/ecs.py
@staticmethod\ndef get_corresponding_worker_type() -> str:\n \"\"\"Return the corresponding worker type for this infrastructure block.\"\"\"\n return ECSWorker.type\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.image_is_required","title":"
image_is_required
classmethod
","text":"
Enforces that an image is available if image is None
.
Source code in
prefect_aws/ecs.py
@root_validator(pre=True)\ndef image_is_required(cls, values: dict) -> dict:\n \"\"\"\n Enforces that an image is available if image is `None`.\n \"\"\"\n has_image = bool(values.get(\"image\"))\n has_task_definition_arn = bool(values.get(\"task_definition_arn\"))\n\n # The image can only be null when the task_definition_arn is set\n if has_image or has_task_definition_arn:\n return values\n\n prefect_container = (\n get_prefect_container(\n (values.get(\"task_definition\") or {}).get(\"containerDefinitions\", [])\n )\n or {}\n )\n image_in_task_definition = prefect_container.get(\"image\")\n\n # If a task_definition is given with a prefect container image, use that value\n if image_in_task_definition:\n values[\"image\"] = image_in_task_definition\n # Otherwise, it should default to the Prefect base image\n else:\n values[\"image\"] = get_prefect_image_name()\n return values\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.kill","title":"
kill
async
","text":"
Kill a task running on ECS.
Parameters:
Name Type Description Default
identifier
str
A cluster and task arn combination. This should match a value yielded by ECSTask.run
.
required Source code in
prefect_aws/ecs.py
@sync_compatible\nasync def kill(self, identifier: str, grace_seconds: int = 30) -> None:\n \"\"\"\n Kill a task running on ECS.\n\n Args:\n identifier: A cluster and task arn combination. This should match a value\n yielded by `ECSTask.run`.\n \"\"\"\n if grace_seconds != 30:\n self.logger.warning(\n f\"Kill grace period of {grace_seconds}s requested, but AWS does not \"\n \"support dynamic grace period configuration so 30s will be used. \"\n \"See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-config.html for configuration of grace periods.\" # noqa\n )\n cluster, task = parse_task_identifier(identifier)\n await run_sync_in_worker_thread(self._stop_task, cluster, task)\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.prepare_for_flow_run","title":"
prepare_for_flow_run
","text":"
Return an copy of the block that is prepared to execute a flow run.
Source code in
prefect_aws/ecs.py
def prepare_for_flow_run(\n self: Self,\n flow_run: \"FlowRun\",\n deployment: Optional[\"Deployment\"] = None,\n flow: Optional[\"Flow\"] = None,\n) -> Self:\n \"\"\"\n Return an copy of the block that is prepared to execute a flow run.\n \"\"\"\n new_family = None\n\n # Update the family if not specified elsewhere\n if (\n not self.family\n and not self.task_definition_arn\n and not (self.task_definition and self.task_definition.get(\"family\"))\n ):\n if flow and deployment:\n new_family = f\"{ECS_DEFAULT_FAMILY}__{flow.name}__{deployment.name}\"\n elif flow and not deployment:\n new_family = f\"{ECS_DEFAULT_FAMILY}__{flow.name}\"\n elif deployment and not flow:\n # This is a weird case and should not be see in the wild\n new_family = f\"{ECS_DEFAULT_FAMILY}__unknown-flow__{deployment.name}\"\n\n new = super().prepare_for_flow_run(flow_run, deployment=deployment, flow=flow)\n\n if new_family:\n return new.copy(update={\"family\": new_family})\n else:\n # Avoid an extra copy if not needed\n return new\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.preview","title":"
preview
","text":"
Generate a preview of the task definition and task run that will be sent to AWS.
Source code in
prefect_aws/ecs.py
def preview(self) -> str:\n \"\"\"\n Generate a preview of the task definition and task run that will be sent to AWS.\n \"\"\"\n preview = \"\"\n\n task_definition_arn = self.task_definition_arn or \"<registered at runtime>\"\n\n if self.task_definition or not self.task_definition_arn:\n task_definition = self._prepare_task_definition(\n self.task_definition or {},\n region=self.aws_credentials.region_name\n or \"<loaded from client at runtime>\",\n )\n preview += \"---\\n# Task definition\\n\"\n preview += yaml.dump(task_definition)\n preview += \"\\n\"\n else:\n task_definition = None\n\n if task_definition and task_definition.get(\"networkMode\") == \"awsvpc\":\n vpc = \"the default VPC\" if not self.vpc_id else self.vpc_id\n network_config = {\n \"awsvpcConfiguration\": {\n \"subnets\": f\"<loaded from {vpc} at runtime>\",\n \"assignPublicIp\": \"ENABLED\",\n }\n }\n else:\n network_config = None\n\n task_run = self._prepare_task_run(network_config, task_definition_arn)\n preview += \"---\\n# Task run request\\n\"\n preview += yaml.dump(task_run)\n\n return preview\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.run","title":"
run
async
","text":"
Run the configured task on ECS.
Source code in
prefect_aws/ecs.py
@sync_compatible\nasync def run(self, task_status: Optional[TaskStatus] = None) -> ECSTaskResult:\n \"\"\"\n Run the configured task on ECS.\n \"\"\"\n boto_session, ecs_client = await run_sync_in_worker_thread(\n self._get_session_and_client\n )\n\n (\n task_arn,\n cluster_arn,\n task_definition,\n is_new_task_definition,\n ) = await run_sync_in_worker_thread(\n self._create_task_and_wait_for_start, boto_session, ecs_client\n )\n\n # Display a nice message indicating the command and image\n command = self.command or get_prefect_container(\n task_definition[\"containerDefinitions\"]\n ).get(\"command\", [])\n self.logger.info(\n f\"{self._log_prefix}: Running command {' '.join(command)!r} \"\n f\"in container {PREFECT_ECS_CONTAINER_NAME!r} ({self.image})...\"\n )\n\n # The task identifier is \"{cluster}::{task}\" where we use the configured cluster\n # if set to preserve matching by name rather than arn\n # Note \"::\" is used despite the Prefect standard being \":\" because ARNs contain\n # single colons.\n identifier = (self.cluster if self.cluster else cluster_arn) + \"::\" + task_arn\n\n if task_status:\n task_status.started(identifier)\n\n status_code = await run_sync_in_worker_thread(\n self._watch_task_and_get_exit_code,\n task_arn,\n cluster_arn,\n task_definition,\n is_new_task_definition and self.auto_deregister_task_definition,\n boto_session,\n ecs_client,\n )\n\n return ECSTaskResult(\n identifier=identifier,\n # If the container does not start the exit code can be null but we must\n # still report a status code. We use a -1 to indicate a special code.\n status_code=status_code if status_code is not None else -1,\n )\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTask.set_default_configure_cloudwatch_logs","title":"
set_default_configure_cloudwatch_logs
classmethod
","text":"
Streaming output generally requires CloudWatch logs to be configured.
To avoid entangled arguments in the simple case, configure_cloudwatch_logs
defaults to matching the value of stream_output
.
Source code in
prefect_aws/ecs.py
@root_validator(pre=True)\ndef set_default_configure_cloudwatch_logs(cls, values: dict) -> dict:\n \"\"\"\n Streaming output generally requires CloudWatch logs to be configured.\n\n To avoid entangled arguments in the simple case, `configure_cloudwatch_logs`\n defaults to matching the value of `stream_output`.\n \"\"\"\n configure_cloudwatch_logs = values.get(\"configure_cloudwatch_logs\")\n if configure_cloudwatch_logs is None:\n values[\"configure_cloudwatch_logs\"] = values.get(\"stream_output\")\n return values\n
"},{"location":"ecs/#prefect_aws.ecs.ECSTaskResult","title":"
ECSTaskResult (InfrastructureResult)
pydantic-model
","text":"
The result of a run of an ECS task
Source code in
prefect_aws/ecs.py
class ECSTaskResult(InfrastructureResult):\n \"\"\"The result of a run of an ECS task\"\"\"\n
"},{"location":"ecs/#prefect_aws.ecs-functions","title":"Functions","text":""},{"location":"ecs/#prefect_aws.ecs.get_container","title":"
get_container
","text":"
Extract a container from a list of containers or container definitions. If not found, None
is returned.
Source code in
prefect_aws/ecs.py
def get_container(containers: List[dict], name: str) -> Optional[dict]:\n \"\"\"\n Extract a container from a list of containers or container definitions.\n If not found, `None` is returned.\n \"\"\"\n for container in containers:\n if container.get(\"name\") == name:\n return container\n return None\n
"},{"location":"ecs/#prefect_aws.ecs.get_prefect_container","title":"
get_prefect_container
","text":"
Extract the Prefect container from a list of containers or container definitions. If not found, None
is returned.
Source code in
prefect_aws/ecs.py
def get_prefect_container(containers: List[dict]) -> Optional[dict]:\n \"\"\"\n Extract the Prefect container from a list of containers or container definitions.\n If not found, `None` is returned.\n \"\"\"\n return get_container(containers, PREFECT_ECS_CONTAINER_NAME)\n
"},{"location":"ecs/#prefect_aws.ecs.parse_task_identifier","title":"
parse_task_identifier
","text":"
Splits identifier into its cluster and task components, e.g. input \"cluster_name::task_arn\" outputs (\"cluster_name\", \"task_arn\").
Source code in
prefect_aws/ecs.py
def parse_task_identifier(identifier: str) -> Tuple[str, str]:\n \"\"\"\n Splits identifier into its cluster and task components, e.g.\n input \"cluster_name::task_arn\" outputs (\"cluster_name\", \"task_arn\").\n \"\"\"\n cluster, task = identifier.split(\"::\", maxsplit=1)\n return cluster, task\n
"},{"location":"ecs_guide/","title":"ECS Worker Guide","text":""},{"location":"ecs_guide/#why-use-ecs-for-flow-run-execution","title":"Why use ECS for flow run execution?","text":"
ECS (Elastic Container Service) tasks are a good option for executing Prefect 2 flow runs for several reasons:
- Scalability: ECS scales your infrastructure in response to demand, effectively managing Prefect flow runs. ECS automatically administers container distribution across multiple instances based on demand.
- Flexibility: ECS lets you choose between AWS Fargate and Amazon EC2 for container operation. Fargate abstracts the underlying infrastructure, while EC2 has faster job start times and offers additional control over instance management and configuration.
- AWS Integration: Easily connect with other AWS services, such as AWS IAM and CloudWatch.
- Containerization: ECS supports Docker containers and offers managed execution. Containerization encourages reproducible deployments.
"},{"location":"ecs_guide/#ecs-flow-run-execution","title":"ECS Flow Run Execution","text":"
Prefect enables remote flow execution via workers and work pools. To learn more about these concepts please see our deployment tutorial.
For details on how workers and work pools are implemented for ECS, see the diagram below:
"},{"location":"ecs_guide/#architecture-diagram","title":"Architecture Diagram","text":"
graph TB\n\n subgraph ecs_cluster[ECS cluster]\n subgraph ecs_service[ECS service]\n td_worker[Worker task definition] --> |defines| prefect_worker((Prefect worker))\n end\n prefect_worker -->|kicks off| ecs_task\n fr_task_definition[Flow run task definition]\n\n\n subgraph ecs_task[\"ECS task execution <br> (Flow run infrastructure)\"]\n style ecs_task text-align:center\n\n flow_run((Flow run))\n\n end\n fr_task_definition -->|defines| ecs_task\n end\n\n subgraph prefect_cloud[Prefect Cloud]\n subgraph prefect_workpool[ECS work pool]\n workqueue[Work queue]\n end\n end\n\n subgraph github[\"GitHub\"]\n flow_code{{\"Flow code\"}}\n end\n flow_code --> |pulls| ecs_task\n prefect_worker -->|polls| workqueue\n prefect_workpool -->|configures| fr_task_definition
"},{"location":"ecs_guide/#ecs-in-prefect-terms","title":"ECS in Prefect Terms","text":"
ECS tasks != Prefect tasks
An ECS task is not the same thing as a Prefect task.
ECS tasks are groupings of containers that run within an ECS Cluster. An ECS task's behavior is determined by its task definition.
An ECS task definition is the blueprint for the ECS task. It describes which Docker containers to run and what you want to have happen inside these containers.
ECS tasks are instances of a task definition. A Task Execution launches container(s) as defined in the task definition until they are stopped or exit on their own. This setup is ideal for ephemeral processes such as a Prefect flow run.
The ECS task running the Prefect worker should be an ECS Service, given its long-running nature and need for auto-recovery in case of failure. An ECS service automatically replaces any task that fails, which is ideal for managing a long-running process such as a Prefect worker.
When a Prefect flow is scheduled to run it goes into the work pool specified in the flow's deployment. Work pools are typed according to the infrastructure the flow will run on. Flow runs scheduled in an ecs
typed work pool are executed as ECS tasks. Only Prefect ECS workers can poll an ecs
typed work pool.
When the ECS worker receives a scheduled flow run from the ECS work pool it is polling, it spins up the specified infrastructure on AWS ECS. The worker knows to build an ECS task definition for each flow run based on the configuration specified in the work pool.
Once the flow run completes, the ECS containers of the cluster are spun down to a single container that continues to run the Prefect worker. This worker continues polling for work from the Prefect work pool.
If you specify a task definition ARN (Amazon Resource Name) in the work pool, the worker will use that ARN when spinning up the ECS Task, rather than creating a task definition from the fields supplied in the work pool configuration.
You can use either EC2 or Fargate as the capacity provider. Fargate simplifies initiation, but lengthens infrastructure setup time for each flow run. Using EC2 for the ECS cluster can reduce setup time. In this example, we will show how to use Fargate.
"},{"location":"ecs_guide/#aws-cli-guide","title":"AWS CLI Guide","text":"
Tip
If you prefer infrastructure as code check out this Terraform module to provision an ECS cluster with a worker.
"},{"location":"ecs_guide/#prerequisites","title":"Prerequisites","text":"
Before you begin, make sure you have:
- An AWS account with permissions to create ECS services and IAM roles.
- The AWS CLI installed on your local machine. You can download it from the AWS website.
- An ECS Cluster to host both the worker and the flow runs it submits. Follow this guide to create an ECS cluster or simply use the default cluster.
- A VPC configured for your ECS tasks. A VPC is a good idea if using EC2 and required if using Fargate.
"},{"location":"ecs_guide/#step-1-set-up-an-ecs-work-pool","title":"Step 1: Set Up an ECS work pool","text":"
Before setting up the worker, create a simple work pool of type ECS for the worker to pull work from.
Create a work pool from the Prefect UI or CLI:
prefect work-pool create --type ecs my-ecs-pool\n
Configure the VPC and ECS cluster for your work pool via the UI:
Configuring custom fields is easiest from the UI.
Warning
You need to have a VPC specified for your work pool if you are using AWS Fargate.
Next, set up a Prefect ECS worker that will discover and pull work from this work pool.
"},{"location":"ecs_guide/#step-2-start-a-prefect-worker-in-your-ecs-cluster","title":"Step 2: Start a Prefect worker in your ECS cluster.","text":"
To create an IAM role for the ECS task using the AWS CLI, follow these steps:
-
Create a trust policy
The trust policy will specify that ECS can assume the role.
Save this policy to a file, such as ecs-trust-policy.json
:
{\n \"Version\": \"2012-10-17\",\n \"Statement\": [\n {\n \"Effect\": \"Allow\",\n \"Principal\": {\n \"Service\": \"ecs-tasks.amazonaws.com\"\n },\n \"Action\": \"sts:AssumeRole\"\n }\n ]\n}\n
-
Create the IAM role
Use the aws iam create-role
command to create the role:
aws iam create-role \\ \n--role-name ecsTaskExecutionRole \\\n--assume-role-policy-document file://ecs-trust-policy.json\n
-
Attach the policy to the role
Amazon has a managed policy named AmazonECSTaskExecutionRolePolicy
that grants the permissions necessary for ECS tasks. Attach this policy to your role:
aws iam attach-role-policy \\\n--role-name ecsTaskExecutionRole \\ \n--policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy\n
Remember to replace the --role-name
and --policy-arn
with the actual role name and policy Amazon Resource Name (ARN) you want to use.
Now, you have a role named ecsTaskExecutionRole
that you can assign to your ECS tasks. This role has the necessary permissions to pull container images and publish logs to CloudWatch.
-
Launch an ECS Service to host the worker
Next, create an ECS task definition that specifies the Docker image for the Prefect worker, the resources it requires, and the command it should run. In this example, the command to start the worker is prefect worker start --pool my-ecs-pool
.
Create a JSON file with the following contents:
{\n \"family\": \"prefect-worker-task\",\n \"networkMode\": \"awsvpc\",\n \"requiresCompatibilities\": [\n \"FARGATE\"\n ],\n \"cpu\": \"512\",\n \"memory\": \"1024\",\n \"executionRoleArn\": \"<your-ecs-task-role-arn>\",\n \"taskRoleArn\": \"<your-ecs-task-role-arn>\",\n \"containerDefinitions\": [\n {\n \"name\": \"prefect-worker\",\n \"image\": \"prefecthq/prefect:2-latest\",\n \"cpu\": 512,\n \"memory\": 1024,\n \"essential\": true,\n \"command\": [\n \"/bin/sh\",\n \"-c\",\n \"pip install prefect-aws && prefect worker start --pool my-ecs-pool --type ecs\"\n ],\n \"environment\": [\n {\n \"name\": \"PREFECT_API_URL\",\n \"value\": \"https://api.prefect.cloud/api/accounts/<your-account-id>/workspaces/<your-workspace-id>\"\n },\n {\n \"name\": \"PREFECT_API_KEY\",\n \"value\": \"<your-prefect-api-key>\"\n }\n ]\n }\n ]\n}\n
-
Use prefect config view
to view the PREFECT_API_URL
for your current Prefect profile. Use this to replace both <your-account-id>
and <your-workspace-id>
.
-
For the PREFECT_API_KEY
, individuals on the organization tier can create a service account for the worker. If on a personal tier, you can pass a user\u2019s API key.
-
Replace both instances of <your-ecs-task-role-arn>
with the ARN of the IAM role you created in Step 2.
-
Notice that the CPU and Memory allocations are relatively small. The worker's main responsibility is to submit work through API calls to AWS, not to execute your Prefect flow code.
Tip
To avoid hardcoding your API key into the task definition JSON see how to add environment variables to the container definition. The API key must be stored as plain text, not the key-value pair dictionary that it is formatted in by default.
-
Register the task definition:
Before creating a service, you first need to register a task definition. You can do that using the register-task-definition
command in the AWS CLI. Here is an example:
aws ecs register-task-definition --cli-input-json file://task-definition.json\n
Replace task-definition.json
with the name of your JSON file.
-
Create an ECS service to host your worker:
Finally, create a service that will manage your Prefect worker:
Open a terminal window and run the following command to create an ECS Fargate service:
aws ecs create-service \\\n --service-name prefect-worker-service \\\n --cluster <your-ecs-cluster> \\\n --task-definition <task-definition-arn> \\\n --launch-type FARGATE \\\n --desired-count 1 \\\n --network-configuration \"awsvpcConfiguration={subnets=[<your-subnet-ids>],securityGroups=[<your-security-group-ids>]}\"\n
- Replace
<your-ecs-cluster>
with the name of your ECS cluster. - Replace
<path-to-task-definition-file>
with the path to the JSON file you created in Step 2, <your-subnet-ids>
with a comma-separated list of your VPC subnet IDs. Ensure that these subnets are aligned with the vpc specified on the work pool in step 1. - Replace
<your-security-group-ids>
with a comma-separated list of your VPC security group IDs. - Replace
<task-definition-arn>
with the ARN of the task definition you just registered.
Sanity check
The work pool page in the Prefect UI allows you to check the health of your workers - make sure your new worker is live!
"},{"location":"ecs_guide/#step-4-pick-up-a-flow-run-with-your-new-worker","title":"Step 4: Pick up a flow run with your new worker!","text":"
-
Write a simple test flow in a repo of your choice:
my_flow.py
from prefect import flow, get_run_logger\n\n@flow\ndef my_flow():\n logger = get_run_logger()\n logger.info(\"Hello from ECS!!\")\n\nif __name__ == \"__main__\":\n my_flow()\n
-
Deploy the flow to the server, specifying the ECS work pool when prompted.
prefect deploy my_flow.py:my_flow\n
-
Find the deployment in the UI and click the Quick Run button!
"},{"location":"ecs_guide/#optional-next-steps","title":"Optional Next Steps","text":"
-
Now that you are confident your ECS worker is healthy, you can experiment with different work pool configurations.
- Do your flow runs require higher
CPU
? - Would an EC2
Launch Type
speed up your flow run execution?
These infrastructure configuration values can be set on your ECS work pool or they can be overridden on the deployment level through job_variables if desired.
-
Consider adding a build action to your Prefect Project prefect.yaml
if you want to automatically build a Docker image and push it to an image registry prefect deploy
is run.
Here is an example build action for ECR:
build:\n- prefect.deployments.steps.run_shell_script:\n id: get-commit-hash\n script: git rev-parse --short HEAD\n stream_output: false\n- prefect.deployments.steps.run_shell_script:\n id: ecr-auth-step\n script: aws ecr get-login-password --region <region> | docker login --username\n AWS --password-stdin <>.dkr.ecr.<region>.amazonaws.com\n stream_output: false\n- prefect_docker.deployments.steps.build_docker_image:\n requires: prefect-docker>=0.3.0\n image_name: <your-AWS-account-number>.dkr.ecr.us-east-2.amazonaws.com/<registry>\n tag: '{{ get-commit-hash.stdout }}'\n dockerfile: auto\n push: true\n
"},{"location":"ecs_worker/","title":"ECS Worker","text":""},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker","title":"
prefect_aws.workers.ecs_worker
","text":"
Prefect worker for executing flow runs as ECS tasks.
Get started by creating a work pool:
$ prefect work-pool create --type ecs my-ecs-pool\n
Then, you can start a worker for the pool:
$ prefect worker start --pool my-ecs-pool\n
It's common to deploy the worker as an ECS task as well. However, you can run the worker locally to get started.
The worker may work without any additional configuration, but it is dependent on your specific AWS setup and we'd recommend opening the work pool editor in the UI to see the available options.
By default, the worker will register a task definition for each flow run and run a task in your default ECS cluster using AWS Fargate. Fargate requires tasks to configure subnets, which we will infer from your default VPC. If you do not have a default VPC, you must provide a VPC ID or manually setup the network configuration for your tasks.
Note, the worker caches task definitions for each deployment to avoid excessive registration. The worker will check that the cached task definition is compatible with your configuration before using it.
The launch type option can be used to run your tasks in different modes. For example, FARGATE_SPOT
can be used to use spot instances for your Fargate tasks or EC2
can be used to run your tasks on a cluster backed by EC2 instances.
Generally, it is very useful to enable CloudWatch logging for your ECS tasks; this can help you debug task failures. To enable CloudWatch logging, you must provide an execution role ARN with permissions to create and write to log streams. See the configure_cloudwatch_logs
field documentation for details.
The worker can be configured to use an existing task definition by setting the task definition arn variable or by providing a \"taskDefinition\" in the task run request. When a task definition is provided, the worker will never create a new task definition which may result in variables that are templated into the task definition payload being ignored.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker-classes","title":"Classes","text":""},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSIdentifier","title":"
ECSIdentifier (tuple)
","text":"
The identifier for a running ECS task.
Source code in
prefect_aws/workers/ecs_worker.py
class ECSIdentifier(NamedTuple):\n \"\"\"\n The identifier for a running ECS task.\n \"\"\"\n\n cluster: str\n task_arn: str\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSIdentifier-methods","title":"Methods","text":""},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSIdentifier.__getnewargs__","title":"
__getnewargs__
special
","text":"
Return self as a plain tuple. Used by copy and pickle.
Source code in
prefect_aws/workers/ecs_worker.py
def __getnewargs__(self):\n 'Return self as a plain tuple. Used by copy and pickle.'\n return _tuple(self)\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSIdentifier.__new__","title":"
__new__
special
staticmethod
","text":"
Create new instance of ECSIdentifier(cluster, task_arn)
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSIdentifier.__repr__","title":"
__repr__
special
","text":"
Return a nicely formatted representation string
Source code in
prefect_aws/workers/ecs_worker.py
def __repr__(self):\n 'Return a nicely formatted representation string'\n return self.__class__.__name__ + repr_fmt % self\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSJobConfiguration","title":"
ECSJobConfiguration (BaseJobConfiguration)
pydantic-model
","text":"
Job configuration for an ECS worker.
Source code in
prefect_aws/workers/ecs_worker.py
class ECSJobConfiguration(BaseJobConfiguration):\n \"\"\"\n Job configuration for an ECS worker.\n \"\"\"\n\n aws_credentials: Optional[AwsCredentials] = Field(default_factory=AwsCredentials)\n task_definition: Optional[Dict[str, Any]] = Field(\n template=_default_task_definition_template()\n )\n task_run_request: Dict[str, Any] = Field(\n template=_default_task_run_request_template()\n )\n configure_cloudwatch_logs: Optional[bool] = Field(default=None)\n cloudwatch_logs_options: Dict[str, str] = Field(default_factory=dict)\n network_configuration: Dict[str, Any] = Field(default_factory=dict)\n stream_output: Optional[bool] = Field(default=None)\n task_start_timeout_seconds: int = Field(default=300)\n task_watch_poll_interval: float = Field(default=5.0)\n auto_deregister_task_definition: bool = Field(default=False)\n vpc_id: Optional[str] = Field(default=None)\n container_name: Optional[str] = Field(default=None)\n cluster: Optional[str] = Field(default=None)\n\n @root_validator\n def task_run_request_requires_arn_if_no_task_definition_given(cls, values) -> dict:\n \"\"\"\n If no task definition is provided, a task definition ARN must be present on the\n task run request.\n \"\"\"\n if not values.get(\"task_run_request\", {}).get(\n \"taskDefinition\"\n ) and not values.get(\"task_definition\"):\n raise ValueError(\n \"A task definition must be provided if a task definition ARN is not \"\n \"present on the task run request.\"\n )\n return values\n\n @root_validator\n def container_name_default_from_task_definition(cls, values) -> dict:\n \"\"\"\n Infers the container name from the task definition if not provided.\n \"\"\"\n if values.get(\"container_name\") is None:\n values[\"container_name\"] = _container_name_from_task_definition(\n values.get(\"task_definition\")\n )\n\n # We may not have a name here still; for example if someone is using a task\n # definition arn. In that case, we'll perform similar logic later to find\n # the name to treat as the \"orchestration\" container.\n\n return values\n\n @root_validator(pre=True)\n def set_default_configure_cloudwatch_logs(cls, values: dict) -> dict:\n \"\"\"\n Streaming output generally requires CloudWatch logs to be configured.\n\n To avoid entangled arguments in the simple case, `configure_cloudwatch_logs`\n defaults to matching the value of `stream_output`.\n \"\"\"\n configure_cloudwatch_logs = values.get(\"configure_cloudwatch_logs\")\n if configure_cloudwatch_logs is None:\n values[\"configure_cloudwatch_logs\"] = values.get(\"stream_output\")\n return values\n\n @root_validator\n def configure_cloudwatch_logs_requires_execution_role_arn(\n cls, values: dict\n ) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if (\n values.get(\"configure_cloudwatch_logs\")\n and not values.get(\"execution_role_arn\")\n # TODO: Does not match\n # Do not raise if they've linked to another task definition or provided\n # it without using our shortcuts\n and not values.get(\"task_run_request\", {}).get(\"taskDefinition\")\n and not (values.get(\"task_definition\") or {}).get(\"executionRoleArn\")\n ):\n raise ValueError(\n \"An `execution_role_arn` must be provided to use \"\n \"`configure_cloudwatch_logs` or `stream_logs`.\"\n )\n return values\n\n @root_validator\n def cloudwatch_logs_options_requires_configure_cloudwatch_logs(\n cls, values: dict\n ) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if values.get(\"cloudwatch_logs_options\") and not values.get(\n \"configure_cloudwatch_logs\"\n ):\n raise ValueError(\n \"`configure_cloudwatch_log` must be enabled to use \"\n \"`cloudwatch_logs_options`.\"\n )\n return values\n\n @root_validator\n def network_configuration_requires_vpc_id(cls, values: dict) -> dict:\n \"\"\"\n Enforces a `vpc_id` is provided when custom network configuration mode is\n enabled for network settings.\n \"\"\"\n if values.get(\"network_configuration\") and not values.get(\"vpc_id\"):\n raise ValueError(\n \"You must provide a `vpc_id` to enable custom `network_configuration`.\"\n )\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSJobConfiguration-methods","title":"Methods","text":""},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSJobConfiguration.cloudwatch_logs_options_requires_configure_cloudwatch_logs","title":"
cloudwatch_logs_options_requires_configure_cloudwatch_logs
classmethod
","text":"
Enforces that an execution role arn is provided (or could be provided by a runtime task definition) when configuring logging.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef cloudwatch_logs_options_requires_configure_cloudwatch_logs(\n cls, values: dict\n) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if values.get(\"cloudwatch_logs_options\") and not values.get(\n \"configure_cloudwatch_logs\"\n ):\n raise ValueError(\n \"`configure_cloudwatch_log` must be enabled to use \"\n \"`cloudwatch_logs_options`.\"\n )\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSJobConfiguration.configure_cloudwatch_logs_requires_execution_role_arn","title":"
configure_cloudwatch_logs_requires_execution_role_arn
classmethod
","text":"
Enforces that an execution role arn is provided (or could be provided by a runtime task definition) when configuring logging.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef configure_cloudwatch_logs_requires_execution_role_arn(\n cls, values: dict\n) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if (\n values.get(\"configure_cloudwatch_logs\")\n and not values.get(\"execution_role_arn\")\n # TODO: Does not match\n # Do not raise if they've linked to another task definition or provided\n # it without using our shortcuts\n and not values.get(\"task_run_request\", {}).get(\"taskDefinition\")\n and not (values.get(\"task_definition\") or {}).get(\"executionRoleArn\")\n ):\n raise ValueError(\n \"An `execution_role_arn` must be provided to use \"\n \"`configure_cloudwatch_logs` or `stream_logs`.\"\n )\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSJobConfiguration.container_name_default_from_task_definition","title":"
container_name_default_from_task_definition
classmethod
","text":"
Infers the container name from the task definition if not provided.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef container_name_default_from_task_definition(cls, values) -> dict:\n \"\"\"\n Infers the container name from the task definition if not provided.\n \"\"\"\n if values.get(\"container_name\") is None:\n values[\"container_name\"] = _container_name_from_task_definition(\n values.get(\"task_definition\")\n )\n\n # We may not have a name here still; for example if someone is using a task\n # definition arn. In that case, we'll perform similar logic later to find\n # the name to treat as the \"orchestration\" container.\n\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSJobConfiguration.network_configuration_requires_vpc_id","title":"
network_configuration_requires_vpc_id
classmethod
","text":"
Enforces a vpc_id
is provided when custom network configuration mode is enabled for network settings.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef network_configuration_requires_vpc_id(cls, values: dict) -> dict:\n \"\"\"\n Enforces a `vpc_id` is provided when custom network configuration mode is\n enabled for network settings.\n \"\"\"\n if values.get(\"network_configuration\") and not values.get(\"vpc_id\"):\n raise ValueError(\n \"You must provide a `vpc_id` to enable custom `network_configuration`.\"\n )\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSJobConfiguration.set_default_configure_cloudwatch_logs","title":"
set_default_configure_cloudwatch_logs
classmethod
","text":"
Streaming output generally requires CloudWatch logs to be configured.
To avoid entangled arguments in the simple case, configure_cloudwatch_logs
defaults to matching the value of stream_output
.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator(pre=True)\ndef set_default_configure_cloudwatch_logs(cls, values: dict) -> dict:\n \"\"\"\n Streaming output generally requires CloudWatch logs to be configured.\n\n To avoid entangled arguments in the simple case, `configure_cloudwatch_logs`\n defaults to matching the value of `stream_output`.\n \"\"\"\n configure_cloudwatch_logs = values.get(\"configure_cloudwatch_logs\")\n if configure_cloudwatch_logs is None:\n values[\"configure_cloudwatch_logs\"] = values.get(\"stream_output\")\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSJobConfiguration.task_run_request_requires_arn_if_no_task_definition_given","title":"
task_run_request_requires_arn_if_no_task_definition_given
classmethod
","text":"
If no task definition is provided, a task definition ARN must be present on the task run request.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef task_run_request_requires_arn_if_no_task_definition_given(cls, values) -> dict:\n \"\"\"\n If no task definition is provided, a task definition ARN must be present on the\n task run request.\n \"\"\"\n if not values.get(\"task_run_request\", {}).get(\n \"taskDefinition\"\n ) and not values.get(\"task_definition\"):\n raise ValueError(\n \"A task definition must be provided if a task definition ARN is not \"\n \"present on the task run request.\"\n )\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables","title":"
ECSVariables (BaseVariables)
pydantic-model
","text":"
Variables for templating an ECS job.
Source code in
prefect_aws/workers/ecs_worker.py
class ECSVariables(BaseVariables):\n \"\"\"\n Variables for templating an ECS job.\n \"\"\"\n\n task_definition_arn: Optional[str] = Field(\n default=None,\n description=(\n \"An identifier for an existing task definition to use. If set, options that\"\n \" require changes to the task definition will be ignored. All contents of \"\n \"the task definition in the job configuration will be ignored.\"\n ),\n )\n env: Dict[str, Optional[str]] = Field(\n title=\"Environment Variables\",\n default_factory=dict,\n description=(\n \"Environment variables to provide to the task run. These variables are set \"\n \"on the Prefect container at task runtime. These will not be set on the \"\n \"task definition.\"\n ),\n )\n aws_credentials: AwsCredentials = Field(\n title=\"AWS Credentials\",\n default_factory=AwsCredentials,\n description=(\n \"The AWS credentials to use to connect to ECS. If not provided, credentials\"\n \" will be inferred from the local environment following AWS's boto client's\"\n \" rules.\"\n ),\n )\n cluster: Optional[str] = Field(\n default=None,\n description=(\n \"The ECS cluster to run the task in. An ARN or name may be provided. If \"\n \"not provided, the default cluster will be used.\"\n ),\n )\n family: Optional[str] = Field(\n default=None,\n description=(\n \"A family for the task definition. If not provided, it will be inferred \"\n \"from the task definition. If the task definition does not have a family, \"\n \"the name will be generated. When flow and deployment metadata is \"\n \"available, the generated name will include their names. Values for this \"\n \"field will be slugified to match AWS character requirements.\"\n ),\n )\n launch_type: Optional[Literal[\"FARGATE\", \"EC2\", \"EXTERNAL\", \"FARGATE_SPOT\"]] = (\n Field(\n default=ECS_DEFAULT_LAUNCH_TYPE,\n description=(\n \"The type of ECS task run infrastructure that should be used. Note that\"\n \" 'FARGATE_SPOT' is not a formal ECS launch type, but we will configure\"\n \" the proper capacity provider stategy if set here.\"\n ),\n )\n )\n image: Optional[str] = Field(\n default=None,\n description=(\n \"The image to use for the Prefect container in the task. If this value is \"\n \"not null, it will override the value in the task definition. This value \"\n \"defaults to a Prefect base image matching your local versions.\"\n ),\n )\n cpu: int = Field(\n title=\"CPU\",\n default=None,\n description=(\n \"The amount of CPU to provide to the ECS task. Valid amounts are \"\n \"specified in the AWS documentation. If not provided, a default value of \"\n f\"{ECS_DEFAULT_CPU} will be used unless present on the task definition.\"\n ),\n )\n memory: int = Field(\n default=None,\n description=(\n \"The amount of memory to provide to the ECS task. Valid amounts are \"\n \"specified in the AWS documentation. If not provided, a default value of \"\n f\"{ECS_DEFAULT_MEMORY} will be used unless present on the task definition.\"\n ),\n )\n container_name: str = Field(\n default=None,\n description=(\n \"The name of the container flow run orchestration will occur in. If not \"\n f\"specified, a default value of {ECS_DEFAULT_CONTAINER_NAME} will be used \"\n \"and if that is not found in the task definition the first container will \"\n \"be used.\"\n ),\n )\n task_role_arn: str = Field(\n title=\"Task Role ARN\",\n default=None,\n description=(\n \"A role to attach to the task run. This controls the permissions of the \"\n \"task while it is running.\"\n ),\n )\n execution_role_arn: str = Field(\n title=\"Execution Role ARN\",\n default=None,\n description=(\n \"An execution role to use for the task. This controls the permissions of \"\n \"the task when it is launching. If this value is not null, it will \"\n \"override the value in the task definition. An execution role must be \"\n \"provided to capture logs from the container.\"\n ),\n )\n vpc_id: Optional[str] = Field(\n title=\"VPC ID\",\n default=None,\n description=(\n \"The AWS VPC to link the task run to. This is only applicable when using \"\n \"the 'awsvpc' network mode for your task. FARGATE tasks require this \"\n \"network mode, but for EC2 tasks the default network mode is 'bridge'. \"\n \"If using the 'awsvpc' network mode and this field is null, your default \"\n \"VPC will be used. If no default VPC can be found, the task run will fail.\"\n ),\n )\n configure_cloudwatch_logs: bool = Field(\n default=None,\n description=(\n \"If enabled, the Prefect container will be configured to send its output \"\n \"to the AWS CloudWatch logs service. This functionality requires an \"\n \"execution role with logs:CreateLogStream, logs:CreateLogGroup, and \"\n \"logs:PutLogEvents permissions. The default for this field is `False` \"\n \"unless `stream_output` is set.\"\n ),\n )\n cloudwatch_logs_options: Dict[str, str] = Field(\n default_factory=dict,\n description=(\n \"When `configure_cloudwatch_logs` is enabled, this setting may be used to\"\n \" pass additional options to the CloudWatch logs configuration or override\"\n \" the default options. See the [AWS\"\n \" documentation](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_awslogs.html#create_awslogs_logdriver_options)\" # noqa\n \" for available options. \"\n ),\n )\n\n network_configuration: Dict[str, Any] = Field(\n default_factory=dict,\n description=(\n \"When `network_configuration` is supplied it will override ECS Worker's\"\n \"awsvpcConfiguration that defined in the ECS task executing your workload. \"\n \"See the [AWS documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ecs-service-awsvpcconfiguration.html)\" # noqa\n \" for available options.\"\n ),\n )\n\n stream_output: bool = Field(\n default=None,\n description=(\n \"If enabled, logs will be streamed from the Prefect container to the local \"\n \"console. Unless you have configured AWS CloudWatch logs manually on your \"\n \"task definition, this requires the same prerequisites outlined in \"\n \"`configure_cloudwatch_logs`.\"\n ),\n )\n task_start_timeout_seconds: int = Field(\n default=300,\n description=(\n \"The amount of time to watch for the start of the ECS task \"\n \"before marking it as failed. The task must enter a RUNNING state to be \"\n \"considered started.\"\n ),\n )\n task_watch_poll_interval: float = Field(\n default=5.0,\n description=(\n \"The amount of time to wait between AWS API calls while monitoring the \"\n \"state of an ECS task.\"\n ),\n )\n auto_deregister_task_definition: bool = Field(\n default=False,\n description=(\n \"If enabled, any task definitions that are created by this block will be \"\n \"deregistered. Existing task definitions linked by ARN will never be \"\n \"deregistered. Deregistering a task definition does not remove it from \"\n \"your AWS account, instead it will be marked as INACTIVE.\"\n ),\n )\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables-attributes","title":"Attributes","text":""},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.auto_deregister_task_definition","title":"
auto_deregister_task_definition: bool
pydantic-field
","text":"
If enabled, any task definitions that are created by this block will be deregistered. Existing task definitions linked by ARN will never be deregistered. Deregistering a task definition does not remove it from your AWS account, instead it will be marked as INACTIVE.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.aws_credentials","title":"
aws_credentials: AwsCredentials
pydantic-field
","text":"
The AWS credentials to use to connect to ECS. If not provided, credentials will be inferred from the local environment following AWS's boto client's rules.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.cloudwatch_logs_options","title":"
cloudwatch_logs_options: Dict[str, str]
pydantic-field
","text":"
When configure_cloudwatch_logs
is enabled, this setting may be used to pass additional options to the CloudWatch logs configuration or override the default options. See the AWS documentation for available options.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.cluster","title":"
cluster: str
pydantic-field
","text":"
The ECS cluster to run the task in. An ARN or name may be provided. If not provided, the default cluster will be used.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.configure_cloudwatch_logs","title":"
configure_cloudwatch_logs: bool
pydantic-field
","text":"
If enabled, the Prefect container will be configured to send its output to the AWS CloudWatch logs service. This functionality requires an execution role with logs:CreateLogStream, logs:CreateLogGroup, and logs:PutLogEvents permissions. The default for this field is False
unless stream_output
is set.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.container_name","title":"
container_name: str
pydantic-field
","text":"
The name of the container flow run orchestration will occur in. If not specified, a default value of prefect will be used and if that is not found in the task definition the first container will be used.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.cpu","title":"
cpu: int
pydantic-field
","text":"
The amount of CPU to provide to the ECS task. Valid amounts are specified in the AWS documentation. If not provided, a default value of 1024 will be used unless present on the task definition.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.execution_role_arn","title":"
execution_role_arn: str
pydantic-field
","text":"
An execution role to use for the task. This controls the permissions of the task when it is launching. If this value is not null, it will override the value in the task definition. An execution role must be provided to capture logs from the container.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.family","title":"
family: str
pydantic-field
","text":"
A family for the task definition. If not provided, it will be inferred from the task definition. If the task definition does not have a family, the name will be generated. When flow and deployment metadata is available, the generated name will include their names. Values for this field will be slugified to match AWS character requirements.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.image","title":"
image: str
pydantic-field
","text":"
The image to use for the Prefect container in the task. If this value is not null, it will override the value in the task definition. This value defaults to a Prefect base image matching your local versions.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.launch_type","title":"
launch_type: Literal['FARGATE', 'EC2', 'EXTERNAL', 'FARGATE_SPOT']
pydantic-field
","text":"
The type of ECS task run infrastructure that should be used. Note that 'FARGATE_SPOT' is not a formal ECS launch type, but we will configure the proper capacity provider stategy if set here.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.memory","title":"
memory: int
pydantic-field
","text":"
The amount of memory to provide to the ECS task. Valid amounts are specified in the AWS documentation. If not provided, a default value of 2048 will be used unless present on the task definition.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.network_configuration","title":"
network_configuration: Dict[str, Any]
pydantic-field
","text":"
When network_configuration
is supplied it will override ECS Worker'sawsvpcConfiguration that defined in the ECS task executing your workload. See the AWS documentation for available options.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.stream_output","title":"
stream_output: bool
pydantic-field
","text":"
If enabled, logs will be streamed from the Prefect container to the local console. Unless you have configured AWS CloudWatch logs manually on your task definition, this requires the same prerequisites outlined in configure_cloudwatch_logs
.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.task_definition_arn","title":"
task_definition_arn: str
pydantic-field
","text":"
An identifier for an existing task definition to use. If set, options that require changes to the task definition will be ignored. All contents of the task definition in the job configuration will be ignored.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.task_role_arn","title":"
task_role_arn: str
pydantic-field
","text":"
A role to attach to the task run. This controls the permissions of the task while it is running.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.task_start_timeout_seconds","title":"
task_start_timeout_seconds: int
pydantic-field
","text":"
The amount of time to watch for the start of the ECS task before marking it as failed. The task must enter a RUNNING state to be considered started.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.task_watch_poll_interval","title":"
task_watch_poll_interval: float
pydantic-field
","text":"
The amount of time to wait between AWS API calls while monitoring the state of an ECS task.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSVariables.vpc_id","title":"
vpc_id: str
pydantic-field
","text":"
The AWS VPC to link the task run to. This is only applicable when using the 'awsvpc' network mode for your task. FARGATE tasks require this network mode, but for EC2 tasks the default network mode is 'bridge'. If using the 'awsvpc' network mode and this field is null, your default VPC will be used. If no default VPC can be found, the task run will fail.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker","title":"
ECSWorker (BaseWorker)
","text":"
A Prefect worker to run flow runs as ECS tasks.
Source code in
prefect_aws/workers/ecs_worker.py
class ECSWorker(BaseWorker):\n \"\"\"\n A Prefect worker to run flow runs as ECS tasks.\n \"\"\"\n\n type = \"ecs\"\n job_configuration = ECSJobConfiguration\n job_configuration_variables = ECSVariables\n _description = (\n \"Execute flow runs within containers on AWS ECS. Works with EC2 \"\n \"and Fargate clusters. Requires an AWS account.\"\n )\n _display_name = \"AWS Elastic Container Service\"\n _documentation_url = \"https://prefecthq.github.io/prefect-aws/ecs_worker/\"\n _logo_url = \"https://cdn.sanity.io/images/3ugk85nk/production/d74b16fe84ce626345adf235a47008fea2869a60-225x225.png\" # noqa\n\n async def run(\n self,\n flow_run: \"FlowRun\",\n configuration: ECSJobConfiguration,\n task_status: Optional[anyio.abc.TaskStatus] = None,\n ) -> BaseWorkerResult:\n \"\"\"\n Runs a given flow run on the current worker.\n \"\"\"\n boto_session, ecs_client = await run_sync_in_worker_thread(\n self._get_session_and_client, configuration\n )\n\n logger = self.get_flow_run_logger(flow_run)\n\n (\n task_arn,\n cluster_arn,\n task_definition,\n is_new_task_definition,\n ) = await run_sync_in_worker_thread(\n self._create_task_and_wait_for_start,\n logger,\n boto_session,\n ecs_client,\n configuration,\n flow_run,\n )\n\n # The task identifier is \"{cluster}::{task}\" where we use the configured cluster\n # if set to preserve matching by name rather than arn\n # Note \"::\" is used despite the Prefect standard being \":\" because ARNs contain\n # single colons.\n identifier = (\n (configuration.cluster if configuration.cluster else cluster_arn)\n + \"::\"\n + task_arn\n )\n\n if task_status:\n task_status.started(identifier)\n\n status_code = await run_sync_in_worker_thread(\n self._watch_task_and_get_exit_code,\n logger,\n configuration,\n task_arn,\n cluster_arn,\n task_definition,\n is_new_task_definition and configuration.auto_deregister_task_definition,\n boto_session,\n ecs_client,\n )\n\n return ECSWorkerResult(\n identifier=identifier,\n # If the container does not start the exit code can be null but we must\n # still report a status code. We use a -1 to indicate a special code.\n status_code=status_code if status_code is not None else -1,\n )\n\n def _get_session_and_client(\n self,\n configuration: ECSJobConfiguration,\n ) -> Tuple[boto3.Session, _ECSClient]:\n \"\"\"\n Retrieve a boto3 session and ECS client\n \"\"\"\n boto_session = configuration.aws_credentials.get_boto3_session()\n ecs_client = boto_session.client(\"ecs\")\n return boto_session, ecs_client\n\n def _create_task_and_wait_for_start(\n self,\n logger: logging.Logger,\n boto_session: boto3.Session,\n ecs_client: _ECSClient,\n configuration: ECSJobConfiguration,\n flow_run: FlowRun,\n ) -> Tuple[str, str, dict, bool]:\n \"\"\"\n Register the task definition, create the task run, and wait for it to start.\n\n Returns a tuple of\n - The task ARN\n - The task's cluster ARN\n - The task definition\n - A bool indicating if the task definition is newly registered\n \"\"\"\n task_definition_arn = configuration.task_run_request.get(\"taskDefinition\")\n new_task_definition_registered = False\n\n if not task_definition_arn:\n cached_task_definition_arn = _TASK_DEFINITION_CACHE.get(\n flow_run.deployment_id\n )\n task_definition = self._prepare_task_definition(\n configuration, region=ecs_client.meta.region_name\n )\n\n if cached_task_definition_arn:\n # Read the task definition to see if the cached task definition is valid\n try:\n cached_task_definition = self._retrieve_task_definition(\n logger, ecs_client, cached_task_definition_arn\n )\n except Exception as exc:\n logger.warning(\n \"Failed to retrieve cached task definition\"\n f\" {cached_task_definition_arn!r}: {exc!r}\"\n )\n # Clear from cache\n _TASK_DEFINITION_CACHE.pop(flow_run.deployment_id, None)\n cached_task_definition_arn = None\n else:\n if not cached_task_definition[\"status\"] == \"ACTIVE\":\n # Cached task definition is not active\n logger.warning(\n \"Cached task definition\"\n f\" {cached_task_definition_arn!r} is not active\"\n )\n _TASK_DEFINITION_CACHE.pop(flow_run.deployment_id, None)\n cached_task_definition_arn = None\n elif not self._task_definitions_equal(\n task_definition, cached_task_definition\n ):\n # Cached task definition is not valid\n logger.warning(\n \"Cached task definition\"\n f\" {cached_task_definition_arn!r} does not meet\"\n \" requirements\"\n )\n _TASK_DEFINITION_CACHE.pop(flow_run.deployment_id, None)\n cached_task_definition_arn = None\n\n if not cached_task_definition_arn:\n task_definition_arn = self._register_task_definition(\n logger, ecs_client, task_definition\n )\n new_task_definition_registered = True\n else:\n task_definition_arn = cached_task_definition_arn\n else:\n task_definition = self._retrieve_task_definition(\n logger, ecs_client, task_definition_arn\n )\n if configuration.task_definition:\n logger.warning(\n \"Ignoring task definition in configuration since task definition\"\n \" ARN is provided on the task run request.\"\n )\n\n self._validate_task_definition(task_definition, configuration)\n\n # Update the cached task definition ARN to avoid re-registering the task\n # definition on this worker unless necessary; registration is agressively\n # rate limited by AWS\n _TASK_DEFINITION_CACHE[flow_run.deployment_id] = task_definition_arn\n\n logger.info(f\"Using ECS task definition {task_definition_arn!r}...\")\n logger.debug(\n f\"Task definition {json.dumps(task_definition, indent=2, default=str)}\"\n )\n\n # Prepare the task run request\n task_run_request = self._prepare_task_run_request(\n boto_session,\n configuration,\n task_definition,\n task_definition_arn,\n )\n\n logger.info(\"Creating ECS task run...\")\n logger.debug(\n \"Task run request\"\n f\"{json.dumps(mask_api_key(task_run_request), indent=2, default=str)}\"\n )\n\n try:\n task = self._create_task_run(ecs_client, task_run_request)\n task_arn = task[\"taskArn\"]\n cluster_arn = task[\"clusterArn\"]\n except Exception as exc:\n self._report_task_run_creation_failure(configuration, task_run_request, exc)\n raise\n\n # Raises an exception if the task does not start\n logger.info(\"Waiting for ECS task run to start...\")\n self._wait_for_task_start(\n logger,\n configuration,\n task_arn,\n cluster_arn,\n ecs_client,\n timeout=configuration.task_start_timeout_seconds,\n )\n\n return task_arn, cluster_arn, task_definition, new_task_definition_registered\n\n def _watch_task_and_get_exit_code(\n self,\n logger: logging.Logger,\n configuration: ECSJobConfiguration,\n task_arn: str,\n cluster_arn: str,\n task_definition: dict,\n deregister_task_definition: bool,\n boto_session: boto3.Session,\n ecs_client: _ECSClient,\n ) -> Optional[int]:\n \"\"\"\n Wait for the task run to complete and retrieve the exit code of the Prefect\n container.\n \"\"\"\n\n # Wait for completion and stream logs\n task = self._wait_for_task_finish(\n logger,\n configuration,\n task_arn,\n cluster_arn,\n task_definition,\n ecs_client,\n boto_session,\n )\n\n if deregister_task_definition:\n ecs_client.deregister_task_definition(\n taskDefinition=task[\"taskDefinitionArn\"]\n )\n\n container_name = (\n configuration.container_name\n or _container_name_from_task_definition(task_definition)\n or ECS_DEFAULT_CONTAINER_NAME\n )\n\n # Check the status code of the Prefect container\n container = _get_container(task[\"containers\"], container_name)\n assert (\n container is not None\n ), f\"'{container_name}' container missing from task: {task}\"\n status_code = container.get(\"exitCode\")\n self._report_container_status_code(logger, container_name, status_code)\n\n return status_code\n\n def _report_container_status_code(\n self, logger: logging.Logger, name: str, status_code: Optional[int]\n ) -> None:\n \"\"\"\n Display a log for the given container status code.\n \"\"\"\n if status_code is None:\n logger.error(\n f\"Task exited without reporting an exit status for container {name!r}.\"\n )\n elif status_code == 0:\n logger.info(f\"Container {name!r} exited successfully.\")\n else:\n logger.warning(\n f\"Container {name!r} exited with non-zero exit code {status_code}.\"\n )\n\n def _report_task_run_creation_failure(\n self, configuration: ECSJobConfiguration, task_run: dict, exc: Exception\n ) -> None:\n \"\"\"\n Wrap common AWS task run creation failures with nicer user-facing messages.\n \"\"\"\n # AWS generates exception types at runtime so they must be captured a bit\n # differently than normal.\n if \"ClusterNotFoundException\" in str(exc):\n cluster = task_run.get(\"cluster\", \"default\")\n raise RuntimeError(\n f\"Failed to run ECS task, cluster {cluster!r} not found. \"\n \"Confirm that the cluster is configured in your region.\"\n ) from exc\n elif (\n \"No Container Instances\" in str(exc) and task_run.get(\"launchType\") == \"EC2\"\n ):\n cluster = task_run.get(\"cluster\", \"default\")\n raise RuntimeError(\n f\"Failed to run ECS task, cluster {cluster!r} does not appear to \"\n \"have any container instances associated with it. Confirm that you \"\n \"have EC2 container instances available.\"\n ) from exc\n elif (\n \"failed to validate logger args\" in str(exc)\n and \"AccessDeniedException\" in str(exc)\n and configuration.configure_cloudwatch_logs\n ):\n raise RuntimeError(\n \"Failed to run ECS task, the attached execution role does not appear\"\n \" to have sufficient permissions. Ensure that the execution role\"\n f\" {configuration.execution_role!r} has permissions\"\n \" logs:CreateLogStream, logs:CreateLogGroup, and logs:PutLogEvents.\"\n )\n else:\n raise\n\n def _validate_task_definition(\n self, task_definition: dict, configuration: ECSJobConfiguration\n ) -> None:\n \"\"\"\n Ensure that the task definition is compatible with the configuration.\n\n Raises `ValueError` on incompatibility. Returns `None` on success.\n \"\"\"\n launch_type = configuration.task_run_request.get(\n \"launchType\", ECS_DEFAULT_LAUNCH_TYPE\n )\n if (\n launch_type != \"EC2\"\n and \"FARGATE\" not in task_definition[\"requiresCompatibilities\"]\n ):\n raise ValueError(\n \"Task definition does not have 'FARGATE' in 'requiresCompatibilities'\"\n f\" and cannot be used with launch type {launch_type!r}\"\n )\n\n if launch_type == \"FARGATE\" or launch_type == \"FARGATE_SPOT\":\n # Only the 'awsvpc' network mode is supported when using FARGATE\n network_mode = task_definition.get(\"networkMode\")\n if network_mode != \"awsvpc\":\n raise ValueError(\n f\"Found network mode {network_mode!r} which is not compatible with \"\n f\"launch type {launch_type!r}. Use either the 'EC2' launch \"\n \"type or the 'awsvpc' network mode.\"\n )\n\n if configuration.configure_cloudwatch_logs and not task_definition.get(\n \"executionRoleArn\"\n ):\n raise ValueError(\n \"An execution role arn must be set on the task definition to use \"\n \"`configure_cloudwatch_logs` or `stream_logs` but no execution role \"\n \"was found on the task definition.\"\n )\n\n def _register_task_definition(\n self,\n logger: logging.Logger,\n ecs_client: _ECSClient,\n task_definition: dict,\n ) -> str:\n \"\"\"\n Register a new task definition with AWS.\n\n Returns the ARN.\n \"\"\"\n logger.info(\"Registering ECS task definition...\")\n logger.debug(\n \"Task definition request\"\n f\"{json.dumps(task_definition, indent=2, default=str)}\"\n )\n response = ecs_client.register_task_definition(**task_definition)\n return response[\"taskDefinition\"][\"taskDefinitionArn\"]\n\n def _retrieve_task_definition(\n self,\n logger: logging.Logger,\n ecs_client: _ECSClient,\n task_definition_arn: str,\n ):\n \"\"\"\n Retrieve an existing task definition from AWS.\n \"\"\"\n logger.info(f\"Retrieving ECS task definition {task_definition_arn!r}...\")\n response = ecs_client.describe_task_definition(\n taskDefinition=task_definition_arn\n )\n return response[\"taskDefinition\"]\n\n def _wait_for_task_start(\n self,\n logger: logging.Logger,\n configuration: ECSJobConfiguration,\n task_arn: str,\n cluster_arn: str,\n ecs_client: _ECSClient,\n timeout: int,\n ) -> dict:\n \"\"\"\n Waits for an ECS task run to reach a RUNNING status.\n\n If a STOPPED status is reached instead, an exception is raised indicating the\n reason that the task run did not start.\n \"\"\"\n for task in self._watch_task_run(\n logger,\n configuration,\n task_arn,\n cluster_arn,\n ecs_client,\n until_status=\"RUNNING\",\n timeout=timeout,\n ):\n # TODO: It is possible that the task has passed _through_ a RUNNING\n # status during the polling interval. In this case, there is not an\n # exception to raise.\n if task[\"lastStatus\"] == \"STOPPED\":\n code = task.get(\"stopCode\")\n reason = task.get(\"stoppedReason\")\n # Generate a dynamic exception type from the AWS name\n raise type(code, (RuntimeError,), {})(reason)\n\n return task\n\n def _wait_for_task_finish(\n self,\n logger: logging.Logger,\n configuration: ECSJobConfiguration,\n task_arn: str,\n cluster_arn: str,\n task_definition: dict,\n ecs_client: _ECSClient,\n boto_session: boto3.Session,\n ):\n \"\"\"\n Watch an ECS task until it reaches a STOPPED status.\n\n If configured, logs from the Prefect container are streamed to stderr.\n\n Returns a description of the task on completion.\n \"\"\"\n can_stream_output = False\n container_name = (\n configuration.container_name\n or _container_name_from_task_definition(task_definition)\n or ECS_DEFAULT_CONTAINER_NAME\n )\n\n if configuration.stream_output:\n container_def = _get_container(\n task_definition[\"containerDefinitions\"], container_name\n )\n if not container_def:\n logger.warning(\n \"Prefect container definition not found in \"\n \"task definition. Output cannot be streamed.\"\n )\n elif not container_def.get(\"logConfiguration\"):\n logger.warning(\n \"Logging configuration not found on task. \"\n \"Output cannot be streamed.\"\n )\n elif not container_def[\"logConfiguration\"].get(\"logDriver\") == \"awslogs\":\n logger.warning(\n \"Logging configuration uses unsupported \"\n \" driver {container_def['logConfiguration'].get('logDriver')!r}. \"\n \"Output cannot be streamed.\"\n )\n else:\n # Prepare to stream the output\n log_config = container_def[\"logConfiguration\"][\"options\"]\n logs_client = boto_session.client(\"logs\")\n can_stream_output = True\n # Track the last log timestamp to prevent double display\n last_log_timestamp: Optional[int] = None\n # Determine the name of the stream as \"prefix/container/run-id\"\n stream_name = \"/\".join(\n [\n log_config[\"awslogs-stream-prefix\"],\n container_name,\n task_arn.rsplit(\"/\")[-1],\n ]\n )\n self._logger.info(\n f\"Streaming output from container {container_name!r}...\"\n )\n\n for task in self._watch_task_run(\n logger,\n configuration,\n task_arn,\n cluster_arn,\n ecs_client,\n current_status=\"RUNNING\",\n ):\n if configuration.stream_output and can_stream_output:\n # On each poll for task run status, also retrieve available logs\n last_log_timestamp = self._stream_available_logs(\n logger,\n logs_client,\n log_group=log_config[\"awslogs-group\"],\n log_stream=stream_name,\n last_log_timestamp=last_log_timestamp,\n )\n\n return task\n\n def _stream_available_logs(\n self,\n logger: logging.Logger,\n logs_client: Any,\n log_group: str,\n log_stream: str,\n last_log_timestamp: Optional[int] = None,\n ) -> Optional[int]:\n \"\"\"\n Stream logs from the given log group and stream since the last log timestamp.\n\n Will continue on paginated responses until all logs are returned.\n\n Returns the last log timestamp which can be used to call this method in the\n future.\n \"\"\"\n last_log_stream_token = \"NO-TOKEN\"\n next_log_stream_token = None\n\n # AWS will return the same token that we send once the end of the paginated\n # response is reached\n while last_log_stream_token != next_log_stream_token:\n last_log_stream_token = next_log_stream_token\n\n request = {\n \"logGroupName\": log_group,\n \"logStreamName\": log_stream,\n }\n\n if last_log_stream_token is not None:\n request[\"nextToken\"] = last_log_stream_token\n\n if last_log_timestamp is not None:\n # Bump the timestamp by one ms to avoid retrieving the last log again\n request[\"startTime\"] = last_log_timestamp + 1\n\n try:\n response = logs_client.get_log_events(**request)\n except Exception:\n logger.error(\n f\"Failed to read log events with request {request}\",\n exc_info=True,\n )\n return last_log_timestamp\n\n log_events = response[\"events\"]\n for log_event in log_events:\n # TODO: This doesn't forward to the local logger, which can be\n # bad for customizing handling and understanding where the\n # log is coming from, but it avoid nesting logger information\n # when the content is output from a Prefect logger on the\n # running infrastructure\n print(log_event[\"message\"], file=sys.stderr)\n\n if (\n last_log_timestamp is None\n or log_event[\"timestamp\"] > last_log_timestamp\n ):\n last_log_timestamp = log_event[\"timestamp\"]\n\n next_log_stream_token = response.get(\"nextForwardToken\")\n if not log_events:\n # Stop reading pages if there was no data\n break\n\n return last_log_timestamp\n\n def _watch_task_run(\n self,\n logger: logging.Logger,\n configuration: ECSJobConfiguration,\n task_arn: str,\n cluster_arn: str,\n ecs_client: _ECSClient,\n current_status: str = \"UNKNOWN\",\n until_status: str = None,\n timeout: int = None,\n ) -> Generator[None, None, dict]:\n \"\"\"\n Watches an ECS task run by querying every `poll_interval` seconds. After each\n query, the retrieved task is yielded. This function returns when the task run\n reaches a STOPPED status or the provided `until_status`.\n\n Emits a log each time the status changes.\n \"\"\"\n last_status = status = current_status\n t0 = time.time()\n while status != until_status:\n tasks = ecs_client.describe_tasks(\n tasks=[task_arn], cluster=cluster_arn, include=[\"TAGS\"]\n )[\"tasks\"]\n\n if tasks:\n task = tasks[0]\n\n status = task[\"lastStatus\"]\n if status != last_status:\n logger.info(f\"ECS task status is {status}.\")\n\n yield task\n\n # No point in continuing if the status is final\n if status == \"STOPPED\":\n break\n\n last_status = status\n\n else:\n # Intermittently, the task will not be described. We wat to respect the\n # watch timeout though.\n logger.debug(\"Task not found.\")\n\n elapsed_time = time.time() - t0\n if timeout is not None and elapsed_time > timeout:\n raise RuntimeError(\n f\"Timed out after {elapsed_time}s while watching task for status \"\n f\"{until_status or 'STOPPED'}.\"\n )\n time.sleep(configuration.task_watch_poll_interval)\n\n def _prepare_task_definition(\n self,\n configuration: ECSJobConfiguration,\n region: str,\n ) -> dict:\n \"\"\"\n Prepare a task definition by inferring any defaults and merging overrides.\n \"\"\"\n task_definition = copy.deepcopy(configuration.task_definition)\n\n # Configure the Prefect runtime container\n task_definition.setdefault(\"containerDefinitions\", [])\n\n # Remove empty container definitions\n task_definition[\"containerDefinitions\"] = [\n d for d in task_definition[\"containerDefinitions\"] if d\n ]\n\n container_name = configuration.container_name\n if not container_name:\n container_name = (\n _container_name_from_task_definition(task_definition)\n or ECS_DEFAULT_CONTAINER_NAME\n )\n\n container = _get_container(\n task_definition[\"containerDefinitions\"], container_name\n )\n if container is None:\n if container_name != ECS_DEFAULT_CONTAINER_NAME:\n raise ValueError(\n f\"Container {container_name!r} not found in task definition.\"\n )\n\n # Look for a container without a name\n for container in task_definition[\"containerDefinitions\"]:\n if \"name\" not in container:\n container[\"name\"] = container_name\n break\n else:\n container = {\"name\": container_name}\n task_definition[\"containerDefinitions\"].append(container)\n\n # Image is required so make sure it's present\n container.setdefault(\"image\", get_prefect_image_name())\n\n # Remove any keys that have been explicitly \"unset\"\n unset_keys = {key for key, value in configuration.env.items() if value is None}\n for item in tuple(container.get(\"environment\", [])):\n if item[\"name\"] in unset_keys or item[\"value\"] is None:\n container[\"environment\"].remove(item)\n\n if configuration.configure_cloudwatch_logs:\n container[\"logConfiguration\"] = {\n \"logDriver\": \"awslogs\",\n \"options\": {\n \"awslogs-create-group\": \"true\",\n \"awslogs-group\": \"prefect\",\n \"awslogs-region\": region,\n \"awslogs-stream-prefix\": configuration.name or \"prefect\",\n **configuration.cloudwatch_logs_options,\n },\n }\n\n family = task_definition.get(\"family\") or ECS_DEFAULT_FAMILY\n task_definition[\"family\"] = slugify(\n family,\n max_length=255,\n regex_pattern=r\"[^a-zA-Z0-9-_]+\",\n )\n\n # CPU and memory are required in some cases, retrieve the value to use\n cpu = task_definition.get(\"cpu\") or ECS_DEFAULT_CPU\n memory = task_definition.get(\"memory\") or ECS_DEFAULT_MEMORY\n\n launch_type = configuration.task_run_request.get(\n \"launchType\", ECS_DEFAULT_LAUNCH_TYPE\n )\n\n if launch_type == \"FARGATE\" or launch_type == \"FARGATE_SPOT\":\n # Task level memory and cpu are required when using fargate\n task_definition[\"cpu\"] = str(cpu)\n task_definition[\"memory\"] = str(memory)\n\n # The FARGATE compatibility is required if it will be used as as launch type\n requires_compatibilities = task_definition.setdefault(\n \"requiresCompatibilities\", []\n )\n if \"FARGATE\" not in requires_compatibilities:\n task_definition[\"requiresCompatibilities\"].append(\"FARGATE\")\n\n # Only the 'awsvpc' network mode is supported when using FARGATE\n # However, we will not enforce that here if the user has set it\n task_definition.setdefault(\"networkMode\", \"awsvpc\")\n\n elif launch_type == \"EC2\":\n # Container level memory and cpu are required when using ec2\n container.setdefault(\"cpu\", cpu)\n container.setdefault(\"memory\", memory)\n\n # Ensure set values are cast to integers\n container[\"cpu\"] = int(container[\"cpu\"])\n container[\"memory\"] = int(container[\"memory\"])\n\n # Ensure set values are cast to strings\n if task_definition.get(\"cpu\"):\n task_definition[\"cpu\"] = str(task_definition[\"cpu\"])\n if task_definition.get(\"memory\"):\n task_definition[\"memory\"] = str(task_definition[\"memory\"])\n\n return task_definition\n\n def _load_network_configuration(\n self, vpc_id: Optional[str], boto_session: boto3.Session\n ) -> dict:\n \"\"\"\n Load settings from a specific VPC or the default VPC and generate a task\n run request's network configuration.\n \"\"\"\n ec2_client = boto_session.client(\"ec2\")\n vpc_message = \"the default VPC\" if not vpc_id else f\"VPC with ID {vpc_id}\"\n\n if not vpc_id:\n # Retrieve the default VPC\n describe = {\"Filters\": [{\"Name\": \"isDefault\", \"Values\": [\"true\"]}]}\n else:\n describe = {\"VpcIds\": [vpc_id]}\n\n vpcs = ec2_client.describe_vpcs(**describe)[\"Vpcs\"]\n if not vpcs:\n help_message = (\n \"Pass an explicit `vpc_id` or configure a default VPC.\"\n if not vpc_id\n else \"Check that the VPC exists in the current region.\"\n )\n raise ValueError(\n f\"Failed to find {vpc_message}. \"\n \"Network configuration cannot be inferred. \"\n + help_message\n )\n\n vpc_id = vpcs[0][\"VpcId\"]\n subnets = ec2_client.describe_subnets(\n Filters=[{\"Name\": \"vpc-id\", \"Values\": [vpc_id]}]\n )[\"Subnets\"]\n if not subnets:\n raise ValueError(\n f\"Failed to find subnets for {vpc_message}. \"\n \"Network configuration cannot be inferred.\"\n )\n\n return {\n \"awsvpcConfiguration\": {\n \"subnets\": [s[\"SubnetId\"] for s in subnets],\n \"assignPublicIp\": \"ENABLED\",\n \"securityGroups\": [],\n }\n }\n\n def _custom_network_configuration(\n self, vpc_id: str, network_configuration: dict, boto_session: boto3.Session\n ) -> dict:\n \"\"\"\n Load settings from a specific VPC or the default VPC and generate a task\n run request's network configuration.\n \"\"\"\n ec2_client = boto_session.client(\"ec2\")\n vpc_message = f\"VPC with ID {vpc_id}\"\n\n vpcs = ec2_client.describe_vpcs(VpcIds=[vpc_id]).get(\"Vpcs\")\n\n if not vpcs:\n raise ValueError(\n f\"Failed to find {vpc_message}. \"\n + \"Network configuration cannot be inferred. \"\n + \"Pass an explicit `vpc_id`.\"\n )\n\n vpc_id = vpcs[0][\"VpcId\"]\n subnets = ec2_client.describe_subnets(\n Filters=[{\"Name\": \"vpc-id\", \"Values\": [vpc_id]}]\n )[\"Subnets\"]\n\n if not subnets:\n raise ValueError(\n f\"Failed to find subnets for {vpc_message}. \"\n + \"Network configuration cannot be inferred.\"\n )\n\n subnet_ids = [subnet[\"SubnetId\"] for subnet in subnets]\n\n config_subnets = network_configuration.get(\"subnets\", [])\n if not all(conf_sn in subnet_ids for conf_sn in config_subnets):\n raise ValueError(\n f\"Subnets {config_subnets} not found within {vpc_message}.\"\n + \"Please check that VPC is associated with supplied subnets.\"\n )\n\n return {\"awsvpcConfiguration\": network_configuration}\n\n def _prepare_task_run_request(\n self,\n boto_session: boto3.Session,\n configuration: ECSJobConfiguration,\n task_definition: dict,\n task_definition_arn: str,\n ) -> dict:\n \"\"\"\n Prepare a task run request payload.\n \"\"\"\n task_run_request = deepcopy(configuration.task_run_request)\n\n task_run_request.setdefault(\"taskDefinition\", task_definition_arn)\n assert task_run_request[\"taskDefinition\"] == task_definition_arn\n\n if task_run_request.get(\"launchType\") == \"FARGATE_SPOT\":\n # Should not be provided at all for FARGATE SPOT\n task_run_request.pop(\"launchType\", None)\n\n # A capacity provider strategy is required for FARGATE SPOT\n task_run_request.setdefault(\n \"capacityProviderStrategy\",\n [{\"capacityProvider\": \"FARGATE_SPOT\", \"weight\": 1}],\n )\n\n overrides = task_run_request.get(\"overrides\", {})\n container_overrides = overrides.get(\"containerOverrides\", [])\n\n # Ensure the network configuration is present if using awsvpc for network mode\n if (\n task_definition.get(\"networkMode\") == \"awsvpc\"\n and not task_run_request.get(\"networkConfiguration\")\n and not configuration.network_configuration\n ):\n task_run_request[\"networkConfiguration\"] = self._load_network_configuration(\n configuration.vpc_id, boto_session\n )\n\n # Use networkConfiguration if supplied by user\n if (\n task_definition.get(\"networkMode\") == \"awsvpc\"\n and configuration.network_configuration\n and configuration.vpc_id\n ):\n task_run_request[\"networkConfiguration\"] = (\n self._custom_network_configuration(\n configuration.vpc_id,\n configuration.network_configuration,\n boto_session,\n )\n )\n\n # Ensure the container name is set if not provided at template time\n\n container_name = (\n configuration.container_name\n or _container_name_from_task_definition(task_definition)\n or ECS_DEFAULT_CONTAINER_NAME\n )\n\n if container_overrides and not container_overrides[0].get(\"name\"):\n container_overrides[0][\"name\"] = container_name\n\n # Ensure configuration command is respected post-templating\n\n orchestration_container = _get_container(container_overrides, container_name)\n\n if orchestration_container:\n # Override the command if given on the configuration\n if configuration.command:\n orchestration_container[\"command\"] = configuration.command\n\n # Clean up templated variable formatting\n\n for container in container_overrides:\n if isinstance(container.get(\"command\"), str):\n container[\"command\"] = shlex.split(container[\"command\"])\n if isinstance(container.get(\"environment\"), dict):\n container[\"environment\"] = [\n {\"name\": k, \"value\": v} for k, v in container[\"environment\"].items()\n ]\n\n # Remove null values \u2014 they're not allowed by AWS\n container[\"environment\"] = [\n item\n for item in container.get(\"environment\", [])\n if item[\"value\"] is not None\n ]\n\n if isinstance(task_run_request.get(\"tags\"), dict):\n task_run_request[\"tags\"] = [\n {\"key\": k, \"value\": v} for k, v in task_run_request[\"tags\"].items()\n ]\n\n if overrides.get(\"cpu\"):\n overrides[\"cpu\"] = str(overrides[\"cpu\"])\n\n if overrides.get(\"memory\"):\n overrides[\"memory\"] = str(overrides[\"memory\"])\n\n # Ensure configuration tags and env are respected post-templating\n\n tags = [\n item\n for item in task_run_request.get(\"tags\", [])\n if item[\"key\"] not in configuration.labels.keys()\n ] + [\n {\"key\": k, \"value\": v}\n for k, v in configuration.labels.items()\n if v is not None\n ]\n\n # Slugify tags keys and values\n tags = [\n {\n \"key\": slugify(\n item[\"key\"],\n regex_pattern=_TAG_REGEX,\n allow_unicode=True,\n lowercase=False,\n ),\n \"value\": slugify(\n item[\"value\"],\n regex_pattern=_TAG_REGEX,\n allow_unicode=True,\n lowercase=False,\n ),\n }\n for item in tags\n ]\n\n if tags:\n task_run_request[\"tags\"] = tags\n\n if orchestration_container:\n environment = [\n item\n for item in orchestration_container.get(\"environment\", [])\n if item[\"name\"] not in configuration.env.keys()\n ] + [\n {\"name\": k, \"value\": v}\n for k, v in configuration.env.items()\n if v is not None\n ]\n if environment:\n orchestration_container[\"environment\"] = environment\n\n # Remove empty container overrides\n\n overrides[\"containerOverrides\"] = [v for v in container_overrides if v]\n\n return task_run_request\n\n @retry(\n stop=stop_after_attempt(MAX_CREATE_TASK_RUN_ATTEMPTS),\n wait=wait_fixed(CREATE_TASK_RUN_MIN_DELAY_SECONDS)\n + wait_random(\n CREATE_TASK_RUN_MIN_DELAY_JITTER_SECONDS,\n CREATE_TASK_RUN_MAX_DELAY_JITTER_SECONDS,\n ),\n )\n def _create_task_run(self, ecs_client: _ECSClient, task_run_request: dict) -> str:\n \"\"\"\n Create a run of a task definition.\n\n Returns the task run ARN.\n \"\"\"\n return ecs_client.run_task(**task_run_request)[\"tasks\"][0]\n\n def _task_definitions_equal(self, taskdef_1, taskdef_2) -> bool:\n \"\"\"\n Compare two task definitions.\n\n Since one may come from the AWS API and have populated defaults, we do our best\n to homogenize the definitions without changing their meaning.\n \"\"\"\n if taskdef_1 == taskdef_2:\n return True\n\n if taskdef_1 is None or taskdef_2 is None:\n return False\n\n taskdef_1 = copy.deepcopy(taskdef_1)\n taskdef_2 = copy.deepcopy(taskdef_2)\n\n for taskdef in (taskdef_1, taskdef_2):\n # Set defaults that AWS would set after registration\n container_definitions = taskdef.get(\"containerDefinitions\", [])\n essential = any(\n container.get(\"essential\") for container in container_definitions\n )\n if not essential:\n container_definitions[0].setdefault(\"essential\", True)\n\n taskdef.setdefault(\"networkMode\", \"bridge\")\n\n _drop_empty_keys_from_task_definition(taskdef_1)\n _drop_empty_keys_from_task_definition(taskdef_2)\n\n # Clear fields that change on registration for comparison\n for field in ECS_POST_REGISTRATION_FIELDS:\n taskdef_1.pop(field, None)\n taskdef_2.pop(field, None)\n\n return taskdef_1 == taskdef_2\n\n async def kill_infrastructure(\n self,\n configuration: ECSJobConfiguration,\n infrastructure_pid: str,\n grace_seconds: int = 30,\n ) -> None:\n \"\"\"\n Kill a task running on ECS.\n\n Args:\n infrastructure_pid: A cluster and task arn combination. This should match a\n value yielded by `ECSWorker.run`.\n \"\"\"\n if grace_seconds != 30:\n self._logger.warning(\n f\"Kill grace period of {grace_seconds}s requested, but AWS does not \"\n \"support dynamic grace period configuration so 30s will be used. \"\n \"See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-config.html for configuration of grace periods.\" # noqa\n )\n cluster, task = parse_identifier(infrastructure_pid)\n await run_sync_in_worker_thread(self._stop_task, configuration, cluster, task)\n\n def _stop_task(\n self, configuration: ECSJobConfiguration, cluster: str, task: str\n ) -> None:\n \"\"\"\n Stop a running ECS task.\n \"\"\"\n if configuration.cluster is not None and cluster != configuration.cluster:\n raise InfrastructureNotAvailable(\n \"Cannot stop ECS task: this infrastructure block has access to \"\n f\"cluster {configuration.cluster!r} but the task is running in cluster \"\n f\"{cluster!r}.\"\n )\n\n _, ecs_client = self._get_session_and_client(configuration)\n try:\n ecs_client.stop_task(cluster=cluster, task=task)\n except Exception as exc:\n # Raise a special exception if the task does not exist\n if \"ClusterNotFound\" in str(exc):\n raise InfrastructureNotFound(\n f\"Cannot stop ECS task: the cluster {cluster!r} could not be found.\"\n ) from exc\n if \"not find task\" in str(exc) or \"referenced task was not found\" in str(\n exc\n ):\n raise InfrastructureNotFound(\n f\"Cannot stop ECS task: the task {task!r} could not be found in \"\n f\"cluster {cluster!r}.\"\n ) from exc\n if \"no registered tasks\" in str(exc):\n raise InfrastructureNotFound(\n f\"Cannot stop ECS task: the cluster {cluster!r} has no tasks.\"\n ) from exc\n\n # Reraise unknown exceptions\n raise\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker-classes","title":"Classes","text":""},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker.job_configuration","title":"
job_configuration (BaseJobConfiguration)
pydantic-model
","text":"
Job configuration for an ECS worker.
Source code in
prefect_aws/workers/ecs_worker.py
class ECSJobConfiguration(BaseJobConfiguration):\n \"\"\"\n Job configuration for an ECS worker.\n \"\"\"\n\n aws_credentials: Optional[AwsCredentials] = Field(default_factory=AwsCredentials)\n task_definition: Optional[Dict[str, Any]] = Field(\n template=_default_task_definition_template()\n )\n task_run_request: Dict[str, Any] = Field(\n template=_default_task_run_request_template()\n )\n configure_cloudwatch_logs: Optional[bool] = Field(default=None)\n cloudwatch_logs_options: Dict[str, str] = Field(default_factory=dict)\n network_configuration: Dict[str, Any] = Field(default_factory=dict)\n stream_output: Optional[bool] = Field(default=None)\n task_start_timeout_seconds: int = Field(default=300)\n task_watch_poll_interval: float = Field(default=5.0)\n auto_deregister_task_definition: bool = Field(default=False)\n vpc_id: Optional[str] = Field(default=None)\n container_name: Optional[str] = Field(default=None)\n cluster: Optional[str] = Field(default=None)\n\n @root_validator\n def task_run_request_requires_arn_if_no_task_definition_given(cls, values) -> dict:\n \"\"\"\n If no task definition is provided, a task definition ARN must be present on the\n task run request.\n \"\"\"\n if not values.get(\"task_run_request\", {}).get(\n \"taskDefinition\"\n ) and not values.get(\"task_definition\"):\n raise ValueError(\n \"A task definition must be provided if a task definition ARN is not \"\n \"present on the task run request.\"\n )\n return values\n\n @root_validator\n def container_name_default_from_task_definition(cls, values) -> dict:\n \"\"\"\n Infers the container name from the task definition if not provided.\n \"\"\"\n if values.get(\"container_name\") is None:\n values[\"container_name\"] = _container_name_from_task_definition(\n values.get(\"task_definition\")\n )\n\n # We may not have a name here still; for example if someone is using a task\n # definition arn. In that case, we'll perform similar logic later to find\n # the name to treat as the \"orchestration\" container.\n\n return values\n\n @root_validator(pre=True)\n def set_default_configure_cloudwatch_logs(cls, values: dict) -> dict:\n \"\"\"\n Streaming output generally requires CloudWatch logs to be configured.\n\n To avoid entangled arguments in the simple case, `configure_cloudwatch_logs`\n defaults to matching the value of `stream_output`.\n \"\"\"\n configure_cloudwatch_logs = values.get(\"configure_cloudwatch_logs\")\n if configure_cloudwatch_logs is None:\n values[\"configure_cloudwatch_logs\"] = values.get(\"stream_output\")\n return values\n\n @root_validator\n def configure_cloudwatch_logs_requires_execution_role_arn(\n cls, values: dict\n ) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if (\n values.get(\"configure_cloudwatch_logs\")\n and not values.get(\"execution_role_arn\")\n # TODO: Does not match\n # Do not raise if they've linked to another task definition or provided\n # it without using our shortcuts\n and not values.get(\"task_run_request\", {}).get(\"taskDefinition\")\n and not (values.get(\"task_definition\") or {}).get(\"executionRoleArn\")\n ):\n raise ValueError(\n \"An `execution_role_arn` must be provided to use \"\n \"`configure_cloudwatch_logs` or `stream_logs`.\"\n )\n return values\n\n @root_validator\n def cloudwatch_logs_options_requires_configure_cloudwatch_logs(\n cls, values: dict\n ) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if values.get(\"cloudwatch_logs_options\") and not values.get(\n \"configure_cloudwatch_logs\"\n ):\n raise ValueError(\n \"`configure_cloudwatch_log` must be enabled to use \"\n \"`cloudwatch_logs_options`.\"\n )\n return values\n\n @root_validator\n def network_configuration_requires_vpc_id(cls, values: dict) -> dict:\n \"\"\"\n Enforces a `vpc_id` is provided when custom network configuration mode is\n enabled for network settings.\n \"\"\"\n if values.get(\"network_configuration\") and not values.get(\"vpc_id\"):\n raise ValueError(\n \"You must provide a `vpc_id` to enable custom `network_configuration`.\"\n )\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker.job_configuration-methods","title":"Methods","text":"
cloudwatch_logs_options_requires_configure_cloudwatch_logs
classmethod
Enforces that an execution role arn is provided (or could be provided by a runtime task definition) when configuring logging.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef cloudwatch_logs_options_requires_configure_cloudwatch_logs(\n cls, values: dict\n) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if values.get(\"cloudwatch_logs_options\") and not values.get(\n \"configure_cloudwatch_logs\"\n ):\n raise ValueError(\n \"`configure_cloudwatch_log` must be enabled to use \"\n \"`cloudwatch_logs_options`.\"\n )\n return values\n
configure_cloudwatch_logs_requires_execution_role_arn
classmethod
Enforces that an execution role arn is provided (or could be provided by a runtime task definition) when configuring logging.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef configure_cloudwatch_logs_requires_execution_role_arn(\n cls, values: dict\n) -> dict:\n \"\"\"\n Enforces that an execution role arn is provided (or could be provided by a\n runtime task definition) when configuring logging.\n \"\"\"\n if (\n values.get(\"configure_cloudwatch_logs\")\n and not values.get(\"execution_role_arn\")\n # TODO: Does not match\n # Do not raise if they've linked to another task definition or provided\n # it without using our shortcuts\n and not values.get(\"task_run_request\", {}).get(\"taskDefinition\")\n and not (values.get(\"task_definition\") or {}).get(\"executionRoleArn\")\n ):\n raise ValueError(\n \"An `execution_role_arn` must be provided to use \"\n \"`configure_cloudwatch_logs` or `stream_logs`.\"\n )\n return values\n
container_name_default_from_task_definition
classmethod
Infers the container name from the task definition if not provided.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef container_name_default_from_task_definition(cls, values) -> dict:\n \"\"\"\n Infers the container name from the task definition if not provided.\n \"\"\"\n if values.get(\"container_name\") is None:\n values[\"container_name\"] = _container_name_from_task_definition(\n values.get(\"task_definition\")\n )\n\n # We may not have a name here still; for example if someone is using a task\n # definition arn. In that case, we'll perform similar logic later to find\n # the name to treat as the \"orchestration\" container.\n\n return values\n
network_configuration_requires_vpc_id
classmethod
Enforces a vpc_id
is provided when custom network configuration mode is enabled for network settings.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef network_configuration_requires_vpc_id(cls, values: dict) -> dict:\n \"\"\"\n Enforces a `vpc_id` is provided when custom network configuration mode is\n enabled for network settings.\n \"\"\"\n if values.get(\"network_configuration\") and not values.get(\"vpc_id\"):\n raise ValueError(\n \"You must provide a `vpc_id` to enable custom `network_configuration`.\"\n )\n return values\n
set_default_configure_cloudwatch_logs
classmethod
Streaming output generally requires CloudWatch logs to be configured.
To avoid entangled arguments in the simple case, configure_cloudwatch_logs
defaults to matching the value of stream_output
.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator(pre=True)\ndef set_default_configure_cloudwatch_logs(cls, values: dict) -> dict:\n \"\"\"\n Streaming output generally requires CloudWatch logs to be configured.\n\n To avoid entangled arguments in the simple case, `configure_cloudwatch_logs`\n defaults to matching the value of `stream_output`.\n \"\"\"\n configure_cloudwatch_logs = values.get(\"configure_cloudwatch_logs\")\n if configure_cloudwatch_logs is None:\n values[\"configure_cloudwatch_logs\"] = values.get(\"stream_output\")\n return values\n
task_run_request_requires_arn_if_no_task_definition_given
classmethod
If no task definition is provided, a task definition ARN must be present on the task run request.
Source code in
prefect_aws/workers/ecs_worker.py
@root_validator\ndef task_run_request_requires_arn_if_no_task_definition_given(cls, values) -> dict:\n \"\"\"\n If no task definition is provided, a task definition ARN must be present on the\n task run request.\n \"\"\"\n if not values.get(\"task_run_request\", {}).get(\n \"taskDefinition\"\n ) and not values.get(\"task_definition\"):\n raise ValueError(\n \"A task definition must be provided if a task definition ARN is not \"\n \"present on the task run request.\"\n )\n return values\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker.job_configuration_variables","title":"
job_configuration_variables (BaseVariables)
pydantic-model
","text":"
Variables for templating an ECS job.
Source code in
prefect_aws/workers/ecs_worker.py
class ECSVariables(BaseVariables):\n \"\"\"\n Variables for templating an ECS job.\n \"\"\"\n\n task_definition_arn: Optional[str] = Field(\n default=None,\n description=(\n \"An identifier for an existing task definition to use. If set, options that\"\n \" require changes to the task definition will be ignored. All contents of \"\n \"the task definition in the job configuration will be ignored.\"\n ),\n )\n env: Dict[str, Optional[str]] = Field(\n title=\"Environment Variables\",\n default_factory=dict,\n description=(\n \"Environment variables to provide to the task run. These variables are set \"\n \"on the Prefect container at task runtime. These will not be set on the \"\n \"task definition.\"\n ),\n )\n aws_credentials: AwsCredentials = Field(\n title=\"AWS Credentials\",\n default_factory=AwsCredentials,\n description=(\n \"The AWS credentials to use to connect to ECS. If not provided, credentials\"\n \" will be inferred from the local environment following AWS's boto client's\"\n \" rules.\"\n ),\n )\n cluster: Optional[str] = Field(\n default=None,\n description=(\n \"The ECS cluster to run the task in. An ARN or name may be provided. If \"\n \"not provided, the default cluster will be used.\"\n ),\n )\n family: Optional[str] = Field(\n default=None,\n description=(\n \"A family for the task definition. If not provided, it will be inferred \"\n \"from the task definition. If the task definition does not have a family, \"\n \"the name will be generated. When flow and deployment metadata is \"\n \"available, the generated name will include their names. Values for this \"\n \"field will be slugified to match AWS character requirements.\"\n ),\n )\n launch_type: Optional[Literal[\"FARGATE\", \"EC2\", \"EXTERNAL\", \"FARGATE_SPOT\"]] = (\n Field(\n default=ECS_DEFAULT_LAUNCH_TYPE,\n description=(\n \"The type of ECS task run infrastructure that should be used. Note that\"\n \" 'FARGATE_SPOT' is not a formal ECS launch type, but we will configure\"\n \" the proper capacity provider stategy if set here.\"\n ),\n )\n )\n image: Optional[str] = Field(\n default=None,\n description=(\n \"The image to use for the Prefect container in the task. If this value is \"\n \"not null, it will override the value in the task definition. This value \"\n \"defaults to a Prefect base image matching your local versions.\"\n ),\n )\n cpu: int = Field(\n title=\"CPU\",\n default=None,\n description=(\n \"The amount of CPU to provide to the ECS task. Valid amounts are \"\n \"specified in the AWS documentation. If not provided, a default value of \"\n f\"{ECS_DEFAULT_CPU} will be used unless present on the task definition.\"\n ),\n )\n memory: int = Field(\n default=None,\n description=(\n \"The amount of memory to provide to the ECS task. Valid amounts are \"\n \"specified in the AWS documentation. If not provided, a default value of \"\n f\"{ECS_DEFAULT_MEMORY} will be used unless present on the task definition.\"\n ),\n )\n container_name: str = Field(\n default=None,\n description=(\n \"The name of the container flow run orchestration will occur in. If not \"\n f\"specified, a default value of {ECS_DEFAULT_CONTAINER_NAME} will be used \"\n \"and if that is not found in the task definition the first container will \"\n \"be used.\"\n ),\n )\n task_role_arn: str = Field(\n title=\"Task Role ARN\",\n default=None,\n description=(\n \"A role to attach to the task run. This controls the permissions of the \"\n \"task while it is running.\"\n ),\n )\n execution_role_arn: str = Field(\n title=\"Execution Role ARN\",\n default=None,\n description=(\n \"An execution role to use for the task. This controls the permissions of \"\n \"the task when it is launching. If this value is not null, it will \"\n \"override the value in the task definition. An execution role must be \"\n \"provided to capture logs from the container.\"\n ),\n )\n vpc_id: Optional[str] = Field(\n title=\"VPC ID\",\n default=None,\n description=(\n \"The AWS VPC to link the task run to. This is only applicable when using \"\n \"the 'awsvpc' network mode for your task. FARGATE tasks require this \"\n \"network mode, but for EC2 tasks the default network mode is 'bridge'. \"\n \"If using the 'awsvpc' network mode and this field is null, your default \"\n \"VPC will be used. If no default VPC can be found, the task run will fail.\"\n ),\n )\n configure_cloudwatch_logs: bool = Field(\n default=None,\n description=(\n \"If enabled, the Prefect container will be configured to send its output \"\n \"to the AWS CloudWatch logs service. This functionality requires an \"\n \"execution role with logs:CreateLogStream, logs:CreateLogGroup, and \"\n \"logs:PutLogEvents permissions. The default for this field is `False` \"\n \"unless `stream_output` is set.\"\n ),\n )\n cloudwatch_logs_options: Dict[str, str] = Field(\n default_factory=dict,\n description=(\n \"When `configure_cloudwatch_logs` is enabled, this setting may be used to\"\n \" pass additional options to the CloudWatch logs configuration or override\"\n \" the default options. See the [AWS\"\n \" documentation](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_awslogs.html#create_awslogs_logdriver_options)\" # noqa\n \" for available options. \"\n ),\n )\n\n network_configuration: Dict[str, Any] = Field(\n default_factory=dict,\n description=(\n \"When `network_configuration` is supplied it will override ECS Worker's\"\n \"awsvpcConfiguration that defined in the ECS task executing your workload. \"\n \"See the [AWS documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ecs-service-awsvpcconfiguration.html)\" # noqa\n \" for available options.\"\n ),\n )\n\n stream_output: bool = Field(\n default=None,\n description=(\n \"If enabled, logs will be streamed from the Prefect container to the local \"\n \"console. Unless you have configured AWS CloudWatch logs manually on your \"\n \"task definition, this requires the same prerequisites outlined in \"\n \"`configure_cloudwatch_logs`.\"\n ),\n )\n task_start_timeout_seconds: int = Field(\n default=300,\n description=(\n \"The amount of time to watch for the start of the ECS task \"\n \"before marking it as failed. The task must enter a RUNNING state to be \"\n \"considered started.\"\n ),\n )\n task_watch_poll_interval: float = Field(\n default=5.0,\n description=(\n \"The amount of time to wait between AWS API calls while monitoring the \"\n \"state of an ECS task.\"\n ),\n )\n auto_deregister_task_definition: bool = Field(\n default=False,\n description=(\n \"If enabled, any task definitions that are created by this block will be \"\n \"deregistered. Existing task definitions linked by ARN will never be \"\n \"deregistered. Deregistering a task definition does not remove it from \"\n \"your AWS account, instead it will be marked as INACTIVE.\"\n ),\n )\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker.job_configuration_variables-attributes","title":"Attributes","text":"
auto_deregister_task_definition: bool
pydantic-field
If enabled, any task definitions that are created by this block will be deregistered. Existing task definitions linked by ARN will never be deregistered. Deregistering a task definition does not remove it from your AWS account, instead it will be marked as INACTIVE.
aws_credentials: AwsCredentials
pydantic-field
The AWS credentials to use to connect to ECS. If not provided, credentials will be inferred from the local environment following AWS's boto client's rules.
cloudwatch_logs_options: Dict[str, str]
pydantic-field
When configure_cloudwatch_logs
is enabled, this setting may be used to pass additional options to the CloudWatch logs configuration or override the default options. See the AWS documentation for available options.
cluster: str
pydantic-field
The ECS cluster to run the task in. An ARN or name may be provided. If not provided, the default cluster will be used.
configure_cloudwatch_logs: bool
pydantic-field
If enabled, the Prefect container will be configured to send its output to the AWS CloudWatch logs service. This functionality requires an execution role with logs:CreateLogStream, logs:CreateLogGroup, and logs:PutLogEvents permissions. The default for this field is False
unless stream_output
is set.
container_name: str
pydantic-field
The name of the container flow run orchestration will occur in. If not specified, a default value of prefect will be used and if that is not found in the task definition the first container will be used.
cpu: int
pydantic-field
The amount of CPU to provide to the ECS task. Valid amounts are specified in the AWS documentation. If not provided, a default value of 1024 will be used unless present on the task definition.
execution_role_arn: str
pydantic-field
An execution role to use for the task. This controls the permissions of the task when it is launching. If this value is not null, it will override the value in the task definition. An execution role must be provided to capture logs from the container.
family: str
pydantic-field
A family for the task definition. If not provided, it will be inferred from the task definition. If the task definition does not have a family, the name will be generated. When flow and deployment metadata is available, the generated name will include their names. Values for this field will be slugified to match AWS character requirements.
image: str
pydantic-field
The image to use for the Prefect container in the task. If this value is not null, it will override the value in the task definition. This value defaults to a Prefect base image matching your local versions.
launch_type: Literal['FARGATE', 'EC2', 'EXTERNAL', 'FARGATE_SPOT']
pydantic-field
The type of ECS task run infrastructure that should be used. Note that 'FARGATE_SPOT' is not a formal ECS launch type, but we will configure the proper capacity provider stategy if set here.
memory: int
pydantic-field
The amount of memory to provide to the ECS task. Valid amounts are specified in the AWS documentation. If not provided, a default value of 2048 will be used unless present on the task definition.
network_configuration: Dict[str, Any]
pydantic-field
When network_configuration
is supplied it will override ECS Worker'sawsvpcConfiguration that defined in the ECS task executing your workload. See the AWS documentation for available options.
stream_output: bool
pydantic-field
If enabled, logs will be streamed from the Prefect container to the local console. Unless you have configured AWS CloudWatch logs manually on your task definition, this requires the same prerequisites outlined in configure_cloudwatch_logs
.
task_definition_arn: str
pydantic-field
An identifier for an existing task definition to use. If set, options that require changes to the task definition will be ignored. All contents of the task definition in the job configuration will be ignored.
task_role_arn: str
pydantic-field
A role to attach to the task run. This controls the permissions of the task while it is running.
task_start_timeout_seconds: int
pydantic-field
The amount of time to watch for the start of the ECS task before marking it as failed. The task must enter a RUNNING state to be considered started.
task_watch_poll_interval: float
pydantic-field
The amount of time to wait between AWS API calls while monitoring the state of an ECS task.
vpc_id: str
pydantic-field
The AWS VPC to link the task run to. This is only applicable when using the 'awsvpc' network mode for your task. FARGATE tasks require this network mode, but for EC2 tasks the default network mode is 'bridge'. If using the 'awsvpc' network mode and this field is null, your default VPC will be used. If no default VPC can be found, the task run will fail.
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker-methods","title":"Methods","text":""},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker.kill_infrastructure","title":"
kill_infrastructure
async
","text":"
Kill a task running on ECS.
Parameters:
Name Type Description Default
infrastructure_pid
str
A cluster and task arn combination. This should match a value yielded by ECSWorker.run
.
required Source code in
prefect_aws/workers/ecs_worker.py
async def kill_infrastructure(\n self,\n configuration: ECSJobConfiguration,\n infrastructure_pid: str,\n grace_seconds: int = 30,\n) -> None:\n \"\"\"\n Kill a task running on ECS.\n\n Args:\n infrastructure_pid: A cluster and task arn combination. This should match a\n value yielded by `ECSWorker.run`.\n \"\"\"\n if grace_seconds != 30:\n self._logger.warning(\n f\"Kill grace period of {grace_seconds}s requested, but AWS does not \"\n \"support dynamic grace period configuration so 30s will be used. \"\n \"See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-config.html for configuration of grace periods.\" # noqa\n )\n cluster, task = parse_identifier(infrastructure_pid)\n await run_sync_in_worker_thread(self._stop_task, configuration, cluster, task)\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorker.run","title":"
run
async
","text":"
Runs a given flow run on the current worker.
Source code in
prefect_aws/workers/ecs_worker.py
async def run(\n self,\n flow_run: \"FlowRun\",\n configuration: ECSJobConfiguration,\n task_status: Optional[anyio.abc.TaskStatus] = None,\n) -> BaseWorkerResult:\n \"\"\"\n Runs a given flow run on the current worker.\n \"\"\"\n boto_session, ecs_client = await run_sync_in_worker_thread(\n self._get_session_and_client, configuration\n )\n\n logger = self.get_flow_run_logger(flow_run)\n\n (\n task_arn,\n cluster_arn,\n task_definition,\n is_new_task_definition,\n ) = await run_sync_in_worker_thread(\n self._create_task_and_wait_for_start,\n logger,\n boto_session,\n ecs_client,\n configuration,\n flow_run,\n )\n\n # The task identifier is \"{cluster}::{task}\" where we use the configured cluster\n # if set to preserve matching by name rather than arn\n # Note \"::\" is used despite the Prefect standard being \":\" because ARNs contain\n # single colons.\n identifier = (\n (configuration.cluster if configuration.cluster else cluster_arn)\n + \"::\"\n + task_arn\n )\n\n if task_status:\n task_status.started(identifier)\n\n status_code = await run_sync_in_worker_thread(\n self._watch_task_and_get_exit_code,\n logger,\n configuration,\n task_arn,\n cluster_arn,\n task_definition,\n is_new_task_definition and configuration.auto_deregister_task_definition,\n boto_session,\n ecs_client,\n )\n\n return ECSWorkerResult(\n identifier=identifier,\n # If the container does not start the exit code can be null but we must\n # still report a status code. We use a -1 to indicate a special code.\n status_code=status_code if status_code is not None else -1,\n )\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.ECSWorkerResult","title":"
ECSWorkerResult (BaseWorkerResult)
pydantic-model
","text":"
The result of an ECS job.
Source code in
prefect_aws/workers/ecs_worker.py
class ECSWorkerResult(BaseWorkerResult):\n \"\"\"\n The result of an ECS job.\n \"\"\"\n
"},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker-functions","title":"Functions","text":""},{"location":"ecs_worker/#prefect_aws.workers.ecs_worker.parse_identifier","title":"
parse_identifier
","text":"
Splits identifier into its cluster and task components, e.g. input \"cluster_name::task_arn\" outputs (\"cluster_name\", \"task_arn\").
Source code in
prefect_aws/workers/ecs_worker.py
def parse_identifier(identifier: str) -> ECSIdentifier:\n \"\"\"\n Splits identifier into its cluster and task components, e.g.\n input \"cluster_name::task_arn\" outputs (\"cluster_name\", \"task_arn\").\n \"\"\"\n cluster, task = identifier.split(\"::\", maxsplit=1)\n return ECSIdentifier(cluster, task)\n
"},{"location":"examples_catalog/","title":"Examples Catalog","text":"
Below is a list of examples for prefect-aws
.
"},{"location":"examples_catalog/#batch-module","title":"Batch Module","text":"
Submits a job to batch.
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.batch import batch_submit\n\n\n@flow\ndef example_batch_submit_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n job_id = batch_submit(\n \"job_name\",\n \"job_definition\",\n \"job_queue\",\n aws_credentials\n )\n return job_id\n\nexample_batch_submit_flow()\n
"},{"location":"examples_catalog/#client-waiter-module","title":"Client Waiter Module","text":"
Run an ec2 waiter until instance_exists.
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.client_wait import client_waiter\n\n@flow\ndef example_client_wait_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n\n waiter = client_waiter(\n \"ec2\",\n \"instance_exists\",\n aws_credentials\n )\n\n return waiter\nexample_client_wait_flow()\n
"},{"location":"examples_catalog/#credentials-module","title":"Credentials Module","text":"
Create an S3 client from an authorized boto3 session
minio_credentials = MinIOCredentials(\n minio_root_user = \"minio_root_user\",\n minio_root_password = \"minio_root_password\"\n)\ns3_client = minio_credentials.get_boto3_session().client(\n service=\"s3\",\n endpoint_url=\"http://localhost:9000\"\n)\n
Create an S3 client from an authorized boto3 session:
aws_credentials = AwsCredentials(\n aws_access_key_id = \"access_key_id\",\n aws_secret_access_key = \"secret_access_key\"\n )\ns3_client = aws_credentials.get_boto3_session().client(\"s3\")\n
"},{"location":"examples_catalog/#s3-module","title":"S3 Module","text":"
Download my_folder/notes.txt object to notes.txt.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.download_object_to_path(\"my_folder/notes.txt\", \"notes.txt\")\n
Download my_folder to a local folder named my_folder.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.download_folder_to_path(\"my_folder\", \"my_folder\")\n
List objects under the
base_folder
.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.list_objects(\"base_folder\")\n
Stream notes.txt from your-bucket/notes.txt to my-bucket/landed/notes.txt.
from prefect_aws.s3 import S3Bucket\n\nyour_s3_bucket = S3Bucket.load(\"your-bucket\")\nmy_s3_bucket = S3Bucket.load(\"my-bucket\")\n\nmy_s3_bucket.stream_from(\n your_s3_bucket,\n \"notes.txt\",\n to_path=\"landed/notes.txt\"\n)\n
Upload BytesIO object to my_folder/notes.txt.
from io import BytesIO\n\nfrom prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\nwith open(\"notes.txt\", \"rb\") as f:\n s3_bucket.upload_from_file_object(f, \"my_folder/notes.txt\")\n
Upload BufferedReader object to my_folder/notes.txt.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\nwith open(\"notes.txt\", \"rb\") as f:\n s3_bucket.upload_from_file_object(\n f, \"my_folder/notes.txt\"\n )\n
Upload notes.txt to my_folder/notes.txt.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.upload_from_path(\"notes.txt\", \"my_folder/notes.txt\")\n
Read and upload a file to an S3 bucket:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import s3_upload\n\n\n@flow\nasync def example_s3_upload_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n with open(\"data.csv\", \"rb\") as file:\n key = await s3_upload(\n bucket=\"bucket\",\n key=\"data.csv\",\n data=file.read(),\n aws_credentials=aws_credentials,\n )\n\nexample_s3_upload_flow()\n
Upload contents from my_folder to new_folder.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.upload_from_folder(\"my_folder\", \"new_folder\")\n
Download a file from an S3 bucket:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import s3_download\n\n\n@flow\nasync def example_s3_download_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n data = await s3_download(\n bucket=\"bucket\",\n key=\"key\",\n aws_credentials=aws_credentials,\n )\n\nexample_s3_download_flow()\n
List all objects in a bucket:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import s3_list_objects\n\n\n@flow\nasync def example_s3_list_objects_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n objects = await s3_list_objects(\n bucket=\"data_bucket\",\n aws_credentials=aws_credentials\n )\n\nexample_s3_list_objects_flow()\n
Download my_folder/notes.txt object to a BytesIO object.
from io import BytesIO\n\nfrom prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\nwith BytesIO() as buf:\n s3_bucket.download_object_to_file_object(\"my_folder/notes.txt\", buf)\n
Download my_folder/notes.txt object to a BufferedWriter.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\nwith open(\"notes.txt\", \"wb\") as f:\n s3_bucket.download_object_to_file_object(\"my_folder/notes.txt\", f)\n
Read \"subfolder/file1\" contents from an S3 bucket named \"bucket\":
from prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import S3Bucket\n\naws_creds = AwsCredentials(\n aws_access_key_id=AWS_ACCESS_KEY_ID,\n aws_secret_access_key=AWS_SECRET_ACCESS_KEY\n)\n\ns3_bucket_block = S3Bucket(\n bucket_name=\"bucket\",\n aws_credentials=aws_creds,\n basepath=\"subfolder\"\n)\n\nkey_contents = s3_bucket_block.read_path(path=\"subfolder/file1\")\n
"},{"location":"examples_catalog/#secrets-manager-module","title":"Secrets Manager Module","text":"
Deletes the secret with a recovery window of 15 days.
secrets_manager = SecretsManager.load(\"MY_BLOCK\")\nsecrets_manager.delete_secret(recovery_window_in_days=15)\n
Read a secret value:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import read_secret\n\n@flow\ndef example_read_secret():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n secret_value = read_secret(\n secret_name=\"db_password\",\n aws_credentials=aws_credentials\n )\n\nexample_read_secret()\n
Write some secret data.
secrets_manager = SecretsManager.load(\"MY_BLOCK\")\nsecrets_manager.write_secret(b\"my_secret_data\")\n
Delete a secret immediately:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import delete_secret\n\n@flow\ndef example_delete_secret_immediately():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n delete_secret(\n secret_name=\"life_the_universe_and_everything\",\n aws_credentials=aws_credentials,\n force_delete_without_recovery: True\n )\n\nexample_delete_secret_immediately()\n
Delete a secret with a 90 day recovery window:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import delete_secret\n\n@flow\ndef example_delete_secret_with_recovery_window():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n delete_secret(\n secret_name=\"life_the_universe_and_everything\",\n aws_credentials=aws_credentials,\n recovery_window_in_days=90\n )\n\nexample_delete_secret_with_recovery_window()\n
Update a secret value:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import update_secret\n\n@flow\ndef example_update_secret():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n update_secret(\n secret_name=\"life_the_universe_and_everything\",\n secret_value=\"42\",\n aws_credentials=aws_credentials\n )\n\nexample_update_secret()\n
Reads a secret.
secrets_manager = SecretsManager.load(\"MY_BLOCK\")\nsecrets_manager.read_secret()\n
"},{"location":"lambda_function/","title":"Lambda","text":""},{"location":"lambda_function/#prefect_aws.lambda_function","title":"
prefect_aws.lambda_function
","text":"
Integrations with AWS Lambda.
Examples:
Run a lambda function with a payload
LambdaFunction(\n function_name=\"test-function\",\n aws_credentials=aws_credentials,\n).invoke(payload={\"foo\": \"bar\"})\n
Specify a version of a lambda function
LambdaFunction(\n function_name=\"test-function\",\n qualifier=\"1\",\n aws_credentials=aws_credentials,\n).invoke()\n
Invoke a lambda function asynchronously
LambdaFunction(\n function_name=\"test-function\",\n aws_credentials=aws_credentials,\n).invoke(invocation_type=\"Event\")\n
Invoke a lambda function and return the last 4 KB of logs
LambdaFunction(\n function_name=\"test-function\",\n aws_credentials=aws_credentials,\n).invoke(tail=True)\n
Invoke a lambda function with a client context
LambdaFunction(\n function_name=\"test-function\",\n aws_credentials=aws_credentials,\n).invoke(client_context={\"bar\": \"foo\"})\n
"},{"location":"lambda_function/#prefect_aws.lambda_function-classes","title":"Classes","text":""},{"location":"lambda_function/#prefect_aws.lambda_function.LambdaFunction","title":"
LambdaFunction (Block)
pydantic-model
","text":"
Invoke a Lambda function. This block is part of the prefect-aws collection. Install prefect-aws with pip install prefect-aws
to use this block.
Attributes:
Name Type Description
function_name
str
The name, ARN, or partial ARN of the Lambda function to run. This must be the name of a function that is already deployed to AWS Lambda.
qualifier
Optional[str]
The version or alias of the Lambda function to use when invoked. If not specified, the latest (unqualified) version of the Lambda function will be used.
aws_credentials
AwsCredentials
The AWS credentials to use to connect to AWS Lambda with a default factory of AwsCredentials.
Source code in
prefect_aws/lambda_function.py
class LambdaFunction(Block):\n \"\"\"Invoke a Lambda function. This block is part of the prefect-aws\n collection. Install prefect-aws with `pip install prefect-aws` to use this\n block.\n\n Attributes:\n function_name: The name, ARN, or partial ARN of the Lambda function to\n run. This must be the name of a function that is already deployed\n to AWS Lambda.\n qualifier: The version or alias of the Lambda function to use when\n invoked. If not specified, the latest (unqualified) version of the\n Lambda function will be used.\n aws_credentials: The AWS credentials to use to connect to AWS Lambda\n with a default factory of AwsCredentials.\n\n \"\"\"\n\n _block_type_name = \"Lambda Function\"\n _logo_url = \"https://cdn.sanity.io/images/3ugk85nk/production/d74b16fe84ce626345adf235a47008fea2869a60-225x225.png\" # noqa\n _documentation_url = \"https://prefecthq.github.io/prefect-aws/s3/#prefect_aws.lambda_function.LambdaFunction\" # noqa\n\n function_name: str = Field(\n title=\"Function Name\",\n description=(\n \"The name, ARN, or partial ARN of the Lambda function to run. This\"\n \" must be the name of a function that is already deployed to AWS\"\n \" Lambda.\"\n ),\n )\n qualifier: Optional[str] = Field(\n default=None,\n title=\"Qualifier\",\n description=(\n \"The version or alias of the Lambda function to use when invoked. \"\n \"If not specified, the latest (unqualified) version of the Lambda \"\n \"function will be used.\"\n ),\n )\n aws_credentials: AwsCredentials = Field(\n title=\"AWS Credentials\",\n default_factory=AwsCredentials,\n description=\"The AWS credentials to invoke the Lambda with.\",\n )\n\n class Config:\n \"\"\"Lambda's pydantic configuration.\"\"\"\n\n smart_union = True\n\n def _get_lambda_client(self):\n \"\"\"\n Retrieve a boto3 session and Lambda client\n \"\"\"\n boto_session = self.aws_credentials.get_boto3_session()\n lambda_client = boto_session.client(\"lambda\")\n return lambda_client\n\n @sync_compatible\n async def invoke(\n self,\n payload: dict = None,\n invocation_type: Literal[\n \"RequestResponse\", \"Event\", \"DryRun\"\n ] = \"RequestResponse\",\n tail: bool = False,\n client_context: Optional[dict] = None,\n ) -> dict:\n \"\"\"\n [Invoke](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/lambda/client/invoke.html)\n the Lambda function with the given payload.\n\n Args:\n payload: The payload to send to the Lambda function.\n invocation_type: The invocation type of the Lambda function. This\n can be one of \"RequestResponse\", \"Event\", or \"DryRun\". Uses\n \"RequestResponse\" by default.\n tail: If True, the response will include the base64-encoded last 4\n KB of log data produced by the Lambda function.\n client_context: The client context to send to the Lambda function.\n Limited to 3583 bytes.\n\n Returns:\n The response from the Lambda function.\n\n Examples:\n\n ```python\n from prefect_aws.lambda_function import LambdaFunction\n from prefect_aws.credentials import AwsCredentials\n\n credentials = AwsCredentials()\n lambda_function = LambdaFunction(\n function_name=\"test_lambda_function\",\n aws_credentials=credentials,\n )\n response = lambda_function.invoke(\n payload={\"foo\": \"bar\"},\n invocation_type=\"RequestResponse\",\n )\n response[\"Payload\"].read()\n ```\n ```txt\n b'{\"foo\": \"bar\"}'\n ```\n\n \"\"\"\n # Add invocation arguments\n kwargs = dict(FunctionName=self.function_name)\n\n if payload:\n kwargs[\"Payload\"] = json.dumps(payload).encode()\n\n # Let boto handle invalid invocation types\n kwargs[\"InvocationType\"] = invocation_type\n\n if self.qualifier is not None:\n kwargs[\"Qualifier\"] = self.qualifier\n\n if tail:\n kwargs[\"LogType\"] = \"Tail\"\n\n if client_context is not None:\n # For some reason this is string, but payload is bytes\n kwargs[\"ClientContext\"] = json.dumps(client_context)\n\n # Get client and invoke\n lambda_client = await run_sync_in_worker_thread(self._get_lambda_client)\n return await run_sync_in_worker_thread(lambda_client.invoke, **kwargs)\n
"},{"location":"lambda_function/#prefect_aws.lambda_function.LambdaFunction-attributes","title":"Attributes","text":""},{"location":"lambda_function/#prefect_aws.lambda_function.LambdaFunction.aws_credentials","title":"
aws_credentials: AwsCredentials
pydantic-field
","text":"
The AWS credentials to invoke the Lambda with.
"},{"location":"lambda_function/#prefect_aws.lambda_function.LambdaFunction.function_name","title":"
function_name: str
pydantic-field
required
","text":"
The name, ARN, or partial ARN of the Lambda function to run. This must be the name of a function that is already deployed to AWS Lambda.
"},{"location":"lambda_function/#prefect_aws.lambda_function.LambdaFunction.qualifier","title":"
qualifier: str
pydantic-field
","text":"
The version or alias of the Lambda function to use when invoked. If not specified, the latest (unqualified) version of the Lambda function will be used.
"},{"location":"lambda_function/#prefect_aws.lambda_function.LambdaFunction-classes","title":"Classes","text":""},{"location":"lambda_function/#prefect_aws.lambda_function.LambdaFunction.Config","title":"
Config
","text":"
Lambda's pydantic configuration.
Source code in
prefect_aws/lambda_function.py
class Config:\n \"\"\"Lambda's pydantic configuration.\"\"\"\n\n smart_union = True\n
"},{"location":"lambda_function/#prefect_aws.lambda_function.LambdaFunction-methods","title":"Methods","text":""},{"location":"lambda_function/#prefect_aws.lambda_function.LambdaFunction.__json_encoder__","title":"
__json_encoder__
special
staticmethod
","text":"
partial(func, args, *keywords) - new function with partial application of the given arguments and keywords.
"},{"location":"lambda_function/#prefect_aws.lambda_function.LambdaFunction.invoke","title":"
invoke
async
","text":"
Invoke the Lambda function with the given payload.
Parameters:
Name Type Description Default
payload
dict
The payload to send to the Lambda function.
None
invocation_type
Literal['RequestResponse', 'Event', 'DryRun']
The invocation type of the Lambda function. This can be one of \"RequestResponse\", \"Event\", or \"DryRun\". Uses \"RequestResponse\" by default.
'RequestResponse'
tail
bool
If True, the response will include the base64-encoded last 4 KB of log data produced by the Lambda function.
False
client_context
Optional[dict]
The client context to send to the Lambda function. Limited to 3583 bytes.
None
Returns:
Type Description
dict
The response from the Lambda function.
Examples:
from prefect_aws.lambda_function import LambdaFunction\nfrom prefect_aws.credentials import AwsCredentials\n\ncredentials = AwsCredentials()\nlambda_function = LambdaFunction(\n function_name=\"test_lambda_function\",\n aws_credentials=credentials,\n)\nresponse = lambda_function.invoke(\n payload={\"foo\": \"bar\"},\n invocation_type=\"RequestResponse\",\n)\nresponse[\"Payload\"].read()\n
b'{\"foo\": \"bar\"}'\n
Source code in
prefect_aws/lambda_function.py
@sync_compatible\nasync def invoke(\n self,\n payload: dict = None,\n invocation_type: Literal[\n \"RequestResponse\", \"Event\", \"DryRun\"\n ] = \"RequestResponse\",\n tail: bool = False,\n client_context: Optional[dict] = None,\n) -> dict:\n \"\"\"\n [Invoke](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/lambda/client/invoke.html)\n the Lambda function with the given payload.\n\n Args:\n payload: The payload to send to the Lambda function.\n invocation_type: The invocation type of the Lambda function. This\n can be one of \"RequestResponse\", \"Event\", or \"DryRun\". Uses\n \"RequestResponse\" by default.\n tail: If True, the response will include the base64-encoded last 4\n KB of log data produced by the Lambda function.\n client_context: The client context to send to the Lambda function.\n Limited to 3583 bytes.\n\n Returns:\n The response from the Lambda function.\n\n Examples:\n\n ```python\n from prefect_aws.lambda_function import LambdaFunction\n from prefect_aws.credentials import AwsCredentials\n\n credentials = AwsCredentials()\n lambda_function = LambdaFunction(\n function_name=\"test_lambda_function\",\n aws_credentials=credentials,\n )\n response = lambda_function.invoke(\n payload={\"foo\": \"bar\"},\n invocation_type=\"RequestResponse\",\n )\n response[\"Payload\"].read()\n ```\n ```txt\n b'{\"foo\": \"bar\"}'\n ```\n\n \"\"\"\n # Add invocation arguments\n kwargs = dict(FunctionName=self.function_name)\n\n if payload:\n kwargs[\"Payload\"] = json.dumps(payload).encode()\n\n # Let boto handle invalid invocation types\n kwargs[\"InvocationType\"] = invocation_type\n\n if self.qualifier is not None:\n kwargs[\"Qualifier\"] = self.qualifier\n\n if tail:\n kwargs[\"LogType\"] = \"Tail\"\n\n if client_context is not None:\n # For some reason this is string, but payload is bytes\n kwargs[\"ClientContext\"] = json.dumps(client_context)\n\n # Get client and invoke\n lambda_client = await run_sync_in_worker_thread(self._get_lambda_client)\n return await run_sync_in_worker_thread(lambda_client.invoke, **kwargs)\n
"},{"location":"s3/","title":"S3","text":""},{"location":"s3/#prefect_aws.s3","title":"
prefect_aws.s3
","text":"
Tasks for interacting with AWS S3
"},{"location":"s3/#prefect_aws.s3-classes","title":"Classes","text":""},{"location":"s3/#prefect_aws.s3.S3Bucket","title":"
S3Bucket (WritableFileSystem, WritableDeploymentStorage, ObjectStorageBlock)
pydantic-model
","text":"
Block used to store data using AWS S3 or S3-compatible object storage like MinIO.
Attributes:
Name Type Description
bucket_name
str
Name of your bucket.
credentials
Union[prefect_aws.credentials.MinIOCredentials, prefect_aws.credentials.AwsCredentials]
A block containing your credentials to AWS or MinIO.
bucket_folder
str
A default path to a folder within the S3 bucket to use for reading and writing objects.
Source code in
prefect_aws/s3.py
class S3Bucket(WritableFileSystem, WritableDeploymentStorage, ObjectStorageBlock):\n\n \"\"\"\n Block used to store data using AWS S3 or S3-compatible object storage like MinIO.\n\n Attributes:\n bucket_name: Name of your bucket.\n credentials: A block containing your credentials to AWS or MinIO.\n bucket_folder: A default path to a folder within the S3 bucket to use\n for reading and writing objects.\n \"\"\"\n\n _logo_url = \"https://cdn.sanity.io/images/3ugk85nk/production/d74b16fe84ce626345adf235a47008fea2869a60-225x225.png\" # noqa\n _block_type_name = \"S3 Bucket\"\n _documentation_url = (\n \"https://prefecthq.github.io/prefect-aws/s3/#prefect_aws.s3.S3Bucket\" # noqa\n )\n\n bucket_name: str = Field(default=..., description=\"Name of your bucket.\")\n\n credentials: Union[MinIOCredentials, AwsCredentials] = Field(\n default_factory=AwsCredentials,\n description=\"A block containing your credentials to AWS or MinIO.\",\n )\n\n bucket_folder: str = Field(\n default=\"\",\n description=(\n \"A default path to a folder within the S3 bucket to use \"\n \"for reading and writing objects.\"\n ),\n )\n\n # Property to maintain compatibility with storage block based deployments\n @property\n def basepath(self) -> str:\n \"\"\"\n The base path of the S3 bucket.\n\n Returns:\n str: The base path of the S3 bucket.\n \"\"\"\n return self.bucket_folder\n\n @basepath.setter\n def basepath(self, value: str) -> None:\n self.bucket_folder = value\n\n def _resolve_path(self, path: str) -> str:\n \"\"\"\n A helper function used in write_path to join `self.basepath` and `path`.\n\n Args:\n\n path: Name of the key, e.g. \"file1\". Each object in your\n bucket has a unique key (or key name).\n\n \"\"\"\n # If bucket_folder provided, it means we won't write to the root dir of\n # the bucket. So we need to add it on the front of the path.\n #\n # AWS object key naming guidelines require '/' for bucket folders.\n # Get POSIX path to prevent `pathlib` from inferring '\\' on Windows OS\n path = (\n (Path(self.bucket_folder) / path).as_posix() if self.bucket_folder else path\n )\n\n return path\n\n def _get_s3_client(self) -> boto3.client:\n \"\"\"\n Authenticate MinIO credentials or AWS credentials and return an S3 client.\n This is a helper function called by read_path() or write_path().\n \"\"\"\n return self.credentials.get_s3_client()\n\n def _get_bucket_resource(self) -> boto3.resource:\n \"\"\"\n Retrieves boto3 resource object for the configured bucket\n \"\"\"\n params_override = self.credentials.aws_client_parameters.get_params_override()\n bucket = (\n self.credentials.get_boto3_session()\n .resource(\"s3\", **params_override)\n .Bucket(self.bucket_name)\n )\n return bucket\n\n @sync_compatible\n async def get_directory(\n self, from_path: Optional[str] = None, local_path: Optional[str] = None\n ) -> None:\n \"\"\"\n Copies a folder from the configured S3 bucket to a local directory.\n\n Defaults to copying the entire contents of the block's basepath to the current\n working directory.\n\n Args:\n from_path: Path in S3 bucket to download from. Defaults to the block's\n configured basepath.\n local_path: Local path to download S3 contents to. Defaults to the current\n working directory.\n \"\"\"\n bucket_folder = self.bucket_folder\n if from_path is None:\n from_path = str(bucket_folder) if bucket_folder else \"\"\n\n if local_path is None:\n local_path = str(Path(\".\").absolute())\n else:\n local_path = str(Path(local_path).expanduser())\n\n bucket = self._get_bucket_resource()\n for obj in bucket.objects.filter(Prefix=from_path):\n if obj.key[-1] == \"/\":\n # object is a folder and will be created if it contains any objects\n continue\n target = os.path.join(\n local_path,\n os.path.relpath(obj.key, from_path),\n )\n os.makedirs(os.path.dirname(target), exist_ok=True)\n bucket.download_file(obj.key, target)\n\n @sync_compatible\n async def put_directory(\n self,\n local_path: Optional[str] = None,\n to_path: Optional[str] = None,\n ignore_file: Optional[str] = None,\n ) -> int:\n \"\"\"\n Uploads a directory from a given local path to the configured S3 bucket in a\n given folder.\n\n Defaults to uploading the entire contents the current working directory to the\n block's basepath.\n\n Args:\n local_path: Path to local directory to upload from.\n to_path: Path in S3 bucket to upload to. Defaults to block's configured\n basepath.\n ignore_file: Path to file containing gitignore style expressions for\n filepaths to ignore.\n\n \"\"\"\n to_path = \"\" if to_path is None else to_path\n\n if local_path is None:\n local_path = \".\"\n\n included_files = None\n if ignore_file:\n with open(ignore_file, \"r\") as f:\n ignore_patterns = f.readlines()\n\n included_files = filter_files(local_path, ignore_patterns)\n\n uploaded_file_count = 0\n for local_file_path in Path(local_path).expanduser().rglob(\"*\"):\n if (\n included_files is not None\n and str(local_file_path.relative_to(local_path)) not in included_files\n ):\n continue\n elif not local_file_path.is_dir():\n remote_file_path = Path(to_path) / local_file_path.relative_to(\n local_path\n )\n with open(local_file_path, \"rb\") as local_file:\n local_file_content = local_file.read()\n\n await self.write_path(\n remote_file_path.as_posix(), content=local_file_content\n )\n uploaded_file_count += 1\n\n return uploaded_file_count\n\n @sync_compatible\n async def read_path(self, path: str) -> bytes:\n \"\"\"\n Read specified path from S3 and return contents. Provide the entire\n path to the key in S3.\n\n Args:\n path: Entire path to (and including) the key.\n\n Example:\n Read \"subfolder/file1\" contents from an S3 bucket named \"bucket\":\n ```python\n from prefect_aws import AwsCredentials\n from prefect_aws.s3 import S3Bucket\n\n aws_creds = AwsCredentials(\n aws_access_key_id=AWS_ACCESS_KEY_ID,\n aws_secret_access_key=AWS_SECRET_ACCESS_KEY\n )\n\n s3_bucket_block = S3Bucket(\n bucket_name=\"bucket\",\n aws_credentials=aws_creds,\n basepath=\"subfolder\"\n )\n\n key_contents = s3_bucket_block.read_path(path=\"subfolder/file1\")\n ```\n \"\"\"\n path = self._resolve_path(path)\n\n return await run_sync_in_worker_thread(self._read_sync, path)\n\n def _read_sync(self, key: str) -> bytes:\n \"\"\"\n Called by read_path(). Creates an S3 client and retrieves the\n contents from a specified path.\n \"\"\"\n\n s3_client = self._get_s3_client()\n\n with io.BytesIO() as stream:\n s3_client.download_fileobj(Bucket=self.bucket_name, Key=key, Fileobj=stream)\n stream.seek(0)\n output = stream.read()\n return output\n\n @sync_compatible\n async def write_path(self, path: str, content: bytes) -> str:\n \"\"\"\n Writes to an S3 bucket.\n\n Args:\n\n path: The key name. Each object in your bucket has a unique\n key (or key name).\n content: What you are uploading to S3.\n\n Example:\n\n Write data to the path `dogs/small_dogs/havanese` in an S3 Bucket:\n ```python\n from prefect_aws import MinioCredentials\n from prefect_aws.s3 import S3Bucket\n\n minio_creds = MinIOCredentials(\n minio_root_user = \"minioadmin\",\n minio_root_password = \"minioadmin\",\n )\n\n s3_bucket_block = S3Bucket(\n bucket_name=\"bucket\",\n minio_credentials=minio_creds,\n basepath=\"dogs/smalldogs\",\n endpoint_url=\"http://localhost:9000\",\n )\n s3_havanese_path = s3_bucket_block.write_path(path=\"havanese\", content=data)\n ```\n \"\"\"\n\n path = self._resolve_path(path)\n\n await run_sync_in_worker_thread(self._write_sync, path, content)\n\n return path\n\n def _write_sync(self, key: str, data: bytes) -> None:\n \"\"\"\n Called by write_path(). Creates an S3 client and uploads a file\n object.\n \"\"\"\n\n s3_client = self._get_s3_client()\n\n with io.BytesIO(data) as stream:\n s3_client.upload_fileobj(Fileobj=stream, Bucket=self.bucket_name, Key=key)\n\n # NEW BLOCK INTERFACE METHODS BELOW\n @staticmethod\n def _list_objects_sync(page_iterator: PageIterator) -> List[Dict[str, Any]]:\n \"\"\"\n Synchronous method to collect S3 objects into a list\n\n Args:\n page_iterator: AWS Paginator for S3 objects\n\n Returns:\n List[Dict]: List of object information\n \"\"\"\n return [\n content for page in page_iterator for content in page.get(\"Contents\", [])\n ]\n\n def _join_bucket_folder(self, bucket_path: str = \"\") -> str:\n \"\"\"\n Joins the base bucket folder to the bucket path.\n NOTE: If a method reuses another method in this class, be careful to not\n call this twice because it'll join the bucket folder twice.\n See https://github.com/PrefectHQ/prefect-aws/issues/141 for a past issue.\n \"\"\"\n if not self.bucket_folder and not bucket_path:\n # there's a difference between \".\" and \"\", at least in the tests\n return \"\"\n\n bucket_path = str(bucket_path)\n if self.bucket_folder != \"\" and bucket_path.startswith(self.bucket_folder):\n self.logger.info(\n f\"Bucket path {bucket_path!r} is already prefixed with \"\n f\"bucket folder {self.bucket_folder!r}; is this intentional?\"\n )\n\n return (Path(self.bucket_folder) / bucket_path).as_posix() + (\n \"\" if not bucket_path.endswith(\"/\") else \"/\"\n )\n\n @sync_compatible\n async def list_objects(\n self,\n folder: str = \"\",\n delimiter: str = \"\",\n page_size: Optional[int] = None,\n max_items: Optional[int] = None,\n jmespath_query: Optional[str] = None,\n ) -> List[Dict[str, Any]]:\n \"\"\"\n Args:\n folder: Folder to list objects from.\n delimiter: Character used to group keys of listed objects.\n page_size: Number of objects to return in each request to the AWS API.\n max_items: Maximum number of objects that to be returned by task.\n jmespath_query: Query used to filter objects based on object attributes refer to\n the [boto3 docs](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html#filtering-results-with-jmespath)\n for more information on how to construct queries.\n\n Returns:\n List of objects and their metadata in the bucket.\n\n Examples:\n List objects under the `base_folder`.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.list_objects(\"base_folder\")\n ```\n \"\"\" # noqa: E501\n bucket_path = self._join_bucket_folder(folder)\n client = self.credentials.get_s3_client()\n paginator = client.get_paginator(\"list_objects_v2\")\n page_iterator = paginator.paginate(\n Bucket=self.bucket_name,\n Prefix=bucket_path,\n Delimiter=delimiter,\n PaginationConfig={\"PageSize\": page_size, \"MaxItems\": max_items},\n )\n if jmespath_query:\n page_iterator = page_iterator.search(f\"{jmespath_query} | {{Contents: @}}\")\n\n self.logger.info(f\"Listing objects in bucket {bucket_path}.\")\n objects = await run_sync_in_worker_thread(\n self._list_objects_sync, page_iterator\n )\n return objects\n\n @sync_compatible\n async def download_object_to_path(\n self,\n from_path: str,\n to_path: Optional[Union[str, Path]],\n **download_kwargs: Dict[str, Any],\n ) -> Path:\n \"\"\"\n Downloads an object from the S3 bucket to a path.\n\n Args:\n from_path: The path to the object to download; this gets prefixed\n with the bucket_folder.\n to_path: The path to download the object to. If not provided, the\n object's name will be used.\n **download_kwargs: Additional keyword arguments to pass to\n `Client.download_file`.\n\n Returns:\n The absolute path that the object was downloaded to.\n\n Examples:\n Download my_folder/notes.txt object to notes.txt.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.download_object_to_path(\"my_folder/notes.txt\", \"notes.txt\")\n ```\n \"\"\"\n if to_path is None:\n to_path = Path(from_path).name\n\n # making path absolute, but converting back to str here\n # since !r looks nicer that way and filename arg expects str\n to_path = str(Path(to_path).absolute())\n bucket_path = self._join_bucket_folder(from_path)\n client = self.credentials.get_s3_client()\n\n self.logger.debug(\n f\"Preparing to download object from bucket {self.bucket_name!r} \"\n f\"path {bucket_path!r} to {to_path!r}.\"\n )\n await run_sync_in_worker_thread(\n client.download_file,\n Bucket=self.bucket_name,\n Key=from_path,\n Filename=to_path,\n **download_kwargs,\n )\n self.logger.info(\n f\"Downloaded object from bucket {self.bucket_name!r} path {bucket_path!r} \"\n f\"to {to_path!r}.\"\n )\n return Path(to_path)\n\n @sync_compatible\n async def download_object_to_file_object(\n self,\n from_path: str,\n to_file_object: BinaryIO,\n **download_kwargs: Dict[str, Any],\n ) -> BinaryIO:\n \"\"\"\n Downloads an object from the object storage service to a file-like object,\n which can be a BytesIO object or a BufferedWriter.\n\n Args:\n from_path: The path to the object to download from; this gets prefixed\n with the bucket_folder.\n to_file_object: The file-like object to download the object to.\n **download_kwargs: Additional keyword arguments to pass to\n `Client.download_fileobj`.\n\n Returns:\n The file-like object that the object was downloaded to.\n\n Examples:\n Download my_folder/notes.txt object to a BytesIO object.\n ```python\n from io import BytesIO\n\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n with BytesIO() as buf:\n s3_bucket.download_object_to_file_object(\"my_folder/notes.txt\", buf)\n ```\n\n Download my_folder/notes.txt object to a BufferedWriter.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n with open(\"notes.txt\", \"wb\") as f:\n s3_bucket.download_object_to_file_object(\"my_folder/notes.txt\", f)\n ```\n \"\"\"\n client = self.credentials.get_s3_client()\n bucket_path = self._join_bucket_folder(from_path)\n\n self.logger.debug(\n f\"Preparing to download object from bucket {self.bucket_name!r} \"\n f\"path {bucket_path!r} to file object.\"\n )\n await run_sync_in_worker_thread(\n client.download_fileobj,\n Bucket=self.bucket_name,\n Key=bucket_path,\n Fileobj=to_file_object,\n **download_kwargs,\n )\n self.logger.info(\n f\"Downloaded object from bucket {self.bucket_name!r} path {bucket_path!r} \"\n \"to file object.\"\n )\n return to_file_object\n\n @sync_compatible\n async def download_folder_to_path(\n self,\n from_folder: str,\n to_folder: Optional[Union[str, Path]] = None,\n **download_kwargs: Dict[str, Any],\n ) -> Path:\n \"\"\"\n Downloads objects *within* a folder (excluding the folder itself)\n from the S3 bucket to a folder.\n\n Args:\n from_folder: The path to the folder to download from.\n to_folder: The path to download the folder to.\n **download_kwargs: Additional keyword arguments to pass to\n `Client.download_file`.\n\n Returns:\n The absolute path that the folder was downloaded to.\n\n Examples:\n Download my_folder to a local folder named my_folder.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.download_folder_to_path(\"my_folder\", \"my_folder\")\n ```\n \"\"\"\n if to_folder is None:\n to_folder = \"\"\n to_folder = Path(to_folder).absolute()\n\n client = self.credentials.get_s3_client()\n objects = await self.list_objects(folder=from_folder)\n\n # do not call self._join_bucket_folder for filter\n # because it's built-in to that method already!\n # however, we still need to do it because we're using relative_to\n bucket_folder = self._join_bucket_folder(from_folder)\n\n async_coros = []\n for object in objects:\n bucket_path = Path(object[\"Key\"]).relative_to(bucket_folder)\n # this skips the actual directory itself, e.g.\n # `my_folder/` will be skipped\n # `my_folder/notes.txt` will be downloaded\n if bucket_path.is_dir():\n continue\n to_path = to_folder / bucket_path\n to_path.parent.mkdir(parents=True, exist_ok=True)\n to_path = str(to_path) # must be string\n self.logger.info(\n f\"Downloading object from bucket {self.bucket_name!r} path \"\n f\"{bucket_path.as_posix()!r} to {to_path!r}.\"\n )\n async_coros.append(\n run_sync_in_worker_thread(\n client.download_file,\n Bucket=self.bucket_name,\n Key=object[\"Key\"],\n Filename=to_path,\n **download_kwargs,\n )\n )\n await asyncio.gather(*async_coros)\n\n return Path(to_folder)\n\n @sync_compatible\n async def stream_from(\n self,\n bucket: \"S3Bucket\",\n from_path: str,\n to_path: Optional[str] = None,\n **upload_kwargs: Dict[str, Any],\n ) -> str:\n \"\"\"Streams an object from another bucket to this bucket. Requires the\n object to be downloaded and uploaded in chunks. If `self`'s credentials\n allow for writes to the other bucket, try using `S3Bucket.copy_object`.\n\n Args:\n bucket: The bucket to stream from.\n from_path: The path of the object to stream.\n to_path: The path to stream the object to. Defaults to the object's name.\n **upload_kwargs: Additional keyword arguments to pass to\n `Client.upload_fileobj`.\n\n Returns:\n The path that the object was uploaded to.\n\n Examples:\n Stream notes.txt from your-bucket/notes.txt to my-bucket/landed/notes.txt.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n your_s3_bucket = S3Bucket.load(\"your-bucket\")\n my_s3_bucket = S3Bucket.load(\"my-bucket\")\n\n my_s3_bucket.stream_from(\n your_s3_bucket,\n \"notes.txt\",\n to_path=\"landed/notes.txt\"\n )\n ```\n\n \"\"\"\n if to_path is None:\n to_path = Path(from_path).name\n\n # Get the source object's StreamingBody\n from_path: str = bucket._join_bucket_folder(from_path)\n from_client = bucket.credentials.get_s3_client()\n obj = await run_sync_in_worker_thread(\n from_client.get_object, Bucket=bucket.bucket_name, Key=from_path\n )\n body: StreamingBody = obj[\"Body\"]\n\n # Upload the StreamingBody to this bucket\n bucket_path = str(self._join_bucket_folder(to_path))\n to_client = self.credentials.get_s3_client()\n await run_sync_in_worker_thread(\n to_client.upload_fileobj,\n Fileobj=body,\n Bucket=self.bucket_name,\n Key=bucket_path,\n **upload_kwargs,\n )\n self.logger.info(\n f\"Streamed s3://{bucket.bucket_name}/{from_path} to the bucket \"\n f\"{self.bucket_name!r} path {bucket_path!r}.\"\n )\n return bucket_path\n\n @sync_compatible\n async def upload_from_path(\n self,\n from_path: Union[str, Path],\n to_path: Optional[str] = None,\n **upload_kwargs: Dict[str, Any],\n ) -> str:\n \"\"\"\n Uploads an object from a path to the S3 bucket.\n\n Args:\n from_path: The path to the file to upload from.\n to_path: The path to upload the file to.\n **upload_kwargs: Additional keyword arguments to pass to `Client.upload`.\n\n Returns:\n The path that the object was uploaded to.\n\n Examples:\n Upload notes.txt to my_folder/notes.txt.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.upload_from_path(\"notes.txt\", \"my_folder/notes.txt\")\n ```\n \"\"\"\n from_path = str(Path(from_path).absolute())\n if to_path is None:\n to_path = Path(from_path).name\n\n bucket_path = str(self._join_bucket_folder(to_path))\n client = self.credentials.get_s3_client()\n\n await run_sync_in_worker_thread(\n client.upload_file,\n Filename=from_path,\n Bucket=self.bucket_name,\n Key=bucket_path,\n **upload_kwargs,\n )\n self.logger.info(\n f\"Uploaded from {from_path!r} to the bucket \"\n f\"{self.bucket_name!r} path {bucket_path!r}.\"\n )\n return bucket_path\n\n @sync_compatible\n async def upload_from_file_object(\n self, from_file_object: BinaryIO, to_path: str, **upload_kwargs: Dict[str, Any]\n ) -> str:\n \"\"\"\n Uploads an object to the S3 bucket from a file-like object,\n which can be a BytesIO object or a BufferedReader.\n\n Args:\n from_file_object: The file-like object to upload from.\n to_path: The path to upload the object to.\n **upload_kwargs: Additional keyword arguments to pass to\n `Client.upload_fileobj`.\n\n Returns:\n The path that the object was uploaded to.\n\n Examples:\n Upload BytesIO object to my_folder/notes.txt.\n ```python\n from io import BytesIO\n\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n with open(\"notes.txt\", \"rb\") as f:\n s3_bucket.upload_from_file_object(f, \"my_folder/notes.txt\")\n ```\n\n Upload BufferedReader object to my_folder/notes.txt.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n with open(\"notes.txt\", \"rb\") as f:\n s3_bucket.upload_from_file_object(\n f, \"my_folder/notes.txt\"\n )\n ```\n \"\"\"\n bucket_path = str(self._join_bucket_folder(to_path))\n client = self.credentials.get_s3_client()\n await run_sync_in_worker_thread(\n client.upload_fileobj,\n Fileobj=from_file_object,\n Bucket=self.bucket_name,\n Key=bucket_path,\n **upload_kwargs,\n )\n self.logger.info(\n \"Uploaded from file object to the bucket \"\n f\"{self.bucket_name!r} path {bucket_path!r}.\"\n )\n return bucket_path\n\n @sync_compatible\n async def upload_from_folder(\n self,\n from_folder: Union[str, Path],\n to_folder: Optional[str] = None,\n **upload_kwargs: Dict[str, Any],\n ) -> str:\n \"\"\"\n Uploads files *within* a folder (excluding the folder itself)\n to the object storage service folder.\n\n Args:\n from_folder: The path to the folder to upload from.\n to_folder: The path to upload the folder to.\n **upload_kwargs: Additional keyword arguments to pass to\n `Client.upload_fileobj`.\n\n Returns:\n The path that the folder was uploaded to.\n\n Examples:\n Upload contents from my_folder to new_folder.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.upload_from_folder(\"my_folder\", \"new_folder\")\n ```\n \"\"\"\n from_folder = Path(from_folder)\n bucket_folder = self._join_bucket_folder(to_folder or \"\")\n\n num_uploaded = 0\n client = self.credentials.get_s3_client()\n\n async_coros = []\n for from_path in from_folder.rglob(\"**/*\"):\n # this skips the actual directory itself, e.g.\n # `my_folder/` will be skipped\n # `my_folder/notes.txt` will be uploaded\n if from_path.is_dir():\n continue\n bucket_path = (\n Path(bucket_folder) / from_path.relative_to(from_folder)\n ).as_posix()\n self.logger.info(\n f\"Uploading from {str(from_path)!r} to the bucket \"\n f\"{self.bucket_name!r} path {bucket_path!r}.\"\n )\n async_coros.append(\n run_sync_in_worker_thread(\n client.upload_file,\n Filename=str(from_path),\n Bucket=self.bucket_name,\n Key=bucket_path,\n **upload_kwargs,\n )\n )\n num_uploaded += 1\n await asyncio.gather(*async_coros)\n\n if num_uploaded == 0:\n self.logger.warning(f\"No files were uploaded from {str(from_folder)!r}.\")\n else:\n self.logger.info(\n f\"Uploaded {num_uploaded} files from {str(from_folder)!r} to \"\n f\"the bucket {self.bucket_name!r} path {bucket_path!r}\"\n )\n\n return to_folder\n\n @sync_compatible\n async def copy_object(\n self,\n from_path: Union[str, Path],\n to_path: Union[str, Path],\n to_bucket: Optional[Union[\"S3Bucket\", str]] = None,\n **copy_kwargs,\n ) -> str:\n \"\"\"Uses S3's internal\n [CopyObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html)\n to copy objects within or between buckets. To copy objects between buckets,\n `self`'s credentials must have permission to read the source object and write\n to the target object. If the credentials do not have those permissions, try\n using `S3Bucket.stream_from`.\n\n Args:\n from_path: The path of the object to copy.\n to_path: The path to copy the object to.\n to_bucket: The bucket to copy to. Defaults to the current bucket.\n **copy_kwargs: Additional keyword arguments to pass to\n `S3Client.copy_object`.\n\n Returns:\n The path that the object was copied to. Excludes the bucket name.\n\n Examples:\n\n Copy notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.copy_object(\"my_folder/notes.txt\", \"my_folder/notes_copy.txt\")\n ```\n\n Copy notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt in\n another bucket.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.copy_object(\n \"my_folder/notes.txt\",\n \"my_folder/notes_copy.txt\",\n to_bucket=\"other-bucket\"\n )\n ```\n \"\"\"\n s3_client = self.credentials.get_s3_client()\n\n source_path = self._resolve_path(Path(from_path).as_posix())\n target_path = self._resolve_path(Path(to_path).as_posix())\n\n source_bucket_name = self.bucket_name\n target_bucket_name = self.bucket_name\n if isinstance(to_bucket, S3Bucket):\n target_bucket_name = to_bucket.bucket_name\n target_path = to_bucket._resolve_path(target_path)\n elif isinstance(to_bucket, str):\n target_bucket_name = to_bucket\n elif to_bucket is not None:\n raise TypeError(\n \"to_bucket must be a string or S3Bucket, not\"\n f\" {type(target_bucket_name)}\"\n )\n\n self.logger.info(\n \"Copying object from bucket %s with key %s to bucket %s with key %s\",\n source_bucket_name,\n source_path,\n target_bucket_name,\n target_path,\n )\n\n s3_client.copy_object(\n CopySource={\"Bucket\": source_bucket_name, \"Key\": source_path},\n Bucket=target_bucket_name,\n Key=target_path,\n **copy_kwargs,\n )\n\n return target_path\n\n @sync_compatible\n async def move_object(\n self,\n from_path: Union[str, Path],\n to_path: Union[str, Path],\n to_bucket: Optional[Union[\"S3Bucket\", str]] = None,\n ) -> str:\n \"\"\"Uses S3's internal CopyObject and DeleteObject to move objects within or\n between buckets. To move objects between buckets, `self`'s credentials must\n have permission to read and delete the source object and write to the target\n object. If the credentials do not have those permissions, this method will\n raise an error. If the credentials have permission to read the source object\n but not delete it, the object will be copied but not deleted.\n\n Args:\n from_path: The path of the object to move.\n to_path: The path to move the object to.\n to_bucket: The bucket to move to. Defaults to the current bucket.\n\n Returns:\n The path that the object was moved to. Excludes the bucket name.\n\n Examples:\n\n Move notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.move_object(\"my_folder/notes.txt\", \"my_folder/notes_copy.txt\")\n ```\n\n Move notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt in\n another bucket.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.move_object(\n \"my_folder/notes.txt\",\n \"my_folder/notes_copy.txt\",\n to_bucket=\"other-bucket\"\n )\n ```\n \"\"\"\n s3_client = self.credentials.get_s3_client()\n\n source_path = self._resolve_path(Path(from_path).as_posix())\n target_path = self._resolve_path(Path(to_path).as_posix())\n\n source_bucket_name = self.bucket_name\n target_bucket_name = self.bucket_name\n if isinstance(to_bucket, S3Bucket):\n target_bucket_name = to_bucket.bucket_name\n target_path = to_bucket._resolve_path(target_path)\n elif isinstance(to_bucket, str):\n target_bucket_name = to_bucket\n elif to_bucket is not None:\n raise TypeError(\n \"to_bucket must be a string or S3Bucket, not\"\n f\" {type(target_bucket_name)}\"\n )\n\n self.logger.info(\n \"Moving object from s3://%s/%s to s3://%s/%s\",\n source_bucket_name,\n source_path,\n target_bucket_name,\n target_path,\n )\n\n # If invalid, should error and prevent next operation\n s3_client.copy(\n CopySource={\"Bucket\": source_bucket_name, \"Key\": source_path},\n Bucket=target_bucket_name,\n Key=target_path,\n )\n s3_client.delete_object(Bucket=source_bucket_name, Key=source_path)\n return target_path\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket-attributes","title":"Attributes","text":""},{"location":"s3/#prefect_aws.s3.S3Bucket.basepath","title":"
basepath: str
property
writable
","text":"
The base path of the S3 bucket.
Returns:
Type Description
str
The base path of the S3 bucket.
"},{"location":"s3/#prefect_aws.s3.S3Bucket.bucket_folder","title":"
bucket_folder: str
pydantic-field
","text":"
A default path to a folder within the S3 bucket to use for reading and writing objects.
"},{"location":"s3/#prefect_aws.s3.S3Bucket.bucket_name","title":"
bucket_name: str
pydantic-field
required
","text":"
Name of your bucket.
"},{"location":"s3/#prefect_aws.s3.S3Bucket.credentials","title":"
credentials: Union[prefect_aws.credentials.MinIOCredentials, prefect_aws.credentials.AwsCredentials]
pydantic-field
","text":"
A block containing your credentials to AWS or MinIO.
"},{"location":"s3/#prefect_aws.s3.S3Bucket-methods","title":"Methods","text":""},{"location":"s3/#prefect_aws.s3.S3Bucket.__json_encoder__","title":"
__json_encoder__
special
staticmethod
","text":"
partial(func, args, *keywords) - new function with partial application of the given arguments and keywords.
"},{"location":"s3/#prefect_aws.s3.S3Bucket.copy_object","title":"
copy_object
async
","text":"
Uses S3's internal CopyObject to copy objects within or between buckets. To copy objects between buckets, self
's credentials must have permission to read the source object and write to the target object. If the credentials do not have those permissions, try using S3Bucket.stream_from
.
Parameters:
Name Type Description Default
from_path
Union[str, pathlib.Path]
The path of the object to copy.
required
to_path
Union[str, pathlib.Path]
The path to copy the object to.
required
to_bucket
Union[S3Bucket, str]
The bucket to copy to. Defaults to the current bucket.
None
**copy_kwargs
Additional keyword arguments to pass to S3Client.copy_object
.
{}
Returns:
Type Description
str
The path that the object was copied to. Excludes the bucket name.
Examples:
Copy notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.copy_object(\"my_folder/notes.txt\", \"my_folder/notes_copy.txt\")\n
Copy notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt in another bucket.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.copy_object(\n \"my_folder/notes.txt\",\n \"my_folder/notes_copy.txt\",\n to_bucket=\"other-bucket\"\n)\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def copy_object(\n self,\n from_path: Union[str, Path],\n to_path: Union[str, Path],\n to_bucket: Optional[Union[\"S3Bucket\", str]] = None,\n **copy_kwargs,\n) -> str:\n \"\"\"Uses S3's internal\n [CopyObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html)\n to copy objects within or between buckets. To copy objects between buckets,\n `self`'s credentials must have permission to read the source object and write\n to the target object. If the credentials do not have those permissions, try\n using `S3Bucket.stream_from`.\n\n Args:\n from_path: The path of the object to copy.\n to_path: The path to copy the object to.\n to_bucket: The bucket to copy to. Defaults to the current bucket.\n **copy_kwargs: Additional keyword arguments to pass to\n `S3Client.copy_object`.\n\n Returns:\n The path that the object was copied to. Excludes the bucket name.\n\n Examples:\n\n Copy notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.copy_object(\"my_folder/notes.txt\", \"my_folder/notes_copy.txt\")\n ```\n\n Copy notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt in\n another bucket.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.copy_object(\n \"my_folder/notes.txt\",\n \"my_folder/notes_copy.txt\",\n to_bucket=\"other-bucket\"\n )\n ```\n \"\"\"\n s3_client = self.credentials.get_s3_client()\n\n source_path = self._resolve_path(Path(from_path).as_posix())\n target_path = self._resolve_path(Path(to_path).as_posix())\n\n source_bucket_name = self.bucket_name\n target_bucket_name = self.bucket_name\n if isinstance(to_bucket, S3Bucket):\n target_bucket_name = to_bucket.bucket_name\n target_path = to_bucket._resolve_path(target_path)\n elif isinstance(to_bucket, str):\n target_bucket_name = to_bucket\n elif to_bucket is not None:\n raise TypeError(\n \"to_bucket must be a string or S3Bucket, not\"\n f\" {type(target_bucket_name)}\"\n )\n\n self.logger.info(\n \"Copying object from bucket %s with key %s to bucket %s with key %s\",\n source_bucket_name,\n source_path,\n target_bucket_name,\n target_path,\n )\n\n s3_client.copy_object(\n CopySource={\"Bucket\": source_bucket_name, \"Key\": source_path},\n Bucket=target_bucket_name,\n Key=target_path,\n **copy_kwargs,\n )\n\n return target_path\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.download_folder_to_path","title":"
download_folder_to_path
async
","text":"
Downloads objects within a folder (excluding the folder itself) from the S3 bucket to a folder.
Parameters:
Name Type Description Default
from_folder
str
The path to the folder to download from.
required
to_folder
Union[str, pathlib.Path]
The path to download the folder to.
None
**download_kwargs
Dict[str, Any]
Additional keyword arguments to pass to Client.download_file
.
{}
Returns:
Type Description
Path
The absolute path that the folder was downloaded to.
Examples:
Download my_folder to a local folder named my_folder.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.download_folder_to_path(\"my_folder\", \"my_folder\")\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def download_folder_to_path(\n self,\n from_folder: str,\n to_folder: Optional[Union[str, Path]] = None,\n **download_kwargs: Dict[str, Any],\n) -> Path:\n \"\"\"\n Downloads objects *within* a folder (excluding the folder itself)\n from the S3 bucket to a folder.\n\n Args:\n from_folder: The path to the folder to download from.\n to_folder: The path to download the folder to.\n **download_kwargs: Additional keyword arguments to pass to\n `Client.download_file`.\n\n Returns:\n The absolute path that the folder was downloaded to.\n\n Examples:\n Download my_folder to a local folder named my_folder.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.download_folder_to_path(\"my_folder\", \"my_folder\")\n ```\n \"\"\"\n if to_folder is None:\n to_folder = \"\"\n to_folder = Path(to_folder).absolute()\n\n client = self.credentials.get_s3_client()\n objects = await self.list_objects(folder=from_folder)\n\n # do not call self._join_bucket_folder for filter\n # because it's built-in to that method already!\n # however, we still need to do it because we're using relative_to\n bucket_folder = self._join_bucket_folder(from_folder)\n\n async_coros = []\n for object in objects:\n bucket_path = Path(object[\"Key\"]).relative_to(bucket_folder)\n # this skips the actual directory itself, e.g.\n # `my_folder/` will be skipped\n # `my_folder/notes.txt` will be downloaded\n if bucket_path.is_dir():\n continue\n to_path = to_folder / bucket_path\n to_path.parent.mkdir(parents=True, exist_ok=True)\n to_path = str(to_path) # must be string\n self.logger.info(\n f\"Downloading object from bucket {self.bucket_name!r} path \"\n f\"{bucket_path.as_posix()!r} to {to_path!r}.\"\n )\n async_coros.append(\n run_sync_in_worker_thread(\n client.download_file,\n Bucket=self.bucket_name,\n Key=object[\"Key\"],\n Filename=to_path,\n **download_kwargs,\n )\n )\n await asyncio.gather(*async_coros)\n\n return Path(to_folder)\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.download_object_to_file_object","title":"
download_object_to_file_object
async
","text":"
Downloads an object from the object storage service to a file-like object, which can be a BytesIO object or a BufferedWriter.
Parameters:
Name Type Description Default
from_path
str
The path to the object to download from; this gets prefixed with the bucket_folder.
required
to_file_object
BinaryIO
The file-like object to download the object to.
required
**download_kwargs
Dict[str, Any]
Additional keyword arguments to pass to Client.download_fileobj
.
{}
Returns:
Type Description
BinaryIO
The file-like object that the object was downloaded to.
Examples:
Download my_folder/notes.txt object to a BytesIO object.
from io import BytesIO\n\nfrom prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\nwith BytesIO() as buf:\n s3_bucket.download_object_to_file_object(\"my_folder/notes.txt\", buf)\n
Download my_folder/notes.txt object to a BufferedWriter.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\nwith open(\"notes.txt\", \"wb\") as f:\n s3_bucket.download_object_to_file_object(\"my_folder/notes.txt\", f)\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def download_object_to_file_object(\n self,\n from_path: str,\n to_file_object: BinaryIO,\n **download_kwargs: Dict[str, Any],\n) -> BinaryIO:\n \"\"\"\n Downloads an object from the object storage service to a file-like object,\n which can be a BytesIO object or a BufferedWriter.\n\n Args:\n from_path: The path to the object to download from; this gets prefixed\n with the bucket_folder.\n to_file_object: The file-like object to download the object to.\n **download_kwargs: Additional keyword arguments to pass to\n `Client.download_fileobj`.\n\n Returns:\n The file-like object that the object was downloaded to.\n\n Examples:\n Download my_folder/notes.txt object to a BytesIO object.\n ```python\n from io import BytesIO\n\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n with BytesIO() as buf:\n s3_bucket.download_object_to_file_object(\"my_folder/notes.txt\", buf)\n ```\n\n Download my_folder/notes.txt object to a BufferedWriter.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n with open(\"notes.txt\", \"wb\") as f:\n s3_bucket.download_object_to_file_object(\"my_folder/notes.txt\", f)\n ```\n \"\"\"\n client = self.credentials.get_s3_client()\n bucket_path = self._join_bucket_folder(from_path)\n\n self.logger.debug(\n f\"Preparing to download object from bucket {self.bucket_name!r} \"\n f\"path {bucket_path!r} to file object.\"\n )\n await run_sync_in_worker_thread(\n client.download_fileobj,\n Bucket=self.bucket_name,\n Key=bucket_path,\n Fileobj=to_file_object,\n **download_kwargs,\n )\n self.logger.info(\n f\"Downloaded object from bucket {self.bucket_name!r} path {bucket_path!r} \"\n \"to file object.\"\n )\n return to_file_object\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.download_object_to_path","title":"
download_object_to_path
async
","text":"
Downloads an object from the S3 bucket to a path.
Parameters:
Name Type Description Default
from_path
str
The path to the object to download; this gets prefixed with the bucket_folder.
required
to_path
Union[str, pathlib.Path]
The path to download the object to. If not provided, the object's name will be used.
required
**download_kwargs
Dict[str, Any]
Additional keyword arguments to pass to Client.download_file
.
{}
Returns:
Type Description
Path
The absolute path that the object was downloaded to.
Examples:
Download my_folder/notes.txt object to notes.txt.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.download_object_to_path(\"my_folder/notes.txt\", \"notes.txt\")\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def download_object_to_path(\n self,\n from_path: str,\n to_path: Optional[Union[str, Path]],\n **download_kwargs: Dict[str, Any],\n) -> Path:\n \"\"\"\n Downloads an object from the S3 bucket to a path.\n\n Args:\n from_path: The path to the object to download; this gets prefixed\n with the bucket_folder.\n to_path: The path to download the object to. If not provided, the\n object's name will be used.\n **download_kwargs: Additional keyword arguments to pass to\n `Client.download_file`.\n\n Returns:\n The absolute path that the object was downloaded to.\n\n Examples:\n Download my_folder/notes.txt object to notes.txt.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.download_object_to_path(\"my_folder/notes.txt\", \"notes.txt\")\n ```\n \"\"\"\n if to_path is None:\n to_path = Path(from_path).name\n\n # making path absolute, but converting back to str here\n # since !r looks nicer that way and filename arg expects str\n to_path = str(Path(to_path).absolute())\n bucket_path = self._join_bucket_folder(from_path)\n client = self.credentials.get_s3_client()\n\n self.logger.debug(\n f\"Preparing to download object from bucket {self.bucket_name!r} \"\n f\"path {bucket_path!r} to {to_path!r}.\"\n )\n await run_sync_in_worker_thread(\n client.download_file,\n Bucket=self.bucket_name,\n Key=from_path,\n Filename=to_path,\n **download_kwargs,\n )\n self.logger.info(\n f\"Downloaded object from bucket {self.bucket_name!r} path {bucket_path!r} \"\n f\"to {to_path!r}.\"\n )\n return Path(to_path)\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.get_directory","title":"
get_directory
async
","text":"
Copies a folder from the configured S3 bucket to a local directory.
Defaults to copying the entire contents of the block's basepath to the current working directory.
Parameters:
Name Type Description Default
from_path
Optional[str]
Path in S3 bucket to download from. Defaults to the block's configured basepath.
None
local_path
Optional[str]
Local path to download S3 contents to. Defaults to the current working directory.
None
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def get_directory(\n self, from_path: Optional[str] = None, local_path: Optional[str] = None\n) -> None:\n \"\"\"\n Copies a folder from the configured S3 bucket to a local directory.\n\n Defaults to copying the entire contents of the block's basepath to the current\n working directory.\n\n Args:\n from_path: Path in S3 bucket to download from. Defaults to the block's\n configured basepath.\n local_path: Local path to download S3 contents to. Defaults to the current\n working directory.\n \"\"\"\n bucket_folder = self.bucket_folder\n if from_path is None:\n from_path = str(bucket_folder) if bucket_folder else \"\"\n\n if local_path is None:\n local_path = str(Path(\".\").absolute())\n else:\n local_path = str(Path(local_path).expanduser())\n\n bucket = self._get_bucket_resource()\n for obj in bucket.objects.filter(Prefix=from_path):\n if obj.key[-1] == \"/\":\n # object is a folder and will be created if it contains any objects\n continue\n target = os.path.join(\n local_path,\n os.path.relpath(obj.key, from_path),\n )\n os.makedirs(os.path.dirname(target), exist_ok=True)\n bucket.download_file(obj.key, target)\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.list_objects","title":"
list_objects
async
","text":"
Parameters:
Name Type Description Default
folder
str
Folder to list objects from.
''
delimiter
str
Character used to group keys of listed objects.
''
page_size
Optional[int]
Number of objects to return in each request to the AWS API.
None
max_items
Optional[int]
Maximum number of objects that to be returned by task.
None
jmespath_query
Optional[str]
Query used to filter objects based on object attributes refer to the boto3 docs for more information on how to construct queries.
None
Returns:
Type Description
List[Dict[str, Any]]
List of objects and their metadata in the bucket.
Examples:
List objects under the base_folder
.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.list_objects(\"base_folder\")\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def list_objects(\n self,\n folder: str = \"\",\n delimiter: str = \"\",\n page_size: Optional[int] = None,\n max_items: Optional[int] = None,\n jmespath_query: Optional[str] = None,\n) -> List[Dict[str, Any]]:\n \"\"\"\n Args:\n folder: Folder to list objects from.\n delimiter: Character used to group keys of listed objects.\n page_size: Number of objects to return in each request to the AWS API.\n max_items: Maximum number of objects that to be returned by task.\n jmespath_query: Query used to filter objects based on object attributes refer to\n the [boto3 docs](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html#filtering-results-with-jmespath)\n for more information on how to construct queries.\n\n Returns:\n List of objects and their metadata in the bucket.\n\n Examples:\n List objects under the `base_folder`.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.list_objects(\"base_folder\")\n ```\n \"\"\" # noqa: E501\n bucket_path = self._join_bucket_folder(folder)\n client = self.credentials.get_s3_client()\n paginator = client.get_paginator(\"list_objects_v2\")\n page_iterator = paginator.paginate(\n Bucket=self.bucket_name,\n Prefix=bucket_path,\n Delimiter=delimiter,\n PaginationConfig={\"PageSize\": page_size, \"MaxItems\": max_items},\n )\n if jmespath_query:\n page_iterator = page_iterator.search(f\"{jmespath_query} | {{Contents: @}}\")\n\n self.logger.info(f\"Listing objects in bucket {bucket_path}.\")\n objects = await run_sync_in_worker_thread(\n self._list_objects_sync, page_iterator\n )\n return objects\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.move_object","title":"
move_object
async
","text":"
Uses S3's internal CopyObject and DeleteObject to move objects within or between buckets. To move objects between buckets, self
's credentials must have permission to read and delete the source object and write to the target object. If the credentials do not have those permissions, this method will raise an error. If the credentials have permission to read the source object but not delete it, the object will be copied but not deleted.
Parameters:
Name Type Description Default
from_path
Union[str, pathlib.Path]
The path of the object to move.
required
to_path
Union[str, pathlib.Path]
The path to move the object to.
required
to_bucket
Union[S3Bucket, str]
The bucket to move to. Defaults to the current bucket.
None
Returns:
Type Description
str
The path that the object was moved to. Excludes the bucket name.
Examples:
Move notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.move_object(\"my_folder/notes.txt\", \"my_folder/notes_copy.txt\")\n
Move notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt in another bucket.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.move_object(\n \"my_folder/notes.txt\",\n \"my_folder/notes_copy.txt\",\n to_bucket=\"other-bucket\"\n)\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def move_object(\n self,\n from_path: Union[str, Path],\n to_path: Union[str, Path],\n to_bucket: Optional[Union[\"S3Bucket\", str]] = None,\n) -> str:\n \"\"\"Uses S3's internal CopyObject and DeleteObject to move objects within or\n between buckets. To move objects between buckets, `self`'s credentials must\n have permission to read and delete the source object and write to the target\n object. If the credentials do not have those permissions, this method will\n raise an error. If the credentials have permission to read the source object\n but not delete it, the object will be copied but not deleted.\n\n Args:\n from_path: The path of the object to move.\n to_path: The path to move the object to.\n to_bucket: The bucket to move to. Defaults to the current bucket.\n\n Returns:\n The path that the object was moved to. Excludes the bucket name.\n\n Examples:\n\n Move notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.move_object(\"my_folder/notes.txt\", \"my_folder/notes_copy.txt\")\n ```\n\n Move notes.txt from my_folder/notes.txt to my_folder/notes_copy.txt in\n another bucket.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.move_object(\n \"my_folder/notes.txt\",\n \"my_folder/notes_copy.txt\",\n to_bucket=\"other-bucket\"\n )\n ```\n \"\"\"\n s3_client = self.credentials.get_s3_client()\n\n source_path = self._resolve_path(Path(from_path).as_posix())\n target_path = self._resolve_path(Path(to_path).as_posix())\n\n source_bucket_name = self.bucket_name\n target_bucket_name = self.bucket_name\n if isinstance(to_bucket, S3Bucket):\n target_bucket_name = to_bucket.bucket_name\n target_path = to_bucket._resolve_path(target_path)\n elif isinstance(to_bucket, str):\n target_bucket_name = to_bucket\n elif to_bucket is not None:\n raise TypeError(\n \"to_bucket must be a string or S3Bucket, not\"\n f\" {type(target_bucket_name)}\"\n )\n\n self.logger.info(\n \"Moving object from s3://%s/%s to s3://%s/%s\",\n source_bucket_name,\n source_path,\n target_bucket_name,\n target_path,\n )\n\n # If invalid, should error and prevent next operation\n s3_client.copy(\n CopySource={\"Bucket\": source_bucket_name, \"Key\": source_path},\n Bucket=target_bucket_name,\n Key=target_path,\n )\n s3_client.delete_object(Bucket=source_bucket_name, Key=source_path)\n return target_path\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.put_directory","title":"
put_directory
async
","text":"
Uploads a directory from a given local path to the configured S3 bucket in a given folder.
Defaults to uploading the entire contents the current working directory to the block's basepath.
Parameters:
Name Type Description Default
local_path
Optional[str]
Path to local directory to upload from.
None
to_path
Optional[str]
Path in S3 bucket to upload to. Defaults to block's configured basepath.
None
ignore_file
Optional[str]
Path to file containing gitignore style expressions for filepaths to ignore.
None
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def put_directory(\n self,\n local_path: Optional[str] = None,\n to_path: Optional[str] = None,\n ignore_file: Optional[str] = None,\n) -> int:\n \"\"\"\n Uploads a directory from a given local path to the configured S3 bucket in a\n given folder.\n\n Defaults to uploading the entire contents the current working directory to the\n block's basepath.\n\n Args:\n local_path: Path to local directory to upload from.\n to_path: Path in S3 bucket to upload to. Defaults to block's configured\n basepath.\n ignore_file: Path to file containing gitignore style expressions for\n filepaths to ignore.\n\n \"\"\"\n to_path = \"\" if to_path is None else to_path\n\n if local_path is None:\n local_path = \".\"\n\n included_files = None\n if ignore_file:\n with open(ignore_file, \"r\") as f:\n ignore_patterns = f.readlines()\n\n included_files = filter_files(local_path, ignore_patterns)\n\n uploaded_file_count = 0\n for local_file_path in Path(local_path).expanduser().rglob(\"*\"):\n if (\n included_files is not None\n and str(local_file_path.relative_to(local_path)) not in included_files\n ):\n continue\n elif not local_file_path.is_dir():\n remote_file_path = Path(to_path) / local_file_path.relative_to(\n local_path\n )\n with open(local_file_path, \"rb\") as local_file:\n local_file_content = local_file.read()\n\n await self.write_path(\n remote_file_path.as_posix(), content=local_file_content\n )\n uploaded_file_count += 1\n\n return uploaded_file_count\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.read_path","title":"
read_path
async
","text":"
Read specified path from S3 and return contents. Provide the entire path to the key in S3.
Parameters:
Name Type Description Default
path
str
Entire path to (and including) the key.
required
Examples:
Read \"subfolder/file1\" contents from an S3 bucket named \"bucket\":
from prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import S3Bucket\n\naws_creds = AwsCredentials(\n aws_access_key_id=AWS_ACCESS_KEY_ID,\n aws_secret_access_key=AWS_SECRET_ACCESS_KEY\n)\n\ns3_bucket_block = S3Bucket(\n bucket_name=\"bucket\",\n aws_credentials=aws_creds,\n basepath=\"subfolder\"\n)\n\nkey_contents = s3_bucket_block.read_path(path=\"subfolder/file1\")\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def read_path(self, path: str) -> bytes:\n \"\"\"\n Read specified path from S3 and return contents. Provide the entire\n path to the key in S3.\n\n Args:\n path: Entire path to (and including) the key.\n\n Example:\n Read \"subfolder/file1\" contents from an S3 bucket named \"bucket\":\n ```python\n from prefect_aws import AwsCredentials\n from prefect_aws.s3 import S3Bucket\n\n aws_creds = AwsCredentials(\n aws_access_key_id=AWS_ACCESS_KEY_ID,\n aws_secret_access_key=AWS_SECRET_ACCESS_KEY\n )\n\n s3_bucket_block = S3Bucket(\n bucket_name=\"bucket\",\n aws_credentials=aws_creds,\n basepath=\"subfolder\"\n )\n\n key_contents = s3_bucket_block.read_path(path=\"subfolder/file1\")\n ```\n \"\"\"\n path = self._resolve_path(path)\n\n return await run_sync_in_worker_thread(self._read_sync, path)\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.stream_from","title":"
stream_from
async
","text":"
Streams an object from another bucket to this bucket. Requires the object to be downloaded and uploaded in chunks. If self
's credentials allow for writes to the other bucket, try using S3Bucket.copy_object
.
Parameters:
Name Type Description Default
bucket
S3Bucket
The bucket to stream from.
required
from_path
str
The path of the object to stream.
required
to_path
Optional[str]
The path to stream the object to. Defaults to the object's name.
None
**upload_kwargs
Dict[str, Any]
Additional keyword arguments to pass to Client.upload_fileobj
.
{}
Returns:
Type Description
str
The path that the object was uploaded to.
Examples:
Stream notes.txt from your-bucket/notes.txt to my-bucket/landed/notes.txt.
from prefect_aws.s3 import S3Bucket\n\nyour_s3_bucket = S3Bucket.load(\"your-bucket\")\nmy_s3_bucket = S3Bucket.load(\"my-bucket\")\n\nmy_s3_bucket.stream_from(\n your_s3_bucket,\n \"notes.txt\",\n to_path=\"landed/notes.txt\"\n)\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def stream_from(\n self,\n bucket: \"S3Bucket\",\n from_path: str,\n to_path: Optional[str] = None,\n **upload_kwargs: Dict[str, Any],\n) -> str:\n \"\"\"Streams an object from another bucket to this bucket. Requires the\n object to be downloaded and uploaded in chunks. If `self`'s credentials\n allow for writes to the other bucket, try using `S3Bucket.copy_object`.\n\n Args:\n bucket: The bucket to stream from.\n from_path: The path of the object to stream.\n to_path: The path to stream the object to. Defaults to the object's name.\n **upload_kwargs: Additional keyword arguments to pass to\n `Client.upload_fileobj`.\n\n Returns:\n The path that the object was uploaded to.\n\n Examples:\n Stream notes.txt from your-bucket/notes.txt to my-bucket/landed/notes.txt.\n\n ```python\n from prefect_aws.s3 import S3Bucket\n\n your_s3_bucket = S3Bucket.load(\"your-bucket\")\n my_s3_bucket = S3Bucket.load(\"my-bucket\")\n\n my_s3_bucket.stream_from(\n your_s3_bucket,\n \"notes.txt\",\n to_path=\"landed/notes.txt\"\n )\n ```\n\n \"\"\"\n if to_path is None:\n to_path = Path(from_path).name\n\n # Get the source object's StreamingBody\n from_path: str = bucket._join_bucket_folder(from_path)\n from_client = bucket.credentials.get_s3_client()\n obj = await run_sync_in_worker_thread(\n from_client.get_object, Bucket=bucket.bucket_name, Key=from_path\n )\n body: StreamingBody = obj[\"Body\"]\n\n # Upload the StreamingBody to this bucket\n bucket_path = str(self._join_bucket_folder(to_path))\n to_client = self.credentials.get_s3_client()\n await run_sync_in_worker_thread(\n to_client.upload_fileobj,\n Fileobj=body,\n Bucket=self.bucket_name,\n Key=bucket_path,\n **upload_kwargs,\n )\n self.logger.info(\n f\"Streamed s3://{bucket.bucket_name}/{from_path} to the bucket \"\n f\"{self.bucket_name!r} path {bucket_path!r}.\"\n )\n return bucket_path\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.upload_from_file_object","title":"
upload_from_file_object
async
","text":"
Uploads an object to the S3 bucket from a file-like object, which can be a BytesIO object or a BufferedReader.
Parameters:
Name Type Description Default
from_file_object
BinaryIO
The file-like object to upload from.
required
to_path
str
The path to upload the object to.
required
**upload_kwargs
Dict[str, Any]
Additional keyword arguments to pass to Client.upload_fileobj
.
{}
Returns:
Type Description
str
The path that the object was uploaded to.
Examples:
Upload BytesIO object to my_folder/notes.txt.
from io import BytesIO\n\nfrom prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\nwith open(\"notes.txt\", \"rb\") as f:\n s3_bucket.upload_from_file_object(f, \"my_folder/notes.txt\")\n
Upload BufferedReader object to my_folder/notes.txt.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\nwith open(\"notes.txt\", \"rb\") as f:\n s3_bucket.upload_from_file_object(\n f, \"my_folder/notes.txt\"\n )\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def upload_from_file_object(\n self, from_file_object: BinaryIO, to_path: str, **upload_kwargs: Dict[str, Any]\n) -> str:\n \"\"\"\n Uploads an object to the S3 bucket from a file-like object,\n which can be a BytesIO object or a BufferedReader.\n\n Args:\n from_file_object: The file-like object to upload from.\n to_path: The path to upload the object to.\n **upload_kwargs: Additional keyword arguments to pass to\n `Client.upload_fileobj`.\n\n Returns:\n The path that the object was uploaded to.\n\n Examples:\n Upload BytesIO object to my_folder/notes.txt.\n ```python\n from io import BytesIO\n\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n with open(\"notes.txt\", \"rb\") as f:\n s3_bucket.upload_from_file_object(f, \"my_folder/notes.txt\")\n ```\n\n Upload BufferedReader object to my_folder/notes.txt.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n with open(\"notes.txt\", \"rb\") as f:\n s3_bucket.upload_from_file_object(\n f, \"my_folder/notes.txt\"\n )\n ```\n \"\"\"\n bucket_path = str(self._join_bucket_folder(to_path))\n client = self.credentials.get_s3_client()\n await run_sync_in_worker_thread(\n client.upload_fileobj,\n Fileobj=from_file_object,\n Bucket=self.bucket_name,\n Key=bucket_path,\n **upload_kwargs,\n )\n self.logger.info(\n \"Uploaded from file object to the bucket \"\n f\"{self.bucket_name!r} path {bucket_path!r}.\"\n )\n return bucket_path\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.upload_from_folder","title":"
upload_from_folder
async
","text":"
Uploads files within a folder (excluding the folder itself) to the object storage service folder.
Parameters:
Name Type Description Default
from_folder
Union[str, pathlib.Path]
The path to the folder to upload from.
required
to_folder
Optional[str]
The path to upload the folder to.
None
**upload_kwargs
Dict[str, Any]
Additional keyword arguments to pass to Client.upload_fileobj
.
{}
Returns:
Type Description
str
The path that the folder was uploaded to.
Examples:
Upload contents from my_folder to new_folder.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.upload_from_folder(\"my_folder\", \"new_folder\")\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def upload_from_folder(\n self,\n from_folder: Union[str, Path],\n to_folder: Optional[str] = None,\n **upload_kwargs: Dict[str, Any],\n) -> str:\n \"\"\"\n Uploads files *within* a folder (excluding the folder itself)\n to the object storage service folder.\n\n Args:\n from_folder: The path to the folder to upload from.\n to_folder: The path to upload the folder to.\n **upload_kwargs: Additional keyword arguments to pass to\n `Client.upload_fileobj`.\n\n Returns:\n The path that the folder was uploaded to.\n\n Examples:\n Upload contents from my_folder to new_folder.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.upload_from_folder(\"my_folder\", \"new_folder\")\n ```\n \"\"\"\n from_folder = Path(from_folder)\n bucket_folder = self._join_bucket_folder(to_folder or \"\")\n\n num_uploaded = 0\n client = self.credentials.get_s3_client()\n\n async_coros = []\n for from_path in from_folder.rglob(\"**/*\"):\n # this skips the actual directory itself, e.g.\n # `my_folder/` will be skipped\n # `my_folder/notes.txt` will be uploaded\n if from_path.is_dir():\n continue\n bucket_path = (\n Path(bucket_folder) / from_path.relative_to(from_folder)\n ).as_posix()\n self.logger.info(\n f\"Uploading from {str(from_path)!r} to the bucket \"\n f\"{self.bucket_name!r} path {bucket_path!r}.\"\n )\n async_coros.append(\n run_sync_in_worker_thread(\n client.upload_file,\n Filename=str(from_path),\n Bucket=self.bucket_name,\n Key=bucket_path,\n **upload_kwargs,\n )\n )\n num_uploaded += 1\n await asyncio.gather(*async_coros)\n\n if num_uploaded == 0:\n self.logger.warning(f\"No files were uploaded from {str(from_folder)!r}.\")\n else:\n self.logger.info(\n f\"Uploaded {num_uploaded} files from {str(from_folder)!r} to \"\n f\"the bucket {self.bucket_name!r} path {bucket_path!r}\"\n )\n\n return to_folder\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.upload_from_path","title":"
upload_from_path
async
","text":"
Uploads an object from a path to the S3 bucket.
Parameters:
Name Type Description Default
from_path
Union[str, pathlib.Path]
The path to the file to upload from.
required
to_path
Optional[str]
The path to upload the file to.
None
**upload_kwargs
Dict[str, Any]
Additional keyword arguments to pass to Client.upload
.
{}
Returns:
Type Description
str
The path that the object was uploaded to.
Examples:
Upload notes.txt to my_folder/notes.txt.
from prefect_aws.s3 import S3Bucket\n\ns3_bucket = S3Bucket.load(\"my-bucket\")\ns3_bucket.upload_from_path(\"notes.txt\", \"my_folder/notes.txt\")\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def upload_from_path(\n self,\n from_path: Union[str, Path],\n to_path: Optional[str] = None,\n **upload_kwargs: Dict[str, Any],\n) -> str:\n \"\"\"\n Uploads an object from a path to the S3 bucket.\n\n Args:\n from_path: The path to the file to upload from.\n to_path: The path to upload the file to.\n **upload_kwargs: Additional keyword arguments to pass to `Client.upload`.\n\n Returns:\n The path that the object was uploaded to.\n\n Examples:\n Upload notes.txt to my_folder/notes.txt.\n ```python\n from prefect_aws.s3 import S3Bucket\n\n s3_bucket = S3Bucket.load(\"my-bucket\")\n s3_bucket.upload_from_path(\"notes.txt\", \"my_folder/notes.txt\")\n ```\n \"\"\"\n from_path = str(Path(from_path).absolute())\n if to_path is None:\n to_path = Path(from_path).name\n\n bucket_path = str(self._join_bucket_folder(to_path))\n client = self.credentials.get_s3_client()\n\n await run_sync_in_worker_thread(\n client.upload_file,\n Filename=from_path,\n Bucket=self.bucket_name,\n Key=bucket_path,\n **upload_kwargs,\n )\n self.logger.info(\n f\"Uploaded from {from_path!r} to the bucket \"\n f\"{self.bucket_name!r} path {bucket_path!r}.\"\n )\n return bucket_path\n
"},{"location":"s3/#prefect_aws.s3.S3Bucket.write_path","title":"
write_path
async
","text":"
Writes to an S3 bucket.
Parameters:
Name Type Description Default
path
str
The key name. Each object in your bucket has a unique key (or key name).
required
content
bytes
What you are uploading to S3.
required
Examples:
Write data to the path dogs/small_dogs/havanese
in an S3 Bucket:
from prefect_aws import MinioCredentials\nfrom prefect_aws.s3 import S3Bucket\n\nminio_creds = MinIOCredentials(\n minio_root_user = \"minioadmin\",\n minio_root_password = \"minioadmin\",\n)\n\ns3_bucket_block = S3Bucket(\n bucket_name=\"bucket\",\n minio_credentials=minio_creds,\n basepath=\"dogs/smalldogs\",\n endpoint_url=\"http://localhost:9000\",\n)\ns3_havanese_path = s3_bucket_block.write_path(path=\"havanese\", content=data)\n
Source code in
prefect_aws/s3.py
@sync_compatible\nasync def write_path(self, path: str, content: bytes) -> str:\n \"\"\"\n Writes to an S3 bucket.\n\n Args:\n\n path: The key name. Each object in your bucket has a unique\n key (or key name).\n content: What you are uploading to S3.\n\n Example:\n\n Write data to the path `dogs/small_dogs/havanese` in an S3 Bucket:\n ```python\n from prefect_aws import MinioCredentials\n from prefect_aws.s3 import S3Bucket\n\n minio_creds = MinIOCredentials(\n minio_root_user = \"minioadmin\",\n minio_root_password = \"minioadmin\",\n )\n\n s3_bucket_block = S3Bucket(\n bucket_name=\"bucket\",\n minio_credentials=minio_creds,\n basepath=\"dogs/smalldogs\",\n endpoint_url=\"http://localhost:9000\",\n )\n s3_havanese_path = s3_bucket_block.write_path(path=\"havanese\", content=data)\n ```\n \"\"\"\n\n path = self._resolve_path(path)\n\n await run_sync_in_worker_thread(self._write_sync, path, content)\n\n return path\n
"},{"location":"s3/#prefect_aws.s3-functions","title":"Functions","text":""},{"location":"s3/#prefect_aws.s3.s3_copy","title":"
s3_copy
async
","text":"
Uses S3's internal CopyObject to copy objects within or between buckets. To copy objects between buckets, the credentials must have permission to read the source object and write to the target object. If the credentials do not have those permissions, try using S3Bucket.stream_from
.
Parameters:
Name Type Description Default
source_path
str
The path to the object to copy. Can be a string or Path
.
required
target_path
str
The path to copy the object to. Can be a string or Path
.
required
source_bucket_name
str
The bucket to copy the object from.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
target_bucket_name
Optional[str]
The bucket to copy the object to. If not provided, defaults to source_bucket
.
None
**copy_kwargs
Additional keyword arguments to pass to S3Client.copy_object
.
{}
Returns:
Type Description
str
The path that the object was copied to. Excludes the bucket name.
Examples:
Copy notes.txt from s3://my-bucket/my_folder/notes.txt to s3://my-bucket/my_folder/notes_copy.txt.
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import s3_copy\n\naws_credentials = AwsCredentials.load(\"my-creds\")\n\n@flow\nasync def example_copy_flow():\n await s3_copy(\n source_path=\"my_folder/notes.txt\",\n target_path=\"my_folder/notes_copy.txt\",\n source_bucket_name=\"my-bucket\",\n aws_credentials=aws_credentials,\n )\n\nexample_copy_flow()\n
Copy notes.txt from s3://my-bucket/my_folder/notes.txt to s3://other-bucket/notes_copy.txt.
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import s3_copy\n\naws_credentials = AwsCredentials.load(\"shared-creds\")\n\n@flow\nasync def example_copy_flow():\n await s3_copy(\n source_path=\"my_folder/notes.txt\",\n target_path=\"notes_copy.txt\",\n source_bucket_name=\"my-bucket\",\n aws_credentials=aws_credentials,\n target_bucket_name=\"other-bucket\",\n )\n\nexample_copy_flow()\n
Source code in
prefect_aws/s3.py
@task\nasync def s3_copy(\n source_path: str,\n target_path: str,\n source_bucket_name: str,\n aws_credentials: AwsCredentials,\n target_bucket_name: Optional[str] = None,\n **copy_kwargs,\n) -> str:\n \"\"\"Uses S3's internal\n [CopyObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html)\n to copy objects within or between buckets. To copy objects between buckets, the\n credentials must have permission to read the source object and write to the target\n object. If the credentials do not have those permissions, try using\n `S3Bucket.stream_from`.\n\n Args:\n source_path: The path to the object to copy. Can be a string or `Path`.\n target_path: The path to copy the object to. Can be a string or `Path`.\n source_bucket_name: The bucket to copy the object from.\n aws_credentials: Credentials to use for authentication with AWS.\n target_bucket_name: The bucket to copy the object to. If not provided, defaults\n to `source_bucket`.\n **copy_kwargs: Additional keyword arguments to pass to `S3Client.copy_object`.\n\n Returns:\n The path that the object was copied to. Excludes the bucket name.\n\n Examples:\n\n Copy notes.txt from s3://my-bucket/my_folder/notes.txt to\n s3://my-bucket/my_folder/notes_copy.txt.\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.s3 import s3_copy\n\n aws_credentials = AwsCredentials.load(\"my-creds\")\n\n @flow\n async def example_copy_flow():\n await s3_copy(\n source_path=\"my_folder/notes.txt\",\n target_path=\"my_folder/notes_copy.txt\",\n source_bucket_name=\"my-bucket\",\n aws_credentials=aws_credentials,\n )\n\n example_copy_flow()\n ```\n\n Copy notes.txt from s3://my-bucket/my_folder/notes.txt to\n s3://other-bucket/notes_copy.txt.\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.s3 import s3_copy\n\n aws_credentials = AwsCredentials.load(\"shared-creds\")\n\n @flow\n async def example_copy_flow():\n await s3_copy(\n source_path=\"my_folder/notes.txt\",\n target_path=\"notes_copy.txt\",\n source_bucket_name=\"my-bucket\",\n aws_credentials=aws_credentials,\n target_bucket_name=\"other-bucket\",\n )\n\n example_copy_flow()\n ```\n\n \"\"\"\n logger = get_run_logger()\n\n s3_client = aws_credentials.get_s3_client()\n\n target_bucket_name = target_bucket_name or source_bucket_name\n\n logger.info(\n \"Copying object from bucket %s with key %s to bucket %s with key %s\",\n source_bucket_name,\n source_path,\n target_bucket_name,\n target_path,\n )\n\n s3_client.copy_object(\n CopySource={\"Bucket\": source_bucket_name, \"Key\": source_path},\n Bucket=target_bucket_name,\n Key=target_path,\n **copy_kwargs,\n )\n\n return target_path\n
"},{"location":"s3/#prefect_aws.s3.s3_download","title":"
s3_download
async
","text":"
Downloads an object with a given key from a given S3 bucket.
Parameters:
Name Type Description Default
bucket
str
Name of bucket to download object from. Required if a default value was not supplied when creating the task.
required
key
str
Key of object to download. Required if a default value was not supplied when creating the task.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
aws_client_parameters
AwsClientParameters
Custom parameter for the boto3 client initialization.
AwsClientParameters(api_version=None, use_ssl=True, verify=True, verify_cert_path=None, endpoint_url=None, config=None)
Returns:
Type Description
bytes
A bytes
representation of the downloaded object.
Examples:
Download a file from an S3 bucket:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import s3_download\n\n\n@flow\nasync def example_s3_download_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n data = await s3_download(\n bucket=\"bucket\",\n key=\"key\",\n aws_credentials=aws_credentials,\n )\n\nexample_s3_download_flow()\n
Source code in
prefect_aws/s3.py
@task\nasync def s3_download(\n bucket: str,\n key: str,\n aws_credentials: AwsCredentials,\n aws_client_parameters: AwsClientParameters = AwsClientParameters(),\n) -> bytes:\n \"\"\"\n Downloads an object with a given key from a given S3 bucket.\n\n Args:\n bucket: Name of bucket to download object from. Required if a default value was\n not supplied when creating the task.\n key: Key of object to download. Required if a default value was not supplied\n when creating the task.\n aws_credentials: Credentials to use for authentication with AWS.\n aws_client_parameters: Custom parameter for the boto3 client initialization.\n\n\n Returns:\n A `bytes` representation of the downloaded object.\n\n Example:\n Download a file from an S3 bucket:\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.s3 import s3_download\n\n\n @flow\n async def example_s3_download_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n data = await s3_download(\n bucket=\"bucket\",\n key=\"key\",\n aws_credentials=aws_credentials,\n )\n\n example_s3_download_flow()\n ```\n \"\"\"\n logger = get_run_logger()\n logger.info(\"Downloading object from bucket %s with key %s\", bucket, key)\n\n s3_client = aws_credentials.get_boto3_session().client(\n \"s3\", **aws_client_parameters.get_params_override()\n )\n stream = io.BytesIO()\n await run_sync_in_worker_thread(\n s3_client.download_fileobj, Bucket=bucket, Key=key, Fileobj=stream\n )\n stream.seek(0)\n output = stream.read()\n\n return output\n
"},{"location":"s3/#prefect_aws.s3.s3_list_objects","title":"
s3_list_objects
async
","text":"
Lists details of objects in a given S3 bucket.
Parameters:
Name Type Description Default
bucket
str
Name of bucket to list items from. Required if a default value was not supplied when creating the task.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
aws_client_parameters
AwsClientParameters
Custom parameter for the boto3 client initialization..
AwsClientParameters(api_version=None, use_ssl=True, verify=True, verify_cert_path=None, endpoint_url=None, config=None)
prefix
str
Used to filter objects with keys starting with the specified prefix.
''
delimiter
str
Character used to group keys of listed objects.
''
page_size
Optional[int]
Number of objects to return in each request to the AWS API.
None
max_items
Optional[int]
Maximum number of objects that to be returned by task.
None
jmespath_query
Optional[str]
Query used to filter objects based on object attributes refer to the boto3 docs for more information on how to construct queries.
None
Returns:
Type Description
List[Dict[str, Any]]
A list of dictionaries containing information about the objects retrieved. Refer to the boto3 docs for an example response.
Examples:
List all objects in a bucket:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import s3_list_objects\n\n\n@flow\nasync def example_s3_list_objects_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n objects = await s3_list_objects(\n bucket=\"data_bucket\",\n aws_credentials=aws_credentials\n )\n\nexample_s3_list_objects_flow()\n
Source code in
prefect_aws/s3.py
@task\nasync def s3_list_objects(\n bucket: str,\n aws_credentials: AwsCredentials,\n aws_client_parameters: AwsClientParameters = AwsClientParameters(),\n prefix: str = \"\",\n delimiter: str = \"\",\n page_size: Optional[int] = None,\n max_items: Optional[int] = None,\n jmespath_query: Optional[str] = None,\n) -> List[Dict[str, Any]]:\n \"\"\"\n Lists details of objects in a given S3 bucket.\n\n Args:\n bucket: Name of bucket to list items from. Required if a default value was not\n supplied when creating the task.\n aws_credentials: Credentials to use for authentication with AWS.\n aws_client_parameters: Custom parameter for the boto3 client initialization..\n prefix: Used to filter objects with keys starting with the specified prefix.\n delimiter: Character used to group keys of listed objects.\n page_size: Number of objects to return in each request to the AWS API.\n max_items: Maximum number of objects that to be returned by task.\n jmespath_query: Query used to filter objects based on object attributes refer to\n the [boto3 docs](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html#filtering-results-with-jmespath)\n for more information on how to construct queries.\n\n Returns:\n A list of dictionaries containing information about the objects retrieved. Refer\n to the boto3 docs for an example response.\n\n Example:\n List all objects in a bucket:\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.s3 import s3_list_objects\n\n\n @flow\n async def example_s3_list_objects_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n objects = await s3_list_objects(\n bucket=\"data_bucket\",\n aws_credentials=aws_credentials\n )\n\n example_s3_list_objects_flow()\n ```\n \"\"\" # noqa E501\n logger = get_run_logger()\n logger.info(\"Listing objects in bucket %s with prefix %s\", bucket, prefix)\n\n s3_client = aws_credentials.get_boto3_session().client(\n \"s3\", **aws_client_parameters.get_params_override()\n )\n paginator = s3_client.get_paginator(\"list_objects_v2\")\n page_iterator = paginator.paginate(\n Bucket=bucket,\n Prefix=prefix,\n Delimiter=delimiter,\n PaginationConfig={\"PageSize\": page_size, \"MaxItems\": max_items},\n )\n if jmespath_query:\n page_iterator = page_iterator.search(f\"{jmespath_query} | {{Contents: @}}\")\n\n return await run_sync_in_worker_thread(_list_objects_sync, page_iterator)\n
"},{"location":"s3/#prefect_aws.s3.s3_move","title":"
s3_move
async
","text":"
Move an object from one S3 location to another. To move objects between buckets, the credentials must have permission to read and delete the source object and write to the target object. If the credentials do not have those permissions, this method will raise an error. If the credentials have permission to read the source object but not delete it, the object will be copied but not deleted.
Parameters:
Name Type Description Default
source_path
str
The path of the object to move
required
target_path
str
The path to move the object to
required
source_bucket_name
str
The name of the bucket containing the source object
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
target_bucket_name
Optional[str]
The bucket to copy the object to. If not provided, defaults to source_bucket
.
None
Returns:
Type Description
str
The path that the object was moved to. Excludes the bucket name.
Source code in
prefect_aws/s3.py
@task\nasync def s3_move(\n source_path: str,\n target_path: str,\n source_bucket_name: str,\n aws_credentials: AwsCredentials,\n target_bucket_name: Optional[str] = None,\n) -> str:\n \"\"\"\n Move an object from one S3 location to another. To move objects between buckets,\n the credentials must have permission to read and delete the source object and write\n to the target object. If the credentials do not have those permissions, this method\n will raise an error. If the credentials have permission to read the source object\n but not delete it, the object will be copied but not deleted.\n\n Args:\n source_path: The path of the object to move\n target_path: The path to move the object to\n source_bucket_name: The name of the bucket containing the source object\n aws_credentials: Credentials to use for authentication with AWS.\n target_bucket_name: The bucket to copy the object to. If not provided, defaults\n to `source_bucket`.\n\n Returns:\n The path that the object was moved to. Excludes the bucket name.\n \"\"\"\n logger = get_run_logger()\n\n s3_client = aws_credentials.get_s3_client()\n\n # If target bucket is not provided, assume it's the same as the source bucket\n target_bucket_name = target_bucket_name or source_bucket_name\n\n logger.info(\n \"Moving object from s3://%s/%s s3://%s/%s\",\n source_bucket_name,\n source_path,\n target_bucket_name,\n target_path,\n )\n\n # Copy the object to the new location\n s3_client.copy_object(\n Bucket=target_bucket_name,\n CopySource={\"Bucket\": source_bucket_name, \"Key\": source_path},\n Key=target_path,\n )\n\n # Delete the original object\n s3_client.delete_object(Bucket=source_bucket_name, Key=source_path)\n\n return target_path\n
"},{"location":"s3/#prefect_aws.s3.s3_upload","title":"
s3_upload
async
","text":"
Uploads data to an S3 bucket.
Parameters:
Name Type Description Default
data
bytes
Bytes representation of data to upload to S3.
required
bucket
str
Name of bucket to upload data to. Required if a default value was not supplied when creating the task.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
aws_client_parameters
AwsClientParameters
Custom parameter for the boto3 client initialization..
AwsClientParameters(api_version=None, use_ssl=True, verify=True, verify_cert_path=None, endpoint_url=None, config=None)
key
Optional[str]
Key of object to download. Defaults to a UUID string.
None
Returns:
Type Description
str
The key of the uploaded object
Examples:
Read and upload a file to an S3 bucket:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.s3 import s3_upload\n\n\n@flow\nasync def example_s3_upload_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n with open(\"data.csv\", \"rb\") as file:\n key = await s3_upload(\n bucket=\"bucket\",\n key=\"data.csv\",\n data=file.read(),\n aws_credentials=aws_credentials,\n )\n\nexample_s3_upload_flow()\n
Source code in
prefect_aws/s3.py
@task\nasync def s3_upload(\n data: bytes,\n bucket: str,\n aws_credentials: AwsCredentials,\n aws_client_parameters: AwsClientParameters = AwsClientParameters(),\n key: Optional[str] = None,\n) -> str:\n \"\"\"\n Uploads data to an S3 bucket.\n\n Args:\n data: Bytes representation of data to upload to S3.\n bucket: Name of bucket to upload data to. Required if a default value was not\n supplied when creating the task.\n aws_credentials: Credentials to use for authentication with AWS.\n aws_client_parameters: Custom parameter for the boto3 client initialization..\n key: Key of object to download. Defaults to a UUID string.\n\n Returns:\n The key of the uploaded object\n\n Example:\n Read and upload a file to an S3 bucket:\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.s3 import s3_upload\n\n\n @flow\n async def example_s3_upload_flow():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"acccess_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n with open(\"data.csv\", \"rb\") as file:\n key = await s3_upload(\n bucket=\"bucket\",\n key=\"data.csv\",\n data=file.read(),\n aws_credentials=aws_credentials,\n )\n\n example_s3_upload_flow()\n ```\n \"\"\"\n logger = get_run_logger()\n\n key = key or str(uuid.uuid4())\n\n logger.info(\"Uploading object to bucket %s with key %s\", bucket, key)\n\n s3_client = aws_credentials.get_boto3_session().client(\n \"s3\", **aws_client_parameters.get_params_override()\n )\n stream = io.BytesIO(data)\n await run_sync_in_worker_thread(\n s3_client.upload_fileobj, stream, Bucket=bucket, Key=key\n )\n\n return key\n
"},{"location":"secrets_manager/","title":"Secrets Manager","text":""},{"location":"secrets_manager/#prefect_aws.secrets_manager","title":"
prefect_aws.secrets_manager
","text":"
Tasks for interacting with AWS Secrets Manager
"},{"location":"secrets_manager/#prefect_aws.secrets_manager-classes","title":"Classes","text":""},{"location":"secrets_manager/#prefect_aws.secrets_manager.AwsSecret","title":"
AwsSecret (SecretBlock)
pydantic-model
","text":"
Manages a secret in AWS's Secrets Manager.
Attributes:
Name Type Description
aws_credentials
AwsCredentials
The credentials to use for authentication with AWS.
secret_name
str
The name of the secret.
Source code in
prefect_aws/secrets_manager.py
class AwsSecret(SecretBlock):\n \"\"\"\n Manages a secret in AWS's Secrets Manager.\n\n Attributes:\n aws_credentials: The credentials to use for authentication with AWS.\n secret_name: The name of the secret.\n \"\"\"\n\n _logo_url = \"https://cdn.sanity.io/images/3ugk85nk/production/d74b16fe84ce626345adf235a47008fea2869a60-225x225.png\" # noqa\n _block_type_name = \"AWS Secret\"\n _documentation_url = \"https://prefecthq.github.io/prefect-aws/secrets_manager/#prefect_aws.secrets_manager.AwsSecret\" # noqa\n\n aws_credentials: AwsCredentials\n secret_name: str = Field(default=..., description=\"The name of the secret.\")\n\n @sync_compatible\n async def read_secret(\n self,\n version_id: str = None,\n version_stage: str = None,\n **read_kwargs: Dict[str, Any],\n ) -> bytes:\n \"\"\"\n Reads the secret from the secret storage service.\n\n Args:\n version_id: The version of the secret to read. If not provided, the latest\n version will be read.\n version_stage: The version stage of the secret to read. If not provided,\n the latest version will be read.\n read_kwargs: Additional keyword arguments to pass to the\n `get_secret_value` method of the boto3 client.\n\n Returns:\n The secret data.\n\n Examples:\n Reads a secret.\n ```python\n secrets_manager = SecretsManager.load(\"MY_BLOCK\")\n secrets_manager.read_secret()\n ```\n \"\"\"\n client = self.aws_credentials.get_secrets_manager_client()\n if version_id is not None:\n read_kwargs[\"VersionId\"] = version_id\n if version_stage is not None:\n read_kwargs[\"VersionStage\"] = version_stage\n response = await run_sync_in_worker_thread(\n client.get_secret_value, SecretId=self.secret_name, **read_kwargs\n )\n if \"SecretBinary\" in response:\n secret = response[\"SecretBinary\"]\n elif \"SecretString\" in response:\n secret = response[\"SecretString\"]\n arn = response[\"ARN\"]\n self.logger.info(f\"The secret {arn!r} data was successfully read.\")\n return secret\n\n @sync_compatible\n async def write_secret(\n self, secret_data: bytes, **put_or_create_secret_kwargs: Dict[str, Any]\n ) -> str:\n \"\"\"\n Writes the secret to the secret storage service as a SecretBinary;\n if it doesn't exist, it will be created.\n\n Args:\n secret_data: The secret data to write.\n **put_or_create_secret_kwargs: Additional keyword arguments to pass to\n put_secret_value or create_secret method of the boto3 client.\n\n Returns:\n The path that the secret was written to.\n\n Examples:\n Write some secret data.\n ```python\n secrets_manager = SecretsManager.load(\"MY_BLOCK\")\n secrets_manager.write_secret(b\"my_secret_data\")\n ```\n \"\"\"\n client = self.aws_credentials.get_secrets_manager_client()\n try:\n response = await run_sync_in_worker_thread(\n client.put_secret_value,\n SecretId=self.secret_name,\n SecretBinary=secret_data,\n **put_or_create_secret_kwargs,\n )\n except client.exceptions.ResourceNotFoundException:\n self.logger.info(\n f\"The secret {self.secret_name!r} does not exist yet, creating it now.\"\n )\n response = await run_sync_in_worker_thread(\n client.create_secret,\n Name=self.secret_name,\n SecretBinary=secret_data,\n **put_or_create_secret_kwargs,\n )\n arn = response[\"ARN\"]\n self.logger.info(f\"The secret data was written successfully to {arn!r}.\")\n return arn\n\n @sync_compatible\n async def delete_secret(\n self,\n recovery_window_in_days: int = 30,\n force_delete_without_recovery: bool = False,\n **delete_kwargs: Dict[str, Any],\n ) -> str:\n \"\"\"\n Deletes the secret from the secret storage service.\n\n Args:\n recovery_window_in_days: The number of days to wait before permanently\n deleting the secret. Must be between 7 and 30 days.\n force_delete_without_recovery: If True, the secret will be deleted\n immediately without a recovery window.\n **delete_kwargs: Additional keyword arguments to pass to the\n delete_secret method of the boto3 client.\n\n Returns:\n The path that the secret was deleted from.\n\n Examples:\n Deletes the secret with a recovery window of 15 days.\n ```python\n secrets_manager = SecretsManager.load(\"MY_BLOCK\")\n secrets_manager.delete_secret(recovery_window_in_days=15)\n ```\n \"\"\"\n if force_delete_without_recovery and recovery_window_in_days:\n raise ValueError(\n \"Cannot specify recovery window and force delete without recovery.\"\n )\n elif not (7 <= recovery_window_in_days <= 30):\n raise ValueError(\n \"Recovery window must be between 7 and 30 days, got \"\n f\"{recovery_window_in_days}.\"\n )\n\n client = self.aws_credentials.get_secrets_manager_client()\n response = await run_sync_in_worker_thread(\n client.delete_secret,\n SecretId=self.secret_name,\n RecoveryWindowInDays=recovery_window_in_days,\n ForceDeleteWithoutRecovery=force_delete_without_recovery,\n **delete_kwargs,\n )\n arn = response[\"ARN\"]\n self.logger.info(f\"The secret {arn} was deleted successfully.\")\n return arn\n
"},{"location":"secrets_manager/#prefect_aws.secrets_manager.AwsSecret-attributes","title":"Attributes","text":""},{"location":"secrets_manager/#prefect_aws.secrets_manager.AwsSecret.secret_name","title":"
secret_name: str
pydantic-field
required
","text":"
The name of the secret.
"},{"location":"secrets_manager/#prefect_aws.secrets_manager.AwsSecret-methods","title":"Methods","text":""},{"location":"secrets_manager/#prefect_aws.secrets_manager.AwsSecret.__json_encoder__","title":"
__json_encoder__
special
staticmethod
","text":"
partial(func, args, *keywords) - new function with partial application of the given arguments and keywords.
"},{"location":"secrets_manager/#prefect_aws.secrets_manager.AwsSecret.delete_secret","title":"
delete_secret
async
","text":"
Deletes the secret from the secret storage service.
Parameters:
Name Type Description Default
recovery_window_in_days
int
The number of days to wait before permanently deleting the secret. Must be between 7 and 30 days.
30
force_delete_without_recovery
bool
If True, the secret will be deleted immediately without a recovery window.
False
**delete_kwargs
Dict[str, Any]
Additional keyword arguments to pass to the delete_secret method of the boto3 client.
{}
Returns:
Type Description
str
The path that the secret was deleted from.
Examples:
Deletes the secret with a recovery window of 15 days.
secrets_manager = SecretsManager.load(\"MY_BLOCK\")\nsecrets_manager.delete_secret(recovery_window_in_days=15)\n
Source code in
prefect_aws/secrets_manager.py
@sync_compatible\nasync def delete_secret(\n self,\n recovery_window_in_days: int = 30,\n force_delete_without_recovery: bool = False,\n **delete_kwargs: Dict[str, Any],\n) -> str:\n \"\"\"\n Deletes the secret from the secret storage service.\n\n Args:\n recovery_window_in_days: The number of days to wait before permanently\n deleting the secret. Must be between 7 and 30 days.\n force_delete_without_recovery: If True, the secret will be deleted\n immediately without a recovery window.\n **delete_kwargs: Additional keyword arguments to pass to the\n delete_secret method of the boto3 client.\n\n Returns:\n The path that the secret was deleted from.\n\n Examples:\n Deletes the secret with a recovery window of 15 days.\n ```python\n secrets_manager = SecretsManager.load(\"MY_BLOCK\")\n secrets_manager.delete_secret(recovery_window_in_days=15)\n ```\n \"\"\"\n if force_delete_without_recovery and recovery_window_in_days:\n raise ValueError(\n \"Cannot specify recovery window and force delete without recovery.\"\n )\n elif not (7 <= recovery_window_in_days <= 30):\n raise ValueError(\n \"Recovery window must be between 7 and 30 days, got \"\n f\"{recovery_window_in_days}.\"\n )\n\n client = self.aws_credentials.get_secrets_manager_client()\n response = await run_sync_in_worker_thread(\n client.delete_secret,\n SecretId=self.secret_name,\n RecoveryWindowInDays=recovery_window_in_days,\n ForceDeleteWithoutRecovery=force_delete_without_recovery,\n **delete_kwargs,\n )\n arn = response[\"ARN\"]\n self.logger.info(f\"The secret {arn} was deleted successfully.\")\n return arn\n
"},{"location":"secrets_manager/#prefect_aws.secrets_manager.AwsSecret.read_secret","title":"
read_secret
async
","text":"
Reads the secret from the secret storage service.
Parameters:
Name Type Description Default
version_id
str
The version of the secret to read. If not provided, the latest version will be read.
None
version_stage
str
The version stage of the secret to read. If not provided, the latest version will be read.
None
read_kwargs
Dict[str, Any]
Additional keyword arguments to pass to the get_secret_value
method of the boto3 client.
{}
Returns:
Type Description
bytes
The secret data.
Examples:
Reads a secret.
secrets_manager = SecretsManager.load(\"MY_BLOCK\")\nsecrets_manager.read_secret()\n
Source code in
prefect_aws/secrets_manager.py
@sync_compatible\nasync def read_secret(\n self,\n version_id: str = None,\n version_stage: str = None,\n **read_kwargs: Dict[str, Any],\n) -> bytes:\n \"\"\"\n Reads the secret from the secret storage service.\n\n Args:\n version_id: The version of the secret to read. If not provided, the latest\n version will be read.\n version_stage: The version stage of the secret to read. If not provided,\n the latest version will be read.\n read_kwargs: Additional keyword arguments to pass to the\n `get_secret_value` method of the boto3 client.\n\n Returns:\n The secret data.\n\n Examples:\n Reads a secret.\n ```python\n secrets_manager = SecretsManager.load(\"MY_BLOCK\")\n secrets_manager.read_secret()\n ```\n \"\"\"\n client = self.aws_credentials.get_secrets_manager_client()\n if version_id is not None:\n read_kwargs[\"VersionId\"] = version_id\n if version_stage is not None:\n read_kwargs[\"VersionStage\"] = version_stage\n response = await run_sync_in_worker_thread(\n client.get_secret_value, SecretId=self.secret_name, **read_kwargs\n )\n if \"SecretBinary\" in response:\n secret = response[\"SecretBinary\"]\n elif \"SecretString\" in response:\n secret = response[\"SecretString\"]\n arn = response[\"ARN\"]\n self.logger.info(f\"The secret {arn!r} data was successfully read.\")\n return secret\n
"},{"location":"secrets_manager/#prefect_aws.secrets_manager.AwsSecret.write_secret","title":"
write_secret
async
","text":"
Writes the secret to the secret storage service as a SecretBinary; if it doesn't exist, it will be created.
Parameters:
Name Type Description Default
secret_data
bytes
The secret data to write.
required
**put_or_create_secret_kwargs
Dict[str, Any]
Additional keyword arguments to pass to put_secret_value or create_secret method of the boto3 client.
{}
Returns:
Type Description
str
The path that the secret was written to.
Examples:
Write some secret data.
secrets_manager = SecretsManager.load(\"MY_BLOCK\")\nsecrets_manager.write_secret(b\"my_secret_data\")\n
Source code in
prefect_aws/secrets_manager.py
@sync_compatible\nasync def write_secret(\n self, secret_data: bytes, **put_or_create_secret_kwargs: Dict[str, Any]\n) -> str:\n \"\"\"\n Writes the secret to the secret storage service as a SecretBinary;\n if it doesn't exist, it will be created.\n\n Args:\n secret_data: The secret data to write.\n **put_or_create_secret_kwargs: Additional keyword arguments to pass to\n put_secret_value or create_secret method of the boto3 client.\n\n Returns:\n The path that the secret was written to.\n\n Examples:\n Write some secret data.\n ```python\n secrets_manager = SecretsManager.load(\"MY_BLOCK\")\n secrets_manager.write_secret(b\"my_secret_data\")\n ```\n \"\"\"\n client = self.aws_credentials.get_secrets_manager_client()\n try:\n response = await run_sync_in_worker_thread(\n client.put_secret_value,\n SecretId=self.secret_name,\n SecretBinary=secret_data,\n **put_or_create_secret_kwargs,\n )\n except client.exceptions.ResourceNotFoundException:\n self.logger.info(\n f\"The secret {self.secret_name!r} does not exist yet, creating it now.\"\n )\n response = await run_sync_in_worker_thread(\n client.create_secret,\n Name=self.secret_name,\n SecretBinary=secret_data,\n **put_or_create_secret_kwargs,\n )\n arn = response[\"ARN\"]\n self.logger.info(f\"The secret data was written successfully to {arn!r}.\")\n return arn\n
"},{"location":"secrets_manager/#prefect_aws.secrets_manager-functions","title":"Functions","text":""},{"location":"secrets_manager/#prefect_aws.secrets_manager.create_secret","title":"
create_secret
async
","text":"
Creates a secret in AWS Secrets Manager.
Parameters:
Name Type Description Default
secret_name
str
The name of the secret to create.
required
secret_value
Union[str, bytes]
The value to store in the created secret.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
description
Optional[str]
A description for the created secret.
None
tags
Optional[List[Dict[str, str]]]
A list of tags to attach to the secret. Each tag should be specified as a dictionary in the following format:
{\n \"Key\": str,\n \"Value\": str\n}\n
None
Returns:
Type Description
A dict containing the secret ARN (Amazon Resource Name), name, and current version ID. ```python { \"ARN\"
str, \"Name\": str, \"VersionId\": str } ```
Examples:
Create a secret:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import create_secret\n\n@flow\ndef example_create_secret():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n create_secret(\n secret_name=\"life_the_universe_and_everything\",\n secret_value=\"42\",\n aws_credentials=aws_credentials\n )\n\nexample_create_secret()\n
Source code in
prefect_aws/secrets_manager.py
@task\nasync def create_secret(\n secret_name: str,\n secret_value: Union[str, bytes],\n aws_credentials: AwsCredentials,\n description: Optional[str] = None,\n tags: Optional[List[Dict[str, str]]] = None,\n) -> Dict[str, str]:\n \"\"\"\n Creates a secret in AWS Secrets Manager.\n\n Args:\n secret_name: The name of the secret to create.\n secret_value: The value to store in the created secret.\n aws_credentials: Credentials to use for authentication with AWS.\n description: A description for the created secret.\n tags: A list of tags to attach to the secret. Each tag should be specified as a\n dictionary in the following format:\n ```python\n {\n \"Key\": str,\n \"Value\": str\n }\n ```\n\n Returns:\n A dict containing the secret ARN (Amazon Resource Name),\n name, and current version ID.\n ```python\n {\n \"ARN\": str,\n \"Name\": str,\n \"VersionId\": str\n }\n ```\n Example:\n Create a secret:\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.secrets_manager import create_secret\n\n @flow\n def example_create_secret():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n create_secret(\n secret_name=\"life_the_universe_and_everything\",\n secret_value=\"42\",\n aws_credentials=aws_credentials\n )\n\n example_create_secret()\n ```\n\n\n \"\"\"\n create_secret_kwargs: Dict[str, Union[str, bytes, List[Dict[str, str]]]] = dict(\n Name=secret_name\n )\n if description is not None:\n create_secret_kwargs[\"Description\"] = description\n if tags is not None:\n create_secret_kwargs[\"Tags\"] = tags\n if isinstance(secret_value, bytes):\n create_secret_kwargs[\"SecretBinary\"] = secret_value\n elif isinstance(secret_value, str):\n create_secret_kwargs[\"SecretString\"] = secret_value\n else:\n raise ValueError(\"Please provide a bytes or str value for secret_value\")\n\n logger = get_run_logger()\n logger.info(\"Creating secret named %s\", secret_name)\n\n client = aws_credentials.get_boto3_session().client(\"secretsmanager\")\n\n try:\n response = await run_sync_in_worker_thread(\n client.create_secret, **create_secret_kwargs\n )\n print(response.pop(\"ResponseMetadata\", None))\n return response\n except ClientError:\n logger.exception(\"Unable to create secret %s\", secret_name)\n raise\n
"},{"location":"secrets_manager/#prefect_aws.secrets_manager.delete_secret","title":"
delete_secret
async
","text":"
Deletes a secret from AWS Secrets Manager.
Secrets can either be deleted immediately by setting force_delete_without_recovery
equal to True
. Otherwise, secrets will be marked for deletion and available for recovery for the number of days specified in recovery_window_in_days
Parameters:
Name Type Description Default
secret_name
str
Name of the secret to be deleted.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
recovery_window_in_days
int
Number of days a secret should be recoverable for before permanent deletion. Minium window is 7 days and maximum window is 30 days. If force_delete_without_recovery
is set to True
, this value will be ignored.
30
force_delete_without_recovery
bool
If True
, the secret will be immediately deleted and will not be recoverable.
False
Returns:
Type Description
A dict containing the secret ARN (Amazon Resource Name), name, and deletion date of the secret. DeletionDate is the date and time of the delete request plus the number of days in `recovery_window_in_days`. ```python { \"ARN\"
str, \"Name\": str, \"DeletionDate\": datetime.datetime } ```
Examples:
Delete a secret immediately:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import delete_secret\n\n@flow\ndef example_delete_secret_immediately():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n delete_secret(\n secret_name=\"life_the_universe_and_everything\",\n aws_credentials=aws_credentials,\n force_delete_without_recovery: True\n )\n\nexample_delete_secret_immediately()\n
Delete a secret with a 90 day recovery window:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import delete_secret\n\n@flow\ndef example_delete_secret_with_recovery_window():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n delete_secret(\n secret_name=\"life_the_universe_and_everything\",\n aws_credentials=aws_credentials,\n recovery_window_in_days=90\n )\n\nexample_delete_secret_with_recovery_window()\n
Source code in
prefect_aws/secrets_manager.py
@task\nasync def delete_secret(\n secret_name: str,\n aws_credentials: AwsCredentials,\n recovery_window_in_days: int = 30,\n force_delete_without_recovery: bool = False,\n) -> Dict[str, str]:\n \"\"\"\n Deletes a secret from AWS Secrets Manager.\n\n Secrets can either be deleted immediately by setting `force_delete_without_recovery`\n equal to `True`. Otherwise, secrets will be marked for deletion and available for\n recovery for the number of days specified in `recovery_window_in_days`\n\n Args:\n secret_name: Name of the secret to be deleted.\n aws_credentials: Credentials to use for authentication with AWS.\n recovery_window_in_days: Number of days a secret should be recoverable for\n before permanent deletion. Minium window is 7 days and maximum window\n is 30 days. If `force_delete_without_recovery` is set to `True`, this\n value will be ignored.\n force_delete_without_recovery: If `True`, the secret will be immediately\n deleted and will not be recoverable.\n\n Returns:\n A dict containing the secret ARN (Amazon Resource Name),\n name, and deletion date of the secret. DeletionDate is the date and\n time of the delete request plus the number of days in\n `recovery_window_in_days`.\n ```python\n {\n \"ARN\": str,\n \"Name\": str,\n \"DeletionDate\": datetime.datetime\n }\n ```\n\n Examples:\n Delete a secret immediately:\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.secrets_manager import delete_secret\n\n @flow\n def example_delete_secret_immediately():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n delete_secret(\n secret_name=\"life_the_universe_and_everything\",\n aws_credentials=aws_credentials,\n force_delete_without_recovery: True\n )\n\n example_delete_secret_immediately()\n ```\n\n Delete a secret with a 90 day recovery window:\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.secrets_manager import delete_secret\n\n @flow\n def example_delete_secret_with_recovery_window():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n delete_secret(\n secret_name=\"life_the_universe_and_everything\",\n aws_credentials=aws_credentials,\n recovery_window_in_days=90\n )\n\n example_delete_secret_with_recovery_window()\n ```\n\n\n \"\"\"\n if not force_delete_without_recovery and not (7 <= recovery_window_in_days <= 30):\n raise ValueError(\"Recovery window must be between 7 and 30 days.\")\n\n delete_secret_kwargs: Dict[str, Union[str, int, bool]] = dict(SecretId=secret_name)\n if force_delete_without_recovery:\n delete_secret_kwargs[\"ForceDeleteWithoutRecovery\"] = (\n force_delete_without_recovery\n )\n else:\n delete_secret_kwargs[\"RecoveryWindowInDays\"] = recovery_window_in_days\n\n logger = get_run_logger()\n logger.info(\"Deleting secret %s\", secret_name)\n\n client = aws_credentials.get_boto3_session().client(\"secretsmanager\")\n\n try:\n response = await run_sync_in_worker_thread(\n client.delete_secret, **delete_secret_kwargs\n )\n response.pop(\"ResponseMetadata\", None)\n return response\n except ClientError:\n logger.exception(\"Unable to delete secret %s\", secret_name)\n raise\n
"},{"location":"secrets_manager/#prefect_aws.secrets_manager.read_secret","title":"
read_secret
async
","text":"
Reads the value of a given secret from AWS Secrets Manager.
Parameters:
Name Type Description Default
secret_name
str
Name of stored secret.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
version_id
Optional[str]
Specifies version of secret to read. Defaults to the most recent version if not given.
None
version_stage
Optional[str]
Specifies the version stage of the secret to read. Defaults to AWS_CURRENT if not given.
None
Returns:
Type Description
Union[str, bytes]
The secret values as a str
or bytes
depending on the format in which the secret was stored.
Examples:
Read a secret value:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import read_secret\n\n@flow\ndef example_read_secret():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n secret_value = read_secret(\n secret_name=\"db_password\",\n aws_credentials=aws_credentials\n )\n\nexample_read_secret()\n
Source code in
prefect_aws/secrets_manager.py
@task\nasync def read_secret(\n secret_name: str,\n aws_credentials: AwsCredentials,\n version_id: Optional[str] = None,\n version_stage: Optional[str] = None,\n) -> Union[str, bytes]:\n \"\"\"\n Reads the value of a given secret from AWS Secrets Manager.\n\n Args:\n secret_name: Name of stored secret.\n aws_credentials: Credentials to use for authentication with AWS.\n version_id: Specifies version of secret to read. Defaults to the most recent\n version if not given.\n version_stage: Specifies the version stage of the secret to read. Defaults to\n AWS_CURRENT if not given.\n\n Returns:\n The secret values as a `str` or `bytes` depending on the format in which the\n secret was stored.\n\n Example:\n Read a secret value:\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.secrets_manager import read_secret\n\n @flow\n def example_read_secret():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n secret_value = read_secret(\n secret_name=\"db_password\",\n aws_credentials=aws_credentials\n )\n\n example_read_secret()\n ```\n \"\"\"\n logger = get_run_logger()\n logger.info(\"Getting value for secret %s\", secret_name)\n\n client = aws_credentials.get_boto3_session().client(\"secretsmanager\")\n\n get_secret_value_kwargs = dict(SecretId=secret_name)\n if version_id is not None:\n get_secret_value_kwargs[\"VersionId\"] = version_id\n if version_stage is not None:\n get_secret_value_kwargs[\"VersionStage\"] = version_stage\n\n try:\n response = await run_sync_in_worker_thread(\n client.get_secret_value, **get_secret_value_kwargs\n )\n except ClientError:\n logger.exception(\"Unable to get value for secret %s\", secret_name)\n raise\n else:\n return response.get(\"SecretString\") or response.get(\"SecretBinary\")\n
"},{"location":"secrets_manager/#prefect_aws.secrets_manager.update_secret","title":"
update_secret
async
","text":"
Updates the value of a given secret in AWS Secrets Manager.
Parameters:
Name Type Description Default
secret_name
str
Name of secret to update.
required
secret_value
Union[str, bytes]
Desired value of the secret. Can be either str
or bytes
.
required
aws_credentials
AwsCredentials
Credentials to use for authentication with AWS.
required
description
Optional[str]
Desired description of the secret.
None
Returns:
Type Description
A dict containing the secret ARN (Amazon Resource Name), name, and current version ID. ```python { \"ARN\"
str, \"Name\": str, \"VersionId\": str } ```
Examples:
Update a secret value:
from prefect import flow\nfrom prefect_aws import AwsCredentials\nfrom prefect_aws.secrets_manager import update_secret\n\n@flow\ndef example_update_secret():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n update_secret(\n secret_name=\"life_the_universe_and_everything\",\n secret_value=\"42\",\n aws_credentials=aws_credentials\n )\n\nexample_update_secret()\n
Source code in
prefect_aws/secrets_manager.py
@task\nasync def update_secret(\n secret_name: str,\n secret_value: Union[str, bytes],\n aws_credentials: AwsCredentials,\n description: Optional[str] = None,\n) -> Dict[str, str]:\n \"\"\"\n Updates the value of a given secret in AWS Secrets Manager.\n\n Args:\n secret_name: Name of secret to update.\n secret_value: Desired value of the secret. Can be either `str` or `bytes`.\n aws_credentials: Credentials to use for authentication with AWS.\n description: Desired description of the secret.\n\n Returns:\n A dict containing the secret ARN (Amazon Resource Name),\n name, and current version ID.\n ```python\n {\n \"ARN\": str,\n \"Name\": str,\n \"VersionId\": str\n }\n ```\n\n Example:\n Update a secret value:\n\n ```python\n from prefect import flow\n from prefect_aws import AwsCredentials\n from prefect_aws.secrets_manager import update_secret\n\n @flow\n def example_update_secret():\n aws_credentials = AwsCredentials(\n aws_access_key_id=\"access_key_id\",\n aws_secret_access_key=\"secret_access_key\"\n )\n update_secret(\n secret_name=\"life_the_universe_and_everything\",\n secret_value=\"42\",\n aws_credentials=aws_credentials\n )\n\n example_update_secret()\n ```\n\n \"\"\"\n update_secret_kwargs: Dict[str, Union[str, bytes]] = dict(SecretId=secret_name)\n if description is not None:\n update_secret_kwargs[\"Description\"] = description\n if isinstance(secret_value, bytes):\n update_secret_kwargs[\"SecretBinary\"] = secret_value\n elif isinstance(secret_value, str):\n update_secret_kwargs[\"SecretString\"] = secret_value\n else:\n raise ValueError(\"Please provide a bytes or str value for secret_value\")\n\n logger = get_run_logger()\n logger.info(\"Updating value for secret %s\", secret_name)\n\n client = aws_credentials.get_boto3_session().client(\"secretsmanager\")\n\n try:\n response = await run_sync_in_worker_thread(\n client.update_secret, **update_secret_kwargs\n )\n response.pop(\"ResponseMetadata\", None)\n return response\n except ClientError:\n logger.exception(\"Unable to update secret %s\", secret_name)\n raise\n
"},{"location":"deployments/steps/","title":"Steps","text":""},{"location":"deployments/steps/#prefect_aws.deployments.steps","title":"
prefect_aws.deployments.steps
","text":"
Prefect deployment steps for code storage and retrieval in S3 and S3 compatible services.
"},{"location":"deployments/steps/#prefect_aws.deployments.steps-classes","title":"Classes","text":""},{"location":"deployments/steps/#prefect_aws.deployments.steps.PullFromS3Output","title":"
PullFromS3Output (dict)
","text":"
The output of the pull_from_s3
step.
Source code in
prefect_aws/deployments/steps.py
class PullFromS3Output(TypedDict):\n \"\"\"\n The output of the `pull_from_s3` step.\n \"\"\"\n\n bucket: str\n folder: str\n directory: str\n
"},{"location":"deployments/steps/#prefect_aws.deployments.steps.PullProjectFromS3Output","title":"
PullProjectFromS3Output (dict)
","text":"
Deprecated. Use PullFromS3Output
instead..
Source code in
prefect_aws/deployments/steps.py
@deprecated_callable(start_date=\"Jun 2023\", help=\"Use `PullFromS3Output` instead.\")\nclass PullProjectFromS3Output(PullFromS3Output):\n \"\"\"Deprecated. Use `PullFromS3Output` instead..\"\"\"\n
"},{"location":"deployments/steps/#prefect_aws.deployments.steps.PushProjectToS3Output","title":"
PushProjectToS3Output (dict)
","text":"
Deprecated. Use PushToS3Output
instead.
Source code in
prefect_aws/deployments/steps.py
@deprecated_callable(start_date=\"Jun 2023\", help=\"Use `PushToS3Output` instead.\")\nclass PushProjectToS3Output(PushToS3Output):\n \"\"\"Deprecated. Use `PushToS3Output` instead.\"\"\"\n
"},{"location":"deployments/steps/#prefect_aws.deployments.steps.PushToS3Output","title":"
PushToS3Output (dict)
","text":"
The output of the push_to_s3
step.
Source code in
prefect_aws/deployments/steps.py
class PushToS3Output(TypedDict):\n \"\"\"\n The output of the `push_to_s3` step.\n \"\"\"\n\n bucket: str\n folder: str\n
"},{"location":"deployments/steps/#prefect_aws.deployments.steps-functions","title":"Functions","text":""},{"location":"deployments/steps/#prefect_aws.deployments.steps.pull_from_s3","title":"
pull_from_s3
","text":"
Pulls the contents of an S3 bucket folder to the current working directory.
Parameters:
Name Type Description Default
bucket
str
The name of the S3 bucket where files are stored.
required
folder
str
The folder in the S3 bucket where files are stored.
required
credentials
Optional[Dict]
A dictionary of AWS credentials (aws_access_key_id, aws_secret_access_key, aws_session_token).
None
client_parameters
Optional[Dict]
A dictionary of additional parameters to pass to the boto3 client.
None
Returns:
Type Description
PullFromS3Output
A dictionary containing the bucket, folder, and local directory where files were downloaded.
Examples:
Pull files from S3 using the default credentials and client parameters:
pull:\n - prefect_aws.deployments.steps.pull_from_s3:\n requires: prefect-aws\n bucket: my-bucket\n folder: my-project\n
Pull files from S3 using credentials stored in a block:
pull:\n - prefect_aws.deployments.steps.pull_from_s3:\n requires: prefect-aws\n bucket: my-bucket\n folder: my-project\n credentials: \"{{ prefect.blocks.aws-credentials.dev-credentials }}\"\n
Source code in
prefect_aws/deployments/steps.py
def pull_from_s3(\n bucket: str,\n folder: str,\n credentials: Optional[Dict] = None,\n client_parameters: Optional[Dict] = None,\n) -> PullFromS3Output:\n \"\"\"\n Pulls the contents of an S3 bucket folder to the current working directory.\n\n Args:\n bucket: The name of the S3 bucket where files are stored.\n folder: The folder in the S3 bucket where files are stored.\n credentials: A dictionary of AWS credentials (aws_access_key_id,\n aws_secret_access_key, aws_session_token).\n client_parameters: A dictionary of additional parameters to pass to the\n boto3 client.\n\n Returns:\n A dictionary containing the bucket, folder, and local directory where\n files were downloaded.\n\n Examples:\n Pull files from S3 using the default credentials and client parameters:\n ```yaml\n pull:\n - prefect_aws.deployments.steps.pull_from_s3:\n requires: prefect-aws\n bucket: my-bucket\n folder: my-project\n ```\n\n Pull files from S3 using credentials stored in a block:\n ```yaml\n pull:\n - prefect_aws.deployments.steps.pull_from_s3:\n requires: prefect-aws\n bucket: my-bucket\n folder: my-project\n credentials: \"{{ prefect.blocks.aws-credentials.dev-credentials }}\"\n ```\n \"\"\"\n s3 = get_s3_client(credentials=credentials, client_parameters=client_parameters)\n\n local_path = Path.cwd()\n\n paginator = s3.get_paginator(\"list_objects_v2\")\n for result in paginator.paginate(Bucket=bucket, Prefix=folder):\n for obj in result.get(\"Contents\", []):\n remote_key = obj[\"Key\"]\n\n if remote_key[-1] == \"/\":\n # object is a folder and will be created if it contains any objects\n continue\n\n target = PurePosixPath(\n local_path\n / relative_path_to_current_platform(remote_key).relative_to(folder)\n )\n Path.mkdir(Path(target.parent), parents=True, exist_ok=True)\n s3.download_file(bucket, remote_key, str(target))\n\n return {\n \"bucket\": bucket,\n \"folder\": folder,\n \"directory\": str(local_path),\n }\n
"},{"location":"deployments/steps/#prefect_aws.deployments.steps.pull_project_from_s3","title":"
pull_project_from_s3
","text":"
Deprecated. Use pull_from_s3
instead.
Source code in
prefect_aws/deployments/steps.py
@deprecated_callable(start_date=\"Jun 2023\", help=\"Use `pull_from_s3` instead.\")\ndef pull_project_from_s3(*args, **kwargs):\n \"\"\"Deprecated. Use `pull_from_s3` instead.\"\"\"\n pull_from_s3(*args, **kwargs)\n
"},{"location":"deployments/steps/#prefect_aws.deployments.steps.push_project_to_s3","title":"
push_project_to_s3
","text":"
Deprecated. Use push_to_s3
instead.
Source code in
prefect_aws/deployments/steps.py
@deprecated_callable(start_date=\"Jun 2023\", help=\"Use `push_to_s3` instead.\")\ndef push_project_to_s3(*args, **kwargs):\n \"\"\"Deprecated. Use `push_to_s3` instead.\"\"\"\n push_to_s3(*args, **kwargs)\n
"},{"location":"deployments/steps/#prefect_aws.deployments.steps.push_to_s3","title":"
push_to_s3
","text":"
Pushes the contents of the current working directory to an S3 bucket, excluding files and folders specified in the ignore_file.
Parameters:
Name Type Description Default
bucket
str
The name of the S3 bucket where files will be uploaded.
required
folder
str
The folder in the S3 bucket where files will be uploaded.
required
credentials
Optional[Dict]
A dictionary of AWS credentials (aws_access_key_id, aws_secret_access_key, aws_session_token).
None
client_parameters
Optional[Dict]
A dictionary of additional parameters to pass to the boto3 client.
None
ignore_file
Optional[str]
The name of the file containing ignore patterns.
'.prefectignore'
Returns:
Type Description
PushToS3Output
A dictionary containing the bucket and folder where files were uploaded.
Examples:
Push files to an S3 bucket:
push:\n - prefect_aws.deployments.steps.push_to_s3:\n requires: prefect-aws\n bucket: my-bucket\n folder: my-project\n
Push files to an S3 bucket using credentials stored in a block:
push:\n - prefect_aws.deployments.steps.push_to_s3:\n requires: prefect-aws\n bucket: my-bucket\n folder: my-project\n credentials: \"{{ prefect.blocks.aws-credentials.dev-credentials }}\"\n
Source code in
prefect_aws/deployments/steps.py
def push_to_s3(\n bucket: str,\n folder: str,\n credentials: Optional[Dict] = None,\n client_parameters: Optional[Dict] = None,\n ignore_file: Optional[str] = \".prefectignore\",\n) -> PushToS3Output:\n \"\"\"\n Pushes the contents of the current working directory to an S3 bucket,\n excluding files and folders specified in the ignore_file.\n\n Args:\n bucket: The name of the S3 bucket where files will be uploaded.\n folder: The folder in the S3 bucket where files will be uploaded.\n credentials: A dictionary of AWS credentials (aws_access_key_id,\n aws_secret_access_key, aws_session_token).\n client_parameters: A dictionary of additional parameters to pass to the boto3\n client.\n ignore_file: The name of the file containing ignore patterns.\n\n Returns:\n A dictionary containing the bucket and folder where files were uploaded.\n\n Examples:\n Push files to an S3 bucket:\n ```yaml\n push:\n - prefect_aws.deployments.steps.push_to_s3:\n requires: prefect-aws\n bucket: my-bucket\n folder: my-project\n ```\n\n Push files to an S3 bucket using credentials stored in a block:\n ```yaml\n push:\n - prefect_aws.deployments.steps.push_to_s3:\n requires: prefect-aws\n bucket: my-bucket\n folder: my-project\n credentials: \"{{ prefect.blocks.aws-credentials.dev-credentials }}\"\n ```\n\n \"\"\"\n s3 = get_s3_client(credentials=credentials, client_parameters=client_parameters)\n\n local_path = Path.cwd()\n\n included_files = None\n if ignore_file and Path(ignore_file).exists():\n with open(ignore_file, \"r\") as f:\n ignore_patterns = f.readlines()\n\n included_files = filter_files(str(local_path), ignore_patterns)\n\n for local_file_path in local_path.expanduser().rglob(\"*\"):\n if (\n included_files is not None\n and str(local_file_path.relative_to(local_path)) not in included_files\n ):\n continue\n elif not local_file_path.is_dir():\n remote_file_path = Path(folder) / local_file_path.relative_to(local_path)\n s3.upload_file(\n str(local_file_path), bucket, str(remote_file_path.as_posix())\n )\n\n return {\n \"bucket\": bucket,\n \"folder\": folder,\n }\n
"}]}
\ No newline at end of file
diff --git a/secrets_manager/index.html b/secrets_manager/index.html
index 8e812d35..ff553072 100644
--- a/secrets_manager/index.html
+++ b/secrets_manager/index.html
@@ -18,7 +18,7 @@
-
+
@@ -530,6 +530,27 @@
+
+
+
+
+
+ Lambda
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
@@ -540,10 +561,10 @@
-
+
-