Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

put_directory creates a new s3_client for each file uploaded #332

Closed
1 task done
mattiamatrix opened this issue Nov 1, 2023 · 2 comments · Fixed by #369
Closed
1 task done

put_directory creates a new s3_client for each file uploaded #332

mattiamatrix opened this issue Nov 1, 2023 · 2 comments · Fixed by #369
Labels
bug Something isn't working

Comments

@mattiamatrix
Copy link

mattiamatrix commented Nov 1, 2023

Hello all,

I am executing build_from_flow as below to upload all my Prefect files into an S3Bucket.

Deployment.build_from_flow(
    flow=...,
    skip_upload=False,
    storage: Block = S3Bucket.load(...),
    ...,
)

It works, but the result is that upload_to_storage takes a very long time. I have about 400 files to upload and it takes around 1 second per file.
I looked at the source code and the botocore output and it looks like a boto3 client and credentials are checked for every files.

version: prefect-aws==0.4.1

Expectation / Proposal

Uploading a bunch of files to S3 should very fast.

Traceback / Example

@desertaxle desertaxle added the bug Something isn't working label Nov 1, 2023
@rudeb0y
Copy link

rudeb0y commented Nov 7, 2023

Not sure if there is much of a difference but ours looks like:

from prefect.filesystems import S3

default_storage = S3.load("flow-storage")

Deployment.build_from_flow(
    storage=default_storage,

...

@mattiamatrix
Copy link
Author

mattiamatrix commented Dec 19, 2023

I realized what is happening here, and I started working on #361.

The _write_sync function that actually talks to S3, is within a loop. Here, the function calls s3_client = self._get_s3_client() which generates a new S3 client every time.

The problem is that the s3_client should not be created as a new object but instead created once and reused across multiple calls.

The improvement with the changes is substantial. In my setup, with 420 files, the upload goes from 5-7 minutes to 2 minutes.

@mattiamatrix mattiamatrix changed the title put_directory (might) create a new client for each file put_directory creates a new s3_client for each file uploaded Dec 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
3 participants