Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multipart blob download #5715

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

wayner0628
Copy link
Contributor

@wayner0628 wayner0628 commented Sep 1, 2024

Tracking issue

#3632

Why are the changes needed?

Supporting multipart blob downloads allows us to completely copy the specified directory into the input path.

What changes were proposed in this pull request?

  • Using new storage List api to collect items under container before download
  • Implement List api for memory storage
  • Parallel download

How was this patch tested?

unit tests, specifically in download_test.go

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

flyteorg/flytekit#2258

Docs link

NA

Copy link

codecov bot commented Sep 1, 2024

Codecov Report

Attention: Patch coverage is 60.22727% with 35 lines in your changes missing coverage. Please review.

Project coverage is 36.25%. Comparing base (59bf191) to head (e008444).

Files with missing lines Patch % Lines
flytecopilot/data/download.go 71.62% 14 Missing and 7 partials ⚠️
flytestdlib/storage/mem_store.go 0.00% 12 Missing ⚠️
flytestdlib/storage/storage.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5715      +/-   ##
==========================================
+ Coverage   36.21%   36.25%   +0.03%     
==========================================
  Files        1303     1303              
  Lines      109644   109713      +69     
==========================================
+ Hits        39710    39774      +64     
+ Misses      65810    65802       -8     
- Partials     4124     4137      +13     
Flag Coverage Δ
unittests-datacatalog 51.37% <ø> (ø)
unittests-flyteadmin 55.63% <ø> (+0.03%) ⬆️
unittests-flytecopilot 20.34% <71.62%> (+8.16%) ⬆️
unittests-flytectl 62.21% <ø> (ø)
unittests-flyteidl 7.12% <ø> (ø)
unittests-flyteplugins 53.35% <ø> (ø)
unittests-flytepropeller 41.75% <ø> (ø)
unittests-flytestdlib 55.04% <0.00%> (-0.17%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
@wayner0628 wayner0628 marked this pull request as ready for review September 5, 2024 19:14
Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wayner0628 - i think this is good. I want to get @eapolinario or @EngHabu to take a quick look at this as well though. This is a pretty core interface that's changing in this PR.

@@ -78,6 +78,9 @@ type RawStore interface {
// Head gets metadata about the reference. This should generally be a light weight operation.
Head(ctx context.Context, reference DataReference) (Metadata, error)

// GetItems retrieves the paths of all items from the Blob store or an error
GetItems(ctx context.Context, reference DataReference) ([]string, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be more accurately named ListItems? Also what is retrieved? The relative path to the reference input? can we add comment?

flytecopilot/data/download.go Show resolved Hide resolved
@@ -54,6 +55,23 @@ func (s *InMemoryStore) Head(ctx context.Context, reference DataReference) (Meta
}, nil
}

func (s *InMemoryStore) GetItems(ctx context.Context, reference DataReference) ([]string, error) {
var items []string
prefix := string(reference) + "/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will reference ever already have a /?

Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @wayner0628
Can you test cases like this PR?
flyteorg/flytekit#2258
To be more specifically, this case

flyte_dir_io = ContainerTask(
    name="flyte_dir_io",
    input_data_dir="/var/inputs",
    output_data_dir="/var/outputs",
    inputs=kwtypes(inputs=FlyteDirectory),
    outputs=kwtypes(out=FlyteDirectory),
    image="futureoutlier/rawcontainer:0320",
    command=[
        "python",
        "write_flytedir.py",
        "{{.inputs.inputs}}",
        "/var/outputs/out",
    ],
)

If possible, please proivde screenshot, thank you.

@wild-endeavor
Copy link
Contributor

There is also this PR, https://github.com/flyteorg/flyte/pull/5674/files which I think we should merge first. The change to core api should probably be done separately.

@wild-endeavor
Copy link
Contributor

@wayner0628 #5741 this was just merged, adding a list api to the storage client. mind using the new interface to do this?

@wayner0628
Copy link
Contributor Author

@wild-endeavor No problem, I'll update this PR to align with the new interface.

Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tips to develop copilot in single binary.

  1. config
plugins:
  logs:
    dynamic-log-links:
      - comet-ml-execution-id:
          displayName: Comet
          templateUris: "{{ .taskConfig.host }}/{{ .taskConfig.workspace }}/{{ .taskConfig.project_name }}/{{ .executionName }}{{ .nodeId }}{{ .taskRetryAttempt }}{{ .taskConfig.link_suffix }}"
      - comet-ml-custom-id:
          displayName: Comet
          templateUris: "{{ .taskConfig.host }}/{{ .taskConfig.workspace }}/{{ .taskConfig.project_name }}/{{ .taskConfig.experiment_key }}"

    kubernetes-enabled: true
    kubernetes-template-uri: http://localhost:30080/kubernetes-dashboard/#/log/{{.namespace }}/{{ .podName }}/pod?namespace={{ .namespace }}
    cloudwatch-enabled: false
    stackdriver-enabled: false
  k8s:
    default-env-vars:
      - FLYTE_AWS_ENDPOINT: "http://flyte-sandbox-minio.flyte:9000"
      - FLYTE_AWS_ACCESS_KEY_ID: minio
      - FLYTE_AWS_SECRET_ACCESS_KEY: miniostorage
      - MLFLOW_TRACKING_URI: postgresql+psycopg2://postgres:@postgres.flyte.svc.cluster.local:5432/flyteadmin
    co-pilot:
          image: "localhost:30000/copilot-flytefile:0603"
  1. how to build copilot image?
    use Dockerfile.flytecopilot to build it.

@wayner0628 wayner0628 closed this Sep 15, 2024
@wayner0628 wayner0628 reopened this Sep 15, 2024
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Signed-off-by: wayner0628 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants