[SDK] Allow customising base trainer and storage images in Train API #2261

varshaprasad96 · 2024-09-17T21:29:41Z

What this PR does / why we need it:
Allow customising base storage_initializer and trainer images through Env vars.
Example use case: Train API could be expanded to use ROCm libs in addition to CUDA.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #2247

TODO: Docs to be updated in https://github.com/kubeflow/website.

Checklist:

Docs included if any changes are user facing

Allow customizing base storage_initializer and trainer images through Env vars. Signed-off-by: Varsha Prasad Narsing <[email protected]>

coveralls · 2024-09-18T18:29:35Z

Pull Request Test Coverage Report for Build 10927951593

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall first build on sdk/fetch-base-image at 100.0%

Totals
Change from base Build 10927738808:	100.0%
Covered Lines:	66
Relevant Lines:	66

💛 - Coveralls

tenzen-y

Thank you for creating this PR!
/approve

@deepanker13 Do you have any other comments?
If not, you can just say /lgtm, and then this will be merged into the master branch.

google-oss-prow · 2024-09-18T18:46:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~sdk/python/OWNERS~~ [tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tenzen-y · 2024-09-18T19:01:23Z

/assign @deepanker13

deepanker13 · 2024-09-19T08:47:37Z

Thanks @varshaprasad96
/lgtm

andreyvelich

Thank you for doing this @varshaprasad96!
Please can you submit PR to address my comment ?

andreyvelich · 2024-09-23T14:00:25Z

sdk/python/kubeflow/training/constants/constants.py

@@ -82,14 +82,14 @@


 # TODO (andreyvelich): We should add image tag for Storage Initializer and Trainer.
-STORAGE_INITIALIZER_IMAGE = "docker.io/kubeflow/storage-initializer"
+STORAGE_INITIALIZER_IMAGE_DEFAULT = "docker.io/kubeflow/storage-initializer"


@varshaprasad96 Please can you submit PR to add the following change in the constants.py:

STORAGE_INITIALIZER_IMAGE = os.getenv("STORAGE_INITIAILIZER_IMAGE", "docker.io/kubeflow/storage-initializer") TRAINER_TRANSFORMER_IMAGE = os.getenv("TRAINER_TRANSFORMER_IMAGE", "docker.io/kubeflow/trainer-huggingface")

That will allow users to quickly see the env they can modify, instead of searching in the training_client.py.

I see!.. here we go: #2268

Follow up from kubeflow/training-operator#2261 as this is a user facing change. Signed-off-by: Varsha Prasad Narsing <[email protected]>

google-oss-prow bot requested review from jinchihe and kuizhiqing September 17, 2024 21:29

google-oss-prow bot added the size/XS label Sep 17, 2024

[SDK] Allow customizing base trainer and storage images in Train API

40c5c10

Allow customizing base storage_initializer and trainer images through Env vars. Signed-off-by: Varsha Prasad Narsing <[email protected]>

varshaprasad96 force-pushed the sdk/fetch-base-image branch from 9cda074 to 40c5c10 Compare September 18, 2024 18:24

google-oss-prow bot added size/S and removed size/XS labels Sep 18, 2024

tenzen-y reviewed Sep 18, 2024

View reviewed changes

google-oss-prow bot added the approved label Sep 18, 2024

google-oss-prow bot assigned deepanker13 Sep 18, 2024

google-oss-prow bot added the lgtm label Sep 19, 2024

google-oss-prow bot merged commit ee6756b into kubeflow:master Sep 19, 2024
39 checks passed

andreyvelich reviewed Sep 23, 2024

View reviewed changes

varshaprasad96 mentioned this pull request Sep 23, 2024

[SDK] move env var to constants.py #2268

Merged

1 task

varshaprasad96 mentioned this pull request Sep 24, 2024

[KFTO-SDK] Add doc on customising base images for Train API kubeflow/website#3879

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SDK] Allow customising base trainer and storage images in Train API #2261

[SDK] Allow customising base trainer and storage images in Train API #2261

varshaprasad96 commented Sep 17, 2024

coveralls commented Sep 18, 2024

tenzen-y left a comment

google-oss-prow bot commented Sep 18, 2024

tenzen-y commented Sep 18, 2024

deepanker13 commented Sep 19, 2024

andreyvelich left a comment

andreyvelich Sep 23, 2024 •

edited

Loading

varshaprasad96 Sep 23, 2024

[SDK] Allow customising base trainer and storage images in Train API #2261

[SDK] Allow customising base trainer and storage images in Train API #2261

Conversation

varshaprasad96 commented Sep 17, 2024

coveralls commented Sep 18, 2024

Pull Request Test Coverage Report for Build 10927951593

Details

💛 - Coveralls

tenzen-y left a comment

Choose a reason for hiding this comment

google-oss-prow bot commented Sep 18, 2024

tenzen-y commented Sep 18, 2024

deepanker13 commented Sep 19, 2024

andreyvelich left a comment

Choose a reason for hiding this comment

andreyvelich Sep 23, 2024 • edited Loading

Choose a reason for hiding this comment

varshaprasad96 Sep 23, 2024

Choose a reason for hiding this comment

andreyvelich Sep 23, 2024 •

edited

Loading