Skip to content

Commit

Permalink
Add user roles diagram
Browse files Browse the repository at this point in the history
Signed-off-by: Andrey Velichkevich <[email protected]>
  • Loading branch information
andreyvelich committed Jul 18, 2024
1 parent 1c49568 commit 85dee06
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 2 deletions.
9 changes: 7 additions & 2 deletions docs/proposals/2170-kubeflow-training-v2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,9 +102,10 @@ We propose these APIs:
to configure infrastructure parameters that are required for the **TrainJob**.
For example, failure policy or gang-scheduling.

The below diagram shows which resources will be created for LLM fine-tuning with PyTorch.
The below diagram shows that platform engineers manage `TrainingRuntime` and data scientists create
`TrainJob`:

![trainjob-diagram](./trainjob-diagram.jpg)
![user-roles](./user-roles.drawio.svg)

### Worker and Node Definition

Expand Down Expand Up @@ -409,6 +410,10 @@ spec:
path: custom-datasets/yelp-review
```
The below diagram shows which resources will be created for LLM fine-tuning with PyTorch:
![trainjob-diagram](./trainjob-diagram.drawio.svg)
### The Trainer Config API
The `TrainerConfig` represents the APIs that data scientists can use to configure trainer settings:
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 85dee06

Please sign in to comment.