Skip to content

Commit

Permalink
Move diagrams after design
Browse files Browse the repository at this point in the history
Signed-off-by: Andrey Velichkevich <[email protected]>
  • Loading branch information
andreyvelich committed Jul 18, 2024
1 parent 85dee06 commit 8a4b58d
Showing 1 changed file with 12 additions and 6 deletions.
18 changes: 12 additions & 6 deletions docs/proposals/2170-kubeflow-training-v2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,11 +102,21 @@ We propose these APIs:
to configure infrastructure parameters that are required for the **TrainJob**.
For example, failure policy or gang-scheduling.

The below diagram shows that platform engineers manage `TrainingRuntime` and data scientists create
`TrainJob`:
### User Roles Diagram

The below diagram shows how platform engineers manage `TrainingRuntime` and how data scientists
create `TrainJob`:

![user-roles](./user-roles.drawio.svg)

`TrainJob` can be created using `kubectl` or Kubeflow Python SDK.

### LLM Fine-Tuning Diagram

The below diagram shows which resources will be created for LLM fine-tuning with PyTorch:

![trainjob-diagram](./trainjob-diagram.drawio.svg)

### Worker and Node Definition

To better understand what does Nodes and Worker mean in the diagram above,
Expand Down Expand Up @@ -410,10 +420,6 @@ spec:
path: custom-datasets/yelp-review
```
The below diagram shows which resources will be created for LLM fine-tuning with PyTorch:
![trainjob-diagram](./trainjob-diagram.drawio.svg)
### The Trainer Config API
The `TrainerConfig` represents the APIs that data scientists can use to configure trainer settings:
Expand Down

0 comments on commit 8a4b58d

Please sign in to comment.