Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
KEP-2170: Kubeflow Training V2 API (#2171)
* KEP-2170: Kubeflow Training V2 API Signed-off-by: Andrey Velichkevich <[email protected]> * Fix some comments Signed-off-by: Andrey Velichkevich <[email protected]> * Add user roles diagram Signed-off-by: Andrey Velichkevich <[email protected]> * Move diagrams after design Signed-off-by: Andrey Velichkevich <[email protected]> * Update diagram Signed-off-by: Andrey Velichkevich <[email protected]> * Refactor Model and Dataset configs Signed-off-by: Andrey Velichkevich <[email protected]> * Update runtime timelines Signed-off-by: Andrey Velichkevich <[email protected]> * Address readability comments Signed-off-by: Andrey Velichkevich <[email protected]> * Explaination for Trainer Signed-off-by: Andrey Velichkevich <[email protected]> * Update LLM Fine-Tuning Diagram Signed-off-by: Andrey Velichkevich <[email protected]> * Fix Llama model name Signed-off-by: Andrey Velichkevich <[email protected]> * Add goal for integration with Kueue Signed-off-by: Andrey Velichkevich <[email protected]> * Add links for Job run policies Signed-off-by: Andrey Velichkevich <[email protected]> * Add some alternatives Signed-off-by: Andrey Velichkevich <[email protected]> * Fix more API types Signed-off-by: Andrey Velichkevich <[email protected]> * Fix empty number of nodes Signed-off-by: Andrey Velichkevich <[email protected]> * Rename to Coscheduling Signed-off-by: Andrey Velichkevich <[email protected]> * Change parameters to env Add runLauncherAsNode parameter Signed-off-by: Andrey Velichkevich <[email protected]> * Update PodSpecOverride with scheduling directives Signed-off-by: Andrey Velichkevich <[email protected]> * Fix TrainingRuntime field Signed-off-by: Andrey Velichkevich <[email protected]> * Refactor PodGroupSpec APIs Signed-off-by: Andrey Velichkevich <[email protected]> * Add note about scheduler name Signed-off-by: Andrey Velichkevich <[email protected]> * Add initial TrainJob status field Signed-off-by: Andrey Velichkevich <[email protected]> * Fix pre-commit Signed-off-by: Andrey Velichkevich <[email protected]> --------- Signed-off-by: Andrey Velichkevich <[email protected]>
- Loading branch information