-
Notifications
You must be signed in to change notification settings - Fork 698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2170: Adding CEL validations on v2 TrainJob CRD #2260
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this.
I left my first feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you implement integration testing if these validations should work?
We can implement those tests in https://github.com/kubeflow/training-operator/tree/126110fd4d76439bd04ca9fdf96bafb7ea3b6910/test/integration/webhook.v2.
Pull Request Test Coverage Report for Build 11412640672Details
💛 - Coveralls |
/hold |
Additionally, could you sign DCO? |
5c876bb
to
2b97162
Compare
2b97162
to
fba853b
Compare
/ok-to-test |
/assign @saileshd1402 @varshaprasad96 |
@andreyvelich: GitHub didn't allow me to assign the following users: saileshd1402, varshaprasad96. Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @akshaychitneni!
I left my initial comments.
/assign @kubeflow/wg-training-leads
@@ -56,6 +56,7 @@ type TrainJobList struct { | |||
} | |||
|
|||
// TrainJobSpec represents specification of the desired TrainJob. | |||
// +kubebuilder:validation:XValidation:rule="!has(oldSelf.managedBy) || has(self.managedBy)", message="ManagedBy is required once set" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a type scoped rule, making sure that it is not removed once set. Not sure if this is necessary as a default is being set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we set it here ?
// +kubebuilder:validation:XValidation:rule="self == oldSelf", message="ManagedBy value is immutable" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. I agree this is a rule that covers removal which need not be added here as we always set a default value. Updated PR. Thanks
@@ -56,6 +56,7 @@ type TrainJobList struct { | |||
} | |||
|
|||
// TrainJobSpec represents specification of the desired TrainJob. | |||
// +kubebuilder:validation:XValidation:rule="!has(oldSelf.managedBy) || has(self.managedBy)", message="ManagedBy is required once set" | |||
type TrainJobSpec struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akshaychitneni Do we want to add validations/defaults for other pars of TrainJob (e.g. Trainer, DatasetConfig, ModelConfig) as part of this PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich I was planning to cover other validations in webhook as they seem to complicated for CEL validation. For example validating dataset config with rule "training runtime must have the dataset-initializer
container in the Initializer
Job" require accessing referenced trainingruntime object. Let me know other wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's revisit the discussion of which validations we should implement after webhook validations are implemented.
9d739ea
to
4299515
Compare
Sorry for the late response. |
4299515
to
54ee0b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Basically, lgtm
@@ -56,6 +56,7 @@ type TrainJobList struct { | |||
} | |||
|
|||
// TrainJobSpec represents specification of the desired TrainJob. | |||
// +kubebuilder:validation:XValidation:rule="!has(oldSelf.managedBy) || has(self.managedBy)", message="ManagedBy is required once set" | |||
type TrainJobSpec struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's revisit the discussion of which validations we should implement after webhook validations are implemented.
@@ -71,4 +73,120 @@ var _ = ginkgo.Describe("TrainJob controller", ginkgo.Ordered, func() { | |||
gomega.Expect(k8sClient.Create(ctx, trainJob)).Should(gomega.Succeed()) | |||
}) | |||
}) | |||
|
|||
ginkgo.When("TrainJob CR Validation", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just comment. We will revisit the directory structure discussion after we implement the webhook validations.
8e4291f
to
b580661
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for updating!
/lgtm
/approve
@andreyvelich Do you want to recheck this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @akshaychitneni!
Just a few small comments
ManagedBy: &managedBy, | ||
} | ||
trainJob := &kubeflowv2.TrainJob{ | ||
TypeMeta: metav1.TypeMeta{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tenzen-y @akshaychitneni Is this TypeMeta required for ginkgo if we use the TrainJob struct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is needed to infer group/version for create api
Signed-off-by: Akshay Chitneni <[email protected]>
b580661
to
8258232
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution @akshaychitneni 🎉
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich, tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
What this PR does / why we need it:
This PR relates to #2209 adding CEL validations on TrainJob CRD. I will followup with validations implemented in webhook in separate PR
cc @andreyvelich @tenzen-y