Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2170: Adding CEL validations on v2 TrainJob CRD #2260

Merged
merged 1 commit into from
Oct 19, 2024

Conversation

akshaychitneni
Copy link
Contributor

@akshaychitneni akshaychitneni commented Sep 16, 2024

What this PR does / why we need it:
This PR relates to #2209 adding CEL validations on TrainJob CRD. I will followup with validations implemented in webhook in separate PR

cc @andreyvelich @tenzen-y

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this.
I left my first feedback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you implement integration testing if these validations should work?
We can implement those tests in https://github.com/kubeflow/training-operator/tree/126110fd4d76439bd04ca9fdf96bafb7ea3b6910/test/integration/webhook.v2.

@coveralls
Copy link

coveralls commented Sep 17, 2024

Pull Request Test Coverage Report for Build 11412640672

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 11390874125: 0.0%
Covered Lines: 73
Relevant Lines: 73

💛 - Coveralls

@tenzen-y
Copy link
Member

/hold

@tenzen-y
Copy link
Member

Additionally, could you sign DCO?

@andreyvelich
Copy link
Member

/ok-to-test
/rerun-all

@andreyvelich andreyvelich changed the title Adding CEL validations on v2 TrainJob CRD KEP-2170: Adding CEL validations on v2 TrainJob CRD Sep 30, 2024
@andreyvelich
Copy link
Member

/assign @saileshd1402 @varshaprasad96

Copy link

@andreyvelich: GitHub didn't allow me to assign the following users: saileshd1402, varshaprasad96.

Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @saileshd1402 @varshaprasad96

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @akshaychitneni!
I left my initial comments.
/assign @kubeflow/wg-training-leads

@@ -56,6 +56,7 @@ type TrainJobList struct {
}

// TrainJobSpec represents specification of the desired TrainJob.
// +kubebuilder:validation:XValidation:rule="!has(oldSelf.managedBy) || has(self.managedBy)", message="ManagedBy is required once set"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this ?

Copy link
Contributor

@varshaprasad96 varshaprasad96 Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a type scoped rule, making sure that it is not removed once set. Not sure if this is necessary as a default is being set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we set it here ?

// +kubebuilder:validation:XValidation:rule="self == oldSelf", message="ManagedBy value is immutable"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I agree this is a rule that covers removal which need not be added here as we always set a default value. Updated PR. Thanks

pkg/apis/kubeflow.org/v2alpha1/trainjob_types.go Outdated Show resolved Hide resolved
@@ -56,6 +56,7 @@ type TrainJobList struct {
}

// TrainJobSpec represents specification of the desired TrainJob.
// +kubebuilder:validation:XValidation:rule="!has(oldSelf.managedBy) || has(self.managedBy)", message="ManagedBy is required once set"
type TrainJobSpec struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akshaychitneni Do we want to add validations/defaults for other pars of TrainJob (e.g. Trainer, DatasetConfig, ModelConfig) as part of this PR ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich I was planning to cover other validations in webhook as they seem to complicated for CEL validation. For example validating dataset config with rule "training runtime must have the dataset-initializer container in the Initializer Job" require accessing referenced trainingruntime object. Let me know other wise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revisit the discussion of which validations we should implement after webhook validations are implemented.

test/integration/cel.v2/trainjob_crd_test.go Outdated Show resolved Hide resolved
test/integration/cel.v2/trainjob_crd_test.go Outdated Show resolved Hide resolved
@tenzen-y
Copy link
Member

Sorry for the late response.
I will revisit here and start reviewing tomorrow.

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
Basically, lgtm

pkg/apis/kubeflow.org/v2alpha1/trainjob_types.go Outdated Show resolved Hide resolved
test/integration/controller.v2/trainjob_controller_test.go Outdated Show resolved Hide resolved
@@ -56,6 +56,7 @@ type TrainJobList struct {
}

// TrainJobSpec represents specification of the desired TrainJob.
// +kubebuilder:validation:XValidation:rule="!has(oldSelf.managedBy) || has(self.managedBy)", message="ManagedBy is required once set"
type TrainJobSpec struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revisit the discussion of which validations we should implement after webhook validations are implemented.

test/integration/cel.v2/trainjob_crd_test.go Outdated Show resolved Hide resolved
@@ -71,4 +73,120 @@ var _ = ginkgo.Describe("TrainJob controller", ginkgo.Ordered, func() {
gomega.Expect(k8sClient.Create(ctx, trainJob)).Should(gomega.Succeed())
})
})

ginkgo.When("TrainJob CR Validation", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just comment. We will revisit the directory structure discussion after we implement the webhook validations.

test/integration/controller.v2/trainjob_controller_test.go Outdated Show resolved Hide resolved
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating!
/lgtm
/approve

@tenzen-y
Copy link
Member

@andreyvelich Do you want to recheck this PR?

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @akshaychitneni!
Just a few small comments

test/integration/controller.v2/trainjob_controller_test.go Outdated Show resolved Hide resolved
ManagedBy: &managedBy,
}
trainJob := &kubeflowv2.TrainJob{
TypeMeta: metav1.TypeMeta{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y @akshaychitneni Is this TypeMeta required for ginkgo if we use the TrainJob struct ?

Copy link
Contributor Author

@akshaychitneni akshaychitneni Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is needed to infer group/version for create api

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution @akshaychitneni 🎉
/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [andreyvelich,tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@andreyvelich
Copy link
Member

/hold cancel

@google-oss-prow google-oss-prow bot merged commit 0149eb0 into kubeflow:master Oct 19, 2024
40 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants