KUBESAW-187: Adjust ksctl adm restart command to use rollout-restart #79

fbm3307 · 2024-09-12T12:03:05Z

This is to adjust. the restart command logic
it does the following

If the command is run for host operator, it restart the whole host operator.(it deletes olm based pods(host-operator pods),
waits for the new deployment to come up, then uses rollout-restart command for non-olm based - registration-service)
If the command is run for member operator, it restart the whole member operator.(it deletes olm based pods(member-operator pods),
waits for the new deployment to come up, then uses rollout-restart command for non-olm based deployments - webhooks)

Signed-off-by: Feny Mehta <[email protected]>

…into kubesaw170_restart

Signed-off-by: Feny Mehta <[email protected]>

MatousJobanek

Could you please add a few print info lines for better UX?

pkg/cmd/adm/restart.go

…into kubesaw170_restart

Signed-off-by: Feny Mehta <[email protected]>

mfrancisc

Nice Job 👍

I left few minor comments. Also haven't got trough the test code since it looks still WIP.

pkg/cmd/adm/restart.go

Signed-off-by: Feny Mehta <[email protected]>

filariow

nice, just some minor feedback

pkg/cmd/adm/restart.go

mfrancisc

Looks good overall 👍

I have few aesthetic comments regarding the code and few questions/suggestions for the tests.

pkg/cmd/adm/restart.go

mfrancisc · 2024-09-25T12:42:57Z

pkg/cmd/adm/restart.go

 		return err
 	}

-	ctx.Printlnf("The deployment was scaled back to '%d'", originalReplicas)
+	if len(olmDeploymentList.Items) == 0 {


alternatively you could simply print both number of deployments found at the beginning:

fmt.Printf("OLM based deployments found in %s namespace: %d", ns, len(olmDeploymentList.Items)) fmt.Printf("NON-OLM based deployments found in %s namespace: %d", ns, len(nonOlmDeploymentlist.Items))

and avoid the if statements at line 93 and 104.

For a better user experience should we let users know that there was 0 deployment hence we did not go ahead with restart? and not go through the rest of the code ?
this was the reason i wrote if else

I think locally it doesn't change much, having an else and printing that there were no deployments or printing the number of deployments found at the beginning and going throw the rest of the code which will just skip the for loops since there will be no items to iterate on. But I'm fine with whatever looks more readable.

pkg/cmd/adm/restart.go

mfrancisc · 2024-09-25T13:31:13Z

pkg/cmd/adm/restart_test.go

-			fakeClient.MockUpdate = requireDeploymentBeingUpdated(t, fakeClient, namespacedName, 3, &numberOfUpdateCalls)
-			fakeClient.MockGet = func(ctx context.Context, key runtimeclient.ObjectKey, obj runtimeclient.Object, opts ...runtimeclient.GetOption) error {
-				return fmt.Errorf("some error")
+	tests := map[string]struct {


If it's not too tricky , I think we could add some tests/scenarios also for:

there's some other noise pods in the namespace that should not be restarted

the pod/deployment doesn't restart correctly ( both olm and non-olm )

mfrancisc · 2024-09-25T13:33:03Z

pkg/cmd/adm/restart.go

-	deployments := &appsv1.DeploymentList{}
-	if err := cl.List(context.TODO(), deployments, runtimeclient.InNamespace(ns)); err != nil {
-		return err
+func checkRolloutStatus(f cmdutil.Factory, ioStreams genericclioptions.IOStreams, labelSelector string) error {


this is fine for checking the latest rollout status, but it doesn't ensure that we actually have new pods right?
Should we introduce a check and verify that there are newly created pods after the restart ?

This may be useful with the OLM based deployments since there we directly delete the pods thus we want to make sure new ones are created.

pkg/cmd/adm/restart_test.go

Signed-off-by: Feny Mehta <[email protected]>

MatousJobanek

We also need to update the permissions in the corresponding Roles:

ksctl/resources/roles/host.yaml

Lines 7 to 22 in 1b90538

    
           - kind: Role 
        
             apiVersion: rbac.authorization.k8s.io/v1 
        
             metadata: 
        
               name: restart-deployment 
        
               labels: 
        
                 provider: ksctl 
        
             rules: 
        
             - apiGroups: 
        
               - apps 
        
               resources: 
        
               - deployments 
        
               verbs: 
        
               - "get" 
        
               - "list" 
        
               - "patch" 
        
               - "update"

ksctl/resources/roles/member.yaml

Lines 7 to 22 in 1b90538

    
           - kind: Role 
        
             apiVersion: rbac.authorization.k8s.io/v1 
        
             metadata: 
        
               name: restart-deployment 
        
               labels: 
        
                 provider: ksctl 
        
             rules: 
        
             - apiGroups: 
        
               - apps 
        
               resources: 
        
               - deployments 
        
               verbs: 
        
               - "get" 
        
               - "list" 
        
               - "patch" 
        
               - "update"

MatousJobanek · 2024-09-30T13:49:21Z

pkg/cmd/adm/restart.go

+	nonOlmDeployments := &appsv1.DeploymentList{}
+	if err := cl.List(ctx, nonOlmDeployments,
+		runtimeclient.InNamespace(ns),
+		runtimeclient.MatchingLabels{"provider": "codeready-toolchain"}); err != nil {


looking at our code (and the deployments) in member operator, we use the fully-qualified name of the provider label there.

toolchain.dev.openshift.com/provider: codeready-toolchain

But I see that there is a mismatch in the labels - reg-service uses the short version. Could you please fix it there?

MatousJobanek · 2024-09-30T13:49:56Z

pkg/cmd/adm/restart.go

+	if err := cl.List(ctx, nonOlmDeployments,
+		runtimeclient.InNamespace(ns),
+		runtimeclient.MatchingLabels{"provider": "codeready-toolchain"}); err != nil {


Let's make sure that we don't restart the autoscaler deployment in member cluster - this is not really needed.

pkg/cmd/adm/restart.go

MatousJobanek · 2024-09-30T15:26:00Z

pkg/cmd/adm/restart.go

+		for _, olmDeployment := range olmDeploymentList.Items {
+			ctx.Printlnf("Proceeding to delete the Pods of %v", olmDeployment)
+
+			if err := deleteAndWaitForPods(ctx, cl, olmDeployment, f, ioStreams); err != nil {
+				return err
+			}
+		}
+	}
+	if len(nonOlmDeploymentlist.Items) != 0 {
+		for _, nonOlmDeployment := range nonOlmDeploymentlist.Items {
+
+			ctx.Printlnf("Proceeding to restart the non-OLM deployment %v", nonOlmDeployment)
+
+			if err := restartNonOlmDeployments(ctx, nonOlmDeployment, f, ioStreams); err != nil {
+				return err
+			}
+			//check the rollout status
+			ctx.Printlnf("Checking the status of the rolled out deployment %v", nonOlmDeployment)
+			if err := checkRolloutStatus(ctx, f, ioStreams, "provider=codeready-toolchain"); err != nil {
+				return err


Let's try to simplify this. You can group some logic together and make it more generic. As I mentioned in the other comment, when we are getting the the list of deployments, we don't have to care if they are OLM or non-OLM based - we just:

list them based on the labels

check if there are the operator deployments (if not then fail)

filter out the autoscaler one

then you can iterate over each of the deployments:

check if the deployment has some owner-reference
i. if it has then delete the pods
ii. if not then use the rollout restart logic

wait for the rollout status

with this approach, you can have the logic simpler, with only one for-loop, only one if-else statement, and call the checkRolloutStatus function only from one place

MatousJobanek · 2024-09-30T15:28:54Z

pkg/cmd/adm/restart.go

+		ctx.Printlnf("Checking the status of the deleted pod's deployment %v", deployment)
+		//check the rollout status
+		if err := checkRolloutStatus(ctx, f, ioStreams, "kubesaw-control-plane=kubesaw-controller-manager"); err != nil {


the pods that the code is killing belong to the single deployment, right? you don't need to call checkRolloutStatus for every each one of them.

MatousJobanek · 2024-09-30T15:46:14Z

pkg/cmd/adm/unregister_member.go

@@ -62,5 +62,5 @@ func UnregisterMemberCluster(ctx *clicontext.CommandContext, clusterName string)
 	}
 	ctx.Printlnf("\nThe deletion of the Toolchain member cluster from the Host cluster has been triggered")

-	return restartHostOperator(ctx, hostClusterClient, hostClusterConfig.OperatorNamespace)
+	return restart(ctx, clusterName)


the clusterName represents the member that is being unregistered, we want to restart host only

MatousJobanek · 2024-09-30T15:50:35Z

pkg/cmd/adm/unregister_member_test.go

-	deployment.Labels = map[string]string{"olm.owner.namespace": "toolchain-host-operator"}
+	deployment.Labels = map[string]string{"kubesaw-control-plane": "kubesaw-controller-manager"}

 	newClient, fakeClient := NewFakeClients(t, toolchainCluster, deployment)
 	numberOfUpdateCalls := 0


I'm surprised that this still works without any additional changes - most likely because you restart the member operator (which doesn't contain any deployments).

MatousJobanek · 2024-09-30T15:51:00Z

pkg/cmd/adm/restart_test.go

+type RolloutRestartRESTClient struct {
+	*fake.RESTClient
 }


Why do we need this wrapper?

MatousJobanek · 2024-09-30T15:54:10Z

pkg/cmd/adm/restart_test.go

 func TestRestartDeployment(t *testing.T) {
 	// given


Correct me if I'm wrong, but I haven't found anywhere in the tests that the pods were really deleted and the deployment was modified. Could you add some interceptor for this?

MatousJobanek · 2024-09-30T15:55:53Z

pkg/cmd/adm/restart_test.go

+				fw := watch.NewFake()
+				dep := &appsv1.Deployment{}
+				dep.Name = deployment1.Name
+				dep.Status = appsv1.DeploymentStatus{
+					Replicas:            1,
+					UpdatedReplicas:     1,
+					ReadyReplicas:       1,
+					AvailableReplicas:   1,
+					UnavailableReplicas: 0,
+					Conditions: []appsv1.DeploymentCondition{{
+						Type: appsv1.DeploymentAvailable,
+					}},
+				}
+				dep.Labels = make(map[string]string)
+				dep.Labels[tc.labelKey] = tc.labelValue
+				c, err := runtime.DefaultUnstructuredConverter.ToUnstructured(dep.DeepCopyObject())
+				if err != nil {
+					t.Errorf("unexpected err %s", err)
+				}
+				u := &unstructured.Unstructured{}
+				u.SetUnstructuredContent(c)
+				go fw.Add(u)
+				return true, fw, nil


Let's record for which resources (and how many times) this was called

codecov · 2024-10-01T07:21:18Z

Codecov Report

Attention: Patch coverage is 62.62626% with 37 lines in your changes missing coverage. Please review.

Project coverage is 69.37%. Comparing base (bd2bf12) to head (f5c19de).

Files with missing lines	Patch %	Lines
pkg/cmd/adm/restart.go	62.24%	23 Missing and 14 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #79      +/-   ##
==========================================
- Coverage   69.66%   69.37%   -0.29%     
==========================================
  Files          43       43              
  Lines        2571     2596      +25     
==========================================
+ Hits         1791     1801      +10     
- Misses        589      596       +7     
- Partials      191      199       +8

Files with missing lines	Coverage Δ
pkg/cmd/adm/unregister_member.go	`51.42% <100.00%> (ø)`
pkg/cmd/adm/restart.go	`55.17% <62.24%> (-4.17%)`	⬇️

Signed-off-by: Feny Mehta <[email protected]>

fbm3307 added 4 commits September 12, 2024 17:28

KUBESAW-187: Adjust ksctl adm restart command to use rollout-restart

2748b57

Signed-off-by: Feny Mehta <[email protected]>

some checking

da57803

Signed-off-by: Feny Mehta <[email protected]>

Merge branch 'master' into kubesaw170_restart

aeef8de

golint

f2c29ee

Signed-off-by: Feny Mehta <[email protected]>

fbm3307 mentioned this pull request Sep 13, 2024

KUBESAW-170: Replace custom ksctl adm restart logic with the one from kubectl rollout #70

Closed

fbm3307 and others added 5 commits September 13, 2024 18:42

Merge branch 'master' into kubesaw170_restart

c742197

few changes to the logic

ba8866e

Signed-off-by: Feny Mehta <[email protected]>

Merge branch 'kubesaw170_restart' of https://github.com/fbm3307/ksctl …

ad5348e

…into kubesaw170_restart

t cases

cd4b1bf

Signed-off-by: Feny Mehta <[email protected]>

Merge branch 'master' into kubesaw170_restart

4cd8e26

MatousJobanek reviewed Sep 18, 2024

View reviewed changes

Merge branch 'kubesaw170_restart' of https://github.com/fbm3307/ksctl …

90f7d40

…into kubesaw170_restart

This was referenced Sep 19, 2024

KUBESAW-187: Have a unique label for our operators codeready-toolchain/member-operator#597

Merged

KUBESAW-187: have a unique label for our OLM based operator codeready-toolchain/host-operator#1087

Merged

fbm3307 added 3 commits September 19, 2024 13:47

eview comments

4c15cf0

Signed-off-by: Feny Mehta <[email protected]>

Review comments

8796901

Signed-off-by: Feny Mehta <[email protected]>

check the args

1d68d34

Signed-off-by: Feny Mehta <[email protected]>

mfrancisc reviewed Sep 23, 2024

View reviewed changes

pkg/cmd/adm/restart.go Show resolved Hide resolved

pkg/cmd/adm/restart.go Outdated Show resolved Hide resolved

pkg/cmd/adm/restart.go Outdated Show resolved Hide resolved

pkg/cmd/adm/restart.go Show resolved Hide resolved

pkg/cmd/adm/restart.go Outdated Show resolved Hide resolved

fbm3307 added 5 commits September 23, 2024 17:15

adding unit test cases

47bc27e

Signed-off-by: Feny Mehta <[email protected]>

Change in test cases

8f56cbf

Signed-off-by: Feny Mehta <[email protected]>

Merge branch 'master' into kubesaw170_restart

9e8fc49

minor change in unit test

92d0237

Signed-off-by: Feny Mehta <[email protected]>

unregister-member test

c0332b1

Signed-off-by: Feny Mehta <[email protected]>

fbm3307 marked this pull request as ready for review September 25, 2024 09:09

fbm3307 requested review from xcoulon, alexeykazakov, rajivnathan, ranakan19, sbryzak and drpaneas as code owners September 25, 2024 09:09

fbm3307 requested a review from metlos as a code owner September 25, 2024 09:09

unit test case for restart

83e99b5

Signed-off-by: Feny Mehta <[email protected]>

fbm3307 requested review from mfrancisc and MatousJobanek September 25, 2024 09:42

test case for delete

d5e5280

Signed-off-by: Feny Mehta <[email protected]>

filariow reviewed Sep 25, 2024

View reviewed changes

pkg/cmd/adm/restart.go Outdated Show resolved Hide resolved

pkg/cmd/adm/restart.go Outdated Show resolved Hide resolved

mfrancisc reviewed Sep 25, 2024

View reviewed changes

fbm3307 and others added 3 commits September 26, 2024 15:01

Rc1

b6f3df1

Signed-off-by: Feny Mehta <[email protected]>

golint

51e1e4e

Signed-off-by: Feny Mehta <[email protected]>

Merge branch 'master' into kubesaw170_restart

f2c234e

MatousJobanek reviewed Sep 30, 2024

View reviewed changes

Merge branch 'master' into kubesaw170_restart

f5c19de

fbm3307 mentioned this pull request Oct 9, 2024

KUBESAW-187: Have similar label names in member and host codeready-toolchain/host-operator#1097

Merged

fbm3307 added 3 commits October 10, 2024 15:44

changes to the logic of restart

f3cf690

Signed-off-by: Feny Mehta <[email protected]>

Merge branch 'master' into kubesaw170_restart

1868b12

Merge branch 'master' into kubesaw170_restart

2d4d4b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KUBESAW-187: Adjust ksctl adm restart command to use rollout-restart #79

KUBESAW-187: Adjust ksctl adm restart command to use rollout-restart #79

fbm3307 commented Sep 12, 2024

MatousJobanek left a comment

mfrancisc left a comment

filariow left a comment

mfrancisc left a comment

mfrancisc Sep 25, 2024

fbm3307 Sep 26, 2024 •

edited

Loading

mfrancisc Sep 26, 2024

mfrancisc Sep 25, 2024

mfrancisc Sep 25, 2024

mfrancisc Sep 25, 2024

MatousJobanek left a comment

MatousJobanek Sep 30, 2024

MatousJobanek Sep 30, 2024

MatousJobanek Sep 30, 2024

MatousJobanek Sep 30, 2024

MatousJobanek Sep 30, 2024

MatousJobanek Sep 30, 2024

MatousJobanek Sep 30, 2024

MatousJobanek Sep 30, 2024

MatousJobanek Sep 30, 2024

codecov bot commented Oct 1, 2024

	- kind: Role
	apiVersion: rbac.authorization.k8s.io/v1
	metadata:
	name: restart-deployment
	labels:
	provider: ksctl
	rules:
	- apiGroups:
	- apps
	resources:
	- deployments
	verbs:
	- "get"
	- "list"
	- "patch"
	- "update"

KUBESAW-187: Adjust ksctl adm restart command to use rollout-restart #79

Are you sure you want to change the base?

KUBESAW-187: Adjust ksctl adm restart command to use rollout-restart #79

Conversation

fbm3307 commented Sep 12, 2024

MatousJobanek left a comment

Choose a reason for hiding this comment

mfrancisc left a comment

Choose a reason for hiding this comment

filariow left a comment

Choose a reason for hiding this comment

mfrancisc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fbm3307 Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MatousJobanek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 1, 2024

Codecov Report

fbm3307 Sep 26, 2024 •

edited

Loading