Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support bluegreen release: rollout controller #232

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

myname4423
Copy link
Contributor

@myname4423 myname4423 commented Sep 12, 2024

Ⅰ. Describe what this PR does

the first patch is about rollout controller

  1. add a bluegreen release manager for bluegreen
  2. move some common functions of both bluegreen and canary to releaseManager.go

Ⅱ. Does this pull request fix one issue?

Ⅲ. Special notes for reviews

@myname4423 myname4423 changed the title support bluegreen release: webhook update support bluegreen release: rollout controller Sep 13, 2024
tr.RecheckDuration = time.Duration(trafficrouting.GetGraceSeconds(c.Rollout.Spec.Strategy.GetTrafficRouting(), defaultGracePeriodSeconds)) * time.Second
}
expectedTime := time.Now().Add(tr.RecheckDuration)
expectedTime := time.Now().Add(time.Duration(defaultGracePeriodSeconds) * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change the code here ? if one set TrafficRoutingRef.GracePeriodSeconds, we should respect the user setting

Copy link
Contributor Author

@myname4423 myname4423 Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How the tr.RecheckDuration works:
if we do some traffic-related operations (eg. some functions in manager.go), the operations will update the tr.RecheckDuration automatically, if operation A need to recheck 300ms later, operations B need to recheck 500ms later, then tr.RecheckDuration will be the max of 300ms and 500ms. However, the initial value of tr.RecheckDuration is 0, if we doesn't do any traffic-related operations, the tr.RecheckDuration will keep the initial value. When RecheckTime is 0, requeue won't happen. For function DoTrafficRouting, it won't always update tr.RecheckDuration, thus the tr.RecheckDuration may be 0. That's why i added a check (the removed lines) before. However, I realize that

  1. the check won't save significant time due to the complex logic in the DoTrafficRouting function
  2. this check is a little hard to understand for others, that's why i removed these lines and use the original defaultGracePeriodSeconds again (ie. the removed are lines added by me in a recent commit, and the line to add is the original logic)

@@ -506,7 +505,10 @@ func (r *RolloutReconciler) recalculateCanaryStep(c *RolloutContext) (int32, err
if c.NewStatus != nil {
currentIndex = c.NewStatus.GetSubStatus().CurrentStepIndex - 1
}
steps := append([]int{}, int(currentIndex))
steps := make([]int, 0)
if ci := int(currentIndex); ci >= 0 && ci < len(c.Rollout.Spec.Strategy.GetSteps()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz comment when ci will be less than zero ?

Copy link
Contributor Author

@myname4423 myname4423 Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's unlikely to be less than 0, just to make sure the slice doesn't go out of index

@@ -94,6 +94,13 @@ func (r *ControllerFinder) GetWorkloadForRef(rollout *rolloutv1beta1.Rollout) (*
return workload, err
}
}
} else if rollout.Spec.Strategy.GetRollingStyle() == rolloutv1beta1.BlueGreenRollingStyle {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz add ut for GetWorkloadForRef

if workload != nil || err != nil {
return workload, err
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider extract the following code block as a common code

		for _, finder := range finders {
			workload, err := finder(rollout.Namespace, &workloadRef)
			if workload != nil || err != nil {
				return workload, err
			}
		}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

blueGreenStatus.CurrentStepState = v1beta1.CanaryStepStateUpgrade
klog.Infof("rollout(%s/%s) step(%d) state from(%s) -> to(%s)", c.Rollout.Namespace, c.Rollout.Name,
blueGreenStatus.CurrentStepIndex, v1beta1.CanaryStepStateInit, blueGreenStatus.CurrentStepState)
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using a ’labeled’ fallthrough to avoid unnecessary requeue , e.g

if currentStep.Traffic == nil && len(currentStep.Matches) == 0 {
   ...
   goto nextCase
}
....
nextCase:
  fallthrough
case: v1beta1.CanaryStepStateUpgrade:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}
//TODO - consider istio subsets
if blueGreenStatus.CurrentStepIndex == 1 {
klog.Infof("special case detected: rollout(%s/%s) patch stable Service", c.Rollout.Namespace, c.Rollout.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not special case, we should always patch stable service here. consider revise or removing to logging to avoid confusion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

} else {
klog.Infof("rollout(%s/%s) bluegreen run all steps, and completed", c.Rollout.Namespace, c.Rollout.Name)
blueGreenStatus.LastUpdateTime = &metav1.Time{Time: time.Now()}
blueGreenStatus.CurrentStepState = v1beta1.CanaryStepStateCompleted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we fallthough to the case CanaryStepStateCompleted to avoid unnecessary reenqueue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

} else {
bluegreenStatus.LastUpdateTime = &metav1.Time{Time: time.Now()}
bluegreenStatus.CurrentStepState = v1beta1.CanaryStepStateInit
klog.Infof("rollout(%s/%s) step(%d) state from(%s) -> to(%s)", c.Rollout.Namespace, c.Rollout.Name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz move the following lines out of if clause since they can be shared by every if conditions

			bluegreenStatus.LastUpdateTime = &metav1.Time{Time: time.Now()}
			klog.Infof("rollout(%s/%s) step(%d) state from(%s) -> to(%s)", c.Rollout.Namespace, c.Rollout.Name,
				bluegreenStatus.CurrentStepIndex, currentStepStateBackup, bluegreenStatus.CurrentStepState)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a little hard to do so, here is why

  1. only if a step is done, can we run bluegreenStatus.LastUpdateTime = &metav1.Time{Time: time.Now()}, if not done, we cannot update the bluegreenStatus.LastUpdateTime.
  2. klog Infof is reasonable to not shared by every if conditions either, it can help us to realize the fallthrough, what's more, if we use a new variable currentStepStateBackup additionally, the total lines seems keep the same....

case v1beta1.FinalisingStepTypeRemoveCanaryService:
retry, err = m.trafficRoutingManager.RemoveCanaryService(tr)
// route all traffic to new version
case v1beta1.FinalisingStepTypeRouteAllTraffic:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider rename these finalizing steps so that the name can be more differentiating. now it is very hard to tell the difference between FinalisingStepTypeRouteAllTraffic and FinalisingStepTypeGateway, and it is hard to guess the meaning of step FinalisingStepTypeBatchRelease and FinalisingStepTypeGateway, consider rename them to
FinalisingStepTypeRestoreWorkload and FinalisingStepTypeRestoreGateway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok!

but CurrentStepIndex remains 2, which could cause out of range.
*/
resetCurrentIndex := false
if int(bluegreenStatus.CurrentStepIndex) > len(c.Rollout.Spec.Strategy.BlueGreen.Steps) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to forbid adding or removing steps during rollout so that the code can be more simpler?

// if current step is empty, set it with the first step
// if current step is end, we just return
if len(blueGreenStatus.FinalisingStep) == 0 {
blueGreenStatus.FinalisingStep = nextStep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to move finalizing status to rollotu.status.conditions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the finished finalizing steps in saved in rollout.status.conditions, it is still possible to determine the current step in nextBlueGreenTask by iterating over taskSequence and find the last recorded step

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants