Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add timeout for pausing pipeline. Fixes #992 #1138

Merged
merged 32 commits into from
Oct 10, 2023

Conversation

dpadhiar
Copy link
Contributor

@dpadhiar dpadhiar commented Oct 2, 2023

Explain what this PR does.

Fixes #992

Allows user to add timeout (grace period) for pausing a pipeline. Default is 30 seconds and can be configured in spec.Lifecycle.PauseGracePeriodSeconds.

pkg/apis/numaflow/v1alpha1/pipeline_types.go Show resolved Hide resolved
pkg/reconciler/pipeline/controller.go Outdated Show resolved Hide resolved
pkg/reconciler/pipeline/controller.go Outdated Show resolved Hide resolved
pkg/reconciler/pipeline/controller.go Outdated Show resolved Hide resolved
pkg/reconciler/pipeline/controller.go Outdated Show resolved Hide resolved
@dpadhiar dpadhiar changed the title feat: add timeout for pausing pipeline (WIP) feat: add timeout for pausing pipeline. Fixes #992 Oct 4, 2023
@dpadhiar dpadhiar marked this pull request as ready for review October 4, 2023 18:55
@dpadhiar dpadhiar requested a review from vigith as a code owner October 4, 2023 18:55
@dpadhiar dpadhiar requested a review from whynowy October 4, 2023 18:55
@dpadhiar dpadhiar requested a review from whynowy October 4, 2023 21:27
@dpadhiar dpadhiar requested a review from whynowy October 5, 2023 16:21
@dpadhiar dpadhiar requested a review from whynowy October 5, 2023 23:33
@whynowy
Copy link
Member

whynowy commented Oct 6, 2023

@dpadhiar - I tested the change, the annotation causes one extra reconciliation. Can you look into it?

@dpadhiar
Copy link
Contributor Author

dpadhiar commented Oct 6, 2023

@dpadhiar - I tested the change, the annotation causes one extra reconciliation. Can you look into it?

With a log, I am seeing that using update causes the reconciliation to happen twice when resuming the pipeline. When pausing the pipeline, reconciliation occurs twice to change the Status from Running to Pausing to Paused which is expected.

If we use patch instead of update, reconciliation only occurs once when resuming.

if pl.GetAnnotations() == nil || pl.GetAnnotations()[dfv1.KeyPauseTimestamp] == "" {
		pl.SetAnnotations(map[string]string{dfv1.KeyPauseTimestamp: time.Now().Format(time.RFC3339)})
		body, err := json.Marshal(pl)
		if err != nil {
			return false, err
		}
		err = r.client.Patch(ctx, pl, client.RawPatch(types.MergePatchType, body))
		if err != nil && !apierrors.IsNotFound(err) {
			return false, err
		}
	}

Signed-off-by: Dillen Padhiar <[email protected]>
pkg/reconciler/pipeline/controller.go Outdated Show resolved Hide resolved
}
err = r.client.Patch(ctx, pl, client.RawPatch(types.MergePatchType, body))
if err != nil && !apierrors.IsNotFound(err) {
return false, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we should let it re-queue, right?

pkg/reconciler/pipeline/controller.go Show resolved Hide resolved
pkg/reconciler/pipeline/controller.go Outdated Show resolved Hide resolved
pkg/reconciler/pipeline/controller.go Show resolved Hide resolved
Copy link
Member

@whynowy whynowy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the rest looks good to me, except the one commented.

err := r.client.Patch(ctx, pl, client.RawPatch(types.JSONPatchType, []byte(dfv1.RemovePauseTimestampPatch)))
if err != nil && !apierrors.IsNotFound(err) {
return true, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a tiny issue here:
if err != nill && IsNotFound, return false, nil

Somehow the pipeline can not be found, and we skip it.

Copy link
Contributor Author

@dpadhiar dpadhiar Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if err != nil {
  if apierrors.IsNotFound(err) {
    return false, nil // skip pl if we can't find it
  } else {
      return true, err // otherwise requeue resume
  }
}

@dpadhiar dpadhiar requested a review from whynowy October 10, 2023 17:04
Copy link
Member

@whynowy whynowy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@dpadhiar
Copy link
Contributor Author

@whynowy Not sure why this jetstream e2e test continues to fail, otherwise should be done.

@whynowy whynowy merged commit b962f8e into numaproj:main Oct 10, 2023
16 of 17 checks passed
whynowy pushed a commit that referenced this pull request Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No grace period (timeout) for pausing a pipeline
2 participants