Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider how to handle case that a pipeline cannot be drained before timeout #74

Closed
xdevxy opened this issue Jun 21, 2024 · 4 comments
Closed
Assignees
Labels
enhance PPND enhancement New feature or request

Comments

@xdevxy
Copy link
Collaborator

xdevxy commented Jun 21, 2024

Summary

In the case a pipeline cannot be drained before timeout, consider solutions how to update the pipeline without data loss. e.g. automatic replay from last successful processed offset.


Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

@xdevxy xdevxy added the enhancement New feature or request label Jun 21, 2024
@juliev0
Copy link
Collaborator

juliev0 commented Sep 13, 2024

Decisions from meeting with @vigith, @whynowy, and @xdevxy:

  • Ultimately, user needs to have configurability for whether their Pipeline times out or doesn't time out, although at Intuit the default will be to not time out.
  • "not timing out" can be for now configured as just an extremely long drain time in the Pipeline
  • In the case that the user selects to time out: if drain isn't completed by that time, the Pipeline will go to "Paused" state, PipelineRollout can mark the fact that it didn't fully drain in its Status (unless we think this is redundant with what Pipeline shows?), Numaplane will update the Pipeline and it will be back to "Running". The CI will pass. <-- or do we think it should not?

In the case that the user selects not to time out:

  • Pipeline will remain in "Pausing" phase, PipelineRollout will remain in "Pending" phase, Jenkins job will time out so CI fails.
  • If Pipeline eventually finishes pausing, ArgoCD will show that the PipelineRollout and Pipeline are good again. User would just need to re-trigger the job.
  • If Pipeline does not finish, the user needs a way to forcibly re-apply the spec without pausing first. Ideally, would like to do this through ArgoCD extension (@whynowy will follow up with Darshan/ArgoCD to see if user will have permission to write to Rollout spec), in which case extension can add an annotation to the PipelineRollout to indicate that the Pipeline needs to be reapplied without pausing. If Numaplane sees this annotation, it unpauses the pipeline and updates it directly. If we follow the principle that Numaplane does not modify annotations, then ArgoCD would need to periodically check the Rollout object to determine if things are good again such that it can remove the Annotation.

@juliev0
Copy link
Collaborator

juliev0 commented Sep 13, 2024

We also need to consider the case of the Numaflow Controller updating and the isbsvc updating. If Pipelines are set not to time out, then the result is that the NumaflowControllerRollout and ISBServiceRollout will be considered "Progressing" in ArgoCD, and the Jenkins pipeline will time out. User can take same corrective actions with their pipelines as listed above.

@juliev0
Copy link
Collaborator

juliev0 commented Sep 20, 2024

I feel like the simplest thing we could do initially, which would take care of most needs, is to simply have configurability for the timeout value on the PavedRoad side. Either the user makes it somewhat short (~5 minutes), or they can make it something like 1 hour. If Pipeline doesn't drain within 1 hour, there is likely something broken. In this case, if there is not yet a capability to force the Pipeline to be reapplied, then the user would need to wait for 1 hour until their Pipeline will be reapplied and start running again.

@juliev0
Copy link
Collaborator

juliev0 commented Sep 25, 2024

I have opened a new issue to track the work on Numaplane Backend side here: #295

Therefore, closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhance PPND enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants