Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly propagate orchestration results of a failed orchestration #2748

Merged

Conversation

sebastianburckhardt
Copy link
Collaborator

As diagnosed in #2743, the current code in OutOfProcMiddleware.CallOrchestratorAsync does not correctly propagate the results of a failed orchestration to the backend.

The problem is that when an orchestrator function fails, the OrchestratorExecutionResult that was constructed by the executor is ignored, and instead an alternate OrchestratorExecutionResult.ForFailure is constructed from the exception returned by the function result. This is a problem because

  1. the original failure details are lost instead of being persisted in the history.
  2. any orchestrator actions that took place prior to the failure (e.g. sending lock release messages) are ignored.

This PR fixes the problem by propagating the original OrchestratorExecutionResult that describes the failure (if there is one).

  • Otherwise: I've added my notes to release_notes.md
  • I have added all required tests (Unit tests, E2E tests)
  • My changes do not change the version of the WebJobs.Extensions.DurableTask package
  • My changes should be added to v3.x branch.

@lilyjma lilyjma requested review from jviau and cgillum March 14, 2024 18:59
@lilyjma
Copy link
Contributor

lilyjma commented Mar 14, 2024

@cgillum / @jviau - requested for your review because there are some customers asking about this. There's a DF release starting next week, so it'd be great if this fix could go out with that.

Copy link
Collaborator

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I'm working on a process for creating end-to-end tests. Is there an orchestration that you used for testing this fix which we can borrow and include as part of an end-to-end test? (Creating such a test can be done separately from this PR - I'm just asking now in case I forget later)

@sebastianburckhardt
Copy link
Collaborator Author

Is there an orchestration that you used for testing this fix which we can borrow and include as part of an end-to-end test?

Yes. All the tests are in here: #2612
The specific test that would have detected this problem is called FaultyCriticalSection.

@sebastianburckhardt sebastianburckhardt merged commit 7472dad into dev Mar 18, 2024
20 checks passed
@sebastianburckhardt sebastianburckhardt deleted the sebastian/fix-isolated-orchestration-error-propagation branch March 19, 2024 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants