Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Errors in XState #4

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

RFC: Errors in XState #4

wants to merge 2 commits into from

Conversation

Andarist
Copy link
Member

@VanTanev
Copy link

VanTanev commented Aug 19, 2021

Overall, I'm very much in favor of immediately terminating when an unhandled exception occurs. It is by far the safest approach, instead of leaving the system in an indeterminate state.


My two cents on some specifis:

Effect of uncaught/unhandled errors on the XState runtime

100% agree that the program should terminate on unhandled errors
It's a bit wordy, it can probably be trimmed down by almost half

Actor hierarchy considerations

Super agree with recursive crashing until some parent actor handles the error, or crash the whole thing
Maybe elaborate on error.* - how do you propose that to work?

States hierarchy

The raise('EV') example is confusing - I personally had forgotten what raise() does, and probably a lot of people without deep familiarity with XState won't know what it does - It almost looks like it's trying to raise an exception! Does send('EV', { to: 'actor'}) do something similar, for the purposes of the example?
The example of how a transition might halt is very useful and important to understand the changes proposed. I do think that the trade-off is worth it.

Communicating intentional errors

This is good to have but should be an extra API on top of being able to raise normal errors of any type.

@davidkpiano
Copy link
Member

According to the SCXML semantics, errors are just events - internal or external ones, depending on the source of an error. This makes them processed like any other event of a given type.

I find this problematic. Consider a simple example like this: ...

With this section, I agree with the problem (that the next state is settled before executing actions which may throw) but I don't fully agree with the solution (execute actions while determining the next state).

Here's how I think it should go: machine.transition(...) should always deterministically return the potential next state as if none of the actions will throw errors. This doesn't mean that the interpreter will settle on this state; before settling on the state, the actions should be executed, and if one throws an error, then the current state should not change and the fail-fast behavior you described should be followed (raising an error event). This means that we are still in the state where onError can be applied as expected.

An example of this behavior, given this example machine:

createMachine({
  initial: 'A',
  states: {
    A: {
      on: { EVENT: { target: 'B', actions: ['action1', 'throws', 'action3'] } },
      onError: 'X'
    },
    B: {},
    X: {}
  }
});
service state machine state
initial state A state A
EVENT state A state B
(n actions)
state B is determined but isn't the settled state yet
❌ execute n actions (throws) state A state B
(n actions)
execute actions first before settling on state B (assume one throws)
raise error.* state A state X
(0 actions)
state X is determined (state A + error.* event via .onError)
✅ execute 0 actions state A state X no actions to execute, assume no errors
🏁 settle state X state X since no errors were thrown, only now settle the state

The way we can implement this is by inverting the order of updating the state in the interpreter; instead of settling on the state and then executing actions, we determine what the next state should be (before settling on it), execute actions, and then settle. Between executing actions and settling, an error may occur which may raise an event, and the new potential state will be determined based on the previously settled state and the raised event (eg., an error.* event).

@nobrayner
Copy link

This means that we are still in the state where onError can be applied as expected.

I may be completely on the wrong train of thought, but this only seems to work for exit and transition actions, as you are effectively still in the process of leaving the previous state. However for entry actions, aren't you effectively in the new state at that point in time?

This leads me to think that if exit or transition actions throw, then use the current state's onError; if entry actions throw, then use the next state's onError.


Though as I write this, there is the possibility of performing multiple exit, transtion, enter actions for different states... Any interim states between current and next. So that would then mean an action needs an associated source state, or at least have the correct onError transition associated with it. 🤔

@jbeard4
Copy link

jbeard4 commented Oct 7, 2021

Hi @Andarist,

Here are my thoughts on the RFC:

I agree with your criticism about the spec. The default behavior is to swallow JS errors, which is an easy gotcha, and not very friendly to new users. It puts the responsibility on the developer to explicitly watch for these errors, which is not a great design choice.

At the same time, I think catching errors and emitting them back into the state machine is a good design choice, because then you can handle the errors in your state machine logic, which is a useful thing to do. A common pattern I have implemented in SCXML is to create a top-level Error state, a top-level transition on "error.*", with generic error handling behavior that operates on _event.data (which is populated with a JavaScript Error object). You want to potentially capture this "recoverable error" state, as you want to respond to events with different behavior based on being in this state.

Finally, letting the application crash on error is not a bad idea. This reminds me of Erlang's philosophy, which is to just let the program crash and resurrect itself into a good state.

Therefore, the approach I like best with regard to handling errors in actions, is to:

  1. Raise a error.execution event back into the state machine. This facilitates compliance with the specification, and also allows the state machine the option of handling the error with its internal logic.

  2. If the error event is not handled by the state machine by the end of the macrostep (according to normal state machine execution rules - the error event is raised and added to the internal event queue, and processed in the same order of events), then the interpreter should re-throw the error. This has the advantage that the error event will not be swallowed by the interpreter. This forces the user to handle unhandled errors, and then the user can decide what to do with it (i.e. they could wrap it in a try/catch and choose to ignore it, or let the program crash and restart if they want to).

Thank you for your time and attention to this.

@Andarist
Copy link
Member Author

Overall, I'm very much in favor of immediately terminating when an unhandled exception occurs. It is by far the safest approach, instead of leaving the system in an indeterminate state.

That's exactly my line of thinking 👍

Maybe elaborate on error.* - how do you propose that to work?

In v5 we have introduced "partial" descriptors, so for instance on: {'foo.*': {}} would match foo.bar and foo.bar.xyz. And since errors would "just" be events you could still write "custom" handlers/transitions for them, based on the naming conventions cause their types would always start with error.

The raise('EV') example is confusing - I personally had forgotten what raise() does, and probably a lot of people without deep familiarity with XState won't know what it does - It almost looks like it's trying to raise an exception! Does send('EV', { to: 'actor'}) do something similar, for the purposes of the example?

No, this example is specifically illustrating a problem with internal events. Internal events (raised ones) are basically always resolved first. Imagine that XState maintains 2 message queues:

internalQueue: [],
externalQueue: [],

and events can be sent to a particular machine at any time. We always "drain" the internalQueue before even checking if there is anything to be handled in the externalQueue. Also note that while processing internalQueue we still accept events to be pushed to it, so, in theory, it's possible that we won't ever get to processing the external events, if the machine would constantly raise internal events. This would be somewhat an equivalent of an infinite loop though - so in practice this doesn't happen often and is usually a sign of a problem with the statechart.

This is good to have but should be an extra API on top of being able to raise normal errors of any type.

Agreed, unhandled errors might always happen and we must be able to capture and process them.

Here's how I think it should go: machine.transition(...) should always deterministically return the potential next state as if none of the actions will throw errors. This doesn't mean that the interpreter will settle on this state; before settling on the state, the actions should be executed, and if one throws an error, then the current state should not change and the fail-fast behavior you described should be followed (raising an error event). This means that we are still in the state where onError can be applied as expected.

I don't think this would play out well in practice. Consider this example:

createMachine({
  initial: 'A',
  states: {
    A: {
      on: { EVENT: 'B },
      onError: 'X'
    },
    B: {
      entry: ['action1', 'throws', 'action3'],
      onError: 'Y'
    },
    X: {},
    Y: {},
  }
});

What would you expect, as the reader, of this? Would we get to X or Y state? IMHO the reasonable approach is to end up in Y but you kinda propose ending up in X.

I think it's very important to allow for "local" reasoning. If I take a look at any particular action I should be able to only check state node on which this action is defined (and its ancestors) for onError handlers. The rest of the state tree should always be irrelevant for that - cause if we make it relevant the we, as readers, have to consider all of the statechart to understand which error handler can be called based on an error thrown by a given action. This doesn't scale and would be very, very hard to do "manually" for complex/bigger cases. By focusing on the "definition state path" we limit the possibilities and we don't overload the reader with information as this is very easy to scan (especially on the viz).

Wehn I think about I see states being wrapped with try/catches and it builds a familiar and nice mental model for this:

try {
  try {

  } catch (err) {

  }
} catch (err) {

}

Notice how this structure resembles the state tree! It's also hierarchical - so if I imagine this imaginary thing then it just "fits" nicely together:

{
  initial: 'A',
  states: {
    // A      
    try {
      on: { EVENT: 'B },
    } catch (err) {
      'X'
    },
    // B
    try {
      entry: ['action1', 'throws', 'action3'],
    } catch (err) {
      'Y'
    },
    // X
    try {} catch (err) {},
    // Y
    try {} catch (err) {},
  }
}

By looking at this I can always expect going into Y block when something throws in the B block.

I may be completely on the wrong train of thought, but this only seems to work for exit and transition actions, as you are effectively still in the process of leaving the previous state. However for entry actions, aren't you effectively in the new state at that point in time?

@nobrayner I believe that you are agreeing here with me and with the example that I've just put here, in this comment, right? Could you confirm that we are on the same page? Or maybe I've misunderstood you.

This leads me to think that if exit or transition actions throw, then use the current state's onError; if entry actions throw, then use the next state's onError.

If we consider what "state" is in XState then I would classify this as slightly incorrect approach. I propose to use "state nodes" as the base for onError selection. Cosnider this:

createMachine({
  initial: 'A',
  states: {
    A: {
      on: { EVENT: 'B },
    },
    B: {
      entry: ['action1', 'throws', 'action3'],
      onError: 'X'
      states: {
        C: {
          onError: 'Y'
        } 
      }
    },
    X: {},
    Y: {},
  }
});

I think that in this exaample, it makes sense to transition to X because from the reader's PoV the C state has not yet been "entered" when the error was thrown by the throws action. However, traditionally in XState, the "next" state here would be the one containing C state node. So it's an important distinction for me to use state nodes for this selection, and not states.

Though as I write this, there is the possibility of performing multiple exit, transtion, enter actions for different states... Any interim states between current and next. So that would then mean an action needs an associated source state, or at least have the correct onError transition associated with it. 🤔

Yes, this :P (sorry, I just go through all comments top-down and respond bit by bit). So I guess we are also on the same page after all, when it comes to the example above.


At the same time, I think catching errors and emitting them back into the state machine is a good design choice, because then you can handle the errors in your state machine logic, which is a useful thing to do. A common pattern I have implemented in SCXML is to create a top-level Error state, a top-level transition on "error.*", with generic error handling behavior that operates on _event.data (which is populated with a JavaScript Error object). You want to potentially capture this "recoverable error" state, as you want to respond to events with different behavior based on being in this state.

Yep, this reflects my tran of thought. I want to make users to explicitly implement those top-level transitions for error.* so they can think about designing their state machine logic when errors happen.

Finally, letting the application crash on error is not a bad idea. This reminds me of Erlang's philosophy, which is to just let the program crash and resurrect itself into a good state.

I won't lie - Erlang's philosophy was a good validation metric for me when proposing this :)

Raise a error.execution event back into the state machine. This facilitates compliance with the specification, and also allows the state machine the option of handling the error with its internal logic.

I've mentioned in the RFC that I find it a little bit problematic that those error events in SCXML are "just" raised. This introduces risks because other actions in the current microstep (and even in next microsteps if the internalQueue isnt currently empty) to just "continue". I propose to make those raised error events of even a higher priority - so the transition would be selected based on it ASAP. On top of that I propose to skip executing action3 in this example:

entry: ['action1', 'throws', 'action3'],

What are your thoughts on that?

If the error event is not handled by the state machine by the end of the macrostep (according to normal state machine execution rules - the error event is raised and added to the internal event queue, and processed in the same order of events), then the interpreter should re-throw the error.

Where it should be rethrown though? How the user is supposed to "catch" them? Should they be able to prevent machine failure from such a catch handler that would not be the part of machine's transitions? And if you agree with the "let the program crash" philosophy, then shouldnt we... crash that machine? And notify its parent (if there is any)?

@dc0d
Copy link

dc0d commented Mar 11, 2022

Context: xstate/statecharts noob + used state machines in different context / wanted to try xstate on server-side inside some domain service (not UI facing)

♥️ I believe that the only reasonable thing to do when dealing with an uncaught exception is to terminate the program.

♥️ I propose to treat all errors the same /

❌ adding onError /

Any error that has no clear meaning in the domain should halt the state machine and propagate up to the application - halt as killed/frozen; no state changes or anything.

On the server-side, most of the non-domain errors are fallacies of distributed computing. Timeouts are there; retries are needed. The core service has to guarantee the consistency of an aggregate (inside its boundaries) - so it also has to be idempotent while reacting to domain events and such.

(IMHO) Those are not the concerns that should be covered by xstate. So, the resulted errors should not be handled inside the xstate (or be represented - in any way - inside xstate - injected behaviors aside which could be invoked inside some action and is another topic).

The way I see xstate is as an actor that represents an aggregate/entity inside a domain - and comes to life only in response to meaningful events.

Unless it is taking a path of becoming some "cloud-native"y, all-in-one solution (I am not saying it would be bad or good. It's just a very different thing).

(I could be wrong)

@mattpocock
Copy link

@Andarist Would love some more details about how spawn works with this RFC. I'm guessing that spawned machines do this when they error:

  1. Escalate the error to the parent machine
  2. The parent machine checks which state it's in and calls the appropriate onError.

This feels like it makes sense to me. It would be more ergonomic to add an onError to the spawn itself, but this breaks the rules of statecharts (can't specify a dynamic transition).

@davidkpiano
Copy link
Member

We should also preserve the original error message (including stack trace) as much as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants