Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: add app.state #80

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

feat!: add app.state #80

wants to merge 9 commits into from

Conversation

fubuloubu
Copy link
Member

@fubuloubu fubuloubu commented May 5, 2024

What I did

This PR adds worker shared state (accessible via app.state) and adds 2 new system tasks to support loading and storing a snapshot of that managed state. Currently this is just last_block_processed and last_block_seen.

This paves the way for the Parameter feature, but does not include that just yet.

BREAKING CHANGE: state snapshot features migrates from runner to worker and must be triggered by runner

How I did it

How to verify it

Checklist

  • Passes all linting checks (pre-commit and CI jobs)
    - [ ] New test cases have been added and are passing
  • Documentation has been updated
  • PR title follows Conventional Commit standard (will be automatically included in the changelog)

@fubuloubu fubuloubu requested a review from mikeshultz May 5, 2024 02:23
@fubuloubu fubuloubu changed the title feat: add app.state and parameters API feat!: add app.state and parameters API May 5, 2024
Copy link
Member

@mikeshultz mikeshultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick initial pass. Kind of wish "state" wasn't used for so many different things, it's getting confusing.

silverback/application.py Show resolved Hide resolved
silverback/application.py Show resolved Hide resolved
silverback/runner.py Outdated Show resolved Hide resolved
silverback/application.py Outdated Show resolved Hide resolved
silverback/application.py Outdated Show resolved Hide resolved
@fubuloubu

This comment was marked as outdated.

@fubuloubu fubuloubu force-pushed the feat/add-app-state branch 2 times, most recently from 20ee2b1 to 0af18b8 Compare May 30, 2024 23:47
@fubuloubu fubuloubu changed the title feat!: add app.state and parameters API feat!: add app.state May 30, 2024
@fubuloubu fubuloubu marked this pull request as ready for review May 30, 2024 23:55
@fubuloubu
Copy link
Member Author

So given that distributed TaskIQ sequencing currently operates as multi-threading (and not multi-processing), this approach of worker-managed application state is okay as long as there isn't multiple workers trying to write the same state at the same time. We may need to integrate a lock to the AppState datatype in order to ensure this, but in terms of the mechanism it will actually function as long as there is only one container with one process operating the multi-threaded workers.

@fubuloubu fubuloubu requested a review from mikeshultz May 30, 2024 23:58
mikeshultz
mikeshultz previously approved these changes May 31, 2024
example.py Outdated Show resolved Hide resolved
example.py Outdated Show resolved Hide resolved
@fubuloubu
Copy link
Member Author

fubuloubu commented May 31, 2024

TODO:

  • write documentation about app.state and how it gets used

mikeshultz
mikeshultz previously approved these changes Aug 16, 2024
Copy link
Member

@mikeshultz mikeshultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't really understand why the datastore is on the worker side. I'm going to have to basically re-invent that whole thing for the platform side, more or less reverting or duplicating some of this design. But I think a bunch of this was already touched in previous discussions so probably pointless to mention.

Will try and figure out how to make this work on the cluster side when I get a chance.

@fubuloubu
Copy link
Member Author

fubuloubu commented Aug 17, 2024

I still don't really understand why the datastore is on the worker side.

Originally I don't think I actually understood what you were saying, but yes it is very clear to me now that needs to be inverted. The application state being worker side is what I wanted here, but the state snapshot loading and backup should be entirely on the runner client to perform. My mistake.

I hope you agree the solution should be to update this system function and add startup_state as a parameter it receives from the runner to update the local state on startup, then update this system function to just return the state snapshot back to the runner to write to disk; lastly, can move the AppDatastore back to the SDK's BaseRunner (and maybe make that a configurable ABC if you need to use it downstream, up to you).

Adds 2 new system tasks to load and store a snapshot of state.
Currently this is just `last_block_processed` and `last_block_seen`.

This paves the way for the `Parameter` feature, but does not include
that just yet.

BREAKING CHANGE: state snapshotting migrates from runner to worker
Also updated docstrings within `silverback/application.py`
@fubuloubu
Copy link
Member Author

I still don't really understand why the datastore is on the worker side. I'm going to have to basically re-invent that whole thing for the platform side, more or less reverting or duplicating some of this design. But I think a bunch of this was already touched in previous discussions so probably pointless to mention.

Will try and figure out how to make this work on the cluster side when I get a chance.

@mikeshultz resolved in 07dd9f9

Copy link
Member

@mikeshultz mikeshultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question on design inline. I'm good with it as is though.

# TODO: Migrate these to parameters (remove explicitly from state)
last_block_seen=self.state.get("system:last_block_seen", -1),
last_block_processed=self.state.get("system:last_block_processed", -1),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Do we have future plans for this handler? This more or less is just the runner sending these two values to the worker to send back to the runner.

Or maybe we intend to make this available to user code since it's called every checkpoint?

Copy link
Member Author

@fubuloubu fubuloubu Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State snapshots in general will be created using this mechanism (parameters will make use of this), but I added system: to the front of these specific internal state variables because then they cannot conflict with any attribute added to state directly by users (e.g. app.state.something = ... by definition cannot start with system:)

One of my ideas when adding parameters is that those are only allowed to be specific types, and only parameters will end up in state snapshots that get communicated back to the cluster services. All other non-parameter state will not be backed up in snapshots.

There may exist other features that make use of this, like nonce tracking, although they may need better concurrency guarantees (like using a lock)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants