Skip to content

Commit

Permalink
Draft for clustering RFC
Browse files Browse the repository at this point in the history
Signed-off-by: Heinz N. Gies <[email protected]>
  • Loading branch information
Licenser committed Nov 14, 2023
1 parent bb43236 commit a18188c
Showing 1 changed file with 63 additions and 40 deletions.
103 changes: 63 additions & 40 deletions rfc/accepted/0021-clustering-stage-1.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,86 +3,109 @@
- Feature Name: Clustering - Stage 1
- Start Date: 2023-11-14
- Tremor Issue: [tremor-rs/tremor-runtime#2219](https://github.com/tremor-rs/tremor-runtime/pull/2219)
- RFC PR: [tremor-rs/tremor-www#0000](https://github.com/tremor-rs/tremor-rfcs/pull/0000)
- RFC PR: [tremor-rs/tremor-www#308](https://github.com/tremor-rs/tremor-www/pull/308)

## Summary
[summary]: #summary

One paragraph explanation of the feature.
This RFC covers the introduction of the most basic clustering functionality to Tremor. New features introduced are a shared KV store (exposed as a connector), cluster membership management, clusgter application deployment (in a all node or single node fashion). No further features or functionality are introduced in this RFC and are out of scope as expansion on the clustering functionality is expected to be handled in further RFCs.

## Motivation
[motivation]: #motivation

Why are we doing this? What use cases does it support? What is the expected outcome?
In sceanrios with a significant eventload it is quickly possible to satturate a single pipelines capability to process all the data in real time. As of today the woraround is to deploy multiple, independant tremor nodes each taking a part of the eventload. This is not ideal as it requires manual intervention to scale up and down and does not allow for a single tremor instance to be used as a single point of entry for all events.

In addition methods for storing dynamic configuration such as files, or the `kv` connector are instance limited and changes in configuration need to be made seperately to every node in a multi node deployment.

The distributred application framework, the cluster management and the shared datastore resolve those issues by allowing to see a cluster deployemnt as a single entity that can share applications and configuration.

Aside of the deprication of the old, largely defunct api for remote managing nodes, this change does not affect the existing usecases, single node tremors can still be run and are unaffected by this RFC or the changes it introduces.

## Guide-level Explanation
[guide-level-explanation]: #guide-level-explanation

Explain the proposal as if it was already included in Tremor and you were teaching it to another Tremor user. That generally means:
Stage 1 clustering consists of three main components:

### Cluster Membership Management

The clsuter membership is managed by RAFT consensus, it is a strongly consistent algorithm that ensures that any comitted change will persmist as long as half the nodes of a cluster are reachable. To interact with this the CLI introduces a new `cluster` namesplace to allow creating, joining, leaving and inspecting the cluster - the implementation details of the CLI are out of scope for this RFC as they will fluidly change.

### Shared KV store

Based on the same RAFT consensus mechanism a key value store is implemented. This store serves as a place to store and share cluster configuration, applications and builds the foundation of the funcionality provided bvy the cluster. A API for inspecting the KV store is exposed for testing purposes.

- Introducing new named concepts.
- Explaining the feature largely in terms of examples.
- Explaining how stakeholders should *think* about the feature, and how it should impact the way they use Tremor. It should explain the impact as concretely as possible.
- If applicable, provide sample error messages, deprecation warnings, or migration guidance.
- If applicable, describe the differences between teaching this to existing Tremor stakeholders and new Tremor programmers.
In addition a new `cluster_kv` connector is introduced that allows to interact with the KV store from within tremor pipelines, allowing a pipeline to access data that is shared between all it's incarnations in a cluster. It has simple `read`, `consistant read` and `write` semantics.

For implementation-oriented RFCs (e.g. for language internals), this section should focus on how language contributors should think about the change, and give examples of its concrete impact. For policy RFCs, this section should provide an example-driven introduction to the policy, and explain its impact in concrete terms.
The `cluister_kv` connector is not meant to replace a dedicated database but for situations where some simple dynamic information needs to be stored and occasionally received by a pipeline.

### Tremor Applications

Deploying into a cluster comes with some new chalenges, the biggest one being that there is no guarantee that multiple nodes are identical. Historically with modularity introduced, a pipelein depended on the standard library along with any additional librarires that were installed on the machine the pipeline ran.

This method does not work well in a cluster for once the system initialing the pipeline is likely not part of the cluster but a administrative system, meaning it would require a user to keep this systems libraries directly in sync with the cluster. In addition the cluster nodes are not guaranteed to be identical, deployed at different times and differ in availaibel libraries.

To resolve this problem clusteirng stage 1 introduced the concept of a **tremor appllication**, this is basically an archive that contains all the code, libraries, configuration and metadata required to run a pipeline packed as a tar archive (fortunately the naming works both with the file format and with it being a Tremor ARchive ;) ). This archive is then deployed to the cluster and can be started via the API. The cluster will then ensure ensure the application is started on all nodes (or in the case of a snigle node application a snigle node).

## Reference-level Explanation
[reference-level-explanation]: #reference-level-explanation

This is the technical portion of the RFC. Explain the design in sufficient detail that:
### Tremor Application

The tremor application archive is a tar file with some well defined entries:

- `app.json` - a json file containing the application metadata
- `main.troy` - the main entry point to the application containing at least one flow definition
- any additional libraries referenced in the `main.troy`

- Its interaction with other features is clear.
- It is reasonably clear how the feature would be implemented.
- Corner cases are dissected by example.
#### `main.troy`

The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work.
The entrypoint file needs to contain at least one flow definiotn, any deploy statements will be ignored as deoplyment of an application is handled via the API.

The flow `main` takes a special meaning as the default flow of the application. When starting an application this flow is started by default, it is however possible to provide more the one flow or a flow with a name different then `main`. Those flows can then be started seperately via the API.

The flows arguments, defined via the `args` clause, can be provided via the API when starting the flow.

#### `app.json`

This metadat file is automatically created to contain details on the application, it's flows, it's dependenceis and the files in the archive. It is not meant to be manually modified, or created.

#### included libraries

Those libraries are automatically combined in the archive and indexed in the `app.json`, when deployed they will take the place of the libraries installed on the node. This means when deploying an application no node local files are consulted.

## Drawbacks
[drawbacks]: #drawbacks

Why should we *not* do this?
Since this is a pure addition and not a change there are no drawbacks to prior versions of tremor however there are some possible drawbacks in the aproach taken.

* Strong consistency is a demanding constraint for node management it poses the problem taht in the face of a partial autage, cluster reconfiguration might not be possible. This is a tradeoff is however just that, a tradeoff, the alternative of a eventual consistant backend for all configuration imposes other chalenges and for better or worse the decision was made to go with strong consistency.

* The cluster_kv connector is not tested for high performance needs as building a distributed kv store in itself is a massive undertaking and it's purpiose is to allow shared configuration not to be a fully fledged kv store. Improvements can be made to this but they can happen after the initial release.

* Raft expects all actions that are comitted to the log to be sucessful, however tremors internal API can't make a strong guarantee of this, so we are in the situation that we have to do this on a best effort baisis. For example parsing the application, technically could fail even so logically it shouldn't since that already was done during application creation.

## Rationale and Alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

- Why is this design the best in the space of possible designs?
- What other designs have been considered, and what is the rationale for not choosing them?
- What is the impact of not doing this?
An alternative would havce been a eventual consistant backend, the main choice to avoid this was that there are no good libraries for this in rust at the point of writing this as well as that it's easier to reduce constraints later then to enforce more constraints.

## Prior Art
[prior-art]: #prior-art

Discuss prior art, both the good and the bad, in relation to this proposal.
A few examples of what this can include are:
RAFT is a well known algorithm for strongly consistent systems.

- For language, library, tools, and clustering proposals: Does this feature exist in other programming languages, and what experience have their community had?
- For community proposals: Is this done by some other community and what were their experiences with it?
- For other teams: What lessons can we learn from what other communities have done here?
- Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background.

This section is intended to encourage you as an author to think about the lessons from other projects, provide readers of your RFC with a fuller picture.
If there is no prior art, that is fine- your ideas are interesting to us whether they are brand new or if it is an adaptation from other projects.

Note that while precedent set by other projects is some motivation, it does not, on its own, motivate an RFC.
Please also take into consideration that Tremor sometimes intentionally diverges from similar projects.
The Tremor Archive is insired byt Erlang Application archives and the JVMs JAR files

## Unresolved Questions
[unresolved-questions]: #unresolved-questions

- What parts of the design do you expect to resolve through the RFC process before this gets merged?
- What parts of the design do you expect to resolve through the implementation of this feature before stabilization?
- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?
The biggest unresolved question is how to handle failed requests in the LOGs well, while they should :tm: not happen there right now is no guarnatee that they don't during an applications lifecycle.

## Future Possibilities
[future-possibilities]: #future-possibilities

Think about what the natural extension and evolution of your proposal would be and how it would affect Tremor as a whole in a holistic way. Try to use this section as a tool to more fully consider all possible interactions with the project in your proposal. Also, consider how this all fits into the roadmap for the project and of the relevant sub-team.

This is also a good place to "dump ideas", if they are out of scope for the RFC you are writing but otherwise related.
The design of stage 1 was internally named the **micro ring** as the futur goal is to limit cluster membership and configuration storage to raft while outsourcing work to a **macro ring** that works as a eventual consystem system in a ring based configuration. This would allow to combine the benefits for strong consistent configuration with eventual consistent workload management.

If you have tried and cannot think of any future possibilities, you may state that you cannot think of anything.
Another later stage implementation is clusterwide communication, the option to forward events from one node to another an create a form of location idependance. This will allow splitting out ingres, egress and processing as well as more elaborate deployemnt and data partitioning strategies.

Note that having something written down in the future-possibilities section is not a reason to accept the current or a future RFC; such notes should be in the section on motivation or rationale in this or subsequent RFCs.
The section merely provides additional information.
Last but not least cluster aware deployments and more advanced placement stragegies are a reasonable followup to this RFC, so far only `all` and `one` node deployments exist but more advanced strategies are possible with further development. This would combine nicely with the partioning of data and the clusterwide communication mentioned above.

0 comments on commit a18188c

Please sign in to comment.