Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Modeling writes as an extensible workflow #3237

Open
kartg opened this issue May 6, 2022 · 4 comments
Open

[RFC] Modeling writes as an extensible workflow #3237

kartg opened this issue May 6, 2022 · 4 comments
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing RFC Issues requesting major changes Roadmap:Cost/Performance/Scale Project-wide roadmap label

Comments

@kartg
Copy link
Member

kartg commented May 6, 2022

Note 1 - The discussion in this issue focuses on the write paths in OpenSearch, though the assertions herein are probably true for other parts of the codebase.

Note 2 - For clarity, the term “replication” will only be used to describe code paths that result in segment files being created on a shard. The act of sending requests from the primary shard to replicas will be termed “forwarding”.

What's the problem?

Much of the code in the write path for segments is tightly coupled together. For example:

  1. The implementation of index and delete operations uses a class hierarchy that mandates that these operations be replicated.
  2. The notion of “replication” is coupled to “forwarding” (see Note 2 above)
  3. The notion of a “write” is tightly coupled with a translog update.

(I'll talk about why we would want to solve for these in a moment.)

This coupling is driven by the code architecture. In technical terms, I'd say it uses a compile-time, top-down (inheritance-based) chain of responsibility (CoR) design pattern. Put more simply, it's like spaghetti lasagna - lots of layers encasing lots of noodley code.

Such a design pattern poses two problems:

  1. The compile-time nature precludes the ability to configure behavior at run-time
  2. The inheritance-based CoR pattern implicitly defines a fixed set of steps for the code, but misses out on the benefits of a unified orchestrator class or workflow definition - for example, the ability for a step to react to the result of a previous step

I think we can make this better (though that recipe is beyond saving IMO).

What should we do about it?

We should rearchitect write-path operations as a workflow comprised of the following configurable steps:

  • Reroute (route the incoming request to the correct shard/node)
  • Ingest (process the request)
  • Persist (make the results of the request durable)
    • This would include separate configuration/extension points for storage and translog
  • Forward (send the request to another node)

Persist and Forward will be conditional steps that rely on the output of prior steps to determine if they should execute.

Why do this now?

Because extensibility is one of key themes for Opensearch (#2095). It is essential that we start tackling this architectural limitation now since we have multiple ongoing initiatives for OpenSearch extensibility that require more run-time configurability:

  1. With the introduction of replication strategies like segment replication being defined per-index, write code paths can no longer simply mandate replication. Segment replication no longer needs “replication” to be coupled with “forwarding”.
  2. With a remote translog, the need for “forwarding” is removed entirely.
  3. The introduction of remote storage will affect the behavior of both replication and recovery.

Open Questions (aka things I'm mulling over)

  • What situations/architectures (if any) would require the Reroute step to be optional/configurable?
  • Is there a way to remove the need for an Engine class, so that ingest and translog can be configured independent of one another?
  • How does this workflow and the decoupling of replication vs forwarding affect sync actions?

Given the sheer breadth of functionality in the Opensearch codebase, there are probably other coupled components that I haven't considered. Please comment below if there are things that would break with this workflow approach, or other areas that may benefit from a similar approach.

@kartg kartg added enhancement Enhancement or improvement to existing feature or request discuss Issues intended to help drive brainstorming and decision making distributed framework RFC Issues requesting major changes labels May 6, 2022
@Bukhtawar
Copy link
Collaborator

Few observations

With #1319, the need for “forwarding” is removed entirely.

There are few cases today that inherently rely on forwarding writes for correctness, for instance if the primary is partitioned off from the rest of the cluster, it continues to accept writes, once it forwards the request to the replica which meanwhile has been promoted to the primary, it then becomes aware of the problem and hence respond with a failure which otherwise would have been acknowledged and writes diverged.
Forwarding is a nice property to detect such anomalies. We need to give it more thought before we get rid of this

Is there a way to remove the need for an Engine class, so that ingest and translog can be configured independent of one another?

With #1319, translogs would be decoupled from the Engine and would be made optional and extracted out
Would that simplify the Engine?

@dblock
Copy link
Member

dblock commented May 10, 2022

At a high level this makes a lot of sense to me. If you see a refactor increment that would make the code better I would PR that on main. An alternative could be to try and setup the new workflow as proposed without touching the existing implementation, that could be a good start in a feature branch.

@kartg
Copy link
Member Author

kartg commented May 11, 2022

There are few cases today that inherently rely on forwarding writes for correctness, for instance if the primary is partitioned off from the rest of the cluster, it continues to accept writes, once it forwards the request to the replica which meanwhile has been promoted to the primary, it then becomes aware of the problem and hence respond with a failure which otherwise would have been acknowledged and writes diverged.

Thanks @Bukhtawar ! These are exactly the kind of blind spots I was hoping to get feedback on. I'd like to understand this particular behavior more in depth. Are you aware of any reading material to learn more about this? Or could you point me to other resources (code, exceptions, etc.) from where I could start learning?

With #1319, translogs would be decoupled from the Engine and would be made optional and extracted out
Would that simplify the Engine?

Decoupling translog behavior from Engine would indeed simplify its code. Assuming the next step is to decouple ingest/storage from Engine, I'm left wondering what value the Engine class itself is adding by encapsulating those two extension points.

To use a food analogy again 😉 - what I'm asking is if the two extension points are like a bagel 🥯 and cream cheese (where it doesn't make sense to have them separately) or like fries 🍟 and soda (where it does)

@kartg
Copy link
Member Author

kartg commented May 12, 2022

@nknize I saw this PR comment from you so I'd love to hear what you think about the ideas here, especially around what value the Engine class provides (as discussed above 👆 )

@anasalkouz anasalkouz added Indexing Indexing, Bulk Indexing and anything related to indexing and removed distributed framework labels Sep 19, 2023
@msfroh msfroh added the Roadmap:Cost/Performance/Scale Project-wide roadmap label label May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing RFC Issues requesting major changes Roadmap:Cost/Performance/Scale Project-wide roadmap label
Projects
Status: New
Development

No branches or pull requests

5 participants