-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove min_isr #789
Remove min_isr #789
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit needs a lot more explanation
Something like this? Or are you looking for something else? Setting minimum in sync replicas ( Several workarounds for this where taken into consideration, but we deemed none of them to be good enough. Which is why we decided to remove |
What can happen without a properly working |
Sure, added this to the comment: |
Setting minimum in sync replicas (
min_isr
) > 0 with the current implementation will block LavinMQ from starting. The reason for this is that LavinMQ does some file actions on startup (cleanup, truncate, etc - for example the actions taken here). Withmin_isr
set, these actions needs to be replicated to at least n (min_isr
) number of nodes before LavinMQ can continue. However, the listener for the replication server has not been started yet when these actions are taken, which makes it impossible for any followers to connect to the leader, and therefore makes it impossible to replicate the actions to the required number of nodes. This leaves the leader node stuck on waiting for actions to be replicated to followers who are themselves waiting for the leader to accept connections.Using clustering without
min_isr
will work the same way as ifmin_isr
would be set to 0: The leader will not wait for followers to connect/reconnect before accepting connections and data. If followers are connected and in-sync, they will be kept in-sync for as long as possible. But if a follower disconnects, the leader node will not wait for it to come back online before accepting new connections and data. This means that it is possible to have data that isn't synced to any follower.Several workarounds for this where taken into consideration, but we deemed none of them to be good enough. Which is why we decided to remove
min_isr
for now, until we have a better idea of how to implement it.WHAT is this pull request doing?
Removes settings and code related to
min_isr
.The current implementation has been saved as a backup in https://github.com/cloudamqp/lavinmq/tree/min_isr_backup.
HOW can this pull request be tested?
Run specs