Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cluster] The replication log exception results in an unlimited retry election #1657

Open
WorkingChen opened this issue Sep 11, 2024 · 0 comments

Comments

@WorkingChen
Copy link
Contributor

If log replication or other exceptions cause the election to fail, the current code logic will reset the Election status to INIT. If this exception cannot be resolved, it will cause continuous election failures, affecting the normal operation of the entire cluster.

Consider adding a delay time when an exception occurs during the election to avoid frequent elections in a short period ?

exception cluster node log
io.aeron.archive.client.ArchiveException: ERROR - requested replay start position=214368015799296 is less than recording start position=214673337090048 for recording 0 at io.aeron.archive.ReplicationSession.hasResponse(ReplicationSession.java:742) ~[aeron-archive-1.44.1.jar!/:1.44.1] at io.aeron.archive.ReplicationSession.replay(ReplicationSession.java:576) ~[aeron-archive-1.44.1.jar!/:1.44.1] at io.aeron.archive.ReplicationSession.doWork(ReplicationSession.java:220) ~[aeron-archive-1.44.1.jar!/:1.44.1] at io.aeron.archive.SessionWorker.doWork(SessionWorker.java:64) ~[aeron-archive-1.44.1.jar!/:1.44.1] at io.aeron.archive.ArchiveConductor.doWork(ArchiveConductor.java:303) ~[aeron-archive-1.44.1.jar!/:1.44.1] at io.aeron.archive.DedicatedModeArchiveConductor.doWork(DedicatedModeArchiveConductor.java:58) ~[aeron-archive-1.44.1.jar!/:1.44.1] at org.agrona.concurrent.AgentRunner.doWork(AgentRunner.java:304) ~[agrona-1.21.1.jar!/:1.21.1] at org.agrona.concurrent.AgentRunner.workLoop(AgentRunner.java:296) ~[agrona-1.21.1.jar!/:1.21.1] at org.agrona.concurrent.AgentRunner.run(AgentRunner.java:162) ~[agrona-1.21.1.jar!/:1.21.1] at java.base/java.lang.Thread.run(Thread.java:898) [?:?]

leader node log
io.aeron.exceptions.AeronException: ERROR - Driver events adapter is invalid at io.aeron.ClientConductor.service(ClientConductor.java:1368) ~[aeron-client-1.44.1.jar!/:1.44.1] at io.aeron.ClientConductor.doWork(ClientConductor.java:196) ~[aeron-client-1.44.1.jar!/:1.44.1] at org.agrona.concurrent.AgentInvoker.invoke(AgentInvoker.java:147) ~[agrona-1.21.1.jar!/:1.21.1] at io.aeron.cluster.ConsensusModuleAgent.slowTickWork(ConsensusModuleAgent.java:2114) ~[aeron-cluster-1.44.1.jar!/:1.44.1] at io.aeron.cluster.ConsensusModuleAgent.doWork(ConsensusModuleAgent.java:346) ~[aeron-cluster-1.44.1.jar!/:1.44.1] at org.agrona.concurrent.AgentRunner.doWork(AgentRunner.java:304) ~[agrona-1.21.1.jar!/:1.21.1] at org.agrona.concurrent.AgentRunner.workLoop(AgentRunner.java:296) ~[agrona-1.21.1.jar!/:1.21.1] at org.agrona.concurrent.AgentRunner.run(AgentRunner.java:162) ~[agrona-1.21.1.jar!/:1.21.1] at java.base/java.lang.Thread.run(Thread.java:898) [?:?] Caused by: java.lang.IllegalStateException: unable to keep up with broadcast at org.agrona.concurrent.broadcast.CopyBroadcastReceiver.receive(CopyBroadcastReceiver.java:97) ~[agrona-1.21.1.jar!/:1.21.1] at io.aeron.DriverEventsAdapter.receive(DriverEventsAdapter.java:68) ~[aeron-client-1.44.1.jar!/:1.44.1] at io.aeron.ClientConductor.service(ClientConductor.java:1349) ~[aeron-client-1.44.1.jar!/:1.44.1] ... 8 more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant