Add Option for reconnect Back-Off (to prevent continual reconnection attempts when connection closed by remote host) #589

chess555 · 2022-03-17T19:15:49Z

Due to the way connect/reconnects are handled, no delay is used when a remote host closes a connection.

The simplest example is publishing using a bad topic with QoS > 0
The client will store the bad message, and get stuck in a cycle of connect, publish, connection closed, reconnect, ...

Similar behavior has been noted when:
Connection is dropped due to duplicate clientids in a broker
Performing an operation (subscribe/publish) on an AWS data broker when your device lacks the policy settings, or has had policy permissions revoked.

MattBrittan · 2022-03-17T19:32:48Z

I'm not sure that there is a solution to this unfortunately; as per the spec:

If a Server implementation does not authorize a PUBLISH to be performed by a Client; it has no way of informing that Client. It MUST either make a positive acknowledgement, according to the normal QoS rules, or close the Network Connection [MQTT-3.3.5-2].

So most 'well behaved' brokers will complete the handshake and throw the message away. While disconnecting is given as an option the client has no way to determine why the broker dropped the connection and, as such, cannot remove the offending message from it's store (the spec does not make any provision for 'rejected' messages).

The same applies in other situations (e.g. another client connects with the same client ID); the broker just drops the connection and we have no way to differentiate that from a network issue.

The above means that I don't think we really have any option other than to attempt to reconnect immediately and send through any queued messages. If you can suggest an alternative I'll definitely consider it (but most libraries use the same approach).

chess555 · 2022-03-17T22:19:31Z

Sorry for not being clear, but the only issue I'm pointing at is the reconnect frequency performed when a close event occurs. (The examples were just meant as a situation where this occurs)

When a remote host closes a connection in this manner, I'm seeing upwards of 50-100 connect/disconnect events per second, and am concerned about this behavior (particularly on constrained networks)

In my case, I was able to use the Reconnect handler to add a delay after receiving an an io.EOF event.

MattBrittan · 2022-03-17T22:33:42Z

No worries - I have amended the title so that it more clearly covers what I believe you are requesting.

The initial reconnection attempt needs to be immediate (because the issue may be a momentary network glitch) but I agree that continually attempting to reconnect is counter productive. Some form of Back-Off algorithm (reset after the connection has been up for more than a user specified time) would be beneficial.

…ackoff related to eclipse-paho#589 Signed-off-by: Daichi Tomaru <[email protected]>

…n lost is detected immediately after connecting. eclipse-paho#589 Signed-off-by: Daichi Tomaru <[email protected]>

tomatod · 2022-12-28T01:03:16Z

Summary of this issue (I think)

There seem to be at least 3 points should have appropriate sleep with back-off algorithm.

No	Situation	Presence of implementation	Cause
1	Unsuccessful initial connection	No	Mere connection failure
2	Unsuccessful reconnection after connection lost	Yes	Mere connection failure
3	Connection lost immediately after successful reconnection	No	Unexpectedly disconnected immediately after connection

In this GitHub issue, the following are nowly reported for each points.

Trigger of No.1:
- Connecting AWS broker without appropriate authority.
Trigger of No.3:
- Duplicate clientids
- Invalid publish

Cause of No.3

When incomming loop recieves a error about connection lost, internalConnLost method is called.
https://github.com/eclipse/paho.mqtt.golang/blob/master/client.go#L673-L695
Then, internalConnLost method called reconnect method.
https://github.com/eclipse/paho.mqtt.golang/blob/master/client.go#L560
It doesn't take long to reconnect broker. Although network connection combacks, some brokers disconnect due to MQTT issues (Duplicate clientids, Invalid publish,...). Then, return to 1. No any back-off sleep is during this time.
https://github.com/eclipse/paho.mqtt.golang/blob/master/client.go#L314-L317

… lost is detected immediately after connecting. eclipse-paho#589 Signed-off-by: Daichi Tomaru <[email protected]>

…reconnect loops Add back-off controller for sleep time of reconnection when connection lost is detected immediately after connecting. #589 This issue could be caused by an invalid publish request (which leads to the broker dropping the connection immediately).

MattBrittan changed the title ~~Reconnect "Thrashing" when connection closed by remote host~~ Add Option for Back-Off when connection lost (to prevent continual reconnection attempts when connection closed by remote host) Mar 17, 2022

MattBrittan added enhancement help wanted labels Mar 17, 2022

tomatod added a commit to tomatod/paho.mqtt.golang that referenced this issue Dec 26, 2022

Add MaxConnectRetryInterval client option for reconnect exponential b…

f458ea1

…ackoff related to eclipse-paho#589 Signed-off-by: Daichi Tomaru <[email protected]>

tomatod mentioned this issue Dec 26, 2022

Add back-off controller for sleep time of reconnection when connection lost is detected immediately after connecting. #589 #625

Merged

tomatod added a commit to tomatod/paho.mqtt.golang that referenced this issue Dec 27, 2022

Add back-off controller for sleep time of reconnection when connectio…

2749ad4

…n lost is detected immediately after connecting. eclipse-paho#589 Signed-off-by: Daichi Tomaru <[email protected]>

tomatod added a commit to tomatod/paho.mqtt.golang that referenced this issue Dec 31, 2022

Add back-off controller for sleep before reconnection when connection…

d174b9a

… lost is detected immediately after connecting. eclipse-paho#589 Signed-off-by: Daichi Tomaru <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Option for reconnect Back-Off (to prevent continual reconnection attempts when connection closed by remote host) #589

Add Option for reconnect Back-Off (to prevent continual reconnection attempts when connection closed by remote host) #589

chess555 commented Mar 17, 2022

MattBrittan commented Mar 17, 2022

chess555 commented Mar 17, 2022

MattBrittan commented Mar 17, 2022

tomatod commented Dec 28, 2022 •

edited

Loading

Add Option for reconnect Back-Off (to prevent continual reconnection attempts when connection closed by remote host) #589

Add Option for reconnect Back-Off (to prevent continual reconnection attempts when connection closed by remote host) #589

Comments

chess555 commented Mar 17, 2022

MattBrittan commented Mar 17, 2022

chess555 commented Mar 17, 2022

MattBrittan commented Mar 17, 2022

tomatod commented Dec 28, 2022 • edited Loading

Summary of this issue (I think)

Cause of No.3

tomatod commented Dec 28, 2022 •

edited

Loading