Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Condition for telemetry message resend #2499

Closed
ClaesIvarsson opened this issue Jul 6, 2023 · 13 comments
Closed

Condition for telemetry message resend #2499

ClaesIvarsson opened this issue Jul 6, 2023 · 13 comments

Comments

@ClaesIvarsson
Copy link

Environment:
Wifi connection over mesh network (NAT)
ESP32
MQTT protocol

I have noticed that some telemetry messages sent from user level will actually be sent twice to the IoT hub .
Enabling Trace on the SDK stack shows a resend of the message, both will later get PUBACK response but only after the message is sent twice.

This happens only for some messages and not for all. At first I thought it was related to size of the payload but it is not consistent.

I have tried executing the IoTHubDeviceClient_LL_DoWork with different period but it is also not clear even though it seems to be less frequent with longer period (changed from 40 to 100 ms).

How long is the PUBACK timeout ?
I have understood that there is no time set by the MQTT standard , even though I have seen 60s mentioned. In my case the retransmission happens within less than a second.

The environment with wifi connection and possible delays might be a reason for this to happen but I do not understand why the retransmission occurs so fast after first message being sent.

Any other explanation that come to mind ?

@ericwolz
Copy link
Contributor

ericwolz commented Jul 6, 2023

Timeout is documented in the below link. Please post your device logs.

https://github.com/Azure/azure-iot-sdk-c/blob/main/doc/connection_and_messaging_reliability.md

@ClaesIvarsson
Copy link
Author

Timeout is documented in the below link. Please post your device logs.

https://github.com/Azure/azure-iot-sdk-c/blob/main/doc/connection_and_messaging_reliability.md

Thanks for you reply but I think that my questions regarding timeout management for PUBACK in MQTT is not answered in the documentation you are referring to.

I can't find anything related to timeout waiting for PUBACK when using MQTT.

OPTION_MESSAGE_TIMEOUT is marked as deprecated
OPTION_RETRY_INTERVAL_SEC is used for connection retries
OPTION_DO_WORK_FREQUENCY_IN_MS is not applicable when using IoTHubDeviceClient_LL_* api's

In the code, there is two timeouts define in ProcessPendingTelemetryMessages.
These timeouts are not possible to change in user scope and is set to 60s and 120s.

Here is a log where I get duplication of telemetry messages
-> (159454) PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices//messages/events/%24.ct=application%2fjson&%24.ce=utf-8 | PACKET_ID: 18 | PAYLOAD_LEN: 950
-> (159508) PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices//messages/events/%24.ct=application%2fjson&%24.ce=utf-8 | PACKET_ID: 18 | PAYLOAD_LEN: 950
-> (159559) PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/PATCH/properties/reported/?$rid=19 | PAYLOAD_LEN: 412
-> (159564) PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices//messages/events/%24.ct=application%2fjson&%24.ce=utf-8 | PACKET_ID: 20 | PAYLOAD_LEN: 196
<- (159573) PUBACK | PACKET_ID: 18
<- (159656) PUBACK | PACKET_ID: 18
<- (159790) PUBACK | PACKET_ID: 20
<- (159915) PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/res/204/?$rid=19&$version=7661 | PAYLOAD_LEN: 0

Number in parentheses is millisec from start
DoWork is executed approximately every 20ms and the delta times between original PUB and the second PUB is 54.
For me this is strange that it happens so fast since all timeouts are on a completely different level.

I am just trying to understand why this happens! As telemetry messages are DELIVER_AT_LEAST_ONCE, it could happen that there are duplicates but I can't find an explanation when this will happen.

@ericwolz
Copy link
Contributor

You have removed pertinent information from the SDK logs for us to debug your situation.

@ClaesIvarsson
Copy link
Author

ClaesIvarsson commented Jul 13, 2023

-> (159454) PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/a584297f-a9eb-4eab-9fdc-d392fa863dc4/messages/events/%24.ct=application%2fjson&%24.ce=utf-8 | PACKET_ID: 18 | PAYLOAD_LEN: 950
-> (159508) PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/a584297f-a9eb-4eab-9fdc-d392fa863dc4/messages/events/%24.ct=application%2fjson&%24.ce=utf-8 | PACKET_ID: 18 | PAYLOAD_LEN: 950
-> (159559) PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/PATCH/properties/reported/?$rid=19 | PAYLOAD_LEN: 412
-> (159564) PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_LEAST_ONCE | TOPIC_NAME: devices/a584297f-a9eb-4eab-9fdc-d392fa863dc4/messages/events/%24.ct=application%2fjson&%24.ce=utf-8 | PACKET_ID: 20 | PAYLOAD_LEN: 196
<- (159573) PUBACK | PACKET_ID: 18
<- (159656) PUBACK | PACKET_ID: 18
<- (159790) PUBACK | PACKET_ID: 20
<- (159915) PUBLISH | IS_DUP: false | RETAIN: 0 | QOS: DELIVER_AT_MOST_ONCE | TOPIC_NAME: $iothub/twin/res/204/?$rid=19&$version=7661 | PAYLOAD_LEN: 0

@ericwolz
Copy link
Contributor

ericwolz commented Jul 13, 2023

There are no service logs with a device ID of a584297f-a9eb-4eab-9fdc-d392fa863dc4. What hub is this on? What are the UTC timestamp of these messages?

  • What SDK version are you using?
  • What SDK APIs are you using for the messages? (IoTHubDeviceClient_LL_SendEventAsync)?
  • What are the message callback status?

@ClaesIvarsson
Copy link
Author

Do you suspect that this is invalid behavior or can it really happen under certain circumstances ?

I am using 1.11.0 of the azure-iot-sdk-c on an ESP32 WROVER-IE in a wifi mesh network with NAT (might impact delays in the network).
command used for sending is IoTHubDeviceClient_LL_SendEventAsync
I use only IoTHubDeviceClient_LL_* api's
I always call these from the same task.
DoWork is called every 50ms

I will not be able to provide a more detailed log until after my vacation, starting today. Back in 4 weeks.

@ericwolz
Copy link
Contributor

ericwolz commented Jul 14, 2023

This could happen if your clocks drift and there is a significant time jump when updating NTP. The ESP32 internal oscillator is not very good. The message callback should have status on what is going on.

@ClaesIvarsson
Copy link
Author

OK, so how much is a significant time jump ?

@ericwolz
Copy link
Contributor

greater than the message timeout.

@ClaesIvarsson
Copy link
Author

So, I guess we are back to my original question: How long is the timeout for PUBACK of an MQTT message ?
I cannot find any documentation stating this information.

@ericwolz
Copy link
Contributor

ericwolz commented Aug 8, 2023

The max timeout wait for PUBACK is 2 minutes.

#define TELEMETRY_MSG_TIMEOUT_MIN 2*60

@ClaesIvarsson
Copy link
Author

Problem I have is not related to the 2 minute timeout, I have checked that there is no time jump that could explain the 2 minute timeout to expire.

@ericwolz
Copy link
Contributor

You need to trace thru the code to debug it.

@ericwolz ericwolz reopened this Oct 12, 2023
@ericwolz ericwolz closed this as not planned Won't fix, can't repro, duplicate, stale Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants