-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve server/client stuck at the test end before results-exchange #1527
base: master
Are you sure you want to change the base?
Resolve server/client stuck at the test end before results-exchange #1527
Conversation
a3df342
to
dbe8d68
Compare
Re-implementation for the Multi-thread iperf3. I am not sure whether the original problem still happen with multi-thread. Some of the changes may be worth implementing in any case, so if needed I can submit a PR only with these changes:
The PR also include the code changes to resolve the original issue:
|
@@ -562,7 +545,6 @@ iperf_run_server(struct iperf_test *test) | |||
} | |||
|
|||
memcpy(&read_set, &test->read_set, sizeof(fd_set)); | |||
memcpy(&write_set, &test->write_set, sizeof(fd_set)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this means that write_set
may have an uninitialized values on line 581. Should you just remove the variable entirely since it it unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. Forgot to remove all occurances of write_set
... Now removed (with rebase).
dbe8d68
to
fcf1c46
Compare
@@ -580,8 +583,9 @@ iperf_run_client(struct iperf_test * test) | |||
|
|||
/* Begin calculating CPU utilization */ | |||
cpu_util(NULL); | |||
rcv_timeout_value_in_us = (test->settings->rcv_timeout.secs * SEC_TO_US) + test->settings->rcv_timeout.usecs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change creates a scenario with an early timeout.
Given:
test->mode == SENDER
rcv_timeout_value_in_us > 0
test->state == TEST_END
(orEXCHANGE_RESULTS
orDISPLAY_RESULTS
)
This implies that rcv_timeout_us = 0
. Then on line 642, the if statement will evaluate to something like if (t_usecs > 0)
; always being true since t_usecs
will pretty much always be greater than 0.
It might make more sense to split the blocks so the correct timeout is used:
(I've also renamed rcv_timeout_value_in_us
-> end_rcv_timeout
and rcv_timeout_us
-> running_rcv_timeout
to hopefully make their use more clear)
if (result < 0 && errno != EINTR) {
i_errno = IESELECT;
goto cleanup_and_fail;
} else if ( result == 0 && (running_rcv_timeout > 0 && test->state == TEST_RUNNING)) {
/*
* If nothing was received in non-reverse running state
* then probably something got stuck - either client,
* server or network, and test should be terminated./
*/
iperf_time_now(&now);
if (iperf_time_diff(&now, &last_receive_time, &diff_time) == 0) {
t_usecs = iperf_time_in_usecs(&diff_time);
if (t_usecs > running_rcv_timeout) {
/* Idle timeout if no new blocks received */
if (test->blocks_received == last_receive_blocks) {
i_errno = IENOMSG;
goto cleanup_and_fail;
}
}
}
} else if (result == 0 && (end_rcv_timeout > 0 && (test->state == TEST_END
|| test->state == EXCHANGE_RESULTS
|| test->state == DISPLAY_RESULTS))) {
iperf_time_now(&now);
if (iperf_time_diff(&now, &last_receive_time, &diff_time) == 0) {
t_usecs = iperf_time_in_usecs(&diff_time);
if (t_usecs > end_rcv_timeout) {
/* Idle timeout if no new blocks received */
if (test->blocks_received == last_receive_blocks) {
i_errno = IENOMSG;
goto cleanup_and_fail;
}
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only rcv_timeout_us
value is used for timeout (in the select
statement). The value of rcv_timeout_value_in_us
is only used to initialize rcv_timeout_us
and to indicate for for the ending states whether receive timeout was requested (as there is no "timeout requested" setting). Therefore, the current code is correct in that respect.
I agree that probably rcv_timeout_value_in_us
may have a better name, like rcv_timeout_setting_us
but I won't make a change just for that.
Hi, The data transfers are performed between:
The data transfers occur every X minutes, alternating between client-to-server and server-to-client directions. Sometimes, the iperf3 server hangs, and all subsequent tests fail with the message When the iperf3 server hangs:
The issue is not systematic but occurs after several hours. I have also tried the "--rcv-timeout 30000" option, but it did not resolve the issue. The iperf3 server is started with the following command: The iperf3 client command (on the Android phone) is: Since I need a solution, I am available to experiment with other iperf3 command options or to run a debug version of iperf3. Thank you. |
@RizziMau does it happen if you don't use On a side note, with your work around, you may be interested in this PR. Using systemd socket based activation would remove your 2 second down time between one-off tests. #1171 |
@MattCatz As an additional test, does it make sense to try the "--forceflush" option on the server command? |
oh duhh I misunderstood your original comment. You are right, |
@RizziMau, I also suggest to run the server using You may also try building the server from PR #1734, which adds more debug information about the server's state changes. This may be important, as one of the possible reasons for the problem is that the server is waiting for a state change message from the client. |
Thank you for the suggestions: |
Version of iperf3 (or development branch, such as
master
or3.1-STABLE
) to which this pull request applies:master
Issues fixed (if any):
Iperf3 server is getting stuck after printing the result #819
Brief description of code changes (suitable for use as a commit message):
A suggested fix for the issue that server and client got stuck at the end of the test, because the client did not receive
EXCHANGE_RESULTS
state, based on @ffolkes1911 test results and discussion starting at this comment.The issue was caused on a cellular network, so the fix seem to be important for any iperf3 use over cellular network. As the changes are for the end of the test, they are probably related also to the multi-thread version.
The root cause of the issue was that the server sent (reverse mode) only about 60KB from each 128KB TCP packet. Therefore, the last read by the client did not receive a full 128KB packet. Since the read was in blocking-mode the client got stuck and was not able to respond to the server. Therefore, the server also got stuck while waiting for the client's reply. It is not clear whether the
EXCHANGE_RESULTS
was lost or whether it was just delayed, but the fix handles both cases.The main changes done:
TEST_END
. This is to allow the client to receive non-full late packets.--rcv-timeout
in the client in sending mode - used at the end of the test to allow timeout for exchange-results messages read.TEST_END
. This is becauseselect()
returns immediately when monitoring closed sockets, and the client'sselect()
timeout is not effective in this case.TEST_END
. Otherwise, even when no input is received from the server, client'sselect()
for late packets will return immediately (as the write sockets are available) which will not allow using the--rcv-timeout
.IPERF_DONE
orCLIENT_TERMINATE
, it sends additional 3 null bytes. This will make sure that in case the server is waiting for the exchanged-results JSON length (4 bytes), it will not get stuck in the read command (server will then fail because it doesn't get a legal JSON).TEST_END
, as they are redundant at that point and may interrupt with the--rcv-timeout
, becauseselect()
may return after a timer expires and before the receive timer expired.