-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
samples: net: zperf: Optimize configuration for better performance #75281
samples: net: zperf: Optimize configuration for better performance #75281
Conversation
samples/net/zperf/prj.conf
Outdated
CONFIG_NET_PKT_RX_COUNT=50 | ||
CONFIG_NET_PKT_TX_COUNT=50 | ||
CONFIG_NET_BUF_RX_COUNT=300 | ||
CONFIG_NET_BUF_TX_COUNT=300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried this with NXP imxrt1050-evkb
board, and the Zephyr download throughput drops to 79.06 Mbps with this.
With slightly lower values
CONFIG_NET_PKT_RX_COUNT=40
CONFIG_NET_PKT_TX_COUNT=40
CONFIG_NET_BUF_RX_COUNT=160
CONFIG_NET_BUF_TX_COUNT=160
the throughput rises to 94.4 Mbits/sec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check this with my STM, I recall I couldn't reach maximum TCP download throughput with lower buffer count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, both TCP TX and RX degraded a bit with those configs:
TCP TX RX
74.19 Mbps 85.51 Mbps
But they're still not bad so I guess it's fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is weird how the numbers change when having more buffers in NXP but with STM the results are more in line what to expect.
6d00e28
to
d28aca7
Compare
@dleach02 any idea why increasing network buffers for NXP boards gives worse results than having a lower net buf count? We could discuss the zperf issues next week in network forum meeting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A slight inconsistency, great work, I'm quite amazed how well the TCP stack is performing.
samples/net/zperf/README.rst
Outdated
|
||
.. code-block:: cfg | ||
|
||
CONFIG_NET_PKT_RX_COUNT=50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure that these values match the prj.conf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated
Interesting things are going on lately. What I do not understand is why the UDP TX performance of Zephyr is so bad in comparison. Also in the report that NXP released lately this was very clear. |
Increase the number of network packets and buffers for better TCP performance in the sample out-of-the-box. Decrease the network buffer data size for better buffer management in the sample (less buffer space wasted for L2 header). The only drawback of this is reduced TCP TX performance, but less than 2 Mbps in my case. Finally, enable speed optimizations for another small performance boost. As the RAM requirements of the sample now increase considerably in the default configuration, add a note in the readme file about it, and how to make it fit into smaller boards. Tested on nucleo_h723zg: Before (current defaults): UDP TX RX 76.47 Mbps 93.48 Mbps TCP TX RX 76.18 Mbps 67.75 Mbps After (new defaults): UDP TX RX 76.08 Mbps 93.62 Mbps TCP TX RX 74.19 Mbps 85.51 Mbps Signed-off-by: Robert Lubos <[email protected]>
d28aca7
to
ce805a0
Compare
I just looked at the UDP TX path and recently had a mail conversation with @dleach02 on the performance. Is the fact that the transmitted packets always go through a separate thread a significant contributor to the lower (UDP) TX performance? A reasonable simple way to test is would be to see if playing with CONFIG_NET_TC_SKIP_FOR_HIGH_PRIO has a significant impact. Line 356 in 2e956c2
If this is of significant influence we might need to look if this can be implemented in a different way. |
Actually, when I was testing the impact of a separate TX thread in the past, I've observed quite a contrary, see the table in #23302 (comment). But that was because the driver I've used blocked the thread for the transmission (hence offloading it to a separate thread allowed the net stack to already prepare the next packet instead of waiting). |
In default settings, there is no separate TX thread in the system. If userspace is enabled, then there needs to be one TX thread. Line 200 in 1ed04e7
|
Increase the number of network packets and buffers for better TCP performance in the sample out-of-the-box.
Decrease the network buffer data size for better buffer management in the sample (less buffer space wasted for L2 header). The only drawback of this is reduced TCP TX performance, but less than 2 Mbps in my case.
Finally, enable speed optimizations for another small performance boost.
As the RAM requirements of the sample now increase considerably in the default configuration, add a note in the readme file about it, and how to make it fit into smaller boards.
Tested on nucleo_h723zg: