Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

samples: net: zperf: Optimize configuration for better performance #75281

Merged

Conversation

rlubos
Copy link
Contributor

@rlubos rlubos commented Jul 1, 2024

Increase the number of network packets and buffers for better TCP performance in the sample out-of-the-box.

Decrease the network buffer data size for better buffer management in the sample (less buffer space wasted for L2 header). The only drawback of this is reduced TCP TX performance, but less than 2 Mbps in my case.

Finally, enable speed optimizations for another small performance boost.

As the RAM requirements of the sample now increase considerably in the default configuration, add a note in the readme file about it, and how to make it fit into smaller boards.

Tested on nucleo_h723zg:

  Before (current defaults):
    UDP      TX          RX
         76.47 Mbps  93.48 Mbps
    TCP      TX          RX
         76.18 Mbps  67.75 Mbps

  After (new defaults):
    UDP      TX          RX
         76.08 Mbps  93.62 Mbps
    TCP      TX          RX
         74.19 Mbps  85.51 Mbps

pdgendt
pdgendt previously approved these changes Jul 1, 2024
Comment on lines 11 to 14
CONFIG_NET_PKT_RX_COUNT=50
CONFIG_NET_PKT_TX_COUNT=50
CONFIG_NET_BUF_RX_COUNT=300
CONFIG_NET_BUF_TX_COUNT=300
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried this with NXP imxrt1050-evkb board, and the Zephyr download throughput drops to 79.06 Mbps with this.
With slightly lower values

CONFIG_NET_PKT_RX_COUNT=40
CONFIG_NET_PKT_TX_COUNT=40
CONFIG_NET_BUF_RX_COUNT=160
CONFIG_NET_BUF_TX_COUNT=160

the throughput rises to 94.4 Mbits/sec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check this with my STM, I recall I couldn't reach maximum TCP download throughput with lower buffer count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, both TCP TX and RX degraded a bit with those configs:

 TCP      TX          RX
     74.19 Mbps  85.51 Mbps

But they're still not bad so I guess it's fine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is weird how the numbers change when having more buffers in NXP but with STM the results are more in line what to expect.

samples/net/zperf/prj.conf Show resolved Hide resolved
@jukkar
Copy link
Member

jukkar commented Jul 2, 2024

@dleach02 any idea why increasing network buffers for NXP boards gives worse results than having a lower net buf count?

We could discuss the zperf issues next week in network forum meeting.

@jukkar jukkar added this to the v3.7.0 milestone Jul 4, 2024
jukkar
jukkar previously approved these changes Jul 4, 2024
@jukkar jukkar requested a review from aescolar July 4, 2024 11:46
Copy link
Collaborator

@ssharks ssharks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A slight inconsistency, great work, I'm quite amazed how well the TCP stack is performing.


.. code-block:: cfg

CONFIG_NET_PKT_RX_COUNT=50
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that these values match the prj.conf

Copy link
Contributor Author

@rlubos rlubos Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated

@ssharks
Copy link
Collaborator

ssharks commented Jul 5, 2024

Interesting things are going on lately. What I do not understand is why the UDP TX performance of Zephyr is so bad in comparison. Also in the report that NXP released lately this was very clear.

Increase the number of network packets and buffers for better TCP
performance in the sample out-of-the-box.
Decrease the network buffer data size for better buffer management in
the sample (less buffer space wasted for L2 header). The only drawback
of this is reduced TCP TX performance, but less than 2 Mbps in my case.
Finally, enable speed optimizations for another small performance boost.

As the RAM requirements of the sample now increase considerably in the
default configuration, add a note in the readme file about it, and how
to make it fit into smaller boards.

Tested on nucleo_h723zg:
  Before (current defaults):
    UDP      TX          RX
         76.47 Mbps  93.48 Mbps
    TCP      TX          RX
         76.18 Mbps  67.75 Mbps

  After (new defaults):
    UDP      TX          RX
         76.08 Mbps  93.62 Mbps
    TCP      TX          RX
         74.19 Mbps  85.51 Mbps

Signed-off-by: Robert Lubos <[email protected]>
@ssharks
Copy link
Collaborator

ssharks commented Jul 6, 2024

I just looked at the UDP TX path and recently had a mail conversation with @dleach02 on the performance. Is the fact that the transmitted packets always go through a separate thread a significant contributor to the lower (UDP) TX performance?

A reasonable simple way to test is would be to see if playing with CONFIG_NET_TC_SKIP_FOR_HIGH_PRIO has a significant impact.
See

if ((IS_ENABLED(CONFIG_NET_TC_SKIP_FOR_HIGH_PRIO) &&

If this is of significant influence we might need to look if this can be implemented in a different way.

@aescolar aescolar merged commit e91c47d into zephyrproject-rtos:main Jul 6, 2024
17 checks passed
@rlubos
Copy link
Contributor Author

rlubos commented Jul 8, 2024

I just looked at the UDP TX path and recently had a mail conversation with @dleach02 on the performance. Is the fact that the transmitted packets always go through a separate thread a significant contributor to the lower (UDP) TX performance?

A reasonable simple way to test is would be to see if playing with CONFIG_NET_TC_SKIP_FOR_HIGH_PRIO has a significant impact. See

if ((IS_ENABLED(CONFIG_NET_TC_SKIP_FOR_HIGH_PRIO) &&

If this is of significant influence we might need to look if this can be implemented in a different way.

Actually, when I was testing the impact of a separate TX thread in the past, I've observed quite a contrary, see the table in #23302 (comment). But that was because the driver I've used blocked the thread for the transmission (hence offloading it to a separate thread allowed the net stack to already prepare the next packet instead of waiting).

@jukkar
Copy link
Member

jukkar commented Jul 8, 2024

I just looked at the UDP TX path and recently had a mail conversation with @dleach02 on the performance. Is the fact that the transmitted packets always go through a separate thread a significant contributor to the lower (UDP) TX performance?

In default settings, there is no separate TX thread in the system. If userspace is enabled, then there needs to be one TX thread.

config NET_TC_TX_COUNT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants