Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP bandwidth with iperf/qperf/sockperf #778

Open
zorrohahaha opened this issue Dec 6, 2018 · 9 comments
Open

TCP bandwidth with iperf/qperf/sockperf #778

zorrohahaha opened this issue Dec 6, 2018 · 9 comments

Comments

@zorrohahaha
Copy link

I am trying to run this libvma on the aarch64 platform and it works. And when using iperf/qperf/sockperf to do some evaluation, I find the latency is better than that through the kernel socket stack. The UDP bandwidth via qperf is also better.
But the bandwidth of TCP mode is much worse than that through the kernel mode. Only several hundred Mb to 1Gb (25Gb NIC interface).
After setting the parameters "VMA_TCP_NODELAY=1 VMA_TX_BUFS=4000000", the bandwidth is getting a little better, upto about 10Gbps. But it is still far away from the line rate.

@zorrohahaha
Copy link
Author

zorrohahaha commented Dec 6, 2018

By using the recommended sockperf, the TCP bandwidth is still quite low without VMA_SPEC being set before the preload dynamic library.
I just user Linux perf tool to check the thread status. It seems that the TX buffer is exhausted and it will waiting for the ACK from the another side. Then dozens of Kbs window could be released and some data could be sent again. The "rx wait" takes a lot of CPU time. (Correct me if my understanding is wrong)

When UDP is used, the bandwidth is much better since there is no ACK required.

@zorrohahaha
Copy link
Author

I searched from Internet and found a slides named "Data Transfer Node Performance GÉANT & AENEAS". In this slides, TCP bandwidth with iperf2 is also mentioned.
It seems that the bandwidth tested through libvma is lower that through the kernel stack.

@liranoz12
Copy link
Contributor

Hi @zorrohahaha,

VMA is optimize for TCP packets below MTU/MSS size. We are aware for the disadvantage with high TCP bandwidth above MTU/MSS size.
Resolving this gap is in our roadmap.
Please consider using VMA_RX_POLL_ON_TX_TCP=1 parameter which improve the ACK processing time.

Thanks,
Liran

@igor-ivanov
Copy link
Collaborator

Hi @zorrohahaha please try to use https://github.com/Mellanox/libvma/releases/tag/8.9.2 and compile VMA with --enable-tso

@littleyanglovegithub
Copy link

Hi,we also find this issue, the bandwidth of TCP bypass with vma is much worse than that through the kernel.

@igor-ivanov
Copy link
Collaborator

@littleyanglovegithub
Did you try VMA 8.9.2 and up with --enable-tso configuration option?
Did you use server under OS and client under VMA?
What is a message size?

@daversun
Copy link

daversun commented Nov 5, 2020

I strongly suggest you write the limitation "VMA is optimize for TCP packets below MTU/MSS size. We are aware for the disadvantage with high TCP bandwidth above MTU/MSS size." in the offical website

@zhentaoyang1982
Copy link

Please consider using VMA_RX_POLL_ON_TX_TCP=1 parameter which improve the ACK processing time.

hi , Is there any update for VMA throughput performance with big packet size over MTU?

@igor-ivanov
Copy link
Collaborator

igor-ivanov commented Jul 1, 2022

VMA usage allows to improve throughput values comparing with Linux for payload size less 16KB. It is not always seen on simple benchmark scenarios as iperf/sockperf. NGNIX configured with different workers under configured VMA (--enable-tso and related VMA environment options) demonstrates better throughput results vs Linux.
As an example there is open link as https://www.microsoft.com/en-us/research/uploads/prod/2019/08/p90-li.pdf page 100 with analysis that can be studied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants