Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bonding interfaces cannot be offloaded on Linux Kernel 4.9 #313

Open
Maokaman1 opened this issue Mar 1, 2017 · 11 comments
Open

Bonding interfaces cannot be offloaded on Linux Kernel 4.9 #313

Maokaman1 opened this issue Mar 1, 2017 · 11 comments

Comments

@Maokaman1
Copy link

Maokaman1 commented Mar 1, 2017

Hello!

It seems that something has changed in Linux 4.9 regarding the way it represents bonded Mellanox interfaces which leads to broken offloading functionality of VMA for teamed interfaces.

[root@host2 ~]# uname -a
Linux host2 4.9.11-1-ARCH #1 SMP PREEMPT Sun Feb 19 13:45:52 UTC 2017 x86_64 GNU/Linux

[root@host2 ~]# LD_PRELOAD=libvma.so sockperf server
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: VMA_VERSION: 8.2.8-0 Development Snapshot built on Feb 27 2017 17:27:29
VMA INFO: Cmd Line: sockperf server
VMA INFO: Current Time: Wed Mar 1 09:59:12 2017
VMA INFO: Pid: 18020
VMA INFO: Architecture: x86_64
VMA INFO: Node: host2
VMA INFO: Log Level INFO [VMA_TRACELEVEL]
VMA INFO: ---------------------------------------------------------------------------
VMA WARNING: ************************************************************************
VMA WARNING: Your current max locked memory is: 65536. Please change it to unlimited.
VMA WARNING: Set this user's default to ulimit -l unlimited.
VMA WARNING: Read more about this topic in the VMA's User Manual.
VMA WARNING: ************************************************************************
VMA WARNING: *******************************************************************************************************
VMA WARNING: * Bond bond0 will not be offloaded due to problem with it's slaves.
VMA WARNING: * Check warning messages for more information.
VMA WARNING: *******************************************************************************************************
VMA WARNING: *******************************************************************************************************
VMA WARNING: * Bond bond0 will not be offloaded due to problem with it's slaves.
VMA WARNING: * Check warning messages for more information.
VMA WARNING: *******************************************************************************************************
VMA WARNING: *******************************************************************************************************
VMA WARNING: * Bond bond0.10 will not be offloaded due to problem with it's slaves.
VMA WARNING: * Check warning messages for more information.
VMA WARNING: *******************************************************************************************************
VMA WARNING: *******************************************************************************************************
VMA WARNING: * Bond bond0.8 will not be offloaded due to problem with it's slaves.
VMA WARNING: * Check warning messages for more information.
VMA WARNING: *******************************************************************************************************
VMA WARNING: *******************************************************************************************************
VMA WARNING: * Bond bond0.8 will not be offloaded due to problem with it's slaves.
VMA WARNING: * Check warning messages for more information.
VMA WARNING: *******************************************************************************************************
VMA WARNING: **************************************************************
VMA WARNING: * NO IMMEDIATE ACTION NEEDED!
VMA WARNING: * Not enough hugepage resources for VMA memory allocation.
VMA WARNING: * VMA will continue working with regular memory allocation.
VMA INFO: * Optional:
VMA INFO: * 1. Switch to a different memory allocation type
VMA INFO: * (VMA_MEM_ALLOC_TYPE= 0 or 1)
VMA INFO: * 2. Restart process after increasing the number of
VMA INFO: * hugepages resources in the system:
VMA INFO: * "cat /proc/meminfo | grep -i HugePage"
VMA INFO: * "echo 1000000000 > /proc/sys/kernel/shmmax"
VMA INFO: * "echo 800 > /proc/sys/vm/nr_hugepages"
VMA WARNING: * Please refer to the memory allocation section in the VMA's
VMA WARNING: * User Manual for more information
VMA WARNING: ***************************************************************
sockperf: == version #2.7-54.git4e9e71bf405b ==
sockperf: [SERVER] listen on:
[ 0] IP = 0.0.0.0 PORT = 11111 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid 18020] using recvfrom() to block on socket(s)
^Csockperf: Test end (interrupted by user)
sockperf: No messages were received on the server.
sockperf: cleanupAfterLoop() exit
[root@host2 ~]#

@liranoz12
Copy link
Contributor

Hi @Maokaman1 ,

I did not manage to reproduce the issue using kernel 4.9.11, Redhat 6.4 and VMA 828.
Do you use Mellanox OFED ? if yes, please try to reinstall it using --vma --add-kernel-support parameters.
What is the output of ibstat command ?
Can you please attach VMA log with debug log level? (run using VMA_TRACELEVEL=DEBUG).

Thanks.

@Maokaman1
Copy link
Author

Hello @liranoz12 ,

We use Archlinux and it's not supported by Mellanox OFED.
So we have only these tools and libs: Arch AUR Infiniband

Unfortunatly I have already returned 2 dual-port MCX416A-CCAT (100Gb, Ethernet Only) adapters that I had requested for a test and I cannot make any additional researches at the moment.
I've attached a log that I saved back then (mlx5_bond_0 is pretty suspicious device name).

Now I have only 2 single-port MCX455A-FCAT (56Gb VPI) adapters and I cannot reproduce the problem.

@Maokaman1
Copy link
Author

Hi @liranoz12 ,

Is there any ETA on resolving this dual port adapters issue?

@NirNitzani
Copy link

Hi @Maokaman1 ,

We are not familiar with such issue when using Mellanox OFED.
Have you been able to obtain a new board and test it with Mellanox OFED ?

@Maokaman1
Copy link
Author

Hi @NirNitzani ,
I've got a bunch of new MCX456A-ECAT (dual port again) and the problem is still there.
According to this community post "HowTo Configure RoCE over LAG (ConnectX-4)" appearance of aggregated mlx5_bond_0 device instead of two separate ones is a typical behaviour if you meet the requirements described in "Setup" section. So it seems that libvma doesn't support so-called "RoCE LAG mode". Can I somehow disable this mode to make libvma work again?

@NirNitzani
Copy link

Hi @Maokaman1 ,

VMA is not supporting ROCE.....you can work in ETH mode or IPoIB (supported in latest OFED).
I suggest starting by using our latest OFED/VMA release ensure that everything is working and only then switch to you specific OS.

@Maokaman1
Copy link
Author

Hi @NirNitzani ,
Unfortunately CentOS 7.4 with Mellanox OFED installed creates that aggregated mlx5_bond_0 (roce LAG) device too.

# cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)

# uname -a
Linux centos-1.local 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

# modinfo mlx5_ib
filename: /lib/modules/3.10.0-693.2.2.el7.x86_64/extra/mlnx-ofa_kernel/drivers/infiniband/hw/mlx5/mlx5_ib.ko
version: 4.1-1.0.2
license: Dual BSD/GPL
description: Mellanox Connect-IB HCA IB driver
author: Eli Cohen [email protected]
rhelversion: 7.4
srcversion: D88500BEA6DD3896298C88C
depends: mlx5_core,ib_core,mlx_compat
vermagic: 3.10.0-693.2.2.el7.x86_64 SMP mod_unload modversions


# /etc/init.d/openibd status

HCA driver loaded

Configured Mellanox EN devices:
mlx0
mlx1

Currently active Mellanox devices:
mlx0
mlx1

The following OFED modules are loaded:

rdma_ucm
rdma_cm
ib_ipoib
mlx4_core
mlx4_ib
mlx4_en
mlx5_core
mlx5_ib
ib_uverbs
ib_umad
ib_ucm
ib_cm
ib_core


# ibstat
CA 'mlx5_bond_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.20.1010
Hardware version: 0
Node GUID: 0x248a070300b1bcd8
System image GUID: 0x248a070300b1bcd8
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffeb1bcd8
Link layer: Ethernet


# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: net0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 70:4d:7b:63:25:c7 brd ff:ff:ff:ff:ff:ff
inet 192.168.110.145/24 brd 192.168.110.255 scope global net0
valid_lft forever preferred_lft forever
inet6 fe80::724d:7bff:fe63:25c7/64 scope link
valid_lft forever preferred_lft forever
7: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 02:56:fd:62:fd:1d brd ff:ff:ff:ff:ff:ff
8: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
link/ether 24:8a:07:b1:bc:d8 brd ff:ff:ff:ff:ff:ff
inet 10.17.17.2/24 brd 10.17.17.255 scope global bond1
valid_lft forever preferred_lft forever
inet 10.17.17.20/24 brd 10.17.17.255 scope global secondary bond1
valid_lft forever preferred_lft forever
inet6 fe80::268a:7ff:feb1:bcd8/64 scope link
valid_lft forever preferred_lft forever
9: mlx0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP qlen 1000
link/ether 24:8a:07:b1:bc:d8 brd ff:ff:ff:ff:ff:ff
10: mlx1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP qlen 1000
link/ether 24:8a:07:b1:bc:d8 brd ff:ff:ff:ff:ff:ff


[root@centos-1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-mlx0
DEVICE=mlx0
BOOTPROTO=none
ONBOOT=yes
MASTER=bond1
SLAVE=yes
USERCTL=no

[root@centos-1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-mlx1
DEVICE=mlx1
BOOTPROTO=none
ONBOOT=yes
MASTER=bond1
SLAVE=yes
USERCTL=no

[root@centos-1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond1
DEVICE=bond1
BONDING_OPTS="mode=4 miimon=100 fail_over_mac=0"
BOOTPROTO=none
ONBOOT=yes
IPADDR0=10.17.17.2
PREFIX0="24"
IPADDR1=10.17.17.20
PREFIX1="24"
USERCTL=no


libvma 8.3.7 bundled with MLNX_OFED:
[root@centos-1 ~]# LD_PRELOAD=/usr/lib64/libvma.so.8.3.7 sockperf sr
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: VMA_VERSION: 8.3.7-0 Release built on Aug 2 2017 03:21:48
VMA INFO: Cmd Line: sockperf sr
VMA INFO: OFED Version: MLNX_OFED_LINUX-4.1-1.0.2.0:
VMA INFO: Log Level INFO [VMA_TRACELEVEL]
VMA INFO: ---------------------------------------------------------------------------
VMA WARNING: *******************************************************************************************************
VMA WARNING: * Bond bond1 will not be offloaded due to problem with it's slaves.
VMA WARNING: * Check warning messages for more information.
VMA WARNING: *******************************************************************************************************
VMA WARNING: *******************************************************************************************************
VMA WARNING: * Bond bond1 will not be offloaded due to problem with it's slaves.
VMA WARNING: * Check warning messages for more information.
VMA WARNING: *******************************************************************************************************
sockperf: == version #3.1-16.gitc6a0d0e3ab53 ==
sockperf: [SERVER] listen on:
[ 0] IP = 0.0.0.0 PORT = 11111 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid 5212] using recvfrom() to block on socket(s)


libvma 8.4.4 compiled from git:
[root@centos-1 ~]# LD_PRELOAD=/usr/lib64/libvma.so.8.4.4 sockperf sr
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: VMA_VERSION: 8.4.4-0 Development Snapshot built on Sep 18 2017 14:06:27
VMA INFO: Git: d2c8f24
VMA INFO: Cmd Line: sockperf sr
VMA INFO: Current Time: Mon Sep 18 16:21:48 2017
VMA INFO: Pid: 5384
VMA INFO: OFED Version: MLNX_OFED_LINUX-4.1-1.0.2.0:
VMA INFO: Architecture: x86_64
VMA INFO: Node: centos-1.local
VMA INFO: ---------------------------------------------------------------------------
VMA INFO: Log Level INFO [VMA_TRACELEVEL]
VMA INFO: ---------------------------------------------------------------------------
VMA WARNING: *******************************************************************************************************
VMA WARNING: * Bond bond1 will not be offloaded due to problem with it's slaves.
VMA WARNING: * Check warning messages for more information.
VMA WARNING: *******************************************************************************************************
VMA WARNING: *******************************************************************************************************
VMA WARNING: * Bond bond1 will not be offloaded due to problem with it's slaves.
VMA WARNING: * Check warning messages for more information.
VMA WARNING: *******************************************************************************************************
VMA WARNING: **************************************************************
VMA WARNING: * NO IMMEDIATE ACTION NEEDED!
VMA WARNING: * Not enough hugepage resources for VMA memory allocation.
VMA WARNING: * VMA will continue working with regular memory allocation.
VMA INFO: * Optional:
VMA INFO: * 1. Switch to a different memory allocation type
VMA INFO: * (VMA_MEM_ALLOC_TYPE!= 2)
VMA INFO: * 2. Restart process after increasing the number of
VMA INFO: * hugepages resources in the system:
VMA INFO: * "echo 1000000000 > /proc/sys/kernel/shmmax"
VMA INFO: * "echo 800 > /proc/sys/vm/nr_hugepages"
VMA WARNING: * Please refer to the memory allocation section in the VMA's
VMA WARNING: * User Manual for more information
VMA WARNING: ***************************************************************
sockperf: == version #3.1-16.gitc6a0d0e3ab53 ==
sockperf: [SERVER] listen on:
[ 0] IP = 0.0.0.0 PORT = 11111 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid 5384] using recvfrom() to block on socket(s)

Debug mode on:
debug_libvma.so.8.4.4.txt

@liranoz12
Copy link
Contributor

Hi @Maokaman1,

Thanks for your informative update.
It is a known issue while using VMA with CentOS 7.4.
Starting in kernel version 3.10.0-693 (7.4 kernel), in case of creating a bond LAG consisting of precisely two ports, the bond will not be offloaded if both ports belong to a single device.
Workaround: In case of creating a bond LAG there should be at least two ports belonging to different devices enslaved under the bond.
A fix for this issue is in our roadmap.

Liran.

@Maokaman1
Copy link
Author

Hi @liranoz12,

I've found another workaround that seems to work even on dual-port adapters: you just need to create a "dummy" bridge interface on top of the bond interface (also do not forget to migrate IP address(es) from the bond to the bridge interface).
Not sure if that's a production ready workaround, but, nevetheless, one can find this information useful.

@liranoz12
Copy link
Contributor

@Maokaman1,

Thanks for your update. We will check this workaround.
Liran.

@DanielLibenson
Copy link
Collaborator

Hi @Maokaman1,
Thank you for your hint, "dummy" bridge is a good and working workaround for mlx5 devices.
Also you may use a "dummy" interface as an alternative workaround.
We will update our release notes accordingly.

Daniel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants