Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

permanent mac_bindig cause traffic blackhole #238

Open
ardenisov opened this issue Apr 2, 2024 · 5 comments
Open

permanent mac_bindig cause traffic blackhole #238

ardenisov opened this issue Apr 2, 2024 · 5 comments

Comments

@ardenisov
Copy link

ardenisov commented Apr 2, 2024

Hello!
I want to report a bug in patch for MAC_binding functionality, seems that it just adding but not updating or deleting MAC_Bindig rows.
a2b88dc

In my setup I have logical router with DNAT rules.
Whenever I create or delete logical routers or DNAT rules on them with same IPs, I can see that MAC_Binding rows are not updated or deleted in SBDB.
Let me show some example of relationships between Port_Binding and Mac_Binding, which I expect to be properly served by OVN controller pinctrl module. But its not :(

  • DNAT rule added
    Port_Binding
_uuid               : 7c16e012-5acc-498e-b656-79a19f5bb4d1
chassis             : 1f4aa70f-804a-4e98-b8ba-22db389be1e2
datapath            : 09b37624-d7d2-4a21-8758-3809ae319f62
encap               : []
external_ids        : {"neutron:cidrs"="10.14.0.253/24", "neutron:device_id"="9788b99d-351c-4741-92d3-2ee27ecd1e3f", "neutron:device_owner"="network:router_gateway", "neutron:network_name"=neutron-a0f9f5fd-e94b-44d9-a4b3-66082dd9dd5a, "neutron:port_name"="", "neutron:project_id"="", "neutron:revision_number"="1", "neutron:security_group_ids"="", "neutron:subnet_pool_addr_scope4"="", "neutron:subnet_pool_addr_scope6"=""}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : "ad363382-c4e1-42f1-a103-85a0decf8b73"
mac                 : [router]
nat_addresses       : ["fa:16:3e:a1:d9:1e 10.14.0.250"] <<<< This field updated!!!
options             : {l3gateway-chassis=az14-network-2, peer=lrp-ad363382-c4e1-42f1-a103-85a0decf8b73, shadow-port="true"}
parent_port         : []
requested_chassis   : []
tag                 : []
tunnel_key          : 6
type                : l3gateway
up                  : true
virtual_parent      : []

MAC_binding

_uuid               : b05316bd-4293-44c1-890c-ca2ca869241d  <<<< This row OVN must create if not exist!
datapath            : 06cb9489-07d5-4328-8543-aab635b1d8d1
ip                  : "10.14.0.250"
logical_port        : lrp-8c60913e-1e3f-44fe-ba6e-49ecf6ced01e
mac                 : "fa:16:3e:a1:d9:1e"   <<<< This field OVN must update with new mac if row already exist!
  • DNAT rule deleted
    Port_Binding
_uuid               : 7c16e012-5acc-498e-b656-79a19f5bb4d1
chassis             : 1f4aa70f-804a-4e98-b8ba-22db389be1e2
datapath            : 09b37624-d7d2-4a21-8758-3809ae319f62
encap               : []
external_ids        : {"neutron:cidrs"="10.14.0.253/24", "neutron:device_id"="9788b99d-351c-4741-92d3-2ee27ecd1e3f", "neutron:device_owner"="network:router_gateway", "neutron:network_name"=neutron-a0f9f5fd-e94b-44d9-a4b3-66082dd9dd5a, "neutron:port_name"="", "neutron:project_id"="", "neutron:revision_number"="1", "neutron:security_group_ids"="", "neutron:subnet_pool_addr_scope4"="", "neutron:subnet_pool_addr_scope6"=""}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : "ad363382-c4e1-42f1-a103-85a0decf8b73"
mac                 : [router]
nat_addresses       : [] <<<< This field cleared!!!
options             : {l3gateway-chassis=az14-network-2, peer=lrp-ad363382-c4e1-42f1-a103-85a0decf8b73, shadow-port="true"}
parent_port         : []
requested_chassis   : []
tag                 : []
tunnel_key          : 6
type                : l3gateway
up                  : true
virtual_parent      : []

MAC_binding

_uuid               : b05316bd-4293-44c1-890c-ca2ca869241d  <<<< This row OVN must destroy!
datapath            : 06cb9489-07d5-4328-8543-aab635b1d8d1
ip                  : "10.14.0.250"
logical_port        : lrp-8c60913e-1e3f-44fe-ba6e-49ecf6ced01e
mac                 : "fa:16:3e:a1:d9:1e" 
  • Logical router added
    Port_Binding
_uuid               : 4fba6716-0eb1-4cb2-ac57-66184939e623  <<<< This row created!
chassis             : 1f4aa70f-804a-4e98-b8ba-22db389be1e2
datapath            : 890aef9d-0dd6-48e5-935b-d07951143c37
encap               : []
external_ids        : {"neutron:network_name"=neutron-a0f9f5fd-e94b-44d9-a4b3-66082dd9dd5a, "neutron:revision_number"="1", "neutron:router_name"="9788b99d-351c-4741-92d3-2ee27ecd1e3f", "neutron:subnet_ids"="8525a5ff-7e20-40f1-a768-b82b30378ac2"}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : lrp-ad363382-c4e1-42f1-a103-85a0decf8b73
mac                 : ["fa:16:3e:a1:d9:1e 10.14.0.253/24"]
nat_addresses       : []
options             : {chassis-redirect-port=cr-lrp-ad363382-c4e1-42f1-a103-85a0decf8b73, l3gateway-chassis=az14-network-2, peer="ad363382-c4e1-42f1-a103-85a0decf8b73"}
parent_port         : []
requested_chassis   : []
tag                 : []
tunnel_key          : 1
type                : l3gateway
up                  : true
virtual_parent      : []

MAC_binding

_uuid               : f3435a82-05f8-4c08-8689-6ebbc4f6c7b4  <<<< This row OVN must create if not exist!
datapath            : 06cb9489-07d5-4328-8543-aab635b1d8d1
ip                  : "10.14.0.253"
logical_port        : lrp-8c60913e-1e3f-44fe-ba6e-49ecf6ced01e
mac                 : "fa:16:3e:a1:d9:1e"  <<<< This filed OVN must update with new mac if row already exist!
  • Logical router deleted
    Port_Binding
_uuid               : 4fba6716-0eb1-4cb2-ac57-66184939e623  <<<< This row deleted!
chassis             : 1f4aa70f-804a-4e98-b8ba-22db389be1e2
datapath            : 890aef9d-0dd6-48e5-935b-d07951143c37
encap               : []
external_ids        : {"neutron:network_name"=neutron-a0f9f5fd-e94b-44d9-a4b3-66082dd9dd5a, "neutron:revision_number"="1", "neutron:router_name"="9788b99d-351c-4741-92d3-2ee27ecd1e3f", "neutron:subnet_ids"="8525a5ff-7e20-40f1-a768-b82b30378ac2"}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : lrp-ad363382-c4e1-42f1-a103-85a0decf8b73
mac                 : ["fa:16:3e:a1:d9:1e 10.14.0.253/24"]
nat_addresses       : []
options             : {chassis-redirect-port=cr-lrp-ad363382-c4e1-42f1-a103-85a0decf8b73, l3gateway-chassis=az14-network-2, peer="ad363382-c4e1-42f1-a103-85a0decf8b73"}
parent_port         : []
requested_chassis   : []
tag                 : []
tunnel_key          : 1
type                : l3gateway
up                  : true
virtual_parent      : []

MAC_binding

_uuid               : f3435a82-05f8-4c08-8689-6ebbc4f6c7b4  <<<< This row OVN must destroy!
datapath            : 06cb9489-07d5-4328-8543-aab635b1d8d1
ip                  : "10.14.0.253"
logical_port        : lrp-8c60913e-1e3f-44fe-ba6e-49ecf6ced01e
mac                 : "fa:16:3e:a1:d9:1e"

So in the above examples I tried to create/delete logical routers with same external IP address twice, and traffic, when router created second time to its external IP, blackholed.
The same picture I can see when, some DNAT rules were added to first router then router deleted and then DNAT rules repeated on second router with same IPs - traffic blackholed again.

My setup:

  • OVN 22.03.3
  • OVS 2.17.7
@dceara
Copy link
Collaborator

dceara commented Apr 18, 2024

Hi @ardenisov! Thanks for your report!

CC: @almusil

It's however something we can't easily fix in the general case because mac bindings (ARP cache) are for IPs that may also be outside of OVN.

Partially related to this we added support for mac binding aging in 22.09:
1a947dd3073

Quoting from the original bug report that triggered that change:

In OpenStack, we have been doing some tricks in the past to workaround
the limitation of MAC_Binding entries not expiring.

Some of those tricks involve not monitoring the MAC_Binding table at all
to avoid OOM killers [0] or delete the entries upon association/disassociation
of a Floating IP [1].

Ideally, old (or better, unused) entries should be deleted helping reduce the
size of the database but also avoiding issues when reusing IP addresses.

Would it be an option for you to upgrade to a version that has it and enable the feature?

Thanks,
Dumitru

@george-shagov-cloud-ru
Copy link

george-shagov-cloud-ru commented Jul 12, 2024

seems that it just adding but not updating or deleting MAC_Bindig rows.

It does, please have a look at the ovn.at: delete Mac bindings test (133-134). It works - no issues
Please have a look at the related issue also Issue-251

The only case that we fail to reproduce at the moment is snat/dnat adding/removal following adding/removing MAC_Binding record at SBDB

Keeping in mind aging function it seems like this case is loosing its priority

@ardenisov
Copy link
Author

ardenisov commented Jul 25, 2024

Hello @dceara!
I made setup with two routers connected in one logical network.
Both of them configured with mac_aging_threshold.
Both of them have mac_binding related to each other.
But mac aging mechanism is not working.
r1:

_uuid               : f9c4f755-8295-47cd-94a7-d83d6aa53a17
copp                : []
enabled             : true
external_ids        : {}
load_balancer       : []
load_balancer_group : []
name                : neutron-10e8bd1d-807a-4c05-a075-f616be1b109d
nat                 : []
options             : {always_learn_from_arp_request="false", chassis=pd32-nsrv-006, dynamic_neigh_routers="true", mac_binding_age_threshold="300"}
policies            : []
ports               : [89a90541-6658-450b-a738-366313958e01, 91e8436c-0ac3-4c51-864e-8398cdbc1221, 932ce364-b68a-4b26-81c0-10ac470358fa]
static_routes       : [10209a70-39a2-4362-8b40-ad8d4254f48d]

r2:

_uuid               : c3a51f98-07d1-493a-bd2d-f9c71166cd55
copp                : []
enabled             : true
external_ids        : {}
load_balancer       : []
load_balancer_group : []
name                : neutron-adf35c8a-d2f7-4117-ab6a-0d62d44a187e
nat                 : [80e70386-c9fc-4af0-b330-de8d38441e34, c08d6f90-e4e5-4dd7-ae5f-19f89f0e7b63]
options             : {always_learn_from_arp_request="false", chassis=pd32-nsrv-005, dynamic_neigh_routers="true", mac_binding_age_threshold="300"}
policies            : []
ports               : [4e3404c1-8253-456d-801f-6f7704570036, 5935813e-3682-4d25-9b08-9f9554e23ea1]
static_routes       : [95b33cb4-9e51-479a-a0f4-c8dfbb8b3bf3, a05169cb-7f0a-48fa-aba7-e523983f81cc, b2587a92-8a5e-4a90-8f07-8ca088409030]

p1:

_uuid               : 932ce364-b68a-4b26-81c0-10ac470358fa
enabled             : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []
ipv6_prefix         : []
ipv6_ra_configs     : {}
mac                 : "fa:16:3e:97:7b:79"
name                : lrp-bd4d4e9d-f652-4ee6-be7b-19b76c0f470b
networks            : ["10.32.0.1/24"]
options             : {}
peer                : []
status              : {}

p2:

_uuid               : 4e3404c1-8253-456d-801f-6f7704570036
enabled             : []
external_ids        : {}
gateway_chassis     : [c8cb4a1b-c633-416d-8aca-6a61f76ede54, f242df81-7a19-493f-b12a-4e6868bae442]
ha_chassis_group    : []
ipv6_prefix         : []
ipv6_ra_configs     : {}
mac                 : "fa:16:3e:b1:ea:5f"
name                : lrp-3ef7214f-1ccb-4b0d-b3ae-527d9b03cf15
networks            : ["10.32.0.253/24"]
options             : {}
peer                : []

mac1:

_uuid               : ff5056ea-172e-4517-831d-230dbee64dab
datapath            : a1b2ef69-5b91-48bf-958e-a1ba223b27d2
ip                  : "10.32.0.1"
logical_port        : lrp-3ef7214f-1ccb-4b0d-b3ae-527d9b03cf15
mac                 : "fa:16:3e:97:7b:79"
timestamp           : 1721899748761

mac2:

_uuid               : 0113ac8d-2ba8-4d52-9356-75462e238eec
datapath            : 3998ece8-bb0c-4070-948b-3127f49aca9b
ip                  : "10.32.0.253"
logical_port        : lrp-bd4d4e9d-f652-4ee6-be7b-19b76c0f470b
mac                 : "fa:16:3e:b1:ea:5f"
timestamp           : 1721899748728

There is no traffic between routers and timestamps in mac_bindings are not updated.
Above mac_bindings still present in sbdb even 5 minutes timer expired.
Our ovn build has tests with mac_binding_age_threshold in ovt.at file which passes successfully.
What is wrong with above mac_bindings rows, why are they not destroyed from sbdb after aging timer expired?

@almusil
Copy link
Contributor

almusil commented Jul 25, 2024

Hi, what is the OVN version that you are testing the aging with? Also could you please show the relevant datapath bindings (a1b2ef69-5b91-48bf-958e-a1ba223b27d2 and 3998ece8-bb0c-4070-948b-3127f49aca9b) if they actually correspond to those two routers?

@ardenisov
Copy link
Author

Hello, @almusil ! Sorry, I miss your request.
My setup was restored with different ids and issue reproduced:
I use 24.03 OVN version.
New setup is:
r1

_uuid               : c7698e88-019d-48dd-a6e8-97978f404624
copp                : []
enabled             : true
external_ids        : {}
load_balancer       : []
load_balancer_group : []
name                : neutron-ad7c01e9-b3c9-429b-9744-4cb0a24556b3
nat                 : [a13b62df-13af-40a5-aac6-82f410db70e6, da15b3c2-9f9c-48ef-ab7d-27e92e820a5e]
options             : {always_learn_from_arp_request="false", chassis=pd32-nsrv-006, dynamic_neigh_routers="true", mac_binding_age_threshold="5"}
policies            : []
ports               : [055e2390-891f-4c49-80ab-1d8466fc0735, a7f2a1a5-1e91-4cbe-b7ec-7cd48be0cae6, da099f13-2d94-4402-bc2d-c3a995f792c7]
static_routes       : [d72144e3-eeeb-4d14-ab2f-09e67a52886c]

r2:

_uuid               : c476e2ef-5cad-474c-8595-5ae6751ce3f8
copp                : []
enabled             : true
external_ids        : {}
load_balancer       : []
load_balancer_group : []
name                : neutron-63e4812c-8518-4076-9b52-554b2ed5d6eb
nat                 : [a2bc1bd4-c1ee-4402-9679-b4e2aefa3049, c2a28c7d-fd6a-41d9-b8d9-b9d39c0aa0ef]
options             : {always_learn_from_arp_request="false", chassis=pd32-nsrv-005, dynamic_neigh_routers="true", mac_binding_age_threshold="5"}
policies            : []
ports               : [2c548c50-735c-4b69-a58f-d05131074e73, 2f08362a-7822-4f7a-ac1f-c23104b80c5d]
static_routes       : [70140877-605b-4cf6-a23e-c2a7707fcea0, e1ed760c-3bec-4117-9f89-22f416e6ebd8]

p1:

_uuid               : 055e2390-891f-4c49-80ab-1d8466fc0735
enabled             : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []
ipv6_prefix         : []
ipv6_ra_configs     : {}
mac                 : "fa:16:3e:ac:43:76"
name                : lrp-7c33c1cc-d60d-42ff-87a0-2a2f12364ab5
networks            : ["10.32.0.1/24"]
options             : {}
peer                : []
status              : {}

p2:

_uuid               : 2c548c50-735c-4b69-a58f-d05131074e73
enabled             : []
external_ids        : {}
gateway_chassis     : [ba621169-c551-4da2-a429-ed0057259eb0, c8365b35-1ba1-40b1-a90a-2b2d1ba2693d]
ha_chassis_group    : []
ipv6_prefix         : []
ipv6_ra_configs     : {}
mac                 : "fa:16:3e:f3:cf:0b"
name                : lrp-8bd496d6-acaa-416e-9042-54920959e72f
networks            : ["10.32.0.254/24"]
options             : {}
peer                : []
status              : {}

mac1:

_uuid               : 727a07a9-f83b-4087-bac1-b4186af9ddf7
datapath            : 497d0260-2754-4672-abf0-256bfd02cd17
ip                  : "10.32.0.254"
logical_port        : lrp-7c33c1cc-d60d-42ff-87a0-2a2f12364ab5
mac                 : "fa:16:3e:f3:cf:0b"
timestamp           : 1722947850158

dp1:

_uuid               : 497d0260-2754-4672-abf0-256bfd02cd17
external_ids        : {always_learn_from_arp_request="false", logical-router="c7698e88-019d-48dd-a6e8-97978f404624", mac_binding_age_threshold="5"}
load_balancers      : []
tunnel_key          : 26

mac2:

_uuid               : 1f335763-aa6c-41aa-ae3e-59067257bb5f
datapath            : c36656c4-c46c-46e1-8d7c-9ff0d3dc1220
ip                  : "10.32.0.1"
logical_port        : lrp-8bd496d6-acaa-416e-9042-54920959e72f
mac                 : "fa:16:3e:ac:43:76"
timestamp           : 1722947850192

dp2:

_uuid               : c36656c4-c46c-46e1-8d7c-9ff0d3dc1220
external_ids        : {always_learn_from_arp_request="false", logical-router="c476e2ef-5cad-474c-8595-5ae6751ce3f8", mac_binding_age_threshold="5"}
load_balancers      : []
tunnel_key          : 108

As you can see, mac_binding_age_threshold=5, but related mac_bindings alive as related router live without any traffic between them and timestamps don't updated.
I'll save this config for further investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants