-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/revert ipv4 mapped ipv6 stuff 10.1 #29
Open
RodrigoMNardi
wants to merge
126
commits into
master
Choose a base branch
from
fix/revert_ipv4_mapped_ipv6_stuff_10.1
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Christian Hopps <[email protected]>
lib: fix incorrect use of error checking macro
When removing a large number of routes, the linux kernel can take the cpu for an extended amount of time, leaving a situation where FRR detects a starvation event. r1# sharp install routes 10.0.0.0 nexthop 192.168.44.33 1000000 repeat 10 2024-06-14 12:55:49.365 [NTFY] sharpd: [M7Q4P-46WDR] vty[5]@# sharp install routes 10.0.0.0 nexthop 192.168.44.33 1000000 repeat 10 2024-06-14 12:55:49.365 [DEBG] sharpd: [YP4TQ-01TYK] Inserting 1000000 routes 2024-06-14 12:55:57.256 [DEBG] sharpd: [TPHKD-3NYSB] Installed All Items 7.890085 2024-06-14 12:55:57.256 [DEBG] sharpd: [YJ486-NX5R1] Removing 1000000 routes 2024-06-14 12:56:07.802 [WARN] zebra: [QH9AB-Y4XMZ][EC 100663314] STARVATION: task dplane_thread_loop (634377bc8f9e) ran for 7078ms (cpu time 220ms) 2024-06-14 12:56:25.039 [DEBG] sharpd: [WTN53-GK9Y5] Removed all Items 27.783668 2024-06-14 12:56:25.039 [DEBG] sharpd: [YP4TQ-01TYK] Inserting 1000000 routes 2024-06-14 12:56:32.783 [DEBG] sharpd: [TPHKD-3NYSB] Installed All Items 7.743524 2024-06-14 12:56:32.783 [DEBG] sharpd: [YJ486-NX5R1] Removing 1000000 routes 2024-06-14 12:56:41.447 [WARN] zebra: [QH9AB-Y4XMZ][EC 100663314] STARVATION: task dplane_thread_loop (634377bc8f9e) ran for 5175ms (cpu time 179ms) Let's modify the loop in dplane_thread_loop such that after a provider has been run, check to see if the event should yield, if so, stop and reschedule this for the future. Signed-off-by: Donald Sharp <[email protected]> (cherry picked from commit 6faad86)
Problem statement: ================== When a vrf is deleted from the kernel, before its removed from the FRR config, zebra gets to delete the the vrf and assiciated state. It does so by sending a request to delete the l3 vni associated with the vrf followed by a request to delete the vrf itself. 2023/10/06 06:22:18 ZEBRA: [JAESH-BABB8] Send L3_VNI_DEL 1001 VRF testVRF1001 to bgp 2023/10/06 06:22:18 ZEBRA: [XC3P3-1DG4D] MESSAGE: ZEBRA_VRF_DELETE testVRF1001 The zebra client communication is asynchronous and about 1/5 cases the bgp client process them in a different order. 2023/10/06 06:22:18 BGP: [VP18N-HB5R6] VRF testVRF1001(766) is to be deleted. 2023/10/06 06:22:18 BGP: [RH4KQ-X3CYT] VRF testVRF1001(766) is to be disabled. 2023/10/06 06:22:18 BGP: [X8ZE0-9TS5H] VRF disable testVRF1001 id 766 2023/10/06 06:22:18 BGP: [X67AQ-923PR] Deregistering VRF 766 2023/10/06 06:22:18 BGP: [K52W0-YZ4T8] VRF Deletion: testVRF1001(4294967295) .. and a bit later : 2023/10/06 06:22:18 BGP: [MRXGD-9MHNX] DJERNAES: process L3VNI 1001 DEL 2023/10/06 06:22:18 BGP: [NCEPE-BKB1G][EC 33554467] Cannot process L3VNI 1001 Del - Could not find BGP instance When the bgp vrf config is removed later it fails on the sanity check if l3vni is removed. if (bgp->l3vni) { vty_out(vty, "%% Please unconfigure l3vni %u\n", bgp->l3vni); return CMD_WARNING_CONFIG_FAILED; } Solution: ========= The solution is to make bgp cleanup the l3vni a bgp instance is going down. The fix: ======== The fix is to add a function in bgp_evpn.c to be responsible for for deleting the local vni, if it should be needed, and call the function from bgp_instance_down(). Testing: ======== Created a test, which can run in container lab that remove the vrf on the host before removing the vrf and the bgp config form frr. Running this test in a loop trigger the problem 18 times of 100 runs. After the fix it did not fail. To verify the fix a log message (which is not in the code any longer) were used when we had a stale l3vni and needed to call bgp_evpn_local_l3vni_del() to do the cleanup. This were hit 20 times in 100 test runs. Signed-off-by: Kacper Kwasny <[email protected]> bgpd: braces {} are not necessary for single line block Signed-off-by: Kacper Kwasny <[email protected]> (cherry picked from commit 171d258)
The backup_nexthop entry list has been populated by mistake, and should not. Fix this by reverting the introduced behavior. Fixes: 237ebf8 ("bgpd: rework bgp_zebra_announce() function, separate nexthop handling") Signed-off-by: Philippe Guibert <[email protected]> (cherry picked from commit d4390fc)
In case of EVPN MH bond, a member port going in protodown state due to external reason (one case being linkflap), frr updates the state correctly but upon manually clearing external reason trigger FRR to reinstate protodown without any reason code. Fix is to ensure if the protodown reason was external and new state is to have protodown 'off' then do no reinstate protodown. Ticket: #3947432 Testing: switch:#ip link show swp1 4: swp1: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 9216 qdisc pfifo_fast master bond1 state DOWN mode DEFAULT group default qlen 1000 link/ether 1c:34:da:2c:aa:68 brd ff:ff:ff:ff:ff:ff protodown on protodown_reason <linkflap> switch:#ip link set swp1 protodown off protodown_reason linkflap off switch:#ip link show swp1 4: swp1: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 9216 qdisc pfifo_fast master bond1 state DOWN mode DEFAULT group default qlen 1000 link/ether 1c:34:da:2c:aa:68 brd ff:ff:ff:ff:ff:ff Signed-off-by: Chirag Shah <[email protected]> (cherry picked from commit e4d843b)
bgpd: fix do not use api.backup_nexthop in ZAPI message (backport #16260)
zebra: fix evpn mh bond member proto reinstall (backport #16252)
bgpd: fixed failing to remove VRF if there is a stale l3vni (backport #16059)
Before this patch, we always printed the last reason "Waiting for OPEN", but if it's a manual shutdown, then we technically are not waiting for OPEN. Signed-off-by: Donatas Abraitis <[email protected]> (cherry picked from commit c25c7e9)
…ailed peer Before: ``` Neighbor EstdCnt DropCnt ResetTime Reason 127.0.0.1 0 0 never Waiting for peer OPEN (n/a) ``` After: ``` Neighbor EstdCnt DropCnt ResetTime Reason 127.0.0.1 0 0 never Waiting for peer OPEN (n/a) ``` Signed-off-by: Donatas Abraitis <[email protected]> (cherry picked from commit b5bd626)
…tware version If we receive CAPABILITY message (software-version), we SHOULD check if we really have enough data before doing memcpy(), that could also lead to buffer overflow. (data + len > end) is not enough, because after this check we do data++ and later memcpy(..., data, len). That means we have one more byte. Hit this through fuzzing by ``` 0 0xaaaaaadf872c in __asan_memcpy (/home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/.libs/bgpd+0x35872c) (BuildId: 9c6e455d0d9a20f5a4d2f035b443f50add9564d7) 1 0xaaaaab06bfbc in bgp_dynamic_capability_software_version /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3713:3 2 0xaaaaab05ccb4 in bgp_capability_msg_parse /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3839:4 3 0xaaaaab05c074 in bgp_capability_receive /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3980:9 4 0xaaaaab05e48c in bgp_process_packet /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:4109:11 5 0xaaaaaae36150 in LLVMFuzzerTestOneInput /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_main.c:582:3 ``` Hit this again by Iggy \m/ Reported-by: Iggy Frankovic <[email protected]> Signed-off-by: Donatas Abraitis <[email protected]> (cherry picked from commit 5d7af51)
…N capability We advance data pointer (data++), but we do memcpy() with the length that is 1-byte over, which is technically heap overflow. ``` ==411461==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x50600011da1a at pc 0xc4f45a9786f0 bp 0xffffed1e2740 sp 0xffffed1e1f30 READ of size 4 at 0x50600011da1a thread T0 0 0xc4f45a9786ec in __asan_memcpy (/home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/.libs/bgpd+0x3586ec) (BuildId: e794c5f796eee20c8973d7efb9bf5735e54d44cd) 1 0xc4f45abf15f8 in bgp_dynamic_capability_fqdn /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3457:4 2 0xc4f45abdd408 in bgp_capability_msg_parse /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3911:4 3 0xc4f45abdbeb4 in bgp_capability_receive /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3980:9 4 0xc4f45abde2cc in bgp_process_packet /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:4109:11 5 0xc4f45a9b6110 in LLVMFuzzerTestOneInput /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_main.c:582:3 ``` Found by fuzzing. Reported-by: Iggy Frankovic <[email protected]> Signed-off-by: Donatas Abraitis <[email protected]> (cherry picked from commit b685ab5)
clear evpn dup-addr cli returns error-msg for below conditions, - If evpn is not enabled & - If there is no VNI exists. supported command: ``` clear evpn dup-addr vni <vni-id> ``` Ticket: #3495573 Testing: bharat# clear evpn dup-addr vni all Error type: validation Error description: % EVPN not enabled bharat# clear evpn dup-addr vni 20 Error type: validation Error description: % VNI 20 does not exist Signed-off-by: Sindhu Parvathi Gopinathan's <[email protected]> Signed-off-by: Chirag Shah <[email protected]> (cherry picked from commit 56c16ee)
…t issues whith the following config router bgp 65001 no bgp ebgp-requires-policy neighbor 192.168.1.2 remote-as external neighbor 192.168.1.2 timers 3 10 ! address-family ipv4 unicast neighbor 192.168.1.2 route-map r2 in exit-address-family exit ! bgp as-path access-list FIRST seq 5 permit ^65 bgp as-path access-list SECOND seq 5 permit 2$ ! route-map r2 permit 6 match ip address prefix-list p2 set as-path exclude as-path-access-list SECOND exit ! route-map r2 permit 10 match ip address prefix-list p1 set as-path exclude 65003 exit ! route-map r2 permit 20 match ip address prefix-list p3 set as-path exclude all exit making some no bgp as-path access-list SECOND permit 2$ bgp as-path access-list SECOND permit 3$ clear bgp * no bgp as-path access-list SECOND permit 3$ bgp as-path access-list SECOND permit 2$ clear bgp * will induce some crashes thus we rework the links between aslists and aspath_exclude Signed-off-by: Francois Dumontet <[email protected]> (cherry picked from commit 094dcc3)
add some match in route map rules add some set unset bgp access path list add another prefix for better tests discrimination update expected results Signed-off-by: Francois Dumontet <[email protected]> (cherry picked from commit 0df2e14)
bgpd: Set last reset reason to admin shutdown if it was manually (backport #16242)
zebra: Prevent starvation in dplane_thread_loop (backport #16224)
bgpd: Check if we have really enough data before doing memcpy for software version (backport #16211)
bgpd: Check if we have really enough data before doing memcpy for FQDN capability (backport #16213)
RFC 8212 defines leak prevention for eBGP peers, but BGP-OAD defines a new peering type One Administrative Domain (OAD), where multiple ASNs could be used inside a single administrative domain. OAD allows sending non-transitive attributes, so this prevention should be relaxed too. Signed-off-by: Donatas Abraitis <[email protected]> (cherry picked from commit 3b98ddf)
Fixes: 79563af ("bgpd: Get 1 or 2 octets for Sub-TLV length (Tunnel Encap attr)") Signed-off-by: Donatas Abraitis <[email protected]> (cherry picked from commit 34b209f)
…tlvs When the packet is malformed it can use whatever values it wants. Let's check what the real data we have in a stream instead of relying on malformed values. Reported-by: Iggy Frankovic <[email protected]> Signed-off-by: Donatas Abraitis <[email protected]> (cherry picked from commit 9929486)
bgpd: Relax OAD (One-Administration-Domain) for RFC8212 (backport #16273)
bgpd: A couple more fixes for Tunnel encapsulation handling (backport #16214)
zebra: clear evpn dup-addr return error-msg when there is no vni (backport #16261)
bgpd: fix "bgp as-pah access-list" with "set aspath exclude" set/unset issue (backport #15838)
Under heavy system load with many peers in passive mode and a large number of routes, bgpd can enter an infinite loop. This occurs while processing timeout BGP_OPEN messages, which prevents it from accepting new connections. The following log entries illustrate the issue: >bgpd[6151]: [VX6SM-8YE5W][EC 33554460] 3.3.2.224: nexthop_set failed, resetting connection - intf 0x0 >bgpd[6151]: [P790V-THJKS][EC 100663299] bgp_open_receive: bgp_getsockname() failed for peer: 3.3.2.224 >bgpd[6151]: [HTQD2-0R1WR][EC 33554451] bgp_process_packet: BGP OPEN receipt failed for peer: 3.3.2.224 ... repeating The issue occurs when bgpd handles a massive number of routes in the RIB while receiving numerous BGP_OPEN packets. If bgpd is overloaded, it fails to process these packets promptly, leading the remote peer to close the connection and resend BGP_OPEN packets. When bgpd eventually starts processing these timeout BGP_OPEN packets, it finds the TCP connection closed by the remote peer, resulting in "bgp_stop()" being called. For each timeout peer, bgpd must iterate through the routing table, which is time-consuming and causes new incoming BGP_OPEN packets to timeout, perpetuating the infinite loop. To address this issue, the code is modified to check if the peer has been established at least once before calling "bgp_clear_route_all()". This ensures that routes are only cleared for peers that had a successful session, preventing unnecessary iterations over the routing table for peers that never established a connection. With this change, BGP_OPEN timeout messages may still occur, but in the worst case, bgpd will stabilize. Before this patch, bgpd could enter a loop where it was unable to accpet any new connections. Signed-off-by: Loïc Sang <[email protected]> (cherry picked from commit e0ae285)
bgpd: avoid clearing routes for peers that were never established (backport #16271)
Fix for a bug, where FRR fails to install route received for an unknown but later-created VRF - detailed description can be found here FRRouting/frr#13708 Signed-off-by: Piotr Suchy <[email protected]> (cherry picked from commit 8044d73)
This reverts commit 8599fe2.
This reverts commit f1b8364.
This reverts commit 5dd731a.
This reverts commit fc5a738.
This reverts commit 04c220b.
This reverts commit 2de4dfc.
This reverts commit ee0378c.
This reverts commit fc1dd2e.
This reverts commit 62913cb. Signed-off-by: Donatas Abraitis <[email protected]>
A crash happens when executing the following command: > ubuntu2204hwe# conf > ubuntu2204hwe(config)# router bgp 65500 > ubuntu2204hwe(config-router)# ! > ubuntu2204hwe(config-router)# address-family ipv4 unicast > ubuntu2204hwe(config-router-af)# sid vpn export auto > ubuntu2204hwe(config-router-af)# exit-address-family > ubuntu2204hwe(config-router)# ! > ubuntu2204hwe(config-router)# address-family ipv4 vpn > ubuntu2204hwe(config-router-af)# network 4.4.4.4/32 rd 55:55 label 556 > ubuntu2204hwe(config-router-af)# network 5.5.5.5/32 rd 662:33 label 232 > ubuntu2204hwe(config-router-af)# exit-address-family > ubuntu2204hwe(config-router)# exit > ubuntu2204hwe(config)# ! > ubuntu2204hwe(config)# no router bgp The crash analysis indicates a memory item has been freed. > #6 0x000076066a629c15 in mt_count_free (mt=0x56b57be85e00 <MTYPE_BGP_NAME>, ptr=0x60200038b4f0) > at lib/memory.c:73 > #7 mt_count_free (ptr=0x60200038b4f0, mt=0x56b57be85e00 <MTYPE_BGP_NAME>) at lib/memory.c:69 > #8 qfree (mt=mt@entry=0x56b57be85e00 <MTYPE_BGP_NAME>, ptr=0x60200038b4f0) at lib/memory.c:129 > #9 0x000056b57bb09ce9 in bgp_free (bgp=<optimized out>) at bgpd/bgpd.c:4120 > #10 0x000056b57bb0aa73 in bgp_unlock (bgp=<optimized out>) at ./bgpd/bgpd.h:2513 > #11 peer_free (peer=0x62a000000200) at bgpd/bgpd.c:1313 > #12 0x000056b57bb0aca8 in peer_unlock_with_caller (name=<optimized out>, peer=<optimized out>) > at bgpd/bgpd.c:1344 > #13 0x000076066a6dbb2c in event_call (thread=thread@entry=0x7ffc8cae1d60) at lib/event.c:2011 > #14 0x000076066a60aa88 in frr_run (master=0x613000000040) at lib/libfrr.c:1214 > #15 0x000056b57b8b2c44 in main (argc=<optimized out>, argv=<optimized out>) at bgpd/bgp_main.c:543 Actually, the BGP_NAME item has not been used at allocation for static->prd_pretty, and this results in reaching 0 quicker at bgp deletion. Fix this by reassigning MTYPE_BGP_NAME to prd_pretty. Fixes: 16600df2c4f4 ("bgpd: fix show run of network route-distinguisher") Signed-off-by: Philippe Guibert <[email protected]> (cherry picked from commit 64594f8)
bgpd: fix memory type for static->prd_pretty (backport #16585)
…vert_ipv4_mapped_ipv6_stuff_10.1 # Conflicts: # bgpd/bgp_nht.c # bgpd/bgp_updgrp_packet.c # bgpd/bgp_vty.c # configure.ac # tests/topotests/bgp_nexthop_mp_ipv4_6/r2/bgp_ipv6_step2.json # tests/topotests/bgp_nexthop_mp_ipv4_6/r3/bgp_ipv6_step2.json # tests/topotests/bgp_nexthop_mp_ipv4_6/test_nexthop_mp_ipv4_6.py
This reverts commit 424fe0b.
This reverts commit b083885.
This reverts commit 778e0df.
This reverts commit 8599fe2.
This reverts commit f1b8364.
This reverts commit 5dd731a.
This reverts commit fc5a738.
This reverts commit 04c220b.
This reverts commit 2de4dfc.
This reverts commit ee0378c.
This reverts commit fc1dd2e.
This reverts commit 62913cb. Signed-off-by: Donatas Abraitis <[email protected]>
rzalamena
force-pushed
the
fix/revert_ipv4_mapped_ipv6_stuff_10.1
branch
from
August 15, 2024 14:52
e5030e5
to
0b550c4
Compare
ci:rerun |
…f_10.1' into fix/revert_ipv4_mapped_ipv6_stuff_10.1
…ped_ipv6_stuff_10.1
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.