From f727c63790606055979d9d4bd98d0ba3a04abbd0 Mon Sep 17 00:00:00 2001 From: zhou-run Date: Thu, 11 Jul 2024 11:35:34 +0800 Subject: [PATCH] isisd: fix crash when calculating the neighbor spanning tree based on the fragmented LSP 1. When the root IS regenerates an LSP, it calls lsp_build() -> lsp_clear_data() to free the TLV memory of the first fragment and all other fragments. If the number of fragments in the regenerated LSP decreases or if no fragmentation is needed, the extra LSP fragments are not immediately deleted. Instead, lsp_seqno_update() -> lsp_purge() is called to set the remaining time to zero and start aging, while also notifying other IS nodes to age these fragments. lsp_purge() usually does not reset lsp->hdr.seqno to zero because the LSP might recover during the aging process. 2. When other IS nodes receive an LSP, they always call process_lsp() -> isis_unpack_tlvs() to allocate TLV memory for the LSP. This does not differentiate whether the received LSP has a remaining lifetime of zero. Therefore, it is rare for an LSP of a non-root IS to have empty TLVs. Of course, if an LSP with a remaining time of zero and already corrupted is received, lsp_update() -> lsp_purge() will be called to free the TLV memory of the LSP, but this scenario is rare. 3. In LFA calculations, neighbors of the root IS are traversed, and each neighbor is taken as a new root to compute the neighbor SPT. During this process, the old root IS will serve as a neighbor of the new root IS, triggering a call to isis_spf_process_lsp() to parse the LSP of the old root IS and obtain its IP vertices and neighboring IS vertices. However, isis_spf_process_lsp() only checks whether the TLVs in the first fragment of the LSP exist, and does not check the TLVs in the fragmented LSP. If the TLV memory of the fragmented LSP of the old root IS has been freed, it can lead to a null pointer access, causing the current crash. Additionally, for the base SPT, there are only two places where the LSP of the root IS is parsed: 1. When obtaining the UP neighbors of the root IS via spf_adj_list_parse_lsp(). 2. When preloading the IP vertices of the root IS via isis_lsp_iterate_ip_reach(). Both of these checks ensure that frag->tlvs is not null, and they do not subsequently call isis_spf_process_lsp() to parse the root IS's LSP. It is very rare for non-root IS LSPs to have empty TLVs unless they are corrupted LSPs awaiting deletion. If it happens, a crash will occur. The backtrace is as follows: (gdb) bt #0 0x00007f3097281fe1 in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f30973a2972 in core_handler (signo=11, siginfo=0x7ffce66c2870, context=0x7ffce66c2740) at ../lib/sigevent.c:261 #2 #3 0x000055dfa805512b in isis_spf_process_lsp (spftree=0x55dfa950eee0, lsp=0x55dfa94cb590, cost=10, depth=1, root_sysid=0x55dfa950ef6c "", parent=0x55dfa952fca0) at ../isisd/isis_spf.c:898 #4 0x000055dfa805743b in isis_spf_loop (spftree=0x55dfa950eee0, root_sysid=0x55dfa950ef6c "") at ../isisd/isis_spf.c:1688 #5 0x000055dfa805784f in isis_run_spf (spftree=0x55dfa950eee0) at ../isisd/isis_spf.c:1808 #6 0x000055dfa8037ff5 in isis_spf_run_neighbors (spftree=0x55dfa9474440) at ../isisd/isis_lfa.c:1259 #7 0x000055dfa803ac17 in isis_spf_run_lfa (area=0x55dfa9477510, spftree=0x55dfa9474440) at ../isisd/isis_lfa.c:2300 #8 0x000055dfa8057964 in isis_run_spf_with_protection (area=0x55dfa9477510, spftree=0x55dfa9474440) at ../isisd/isis_spf.c:1827 #9 0x000055dfa8057c15 in isis_run_spf_cb (thread=0x7ffce66c38e0) at ../isisd/isis_spf.c:1889 #10 0x00007f30973bbf04 in thread_call (thread=0x7ffce66c38e0) at ../lib/thread.c:1990 #11 0x00007f309735497b in frr_run (master=0x55dfa91733c0) at ../lib/libfrr.c:1198 #12 0x000055dfa8029d5d in main (argc=5, argv=0x7ffce66c3b08, envp=0x7ffce66c3b38) at ../isisd/isis_main.c:273 (gdb) f 3 #3 0x000055dfa805512b in isis_spf_process_lsp (spftree=0x55dfa950eee0, lsp=0x55dfa94cb590, cost=10, depth=1, root_sysid=0x55dfa950ef6c "", parent=0x55dfa952fca0) at ../isisd/isis_spf.c:898 898 ../isisd/isis_spf.c: No such file or directory. (gdb) p te_neighs $1 = (struct isis_item_list *) 0x120 (gdb) p lsp->tlvs $2 = (struct isis_tlvs *) 0x0 (gdb) p lsp->hdr $3 = {pdu_len = 27, rem_lifetime = 0, lsp_id = "\000\000\000\000\000\001\000\001", seqno = 4, checksum = 59918, lsp_bits = 1 '\001'} The backtrace provided above pertains to version 8.5.4, but it seems that the same issue exists in the code of the master branch as well. I have reviewed the process for calculating the SPT based on the LSP, and isis_spf_process_lsp() is the only function that does not check whether the TLVs in the fragments are empty. Therefore, I believe that modifying this function alone should be sufficient. If the TLVs of the current fragment are already empty, we do not need to continue processing subsequent fragments. This is consistent with the behavior where we do not process fragments if the TLVs of the first fragment are empty. Of course, one could argue that lsp_purge() should still retain the TLV memory, freeing it and then reallocating it if needed. However, this is a debatable point because in some scenarios, it is permissible for the LSP to have empty TLVs. For example, after receiving an SNP (Sequence Number PDU) message, an empty LSP (with lsp->hdr.seqno = 0) might be created by calling lsp_new. If the corresponding LSP message is discarded due to domain or area authentication failure, the TLV memory wouldn't be allocated. Test scenario: In an LFA network, importing a sufficient number of static routes to cause LSP fragmentation, and then rolling back the imported static routes so that the LSP is no longer fragmented, can easily result in this issue. Signed-off-by: zhou-run (cherry picked from commit e905177a8c9d67713682d46934c7a87a0913c250) --- isisd/isis_spf.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/isisd/isis_spf.c b/isisd/isis_spf.c index e349373372..1197f8c372 100644 --- a/isisd/isis_spf.c +++ b/isisd/isis_spf.c @@ -873,6 +873,9 @@ static int isis_spf_process_lsp(struct isis_spftree *spftree, || (mt_router_info && !mt_router_info->overload)); lspfragloop: + if (!lsp->tlvs) + return ISIS_OK; + if (lsp->hdr.seqno == 0) { zlog_warn("%s: lsp with 0 seq_num - ignore", __func__); return ISIS_WARNING;