Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock on ssl_flush_wbio #3906

Closed
XQTING-Wout opened this issue Mar 26, 2024 · 1 comment · Fixed by #3905
Closed

Deadlock on ssl_flush_wbio #3906

XQTING-Wout opened this issue Mar 26, 2024 · 1 comment · Fixed by #3905

Comments

@XQTING-Wout
Copy link

Describe the bug

Same setup as in #3898 but with an older pjsip version. A loadtest in which an IVR sends multiple short SAVP calls to an simulation of an agent. (both based on pjsip without the pjsua layer).

The deadlock happens with two threads trying to join the same thread with pj_thead_join(clock->thread) in clock_thread.c. When the first thread joins, the second one can't join that thread anymore.

Hopefully this gives some more context of what goes wrong together with #3898.
I will try to recreate this deadlock on pjsip 2.14.1 as well.

Steps to reproduce

  1. Send multiple calls invites (SAVP) (without pjsua layer)
  2. Disconnect those calls from caller side after a short amount of time (150-250ms)

PJSIP version

2.13

Context

This issue happens on Linux
config_site.h contents: config_site.h.txt
OpenSSL 3.1.3

Log, call stack, etc

SipStack event thread 

[2024-03-13 10:56:35.193] [SipStack] [trace] [433102] 6152caf91e33e52e90370b61b0ec0bfa: event onCallDisconnected(normal)
[2024-03-13 10:56:35.193] [PjSip] [trace] [433102]                     L4 !UDP media transport already stopped
[2024-03-13 10:56:35.193] [PjSip] [trace] [433102]     dtls0x558ea8c0fb50 !dtls_media_stop()
[2024-03-13 10:56:35.193] [PjSip] [trace] [433102] event processing threa  Joining thread clock
[2024-03-13 10:56:35.440] [PjSip] [trace] [433102]    clock0x7f45a801f550 !reset(): cap=512, used=240(46%)
[2024-03-13 10:56:35.440] [PjSip] [trace] [433102] dtls_ssl_lock0x558ea8b  Mutex: thread event processing thread is waiting
[2024-03-13 10:56:35.440] [PjSip] [trace] [433102] dtls_ssl_lock0x558ea8b  Mutex acquired by thread event processing thread
[2024-03-13 10:56:35.440] [PjSip] [trace] [433102]     dtls0x558ea8c0fb50  Sent SSL3_RT_HEADER: Version = DTLS 1.2 (0xfefd)
[2024-03-13 10:56:35.440] [PjSip] [trace] [433102]     dtls0x558ea8c0fb50  Sent Alert Level=warning(1), description=close notify(0)
[2024-03-13 10:56:35.440] [PjSip] [trace] [433102] dtls_ssl_lock0x558ea8b  Mutex released by thread event processing thread
[2024-03-13 10:56:35.440] [PjSip] [debug] [433102]                     L4  Stopping ICE, reason=media stop requested
[2024-03-13 10:56:35.440] [PjSip] [trace] [433102]     glck0x558ea8c09730  Mutex: thread event processing thread is waiting

grep glck0x558ea8c09730
	count acquired 80
	count released 78

[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]     glck0x558ea8c09730  Mutex acquired by thread media
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]     glck0x558ea8c09730 !Mutex released by thread media
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]     glck0x558ea8c09730  Mutex released by thread media
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]     glck0x558ea8c09730  Mutex: thread media is waiting
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]     glck0x558ea8c09730  Mutex acquired by thread media
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]     glck0x558ea8c09730  Mutex released by thread media
[2024-03-13 10:56:35.248] [PjSip] [trace] [433101]     glck0x558ea8c09730  Mutex: thread SIP event loop thread is waiting
[2024-03-13 10:56:35.440] [PjSip] [trace] [433102]     glck0x558ea8c09730  Mutex: thread event processing thread is waiting
[2024-03-13 11:35:01.860] [PjSip] [trace] [433097]     glck0x558ea8c09730  .Mutex: thread thr0x7f45dcff0240 is waiting

thread media

[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]     glck0x558ea8c09730  Mutex: thread media is waiting
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]     glck0x558ea8c09730  Mutex acquired by thread media
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]      atm0x558ea8c09878  Mutex: thread media is waiting
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]      atm0x558ea8c09878  Mutex acquired by thread media
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]      atm0x558ea8c09878  Mutex released by thread media
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]     glck0x558ea8c09730  Mutex released by thread media
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]      atm0x558ea8c09878  Mutex: thread media is waiting
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]      atm0x558ea8c09878  Mutex acquired by thread media
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]      atm0x558ea8c09878  Mutex released by thread media
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104] dtls_ssl_lock0x558ea8b  Mutex: thread media is waiting
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104] dtls_ssl_lock0x558ea8b !Mutex acquired by thread media
[2024-03-13 10:56:35.207] [PjSip] [warning] [433104]     dtls0x558ea8c0fb50  DTLS-SRTP negotiation completed!
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104] dtls_ssl_lock0x558ea8b  Mutex released by thread media
[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]                  media  Joining thread clock

gdb of dump of deadlock + analysis

[Switching to thread 6 (Thread 0x7f45c5ffb700 (LWP 433102))]
#0  0x00007f45dcbc376d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f45dcbbcb54 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x0000558ea6bbf415 in pj_mutex_lock (mutex=0x558ea8c098e8) at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjlib/src/pj/os_core_unix.c:1433
#3  0x0000558ea6bb1d11 in grp_lock_acquire (p=0x558ea8c097d8) at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjlib/src/pj/lock.c:297
#4  0x0000558ea6bb1d75 in pj_grp_lock_acquire (grp_lock=<optimized out>) at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjlib/src/pj/lock.c:486
#5  0x0000558ea6bda9c5 in pj_ice_strans_stop_ice (ice_st=0x558ea8c08748)
    at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjnath/src/pjnath/ice_strans.c:1743
#6  0x0000558ea6b96c31 in set_no_ice (tp_ice=0x558ea8c02888, reason=reason@entry=0x558ea7011a75 "media stop requested", err=err@entry=0)
    at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjmedia/src/pjmedia/transport_ice.c:789
#7  0x0000558ea6b96c82 in transport_media_stop (tp=<optimized out>)
    at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjmedia/src/pjmedia/transport_ice.c:2249
#8  0x0000558ea6b9e9d6 in pjmedia_transport_media_stop (tp=<optimized out>)
    at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjmedia/include/pjmedia/transport.h:1018
#9  transport_media_stop (tp=0x558ea8c0e7e8) at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjmedia/src/pjmedia/transport_srtp.c:1939
#10 0x0000558ea6afd051 in pjmedia_transport_media_stop (tp=<optimized out>)
    at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/shared-lib-dep/SipStackLib/../../third-party-lib/pjsip/pjmedia/include/pjmedia/transport.h:1018
#11 MediaLine::stopTransport (this=0x558ea8c04810, callId=...) at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/shared-lib-dep/SipStackLib/MediaLine.cpp:139
#12 0x0000558ea6afd0bc in MediaLine::freeLine (this=<optimized out>, callId=...) at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/shared-lib-dep/SipStackLib/MediaLine.cpp:125

frame #2

(gdb) display mutex->mutex
1: mutex->mutex = {__data = {__lock = 2, __count = 2, __owner = 433104, __nusers = 1, __kind = 1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
  __size = "\002\000\000\000\002\000\000\000Л\006\000\001\000\000\000\001", '\000' <repeats 22 times>, __align = 8589934594}
  
owner = 433104 = thread media  

[Switching to thread 8 (Thread 0x7f45c4ff9700 (LWP 433104))]
#0  0x00007f45dcbbb67d in __pthread_timedjoin_ex () from /lib64/libpthread.so.0
#1  0x0000558ea6bbf2ed in pj_thread_join (p=0x7f45a801f5f8) at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjlib/src/pj/os_core_unix.c:876
#2  0x0000558ea6ba7c5f in pjmedia_clock_stop (clock=0x7f45a801ee18)
    at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjmedia/src/pjmedia/clock_thread.c:245
#3  0x0000558ea6b9f71f in ssl_flush_wbio (ds=ds@entry=0x558ea8c0fbf8)
    at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjmedia/src/pjmedia/transport_srtp_dtls.c:922
#4  0x0000558ea6ba1303 in ssl_on_recv_packet (len=<optimized out>, data=<optimized out>, ds=0x558ea8c0fbf8)
    at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjmedia/src/pjmedia/transport_srtp_dtls.c:1265
#5  dtls_on_recv_rtp (tp=0x558ea8c0fbf8, pkt=<optimized out>, size=<optimized out>)
    at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjmedia/src/pjmedia/transport_srtp_dtls.c:1377
#6  0x0000558ea6ba18ed in pjmedia_transport_send_rtp (size=1245, pkt=0x558ea8c0c7b8, tp=<optimized out>)
    at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjmedia/include/pjmedia/transport.h:869
#7  srtp_rtp_cb (param=0x7f45c4ff2a70) at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjmedia/src/pjmedia/transport_srtp.c:1465
#8  0x0000558ea6b968c6 in ice_on_rx_data (ice_st=<optimized out>, comp_id=<optimized out>, pkt=0x558ea8c0c7b8, size=1245, src_addr=0x558ea8c0d0b0, src_addr_len=<optimized out>)
    at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjmedia/src/pjmedia/transport_ice.c:2561
#9  0x0000558ea6bd65c3 in pj_ice_sess_on_rx_pkt (ice=0x7f45b8071a38, comp_id=1, transport_id=<optimized out>, pkt=pkt@entry=0x558ea8c0c7b8, pkt_size=pkt_size@entry=1245,
    src_addr=src_addr@entry=0x558ea8c0d0b0, src_addr_len=16) at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjnath/src/pjnath/ice_session.c:3692
#10 0x0000558ea6bd67a1 in stun_on_rx_data (stun_sock=<optimized out>, pkt=0x558ea8c0c7b8, pkt_len=1245, src_addr=0x558ea8c0d0b0, addr_len=16)
    at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjnath/src/pjnath/ice_strans.c:2364
#11 0x0000558ea6be0d3b in on_data_recvfrom (asock=<optimized out>, data=0x558ea8c0c7b8, size=1245, src_addr=0x558ea8c0d0b0, addr_len=16, status=<optimized out>)
    at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjnath/src/pjnath/stun_sock.c:1001
	
(gdb) display rec->thread
2: rec->thread = 139937550628608 = 0x7F45BFFEF700

Joining thread clock

thread id doens't exist?

theory: thread doesn't exist anymore because already stopped outside of pj_lock_release(ds->ossl_lock); in ssl_flush_wbio

	[2024-03-13 10:56:35.193] [PjSip] [trace] [433102]     dtls0x558ea8c0fb50 !dtls_media_stop()
	[2024-03-13 10:56:35.193] [PjSip] [trace] [433102] event processing threa  Joining thread clock

	[2024-03-13 10:56:35.207] [PjSip] [warning] [433104]     dtls0x558ea8c0fb50  DTLS-SRTP negotiation completed!
	[2024-03-13 10:56:35.207] [PjSip] [trace] [433104] dtls_ssl_lock0x558ea8b  Mutex released by thread media
	[2024-03-13 10:56:35.207] [PjSip] [trace] [433104]                  media  Joining thread clock


grep thread clock -> id's 433108, 433109, 433110, 433111, 433112
conference bridges ???

thread 12
[Switching to thread 12 (Thread 0x7f45c2ff5700 (LWP 433108))]
frame 4
#4  0x0000558ea6bbf73d in thread_main (param=0x558ea97665a8) at /mnt/c/Users/WoutVanKets/Novomind/ecom-call-sipstack/ecom-call-sip-stack-native/third-party-lib/pjsip/pjlib/src/pj/os_core_unix.c:685
display *(pj_thread_t*)0x558ea97665a8
5: *(pj_thread_t*)0x558ea97665a8 = {obj_name = "clock", '\000' <repeats 26 times>, thread = 139937600984832, proc = 0x558ea6ba76a0 <clock_thread>, arg = 0x558ea9765338, signature1 = 0, signature2 = 0,
  suspended_mutex = 0x0}
  
thread 13	--> thread = 139937592592128  
thread 14   --> thread = 139937584199424
thread 15	--> thread = 139937575806720
thread 16	--> thread = 139937567414016
@sauwming sauwming linked a pull request Mar 27, 2024 that will close this issue
@sauwming
Copy link
Member

I have created a patch for this in #3905. Let us know if it resolves the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants