Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to stop non existing thread #3898

Closed
XQTING-Wout opened this issue Mar 22, 2024 · 3 comments · Fixed by #3901
Closed

Trying to stop non existing thread #3898

XQTING-Wout opened this issue Mar 22, 2024 · 3 comments · Fixed by #3901

Comments

@XQTING-Wout
Copy link

Describe the bug

I was trying to reproduce a deadlock in pjsip in the newer version with a loadtest. This loadtest was set up as follows: an IVR sends multiple short SAVP calls to an simulation of an agent. (both based on pjsip). But the agent crashes when trying to handle the disconnect of those called (sent by the IVR).

The exception is that the argument pool of the method pj_pool_get_used_size is a nullptr. (see call stack)
Presumably was media_clock_stop already called on the same clock from another thread just before this threads tried to do the same.

Steps to reproduce

  1. Send multiple calls invites (SAVP)
  2. Disconnect those calls from caller side after a short amount of time (150-250ms)

PJSIP version

2.14.1

Context

  • This issue happens on Windows
  • config_site.h contents: config_site.h.txt
  • OpenSSL 3.1.3

Log, call stack, etc

IvrLoadTest.exe!pj_pool_get_used_size(pj_pool_t * pool) Line 31
	at C:\Users\WoutVanKets\Novomind\ecom-call-sipstack\ecom-call-sip-stack-native\third-party-lib\pjsip\pjlib\include\pj\pool_i.h(31)
IvrLoadTest.exe!pj_pool_reset(pj_pool_t * pool) Line 281
	at C:\Users\WoutVanKets\Novomind\ecom-call-sipstack\ecom-call-sip-stack-native\third-party-lib\pjsip\pjlib\src\pj\pool.c(281)
IvrLoadTest.exe!pjmedia_clock_stop(pjmedia_clock * clock) Line 268
	at C:\Users\WoutVanKets\Novomind\ecom-call-sipstack\ecom-call-sip-stack-native\third-party-lib\pjsip\pjmedia\src\pjmedia\clock_thread.c(268)
[Inline Frame] IvrLoadTest.exe!dtls_media_stop_channel(dtls_srtp *) Line 1477
	at C:\Users\WoutVanKets\Novomind\ecom-call-sipstack\ecom-call-sip-stack-native\third-party-lib\pjsip\pjmedia\src\pjmedia\transport_srtp_dtls.c(1477)
IvrLoadTest.exe!dtls_media_stop(pjmedia_transport * tp) Line 1893
	at C:\Users\WoutVanKets\Novomind\ecom-call-sipstack\ecom-call-sip-stack-native\third-party-lib\pjsip\pjmedia\src\pjmedia\transport_srtp_dtls.c(1893)
[Inline Frame] IvrLoadTest.exe!pjmedia_transport_media_stop(pjmedia_transport * tp) Line 1021
	at C:\Users\WoutVanKets\Novomind\ecom-call-sipstack\ecom-call-sip-stack-native\third-party-lib\pjsip\pjmedia\include\pjmedia\transport.h(1021)
IvrLoadTest.exe!transport_media_stop(pjmedia_transport * tp) Line 2070
	at C:\Users\WoutVanKets\Novomind\ecom-call-sipstack\ecom-call-sip-stack-native\third-party-lib\pjsip\pjmedia\src\pjmedia\transport_srtp.c(2070)
[Inline Frame] IvrLoadTest.exe!pjmedia_transport_media_stop(pjmedia_transport * tp) Line 1021
	at C:\Users\WoutVanKets\Novomind\ecom-call-sipstack\ecom-call-sip-stack-native\third-party-lib\pjsip\pjmedia\include\pjmedia\transport.h(1021)
IvrLoadTest.exe!MediaLine::stopTransport(const std::string & callId) Line 140
	at C:\Users\WoutVanKets\Novomind\ecom-call-sipstack\ecom-call-sip-stack-native\shared-lib-dep\SipStackLib\MediaLine.cpp(140)
@sauwming
Copy link
Member

I have created a patch to fix it in #3901.

One thing worth mentioning is that the app seems to initiate media stoppage by calling MediaLine::stopTransport(). I believe this shouldn't be necessary since typically the media should be stopped by other components, such as the call itself via pjsua_media_channel_deinit().

If the issue still persists, please provide us the stack trace of the two threads that race, i.e. the threads that call pjmedia_transport_media_stop() at the same time.

@XQTING-Wout
Copy link
Author

That seems to fix it, thanks!

We don't use pjsua for our IVR (for performance reasons), we use pjsip and pjmedia directly. Hence why pjsua_media_channel_deinit() isn't called.

@XQTING-Wout
Copy link
Author

Hi,
After some more research of the fix, I saw that pjmedia_clock_stop() is now protected by a mutex only in dtls_media_stop(). But pjmedia_clock_stop() is called from within ssl_flush_wbio() as well without mutex protection, so we could still get the error trying to stop a clock that is already stopped by another thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants