Deadlock when snd-auto-close=1 (the default)

SM
Scott M Ober
Thu, Apr 21, 2016 6:48 PM

I'm running PJSUA from PJSIP version 2.4.5 on both a Mac and on Linux,

I've modified the PJSUA to be a test app for our SIP servers.
I've added a thread that makes calls from all configured SIP URI's to a test number.

A call is made and when the DISCONNECTED state happens it starts the next call after waiting 3 seconds.

This works for about 6 hours typically when a deadlock occurs.

The deadlock always occurs because the close_snd_timer_cb has obtained a PJSUA_LOCK and some flaw in the code
in pa_unix_util.c PaUnixThread_Terminate is waiting for a join that never happens.

The setting for snd-auto-close is 1 (the default)

I'm retrying the system with snd-auto-close set to -1 and will relay the results.
(setting it to -1 works after 2 days)

It seems dangerous to keep the entire system waiting for a sound device to close and a thread to stop on a system wide lock.

Has anyone else experienced this problem?

-- Scott.

-- Code where lock is obtained

pjsua_aud.c

/* Timer callback to close sound device */
static void close_snd_timer_cb( pj_timer_heap_t *th,
pj_timer_entry *entry)
{
PJ_UNUSED_ARG(th);

 PJSUA_LOCK();
 if (entry->id) {
PJ_LOG(4,(THIS_FILE,"Closing sound device after idle for %d second(s)",
	  pjsua_var.media_cfg.snd_auto_close_time));

entry->id = PJ_FALSE;

close_snd_dev();
 }
 PJSUA_UNLOCK();

}

-- GDB backtrace of thread which owns the lock

(gdb) bt
#0  0x00007ffff707c66b in pthread_join (threadid=140737260861184, thread_return=0x7ffff3f0d8a8) at pthread_join.c:92
#1  0x000000000055a4fb in PaUnixThread_Terminate (self=0x7fffec003120, wait=1, exitResult=0x7ffff3f0d8e8) at src/../../../portaudio/src/os/unix/pa_unix_util.c:433
#2  0x0000000000561216 in RealStop (stream=0x7fffec002fa0, abort=0) at src/../../../portaudio/src/hostapi/alsa/pa_linux_alsa.c:2251
#3  0x00000000005612ca in StopStream (s=0x7fffec002fa0) at src/../../../portaudio/src/hostapi/alsa/pa_linux_alsa.c:2279
#4  0x00000000005599ad in Pa_StopStream (stream=0x7fffec002fa0) at src/../../../portaudio/src/common/pa_front.c:1423
#5  0x00000000004da37c in strm_stop (s=0x7fffec001b58) at ../src/pjmedia-audiodev/pa_dev.c:1239
#6  0x00000000004d7b8e in pjmedia_aud_stream_stop (strm=0x7fffec001b58) at ../src/pjmedia-audiodev/audiodev.c:838
#7  0x00000000004b6ac7 in stop_sound_device (snd_port=0x7fffec000bc8) at ../src/pjmedia/sound_port.c:347
#8  0x00000000004b709b in pjmedia_snd_port_destroy (snd_port=0x7fffec000bc8) at ../src/pjmedia/sound_port.c:538
#9  0x0000000000448f10 in close_snd_dev () at ../src/pjsua-lib/pjsua_aud.c:1902
#10 0x0000000000445ca7 in close_snd_timer_cb (th=0x920ee8, entry=0x8fe580 <pjsua_var+591008>) at ../src/pjsua-lib/pjsua_aud.c:448
#11 0x0000000000589e90 in pj_timer_heap_poll (ht=0x920ee8, next_delay=0x7ffff3f0de50) at ../src/pj/timer.c:643
#12 0x00000000004752f9 in pjsip_endpt_handle_events2 (endpt=0x920bc8, max_timeout=0x7ffff3f0de90, p_count=0x7ffff3f0de88) at ../src/pjsip/sip_endpoint.c:712
#13 0x000000000043386e in pjsua_handle_events (msec_timeout=10) at ../src/pjsua-lib/pjsua_core.c:1856
#14 0x000000000043139d in worker_thread (arg=0x0) at ../src/pjsua-lib/pjsua_core.c:696
#15 0x00000000005739f3 in thread_main (param=0x984e48) at ../src/pj/os_core_unix.c:541
#16 0x00007ffff707b182 in start_thread (arg=0x7ffff3f0e700) at pthread_create.c:312
#17 0x00007ffff663a47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

-- Screen Output

[Thread 0x7ffff270b700 (LWP 32303) exited]
[New Thread 0x7ffff270b700 (LWP 32312)]
[Thread 0x7ffff270b700 (LWP 32312) exited]
[New Thread 0x7ffff270b700 (LWP 32317)]
[Thread 0x7ffff270b700 (LWP 32317) exited]
[New Thread 0x7ffff270b700 (LWP 32326)]
[Thread 0x7ffff270b700 (LWP 32326) exited]
[New Thread 0x7ffff270b700 (LWP 32335)]
[Thread 0x7ffff270b700 (LWP 32335) exited]
[New Thread 0x7ffff270b700 (LWP 32344)]
[Thread 0x7ffff270b700 (LWP 32344) exited]
[New Thread 0x7ffff270b700 (LWP 32351)]
13:12:49.382 tcpc0x7fffe800  TCP connect() error: Connection timed out [code=120110]
13:12:49.382    pjsua_app.c  SIP TCP transport is disconnected from [54.252.148.80:5062]: Connection timed out [status=120110]
[Thread 0x7ffff270b700 (LWP 32351) exited]
[New Thread 0x7ffff270b700 (LWP 32363)]
[Thread 0x7ffff270b700 (LWP 32363) exited]
[New Thread 0x7ffff270b700 (LWP 32370)]
[Thread 0x7ffff270b700 (LWP 32370) exited]
[New Thread 0x7ffff270b700 (LWP 32379)]
13:14:07.605 tcpc0x7fffe800  TCP connect() error: Connection refused [code=120111]
13:14:07.605    pjsua_app.c  SIP TCP transport is disconnected from [182.23.65.18:5062]: Connection refused [status=120111]
13:14:15.927  pjsua_media.c  ......pjmedia_transport_media_start() failed for call_id 0 media 0: Invalid SRTP crypto attribute (PJMEDIA_SRTP_ESDPINCRYPTO)
13:14:15.927  pjsua_media.c  ......Error updating media call00:0: Invalid SRTP crypto attribute (PJMEDIA_SRTP_ESDPINCRYPTO)
13:14:15.927  pjsua_call.c  .....Unable to create media session: No active media stream after negotiation (PJMEDIA_SDPNEG_ENOMEDIA) [status=220048]
[Thread 0x7ffff270b700 (LWP 32379) exited]
[New Thread 0x7ffff270b700 (LWP 32382)]
[Thread 0x7ffff270b700 (LWP 32382) exited]
[New Thread 0x7ffff270b700 (LWP 32391)]
[Thread 0x7ffff270b700 (LWP 32391) exited]
[New Thread 0x7ffff270b700 (LWP 32398)]
[Thread 0x7ffff270b700 (LWP 32398) exited]
[New Thread 0x7ffff270b700 (LWP 32405)]
[Thread 0x7ffff270b700 (LWP 32405) exited]
[New Thread 0x7ffff270b700 (LWP 32414)]
[Thread 0x7ffff270b700 (LWP 32414) exited]
[New Thread 0x7ffff270b700 (LWP 32432)]
13:17:18.433  pjsua_media.c  ......pjmedia_transport_media_start() failed for call_id 2 media 0: Invalid SRTP crypto attribute (PJMEDIA_SRTP_ESDPINCRYPTO)
13:17:18.433  pjsua_media.c  ......Error updating media call02:0: Invalid SRTP crypto attribute (PJMEDIA_SRTP_ESDPINCRYPTO)
13:17:18.434  pjsua_call.c  .....Unable to create media session: No active media stream after negotiation (PJMEDIA_SDPNEG_ENOMEDIA) [status=220048]
[Thread 0x7ffff270b700 (LWP 32432) exited]
[New Thread 0x7ffff270b700 (LWP 32437)]
[Thread 0x7ffff270b700 (LWP 32437) exited]
[New Thread 0x7ffff270b700 (LWP 32450)]

I'm running PJSUA from PJSIP version 2.4.5 on both a Mac and on Linux, I've modified the PJSUA to be a test app for our SIP servers. I've added a thread that makes calls from all configured SIP URI's to a test number. A call is made and when the DISCONNECTED state happens it starts the next call after waiting 3 seconds. This works for about 6 hours typically when a deadlock occurs. The deadlock always occurs because the close_snd_timer_cb has obtained a PJSUA_LOCK and some flaw in the code in pa_unix_util.c PaUnixThread_Terminate is waiting for a join that never happens. The setting for snd-auto-close is 1 (the default) I'm retrying the system with snd-auto-close set to -1 and will relay the results. (setting it to -1 works after 2 days) It seems dangerous to keep the entire system waiting for a sound device to close and a thread to stop on a system wide lock. Has anyone else experienced this problem? -- Scott. -- Code where lock is obtained pjsua_aud.c /* Timer callback to close sound device */ static void close_snd_timer_cb( pj_timer_heap_t *th, pj_timer_entry *entry) { PJ_UNUSED_ARG(th); PJSUA_LOCK(); if (entry->id) { PJ_LOG(4,(THIS_FILE,"Closing sound device after idle for %d second(s)", pjsua_var.media_cfg.snd_auto_close_time)); entry->id = PJ_FALSE; close_snd_dev(); } PJSUA_UNLOCK(); } -- GDB backtrace of thread which owns the lock (gdb) bt #0 0x00007ffff707c66b in pthread_join (threadid=140737260861184, thread_return=0x7ffff3f0d8a8) at pthread_join.c:92 #1 0x000000000055a4fb in PaUnixThread_Terminate (self=0x7fffec003120, wait=1, exitResult=0x7ffff3f0d8e8) at src/../../../portaudio/src/os/unix/pa_unix_util.c:433 #2 0x0000000000561216 in RealStop (stream=0x7fffec002fa0, abort=0) at src/../../../portaudio/src/hostapi/alsa/pa_linux_alsa.c:2251 #3 0x00000000005612ca in StopStream (s=0x7fffec002fa0) at src/../../../portaudio/src/hostapi/alsa/pa_linux_alsa.c:2279 #4 0x00000000005599ad in Pa_StopStream (stream=0x7fffec002fa0) at src/../../../portaudio/src/common/pa_front.c:1423 #5 0x00000000004da37c in strm_stop (s=0x7fffec001b58) at ../src/pjmedia-audiodev/pa_dev.c:1239 #6 0x00000000004d7b8e in pjmedia_aud_stream_stop (strm=0x7fffec001b58) at ../src/pjmedia-audiodev/audiodev.c:838 #7 0x00000000004b6ac7 in stop_sound_device (snd_port=0x7fffec000bc8) at ../src/pjmedia/sound_port.c:347 #8 0x00000000004b709b in pjmedia_snd_port_destroy (snd_port=0x7fffec000bc8) at ../src/pjmedia/sound_port.c:538 #9 0x0000000000448f10 in close_snd_dev () at ../src/pjsua-lib/pjsua_aud.c:1902 #10 0x0000000000445ca7 in close_snd_timer_cb (th=0x920ee8, entry=0x8fe580 <pjsua_var+591008>) at ../src/pjsua-lib/pjsua_aud.c:448 #11 0x0000000000589e90 in pj_timer_heap_poll (ht=0x920ee8, next_delay=0x7ffff3f0de50) at ../src/pj/timer.c:643 #12 0x00000000004752f9 in pjsip_endpt_handle_events2 (endpt=0x920bc8, max_timeout=0x7ffff3f0de90, p_count=0x7ffff3f0de88) at ../src/pjsip/sip_endpoint.c:712 #13 0x000000000043386e in pjsua_handle_events (msec_timeout=10) at ../src/pjsua-lib/pjsua_core.c:1856 #14 0x000000000043139d in worker_thread (arg=0x0) at ../src/pjsua-lib/pjsua_core.c:696 #15 0x00000000005739f3 in thread_main (param=0x984e48) at ../src/pj/os_core_unix.c:541 #16 0x00007ffff707b182 in start_thread (arg=0x7ffff3f0e700) at pthread_create.c:312 #17 0x00007ffff663a47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 -- Screen Output [Thread 0x7ffff270b700 (LWP 32303) exited] [New Thread 0x7ffff270b700 (LWP 32312)] [Thread 0x7ffff270b700 (LWP 32312) exited] [New Thread 0x7ffff270b700 (LWP 32317)] [Thread 0x7ffff270b700 (LWP 32317) exited] [New Thread 0x7ffff270b700 (LWP 32326)] [Thread 0x7ffff270b700 (LWP 32326) exited] [New Thread 0x7ffff270b700 (LWP 32335)] [Thread 0x7ffff270b700 (LWP 32335) exited] [New Thread 0x7ffff270b700 (LWP 32344)] [Thread 0x7ffff270b700 (LWP 32344) exited] [New Thread 0x7ffff270b700 (LWP 32351)] 13:12:49.382 tcpc0x7fffe800 TCP connect() error: Connection timed out [code=120110] 13:12:49.382 pjsua_app.c SIP TCP transport is disconnected from [54.252.148.80:5062]: Connection timed out [status=120110] [Thread 0x7ffff270b700 (LWP 32351) exited] [New Thread 0x7ffff270b700 (LWP 32363)] [Thread 0x7ffff270b700 (LWP 32363) exited] [New Thread 0x7ffff270b700 (LWP 32370)] [Thread 0x7ffff270b700 (LWP 32370) exited] [New Thread 0x7ffff270b700 (LWP 32379)] 13:14:07.605 tcpc0x7fffe800 TCP connect() error: Connection refused [code=120111] 13:14:07.605 pjsua_app.c SIP TCP transport is disconnected from [182.23.65.18:5062]: Connection refused [status=120111] 13:14:15.927 pjsua_media.c ......pjmedia_transport_media_start() failed for call_id 0 media 0: Invalid SRTP crypto attribute (PJMEDIA_SRTP_ESDPINCRYPTO) 13:14:15.927 pjsua_media.c ......Error updating media call00:0: Invalid SRTP crypto attribute (PJMEDIA_SRTP_ESDPINCRYPTO) 13:14:15.927 pjsua_call.c .....Unable to create media session: No active media stream after negotiation (PJMEDIA_SDPNEG_ENOMEDIA) [status=220048] [Thread 0x7ffff270b700 (LWP 32379) exited] [New Thread 0x7ffff270b700 (LWP 32382)] [Thread 0x7ffff270b700 (LWP 32382) exited] [New Thread 0x7ffff270b700 (LWP 32391)] [Thread 0x7ffff270b700 (LWP 32391) exited] [New Thread 0x7ffff270b700 (LWP 32398)] [Thread 0x7ffff270b700 (LWP 32398) exited] [New Thread 0x7ffff270b700 (LWP 32405)] [Thread 0x7ffff270b700 (LWP 32405) exited] [New Thread 0x7ffff270b700 (LWP 32414)] [Thread 0x7ffff270b700 (LWP 32414) exited] [New Thread 0x7ffff270b700 (LWP 32432)] 13:17:18.433 pjsua_media.c ......pjmedia_transport_media_start() failed for call_id 2 media 0: Invalid SRTP crypto attribute (PJMEDIA_SRTP_ESDPINCRYPTO) 13:17:18.433 pjsua_media.c ......Error updating media call02:0: Invalid SRTP crypto attribute (PJMEDIA_SRTP_ESDPINCRYPTO) 13:17:18.434 pjsua_call.c .....Unable to create media session: No active media stream after negotiation (PJMEDIA_SDPNEG_ENOMEDIA) [status=220048] [Thread 0x7ffff270b700 (LWP 32432) exited] [New Thread 0x7ffff270b700 (LWP 32437)] [Thread 0x7ffff270b700 (LWP 32437) exited] [New Thread 0x7ffff270b700 (LWP 32450)]