Dear PJSIP List / Maintainers,
As I've just been through debugging a deadlock in our application using
Pjsip / pjsua2, I would like to suggest a documentation improvement, that
should help other users run into the same trap as we did.
In our case, we've implemented some code in the pj::Call::onCallState()
callback, that used a Mutex to modify our application state. Another
thread, which took the same application state mutex, called out into PJSIP
with the mutex held. It was an innocent call::getInfo() with our own Mutex
held, but PJSIP deadlocked on this pattern in some, very rare, cases.
To make develoeprs aware, I would suggest improving the
pj::Call::onCallState() documentation as follows:
"
virtual void pj::Call::onCallState ( OnCallStateParam
https://www.pjsip.org/pjsip/docs/html/structpj_1_1OnCallStateParam.htm &
prm ) inlinevirtual
Notify application when call state has changed. Application may then query
the call info to get the detail call states by calling getInfo()
https://www.pjsip.org/pjsip/docs/html/classpj_1_1Call.htm#a5057b298a0f5f5a74a8d37a144ca538e
function.
Note that the callback is made with the PJSUA_LOCK held. So the code should
not take any resources that other Threads hold while calling into PJSIP.
Parameters
prm Callback parameter.
References PJ_UNUSED_ARG
https://www.pjsip.org/pjlib/docs/html/group__pj__config.htm#ga7523f8576efd271c22b188eab05d7870
.
"
What do you think?
Martin
PS here is the backtraces of our deadlock, from pjsip 2.7.1 on Linux/armhf:
Thread 18 "pjsip"
#0 0x76acadf4 in __lll_lock_wait (futex=futex@entry=0x6263c77c,
private=0) at lowlevellock.c:46 # trying to acquire our
application mutex#1 0x76ac3c64 in __GI___pthread_mutex_lock #
trying to acquire our application mutex#2 0x00228320 in ?? () # our
application code callback#3 0x000ffd7c in ?? () # our application
code callback; PJSUA_LOCK is apparently held#4 0x76fd0d34 in
pj::Call::processStateChange (this=this@entry=0x6263c6c0, prm=...) at
../src/pjsua2/call.cpp:723#5 0x76face6c in
pj::Endpoint::on_call_state (call_id=<optimized out>, e=0x711fe50c) at
../src/pjsua2/endpoint.cpp:1005#6 0x76eff200 in
pjsua_call_on_state_changed (inv=0x722a612c, e=0x711fe50c) at
../src/pjsua-lib/pjsua_call.c:3812#7 0x76ec192c in inv_set_state
(inv=0x722a612c, state=<optimized out>, e=<optimized out>) at
../src/pjsip-ua/sip_inv.c:317#8 0x76ebd67c in mod_inv_on_tsx_state
(tsx=0x626fcb54, e=0x711fe50c) at ../src/pjsip-ua/sip_inv.c:717#9
0x76e73aa0 in pjsip_dlg_on_tsx_state (dlg=0x6240bd9c, tsx=0x626fcb54,
e=0x711fe50c) at ../src/pjsip/sip_dialog.c:2066#10 0x76e6bce0 in
tsx_set_state (tsx=tsx@entry=0x626fcb54,
state=state@entry=PJSIP_TSX_STATE_PROCEEDING,
event_src_type=event_src_type@entry=PJSIP_EVENT_TX_MSG,
event_src=event_src@entry=0x624095f4, flag=flag@entry=0) at
../src/pjsip/sip_transaction.c:1267#11 0x76e6e228 in
tsx_on_state_proceeding_uas (tsx=0x626fcb54, event=<optimized out>) at
../src/pjsip/sip_transaction.c:2710#12 0x76e70188 in
pjsip_tsx_send_msg (tsx=tsx@entry=0x626fcb54,
tdata=tdata@entry=0x624095f4) at
../src/pjsip/sip_transaction.c:1789#13 0x76e732dc in
pjsip_dlg_send_response (dlg=0x6240bd9c, tsx=0x626fcb54,
tdata=tdata@entry=0x624095f4) at ../src/pjsip/sip_dialog.c:1531#14
0x76ec16f8 in pjsip_inv_send_msg (inv=0x722a612c, tdata=0x624095f4) at
../src/pjsip-ua/sip_inv.c:3231#15 0x76efb7f8 in pjsua_call_answer2
(call_id=2, opt=0x0, code=180, reason=0x0, msg_data=0x0) at
../src/pjsua-lib/pjsua_call.c:2317#16 0x76efb94c in
process_pending_call_answer (call=call@entry=0x76f3d3b8
<pjsua_var+30440>) at ../src/pjsua-lib/pjsua_call.c:978#17 0x76efd280
in pjsua_call_on_incoming (rdata=rdata@entry=0x626efd4c) at
../src/pjsua-lib/pjsua_call.c:1666#18 0x76f00698 in
mod_pjsua_on_rx_request (rdata=<optimized out>) at
../src/pjsua-lib/pjsua_core.c:582#19 0x76e569fc in
pjsip_endpt_process_rx_data (endpt=0x983094, rdata=0x626efd4c,
p=<optimized out>, p_handled=0x711fe8f4) at
../src/pjsip/sip_endpoint.c:893#20 0x76e56bd4 in endpt_on_rx_msg
(endpt=0x983094, status=<optimized out>, rdata=0x626efd4c) at
../src/pjsip/sip_endpoint.c:1043#21 0x76e5dfa8 in
pjsip_tpmgr_receive_packet (mgr=0x98e310,
rdata=rdata@entry=0x626efd4c) at ../src/pjsip/sip_transport.c:1965#22
0x76e60d68 in udp_on_read_complete (key=0x98b1f0, op_key=<optimized
out>, bytes_read=1058) at ../src/pjsip/sip_transport_udp.c:170#23
0x76cba7b4 in ioqueue_dispatch_read_event (ioqueue=<optimized out>,
h=0x98b1f0) at ../src/pj/ioqueue_common_abs.c:605#24 0x76cbc1c8 in
pj_ioqueue_poll (ioqueue=0x98af40, timeout=timeout@entry=0x711fed88)
at ../src/pj/ioqueue_epoll.c:812#25 0x76e566fc in
pjsip_endpt_handle_events2 (endpt=0x983094,
max_timeout=max_timeout@entry=0x711fedc0,
p_count=p_count@entry=0x711fedbc) at
../src/pjsip/sip_endpoint.c:744#26 0x76f01704 in pjsua_handle_events
(msec_timeout=<optimized out>) at
../src/pjsua-lib/pjsua_core.c:1988#27 0x76f01744 in worker_thread
(arg=<optimized out>) at ../src/pjsua-lib/pjsua_core.c:704#28
0x76cbdbec in thread_main (param=0x9a93f4) at
../src/pj/os_core_unix.c:541#29 0x76ac107c in start_thread
(arg=0x711ff410) at pthread_create.c:339
Thread 22 "application"
#0 0x76acadf4 in __lll_lock_wait (futex=0x982c8c, private=0) at
lowlevellock.c:46#1 0x76ac3cc8 in __GI___pthread_mutex_lock
(mutex=mutex@entry=0x982c8c) at pthread_mutex_lock.c:116#2 0x76cbd740
in pj_mutex_lock (mutex=0x982c8c) at ../src/pj/os_core_unix.c:1265#3
0x76efa7e0 in PJSUA_LOCK () at
../include/pjsua-lib/pjsua_internal.h:580#4 pjsua_call_get_info
(call_id=2, info=info@entry=0x733fd2dc) at
../src/pjsua-lib/pjsua_call.c:1805#5 0x76fd14ac in pj::Call::getInfo
(this=0x6263c6c0) at ../src/pjsua2/call.cpp:408#6 0x001c5cbc in ?? ()
our application mutex*
Thanks,
Martin
On Wed, May 22, 2019 at 10:19 AM Martin Oberhuber <mober.at+
eclipse@gmail.com> wrote:
Dear PJSIP List / Maintainers,
As I've just been through debugging a deadlock in our application using
Pjsip / pjsua2, I would like to suggest a documentation improvement, that
should help other users run into the same trap as we did.
In our case, we've implemented some code in the pj::Call::onCallState()
callback, that used a Mutex to modify our application state. Another
thread, which took the same application state mutex, called out into PJSIP
with the mutex held. It was an innocent call::getInfo() with our own Mutex
held, but PJSIP deadlocked on this pattern in some, very rare, cases.
To make develoeprs aware, I would suggest improving the
pj::Call::onCallState() documentation as follows:
"
virtual void pj::Call::onCallState ( OnCallStateParam
https://www.pjsip.org/pjsip/docs/html/structpj_1_1OnCallStateParam.htm
& prm ) inlinevirtual
Notify application when call state has changed. Application may then query
the call info to get the detail call states by calling getInfo()
https://www.pjsip.org/pjsip/docs/html/classpj_1_1Call.htm#a5057b298a0f5f5a74a8d37a144ca538e
function.
Note that the callback is made with the PJSUA_LOCK held. So the code
should not take any resources that other Threads hold while calling into
PJSIP.
Parameters
prm Callback parameter.
References PJ_UNUSED_ARG
https://www.pjsip.org/pjlib/docs/html/group__pj__config.htm#ga7523f8576efd271c22b188eab05d7870
.
"
What do you think?
Martin