J
janu@sympalog.de
Fri, Mar 17, 2017 8:58 AM
Also seems like the one I reprted (happens also if the application calls pjsua_call_hangup() between receiving a sip BYE and sending the corresponding OK).I also got no response to that.
Regards,Thomas
------ Originalnachricht------Von: Alex HermannDatum: Fr., 17. März 2017 09:49An: pjsip@lists.pjsip.org;Cc: Betreff:Re: [pjsip] Interesting deadlock bug found causing three threads to deadlock on the PJSUA, UA, and transaction locks.
On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghttp://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Also seems like the one I reprted (happens also if the application calls pjsua_call_hangup() between receiving a sip BYE and sending the corresponding OK).I also got no response to that.
Regards,Thomas
------ Originalnachricht------Von: Alex HermannDatum: Fr., 17. März 2017 09:49An: pjsip@lists.pjsip.org;Cc: Betreff:Re: [pjsip] Interesting deadlock bug found causing three threads to deadlock on the PJSUA, UA, and transaction locks.
On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghttp://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
M
Ming
Mon, Mar 20, 2017 10:33 AM
Hi all,
Thanks for the report and sorry for the delay in answering.
For the problem reported by David, we think the problem is actually caused
by thread 1 (instead of thread 3) which shouldn't try to obtain PJSUA's
lock after holding the transaction lock. So we propose the fix attached.
Please revert your temporary fix (where you remove the acquisition of group
lock in pjsip_tsx_layer_find_tsx()) and use our patch instead, and share
with us whether it rectifies the issue. Thanks.
While for the deadlock issue reported by Alex, despite the similarity, I
believe this is an unrelated issue, so I'll reply it separately in the
original thread.
Finally, for Janu, without the stack trace, I can't be certain that your
problem is the same as one or both of these issues. So my suggestion is to
apply both patches and see if the problem disappears. Also, I would
recommend to upgrade to version 2.6, if you haven't, since 2.3 is already
quite old.
Best regards,
Ming
On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de janu@sympalog.de wrote:
Also seems like the one I reprted (happens also if the application calls
pjsua_call_hangup() between receiving a sip BYE and sending the
corresponding OK).
I also got no response to that.
Regards,
Thomas
------ Originalnachricht------
*Von: *Alex Hermann
*Datum: *Fr., 17. März 2017 09:49
*An: *pjsip@lists.pjsip.org;
*Cc: *
*Betreff:*Re: [pjsip] Interesting deadlock bug found causing three
threads to deadlock on the PJSUA, UA, and transaction locks.
On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt +listpjsip@lists.pjsip.orghttp://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Hi all,
Thanks for the report and sorry for the delay in answering.
For the problem reported by David, we think the problem is actually caused
by thread 1 (instead of thread 3) which shouldn't try to obtain PJSUA's
lock after holding the transaction lock. So we propose the fix attached.
Please revert your temporary fix (where you remove the acquisition of group
lock in pjsip_tsx_layer_find_tsx()) and use our patch instead, and share
with us whether it rectifies the issue. Thanks.
While for the deadlock issue reported by Alex, despite the similarity, I
believe this is an unrelated issue, so I'll reply it separately in the
original thread.
Finally, for Janu, without the stack trace, I can't be certain that your
problem is the same as one or both of these issues. So my suggestion is to
apply both patches and see if the problem disappears. Also, I would
recommend to upgrade to version 2.6, if you haven't, since 2.3 is already
quite old.
Best regards,
Ming
On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de <janu@sympalog.de> wrote:
> Also seems like the one I reprted (happens also if the application calls
> pjsua_call_hangup() between receiving a sip BYE and sending the
> corresponding OK).
> I also got no response to that.
>
> Regards,
> Thomas
>
> ------ Originalnachricht------
> *Von: *Alex Hermann
> *Datum: *Fr., 17. März 2017 09:49
> *An: *pjsip@lists.pjsip.org;
> *Cc: *
> *Betreff:*Re: [pjsip] Interesting deadlock bug found causing three
> threads to deadlock on the PJSUA, UA, and transaction locks.
>
> On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt <+listpjsip@lists.pjsip.orghtt>p://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>
>
> _______________________________________________
> Visit our blog: http://blog.pjsip.org
>
> pjsip mailing list
> pjsip@lists.pjsip.org
> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>
>
DR
David Richards
Mon, Mar 20, 2017 8:05 PM
Hi Ming.
Thanks for the quick response. However your patch doesn't fix the problem,
it just moved it as shown in the Thread 1 stack backtrace below. Your
patch stopped the PJSUA lock from being obtained in
pjsua_call_on_state_changed()(frame
7), however, just a few frames later (frame 5), my application calls
pjsua_call_on_state_changed()
from on_call_state() where an attempt on the PJSUA lock is made and
deadlocked.
I didn't include the stack back traces of the other tasks because they were
essentially blocked the same as before. I admit I don't know this software
well, but why was my initial analysis incorrect? Don't the locks always
have to be locked in the same order?
Thanks for your help,
Dave
Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
#0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 in _L_lock_839 () from /lib64/libpthread.so.0
#2 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
#4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
#5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at
../src/pjsua-lib/pjsua_call.c:1817
#6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
#7 in pjsua_call_on_state_changed (inv=0x7f1a05272688, e=0x7fff7ce7ceb0)
at ../src/pjsua-lib/pjsua_call.c:3824
#8 in inv_set_state (inv=0x7f1a05272688,
state=PJSIP_INV_STATE_DISCONNECTED, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:317
#9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:4310
#10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:717
#11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
#12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
../src/pjsip/sip_ua_layer.c:178
#13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED,
event_src_type=PJSIP_EVENT_TX_MSG, event_src=0x7f1a043c0c38, flag=0) at
../src/pjsip/sip_transaction.c:1235
#14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608,
event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
#15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at
../src/pjsip/sip_transaction.c:1751
#16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
#17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at
../src/pjsip-ua/sip_inv.c:3227
#18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0, msg_data=0x0)
at ../src/pjsua-lib/pjsua_call.c:2426
#19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90
<dialer_info+3972080>) at TERMmain.c:313
#20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at
../src/pj/timer.c:643
#21 in TIMERpoll () at TIMERmain.c:72
#22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
#23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1,
argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
#24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782
On Mon, Mar 20, 2017 at 5:33 AM, Ming ming@teluu.com wrote:
Hi all,
Thanks for the report and sorry for the delay in answering.
For the problem reported by David, we think the problem is actually caused
by thread 1 (instead of thread 3) which shouldn't try to obtain PJSUA's
lock after holding the transaction lock. So we propose the fix attached.
Please revert your temporary fix (where you remove the acquisition of group
lock in pjsip_tsx_layer_find_tsx()) and use our patch instead, and share
with us whether it rectifies the issue. Thanks.
While for the deadlock issue reported by Alex, despite the similarity, I
believe this is an unrelated issue, so I'll reply it separately in the
original thread.
Finally, for Janu, without the stack trace, I can't be certain that your
problem is the same as one or both of these issues. So my suggestion is to
apply both patches and see if the problem disappears. Also, I would
recommend to upgrade to version 2.6, if you haven't, since 2.3 is already
quite old.
Best regards,
Ming
On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de janu@sympalog.de
wrote:
Also seems like the one I reprted (happens also if the application calls
pjsua_call_hangup() between receiving a sip BYE and sending the
corresponding OK).
I also got no response to that.
Regards,
Thomas
------ Originalnachricht------
*Von: *Alex Hermann
*Datum: *Fr., 17. März 2017 09:49
*An: *pjsip@lists.pjsip.org;
*Cc: *
*Betreff:*Re: [pjsip] Interesting deadlock bug found causing three
threads to deadlock on the PJSUA, UA, and transaction locks.
On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt +listpjsip@lists.pjsip.orghttp://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Hi Ming.
Thanks for the quick response. However your patch doesn't fix the problem,
it just moved it as shown in the Thread 1 stack backtrace below. Your
patch stopped the PJSUA lock from being obtained in
pjsua_call_on_state_changed()(frame
7), however, just a few frames later (frame 5), my application calls
pjsua_call_on_state_changed()
from on_call_state() where an attempt on the PJSUA lock is made and
deadlocked.
I didn't include the stack back traces of the other tasks because they were
essentially blocked the same as before. I admit I don't know this software
well, but why was my initial analysis incorrect? Don't the locks always
have to be locked in the same order?
Thanks for your help,
Dave
Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
#0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 in _L_lock_839 () from /lib64/libpthread.so.0
#2 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
#4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
#5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at
../src/pjsua-lib/pjsua_call.c:1817
#6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
#7 in pjsua_call_on_state_changed (inv=0x7f1a05272688, e=0x7fff7ce7ceb0)
at ../src/pjsua-lib/pjsua_call.c:3824
#8 in inv_set_state (inv=0x7f1a05272688,
state=PJSIP_INV_STATE_DISCONNECTED, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:317
#9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:4310
#10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:717
#11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
#12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
../src/pjsip/sip_ua_layer.c:178
#13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED,
event_src_type=PJSIP_EVENT_TX_MSG, event_src=0x7f1a043c0c38, flag=0) at
../src/pjsip/sip_transaction.c:1235
#14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608,
event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
#15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at
../src/pjsip/sip_transaction.c:1751
#16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
#17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at
../src/pjsip-ua/sip_inv.c:3227
#18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0, msg_data=0x0)
at ../src/pjsua-lib/pjsua_call.c:2426
#19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90
<dialer_info+3972080>) at TERMmain.c:313
#20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at
../src/pj/timer.c:643
#21 in TIMERpoll () at TIMERmain.c:72
#22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
#23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1,
argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
#24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782
On Mon, Mar 20, 2017 at 5:33 AM, Ming <ming@teluu.com> wrote:
> Hi all,
>
> Thanks for the report and sorry for the delay in answering.
>
> For the problem reported by David, we think the problem is actually caused
> by thread 1 (instead of thread 3) which shouldn't try to obtain PJSUA's
> lock after holding the transaction lock. So we propose the fix attached.
> Please revert your temporary fix (where you remove the acquisition of group
> lock in pjsip_tsx_layer_find_tsx()) and use our patch instead, and share
> with us whether it rectifies the issue. Thanks.
>
> While for the deadlock issue reported by Alex, despite the similarity, I
> believe this is an unrelated issue, so I'll reply it separately in the
> original thread.
>
> Finally, for Janu, without the stack trace, I can't be certain that your
> problem is the same as one or both of these issues. So my suggestion is to
> apply both patches and see if the problem disappears. Also, I would
> recommend to upgrade to version 2.6, if you haven't, since 2.3 is already
> quite old.
>
> Best regards,
> Ming
>
> On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de <janu@sympalog.de>
> wrote:
>
>> Also seems like the one I reprted (happens also if the application calls
>> pjsua_call_hangup() between receiving a sip BYE and sending the
>> corresponding OK).
>> I also got no response to that.
>>
>> Regards,
>> Thomas
>>
>> ------ Originalnachricht------
>> *Von: *Alex Hermann
>> *Datum: *Fr., 17. März 2017 09:49
>> *An: *pjsip@lists.pjsip.org;
>> *Cc: *
>> *Betreff:*Re: [pjsip] Interesting deadlock bug found causing three
>> threads to deadlock on the PJSUA, UA, and transaction locks.
>>
>> On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt <+listpjsip@lists.pjsip.orghtt>p://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>
>>
>> _______________________________________________
>> Visit our blog: http://blog.pjsip.org
>>
>> pjsip mailing list
>> pjsip@lists.pjsip.org
>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>
>>
>
> _______________________________________________
> Visit our blog: http://blog.pjsip.org
>
> pjsip mailing list
> pjsip@lists.pjsip.org
> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>
>
M
Ming
Tue, Mar 21, 2017 10:08 AM
Hi David,
Ah, right, I completely forgot that application can obtain PJSUA lock in
the callback, so even though the library doesn't technically cause the
deadlock, it's still not a practical solution.
Your initial analysis seems to be correct, it's just that last time, I
didn't find a solution yet to prevent the transaction from getting
destroyed (as previously reported in ticket #1706 (
https://trac.pjsip.org/repos/ticket/1706), but now I think I do (in the
attached patch).
So, thanks for testing it before and please help us test the patch again.
Best regards,
Ming
On Tue, Mar 21, 2017 at 4:05 AM, David Richards <
david.brian.richards@gmail.com> wrote:
Hi Ming.
Thanks for the quick response. However your patch doesn't fix the problem,
it just moved it as shown in the Thread 1 stack backtrace below. Your
patch stopped the PJSUA lock from being obtained in pjsua_call_on_state_changed()(frame
7), however, just a few frames later (frame 5), my application calls
pjsua_call_on_state_changed() from on_call_state() where an attempt on
the PJSUA lock is made and deadlocked.
I didn't include the stack back traces of the other tasks because they
were essentially blocked the same as before. I admit I don't know this
software well, but why was my initial analysis incorrect? Don't the locks
always have to be locked in the same order?
Thanks for your help,
Dave
Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
#0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 in _L_lock_839 () from /lib64/libpthread.so.0
#2 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
#4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
#5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at
../src/pjsua-lib/pjsua_call.c:1817
#6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
#7 in pjsua_call_on_state_changed (inv=0x7f1a05272688, e=0x7fff7ce7ceb0)
at ../src/pjsua-lib/pjsua_call.c:3824
#8 in inv_set_state (inv=0x7f1a05272688, state=PJSIP_INV_STATE_DISCONNECTED,
e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:317
#9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:4310
#10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:717
#11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
#12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
../src/pjsip/sip_ua_layer.c:178
#13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED,
event_src_type=PJSIP_EVENT_TX_MSG, event_src=0x7f1a043c0c38, flag=0) at
../src/pjsip/sip_transaction.c:1235
#14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608,
event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
#15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at
../src/pjsip/sip_transaction.c:1751
#16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
#17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at
../src/pjsip-ua/sip_inv.c:3227
#18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0,
msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426
#19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90
<dialer_info+3972080>) at TERMmain.c:313
#20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at
../src/pj/timer.c:643
#21 in TIMERpoll () at TIMERmain.c:72
#22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
#23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1,
argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
#24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782
On Mon, Mar 20, 2017 at 5:33 AM, Ming ming@teluu.com wrote:
Hi all,
Thanks for the report and sorry for the delay in answering.
For the problem reported by David, we think the problem is actually
caused by thread 1 (instead of thread 3) which shouldn't try to obtain
PJSUA's lock after holding the transaction lock. So we propose the fix
attached. Please revert your temporary fix (where you remove the
acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our
patch instead, and share with us whether it rectifies the issue. Thanks.
While for the deadlock issue reported by Alex, despite the similarity, I
believe this is an unrelated issue, so I'll reply it separately in the
original thread.
Finally, for Janu, without the stack trace, I can't be certain that your
problem is the same as one or both of these issues. So my suggestion is to
apply both patches and see if the problem disappears. Also, I would
recommend to upgrade to version 2.6, if you haven't, since 2.3 is already
quite old.
Best regards,
Ming
On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de janu@sympalog.de
wrote:
Also seems like the one I reprted (happens also if the application calls
pjsua_call_hangup() between receiving a sip BYE and sending the
corresponding OK).
I also got no response to that.
Regards,
Thomas
------ Originalnachricht------
*Von: *Alex Hermann
*Datum: *Fr., 17. März 2017 09:49
*An: *pjsip@lists.pjsip.org;
*Cc: *
*Betreff:*Re: [pjsip] Interesting deadlock bug found causing three
threads to deadlock on the PJSUA, UA, and transaction locks.
On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt +listpjsip@lists.pjsip.orghttp://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Hi David,
Ah, right, I completely forgot that application can obtain PJSUA lock in
the callback, so even though the library doesn't technically cause the
deadlock, it's still not a practical solution.
Your initial analysis seems to be correct, it's just that last time, I
didn't find a solution yet to prevent the transaction from getting
destroyed (as previously reported in ticket #1706 (
https://trac.pjsip.org/repos/ticket/1706), but now I think I do (in the
attached patch).
So, thanks for testing it before and please help us test the patch again.
Best regards,
Ming
On Tue, Mar 21, 2017 at 4:05 AM, David Richards <
david.brian.richards@gmail.com> wrote:
> Hi Ming.
>
> Thanks for the quick response. However your patch doesn't fix the problem,
> it just moved it as shown in the Thread 1 stack backtrace below. Your
> patch stopped the PJSUA lock from being obtained in pjsua_call_on_state_changed()(frame
> 7), however, just a few frames later (frame 5), my application calls
> pjsua_call_on_state_changed() from on_call_state() where an attempt on
> the PJSUA lock is made and deadlocked.
>
> I didn't include the stack back traces of the other tasks because they
> were essentially blocked the same as before. I admit I don't know this
> software well, but why was my initial analysis incorrect? Don't the locks
> always have to be locked in the same order?
>
> Thanks for your help,
> Dave
>
> Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
> #0 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 in _L_lock_839 () from /lib64/libpthread.so.0
> #2 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
> #4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
> #5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at
> ../src/pjsua-lib/pjsua_call.c:1817
> #6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
> #7 in pjsua_call_on_state_changed (inv=0x7f1a05272688, e=0x7fff7ce7ceb0)
> at ../src/pjsua-lib/pjsua_call.c:3824
> #8 in inv_set_state (inv=0x7f1a05272688, state=PJSIP_INV_STATE_DISCONNECTED,
> e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:317
> #9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at
> ../src/pjsip-ua/sip_inv.c:4310
> #10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
> ../src/pjsip-ua/sip_inv.c:717
> #11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
> e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
> #12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
> ../src/pjsip/sip_ua_layer.c:178
> #13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED,
> event_src_type=PJSIP_EVENT_TX_MSG, event_src=0x7f1a043c0c38, flag=0) at
> ../src/pjsip/sip_transaction.c:1235
> #14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608,
> event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
> #15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at
> ../src/pjsip/sip_transaction.c:1751
> #16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
> tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
> #17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at
> ../src/pjsip-ua/sip_inv.c:3227
> #18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0,
> msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426
> #19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90
> <dialer_info+3972080>) at TERMmain.c:313
> #20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at
> ../src/pj/timer.c:643
> #21 in TIMERpoll () at TIMERmain.c:72
> #22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
> #23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1,
> argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
> #24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782
>
>
> On Mon, Mar 20, 2017 at 5:33 AM, Ming <ming@teluu.com> wrote:
>
>> Hi all,
>>
>> Thanks for the report and sorry for the delay in answering.
>>
>> For the problem reported by David, we think the problem is actually
>> caused by thread 1 (instead of thread 3) which shouldn't try to obtain
>> PJSUA's lock after holding the transaction lock. So we propose the fix
>> attached. Please revert your temporary fix (where you remove the
>> acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our
>> patch instead, and share with us whether it rectifies the issue. Thanks.
>>
>> While for the deadlock issue reported by Alex, despite the similarity, I
>> believe this is an unrelated issue, so I'll reply it separately in the
>> original thread.
>>
>> Finally, for Janu, without the stack trace, I can't be certain that your
>> problem is the same as one or both of these issues. So my suggestion is to
>> apply both patches and see if the problem disappears. Also, I would
>> recommend to upgrade to version 2.6, if you haven't, since 2.3 is already
>> quite old.
>>
>> Best regards,
>> Ming
>>
>> On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de <janu@sympalog.de>
>> wrote:
>>
>>> Also seems like the one I reprted (happens also if the application calls
>>> pjsua_call_hangup() between receiving a sip BYE and sending the
>>> corresponding OK).
>>> I also got no response to that.
>>>
>>> Regards,
>>> Thomas
>>>
>>> ------ Originalnachricht------
>>> *Von: *Alex Hermann
>>> *Datum: *Fr., 17. März 2017 09:49
>>> *An: *pjsip@lists.pjsip.org;
>>> *Cc: *
>>> *Betreff:*Re: [pjsip] Interesting deadlock bug found causing three
>>> threads to deadlock on the PJSUA, UA, and transaction locks.
>>>
>>> On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt <+listpjsip@lists.pjsip.orghtt>p://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>
>>>
>>> _______________________________________________
>>> Visit our blog: http://blog.pjsip.org
>>>
>>> pjsip mailing list
>>> pjsip@lists.pjsip.org
>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>
>>>
>>
>> _______________________________________________
>> Visit our blog: http://blog.pjsip.org
>>
>> pjsip mailing list
>> pjsip@lists.pjsip.org
>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>
>>
>
> _______________________________________________
> Visit our blog: http://blog.pjsip.org
>
> pjsip mailing list
> pjsip@lists.pjsip.org
> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>
>
DR
David Richards
Thu, Mar 23, 2017 2:13 PM
Hi Ming.
The patch is working great so far. I'll be doing more testing. Can you
send me a single patch file that contains Alex's deadlock patch too (and
any other deadlock fixes you might have) so i can test them all together?
Thanks,
Dave Richards
On Tue, Mar 21, 2017 at 5:08 AM, Ming ming@teluu.com wrote:
Hi David,
Ah, right, I completely forgot that application can obtain PJSUA lock in
the callback, so even though the library doesn't technically cause the
deadlock, it's still not a practical solution.
Your initial analysis seems to be correct, it's just that last time, I
didn't find a solution yet to prevent the transaction from getting
destroyed (as previously reported in ticket #1706 (
https://trac.pjsip.org/repos/ticket/1706), but now I think I do (in the
attached patch).
So, thanks for testing it before and please help us test the patch again.
Best regards,
Ming
On Tue, Mar 21, 2017 at 4:05 AM, David Richards <
david.brian.richards@gmail.com> wrote:
Hi Ming.
Thanks for the quick response. However your patch doesn't fix the
problem, it just moved it as shown in the Thread 1 stack backtrace below.
Your patch stopped the PJSUA lock from being obtained in
pjsua_call_on_state_changed()(frame 7), however, just a few frames later
(frame 5), my application calls pjsua_call_on_state_changed() from
on_call_state() where an attempt on the PJSUA lock is made and deadlocked
.
I didn't include the stack back traces of the other tasks because they
were essentially blocked the same as before. I admit I don't know this
software well, but why was my initial analysis incorrect? Don't the locks
always have to be locked in the same order?
Thanks for your help,
Dave
Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
#0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 in _L_lock_839 () from /lib64/libpthread.so.0
#2 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
#4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
#5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at
../src/pjsua-lib/pjsua_call.c:1817
#6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
#7 in pjsua_call_on_state_changed (inv=0x7f1a05272688, e=0x7fff7ce7ceb0)
at ../src/pjsua-lib/pjsua_call.c:3824
#8 in inv_set_state (inv=0x7f1a05272688, state=PJSIP_INV_STATE_DISCONNECTED,
e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:317
#9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:4310
#10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:717
#11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
#12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
../src/pjsip/sip_ua_layer.c:178
#13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED,
event_src_type=PJSIP_EVENT_TX_MSG, event_src=0x7f1a043c0c38, flag=0) at
../src/pjsip/sip_transaction.c:1235
#14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608,
event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
#15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at
../src/pjsip/sip_transaction.c:1751
#16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
#17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at
../src/pjsip-ua/sip_inv.c:3227
#18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0,
msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426
#19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90
<dialer_info+3972080>) at TERMmain.c:313
#20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at
../src/pj/timer.c:643
#21 in TIMERpoll () at TIMERmain.c:72
#22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
#23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1,
argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
#24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782
On Mon, Mar 20, 2017 at 5:33 AM, Ming ming@teluu.com wrote:
Hi all,
Thanks for the report and sorry for the delay in answering.
For the problem reported by David, we think the problem is actually
caused by thread 1 (instead of thread 3) which shouldn't try to obtain
PJSUA's lock after holding the transaction lock. So we propose the fix
attached. Please revert your temporary fix (where you remove the
acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our
patch instead, and share with us whether it rectifies the issue. Thanks.
While for the deadlock issue reported by Alex, despite the similarity, I
believe this is an unrelated issue, so I'll reply it separately in the
original thread.
Finally, for Janu, without the stack trace, I can't be certain that your
problem is the same as one or both of these issues. So my suggestion is to
apply both patches and see if the problem disappears. Also, I would
recommend to upgrade to version 2.6, if you haven't, since 2.3 is already
quite old.
Best regards,
Ming
On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de janu@sympalog.de
wrote:
Also seems like the one I reprted (happens also if the application
calls pjsua_call_hangup() between receiving a sip BYE and sending the
corresponding OK).
I also got no response to that.
Regards,
Thomas
------ Originalnachricht------
*Von: *Alex Hermann
*Datum: *Fr., 17. März 2017 09:49
*An: *pjsip@lists.pjsip.org;
*Cc: *
*Betreff:*Re: [pjsip] Interesting deadlock bug found causing three
threads to deadlock on the PJSUA, UA, and transaction locks.
On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt +listpjsip@lists.pjsip.orghttp://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Hi Ming.
The patch is working great so far. I'll be doing more testing. Can you
send me a single patch file that contains Alex's deadlock patch too (and
any other deadlock fixes you might have) so i can test them all together?
Thanks,
Dave Richards
On Tue, Mar 21, 2017 at 5:08 AM, Ming <ming@teluu.com> wrote:
> Hi David,
>
> Ah, right, I completely forgot that application can obtain PJSUA lock in
> the callback, so even though the library doesn't technically cause the
> deadlock, it's still not a practical solution.
>
> Your initial analysis seems to be correct, it's just that last time, I
> didn't find a solution yet to prevent the transaction from getting
> destroyed (as previously reported in ticket #1706 (
> https://trac.pjsip.org/repos/ticket/1706), but now I think I do (in the
> attached patch).
>
> So, thanks for testing it before and please help us test the patch again.
>
> Best regards,
> Ming
>
> On Tue, Mar 21, 2017 at 4:05 AM, David Richards <
> david.brian.richards@gmail.com> wrote:
>
>> Hi Ming.
>>
>> Thanks for the quick response. However your patch doesn't fix the
>> problem, it just moved it as shown in the Thread 1 stack backtrace below.
>> Your patch stopped the PJSUA lock from being obtained in
>> pjsua_call_on_state_changed()(frame 7), however, just a few frames later
>> (frame 5), my application calls pjsua_call_on_state_changed() from
>> on_call_state() where an attempt on the PJSUA lock is made and deadlocked
>> .
>>
>> I didn't include the stack back traces of the other tasks because they
>> were essentially blocked the same as before. I admit I don't know this
>> software well, but why was my initial analysis incorrect? Don't the locks
>> always have to be locked in the same order?
>>
>> Thanks for your help,
>> Dave
>>
>> Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
>> #0 in __lll_lock_wait () from /lib64/libpthread.so.0
>> #1 in _L_lock_839 () from /lib64/libpthread.so.0
>> #2 in pthread_mutex_lock () from /lib64/libpthread.so.0
>> #3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
>> #4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
>> #5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at
>> ../src/pjsua-lib/pjsua_call.c:1817
>> #6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
>> #7 in pjsua_call_on_state_changed (inv=0x7f1a05272688, e=0x7fff7ce7ceb0)
>> at ../src/pjsua-lib/pjsua_call.c:3824
>> #8 in inv_set_state (inv=0x7f1a05272688, state=PJSIP_INV_STATE_DISCONNECTED,
>> e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:317
>> #9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at
>> ../src/pjsip-ua/sip_inv.c:4310
>> #10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
>> ../src/pjsip-ua/sip_inv.c:717
>> #11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
>> e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
>> #12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
>> ../src/pjsip/sip_ua_layer.c:178
>> #13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED,
>> event_src_type=PJSIP_EVENT_TX_MSG, event_src=0x7f1a043c0c38, flag=0) at
>> ../src/pjsip/sip_transaction.c:1235
>> #14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608,
>> event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
>> #15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at
>> ../src/pjsip/sip_transaction.c:1751
>> #16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
>> tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
>> #17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at
>> ../src/pjsip-ua/sip_inv.c:3227
>> #18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0,
>> msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426
>> #19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90
>> <dialer_info+3972080>) at TERMmain.c:313
>> #20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at
>> ../src/pj/timer.c:643
>> #21 in TIMERpoll () at TIMERmain.c:72
>> #22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
>> #23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1,
>> argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
>> #24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782
>>
>>
>> On Mon, Mar 20, 2017 at 5:33 AM, Ming <ming@teluu.com> wrote:
>>
>>> Hi all,
>>>
>>> Thanks for the report and sorry for the delay in answering.
>>>
>>> For the problem reported by David, we think the problem is actually
>>> caused by thread 1 (instead of thread 3) which shouldn't try to obtain
>>> PJSUA's lock after holding the transaction lock. So we propose the fix
>>> attached. Please revert your temporary fix (where you remove the
>>> acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our
>>> patch instead, and share with us whether it rectifies the issue. Thanks.
>>>
>>> While for the deadlock issue reported by Alex, despite the similarity, I
>>> believe this is an unrelated issue, so I'll reply it separately in the
>>> original thread.
>>>
>>> Finally, for Janu, without the stack trace, I can't be certain that your
>>> problem is the same as one or both of these issues. So my suggestion is to
>>> apply both patches and see if the problem disappears. Also, I would
>>> recommend to upgrade to version 2.6, if you haven't, since 2.3 is already
>>> quite old.
>>>
>>> Best regards,
>>> Ming
>>>
>>> On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de <janu@sympalog.de>
>>> wrote:
>>>
>>>> Also seems like the one I reprted (happens also if the application
>>>> calls pjsua_call_hangup() between receiving a sip BYE and sending the
>>>> corresponding OK).
>>>> I also got no response to that.
>>>>
>>>> Regards,
>>>> Thomas
>>>>
>>>> ------ Originalnachricht------
>>>> *Von: *Alex Hermann
>>>> *Datum: *Fr., 17. März 2017 09:49
>>>> *An: *pjsip@lists.pjsip.org;
>>>> *Cc: *
>>>> *Betreff:*Re: [pjsip] Interesting deadlock bug found causing three
>>>> threads to deadlock on the PJSUA, UA, and transaction locks.
>>>>
>>>> On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt <+listpjsip@lists.pjsip.orghtt>p://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>
>>>>
>>>> _______________________________________________
>>>> Visit our blog: http://blog.pjsip.org
>>>>
>>>> pjsip mailing list
>>>> pjsip@lists.pjsip.org
>>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Visit our blog: http://blog.pjsip.org
>>>
>>> pjsip mailing list
>>> pjsip@lists.pjsip.org
>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>
>>>
>>
>> _______________________________________________
>> Visit our blog: http://blog.pjsip.org
>>
>> pjsip mailing list
>> pjsip@lists.pjsip.org
>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>
>>
>
> _______________________________________________
> Visit our blog: http://blog.pjsip.org
>
> pjsip mailing list
> pjsip@lists.pjsip.org
> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>
>
M
Ming
Fri, Mar 24, 2017 12:26 AM
Hi David,
Please find the attached combined patch.
Regards,
Ming
On Thu, Mar 23, 2017 at 10:13 PM, David Richards <
david.brian.richards@gmail.com> wrote:
Hi Ming.
The patch is working great so far. I'll be doing more testing. Can you
send me a single patch file that contains Alex's deadlock patch too (and
any other deadlock fixes you might have) so i can test them all together?
Thanks,
Dave Richards
On Tue, Mar 21, 2017 at 5:08 AM, Ming ming@teluu.com wrote:
Hi David,
Ah, right, I completely forgot that application can obtain PJSUA lock in
the callback, so even though the library doesn't technically cause the
deadlock, it's still not a practical solution.
Your initial analysis seems to be correct, it's just that last time, I
didn't find a solution yet to prevent the transaction from getting
destroyed (as previously reported in ticket #1706 (
https://trac.pjsip.org/repos/ticket/1706), but now I think I do (in the
attached patch).
So, thanks for testing it before and please help us test the patch again.
Best regards,
Ming
On Tue, Mar 21, 2017 at 4:05 AM, David Richards <
david.brian.richards@gmail.com> wrote:
Hi Ming.
Thanks for the quick response. However your patch doesn't fix the
problem, it just moved it as shown in the Thread 1 stack backtrace below.
Your patch stopped the PJSUA lock from being obtained in
pjsua_call_on_state_changed()(frame 7), however, just a few frames
later (frame 5), my application calls pjsua_call_on_state_changed() from
on_call_state() where an attempt on the PJSUA lock is made and
deadlocked.
I didn't include the stack back traces of the other tasks because they
were essentially blocked the same as before. I admit I don't know this
software well, but why was my initial analysis incorrect? Don't the locks
always have to be locked in the same order?
Thanks for your help,
Dave
Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
#0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 in _L_lock_839 () from /lib64/libpthread.so.0
#2 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
#4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
#5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at
../src/pjsua-lib/pjsua_call.c:1817
#6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
#7 in pjsua_call_on_state_changed (inv=0x7f1a05272688,
e=0x7fff7ce7ceb0) at ../src/pjsua-lib/pjsua_call.c:3824
#8 in inv_set_state (inv=0x7f1a05272688, state=PJSIP_INV_STATE_DISCONNECTED,
e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:317
#9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:4310
#10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:717
#11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
#12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
../src/pjsip/sip_ua_layer.c:178
#13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED,
event_src_type=PJSIP_EVENT_TX_MSG, event_src=0x7f1a043c0c38, flag=0) at
../src/pjsip/sip_transaction.c:1235
#14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608,
event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
#15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at
../src/pjsip/sip_transaction.c:1751
#16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
#17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at
../src/pjsip-ua/sip_inv.c:3227
#18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0,
msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426
#19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90
<dialer_info+3972080>) at TERMmain.c:313
#20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at
../src/pj/timer.c:643
#21 in TIMERpoll () at TIMERmain.c:72
#22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
#23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1,
argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
#24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782
On Mon, Mar 20, 2017 at 5:33 AM, Ming ming@teluu.com wrote:
Hi all,
Thanks for the report and sorry for the delay in answering.
For the problem reported by David, we think the problem is actually
caused by thread 1 (instead of thread 3) which shouldn't try to obtain
PJSUA's lock after holding the transaction lock. So we propose the fix
attached. Please revert your temporary fix (where you remove the
acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our
patch instead, and share with us whether it rectifies the issue. Thanks.
While for the deadlock issue reported by Alex, despite the similarity,
I believe this is an unrelated issue, so I'll reply it separately in the
original thread.
Finally, for Janu, without the stack trace, I can't be certain that
your problem is the same as one or both of these issues. So my suggestion
is to apply both patches and see if the problem disappears. Also, I would
recommend to upgrade to version 2.6, if you haven't, since 2.3 is already
quite old.
Best regards,
Ming
On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de janu@sympalog.de
wrote:
Also seems like the one I reprted (happens also if the application
calls pjsua_call_hangup() between receiving a sip BYE and sending the
corresponding OK).
I also got no response to that.
Regards,
Thomas
------ Originalnachricht------
*Von: *Alex Hermann
*Datum: *Fr., 17. März 2017 09:49
*An: *pjsip@lists.pjsip.org;
*Cc: *
*Betreff:*Re: [pjsip] Interesting deadlock bug found causing three
threads to deadlock on the PJSUA, UA, and transaction locks.
On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt +listpjsip@lists.pjsip.orghttp://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Hi David,
Please find the attached combined patch.
Regards,
Ming
On Thu, Mar 23, 2017 at 10:13 PM, David Richards <
david.brian.richards@gmail.com> wrote:
> Hi Ming.
>
> The patch is working great so far. I'll be doing more testing. Can you
> send me a single patch file that contains Alex's deadlock patch too (and
> any other deadlock fixes you might have) so i can test them all together?
>
> Thanks,
> Dave Richards
>
> On Tue, Mar 21, 2017 at 5:08 AM, Ming <ming@teluu.com> wrote:
>
>> Hi David,
>>
>> Ah, right, I completely forgot that application can obtain PJSUA lock in
>> the callback, so even though the library doesn't technically cause the
>> deadlock, it's still not a practical solution.
>>
>> Your initial analysis seems to be correct, it's just that last time, I
>> didn't find a solution yet to prevent the transaction from getting
>> destroyed (as previously reported in ticket #1706 (
>> https://trac.pjsip.org/repos/ticket/1706), but now I think I do (in the
>> attached patch).
>>
>> So, thanks for testing it before and please help us test the patch again.
>>
>> Best regards,
>> Ming
>>
>> On Tue, Mar 21, 2017 at 4:05 AM, David Richards <
>> david.brian.richards@gmail.com> wrote:
>>
>>> Hi Ming.
>>>
>>> Thanks for the quick response. However your patch doesn't fix the
>>> problem, it just moved it as shown in the Thread 1 stack backtrace below.
>>> Your patch stopped the PJSUA lock from being obtained in
>>> pjsua_call_on_state_changed()(frame 7), however, just a few frames
>>> later (frame 5), my application calls pjsua_call_on_state_changed() from
>>> on_call_state() where an attempt on the PJSUA lock is made and
>>> deadlocked.
>>>
>>> I didn't include the stack back traces of the other tasks because they
>>> were essentially blocked the same as before. I admit I don't know this
>>> software well, but why was my initial analysis incorrect? Don't the locks
>>> always have to be locked in the same order?
>>>
>>> Thanks for your help,
>>> Dave
>>>
>>> Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
>>> #0 in __lll_lock_wait () from /lib64/libpthread.so.0
>>> #1 in _L_lock_839 () from /lib64/libpthread.so.0
>>> #2 in pthread_mutex_lock () from /lib64/libpthread.so.0
>>> #3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
>>> #4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
>>> #5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at
>>> ../src/pjsua-lib/pjsua_call.c:1817
>>> #6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
>>> #7 in pjsua_call_on_state_changed (inv=0x7f1a05272688,
>>> e=0x7fff7ce7ceb0) at ../src/pjsua-lib/pjsua_call.c:3824
>>> #8 in inv_set_state (inv=0x7f1a05272688, state=PJSIP_INV_STATE_DISCONNECTED,
>>> e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:317
>>> #9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at
>>> ../src/pjsip-ua/sip_inv.c:4310
>>> #10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
>>> ../src/pjsip-ua/sip_inv.c:717
>>> #11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
>>> e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
>>> #12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
>>> ../src/pjsip/sip_ua_layer.c:178
>>> #13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED,
>>> event_src_type=PJSIP_EVENT_TX_MSG, event_src=0x7f1a043c0c38, flag=0) at
>>> ../src/pjsip/sip_transaction.c:1235
>>> #14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608,
>>> event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
>>> #15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at
>>> ../src/pjsip/sip_transaction.c:1751
>>> #16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
>>> tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
>>> #17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at
>>> ../src/pjsip-ua/sip_inv.c:3227
>>> #18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0,
>>> msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426
>>> #19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90
>>> <dialer_info+3972080>) at TERMmain.c:313
>>> #20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at
>>> ../src/pj/timer.c:643
>>> #21 in TIMERpoll () at TIMERmain.c:72
>>> #22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
>>> #23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1,
>>> argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
>>> #24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782
>>>
>>>
>>> On Mon, Mar 20, 2017 at 5:33 AM, Ming <ming@teluu.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Thanks for the report and sorry for the delay in answering.
>>>>
>>>> For the problem reported by David, we think the problem is actually
>>>> caused by thread 1 (instead of thread 3) which shouldn't try to obtain
>>>> PJSUA's lock after holding the transaction lock. So we propose the fix
>>>> attached. Please revert your temporary fix (where you remove the
>>>> acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our
>>>> patch instead, and share with us whether it rectifies the issue. Thanks.
>>>>
>>>> While for the deadlock issue reported by Alex, despite the similarity,
>>>> I believe this is an unrelated issue, so I'll reply it separately in the
>>>> original thread.
>>>>
>>>> Finally, for Janu, without the stack trace, I can't be certain that
>>>> your problem is the same as one or both of these issues. So my suggestion
>>>> is to apply both patches and see if the problem disappears. Also, I would
>>>> recommend to upgrade to version 2.6, if you haven't, since 2.3 is already
>>>> quite old.
>>>>
>>>> Best regards,
>>>> Ming
>>>>
>>>> On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de <janu@sympalog.de>
>>>> wrote:
>>>>
>>>>> Also seems like the one I reprted (happens also if the application
>>>>> calls pjsua_call_hangup() between receiving a sip BYE and sending the
>>>>> corresponding OK).
>>>>> I also got no response to that.
>>>>>
>>>>> Regards,
>>>>> Thomas
>>>>>
>>>>> ------ Originalnachricht------
>>>>> *Von: *Alex Hermann
>>>>> *Datum: *Fr., 17. März 2017 09:49
>>>>> *An: *pjsip@lists.pjsip.org;
>>>>> *Cc: *
>>>>> *Betreff:*Re: [pjsip] Interesting deadlock bug found causing three
>>>>> threads to deadlock on the PJSUA, UA, and transaction locks.
>>>>>
>>>>> On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt <+listpjsip@lists.pjsip.orghtt>p://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Visit our blog: http://blog.pjsip.org
>>>>>
>>>>> pjsip mailing list
>>>>> pjsip@lists.pjsip.org
>>>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Visit our blog: http://blog.pjsip.org
>>>>
>>>> pjsip mailing list
>>>> pjsip@lists.pjsip.org
>>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Visit our blog: http://blog.pjsip.org
>>>
>>> pjsip mailing list
>>> pjsip@lists.pjsip.org
>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>
>>>
>>
>> _______________________________________________
>> Visit our blog: http://blog.pjsip.org
>>
>> pjsip mailing list
>> pjsip@lists.pjsip.org
>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>
>>
>
> _______________________________________________
> Visit our blog: http://blog.pjsip.org
>
> pjsip mailing list
> pjsip@lists.pjsip.org
> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>
>
DR
David Richards
Fri, Mar 24, 2017 2:24 PM
Thanks Ming.
I'll start load testing with it.
-Dave
On Thu, Mar 23, 2017 at 7:26 PM, Ming ming@teluu.com wrote:
Hi David,
Please find the attached combined patch.
Regards,
Ming
On Thu, Mar 23, 2017 at 10:13 PM, David Richards <
david.brian.richards@gmail.com> wrote:
Hi Ming.
The patch is working great so far. I'll be doing more testing. Can you
send me a single patch file that contains Alex's deadlock patch too (and
any other deadlock fixes you might have) so i can test them all together?
Thanks,
Dave Richards
On Tue, Mar 21, 2017 at 5:08 AM, Ming ming@teluu.com wrote:
Hi David,
Ah, right, I completely forgot that application can obtain PJSUA lock in
the callback, so even though the library doesn't technically cause the
deadlock, it's still not a practical solution.
Your initial analysis seems to be correct, it's just that last time, I
didn't find a solution yet to prevent the transaction from getting
destroyed (as previously reported in ticket #1706 (
https://trac.pjsip.org/repos/ticket/1706), but now I think I do (in the
attached patch).
So, thanks for testing it before and please help us test the patch again.
Best regards,
Ming
On Tue, Mar 21, 2017 at 4:05 AM, David Richards <
david.brian.richards@gmail.com> wrote:
Hi Ming.
Thanks for the quick response. However your patch doesn't fix the
problem, it just moved it as shown in the Thread 1 stack backtrace below.
Your patch stopped the PJSUA lock from being obtained in
pjsua_call_on_state_changed()(frame 7), however, just a few frames
later (frame 5), my application calls pjsua_call_on_state_changed() from
on_call_state() where an attempt on the PJSUA lock is made and
deadlocked.
I didn't include the stack back traces of the other tasks because they
were essentially blocked the same as before. I admit I don't know this
software well, but why was my initial analysis incorrect? Don't the locks
always have to be locked in the same order?
Thanks for your help,
Dave
Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
#0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 in _L_lock_839 () from /lib64/libpthread.so.0
#2 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
#4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
#5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at
../src/pjsua-lib/pjsua_call.c:1817
#6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
#7 in pjsua_call_on_state_changed (inv=0x7f1a05272688,
e=0x7fff7ce7ceb0) at ../src/pjsua-lib/pjsua_call.c:3824
#8 in inv_set_state (inv=0x7f1a05272688, state=PJSIP_INV_STATE_DISCONNECTED,
e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:317
#9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:4310
#10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:717
#11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
#12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
../src/pjsip/sip_ua_layer.c:178
#13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED,
event_src_type=PJSIP_EVENT_TX_MSG, event_src=0x7f1a043c0c38, flag=0)
at ../src/pjsip/sip_transaction.c:1235
#14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608,
event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
#15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at
../src/pjsip/sip_transaction.c:1751
#16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
#17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at
../src/pjsip-ua/sip_inv.c:3227
#18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0,
msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426
#19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90
<dialer_info+3972080>) at TERMmain.c:313
#20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at
../src/pj/timer.c:643
#21 in TIMERpoll () at TIMERmain.c:72
#22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
#23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1,
argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
#24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782
On Mon, Mar 20, 2017 at 5:33 AM, Ming ming@teluu.com wrote:
Hi all,
Thanks for the report and sorry for the delay in answering.
For the problem reported by David, we think the problem is actually
caused by thread 1 (instead of thread 3) which shouldn't try to obtain
PJSUA's lock after holding the transaction lock. So we propose the fix
attached. Please revert your temporary fix (where you remove the
acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our
patch instead, and share with us whether it rectifies the issue. Thanks.
While for the deadlock issue reported by Alex, despite the similarity,
I believe this is an unrelated issue, so I'll reply it separately in the
original thread.
Finally, for Janu, without the stack trace, I can't be certain that
your problem is the same as one or both of these issues. So my suggestion
is to apply both patches and see if the problem disappears. Also, I would
recommend to upgrade to version 2.6, if you haven't, since 2.3 is already
quite old.
Best regards,
Ming
On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de janu@sympalog.de
wrote:
Also seems like the one I reprted (happens also if the application
calls pjsua_call_hangup() between receiving a sip BYE and sending the
corresponding OK).
I also got no response to that.
Regards,
Thomas
------ Originalnachricht------
*Von: *Alex Hermann
*Datum: *Fr., 17. März 2017 09:49
*An: *pjsip@lists.pjsip.org;
*Cc: *
*Betreff:*Re: [pjsip] Interesting deadlock bug found causing three
threads to deadlock on the PJSUA, UA, and transaction locks.
On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt +listpjsip@lists.pjsip.orghttp://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Thanks Ming.
I'll start load testing with it.
-Dave
On Thu, Mar 23, 2017 at 7:26 PM, Ming <ming@teluu.com> wrote:
> Hi David,
>
> Please find the attached combined patch.
>
> Regards,
> Ming
>
> On Thu, Mar 23, 2017 at 10:13 PM, David Richards <
> david.brian.richards@gmail.com> wrote:
>
>> Hi Ming.
>>
>> The patch is working great so far. I'll be doing more testing. Can you
>> send me a single patch file that contains Alex's deadlock patch too (and
>> any other deadlock fixes you might have) so i can test them all together?
>>
>> Thanks,
>> Dave Richards
>>
>> On Tue, Mar 21, 2017 at 5:08 AM, Ming <ming@teluu.com> wrote:
>>
>>> Hi David,
>>>
>>> Ah, right, I completely forgot that application can obtain PJSUA lock in
>>> the callback, so even though the library doesn't technically cause the
>>> deadlock, it's still not a practical solution.
>>>
>>> Your initial analysis seems to be correct, it's just that last time, I
>>> didn't find a solution yet to prevent the transaction from getting
>>> destroyed (as previously reported in ticket #1706 (
>>> https://trac.pjsip.org/repos/ticket/1706), but now I think I do (in the
>>> attached patch).
>>>
>>> So, thanks for testing it before and please help us test the patch again.
>>>
>>> Best regards,
>>> Ming
>>>
>>> On Tue, Mar 21, 2017 at 4:05 AM, David Richards <
>>> david.brian.richards@gmail.com> wrote:
>>>
>>>> Hi Ming.
>>>>
>>>> Thanks for the quick response. However your patch doesn't fix the
>>>> problem, it just moved it as shown in the Thread 1 stack backtrace below.
>>>> Your patch stopped the PJSUA lock from being obtained in
>>>> pjsua_call_on_state_changed()(frame 7), however, just a few frames
>>>> later (frame 5), my application calls pjsua_call_on_state_changed() from
>>>> on_call_state() where an attempt on the PJSUA lock is made and
>>>> deadlocked.
>>>>
>>>> I didn't include the stack back traces of the other tasks because they
>>>> were essentially blocked the same as before. I admit I don't know this
>>>> software well, but why was my initial analysis incorrect? Don't the locks
>>>> always have to be locked in the same order?
>>>>
>>>> Thanks for your help,
>>>> Dave
>>>>
>>>> Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
>>>> #0 in __lll_lock_wait () from /lib64/libpthread.so.0
>>>> #1 in _L_lock_839 () from /lib64/libpthread.so.0
>>>> #2 in pthread_mutex_lock () from /lib64/libpthread.so.0
>>>> #3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
>>>> #4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
>>>> #5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at
>>>> ../src/pjsua-lib/pjsua_call.c:1817
>>>> #6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
>>>> #7 in pjsua_call_on_state_changed (inv=0x7f1a05272688,
>>>> e=0x7fff7ce7ceb0) at ../src/pjsua-lib/pjsua_call.c:3824
>>>> #8 in inv_set_state (inv=0x7f1a05272688, state=PJSIP_INV_STATE_DISCONNECTED,
>>>> e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:317
>>>> #9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at
>>>> ../src/pjsip-ua/sip_inv.c:4310
>>>> #10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
>>>> ../src/pjsip-ua/sip_inv.c:717
>>>> #11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
>>>> e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
>>>> #12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
>>>> ../src/pjsip/sip_ua_layer.c:178
>>>> #13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED,
>>>> event_src_type=PJSIP_EVENT_TX_MSG, event_src=0x7f1a043c0c38, flag=0)
>>>> at ../src/pjsip/sip_transaction.c:1235
>>>> #14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608,
>>>> event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
>>>> #15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at
>>>> ../src/pjsip/sip_transaction.c:1751
>>>> #16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
>>>> tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
>>>> #17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at
>>>> ../src/pjsip-ua/sip_inv.c:3227
>>>> #18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0,
>>>> msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426
>>>> #19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90
>>>> <dialer_info+3972080>) at TERMmain.c:313
>>>> #20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at
>>>> ../src/pj/timer.c:643
>>>> #21 in TIMERpoll () at TIMERmain.c:72
>>>> #22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
>>>> #23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1,
>>>> argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
>>>> #24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782
>>>>
>>>>
>>>> On Mon, Mar 20, 2017 at 5:33 AM, Ming <ming@teluu.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Thanks for the report and sorry for the delay in answering.
>>>>>
>>>>> For the problem reported by David, we think the problem is actually
>>>>> caused by thread 1 (instead of thread 3) which shouldn't try to obtain
>>>>> PJSUA's lock after holding the transaction lock. So we propose the fix
>>>>> attached. Please revert your temporary fix (where you remove the
>>>>> acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our
>>>>> patch instead, and share with us whether it rectifies the issue. Thanks.
>>>>>
>>>>> While for the deadlock issue reported by Alex, despite the similarity,
>>>>> I believe this is an unrelated issue, so I'll reply it separately in the
>>>>> original thread.
>>>>>
>>>>> Finally, for Janu, without the stack trace, I can't be certain that
>>>>> your problem is the same as one or both of these issues. So my suggestion
>>>>> is to apply both patches and see if the problem disappears. Also, I would
>>>>> recommend to upgrade to version 2.6, if you haven't, since 2.3 is already
>>>>> quite old.
>>>>>
>>>>> Best regards,
>>>>> Ming
>>>>>
>>>>> On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de <janu@sympalog.de>
>>>>> wrote:
>>>>>
>>>>>> Also seems like the one I reprted (happens also if the application
>>>>>> calls pjsua_call_hangup() between receiving a sip BYE and sending the
>>>>>> corresponding OK).
>>>>>> I also got no response to that.
>>>>>>
>>>>>> Regards,
>>>>>> Thomas
>>>>>>
>>>>>> ------ Originalnachricht------
>>>>>> *Von: *Alex Hermann
>>>>>> *Datum: *Fr., 17. März 2017 09:49
>>>>>> *An: *pjsip@lists.pjsip.org;
>>>>>> *Cc: *
>>>>>> *Betreff:*Re: [pjsip] Interesting deadlock bug found causing three
>>>>>> threads to deadlock on the PJSUA, UA, and transaction locks.
>>>>>>
>>>>>> On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt <+listpjsip@lists.pjsip.orghtt>p://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Visit our blog: http://blog.pjsip.org
>>>>>>
>>>>>> pjsip mailing list
>>>>>> pjsip@lists.pjsip.org
>>>>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Visit our blog: http://blog.pjsip.org
>>>>>
>>>>> pjsip mailing list
>>>>> pjsip@lists.pjsip.org
>>>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Visit our blog: http://blog.pjsip.org
>>>>
>>>> pjsip mailing list
>>>> pjsip@lists.pjsip.org
>>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Visit our blog: http://blog.pjsip.org
>>>
>>> pjsip mailing list
>>> pjsip@lists.pjsip.org
>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>
>>>
>>
>> _______________________________________________
>> Visit our blog: http://blog.pjsip.org
>>
>> pjsip mailing list
>> pjsip@lists.pjsip.org
>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>
>>
>
> _______________________________________________
> Visit our blog: http://blog.pjsip.org
>
> pjsip mailing list
> pjsip@lists.pjsip.org
> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>
>
RB
Ross Beer
Tue, Mar 28, 2017 1:05 PM
Hi Ming,
Will this be committed to SVN?
Kind regards,
Ross
From: pjsip pjsip-bounces@lists.pjsip.org on behalf of Ming ming@teluu.com
Sent: 24 March 2017 00:26
To: pjsip list
Subject: Re: [pjsip] Interesting deadlock bug found causing three threads to deadlock on the PJSUA, UA, and transaction locks.
Hi David,
Please find the attached combined patch.
Regards,
Ming
On Thu, Mar 23, 2017 at 10:13 PM, David Richards <david.brian.richards@gmail.commailto:david.brian.richards@gmail.com> wrote:
Hi Ming.
The patch is working great so far. I'll be doing more testing. Can you send me a single patch file that contains Alex's deadlock patch too (and any other deadlock fixes you might have) so i can test them all together?
Thanks,
Dave Richards
On Tue, Mar 21, 2017 at 5:08 AM, Ming <ming@teluu.commailto:ming@teluu.com> wrote:
Hi David,
Ah, right, I completely forgot that application can obtain PJSUA lock in the callback, so even though the library doesn't technically cause the deadlock, it's still not a practical solution.
Your initial analysis seems to be correct, it's just that last time, I didn't find a solution yet to prevent the transaction from getting destroyed (as previously reported in ticket #1706 (https://trac.pjsip.org/repos/ticket/1706), but now I think I do (in the attached patch).
So, thanks for testing it before and please help us test the patch again.
Best regards,
Ming
On Tue, Mar 21, 2017 at 4:05 AM, David Richards <david.brian.richards@gmail.commailto:david.brian.richards@gmail.com> wrote:
Hi Ming.
Thanks for the quick response. However your patch doesn't fix the problem, it just moved it as shown in the Thread 1 stack backtrace below. Your patch stopped the PJSUA lock from being obtained in pjsua_call_on_state_changed()(frame 7), however, just a few frames later (frame 5), my application calls pjsua_call_on_state_changed() from on_call_state() where an attempt on the PJSUA lock is made and deadlocked.
I didn't include the stack back traces of the other tasks because they were essentially blocked the same as before. I admit I don't know this software well, but why was my initial analysis incorrect? Don't the locks always have to be locked in the same order?
Thanks for your help,
Dave
Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
#0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 in _L_lock_839 () from /lib64/libpthread.so.0
#2 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
#4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
#5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at ../src/pjsua-lib/pjsua_call.c:1817
#6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
#7 in pjsua_call_on_state_changed (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at ../src/pjsua-lib/pjsua_call.c:3824
#8 in inv_set_state (inv=0x7f1a05272688, state=PJSIP_INV_STATE_DISCONNECTED, e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:317
#9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:4310
#10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:717
#11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
#12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at ../src/pjsip/sip_ua_layer.c:178
#13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED, event_src_type=PJSIP_EVENT_TX_MSG, event_src=0x7f1a043c0c38, flag=0) at ../src/pjsip/sip_transaction.c:1235
#14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608, event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
#15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at ../src/pjsip/sip_transaction.c:1751
#16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
#17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at ../src/pjsip-ua/sip_inv.c:3227
#18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0, msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426
#19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90 <dialer_info+3972080>) at TERMmain.c:313
#20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at ../src/pj/timer.c:643
#21 in TIMERpoll () at TIMERmain.c:72
#22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
#23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1, argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
#24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782
On Mon, Mar 20, 2017 at 5:33 AM, Ming <ming@teluu.commailto:ming@teluu.com> wrote:
Hi all,
Thanks for the report and sorry for the delay in answering.
For the problem reported by David, we think the problem is actually caused by thread 1 (instead of thread 3) which shouldn't try to obtain PJSUA's lock after holding the transaction lock. So we propose the fix attached. Please revert your temporary fix (where you remove the acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our patch instead, and share with us whether it rectifies the issue. Thanks.
While for the deadlock issue reported by Alex, despite the similarity, I believe this is an unrelated issue, so I'll reply it separately in the original thread.
Finally, for Janu, without the stack trace, I can't be certain that your problem is the same as one or both of these issues. So my suggestion is to apply both patches and see if the problem disappears. Also, I would recommend to upgrade to version 2.6, if you haven't, since 2.3 is already quite old.
Best regards,
Ming
On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.demailto:janu@sympalog.de <janu@sympalog.demailto:janu@sympalog.de> wrote:
Also seems like the one I reprted (happens also if the application calls pjsua_call_hangup() between receiving a sip BYE and sending the corresponding OK).
I also got no response to that.
Regards,
Thomas
------ Originalnachricht------
Von: Alex Hermann
Datum: Fr., 17. März 2017 09:49
An: pjsip@lists.pjsip.orgmailto:pjsip@lists.pjsip.org;
Cc:
Betreff:Re: [pjsip] Interesting deadlock bug found causing three threads to deadlock on the PJSUA, UA, and transaction locks.
On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghttmailto:+listpjsip@lists.pjsip.orghttp://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.orghttp://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.orgmailto:pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.orgmailto:pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.orgmailto:pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.orgmailto:pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.orgmailto:pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Hi Ming,
Will this be committed to SVN?
Kind regards,
Ross
________________________________
From: pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Ming <ming@teluu.com>
Sent: 24 March 2017 00:26
To: pjsip list
Subject: Re: [pjsip] Interesting deadlock bug found causing three threads to deadlock on the PJSUA, UA, and transaction locks.
Hi David,
Please find the attached combined patch.
Regards,
Ming
On Thu, Mar 23, 2017 at 10:13 PM, David Richards <david.brian.richards@gmail.com<mailto:david.brian.richards@gmail.com>> wrote:
Hi Ming.
The patch is working great so far. I'll be doing more testing. Can you send me a single patch file that contains Alex's deadlock patch too (and any other deadlock fixes you might have) so i can test them all together?
Thanks,
Dave Richards
On Tue, Mar 21, 2017 at 5:08 AM, Ming <ming@teluu.com<mailto:ming@teluu.com>> wrote:
Hi David,
Ah, right, I completely forgot that application can obtain PJSUA lock in the callback, so even though the library doesn't technically cause the deadlock, it's still not a practical solution.
Your initial analysis seems to be correct, it's just that last time, I didn't find a solution yet to prevent the transaction from getting destroyed (as previously reported in ticket #1706 (https://trac.pjsip.org/repos/ticket/1706), but now I think I do (in the attached patch).
So, thanks for testing it before and please help us test the patch again.
Best regards,
Ming
On Tue, Mar 21, 2017 at 4:05 AM, David Richards <david.brian.richards@gmail.com<mailto:david.brian.richards@gmail.com>> wrote:
Hi Ming.
Thanks for the quick response. However your patch doesn't fix the problem, it just moved it as shown in the Thread 1 stack backtrace below. Your patch stopped the PJSUA lock from being obtained in pjsua_call_on_state_changed()(frame 7), however, just a few frames later (frame 5), my application calls pjsua_call_on_state_changed() from on_call_state() where an attempt on the PJSUA lock is made and deadlocked.
I didn't include the stack back traces of the other tasks because they were essentially blocked the same as before. I admit I don't know this software well, but why was my initial analysis incorrect? Don't the locks always have to be locked in the same order?
Thanks for your help,
Dave
Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
#0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 in _L_lock_839 () from /lib64/libpthread.so.0
#2 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
#4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
#5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at ../src/pjsua-lib/pjsua_call.c:1817
#6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
#7 in pjsua_call_on_state_changed (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at ../src/pjsua-lib/pjsua_call.c:3824
#8 in inv_set_state (inv=0x7f1a05272688, state=PJSIP_INV_STATE_DISCONNECTED, e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:317
#9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:4310
#10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:717
#11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
#12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at ../src/pjsip/sip_ua_layer.c:178
#13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED, event_src_type=PJSIP_EVENT_TX_MSG, event_src=0x7f1a043c0c38, flag=0) at ../src/pjsip/sip_transaction.c:1235
#14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608, event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
#15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at ../src/pjsip/sip_transaction.c:1751
#16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
#17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at ../src/pjsip-ua/sip_inv.c:3227
#18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0, msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426
#19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90 <dialer_info+3972080>) at TERMmain.c:313
#20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at ../src/pj/timer.c:643
#21 in TIMERpoll () at TIMERmain.c:72
#22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
#23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1, argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
#24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782
On Mon, Mar 20, 2017 at 5:33 AM, Ming <ming@teluu.com<mailto:ming@teluu.com>> wrote:
Hi all,
Thanks for the report and sorry for the delay in answering.
For the problem reported by David, we think the problem is actually caused by thread 1 (instead of thread 3) which shouldn't try to obtain PJSUA's lock after holding the transaction lock. So we propose the fix attached. Please revert your temporary fix (where you remove the acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our patch instead, and share with us whether it rectifies the issue. Thanks.
While for the deadlock issue reported by Alex, despite the similarity, I believe this is an unrelated issue, so I'll reply it separately in the original thread.
Finally, for Janu, without the stack trace, I can't be certain that your problem is the same as one or both of these issues. So my suggestion is to apply both patches and see if the problem disappears. Also, I would recommend to upgrade to version 2.6, if you haven't, since 2.3 is already quite old.
Best regards,
Ming
On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de<mailto:janu@sympalog.de> <janu@sympalog.de<mailto:janu@sympalog.de>> wrote:
Also seems like the one I reprted (happens also if the application calls pjsua_call_hangup() between receiving a sip BYE and sending the corresponding OK).
I also got no response to that.
Regards,
Thomas
------ Originalnachricht------
Von: Alex Hermann
Datum: Fr., 17. März 2017 09:49
An: pjsip@lists.pjsip.org<mailto:pjsip@lists.pjsip.org>;
Cc:
Betreff:Re: [pjsip] Interesting deadlock bug found causing three threads to deadlock on the PJSUA, UA, and transaction locks.
On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt<mailto:+listpjsip@lists.pjsip.orghtt>p://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org<http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org>
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.org<mailto:pjsip@lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.org<mailto:pjsip@lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.org<mailto:pjsip@lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.org<mailto:pjsip@lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.org<mailto:pjsip@lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
DR
David Richards
Tue, Mar 28, 2017 2:32 PM
Hi Ming.
I tested your deadlock patch over the weekend and it worked great with no
problems!
Thanks again.,
Dave
On Thu, Mar 23, 2017 at 7:26 PM, Ming ming@teluu.com wrote:
Hi David,
Please find the attached combined patch.
Regards,
Ming
On Thu, Mar 23, 2017 at 10:13 PM, David Richards <
david.brian.richards@gmail.com> wrote:
Hi Ming.
The patch is working great so far. I'll be doing more testing. Can you
send me a single patch file that contains Alex's deadlock patch too (and
any other deadlock fixes you might have) so i can test them all together?
Thanks,
Dave Richards
On Tue, Mar 21, 2017 at 5:08 AM, Ming ming@teluu.com wrote:
Hi David,
Ah, right, I completely forgot that application can obtain PJSUA lock in
the callback, so even though the library doesn't technically cause the
deadlock, it's still not a practical solution.
Your initial analysis seems to be correct, it's just that last time, I
didn't find a solution yet to prevent the transaction from getting
destroyed (as previously reported in ticket #1706 (
https://trac.pjsip.org/repos/ticket/1706), but now I think I do (in the
attached patch).
So, thanks for testing it before and please help us test the patch again.
Best regards,
Ming
On Tue, Mar 21, 2017 at 4:05 AM, David Richards <
david.brian.richards@gmail.com> wrote:
Hi Ming.
Thanks for the quick response. However your patch doesn't fix the
problem, it just moved it as shown in the Thread 1 stack backtrace below.
Your patch stopped the PJSUA lock from being obtained in
pjsua_call_on_state_changed()(frame 7), however, just a few frames
later (frame 5), my application calls pjsua_call_on_state_changed() from
on_call_state() where an attempt on the PJSUA lock is made and
deadlocked.
I didn't include the stack back traces of the other tasks because they
were essentially blocked the same as before. I admit I don't know this
software well, but why was my initial analysis incorrect? Don't the locks
always have to be locked in the same order?
Thanks for your help,
Dave
Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
#0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 in _L_lock_839 () from /lib64/libpthread.so.0
#2 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
#4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
#5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at
../src/pjsua-lib/pjsua_call.c:1817
#6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
#7 in pjsua_call_on_state_changed (inv=0x7f1a05272688,
e=0x7fff7ce7ceb0) at ../src/pjsua-lib/pjsua_call.c:3824
#8 in inv_set_state (inv=0x7f1a05272688, state=PJSIP_INV_STATE_DISCONNECTED,
e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:317
#9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:4310
#10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:717
#11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
#12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
../src/pjsip/sip_ua_layer.c:178
#13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED,
event_src_type=PJSIP_EVENT_TX_MSG, event_src=0x7f1a043c0c38, flag=0)
at ../src/pjsip/sip_transaction.c:1235
#14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608,
event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
#15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at
../src/pjsip/sip_transaction.c:1751
#16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
#17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at
../src/pjsip-ua/sip_inv.c:3227
#18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0,
msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426
#19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90
<dialer_info+3972080>) at TERMmain.c:313
#20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at
../src/pj/timer.c:643
#21 in TIMERpoll () at TIMERmain.c:72
#22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
#23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1,
argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
#24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782
On Mon, Mar 20, 2017 at 5:33 AM, Ming ming@teluu.com wrote:
Hi all,
Thanks for the report and sorry for the delay in answering.
For the problem reported by David, we think the problem is actually
caused by thread 1 (instead of thread 3) which shouldn't try to obtain
PJSUA's lock after holding the transaction lock. So we propose the fix
attached. Please revert your temporary fix (where you remove the
acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our
patch instead, and share with us whether it rectifies the issue. Thanks.
While for the deadlock issue reported by Alex, despite the similarity,
I believe this is an unrelated issue, so I'll reply it separately in the
original thread.
Finally, for Janu, without the stack trace, I can't be certain that
your problem is the same as one or both of these issues. So my suggestion
is to apply both patches and see if the problem disappears. Also, I would
recommend to upgrade to version 2.6, if you haven't, since 2.3 is already
quite old.
Best regards,
Ming
On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de janu@sympalog.de
wrote:
Also seems like the one I reprted (happens also if the application
calls pjsua_call_hangup() between receiving a sip BYE and sending the
corresponding OK).
I also got no response to that.
Regards,
Thomas
------ Originalnachricht------
*Von: *Alex Hermann
*Datum: *Fr., 17. März 2017 09:49
*An: *pjsip@lists.pjsip.org;
*Cc: *
*Betreff:*Re: [pjsip] Interesting deadlock bug found causing three
threads to deadlock on the PJSUA, UA, and transaction locks.
On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt +listpjsip@lists.pjsip.orghttp://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Hi Ming.
I tested your deadlock patch over the weekend and it worked great with no
problems!
Thanks again.,
Dave
On Thu, Mar 23, 2017 at 7:26 PM, Ming <ming@teluu.com> wrote:
> Hi David,
>
> Please find the attached combined patch.
>
> Regards,
> Ming
>
> On Thu, Mar 23, 2017 at 10:13 PM, David Richards <
> david.brian.richards@gmail.com> wrote:
>
>> Hi Ming.
>>
>> The patch is working great so far. I'll be doing more testing. Can you
>> send me a single patch file that contains Alex's deadlock patch too (and
>> any other deadlock fixes you might have) so i can test them all together?
>>
>> Thanks,
>> Dave Richards
>>
>> On Tue, Mar 21, 2017 at 5:08 AM, Ming <ming@teluu.com> wrote:
>>
>>> Hi David,
>>>
>>> Ah, right, I completely forgot that application can obtain PJSUA lock in
>>> the callback, so even though the library doesn't technically cause the
>>> deadlock, it's still not a practical solution.
>>>
>>> Your initial analysis seems to be correct, it's just that last time, I
>>> didn't find a solution yet to prevent the transaction from getting
>>> destroyed (as previously reported in ticket #1706 (
>>> https://trac.pjsip.org/repos/ticket/1706), but now I think I do (in the
>>> attached patch).
>>>
>>> So, thanks for testing it before and please help us test the patch again.
>>>
>>> Best regards,
>>> Ming
>>>
>>> On Tue, Mar 21, 2017 at 4:05 AM, David Richards <
>>> david.brian.richards@gmail.com> wrote:
>>>
>>>> Hi Ming.
>>>>
>>>> Thanks for the quick response. However your patch doesn't fix the
>>>> problem, it just moved it as shown in the Thread 1 stack backtrace below.
>>>> Your patch stopped the PJSUA lock from being obtained in
>>>> pjsua_call_on_state_changed()(frame 7), however, just a few frames
>>>> later (frame 5), my application calls pjsua_call_on_state_changed() from
>>>> on_call_state() where an attempt on the PJSUA lock is made and
>>>> deadlocked.
>>>>
>>>> I didn't include the stack back traces of the other tasks because they
>>>> were essentially blocked the same as before. I admit I don't know this
>>>> software well, but why was my initial analysis incorrect? Don't the locks
>>>> always have to be locked in the same order?
>>>>
>>>> Thanks for your help,
>>>> Dave
>>>>
>>>> Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
>>>> #0 in __lll_lock_wait () from /lib64/libpthread.so.0
>>>> #1 in _L_lock_839 () from /lib64/libpthread.so.0
>>>> #2 in pthread_mutex_lock () from /lib64/libpthread.so.0
>>>> #3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
>>>> #4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
>>>> #5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at
>>>> ../src/pjsua-lib/pjsua_call.c:1817
>>>> #6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
>>>> #7 in pjsua_call_on_state_changed (inv=0x7f1a05272688,
>>>> e=0x7fff7ce7ceb0) at ../src/pjsua-lib/pjsua_call.c:3824
>>>> #8 in inv_set_state (inv=0x7f1a05272688, state=PJSIP_INV_STATE_DISCONNECTED,
>>>> e=0x7fff7ce7ceb0) at ../src/pjsip-ua/sip_inv.c:317
>>>> #9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at
>>>> ../src/pjsip-ua/sip_inv.c:4310
>>>> #10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
>>>> ../src/pjsip-ua/sip_inv.c:717
>>>> #11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
>>>> e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
>>>> #12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
>>>> ../src/pjsip/sip_ua_layer.c:178
>>>> #13 in tsx_set_state (tsx=0x7f1a05273608, state=PJSIP_TSX_STATE_COMPLETED,
>>>> event_src_type=PJSIP_EVENT_TX_MSG, event_src=0x7f1a043c0c38, flag=0)
>>>> at ../src/pjsip/sip_transaction.c:1235
>>>> #14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608,
>>>> event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
>>>> #15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at
>>>> ../src/pjsip/sip_transaction.c:1751
>>>> #16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
>>>> tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
>>>> #17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38) at
>>>> ../src/pjsip-ua/sip_inv.c:3227
>>>> #18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0,
>>>> msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426
>>>> #19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90
>>>> <dialer_info+3972080>) at TERMmain.c:313
>>>> #20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at
>>>> ../src/pj/timer.c:643
>>>> #21 in TIMERpoll () at TIMERmain.c:72
>>>> #22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
>>>> #23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1,
>>>> argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
>>>> #24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782
>>>>
>>>>
>>>> On Mon, Mar 20, 2017 at 5:33 AM, Ming <ming@teluu.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Thanks for the report and sorry for the delay in answering.
>>>>>
>>>>> For the problem reported by David, we think the problem is actually
>>>>> caused by thread 1 (instead of thread 3) which shouldn't try to obtain
>>>>> PJSUA's lock after holding the transaction lock. So we propose the fix
>>>>> attached. Please revert your temporary fix (where you remove the
>>>>> acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our
>>>>> patch instead, and share with us whether it rectifies the issue. Thanks.
>>>>>
>>>>> While for the deadlock issue reported by Alex, despite the similarity,
>>>>> I believe this is an unrelated issue, so I'll reply it separately in the
>>>>> original thread.
>>>>>
>>>>> Finally, for Janu, without the stack trace, I can't be certain that
>>>>> your problem is the same as one or both of these issues. So my suggestion
>>>>> is to apply both patches and see if the problem disappears. Also, I would
>>>>> recommend to upgrade to version 2.6, if you haven't, since 2.3 is already
>>>>> quite old.
>>>>>
>>>>> Best regards,
>>>>> Ming
>>>>>
>>>>> On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de <janu@sympalog.de>
>>>>> wrote:
>>>>>
>>>>>> Also seems like the one I reprted (happens also if the application
>>>>>> calls pjsua_call_hangup() between receiving a sip BYE and sending the
>>>>>> corresponding OK).
>>>>>> I also got no response to that.
>>>>>>
>>>>>> Regards,
>>>>>> Thomas
>>>>>>
>>>>>> ------ Originalnachricht------
>>>>>> *Von: *Alex Hermann
>>>>>> *Datum: *Fr., 17. März 2017 09:49
>>>>>> *An: *pjsip@lists.pjsip.org;
>>>>>> *Cc: *
>>>>>> *Betreff:*Re: [pjsip] Interesting deadlock bug found causing three
>>>>>> threads to deadlock on the PJSUA, UA, and transaction locks.
>>>>>>
>>>>>> On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt <+listpjsip@lists.pjsip.orghtt>p://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Visit our blog: http://blog.pjsip.org
>>>>>>
>>>>>> pjsip mailing list
>>>>>> pjsip@lists.pjsip.org
>>>>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Visit our blog: http://blog.pjsip.org
>>>>>
>>>>> pjsip mailing list
>>>>> pjsip@lists.pjsip.org
>>>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Visit our blog: http://blog.pjsip.org
>>>>
>>>> pjsip mailing list
>>>> pjsip@lists.pjsip.org
>>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Visit our blog: http://blog.pjsip.org
>>>
>>> pjsip mailing list
>>> pjsip@lists.pjsip.org
>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>
>>>
>>
>> _______________________________________________
>> Visit our blog: http://blog.pjsip.org
>>
>> pjsip mailing list
>> pjsip@lists.pjsip.org
>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>
>>
>
> _______________________________________________
> Visit our blog: http://blog.pjsip.org
>
> pjsip mailing list
> pjsip@lists.pjsip.org
> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>
>
M
Ming
Wed, Mar 29, 2017 2:40 AM
Hi Ming.
I tested your deadlock patch over the weekend and it worked great with no
problems!
Thanks again.,
Dave
On Thu, Mar 23, 2017 at 7:26 PM, Ming ming@teluu.com wrote:
Hi David,
Please find the attached combined patch.
Regards,
Ming
On Thu, Mar 23, 2017 at 10:13 PM, David Richards <
david.brian.richards@gmail.com> wrote:
Hi Ming.
The patch is working great so far. I'll be doing more testing. Can you
send me a single patch file that contains Alex's deadlock patch too (and
any other deadlock fixes you might have) so i can test them all together?
Thanks,
Dave Richards
On Tue, Mar 21, 2017 at 5:08 AM, Ming ming@teluu.com wrote:
Hi David,
Ah, right, I completely forgot that application can obtain PJSUA lock
in the callback, so even though the library doesn't technically cause the
deadlock, it's still not a practical solution.
Your initial analysis seems to be correct, it's just that last time, I
didn't find a solution yet to prevent the transaction from getting
destroyed (as previously reported in ticket #1706 (
https://trac.pjsip.org/repos/ticket/1706), but now I think I do (in
the attached patch).
So, thanks for testing it before and please help us test the patch
again.
Best regards,
Ming
On Tue, Mar 21, 2017 at 4:05 AM, David Richards <
david.brian.richards@gmail.com> wrote:
Hi Ming.
Thanks for the quick response. However your patch doesn't fix the
problem, it just moved it as shown in the Thread 1 stack backtrace below.
Your patch stopped the PJSUA lock from being obtained in
pjsua_call_on_state_changed()(frame 7), however, just a few frames
later (frame 5), my application calls pjsua_call_on_state_changed() from
on_call_state() where an attempt on the PJSUA lock is made and
deadlocked.
I didn't include the stack back traces of the other tasks because they
were essentially blocked the same as before. I admit I don't know this
software well, but why was my initial analysis incorrect? Don't the locks
always have to be locked in the same order?
Thanks for your help,
Dave
Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
#0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 in _L_lock_839 () from /lib64/libpthread.so.0
#2 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
#4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
#5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at
../src/pjsua-lib/pjsua_call.c:1817
#6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
#7 in pjsua_call_on_state_changed (inv=0x7f1a05272688,
e=0x7fff7ce7ceb0) at ../src/pjsua-lib/pjsua_call.c:3824
#8 in inv_set_state (inv=0x7f1a05272688,
state=PJSIP_INV_STATE_DISCONNECTED, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:317
#9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:4310
#10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
../src/pjsip-ua/sip_inv.c:717
#11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
#12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
../src/pjsip/sip_ua_layer.c:178
#13 in tsx_set_state (tsx=0x7f1a05273608,
state=PJSIP_TSX_STATE_COMPLETED, event_src_type=PJSIP_EVENT_TX_MSG,
event_src=0x7f1a043c0c38, flag=0) at ../src/pjsip/sip_transaction.c
:1235
#14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608,
event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
#15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38)
at ../src/pjsip/sip_transaction.c:1751
#16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598,
tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
#17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38)
at ../src/pjsip-ua/sip_inv.c:3227
#18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0,
msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426
#19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90
<dialer_info+3972080>) at TERMmain.c:313
#20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at
../src/pj/timer.c:643
#21 in TIMERpoll () at TIMERmain.c:72
#22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
#23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1,
argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
#24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782
On Mon, Mar 20, 2017 at 5:33 AM, Ming ming@teluu.com wrote:
Hi all,
Thanks for the report and sorry for the delay in answering.
For the problem reported by David, we think the problem is actually
caused by thread 1 (instead of thread 3) which shouldn't try to obtain
PJSUA's lock after holding the transaction lock. So we propose the fix
attached. Please revert your temporary fix (where you remove the
acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our
patch instead, and share with us whether it rectifies the issue. Thanks.
While for the deadlock issue reported by Alex, despite the
similarity, I believe this is an unrelated issue, so I'll reply it
separately in the original thread.
Finally, for Janu, without the stack trace, I can't be certain that
your problem is the same as one or both of these issues. So my suggestion
is to apply both patches and see if the problem disappears. Also, I would
recommend to upgrade to version 2.6, if you haven't, since 2.3 is already
quite old.
Best regards,
Ming
On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de janu@sympalog.de
wrote:
Also seems like the one I reprted (happens also if the application
calls pjsua_call_hangup() between receiving a sip BYE and sending the
corresponding OK).
I also got no response to that.
Regards,
Thomas
------ Originalnachricht------
*Von: *Alex Hermann
*Datum: *Fr., 17. März 2017 09:49
*An: *pjsip@lists.pjsip.org;
*Cc: *
*Betreff:*Re: [pjsip] Interesting deadlock bug found causing three
threads to deadlock on the PJSUA, UA, and transaction locks.
On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt +listpjsip@lists.pjsip.orghttp://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Hi David,
Thanks for all your help in testing the patches.
@Ross: Yes, we just committed the patches in ticket #2001 (
https://trac.pjsip.org/repos/ticket/2001) and #2002 (
https://trac.pjsip.org/repos/ticket/2002).
Best regards,
Ming
On Tue, Mar 28, 2017 at 10:32 PM, David Richards <
david.brian.richards@gmail.com> wrote:
> Hi Ming.
>
> I tested your deadlock patch over the weekend and it worked great with no
> problems!
>
> Thanks again.,
> Dave
>
> On Thu, Mar 23, 2017 at 7:26 PM, Ming <ming@teluu.com> wrote:
>
>> Hi David,
>>
>> Please find the attached combined patch.
>>
>> Regards,
>> Ming
>>
>> On Thu, Mar 23, 2017 at 10:13 PM, David Richards <
>> david.brian.richards@gmail.com> wrote:
>>
>>> Hi Ming.
>>>
>>> The patch is working great so far. I'll be doing more testing. Can you
>>> send me a single patch file that contains Alex's deadlock patch too (and
>>> any other deadlock fixes you might have) so i can test them all together?
>>>
>>> Thanks,
>>> Dave Richards
>>>
>>> On Tue, Mar 21, 2017 at 5:08 AM, Ming <ming@teluu.com> wrote:
>>>
>>>> Hi David,
>>>>
>>>> Ah, right, I completely forgot that application can obtain PJSUA lock
>>>> in the callback, so even though the library doesn't technically cause the
>>>> deadlock, it's still not a practical solution.
>>>>
>>>> Your initial analysis seems to be correct, it's just that last time, I
>>>> didn't find a solution yet to prevent the transaction from getting
>>>> destroyed (as previously reported in ticket #1706 (
>>>> https://trac.pjsip.org/repos/ticket/1706), but now I think I do (in
>>>> the attached patch).
>>>>
>>>> So, thanks for testing it before and please help us test the patch
>>>> again.
>>>>
>>>> Best regards,
>>>> Ming
>>>>
>>>> On Tue, Mar 21, 2017 at 4:05 AM, David Richards <
>>>> david.brian.richards@gmail.com> wrote:
>>>>
>>>>> Hi Ming.
>>>>>
>>>>> Thanks for the quick response. However your patch doesn't fix the
>>>>> problem, it just moved it as shown in the Thread 1 stack backtrace below.
>>>>> Your patch stopped the PJSUA lock from being obtained in
>>>>> pjsua_call_on_state_changed()(frame 7), however, just a few frames
>>>>> later (frame 5), my application calls pjsua_call_on_state_changed() from
>>>>> on_call_state() where an attempt on the PJSUA lock is made and
>>>>> deadlocked.
>>>>>
>>>>> I didn't include the stack back traces of the other tasks because they
>>>>> were essentially blocked the same as before. I admit I don't know this
>>>>> software well, but why was my initial analysis incorrect? Don't the locks
>>>>> always have to be locked in the same order?
>>>>>
>>>>> Thanks for your help,
>>>>> Dave
>>>>>
>>>>> Thread 1 (Thread 0x7f1a3935d840 (LWP 27501)):
>>>>> #0 in __lll_lock_wait () from /lib64/libpthread.so.0
>>>>> #1 in _L_lock_839 () from /lib64/libpthread.so.0
>>>>> #2 in pthread_mutex_lock () from /lib64/libpthread.so.0
>>>>> #3 in pj_mutex_lock (mutex=0x4594238) at ../src/pj/os_core_unix.c:1265
>>>>> #4 in PJSUA_LOCK () at ../include/pjsua-lib/pjsua_internal.h:575
>>>>> #5 in pjsua_call_get_info (call_id=1091, info=0x7fff7ce7c610) at
>>>>> ../src/pjsua-lib/pjsua_call.c:1817
>>>>> #6 in on_call_state (call_id=1091, e=0x7fff7ce7ceb0) at TERMmain.c:509
>>>>> #7 in pjsua_call_on_state_changed (inv=0x7f1a05272688,
>>>>> e=0x7fff7ce7ceb0) at ../src/pjsua-lib/pjsua_call.c:3824
>>>>> #8 in inv_set_state (inv=0x7f1a05272688,
>>>>> state=PJSIP_INV_STATE_DISCONNECTED, e=0x7fff7ce7ceb0) at
>>>>> ../src/pjsip-ua/sip_inv.c:317
>>>>> #9 in inv_on_state_incoming (inv=0x7f1a05272688, e=0x7fff7ce7ceb0) at
>>>>> ../src/pjsip-ua/sip_inv.c:4310
>>>>> #10 in mod_inv_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
>>>>> ../src/pjsip-ua/sip_inv.c:717
>>>>> #11 in pjsip_dlg_on_tsx_state (dlg=0x7f1a05c55598, tsx=0x7f1a05273608,
>>>>> e=0x7fff7ce7ceb0) at ../src/pjsip/sip_dialog.c:2064
>>>>> #12 in mod_ua_on_tsx_state (tsx=0x7f1a05273608, e=0x7fff7ce7ceb0) at
>>>>> ../src/pjsip/sip_ua_layer.c:178
>>>>> #13 in tsx_set_state (tsx=0x7f1a05273608,
>>>>> state=PJSIP_TSX_STATE_COMPLETED, event_src_type=PJSIP_EVENT_TX_MSG,
>>>>> event_src=0x7f1a043c0c38, flag=0) at ../src/pjsip/sip_transaction.c
>>>>> :1235
>>>>> #14 in tsx_on_state_proceeding_uas (tsx=0x7f1a05273608,
>>>>> event=0x7fff7ce7cfb0) at ../src/pjsip/sip_transaction.c:2819
>>>>> #15 in pjsip_tsx_send_msg (tsx=0x7f1a05273608, tdata=0x7f1a043c0c38)
>>>>> at ../src/pjsip/sip_transaction.c:1751
>>>>> #16 in pjsip_dlg_send_response (dlg=0x7f1a05c55598,
>>>>> tsx=0x7f1a05273608, tdata=0x7f1a043c0c38) at ../src/pjsip/sip_dialog.c:1529
>>>>> #17 in pjsip_inv_send_msg (inv=0x7f1a05272688, tdata=0x7f1a043c0c38)
>>>>> at ../src/pjsip-ua/sip_inv.c:3227
>>>>> #18 in pjsua_call_hangup (call_id=1091, code=606, reason=0x0,
>>>>> msg_data=0x0) at ../src/pjsua-lib/pjsua_call.c:2426
>>>>> #19 in timer_callback (timer_heap=0x7f1a15cc85f8, timer=0xbf6e90
>>>>> <dialer_info+3972080>) at TERMmain.c:313
>>>>> #20 in pj_timer_heap_poll (ht=0x7f1a15cc85f8, next_delay=0x0) at
>>>>> ../src/pj/timer.c:643
>>>>> #21 in TIMERpoll () at TIMERmain.c:72
>>>>> #22 in main_func (argc=1, argv=0x7fff7ce7d788) at main.c:352
>>>>> #23 in pj_run_app (main_func=0x40c6a7 <main_func>, argc=1,
>>>>> argv=0x7fff7ce7d788, flags=0) at ../src/pj/os_core_unix.c:1952
>>>>> #24 in main (argc=1, argv=0x7fff7ce7d788) at main.c:782
>>>>>
>>>>>
>>>>> On Mon, Mar 20, 2017 at 5:33 AM, Ming <ming@teluu.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Thanks for the report and sorry for the delay in answering.
>>>>>>
>>>>>> For the problem reported by David, we think the problem is actually
>>>>>> caused by thread 1 (instead of thread 3) which shouldn't try to obtain
>>>>>> PJSUA's lock after holding the transaction lock. So we propose the fix
>>>>>> attached. Please revert your temporary fix (where you remove the
>>>>>> acquisition of group lock in pjsip_tsx_layer_find_tsx()) and use our
>>>>>> patch instead, and share with us whether it rectifies the issue. Thanks.
>>>>>>
>>>>>> While for the deadlock issue reported by Alex, despite the
>>>>>> similarity, I believe this is an unrelated issue, so I'll reply it
>>>>>> separately in the original thread.
>>>>>>
>>>>>> Finally, for Janu, without the stack trace, I can't be certain that
>>>>>> your problem is the same as one or both of these issues. So my suggestion
>>>>>> is to apply both patches and see if the problem disappears. Also, I would
>>>>>> recommend to upgrade to version 2.6, if you haven't, since 2.3 is already
>>>>>> quite old.
>>>>>>
>>>>>> Best regards,
>>>>>> Ming
>>>>>>
>>>>>> On Fri, Mar 17, 2017 at 4:58 PM, janu@sympalog.de <janu@sympalog.de>
>>>>>> wrote:
>>>>>>
>>>>>>> Also seems like the one I reprted (happens also if the application
>>>>>>> calls pjsua_call_hangup() between receiving a sip BYE and sending the
>>>>>>> corresponding OK).
>>>>>>> I also got no response to that.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Thomas
>>>>>>>
>>>>>>> ------ Originalnachricht------
>>>>>>> *Von: *Alex Hermann
>>>>>>> *Datum: *Fr., 17. März 2017 09:49
>>>>>>> *An: *pjsip@lists.pjsip.org;
>>>>>>> *Cc: *
>>>>>>> *Betreff:*Re: [pjsip] Interesting deadlock bug found causing three
>>>>>>> threads to deadlock on the PJSUA, UA, and transaction locks.
>>>>>>>
>>>>>>> On donderdag 16 maart 2017 14:18:25 CET David Richards wrote:> I found a bug that causes my application (using the PJSUA API) to deadlockThis looks like the same deadlock I reported on 7-3-2017. Unfortunately, I got no response to it.-- Alex Hermann_______________________________________________Visit our blog: http://blog.pjsip.orgpjsip mailing listpjsip@lists.pjsip.orghtt <+listpjsip@lists.pjsip.orghtt>p://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Visit our blog: http://blog.pjsip.org
>>>>>>>
>>>>>>> pjsip mailing list
>>>>>>> pjsip@lists.pjsip.org
>>>>>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Visit our blog: http://blog.pjsip.org
>>>>>>
>>>>>> pjsip mailing list
>>>>>> pjsip@lists.pjsip.org
>>>>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Visit our blog: http://blog.pjsip.org
>>>>>
>>>>> pjsip mailing list
>>>>> pjsip@lists.pjsip.org
>>>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Visit our blog: http://blog.pjsip.org
>>>>
>>>> pjsip mailing list
>>>> pjsip@lists.pjsip.org
>>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Visit our blog: http://blog.pjsip.org
>>>
>>> pjsip mailing list
>>> pjsip@lists.pjsip.org
>>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>>
>>>
>>
>> _______________________________________________
>> Visit our blog: http://blog.pjsip.org
>>
>> pjsip mailing list
>> pjsip@lists.pjsip.org
>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>
>>
>
> _______________________________________________
> Visit our blog: http://blog.pjsip.org
>
> pjsip mailing list
> pjsip@lists.pjsip.org
> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>
>