Pjsip 2.9 crash in pjsip_tx_data_add_ref

JB
Josep Bort
Thu, Apr 2, 2020 10:44 AM

Hi,

We are using pjsip 2.9 with asterisk 13.32.0, using webrtc transport with 'rel100' activated. There are about 170 SIP endpoints connected and 150 simultaneous calls.

We get a crash last week and it wasn't reproduced yet. Segfault thread stack is:

#0  pjsip_tx_data_add_ref (tdata=0x0) at ../src/pjsip/sip_transport.c:512
#1  0x00007f8bfd51ee12 in on_retransmit (timer_heap=<optimized out>, entry=0x7f8b9437d748) at ../src/pjsip-ua/sip_100rel.c:599
#2  0x00007f8bfd5d2fa7 in pj_timer_heap_poll (ht=0x36f0850, next_delay=next_delay@entry=0x7f8beb697ce0) at ../src/pj/timer.c:659
#3  0x00007f8bfd536dad in pjsip_endpt_handle_events2 (endpt=0x36f0568, max_timeout=max_timeout@entry=0x7f8beb697d40, p_count=p_count@entry=0x0) at ../src/pjsip/sip_endpoint.c:716
#4  0x00007f8bfd536ec7 in pjsip_endpt_handle_events (endpt=<optimized out>, max_timeout=max_timeout@entry=0x7f8beb697d40) at ../src/pjsip/sip_endpoint.c:777
#5  0x00007f8b877a6f30 in monitor_thread_exec (endpt=<optimized out>) at res_pjsip.c:4465
#6  0x00007f8bfd5bc000 in thread_main (param=0x379a3a8) at ../src/pj/os_core_unix.c:541
#7  0x00007f8bfb609e65 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007f8bfa9ab88d in clone () from /usr/lib64/libc.so.6

Seems its related with timers and/or rel100/PRACK.

We attach additional information.

We ensured that related timer fixes (#2230 and #2172) were applied.

Additionally, we think the issue could be related to network latencies / problems, because some end points are connected to WIFI networks.

¿Does anyone know if it's a known issue?
¿Can anyone help us?

Regards,

Hi, We are using pjsip 2.9 with asterisk 13.32.0, using webrtc transport with 'rel100' activated. There are about 170 SIP endpoints connected and 150 simultaneous calls. We get a crash last week and it wasn't reproduced yet. Segfault thread stack is: #0 pjsip_tx_data_add_ref (tdata=0x0) at ../src/pjsip/sip_transport.c:512 #1 0x00007f8bfd51ee12 in on_retransmit (timer_heap=<optimized out>, entry=0x7f8b9437d748) at ../src/pjsip-ua/sip_100rel.c:599 #2 0x00007f8bfd5d2fa7 in pj_timer_heap_poll (ht=0x36f0850, next_delay=next_delay@entry=0x7f8beb697ce0) at ../src/pj/timer.c:659 #3 0x00007f8bfd536dad in pjsip_endpt_handle_events2 (endpt=0x36f0568, max_timeout=max_timeout@entry=0x7f8beb697d40, p_count=p_count@entry=0x0) at ../src/pjsip/sip_endpoint.c:716 #4 0x00007f8bfd536ec7 in pjsip_endpt_handle_events (endpt=<optimized out>, max_timeout=max_timeout@entry=0x7f8beb697d40) at ../src/pjsip/sip_endpoint.c:777 #5 0x00007f8b877a6f30 in monitor_thread_exec (endpt=<optimized out>) at res_pjsip.c:4465 #6 0x00007f8bfd5bc000 in thread_main (param=0x379a3a8) at ../src/pj/os_core_unix.c:541 #7 0x00007f8bfb609e65 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007f8bfa9ab88d in clone () from /usr/lib64/libc.so.6 Seems its related with timers and/or rel100/PRACK. We attach additional information. We ensured that related timer fixes (#2230 and #2172) were applied. Additionally, we think the issue could be related to network latencies / problems, because some end points are connected to WIFI networks. ¿Does anyone know if it's a known issue? ¿Can anyone help us? Regards,
JB
Josep Bort
Thu, Apr 2, 2020 3:43 PM

Hi,

Seems #2350 could fix the crash (https://github.com/pjsip/pjproject/pull/2350)

#2350 has been applied on master (2.11). Does anyone know if apply fix# 2350 on pjsip 2.9 is safe?

Regards.

De: Josep Bort
Enviado el: jueves, 2 de abril de 2020 12:44
Para: 'pjsip@lists.pjsip.org' pjsip@lists.pjsip.org
Asunto: Pjsip 2.9 crash in pjsip_tx_data_add_ref

Hi,

We are using pjsip 2.9 with asterisk 13.32.0, using webrtc transport with 'rel100' activated. There are about 170 SIP endpoints connected and 150 simultaneous calls.

We get a crash last week and it wasn't reproduced yet. Segfault thread stack is:

#0  pjsip_tx_data_add_ref (tdata=0x0) at ../src/pjsip/sip_transport.c:512
#1  0x00007f8bfd51ee12 in on_retransmit (timer_heap=<optimized out>, entry=0x7f8b9437d748) at ../src/pjsip-ua/sip_100rel.c:599
#2  0x00007f8bfd5d2fa7 in pj_timer_heap_poll (ht=0x36f0850, next_delay=next_delay@entry=0x7f8beb697ce0) at ../src/pj/timer.c:659
#3  0x00007f8bfd536dad in pjsip_endpt_handle_events2 (endpt=0x36f0568, max_timeout=max_timeout@entry=0x7f8beb697d40, p_count=p_count@entry=0x0) at ../src/pjsip/sip_endpoint.c:716
#4  0x00007f8bfd536ec7 in pjsip_endpt_handle_events (endpt=<optimized out>, max_timeout=max_timeout@entry=0x7f8beb697d40) at ../src/pjsip/sip_endpoint.c:777
#5  0x00007f8b877a6f30 in monitor_thread_exec (endpt=<optimized out>) at res_pjsip.c:4465
#6  0x00007f8bfd5bc000 in thread_main (param=0x379a3a8) at ../src/pj/os_core_unix.c:541
#7  0x00007f8bfb609e65 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007f8bfa9ab88d in clone () from /usr/lib64/libc.so.6

Seems its related with timers and/or rel100/PRACK.

We attach additional information.

We ensured that related timer fixes (#2230 and #2172) were applied.

Additionally, we think the issue could be related to network latencies / problems, because some end points are connected to WIFI networks.

¿Does anyone know if it's a known issue?
¿Can anyone help us?

Regards,

Hi, Seems #2350 could fix the crash (https://github.com/pjsip/pjproject/pull/2350) #2350 has been applied on master (2.11). Does anyone know if apply fix# 2350 on pjsip 2.9 is safe? Regards. De: Josep Bort Enviado el: jueves, 2 de abril de 2020 12:44 Para: 'pjsip@lists.pjsip.org' <pjsip@lists.pjsip.org> Asunto: Pjsip 2.9 crash in pjsip_tx_data_add_ref Hi, We are using pjsip 2.9 with asterisk 13.32.0, using webrtc transport with 'rel100' activated. There are about 170 SIP endpoints connected and 150 simultaneous calls. We get a crash last week and it wasn't reproduced yet. Segfault thread stack is: #0 pjsip_tx_data_add_ref (tdata=0x0) at ../src/pjsip/sip_transport.c:512 #1 0x00007f8bfd51ee12 in on_retransmit (timer_heap=<optimized out>, entry=0x7f8b9437d748) at ../src/pjsip-ua/sip_100rel.c:599 #2 0x00007f8bfd5d2fa7 in pj_timer_heap_poll (ht=0x36f0850, next_delay=next_delay@entry=0x7f8beb697ce0) at ../src/pj/timer.c:659 #3 0x00007f8bfd536dad in pjsip_endpt_handle_events2 (endpt=0x36f0568, max_timeout=max_timeout@entry=0x7f8beb697d40, p_count=p_count@entry=0x0) at ../src/pjsip/sip_endpoint.c:716 #4 0x00007f8bfd536ec7 in pjsip_endpt_handle_events (endpt=<optimized out>, max_timeout=max_timeout@entry=0x7f8beb697d40) at ../src/pjsip/sip_endpoint.c:777 #5 0x00007f8b877a6f30 in monitor_thread_exec (endpt=<optimized out>) at res_pjsip.c:4465 #6 0x00007f8bfd5bc000 in thread_main (param=0x379a3a8) at ../src/pj/os_core_unix.c:541 #7 0x00007f8bfb609e65 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007f8bfa9ab88d in clone () from /usr/lib64/libc.so.6 Seems its related with timers and/or rel100/PRACK. We attach additional information. We ensured that related timer fixes (#2230 and #2172) were applied. Additionally, we think the issue could be related to network latencies / problems, because some end points are connected to WIFI networks. ¿Does anyone know if it's a known issue? ¿Can anyone help us? Regards,