Segfault in copy_node() called from pjsip_endpt_handle_events2()

AH
Alex Hermann
Thu, Sep 27, 2018 12:26 PM

Hi all,

I've seen some segfaults recently that occur at random times, but every
time at the same spot. I've not been able to correlate it to specific SIP
messages. From the backtrace it seems some timer is trying to access
some memory that has already been reused. That would indicate i (or pjsip)
forgot to increase some refcount.

With the very little information i have at hand it's probably very hard to
find the cause, but maybe someone can give me a hint on what may be
happening from the below backtrace:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/asterisk -g -f -U asterisk'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  copy_node (ht=0x7f33e831bd90, ht=0x7f33e831bd90, moved_node=<optimized out>, slot=60) at ../src/pj/timer.c:137
137    ../src/pj/timer.c: No such file or directory.
[Current thread is 1 (Thread 0x7f344d46c700 (LWP 17018))]
(gdb) bt full
#0  copy_node (ht=0x7f33e831bd90, ht=0x7f33e831bd90, moved_node=<optimized out>, slot=60) at ../src/pj/timer.c:137
No locals.
#1  reheap_down (child=121, slot=60, moved_node=0x7f33e831bd90, ht=0x7f33e831bd90) at ../src/pj/timer.c:185
No locals.
#2  remove_node (ht=ht@entry=0x55c82246a350, slot=slot@entry=0) at ../src/pj/timer.c:252
parent = <optimized out>
moved_node = 0x7f33e831bd90
removed_node = 0x7f33a88940b0
#3  0x00007f3455eed261 in pj_timer_heap_poll (ht=0x55c82246a350, next_delay=next_delay@entry=0x7f344d46bcc0) at ../src/pj/timer.c:643
node = <optimized out>
grp_lock = <optimized out>
now = {sec = 18760889, msec = 90}
count = 0
#4  0x00007f3455e54712 in pjsip_endpt_handle_events2 (endpt=0x55c82246a068, max_timeout=0x7f344d46bd10, p_count=0x0) at ../src/pjsip/sip_endpoint.c:715
timeout = {sec = 0, msec = 0}
count = 0
net_event_count = 0
c = <optimized out>
#5  0x00007f344da0b4e8 in ?? () from /usr/lib/asterisk/modules/res_pjsip.so
No symbol table info available.
#6  0x00007f3455ed7680 in thread_main (param=0x55c822467e08) at ../src/pj/os_core_unix.c:541
rec = 0x55c822467e08
result = <optimized out>
#7  0x00007f3453f4b494 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#8  0x00007f34531ffacf in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
(gdb) f 1
#1  reheap_down (child=121, slot=60, moved_node=0x7f33e831bd90, ht=0x7f33e831bd90) at ../src/pj/timer.c:185
185    in ../src/pj/timer.c
(gdb) p *ht
$2 = {pool = 0x7f33e831bbd8, max_size = 2, cur_size = 139862756213392, max_entries_per_poll = 331, lock = 0x11e44bd, auto_delete_lock = 989, heap = 0x7f33d41161f8, timer_ids = 0x0, timer_ids_freelist = 0, callback = 0x0}

--
Alex Hermann

Hi all, I've seen some segfaults recently that occur at random times, but every time at the same spot. I've not been able to correlate it to specific SIP messages. From the backtrace it seems some timer is trying to access some memory that has already been reused. That would indicate i (or pjsip) forgot to increase some refcount. With the very little information i have at hand it's probably very hard to find the cause, but maybe someone can give me a hint on what may be happening from the below backtrace: [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/sbin/asterisk -g -f -U asterisk'. Program terminated with signal SIGSEGV, Segmentation fault. #0 copy_node (ht=0x7f33e831bd90, ht=0x7f33e831bd90, moved_node=<optimized out>, slot=60) at ../src/pj/timer.c:137 137 ../src/pj/timer.c: No such file or directory. [Current thread is 1 (Thread 0x7f344d46c700 (LWP 17018))] (gdb) bt full #0 copy_node (ht=0x7f33e831bd90, ht=0x7f33e831bd90, moved_node=<optimized out>, slot=60) at ../src/pj/timer.c:137 No locals. #1 reheap_down (child=121, slot=60, moved_node=0x7f33e831bd90, ht=0x7f33e831bd90) at ../src/pj/timer.c:185 No locals. #2 remove_node (ht=ht@entry=0x55c82246a350, slot=slot@entry=0) at ../src/pj/timer.c:252 parent = <optimized out> moved_node = 0x7f33e831bd90 removed_node = 0x7f33a88940b0 #3 0x00007f3455eed261 in pj_timer_heap_poll (ht=0x55c82246a350, next_delay=next_delay@entry=0x7f344d46bcc0) at ../src/pj/timer.c:643 node = <optimized out> grp_lock = <optimized out> now = {sec = 18760889, msec = 90} count = 0 #4 0x00007f3455e54712 in pjsip_endpt_handle_events2 (endpt=0x55c82246a068, max_timeout=0x7f344d46bd10, p_count=0x0) at ../src/pjsip/sip_endpoint.c:715 timeout = {sec = 0, msec = 0} count = 0 net_event_count = 0 c = <optimized out> #5 0x00007f344da0b4e8 in ?? () from /usr/lib/asterisk/modules/res_pjsip.so No symbol table info available. #6 0x00007f3455ed7680 in thread_main (param=0x55c822467e08) at ../src/pj/os_core_unix.c:541 rec = 0x55c822467e08 result = <optimized out> #7 0x00007f3453f4b494 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 No symbol table info available. #8 0x00007f34531ffacf in clone () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. (gdb) f 1 #1 reheap_down (child=121, slot=60, moved_node=0x7f33e831bd90, ht=0x7f33e831bd90) at ../src/pj/timer.c:185 185 in ../src/pj/timer.c (gdb) p *ht $2 = {pool = 0x7f33e831bbd8, max_size = 2, cur_size = 139862756213392, max_entries_per_poll = 331, lock = 0x11e44bd, auto_delete_lock = 989, heap = 0x7f33d41161f8, timer_ids = 0x0, timer_ids_freelist = 0, callback = 0x0} -- Alex Hermann
M
Ming
Fri, Sep 28, 2018 1:44 AM

Hi Alex,

You should probably report this to Asterisk first.

Regards,
Ming

On Thu, Sep 27, 2018 at 8:26 PM, Alex Hermann alex-lists@wenlex.nl wrote:

Hi all,

I've seen some segfaults recently that occur at random times, but every
time at the same spot. I've not been able to correlate it to specific SIP
messages. From the backtrace it seems some timer is trying to access
some memory that has already been reused. That would indicate i (or pjsip)
forgot to increase some refcount.

With the very little information i have at hand it's probably very hard to
find the cause, but maybe someone can give me a hint on what may be
happening from the below backtrace:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/asterisk -g -f -U asterisk'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  copy_node (ht=0x7f33e831bd90, ht=0x7f33e831bd90, moved_node=<optimized out>, slot=60) at ../src/pj/timer.c:137
137    ../src/pj/timer.c: No such file or directory.
[Current thread is 1 (Thread 0x7f344d46c700 (LWP 17018))]
(gdb) bt full
#0  copy_node (ht=0x7f33e831bd90, ht=0x7f33e831bd90, moved_node=<optimized out>, slot=60) at ../src/pj/timer.c:137
No locals.
#1  reheap_down (child=121, slot=60, moved_node=0x7f33e831bd90, ht=0x7f33e831bd90) at ../src/pj/timer.c:185
No locals.
#2  remove_node (ht=ht@entry=0x55c82246a350, slot=slot@entry=0) at ../src/pj/timer.c:252
parent = <optimized out>
moved_node = 0x7f33e831bd90
removed_node = 0x7f33a88940b0
#3  0x00007f3455eed261 in pj_timer_heap_poll (ht=0x55c82246a350, next_delay=next_delay@entry=0x7f344d46bcc0) at ../src/pj/timer.c:643
node = <optimized out>
grp_lock = <optimized out>
now = {sec = 18760889, msec = 90}
count = 0
#4  0x00007f3455e54712 in pjsip_endpt_handle_events2 (endpt=0x55c82246a068, max_timeout=0x7f344d46bd10, p_count=0x0) at ../src/pjsip/sip_endpoint.c:715
timeout = {sec = 0, msec = 0}
count = 0
net_event_count = 0
c = <optimized out>
#5  0x00007f344da0b4e8 in ?? () from /usr/lib/asterisk/modules/res_pjsip.so
No symbol table info available.
#6  0x00007f3455ed7680 in thread_main (param=0x55c822467e08) at ../src/pj/os_core_unix.c:541
rec = 0x55c822467e08
result = <optimized out>
#7  0x00007f3453f4b494 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#8  0x00007f34531ffacf in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
(gdb) f 1
#1  reheap_down (child=121, slot=60, moved_node=0x7f33e831bd90, ht=0x7f33e831bd90) at ../src/pj/timer.c:185
185    in ../src/pj/timer.c
(gdb) p *ht
$2 = {pool = 0x7f33e831bbd8, max_size = 2, cur_size = 139862756213392, max_entries_per_poll = 331, lock = 0x11e44bd, auto_delete_lock = 989, heap = 0x7f33d41161f8, timer_ids = 0x0, timer_ids_freelist = 0, callback = 0x0}

--
Alex Hermann


Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

Hi Alex, You should probably report this to Asterisk first. Regards, Ming On Thu, Sep 27, 2018 at 8:26 PM, Alex Hermann <alex-lists@wenlex.nl> wrote: > Hi all, > > I've seen some segfaults recently that occur at random times, but every > time at the same spot. I've not been able to correlate it to specific SIP > messages. From the backtrace it seems some timer is trying to access > some memory that has already been reused. That would indicate i (or pjsip) > forgot to increase some refcount. > > With the very little information i have at hand it's probably very hard to > find the cause, but maybe someone can give me a hint on what may be > happening from the below backtrace: > > > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Core was generated by `/usr/sbin/asterisk -g -f -U asterisk'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 copy_node (ht=0x7f33e831bd90, ht=0x7f33e831bd90, moved_node=<optimized out>, slot=60) at ../src/pj/timer.c:137 > 137 ../src/pj/timer.c: No such file or directory. > [Current thread is 1 (Thread 0x7f344d46c700 (LWP 17018))] > (gdb) bt full > #0 copy_node (ht=0x7f33e831bd90, ht=0x7f33e831bd90, moved_node=<optimized out>, slot=60) at ../src/pj/timer.c:137 > No locals. > #1 reheap_down (child=121, slot=60, moved_node=0x7f33e831bd90, ht=0x7f33e831bd90) at ../src/pj/timer.c:185 > No locals. > #2 remove_node (ht=ht@entry=0x55c82246a350, slot=slot@entry=0) at ../src/pj/timer.c:252 > parent = <optimized out> > moved_node = 0x7f33e831bd90 > removed_node = 0x7f33a88940b0 > #3 0x00007f3455eed261 in pj_timer_heap_poll (ht=0x55c82246a350, next_delay=next_delay@entry=0x7f344d46bcc0) at ../src/pj/timer.c:643 > node = <optimized out> > grp_lock = <optimized out> > now = {sec = 18760889, msec = 90} > count = 0 > #4 0x00007f3455e54712 in pjsip_endpt_handle_events2 (endpt=0x55c82246a068, max_timeout=0x7f344d46bd10, p_count=0x0) at ../src/pjsip/sip_endpoint.c:715 > timeout = {sec = 0, msec = 0} > count = 0 > net_event_count = 0 > c = <optimized out> > #5 0x00007f344da0b4e8 in ?? () from /usr/lib/asterisk/modules/res_pjsip.so > No symbol table info available. > #6 0x00007f3455ed7680 in thread_main (param=0x55c822467e08) at ../src/pj/os_core_unix.c:541 > rec = 0x55c822467e08 > result = <optimized out> > #7 0x00007f3453f4b494 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 > No symbol table info available. > #8 0x00007f34531ffacf in clone () from /lib/x86_64-linux-gnu/libc.so.6 > No symbol table info available. > (gdb) f 1 > #1 reheap_down (child=121, slot=60, moved_node=0x7f33e831bd90, ht=0x7f33e831bd90) at ../src/pj/timer.c:185 > 185 in ../src/pj/timer.c > (gdb) p *ht > $2 = {pool = 0x7f33e831bbd8, max_size = 2, cur_size = 139862756213392, max_entries_per_poll = 331, lock = 0x11e44bd, auto_delete_lock = 989, heap = 0x7f33d41161f8, timer_ids = 0x0, timer_ids_freelist = 0, callback = 0x0} > > -- > Alex Hermann > > > > _______________________________________________ > Visit our blog: http://blog.pjsip.org > > pjsip mailing list > pjsip@lists.pjsip.org > http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
AH
Alex Hermann
Fri, Sep 28, 2018 7:57 AM

On vrijdag 28 september 2018 03:44:48 CEST Ming wrote:

You should probably report this to Asterisk first.

This is a heavily modified version of Asterisk, they probably won't recognize
some parts of it.

Alex Hermann

On vrijdag 28 september 2018 03:44:48 CEST Ming wrote: > You should probably report this to Asterisk first. This is a heavily modified version of Asterisk, they probably won't recognize some parts of it. -- Alex Hermann