Empathy List Archives

AG

Alexei Gradinari

Wed, Sep 21, 2016 3:38 PM

Hello,

On heavy loaded system with TLS,
one thread could destroy the ssl socket on SSL_ERROR_SYSCALL
while another thread still uses this socket which
was already freed, so we get segfault.
Attached 2 backtraces.

To avoid race condition need to lock the socket before destroying it.

The attached patch adds the socket lock on destroying it
and adds a checking on all openssl calls that the socket wasn't destroyed.

Regards,
Alexei

Hello, On heavy loaded system with TLS, one thread could destroy the ssl socket on SSL_ERROR_SYSCALL while another thread still uses this socket which was already freed, so we get segfault. Attached 2 backtraces. To avoid race condition need to lock the socket before destroying it. The attached patch adds the socket lock on destroying it and adds a checking on all openssl calls that the socket wasn't destroyed. Regards, Alexei

RB

Ross Beer

Mon, Oct 3, 2016 9:23 AM

Hi PJSIP team,

Could this patch be considered for inclusion in the project?

This issue is causing segfaults a least once a day.

Kind regards,

Ross

From: pjsip pjsip-bounces@lists.pjsip.org on behalf of Alexei Gradinari alex2grad@gmail.com
Sent: 21 September 2016 16:38
To: pjsip list
Subject: [pjsip] patch: crash on using already destroyed ssl socket

Hello,

On heavy loaded system with TLS,
one thread could destroy the ssl socket on SSL_ERROR_SYSCALL
while another thread still uses this socket which
was already freed, so we get segfault.
Attached 2 backtraces.

To avoid race condition need to lock the socket before destroying it.

The attached patch adds the socket lock on destroying it
and adds a checking on all openssl calls that the socket wasn't destroyed.

Regards,
Alexei

Hi PJSIP team, Could this patch be considered for inclusion in the project? This issue is causing segfaults a least once a day. Kind regards, Ross ________________________________ From: pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Alexei Gradinari <alex2grad@gmail.com> Sent: 21 September 2016 16:38 To: pjsip list Subject: [pjsip] patch: crash on using already destroyed ssl socket Hello, On heavy loaded system with TLS, one thread could destroy the ssl socket on SSL_ERROR_SYSCALL while another thread still uses this socket which was already freed, so we get segfault. Attached 2 backtraces. To avoid race condition need to lock the socket before destroying it. The attached patch adds the socket lock on destroying it and adds a checking on all openssl calls that the socket wasn't destroyed. Regards, Alexei

M

Ming

Mon, Oct 3, 2016 11:34 AM

Hi Ross,

Yes, the patch is undergoing some changes and currently under
(hopefully) final review.

Regards,
Ming

On Mon, Oct 3, 2016 at 5:23 PM, Ross Beer ross.beer@outlook.com wrote:

Hi PJSIP team,

Could this patch be considered for inclusion in the project?

This issue is causing segfaults a least once a day.

Kind regards,

Ross

From: pjsip pjsip-bounces@lists.pjsip.org on behalf of Alexei Gradinari
alex2grad@gmail.com
Sent: 21 September 2016 16:38
To: pjsip list
Subject: [pjsip] patch: crash on using already destroyed ssl socket

Hello,

On heavy loaded system with TLS,
one thread could destroy the ssl socket on SSL_ERROR_SYSCALL
while another thread still uses this socket which
was already freed, so we get segfault.
Attached 2 backtraces.

To avoid race condition need to lock the socket before destroying it.

The attached patch adds the socket lock on destroying it
and adds a checking on all openssl calls that the socket wasn't destroyed.

Regards,
Alexei

Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

Hi Ross, Yes, the patch is undergoing some changes and currently under (hopefully) final review. Regards, Ming On Mon, Oct 3, 2016 at 5:23 PM, Ross Beer <ross.beer@outlook.com> wrote: > Hi PJSIP team, > > > Could this patch be considered for inclusion in the project? > > > This issue is causing segfaults a least once a day. > > > Kind regards, > > > Ross > > > > ________________________________ > From: pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Alexei Gradinari > <alex2grad@gmail.com> > Sent: 21 September 2016 16:38 > To: pjsip list > Subject: [pjsip] patch: crash on using already destroyed ssl socket > > Hello, > > On heavy loaded system with TLS, > one thread could destroy the ssl socket on SSL_ERROR_SYSCALL > while another thread still uses this socket which > was already freed, so we get segfault. > Attached 2 backtraces. > > To avoid race condition need to lock the socket before destroying it. > > The attached patch adds the socket lock on destroying it > and adds a checking on all openssl calls that the socket wasn't destroyed. > > Regards, > Alexei > > _______________________________________________ > Visit our blog: http://blog.pjsip.org > > pjsip mailing list > pjsip@lists.pjsip.org > http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org >

RB

Ross Beer

Fri, Oct 7, 2016 9:47 AM

Hi Ming,

Sorry to chase, however is this still undergoing review?

Kind regards,

Ross

From: pjsip pjsip-bounces@lists.pjsip.org on behalf of Ming ming@teluu.com
Sent: 03 October 2016 12:34
To: pjsip list
Subject: Re: [pjsip] patch: crash on using already destroyed ssl socket

Hi Ross,

Yes, the patch is undergoing some changes and currently under
(hopefully) final review.

Regards,
Ming

On Mon, Oct 3, 2016 at 5:23 PM, Ross Beer ross.beer@outlook.com wrote:

Hi PJSIP team,

Could this patch be considered for inclusion in the project?

This issue is causing segfaults a least once a day.

Kind regards,

Ross

From: pjsip pjsip-bounces@lists.pjsip.org on behalf of Alexei Gradinari
alex2grad@gmail.com
Sent: 21 September 2016 16:38
To: pjsip list
Subject: [pjsip] patch: crash on using already destroyed ssl socket

Hello,

On heavy loaded system with TLS,
one thread could destroy the ssl socket on SSL_ERROR_SYSCALL
while another thread still uses this socket which
was already freed, so we get segfault.
Attached 2 backtraces.

To avoid race condition need to lock the socket before destroying it.

The attached patch adds the socket lock on destroying it
and adds a checking on all openssl calls that the socket wasn't destroyed.

Regards,
Alexei

Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

pjsip Info Pagehttp://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
lists.pjsip.org
This is the mailing list to discuss pjsip, the open source sip stack. We also have answers to frequently asked questions. To see the collection of prior postings to ...

Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

Hi Ming, Sorry to chase, however is this still undergoing review? Kind regards, Ross ________________________________ From: pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Ming <ming@teluu.com> Sent: 03 October 2016 12:34 To: pjsip list Subject: Re: [pjsip] patch: crash on using already destroyed ssl socket Hi Ross, Yes, the patch is undergoing some changes and currently under (hopefully) final review. Regards, Ming On Mon, Oct 3, 2016 at 5:23 PM, Ross Beer <ross.beer@outlook.com> wrote: > Hi PJSIP team, > > > Could this patch be considered for inclusion in the project? > > > This issue is causing segfaults a least once a day. > > > Kind regards, > > > Ross > > > > ________________________________ > From: pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Alexei Gradinari > <alex2grad@gmail.com> > Sent: 21 September 2016 16:38 > To: pjsip list > Subject: [pjsip] patch: crash on using already destroyed ssl socket > > Hello, > > On heavy loaded system with TLS, > one thread could destroy the ssl socket on SSL_ERROR_SYSCALL > while another thread still uses this socket which > was already freed, so we get segfault. > Attached 2 backtraces. > > To avoid race condition need to lock the socket before destroying it. > > The attached patch adds the socket lock on destroying it > and adds a checking on all openssl calls that the socket wasn't destroyed. > > Regards, > Alexei > > _______________________________________________ > Visit our blog: http://blog.pjsip.org > > pjsip mailing list > pjsip@lists.pjsip.org > http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org pjsip Info Page<http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org> lists.pjsip.org This is the mailing list to discuss pjsip, the open source sip stack. We also have answers to frequently asked questions. To see the collection of prior postings to ... > _______________________________________________ Visit our blog: http://blog.pjsip.org pjsip mailing list pjsip@lists.pjsip.org http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

RS

Riza Sulistyo

Sun, Oct 9, 2016 11:51 PM

Hi Ross/Alexei,

We have made some modification to the patch in order to incorporate the
group lock.
Please find the patch on the attachment, could you help us in testing it?

Best Regards,

Riza

On Fri, Oct 7, 2016 at 4:47 PM, Ross Beer ross.beer@outlook.com wrote:

Hi Ming,

Sorry to chase, however is this still undergoing review?

Kind regards,

Ross

From: pjsip pjsip-bounces@lists.pjsip.org on behalf of Ming <
ming@teluu.com>
Sent: 03 October 2016 12:34
To: pjsip list
Subject: Re: [pjsip] patch: crash on using already destroyed ssl socket

Hi Ross,

Yes, the patch is undergoing some changes and currently under
(hopefully) final review.

Regards,
Ming

On Mon, Oct 3, 2016 at 5:23 PM, Ross Beer ross.beer@outlook.com wrote:

Hi PJSIP team,

Could this patch be considered for inclusion in the project?

This issue is causing segfaults a least once a day.

Kind regards,

Ross

From: pjsip pjsip-bounces@lists.pjsip.org on behalf of Alexei

Gradinari

alex2grad@gmail.com
Sent: 21 September 2016 16:38
To: pjsip list
Subject: [pjsip] patch: crash on using already destroyed ssl socket

Hello,

On heavy loaded system with TLS,
one thread could destroy the ssl socket on SSL_ERROR_SYSCALL
while another thread still uses this socket which
was already freed, so we get segfault.
Attached 2 backtraces.

To avoid race condition need to lock the socket before destroying it.

The attached patch adds the socket lock on destroying it
and adds a checking on all openssl calls that the socket wasn't

destroyed.

Regards,
Alexei

Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

pjsip Info Page
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
lists.pjsip.org
This is the mailing list to discuss pjsip, the open source sip stack. We
also have answers to frequently asked questions. To see the collection of
prior postings to ...

Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

Hi Ross/Alexei, We have made some modification to the patch in order to incorporate the group lock. Please find the patch on the attachment, could you help us in testing it? Best Regards, Riza On Fri, Oct 7, 2016 at 4:47 PM, Ross Beer <ross.beer@outlook.com> wrote: > Hi Ming, > > > Sorry to chase, however is this still undergoing review? > > > Kind regards, > > > Ross > > > ------------------------------ > *From:* pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Ming < > ming@teluu.com> > *Sent:* 03 October 2016 12:34 > *To:* pjsip list > *Subject:* Re: [pjsip] patch: crash on using already destroyed ssl socket > > Hi Ross, > > Yes, the patch is undergoing some changes and currently under > (hopefully) final review. > > Regards, > Ming > > On Mon, Oct 3, 2016 at 5:23 PM, Ross Beer <ross.beer@outlook.com> wrote: > > Hi PJSIP team, > > > > > > Could this patch be considered for inclusion in the project? > > > > > > This issue is causing segfaults a least once a day. > > > > > > Kind regards, > > > > > > Ross > > > > > > > > ________________________________ > > From: pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Alexei > Gradinari > > <alex2grad@gmail.com> > > Sent: 21 September 2016 16:38 > > To: pjsip list > > Subject: [pjsip] patch: crash on using already destroyed ssl socket > > > > Hello, > > > > On heavy loaded system with TLS, > > one thread could destroy the ssl socket on SSL_ERROR_SYSCALL > > while another thread still uses this socket which > > was already freed, so we get segfault. > > Attached 2 backtraces. > > > > To avoid race condition need to lock the socket before destroying it. > > > > The attached patch adds the socket lock on destroying it > > and adds a checking on all openssl calls that the socket wasn't > destroyed. > > > > Regards, > > Alexei > > > > _______________________________________________ > > Visit our blog: http://blog.pjsip.org > > > > pjsip mailing list > > pjsip@lists.pjsip.org > > http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org > pjsip Info Page > <http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org> > lists.pjsip.org > This is the mailing list to discuss pjsip, the open source sip stack. We > also have answers to frequently asked questions. To see the collection of > prior postings to ... > > > > > > _______________________________________________ > Visit our blog: http://blog.pjsip.org > > pjsip mailing list > pjsip@lists.pjsip.org > http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org > > _______________________________________________ > Visit our blog: http://blog.pjsip.org > > pjsip mailing list > pjsip@lists.pjsip.org > http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org > >

AG

Alexei Gradinari

Tue, Oct 11, 2016 3:44 PM

Re: [pjsip] patch: crash on using already destroyed ssl socket Hello Riza,

Could you, please, explain why did you remove all checking on openssl calls from my patch?
Your patch didn't fix race condition.
Imagine
one thread calls reset_ssl_sock_state
while another thread calls BIO_pending(ssock->ossl_wbio) on already closed socket.

Pleas, look at commit 189e20c68c2399782034fec9f65d6b9ee88aac3d
https://mta.openssl.org/pipermail/openssl-commits/2015-May/001035.html
Reusing an SSL object when it has encountered a fatal error can
have bad consequences. This is a bug in application code not libssl
but libssl should be more forgiving and not crash.

P.S.
I don't understand why my patch isn't acceptable for you.
Could you, please, explain.

Regards,
Alexei

Sunday, October 9, 2016, 7:51:54 PM, you wrote:
<a name="m_8131166262091593318divtagdefaultwrapper"></a>
Hi Ross/Alexei,

We have made some modification to the patch in order to incorporate the group lock.
Please find the patch on the attachment, could you help us in testing it?

Best Regards,

Riza

On Fri, Oct 7, 2016 at 4:47 PM, Ross Beer <ross.beer@outlook.com> wrote:
Hi Ming,

Sorry to chase, however is this still undergoing review?

Kind regards,

Ross

From: pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Ming <ming@teluu.com>
Sent: 03 October 2016 12:34
To: pjsip list
Subject: Re: [pjsip] patch: crash on using already destroyed ssl socket

Hi Ross,

Yes, the patch is undergoing some changes and currently under
(hopefully) final review.

Regards,
Ming

On Mon, Oct 3, 2016 at 5:23 PM, Ross Beer <ross.beer@outlook.com> wrote:
> Hi PJSIP team,
>
>
> Could this patch be considered for inclusion in the project?
>
>
> This issue is causing segfaults a least once a day.
>
>
> Kind regards,
>
>
> Ross
>
>
>
> ________________________________
> From: pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Alexei Gradinari
> <alex2grad@gmail.com>
> Sent: 21 September 2016 16:38
> To: pjsip list
> Subject: [pjsip] patch: crash on using already destroyed ssl socket
>
> Hello,
>
> On heavy loaded system with TLS,
> one thread could destroy the ssl socket on SSL_ERROR_SYSCALL
> while another thread still uses this socket which
> was already freed, so we get segfault.
> Attached 2 backtraces.
>
> To avoid race condition need to lock the socket before destroying it.
>
> The attached patch adds the socket lock on destroying it
> and adds a checking on all openssl calls that the socket wasn't destroyed.
>
> Regards,
> Alexei
>
> _______________________________________________
> Visit our blog: http://blog.pjsip.org
>
> pjsip mailing list
> pjsip@lists.pjsip.org
> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
pjsip Info Page
lists.pjsip.org
This is the mailing list to discuss pjsip, the open source sip stack. We also have answers to frequently asked questions. To see the collection of prior postings to ...

>

_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

--
Best regards,
Alexei mailto:alex2grad@gmail.com

RS

Riza Sulistyo

Wed, Oct 12, 2016 12:36 PM

Hi Alexei,

When doing the review your patch, we identified another potential race
issue which might occur.
e.g: write_mutex get destroyed before send() is called.

We feel that using group lock will solve these issues.
As you can see from the patch, the reset_ssl_sock_state() will not destroy
the ssl.
We have moved all the code of destroying ssl and write_mutex to the
ssl_on_destroy()
which is called when the group lock is no longer held or reference count
reach 0.

As for your concern regarding the call to BIO_pending() on a closed socket,
it shouldn't be
a problem since we don't use OpenSSL for the socket operation.
Or, maybe I'm missing something here?

Best Regards,

Riza

On Tue, Oct 11, 2016 at 10:44 PM, Alexei Gradinari alex2grad@gmail.com
wrote:

Hello Riza,

Could you, please, explain why did you remove all checking on openssl
calls from my patch?
Your patch didn't fix race condition.
Imagine
one thread calls reset_ssl_sock_state
while another thread calls BIO_pending(ssock->ossl_wbio) on already closed
socket.

Pleas, look at commit 189e20c68c2399782034fec9f65d6b9ee88aac3d
https://mta.openssl.org/pipermail/openssl-commits/2015-May/001035.html
Reusing an SSL object when it has encountered a fatal error can
have bad consequences. This is a bug in application code not libssl
but libssl should be more forgiving and not crash.

P.S.
I don't understand why my patch isn't acceptable for you.
Could you, please, explain.

Regards,
Alexei

Sunday, October 9, 2016, 7:51:54 PM, you wrote:

Hi Ross/Alexei,

We have made some modification to the patch in order to incorporate the
group lock.
Please find the patch on the attachment, could you help us in testing it?

Best Regards,

Riza

On Fri, Oct 7, 2016 at 4:47 PM, Ross Beer ross.beer@outlook.com wrote:
Hi Ming,

Sorry to chase, however is this still undergoing review?

Kind regards,

Ross

From: pjsip pjsip-bounces@lists.pjsip.org on behalf of Ming <
ming@teluu.com>
Sent: 03 October 2016 12:34
To: pjsip list
Subject: Re: [pjsip] patch: crash on using already destroyed ssl socket

Hi Ross,

Yes, the patch is undergoing some changes and currently under
(hopefully) final review.

Regards,
Ming

On Mon, Oct 3, 2016 at 5:23 PM, Ross Beer ross.beer@outlook.com wrote:

Hi PJSIP team,

Could this patch be considered for inclusion in the project?

This issue is causing segfaults a least once a day.

Kind regards,

Ross

From: pjsip pjsip-bounces@lists.pjsip.org on behalf of Alexei

Gradinari

alex2grad@gmail.com
Sent: 21 September 2016 16:38
To: pjsip list
Subject: [pjsip] patch: crash on using already destroyed ssl socket

Hello,

On heavy loaded system with TLS,
one thread could destroy the ssl socket on SSL_ERROR_SYSCALL
while another thread still uses this socket which
was already freed, so we get segfault.
Attached 2 backtraces.

To avoid race condition need to lock the socket before destroying it.

The attached patch adds the socket lock on destroying it
and adds a checking on all openssl calls that the socket wasn't

destroyed.

Regards,
Alexei

Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

pjsip Info Page
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
lists.pjsip.org
This is the mailing list to discuss pjsip, the open source sip stack. We
also have answers to frequently asked questions. To see the collection of
prior postings to ...

Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

*-- Best regards, Alexei *
mailto:alex2grad@gmail.com alex2grad@gmail.com

Hi Alexei, When doing the review your patch, we identified another potential race issue which might occur. e.g: write_mutex get destroyed before send() is called. We feel that using group lock will solve these issues. As you can see from the patch, the reset_ssl_sock_state() will not destroy the ssl. We have moved all the code of destroying ssl and write_mutex to the ssl_on_destroy() which is called when the group lock is no longer held or reference count reach 0. As for your concern regarding the call to BIO_pending() on a closed socket, it shouldn't be a problem since we don't use OpenSSL for the socket operation. Or, maybe I'm missing something here? Best Regards, Riza On Tue, Oct 11, 2016 at 10:44 PM, Alexei Gradinari <alex2grad@gmail.com> wrote: > Hello Riza, > > Could you, please, explain why did you remove all checking on openssl > calls from my patch? > Your patch didn't fix race condition. > Imagine > one thread calls reset_ssl_sock_state > while another thread calls BIO_pending(ssock->ossl_wbio) on already closed > socket. > > Pleas, look at commit 189e20c68c2399782034fec9f65d6b9ee88aac3d > https://mta.openssl.org/pipermail/openssl-commits/2015-May/001035.html > Reusing an SSL object when it has encountered a fatal error can > have bad consequences. This is a bug in application code not libssl > but libssl should be more forgiving and not crash. > > > P.S. > I don't understand why my patch isn't acceptable for you. > Could you, please, explain. > > Regards, > Alexei > > > Sunday, October 9, 2016, 7:51:54 PM, you wrote: > > Hi Ross/Alexei, > > We have made some modification to the patch in order to incorporate the > group lock. > Please find the patch on the attachment, could you help us in testing it? > > Best Regards, > > Riza > > On Fri, Oct 7, 2016 at 4:47 PM, Ross Beer <ross.beer@outlook.com> wrote: > Hi Ming, > > Sorry to chase, however is this still undergoing review? > > Kind regards, > > Ross > > > ------------------------------ > *From:* pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Ming < > ming@teluu.com> > *Sent:* 03 October 2016 12:34 > *To:* pjsip list > *Subject:* Re: [pjsip] patch: crash on using already destroyed ssl socket > > Hi Ross, > > Yes, the patch is undergoing some changes and currently under > (hopefully) final review. > > Regards, > Ming > > On Mon, Oct 3, 2016 at 5:23 PM, Ross Beer <ross.beer@outlook.com> wrote: > > Hi PJSIP team, > > > > > > Could this patch be considered for inclusion in the project? > > > > > > This issue is causing segfaults a least once a day. > > > > > > Kind regards, > > > > > > Ross > > > > > > > > ________________________________ > > From: pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Alexei > Gradinari > > <alex2grad@gmail.com> > > Sent: 21 September 2016 16:38 > > To: pjsip list > > Subject: [pjsip] patch: crash on using already destroyed ssl socket > > > > Hello, > > > > On heavy loaded system with TLS, > > one thread could destroy the ssl socket on SSL_ERROR_SYSCALL > > while another thread still uses this socket which > > was already freed, so we get segfault. > > Attached 2 backtraces. > > > > To avoid race condition need to lock the socket before destroying it. > > > > The attached patch adds the socket lock on destroying it > > and adds a checking on all openssl calls that the socket wasn't > destroyed. > > > > Regards, > > Alexei > > > > _______________________________________________ > > Visit our blog: http://blog.pjsip.org > > > > pjsip mailing list > > pjsip@lists.pjsip.org > > http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org > pjsip Info Page > <http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org> > lists.pjsip.org > This is the mailing list to discuss pjsip, the open source sip stack. We > also have answers to frequently asked questions. To see the collection of > prior postings to ... > > > > > > _______________________________________________ > Visit our blog: http://blog.pjsip.org > > pjsip mailing list > pjsip@lists.pjsip.org > http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org > > _______________________________________________ > Visit our blog: http://blog.pjsip.org > > pjsip mailing list > pjsip@lists.pjsip.org > http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org > > > > > > > *-- Best regards, Alexei * > mailto:alex2grad@gmail.com <alex2grad@gmail.com> >

AG

Alexei Gradinari

Wed, Oct 12, 2016 5:23 PM

Re: [pjsip] patch: crash on using already destroyed ssl socket Hello Riza,

Please, see the next OpenSSL calls.
BIO_pending(ssock->ossl_wbio)
SSL_do_handshake(ssock->ossl_ssl)
SSL_read(ssock->ossl_ssl, data_, size_)
SSL_write(ssock->ossl_ssl, data, (int)size)

All of these calls could use "SSL object when it has encountered a fatal error" or
the SSL object was destroyed.
My patch added this checking, but you removed it.
Why?

The destroy_ssl doesn't destroy write_mutex.
The write_mutex is destroyed in pj_ssl_sock_close.
My patch does nothing with this, so this potential race existed before.

The ssl should be destroyed as soon as possible to release a TCP socket/port.

To fix potential race issue with write_mutex the group locking could use.

So I combined my patch with your group locking patch.
Attached a new patch.

Regards,
Alexei

Wednesday, October 12, 2016, 8:36:13 AM, you wrote:

Hi Alexei,

When doing the review your patch, we identified another potential race issue which might occur.
e.g: write_mutex get destroyed before send() is called.

We feel that using group lock will solve these issues.
As you can see from the patch, the reset_ssl_sock_state() will not destroy the ssl.
We have moved all the code of destroying ssl and write_mutex to the ssl_on_destroy()
which is called when the group lock is no longer held or reference count reach 0.

As for your concern regarding the call to BIO_pending() on a closed socket, it shouldn't be
a problem since we don't use OpenSSL for the socket operation.
Or, maybe I'm missing something here?

Best Regards,

Riza

On Tue, Oct 11, 2016 at 10:44 PM, Alexei Gradinari <alex2grad@gmail.com> wrote:
Hello Riza,

Could you, please, explain why did you remove all checking on openssl calls from my patch?
Your patch didn't fix race condition.
Imagine
one thread calls reset_ssl_sock_state
while another thread calls BIO_pending(ssock->ossl_wbio) on already closed socket.

Pleas, look at commit 189e20c68c2399782034fec9f65d6b9ee88aac3d
https://mta.openssl.org/pipermail/openssl-commits/2015-May/001035.html
Reusing an SSL object when it has encountered a fatal error can
have bad consequences. This is a bug in application code not libssl
but libssl should be more forgiving and not crash.

P.S.
I don't understand why my patch isn't acceptable for you.
Could you, please, explain.

Regards,
Alexei

Sunday, October 9, 2016, 7:51:54 PM, you wrote:

Hi Ross/Alexei,

We have made some modification to the patch in order to incorporate the group lock.
Please find the patch on the attachment, could you help us in testing it?

Best Regards,

Riza

On Fri, Oct 7, 2016 at 4:47 PM, Ross Beer <ross.beer@outlook.com> wrote:
Hi Ming,

Sorry to chase, however is this still undergoing review?

Kind regards,

Ross

From: pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Ming <ming@teluu.com>
Sent: 03 October 2016 12:34
To: pjsip list
Subject: Re: [pjsip] patch: crash on using already destroyed ssl socket

Hi Ross,

Yes, the patch is undergoing some changes and currently under
(hopefully) final review.

Regards,
Ming

On Mon, Oct 3, 2016 at 5:23 PM, Ross Beer <ross.beer@outlook.com> wrote:
> Hi PJSIP team,
>
>
> Could this patch be considered for inclusion in the project?
>
>
> This issue is causing segfaults a least once a day.
>
>
> Kind regards,
>
>
> Ross
>
>
>
> ________________________________
> From: pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Alexei Gradinari
> <alex2grad@gmail.com>
> Sent: 21 September 2016 16:38
> To: pjsip list
> Subject: [pjsip] patch: crash on using already destroyed ssl socket
>
> Hello,
>
> On heavy loaded system with TLS,
> one thread could destroy the ssl socket on SSL_ERROR_SYSCALL
> while another thread still uses this socket which
> was already freed, so we get segfault.
> Attached 2 backtraces.
>
> To avoid race condition need to lock the socket before destroying it.
>
> The attached patch adds the socket lock on destroying it
> and adds a checking on all openssl calls that the socket wasn't destroyed.
>
> Regards,
> Alexei
>
> _______________________________________________
> Visit our blog: http://blog.pjsip.org
>
> pjsip mailing list
> pjsip@lists.pjsip.org
> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
pjsip Info Page
lists.pjsip.org
This is the mailing list to discuss pjsip, the open source sip stack. We also have answers to frequently asked questions. To see the collection of prior postings to ...

>

RS

Riza Sulistyo

Thu, Oct 13, 2016 4:12 AM

Hi Alexei,

Please find the reply, inline.

Best Regards,

Riza

On Thu, Oct 13, 2016 at 12:23 AM, Alexei Gradinari alex2grad@gmail.com
wrote:

Hello Riza,

Please, see the next OpenSSL calls.
BIO_pending(ssock->ossl_wbio)
SSL_do_handshake(ssock->ossl_ssl)
SSL_read(ssock->ossl_ssl, data_, size_)
SSL_write(ssock->ossl_ssl, data, (int)size)

All of these calls could use "SSL object when it has encountered a fatal
error" or
the SSL object was destroyed.
My patch added this checking, but you removed it.
Why?

The checks will reduce the potential of of SSL object reuse, however it's
not bullet proof.
e.g:
1. thread 1: fatal error occurs in on_read(), but before
  reset_ssl_sock_state() is called and ssock->last_err is updated, context
  switch
2. thread 2: ssock_send() executed, call BIO_pending() etc
write_mutex actually is to protect write operation, as for the fatal
error reported (from the call stack),
it happened on read operation which is not protected by write_mutex
(write_mutex will be diacquire when destroying SSL state),
so, adding these checks around write_mutex would be less effective.
based on our test, the crash didn't happen again. This was confirmed by
George and another of our user with the same issue.
If you have a different result, please forward them to us.
lastly (although minor), the fatal error scenario +
reset_ssl_sock_state() happened on read operation but the SSL object being
accessed was for write (e.g: write BIO).
However we are not certain if there's a correlation between the BIO write
and the read operation.

The destroy_ssl doesn't destroy write_mutex.
The write_mutex is destroyed in pj_ssl_sock_close.
My patch does nothing with this, so this potential race existed before.

we were not saying that the issue was introduce by your patch.

The ssl should be destroyed as soon as possible to release a TCP
socket/port.

To fix potential race issue with write_mutex the group locking could use.

So I combined my patch with your group locking patch.
Attached a new patch.

Regards,
Alexei

Wednesday, October 12, 2016, 8:36:13 AM, you wrote:

Hi Alexei,

When doing the review your patch, we identified another potential race
issue which might occur.
e.g: write_mutex get destroyed before send() is called.

We feel that using group lock will solve these issues.
As you can see from the patch, the reset_ssl_sock_state() will not destroy
the ssl.
We have moved all the code of destroying ssl and write_mutex to the
ssl_on_destroy()
which is called when the group lock is no longer held or reference count
reach 0.

As for your concern regarding the call to BIO_pending() on a closed
socket, it shouldn't be
a problem since we don't use OpenSSL for the socket operation.
Or, maybe I'm missing something here?

Best Regards,

Riza

On Tue, Oct 11, 2016 at 10:44 PM, Alexei Gradinari alex2grad@gmail.com
wrote:
Hello Riza,

Could you, please, explain why did you remove all checking on openssl
calls from my patch?
Your patch didn't fix race condition.
Imagine
one thread calls reset_ssl_sock_state
while another thread calls BIO_pending(ssock->ossl_wbio) on already closed
socket.

Pleas, look at commit 189e20c68c2399782034fec9f65d6b9ee88aac3d
https://mta.openssl.org/pipermail/openssl-commits/2015-May/001035.html
Reusing an SSL object when it has encountered a fatal error can
have bad consequences. This is a bug in application code not libssl
but libssl should be more forgiving and not crash.

P.S.
I don't understand why my patch isn't acceptable for you.
Could you, please, explain.

Regards,
Alexei

Sunday, October 9, 2016, 7:51:54 PM, you wrote:

Hi Ross/Alexei,

We have made some modification to the patch in order to incorporate the
group lock.
Please find the patch on the attachment, could you help us in testing it?

Best Regards,

Riza

On Fri, Oct 7, 2016 at 4:47 PM, Ross Beer ross.beer@outlook.com wrote:
Hi Ming,

Sorry to chase, however is this still undergoing review?

Kind regards,

Ross

From: pjsip pjsip-bounces@lists.pjsip.org on behalf of Ming <
ming@teluu.com>
Sent: 03 October 2016 12:34
To: pjsip list
Subject: Re: [pjsip] patch: crash on using already destroyed ssl socket

Hi Ross,

Yes, the patch is undergoing some changes and currently under
(hopefully) final review.

Regards,
Ming

On Mon, Oct 3, 2016 at 5:23 PM, Ross Beer ross.beer@outlook.com wrote:

Hi PJSIP team,

Could this patch be considered for inclusion in the project?

This issue is causing segfaults a least once a day.

Kind regards,

Ross

From: pjsip pjsip-bounces@lists.pjsip.org on behalf of Alexei

Gradinari

alex2grad@gmail.com
Sent: 21 September 2016 16:38
To: pjsip list
Subject: [pjsip] patch: crash on using already destroyed ssl socket

Hello,

On heavy loaded system with TLS,
one thread could destroy the ssl socket on SSL_ERROR_SYSCALL
while another thread still uses this socket which
was already freed, so we get segfault.
Attached 2 backtraces.

To avoid race condition need to lock the socket before destroying it.

The attached patch adds the socket lock on destroying it
and adds a checking on all openssl calls that the socket wasn't

destroyed.

Regards,
Alexei

Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

pjsip Info Page
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
lists.pjsip.org
This is the mailing list to discuss pjsip, the open source sip stack. We
also have answers to frequently asked questions. To see the collection of
prior postings to ...

http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

alex2grad@gmail.com

Hi Alexei, Please find the reply, inline. Best Regards, Riza On Thu, Oct 13, 2016 at 12:23 AM, Alexei Gradinari <alex2grad@gmail.com> wrote: > Hello Riza, > > 1. Please, see the next OpenSSL calls. > BIO_pending(ssock->ossl_wbio) > SSL_do_handshake(ssock->ossl_ssl) > SSL_read(ssock->ossl_ssl, data_, size_) > SSL_write(ssock->ossl_ssl, data, (int)size) > > All of these calls could use "SSL object when it has encountered a fatal > error" or > the SSL object was destroyed. > My patch added this checking, but you removed it. > Why? > - The checks will reduce the potential of of SSL object reuse, however it's not bullet proof. e.g: 1. thread 1: fatal error occurs in on_read(), but before reset_ssl_sock_state() is called and ssock->last_err is updated, context switch 2. thread 2: ssock_send() executed, call BIO_pending() etc - write_mutex actually is to protect write operation, as for the fatal error reported (from the call stack), it happened on read operation which is not protected by write_mutex (write_mutex will be diacquire when destroying SSL state), so, adding these checks around write_mutex would be less effective. - based on our test, the crash didn't happen again. This was confirmed by George and another of our user with the same issue. If you have a different result, please forward them to us. - lastly (although minor), the fatal error scenario + reset_ssl_sock_state() happened on read operation but the SSL object being accessed was for write (e.g: write BIO). However we are not certain if there's a correlation between the BIO write and the read operation. > 2. The destroy_ssl doesn't destroy write_mutex. > The write_mutex is destroyed in pj_ssl_sock_close. > My patch does nothing with this, so this potential race existed before. > we were not saying that the issue was introduce by your patch. > > The ssl should be destroyed as soon as possible to release a TCP > socket/port. > > To fix potential race issue with write_mutex the group locking could use. > > So I combined my patch with your group locking patch. > Attached a new patch. > > Regards, > Alexei > > > Wednesday, October 12, 2016, 8:36:13 AM, you wrote: > > Hi Alexei, > > When doing the review your patch, we identified another potential race > issue which might occur. > e.g: write_mutex get destroyed before send() is called. > > We feel that using group lock will solve these issues. > As you can see from the patch, the reset_ssl_sock_state() will not destroy > the ssl. > We have moved all the code of destroying ssl and write_mutex to the > ssl_on_destroy() > which is called when the group lock is no longer held or reference count > reach 0. > > As for your concern regarding the call to BIO_pending() on a closed > socket, it shouldn't be > a problem since we don't use OpenSSL for the socket operation. > Or, maybe I'm missing something here? > > Best Regards, > > Riza > > > On Tue, Oct 11, 2016 at 10:44 PM, Alexei Gradinari <alex2grad@gmail.com> > wrote: > Hello Riza, > > Could you, please, explain why did you remove all checking on openssl > calls from my patch? > Your patch didn't fix race condition. > Imagine > one thread calls reset_ssl_sock_state > while another thread calls BIO_pending(ssock->ossl_wbio) on already closed > socket. > > Pleas, look at commit 189e20c68c2399782034fec9f65d6b9ee88aac3d > https://mta.openssl.org/pipermail/openssl-commits/2015-May/001035.html > Reusing an SSL object when it has encountered a fatal error can > have bad consequences. This is a bug in application code not libssl > but libssl should be more forgiving and not crash. > > > P.S. > I don't understand why my patch isn't acceptable for you. > Could you, please, explain. > > Regards, > Alexei > > > Sunday, October 9, 2016, 7:51:54 PM, you wrote: > > Hi Ross/Alexei, > > We have made some modification to the patch in order to incorporate the > group lock. > Please find the patch on the attachment, could you help us in testing it? > > Best Regards, > > Riza > > On Fri, Oct 7, 2016 at 4:47 PM, Ross Beer <ross.beer@outlook.com> wrote: > Hi Ming, > > Sorry to chase, however is this still undergoing review? > > Kind regards, > > Ross > > > ------------------------------ > *From:* pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Ming < > ming@teluu.com> > *Sent:* 03 October 2016 12:34 > *To:* pjsip list > *Subject:* Re: [pjsip] patch: crash on using already destroyed ssl socket > > Hi Ross, > > Yes, the patch is undergoing some changes and currently under > (hopefully) final review. > > Regards, > Ming > > On Mon, Oct 3, 2016 at 5:23 PM, Ross Beer <ross.beer@outlook.com> wrote: > > Hi PJSIP team, > > > > > > Could this patch be considered for inclusion in the project? > > > > > > This issue is causing segfaults a least once a day. > > > > > > Kind regards, > > > > > > Ross > > > > > > > > ________________________________ > > From: pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Alexei > Gradinari > > <alex2grad@gmail.com> > > Sent: 21 September 2016 16:38 > > To: pjsip list > > Subject: [pjsip] patch: crash on using already destroyed ssl socket > > > > Hello, > > > > On heavy loaded system with TLS, > > one thread could destroy the ssl socket on SSL_ERROR_SYSCALL > > while another thread still uses this socket which > > was already freed, so we get segfault. > > Attached 2 backtraces. > > > > To avoid race condition need to lock the socket before destroying it. > > > > The attached patch adds the socket lock on destroying it > > and adds a checking on all openssl calls that the socket wasn't > destroyed. > > > > Regards, > > Alexei > > > > _______________________________________________ > > Visit our blog: http://blog.pjsip.org > > > > pjsip mailing list > > pjsip@lists.pjsip.org > > http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org > pjsip Info Page > <http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org> > lists.pjsip.org > This is the mailing list to discuss pjsip, the open source sip stack. We > also have answers to frequently asked questions. To see the collection of > prior postings to ... > > > > > > <http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org> > > <alex2grad@gmail.com> >

GJ

George Joseph

Thu, Oct 13, 2016 2:25 PM

On Wed, Oct 12, 2016 at 10:12 PM, Riza Sulistyo riza@pjsip.org wrote:

Hi Alexei,

Please find the reply, inline.

Best Regards,

Riza

On Thu, Oct 13, 2016 at 12:23 AM, Alexei Gradinari alex2grad@gmail.com
wrote:

Hello Riza,

Please, see the next OpenSSL calls.
BIO_pending(ssock->ossl_wbio)
SSL_do_handshake(ssock->ossl_ssl)
SSL_read(ssock->ossl_ssl, data_, size_)
SSL_write(ssock->ossl_ssl, data, (int)size)

All of these calls could use "SSL object when it has encountered a fatal
error" or
the SSL object was destroyed.
My patch added this checking, but you removed it.
Why?

The checks will reduce the potential of of SSL object reuse, however
it's not bullet proof.
e.g:
1. thread 1: fatal error occurs in on_read(), but before
  reset_ssl_sock_state() is called and ssock->last_err is updated, context
  switch
2. thread 2: ssock_send() executed, call BIO_pending() etc
write_mutex actually is to protect write operation, as for the fatal
error reported (from the call stack),
it happened on read operation which is not protected by write_mutex
(write_mutex will be diacquire when destroying SSL state),
so, adding these checks around write_mutex would be less effective.
based on our test, the crash didn't happen again. This was confirmed by
George and another of our user with the same issue.

Actually I'm going to defer to Alexei and Ross on this.

If you have a different result, please forward them to us.

lastly (although minor), the fatal error scenario +
reset_ssl_sock_state() happened on read operation but the SSL object being
accessed was for write (e.g: write BIO).
However we are not certain if there's a correlation between the BIO
write and the read operation.

The destroy_ssl doesn't destroy write_mutex.
The write_mutex is destroyed in pj_ssl_sock_close.
My patch does nothing with this, so this potential race existed before.

we were not saying that the issue was introduce by your patch.

The ssl should be destroyed as soon as possible to release a TCP
socket/port.

To fix potential race issue with write_mutex the group locking could use.

So I combined my patch with your group locking patch.
Attached a new patch.

Regards,
Alexei

Wednesday, October 12, 2016, 8:36:13 AM, you wrote:

Hi Alexei,

When doing the review your patch, we identified another potential race
issue which might occur.
e.g: write_mutex get destroyed before send() is called.

We feel that using group lock will solve these issues.
As you can see from the patch, the reset_ssl_sock_state() will not
destroy the ssl.
We have moved all the code of destroying ssl and write_mutex to the
ssl_on_destroy()
which is called when the group lock is no longer held or reference count
reach 0.

As for your concern regarding the call to BIO_pending() on a closed
socket, it shouldn't be
a problem since we don't use OpenSSL for the socket operation.
Or, maybe I'm missing something here?

Best Regards,

Riza

On Tue, Oct 11, 2016 at 10:44 PM, Alexei Gradinari alex2grad@gmail.com
wrote:
Hello Riza,

Could you, please, explain why did you remove all checking on openssl
calls from my patch?
Your patch didn't fix race condition.
Imagine
one thread calls reset_ssl_sock_state
while another thread calls BIO_pending(ssock->ossl_wbio) on already
closed socket.

Pleas, look at commit 189e20c68c2399782034fec9f65d6b9ee88aac3d
https://mta.openssl.org/pipermail/openssl-commits/2015-May/001035.html
Reusing an SSL object when it has encountered a fatal error can
have bad consequences. This is a bug in application code not libssl
but libssl should be more forgiving and not crash.

P.S.
I don't understand why my patch isn't acceptable for you.
Could you, please, explain.

Regards,
Alexei

Sunday, October 9, 2016, 7:51:54 PM, you wrote:

Hi Ross/Alexei,

We have made some modification to the patch in order to incorporate the
group lock.
Please find the patch on the attachment, could you help us in testing it?

Best Regards,

Riza

On Fri, Oct 7, 2016 at 4:47 PM, Ross Beer ross.beer@outlook.com wrote:
Hi Ming,

Sorry to chase, however is this still undergoing review?

Kind regards,

Ross

From: pjsip pjsip-bounces@lists.pjsip.org on behalf of Ming <
ming@teluu.com>
Sent: 03 October 2016 12:34
To: pjsip list
Subject: Re: [pjsip] patch: crash on using already destroyed ssl
socket

Hi Ross,

Yes, the patch is undergoing some changes and currently under
(hopefully) final review.

Regards,
Ming

On Mon, Oct 3, 2016 at 5:23 PM, Ross Beer ross.beer@outlook.com wrote:

Hi PJSIP team,

Could this patch be considered for inclusion in the project?

This issue is causing segfaults a least once a day.

Kind regards,

Ross

From: pjsip pjsip-bounces@lists.pjsip.org on behalf of Alexei

Gradinari

alex2grad@gmail.com
Sent: 21 September 2016 16:38
To: pjsip list
Subject: [pjsip] patch: crash on using already destroyed ssl socket

Hello,

On heavy loaded system with TLS,
one thread could destroy the ssl socket on SSL_ERROR_SYSCALL
while another thread still uses this socket which
was already freed, so we get segfault.
Attached 2 backtraces.

To avoid race condition need to lock the socket before destroying it.

The attached patch adds the socket lock on destroying it
and adds a checking on all openssl calls that the socket wasn't

destroyed.

Regards,
Alexei

Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

pjsip Info Page
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
lists.pjsip.org
This is the mailing list to discuss pjsip, the open source sip stack. We
also have answers to frequently asked questions. To see the collection of
prior postings to ...

http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

alex2grad@gmail.com

Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

--
George Joseph
Digium, Inc. | Software Developer
445 Jan Davis Drive NW - Huntsville, AL 35806 - US
Check us out at: www.digium.com & www.asterisk.org

On Wed, Oct 12, 2016 at 10:12 PM, Riza Sulistyo <riza@pjsip.org> wrote: > Hi Alexei, > > Please find the reply, inline. > > Best Regards, > > Riza > > On Thu, Oct 13, 2016 at 12:23 AM, Alexei Gradinari <alex2grad@gmail.com> > wrote: > >> Hello Riza, >> >> 1. Please, see the next OpenSSL calls. >> BIO_pending(ssock->ossl_wbio) >> SSL_do_handshake(ssock->ossl_ssl) >> SSL_read(ssock->ossl_ssl, data_, size_) >> SSL_write(ssock->ossl_ssl, data, (int)size) >> >> All of these calls could use "SSL object when it has encountered a fatal >> error" or >> the SSL object was destroyed. >> My patch added this checking, but you removed it. >> Why? >> > > - The checks will reduce the potential of of SSL object reuse, however > it's not bullet proof. > e.g: > 1. thread 1: fatal error occurs in on_read(), but before > reset_ssl_sock_state() is called and ssock->last_err is updated, context > switch > 2. thread 2: ssock_send() executed, call BIO_pending() etc > > - write_mutex actually is to protect write operation, as for the fatal > error reported (from the call stack), > it happened on read operation which is not protected by write_mutex > (write_mutex will be diacquire when destroying SSL state), > so, adding these checks around write_mutex would be less effective. > > - based on our test, the crash didn't happen again. This was confirmed by > George and another of our user with the same issue. > Actually I'm going to defer to Alexei and Ross on this. > If you have a different result, please forward them to us. > > - lastly (although minor), the fatal error scenario + > reset_ssl_sock_state() happened on read operation but the SSL object being > accessed was for write (e.g: write BIO). > However we are not certain if there's a correlation between the BIO > write and the read operation. > > >> 2. The destroy_ssl doesn't destroy write_mutex. >> The write_mutex is destroyed in pj_ssl_sock_close. >> My patch does nothing with this, so this potential race existed before. >> > > we were not saying that the issue was introduce by your patch. > > >> >> The ssl should be destroyed as soon as possible to release a TCP >> socket/port. >> >> To fix potential race issue with write_mutex the group locking could use. >> >> So I combined my patch with your group locking patch. >> Attached a new patch. >> >> Regards, >> Alexei >> >> >> Wednesday, October 12, 2016, 8:36:13 AM, you wrote: >> >> Hi Alexei, >> >> When doing the review your patch, we identified another potential race >> issue which might occur. >> e.g: write_mutex get destroyed before send() is called. >> >> We feel that using group lock will solve these issues. >> As you can see from the patch, the reset_ssl_sock_state() will not >> destroy the ssl. >> We have moved all the code of destroying ssl and write_mutex to the >> ssl_on_destroy() >> which is called when the group lock is no longer held or reference count >> reach 0. >> >> As for your concern regarding the call to BIO_pending() on a closed >> socket, it shouldn't be >> a problem since we don't use OpenSSL for the socket operation. >> Or, maybe I'm missing something here? >> >> Best Regards, >> >> Riza >> >> >> On Tue, Oct 11, 2016 at 10:44 PM, Alexei Gradinari <alex2grad@gmail.com> >> wrote: >> Hello Riza, >> >> Could you, please, explain why did you remove all checking on openssl >> calls from my patch? >> Your patch didn't fix race condition. >> Imagine >> one thread calls reset_ssl_sock_state >> while another thread calls BIO_pending(ssock->ossl_wbio) on already >> closed socket. >> >> Pleas, look at commit 189e20c68c2399782034fec9f65d6b9ee88aac3d >> https://mta.openssl.org/pipermail/openssl-commits/2015-May/001035.html >> Reusing an SSL object when it has encountered a fatal error can >> have bad consequences. This is a bug in application code not libssl >> but libssl should be more forgiving and not crash. >> >> >> P.S. >> I don't understand why my patch isn't acceptable for you. >> Could you, please, explain. >> >> Regards, >> Alexei >> >> >> Sunday, October 9, 2016, 7:51:54 PM, you wrote: >> >> Hi Ross/Alexei, >> >> We have made some modification to the patch in order to incorporate the >> group lock. >> Please find the patch on the attachment, could you help us in testing it? >> >> Best Regards, >> >> Riza >> >> On Fri, Oct 7, 2016 at 4:47 PM, Ross Beer <ross.beer@outlook.com> wrote: >> Hi Ming, >> >> Sorry to chase, however is this still undergoing review? >> >> Kind regards, >> >> Ross >> >> >> ------------------------------ >> *From:* pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Ming < >> ming@teluu.com> >> *Sent:* 03 October 2016 12:34 >> *To:* pjsip list >> *Subject:* Re: [pjsip] patch: crash on using already destroyed ssl >> socket >> >> Hi Ross, >> >> Yes, the patch is undergoing some changes and currently under >> (hopefully) final review. >> >> Regards, >> Ming >> >> On Mon, Oct 3, 2016 at 5:23 PM, Ross Beer <ross.beer@outlook.com> wrote: >> > Hi PJSIP team, >> > >> > >> > Could this patch be considered for inclusion in the project? >> > >> > >> > This issue is causing segfaults a least once a day. >> > >> > >> > Kind regards, >> > >> > >> > Ross >> > >> > >> > >> > ________________________________ >> > From: pjsip <pjsip-bounces@lists.pjsip.org> on behalf of Alexei >> Gradinari >> > <alex2grad@gmail.com> >> > Sent: 21 September 2016 16:38 >> > To: pjsip list >> > Subject: [pjsip] patch: crash on using already destroyed ssl socket >> > >> > Hello, >> > >> > On heavy loaded system with TLS, >> > one thread could destroy the ssl socket on SSL_ERROR_SYSCALL >> > while another thread still uses this socket which >> > was already freed, so we get segfault. >> > Attached 2 backtraces. >> > >> > To avoid race condition need to lock the socket before destroying it. >> > >> > The attached patch adds the socket lock on destroying it >> > and adds a checking on all openssl calls that the socket wasn't >> destroyed. >> > >> > Regards, >> > Alexei >> > >> > _______________________________________________ >> > Visit our blog: http://blog.pjsip.org >> > >> > pjsip mailing list >> > pjsip@lists.pjsip.org >> > http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org >> pjsip Info Page >> <http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org> >> lists.pjsip.org >> This is the mailing list to discuss pjsip, the open source sip stack. We >> also have answers to frequently asked questions. To see the collection of >> prior postings to ... >> >> >> >> > >> <http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org> >> >> <alex2grad@gmail.com> >> > > > _______________________________________________ > Visit our blog: http://blog.pjsip.org > > pjsip mailing list > pjsip@lists.pjsip.org > http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org > > -- George Joseph Digium, Inc. | Software Developer 445 Jan Davis Drive NW - Huntsville, AL 35806 - US Check us out at: www.digium.com & www.asterisk.org