time-nuts@lists.febo.com

Discussion of precise time and frequency measurement

View all threads

Re: [time-nuts] ISS NTP operation problems.

HM
Hal Murray
Fri, Jan 8, 2021 3:58 PM

If you path is not stable, or you flip between different servers with
different delays and/or assymetries, your time will not be stable.

Ahh.  Thanks for the reminder.

There is a huff-puff filter, I think it's optional.  It assumes the physical
path is stable and that increases in the round trip time are due to queuing
delays which are typically asymmetric so it drops answers if the delay is
longer than previous samples.  I'd have to look at the code for the details.

But that doesn't match the crazy graph with the drift way off.

--
These are my opinions.  I hate spam.

phk@phk.freebsd.dk said: > If you path is not stable, or you flip between different servers with > different delays and/or assymetries, your time will not be stable. Ahh. Thanks for the reminder. There is a huff-puff filter, I think it's optional. It assumes the physical path is stable and that increases in the round trip time are due to queuing delays which are typically asymmetric so it drops answers if the delay is longer than previous samples. I'd have to look at the code for the details. But that doesn't match the crazy graph with the drift way off. -- These are my opinions. I hate spam.
PK
Poul-Henning Kamp
Fri, Jan 8, 2021 4:10 PM

Hal Murray writes:

There is a huff-puff filter, I think it's optional.

I think it is more likely that the median-filter is causing trouble.

If you let the poll-rate ramp all the way up to 1024 seconds, the
median filter can get "stuck" for almost an hour before it notices
that something is horribly wrong[1].

These days there are almost no circumstances under which you should not
clamp your poll-rate to one minute "server bla maxpoll 6"

On a strange path like the ISS, I would clamp it to the minimum
16 seconds ('maxpoll 4').

When the backbone of the ARPAnet was 56kbit/s, being able to ramp
up to 1024 seconds poll rate made sense.  It never makes sense now.

Poul-Henning

[1] The median filter should be automatically disabled and the most
recent sample used, if the samples in the shift register are
mononotonic or spread too much.  That's another change I never
managed to "sell" to Dave Mills.

--
Poul-Henning Kamp      | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG        | TCP/IP since RFC 956
FreeBSD committer      | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

-------- Hal Murray writes: > There is a huff-puff filter, I think it's optional. I think it is more likely that the median-filter is causing trouble. If you let the poll-rate ramp all the way up to 1024 seconds, the median filter can get "stuck" for almost an hour before it notices that something is horribly wrong[1]. These days there are almost no circumstances under which you should not clamp your poll-rate to one minute "server bla maxpoll 6" On a strange path like the ISS, I would clamp it to the minimum 16 seconds ('maxpoll 4'). When the backbone of the ARPAnet was 56kbit/s, being able to ramp up to 1024 seconds poll rate made sense. It never makes sense now. Poul-Henning [1] The median filter should be automatically disabled and the most recent sample used, if the samples in the shift register are mononotonic or spread too much. That's another change I never managed to "sell" to Dave Mills. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.
MD
Magnus Danielson
Fri, Jan 8, 2021 10:11 PM

On 2021-01-08 16:58, Hal Murray wrote:

If you path is not stable, or you flip between different servers with
different delays and/or assymetries, your time will not be stable.

Ahh.  Thanks for the reminder.

There is a huff-puff filter, I think it's optional.  It assumes the physical
path is stable and that increases in the round trip time are due to queuing
delays which are typically asymmetric so it drops answers if the delay is
longer than previous samples.  I'd have to look at the code for the details.

Huff-'n-puff is there only to handle packet jitter. The same method is
called min delay algorithm or lucky packet filtering in other
techniques. For the one-way delay estimate, over some window of raw
measurements, you produce the minimum delay one. Then you do the same
for the other direction. Only using these values you then do your
two-way time-transfer calculation.

You should also be cautioned that it works only because the time between
the nodes is stable, and thus the frequency difference between them
being virtually zero, because the assumption made breaks down whenever
there is a phase-drift / frequency error.

The success of this method depends on how large window of measurements
you have. You can get significant gains as you make the window cover
more samples. In the end, the success depends on packet rate, as you end
up having some maximum time between regulations.

This does not really solve problems with asymmetric routes,
route-changes etc.

But that doesn't match the crazy graph with the drift way off.

I don't recall seeing the graph, did I forget to check it in the
original posting? Yes, lazy me forgot to check that. OK. Will look at it.

Cheers,
Magnus

On 2021-01-08 16:58, Hal Murray wrote: > phk@phk.freebsd.dk said: >> If you path is not stable, or you flip between different servers with >> different delays and/or assymetries, your time will not be stable. > Ahh. Thanks for the reminder. > > There is a huff-puff filter, I think it's optional. It assumes the physical > path is stable and that increases in the round trip time are due to queuing > delays which are typically asymmetric so it drops answers if the delay is > longer than previous samples. I'd have to look at the code for the details. Huff-'n-puff is there only to handle packet jitter. The same method is called min delay algorithm or lucky packet filtering in other techniques. For the one-way delay estimate, over some window of raw measurements, you produce the minimum delay one. Then you do the same for the other direction. Only using these values you then do your two-way time-transfer calculation. You should also be cautioned that it works only because the time between the nodes is stable, and thus the frequency difference between them being virtually zero, because the assumption made breaks down whenever there is a phase-drift / frequency error. The success of this method depends on how large window of measurements you have. You can get significant gains as you make the window cover more samples. In the end, the success depends on packet rate, as you end up having some maximum time between regulations. This does not really solve problems with asymmetric routes, route-changes etc. > But that doesn't match the crazy graph with the drift way off. I don't recall seeing the graph, did I forget to check it in the original posting? Yes, lazy me forgot to check that. OK. Will look at it. Cheers, Magnus
MD
Magnus Danielson
Fri, Jan 8, 2021 10:20 PM

Poul-Henning,

On 2021-01-08 17:10, Poul-Henning Kamp wrote:


Hal Murray writes:

There is a huff-puff filter, I think it's optional.

I think it is more likely that the median-filter is causing trouble.

If you let the poll-rate ramp all the way up to 1024 seconds, the
median filter can get "stuck" for almost an hour before it notices
that something is horribly wrong[1].

These days there are almost no circumstances under which you should not
clamp your poll-rate to one minute "server bla maxpoll 6"

On a strange path like the ISS, I would clamp it to the minimum
16 seconds ('maxpoll 4').

When the backbone of the ARPAnet was 56kbit/s, being able to ramp
up to 1024 seconds poll rate made sense.  It never makes sense now.

Poul-Henning

[1] The median filter should be automatically disabled and the most
recent sample used, if the samples in the shift register are
mononotonic or spread too much.  That's another change I never
managed to "sell" to Dave Mills.

The Allan intercept concept comes out of the analysis of the ACTS system
where model signals was used over the public phone network and the noise
analysis in that context. For it's system view, it's a good analysis and
it allowed to reduce the number of calls, associated with significant
cost, into the ACTS system. Then, this was re-applied over to NTP under
the assumption that the noise models work. NEWSFLASH They don't. The
non-zero mean character of network delay variations just play havoc with
the underlying assumption. As one applies knowledge on how to handle
that noise, use min-delay algorithm, play with packet rates etc. you end
up with quite a different solution. Increasing packet-rate for NTP today
is for most scenarios very cheap, so the very fine property of low load
of NTP ends up being not so greatly needed at all times. If we do the
right processing, we can increase the packet rate for the benefit of
better performance.

Now, if one worked enough with these things, the things I say is really
not dramatic and new, but there is still a bit of history involved where
times have changes significantly.

Cheers,
Magnus

Poul-Henning, On 2021-01-08 17:10, Poul-Henning Kamp wrote: > -------- > Hal Murray writes: > >> There is a huff-puff filter, I think it's optional. > I think it is more likely that the median-filter is causing trouble. > > If you let the poll-rate ramp all the way up to 1024 seconds, the > median filter can get "stuck" for almost an hour before it notices > that something is horribly wrong[1]. > > These days there are almost no circumstances under which you should not > clamp your poll-rate to one minute "server bla maxpoll 6" > > On a strange path like the ISS, I would clamp it to the minimum > 16 seconds ('maxpoll 4'). > > When the backbone of the ARPAnet was 56kbit/s, being able to ramp > up to 1024 seconds poll rate made sense. It never makes sense now. > > Poul-Henning > > > [1] The median filter should be automatically disabled and the most > recent sample used, if the samples in the shift register are > mononotonic or spread too much. That's another change I never > managed to "sell" to Dave Mills. The Allan intercept concept comes out of the analysis of the ACTS system where model signals was used over the public phone network and the noise analysis in that context. For it's system view, it's a good analysis and it allowed to reduce the number of calls, associated with significant cost, into the ACTS system. Then, this was re-applied over to NTP under the assumption that the noise models work. *NEWSFLASH* They don't. The non-zero mean character of network delay variations just play havoc with the underlying assumption. As one applies knowledge on how to handle that noise, use min-delay algorithm, play with packet rates etc. you end up with quite a different solution. Increasing packet-rate for NTP today is for most scenarios very cheap, so the very fine property of low load of NTP ends up being not so greatly needed at all times. If we do the right processing, we can increase the packet rate for the benefit of better performance. Now, if one worked enough with these things, the things I say is really not dramatic and new, but there is still a bit of history involved where times have changes significantly. Cheers, Magnus