
IEEE 1588 Precision Time Protocol (PTP)

Introduction
The "exact time protocol" is described by the IEEE 1588 standard . There are 2 versions of the standard. The first version was released in 2002, then the standard was revised in 2008 and the PTPv2 protocol was released. Backward compatibility has not been maintained.
I work with the second version of the protocol, it has many improvements compared to the first (accuracy, stability, as the wiki tells us ). I will not give comparisons with NTP, just mentioning the accuracy of synchronization, and the accuracy of PTP reaches really tens of nanoseconds with "iron" support, it speaks of an advantage over NTP.
Iron support for the protocol in different devices can be implemented in different ways. In fact, the minimum required for the implementation of PTP is the ability of the hardware to set the time stamp of the moment the message is received on the port. The time stamped will be used to calculate the error.
Why is the watch upset?
Errors can appear from everywhere. To begin with, the frequency generators in the devices are different and there is very little chance that two different devices will work perfectly in time with the clock. Immediately, you can attribute constantly changing environmental conditions that affect the generated frequency.
What are we trying to achieve?
Suppose we have a device that works under ideal conditions, some kind of atomic clock that doesn’t even go to the end of the world (of course, to a real one, and not intended by the Mayan calendar) and the task is to get at least approximately (with an accuracy of 10 -9 sec) same hours. We need to synchronize this watch. To do this, you can implement the PTP protocol.
The difference between purely software implementation and implementation with "iron support"
A purely software implementation will not achieve the promised accuracy. The time elapsed from the moment a message was received (or rather, a signal to receive a message in the device) to the transition to the interrupt entry point or callback cannot be strictly determined. Smart glands with PTP support can put these timestamps on their own (for example, chips from Micrel , I’m writing a driver for KSZ8463MLI).
In addition to timestamps, the ability to adjust the crystal oscillator (to align the frequency with the master), or the ability to adjust the clock (increase the clock value by X ns each time) can also be referred to as “iron” support. About it below.
Let's move on to the IEEE 1588 standard
The standard is described already on 289 pages. Consider the minimum required to implement the protocol. PTP is a client-server synchronization protocol, i.e. protocol implementation requires at least 2 devices. So, the Master device is an atomic clock, and the Slave device is a clock that must be made to work accurately.
Language of exchange
Announce message - an announcement message that contains information sent by the wizard to all Slave devices. Using this message, the slave device can select the best master (there is a BMC (Best Master Clock) for this) algorithm. BMC is not so interesting. This algorithm can be easily found in the standard. The choice is made on such message fields as accuracy, variance, class, priority, etc. Let's move on to other posts.
Sync / Follow Up, DelayResp, PDelayResp / PDelayFollowUp - sent by the master, below we will consider them in more detail.
DelayReq, PDelayReq - requests for Slave devices.
As you can see, the Slave device is not verbose, Master provides almost all the information itself. Sending is carried out on Multicast (if desired, you can use the Unicast mode) addresses that are strictly defined in the standard. There is a separate address for PDelay messages (01-80-C2-00-00-0E for Ethernet and 224.0.0.107 for UDP). The remaining messages are sent to 01-1B-19-00-00-00 or 224.0.1.129. Packages differ in the ClockIdentity (clock identifier) and SequenceId (packet identifier) fields .
Work session
Let's say the master was selected using the BMC algorithm, or the master is the only one on the network. The picture shows the communication procedure of the main device and the synchronized one.

- It all starts with the fact that Master sends a Sync message and at the same time records the sending time t1. There are one- and two-stage modes of operation. Distinguishing them is very easy: if a FollowUp message is present , then we are dealing with a two-stage implementation, the dotted arrow shows optional messages
- FollowUp message is sent following Sync and contains time t1. If the transmission is carried out in one step, then Sync contains t1 in the message body. In any case, t1 will be received by our device. When a Sync message is received on the Slave, the time stamp t2 is generated. Thus we get t1, t2
- Slave generates a DelayReq message simultaneously with t3 generation
- Master receives a DelayReq message while generating t4
- t4 sent to Salve device in DelayResp message

Messages on the network
With the help of the exchange session shown above, you can succeed only if the quartz generates perfectly identical frequencies for synchronized devices. In fact, it turns out that the clock frequency is different, i.e. on one device, in 1 second, the clock will increase by 1 second, and on another, for example, by 1.000001 second. From here comes the divergence of hours.
The standard describes an example of calculating the ratio of time elapsed on the Master and on the Slave for a certain interval. This ratio will be a coefficient for the Slave frequency of the device. But at the same time there is an indication that adjustment can be carried out in various ways. Consider two of them:
- Change the clock frequency of the Slave device (example in the standard)
- Do not change the clock frequency, but for each beat of duration T, the clock value will increase not by T, but by T + ∆t (used in my implementation)
In both methods, you will need to calculate the difference in the time values on the Master device for a certain interval, as well as the time difference, for the same interval on the Slave device. The coefficient in the first method:

The second method requires the calculation of ∆t. ∆t is the value that will be added up with the time value for each defined interval. You can see in the figure that while 22 - 15 = 7 seconds passed on the master, 75+ (87-75) / 2 - (30+ (37-30) / 2) = 47.5 passed on the Slave

Frequency - processor frequency, for example, 25 MHz - the processor cycle lasts 1 / (25 * 10 6 ) = 40ns.
Depending on the capabilities of the device, the most suitable method is selected.
To move on to the next section, let's express the offset a little differently:

PTP Modes
Looking at the standard, you can find not the only way to calculate delivery time. There are 2 modes of PTPv2 operation. This is E2E (End-to-End) , it was discussed above, P2P (Peer-to-Peer) mode is also described . Let's see where what method to apply and what is their difference.
In principle, you can use any of the modes as desired, but they can not be combined in one network.
- In E2E mode , the delivery time is calculated by messages received through many devices, each of which puts Sync or FollowUP messages (if two-stage transmission) in the correction field , the time that the packet was delayed on this device (if the devices are connected directly, no correction is applied, therefore we will not consider them in detail). Used messages: Sync / FollowUp, DelayReq / DelayResp
- In P2P mode, not only the time the packet was delayed is entered in the correction field, (t2-t1) is added to it (can be read in the standard). Uses Sync / FollowUp, PDelayReq / PDelayResp / PDelayRespFollowUp messages
According to the standard, the hours through which PTP messages go with a change in the correction field are called Transparent Clock (TC) . Let us see in the figures how messages are transmitted in these two modes. Blue arrows indicate Sync and FollowUp messages .

End-to-End

mode Peer-to-Peer
mode We see that in P2P mode some red arrows appeared. These are the remaining messages that we did not consider, namely PDelayReq , PDelayResp and PDelayFollowUp . Here is the messaging session:

Delivery Time Error
The standard describes the implementation of the protocol in various types of networks. I used an Ethernet network, and received messages at the Ethernet level. In such networks, the packet delivery time is constantly changing (especially noticeable when working with nanosecond accuracy). In order to filter these values, various filters are applied.
What you need to filter:
- Time of delivery
- ∆t
- Bias
My driver uses about the same filtering system as the Linux PTPd daemon , the source of which can be found here , there is still some information here . I will give only a diagram:

LP IIR (Infinite Impulse Response low-pass) filter (Filter with an infinite impulse response), described by the formula:,

where s is a coefficient that allows you to adjust the filter cutoff.
Adjustment calculation
Let's move on to the adjustment, to that delta, which should be added to the value of the second. The calculation scheme used in my system:

I used the Kalman filter to filter out the strong jitter of the tuning due to network interference, I really liked this article . In general, you can use any filter that you like, the main thing is to smooth the chart. In PTPd , for example, filtering is simpler - the average of the current and previous values is calculated. On the graph, you can see the results of the Kalman filter in my driver (tuning error is shown, expressed in subnanoseconds on a 25 MHz chip):

We turn to the adjustment adjustment, the adjustment should strive for a constant, the PI controller is used. In PTPd , the clock offsets are regulated (the adjustment goes according to the offset), but I use it to regulate the adjustment (KSZ8463MLI feature). We see that the controller is not configured perfectly, but in my case such an adjustment is sufficient:

Work result

The result is shown in the graph. Clock offset within -50ns to 50ns. Therefore, I have achieved the accuracy referred to in numerous articles. Of course, many small features of the implementation remained behind the scenes, but the necessary minimum was demonstrated.