Special purpose receivers are available for many time-dissemination services, including the Global Position System (GPS) and other services operated by various national governments. For reasons of cost and convenience, it is not possible to equip every computer with one of these receivers. However, it is possible to equip some number of computers acting as primary time servers to synchronize a much larger number of secondary servers and clients connected by a common network. In order to do this, a distributed network clock synchronization protocol is required which can read a server clock, transmit the reading to one or more clients and adjust each client clock as required. Protocols that do this include the Network Time Protocol (NTP), Digital Time Synchronization Protocol (DTSS) and others found in the literature (See "Further Reading" at the end of this article.)
The community served by the synchronization protocol can be very large. For instance, the NTP community in the Internet of 1998 includes over 230 primary time servers, synchronized by radio, satellite and modem, and well over 100,000 secondary servers and clients. In addition, there are many thousands of private communities in large government, corporate and institution networks. Each community is organized as a tree graph or subnet, with the primary servers at the root and secondary servers and clients at increasing hop count, or stratum level, in corporate, department and desktop networks. It is usually necessary at each stratum level to employ redundant servers and diverse network paths in order to protect against broken software, hardware and network links.
Synchronization protocols work in one or more association modes, depending on the protocol design. Client/server mode, also called master/slave mode, is supported in both DTSS and NTP. In this mode, a client synchronizes to a stateless server as in the conventional RPC model. NTP also supports symmetric mode, which allows either of two peer servers to synchronize to the other, in order to provide mutual backup. DTSS and NTP support a broadcast mode which allows many clients to synchronize to one or a few servers, reducing network traffic when large numbers of clients are involved. In NTP, IP multicast can be used when the subnet spans multiple networks.
Configuration management can be a serious problem in large subnets. Various schemes which index public databases and network directory services are used in DTSS and NTP to discover servers. Both protocols use broadcast modes to support large client populations; but, since listen-only clients cannot calibrate the delay, accuracy can suffer. In NTP, clients determine the delay at the time a server is first discovered by polling the server in client/server mode and then reverting to listen-only mode. In addition, NTP clients can broadcast a special "manycast" message to solicit responses from nearby servers and continue in client/server mode with the respondents.
Clock errors are due to variations in network delay and latencies in computer hardware and software (jitter), as well as clock oscillator instability (wander). The time of a client relative to its server can be expressed
T(t) = T(t0) + R(t - t0) + 1/2 D(t - t0)2,
where t is the current time, T is the time offset at the last measurement update t0, R is the frequency offset and D is the drift due to resonator ageing. All three terms include systematic offsets that can be corrected and random variations that cannot. Some protocols, including DTSS, estimate only the first term in this expression, while others, including NTP, estimate the first two terms. Errors due to the third term, while important to model resonator aging in precision applications, are neglected, since they are usually dominated by errors in the first two terms.
The synchronization protocol estimates T(t0) (and R(t0), where relevant) at regular intervals t0 and adjusts the clock to minimize T(t) in future. In common cases, R can have systematic offsets of several hundred parts-per-million (PPM) with random variations of several PPM due to ambient temperature changes. If not corrected, the resulting errors can accumulate to seconds per day. In order that these errors do not exceed a nominal specification, the protocol must periodically re-estimate T and R and compensate for variations by adjusting the clock at regular intervals. As a practical matter, for nominal accuracies of tens of milliseconds, this requires clients to exchange messages with servers at intervals in the order of tens of minutes.
Analysis of quartz-resonator stabilized oscillators show that errors are a function of the averaging time, which in turn depends on the interval between corrections. At correction intervals less than a few hundred seconds, errors are dominated by jitter, while, at intervals greater than this, errors are dominated by wander. As explained later, the characteristics of each regime determine the algorithm used to discipline the clock. These errors accumulate at each stratum level from the root to the leaves of the subnet tree. It is possible to quantify these errors by statistical means, as in NTP. This allows real-time applications to adjust audio or video playout delay, for example. However, the required statistics may be different for various classes of applications. Some applications need absolute error bounds guaranteed never to exceeded, as provided by the following correctness principles.
In the Probabilistic Clock Synchronization (PCS) scheme devised by Cristian, a maximum error tolerance is established in advance and time value samples associated with roundtrip delays that exceed twice this value are discarded. By the above argument, the remaining samples must represent time values within the specified tolerance. As the tolerance is decreased, more samples fail the test until a point where no samples survive. The tolerance can be adjusted for the best compromise between the highest accuracy consistent with acceptable sample survival rate.
In a scheme devised by Marzullo and exploited in NTP and DTSS, the worst-case error determined for each server determines a correctness interval. If each of a number of servers are in fact synchronized to a common timescale, the actual time must be contained in the intersection of their correctness intervals. If some intervals do not intersect, then the clique containing the maximum number of intersections is assumed correct truechimers and the others assumed incorrect falsetickers. Only the truechimers are used to adjust the system clock.
The same principle could be used when selecting the best subset of servers and combining their offsets to determine the clock adjustment. However, different servers often show different systematic offsets, so the best statistic for the central tendency of the server population may not be obvious. Various kinds of clustering algorithms have been found useful for this purpose. The one used in NTP sorts the offsets by a quality metric, then calculates the variance of all servers relative to each server separately. The algorithm repeatedly discards the outlyer with the largest variance until further discards will not improve the residual variance or until a minimum number of servers remain. The final clock adjustment is computed as a weighted average of the survivors.
At the heart of the synchronization protocol is the algorithm used to adjust the system clock in accordance with the final adjustment determined by the above algorithms. This is called the clock discipline algorithm or simply the discipline. Such algorithms can be classed according to whether they minimize the time offset or frequency offset or both. For instance, the discipline used in DTSS minimizes only the time offset, while the one used in NTP minimizes both time and frequency offsets. While the DTSS algorithm cannot remove residual errors due to systematic frequency errors, the NTP algorithm is more complicated and less forgiving of design and implementation mistakes.
All clock disciplines function as a feedback loop, with measured offsets used to adjust the clock oscillator phase and frequency to match the external synchronization source. The behavior of feedback loops is well understood and modelled by mathematical analysis. The significant design parameter is the time constant, or responsiveness to external or internal variations in time or frequency. Optimum selection of time constant depends on the interval between update messages. In general, the longer these intervals, the larger the time constant and vice versa. In practice and with typical network configurations the optimal poll intervals vary between one and twenty minutes for network paths to some thousands of minutes for modem paths.