A 2.5-ps Bin Size and 6.7-ps Resolution FPGA Time-to-Digital Converter Based on Delay Wrapping and Averaging

A 2.5-ps Bin Size and 6.7-ps Resolution FPGA Time-to-Digital Converter Based on Delay Wrapping and Averaging


A high-resolution time-to-digital converter (TDC) implemented with field programmable gate array (FPGA) based on delay wrapping and averaging is presented. The fundamental idea is to pass a single clock through a series of delay elements to generate multiple reference clocks with different phases for input time quantization. Due to periodicity, those phases will be equivalently wrapped within one reference clock period to achieve the required fine resolution. In practice, a hybrid delay matrix is created to significantly reduce the required number of delay cells. Multiple TDC cores are constructed for parallel measurements and then exquisite routing control and averaging are applied to smooth out the large quantization errors caused by the in homogeneity of the TDC delay lines for both linearity and single-shot precision enhancement. To reduce the impact of temperature sensitivity, a cancellation circuit is created to substantially reduce the offset and confine the output difference within 2 LSB for the same input interval over the full operation temperature range of FPGA. With such a fine resolution of 2.5 ps, the integral nonlinearity is measured to be from merely −2.98 to 3.23 LSB and the corresponding rms resolution is 4.99–6.72 ps. The proposed TDC is tested to be fully functional over 0 °C–50 °C ambient temperature range with extremely low resolution variation. Its performance is even superior to many full-custom-designed TDCs The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.


Conventionally, TDC with sub nano second resolution can be realized with emitter-coupled logic (ECL) which is not only power consuming but also area consuming and unsuitable for portable systems or integrated chips. Many different techniques have been developed in order to achieve a high resolution and a wide measurement range such as time-to-amplitude conversion, Vernier principle, time stretching, and time interpolation. In theory, the simplest implementation of TDC is a high frequency counters those increments every clock cycle.3-ps incremental resolution is achieved with the help of a time-consuming statistical method: the lookup table (LUT).Meanwhile, multistage interpolation can be applied straightforwardly to obtain a wide measurement range while keeping high resolution at the same time. Fig. 1 shows the conceptual timing diagram of the two-stage time interpolation technique based on the classic Nutt method. The input interval Tin is segmented into T12, T1, and T2. T12 is synchronous with the reference clock CLK and can be readily digitized by a coarse counter whileT1 andT2 with duration less than one clock period TCLK are processed by fine TDCs or interpolators with resolutions much smaller than TCLK. T in can be measured as

Tin =T12+T1−T2.                                                                                                (1)

Since the interpolator dominates the effective resolution of TDC, many structures are created to enhance its accuracy. The most commonly used are tapped delay line, pulse stretcher (dual-slope conversion), pulse shrinking, and Vernier delay line (differential delay line) to achieve sub gate delay resolution. After tens of years of evolution, it is still a challenge for experienced designers to accomplish an effective resolution better than 10 ps for TDCs. More subtle techniques are required. Time amplification is adopted to implement a TDC with 9 b, a 1.25-ps bin size, and an output standard deviation of<1 LSB. The measured differential nonlinearity (DNL) and integral nonlinearity (INL) are 0.8 LSB and 3 LSB, respectively, with a limited dynamic range. Cyclic time-domain successive approximation is created to get a 1.2-ps resolution and a 327-μs dynamic range. The RMS single-shot precision is 3.2ps achieved using an external INL-LUT for the interpolators. Vernier ring is invented to generate an 8-ps LSB width with an output standard deviation of <1 LSB also. The performance is further improved by a gated Vernier ring structure to realize an equivalent resolution of 3.2ps with an oversampling ratio of 16.An 8-b cyclic TDC is proposed to achieve a 1.25ps LSB width, a±0.7 LSB DNL, and a−3to+1LSBINL.To enhance dynamic accuracy for applications with periodic TDC input, time-domain delta sigma modulation for noise shaping is adopted to get an effective resolution around 6 ps.


  • worst performance


Assuming that nwrapped phases are uniformly distributed in one reference clock period, the bin size of the TDC can be calculated as

LSB=TCLK/n=1/n×f                                                                                            (2)

During circuit implementation, the pulse-shrinking/stretching mechanism caused by the aspect ratio mismatch among adjacent devices will limit the realizable length of the clock delay line [36]. To accomplish pico second – level resolution, at least hundreds of delay cells are required. After being fed into such a long delay line, the duty cycle of high-frequency reference clock will be either shrunk or stretched to be 0% or 100% before reaching the end of delay line. No delayed clock signal will be generated for the rest of the delay stages after the duty cycle reaches0% or 100% to ruin the TDC accuracy. In theory, the clock frequency can be lowered to get a larger pulse width to ensure that the reference clock can propagate to the end of delay line. However, the delay line must be lengthened correspondingly to maintain the same resolution as revealed by (2). The impact of pulse-shrinking/stretching mechanism is proportionally increased to spoil the effectiveness of clock frequency lowering. On the contrary, the input signal can be made with a larger pulse width than the reference clock and fed into the delay line instead to solve the dilemma. The conceptual timing diagram is shown in Fig. 1. Since all the wrapped clocks quantize the same input signal, Tin can be duplicated in theory so that each clock can be paired up with one specific input signal (e.g., Ci with T in, i)as depicted in Fig. 1(a). Then, we can align all clocks while shifting the input signals accordingly to keep exactly the same timing relation between each pair of signals Ci and Tin, I in Fig. 4(b).Equivalently, T in is fed into the same delay line and then all delayed input signals are quantized by the same reference clock to get the same output for the proposed TDC. The expense is long dead time since only when Tin propagates to the last delay stage can the TDC get the final conversion output.

Figure 1: Timing diagram with (a) delayed clocks and (b) delayed inputs

Another problem is raised by the above modification to delay T in instead of CLK. For much fine resolution, the input delay line is expected to be very long with significant pulse shrinking/stretching impact which limits the smallest measurable width of the input signal Tin. Consequently, a large TDC offset can be expected. To reduce the offset and logic utilization, a delay matrix with multiple short delay lines can be used for a single input Tin to generate enough number of delayed signals as revealed in Fig. 2(a). In theory, different delay cells or strict timing constraints need to be adopted for vertical and horizontal delay lines to make sure the maximum uniformity can be realized among the wrapped phases. Since both Tin and CLK can be delayed to generate the required phase shifts, a hybrid delay matrix or the so-called 2-D Vernier is thus constructed to substantially reduce the number of delay cells from approximate H×V to H+V as shown in Fig. 2(b).

Figure 2: (a) Delay matrix. (b) Hybrid delay matrix

One feasible way to evenly distribute the phases among reference clocks is to use FPGA embedded multi-output phase locked loop (PLL) for phase division as depicted in Fig. 3.There is only one H-stage delay line used.

Figure 3: Hybrid delay matrix withPLL for clock phase division.


  • Better performance


  • Modelsim
  • Xilinx ISE