Low-Power Scan-Based Built-In Self-Test Based on Weighted Pseudorandom Test Pattern Generation and Reseeding

Low-Power Scan-Based Built-In Self-Test Based on Weighted Pseudorandom Test Pattern Generation and Reseeding

 

ABSTRACT:

A new low-power (LP) scan-based built-in self-test (BIST) technique is proposed based on weighted pseudorandom test pattern generation and reseeding. A new LP scan architecture is proposed, which supports both pseudorandom testing and deterministic BIST. During the pseudorandom testing phase, an LP weighted random test pattern generation scheme is proposed by disabling a part of scan chains. During the deterministic BIST phase, the design-for-testability architecture is modified slightly while the linear-feedback shift register is kept short. In both the cases, only a small number of scan chains are activated in a single cycle. Sufficient experimental results are presented to demonstrate the performance of the proposed LP BIST approach. The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.

 

EXISTING SYSTEM:

Recent methods aim at reducing the switching activity during scan shift cycles, whose test generator allows automatic selection of their parameters for LP pseudorandom test generation. However, many of the previous LP BIST approaches cause fault coverage loss to some extent. Therefore, achieving high fault coverage in an LP BIST scheme is also very important. Weighted pseudorandom testing schemes and methods can effectively improve fault coverage. However, these approaches usually result in much more power consumption due to more frequent transitions at the scan flip flops in many cases. Therefore, we intend to propose an LP scan-based pseudorandom pattern generator (PRPG).

Scan flip flops, especially, the ones close to the scan-in pins, are not observable in most of shift cycles. Tsai et al. proposed a novel BIST scheme that inserts multiple capture cycles after scan shift cycles during a test cycle. Thus, the fault coverage of the scan-based BIST can be greatly improved. An improved method of the earlier work, presented, selects different numbers of capture cycles after the shift cycles. In this paper, a new LP scan-based BIST technique is proposed based on weighted pseudorandom test pattern generation and reseeding. A new LP scan architecture is proposed, which supports both pseudorandom testing and deterministic BIST.

Weighted pseudorandom testing schemes can effectively improve fault coverage. A weighted test-enable signal-based pseudorandom test pattern generation scheme was proposed for scan-based BIST, according to which the number of shift cycles and the number of capture cycles in a single test cycle are not fixed. A reconfigurable scan architecture was used for the deterministic BIST scheme using the weighted test enable signal-based pseudorandom test generation scheme. Laiet al. proposed a new scan segmentation approach for more effective BIST.LP BIST approaches were proposed early, and. Zorian proposed a distributed BIST control scheme in order to simplify the BIST execution of complex ICs. The average power was reduced and the temperature was reduced. The methods reduced switching activity during scan shifts by adding extra logic. A new random single-input change test generation scheme generates LP test patterns that provide a high level of defect coverage during LP BIST of digital circuits. An LP BIST scheme was proposed based on circuit partitioning.

New pseudorandom test generators were proposed to reduce power consumption during testing. A new encoding scheme is proposed, which can be used in conjunction with any LFSR-reseeding scheme to significantly reduce test power and even further reduce test data volume. Laiet al.[28] proposed a new LP PRPG for scan-based BIST using a restricted scan chain reordering method to recover the fault coverage loss.

A low-transition test pattern generator was proposed to reduce the average and peak power of a circuit during test by reducing the transitions among patterns. Transitions are reduced in two dimensions:

1) Between consecutive patterns and

2) Between consecutive bits. Abu-Issa and Quigley proposed a PRPG to generate test vectors for test-per-scan BISTs in order to reduce the switching activity while shifting test vectors into the scan chain. Furthermore, a novel algorithm for scan-chain ordering has been presented.

 

DISADVANTAGES:

  • The data storage of chip is high

 

PROPOSED SYSTEM:

New low-power weighted pseudo random pattern test generator:

We propose a new LP scan-based BIST architecture, which supports LP pseudorandom testing, LP deterministic BIST and LP reseeding.

DFT Architecture:

As shown in Fig. 1, the scan-forest architecture is used for pseudorandom testing in the first phase. Each stage of the phase shifter (PS) drives multiple scan chains, where all scan chains in the same scan tree are driven by the same stage of the PS. Unlike the multiple scan-chain architecture used in the previous methods, the scan-forest architecture is adopted to compress test data and reduce the deterministic test data volume. Separate weighted signalse0, e1,…,and en are assigned to all scan chains in the weighted pseudo random testing phase (phase=0), as shown in Fig. 1,which is replaced by the regular test in the deterministic BIST phase (phase=1). Each scan-in signal drives multiple scan chains, as shown in Fig. 1, where different scan chains are assigned different weights. This technique can also significantly reduce the size of the PS compared with the multiple scan-chain architecture where each stage of the PS drives one scan chain. The compactor connected to the combinational part of the circuit is to reduce the size of the MISR. The shadow register is used for LP deterministic and reseeding.

Figure 1: General DFT architecture for LP scan-based BIST

The size of the LFSR needed for deterministic BIST depends on the maximum number of care bits of all deterministic test vectors for most of the previous deterministic BIST methods. In some cases, the size of the LFSR can be very large because of a few vectors with a large number of care bits even when a well-designed PS is adopted. This may significantly increase the test data volume in order to keep the seeds. This problem can be solved by adding a small number of extra variables to the LFSR or ring generator without keeping a big seed for each vector.

Weighted Pseudorandom Test Pattern Generation:

Our method generates the degraded sub circuits for all subsets of scan chains in the following way. All PPIs related to the disabled scan chains are randomly assigned specified values (1 and 0). Note that all scan flip flops at the same level of the same scan tree share the same PPI. For any gate, the gate is removed if its output is specified; the input can be removed from a NAND, NOR, AND, and OR gates if the input is assigned a non-controlling value and it has at least three inputs. For a two-input AND or OR gate, the gate is removed if one of its inputs is assigned a non-controlling value. For an or NAND gate, the gate degrades to an inverter if one of its inputs is assigned a non controlling value.

For an XOR or NXOR gate with more than three inputs, the input is simply removed from the circuit if one of its inputs is assigned value 0; the input is removed if it is assigned value 1, an XOR gate changes to an NXOR gate, and an NXOR gate changes to an XOR gate. For an XOR gate with two inputs, and one of its inputs is assigned value 0, the gate is deleted from the circuit. For a two-input NXOR gate, the gate degrades to an inverter. If one of its inputs is assigned value 1,a two-input XOR gate degrades to an inverter. If one of its inputs is assigned value 1, a two-input NXOR gate can be removed from the circuit.

Figure 2: Weighted pseudorandom test generator for scan-tree-based LP BIST

In the scan-based BIST architecture, as shown in Fig. 2,different weights e0, e1,…,and ek are assigned to the test enable signals of the scan chains SC0,SC1,…,and SCk, respectively, where e0, e2,…,ek∈{0.5,0.625,0.75,0.875}.Scan flip flops in all disabled scan chains are set to constant values. Our method randomly assigns constant values to all scan flip flops in the disabled scan chains. The circuit is degraded into a smaller sub circuit. All weights on the test enable signals are selected in the degraded sub-circuit.

The gating logic is presented in Fig. 1. We do not assign weights less than 0.5 to the test-enable signals, because we do not want to insert more capture cycles than scan shift cycles. We have developed an efficient method to select weights for the test-enable signals of the scan chains.

ADVANTAGES:

  • Reduce the data storage on chip

 

SOFTWARE IMPLEMENTATION:

  • Modelsim
  • Xilinx ISE

 

Coordinate Rotation-Based Low Complexity K-Means Clustering Architecture

Coordinate Rotation-Based Low Complexity K-Means Clustering Architecture

 

ABSTRACT:

In this brief, we propose a low-complexity architectural implementation of the K-means-based clustering algorithm used widely in mobile health monitoring applications for unsupervised and supervised learning. The iterative nature of the algorithm computing the distance of each data point from a respective centroid for a successful cluster formation until convergence presents a significant challenge to map it onto a low-power architecture. This has been addressed by the use of a 2-D Coordinate Rotation Digital Computer-based low-complexity engine for computing the n-dimensional Euclidean distance involved during clustering. The proposed clustering engine was synthesized using the TSMC 130-nm technology library, and a place and route was performed following which the core area and power were estimated as 0.36 mm2 and 9.21 mW at 100 MHz, respectively, making the design applicable for low-power real-time operations within a sensor node. The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.

 

 

 

EXISTING SYSTEM:

The fundamental concept of the cluster analysis is to form groups of similar objects as a means of distinguishing them from each other. Clustering techniques have been successfully used in diverse fields, such as medicine (EEG and activity recognition) and marketing, involving multivariate data and can be conveniently deployed with limited resources (memory and CPU). The K-means clustering algorithm owing to its computational simplicity and efficiency has been an attractive choice for a wide variety of signal processing applications. It is a well-perceived fact in the research community that the cluster analysis is primarily used for unsupervised learning, where the class labels for the training data are not available. However, the K-means algorithms can also be used for supervised learning, where the class labels of the training data are known apriori. Apart from using it as a learning algorithm, K-means has also been utilized for signal preprocessing, feature reduction, and time-domain signal analysis. Hence, using K-means for real-time cluster analysis requiring computation in resource-constrained sensor nodes for remote health care monitoring systems, where online multimodal data acquisition and analysis is the key (e.g., cardiovascular disease prognosis), requires an effective implementation strategy. The fundamental requirement for such applications is to cut down on continuous transmission and has a low-power operation to prolong their battery life. Hence, an effective algorithm-to-architecture holistic mapping is required to fulfill the notion of low-power operation aimed for long durations.

The K-means algorithm exhibits an iterative nature, where it computes the distance of each data sample from the centroids until convergence. This is generally achieved by the use of power hungry multipliers, square rooters (for Euclidean distance computation), and multiplexers, thereby rendering direct mapping of this algorithm to architecture infeasible for implementation on resource constrained platforms. An attempt was made to replace the Euclidean distance by a combination of Manhattan and Max distance but by trading off accuracy for power consumption. Therefore, an optimization is necessary to achieve a tradeoff between algorithm efficiency and architectural complexity. Coordinate Rotation Digital Computer (CORDIC)-based architectures exploring its different transcendental functions to compute complex arithmetic operations have been widely used for computationally intensive signal processing algorithms, which apply the K-means clustering algorithm. Hence, in this brief, we investigate the use of a CORDIC-based low-complexity engine to implement K-means clustering algorithm.

DISADVANTAGES:

  • Complex design

 

 

PROPOSED SYSTEM:

The K-means algorithm iterates to minimize the squared error between the empirical mean of a cluster and the individual data points, defined as the cost function. Initially, k-centroids are defined and data vectors are assigned to a cluster label depending on how close they are to each centroid. The k centroids are recalculated from the newly defined clusters, and the process of reassignment of each data vector to each new centroid is repeated. The algorithm iterates over this loop until the data vectors form clusters and the cost function is minimized. CORDIC is an efficient implementation technique for vector rotation and arctangent computation using shift add operations. In this brief, we use CORDIC in vectoring mode for our implementation. The architecture of the proposed CORDIC-based K-means clustering engine is shown in Fig. 1. The input data are stored in memory and transmitted to different blocks via control unit (CU). The Euclidean distance is calculated in distance unit using low complexity CORDIC vectoring module. The distances from each point to eachof the centroids are sent to a comparator block to identify the clusterto which it belongs. Once the clustering is done, centroid calculation block will be activated to compute the new centroids. If these new centroids differ significantly from the previous values of the iteration, clustering is repeated, else clustered data are sent to the output. The CU governs the data flow among all the modules. The proposed engine utilizes CORDIC to compute Euclidean distance between two points, which is a metric to compute the clusters and has been explained through an illustrative example.

Figure 1: Complete architecture of the proposed system

In this brief, our focus is to propose a methodology for utilizing CORDIC to compute Euclidean distance between two points, which is a metric to compute the clusters. In 2-D signal space, if(x1,x2)and(y1,y2)are two points, the Euclidean distance between these two points will be

One square rooter, two square, one adder, and two subtraction operations are involved in this computation. If we give a and b as the x-and y-inputs to the vectoring mode CORDIC, the x output will be the magnitude of vector (a,b), which is(a2+b2)1/2. So, with(x1 −y1) and (x2 −y2) as the x-and y-inputs, respectively, the vectoring mode generates the distance between two points. Architecture of the 2-D distance measurement unit using vectoring mode CORDIC is shown in Fig. 2(a). We can extend this methodology ton-dimensional (nD) signal space to formulate distance between two nD vectors.

Figure 2: CORDIC-based distance measurement. (a) 2-D vectoring. (b) 3-D vectoring. (c) 3-D multiplexed architecture. (d) nD multiplexed architecture

 

 

ADVANTAGES:

  • Design complexity is less

SOFTWARE IMPLEMENTATION:

  • Modelsim
  • Xilinx ISE

 

10T SRAM Using Half-VDD Precharge and Row-Wise Dynamically Powered Read Port for Low Switching Power and Ultralow RBL Leakage

10T SRAM Using Half-VDD Precharge and Row-Wise Dynamically Powered Read Port for Low Switching Power and Ultralow RBL Leakage

ABSTRACT:

We present, in this paper, a new 10T static random access memory cell having single ended decoupled read-bitline (RBL) with a 4T read port for low power operation and leakage reduction. The RBL is precharged at half the cell’s supply voltage, and is allowed to charge and discharge according to the stored data bit. An inverter, driven by the complementary data node (QB), connects the RBL to the virtual power rails through a transmission gate during the read operation. RBL increases toward the VDD level for a read-1, and discharges toward the ground level for a read-0. Virtual power rails have the same value of the RBL precharging level during the write and the hold mode, and are connected to true supply levels only during the read operation. Dynamic control of virtual rails substantially reduces the RBL leakage. The proposed 10T cell in a commercial 65 nm technology is 2.47×the size of 6T with β=2, provides 2.3×read static noise margin, and reduces the read power dissipation by 50% than that of 6T. The value of RBL leakage is reduced by more than 3 orders of magnitude and (ION/IOFF) is greatly improved compared with the 6T BL leakage. The overall leakage characteristics of 6T and 10T are similar, and competitive performance is achieved. The proposed architecture of this paper analysis the logic size, area and power consumption using Tanner tool.

 

EXISTING SYSTEM:

SRAM cell must robustly operate under hold, read, and write mode. An SRAM cell uses the positive feedback of cross-coupled inverters (INVs) to store a single bit of information in a complementary fashion. Access transistors provide the mechanism for the read and write operation. Before every access, column BL pair (BL and BLB) is precharged to the supply voltage. For the write operation, one of the precharged BLs is discharged through the write driver.

Figure 1: Conventional 6T SRAM read. (a) Column of M bit-cells during read. (b) Top: hold and read SNM butterfly curve (with worst case noise polarity during hold). Bottom: transient behavior showing read disturbance

Fig. 1(a) shows a single column ofM6T SRAM cells, where one cell is accessed in read mode with data=0(Qa=0), while other M−1 cells are in the hold mode. Leakage components are labeled, and for the worst case leakage, all M−1 cells store data=1(Qu=1).I read flows from BL to the VSS through AL and NL of the accessed cell, and the BL voltage is decreased. The unaccessed cell on the BL exhibits BL leakage.IuLeak0 is the main component of BL leakage whileIuLeak1 is negligible, as VDS of AR of the unaccessed cell is large, while VDSof its AL is very small (varies from 0 to VBL). These leakage components decrease the differential BL voltage development. As there are a large number of cells in a single column, the worst case BL leakage can decrease BLB voltage enough to make an erroneous read. Thus, I read must be greater than (M−1)×IuLeak0,whereMis the number of cells in a single column.

Figure 2: SRAM read ports (a) 6T. (b) 8T.(c) 9T.(d) 9T.(e) 10T.

In essence, 6T SRAM has conflicting read and write requirements and transistor sizing cannot be done independently. Also, 6T has inherit RSNM problem as the read current passes through the cell internal node, and it further degrades with VDD scaling. Also, being considered as baseline design, 6T has overall a higher power dissipation, and higher BL leakages, as the low power techniques employ a certain mechanism to lower the dynamic power dissipation, e.g., charge sharing and hierarchical BL and the leakages (by employing virtual rails). The read port of 6T SRAM cell is shown in Fig. 2(a) that highlights the internal node Q in the read current path. Many alternative bit cells and techniques have been proposed in the literature to improve SRAM cell stability, reduce the leakage currents, and achieve low power operation compared with the conventional6T design.

An 8T SRAM cell adds a separate 2T read port, shown in Fig. 2(b), and necessarily solves the problem of read stability. Internal nodes are isolated from the read current path, and thus a high RSNM is achieved. Also, sizing of 8T read port can be done independently without affecting the write operation.

In 6T SRAM read operation, one of the BL stays at the VDD while the other decreases by VBL amount. However, in the case of 8T SRAM, there is only one BL (RBL) and it either decreases or stays at the VDD level depending on the bit read. Now, the sensing of SE BL can be done using different circuits such as: 1) domino sensing that requires full VDD swing ON the local-BL; 2) psuedo-differential that requires a reference signal; and 3) ac coupled sensing that requires the use of capacitors. Using a reference-based sense amplifier, only a small voltage difference is required.

 

DISADVANTAGES:

  • Power consumption is high

PROPOSED SYSTEM:

We present our half VDD precharge and charger cycling technique for low power read operation. A 4T read port is designed to employ the proposed technique. ReadBL (RBL) is charged and discharged through the read port according to the state of stored bit. Read port is powered by virtual power rails that run horizontal and are shared bythe cells of a word. The dynamic control of read port power rails reduces the RBL leakage substantially.

Figure 3: Proposed 10T SRAM cell with row-wise read port dynamic power lines

Proposed cell and low power technique:

The proposed 10T SRAM cell with SE RBL is shown in Fig. 3. We have added a 4T read port to the 6T cell to decouple the internal nodes during the read operation. Read port consists of an INV P1-N1 driven by node QB, and a transmission gate (TG) P2-N2. The output (Z) of the INV is connected to RBL during the read operation through TG, which is controlled by (read) control signals. Furthermore, read port is powered by virtual power rails, VVDD and VVSS, which are dynamically controlled. These virtual power rails (control signals) run horizontally, and have the true rail values only during the read operation. For the RBL leakage reduction, both the virtual rails have the same level as the precharge level of RBL.

  • The 10T SRAM cell using an INV and a TG has been proposed earlier. However, our proposed 10T scheme is different from the previous design in the following aspects. The previous INV+TG-based 10T cell was application specific, while our proposed design is generic.
  • We have used the dynamically controlled power rails for the read port.
  • We precharge RBL at VDD/2, while the previous 10T design eliminated the precharge phase, and used INV to fully charge or discharge the RBL.
  • The basic read technique of both the designs is completely different. The main idea of the proposed design is “the charging or the discharging of the read BL from VDD/2 for every read operation.” The previous design either discharges from VDD to VSS, or charges from VSS to VDD.
  • A powerful INV was used previously to produce full VDD swing on the RBL. In the proposed design, RBL is precharged at VDD/2, and only a small voltage difference (comparable with 6T) is produced for every read cycle.
  • In the proposed design, for every read cycle the RBL will exhibit some change (positive or negative) from its precharged value of vdd/2. However, the RBL would not change for consecutive similar bit reads. RBL would change only if consecutive read bits are different.

 

ADVANTAGES:

  • Power consumption is low

 

SOFTWARE IMPLEMENTATION:

  • Tanner tool

 

A 2.5-ps Bin Size and 6.7-ps Resolution FPGA Time-to-Digital Converter Based on Delay Wrapping and Averaging

A 2.5-ps Bin Size and 6.7-ps Resolution FPGA Time-to-Digital Converter Based on Delay Wrapping and Averaging

ABSTRACT:

A high-resolution time-to-digital converter (TDC) implemented with field programmable gate array (FPGA) based on delay wrapping and averaging is presented. The fundamental idea is to pass a single clock through a series of delay elements to generate multiple reference clocks with different phases for input time quantization. Due to periodicity, those phases will be equivalently wrapped within one reference clock period to achieve the required fine resolution. In practice, a hybrid delay matrix is created to significantly reduce the required number of delay cells. Multiple TDC cores are constructed for parallel measurements and then exquisite routing control and averaging are applied to smooth out the large quantization errors caused by the in homogeneity of the TDC delay lines for both linearity and single-shot precision enhancement. To reduce the impact of temperature sensitivity, a cancellation circuit is created to substantially reduce the offset and confine the output difference within 2 LSB for the same input interval over the full operation temperature range of FPGA. With such a fine resolution of 2.5 ps, the integral nonlinearity is measured to be from merely −2.98 to 3.23 LSB and the corresponding rms resolution is 4.99–6.72 ps. The proposed TDC is tested to be fully functional over 0 °C–50 °C ambient temperature range with extremely low resolution variation. Its performance is even superior to many full-custom-designed TDCs The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.

EXISTING SYSTEM:

Conventionally, TDC with sub nano second resolution can be realized with emitter-coupled logic (ECL) which is not only power consuming but also area consuming and unsuitable for portable systems or integrated chips. Many different techniques have been developed in order to achieve a high resolution and a wide measurement range such as time-to-amplitude conversion, Vernier principle, time stretching, and time interpolation. In theory, the simplest implementation of TDC is a high frequency counters those increments every clock cycle.3-ps incremental resolution is achieved with the help of a time-consuming statistical method: the lookup table (LUT).Meanwhile, multistage interpolation can be applied straightforwardly to obtain a wide measurement range while keeping high resolution at the same time. Fig. 1 shows the conceptual timing diagram of the two-stage time interpolation technique based on the classic Nutt method. The input interval Tin is segmented into T12, T1, and T2. T12 is synchronous with the reference clock CLK and can be readily digitized by a coarse counter whileT1 andT2 with duration less than one clock period TCLK are processed by fine TDCs or interpolators with resolutions much smaller than TCLK. T in can be measured as

Tin =T12+T1−T2.                                                                                                (1)

Since the interpolator dominates the effective resolution of TDC, many structures are created to enhance its accuracy. The most commonly used are tapped delay line, pulse stretcher (dual-slope conversion), pulse shrinking, and Vernier delay line (differential delay line) to achieve sub gate delay resolution. After tens of years of evolution, it is still a challenge for experienced designers to accomplish an effective resolution better than 10 ps for TDCs. More subtle techniques are required. Time amplification is adopted to implement a TDC with 9 b, a 1.25-ps bin size, and an output standard deviation of<1 LSB. The measured differential nonlinearity (DNL) and integral nonlinearity (INL) are 0.8 LSB and 3 LSB, respectively, with a limited dynamic range. Cyclic time-domain successive approximation is created to get a 1.2-ps resolution and a 327-μs dynamic range. The RMS single-shot precision is 3.2ps achieved using an external INL-LUT for the interpolators. Vernier ring is invented to generate an 8-ps LSB width with an output standard deviation of <1 LSB also. The performance is further improved by a gated Vernier ring structure to realize an equivalent resolution of 3.2ps with an oversampling ratio of 16.An 8-b cyclic TDC is proposed to achieve a 1.25ps LSB width, a±0.7 LSB DNL, and a−3to+1LSBINL.To enhance dynamic accuracy for applications with periodic TDC input, time-domain delta sigma modulation for noise shaping is adopted to get an effective resolution around 6 ps.

DISADVANTAGES:

  • worst performance

PROPOSED SYSTEM:

Assuming that nwrapped phases are uniformly distributed in one reference clock period, the bin size of the TDC can be calculated as

LSB=TCLK/n=1/n×f                                                                                            (2)

During circuit implementation, the pulse-shrinking/stretching mechanism caused by the aspect ratio mismatch among adjacent devices will limit the realizable length of the clock delay line [36]. To accomplish pico second – level resolution, at least hundreds of delay cells are required. After being fed into such a long delay line, the duty cycle of high-frequency reference clock will be either shrunk or stretched to be 0% or 100% before reaching the end of delay line. No delayed clock signal will be generated for the rest of the delay stages after the duty cycle reaches0% or 100% to ruin the TDC accuracy. In theory, the clock frequency can be lowered to get a larger pulse width to ensure that the reference clock can propagate to the end of delay line. However, the delay line must be lengthened correspondingly to maintain the same resolution as revealed by (2). The impact of pulse-shrinking/stretching mechanism is proportionally increased to spoil the effectiveness of clock frequency lowering. On the contrary, the input signal can be made with a larger pulse width than the reference clock and fed into the delay line instead to solve the dilemma. The conceptual timing diagram is shown in Fig. 1. Since all the wrapped clocks quantize the same input signal, Tin can be duplicated in theory so that each clock can be paired up with one specific input signal (e.g., Ci with T in, i)as depicted in Fig. 1(a). Then, we can align all clocks while shifting the input signals accordingly to keep exactly the same timing relation between each pair of signals Ci and Tin, I in Fig. 4(b).Equivalently, T in is fed into the same delay line and then all delayed input signals are quantized by the same reference clock to get the same output for the proposed TDC. The expense is long dead time since only when Tin propagates to the last delay stage can the TDC get the final conversion output.

Figure 1: Timing diagram with (a) delayed clocks and (b) delayed inputs

Another problem is raised by the above modification to delay T in instead of CLK. For much fine resolution, the input delay line is expected to be very long with significant pulse shrinking/stretching impact which limits the smallest measurable width of the input signal Tin. Consequently, a large TDC offset can be expected. To reduce the offset and logic utilization, a delay matrix with multiple short delay lines can be used for a single input Tin to generate enough number of delayed signals as revealed in Fig. 2(a). In theory, different delay cells or strict timing constraints need to be adopted for vertical and horizontal delay lines to make sure the maximum uniformity can be realized among the wrapped phases. Since both Tin and CLK can be delayed to generate the required phase shifts, a hybrid delay matrix or the so-called 2-D Vernier is thus constructed to substantially reduce the number of delay cells from approximate H×V to H+V as shown in Fig. 2(b).

Figure 2: (a) Delay matrix. (b) Hybrid delay matrix

One feasible way to evenly distribute the phases among reference clocks is to use FPGA embedded multi-output phase locked loop (PLL) for phase division as depicted in Fig. 3.There is only one H-stage delay line used.

Figure 3: Hybrid delay matrix withPLL for clock phase division.

ADVANTAGES:

  • Better performance

SOFTWARE IMPLEMENTATION:

  • Modelsim
  • Xilinx ISE