A 92-dB DR, 24.3-mW, 1.25-MHz BW Sigma–Delta Modulator Using Dynamically Biased Op Amp Sharing

A 92-dB DR, 24.3-mW, 1.25-MHz BW Sigma–Delta Modulator Using Dynamically Biased Op Amp Sharing

 

ABSTRACT:

A 2–2 cascaded switched-capacitor sigma-delta modulator is presented for design of low-voltage, low-power, broadband analog-to-digital conversion. To reduce power dissipation in both analog and digital circuits and ensure low-voltage operation, a half-sample delayed-input feed forward architecture is employed in combination with 4-bit quantization, which results in reduced integrator output swings and relaxed timing constraint in the feedback path. The integrator power is further reduced by sharing an op amp in the two integrators in each stage and periodically changing the op amp bias condition between a high-current and a low-current mode using a fast low-power high-precision charge pump circuit. Implemented in a 0.18-μm CMOS technology, the experimental prototype achieves a 92-dB dynamic range, a 91-dB peak signal-to-noise ratio, and an 84-dB peak signal-to-noise plus distortion ratio, respectively for a signal bandwidth of 1.25 MHz Operated at a 40-MHz sampling rate, the modulator dissipates 24.3 mW from a 1 V supply The proposed architecture of this paper analysis the logic size, area and power consumption using Tanner tool.

EXISTING SYSTEM:

Oversampling ADCs based on use of sigma–delta modulation offer a means of overcoming the constraints imposed on analog circuit performance by scaling of the technology by exchanging resolution in time for that in amplitude. As a result, a high-precision ADC output can be generated by using back-end digital decimation filtering. However, use of oversampling sigma delta ADCs is typically limited to low-bandwidth applications, such as digital audio and sensor systems, due to the high oversampling ratio required to achieve good conversion accuracy. Meanwhile, sigma delta modulators operating at relatively low oversampling ratios have been shown to be effective means of implementing high-performance ADCs for a signal bandwidth of several megahertz. Design techniques for low voltage low-power operation have also been introduced from an architecture view point, such as an input feed forward and a one-sample delayed input feed forward modulator architecture, and a circuit design view point, such as a double sampling, an inverter or zero-crossing-based stage implementation, an op amp switching, and an op amp sharing.

A megahertz-bandwidth 2–2 cascaded sigma delta modulator is presented, which employs a fast dynamic biasing to reduce power in an integrator op amp that is shared by two integrators. A half-sample delayed-input feed forward architecture is used for implementation of each stage in the2–2 cascaded modulator. This approach, in combination with 4-bit quantization, reduces the integrator output swing and relaxes the timing constraints imposed on the feedback path without requiring any additional circuitry to implement the added delay. As a result, low-voltage low-power operation is enabled for the integrators, the quantizer, and the feedback digital-to-analog converter (DAC). In order to further reduce the integrator power, the two integrators in each stage share a single op amp, and the bias condition of the shared op amp is periodically changed between a high-current mode and a low-current mode to exploit op amp power scaling in discrete-time (DT) integrators. The dynamic biasing of the op amp is performed by a fast low-power charge pump circuit, which precisely changes the op amp bias during the non-overlapping clock periods of the two-phase clocks.

DISADVANTAGES:

  • Area coverage is high
  • Power consumption is high

PROPOSED SYSTEM:

Stage Architecture:

Three different modulator architectures, the distributed feedback architecture, the input feed forward architecture, and the half-sample delayed-input feedforward architecture, are compared in this section based on the magnitude of the first integrator output swing. The reason for this basis is that the integrator output swing greatly affects the integrator power, and the first integrator is one of the most power-hungry blocks in a deta modulator. To simplify the comparison and ensure the modulator stability, only second order modulators are considered.

Figure 1: Second-order modulator with (a) distributed feedback architecture, (b) input feed forward architecture, and (c) half-sample delayed-input feed forward architecture

In the distributed feedback architecture shown in Fig. 1(a), the first integrator processes the modulator input along with the quantization error. This, in turn, results in a very large integrator output swing as the input amplitude approaches full-scale. In contrast, the integrators in the input feed forward architecture in Fig. 1(b) process only the quatization error, and thus, the integrator output swings are significantly reduced if multi-bit quantization is employed. In the half-sample delayed-input feed forward architecture in Fig. 1(c), the signal components at the first and second integrator outputs are not completely removed. However, since they are attenuated by a second-order and a first-order difference at the first integrator and second integrator outputs, respectively, the integrator output swings are very small, regardless of the modulator input if the oversampling ratio is not too low.

Behavioral simulations were performed on the magnitude of the first integrator output with respect to the modulator input power. In these simulations, quantization resolution was 4 bit, and the input frequency was at the edge of the signal bandwidth for an oversampling ratio of 16. The choice of the band-edge input frequency is to reflect the worst case scenario in Fig. 1(c), where a signal transfer function (STF) has a high-pass characteristic. All integrators and quantizers were assumed to be ideal. As shown in Fig. 2, the first integrat or output swing in Fig. 1(a) grows proportionally as the input power increases, whereas those in Fig. 1(b) and (c) remain nearly constant and small. This reduced integrator outputswing results in less nonlinear distortion in the integrator, and thus relaxes the dc gain and the speed required for the integrator op amp.

Op Amp Sharing:

Power and area of DT, multistage, op amp-based analog circuits can be reduced by means of op amp sharing. In a pipelined ADC, each stage op amp is reset during one of the two clock phases and transfers charge during the other clock phase. Therefore, an op amp can be shared by either adjacent stages or distant stages, with charge transfer occurring at the opposite clock phase.

In contrast, the op amps in a DT switched-capacitor sigma-delta modulator are always in a closed-loop configuration formed through integration capacitors. Typically, an integrator op amp transfers and integrates charge during one clock phase and drives the sampling capacitor in the following stage during the other clock phase. However, if a sigma-delta modulator is to be implemented using half-sample delayed integrators, as shown in Fig. 1(c), the integrator op amp integrates charge and drives the subsequent circuitry at the same time. Then, the op amp can be turned OFF in its sampling phase, or it can be shared by two integrators by alternately disconnecting integration capacitors from the op amp.

Figure 2: Second-order sigma-delta modulator with a shared op amp with two input pairs

To avoid this, in this paper, an op amp with two distinct input pairs is used for integrator op amps. A single-end edversion of the second-order stage implementation is shown in Fig.2, where 1-bit quantization is assumed for simplicity. The two op amp inputs are alternately turned ON and OFF, and the charges on the integration capacitors CI1 and CI2are preserved by correspondingly turning OFF series-connected switches SW1 and SW2 when an integrator is in its sampling phase. While the input signal is being sampled ontoCS1whenɸ1andɸ1d are high, the sampled charge onCS2is integrated to CI2.Whenɸ2 andɸ2d are high, the sampled charge onCS1 is integrated to CI1, and the integrated signal on CI1 is sampled ontoCS2. The added switches SW1and SW2 should be sized large enough in order not to slow the settling of the integrator.

Dynamic Biasing:

Although op amp sharing reduces the integrator power and area, the efficiency of power saving is not that high by itself. This is because, in a second-order sigma-delta modulator, for example, non-idealities introduced in the second integrator are greatly attenuated by the noise shaping provided by the loop filter in a feedback. Thus, smaller capacitors and less op amp power are needed in the second integrator compared with the first integrator. Consequently, power dissipation in op amp shared integrators can be further reduced by dynamically changing the op amp bias condition between a high-current mode for the first integrator and a low-current mode for the second integrator, as shown in Fig. 3. The key to this dynamic biasing is that the bias current change must be fast enough to be carried out during non-overlapping clock periods so as not to reduce the op amp settling time.

Figure 3: Dynamic biasing of an op amp.

Fig. 4 shows conventional dynamic biasing schemes. The output current of the bias circuit I OUT is changed by a current mirror [Fig. 4(a)] or a current steering quad [Fig. 4(b)].IOUT is then amplified by k times to establish the op ampbias current IOP. Despite the relatively simple structure, both circuits show a primary concern for the capability of fastop amp bias change. For the case of Fig. 4(a), it may take significant time to charge M3 gate when the switch control signal SW goes high, resulting in slow change in IOUT. Although this can be resolved by using a high-speed current steering quad as shown in Fig. 4(b), there still exists a charging and discharging node that can cause a substantial delay in the change of the op amp bias current. If the ratio of the IOUTto IOP, here denoted as k, islarge, IOP may change veryslowly due to a large capacitive load at node X. In contrast, if kis small, the power of the bias circuit itself will be comparable with the op amp power. This is particularly important in high-speed, sampled-data systems, where the constituent op amps dissipate fairly high power, and the devices in the op amps are typically large.

Figure 4: Conventional dynamic biasing using (a) switched current mirror and (b) current steering circuits.

ADVANTAGES:

  • Area coverage is low
  • Power consumption is low

SOFTWARE IMPLEMENTATION:

  • Tanner tool

 

Energy-Efficient TCAM Search Engine Design Using Priority-Decision in Memory Technology

Energy-Efficient TCAM Search Engine Design Using Priority-Decision in Memory Technology

ABSTRACT:

Ternary content-addressable memory (TCAM)-based search engines generally need a priority encoder (PE) to select the highest priority match entry for resolving the multiple match problem due to the don’t care (X) features of TCAM. In contemporary network security, TCAM-based search engines are widely used in regular expression matching across multiple packets to protect against attacks, such as by viruses and spam. However, the use of PE results in increased energy consumption for pattern updates and search operations. Instead of using PEs to determine the match, our solution is a three-phase search operation that utilizes the length information of the matched patterns to decide the longest pattern match data. This paper proposes a promising memory technology called priority-decision in memory (PDM), which eliminates the need for PEs and removes restrictions on ordering, implying that patterns can be stored in an arbitrary order without sorting their lengths. Moreover, we present a sequential input-state (SIS) scheme to disable the mass of redundant search operations in state segments on the basis of an analysis distribution of hex signatures in a virus database. Experimental results demonstrate that the PDM-based technology can improve update energy consumption of nonvolatile TCAM (nvTCAM) search engines by 36%–67%, because most of the energy in these search engines is used to reorder. By adopting the SIS-based method to avoid unnecessary search operations in a TCAM array, the search energy reduction is around 64% of nvTCAM search engines. The proposed architecture of this paper analysis the logic size, area and power consumption using Tanner tool.

EXISTING SYSTEM:

Regular expression matching algorithms have been implemented in ternary content-addressable memory (TCAM)-based search engines for exploiting their parallel comparison and wildcard search abilities to achieve high speeds. These TCAM-based search engines can be used to monitor patterns spread across multiple packets in a flow, because they run a unique state machine instance for each flow. To check the packet head (IP address/port number/protocol) and payload, the TCAM-based search engine is implemented in firewalls, as shown in Fig. 1. When a new connection is established, it is scanned by the search engine to confirm that it is secure. Then, these packets are forwarded to the host. According to this flow, attacks, such as spam, spyware, worms, and viruses, can be immediately detected for network security because of the parallel and don’t care (X) search abilities of TCAM. Moreover, TCAM-based search designs can be used not only in network security of firewalls but also in broader applications, such as wireless sensor networks, biometrics, face recognition, and vehicle license plate recognition, as shown in Fig. 2.

Figure 1: Firewall router architecture

Unfortunately, TCAM-based search engines need to maintain sorted lists with pattern lengths to resolve multiple matches by the priority encoder (PE). This result in slow update and increases energy consumption in update and search operations. On the other hand, high energy consumption in the TCAM array is really the most critical challenge for TCAM designers, because all entries of the TCAM are searched in parallel comparison, which causes a large amount of power dissipation in match lines (MLs) switching. Therefore, restrictions on the ordering of TCAM arrays and the high energy consumption of search operation are major issues in TCAM-based search engines. Being composed of low-density memory cells, which have high leakage power, is yet another disadvantage of SRAM-based TCAM devices.

Figure 2: TCAM search engine in applications

DISADVANTAGES:

  • Power consumption is high

PROPOSED SYSTEM:

We propose an energy-efficient TCAM search engine, employing a clever decision-making process in memory technology for search operations without using PEs. A key challenge of this design lies in obtaining a longest pattern length to determine a longest pattern match entry in an arbitrary order.

TCAM Search Engine for Regular Expression Matching:

Fig. 3 shows the details of the TCAM-based search engine architecture for implementing the state machine, which consists of a TCAM array, a PE, and an SRAM array. Each TCAM entry is conceptually partitioned into two fields to represent a pattern that consists of the current state and an input character (hex format). The corresponding data (next state) are stored in the SRAM array at an address computed from the TCAM and PE output. The PE function is composed of a multiple match resolver and match address encoder. It selects the highest priority match entry and encodes this match location into binary format, which is used to retrieve the corresponding data in the SRAM array.

Figure 3: Traditional TCAM-based search engines for regular expressionmatching

In this architecture, each signature has several transitions in the state machine that represents a lot of patterns in TCAM entries. The current state register is initialized to state 0 and the search data of the input buffer are stored in the input register. If there is a matching entry that matches the state and input character in the TCAM array, the PE outputs the index of the matching entry to get the next state information from the SRAM array. The “don’t care (X)” features of TCAM can efficiently reduce data entries, as shown in Fig. 3. On the other hand, multiple entries can be simultaneously matched because don’t care (X) bits, where there is always a match regard less of the search key. Therefore, the PE will output the index of the first matched (high priority) entry. This process is repeated until no pattern needs to be monitored in the packets. If there is no matched entry in the TCAM, the current state register is set to the initial state and an input pointer is advanced to the next input.

Conventional Asymmetric TCAM:

A typical AS-TCAM cell consists of three major components, as shown in Fig. 4. The first component is an8TXOR-type cell used to compare the stored data with the search data. The second component is used to store a mask bit to indicate whether the status of the TCAM cell is don’t care (X). The third component is the logic that determines the search result based on the ML state of whether pulled down or not, and it is implemented with two nMOS transistors in series and controlled by the mask bit and the XOR result of CAM cell.

Figure 4: Structure of AS-TCAM cell.

Conventional Symmetric TCAM:

Distinct differences between S-TCAM and AS-TCAM include the symmetric cell structure, and the difference is meaning of the cell data and the difference in meaning of the mask bit. Fig. 5 shows that an S-TCAM cell consists of two SRAM cells and four transistors, which are used to store data and compare the stored data with search data, respectively. The four transistors are the necessary evaluation logic to generate the comparison result based on the charge of the ML.

Figure 5: Structure of S-TCAM cell

RCSD-4T2R nv TCAM:

nvTCAMs have been designed to achieve small area and fast/low-power wake-up operations. Fig. 6 shows the circuit scheme for a resistive memory (RRAM)-based nv TCAM comprising two RRAM devices (RT/RB), two comparison transistors (NC/NCB), a write-control transistor (NWC), and an ML-driver transistor (NML). In the standby mode, word line, write-voltage-control and dynamic source line (DSL) are maintained at 0, and the ML is kept at a precharge voltage(VPRE). After the precharging operation, different search data are put on data lines to perform search operations.

Figure 6: Schematic of RCSD-4T2R nvTCAM cell

ADVANTAGES:

  • Low power consumption

SOFTWARE IMPLEMENTATION:

  • Tanner tool

 

On Micro-architectural Mechanisms for Cache Wear out Reduction

On Micro-architectural Mechanisms for Cache Wear out Reduction

 

 

ABSTRACT:

Hot carrier injection (HCI) and bias temperature instability (BTI) are two of the main deleterious effects that increase a transistor’s threshold voltage over the lifetime of a microprocessor. This voltage degradation causes slower transistor switching and eventually can result in faulty operation. HCI manifests itself when transistors switch from logic “0” to “1” and vice versa, whereas BTI is the result of a transistor maintaining the same logic value for an extended period of time. These failure mechanisms are especially acute in those transistors used to implement the SRAM cells of first-level (L1) caches, which are frequently accessed, so they are critical to performance, and they are continuously aging. This paper focuses on micro architectural solutions to reduce transistor aging effects induced by both HCI and BTI in the data array of L1 data caches. First, we show that the majority of cell flips are concentrated in a small number of specific bits within each data word. In addition, we also build upon the previous studies, showing that logic “0” is the most frequently written value in a cache by identifying which cells hold a given logic value for a significant amount of time. Based on these observations, this paper introduces a number of architectural techniques that spread the number of flips evenly across memory cells and reduce the amount of time that logic “0” values are stored in the cells by switching OFF specific data bytes. Experimental results show that the threshold voltage degradation savings range from 21.8% to 44.3% depending on the application. The proposed architecture of this paper analysis the logic size, area and power consumption using Tanner tool.

EXISTING SYSTEM:

Modern day computer systems have benefited from being designed and manufactured using an ever-increasing budget of transistors on very reliable integrated circuits. However, as technology moves forward, such a “free lunch” is over as increasingly smaller technology nodes pose significant reliability challenges. Not only do variations in the manufacturing process make the resulting transistors unreliable at low voltage operation, but they take less and less time tower out, decreasing their lifetimes (from tens of years in current systems to 1–2 years or fewer in the near future)and making them more prone to failures in the field. Thus, lifetime reliability must be treated as a major design constraint. This concern holds for all kinds of computing devices, ranging from server processors to embedded systems, such as tablets and mobiles, where lifetime is an assertive requirement and the market share strongly depends on their reliability.

The two main phenomena that speed up aging are referred to as hot carrier injection (HCI) and bias temperature instability (BTI). The former effect increases with transistor activity over the lifetime of the processor; that is, when a transistor flips from being ON to OFF and vice versa, leading to threshold voltage (Vth) degradation, which in turn causes an increase in transistor switching delay and can result in timing violations and faulty operation when the critical paths become longer than the processor’s clock period. Overall, HCI is accentuate din the microprocessor components with frequent switching. On the other hand, BTI accelerates transistor degradation when a transistor is kept ON for a long time, and takes two forms :negative BTI (NBTI), which affects pMOS transistors when a “0” is applied to the gate, and positive BTI (PBTI), which affects nMOS when a “1” is applied.

A significant amount of the transistors in most modernchip multiprocessors are used to implement SRAM storage along the cache hierarchy. Therefore, it is important to target these structures to slow down aging. The first-level (L1)data cache is a prime candidate, since it is regularly written, yet stores data for significant amounts of time. Besides, its availability is critical to system performance. The SRAM cell transistors are stressed by HCI and BTI when the stored logic value flips and when it is retained for a long period without flipping (i.e., a duty cycle), respectively. Note that these situations are strongly related to each other. Thus, a given technique designed to exclusively attack BTI might exacerbate HCI as a side effect, and vice versa

Prior architectural research has analyzed cache degradationmainly due to BTI effects. There have been some attempts to diminish BTI aging by periodically inverting the stored logic values in the cells, by implementing redundant cellregions in the cache, and by reducing the cache supply voltage. Gunadiet al. propose a tentative approach to combat BTI and HCI by balancing the cache utilization. However, the cache contents are flushed from time to time, which might incur in significant performance degradation. Unlike the previous works, we extensively analyze the data patterns of the stored contents in L1 data caches in terms of how they affect BTI and HCI, and based on the results of this paper, we propose micro-architectural mechanisms to extend the cache data array lifetime by reducing the Vth degradation, or simplyd Vth, caused by both phenomena, without in curring performance losses

DISADVANTAGES:

  • threshold voltage degradation is high
  • duty cycle distributions is high

PROPOSED SYSTEM:

This paper makes two main contributions. First, we characterize the cell flips and the duty cycle patterns that high-performance applications cause to each specific memory cell. We find that most applications exhibit regular flip and duty cycle patterns, although they are not always uniformly distributed, which exacerbates the HCI and BTI effects on a small number of cells within the 512-bit cache lines. Results also confirm the previous work, claiming that most applications write a significant number of near-zero and zero data values into the cache. This behavior has been exploited in the past to address static energy consumption and performance with data compression. Unlike these works, this paper takes advantage of such a behavior to mitigate aging.

Second, based on the previous characterization study, we devise micro architectural techniques that exploit such a behavior to mitigate aging. The proposal provides a homogeneous degradation of the different cell transistors belonging to the same cache line. For this purpose, the devised techniques aim to reduce cell aging from bit flips and duty cycle and pursue two objectives: 1) to spread the bit flips evenly across the memory cells and 2) to balance the duty cycle distribution of the cells. To accomplish the former objective, we propose to progressively shift the bytes of the incoming data lines according to a given rotation shift value that is regularly updated. To attain the latter objective, the mechanism is enhanced to power OFF those memory cells storing a zero byte value. The result is a switch-OFF or sleep-state, in which all the cell transistor gate terminals are isolated from electric field stress, thus allowing a partial recovery from BTI.

Figure 1:Implementation of a 6T SRAM cell. The labeled transistors refer to the inverter loop of the cell

The proposed approaches attack aging in the data array. Given that the tag array is much smaller than the data array, resilient technologies could be used to address tag wear out. For example, resilient 8T cells introduce a 19% area overhead compared with typical6T cells. According to CACTI, implementing the tags with 8T cells results in just a 1.95% area overhead for a 16-kB L1 cache.

To help microprocessor architects understand how the logic value (i.e., “0” or “1”) distribution among the cache cells as well as the bit flips caused by write operations affect wear out, this section summarizes the implementation of atypical SRAM cell and explains how it suffers from BTI and HCI effects.

As shown in Fig. 1, each cached bit is implemented with an SRAM memory cell consisting of six transistors (6T). The labeled transistors form an inverter loop that holds the stored logic value; this paper uses these labels to refer to these transistors. The remaining pass transistors controlled by the word line signal allow read and writeoperations to the cell through the bit line (BL) and its complementary (BL).

When the SRAM cell is under a “0” duty cycle, that is, when the cell is stable and storing a “0,” the pMOS transistor TP1 and the nMOS transistorTN2 are under stres sand they suffer from NBTI and PBTI, respectively. On the contrary, under a “1” duty cycle, transistors TP2 and TN1are affected by NBTI and PBTI, respectively. The wear out effects induced by each type of duty cycle are complementary, meaning that, for a given duty cycle, the pair of transistors not under stress are partially under recovery from BTI degradation. Thus, if every cache cell experiences a balanced distribution(i.e., 50%) of “0” and “1” duty cycles, wear out effects due to BTI are minimized and evenly distributed among the inverter loop transistors. Moreover, this reduces the probability of the circuit failing due to static noise margin (SNM)changes.

On the other hand, HCI affects all SRAM cell transistors on a write operation if the logic value flips, regardless of the type of transistor. This effect can be mitigated by avoiding bit flips during write operations. In addition, in order to minimize the chances of SRAM cell faults due to HCI wearout, those remaining bit flips must be evenly distributed among the cells.

To sum up, the inverter loop transistors are continuously aging regardless of whether the cell stores “0” or “1,” oris transitioning. This fact makes such transistors particularly sensitive to wear out. Note that the nMOS pass transistors just age when the SRAM cell is being accessed, which represents a very small fraction of the overall execution time, making them much less aging-sensitive than the inverter loop.

Hardware Implementation and Operation:

1) Hardware Components and Area Overhead: The BW mechanism can be implemented with 16 4-to-1 multiplexers; one for each data word within the incoming line. Fig. 2 shows one of the multiplexers and its associated inputs used in the write circuit. Label Bi refers to the different data bytes from the word, B0 and B3 being the LSB and MSB, respectively. Each data input consists of the data bytes ordered according to one of the four possible shift functions. The multiplexer is controlled by theCBW0andCBW1control bits that correspond to the current shift function. For the read circuit, another16 4-to-1 multiplexers can be used for the requested line; however, the order of the data inputs differs from those of the write circuit, since in this operation, the contents must be realigned instead of shifted. These multiplexers are only used when reading and writing a given line; thus, they are shared among all the lines in the data array.

Figure 2: Write circuit for the BW mechanism.

2) Control Bit Inversion: Both HCI and BTI phenomena should be evaluated not only in the data array bits but also in the additional control bits added by our mechanisms and implemented as SRAM cells. Recall that theCBW0andCBW1bits make up a 2-bit counter and they are updated between regular shift phases of 8M processor cycles, which results inane implicit balanced (i.e.,near-optimal) duty cycle distribution in such bits. However, the CpBW bit is set to “1” when the associated line is written for the first time within a phase, andset to “0” every time a new phase starts. We have evaluated that such writes normally come soon after the phase begins, causing a highly biased “1” duty cycle in these bits, which exacerbates BTI in transistorsTP2-TN1.

3) Read/Write Operations: With the aim to clarify how both BW and SZB schemes work together, Fig. 3plots acache block diagram with both mechanisms represented as gray boxes. On a cache read hit, after the way multiplexer selects the target line from the selected set, its contents and the associated control bits are forwarded to the SZB read circuit. Once the SZB tri state buffers have forwarded the zero bytes, the BW multiplexers realign the bytes and serve the original line to the processor. Note that, on a read operation, there is no need to restore the power to those memory cells that originally would hold zero bytes.

Figure 3: Block diagram of the L1 data cache access, including the proposed components (gray boxes)

ADVANTAGES:

  • threshold voltage degradation is reduced
  • duty cycle distributions is reduced

SOFTWARE IMPLEMENTATION:

  • Tanner tool

 

Sense Amplifier Half-Buffer (SAHB): A Low-Power High-Performance Asynchronous Logic QDI Cell Template

Sense Amplifier Half-Buffer (SAHB): A Low-Power High-Performance Asynchronous Logic QDI Cell Template

 

ABSTRACT:

We propose a novel asynchronous logic (async) quasi-delay-insensitive (QDI) sense-amplifier half-buffer (SAHB) cell design approach, with emphases on high operational robustness, high speed, and low power dissipation. There are five key features of our proposed SAHB. First, the SAHB cell embodies the async QDI 4-phase (4φ) signaling protocol to accommodate process–voltage–temperature variations. Second, the sense amplifier (SA) block in SAHB cells embodies a cross-coupled latch with a positive feedback mechanism to speed up the output evaluation. Third, the evaluation block in the SAHB comprises both nMOS pull-up and pull-down networks with minimum transistor sizing to reduce the parasitic capacitance. Fourth, both the evaluation block and SA block are tightly coupled to reduce redundant internal switching nodes. Fifth, the SAHB cell is designed in CMOS static logic and hence appropriate for full range dynamic voltage scaling operation for VDD ranging from nominal voltage (1 V) to subthreshold voltage (∼0.3 V). When six library cells embodying our proposed SAHB are compared with those embodying the conventional async QDI pre-charged half buffer (PCHB) approach, the proposed SAHB cells collectively feature simultaneous ∼64% lower power, ∼21% faster, and ∼6% smaller IC area; the PCHB cell is inappropriate for subthreshold operation. A prototype 64-bit Kogge–Stone pipeline adder based on the SAHB approach (at 65 nm CMOS) is designed. For a 1-GHz throughput and at nominal VDD, the design based on the SAHB approach simultaneously features ∼56% lower energy and∼24% lower transistor count advantages than its PCHB counterpart. When benchmarked against the ubiquitous synchronous logic counterpart, our SAHB dissipates∼39% lower energy at the 1-GHz throughput. The proposed architecture of this paper is analysis the logic size, area and power consumption using tanner tool.

EXISTING SYSTEM:

Fig. 1 broadly classifies digital logic for the realization of operationally robust digital circuits. In the highest classification, there are the sync and async digital logic design philosophies. As the sync digital logic design philosophy requires timing assumptions associated with the clock (e.g., clock skews and setup/hold times), realizing operationally robust circuits under large PVT variations is challenging, where large timing margins are required to accommodate the worst case conditions. In contrast, the async digital logic design philosophy, particularly the quasi-delay-insensitive (QDI) approach, is an alternative approach to mitigate the timing assumptions.

Figure 1: General classification of digital logic circuits

There are nevertheless other challenges and will be discussed in the following two paragraphs. In Fig. 1, the classifications within the async digital logic design philosophy are depicted. In the perspective of the timing approach classification, there are three async types: 1) delay-insensitive (DI); 2) bundled-data (BD);and 3) QDI/timed-pipeline (TP)/single-track (ST). For the first in this classification, the DI circuits, they are largely impractical because they make no assumption on the gate/wire delays, leading to circuit realizations comprising only buffer cells and C-Muller cells. For the second approach, BD circuits, they are similar to sync circuits, requiring delay assumptions for circuit realization. As their operations rely on bounded gate/wire delays similar to sync circuits, their design is somewhat challenging to guarantee operational robustness in unknown operating conditions. For the third approach, QDI, TP, and STcircuits, they are grouped together for their similar completion detection mechanisms. QDI circuits operate error free for arbitrary wire delays and assume isochronic forks, i.e., the same wire delays are assumed for different branches. This assumption can be satisfied easily in the placement and routing stage. On the other hand, although TP circuits and ST circuits have completion detection mechanisms, they require delay assumptions for their circuit realizations. These delay assumptions consequently reduce the reliability of their circuits for unknown operating conditions. In short, as the QDI async approach detects the completion of data according to actual workloads and/or operating conditions, it offers the most practical approach to accommodate unknown PVT variations.

DISADVANTAGES:

  • Low speed
  • High energy

PROPOSED SYSTEM:

We further describe a 64-bit Kogge–Stone (KS) pipeline adder embodying the proposed SAHB approach for a power management application. Our SAHB pipeline adder is experimentally verified to be operationally robust within a wide supply voltage range (0.3 to 1.4 V) and wide temperature range (−40 °C to 100 °C). When benchmarked against its competing async PCHB and sync equivalents (at 1-GHz throughput), our SAHB pipeline adder is more energy efficient;

Figure 2: SAHB cell template.

Sense Amplifier Half-Buffer:

Fig. 2 depicts the generic interface signals for the proposed dual-rail SAHB cell template. The data inputs are Data in and n Data in and the data outputs are Q.T/Q.F and nQ.T/nQ.F. The left-channel handshake outputs are Lack and nLack, and the right-channel handshake inputs are Rack and nRack. nDatain , nQ.T, nQ.F, nLack, and nRack are logical complementary signals to the primary input/output signals of Datain, Q.T, Q.F, Lack, and Rack, respectively. For the sake of brevity, we will only use the primary input/output signals to delineate the operations of an SAHB cell. The SAHB cell strictly abides by the async 4-phase (4φ) handshake protocol—having two alternate operation sequences, evaluation and reset. Initially, Lack and Rack are reset to 0 and both Datain and Q.T/Q.F are empty, i.e., both of the rails in each signal are 0. During the evaluation sequence, when Datain is valid (i.e., one of the rails in each signal is 1) and Rack is 0, Q.T/Q.Fis evaluated and latched and Lack is asserted to 1 to indicate the validity of the output. During the reset sequence, when Datain is empty and Rack is 1, Q.T/Q.F will then be empty and Lack is de asserted to 0. Subsequently, the SAHB cell is ready for the next operation.

Figure 3: Circuit schematic of a buffer cell embodying SAHB. (a) Evaluation block powered byVDD_L. (b) SA block powered by VDD

For illustration, Fig. 3(a) and (b) depicts the respective circuit schematic of an evaluation block and an SA block of a buffer cell embodying SAHB; the various sub-blocks are shown within the dotted blocks. The evaluation and SA blocks are powered, respectively, by VDD_L and by VDD, which can be the same or different voltages (see Section II-B). The nMOS transistor in green with RST is optional for cell initialization. In Fig. 3(a), the evaluation block comprises an nMOS pull-up network and an nMOS pull-down network to, respectively, evaluate and reset the dual-rail output Q.T/Q.F. Of particular interest, the nMOS pull-up network features low parasitic capacitance (lower than the usual pMOS pull-up network whose transistor sizing is often 2×larger than that of the nMOS).

Figure 4: Dual-rail SAHB library cells. (a) Two-input AND/NAND. (b) Two-input XOR/XNOR. (c) Three-input AO/AOI

Fig. 4(a)–(c) depicts the circuit schematic of three basic SAHB library cells: 1) two-input AND/NAND; 2) two-input XOR/XNOR; and 3) three-input AOI/AOI cells. The logic functions of the pull-up network for AND/NAND, XOR/XNOR, and AO/AOI cells are, respectively, expressed in (2), (3), and (4). Similar to the buffer cell, the structure of the evaluation block and SA block of these cells are constructed based on their logic functions and input signals. These library cells will be used for benchmarking and for realizing the 64-bit SAHB pipeline adder.

Circuit Configuration and Supply Voltage Setup:

In the evaluation block, there are two ways to configure the connection of the transistors for a multiple-input SAHB cell. Fig. 5(a) and (b) depicts two different circuit configurations for Q.F of the two-input AND/NANDSAHB cell. Of these circuit configurations, the configuration in Fig. 5(a) is adopted in the cell library for its lesser transistor count, where Q.F will be partially charged up to VDD_L when either A.F or B.F is 1. The voltage level of voltage supplies VDD_L and VDD is critical to prevent an early output transition before all the inputs (A.F and B.F) are valid.

Figure 5: Circuit configurations in a two-input SAHB AND/NANDcell.(a) Transistors are shared and (b) transistors are not shared. The drawings depict the scenario when only inputAis valid.

ADVANTAGES:

  • High speed
  • Low power consumption

SOFTWARE IMPLEMENTATION:

  • Tanner EDA

A 100-mA, 99.11% Current Efficiency, 2-mVppRipple Digitally Controlled LDO with Active Ripple Suppression

A 100-mA, 99.11% Current Efficiency,2-mVpp Ripple Digitally Controlled LDO with Active Ripple Suppression

 

ABSTRACT:

Digital low-dropout (DLDO) regulators are gaining attention due to their design scalability for distributed multiple voltage domain applications required in state-of-the-art system on-chips. Due to the discrete nature of the output current and the discrete-time control loop, the steady-state response of the DLDO has inherent output voltage ripple. A hybrid DLDO (HD-LDO) with fast response and stable operation across a wide load range while reducing the output voltage ripple is proposed. In the HD-LDO, a DLDO and a low current analog ripple cancelation amplifier (RCA) work in parallel. The output dc of the RCA is sensed by a 2-bit analog-to-digital converter, and the digitized linear stage current is fed into the DLDO as an error signal. During load transients, a gear-shift controller enables fast transient response using dynamic load estimation. The DLDO suppresses the output dc of the RCA within its current resolution. With this arrangement, a majority of the dc load current is provided by the DLDO and the RCA supplies ripple cancelation current. The HD-LDO is designed and fabricated in a 180-nm CMOS technology, and occupies 0.697 mm2 of the die area. The HD-LDO operates with an input voltage range of 1.43–2.0 V and an output voltage range of 1.0–1.57 V. At 100-mA load current, the HD-LDO achieves a current peak efficiency of 99.11% and a settling time of 15 clock periods with a 0.5-MHz clock for a current switching between 10 and 90 mA. The RCA suppresses fundamental, second, and third harmonics of the switching frequency by 13.7, 13.3, and 14.1 dB, respectively. The proposed architecture of this paper analysis the logic size, area and power consumption using Tanner tool.

EXISTING SYSTEM:

A typical analog LDO (ALDO) regulator, as shown in Fig.1, is a second-order system with a high gain error amplifierA0and an output power transistorM0. An external output capacitor COUT and its equivalent series resistance (RESR) are added for the compensation of the loop. This configuration has been used in many applications, but it suffers from several issues. The capacitor COUT introduces a zero to en sure sufficient loop phase margin, but the RESR increases the output ripple voltage during the load transients. The output load current variations move the ALDO output pole, which changes the ALDO phase margin [Fig. 1(b)]. The RESR of the capacitor is also not well controlled and can vary widely for different types of capacitors. Given these factors, the frequency stability of an analog LDO significantly depends on the load current as well as the external capacitor COUT and its RESR.

Figure 1: Conventional analog LDO (a) block diagram and (b) its frequency response

Several methods have been proposed in the analog LDO linear regulator to eliminate these issues. An approach proposed to reduce these deficiencies is to use a tracking zero circuit to cancel the load-dependent LDO output pole. However, this approach suffers from a mismatch in the pole-zero cancelation as the output pole varies linearly with the load current while the introduced zero is a nonlinear function of the load current An alternative method is to generate an internal fixed zero at the output node of the resistive feedback using frequency dependent voltage controlled current source. However, this method cannot be realizable in the resistive feed backless regulator. The error amplifier output pole location is required to be at higher frequency to circumvent stability and load transient ripple issues, which limits the power transistor M0size resulting in a lower output maximum load current.

Figure 2: Conventional synchronous DLDO. (a) Block diagram. (b) Loaddependent output ripple

Digital LDO (DLDO) regulator is an innovative approach that uses digital controller to minimize the external compensation network. Block diagram of a synchronous DLDO is shown in Fig. 2(a). An analog-to-digital converter (ADC)is used to digitize the output voltage. The generated error voltage is fed to the input of the digital controller to generate the compensation digital code for the output power digital to-analog converter (DAC). DLDO offers advantages in terms of integration, scalability, size, programmability, and stability over a wide range of load current variations, and a lower sensitivity to process variations. In most of the recent DLDO implementations, output transistor in power DAC operates in the linear (triode) region to reduce the silicon area. However, the triode region operation yields poor power supply rejection (PSR) and lower efficiency at higher loads. An additional feedback loop is implemented to improve PSR. However, the additional loop increases the complexity and reduces efficiency.

Although the DLDO regulators show better transient and stability performance over a wide load range, in the steady state condition their output voltage suffers from load current dependent ripple [5], [13]. The DLDO supplies current to the output load in discrete steps, and in steady-state conditions, digital loop compensation causes voltage ripple at the output. If the output voltage changes by more than one ADC LSB, the feedback digital error signal changes resulting in new controller code. This causes the analog output voltage to cycle continuously around the regulated dc output voltage level to provide the required output load current. The output ripple voltage of a synchronous DLDO for various load currents is shown in Fig. 2(b). In Fig. 2(b), it is assumed , k I DAC−LSB<ILOAD<(k+1)IDAC−LSB and I err =(ILOAD−kI DAC−LSB), where I DAC−LSB is the current resolution of the power DAC and k is an integer number. Traces labeled by ILOAD−a, ILOAD−b,and ILOAD−c represent output ripple voltages when Ierr<0.5IDAC−LSB, Ierr ≈0.5IDAC−LSB,and Ierr>0.5IDAC−LSB, respectively. This result shows that the output ripple voltage is a function of the load current. The ripple period depends on the control loop parameters such as loop response time, output node capacitance, ADC resolution, output load current, and the clock frequency. This ripple generates supply noise that can impact sensitive analog and mixed-signal circuits powered by the DLDO.

DISADVANTAGES:

  • Worst performance

PROPOSED SYSTEM:

A hybrid DLDO (HD-LDO) regulator utilizing a DLDO regulator in parallel with a wideband ripple cancelation amplifier (RCA) to cancel the output ripple associated with the switching ripple due to the DLDO. During the steady-state operation, the RCA supplies the error current between the DLDO output current and the required load current, and suppresses the output voltage ripple. The maximum output current of the RCA is limited to the minimum current step resolution of the DLDO, which is typically less than 1%of the maximum load current. Low current requirement simplifies the RCA design, which results in a fast RCA feedback loop without the use of complex compensation techniques of the conventional analog linear regulator. To enable fast load transient, a gear-shift controller is designed based on dynamic load estimation. In a typical DLDO, large output capacitor suppresses the output voltage ripple, whereas in the proposed HD-LDO the RCA suppresses the output voltage ripple without output capacitance.

The architectural block diagram of the proposed HD-LDO is shown in Fig. 3(a). A hybrid combination consists of the digital loop with DLDO in parallel with the analog loop with the RCA. The analog loop compares the output voltage (VOUT) with the reference voltage (VREF) and generates the current IRCA proportional to the error voltage between VREF and VOUT. The DLDO senses the linear stage current IRCA in every clock cycle and changes the DLDO output current (IDLDO)to force the IRCA to zero. At steady-state condition, due to the power DAC quantization error, the output voltage cycles around the reference output voltage level. The analog loop is formed by a wideband, high gain, low-power amplifier connected in a unity-gain configuration and supplies the residual current within one LSB of the DLDO to reduce the output voltage ripple. The analog regulation and digital regulation continuously track and minimize the output voltage error and linear stage current, respectively, and an accurate output regulation is achieved.

Figure 3: Block diagram of proposed HD-LDO.(a) Architecture. (b) Structural

Ripple Cancelation Amplifier:

In the proposed hybrid architecture, the analog loop is in parallel with the DLDO to cancel the quantization error of the DLDO and reduce the ripple caused by the limit-cycle behavior of the DLDO. The analog loop operates within one LSB of the DLDO output and its current is bound−IDAC−LSB <IRCA <IDAC−LSB. The ripple cancelation amplifier output current IRCA can be sourced or sunk depending on the digital regulator output current.

Figure 4: Class-AB folded cascode RCA

Feedback Analog-to-Digital Converter:

The ADC in the DLDO converts the output current of the RCA into the digital equivalent for the PID controller input. To reduce the power consumption, a 2-bit ADC is selected. The output current of the RCA is sensed by a resistor RSEN, which converts RCA output current into a voltage VSEN. The negative feedback loop of the RCA forces the HD-LDO output voltage VOUT equal to VREF. When the RCA sources/sinks current to/from the load, VSEN becomes positive/negative. Voltage VSEN is compared using three clocked comparators of the 2-bitADCasshowninFig.5.SignalD2 is low for VSEN>VT Hand D0 becomes high for VSEN <−VTH. The output D1of comparatorC1 becomes low for VSEN >0 and high for VSEN<0. Therefore,D1 carries information about the RCA output current direction.

Figure 5: 2-bit flash ADC.

ADVANTAGES:

  • Better performance

SOFTWARE IMPLEMENTATION:

  • Tanner tool

 

Preweighted Linearized VCO Analog-to-Digital Converter

Pre-weighted Linearized VCO Analog-to-Digital Converter

 

ABSTRACT:

A linearization technique of voltage-to-frequency characteristics of voltage-controlled oscillator (VCO) analog-to-digital converters (ADCs) is presented. In contrast to previous works, the proposed technique is an open-loop calibration-free configuration, so it can operate at higher frequencies. It is also independent of the delay element structure, so it can be applied to various VCO ADC topologies. The analog input signal is first mapped through a pre weighted resistor network in which every delay element experiences a different version of the input and produces the corresponding delay. As a result, the proposed approach suppresses the impact of V/F nonlinearity on the ADC performance by expanding a linear region of the transfer curve over the full rail-to-rail input. This technique shows substantial improvement results by keeping nonlinearity within ±0.5% over the full input scale (dBFS) and achieves a peak signal-to-noise and distortion ratio (SNDR) of 75.7 and 60.4 dB for input of −8 and 0 dBFS, respectively.

EXISTING SYSTEM:

Technology scaling down makes the functionalities of analog blocks more and more challenging. Conventional voltage-domain analog-to-digital converter (ADC) basically relies on the voltage headroom swing to fulfill the desirable performance. Consequently, designing voltage-domain ADCs with high resolution target in the recent deep sub micrometer technologies is hard to achieve. Digital circuits are highly immune to noise, area and power efficient, and provide substantial reduction in the design cost. Their associated problems are less and are much easier to be eliminated. To comply with digital implementation, time-domain ADCs paradigm comes to the sight, in which the analog level first translated to time intervals (frequency) which are quantized and then encoded to a digital word as shown in Fig. 1(a). With the technology shrinking, the available voltage headroom is small (i.e., ≤1 V), and at the same time, the parasitic are reduced due to the smaller dimension which leads to faster switching so the representation of a signal in the time-domain is the efficient alternative solution to achieve high resolution. In other words, the quantization process is now in horizontal fashion (time) instead of vertical (voltage) as depicted in Fig. 1(b). The voltage controlled oscillator (VCO)-based ADC is the most popular time based ADC among the researchers due to the presence of highly digital components [1], [2]. However, the major shortcoming of this type of ADCs is the nonlinearity of voltage to time (frequency) transfer curve [3], [4]. This nonlinearity is translated as unwanted harmonics in ADC output spectrum degrading the signal-to-noise and distortion ratio (SNDR) as illustrated in Fig. 1(c). Many approaches have been presented to alleviate the nonlinearity of VCO ADC in order to get better effective number of bits (ENOB) over wider bandwidth (BW). Most of these techniques either impose significant power constraints or are complex leading to imprecision in combating the nonlinearity thus compromising the performance [5]–[22]. In contrast, a simple open loop and calibration-free VCO ADC linearization approach is presented. It is independent to the VCO topology and exhibits 9 ENOB over the rail-to-rail input.

DISADVANTAGES:

  • More power consumption
  • More noise

PROPOSED SYSTEM:

The techniques of mitigating the VCO nonlinearity can be categorized into closed loop or open loop configurations. There are several architectures of closed loop configurations. For instance, switched-capacitor feedback approach [3], [4] has been incorporated in the VCO ADC [5] or high speed implementation of VCO qunatizer [6] was plugged inside high-order loop [7], [8] as displayed in Fig. 2(a). Compared to frequency measurement [7], the phase measurement [8] proves better linearity improvement when used as the key output variable of the VCO quantizer to feedback the loop.

In view of all the above approaches, circuit level seems to be an attractive alternative since the VCO represents the bottleneck of the full ADC. So, a circuit level approach seeks to alleviate the VCO nonlinearity is presented. It can be adequate for different VCO delay topologies without closing the loop. As a result, no further architectural level linearity correction would be necessary. The proposed approach suppresses the impact of V/F nonlinearity on the ADC performance by expanding a linear region of the transfer curve over the full rail-to-rail input. Fig. 4 shows the ADC with the proposed technique. By introducing a preweighted (e.g., binary) resistor network, the analog input voltage is first premapped in a binary fashion at each node of the VCO line. Such a way insures that each delay element experience a different version of the input voltage and produces its corresponding delay (i.e., phase).

ADVANTAGES:

  • Less Power Consumption
  • Less noise

SOFTWARE IMPLEMENTATION

  • TANNER

 

A Fault Tolerance Technique for Combinational Circuits Based on Selective-Transistor Redundancy

A Fault Tolerance Technique for Combinational Circuits Based on Selective-Transistor Redundancy

 

ABSTRACT:

With fabrication technology reaching nano-levels, systems are becoming more prone to manufacturing defects with higher susceptibility to soft errors. This paper is focused on designing combinational circuits for soft error tolerance with minimal area overhead. The idea is based on analyzing random pattern testability of faults in a circuit and protecting sensitive transistors, whose soft error detection probability is relatively high, until desired circuit reliability is achieved or a given area overhead constraint is met. Transistors are protected based on duplicating and sizing a subset of transistors necessary for providing the protection. In addition to that, a novel gate-level reliability evaluation technique is proposed that provides similar results to reliability evaluation at the transistor level (using SPICE) with the orders of magnitude reduction in CPU time. LGSynth’91 benchmark circuits are used to evaluate the proposed algorithm. Simulation results show that the proposed algorithm achieves better reliability than other transistor sizing-based techniques and the triple modular redundancy technique with significantly lower area overhead for 130-nm process technology at a ground level. The proposed architecture of this paper analysis the logic size, area and power consumption using Tanner tool.

EXISTING SYSTEM:

Reliability in systems can be achieved by redundancy. Redundancy can be added at the module level, gate level, transistor level, or even at the software level. Design of reliable systems by using redundant unreliable components was proposed. Since then, plethora of research has been done to rectify soft errors in combinational and sequential circuits by applying hardware redundancy. Triple modular redundancy (TMR), a popular and widely used technique, creates three identical copies of the system and combines their outputs using a majority voter. The generalized modular redundancy scheme considers the probability of occurrence of each combination at the output of a circuit. The redundancy is then added to only protect those combinations that have high probability of occurrence,while the remaining combinations are left unprotected to save area. El-Maleh and Al-Qahtani proposed a fault tolerance technique for sequential circuits that enhances the reliability ofsequential circuits by introducing redundant equivalent statesfor states with high probability of occurrence. Mohanram and Touba proposed a partial error masking scheme based on TMR, which targets the nodes with the highest soft error susceptibility. Two reduction heuristics are used to reduce soft error failure rate, namely, cluster sharing reduction and dominant value reduction. Instead of triplicating the whole logic as in TMR, only those nodes with high soft error susceptibility are triplicated; the rest of the nodes are clustered and shared among the triplicated logic. Sensitive gates are duplicated and their outputs are connected together.

Physically placing the two gates with a sufficient distance reduces the probability of having the two gates hit by a particle strike simultaneously and, therefore, reduces the soft error rate (SER). Another technique based on TMR maintains a history index of correct computation module to select the correct result. Teifel proposed a double/dual modular redundancy (DMR) scheme that utilizes voting and self-voting circuits to mitigate the effects of SET in digital integrated circuits. The Bayesian detection technique from communication theory has been applied to the voter in DMR, called soft NMR. In most cases, it is able to identify the correct output even if all redundant modules are in error, but at the expense of very high area overhead cost of the voters.

Another class of techniques enhances fault tolerance by increasing soft error masking based on modifying the structure of the circuit by addition and/or removal of redundant wires or by re synthesizing parts of the circuit. SER is reduced based on redundancy addition and removal of wires. Redundant wires are added based on the existing implications between a pair of nodes in the circuit. Two-level circuits are synthesized by assigning do not care conditions to improve input error resilience, which minimizes the propagation of fault effects. An algorithm is proposed to synthesize two-level circuits to maximize logical masking utilizing input pattern probabilities.

DISADVANTAGES:

  • Area coverage is High

PROPOSED SYSTEM:

Now, consider the transistor arrangement shown in Fig. 2(a)where duplicate pMOS transistors are connected in parallel. The width of the redundant transistors must also be increased to allow dissipation (sinking) of the deposited charge as quickly as it is deposited, so that the transient does not achieve sufficient magnitude and duration to propagate to the output. If the output is currently high and an energetic particle hits the drain N1 of the nMOS transistor (with the same current source used in the simulations shown in Fig. 1), this should result in a lowered voltage observed at the output. But, due to the employed transistor configuration, the net negative voltage effect will be compensated, as evident from Fig. 2(b), resulting in a spike that has lesser magnitude as compared with the one shown in Fig. 1(b). The spike magnitude is reduced due to increased output capacitance and reduced resistance between the Vdd and the output.

Figure 1: Effect of energetic particle strike on CMOS inverter att =5 ns. (a) Particle strike model. (b) Effect of particle strike at nMOS drain. (c) Effect of particle strike at pMOS drain

Consider another arrangement of transistors in Fig. 2(c) where redundant nMOS transistors are connected in parallel. If the output is low and the incident energetic particle strikes the drain P1 of pMOS transistor, then the raised voltage effect at the output shown in Fig. 1(c) will be reduced, as shown in Fig. 2(d). This reduction in the spike magnitude is due to the same reasons mentioned for the nMOS transistor. Similarly, to protect from both sa0 and sa1 faults, the transistor structures in Fig. 2(a) and (b) can be combined to fully protect the NOT gate. A fully protected NOT gate offers the best hardening by design, but at the cost of higher area overhead and power. It must be noted that the optimal size of the transistor for SEU immunity depends on the charge Q of the incident energetic particle.

Figure 2: Proposed protection schemes and theireffect. (a) Particle hit at nMOS drain, OUT=HIGH. (b) Reduced effect of particle strike at nMOS drain. (c) Particle hit at pMOS drain, OUT=LOW. (d) Reduced effect of particle strike at pMOS drain.

PROPOSEDALGORITHM:

The proposed STR algorithm is presented. The algorithm protects sensitive transistors whose probability of failure (POF) is relatively high. The proposed algorithm can be utilized in two capacities: 1) apply protection until the POF of circuit reaches a certain threshold and 2) apply protection until certain area overhead constraint is met. We will first discuss different relations that realize the circuit POF. These relations are then used in the proposed algorithm.

ADVANTAGES:

  • Area coverage is low

SOFTWARE IMPLEMENTATION:

  • Tanner tool

 

A 65-nm CMOS Constant Current Source with Reduced PVT Variation

A 65-nm CMOS Constant Current Source with Reduced PVT Variation

 

ABSTRACT:

This paper presents a new nanometer-based low-power constant current reference that attains a small value in the total process–voltage–temperature variation. The circuit architecture is based on the embodiment of a process-tolerant bias current circuit and a scaled process-tracking bias voltage source for the dedicated temperature-compensated voltage to-current conversion in a pre regulator loop. Fabricated in a UMC 65-nm CMOS process, it consumes 7.18µWwitha1.4V supply. The measured results indicate that the current reference achieves an average temperature coefficient of 119 ppm/°C over 12 samples in a temperature range from−30 °C to 90 °C without any calibration. Besides, a low line sensitivity of 180 ppm/V is obtained. This paper offers a better sensitivity figure of merit with respect to the reported representative counterparts. The proposed architecture of this paper analysis the logic size, area and power consumption using Tanner tool.

EXISTING SYSTEM:

Oguey and Aebischer presented a self-biased topology by means of biasing a triode-biased nMOS transistor through a saturation-biased transistor in a circuit feedback loop for current generation. However, it has difficulty canceling the temperature effect arising from the matching between the mobility temperature exponent and the mobility degradation factor. Besides, the VTH mismatch issue between the tracking device pair will degrade the current accuracy. Due to the topology, the current reference suffers from poor line sensitivity (10%/V). All of these non ideal effects cause the accuracy of the reference current to deviate significantly from the process variations (∼±30%). Alternatively, a low-voltage, low-power MOSFET-only self-biased current source was reported. The temperature dependence could reach as high as 2500 ppm/°C over the operating range of −20 °C to 70 °C. To improve the temperature coefficient (T.C.), suggested second-order temperature compensation. Since both the reference current and its T.C.are process sensitive, unavoidable trimming is required to preserve the accuracy. Later, Bendali and Audet showed a reference circuit utilizing the zero T.C. point of the transistorto generate a constant output current. This relies on the temperature compensation concept using mutual temperature compensation between the carrier mobility and the threshold voltage. Following a similar technique, Uenoet al. realized the current reference with an improved T.C. of46 ppm/°C. Although a low line sensitivity is achieved, it may not be adequate if the circuit is designed using nanometer technology. At this juncture, the output current, which is obtained from the saturation-based transistor to serve as a VI converter, is sensitive to the offset of the driving op-amp, thus increasing the process sensitivity. In another design, a current reference was generated by means of a constant overdrive voltage. However, the line sensitivity becomes a major concern because the supply voltage needs to be well controlled. Turning to the current summing design technique, the power consumption is generally high because of the circuit’s complexity. Although the floating-gate transistor-based current reference offers a precise current, it is expensive in the trimming method. The same is true of the low T.C. current reference, which depends upon a low T.C precision-trimmed voltage reference at the expense of drawing extra power. It is of particular note that these reported designs are implemented using 0.18-μm CMOS technology or above. As the technology is further scaled down to sub100 nm, the performance of current references will be degraded by the process–voltage–temperature (PVT) variations. This stems from the fact that the process variations arising from the lithography imperfections and uncontrollable factors such as random dopant fluctuations, the well proximity effect, and layout-dependent stress variation impose challenges for robust circuit designs. In addition, the MOS transistors suffer from a high current leakage level. The temperature-fluctuation-induced variation in the carrier mobility becomes significantly higher for MOSFET devices in the exemplary 65-nm CMOS technology than in the 0.18-μm technology. Finally, the short-channel effect (SCE) contributes another factor that limits the circuit performance in advanced nanometer technology. In brief, the lower the channel length in a technology, the more difficult it is to achieve a stable reference design because of the relatively poor output characteristic of long channel transistors compared with those with higher channel lengths in technologies. As a consequence, the design in a 65-nm process turns out to be more challenging than that in other processes (>65 nm) even when transistors with larger than the minimum size are used.

DISADVANTAGES:

  • Sensitivity is low

PROPOSED SYSTEM:

Table I summarizes the acronyms and nomenclature adopted in this paper. The constant current generation is devised from a process-tolerant temperature-compensated VI converter. It aims at establishing a constant VTH0 reference compensation voltage having a first-order T.C. with reduced process sensitivity in series with another auxiliary compensation voltage having a second-order T.C. with low process sensitivity. The combined temperature characteristic will match the corresponding linear T.C. and nonlinear T.C. of the integrated resistor in the VI converter. The outcome leads to a constant current reference with reduced PVT variation. The operation principle of this proposed circuit is illustrated in Fig. 1.

Figure 1: Operation principle of the proposedIREFcircuit with the temperature characteristic of (a)VGS(T),(b)VAux_Comp(T),(c)VR_Comp(T),(d)RO(T), and (e)IREF

A process-tolerant bias current (IPTol) and a process tracking voltage (VPTrack) are generated through the IPTol and VPTrack bias circuits in self-biasing topology. When a scaled IPTol is injected into a MOSFET transistor, it will generate a gate–source voltage VGS(T) with a first order negative T.C., as shown in Fig. 1(a). Fig. 1(c) depicts the target reference compensation voltage VR_Comp(T) that is formed by summing the nonlinear auxiliary compensation voltage VAux_Comp(T) in Fig. 1(b) with the gate–source voltage VGS(T). On the other hand, VAux_Comp(T) is synthesized from the current-to-voltage (IV) conversion in which the scaled IPTol is passed to an active resistor. The resistor realization is based on a scaled VP Track to bias the gate of the triode transistor. Since VR_Comp(T) exhibits a similar T.C. to the sense resistor RO(T)with the temperature characteristic shown in Fig. 1(d), the final output current IREF can be made temperature independent over the operating temperature range as illustrated in Fig. 1(e).

Implementation of proposed current reference:

The current reference depicted in Fig. 2 consists of a pre regulator, a process-tolerant current bias circuit with an embedded process-tracking voltage bias circuit, and a temperature-compensated VI converter. For simplicity, the capacitive startup circuit and the biasing circuit are not shown. The pre regulator loop, which has been reported in a nanometer-based MOSFETVTH measurement circuit, is used to provide good line sensitivity. Referring to Fig. 2,when the op-amp is employed in the bias current circuit, the drain voltages at nodes N1 and N2 are close to each other. This establishes identical currents flowing throughM1 and M2, which are biased in the sub-threshold region. On the other hand,MR1 operates in the triode region as an active resistor. It is self-biased by VPTrack, which is generated from M8 in the saturation region. This turns out to be a process-tolerant active resistor RMR1. As such, the current IPTol flowing throughM1 orM2 becomes process tolerant.

Figure 2: Schematic of the proposed constant current reference circuit.

ADVANTAGES:

  • Sensitivity is high

SOFTWARE IMPLEMENTATION:

  • Tanner tool

 

An All-MOSFET Sub-1-V Voltage Reference With a−51-dB PSR up to 60 MHz

An All-MOSFET Sub-1-V Voltage Reference With a−51-dB PSR up to 60 MHz

 

ABSTRACT:

This paper presents a voltage reference (VR) with a power supply rejection (PSR) better than 50 dB for frequencies of up to 60 MHz, and uses MOSFETs in strong inversion. Another innovation is a compact MOSFET low-pass filter, which was developed along with a feedback technique for a wide-bandwidth PSR not achieved in previous works. The proposed all-MOSFET VR was fabricated using a standard 0.18µm CMOS process. The proposed architecture of this paper analysis the logic size, area and power consumption using Tanner tool.

EXISTING SYSTEM:

Sub-threshold-based MOSFET (STBM) VRs, compared with BGRs, have an inherent advantage in power consumption. However, due to the effects of the temperature-dependent gate-to-surface coupling coefficient tm, poor transistor ratio matching, and the inaccuracies of models, designing STBMVRs is usually an inexact process, resulting in inferior performance. Furthermore, it is difficult to achieve compact and low noise VRs due to either lack of accurate models or low sheet resistance in digital CMOS processes. For modern SoC applications, it is necessary for embedded VRs to have high power supply rejection (PSR) over a wide frequency bandwidth in order to reject noise from the high-speed on-chip digital circuits. Conventional techniques, such as using long channel lengths, cascade structures, and pre-regulations, are usually adopted in VRs to improve PSR. However, such techniques can only improve the low frequency performance at the expense of headroom, area, and power dissipation. A resistor less low-power non-band-gap VR is presented in this paper. All the MOSFETs in the core are standard CMOS transistors biased in strong inversion. Using strong inversion, we used some extra power compared with the weak inversion designs. However, we achieved a more accurate reference since using strong inversion leads to better transistor ratio matching compared with using weak inversion. In the proposed VR, a proportional-to-absolute-temperature (PTAT) voltage is converted into current (I) proportional to mobility (μ) and the square of temperature (T2), i.e.,I ∝μT2. This current is used to extract a temperatures table reference voltage from the VGS of a diode-connected nMOS. In addition to a compact all-MOSFET passive low pass filter (LPF), a feedback technique is also proposed for wide bandwidth PSR enhancement. The VR is fabricated in a standard 0.18μm CMOS process.

DISADVANTAGES:

  • Design is fixed
  • Low performance

PROPOSED SYSTEM:

The conceptual block diagram of the proposed VR circuit isshown in Fig. 1. A PTAT voltage is generated using a PTAT voltage generator and converted into a current proportional to μT2, which serves as a bias current for the whole VR.

Figure 1: Conceptual block diagram of the proposed VR circuit. The PTAT voltage generator puts forth the voltage needed to convert to a bias current for the whole VR. Finally, a diode-connected nMOS transistor (active load) is used to generate a temperature-independent reference

A 5-bit trimming is applied to the output current to cancel the effects of process variation. Also, in order to compensate for the unwanted parasitic diode leakage current at high temperatures, a temperature compensation branch is introduced to increase the output bias current at high temperatures. The PTAT voltage generator, the V-to-I converter, and the feedback form a self-biased current source. The feedback in a self-biased current source is used to provide feed forward and feedback paths for PSN. The feedback and feed forward paths enhance the PSR performance of the VR up to medium range frequency. The passive LPF is a compact all-MOSFET LPF used to attenuate the supply noise at high frequencies.

PTAT Voltage Generator:

The minimum supply voltage in most VRs is dependent on the reference output voltage. Hence, a simple way to decrease the minimum supply voltage of a VR circuit is to reduce the reference’s output voltage. In BGRs, this can be done either by making sure that only a fraction of the material band gap affects the circuit or by lowering the material band gap. The latter is not feasible in standard CMOS technologies, but the former solution is practical and can be implemented in several ways. The most power efficient and compact technique for low power, low-voltage BGR design is to virtually lower the material band gap using an electrostatic field. By this, the circuit “senses” a band gap, which is the material band gap lowered by the electrostatic field. To implement this method, one has to replace the conventional bipolar junction transistor (BJT) with a dynamic threshold MOS transistor (DTMOS).

Figure 2: (a) Cross section of a DTMOS transistor and schematic connection. (b) PTAT generation from DTMOS. (c) Simulated temperature characteristics of DTMOS and corresponding PTAT voltage.

A cross section of DTMOS is shown in Fig. 2(a). As can be seen, this device is a standard pMOS transistor with an interconnected body and gate, and its operation is similar to a BJT or a weak-inversion MOS operation; hence, it can be viewed as a lateral BJT with an extra gate over the base oras an MOS whose threshold voltage is dynamically controlled by the VGS voltage.

Self-Biased Current Source:

The self-biased current source serves as the core of the proposed VR and consists of an asymmetric source-input voltage to-current converter (formed by transistors MN1, MN2, MP1, and MP2), the PTAT voltage generator from the DTMOS1 andDTMOS2, and a feedback branch (consisting of DTMOS0,MN0, and MP0). In Fig. 3, the asymmetric source-input voltage-to-current converter (trans conductance) is used to convert the PTAT voltage from the DTMOS into a bias current for the VR. In addition to minimizing the voltage difference between nodes A and B, the feedback is also used to realize a partial feed forward transfer function for PSN.

Figure 3: Proposed VR circuit

Startup Circuit:

A startup circuit (formed by MS1-MS3 as shown in Fig. 3) is used to ensure that the former state is achieved. At startup, the transistorMS1 is OFF since VREF is zero. This makes the voltage across MOSFET capacitor MS1 zero; hence, the node H is pulled upto VDD. As a result, MS3 is turned ON, pulling down node G and thereby allowing IBIAS to flow.

Figure 4: Leakage compensation configuration. (a) Low temperature. (b) High temperature.

A thermal switch, formed by transistors MN4, MN5, and MP6 is used to allow ICOMP(copied multiple of IBIAS) to flow into the drain MN3 to compensate for the leakage current. The MN5, with its gate connected acrossVDTMOS1,is OFF at low temperatures (below 50 °C) since its threshold voltage at these temperatures is higher thanVDTMOS1. As a result, MP6 is also OFF, hence cutting off ICOMP. MN5 gradually begins to turn on as the temperature increases because VTH decreases faster with a temperature rise compared to the VDTMOS1. MP6 is progressively turned ON as the temperature increases since its gate voltage is slowly lowered. This allows ICOMP to steadily increase to compensate for the leakage current as the temperature increases. Fig. 4(a) and (b) illustrates the active load and leakage compensation circuit configuration for low and high temperature operations, respectively.

ADVANTAGES:

  • Design flexibility
  • High performance

SOFTWARE IMPLEMENTATION:

  • Tanner tool

 

Conditional-Boosting Flip-Flop for Near-Threshold Voltage Application

Conditional-Boosting Flip-Flop for Near-Threshold Voltage Application

 

ABSTRACT:

A conditional-boosting flip-flop is proposed for ultra-low voltage application where the supply voltage is scaled down to the near-threshold region. The proposed flip-flop adopts voltage boosting to provide low latency with reduced performance variability in the near threshold voltage region. It also adopts conditional capture to minimize the switching power consumption by eliminating redundant boosting operations. Experimental results in a 65-nm CMOS process indicated that the proposed flip-flop provided up to 72% lower latency with 75% less performance variability due to process variation, and up to 67% improved energy-delay product at 25% switching activity compared with conventional pre charged differential flip-flops. The proposed architecture of this paper is analysis the logic size, area and power consumption using tanner tool.

EXISTING SYSTEM:

Capacitive boosting can be a solution to overcome theproblems caused by aggressive voltage scaling. It allows the gatesource voltage of some MOS transistors to be boosted above thesupply voltage or below the ground.The enhanced driving capabilityof transistors thus obtained can reduce the latency and its sensitivity to process variations. The bootstrapped CMOS driver presented in [8] relies on this technique to drive heavy capacitive loads with substantially reduced latency. However, since it is a static driver, every input transition causes the bootstrapping operation. So, if some of the transitions are redundant, a large amount of redundant power consumption may occur. The conditional-bootstrapping latched CMOS driver [9] proposes the concept of conditional bootstrapping to eliminate the redundant power consumption. As it is a latched driver, it can allow boosting only when the input and output logic values are different, resulting in no redundant boosting and improved energy efficiency, especially at low switching activity. Recently, a differential CMOS logic family adapting the boosting technique has also been proposed for fast operation at the near-threshold voltage region.

DISADVANTAGES:

  • Low speed
  • Less performance

PROPOSED SYSTEM:

For incorporating the conditional boosting into a pre charged differential flip-flop, four different scenarios regarding input data capture should be considered, which are determined by the logic states of the input and output. These scenarios are as follows:

  • For a low output data, a high input data should trigger boosting for a fast capture of incoming data;
  • For a low output data, a low input data should trigger no boosting since the input need not be captured;
  • For a high output data, a low input data should trigger boosting for a fast capture of incoming data;
  • For a high output data, a high input data should trigger no boosting.

These scenarios can be embodied into a circuit topology using a single boosting capacitor by a combination of two operation principles. One is that the voltage presetting for the terminals of the boosting capacitor must be determined by the data stored at the output (so-called output-dependent presetting). The other principle is that boosting operations must be conditional to the input data given to the flip-flop (so called input-dependent boosting). The conceptual circuit diagrams for supporting these principles are shown in Fig. 1.To support the output-dependent presetting, the preset voltages of capacitor terminals N and NB are made to be determined by outputs Qand QB as shown in Fig. 1(a). If Q and QB are low and high, N and NB are preset to be low and high [left diagram in Fig. 1(a)], and if Q and QB are high and low, N and NB are preset to be high and low [right diagram in Fig. 1(a)], respectively. To support the input dependent boosting, the non-inverting input (D) is coupled to NB through an nMOS transistor and the inverting input (DB) is coupled to N through another nMOS transistor, as shown in Fig. 1(b). Then, as one case in which a low data is stored in the flip-flop, resulting in the capacitor presetting given in the left diagram in Fig. 1(a),a high input allows NB to be pulled down to the ground, letting N being boosted toward–VDD due to capacitive coupling [upperleft diagram in Fig. 1(b)]. Meanwhile, a low input allows N to be connected to the ground, but since the node is already preset to VSS, there is no voltage change at NB, resulting in no boosting [lower left diagram in Fig. 1(b)]. As the other case in which a high data is stored in the flip-flop, resulting in the capacitor presetting given in right diagram in Fig. 1(a), a low input allows N to be pulled down to the ground, letting NB being boosted toward –VDD due to capacitive coupling [lower right diagram in Fig. 1(b)].

Figure 1: Conceptual circuit diagrams for (a) output data-dependent presetting

Meanwhile, a high input allows NB to be connected to the ground, but since the node is already preset to VSS, there is no voltage change at N, resulting in no boosting [upper right diagram in Fig. 1(b)].Table I summarizes these operations for easier understanding. With these operations, any redundant boosting can be eliminated, resulting in a significant power reduction, especially at low switching activity.

Table 1: DATA-DEPENDENTPRESETTING ANDBOOSTING

Circuit Implementation:

The structure of the proposed conditional-boosting flip-flop (CBFF) based on the concepts described in the previous section is shown in Fig. 2. It consists of a conditional-boosting differential stage, a symmetric latch, and an explicit brief pulse generator. In the conditional boosting differential stage shown in Fig. 2(a), MP5/MP6/MP7and MN8/MN9 are used to perform the output-dependent presetting, whereas MN5/MN6/MN7 with boosting capacitor CBOOT are used to perform the input-dependent boosting. MP8–MP13 andMN10–MN15 constitute the symmetric latch, as shown in Fig. 2(b).Some transistors in the differential stage are driven by a brief pulsed signal PS generated by a novel explicit pulse generator shown in Fig. 2(c). Unlike conventional pulse generators, the proposed pulse generator has no pMOS keeper, resulting in higher speed and lower power due to no signal fighting during the pull-down of PSB. The role of the keeper to maintain a high logic value of PSB is done by MP1added in parallel with MN1, which also helps a fast pull-down of PSB. At the rising edge of CLK, PSB is rapidly discharged by MN1, MP1, and I1, letting PS high. After the latency of I2 and I3,PSB is charged by MP2, and so PS returns to low, resulting in a brief positive pulseat PS whose width is determined by the latency of I2 and I3. When CLKis low, PSB is maintained high by MP1, although MP2 is OFF. According to our evaluation, the energy reduction is up to 9% for the same slew rate and pulse width.

Figure 2: Proposed CBFF. (a) Conditional-boosting differential stage. (b) Symmetric latch. (c) Explicit brief pulse generator.

ADVANTAGES:

  • High speed
  • High performance

SOFTWARE IMPLEMENTATION:

  • Tanner EDA