Low-Power Near-Threshold 10T SRAM Bit Cells With Enhanced Data-Independent Read Port Leakage for Array Augmentation in 32-nm CMOS
The conventional six-transistor static random access memory (SRAM) cell allows high density and fast differential sensing but suffers from half-select and read-disturb issues. Although the conventional eight-transistor SRAM cell solves the read-disturb issue, it still suffers from low array efficiency due to deterioration of read bit-line (RBL) swing and Ion/Ioff ratio with increase in the number of cells per column. Previous approaches to solve these issues have been afflicted by low performance, datadependent leakage, large area, and high energy per access. Therefore, in this paper, we present three iterations of SRAM bit cells with nMOS-only based read ports aimed to greatly reduce datadependent read port leakage to enable 1k cells/RBL, improve read performance, and reduce area and power over conventional and 10T cell-based works. We compare the proposed work with other works by recording metrics from the simulation of a 128-kb SRAM constructed with divided-wordline-decoding architecture and a 32-bit word size. Apart from large improvements observed over conventional cells, up to 100-mV improvement in read-access performance, up to 19.8% saving in energy per access, and up to 19.5% saving in the area are also observed over other 10T cells, thereby enlarging the design and application gamut for memory designers in low-power sensors and battery-enabled devices.
STATIC Random Access Memory (SRAM) occupies a significant portion of a system-on-a-chip (SoC) and has a notable contribution to the total power consumption and area of the SoC. Since area is an important factor when designing circuits, memory design engineers aim to place as many cells as possible per column to allow sharing of peripheral circuitry. The conventional 6T and 8T cells are greatly limited by their inability to work in longer columns. This is because they suffer from data dependent leakage and degraded ION/IOFF ratio and read bit-line swing as more cells are placed on a single column. Therefore, there is a need to design new circuits to address this issue. Previous approaches have tried to solve this issue by improving the ION/IOFF ratio to enable up to 1k cells per column. Although these approaches have been successful at this task, these still suffer from large area or varying data-dependent performance. Some also fail to account for the minimum energy point in SRAMs and therefore, consume a lot of energy per access at ultra-low voltages. This work describes three iterations of SRAM bit cells with nMOS-only based read ports aimed to greatly reduce data-dependent read port leakage to enable 1k cells per RBL, improve read performance, and reduce area and power over conventional 6T and 8T cells and other novel read-port based cells. With a unique topology in each of the three cells’ read port, we obtain improved read access performance, low energy per access, and low area respectively, thereby enlarging the design and application gamut for memory designers in low power sensors and battery enabled devices.
SRAM’s impact has become especially important due to the emergence of battery powered portable devices and low power sensor applications. Most SRAM design effort has been led to facilitate voltage scaling and improving yield. The conventionally implemented six transistor (6T) cell in SRAMs allows high density, bit-interleaving and fast differential sensing but suffers from half-select stability, read-disturb stability, and conflicting read and write sizing. Previous attempts to solve these issues have included the implementation of assist techniques, novel cell design, architectural improvements, or technological developments.
Half-select and read-disturb issues in SRAMs can be mitigated by optimization of word-line voltage level. This includes word-line under-drive assists using process corner tracking or using replica access transistors. Delayed word-line boost to match the internal voltage of half-selected cells to that of the bit-line during a read operation helps to improve their stability but requires fine tuning to establish the sensitive tradeoff between read stability and write ability. Cell supply boost assist can also be used to improve half-select stability by increasing the drive strength of pull down nMOS.
Negative cell ground implementation to improve read stability is the most effective assist but has high energy cost due to use of multiple GND rails. Disturb issues can also be mitigated by partial precharge of bit-lines to decrease the strength of access transistors. Pilo et al. make use of regulators to reduce the precharge voltage level of the bit-lines to around 70% of supply voltage to improve the read stability. Alternatively, the bit-lines can be precharged using an nMOS instead of a pMOS to obtain a single VTH drop on the bit-lines. A process variation tolerant selective precharge assist has also been used to decrease bit-line voltage level using charge sharing to improve half-select disturb issues. However, such partial bit-line precharge techniques reduce read ability and become less effective at lower voltages due to reduced VDS of the access transistors. Multiple supply line assist can also be used to improve read and write half-select stability issues in SRAMs. In a column-based dynamic supply technique was proposed. By implementing different supply voltages for read, write and standby modes, it relieved half-select stability issues and allowed bit-interleaving. However, this resulted in increase in dynamic power, design and routing effort and area due to generation of multiple supply voltages.
Although assist techniques can be beneficial in improving the performance and yield of SRAMs, they can often have a deteriorating complementary effect on write and read operations. They can also incur large area overhead, increase the energy per access, and have a limited and saturating effect on yield. Furthermore, since read and write stability is greatly dependent on temperature variations, an SRAM can either be write-limited at lower temperatures or read-limited at higher temperatures. Therefore, assists often require process and temperature tracking for effective yield improvement.
Apart from assist techniques, improvements on the architectural front have also been made to address half-select and read disturb stability issues. These include cross-point selection of words using both row and column word-lines to improve half select stability. Shorter bit-lines can also be used to improve read stability. These work by reducing bit line capacitance, thereby improving dynamic read margin. However, this comes at the expense of large area overhead due to greater number of cell banks. In another work, an array architecture with an area overhead of 12% was implemented in order to address the half-select disturb issue by decoupling the large bit-line capacitance from half-selected cells. Read and-write-back scheme has also been used to alleviate the write-disturb in half-select cells. It allows data retention by writing back the stored data after each read. However, such techniques increase the dynamic power consumption since every column is subjected to full voltage swings. Additionally, the sense amplifier cannot be shared amongst several columns and has to be integrated in each column, thereby incurring a large area overhead.
Figure 1 : Schematic of (a) 6T (b) 8T SRAM cell.
With the 6T SRAM cell being afflicted by various stability issues, the 8T SRAM cell has been proposed (shown in Fig. 1). It has a decoupled read path comprising of two nMOS transistors. Although it eliminates the read-disturb issue, it is still afflicted by a pseudo-read during a write operation in half-selected cells on the same row. As such, the issue of loss of bit-interleaving capability arises. Bit-interleaving is essential to low voltage SRAM operation since it is combined with Error-Correction Code (ECC) to combat soft errors and achieve required yield targets. Soft errors, including Single Bit Upsets (SBUs) and Multiple Cell Upsets (MCUs) are caused by bombardment of alpha-particles, thermal neutrons or high energy cosmic rays. The rate of soft errors increases by 18% for every 10% decrease in supply voltage. This is especially problematic for low voltage SRAMs, since in sub-threshold operation region, the critical charge in nodes is significantly reduced, leading to frequent MCUs. In MCUs have been mitigated by implementing and combining bit-interleaving structure with ECC. In addition, bit-interleaving capable cell structures such as the column-decoupled 8T cell in , disturb-free 9T cell in, two-port disturb-free 9T cell, multi-port 9T cell in and the differential 10T cell to enable bit-interleaving and remove half-select disturb issues by using both row and column word-lines. For cell structures without interleaving capability such as the single ended 8T cell, additional parity or ECC bits can be interleaved per word for soft error correction.
Even if the read and write disturb issues are alleviated using the methods described above, an array implemented using the 8T cells has low array efficiency. This is because, its single ended mechanism requires a hierarchical sensing architecture which implements as few as eight cells per local RBL and multiple local RBLs per global RBL. Additionally, unlike the fast differential sensing in the 6T cell, the single ended sensing has a slow full swing operation. As greater number of cells are put on the same local RBL in order to improve array efficiency, both delay and the read bit-line voltage swing are greatly affected. Therefore, this form of hierarchical sensing does not compare to differential sensing in terms of both performance and array efficiency. Although many techniques have been proposed to improve the single ended read sensing performance, the area overhead still remains large. In order to improve the array efficiency and read bit-line voltage swing of single-ended-read cells, many modified read ports have been proposed.
These designs aim to put up to 1k cells per bit-line by improving the ION/IOFF ratio of SRAM read ports. This approach helps to greatly improve the array efficiency as peripheral circuitry can be shared amongst greater number of cells. Although these approaches have been successful at this task, these still suffer from large area, varying data-dependent performance and high energy consumption. In this work, we propose three iterations of SRAM bit cells with nMOS-only based read ports and compare them with conventional 6T and 8T cells and previous 10T cell-based works by measuring metrics from simulation of a 128kb array on the 32nm technology node. We compute minimum energy per access for all cells considering different activity factors for various levels of caches and calculate dynamic failure rate based on operating frequency and process variations.
- More area and power consumptions
- Maximum energy point in SRAM
- More Transistor Count
PROPOSED SRAM BIT CELLS:
Topology of Proposed Bit Cells:
The schematic of the proposed 10T SRAM cells is shown in Fig. 2. Each of them comprises of cross coupled inverters (PUL-PDL and PUR-PDR) and two access transistors (ACL and ACR). The read port of each cell consists of four nMOS (R1, R2, R3 and R4). The read port in Fig. 2(a) has improved data-dependent read bit-line leakage and is aimed at high performance. The read ports in Fig. 2(b) and (c) have complete data-independent read bit-line leakage and are aimed at very low power and high density respectively. The working of each port has been explained in the next section. From here on, the proposed cells are referred to as 10T-P1, 10T-P2 and 10T-P3.
Figure 2 Schematic of the proposed (a) 10T-P1 (b) 10T-P3 (c) 10T-P2 cells.
Bit Cell Working Mechanism:
When operating in near and sub-threshold region, the ION/IOFF is severely degraded and it becomes increasingly difficult to implement greater number of cells on a single column. As the number of cells increase, the combined pass gate leakage becomes comparable to the read current, thereby making it difficult for the sense amplifier to correctly evaluate the read bit-line voltage level. Furthermore, the data stored in the cell also affects the read bit-line leakage, thereby making the off-state read bit-line leakage current to fluctuate highly. This is exacerbated at ultra-low voltages, where the worst-case data pattern can lead to the RBL voltage level of ‘zero’ becoming greater than the RBL voltage level of ‘one’.
Figure 3 : Schematic of read port of (a) Calhoun and Chandrakasan  (b) Kim et al.  (c) Pasandi and Fakhraie  (d) Proposed 10T-P1 (e) 10T-P2 (f) 10T-P3 cell.
In order to improve the ION/IOFF ratio, the read port shown in Fig. 3(a) was proposed. When the cell stores ‘one,’ the R2 pMOS charges the intermediate node, thereby greatly reducing the read bit-line leakage through R1 nMOS. However, this also leads to flow of leakage current from intermediate node into the RBL. The combined leakage of all cells on the same column can raise the low logic level of RBL to several hundred millivolts, thereby leading to reduced voltage swing and sensing margin. The conceptual scenario of the effective read bit-line voltage swing for this case has been depicted in Fig. 4(a). On the other hand, when the cell stores ‘zero,’ the RBL leakage is reduced through the stacking effect of nMOS. Therefore, such a topology makes the effective RBL swing largely dependent on the data pattern in the column. In another work , the data dependency was removed by creating a data-independent leakage path between the cell’s read port and the RBL. This led to a significant voltage swing on the RBL even at lower voltages. The read port and the corresponding effective RBL swing for the same has been shown in Fig. 3(b) and Fig. 4(b) respectively. A recent work , also proposed a modified read port [shown in Fig. 3.(c)], to improve the ION/IOFF ratio. However, it is also afflicted by the data-dependent leakage path issue. Depending upon the data stored in the cell, the leakage from intermediate node to RBL can change drastically, thereby leading to varying low logic voltage levels of RBL. Despite this issue, it is able to maintain an RBL swing, as shown in Fig. 4(c). From here on, the cells in Fig. 3(a), (b) and (c) will be referred to as the 10T-C, 10T-K and 10T-P cells respectively. Like the proposed cells, these cells also have the same topology for the write port and differ in terms of the read port only.
Figure 4 : Conceptual Effective Read Bit-Line Swing of (a) Calhoun and Chandrakasan  (b) Kim et al.  (c) Pasandi and Fakhraie  (d) Proposed 10T-P1 (e) 10T-P2 (f) 10T-P3 cell.
The schematic of the proposed read ports is shown in Fig. 3(d)–(f). The proposed 10T-P2 and 10T-P3 cells are aimed at low power and low area respectively while simultaneously maintaining a data-independent ION/IOFF ratio. The principle behind their working is depicted in Fig. 5(c) and (d). As seen in Fig. 5, the magnitude of Ileak becomes equal in both read ‘zero’ and read ‘one’ case. This helps to maintain the required difference in magnitude between accessed-cell current in both cases. As such, a significant effective RBL swing can be observed, as shown in Fig. 4(e) and (f). This is not possible in the case of conventional 8T cell sensing, because of the large dependence of leakage current on the data pattern.
Although the proposed 10T-P1 cell decreases its data dependency in comparison to the 10T-C cell as seen in Fig. 4(d), it largely remains incapable of performing a read operation at ultra-low voltages. However, in the following subsection, we show that operating at ultra-low voltages increases the energy per access and operating near the threshold point is optimal for lowest energy consumption. As such, the 10T-P1 cell is operated near the sub-threshold region for lowest energy consumption and highest performance. At near-threshold and super-threshold voltages, the read bit-line swing is not an issue for the 10T-P1 cell. A more comprehensive analysis of RBL swing of each cell with respect to data pattern, supply voltage and temperature is presented in the next section.
- Less area and power consumptions
- Minimum energy point in SRAM
- Reduced Transistor Count
 B. H. Calhoun and A. P. Chandrakasan, “A 256-kb 65-nm sub-threshold SRAM design for ultra-low-voltage operation,” IEEE J. Solid-State Circuits, vol. 42, no. 3, pp. 680–688, Mar. 2007.