Extending 3-bit Burst Error-Correction Codes With Quadruple Adjacent Error Correction
The use of error-correction codes (ECCs) with advanced correction capability is a common system-level strategy to harden the memory against multiple bit upsets (MBUs). Therefore, the construction of ECCs with advanced error correction and low redundancy has become an important problem, especially for adjacent ECCs. Existing codes for mitigating MBUs mainly focus on the correction of up to 3-bit burst errors. As the technology scales and cell interval distance decrease, the number of affected bits can easily extend to more than 3 bit. The previous methods are therefore not enough to satisfy the reliability requirement of the applications in harsh environments. In this paper, a technique to extend 3-bit burst error-correction (BEC) codes with quadruple adjacent error correction (QAEC) is presented. First, the design rules are specified and then a searching algorithm is developed to find the codes that comply with those rules. The H matrices of the 3-bit BEC with QAEC obtained are presented. They do not require additional parity check bits compared with a 3-bit BEC code. By applying the new algorithm to previous 3-bit BEC codes, the performance of 3-bit BEC is also remarkably improved. The encoding and decoding procedure of the proposed codes is illustrated with an example. Then, the encoders and decoders are implemented using a 65-nm library and the results show that our codes have moderate total area and delay overhead to achieve the correction ability extension.
- Xilinx 14.2
Reliability is an important requirement for space applications. Memories as the data storing components play a significant role in the electronic systems. They are widely used in the system on a chip and application-specific integrated circuits. In these applications, memories
Fig. 1. Memory cell area of different technology (cell area shape is simplified to a square, and Length is the length of side
account for a large portion of the circuit area. This makes memories suffer more space radiation than other components. Therefore, the sensitivity to radiation of memories has become a critical issue to ensure the reliability of electronic systems. In modern static random access memories (SRAMs), radiation-induced soft errors in the form of the single event upset (SEU) and multiple bit upset (MBU) are two prominent single event effects. As semiconductor technology develops from the submicrometer technology to the ultradeep submicrometer (UDSM) technology, the size of memory cells is smaller and more cells are included in the radius affected by a particle as shown in Fig. 1.
When a particle from a cosmic ray hits the basic memory cell, it generates a radial distribution of electron–hole pairs along the transport track. These generated electron–hole pairs can cause soft errors by changing the values stored in the memory cell leading to data corruption and system failure. For the transistors with a large feature size, a radiation event just affects one memory cell, which means that only the SEU occurs. In this case, the use of single error-correction (SEC)- double error-detection (DED) codes is enough to protect the memory from radiation effects
As the feature size enters into DSM range, the critical charge keeps decreasing and the area of the memory cell scales down for each successive technology node. This makes more memory cells affected by a particle hit as shown in Fig. 2. For the CMOS bulk technology, with the cell-to-cell spacing decreasing, the electron–hole pairs generated in the substrate can diffuse to nearby cells and induce MBUs .
Fig. 2. Schematic description of memory cells included in the radiation effect with variation of the technology node.
This compares with the FDSOI technology, which isolates transistors and limits the multicollection mechanism. Therefore, the multicollection mechanism is more prominent for a bulk technology, and the MBU probability is higher. Although multiple bit error-correction codes (ECCs) can correct multiple bit errors in any error patterns not limited to the adjacent bits, the complexity of the decoding process and the limitation of the code block size limit their use. Meanwhile, from the generation principle of MBUs , the type of the MBUs depends on the initial angle of incidence and scattering angle of the secondary particles. Based on this, adjacent bit errors are dominant error patterns among the multiple bit errors. Therefore, adjacent bits correction ECCs become popular in memory-hardened designs. Many codes are proposed, and the capability of adjacent bits correction mainly focuses on the double adjacent error correction (DAEC), triple adjacent error correction (TAEC), and 3-bit burst error correction (BEC). An alternative to codes that can correct adjacent errors is to use an SEC or SEC-DED code combined with an interleaving of the memory cells. Interleaving ensures that cells that belong to the same logical word are placed physically apart. This means that an error on multiple adjacent cells affects multiple words each having only one bit error that can be corrected by an SEC code. As noted in previous studies, interleaving makes the interconnections and routing of the memory more complex and it will lead to an increase area and power consumption or limitations in the aspect ratio. Therefore, whether it is better to use SEC plus interleaving or a code that can correct adjacent errors will be design-dependent and both alternatives are of interest
As the technology comes to the UDSM, the area of the memory cell keeps further decreasing and even memories having atomic-dimension transistors appear. The ionization range of ions with the order of magnitude in micrometer can include more memory cells in the word-line direction as shown in Fig. 2 than the three bits previously considered. This means that the SEC-DAEC-TAEC codes may not be effective to ensure the memory reliability. Codes with more advanced correction ability are demanded. For example, codes designed with low redundancy for SEC-DAEC-TAEC and 3-bit BEC are presented. Therefore, extending the correction ability to quadruple adjacent bit errors would be of interest, especially if it can be done without adding extra parity bits.
In this paper, we present an improvement of 3-bit BEC codes to also provide quadruple adjacent error correction (QAEC). The code design technique for the QAEC with low redundancy is specified from two aspects: 1) error space satisfiability; and 2) unique syndrome satisfiability. Codes with QAEC for 16, 32, and 64 data bits are presented. From the view of the integrated circuits design, two criteria have been used to optimize the decoder complexity and decoder delay at the ECCs level: 1) minimizing the total number of ones in the parity check matrix and 2) minimizing the number of ones in the heaviest row of the parity check matrix. Additionally, based on the traditional recursive backtracing algorithm, an algorithm with the function of weight restriction and recording the past procedure is developed. The new algorithm not only reduces the cost of program run time, but also remarkably improves the performance of previous 3-bit BEC codes. The encoders and decoders for the QAEC codes are implemented in Verilog hardware description language (HDL). Area overhead and delay overhead are obtained by using a TSMC bulk 65-nm library. Compared with the previous 3-bit BEC codes, the area and delay overhead is moderate to achieve the correction ability extension.
- Not reliable in operation
- Performance of 3-bit BEC is not amended.
Searching algorithms and tool development
In this section, an algorithm is proposed to solve the Boolean satisfiability problem based on the discussion in the former section. Based on the algorithm, a code optimization tool is developed to obtain the target H matrix with custom optimization restrictions. The introduction of the algorithm is divided into two subsections. In the basic part of the algorithm is introduced to find the solutions meeting the requirement of the Boolean satisfiability. Based on the basic part, the method with column weight restriction is designed to force the optimization process to use as few ones as possible, thus optimizing the total number of ones in the matrix and the number of ones in the heaviest row. This optimized version of the algorithm has been used to obtain all the codes presented in this paper.
Basic Part of Code Design Algorithm
In order to construct the expected codes, the first step is to ensure the number of the check bits. The number of the check bits is seven for codes with 16 data bits, eight for codes with 32 data bits, and nine for codes with 64 data bits.
The main idea of the algorithm is based on the recursive backtracing algorithm. At first, an identity matrix with block size (n − k) is constructed as the initial matrix, and the corresponding syndromes of the error patterns are added to the syndrome set. Then, a column vector selected from Fig. 3. Flow of code design algorithm. the 2n−k − 1 column candidates is added to the right side of the initial matrix. This process is defined as columnadded action. Meanwhile, the new syndromes which belong to the new added column are calculated. If none of new syndromes is equal to the elements in a syndrome set, the column-added action is successful and the corresponding new syndromes are added into the syndrome set. The base-matrix is updated to the previous matrix with the added column. Otherwise, the column-added action fails and another new column from the candidates is selected. If all the candidates are tried and the column-added action still fails, one column from the right side of previous matrix and the corresponding syndromes are reduced from the base-matrix and the syndrome set, respectively. Then, the algorithm continues the columnadded action until the matrix dimension reaches the expected value. The code design algorithm flow is shown in Fig. 3.
Normally, the recursive backtracing algorithm demands a large amount of computing resources and computing time. In order to accelerate the computing speed of the algorithm operation, firstly, we adopt the decimal operation instead of the matrix operation by conversing the column vectors into decimal numbers. Even so, the algorithm completing the execution of all conditions is not possible. In general, if the code we expect exists, it is easy to obtain the first solution. With different optimization criteria, the algorithm can get better solutions. However, searching the best solution requires the complete result of the whole searching process, which is in most cases, unfeasible with today’s computing resources. Therefore, it is more practical to use the best result obtained in a reasonable computation time. To be consistent with , that time was set to one week for all the results presented in this paper. Fig. 4. Flow of algorithm with column weight restriction and past procedure record is shown below
it can be observed that the new algorithms are able to find better solutions than the existing algorithms such as the one used in . Therefore, they can be applied for the finding process of the QAEC codes. The solutions found for QAEC codes are presented in the paper.
- Reliable in operation
- Performance of 3-bit BEC is improved.
Jiaqiang Li , Student Member, IEEE, Pedro Reviriego, Senior Member, IEEE, Liyi Xiao, Member, IEEE,Costas Argyrides, Senior Member, IEEE, and Jie Li, “Extending 3-bit Burst Error-Correction Codes WithQuadruple Adjacent Error Correction”, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2018.