Hardware-Efficient Post-processing Architectures for True Random Number Generators
In this brief we present novel post-processing modules for use in True Random Number Generators. These modules are based on mathematical constructs called strong blenders, which provide theoretical guarantees for the randomness of the output. We start by pointing out problems with current post processing methods used in state-of-the-art TRNG designs. We present three novel hardware-efficient architectures and provide guidelines for choosing the design parameters.
Hardware True Random Number Generators (TRNGs) are used in all devices that require secure communication, device authentication or data encryption. Applications include smart cards, RFID tags and IoT devices. TRNGs used in cryptography are subject to strict certification procedure. In the past, the security of TRNG designs was evaluated by running a set of statistical tests such as NIST 800–22 and DIEHARD. However, as pointed out, the statistical features exploited by future cryptanalysis techniques cannot be foreseen in advance. Therefore, it is a risky practice to rely only on a finite set of statistical tests to verify the security of a random number generator. A notable incident happened in 2003 when the Motorola TRNG was attacked only one year after the details of the design were disclosed.
Today’s certification authorities require a theoretical explanation for the unpredictability of generated data. Based on the theoretical model of the digital noise source (DNS), a designer has to make an entropy claim – i.e. a lower bound of the generated entropy. Once this bound is determined, an appropriate digital post-processing method is used to compress the sequence of raw numbers into a shorter sequence of full entropy random numbers that could be used by the application.
While TRNGs presented in open literature often achieve impressive results in terms of throughput, energy and hardware area, they rarely follow all necessary requirements for use in cryptography. A common mistake is the wrong choice of the post-processing algorithm. For example, designs presented in use Von Neumann’s post processing. This method works only when the probability of generating the output bit 1 doesn’t change over time and when there is no correlation between the generated bits. Unfortunately, these conditions are never met in practice. TRNGs presented in use a parity filter for post-processing, while a design from uses an xor gate to combine the outputs of two independent physical sources of randomness. In some specific cases, these methods increase the min-entropy of the output, but they don’t provide general-case theoretical guarantees. TRNG designs presented in use the LFSR-based whitening schemes instead of post-processing. Such schemes don’t compress the data and thus don’t increase the entropy per bit. In addition, many designs don’t use any post-processing because the raw bits pass NIST 800–22 statistical tests. As illustrated by the attack on the Motorola TRNG, this is not a good practice. We stress that the Motorola TRNG was also able to pass all statistical tests from the DIEHARD suite.
NIST special publication 800–90B recommends using one of the vetted post-processing methods based on cryptographically secure primitives such as block ciphers or cryptographic hashes. However, these methods have a high cost in terms of area, energy and throughput which makes them unsuitable for lightweight applications. To the best of our knowledge, the only attempt to implement a mathematically secure, hardware-efficient post-processing method was made by Intel in their µRNG design for IoT applications. This method was based on a single finite field addition and multiplication, a construct that was proposed and theoretically analyzed. Unfortunately, the implementation presented in uses the wrong choice of the finite field and the design parameters, and thus fails to provide security guarantees. Motivated by the current lack of mathematically-secure post-processing modules in the TRNG state-of-the-art, we propose three hardware-efficient post-processing architectures suitable for compact implementations. These architectures are based on mathematical constructs called strong blenders, which provide theoretically proved guarantees for the statistical quality and unpredictability of the output. The only requirement imposed on the digital noise source is that it produces sufficient amount of min-entropy, which makes these post-processing methods compatible with all physical sources of randomness. For all three architectures we provide a method for selecting design parameters given the min-entropy of the digital noise source.
- Low throughput
- Low energy
- More area, and power consumptions
We propose three hardware-efficient architectures for postprocessing modules. These architectures are based on a twosource strong blender  which consumes l bits from each source and produces a single w-bit word. Such blender can be constructed using any set of l×l matrices with a property given in Equation (5). We opt for the right-shift matrices because of their efficient hardware implementation and their compatibility with bit-serial DNS architectures. These are superdiagonal matrices given by:
A bit vector multiplied by Ai results in the same bit-vector shifted by i positions to the right. In bit-serial architectures this multiplication is implemented by simply delaying the bit sequence for i clock cycles. A sum of any subset of matrices A1, …Aw results in a matrix with a rank of at least l−w. Thus, by equation (5) r = w, and by equation (7), the statistical distance of the output from Uw is limited to:
δ < 2 −(b+1−w−l/2) .
Fig. 1 shows the architectures of the proposed post processing modules. A straightforward implementation using two DNSs is shown in Fig. 1a. In this architecture the strong blender is used as a two-source extractor. Multiplications Aix are implemented by delaying an input bit-stream by i clock cycles. Inner products are implemented using an AND gate, an XOR gate and a flip-flop for storing intermediate results. The computation is performed in l-clock cycles while the sources generate raw bits, after which the result is stored in the w-bit output register.
Figure 1 : Architectures of the post-processing modules based on strong blenders
Higher utilization of available entropy can be achieved by exploiting the independence of the strong blender output from one of the inputs. In the architecture shown in Fig. 1b one of the input sequences is reused for generating multiple output words. This architecture uses the same computational core (shown in gray) as the two-source architecture, but it requires only one DNS. The operation consists of two phases: the setup phase and the entropy extraction phase. In the setup phase, an l-bit sequence is generated and stored in the circular-shift register. During the extraction phase, this sequence is rotated through this register providing one input for the computational core, while the DNS generates the data for the second input. Between these two phases, the DNS should be restarted in order to guarantee the independence of the two inputs. This way, we bypass the requirement for two distinct entropy sources.
Fig. 1c shows an architecture that is suitable for high throughput designs. This architecture uses m sources and m−1 computational cores. Each generated bit-sequence is used at most twice, thereby avoiding the risk that a single corrupted sequence affects many output words.
The selection of the optimal architecture should be guided by the application requirements and the properties of the used digital noise source. The choice between the one-source and the two-source architecture comes down to the choice between an l-bit shift register and a DNS. In case that the selected DNS performs better in terms of the the metric of interest (area, power or energy) compared to the shift register, a two-source architecture should be used. Otherwise, a single source architecture is optimal. The multiple source architecture shown in Fig. 1c should be used in case when the throughput requirement cannot be achieved using the other two architectures.
- High throughput
- More energy
- Less area, and power consumptions
 A. Rukhin et al., “A statistical test suite for random and pseudorandom number generators for cryptographic applications.” NIST Special Publication 800-22, August 2008.