A Combined SDC-SDF Architecture for Normal I/O Pipelined Radix-2 FFT

  • September 24, 2015
  • Blog
No Comments

A Combined SDC-SDF Architecture for Normal I/O Pipelined Radix-2 FFT


We present an efficient combined single-path delay commutator-feedback (SDCSDF) radix-2 pipelined fast Fourier transform architecture, which includes log2 N − 1 SDC stages, and 1 SDF stage. The SDC processing engine is proposed to achieve 100% hardware resource utilization by sharing the common arithmetic resource in the time-multiplexed approach, including both adders and multipliers. Thus, the required number of complex multipliers is reduced to log4 N − 0.5, compared with log2 N − 1 for the other radix-2 SDC/SDF architectures. In addition, the proposed architecture requires roughly minimum number of complex adders log2 N + 1 and complex delay memory 2N + 1.5 log2 N − 1.5.


Fast Fourier transform (FFT) has played a significant role in digital signal processing field, especially in the advanced communication systems, such as orthogonal frequency division multiplexing (OFDM) [1] and asymmetric digital subscriber line [2]. All these systems require that the FFT computation must be high throughput and low latency. Therefore, designing a high performance FFT circuit is an efficient solution to the abovementioned problems. In particular, the pipelined FFT architectures have mainly been adopted to address the difficulties due to their attractive properties, such as small chip area, high throughput, and low power consumption. To the best to our knowledge, two types of pipelined FFT architectures can be found in this brief: delay feedback (DF) and delay commutator (DC). Further, according to the number of input data stream paths, they can be classified into multiple-path (M) or single-path (S) architectures. The two classifications form four kinds of pipelined FFT architectures [e.g., single-path DC (SDC)]. Multiple-path (M) architectures [3]–[9], are often adopted when the throughput requirement is beyond the theoretical limitation that the single-path architecture can offer at a given clock frequency. However, they require concurrent read (write) operations for the multipath input (output) data. Therefore, single-path (S) architectures could be appropriate in some cases when the system cannot ensure concurrent operations. However, the arithmetic utilization is relatively low, compared with 100% utilization of the existing MDF/MDC architectures [4]. In this brief, we focus on the SDC radix-2 pipelined FFT architecture, which can also achieve 100% multiplier utilization by reordering the inner data sequence


The proposed FFT architecture consists of 1 pre-stage, log2 N− 1 SDC stages, 1 post-stage, 1 SDF stage, and 1 bit reverser, shown in Fig. 2(a). The pre-stage shuffles the complex input data to a new sequence that consists of real part followed by the corresponding imaginary part, shown in Table I. The corresponding post-stage shuffles back the new sequence to the complex format. The SDC stage t (t = 1, 2, . . . , log2N − 1) contains an SDC PE, which can achieve 100% arithmetic resource utilization of both complex adders and complex multipliers. The last stage, SDF stage, is identical to the radix-2 SDF, containing a complex adder and a complex subtracter. By using the modified addressing method [12], the data with an even index are written into memory in normal order, and they are then retrieved from memory in bit-reversed order while the ones with an odd index are written in bit-reversed order. Final, the even data are retrieved in normal order. Thus, the bit reverser requires only N/2 data buffer.


(a) Block diagram of the proposed FFT architecture. (b) Block diagram of the SDC PE, including a data commutator, a real add/sub unit, and an optimum complex multiplier unit. a (b) means the real (imaginary) part of subtraction result, c (d) means the real (imaginary) part of the twiddle factor. G1, G2, and G3 mean three one-cycle delay elements. The signal s controls the behavior of two multiplexers (M1 and M2) and the Real Adder. When s is 1 (0), both multiplexers perform “through” (“swap”) and the Real Adder performs addition (subtraction) operation.



  • Modelsim 6.0
  • Xilinx 14.2


Fields marked with an * are required



Please, let us know the best time to contact you by phone (if provided).

I would like to get new blog posts by email