Efficient Dynamic Virtual Channel Organization and Architecture for NoC Systems
A growing number of processing cores on a chip require an efficient and scalable communication structure such as network on chip (NoC). The channel buffer organization of NoC uses virtual channels (VCs) to improve data flow and performance of the NoC system. Dynamically allocated multiqueues (DAMQs) are an effective mechanism to achieve VC flow control with maximum buffer utilization. In this model, VCs employ variable number of buffer slots depending on the traffic. Despite the performance merits of DAMQs, it has some limitations. We propose a new input-port micro-architecture to support our efficient dynamic VC (EDVC) approach that is built on DAMQ buffers. To demonstrate the advantages of EDVC, we compare its micro-architecture with that of the conventional dynamic VC (CDVC), which also employs link-list tables for buffer organization. In terms of hardware, EDVC input-port organization consumes on average 61% less power for application-specific integrated circuit design when compared with the CDVC input port. The saving is even better when compared with VC regulator methodology. An EDVC approach can improve NoC latency by 48%–50% and throughput by 100% on average as compared with the CDVC mechanism. The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.
A link-list-based mechanism has been frequently utilized to implement DAMQ buffers. An initial microarchitecture and VLSI implementation of a link-list-based DAMQ buffer was presented. DAMQ with recruit registers (DAMQWRs) and VC DAMQ are proposed in an effort to overcome some drawbacks of DAMQ. DAMQWR uses DAMQ with some recruit registers to implement adaptive routing for on-chip communication. Recruit registers assign the packets of blocked subqueues to less congested subqueues. However, in addition to hardware overhead, DAMQWR method has additional delays due to recruit register updates and operations.
CONVENTIONAL DAMQ MICROARCHITECTURE
A CDVC router consists of input-port modules, an arbiter, and a crossbar switch as shown in Fig. 1. The CDVC input port contains a static random access memory (SRAM) buffer, five lookup tables, and other logic circuit and ports in Fig. 1. The slot size of the SRAM buffer is equal to the flit size, and the data pointed by the read-pointer appear at the SRAM output. On the activation of credit-in, the data are stored in the SRAM slot pointed by the write-pointer.
Fig. 1. Link-list-based CDVC input port.
Five lookup tables are used to implement the link-list-based DAMQ, where three of these tables are shown in Fig. 2. The VC-state and slot-state tables keep a Boolean value for each VC and slot (empty/occupied). The header-list table has the addresses of input-port buffer that point to the header flits of VCs. The tail-list table keeps the addresses of buffers that point to the tail flits. The link-list table keeps the address of the next slot of each buffer. It links the address of the flits of each VC in an FIFO manner. The slot-state table has a record of the occupied slots in the SRAM buffer.
Fig. 2. CDVC router (4-VC and 16-slot) lookup tables. (a) Link addresses of VCs 16 registers (4 bit). (b) Addresses of first flit of each VC four registers (4 bit). (c) Address of last stored flit of each VC.
We employ an asynchronous communication among routers, sink, and source cores. Credit signals-based handshaking is used to establish communication between the source, intermediate, and destination routers. A credit signal is generated when a source core sends a packet flit. In the case of a destination router, the credit signal causes the data to be stored in the input-port buffer. If the buffer is full then an acknowledge signal, buffer-full is sent back to the source indicating to stop sending flits for the input port.
- Power consumption is high
- High NoC latency
The structure of the EDVC input port along with the NoC router microarchitecture is shown in Fig. 3. Our proposed EDVC router consists of five input-port modules, an arbiter, and a crossbar switch as shown in Fig. 3(a). However, the architecture of the EDVC input port is much simpler in terms of less and efficient hardware and buffering as shown in Fig. 3(b).
Fig. 3. 5 × 5 EDVC (a) router and (b) input-port microarchitecture
Buffers in NoC routers can be placed at three locations: 1) input ports; 2) output ports; or 3) both input and output ports
We employ asynchronous communication in our EDVC mechanism for NoCs. The following functions describe the working of EDVC in detail.
1) Flit Arrival (Clk-Edge #2): A credit-in signal causes the incoming flit and its VC-ID to be saved in a slot pointed by the write-pointer. Meanwhile, the corresponding bit of the slot-state table is set.
2) Request Signal (Clk-Edge #2): When the read-pointer points to a slot and its slot-state bit is set, a request signal is issued according to the VC-ID. The arbiter will read the flit information and perform arbitration.
3) Grant Signal (Clk-Edge #3): If the requested output port is open, the arbiter allocates the proper address for the crossbar switch and VC-ID before issuing a grant signal.
4) Flit Departure (Clk-Edge #3): The grant signal causes the flit to leave the buffer. The corresponding bit of the slot-state table is also reset.
5) Credit Signal (Clk-Edge #4): The high level of grant at the negative clock edge causes the credit-out and grant signals to be set and reset, respectively.
6) Blocking: If the requested output port of a VC is closed, the arbiter issues the VC-block signal to close the corresponding VC. Closing a VC means that a request is not issued and no flit enters the buffer for the VC. In Fig. 4, the EDVC working process listed above is contrasted with the CDVC working process.
Fig. 4. EDVC versus CDVC input-port pipelines.
Input Port Microarchitecture
The block diagram of Fig. 5 shows the simplicity of our EDVC mechanism. The slot-state table (Boolean value) is required to manage the DAMQ structure. The depth of the slot-state table is equal to the depth of input-port buffer. The input-port architecture includes an SRAM (static memory), a slot-state table, two counters, and some other logic circuits and ports as shown in Fig. 3(b).
Fig. 5. EDVC input-port buffer
Operation of Read and Write Pointers: The read-pointer works like a counter that counts the clock cycles as shown Fig. 6(a). The write-pointer also works like a counter but it is controlled by the slot-state table. It also counts the clock cycle when the slot of the input-port buffer is full as shown in Fig. 6(b). When the data are stored in the slot, its corresponding bit is set and causes the write-pointer to increment at the next clock.
Fig. 6. 4-bit simple read and write pointers. (a) Read-pointer. (b) Write-pointer.
- Reduce the power consumption
- Low NoC latency
- Xilinx ISE