PROCEED: A Pareto Optimization-Based Circuit-Level Evaluator for Emerging Devices
Evaluation of novel devices in the context of circuits is crucial to identifying and maximizing their value. We propose a new framework, Pareto optimization-based circuit-level evaluator for emerging device (PROCEED), that uses comprehensive performance, power, and area metrics for accurate device-circuit coevaluation through optimization of digital circuit benchmarks. The PROCEED assesses technology suitability over a wide operating region (megahertz to gigahertz) by leveraging available circuit knobs (threshold voltage assignment, power management, sizing, and so on). It improves the benchmark accuracy by 3× to 115× compared with the existing methods while offering orders of magnitude improvements in runtime over full physical design implementation flows. To illustrate the PROCEED’s capabilities, we deploy it to assess emerging technologies, including novel tunneling field-effect transistors, compared with conventional silicon CMOS. As a further illustration, we extend PROCEED to evaluate future heterogeneous integration of varied devices onto the same silicon substrate. The proposed architecture of this paper the area and power consumption are analysis using tanner tools.
Device-circuit assessments must consider several factors to draw realistic conclusions. First of all, any effective power and delay evaluation of modern circuits should cover several orders of magnitude, since their operating frequencies range from kilohertz to gigahertz. Second, chip area, ignored in all current evaluation methods, should be simultaneously considered because of its impact on manufacturing cost and interconnect length. Third, the crucial tuning knobs, such as logic gate sizing and supply voltage (Vdd) or threshold voltage (Vt) selection, must be optimized for proper use of a particular circuit. Fourth, since circuit performance depends critically on the device operating point, benchmarks should consider the full device current–voltage (I–V) characteristics rather than only simplified metrics such as saturation current (ION) or OFF-state leakage (IOFF). Fifth, a given device may not be suitable for all circuit architectures because of variations in logic depth histogram (LDH) patterns, and logical or physical structure. Sixth, as technologies scale down, device variability due to ambient process fluctuations becomes ever more important and impacts circuit viability. Seventh, the benefits to circuit designs of cooperatively using several device types through heterogeneous integration (HGI) are strongly dependent on the design adaptability and circuit topology, which must be considered in any assessment. All the aforementioned complexities would mandate a complete circuit design flow for performance evaluation, which is nevertheless impractically time-consuming. Therefore, an alternative evaluation method must be developed that accounts for the above factors with reasonable computational run time.
- Delay power product and delay area product is high
Fig. 1. Overview of PROCEED framework
As shown in Fig. 1, typical inputs to PROCEED include interconnect information, such as average wire resistance (R) and capacitance (C) and chip size, circuit benchmark design (i.e., design LDH and average fan-out), variability (through Vdd drops, Vt shifts, and so on), full device I–V models, and operating activity, as well as optional constraints on Vdd, Vt , chip area, and the ratio of average to peak throughput (i.e., clock cycles per second).
Canonical Circuit Construction
A complete and exact optimization is an impossible job for large digital circuits. Since the goal of PROCEED is to predict the best performance and power tradeoffs for emerging devices, detailed circuit design is not our target and it contributes little to evaluation. We therefore use only essential design information to maximize performance and determine the optimal Vdd, Vt , and gate sizes for a given power consumption limit.
Fig. 2. (a) Example of simulation block allocation in PROCEED based on logic depth. (b) Circuit schematic used for simulation and optimization.
In Fig. 2, we show an example of the simulation blocks used to construct a specific circuit. For simplicity, we first dividelogic paths into n bins based on logic depth; in Fig. 2(a), for instance, n = 5. A larger number of bins improve accuracy at the expense of computation time. Each bin is modeled by corresponding simulation blocks Si [S1–S5 in Fig. 2(a)], which are in turn made of i gate stages. We use the gate design for Si to construct logic paths belonging to a given bin i.
The LDH is divided in such a way that the longest path in each bin has the same delay if all these blocks have the same delay. The logic gate and interconnect used for a single stage in the simulation blocks is shown in Fig. 2(b). The gate can be NAND, NOR, or something more complicated like XNOR, depending on the average number of transistors per gate in the chosen benchmark.
Delay and Power Modeling
Delay, power, and area are the three most important gross metrics in the design of digital circuits, but usage constraints lead to tradeoffs between them that must be balanced to maximize the overall efficiency of the design. Hence, we use them as evaluation metrics in PROCEED.
The area of the gates used in canonical circuit constructions is simulated using UCLADRE1, where they are minimized in accordance with input design rules and gate netlists.
Fig. 3. (a) NAND gate. Schematic and layouts for (b) CMOS and (c) TFET
Fig. 3(a) shows a NAND gate logic schematic, where adjacent transistors share a source/drain at node n1. Fig. 3(b) stacks two nMOS devices to create a compact layout for traditional CMOS technology. However, due to the source/drain asymmetry, a TFET layout for the same circuit must split the stack, leading to additional area overhead, as shown in Fig. 3(c). To account for this effect, we modify UCLADRE such that it generates area-optimal TFET layouts for any input circuit netlist.
Fig. 4. Optimizer overview. Adaptive weight is chosen by slope of existing fronts. Based on starting point, metamodeling is built and gradient descent is used to find potential points. Simulate potential points to get new Pareto points.
PROCEED can simultaneously optimize any two metrics out of delay, power, and area, while the third is treated as a constraint; for instance, we can perform a Pareto optimization of delay and power with a maximum area constraint. As described in Section II-B, the chosen area model is linear in gate width and hence easier to optimize than delay and power. Therefore, in the remainder of this section, we will describe in detail the Pareto optimization of delay and power with constrained area. Fig. 4 shows an overview of our Pareto optimization process.
- Delay power product and delay area product is reduced
- Tanner tools