272 File Paper
272 File Paper
Abstract—Energy efficiency is a critical focus in contemporary The rest of the paper is structured as follows. Section
integrated circuit (IC) design, particularly for ultra-low-power II presents the design methodology, including logic design,
applications. Sub-threshold voltage scaling presents a promising optimization, layout design, and characterization. Section III
solution for reducing energy consumption, though it introduces
performance challenges that necessitate custom design libraries. details the experimental setup used in this research. Section IV
In this work, we present a comprehensive methodology for cre- provides the design and implementation results of our SubVt
ating a specialized sub-threshold standard cell library, utilizing library. In Section V, we compare our implementation with
the SPEA-2 genetic algorithm for optimal transistor sizing. Our other SubVt implementations. Finally, Section VI concludes
library demonstrates an average energy efficiency improvement the paper by discussing our findings and outlining potential
of 44% compared to conventional approaches. We validated
its performance by synthesizing RISC-V processors and EPFL future directions for this work.
benchmark circuits, comparing them against standard foundry
II. S UB -T HRESHOLD D ESIGN M ETHODOLOGY
libraries operating at nominal (1.2 V) and near-threshold 600 mV
voltages. The results show a 13.26× and 1.49× reduction in In this section, we outline the steps to design an efficient
energy consumption, respectively, for the sub-threshold library, SubVt standard cell library. The process starts by selecting
although this comes with an increase in area by 5.21× and the supply voltage. Next, transistor schematics are created for
1.3×, respectively. These findings underscore the viability of sub-
threshold libraries as an effective option for energy-constrained a chosen set of logic gates. Transistor sizing for these gates
designs. is then optimized automatically using the SPEA-2 algorithm
Index Terms—Optimization, Sub-threshold Standard Cells, [4]. Physical layouts are designed according to standard cell
Energy Efficiency, Strength Pareto Evolutionary Algorithm 2 design rules. Once all the cells are created, characterization
is performed to create library files which we further use in
I. I NTRODUCTION design implementation.
Increasing demand for computing power and design com-
plexity make a strong case for pushing the limits of energy effi- A. Sub-Threshold Logic Design
ciency. We view circuit designing as a multi-level problem and
find the need to optimize the standard cells. We study standard
cell design methodologies and identify the potential of supply 1e−15
Etotal
voltage scaling for reducing energy consumption. Scaling the Edynamic
1.4
supply voltage below the switching threshold can lead to Estatic
Minimum Energy Point
maximum energy savings [1]. Leakage currents, otherwise 1.2
Evaluate
are translated into a physical layout adhering to standard cell
Fitness design rules.
PDK, Optimization
1.9 4.25
Propagation Delay(s)
III. E XPERIMENTAL S ETUP
Total Power(W)
4.00
1.8
This section details the design and optimization process of
our sub-threshold standard cell library. We describe the steps 1.7
3.75
TABLE II: Comparison of our work against other low-power sub-threshold standard cell library implementation.
B. Synthesis Results
7 × 10 18
The EPFL Benchmark [10] results in Fig. 7 for adder, barrel
shifter, sin and square demonstrate that our SubVt library
offers the least total power consumption and PDP compared
PDP (J)
0 10 20 30 40 50 60
These cores were evaluated based on parameters such as
Generation Power, Delay, Power-Delay Product (PDP), Gate Count, Area,
Fig. 5: The Power-Delay-Product propagation of dominant and Frequency. In the SubVt region, the PicoRV32 core
solution across generations for CMOS Inverter logic designed exhibited the highest frequency at 2.33 MHz, outperforming
using UMC 55nm technology. both the PULP and VEXRISCV cores. Meanwhile, the PULP
core achieved the lowest PDP of 0.608 pJ, making it the most
energy-efficient core in this region. Notably, the PULP RISC-
V processor can be found in Fig. 6 displaying an improvement
During the optimization, we observed that after exceeding in PDP by 13.26× and 1.49× when compared to the SuperVt
60 generations, no significant improvement in objective param- and N earVt library. The maximum achievable frequency for
eters of the Pareto-optimal points (solutions offering the best N earVt library was around 45 MHz but the total power
trade-off between objectives) was observed. Thus meeting our consumed was 69.2 µW. Whereas our SubVt library could
terminating criteria. The transistor dimensions obtained at the achieve a maximum frequency of 1.33 MHz but at a reduced
end of the 60th generation are adopted as the final optimized power consumption of 0.80 µW.
values and are used for subsequent design stages. The synthesis results of EPFL benchmarks and RISC-V
The results in Fig. 4 show the power and delay values processor showcase the potential of the optimization method
of the dominant solution across generations for the inverter and motivate the designer to use the custom-designed SubVt
logic gate. When we analyze the results we can observe that library for energy-efficient designs.
the the power and delay values do not change significantly V. R ELATED W ORK
after the 55th generation even after introducing mutations and Several previous works have explored low-power standard
crossover. Additionally, after the 55th generation, we observed cell design and optimization techniques. Blesken et al. [2]
consistent fitness values across both the entire population and pioneered the concept of multi-objective optimization for stan-
the dominant set over multiple generations. dard logic cells, aiming to strike a balance between resource
Table I compares the PDP of each logic gate before and after allocation and robustness. Building on this, Lutkemeier et al.
optimization using the SPEA-2 algorithm. As evident from the [14] designed a sub-threshold (SubVt ) standard cell library
table, all logic cells achieved significant optimization, with an operating at 325 mV, implemented on a CoreVA processor,
average improvement of 44% across the library. Notably, the which achieved a maximum frequency of 133 kHz, reaching a
AOI cell exhibited the least improvement in PDP compared to PDP of 9.9 pJ. Geisler et al. [3] investigated multi-objective
other basic cells. This difference in optimization effectiveness optimization tools for transistor sizing, comparing algorithms
is attributed to the AOI cell’s topology. Unlike some other like NSGA-II, SPEA-2, and GAIO. They concluded that
basic gates, the pull-down path in an AOI cell typically SPEA-2 provided superior optimization due to its ability to
consists of multiple transistors connected in series and parallel. maintain diversity in the solution set, although their research
In our optimization process, we assumed equal transistor was limited to designing inverters and NAND gates.
widths for all transistors in the pull-up and pull-down path as Several other studies have examined ultra-low-power de-
equivalent to Wp and Wn respectively. Without considering the signs across different process technologies (Table II). In [9],
individual widths for optimization, this approach might have a 90 nm MSP430 processor operated at 400 mV with a PDP
resulted in imbalanced transistor sizing, limiting the overall of 5.2 pJ while research by Djupdal et al. [12] compared low-
P DP improvement for the AOI logic gate. power cores like PicoRV32 [16] and SERV, and developed a
109 10−2
SubVt
NearVt
105
−11 SuperVt
10
Power-Delay product(J)
10−3
108
Total Power(W)
Frequency(Hz)
Area(µm²)
6 × 104
10−4
4 × 104
107 10−12
3 × 104 10−5
2 × 104
10−6
106
VEXRISCV PICORV32 PULP VEXRISCV PICORV32 PULP VEXRISCV PICORV32 PULP VEXRISCV PICORV32 PULP
Fig. 6: Comparison of processor synthesis results with SubVt , N earVt and SuperVt library implementation. Figure (a) depicts
the area occupied by each benchmark, figures (b) and (c) display the frequency and power respectively and (d) indicates the
PDP.
105 SubVt
NearVt
10−4 SuperVt
108
Power-Delay product(J)
10−12
Total Power(W)
Frequency(Hz)
10−5
Area(µm²)
107
104
10−6
10−13
106 10−7
Adder Barrel shifter Sin Square Adder Barrel shifter Sin Square Adder Barrel shifter Sin Square Adder Barrel shifter Sin Square
Fig. 7: Comparison for EPFL Benchmark synthesis results between SubVt , N earVt and SuperVt library implementation.
Figure (a) depicts the area occupied by each benchmark, and figures (b) and (c) display the frequency and power respectively
followed by PDP in figure (d) where the SubVt shows the best results
130 nm library for 250 mV-600 mV operation. They focused a trade-off in operating speeds. Our findings underscore the
on sizing transistors around the threshold voltage to enhance potential of using optimized sub-threshold standard libraries as
noise margins and switching speeds while using a conventional viable alternatives for designs where energy efficiency is more
layout. Their PicoRV32 [16] implementation achieved a PDP important than operating speeds. Future work will include
of 1.68 pJ at 300 mV, with a frequency of 1.7 MHz and an adding more logic gates to the library to improve the efficiency
area of 0.23mm². further.
In comparison, our work builds upon the research by Geisler
et al. [3] by developing a SubVt library on a 55 nm node. R EFERENCES
We optimize for the MEP supply voltage using the SPEA-
[1] B. H. Calhoun, A. Wang and A. Chandrakasan, “Modeling and sizing
2 algorithm, which effectively narrows the design space to for minimum energy operation in subthreshold circuits,” in IEEE Journal
identify the optimal aspect ratio for minimal PDP. We imple- of Solid-State Circuits, vol. 40, no. 9, pp. 1778-1786, Sept. 2005, doi:
mented this library on EPFL benchmarks and an open-source 10.1109/JSSC.2005.852162.
[2] M. Blesken, S. Lütkemeier and U. Rückert, “Multiobjective optimization
RISC-V processor (Fig. 6 & 7). Our library demonstrates for transistor sizing sub-threshold CMOS logic standard cells,” Proceed-
better performance in terms of PDP and frequency but at ings of 2010 IEEE International Symposium on Circuits and Systems,
the cost of the relatively increased area compared to other Paris, France, 2010, pp. 1480-1483, doi: 10.1109/ISCAS.2010.5537349.
implementations. [3] M. Vohrmann, P. Geisler, T. Jungeblut and U. Rückert, “Design-space
exploration of ultra-low power CMOS logic gates in a 28 nm FD-
SOI technology,” 2017 European Conference on Circuit Theory and
VI. C ONCLUSION Design (ECCTD), Catania, Italy, 2017, pp. 1-4, doi: 10.1109/EC-
Our study demonstrated the effectiveness of the evolutionary CTD.2017.8093232.
[4] Zitzler, E., Laumanns, M.Thiele, L. (2001). “SPEA2: Improving the
search algorithm SPEA-2 for finding optimal transistor aspect strength pareto evolutionary algorithm,”
ratios at an optimal supply voltage level for minimizing [5] A. Wang and A. Chandrakasan, “A 180-mV subthreshold FFT processor
energy consumption. Using the proposed methodology, a sub- using a minimum energy design methodology,” in IEEE Journal of
threshold standard cell library designed containing 12 cells Solid-STATE Circuits, vol. 40, no. 1, pp. 310-319, Jan. 2005, doi:
10.1109/JSSC.2004.837945.
resulted in a 44% average improvement in energy efficiency. [6] J. Kwong, Y. K. Ramadass, N. Verma and A. P. Chandrakasan, “A 65 nm
The PULP(CV32E40P) low-power RISC-V processor syn- Sub- Vt Microcontroller With Integrated SRAM and Switched Capacitor
thesized using this library consumed 0.6 pJ per operation, DC-DC Converter,” in IEEE Journal of Solid-STATE Circuits, vol. 44,
no. 1, pp. 115-126, Jan. 2009, doi: 10.1109/JSSC.2008.2007160.
resulting in 13.26× and 1.49× reduction in energy compared [7] S. Hanson et al.., “Performance and Variability Optimization Strategies
to commercial library operating at 1.2 V and 600 mV, for in a Sub-200mV, 3.5pJ/inst, 11nW Subthreshold Processor,” 2007 IEEE
Symposium on VLSI Circuits, Kyoto, Japan, 2007, pp. 152-153, doi:
10.1109/VLSIC.2007.4342694.
[8] J. Morris, P. Prabhat, J. Myers and A. Yakovlev, “Unconventional Layout
Techniques for a High Performance, Low Variability Subthreshold
Standard Cell Library,” 2017 IEEE Computer Society Annual Sym-
posium on VLSI (ISVLSI), Bochum, Germany, 2017, pp. 19-24, doi:
10.1109/ISVLSI.2017.14.
[9] A. Roy, P. J. Grossmann, S. A. Vitale and B. H. Calhoun, “A 1.3µW,
5pJ/cycle sub-threshold MSP430 processor in 90nm xLP FDSOI for
energy-efficient IoT applications,” 2016 17th International Symposium
on Quality Electronic Design (ISQED), Santa Clara, CA, USA, 2016,
pp. 158-162, doi: 10.1109/ISQED.2016.7479193.
[10] Amarú, Luca, Pierre-Emmanuel Gaillardon, and Giovanni De Micheli.
“The EPFL combinational benchmark suite.” Proceedings of the 24th
International Workshop on Logic & Synthesis (IWLS). 2015.
[11] M. Gautschi et al.., “Near-Threshold RISC-V Core With DSP Extensions
for Scalable IoT Endpoint Devices,” in IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 25, no. 10, pp. 2700-2713, Oct.
2017, doi: 10.1109/TVLSI.2017.2654506.
[12] Asbjørn Djupdal, Magnus Själander, Magnus Jahre, and Snorre Aunet
“Minimizing the Energy Usage of Tiny RISC-V Cores,” CARRV ’23,
June 17, 2023, Orlando, FL, USA
[13] M. Vohrmann, S. Chatterjee, S. Lütkemeier, T. Jungeblut, M. Porrmann
and U. Rückert, “A 65 nm standard cell library for ultra low-power
applications,” 2015 European Conference on Circuit Theory and De-
sign (ECCTD), Trondheim, Norway, 2015, pp. 1-4, doi: 10.1109/EC-
CTD.2015.7300041.
[14] S. Lutkemeier, T. Jungeblut, H. K. O. Berge, S. Aunet, M. Porrmann
and U. Ruckert, “A 65 nm 32 b Subthreshold Processor With 9T Multi-
Vt SRAM and Adaptive Supply Voltage Control,” in IEEE Journal
of Solid-State Circuits, vol. 48, no. 1, pp. 8-19, Jan. 2013, doi:
10.1109/JSSC.2012.2220671.
[15] J. Kwong, Y. K. Ramadass, N. Verma and A. P. Chandrakasan, “A 65 nm
Sub-Vt Microcontroller With Integrated SRAM and Switched Capacitor
DC-DC Converter,” in IEEE Journal of Solid-State Circuits, vol. 44, no.
1, pp. 115-126, Jan. 2009, doi: 10.1109/JSSC.2008.2007160.
[16] Wolf, C. “PicoRV32, PicoSoC.” 2021.
https://github.com/cliffordwolf/picorv32
[17] LaMeres, Brock J. “The Texas Instruments
MSP430 family of ultra-low-power microcontrollers,”
https://www.ti.com/lit/ds/symlink/msp430g2553.pdf
[18] B. Hübener, G. Sievers, T. Jungeblut, M. Porrmann and U. Rückert,
“CoreVA: A Configurable Resource-Efficient VLIW Processor Ar-
chitecture,” 2014 12th IEEE International Conference on Embed-
ded and Ubiquitous Computing, Milan, Italy, 2014, pp. 9-16, doi:
10.1109/EUC.2014.11.