Near Threshold Computing Overcoming Performance de
Near Threshold Computing Overcoming Performance de
net/publication/268397950
Article
CITATIONS READS
10 372
5 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Michael Wieckowski on 15 October 2015.
WEED Workshop on Energy-Efficient Design held in conjunction with ISCA 36 pp. 44-49
WEED Workshop on Energy-Efficient Design held in conjunction with ISCA 36 pp. 44-49
increase in a near-exponential fashion. This rise in of voltages. As was discussed in Section 2, there is a
leakage energy eventually dominates any reduction Vmin operating point that occurs in the subthreshold
in switching energy, creating an energy minimum operating region but is tied to operating points of less
seen in Figure 2. than 1MHz. On the other hand, only a modest
increase in energy is seen operating at the NTC
region (around .5V), while frequency characteristics
at that point are significantly better. At nominal
operating points Subliminal operates at 20.5 MHz and
33.1 pJ/inst, showing approximately a 6.6x reduction
in energy and an 11.4x reduction in frequency at the
NTC operating point.
4. NTC Barriers:
Although NTC provides for excellent energy-
frequency tradeoffs, it doesn’t come without its own
set of complications. NTC faces three key barriers
that must be overcome for widespread use,
performance loss, performance variation, and even
functional failure. In the following subsections we will
discuss why each of these exist and why they pose
problems to the wide spread adoption of NTC.
9 E n erg y
F re q u e n cy
1000
8
Energy/Inst (pJ)
Frequency (kHz)
Figure 2: Energy and delay in different supply voltage
7
operating regions.
6
V dd =350m V , 100
The identification of an energy minimum has led to 5 3.52p J/in st, 354kH z
WEED Workshop on Energy-Efficient Design held in conjunction with ISCA 36 pp. 44-49
technology optimization that opportunistically also circuit functionality. In particular the mismatch in
leverages the significantly improved silicon wearout device strength due to process variations such as
characteristics (e.g., oxide breakdown) observed in random dopant fluctuations (RDF) can compromise
low voltage NTC can be used to regain a substantial state elements as the feedback loop develops a
portion of the lost performance. natural inclination for one state over the other. This
issue has been most pronounced in SRAM where
Sub Vt high yield requirements and the use of minimum
5x
sized devices limit variability tolerance. For instance,
Delay a typical 45nm SRAM cell has a failure probability of
Variation -7
~10 at nominal voltage, see Figure 5. This low failure
rate allows failing cells to be readily swapped using
spare columns after fabrication. However, at an NTC
voltage of 500mV, this failure rate increases by ~5
orders of magnitude to approximately 4% (4x10-2). In
NTC
this case, nearly every row and column will have at
least one failing cell, and possibly multiple failures,
rendering simple redundancy methods completely
Super Vt ineffective. There are many alternative SRAM
approaches to help address this variability [21,22,23],
new failure rate estimation techniques [24], and
alternative cache designs [25]. All these techniques
are used to help overcome these failures.
Figure 4: Impact of voltage scaling on gate delay
variation.
4.2. Increased performance variation. In the near- NTC
threshold regime, the dependencies of MOSFET
drive current on Vth, Vdd, and temperature approach
exponential. As a result, NTC designs display a
dramatic increase in performance uncertainty. Figure
4 shows that performance variation due to process
variation alone increases by approximately 5X from
~30% (1.3X) [18] at nominal operating voltage to as 105
much as 150%, (2.5X) at 400mV. When combined
with approximately 2X performance variation due to
supply voltage ripple and 2X variation due to
temperature, a total performance uncertainty of 10X
emerges. Given a total performance uncertainty of
~1.5X at nominal voltage, the increased performance Figure 5: Impact of voltage scaling on SRAM failure rates.
uncertainty of NTC circuits looms as a daunting
challenge that has caused most designers to pass
over low voltage design entirely. Simply adding 5. Addressing Performance Loss:
margin so that all chips will meet the needed
performance specification in the worst-case is One of the most formidable challenges to widespread
effective in nominal voltage design. However, in NTC NTC penetration is overcoming the ~10x performance
design this approach results in some chips running at loss associated with NTC operation while maintaining
1/10th their potential performance, which is wasteful energy efficiency. Below, we explore architectural
both in performance and energy due to leakage and device level methods that form a complementary
currents. Several techniques exist to help mitigate approach to address this challenge.
these problems including Adaptive Body Biasing [19],
soft edge clocking [20]. 5.1 Cluster Based Architecture
In order to regain the performance lost in NTC
4.3. Increased functional failure. The increased without increasing supply voltage, Zhai et. al [26,27]
sensitivity to process, temperature and voltage propose the use of NTC based parallelism. In
variations not only impacts circuit performance but applications where there is an abundance of thread-
WEED Workshop on Energy-Efficient Design held in conjunction with ISCA 36 pp. 44-49
level parallelism the intention is to use 10s to 100s of situation can be common in high performance
NTC processor cores that will regain 10-50X the applications where threads work on independent
performance, while remaining energy efficient. While data. However, these workloads often execute the
traditional superthreshold many-core solutions have same instruction sequences, allowing opportunity for
been studied, the NTC domain presents unique savings with a clustered instruction cache. Initial
challenges and opportunities in these architectures. research of this architecture shows that with a few
Of particular impact are the reliability of NTC memory processors (6-12), a gain of 5-6X performance
cells and differing energy optimal voltage points for improvement can be achieved.
WEED Workshop on Energy-Efficient Design held in conjunction with ISCA 36 pp. 44-49