VLSI Power Efficiency, Leakage, Dissipation and Management Techniques: A Survey
VLSI Power Efficiency, Leakage, Dissipation and Management Techniques: A Survey
VLSI Power Efficiency, Leakage, Dissipation and Management Techniques: A Survey
157 www.erpublication.org
VLSI Power Efficiency, Leakage, Dissipation and Management Techniques: A Survey
systems and hence, we take the following approach to limit E. Average Power
the scope of the paper. We include only those research works Average power is the total distribution of power over a time
that propose methods for improving energy efficiency and period. The ratio of energy to test time gives the average
also evaluate it. Those works which only evaluate power. Elevated average power increases the thermal load
performance improvement are not included although they that must be vented away from the device under test to prevent
may also lead to better energy efficiency. We review structural damage (hot spots) to the silicon, bonding wires, or
application and architectural level techniques and not package.
circuit-level techniques. Since different techniques have been
evaluated using different platforms and methodologies, we F. Instantaneous Power
only focus on their fundamental research idea and do not Instantaneous power is the value of power consumed at any
present the qualitative results. given instant. Usually, it is defined as the power consumed
right after the application of a synchronizing clock signal.
A. Energy and Power Modelling
Elevated instantaneous power might overload the power
Power consumption in CMOS circuits can be static or distribution systems of the silicon or package, causing
dynamic. Current used from the power supply causes static brown-out.
power dissipation in the system. Dynamic dissipation occurs
during output switching because of short circuit current, and G. Peak Power
charging and discharging of load capacitance. For existing The highest power value at any given instant, peak power
CMOS technology, dynamic power is the dominant source of determines the components thermal and electrical limits and
power consumption, although this might change for future system packaging requirements. If peak power exceeds a
high-scale integration. certain limit, designers can no longer guarantee that the entire
circuit will function correctly. In fact, the time window for
B. Sources of Power Consumption
defining peak power is related to the chips thermal capacity,
We briefly review the sources of power consumption in and forcing this window to one clock period is sometimes just
embedded systems and refer the reader to previous work [8, a simplifying assumption. For example, consider a circuit that
9] for more details. The power consumption of embedded has peak power consumption during only one cycle but
systems can be broadly divided in two categories, namely consumes power within the chips thermal capacity for all
dynamic power and static power. The dynamic power (Pdyn ) other cycles. In this case, the circuit is not damaged, because
consumption arises from charging and discharging of the load the energy consumed which corresponds to the peak power
capacitance, and the short circuit currents. The leakage power consumption times one cycle will not be enough to elevate
(Pleak ) arises due to leakage currents that flow even when the temperature over the chips thermal capacity limit
the device is inactive. Thus, we have (unless the peak power consumption is far higher than
Pdyn = CV 2 F (1) normal).
Pleak = IleakV (2)
Here shows the switching activity, F shows the H. Sources of Power Dissipation
operating frequency and V shows the operating voltage. I Power dissipation in digital CMOS circuits is caused by
leak shows the leakage current. With CMOS scaling the sources such as the leakage current, dependent on the
leakage power is increasing dramatically [7]. DVFS based fabrication technology, consists of reverse current in the
techniques work by reducing dynamic energy, while the parasitic diodes between source and drain junction diffusions
techniques which transition the system to low-power aim to and the bulk substrate region in a MOS transistor, and
reduce leakage energy. For a given CMOS technology sub-threshold current which arises due to inversion charge
generation, dynamic power consumption can be reduced by that exists at the gate voltages which are the threshold voltage,
adjusting voltage and frequency of operation or by reducing the standby current which is the DC current drawn
the activity factor. It is clear that, for a given CMOS continuously from Vdd to ground, the short-circuit
technology generation, the opportunity of saving leakage (rush-through) current which is due to the DC path between
energy lies in redesigning the circuit to use low-power cells, the supply rails during output transitions, the capacitance
reducing the total number of transistors or putting some parts current which flows to charge and discharge capacitive loads
of caches into low (or zero) leakage mode. Based on these during logic changes.
essential principles, several architectural techniques have
been proposed II. POWER MANAGEMENT FOR LIMITED SIZE AND BATTERY
C. Terminology
Test power is a possible major engineering problem in the Power management in embedded systems is important for
future of SoC development. As both the SoC designs and the battery-operated mobile embedded system; energy supply is a
deep-submicron geometry become prevalent, larger designs, crucial limitation. Power consumption in systems leads to
tighter timing constraints, higher operating frequencies, and heating, which should not exist in several domains such as
lower applied voltages all affect the power consumption embedded systems. Further, the small size of these systems
systems of silicon devices. [4] also limits the amount of heat-dissipation that can be
managed. Smaller power consumption enables use of smaller
D. Energy power supplies and reduced heat dissipation overhead, which
The total switching activity generated during test application, also reduces the cost, weight and area of embedded systems.
energy affects the battery lifetime during power up or periodic Thus power management can lead to easier system design.
self-test of battery-operated devices.
158 www.erpublication.org
International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869, Volume-2, Issue-7, July 2014
III. LEAKAGE ENERGY SAVING APPROACHES caches using the selective-ways approach, program response
for different number of cache ways needs to be estimated.
A. An Overview For this purpose, researchers generally utilize utility
As explained before, leakage energy saving approaches monitors based on Mattson stack algorithm. Similarly, for
work by turning off a part of the cache to reduce the leakage utilizing selective-sets approach, researchers generally use
energy consumption of the cache. Based on the data set-sampling method and multiple auxiliary tags for getting
retentiveness of turned-off blocks, the leakage energy saving profiling information.
techniques are classified into two broad types, namely
state-preserving and state-destroying techniques. The IV. APPROACHES FOR SAVING BOTH DYNAMIC
state-preserving techniques turn off a block while AND LEAKAGE ENERGY
preserving its state (e.g., [10], [11]). This means that when Several studies present reconfigurable cache architectures
the block is reactivated, it does not need to be fetched from which offer flexibility to change one or more parameters of
next level of memory. The energy saving techniques turn off cache. By taking advantage of the flexibility offered by these
cache at the granularity (unit) of certain cache space, such as architectures, both dynamic and leakage energy can be saved.
a single way or a single block at a time. Based on this Several researchers have presented techniques for
granularity, leakage energy saving techniques can be synergistically using both leakage and dynamic energy saving
classified as way-level [12], [13], cache sub-block level [14]. techniques. For example, Giorgi and Bennati [24]
To demonstrate the typical values of the different cache demonstrate that using filter cache [23] reduces the number of
parameters, we take the example of an 8-way set-associative accesses to L1 cache, which, in turn, enables effectively using
cache of 2MB size with 64B block size and 8-byte sub-block. leakage energy saving techniques in L1 caches. Similarly,
To achieve high granularity with selective-ways approach Keramidas et al. [25] propose a way-selection based
requires use of highly-associative caches, which also have technique for additionally saving dynamic energy in the
high access time and energy. Selective-sets approach can caches which use decay-based leakage energy management.
potentially provide large granularity, however, in practice, it Their technique works on the observation that in a cache,
is observed that reducing the cache size below 1/8 or 1/16 using cache-decay mechanism [26] for saving leakage
significantly increases the miss-rate [15], [16]. Since leakage energy, several cache-blocks may be dead. Thus, by making
energy varies exponentially with the temperature, an increase an early determination of these dead blocks, the accesses to
in chip temperature increases the leakage energy dissipation these cache blocks can be avoided, which leads to saving of
in caches, which, in turn, further increases the chip dynamic energy of the cache. Since way-selection
temperature. To take chip temperature into account while mechanism, unlike way-prediction mechanism, gives definite
modelling and minimizing leakage energy, several techniques information about a cache miss, it always leads to uniform
have been proposed [17], [18], [19]. cache hit latency.
For both state-preserving and state-destroying leakage
control, architectural techniques make use of some A. Enabling Green Computing
well-known circuit-level mechanisms. Powell et al. [20] It has been estimated that the ICT (Information and
propose a circuit design named gated Vdd , which facilitates communications technology) contributes nearly 3% in the
state-destroying leakage control. This technique adds an extra overall carbon footprint [27]. Thus, power management in
transistor in the supply voltage path or ground path of the embedded systems is also important for achieving the goals of
SRAM (static random access memory) cell. For reducing the green computing.
leakage energy of the SRAM cell, this transistor is turned off
B. Using Power Modes
and by stacking effect of the transistor, the leakage current is
reduced by orders of magnitude. For reducing the leakage In embedded systems, the hardware typically provides a range
energy of the SRAM cell, the cache controller switches the of operating modes which can be used to save energy.
operating voltage of the cell to low voltage, thus putting the Different modes consume different amount of power and
cell in low-leakage mode. When this line is accessed the next take different time to return back to the normal mode. In
time, the supply voltage is again switched to high, thus the general, the modes with lower energy consumption also take
cache-block consumes normal power. Kim et al. [10] propose the largest time to return to the normal mode and vice versa.
a super-drowsy circuit design and Agarwal et al. [21] For saving energy while keeping the performance loss
propose a gated-ground circuit design, both of which behave bounded, these modes should be judiciously used. Also, while
similar to the drowsy cache, except that they only require a a low-power mode can be used when the system is idle, the
single voltage supply. Similarly, another state-preserving system must return to the normal mode for actually servicing a
circuit design, named multi-threshold CMOS (MTCMOS), request or performing the task.
dynamically changes the threshold voltage of the SRAM cell Hoeller et al. [28] propose an interface for power
by modulating the back-gate bias voltage to transition the cell management of hardware and software components. They
to low-leakage mode. method allows applications to express when certain
Mohyuddin et al. [22] propose a technique for saving components are not being used and based on this
leakage energy by maintaining different ways of a cache at information, individual components, subsystems or the whole
different state-preserving power saving modes depending on system can be transitioned to low-power modes. This frees the
their replacement priorities. Going from the MRU way to the programmer from the task of individually managing the
LRU way, cache lines are kept in increasingly aggressive power consumption of each component. Huang et al. [29]
power saving mode which also have increasingly larger propose an energy saving technique which works by
penalties of cache line wakeup. To dynamically reconfigure adaptively controlling the power mode of the embedded
system according to historical arrivals of tasks. Their
159 www.erpublication.org
VLSI Power Efficiency, Leakage, Dissipation and Management Techniques: A Survey
technique takes decision regarding when to transition the inside the scan-chain for scan-based BIST. An example of
system to low-power from normal-power mode or vice versa, the low transition TPG for test-per-clock schemes is the
based on the relative time overhead and energy advantage approach presented in [23]. This approach, called
from mode transition and the consideration of meeting the DS-LFSR. The proposed design, called low transition random
deadlines of the tasks. test pattern generator (LT-RTPG), is composed of an LFSR, a
Awan et al. [30] Propose an approach for saving energy in k-input AND gate, and a toggle flip-flop T-FF. Some cells of
embedded systems using multiple low-power modes. Their the LFSR are connected with the inputs of the k-input AND
technique computes the break-even time for each mode using gate, the output of the AND gate is connected with the CUT
offline analysis. Further, since early completion of (the T-FF output will not toggle in m-cells will have the same
high-priority task creates slack, their technique accumulates value in most cases. Thus the power while scanning-in a test
this task and uses it to save extra leakage energy in lower vector not while scanning out the captured response. Also, in
priority tasks by allowing the device to stay in low-power order to get a high fault-coverage, a long test sequence is
mode for longer time. needed. put of the T-FF, and the output of the T-FF is
connected with the scan-chain input Sin). Since the output of
C. Saving Energy in Specific Components
the AND gate (input of the T-FF) is 0 in most of the cases, of
Several researchers propose micro-architectural techniques the clock cycles, and hence the transition probability in the
for saving energy in specific components of embedded CUT will decrease. The main drawback of this system is that
systems. These techniques leverage application properties or it reduces the average power while scanning-in a test vector
variation in workload to dynamically reconfigure the not while scanning out the captured response. Also, in order to
component of the system to save energy. The technique uses get a high fault-coverage, a long test sequence is needed.
software-based RAM compression to increase the effective
size of the memory. The memory compression is used only for B. Test Vectors Reordering
those applications which may gain benefit in performance or The test vectors reordering techniques aim to reduce the
energy from the compression. For such applications, switching activity by modifying the order in which the test the
compression of memory data and swapped-out pages is number of transitions between two consecutive vectors is
performed in an online manner, thus dynamically adjusting reduced (i.e. the Hamming distance between two consecutive
the size of the compressed RAM area. vectors is minimum), then the WSA will be reduced in the
whole CUT [32].
D. Problems Induced by Excessive Test Power
When dealing with high-density systems such as modern C. Scan Cells Reordering Techniques
ASICs and SoCs, a non-destructive test must satisfy all the Another category of techniques used to reduce the power
power constraints defined in the design phase. In addition to consumption in scan-based BIST is the use of scan-chain cells
preventing destruction of the CUT, cost, reliability, ordering techniques [33]. Changing the order of the scan cells
autonomy, performance-verification, and yield-related issues in each scan-chain can reduce the switching activity, and
motivate power consumption minimization during test.[31] hence power dissipation, in scan designs. In the case of a
The cost constraints of consumer electronic products typically deterministic set of test patterns, the best order of cells is the
require plastic packages, which impose a tight limitation on one that gives the best compromise between reducing the
power dissipation. Unfortunately, excessive switching transitions in the scan cells both while scanning in test
activity during test leads to increased current flows in the patterns and while scanning out captured responses.
CUT, making the use of expensive packages for the removal
D. Vector Filtering Techniques
of excessive heat imperative. Moreover, electro migration
causes the erosion of conductors and subsequently leads to The test vectors that are generated by TPGs such as
circuit failure. As the temperature and current density are LFSRs are pseudorandom vectors. The fault detection
major factors that determine electro migration rate, the capability of these vectors quickly reaches diminishing
elevated temperature and current density severely decrease returns. Hence, after running a sequence of test vectors and
CUT reliability. This phenomenon is even more severe in detecting many faults, then only a few of the subsequent test
circuits equipped with BIST because such circuits might be vectors can still detect new faults. The vectors that do not
tested frequently in, for example, online BIST strategies. Not detect new faults, but do consume power when applied to the
only the reliability but also the autonomy of battery-powered CUT, can be filtered or inhibited from being applied to the
remote and portable systems suffers from increased activity. CUT [34]. These algorithms, in general, use extra logic (e.g.
Remote system operation occurs mostly in standby mode with decoder circuitry). Using prior knowledge of the sequences
almost no power consumption, interrupted by periodic of test vectors generated by TPGs such as the LFSR, they can
self-tests. Hence, power savings during test mode directly prevent some sequences from being transmitted to the
prolong battery lifetime. CUT by knowing the first and last vectors in this sequence.
Thus they reduce the power consumption in the CUT.
V. METHODS FOR POWER TESTING E. Low Power Test Vector Compaction
A. Low Transition TPGs In scan-based circuits, in order to reduce the test data volume,
compacting techniques are introduced to merge several test
One common technique to reduce test power consumption is cubes. However, compacting test vectors greatly increases the
the design of low transition TPGs. Most of these techniques power dissipation (it could be several times higher). Thus,
modify the design of the LFSR (or other forms of TPGs such low power test vector compaction techniques have been
as cellular automata) in such a way as to reduce the transitions introduced to minimize the number of test cubes generated by
in the primary inputs of the CUT for test-per-clock BIST or
160 www.erpublication.org
International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869, Volume-2, Issue-7, July 2014
the ATPG tool by merging test cubes that are compatibles in chip-design level, micro architectural level, application level
all bit positions under a power constraint [35]. By carefully and system level. We believe that our survey will enable
merging the test cubes in a specific manner, the number of researchers and engineers to understand the state-of-the-art in
transitions in the scan-chain can be minimized. micro architectural techniques for improving cache energy
efficiency, motivating them to design novel solutions for
F. Scan Architecture Modification
addressing the challenges posed by future trends of CMOS
This technique involves modifying the scan architecture by fabrication and processor design and in addressing the
inserting new elements and partitioning the scan-chain into challenges of power consumption and architecting
segments. In [36] the scan-chain is partitioned into N highly-energy efficient embedded systems of tomorrow.
segments where only one segment is active at a time. This
technique reduces the average power consumption in the REFERENCES
CUT, but it will not affect the power will be enabled by using
[1] S. Murugesan, Harnessing green IT: Principles and practices, IT
the gated clock trees instead of scan enable signals as was professional, vol. 10, no. 1, pp. 24-33, 2008.
used in the previous technique. [2] S. Borkar, Design challenges of technology scaling, Micro, IEEE, vol.
19, no. 4, pp. 23 -29, jul. 1999.
G. Adaptive Shift Power Control technique [3] G. Gammie, A. Wang, H. Mair, R. Lagerquist, M. Chau, P. Royannez, S.
To reduce the scan shift power consumption in logic BIST Gururajarao, and U. Ko, Smartreflex power and performance
management technologies for 90 nm, 65 nm, and 45 nm mobile
by using highly correlated test stimulus bits among adjacent application processors, Proceedings of the IEEE, vol. 98, no. 2,
scan cells, all existing methods only manipulate test stimulus pp.144-159, 2010.
sequences generated by LFSR in various ways and the test [4] International technology roadmap for semiconductors (ITRS),
responses are ignored completely. Although it has been http://www.itrs.net/Links/2011ITRS/2011Chapters/2011ExecSum.pdf,
2011.
observed that the Hamming distance between a test stimulus [5] S. Borkar, Thousand core chips: a technology perspective, in 44th
and its captured test response is typically small, the test annual Design Automation Conference. ACM, 2007, pp. 746-749.
stimulus of a test pattern is loaded into the scan chains at the [6] J. Lorch and A. Smith, Software strategies for portable computer energy
same time as the test response of the previous test pattern is management, IEEE Personal Commun., vol. 5, pp. 60-73, June 1998.
[7] S. Wang and S.K. Gupta, DS-LFSR: A New BISTTPG for Low Heat
unloaded from the scan chains. Dissipation, Proc. Intl Test Conf. (ITC 97), IEEE Press,
Piscataway, N.J.,1997, pp. 848-857.
VI. INCREASING ENCODING EFFICIENCY OF LFSR [8] X. Zhang, K. Roy, and S. Bhawmik, "POWERTEST: A tool for energy
RESEEDING-BASED TEST COMPRESSION conscious weighted random pattern testing ", Proceedings of
International Conference on VLSI Design, pp. 416-422, January 1999.
Usually, the deterministic test set to be encoded by LFSR [9] B. Pouya and A. Crouch, Optimization Trade-offs for Vector Volume
reseeding tends to have a biased probability for the logic and Test Power, Proc. Intl Test Conf. (ITC 00), IEEE Press,
value 1 or 0 at each primary input. The biased inputs are fixed Piscataway, N.J.,2000, pp. 873-881.
[10] N. Kim, K. Flautner, D. Blaauw, and T. Mudge, Single-VDD and
to the logic value 1 or 0 with some combinational logic, so single-VT super-drowsy techniques for low-leakage high-performance
that the amount of data to be encoded by the LFSR can instruction caches, in International Symposium on Low power
considerably be reduced. The combinational logic for bit electronics and design (ISLPED), 2004, pp. 54-57.
fixing has to set some primary input to the logic value 0 (or 1), [11] H. Hanson, M. Hrishikesh, V. Agarwal, S. Keckler, and D. Burger,
Static energy reduction techniques for microprocessor caches, IEEE
if the corresponding probability of the logic value 0 (or 1) is Transactions on VLSI Systems, vol. 11, no. 3, pp. 303 -313, 2003.
one. Otherwise, the test pattern from the pseudorandom test [12] A. Bardine, M. Comparetti, P. Foglia, G. Gabrielli, C. Prete, and
pattern generator, such as an LFSR is directly applied to the P. Stenstr m, Leveraging data promotion for low power D-NUCA
CUT. caches, in 11th EUROMICRO Conference on Digital System Design
Architectures, Methods and Tools (DSD). IEEE, 2008, pp. 307-316.
[13] S. Petit, J. Sahuquillo, J. Such, and D. Kaeli, Exploiting temporal
VII. CONCLUSION locality in drowsy cache policies, in 2nd conference on Computing
frontiers. ACM, 2005, pp. 371-377.
Driven by continuous innovations in CMOS fabrication [14] M. A. Z. Alves et al., Energy savings via dead sub-block
technology, recent years have witnessed wide-spread use of prediction, in International Symposium on Computer Architecture and
multi-core processors and large sized on-chip caches for High Performance Computing (SBAC-PAD), 2012.
achieving high performance. However, due to this, total [15] S. Mittal and Z. Zhang, EnCache: Improving cache energy efficiency
using a software-controlled profiling cache, in IEEE International
power consumption of processors is rapidly approaching the
Conference On Electro/Information Technology, Indianapolis, USA,
power-all imposed by thermal limitations of cooling May 2012.
solutions and power delivery. Thus, to be able to continue [16] S.-H. Yang, B. Falsafi, M. D. Powell, K. Roy, and T. N. Vijaykumar,
achieving higher performance using technological scaling, An integrated circuit/architecture approach to reducing leakage in
deep-submicron high-performance I-caches, in 7th International
managing the power consumption of processors has become a
Symposium on High-Performance Computer Architecture (HPCA),
vital necessity. In this paper, we have reviewed several 2001.
architectural techniques proposed for managing dynamic and [17] S. Kaxiras, P. Xekalakis, and G. Keramidas, A simple mechanism
leakage power in caches. A qualitative survey on low power to adapt leakage-control policies to temperature, in International
testing techniques and its methodology was carried out. While Symposium on Low Power Electronics and Design (ISLPED), 2005,
pp. 54-59.
analyzing, all dimensions of power during chip testing was [18] L. Yuan, S. Leventhal, J. Gu, and G. Qu, TALk: A Temperature
considered as parameters. Low power design requires a Aware Leakage Minimization Technique for Real-Time Systems, IEEE
rethinking of the conventional design process, where power Transactions on Computer-Aided Design of Integrated Circuits and
concerns are often overridden by performance and area Systems, vol. 30, no. 10, pp. 1564-1568, 2011.
[19] L. He, W. Liao, and M. Stan, System level leakage reduction
considerations. This clearly highlights the need of power considering the interdependence of temperature and leakage, in Design
management in embedded systems. To cope with these Automation Conference, 2004, pp. 12-17.
challenges, power management is necessary at all levels, viz. [20] M. Powell, S.-H. Yang, B. Falsafi, K. Roy, and T. Vijaykumar, Gated
Vdd: a circuit technique to reduce leakage in deep-submicron cache
161 www.erpublication.org
VLSI Power Efficiency, Leakage, Dissipation and Management Techniques: A Survey
162 www.erpublication.org