Physics of Power Dissipation in CMOS FET Devices
Physics of Power Dissipation in CMOS FET Devices
Physics of Power Dissipation in CMOS FET Devices
2. Physics of Power Dissipation
in CMOS FET Devices
• For an ideal MIS diode, the energy difference ψms bet
ween the metal work function ψm and the semiconduc
tor work function ψs is zero:
• ψms ≡ψm - (χ+ Eg/2q +ΨB) = 0 (2.1)
CMOS Gate Power equations
• P = CLVDD2f 01 + tsc VDD Ipeak f 0 1 + VDD Ileakage
• The Maxwell-Boltzmann statistics relates th
e equilibrium hole concentration to the intri
nsic Fermi level:
• p0 = ni exp((Ei – EF)/kT) (2.2)
P substrate (The Fermi level EF in the semi
conductor is now –qV below the Fermi level i
n the metal gate.)
P substrate
• If the applied voltage is increased sufficiently, the
bands bend far enough that level Ei at the surface
crosses over to the other side of level EF.
• This is brought about by the tendency of carriers t
o occupy states with the lowest total energy.
• In the present condition of inversion the level Ei b
ends to be closer to level Ec and electrons outnumb
er holes at the surface.
Ei at the surface now is below EF by an amount of ene
rgy equal to 2 ΨB , where ΨB is the potential difference
between the Fermi level EF and the intrinsic Fermi lev
el Ei in the bulk.
• The value of V necessary to reach the onset
of strong inversion is called the threshold
Surface Space Charge Region
and the Threshold Voltage
• Poisson equation
• ▽ ‧D = ρ(x, y, z) (2.3)
• Where D, the electric displacement vector, i
s equal to εs E under low-frequency or static
conditions; εs is the permittivity of Si; E the
electric field vector; and ρ(x, y, z) the total
electric charge density.
Threshold voltage
• VT =
(2d/εi ) * ( q εs NA ψB (1 – e-2βψB) )0.5 + 2ψB
The total voltage needed to offset the effect of
nonzero work function difference and the pr
esence of the charges is referred to as the fla
t-band voltage VFB.
VFB = ψms – QT*d/εi
Threshold voltage
• VT =
(2d/εi ) * ( q εs NA ψB (1 – e-2βψB) )0.5 + 2ψB + VF
16 Effects Influencing
Threshold Voltage
• VT decreases when L (length) is decreased,
varies with Z (width), and decreases when
the drain-source voltage VDS is increased.
• Drain-induced barrier lowering (DIBL) is
the basis for a number of more complex
models of the threshold voltage shift.
• It refers to the decrease in threshold voltage
due to the depletion region charges in the
potential barrier between the source and the
channel at the semiconductor surface.
• A recent model adopt a quasi two-dimensio
nal approach to solving the two-dimensiona
l Poisson equation.
• dEx/dx at each point (x, y) can be replaced
with the average of its value at (0, y) and at
(W, y)
Short channel effect
• The minimum value of the surface potential
increases with decreasing channel length
and increasing VDS.
20 Subsurface Drain-Induced
Barrier Lowering (Punchthrough)
• The punchthrough voltage VPT defined as th
e value of VDS at which I D, st reaches some s
pecific magnitude with VGS = 0.
• The parameter VPT can be roughly approxim
ated as the value of VDS for which the sum o
f the widths of the source and the drain depl
etion regions becomes equal to L.
• If the field in the oxide, Eox, is large enough, the v
oltage drop across the depletion layer suffices to e
nable tunneling in the drain via a near-surface trap
• The minority carriers emitted to the incipient inver
sion layer are laterally removed to the substrate, c
ompleting a path for a gate-induced drain leakag
e (GIDL) current. In CMOS circuits this leakage c
urrent contributes to standby power.
2.3 Power Dissipation in CMOS
• The first ICs ever fabricated used a PMOS process
. This is due to the simplicity of fabrication of a p-
channel enhancement mode MOS field-effect trans
istor (PMOST) with threshold voltage VTp < 0.
• The charge mobility factor caused the move to the
NMOS process.
• Then change to CMOS because of the power dissi
pation problem.
• This advantage of CMOS over NMOS has proven
to be important enough that the shortcomings of
CMOS are overlooked.
• The CMOS process is more complex than the
NMOS, the CMOS requires use of guard-rings to
get around the latch-up problem, and CMOS
circuits require more transistors than the
equivalent NMOS circuits.
• The threshold voltages place a limit on the
minimum supply voltage that can be used w
ithout incurring unreasonable delay penaltie
• If the threshold voltage is too low, the static
component of the power due to subthreshol
d currents becomes significant.
2.3.1 Short-Circuit Dissipation
• The short-circuit dissipation of the gate vari
es with the output load and the input signal
• The short-circuit dissipation decreases linea
rly (roughly) in both absolute terms and a fr
action of the total dissipation as the output l
oad is increased to a critical value and then i
t will increase again rapidly.
• For simplicity a symmetrical inverter (i.e., β
N = βp and VTn = -Vtp;) and a symmetrical in
put signal (rise time = fall time) are conside
• I = β/2(Vin – V T)2 for 0≦ I≦ Imax
• Imean = 1/T ∫0T I(t) dt
• = 2* 2/T ∫t1t2 β/2 (Vin (t) – VT)2 dt
• Assuming the rising and falling portions of t
he input voltage waveform to be linear ram
• Vin(t) = t* VDD/τ
• Imean = 2*2/T∫(Vt/Vdd) ττ/2 β/2(t*VT/τ – VT)2 dt
• Let θ= (VT/τ)t - VT
• Imean = - 2β/T∫(Vt/Vdd) ττ/2 θ dθ
• Imean = 1/12*β/VDD(VDD – VT)3 τ/T
• The short-circuit power dissipation of an un
loaded inverter is
• PSC = β/12(VDD – VT)3 τ/T
• If the inverter is lightly loaded, causing output rise
and fall times that are relatively shorter than the
input rise and fall times, the short-circuit
dissipation increases to become comparable to
dynamic dissipation.
• To minimize dissipation, an inverter should be
designed in such a way so that the input rise and
fall times are about equal to the output rise and fall
2.3.2 Dynamic Dissipation
• Assuming that the input Vin is a square wav
e having a period T and that the rise and fall
times of the input are much less than the rep
etition period, the dynamic dissipation is gi
ven by
• PD = CL VDD2/T
• When V = VDD, E 0->1 = CLVDD2.
• When energy stored in a capacitor with
capacitance CL and voltage VDD across its
plates is CL VDD2/2, the rest of the energy,
another CL VDD2/2, is converted into heat.
Networks of pass transistors
2.3.3 The Load Capacitance
• The overall load capacitance is modeled as t
he parallel combination of 4 capacitors – t
he gate capacitance Cg,
the overlap capacitance Cov,
the diffusion capacitance Cdiff,
and the interconnect capacitance Cint.
42 The Overlap Capacitance
• Cgd1 = Cgd2 = 2 Cox xd W
• Cgd3 = Cgd4 = Cgs3 = Cgs4 = Cox xd W
• The total overlap capacitance is simply the s
um of all the above:
– Cov = Cgd1 + Cgd2 + Cgd3 + Cgd4 + Cgs3 + Cgs4
43 Diffusion Capacitance
• Two components: the bottomwall area capa
citance and the sidewall capacitance
2.4.1 Principles of Low-Power
• Using the lowest possible supply voltage
• Using the smallest geometry, highest frequency devices
but operating them at the lowest possible frequency
• Using parallelism and pipelining to lower required
frequency of operation
• Power management by disconnecting the power source
when the system is idle
• Designing systems to have lowest requirements on
subsystem performance for the given user level
2.4.3 Fundamental Limits
• The limit from thermodynamic principles results f
rom the need to have, at any node with an equivale
nt resistor R to the ground, the signal power P s exc
eed the available noise power Pavail.
• The quantum theoretic limit on low power comes
from the Heisenberg uncertainty principle. In order
to be able to measure the effect of a switching tran
sition of duration Δt, it must involve an energy gre
ater than h/ Δt:
• P ≧ h/ (Δt)2 where h is the Planck’s constant.
• Finally the fundamental limit based on
electromagnetic theory results in the
velocity of propagation of a high-speed
pulse on an interconnect to be always less
than the speed of light in free space, c0:
• L/τ≦ c0 where L is the length of the
interconnect and τ is the interconnect transit
2.4.4 Material Limits
• The attributes of a semiconductor material t
hat determine the properties of a device buil
t with the material are
• Carrier mobility μ
• Carrier saturation velocity σs
• Self-ionizing electric field strength Ec
• Thermal conductivity K
• Consider an SOI structure by surrounding th
e above generic device in a hemispherical s
hell of SiO2 of radius ri, indicating a two-or
der-of-magnitude reduction in thermal cond
• The response time of the global interconnec
t circuit is
• τ= (2.3 Rtr + Rint) Cint where Rtr is the outpu
t resistance of the driving transistor and R int
and Cint are the total resistance and capacita
nce, respectively, of the global interconnect.
2.4.7 System Limits
• The architecture of the chip
• The power-delay product of the CMOS
technology used to implement the chip
• The heat removal capacity of the chip
• The clock frequency
• Its physical size
Energy characterization
• Transition-sensitive energy models
– Single energy tables
• Bit independent modules e.g., flipflops
– Multiple energy tables
• Large bit dependent modules e.g., 32-b adders
• Large multi-element modules e.g., register files
– Transition sensitive energy equations
– System level interconnect capacitance values
• Analytical energy modes
– Cache and main memory
Transition-sensitive energy
• Must first design and layout a functional unit and
then simulate it to capture switch capacitances
– Bit independent – bus lines, pipeline registers
• One bit switching does not affect other bit slices’ operations
• Bit dependent – ALU, decoders
• Once constructed, the models can be reused in
simulations of other architectures built with the
same technology
Switch Capacitance Table
Previous Input Current Input Switch
Vector Vector Capacitance
0…00 0…00 cap0→0
… … …
Table Compression
• Problem
– Results in large uncompressed table (e.g., 16-bit adder
232 rows)
– Excessive simulation (e.g., 232!)
• Solution
– Clustering Algorithm Reference: Huzefa Mehta, et al. “
Module Energy Characterization using Clustering”, DA
– For 16-bit adder, to keep 12% average error 1000 si
mulation points, 97 rows
2:1 Multiplexer Table
64 rows
000 000 0.00
000 001 0.00 Compressed
Architectural Level Analysis
• Very computationally efficient
– Requires predefined analytical and transition-
sensitive energy characterization models
– Requires design only to RTL (with some idea
as to the kind of functional units planned)
– Coarse grain – use of gated clocks implicit
• Reasonably accurate (within 5% - 15% of
• Simulation based so can be used to support
architectural, compiler, OS, and application
level experimentation
• WattWatcher (Sente), DesignPower and Po
werCompiler (Synopsys), prototype academ
ic tools (Wattch – Princeton, SimplePower
– PSU)