Vol. 37, No. 4 Journal of Semiconductors April 2016

Design of power balance SRAM for DPA-resistance

Zhou Keji(周可基)1 , Wang Pengjun(汪鹏君)1; Ž , and Wen Liang(温亮)2
1 Institute of Circuits and Systems, Ningbo University, Ningbo 315211, China
2 State Key Laboratory of ASIC & System, Fudan University, Shanghai 201203, China

Abstract: A power balance static random-access memory (SRAM) for resistance to differential power analysis
(DPA) is proposed. In the proposed design, the switch power consumption and short-circuit power consumption
are balanced by discharging and pre-charging the key nodes of the output circuit and adding an additional short-
circuit current path. Thus, the power consumption is constant in every read cycle. As a result, the DPA-resistant
ability of the SRAM is improved. In 65 nm CMOS technology, the power balance SRAM is fully custom designed
with a layout area of 5863.6 m2 .The post-simulation results show that the normalized energy deviation (NED) and
normalized standard deviation (NSD) are 0.099% and 0.04%, respectively. Compared to existing power balance
circuits, the power balance ability of the proposed SRAM has improved 53%.

Key words: differential power analysis (DPA); static random access memory (SRAM); power balance; informa-
tion security
DOI: 10.1088/1674-4926/37/4/045002 PACS: 85.40.-e EEACC: 2560

in complementary output are the same. In terms of SRAM,

1. Introduction most SRAM are single-ended and do not have complemen-
tary output. Therefore, these methods are not suitable for the
With the rapid development of network and wireless com-
design of SRAM. Based on SABL, three-phase dual-rail pre-
munication technologies, security of personal information stor-
charge logic (TDPL)Œ9; 10 and self-timed three-phase dual-rail
age and transmission is attracting an increasing amount of at-
pre-charge logic (ST-TDPL)Œ11 are proposed. They are added
tention. Personal information is typically stored in electronic
in an extra discharge phase, and the key nodes of the circuit
devices and protected by encryption algorithms. However, the
are pre-charged and discharged in each cycle. As a result, the
process of hardware encryption will leak physical information
difference of power consumption caused by inconsistent load
such as power consumption, time and electromagnetic waves.
and wiring capacity is eliminated. In addition, prior to the end
Attackers can utilize this information to reveal confidential
of the cycle, the output nodes of these circuits must be reset. In
data by side channel attack (SCA)Œ1; 2 . Of all SCA, differen-
other words, these circuits cannot hold the reading data, which
tial power analysis (DPA) is considered to be the most effec-
thus makes them inappropriate for designing SRAM applied to
tive and popular attack method. By exploiting the correlation
between the power consumption and data during the hardware
In this paper, a power balance SRAM for DPA-resistance is
encryption process, DPA can reveal the data hidden in the hard-
proposed. In order to eliminate the correlation between power
ware. Consequently, the security of encrypted devices is deeply
consumption and processed data, the switch power consump-
threatenedŒ3; 4 .
tion and short-circuit power consumption are balanced by dis-
In encryption circuits, static random-access memory charging and pre-charging the key nodes of the output circuit
(SRAM) is widely used. However, the power consumption of and adding an extra short-circuit current path. As a result, the
SRAM is associated with the read data. As a result, the data power consumption is constant in every read cycle which can
stored in SRAM may be revealed by DPA. In recent years, re- enhance the DPA-resistant ability of SRAM.
searchers have found that eliminating the correlation between
power consumption and processed data can effectively defend
against DPA. Accordingly, an increasing amount of transis- 2. Power balance SRAM design
tor level circuit design technology has been proposed to elim- 2.1. Power consumption difference of conventional SRAM
inate the correlation between data and power consumption.
Three representative technologies are sense-amplifier-based The conventional structure of SRAM is shown in Figure 1.
logic (SABL)Œ5; 6 , wave dynamic differential logic (WDDL)Œ7 Its read critical path includes timer, decoder (DEC), SRAM ar-
and charge sharing symmetric adiabatic logic (CSSAL)Œ8 . ray and output circuit (column I/O)Œ12; 13 . The output circuit
These circuits are dual-rail pre-charge logic, which attempts consists of a sense amplifier and a latch circuit. In the read crit-
to balance the power consumption of each cycle by using com- ical path, the power consumption of the timer is basically con-
plementary output. These circuits can ensure the power con- stant, and the decoder is only related to the address signal. As
sumption is basically constant in every cycle when the loads for the SRAM array and output circuit, their power consump-

* Project supported by the Zhejiang Provincial Natural Science Foundation of China (No. LQ14F040001), the National Natural Science
Foundation of China (Nos. 61274132, 61234002), and the K. C. Wong Magna Fund in Ningbo University, China.
† Corresponding author. Email: wangpengjun@nbu.edu.cn
Received 19 August 2015, revised manuscript received 11 October 2015 © 2016 Chinese Institute of Electronics

J. Semicond. 2016, 37(4) Zhou Keji et al.

Figure 1. Analysis of SRAM read operation.

Figure 2. Timing diagram of CSO.

tion may be influenced by the internal storage data. Table 1. Different working states of conventional SRAM output.
The SRAM cell adopts the conventional 6T cell. The 6T State Reading data in last cycle Reading data in current
SRAM cell is completely symmetrical and has a dual-rail pre- cycle
charge structure. Accordingly, the two bitlines must be pre- S11 1 1
charged before the reading operation. During a read cycle, S10 1 0
S00 0 0
only one bitline is discharged in constant time according to the
S01 0 1
stored data. Because the sizes of discharge transistors are the
same, and the structure of the 6T cell is completely symmet-
rical, the electricity discharge will be the same when the dis- SOUTB, the drop-down currents of N3 and N4 are different.
charge time is constant. Therefore, the power consumption of Accordingly, the voltage difference is rapidly amplified with
the SRAM array is basically constant in each read cycle. the feedback of the two internal mutual coupling of the invert-
Conventional SRAM output (CSO) is composed of a latch ers. Meanwhile, if read data are the same in the current and
type sense amplifierŒ12 and a latch circuit. Its timing diagram last cycle, node 1, node 2 and output node Q will be held. If
is depicted in Figure 2. According to the different reading data they are not the same, node1 and node 2 will be flipped and
in the current cycle and the last cycle, CSO has four working output node Q may be charged or discharged. These different
states, as shown in Table 1: read 1 in the last cycle and read 1 operations cause the power consumption difference. The power
in the current cycle (S11 /; read 1 in the last cycle and read 0 in supply current of CSO in different working states is shown in
the current cycle (S10 /; read 0 in the last cycle and read 0 in Figure 3. An immense difference of supply currentin the evalu-
the current cycle (S00 /; and read 0 in the last cycle and read 1 ation phase is observed, which causes the energy consumption
in the current cycle (S01 /. difference shown in the figure. By leveraging the difference,
In the reading phase, the bitlines are discharged by the stor- the data may be revealed.
age cell, and then the voltage difference is established. Mean-
while, the output nodes of sense amplifier SOUT and SOUTB 2.2. Power balance SRAM output circuit
maintain a high level, and the latch circuit holds the original
output data. In the evaluation phase, the drop-down pathway is According to the aforementioned analysis, the main power
opened. Owing to the existing voltage differencein SOUT and consumption difference is caused by different operations on

J. Semicond. 2016, 37(4) Zhou Keji et al.

Figure 3. Power supply current of CSO in different working states.

Figure 4. Power balance SRAM output (PBSO).

nodes of the SRAM output circuit. To balance the power con- NAND gate. Nodes 1 and 2 will be discharged when SAOUT
sumption in every read cycle, a power balance SRAM output and SAOUTB are pre-charged to a high level, and node Q will
(PBSO) circuit is proposed, which is shown in Figure 4. be charged. After these designs, we can ensure that node Q will
The power consumption difference on output node Q is be firstly discharged, and then pre-charged before the evalua-
mainly caused by the different switch power consumption of tion phase. Consequently, the switch power consumption dif-
four working states. In terms of the four working states, only ference of node Q is eliminated.
S01 will charge output node Q and generate switch power con- In the pre-charge phase, a new latch circuit will discharge
sumption. The remaining three working states will hold or dis- nodes 1 and 2. While in the evaluation phase, node 1 or 2 is
charge output node Q, and they do not generate any switch charged according to the reading data, which results in node
power consumption. If node Q is charged in every read cycle, Q being discharged or held, respectively. On account of the in-
the switch power consumption difference will be eliminated. consistent load and wiring capacity, the pre-charge or discharge
To that end, node Q will firstly be discharged, and then pre- operation will cause the switch power consumption and short-
charged before the evaluation phase. Hence, in the evaluation circuit power consumption to differ. Thus, transistors N10 and
phase, node Q can only be discharged or held, it does not gen- P10 are added to node 2 as a load transistor, and the source and
erate switch power consumption. As a result, node Q will have drain of N10 and P10 are all connected to VSS. As a result,
one and only one pre-charge operation in each read cycle. To the power consumption difference can be eliminated by adjust-
ensure that node Q has one pre-charge operation in every read ing the sizes of the new transistors. In addition, the output cir-
cycle, a discharge phase and a pre-charge phase are added at cuit must hold a stable output after the evaluation. Therefore,
the beginning of the cycle. During the discharge phase, a signal a hold phase is added after the evaluation phase. Because the
OUTDIS is added to discharge node Q. During the pre-charge new latch circuit will hold data when SAOUT and SAOUTB
phase, SOUT and SOUTB can be charged by two charge tran- are all at a low level, transistors N4 and N5, which are con-
sistors P4 and P5. Because conventional RS flip–flop consists trolled by signal SADIS, are added to discharge SAOUT and
of two cross-coupled NAND gates, it can only hold the output SAOUTB.
data when the input signals are all high. Thus, the conventional With the addition of adding signal OUTDIS to discharge
RS flip–flop cannot ensure that node1 is low level before the node Q, reading data 1 in the last cycle will discharge node Q
evaluation phase and thus charge node Q. In the new latch cir- one more time in the current cycle compared to reading data
cuit, the RS flip–flop is comprised of a NOR gate instead of a 0 in the last cycle. This causes the short-circuit power con-

J. Semicond. 2016, 37(4) Zhou Keji et al.

Figure 5. Timing diagram of PBSO.

Figure 6. (Color online) Circuit energy consumption distribution in different working states. (a) Energy consumption of CSO. (b) Energy
consumption of PBSO.

sumption to differ. To eliminate the difference, the short-circuit Owing to the test precision and circuit design, the energy con-
power in the current cycle is added when reading data 0 in the sumption distributions of PBSO in S01 were divided into two
last cycle. Thus, a discharge transistor N9 is added to node Q, parts. The numbers of the distribution were 21 and 4 when the
and its gate is connected to node1 by a pass gate which is al- energy consumption was 124.3 and 124.4 pJ, respectively.
ways open. When reading 0 in the last cycle, node1 is charged In terms of CSO, an immense difference in energy con-
to VDD. In the pre-charge phase of the current cycle, node1 sumption exists when CSO is in different working states. In
will be discharged, and then node Q is charged. By the pass gate S01 and S10 , CSO must change the data stored in the latch cir-
effect, when node Q is charged to VDD the discharge transistor cuit, which causes switch power consumption and short-circuit
will be turned on for a short time. Thus, the additional short- power consumption. As a result, energy consumption in work-
circuit power consumption is generated. By changing the pass ing states S01 and S10 is much bigger than in S11 and S00 .
gate size, the generated power consumption is controlled and Conversely, because PBSO eliminates the difference of switch
the difference in the short-circuit power consumption is elimi- power consumption and short-circuit power consumption, it
nated. can maintain an energy consumption that is basically constant
The PBSO timing diagram is depicted in Figure 5. The pro- in different working states.
cess of PBSO involves a discharge, pre-charge, read, evalua-
tion and hold phases in sequence. Output node Q will be pre- 2.3. Structure of power balance SRAM
charged once before the evaluation phase, and can only be dis-
charged or held in the evaluation phase. Nodes 1 and 2 are dis- Based on the analysis and research of References [12–14],
charged in the pre-charge phase; then, either node 1 or node 2 the structure of a 256  8 power balance SRAM is proposed, as
is charged in the evaluation phase. The output nodes of sense shown in Figure 7. It consists of a timer, address latch, DEC,
amplifiers SOUT and SOUTB are charged and discharged in replica, SRAM array, power balance output circuit and input
every read cycle. Consequently, the power of the output circuit circuit.
is balanced. The storage cell of the SRAM array adopts the conven-
CSO and PBSO were tested for 100 cycles in different tional 6T storage structure. The output circuit adopts PBSO
working states. The energy consumption distributions of CSO designed in the present study. Thus, the correlation between
and PBSO in different working states are shown in Figure 6. power consumption and reading data is basically eliminated.

J. Semicond. 2016, 37(4) Zhou Keji et al.

Figure 7. Structure of 256  8 power balance SRAM.

Figure 8. (Color online) Layout and performance of power balance

The replica bitline control circuit is placed on the left side of
SRAM for DPA-resistance.
the SRAM array, resulting in an environment more similar to
the SRAM array. Therefore, the replica bitline control circuit
can more effectively trace the time of the bitline delay which is
affected by threshold voltage, supply voltage, and temperature
variations. A decoder is placed on the left side of the SRAM
array and replica bitline control circuit. Because it is closed to
the SRAM array, the wordline can be shortened, which thereby
reduces the switch power consumption. The address latch is lo-
cated at the bottom of the decoder, which can shorten the in-
terconnecting wire between the address latch and decoder, and
reduce the setup time. The longest interconnecting wire is the
clock line. Considering that all circuits except the SRAM array
must be controlled by the clock, the timer is set in the bottom of
the decoder. The left and right sides of the timer are the address
latch and replica bitline control circuit, respectively.

3. Experimental results and performance com-

A 256  8 power balance SRAM for DPA-resistance is
implemented in SMIC 65 nm technology. The layout and per-
formance are shown in Figure 8. Among these performance
indices, normalized energy deviation (NED) and normalized
standard deviation (NSD) are often used to measure the ability
in the DPA-resistant of the circuit. They are defined as:

max.E/ min.E/
NED D ; (1)
NSD D ; (2)
Figure 9. Super imposition of the power supply current trances. (a)
where E is the power of the read cycle, and E is the average SRAM with PBSO. (b) SRAM with CSO.
power of the reading power. The SRAM proposed in this pa-
per achieves 1.25 GHz @ 7.1 mW at voltage 1.2 V, and 220
MHz @ 575 W at voltage 0.7 V. To test the power balance and SRAM with PBSO can unify the current in the evaluation
ability, the reading data was transformed from 255 to 156 in phase, thereby eliminating the correlation between power con-
sequence with the same address. The super imposition of the sumption and reading data.
power supply current trances of SRAM with CSO and PBSO, The performances of power balance SRAM for DPA-
respectively, are shown in Figure 9. It can be observed that the resistance in different reference voltages and process corners
current of SRAM with CSO is different in the evaluation phase, are shown in Table 2. The reference voltages are varied from

J. Semicond. 2016, 37(4) Zhou Keji et al.

Table 2. Performances of power balance SRAM for DPA-resistance in different voltages and process corners.
Reference voltage 1.2 V @ 667 MHz 1.1 V @ 334 MHz 1.0 V @ 250 MHz
@ frequency
Process corner Pmin Pmax Difference Pmin Pmax Difference Pmin Pmax Difference
(mW) (mW) (mW) (mW) (mW) (mW)
TT 3.522 3.526 0.113% 1.463 1.470 0.476% 0.897 0.904 0.774%
SS 2.608 2.617 0.344% 1.128 1.134 0.529% 0.722 0.723 0.138%
FF 3.867 3.870 0.078% 1.765 1.770 0.282% 1.101 1.108 0.632%
SF 2.763 2.789 0.932% 1.250 1.260 0.794% 0.793 0.801 0.999%
FS 2.747 2.762 0.615% 1.260 1.264 0.316% 0.786 0.788 0.254%
Reference voltage 0.9 V @ 200 MHz 0.8 V @ 125 MHz 0.7 V @ 62.5 MHz
@ frequency
Process corner Pmin Pmax Difference Pmin Pmax Difference Pmin Pmax Difference
(mW) (mW) (mW) (mW) (mW) (mW)
TT 576.6 582.8 1.064% 281.0 285.1 1.438% 105.6 107.2 1.493%
SS 464.2 465.2 0.215% 228.8 231.9 1.337% 85.29 86.00 0.826%
FF 705.6 712.4 0.955% 344.7 349.3 1.489% 129.2 131.3 1.599%
SF 526.6 533.4 1.275% 245.3 248.8 1.407% 110.3 111.8 1.342%
FS 508.6 509.4 0.157% 247.7 247.8 0.040% 107.6 108.6 0.921%

Table 3. Comparison of power balance performance.

Parameter Proposed CSSALŒ8 SABLŒ6 WDDLŒ7 TDPLŒ9
Voltage (V) 1.2 1.8 1.8 1.8 1.2
Technology (nm) 65 180 180 180 65
NED (%) 0.099 4.23 3.2 1.12 0.21
NSD (%) 0.019 1.01 0.6 0.22 0.04

0.7 to 1.2 V, and the process corners adopt five typical corners: ensure that the current and power consumptions are basically
TT, FF, SS, SF and FS. The environments of the five corners constant in every read cycle. Compared to existing power bal-
are as follows: the supply voltage of TT is equal to the refer- ance circuits, our simulation results show that the power bal-
ence voltage, and the temperature is 25 ıC; the supply voltage ance ability of the proposed SRAM provides an improvement
of FF is 10% higher than the reference voltage, and the tem- of 53%.
perature is 40 ıC; and the environments of corners SS, SF,
and FS are the same, the supply voltage is 10% lower than
the reference voltage, and the temperature is 125 ıC. At dif- References
ferent voltages and process corners, the differences of power
are all below 1.6%, effectively ensuring the power balance of [1] Wang P J, Zhang Y J, Zhang X L. Research of differential power
SRAM. The comparison of power balance performance for the analysis countermeasures. Journal of Electronics and Information
proposed power balance SRAM and other power balance cir- Technology, 2012, 34(11): 2774
cuits are shown in Table 3. Compared to TDPLŒ9 , which has [2] Yu Bo, Li Xiangyu, Chen Cong, et al. An AES chip with DPA
the best power balance ability, an improvement of the proposed resistance using hardware-based random order execution. Journal
SRAM can be up to 53% in terms of the power balance ability; of Semiconductors, 2012, 33(6): 065009
[3] Kocher P, Jaffe J, Jun B. Differential power analysis. In: Ad-
moreover, it can enhance its ability in effectively preventing
vances in cryptology. Berlin Heidelberg, Springer, 1999: 388
DPA attacks.
[4] Martinasek Z, Clupek V, Krisztina T. General scheme of differ-
ential power analysis. 36th International Conference on Telecom-
4. Conclusions munications and Signal Processing, 2013: 358
[5] Tiri K, Akmal M, Verbauwhede I. A dynamic and differen-
DPA attacks are easy to implement and highly effective; tial CMOS logic with signal independent power consumption to
therefore, the security of encryption devices is seriously threat- withstand differential power analysis on smart cards. Proceed-
ened. Because the power consumption of SRAM is associated ings of the 28th European Solid-State Circuits Conference, 2002:
with the read data, SRAM security is thus threatened by DPA. 403
[6] Wang P J, Zhang Y J, Zhang X L. Design of two-phase SABL
To overcome the disadvantage of conventional SRAM, a power
flip–flop for resistant DPA attacks. Chinese Journal of Electron-
balance SRAM for DPA resistance was herein proposed based
ics, 2013, 22(4): 833
on the concept of power balance circuit design. In the proposed [7] Tiri K, Verbauwhede I. A logic level design methodology for a
design, the key nodes of the output circuit are discharged and secure DPA resistant ASIC or FPGA implementation. Proceed-
precharged in every read cycle, and an additional short-current ings of Design, Automation and Test in Europe Conference and
path is added. Thus, the difference of short-circuit power con- Exhibition, 2004: 246
sumption and switch power consumption in different working [8] Monteiro C, Takahashi Y, Sekine T. DPA resistance of charge-
states is eliminated. As a result, the power balance SRAM can sharing symmetric adiabatic logic. IEEE International Sympo-

J. Semicond. 2016, 37(4) Zhou Keji et al.
sium on Circuits and Systems, 2013: 2581 leakage SRAM design in 65 nm ultra-low-power CMOS tech-
[9] Bucci M, Giancane L, Luzzi R, et al. Three-phase dual-rail pre- nology with integrated leakage reduction for mobile applications.
charge logic. Proceedings of Workshop on Cryptographic Hard- IEEE J Solid-State Circuits, 2008, 43(1): 172
ware and Embedded Systems, Lecture Notes in Computer Sci- [13] Karl E, Wang Y, Ng Y G, et al. A 4.6 GHz 162Mb SRAM de-
ence, Springer-Verlag, 2006, 4249: 232 sign in 22 nm tri-gate CMOS technology with integrated active
[10] Bucci M, Giancane L, Luzzi R, et al. A flip–flop for the DPA re- VMIN-enhancing assist circuitry. IEEE International Solid-State
sistant three-phase dual-rail pre-charge logic family. IEEE Trans Circuits Conference Digest of Technical Papers, 2012: 230
Very Large Scale Integration Systems, 2012, 20(11): 2128 [14] Ding Lili, Yao Zhibin, Guo Hongxia, et al, Worst-case total dose
[11] Akkaya N.E.C, Erbagci B, Carley R, et al. A DPA-resistant self- radiation effect in deep-submicron SRAM circuits. Journal of
timed three-phase dual-rail pre-charge logic family. IEEE Inter- Semiconductors, 2012, 33(7): 075010
national Symposium on Hardware Oriented Security and Trust, [15] Na T, Woo S H, Kim J, et al. Comparative study of various latch-
2015: 112 type sense amplifiers. IEEE Trans Very Large Scale Integration
[12] Wang Y, Ahn H, Bhattacharya U, et al. A 1.1 GHz 12 A/Mb- Systems, 2014, 22(2): 425


