Bit magnitude comparator
Bit magnitude comparator
Advances in Electronics
Volume 2015, Article ID 713843, 13 pages
http://dx.doi.org/10.1155/2015/713843
Research Article
FPGA-Based Synthesis of High-Speed Hybrid
Carry Select Adders
Copyright © 2015 V. Kokilavani et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Carry select adder is a square-root time high-speed adder. In this paper, FPGA-based synthesis of conventional and hybrid carry
select adders are described with a focus on high speed. Conventionally, carry select adders are realized using the following: (i) full
adders and 2 : 1 multiplexers, (ii) full adders, binary to excess 1 code converters, and 2 : 1 multiplexers, and (iii) sharing of common
Boolean logic. On the other hand, hybrid carry select adders involve a combination of carry select and carry lookahead adders
with/without the use of binary to excess 1 code converters. In this work, two new hybrid carry select adders are proposed involving
the carry select and section-carry based carry lookahead subadders with/without binary to excess 1 converters. Seven different
carry select adders were implemented in Verilog HDL and their performances were analyzed under two scenarios, dual-operand
addition and multioperand addition, where individual operands are of sizes 32 and 64-bits. In the case of dual-operand additions,
the hybrid carry select adder comprising the proposed carry select and section-carry based carry lookahead configurations is the
fastest. With respect to multioperand additions, the hybrid carry select adder containing the carry select and conventional carry
lookahead or section-carry based carry lookahead structures produce similar optimized performance.
(iii) CSLA based on CBL sharing Figure 1(b) shows the 8-bit conventional CSLA compris-
(iv) Hybrid CSLA and CLA structures ing full adders and 2 : 1 MUXes, henceforth referred to as
simply “CSLA.” In the case of CSLA shown in Figure 1(b), the
(v) Hybrid CSLA and CLA including BECs.
full adders present in the most significant nibble position are
In general, CSLAs are composed using a carry select archi- duplicated with carry inputs (cin) of 0 and 1 assumed; that is,
tecture with/without BECs or may consist of a mix of one 4-bit RCA with a carry input (“cin”) of 0 and another 4-
carry select and carry lookahead configurations with/without bit RCA with a carry input (“cin”) of 1 are used. Notice that
BECs. CSLAs constructed using pure carry select structures both these RCAs have the same augend and addend inputs.
are called “homogeneous CSLAs” and CSLAs realized using While the least significant 4-bit RCA would be adding the
a combination of carry select and carry lookahead structures augend inputs (𝑎3 to 𝑎0 ) with the addend inputs (𝑏3 to 𝑏0 ), the
are labeled as “heterogeneous/hybrid CSLAs.” The interest more significant 4-bit RCAs would be simultaneously adding
behind hybrid CSLAs is supported by the fact that hetero- up the augend inputs (𝑎7 to 𝑎4 ) with the addend inputs (𝑏7 to
geneous adders tend to better optimize the design metrics 𝑏4 ), with presumed carry inputs (cin) of 0 and 1. Due to two
compared to homogeneous adders [24]. In a recent work addition sets, two sets of sum and carry outputs are produced,
[25], section-carry based CLAs (SCBCLAs) were proposed one based on 0 as the carry input and another based on 1 as
as an alternative to conventional CLAs; for a 32-bit addi- the carry input, which are in turn fed as inputs to the 2 : 1
tion operation, the SCBCLA was found to exhibit reduced MUXes. The number of MUXes used depends on the size
propagation delay than the conventional CLA by 15.2%. of the RCA duplicated. To determine the true sum outputs
Motivated by this result, two new hybrid CSLA architectures and the real value of carry overflow pertaining to the most
are proposed in this work, a hybrid CSLA incorporating significant nibble position, the carry output (𝑐4 ) from the least
CSLA and SCBCLA and another hybrid CSLA embedding significant 4-bit RCA is used as the common select input for
CSLA, SCBCLA, and BECs. This paper builds upon our all the MUXes; thereby the correct result corresponding to
prior work [21] by analyzing the performance of different either the RCA with 0 as the carry input or the RCA with 1 as
CSLA architectures with respect to diverse input partitions the carry input is displayed as output.
for different addition widths for the case of dual-operand Figure 1(c) portrays the 8-bit CSLA containing full
addition and further evaluates the efficacy of the conventional adders, 2 : 1 MUXes, and BEC logic, henceforth identified
and proposed CSLAs with respect to multioperand additions. as “CSLA BEC”. Figure 1(c) also shows the internals of the
The remaining part of this paper is organized as follows. 5-bit BEC, which is depicted by the circuit shown within
With 8-bit addition as a running example, Section 2 describes the oval. The CSLA BEC is rather different from the CSLA
the conventional CSLA topologies with and without BEC in that instead of having an RCA with a presumed carry
logic and also the CSLA based on sharing of CBL. Section 3 input of 1 in the more significant nibble position, the BEC
presents the architectures of hybrid CSLAs incorporating circuit is introduced. The BEC logic adds binary 1 to the least
CLAs and SCBCLAs with/without BEC logic. In Section 4, significant bit of its binary inputs and produces the resultant
the performance of different CSLA topologies is evaluated for sum and carry as output. As seen in Figure 1(c), the BEC
dual-operand and multioperand additions with operand sizes accepts as inputs the sum and carry outputs of the RCA
of 32 and 64-bits. Finally, the conclusions follow in Section 5. having a presumed carry input of 0, adds binary 1 to this input,
and produces the resulting sum and carry overflow as output.
2. Homogeneous CSLA Architectures Now the correct result exists between choosing the output of
the RCA with a presumed carry input of 0 and the output of
The RCA and homogeneous CSLA architectures are shown the BEC logic. The carry output 𝑐4 of the least significant RCA
in Figure 1 for an example case of 8-bit addition. Figure 1(a) is used to determine the correct set of the most significant
depicts an 8-bit RCA, which is formed by a cascade of full nibble position sum and carry outputs. The logic equations
adder modules; the full adder [9] is an arithmetic building governing the 5-bit BEC are given below. In the equations,
block that adds an augend and addend bit (say, 𝑎 and 𝑏) ∼ signifies logical inversion, ⊕ implies logical XOR, and ∙
along with any carry input (cin) and produces two outputs, represents logical conjunction. Consider
namely, sum (Sum) and carry overflow (Cout). Since there is
a rippling of carry from one full adder stage to another, the Sum4 1 =∼ Sum4 0
propagation delay of the RCA varies linearly in proportion
Sum5 1 = Sum5 0 ⊕ Sum4 0
to the adder width. The CSLA basically partitions the input
data into groups and addition within the groups is carried out
Sum6 1 = Sum6 0 ⊕ (Sum5 0 ∙ Sum4 0 )
in parallel; that is, the CSLA is composed of partitioned and
duplicated RCAs. It can be seen from Figure 1 that the least Sum7 1 = Sum7 0 ⊕ (Sum6 0 ∙ Sum5 0 ∙ Sum4 0 )
significant 4-bit adder stages of RCA and CSLAs are identical.
However, the carry produced by the least significant nibble 𝑐8 1 = 𝑐8 0 ⊕ (Sum7 0 ∙ Sum6 0 ∙ Sum5 0 ∙ Sum4 0 ) .
is simply propagated through the more significant nibble in
(1)
the case of the RCA bit-by-bit, while the carry corresponding
to the least significant nibble serves as the selection input for The CSLA constructed on the basis of sharing of CBL is
MUXes present in the more significant position in the case of depicted through Figure 2, which will be referred to as
CSLAs. “CSLA CBL” henceforth. The CSLA CBL adder is founded
Advances in Electronics 3
a7 b7 a6 b6 a5 b5 a4 b4 a3 b3 a2 b2 a1 b1 a0 b0
Carry_out
a7 b7 a6 b6 a5 b5 a4 b4
(b) 8-bit conventional CSLA comprising full adders and 2 : 1 MUXes (CSLA type)
a7 b7 a6 b6 a5 b5 a4 b4 a3 b3 a2 b2 a1 b1 a0 b0
Carry_out
(c) 8-bit conventional CSLA comprising full adders, 2 : 1 MUXes, and BEC logic (CSLA BEC type)
Figure 1: (a) 8-bit RCA, (b) representative 8-bit homogeneous CSLA, and (c) representative 8-bit homogeneous CSLA with BEC logic.
4 Advances in Electronics
a7 b7 a1 b1 a0 b0
MUX
MUX
0 1 0
MUX
Carry_out c7 · · · c2 0 1 0
1 MUX 1 MUX 0 1 Carry_in
c1 1 MUX
Sumn−1
Sum1 Sum0
Figure 2: 8-bit homogeneous CSLA utilizing shared CBL (CSLA CBL architecture).
upon utilizing the full adder logic, whose underlying equa- constitute prior works in the realm of synchronous and
tions are given below with 𝑎, 𝑏, and cin being the primary asynchronous designs.
inputs, and Sum and Cout being the primary outputs. In (3), The section-carry based carry lookahead generator
“+” implies logical disjunction: shown enclosed within the circle in Figure 3 produces a
single lookahead carry signal corresponding to a “section” or
Sum = 𝑎 ⊕ 𝑏 ⊕ cin (2) “group” of the adder inputs (hence the term “section-carry”),
while the conventional carry lookahead generator encapsu-
Cout = (𝑎 + 𝑏) ∙ cin + (𝑎 ∙ 𝑏) (∼ cin) . (3) lated within the rectangle produces multiple lookahead carry
signals corresponding to each pair of augend and addend
Referring to (2) and (3), it may be understood that, for a primary inputs. The section-carry based carry lookahead
carry input (cin) of 0, (2) and (3) reduce to Sum = (𝑎 ⊕ 𝑏) generator differs from the traditional carry lookahead gener-
and Cout = (𝑎 ∙ 𝑏). With cin = 1, (2) and (3) become ator in that bit-wise lookahead carry signals are not required
Sum =∼ (𝑎 ⊕ 𝑏) and Cout = (𝑎 + 𝑏). Based on this to be computed for the former. The XOR and AND gates
principle, sum and carry outputs for both possible values of used for producing the necessary propagate and generate
input carries are generated simultaneously and fed as inputs signals (𝑃3 to 𝑃0 and 𝐺3 to 𝐺0 ) are highlighted using dotted
to two 2 : 1 MUXes. The correct sum and carry outputs are lines in Figure 3; these constitute the propagate-generate logic
determined by the carry input, serving as the select input for referred to in Figures 4 and 5.
the two MUXes. Though the exorbitant duplicated RCA and 8-bit hybrid CSLAs with/without BEC logic and com-
RCA with BEC logic structures are eliminated through this prising a CLA in the least significant stage viz. “CSLA-CLA”
approach, leading to savings in terms of area, nevertheless, and “CSLA BEC-CLA” adder types are shown in Figure 4. On
since the carry propagates from stage-to-stage, the critical the other hand, 8-bit hybrid CSLAs with/without BEC logic
data path delay tends to be proportional to the size of the full and incorporating a SCBCLA in the least significant stage
adders cascade. As a consequence, the delay of the CSLA CBL viz. “CSLA-SCBCLA” and “CSLA BEC-SCBCLA” adder vari-
adder may be close to that of the RCA which is confirmed by eties are portrayed in Figure 5. Both the conventional CLA
the simulation results given in Section 4. and SCBCLA constitute three functional blocks: propagate-
generate logic, lookahead carry generator, and the sum
3. Heterogeneous/Hybrid CSLA Architectures producing logic. Not only is the carry lookahead genera-
tor different for CLA and SCBCLA adders, but the sum
Apart from synthesizing basic CSLA topologies viz. CSLA, producing logic is also different; in case of CLA, the sum
CSLA BEC, and CSLA CBL, hybrid CSLA architectures producing logic comprises only XOR gates, whereas in the
involving CSLA and CLA/SCBCLA were also implemented SCBCLA, the sum producing logic consists of full adders and
with the intention of minimizing the maximum propagation an XOR gate, with the XOR gate providing the sum of the
path delay. It is well known that a CLA is faster than a primary inputs 𝑎3 , 𝑏3 , and 𝑐3 . While rippling of carries occurs
RCA, and hence it may be worthwhile to have a CLA as internally within the carry-propagate adder constituting the
a replacement for the least significant RCA in the CSLA SCBCLA and producing the requisite sums, the lookahead
structure. Although the concept of carry lookahead is widely carry signal corresponding to an adder section is generated
understood, the concept of section-carry based carry looka- independently (in parallel) and serves as the lookahead carry
head may not be that well known, and hence to explain the input for the successive CSLA stage.
distinction between the two, sample 4-bit lookahead logic
realized using these two approaches is portrayed in Figure 3 4. Results and Discussion
for an illustration. For details on different section-carry based
carry lookahead structures and SCBCLA constructions using Three homogeneous CSLA architectures viz. CSLA, CSLA BEC,
them, an avid reader is directed to references [25–27], which and CSLA CBL and four heterogeneous CSLA architectures
Advances in Electronics 5
a3 P3
b3 c4
G3
a2 P2
b2 4-bit conventional carry
G2 lookahead generator
P1 (excluding generate
a1 and propagate signals)
b1
G1 c3
a0 P0
b0 G0
c2
c1
c0
viz. CSLA-CLA, CSLA BEC-CLA, CSLA-SCBCLA, and the partitions; it should be noted that input partitioning is
CSLA BEC-SCBCLA were described topologically in Verilog inherent to all CSLAs except the CSLA CBL type (shown in
HDL similar to previous works [16, 21–23, 25] to perform Figure 2) which has a regular carry select structure and hence
two kinds of addition operations viz. dual-operand addition is void of input partitions. Referring to Figure 1(b), it can be
and multioperand addition. For dual-operand addition, two seen that 8 pairs of inputs have been split into two uniform
binary operands having corresponding sizes of 32-bits and or equal-sized groups of 4-input pairs; thus it can be said that
64-bits were considered. For multioperand addition, addition the 8-bit CSLA is realized according to a 4-4 input partition.
of four binary operands, each of size 32-bits, and another For synthesis, 3 uniform input partitions (4-4-4-4-4-4-
multioperand addition involving four binary operands with 4-4, 8-8-8-8, and 16-16) and 2 optimum nonuniform input
each having size of 64-bits were considered. Moreover, two partitions (3-7-6-5-4-3-2-2 [29] and 8-7-6-4-3-2-2 [15]) were
types of multioperand additions were performed based on considered for realizing the 32-bit CSLAs. Figure 6 visually
(i) carry save adder (CSA) topology, and (ii) bit-partitioned portrays the variations in propagation delay corresponding
addition scheme. All the adders were synthesized using a to different primary input partitions for the six CSLA types.
90 nm FPGA (XC3S1600E) [28], with speed optimization On the other hand, 4 uniform input partitions viz. 4-4-4-4-4-
specified as the design goal in the Xilinx 9.1i ISE design 4-4-4-4-4-4-4-4-4-4-4, 8-8-8-8-8-8-8-8, and 16-16-16-16, 32-
suite. The critical path delay and area values (in terms of 32, and a nonuniform input partition viz. 8-10-9-8-7-6-5-4-
number of basic logic elements viz. BELs) were ascertained 3-2-2 [29] were considered for realizing the 64-bit CSLAs.
after automatic place-and-route. The results of dual-operand Figure 7 depicts the propagation delay variations subject to
additions shall be presented first, followed by the results different primary input partitions for the six CSLA architec-
obtained for multioperand additions. tures. The trend line highlighted in Figure 6 shows that the
uniform 8-8-8-8 input partition consistently paves the way
4.1. Dual-Operand Addition. CSLAs can be implemented on for least propagation delay (varying from 17 ns to 20 ns) with
the basis of uniform or nonuniform primary input partitions; respect to various 32-bit homogeneous and heterogeneous
accordingly they are labeled as “uniform” or “non-uniform” CSLAs. Similarly the trend line indicated in black in Figure 7
CSLAs, in a structural sense. “Input partitioning” basically conveys that the uniform 16-16-16-16 input partition results
means splitting up of the primary inputs into groups of inputs in the least data path delay (varying from 27 ns to 29 ns) for
so as to pave the way for addition to be done in parallel within the different homogeneous and heterogeneous 64-bit CSLAs.
6 Advances in Electronics
Propagate-generate logic
a3 a
b3 b0 0
···
a7 b7 a6 b6 a5 b5 a4 b4
G3 P3 G0 P0
0 c4
c8 Full Full Full Full 4-bit conventional carry
cin = 0 lookahead generator
adder adder adder adder
Sum7 0 Sum6 0 Sum5 0 Sum4 0
P3 c3 P0
MUX
Carry_out
a7 b7 a6 b6 a5 b5 a4 b4
···
c8 1
Full Full Full Full cin = 1
adder adder adder adder
Sum3 Sum0
Sum7 1 Sum6 1 Sum5 1 Sum4 1
(a) 8-bit hybrid CSLA with a conventional CLA in the least significant stage
Propagate-generate logic
a
b3 3 b0 a0
···
a7 b7 a6 b6 a5 b5 a4 b4 G3 P3 G0 P0
0
c8 Full Full Full Full c4 4-bit conventional carry
cin = 0 lookahead generator
adder adder adder adder
0 0
Sum7 0 Sum6 Sum5 0 Sum4
P3 c3 P0
MUX
Carry_out
(b) 8-bit hybrid CSLA featuring BEC with a least significant CLA stage
Figure 4: Hybrid CSLAs without/with BEC logic comprising a CLA: (a) CSLA-CLA type and (b) CSLA BEC-CLA type.
Advances in Electronics 7
Propagate-generate logic
a3 a
b3 b0 0
···
a7 b7 a6 b6 a5 b5 a4 b4
G3 P3 G0 P0
0
c8 Full Full Full Full c4 4-bit section-carry based carry
cin = 0 lookahead generator
adder adder adder adder
Sum7 0 Sum6 0 Sum5 0 Sum4 0
a3 b3
2:1
Carry_out a b0
0
a7 b7 a6 b6 a5 b5 a4 b4
c3 Full
··· Carry_in
c8 1 adder
Full Full Full Full cin = 1
adder adder adder adder
Sum7 1 Sum6 1 Sum5 1 Sum4 1
Sum3 Sum0
(a) 8-bit hybrid CSLA with 4-bit SCBCLA in the least significant stage
Propagate-generate logic
a
b3 3 b0 a0
···
a7 b7 a6 b6 a5 b5 a4 b4 G3 P3 G0 P0
a3 b3
2:1
Carry_out a0 b0
Sum3 Sum0
(b) 8-bit hybrid CSLA incorporating BEC with a least significant SCBCLA stage
Figure 5: Hybrid CSLAs without/with BEC logic comprising a SCBCLA: (a) CSLA-SCBCLA type and (b) CSLA BEC-SCBCLA type.
8 Advances in Electronics
CSLA_BEC
CSLA-CLA
CSLA_BEC-CLA
CSLA-SCBCLA
CSLA_BEC-SCBCLA
CSLA-CLA 30.398 106
4-4-4-4 CSLA BEC-CLA 22.781 106
CSLA-SCBCLA∗ 29.359 108
CSLA BEC-
22.864 108
SCBCLA∗
CSLA 20.280 117
4-4-4-4-4-4-4-4 3-7-6-5-4-3-2-2 CSLA BEC 19.176 104
8-8-8-8 8-7-6-4-3-2-2
CSLA-CLA 19.260 121
16-16 8-8-8-8
CSLA BEC-CLA 19.059 104
Figure 6: Capturing worst-case delay variations of 32-bit homoge- CSLA-SCBCLA∗ 17.897 123
neous and heterogeneous CSLAs for different input partitions. 𝑋-
CSLA BEC-
axis: CSLA type; 𝑌-axis: Delay in ns. 18.052 110
SCBCLA∗
CSLA 23.722 105
CSLA BEC 22.986 91
60
16-16 CSLA-CLA 21.384 114
50 CSLA BEC-CLA 22.835 91
CSLA-SCBCLA∗ 21.097 119
40
CSLA BEC-
22.255 106
SCBCLA∗
30
CSLA 23.337 110
20 CSLA BEC 22.411 108
3-7-6-5- CSLA-CLA 23.337 110
10 4-3-2-2 CSLA BEC-CLA 22.411 108
0 CSLA-SCBCLA∗ 23.408 110
CSLA
CSLA_BEC
CSLA-CLA
CSLA_BEC-CLA
CSLA-SCBCLA
CSLA_BEC-SCBCLA
CSLA BEC-
22.482 108
SCBCLA∗
CSLA 20.218 118
CSLA BEC 20.743 111
8-7-6-4- CSLA-CLA 20.218 118
3-2-2 CSLA BEC-CLA 20.473 111
4-4-4-4-4-4-4-4-4-4-4-4-4-4-4-4 32-32
CSLA-SCBCLA∗ 21.403 117
8-8-8-8-8-8-8-8 8-10-9-8-7-6-5-4-3-2-2
16-16-16-16 CSLA BEC-
20.544 111
SCBCLA∗
Figure 7: Portraying critical path delay variations of 64-bit homo-
geneous and heterogeneous CSLAs for different input partitions. 𝑋-
axis: CSLA type; 𝑌-axis: Delay in ns.
bold font in the tables. Note that the symbol ∗ signifies the
proposed hybrid CSLA architectures in the tables.
The maximum combinational path delay (also called, From Table 1, it is evident that the CSLA-SCBCLA hybrid
“critical path delay”) encountered and the total number of adder based on the 8-8-8-8 input partition features the least
BELs consumed by different homogeneous and heteroge- propagation delay (17.897 ns) amongst all homogeneous and
neous CSLAs to perform the addition of two 32-bit operands hybrid CSLAs, and hence the 8-8-8-8 input partition is
and two 64-bit operands separately is shown in Tables 1 and deemed to be optimum. The 32-bit RCA has critical path
2, respectively. The optimum delay and area values are in delay of 30.604 ns, while the 32-bit CSLA CBL adder is
Advances in Electronics 9
found to have the longest path delay of 37.604 ns. Com- Table 2: Maximum propagation delay and area (# BELs) of 64-bit
pared to the maximum delay of the hybrid CSLA-SCBCLA, homogeneous and heterogeneous CSLAs corresponding to different
the hybrid CSLA BEC-SCBCLA adder which is another input partitions.
proposed hybrid CSLA topology has a comparable speed
Type of CSLA Critical path
performance of 18.052 ns. However with respect to area, the Input partition Area (# BELs)
architecture delay (ns)
RCA and CSLA CBL structures require less number of BELs
Not applicable RCA 71.555 127
than all the CSLAs. Hence it is inferred from Figure 6 and
Table 1 that for the addition of two input operands having Not applicable CSLA CBL 70.525 129
sizes of 32-bits the hybrid CSLA-SCBCLA adder is preferable CSLA 56.091 217
over all other homogeneous and heterogeneous CSLAs and 4-4-4-4- CSLA BEC 40.870 209
the favorable input data partition is 8-8-8-8. 4-4-4-4- CSLA-CLA 56.101 218
Based on a similar observation, by referring to Figure 7 4-4-4-4- CSLA BEC-CLA 34.799 215
and Table 2, it can be seen that the 16-16-16-16 input partition 4-4-4-4
is found to be optimum from a delay (i.e., speed) perspective CSLA-SCBCLA∗ 55.062 220
for 64-bit dual-operand addition. The proposed CSLA BEC- CSLA BEC-
34.882 217
SCBCLA constructed using the 16-16-16-16 input data par- SCBCLA∗
tition leads to the least latency amongst all other adder CSLA 31.866 251
topologies; however, the other proposed CSLA viz. CSLA- CSLA BEC 29.119 224
SCBCLA based on a similar input partition features almost a 8-8-8-8- CSLA-CLA 30.846 255
similar delay metric. In terms of area occupancy though, the 8-8-8-8 CSLA BEC-CLA 29.002 224
64-bit RCA is optimized. Nevertheless, the RCA encounters
CSLA-SCBCLA∗ 29.483 257
considerably more data path delay by 1.6× in comparison with
the proposed CSLA BEC-SCBCLA based on a 16-16-16-16 CSLA BEC-
27.995 230
SCBCLA∗
input partition.
CSLA 29.625 252
4.2. Multioperand Addition. The performance of different CSLA BEC 28.259 212
homogeneous and heterogeneous CSLAs is evaluated based CSLA-CLA 27.759 261
on the case studies of multioperand addition involving 4 16-16-16-16
CSLA BEC-CLA 28.029 213
binary operands, with respective sizes of 32-bits and 64- CSLA-SCBCLA∗ 27.427 266
bits. Two multioperand addition schemes are considered, one
CSLA BEC-
involving the carry save adder (CSA) topology, and another 27.322 227
SCBCLA∗
involving the bit-partitioning method.
CSLA 40.705 217
4.2.1. CSA Based Multioperand Addition. The structure of an CSLA BEC 40.742 189
example CSA used to add four 𝑛-bit binary numbers is shown CSLA-CLA 38.591 215
32-32
in Figure 8. Here, 𝑎𝑛−1 to 𝑎0 , 𝑏𝑛−1 to 𝑏0 , 𝑐𝑛−1 to 𝑐0 , and 𝑑𝑛−1 to 𝑑0 CSLA BEC-CLA 40.157 189
represent the primary inputs and the sum bits and Sum𝑛+1 to CSLA-SCBCLA∗ 38.591 247
Sum0 represents the primary outputs. The subscript 0 denotes CSLA BEC-
the LSB and the subscript (𝑛 − 1) denotes the MSB. As shown SCBCLA∗
39.682 219
in Figure 8, there are three adders in three levels to perform
CSLA 32.983 251
the addition of four input operands. In each CSA, the carry
output signal of the current bit at a level is not transferred to CSLA BEC 31.204 226
the next bit adder of the same level as the carry input; instead, 8-10-9-8-7-6- CSLA-CLA 32.983 251
5-4-3-2-2
the carry output is transferred to the next bit adder in the CSLA BEC-CLA 31.204 226
lower level as the carry input. In the top-level adder, three CSLA-SCBCLA∗ 33.054 251
numbers (𝑎, 𝑏, and 𝑐) are added simultaneously; that is, the CSLA BEC-
31.276 226
bits corresponding to any number could act as input carries SCBCLA∗
for the full adders of the first level CSA. In the next lower level,
an extra number (𝑑) is added. The adder in the bottom level,
shown within the ellipse in Figure 8, is a simple RCA which
is what portrayed here but it may be any dual-operand adder The FPGA-based synthesis results viz. delay and area
that can be used to compute the final sum. obtained for the addition of four binary operands, each
Experimentation was performed by having different dual- having size of 32-bits, are given in Table 3 with the optimized
operand adders viz. RCA and various homogeneous and values in bold font. Since the 8-8-8-8 primary input partition
heterogeneous CSLAs in the final adder stage of the CSA, was found to yield the least data path delay, as evident
shown in Figure 8, to analyze their relative performance for from Figure 6 and Table 1, it was preferred for the various
two different addition scenarios: (i) addition of four binary CSLA realizations. It can be seen from Table 3 that the
operands, each of size 32-bits, and (ii) addition of four binary hybrid CSLA BEC-CLA when used in the final adder stage
operands with each having size of 64-bits. of the CSA encounters the least propagation delay, with
10 Advances in Electronics
Full ···
Full Full
adder adder adder
0
dn−1 d1 d0
Full Full
adder ··· Full
adder adder
Full Full
adder Full ···
adder adder
and
-oper
Sumn+1 Sumn Sumn−1 Sum1 Sum0 Dual d e r
ad
Table 3: Critical path delay and area figures for CSA-based multi- Table 4: Critical path delay and area for CSA-based multioperand
operand addition of four 32-bit operands, with RCA/homogeneous/ addition of four 64-bit operands, with RCA/homogeneous/hetero-
heterogeneous CSLAs used in the final adder stage. geneous CSLAs used in the final adder stage.
the proposed CSLA BEC-SCBCLA adder closely following it Table 4 and the optimized values are in bold font. Since
with just a 1.7% delay difference. The conventional CLA, when the 16-16-16-16 uniform input partition was found to be
used in the final adder stage of the CSA as a “homogeneous delay optimal (refer to Figure 7 and Table 2), it was adopted
adder,” reports a critical path delay of 34.306 ns. On the for implementing all the CSLAs. Again, the CSLA BEC-
contrary, when the conventional CLA is used along with CLA variant reports the least propagation delay compared to
the CSLA inclusive of the BEC as a “heterogeneous adder” others as in the previous case, with the proposed CSLA BEC-
(CSLA BEC-CLA), it enables considerable decrease in max- SCBCLA reporting almost a similar performance. However
imum data path delay by 37.8% vindicating the observation due to less logic complexity, the usage of RCA or CSLA CBL
made in [24] that heterogeneous adders are preferable over in the final adder stage of the CSA results in the least area
homogeneous adders for delay optimization. Although the occupancy in comparison with the rest, albeit at the expense
use of RCA and CSLA CBL adders in the final adder stage of
of a considerable increase in delay by about 1.4x.
the CSA helps to minimize the area occupancy compared to
their counterparts, they suffer from an exacerbated increase
in delay of about 87% over the CSLA BEC-CLA type. 4.2.2. Bit-Partitioned Multioperand Addition. In CSAs, row-
The synthesis results obtained for the addition of four wise parallel addition is performed where the tree height
binary operands, each having sizes of 64-bits, is shown in (i.e., number of adder levels) grows with an increase in
Advances in Electronics 11
(m − 1) 0
0
Number of
operands
(n − 1)
(m − 1) 0 (m − 1) 0 (n/2)
0
X_field Y_field
(n/2 − 1) (n − 1)
Final
sum
the number of input operands by an approximate linear regularity would be implicit within the overall architecture as
order. To reduce the logic depth of the adder tree, a bit- the gate-level hardware is being duplicated.
partitioning strategy was presented in [30] in the context of In this work, the bit-partitioning scheme was employed
self-timed multioperand addition, which involved splitting to partition the set of four inputs into two input groups
up of the entire group of data operands into a desired number (𝑋 field and 𝑌 field, as shown in Figure 9) and the outputs
of subgroups, and the intermediate addition results of the of 𝑋 and 𝑌 fields were then added to produce the final sum.
subgroups are finally added to produce the final sum. The bit- Several dual-operand adders were used to realize the bit-
partitioning approach basically parallelizes the multioperand partitioned addition separately viz. RCA, CSLA CBL, CSLA,
addition and is illustrated through Figure 9 for an example CSLA BEC, CSLA-CLA, CSLA BEC-CLA, CSLA-SCBCLA,
scenario where addition of “𝑛” binary operands with each and CSLA BEC-SCBCLA. The different bit-partitioned addi-
operand having a size of “𝑚” bits is considered whilst tion structures were individually synthesized using the same
assuming “𝑛” to be even. A “dot” represents a bit position in FPGA (XC3S1600E). It should be noted that the focus here is
Figure 9. only on evaluating the performance of the RCA and different
The entire set of input operands from bit position 0 to bit CSLAs as employed for multioperand addition and not to
position (𝑛 − 1) is divided into two equal-sized groups (for an comment upon the efficacy of the bit-partitioning scheme
example) as 𝑋 field, which comprises inputs from bit posi- as such (i.e., no comparison with the results of the previous
tions 0 to (𝑛/2−1) and the 𝑌 field consisting of inputs from bit subsection). This is because, as mentioned in the preceding
positions (𝑛/2) to (𝑛−1). Addition within the individual fields discussions, the bit-partitioning technique is scalable, can
(i.e., 𝑋 field and 𝑌 field) is performed simultaneously and be custom-defined, and could potentially benefit in terms of
the sum bits generated as intermediate outputs from these latency reduction primarily for additions involving typically
individual fields (𝑋 field and 𝑌 field) are then added together higher dimensions as compared with conventional combina-
using a final dual-operand adder to produce the required tional tree structures.
sum. The bit-partitioning scheme might help to speed-up the Table 5 presents the timing and area results obtained for
addition, especially when several operands have to be added the synthesis of bit-partitioned multi-input addition of 4
by way of performing parallel column-wise addition of row- binary operands, each of size 32-bits, on the basis of RCA
wise partitions. For example, considering the addition of 32 and various homogeneous and heterogeneous CSLAs. Since
data operands, each of size 32 bits, the CSA topology would the 8-8-8-8 uniform input partition was found to be delay-
encounter thirty full adder delays plus the delay associated optimum for realizing the 32-bit CSLAs (refer to Figure 6
with the final dual-operand adder. On the other hand, and Table 1), only this uniform input partition has been
based on the bit-partitioning technique, considering eight considered for implementing the various homogeneous and
partitions with each partition comprising four data operands, hybrid CSLAs corresponding to 𝑋-field and 𝑌 field of the bit-
the bit-partitioned multioperand adder based upon the CSA partitioned multioperand addition. To sum up the outputs of
topology could encounter a reduced propagation delay of 𝑋-field and 𝑌 field, a 33-bit dual-operand adder would be
about four full adder delays plus the delay of a dual-operand required in which case an extra bit has been added to the
adder, depending upon the implementation. Also, a high most significant position of various CSLA input partitions.
12 Advances in Electronics
Table 5: Critical path delay and area metrics for bit-partitioned in terms of less critical path delay, the proposed CSLA-
multioperand addition of four 32-bit operands, with RCA and SCBCLA benefits by achieving a good delay reduction of
various homogeneous/hybrid CSLA architectures used. 38.2% compared to the maximum path delay of the RCA
based bit-partitioned multioperand adder.
Type of adder Critical path
Input partition Area (# BELs)
architecture delay (ns)
Not applicable RCA 39.928 190 5. Conclusions
Not applicable CSLA CBL 42.241 195 CSLA is an important member of the high-speed adder
CSLA 32.303 458 family. In this paper, existing CSLA architectures viz. homo-
CSLA BEC 29.278 311 geneous and heterogeneous have been described and two new
CSLA-CLA 31.727 359 hybrid CSLA topologies were put forward: (i) carry select-
8-8-8-8 CSLA BEC- cum-section-carry based carry lookahead adder (CSLA-
28.207 325 SCBCLA) and (ii) carry select-cum-section-carry based
CLA
CSLA- carry lookahead adder including BEC logic (CSLA BEC-
27.628 365 SCBCLA). The speed performances of the various CSLA
SCBCLA∗
CSLA BEC- structures have been analyzed based on the case studies
27.056 328 of 32-bit and 64-bit dual-operand and multioperand addi-
SCBCLA∗
tions. Both uniform and nonuniform input data partitions
were considered for the various CSLA implementations
Table 6: Critical path delay and area parameters for bit-partitioned
multioperand addition of four 64-bit operands, with RCA and
and FPGA-based synthesis was performed. It has been
various homogeneous/hybrid CSLA architectures used. found for dual-operand additions; the proposed CSLA-
SCBCLA/CSLA BEC-SCBCLA architecture is faster and out-
Type of adder Critical path performs all other homogeneous and heterogeneous CSLAs.
Input partition Area (# BELs)
architecture delay (ns) For bit-partitioned multi-input additions, the proposed
Not applicable RCA 73.840 382 CSLA-SCBCLA/CSLA BEC-SCBCLA architecture promises
Not applicable CSLA CBL 77.946 388 high speed. Nevertheless, for multioperand addition based
on the CSA topology, the conventional CSLA BEC-CLA and
CSLA 50.957 748
the proposed CSLA BEC-SCBCLA architectures were found
CSLA BEC 46.559 637 to exhibit an optimized and comparable speed performance.
CSLA-CLA 50.426 781 From the inferences derived through this work, it is likely
16-16-16-16 CSLA BEC- that the proposed hybrid CSLA architectures could achieve
45.679 648
CLA enhanced performance over conventional CSLAs for ASIC-
CSLA- based synthesis as well.
45.608 800
SCBCLA∗
CSLA BEC-
45.665 691 Conflict of Interests
SCBCLA∗
The authors declare that there is no conflict of interests
regarding the publication of this paper.
The optimum synthesis metrics obtained for the example
multi-input addition are in bold font in Table 5. It can be Acknowledgment
seen that the proposed CSLA BEC-SCBCLA paves the way
for least computation time (27.056 ns) amongst all. In com- The authors thank the constructive comments of the review-
parison, the undesirable increases in delay values for other ers, especially the pointing out of some typos in the initial
bit-partitioned multioperand adders incorporating RCA, submitted version by a reviewer, which has helped to improve
CSLA CBL, CSLA, CSLA BEC, CSLA-CLA, CSLA BEC- this paper’s presentation.
CLA, and CSLA-SCBCLA types are found to be 47.6%, 56.1%,
15.9%, 3%, 15.9%, 3%, and 2.1%, respectively. However, the References
RCA results in the lowest area occupancy (190 BELs) and the
[1] O. J. Bedrij, “Carry-select adder,” IRE Transactions on Electronic
CSLA CBL adder occupies nearly the same area with just 5 Computers, vol. 11, no. 3, pp. 340–346, 1962.
more BELs. Nevertheless, the bit-partitioned multioperand
[2] A. R. Omondi, Computer Arithmetic Systems: Algorithms, Archi-
adder based upon the RCA pays a 47.6% delay penalty in tecture and Implementation, Prentice Hall, 1994.
comparison with that utilizing the CSLA BEC-SCBCLA. [3] I. Koren, Computer Arithmetic Algorithms, A K Peeters/CRC
Table 6 shows the delay and area values obtained for the Press, 2nd edition, 2001.
synthesis of bit-partitioned addition of four input operands of [4] B. Parhami, Computer Arithmetic: Algorithms and Hardware
sizes 64 bits, corresponding to different adder architectures, Designs, Oxford University Press, New York, NY, USA, 2nd
with the CSLAs utilizing the 16-16-16-16 uniform input edition, 2010.
partition since this partition was found to be delay optimal [5] T.-Y. Chang and M.-J. Hsiao, “Carry-select adder using single
(refer to Figure 7 and Table 2). With respect to less area, ripple-carry adder,” Electronics Letters, vol. 34, no. 22, pp. 2101–
the RCA is found to be the optimum architecture. However, 2103, 1998.
Advances in Electronics 13
[6] Y. Kim and L.-S. Kim, “64-bit carry-select adder with reduced [22] R. Yousuf and Najeeb-ud-din, “Synthesis of carry select adder in
area,” Electronics Letters, vol. 37, no. 10, pp. 614–615, 2001. 65nm FPGA,” in Proceedings of the IEEE Region 10 Conference
[7] B. Ramkumar and H. M. Kittur, “Low-power and area-efficient (TENCON ’08), pp. 1–6, November 2008.
carry select adder,” IEEE Transactions on VLSI Systems, vol. 20, [23] U. Sajesh Kumar and K. K. Mohamed Salih, “Efficient carry
no. 2, pp. 371–375, 2012. select adder design for FPGA implementation,” Procedia Engi-
[8] I.-C. Wey, C.-C. Ho, Y.-S. Lin, and C.-C. Peng, “An area-efficient neering, vol. 30, pp. 449–456, 2012.
carry select adder design by sharing the common boolean logic [24] J.-G. Lee, J.-A. Lee, B.-S. Lee, and M. D. Ercegovac, “A design
term,” in Proceedings of the International MultiConference of method for heterogeneous adders,” in Embedded Software and
Engineers and Computer Scientists (IMECS '12), vol. 2, pp. 1091– Systems, vol. 4523 of Lecture Notes in Computer Science, pp. 121–
1094, March 2012. 132, Springer, 2007.
[9] P. Balasubramanian and N. E. Mastorakis, “High speed gate [25] K. Preethi and P. Balasubramanian, “FPGA implementation of
level synchronous full adder designs,” WSEAS Transactions on synchronous section-carry based carry look-ahead adders,” in
Circuits and Systems, vol. 8, no. 2, pp. 290–300, 2009. Proceedings of the IEEE 2nd International Conference on Devices,
[10] W. Jeong and K. Roy, “Robust high-performance low-power Circuits and Systems (ICDCS ’14), pp. 1–4, IEEE, Combiatore,
carry select adder,” in Proceedings of the Asia and South Pacific India, March 2014.
Design Automation Conference, pp. 503–506, Kitakyushu, Japan, [26] P. Balasubramanian, D. A. Edwards, and W. B. Toms, “Self-
January 2003. timed section-carry based carry lookahead adders and the con-
[11] M. Alioto, G. Palumbo, and M. Poli, “A gate-level strategy cept of alias logic,” Journal of Circuits, Systems and Computers,
to design carry select adders,” in Proceedings of the IEEE vol. 22, no. 4, Article ID 1350028, 2013.
International Symposium on Circuits and Systems, vol. 2, pp. [27] P. Balasubramanian, D. A. Edwards, and H. R. Arabnia, “Robust
465–468, IEEE, May 2004. asynchronous carry lookahead adders,” in Proceedings of the 11th
[12] M. Alioto, G. Palumbo, and M. Poli, “Optimized design of International Conference on Computer Design, pp. 119–124, 2011.
parallel carry-select adders,” Integration, the VLSI Journal, vol. [28] Xilinx, http://www.xilinx.com.
44, no. 1, pp. 62–74, 2011. [29] K. K. Parhi, “Low-energy CSMT carry generators and binary
[13] A. Nève, H. Schettler, T. Ludwig, and D. Flandre, “Power- adders,” IEEE Transactions on VLSI Systems, vol. 7, no. 4, pp.
delay product minimization in high-performance 64-bit carry 450–462, 1999.
select adders,” IEEE Transactions on Very Large Scale Integration [30] P. Balasubramanian, D. A. Edwards, and W. B. Toms, “Self-
(VLSI) Systems, vol. 12, no. 3, pp. 235–244, 2004. timed multi-operand addition,” International Journal of Cir-
[14] Y. He, C.-H. Chang, and J. Gu, “An area efficient 64-bit cuits, Systems and Signal Processing, vol. 6, no. 1, pp. 1–11, 2012.
square root carry-select adder for low power applications,” in
Proceedings of the IEEE International Symposium on Circuits and
Systems (ISCAS ’05), vol. 4, pp. 4082–4085, May 2005.
[15] B. K. Mohanty and S. K. Patel, “Area-delay-power efficient carry
select adder,” IEEE Transactions on Circuits and Systems II:
Express Briefs, vol. 61, no. 6, pp. 418–422, 2014.
[16] J. Monteiro, J. L. Güntzel, and L. Agostini, “A1CSA: an energy-
efficient fast adder architecture for cell-based VLSI design,”
in Proceedings of the 18th IEEE International Conference on
Electronics, Circuits and Systems (ICECS '11), pp. 442–445,
Beirut, Lebanon, December 2011.
[17] Y. Chen, H. Li, K. Roy, and C.-K. Koh, “Cascaded carry-select
adder (C2 SA): a new structure for low-power CSA design,”
in Proceedings of the International Symposium on Low Power
Electronics and Design, pp. 115–118, August 2005.
[18] Y. Wang, C. Pai, and X. Song, “The design of hybrid carry-
lookahead/carry-select adders,” IEEE Transactions on Circuits
and Systems II: Analog and Digital Signal Processing, vol. 49, no.
1, pp. 16–24, 2002.
[19] G. A. Ruiz and M. Granda, “An area-efficient static CMOS
carry-select adder based on a compact carry look-ahead unit,”
Microelectronics Journal, vol. 35, no. 12, pp. 939–944, 2004.
[20] H. G. Tamar, A. G. Tamar, K. Hadidi, A. Khoei, and P.
Hoseini, “High speed area reduced 64-bit static hybrid carry-
lookahead/carry-select adder,” in Proceedings of the 18th IEEE
International Conference on Electronics, Circuits and Systems
(ICECS’ 11), pp. 460–463, December 2011.
[21] V. Kokilavani, P. Balasubramanian, and H. R. Arabnia, “FPGA
realization of hybrid carry select-cum-section-carry based
carry lookahead adders,” in Proceedings of the 12th International
Conference on Embedded Systems and Applications, pp. 81–85,
2014.
International Journal of
Rotating
Machinery
International Journal of
The Scientific
Engineering Distributed
Journal of
Journal of
Journal of
Control Science
and Engineering
Advances in
Civil Engineering
Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
Journal of
Journal of Electrical and Computer
Robotics
Hindawi Publishing Corporation
Engineering
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
VLSI Design
Advances in
OptoElectronics
International Journal of
International Journal of
Modelling &
Simulation
Aerospace
Hindawi Publishing Corporation Volume 2014
Navigation and
Observation
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
in Engineering
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Engineering
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
Hindawi Publishing Corporation
http://www.hindawi.com
http://www.hindawi.com Volume 2014
International Journal of
International Journal of Antennas and Active and Passive Advances in
Chemical Engineering Propagation Electronic Components Shock and Vibration Acoustics and Vibration
Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014