0% found this document useful (0 votes)
69 views21 pages

AdvVLSI Module1

Uploaded by

priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views21 pages

AdvVLSI Module1

Uploaded by

priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

21EC71:Advanced VLSI

Module 1: INTRODUCTION TO ASICs

Syllabus: Introduction to ASICs: Full custom, Semi-custom and Programmable ASICs, ASIC Design flow,
ASIC cell libraries. CMOS Logic: Data path Logic Cells: Data Path Elements, Adders: Carry skip, Carry
bypass, Carry save, Carry select, Conditional sum, Multiplier (Booth encoding), Data path Operators, I/O
cells, Cell Compilers.

An ASIC (pronounced ―a-sick‖; bold typeface defines a new term) is an application-specific integrated
circuit —at least that is what the acronym stands for. Before we answer the question of what that means
we first look at the evolution of the silicon chip or integrated circuit ( IC ).
Figure 1.1(a) shows an IC package (this is a pin-grid array, or PGA, shown upside down; the pins will go
through holes in a printed-circuit board). People often call the package a chip, but, as you can see in
Figure 1.1(b), the silicon chip itself (more properly called a die ) is mounted in the cavity under the
sealed lid. A PGA package is usually made from a ceramic material, but plastic packages are also
common

FIGURE 1.1 An integrated circuit (IC). (a) A pingrid array (PGA) package. (b) The silicon die or chip is
under the package lid.

The physical size of a silicon die varies from a few millimeters on a side to over 1 inch on a side, but
instead we often measure the size of an IC by the number of logic gates or the number of transistors that
the IC contains. As a unit of measure a gate equivalent corresponds to a two-input NAND gate (a circuit
that performs the logic function, F = A • B ). Often we just use the term gates instead of gate equivalents
when we are measuring chip size—not to be confused with the gate terminal of a transistor. For
example, a 100 k-gate IC contains the equivalent of 100,000 two-input NAND gates.

The semiconductor industry has evolved from the first ICs of the early 1970s and matured rapidly since
then. Early small-scale integration ( SSI ) ICs contained a few (1 to 10) logic gates—NAND gates, NOR
gates, and so on—amounting to a few tens of transistors. The era of medium-scale integration ( MSI )
increased the range of integrated logic available to counters and similar, larger scale, logic functions. The
era of large-scale integration ( LSI ) packed even larger logic functions, such as the first microprocessors,
into a single chip. The era of very large-scale integration ( VLSI ) now offers 64-bit microprocessors,
complete with cache memory and floating-point arithmetic units—well over a million transistors—on a
single piece of silicon. As CMOS process technology improves, transistors continue to get smaller and ICs
hold more and more transistors. Some people (especially in Japan) use the term ultralarge scale
integration (ULSI ), but most people stop at the term VLSI; otherwise we have to start inventing new
words.

Mrs. Kavya M P,Dept. of ECE, PESITM Page 1


21EC71:Advanced VLSI

1.1 Types of ASICs


ICs are made on a thin (a few hundred microns thick), circular silicon wafer , with each wafer holding
hundreds of die (sometimes people use dies or dice for the plural of die). The transistors and wiring are
made from many layers (usually between 10 and 15 distinct layers) built on top of one another. Each
successive mask layer has a pattern that is defined using a mask similar to a glass photographic slide.
The first half-dozen or so layers define the transistors. The last half-dozen or so layers define the metal
wires between the transistors (the interconnect).

A full-custom IC includes some (possibly all) logic cells that are customized and all mask layers that are
customized. A microprocessor is an example of a full-custom IC—designers spend many hours
squeezing the most out of every last square micron of microprocessor chip space by hand. Customizing
all of the IC features in this way allows designers to include analog circuits, optimized memory cells, or
mechanical structures on an IC, for example. Full-custom ICs are the most expensive to manufacture and
to design.
The manufacturing lead time (the time it takes just to make an IC—not including design time) is
typically eight weeks for a full-custom IC. These specialized full-custom ICs are often intended for a
specific application, so we might call some of them full-custom ASICs.

1.1.1 Full-Custom ASICs

In a full-custom ASIC an engineer designs some or all of the logic cells, circuits, or layout specifically for
one ASIC. This means the designer abandons the approach of using pretested and pre-characterized cells
for all or part of that design. It makes sense to take this approach only if there are no suitable existing
cell libraries available that can be used for the entire design. This might be because existing cell libraries
are not fast enough, or the logic cells are not small enough or consume too much power. You may need
to use full-custom design if the ASIC technology is new or so specialized that there are no existing cell
libraries or because the ASIC is so specialized that some circuits must be custom designed. Fewer and
fewer full-custom ICs are being designed because of the problems with these special parts of the ASIC.
There is one growing member of this family, though, the mixed analog/digital ASIC, which we shall
discuss next.

1.1.2. Semicustom ASICs


ASICs , for which all of the logic cells are predesigned and some (possibly all) of the mask layers are
customized are called semi custom ASICs. Using the predesigned cells from a cell library makes the
design , much easier. There are two types of semicustom ASICs
(i) Standard-cell–based ASICs (ii)Gate-array–based ASICs.

Mrs. Kavya M P,Dept. of ECE, PESITM Page 2


21EC71:Advanced VLSI

(i) Standard-Cell Based ASICs

• A cell-based ASIC (cell-based IC, or CBIC pronounced sea-bick) uses predesigned logic cells
(AND gates, OR gates, multiplexers, and flip-flops, for example) known as standard cells.
• One can apply the term CBIC to any IC that uses cells, but it is generally accepted that a cell-
based ASIC or CBIC means a standard-cell based ASIC.
• The standard-cell areas (also called flexible blocks) in a CBIC are built of rows of standard cells
like a wall built of bricks. The standard-cell areas may be used in combination with
microcontrollers or even microprocessors, known as mega cells. Mega cells are also called mega
functions, full-custom blocks, system-level macros (SLMs), fixed blocks, cores, or Functional
Standard Blocks (FSBs).
• A cell-based ASIC (CBIC) die with a single standard-cell area (a flexible block) together with
four fixed blocks.
• The ASIC designer defines only the placement of the standard cells and the interconnect in a
CBIC. However, the standard cells can be placed anywhere on the silicon; this means that all the
mask layers of a CBIC are customized and are unique to a particular customer.
• The advantage of CBICs is that designers save time, money, and reduce risk by using a
predesigned, pretested, and pre characterized standard-cell library.
• In addition each standard cell can be optimized individually. During the design of the cell library
each and every transistor in every standard cell can be chosen to maximize speed or minimize
area .
• The disadvantages are the time or expense of designing or buying the standard-cell library and
the time needed to fabricate all layers of the ASIC for each new design.

(ii). Gate-Array Based ASICs

• In a gate array (sometimes abbreviated GA) or gate-array based ASIC the transistors are
predefined on the silicon wafer.
• The predefined pattern of transistors on a gate array is the base array , and the smallest element
that is replicated to make the base array is the base cell (sometimes called a primitive cell ).

Mrs. Kavya M P,Dept. of ECE, PESITM Page 3


21EC71:Advanced VLSI

• Only the top few layers of metal, which define the interconnect between transistors, are defined
by the designer using custom masks. To distinguish this type of gate array from other types of
gate array, it is often called a masked gate array ( MGA ).
• The designer chooses from a gate-array library of predesigned and pre-characterized logic cells
• The logic cells in a gate-array library are often called macros . The reason for this is that the
base-cell layout is the same for each logic cell, and only the interconnect (inside cells and
between cells) is customized, which is similar to a software macro.

• Types of MGA or Gate-array based ASICs

There are three types of Gate Array based ASICs.

 Channeled gate arrays.


 Channelless gate arrays.
 Structured gate arrays.

 Channeled gate arrays

• The channeled gate array was the first to be developed . In a channeled gate array space is left
between the rows of transistors for wiring.
• A channeled gate array is similar to a CBIC. Both use the rows of cells separated by channels
used for interconnect. One difference is that the space for interconnect between rows of cells are
fixed in height in a channeled gate array, whereas the space between rows of cells may be
adjusted in a CBIC.

 Channel less Gate Array

Mrs. Kavya M P,Dept. of ECE, PESITM Page 4


21EC71:Advanced VLSI

• This channel less gate-array architecture is now more widely used . The routing on a channelless
gate array uses rows of unused transistors.
• The key difference between a channel less gate array and channeled gate array is that there are
no predefined areas set aside for routing between cells on a channel less gate array. Instead we
route over the top of the gate-array devices. We can do this because we customize the contact
layer that defines the connections between metal 1, the first layer of metal, and the transistors.
• Features of Channel less Gate Array
 Only the interconnect is customized.
 The interconnect uses predefined spaces between rows of base cells.
 Manufacturing lead time is around two days to two weeks.
 When we use an area of transistors for routing in a channel less array, we do not make
any contacts to the devices lying underneath , we simply leave the transistors unused.
• The basic difference between a channel less gate array and channeled gate array is that there are
no predefined areas set aside for routing between cells on a channel less gate array. Instead we
route over the top of the gate-array devices.
• It is done like this because we customize the contact layer that defines the connections between
metal1, the first layer of metal, and the transistors. When we use an area of transistors for
routing in a channel less array, we do not make any contacts to the devices lying underneath;
we simply leave the transistors unused.

 Structured Gate Array

• This design combines some of the features of CBICs and MGAs.It is also known as an embedded
gate array or structured gate array(also called as master slice or master image).

• One of the limitations of the MGA is the fixed gate-array base cell. This makes the
implementation of memory, difficult and inefficient.

• In an embedded gate array some of the IC area is set aside and dedicate it to a specific function.
This embedded area either can contain a different base cell that is more suitable for building
memory cells, or it can contain a complete circuit block, such as a microcontroller.

Mrs. Kavya M P,Dept. of ECE, PESITM Page 5


21EC71:Advanced VLSI

• Features of Structured Gate Array

 Only the interconnect is customized.


 Custom blocks (the same for each design) can be embedded.
 Manufacturing lead time is between two days and two weeks.
 An embedded gate array gives the improved area efficiency and increased performance
of a CBIC but with the lower cost and faster turn around of an MGA.
 The disadvantage of an embedded gate array is that the embedded function is fixed.

(iii). Programmable Logic Devices

• Programmable logic devices ( PLDs ) are standard ICs that are available in standard
configurations.

• However, PLDs may be configured or programmed to create a part customized to a specific


application, and so they also belong to the family of ASICs.

• PLDs use different technologies to allow programming of the device.

• Features of PLDs

 No customized mask layers or logic cells


 Fast design turnaround
 A single large block of programmable interconnect
 A matrix of logic macro cells that usually consist of programmable array logic followed
by a flip-flop or latch

• The simplest type of programmable IC is a read-only memory( ROM ). The most common types
of ROM use a metal fuse that can be blown permanently (a programmable ROM or PROM ).

• An electrically programmable ROM , or EPROM , uses programmable MOS transistors whose


characteristics are altered by applying a high voltage.

Mrs. Kavya M P,Dept. of ECE, PESITM Page 6


21EC71:Advanced VLSI

• One can erase an EPROM either by using another high voltage (an electrically erasable PROM ,
or EEPROM ) or by exposing the device to ultraviolet light (UV-erasable PROM, or UVPROM).

• There is another type of ROM that can be placed on any ASIC a mask-programmable ROM
(mask-programmed ROM or masked ROM). A masked ROM is a regular array of transistors
permanently programmed using custom mask patterns.

• So, an embedded masked ROM is a large, specialized, logic cell.

 Field-Programmable Gate Arrays(FPGAs)

• FPGAs are the newest member of the ASIC family and are rapidly growing in , replacing TTL in
microelectronic systems. Even though an FPGA is a type of gate array, we do not consider the
term gate-array based ASICs to include FPGAs.
• There is very little difference between an FPGA and a PLD .An FPGA is usually just larger and
more complex than a PLD. In fact, some vendors that manufacture programmable ASICs call their
products as FPGAs and some call them as complex PLDs .

• Features of PLDs

 None of the mask layers are customized.


 There is a method for programming the basic logic cells and the interconnect.
 The core is a regular array of programmable basic logic cells that can implement
combinational as well as sequential logic (flip-flops).
 A matrix of programmable interconnect surrounds the basic logic cells.
 Programmable I/O cells surround the core.
 Design turnaround is a few hours.

Mrs. Kavya M P,Dept. of ECE, PESITM Page 7


21EC71:Advanced VLSI

1.2. ASIC design flow diagram

The sequence of steps to design an ASIC is known as the Design flow . The various steps involved in
ASIC design flow are given below.

1. Design entry : Design entry is a stage where the micro architecture is implemented in a
Hardware Description language like VHDL, Verilog , System Verilog etc.
In early days , a schematic editor was used for design entry where designers instantiated
gates. Increased complexity in the current designs require the use of HDLs to gain
productivity . Another advantage is that HDLs are independent of process technology and
hence can be re-used over time
2. Logic synthesis: Use an HDL (VHDL or Verilog) and a logic synthesis tool to produce a net
list a description of the logic cells and their connections
3. System partitioning: Divide a large system into ASIC-sized pieces.
4. Pre-layout simulation: Check to see if the design functions correctly.
5. Floor planning: Arrange the blocks of the netlist on the chip.
6. Placement: Decide the locations of cells in a block.
7. Routing: Make the connections between cells and blocks.
Steps 1–4 are part of logical design , and steps 5–9 are part of physical design . There is some overlap.
For example, system partitioning might be considered as either logical or physical design. To put it
another way, when we are performing system partitioning we have to consider both logical and physical
factors. Chapters 9–14 of this book is largely about logical design and Chapters 15–17 largely about
physical design.

Mrs. Kavya M P,Dept. of ECE, PESITM Page 8


21EC71:Advanced VLSI

1.3. Datapath Logic Cells

Suppose we wish to build an n -bit adder (that adds two n -bit numbers) and to exploit the regularity of
this function in the layout. We can do so using a datapath structure. The following two functions, SUM
and COUT, implement the sum and carry out for a full adder ( FA ) with two data inputs (A, B) and a
carry in, CIN

SUM = A ⊕ B ⊕ CIN = SUM(A, B, CIN) = PARITY(A, B, CIN)

COUT = A · B +A· CIN + B · CIN = MAJ(A, B, CIN).

The sum uses the parity function ('1' if there are an odd numbers of '1's in the inputs). The carry out,
COUT, uses the 2-of-3 majority function ('1' if the majority of the inputs are '1'). We can combine these
two functions in a single FA logic cell, ADD(A[ i ], B[ i ], CIN, S[ i], COUT), shown in

FIGURE 2.20 A datapath adder. (a) A full-adder (FA) cell with inputs (A and B), a carry in, CIN,
sum output, S, and carry out, COUT. (b) A 4-bit adder. (c) The layout, using two-level metal, with
data in m1 and control in m2. In this example the wiring is completed outside the cell; it is also
possible to design the datapath cells to contain the wiring. Using three levels of metal, it is
possible to wire overthe top of the datapath cells. (d) The datapath layout.

Now we can build a 4-bit ripple-carry adder ( RCA ) by connecting four of these ADD cells together as
shown in Figure 2.20(b). The i th ADD cell is arranged with the following: two bus inputs A[ i ], B[ i ]; one
bus output S[ i ]; an input, CIN, that is the carry in from stage (i – 1) below and is also passed up to the
cell above as an output; and an output, COUT, that is the carry out to stage ( i + 1) above. In the 4-bit
adder shown in Figure 2.20(b) we connect the carry input, CIN[0], to VSS and use COUT[3] and COUT[2]
to indicate arithmetic overflow (in Section 2.6.1 we shall see why we may need both signals). Notice that
we build the ADD cell so that COUT[2] is available at the top of the datapath when we need it.

Figure 2.20(c) shows a layout of the ADD cell. The A inputs, B inputs, and S outputs all use m1
interconnect running in the horizontal direction—we call these data signals. Other signals can enter or
exit from the top or bottom and run vertically across the datapath in m2—we call these control signals.

Mrs. Kavya M P,Dept. of ECE, PESITM Page 9


21EC71:Advanced VLSI

We can also use m1 for control and m2 for data, but we normally do not mix these approaches in the
same structure. Control signals are typically clocks and other signals common to elements. For example,
in Figure 2.20(c) the carry signals, CIN and COUT, run vertically in m2 between cells. To build a 4-bit
adder we stack four ADD cells creating the array structure shown in Figure 2.20(d). In this case the A
and B data bus inputs enter from the left and bus S, the sum, exits at the right, but we can connect A, B,
and S to either side if we want.

The layout of buswide logic that operates on data signals in this fashion is called a datapath . The module
ADD is a datapath cell ordatapath element . Just as we do for standard cells we make all the datapath
cells in a library the same height so we can abut other datapath cells on either side of the adder to create
a more complex datapath. When people talk about a datapath they always assume that it is oriented so
that increasing the size in bits makes the datapath grow in height, upwards in the vertical direction, and
adding different datapath elements to increase the function makes the datapath grow in width, in the
horizontal direction—but we can rotate and position a completed datapath in any direction we want on
a chip.

1.4. Datapath Elements

Figure 2.21 shows some typical datapath symbols for an adder (people rarely use the IEEE standards in
ASIC datapath libraries). I use heavy lines (they are 1.5 point wide) with a stroke to denote a data bus
(that flows in the horizontal direction in a datapath), and regular lines (0.5 point) to denote the control
signals (that flow vertically in a datapath). At the risk of adding confusion where there is none, this
stroke to indicate a data bus has nothing to do with mixed-logic conventions. For a bus, A[31:0] denotes
a 32-bit bus with A[31] as the leftmost or most-significant bit or MSB , and A[0] as the least- significant
bit or LSB . Sometimes we shall use A[MSB] or A[LSB] to refer to these bits. Notice that if we have an n -
bit bus and LSB = 0, then MSB = n – 1. Also, for example, A[4] is the fifth bit on the bus (from the LSB).
We use a ' S ' or 'ADD' inside the symbol to denote an adder instead of '+', so we can attach '–' or '+/–' to
the inputs for a subtracter or adder/subtracter.

FIGURE 2.21 Symbols for a datapath adder. (a) A data bus is shown by a heavy line (1.5 point) and a bus
symbol. If the bus is n -bits wide then MSB = n – 1. (b) An alternative symbol for an adder. (c) Control signals
are shown as lightweight (0.5 point) lines.

Mrs. Kavya M P,Dept. of ECE, PESITM Page 10


21EC71:Advanced VLSI

1.5. Adders

1.5.1. Ripple Carry Adder

We can view addition in terms of generate , G[ i ], and propagate , P[ i ], signals.

method 1 method 2
G[i] = A[i] · B[i] G[ i ] = A[ i ] · B[ i ] (2.42)
P[ i ] = A[ i ] ⊕ B[ i P[ i ] = A[ i ] + B[ i ] (2.43)
C[ i ] = G[ i ] + P[ i ] · C[ i –1] C[ i ] = G[ i ] + P[ i ] · C[ i –1] (2.44)
S[ i ] = P[ i ] ⊕ C[ i –1] S[ i ] = A[ i ] ⊕ B[ i ] ⊕ C[ i –1] (2.45)

where C[ i ] is the carry-out signal from stage i , equal to the carry in of stage ( i + 1). Thus, C[ i ] = COUT[
i ] = CIN[ i + 1]. We need to be careful because C[0] might represent either the carry in or the carry out of
the LSB stage. For an adder we set the carry in to the first stage (stage zero), C[–1] or CIN[0], to '0'. Some
people use delete (D) or kill (K) in various ways for the complements of G[i] and P[i], but unfortunately
others use C for COUT and D for CIN—so I avoid using any of these. Do not confuse the two different
methods (both of which are used) in Eqs. 2.42–2.45 when forming the sum, since the propagate signal,
P[ i ] , is different for each method.

Figure 2.22(a) shows a conventional RCA. The delay of an n -bit RCA is proportional to n and is limited
by the propagation of the carry signal through all of the stages. We can reduce delay by using pairs
of―go-faster‖ bubbles to change AND and OR gates to fast two-input NAND gates as shown in

Figure 2.22(a). Alternatively, we can write the equations for the carry signal in two different ways:

Mrs. Kavya M P,Dept. of ECE, PESITM Page 11


21EC71:Advanced VLSI

either C[ i ] = A[ i ] · B[ i ] + P[ i ] · C[ i – 1] (2.46) or C[ i ] = (A[ i ] + B[ i ] ) · (P[ i ]' + C[ i – 1]),


(2.47)

where P[ i ]'= NOT(P[ i ]). Equations 2.46 and 2.47 allow us to build the carry chain from two-input
NAND gates, one per cell, using different logic in even and odd stages (Figure 2.22b):

even stages odd stages

C1[i]' = P[i ] · C3[i – 1] · C4[i – 1] C3[i]' = P[i ] · C1[i – 1] · C2[i – 1] (2.48)

C2[i] = A[i ] + B[i ] C4[i]' = A[i ] · B[i ]

C[i] = C1[i ] · C2[i ] C[i] = C3[i ] ' + C4[i ]'

(the carry inputs to stage zero are C3[–1] = C4[–1] = '0'). We can use the RCA of Figure 2.22(b) in a
datapath, with standard cells, or on a gate array.

1.5.2. Carry Save Adder

Instead of propagating the carries through each stage of an RCA, Figure 2.23 shows a different approach.
A carry-save adder ( CSA ) cell CSA(A1[ i ], A2[ i ], A3[ i ], CIN, S1[ i ], S2[ i ], COUT) has three outputs:

S1[ i ] = CIN , (2.51)

S2[ i ] = A1[ i ] ⊕ A2[ i ] ⊕ A3[ i ] = PARITY(A1[ i ], A2[ i ], A3[ i ]) , (2.52)

COUT = A1[ i ] · A2[ i ] + [(A1[ i ] + A2[ i ]) · A3[ i ]] = MAJ(A1[ i ], A2[ i ], A3[ i ]) . (2.53)

The inputs, A1, A2, and A3; and outputs, S1 and S2, are buses. The input, CIN, is the carry from stage ( i –
1). The carry in, CIN, is connected directly to the output bus S1—indicated by the schematic symbol
(Figure 2.23a). We connect CIN[0] to VSS. The output, COUT, is the carry out to stage ( i + 1).

A 4-bit CSA is shown in Figure 2.23(b). The arithmetic overflow signal for ones‘ complement or two‘s
complement arithmetic, OV, is XOR(COUT[MSB], COUT[MSB – 1]) as shown in Figure 2.23(c). In a CSA
the carries are ―saved‖ at each stage and shifted left onto the bus S1. There is thus no carry propagation
and the delay of a CSA is constant. At the output of a CSA we still need to add the S1 bus (all the saved
carries) and the S2 bus (all the sums) to get an n -bit result using a final stage that is not shown in Figure
2.23(c). We might regard the n -bit sum as being encoded in the two buses, S1 and S2, in the form of the
parity and majority functions.

Mrs. Kavya M P,Dept. of ECE, PESITM Page 12


21EC71:Advanced VLSI

We can use a CSA to add multiple inputs—as an example, an adder with four 4-bit inputs is shown in
Figure 2.23(d). The last stage sums two input buses using a carry-propagate adder ( CPA ). We have
used an RCA as the CPA in Figure 2.23(d) and (e), but we can use any type of adder. Notice in

Figure 2.23(e) how the two CSA cells and the RCA cell abut together horizontally to form a bit slice (or
slice) and then the slices are stacked vertically to form the datapath.

FIGURE 2.23 The carry-save adder (CSA). (a) A CSA cell. (b) A 4-bit CSA. (c) Symbol for a CSA. (d) A
four-input CSA. (e) The datapath for a four-input, 4-bit adder using CSAs with a ripple-carry adder
(RCA) as the final stage. (f) A pipelined adder. (g) The datapath for the pipelined version showing
thepipeline registers as well as the clock control lines that use m2.

We can register the CSA stages by adding vectors of flip-flops as shown in Figure 2.23(f). This reduces
the adder delay to that of the slowest adder stage, usually the CPA. By using registers between stages of
combinational logic we use pipelining to increase the speed and pay a price of increased area (for the
registers) and introduce latency . It takes a few clock cycles (the latency, equal to nclock cycles for an n -
stage pipeline) to fill the pipeline, but once it is filled, the answers emerge every clock cycle. Ferris
wheels work much the same way. When the fair opens it takes a while (latency) to fill the wheel, but
once it is full the people can get on and off every few seconds. (We can also pipeline the RCA of Figure
2.20. We add i registers on the A and B inputs before ADD[ i ] and add ( n– i ) registers after the output S[
i ], with a single register before each C[ i ].)

1.5.3. Carry Skip / Bypass Adder.

The problem with an RCA is that every stage has to wait to make its carry decision, C[ i ], until the
previous stage has calculated C[ i – 1]. If we examine the propagate signals we can bypass this critical

Mrs. Kavya M P,Dept. of ECE, PESITM Page 13


21EC71:Advanced VLSI

path. Thus, for example, to bypass the carries for bits 4–7 (stages 5–8) of an adder we can compute
BYPASS = P[4].P[5].P[6].P[7] and then use a MUX as follows:

C[7] = (G[7] + P[7] · C[6]) · BYPASS' + C[3] · BYPASS . (2.54)

Adders based on this principle are called carry-bypass adders ( CBA ) [Sato et al., 1992]. Large, custom
adders employ Manchester-carry chains to compute the carries and the bypass operation using TGs or
just pass transistors [Weste and Eshraghian, 1993, pp. 530–531]. These types of carry chains may be
part of a predesigned ASIC adder cell, but are not used by ASIC designers.

Instead of checking the propagate signals we can check the inputs. For example we can compute

SKIP= (A[ i – 1] ⊕ B[ i – 1]) + (A[ i ]⊕ B[ i ] ) and then use a 2:1 MUX to select C[ i ]. Thus,

CSKIP[ i ] = (G[ i ] + P[ i ] · C[ i – 1]) · SKIP' + C[ i – 2] · SKIP . (2.55)

This is a carry-skip adder [Keutzer, Malik, and Saldanha, 1991; Lehman, 1961]. Carry-bypass and carry-
skip adders may include redundant logic (since the carry is computed in two different ways—we just
take the first signal to arrive). We must be careful that the redundant logic is not optimized away during
logic synthesis.

1.5.4. Carry Look-ahead Adder

If we evaluate Eq. 2.44 recursively for i = 1, we get the following:

C[1] = G[1] + P[1] · C[0]

= G[1] + P[1] · (G[0] + P[1] · C[–1])

= G[1] + P[1] · G[0] . (2.56)

This result means that we can ―look ahead‖ by two stages and calculate the carry into the third stage
(bit 2), which is C[1], using only the first-stage inputs (to calculate G[0]) and the second-stage inputs.
This is a carry-lookahead adder ( CLA ) [MacSorley, 1961]. If we continue expanding Eq. 2.44, we find:

C[2] = G[2] + P[2] · G[1] + P[2] · P[1] · G[0] ,

C[3] = G[3] + P[2] · G[2] + P[2] · P[1] · G[1] + P[3] · P[2] · P[1] · G[0] . (2.57)

As we look ahead further these equations become more complex, take longer to calculate, and the logic
becomes less regular when implemented using cells with a limited number of inputs. Datapath layout
must fit in a bit slice, so the physical and logical structure of each bit must be similar. In a standard cell
or gate array we are not so concerned about a regular physical structure, but a regular logical structure
simplifies design. The Brent–Kung adder reduces the delay and increases the regularity of the carry-
lookahead scheme [Brent and Kung, 1982]. Figure 2.24(a) shows a regular 4- bit CLA, using the carry-
lookahead generator cell (CLG) shown in Figure 2.24(b).
Mrs. Kavya M P,Dept. of ECE, PESITM Page 14
21EC71:Advanced VLSI

FIGURE 2.24 The Brent–Kung carry-lookahead adder (CLA). (a) Carry generation in a 4-bit CLA. (b) A cell to
generate the lookahead terms, C[0]–C[3]. (c) Cells L1, L2, and L3 are rearranged into a tree that has less
delay. Cell L4 is added to calculate C[2] that is lost in the translation. (d) and (e) Simplified representations of
parts a and c. (f) The lookahead logic for an 8-bit adder. The inputs, 0–7, are the propagate and carry terms
formed from the inputs to the adder. (g) An 8-bit Brent–Kung CLA. The outputs of the lookahead logic are the
carry bits that (together with the inputs) form the sum. One advantage of this adder is that delays from the
inputs to the outputs are more nearly equal than in other adders. This tends to reduce the number of
unwanted and unnecessary switching eventsand thus reduces power dissipation.

1.5.5. Carry Select Adder & Conditional Sum Adder

In a carry-select adder we duplicate two small adders (usually 4-bit or 8-bit adders—often CLAs)
for the cases CIN = '0' and CIN = '1' and then use a MUX to select the case that we need—wasteful,
but fast [Bedrij, 1962]. A carry-select adder is often used as the fast adder in a datapath library
because its layout is regular.

We can extend the idea behind a carry-select adder as follows. Suppose we have an n -bit adder
that generates two sums: One sum assumes a carry-in condition of '0', the other sum assumes a
carry-in condition of '1'. We can split this n -bit adder into an i -bit adder for the i LSBs and an ( n –
i )-bit adder for the n – i MSBs. Both of the smaller adders generate two conditional sums as well as
true and complement carry signals. The two (true and complement) carry signals from the LSB
adder are used to select between the two (n – i + 1)-bit conditional sums from the MSB adder using
2( n – i + 1) two-

Mrs. Kavya M P,Dept. of ECE, PESITM Page 15


21EC71:Advanced VLSI

FIGURE 2.25 The conditional-sum adder. (a) A 1-bit conditional adder that calculates the sum and carry out assuming the
carry in is either '1' or '0'. (b) The multiplexer that selects between sums and carries. (c) A 4-bit conditional-sum adder
with carry input, C[0].

input MUXes. This is a conditional-sum adder (also often abbreviated to CSA) [Sklansky, 1960]. We can
recursively apply this technique. For example, we can split a 16-bit adder using i = 8 and n = 8; then we
can split one or both 8–bit adders again—and so on.

Figure 2.25 shows the simplest form of an n -bit conditional-sum adder that uses n single-bit conditional
adders, H (each with four outputs: two conditional sums, true carry, and complement carry), together
with a tree of 2:1 MUXes (Qi_j). The conditional-sum adder is usually the fastest of all the adders we
have discussed (it is the fastest when logic cell delay increases with the number of inputs—this is true
for all ASICs except FPGAs).

1.6. Other Datapath Operators


Figure 2.31 shows symbols for some other datapath elements. The combinational datapath cells, NAND,
NOR, and so on, and sequential datapath cells (flip-flops and latches) have standard-cell equivalents and
function identically. I use a bold outline (1 point) for datapath cells instead of the regular (0.5 point) line
I use for scalar symbols. We call a set of identical cells a vector of datapath elements in the same way
that a bold symbol, A , represents a vector and A represents a scalar.

Mrs. Kavya M P,Dept. of ECE, PESITM Page 16


21EC71:Advanced VLSI

FIGURE 2.31 Symbols for datapath elements. (a) An array or vector of flip-flops (a register). (b) A two-input
NAND cell with databus inputs. (c) A two-input NAND cell with a control input. (d) A buswide MUX. (e) An
incrementer/decrementer. (f) An all-zeros detector. (g) An all-ones detector. (h) An adder/subtracter.

1. A subtracter is similar to an adder, except in a full subtracter we have a borrow-in signal, BIN; a
borrow-out signal, BOUT; and a difference signal, DIFF:

DIFF = A ⊕ NOT(B) ⊕ NOT( BIN)

NOT(BOUT) = A · NOT(B) + A · NOT(BIN) + NOT(B) · NOT(BIN)

These equations are the same as those for the FA except that the B input is inverted and the
sense of the carry chain is inverted. To build a subtracter that calculates (A – B) we invert the
entire B input bus and connect the BIN[0] input to VDD (not to VSS as we did for CIN[0] in an
adder). As an example, to subtract B = '0011' from A = '1001' we calculate '1001' + '1100' + '1' =
'0110'. As with an adder, the true overflow is XOR(BOUT[MSB], BOUT[MSB – 1]).

2. An adder/subtracter has a control signal that gates the A input with an exclusive-OR cell
(forming a programmable inversion) to switch between an adder or subtracter. Some
adder/subtracters gate both inputs to allow us to compute (–A – B). We must be careful to
connect the input to the LSB of the carry chain (CIN[0] or BIN[0]) when changing between
addition (connect to VSS) and subtraction (connect to VDD).

3. A barrel shifter rotates or shifts an input bus by a specified amount. For example if we have an
eight- input barrel shifter with input '1111 0000' and we specify a shift of '0001 0000' (3, coded
by bit position) the right-shifted 8-bit output is '0001 1110'. A barrel shifter may rotate left or
right (or switch between the two under a separate control). A barrel shifter may also have an

Mrs. Kavya M P,Dept. of ECE, PESITM Page 17


21EC71:Advanced VLSI

output width that is smaller than the input. To use a simple example, we may have an 8-bit input
and a 4-bit output.

This situation is equivalent to having a barrel shifter with two 4-bit inputs and a 4-bit output.
Barrel

4. A leading-one detector is used with a normalizing (left-shift) barrel shifter to align mantissas in
floating-point numbers. The input is ann -bit bus A, the output is an n -bit bus, S, with a single '1'
in the bit position corresponding to the most significant '1' in the input. Thus, for example, if the
input is A = '0000 0101' the leading-one detector output is S = '0000 0100', indicating the
leading one in A is in bit position 2 (bit 7 is the MSB, bit zero is the LSB).

5. The output of a priority encoder is the binary-encoded position of the leading one in an input.
For example, with an input A = '0000 0101' the leading 1 is in bit position 3 (MSB is bit position
7) so the output of a 4-bit priority encoder would be Z = '0010' (3). In some cell libraries the
encoding is reversed so that the MSB has an output code of zero, in this case Z = '0101' (5). This
second, reversed, encoding scheme is useful in floating-point arithmetic. If A is a mantissa and
we normalize A to '1010 0000' we have to subtract 5 from the exponent, this exponent
correction is equal to the output of the priority encoder.

6. An accumulator is an adder/subtracter and a register. Sometimes these are combined with a


multiplier to form a multiplier–accumulator( MAC ). An incrementer adds 1 to the input bus, Z
= A + 1, so we can use this function, together with a register, to negate a two‘s complement
number for example. The implementation is Z[ i ] = XOR(A[ i ], CIN[ i ]), and COUT[ i ] =
AND(A[ i ], CIN[ i ]). The carry-in control input, CIN[0], thus acts as an enable: If it is set to '0'
the output is the same as the input.

Z[ i (even)] = XOR(A[ i ], CIN[ i ]) and COUT[ i (even)] = NAND(A[ i ], CIN[ i ]).

This inverts COUT, so that in the following stage we must invert it again. If we push an inverting
bubble to the input CIN we find that:

Z[ i (odd)] = XNOR(A[ i ], CIN[ i ]) and COUT[ i (even)] = NOR(NOT(A[ i ]), CIN[ i ]).

7. A register file is a bank of flip-flops arranged across the bus; sometimes these have the
option of multiple ports (multiport register files) for read and write. Normally these
register files are the densest logic and hardest to fit in a datapath. For large register files it
may be more appropriate to use a multiport memory. We can add control logic to a register
file to create afirst-in first-out register ( FIFO ), or last-in first-out register ( LIFO ).

Mrs. Kavya M P,Dept. of ECE, PESITM Page 18


21EC71:Advanced VLSI

1.7. I/O Cells

Figure 2.32 shows a three-state bidirectional output buffer (Tri-State ® is a registered trademark of
National Semiconductor). When the output enable (OE) signal is high, the circuit functions as a
noninverting buffer driving the value of DATAin onto the I/O pad. When OE is low, the output
transistors or drivers , M1 and M2, are disconnected. This allows multiple drivers to be connected on a
bus. It is up to the designer to make sure that a bus never has two drivers—a problem known as
contention .

FIGURE 2.32 A three-state bidirectional output


buffer. When theoutput enable, OE, is '1' the
output section is enabled and drivesthe I/O pad.
When OE is '0' the output buffer is placed in a
high-impedance state.

Mrs. Kavya M P,Dept. of ECE, PESITM Page 19


21EC71:Advanced VLSI

1.8. Cell Compilers

Mrs. Kavya M P,Dept. of ECE, PESITM Page 20


Advanced VLSI

Mrs. Kavya M P, Dept. of ECE, PESITM Page 21

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy