AdvVLSI Module1
AdvVLSI Module1
Syllabus: Introduction to ASICs: Full custom, Semi-custom and Programmable ASICs, ASIC Design flow,
ASIC cell libraries. CMOS Logic: Data path Logic Cells: Data Path Elements, Adders: Carry skip, Carry
bypass, Carry save, Carry select, Conditional sum, Multiplier (Booth encoding), Data path Operators, I/O
cells, Cell Compilers.
An ASIC (pronounced ―a-sick‖; bold typeface defines a new term) is an application-specific integrated
circuit —at least that is what the acronym stands for. Before we answer the question of what that means
we first look at the evolution of the silicon chip or integrated circuit ( IC ).
Figure 1.1(a) shows an IC package (this is a pin-grid array, or PGA, shown upside down; the pins will go
through holes in a printed-circuit board). People often call the package a chip, but, as you can see in
Figure 1.1(b), the silicon chip itself (more properly called a die ) is mounted in the cavity under the
sealed lid. A PGA package is usually made from a ceramic material, but plastic packages are also
common
FIGURE 1.1 An integrated circuit (IC). (a) A pingrid array (PGA) package. (b) The silicon die or chip is
under the package lid.
The physical size of a silicon die varies from a few millimeters on a side to over 1 inch on a side, but
instead we often measure the size of an IC by the number of logic gates or the number of transistors that
the IC contains. As a unit of measure a gate equivalent corresponds to a two-input NAND gate (a circuit
that performs the logic function, F = A • B ). Often we just use the term gates instead of gate equivalents
when we are measuring chip size—not to be confused with the gate terminal of a transistor. For
example, a 100 k-gate IC contains the equivalent of 100,000 two-input NAND gates.
The semiconductor industry has evolved from the first ICs of the early 1970s and matured rapidly since
then. Early small-scale integration ( SSI ) ICs contained a few (1 to 10) logic gates—NAND gates, NOR
gates, and so on—amounting to a few tens of transistors. The era of medium-scale integration ( MSI )
increased the range of integrated logic available to counters and similar, larger scale, logic functions. The
era of large-scale integration ( LSI ) packed even larger logic functions, such as the first microprocessors,
into a single chip. The era of very large-scale integration ( VLSI ) now offers 64-bit microprocessors,
complete with cache memory and floating-point arithmetic units—well over a million transistors—on a
single piece of silicon. As CMOS process technology improves, transistors continue to get smaller and ICs
hold more and more transistors. Some people (especially in Japan) use the term ultralarge scale
integration (ULSI ), but most people stop at the term VLSI; otherwise we have to start inventing new
words.
A full-custom IC includes some (possibly all) logic cells that are customized and all mask layers that are
customized. A microprocessor is an example of a full-custom IC—designers spend many hours
squeezing the most out of every last square micron of microprocessor chip space by hand. Customizing
all of the IC features in this way allows designers to include analog circuits, optimized memory cells, or
mechanical structures on an IC, for example. Full-custom ICs are the most expensive to manufacture and
to design.
The manufacturing lead time (the time it takes just to make an IC—not including design time) is
typically eight weeks for a full-custom IC. These specialized full-custom ICs are often intended for a
specific application, so we might call some of them full-custom ASICs.
In a full-custom ASIC an engineer designs some or all of the logic cells, circuits, or layout specifically for
one ASIC. This means the designer abandons the approach of using pretested and pre-characterized cells
for all or part of that design. It makes sense to take this approach only if there are no suitable existing
cell libraries available that can be used for the entire design. This might be because existing cell libraries
are not fast enough, or the logic cells are not small enough or consume too much power. You may need
to use full-custom design if the ASIC technology is new or so specialized that there are no existing cell
libraries or because the ASIC is so specialized that some circuits must be custom designed. Fewer and
fewer full-custom ICs are being designed because of the problems with these special parts of the ASIC.
There is one growing member of this family, though, the mixed analog/digital ASIC, which we shall
discuss next.
• A cell-based ASIC (cell-based IC, or CBIC pronounced sea-bick) uses predesigned logic cells
(AND gates, OR gates, multiplexers, and flip-flops, for example) known as standard cells.
• One can apply the term CBIC to any IC that uses cells, but it is generally accepted that a cell-
based ASIC or CBIC means a standard-cell based ASIC.
• The standard-cell areas (also called flexible blocks) in a CBIC are built of rows of standard cells
like a wall built of bricks. The standard-cell areas may be used in combination with
microcontrollers or even microprocessors, known as mega cells. Mega cells are also called mega
functions, full-custom blocks, system-level macros (SLMs), fixed blocks, cores, or Functional
Standard Blocks (FSBs).
• A cell-based ASIC (CBIC) die with a single standard-cell area (a flexible block) together with
four fixed blocks.
• The ASIC designer defines only the placement of the standard cells and the interconnect in a
CBIC. However, the standard cells can be placed anywhere on the silicon; this means that all the
mask layers of a CBIC are customized and are unique to a particular customer.
• The advantage of CBICs is that designers save time, money, and reduce risk by using a
predesigned, pretested, and pre characterized standard-cell library.
• In addition each standard cell can be optimized individually. During the design of the cell library
each and every transistor in every standard cell can be chosen to maximize speed or minimize
area .
• The disadvantages are the time or expense of designing or buying the standard-cell library and
the time needed to fabricate all layers of the ASIC for each new design.
• In a gate array (sometimes abbreviated GA) or gate-array based ASIC the transistors are
predefined on the silicon wafer.
• The predefined pattern of transistors on a gate array is the base array , and the smallest element
that is replicated to make the base array is the base cell (sometimes called a primitive cell ).
• Only the top few layers of metal, which define the interconnect between transistors, are defined
by the designer using custom masks. To distinguish this type of gate array from other types of
gate array, it is often called a masked gate array ( MGA ).
• The designer chooses from a gate-array library of predesigned and pre-characterized logic cells
• The logic cells in a gate-array library are often called macros . The reason for this is that the
base-cell layout is the same for each logic cell, and only the interconnect (inside cells and
between cells) is customized, which is similar to a software macro.
• The channeled gate array was the first to be developed . In a channeled gate array space is left
between the rows of transistors for wiring.
• A channeled gate array is similar to a CBIC. Both use the rows of cells separated by channels
used for interconnect. One difference is that the space for interconnect between rows of cells are
fixed in height in a channeled gate array, whereas the space between rows of cells may be
adjusted in a CBIC.
• This channel less gate-array architecture is now more widely used . The routing on a channelless
gate array uses rows of unused transistors.
• The key difference between a channel less gate array and channeled gate array is that there are
no predefined areas set aside for routing between cells on a channel less gate array. Instead we
route over the top of the gate-array devices. We can do this because we customize the contact
layer that defines the connections between metal 1, the first layer of metal, and the transistors.
• Features of Channel less Gate Array
Only the interconnect is customized.
The interconnect uses predefined spaces between rows of base cells.
Manufacturing lead time is around two days to two weeks.
When we use an area of transistors for routing in a channel less array, we do not make
any contacts to the devices lying underneath , we simply leave the transistors unused.
• The basic difference between a channel less gate array and channeled gate array is that there are
no predefined areas set aside for routing between cells on a channel less gate array. Instead we
route over the top of the gate-array devices.
• It is done like this because we customize the contact layer that defines the connections between
metal1, the first layer of metal, and the transistors. When we use an area of transistors for
routing in a channel less array, we do not make any contacts to the devices lying underneath;
we simply leave the transistors unused.
• This design combines some of the features of CBICs and MGAs.It is also known as an embedded
gate array or structured gate array(also called as master slice or master image).
• One of the limitations of the MGA is the fixed gate-array base cell. This makes the
implementation of memory, difficult and inefficient.
• In an embedded gate array some of the IC area is set aside and dedicate it to a specific function.
This embedded area either can contain a different base cell that is more suitable for building
memory cells, or it can contain a complete circuit block, such as a microcontroller.
• Programmable logic devices ( PLDs ) are standard ICs that are available in standard
configurations.
• Features of PLDs
• The simplest type of programmable IC is a read-only memory( ROM ). The most common types
of ROM use a metal fuse that can be blown permanently (a programmable ROM or PROM ).
• One can erase an EPROM either by using another high voltage (an electrically erasable PROM ,
or EEPROM ) or by exposing the device to ultraviolet light (UV-erasable PROM, or UVPROM).
• There is another type of ROM that can be placed on any ASIC a mask-programmable ROM
(mask-programmed ROM or masked ROM). A masked ROM is a regular array of transistors
permanently programmed using custom mask patterns.
• FPGAs are the newest member of the ASIC family and are rapidly growing in , replacing TTL in
microelectronic systems. Even though an FPGA is a type of gate array, we do not consider the
term gate-array based ASICs to include FPGAs.
• There is very little difference between an FPGA and a PLD .An FPGA is usually just larger and
more complex than a PLD. In fact, some vendors that manufacture programmable ASICs call their
products as FPGAs and some call them as complex PLDs .
• Features of PLDs
The sequence of steps to design an ASIC is known as the Design flow . The various steps involved in
ASIC design flow are given below.
1. Design entry : Design entry is a stage where the micro architecture is implemented in a
Hardware Description language like VHDL, Verilog , System Verilog etc.
In early days , a schematic editor was used for design entry where designers instantiated
gates. Increased complexity in the current designs require the use of HDLs to gain
productivity . Another advantage is that HDLs are independent of process technology and
hence can be re-used over time
2. Logic synthesis: Use an HDL (VHDL or Verilog) and a logic synthesis tool to produce a net
list a description of the logic cells and their connections
3. System partitioning: Divide a large system into ASIC-sized pieces.
4. Pre-layout simulation: Check to see if the design functions correctly.
5. Floor planning: Arrange the blocks of the netlist on the chip.
6. Placement: Decide the locations of cells in a block.
7. Routing: Make the connections between cells and blocks.
Steps 1–4 are part of logical design , and steps 5–9 are part of physical design . There is some overlap.
For example, system partitioning might be considered as either logical or physical design. To put it
another way, when we are performing system partitioning we have to consider both logical and physical
factors. Chapters 9–14 of this book is largely about logical design and Chapters 15–17 largely about
physical design.
Suppose we wish to build an n -bit adder (that adds two n -bit numbers) and to exploit the regularity of
this function in the layout. We can do so using a datapath structure. The following two functions, SUM
and COUT, implement the sum and carry out for a full adder ( FA ) with two data inputs (A, B) and a
carry in, CIN
The sum uses the parity function ('1' if there are an odd numbers of '1's in the inputs). The carry out,
COUT, uses the 2-of-3 majority function ('1' if the majority of the inputs are '1'). We can combine these
two functions in a single FA logic cell, ADD(A[ i ], B[ i ], CIN, S[ i], COUT), shown in
FIGURE 2.20 A datapath adder. (a) A full-adder (FA) cell with inputs (A and B), a carry in, CIN,
sum output, S, and carry out, COUT. (b) A 4-bit adder. (c) The layout, using two-level metal, with
data in m1 and control in m2. In this example the wiring is completed outside the cell; it is also
possible to design the datapath cells to contain the wiring. Using three levels of metal, it is
possible to wire overthe top of the datapath cells. (d) The datapath layout.
Now we can build a 4-bit ripple-carry adder ( RCA ) by connecting four of these ADD cells together as
shown in Figure 2.20(b). The i th ADD cell is arranged with the following: two bus inputs A[ i ], B[ i ]; one
bus output S[ i ]; an input, CIN, that is the carry in from stage (i – 1) below and is also passed up to the
cell above as an output; and an output, COUT, that is the carry out to stage ( i + 1) above. In the 4-bit
adder shown in Figure 2.20(b) we connect the carry input, CIN[0], to VSS and use COUT[3] and COUT[2]
to indicate arithmetic overflow (in Section 2.6.1 we shall see why we may need both signals). Notice that
we build the ADD cell so that COUT[2] is available at the top of the datapath when we need it.
Figure 2.20(c) shows a layout of the ADD cell. The A inputs, B inputs, and S outputs all use m1
interconnect running in the horizontal direction—we call these data signals. Other signals can enter or
exit from the top or bottom and run vertically across the datapath in m2—we call these control signals.
We can also use m1 for control and m2 for data, but we normally do not mix these approaches in the
same structure. Control signals are typically clocks and other signals common to elements. For example,
in Figure 2.20(c) the carry signals, CIN and COUT, run vertically in m2 between cells. To build a 4-bit
adder we stack four ADD cells creating the array structure shown in Figure 2.20(d). In this case the A
and B data bus inputs enter from the left and bus S, the sum, exits at the right, but we can connect A, B,
and S to either side if we want.
The layout of buswide logic that operates on data signals in this fashion is called a datapath . The module
ADD is a datapath cell ordatapath element . Just as we do for standard cells we make all the datapath
cells in a library the same height so we can abut other datapath cells on either side of the adder to create
a more complex datapath. When people talk about a datapath they always assume that it is oriented so
that increasing the size in bits makes the datapath grow in height, upwards in the vertical direction, and
adding different datapath elements to increase the function makes the datapath grow in width, in the
horizontal direction—but we can rotate and position a completed datapath in any direction we want on
a chip.
Figure 2.21 shows some typical datapath symbols for an adder (people rarely use the IEEE standards in
ASIC datapath libraries). I use heavy lines (they are 1.5 point wide) with a stroke to denote a data bus
(that flows in the horizontal direction in a datapath), and regular lines (0.5 point) to denote the control
signals (that flow vertically in a datapath). At the risk of adding confusion where there is none, this
stroke to indicate a data bus has nothing to do with mixed-logic conventions. For a bus, A[31:0] denotes
a 32-bit bus with A[31] as the leftmost or most-significant bit or MSB , and A[0] as the least- significant
bit or LSB . Sometimes we shall use A[MSB] or A[LSB] to refer to these bits. Notice that if we have an n -
bit bus and LSB = 0, then MSB = n – 1. Also, for example, A[4] is the fifth bit on the bus (from the LSB).
We use a ' S ' or 'ADD' inside the symbol to denote an adder instead of '+', so we can attach '–' or '+/–' to
the inputs for a subtracter or adder/subtracter.
FIGURE 2.21 Symbols for a datapath adder. (a) A data bus is shown by a heavy line (1.5 point) and a bus
symbol. If the bus is n -bits wide then MSB = n – 1. (b) An alternative symbol for an adder. (c) Control signals
are shown as lightweight (0.5 point) lines.
1.5. Adders
method 1 method 2
G[i] = A[i] · B[i] G[ i ] = A[ i ] · B[ i ] (2.42)
P[ i ] = A[ i ] ⊕ B[ i P[ i ] = A[ i ] + B[ i ] (2.43)
C[ i ] = G[ i ] + P[ i ] · C[ i –1] C[ i ] = G[ i ] + P[ i ] · C[ i –1] (2.44)
S[ i ] = P[ i ] ⊕ C[ i –1] S[ i ] = A[ i ] ⊕ B[ i ] ⊕ C[ i –1] (2.45)
where C[ i ] is the carry-out signal from stage i , equal to the carry in of stage ( i + 1). Thus, C[ i ] = COUT[
i ] = CIN[ i + 1]. We need to be careful because C[0] might represent either the carry in or the carry out of
the LSB stage. For an adder we set the carry in to the first stage (stage zero), C[–1] or CIN[0], to '0'. Some
people use delete (D) or kill (K) in various ways for the complements of G[i] and P[i], but unfortunately
others use C for COUT and D for CIN—so I avoid using any of these. Do not confuse the two different
methods (both of which are used) in Eqs. 2.42–2.45 when forming the sum, since the propagate signal,
P[ i ] , is different for each method.
Figure 2.22(a) shows a conventional RCA. The delay of an n -bit RCA is proportional to n and is limited
by the propagation of the carry signal through all of the stages. We can reduce delay by using pairs
of―go-faster‖ bubbles to change AND and OR gates to fast two-input NAND gates as shown in
Figure 2.22(a). Alternatively, we can write the equations for the carry signal in two different ways:
where P[ i ]'= NOT(P[ i ]). Equations 2.46 and 2.47 allow us to build the carry chain from two-input
NAND gates, one per cell, using different logic in even and odd stages (Figure 2.22b):
(the carry inputs to stage zero are C3[–1] = C4[–1] = '0'). We can use the RCA of Figure 2.22(b) in a
datapath, with standard cells, or on a gate array.
Instead of propagating the carries through each stage of an RCA, Figure 2.23 shows a different approach.
A carry-save adder ( CSA ) cell CSA(A1[ i ], A2[ i ], A3[ i ], CIN, S1[ i ], S2[ i ], COUT) has three outputs:
COUT = A1[ i ] · A2[ i ] + [(A1[ i ] + A2[ i ]) · A3[ i ]] = MAJ(A1[ i ], A2[ i ], A3[ i ]) . (2.53)
The inputs, A1, A2, and A3; and outputs, S1 and S2, are buses. The input, CIN, is the carry from stage ( i –
1). The carry in, CIN, is connected directly to the output bus S1—indicated by the schematic symbol
(Figure 2.23a). We connect CIN[0] to VSS. The output, COUT, is the carry out to stage ( i + 1).
A 4-bit CSA is shown in Figure 2.23(b). The arithmetic overflow signal for ones‘ complement or two‘s
complement arithmetic, OV, is XOR(COUT[MSB], COUT[MSB – 1]) as shown in Figure 2.23(c). In a CSA
the carries are ―saved‖ at each stage and shifted left onto the bus S1. There is thus no carry propagation
and the delay of a CSA is constant. At the output of a CSA we still need to add the S1 bus (all the saved
carries) and the S2 bus (all the sums) to get an n -bit result using a final stage that is not shown in Figure
2.23(c). We might regard the n -bit sum as being encoded in the two buses, S1 and S2, in the form of the
parity and majority functions.
We can use a CSA to add multiple inputs—as an example, an adder with four 4-bit inputs is shown in
Figure 2.23(d). The last stage sums two input buses using a carry-propagate adder ( CPA ). We have
used an RCA as the CPA in Figure 2.23(d) and (e), but we can use any type of adder. Notice in
Figure 2.23(e) how the two CSA cells and the RCA cell abut together horizontally to form a bit slice (or
slice) and then the slices are stacked vertically to form the datapath.
FIGURE 2.23 The carry-save adder (CSA). (a) A CSA cell. (b) A 4-bit CSA. (c) Symbol for a CSA. (d) A
four-input CSA. (e) The datapath for a four-input, 4-bit adder using CSAs with a ripple-carry adder
(RCA) as the final stage. (f) A pipelined adder. (g) The datapath for the pipelined version showing
thepipeline registers as well as the clock control lines that use m2.
We can register the CSA stages by adding vectors of flip-flops as shown in Figure 2.23(f). This reduces
the adder delay to that of the slowest adder stage, usually the CPA. By using registers between stages of
combinational logic we use pipelining to increase the speed and pay a price of increased area (for the
registers) and introduce latency . It takes a few clock cycles (the latency, equal to nclock cycles for an n -
stage pipeline) to fill the pipeline, but once it is filled, the answers emerge every clock cycle. Ferris
wheels work much the same way. When the fair opens it takes a while (latency) to fill the wheel, but
once it is full the people can get on and off every few seconds. (We can also pipeline the RCA of Figure
2.20. We add i registers on the A and B inputs before ADD[ i ] and add ( n– i ) registers after the output S[
i ], with a single register before each C[ i ].)
The problem with an RCA is that every stage has to wait to make its carry decision, C[ i ], until the
previous stage has calculated C[ i – 1]. If we examine the propagate signals we can bypass this critical
path. Thus, for example, to bypass the carries for bits 4–7 (stages 5–8) of an adder we can compute
BYPASS = P[4].P[5].P[6].P[7] and then use a MUX as follows:
Adders based on this principle are called carry-bypass adders ( CBA ) [Sato et al., 1992]. Large, custom
adders employ Manchester-carry chains to compute the carries and the bypass operation using TGs or
just pass transistors [Weste and Eshraghian, 1993, pp. 530–531]. These types of carry chains may be
part of a predesigned ASIC adder cell, but are not used by ASIC designers.
Instead of checking the propagate signals we can check the inputs. For example we can compute
SKIP= (A[ i – 1] ⊕ B[ i – 1]) + (A[ i ]⊕ B[ i ] ) and then use a 2:1 MUX to select C[ i ]. Thus,
This is a carry-skip adder [Keutzer, Malik, and Saldanha, 1991; Lehman, 1961]. Carry-bypass and carry-
skip adders may include redundant logic (since the carry is computed in two different ways—we just
take the first signal to arrive). We must be careful that the redundant logic is not optimized away during
logic synthesis.
This result means that we can ―look ahead‖ by two stages and calculate the carry into the third stage
(bit 2), which is C[1], using only the first-stage inputs (to calculate G[0]) and the second-stage inputs.
This is a carry-lookahead adder ( CLA ) [MacSorley, 1961]. If we continue expanding Eq. 2.44, we find:
C[3] = G[3] + P[2] · G[2] + P[2] · P[1] · G[1] + P[3] · P[2] · P[1] · G[0] . (2.57)
As we look ahead further these equations become more complex, take longer to calculate, and the logic
becomes less regular when implemented using cells with a limited number of inputs. Datapath layout
must fit in a bit slice, so the physical and logical structure of each bit must be similar. In a standard cell
or gate array we are not so concerned about a regular physical structure, but a regular logical structure
simplifies design. The Brent–Kung adder reduces the delay and increases the regularity of the carry-
lookahead scheme [Brent and Kung, 1982]. Figure 2.24(a) shows a regular 4- bit CLA, using the carry-
lookahead generator cell (CLG) shown in Figure 2.24(b).
Mrs. Kavya M P,Dept. of ECE, PESITM Page 14
21EC71:Advanced VLSI
FIGURE 2.24 The Brent–Kung carry-lookahead adder (CLA). (a) Carry generation in a 4-bit CLA. (b) A cell to
generate the lookahead terms, C[0]–C[3]. (c) Cells L1, L2, and L3 are rearranged into a tree that has less
delay. Cell L4 is added to calculate C[2] that is lost in the translation. (d) and (e) Simplified representations of
parts a and c. (f) The lookahead logic for an 8-bit adder. The inputs, 0–7, are the propagate and carry terms
formed from the inputs to the adder. (g) An 8-bit Brent–Kung CLA. The outputs of the lookahead logic are the
carry bits that (together with the inputs) form the sum. One advantage of this adder is that delays from the
inputs to the outputs are more nearly equal than in other adders. This tends to reduce the number of
unwanted and unnecessary switching eventsand thus reduces power dissipation.
In a carry-select adder we duplicate two small adders (usually 4-bit or 8-bit adders—often CLAs)
for the cases CIN = '0' and CIN = '1' and then use a MUX to select the case that we need—wasteful,
but fast [Bedrij, 1962]. A carry-select adder is often used as the fast adder in a datapath library
because its layout is regular.
We can extend the idea behind a carry-select adder as follows. Suppose we have an n -bit adder
that generates two sums: One sum assumes a carry-in condition of '0', the other sum assumes a
carry-in condition of '1'. We can split this n -bit adder into an i -bit adder for the i LSBs and an ( n –
i )-bit adder for the n – i MSBs. Both of the smaller adders generate two conditional sums as well as
true and complement carry signals. The two (true and complement) carry signals from the LSB
adder are used to select between the two (n – i + 1)-bit conditional sums from the MSB adder using
2( n – i + 1) two-
FIGURE 2.25 The conditional-sum adder. (a) A 1-bit conditional adder that calculates the sum and carry out assuming the
carry in is either '1' or '0'. (b) The multiplexer that selects between sums and carries. (c) A 4-bit conditional-sum adder
with carry input, C[0].
input MUXes. This is a conditional-sum adder (also often abbreviated to CSA) [Sklansky, 1960]. We can
recursively apply this technique. For example, we can split a 16-bit adder using i = 8 and n = 8; then we
can split one or both 8–bit adders again—and so on.
Figure 2.25 shows the simplest form of an n -bit conditional-sum adder that uses n single-bit conditional
adders, H (each with four outputs: two conditional sums, true carry, and complement carry), together
with a tree of 2:1 MUXes (Qi_j). The conditional-sum adder is usually the fastest of all the adders we
have discussed (it is the fastest when logic cell delay increases with the number of inputs—this is true
for all ASICs except FPGAs).
FIGURE 2.31 Symbols for datapath elements. (a) An array or vector of flip-flops (a register). (b) A two-input
NAND cell with databus inputs. (c) A two-input NAND cell with a control input. (d) A buswide MUX. (e) An
incrementer/decrementer. (f) An all-zeros detector. (g) An all-ones detector. (h) An adder/subtracter.
1. A subtracter is similar to an adder, except in a full subtracter we have a borrow-in signal, BIN; a
borrow-out signal, BOUT; and a difference signal, DIFF:
These equations are the same as those for the FA except that the B input is inverted and the
sense of the carry chain is inverted. To build a subtracter that calculates (A – B) we invert the
entire B input bus and connect the BIN[0] input to VDD (not to VSS as we did for CIN[0] in an
adder). As an example, to subtract B = '0011' from A = '1001' we calculate '1001' + '1100' + '1' =
'0110'. As with an adder, the true overflow is XOR(BOUT[MSB], BOUT[MSB – 1]).
2. An adder/subtracter has a control signal that gates the A input with an exclusive-OR cell
(forming a programmable inversion) to switch between an adder or subtracter. Some
adder/subtracters gate both inputs to allow us to compute (–A – B). We must be careful to
connect the input to the LSB of the carry chain (CIN[0] or BIN[0]) when changing between
addition (connect to VSS) and subtraction (connect to VDD).
3. A barrel shifter rotates or shifts an input bus by a specified amount. For example if we have an
eight- input barrel shifter with input '1111 0000' and we specify a shift of '0001 0000' (3, coded
by bit position) the right-shifted 8-bit output is '0001 1110'. A barrel shifter may rotate left or
right (or switch between the two under a separate control). A barrel shifter may also have an
output width that is smaller than the input. To use a simple example, we may have an 8-bit input
and a 4-bit output.
This situation is equivalent to having a barrel shifter with two 4-bit inputs and a 4-bit output.
Barrel
4. A leading-one detector is used with a normalizing (left-shift) barrel shifter to align mantissas in
floating-point numbers. The input is ann -bit bus A, the output is an n -bit bus, S, with a single '1'
in the bit position corresponding to the most significant '1' in the input. Thus, for example, if the
input is A = '0000 0101' the leading-one detector output is S = '0000 0100', indicating the
leading one in A is in bit position 2 (bit 7 is the MSB, bit zero is the LSB).
5. The output of a priority encoder is the binary-encoded position of the leading one in an input.
For example, with an input A = '0000 0101' the leading 1 is in bit position 3 (MSB is bit position
7) so the output of a 4-bit priority encoder would be Z = '0010' (3). In some cell libraries the
encoding is reversed so that the MSB has an output code of zero, in this case Z = '0101' (5). This
second, reversed, encoding scheme is useful in floating-point arithmetic. If A is a mantissa and
we normalize A to '1010 0000' we have to subtract 5 from the exponent, this exponent
correction is equal to the output of the priority encoder.
This inverts COUT, so that in the following stage we must invert it again. If we push an inverting
bubble to the input CIN we find that:
Z[ i (odd)] = XNOR(A[ i ], CIN[ i ]) and COUT[ i (even)] = NOR(NOT(A[ i ]), CIN[ i ]).
7. A register file is a bank of flip-flops arranged across the bus; sometimes these have the
option of multiple ports (multiport register files) for read and write. Normally these
register files are the densest logic and hardest to fit in a datapath. For large register files it
may be more appropriate to use a multiport memory. We can add control logic to a register
file to create afirst-in first-out register ( FIFO ), or last-in first-out register ( LIFO ).
Figure 2.32 shows a three-state bidirectional output buffer (Tri-State ® is a registered trademark of
National Semiconductor). When the output enable (OE) signal is high, the circuit functions as a
noninverting buffer driving the value of DATAin onto the I/O pad. When OE is low, the output
transistors or drivers , M1 and M2, are disconnected. This allows multiple drivers to be connected on a
bus. It is up to the designer to make sure that a bus never has two drivers—a problem known as
contention .