0% found this document useful (0 votes)
45 views66 pages

21EC71_Module-1-3

Uploaded by

swaroop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views66 pages

21EC71_Module-1-3

Uploaded by

swaroop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 66

21EC71

Advanced VLSI

Dr. Manasa M G; Dinesh M A


Adopted from the slides of Prof. Jagannath B R
INTRODUCTION TO Module-1
ASICs
INTRODUCTION TO ASICs

• An ASIC (“a-sick”) is an application-specific integrated


circuit

9/4/20XX Presentation Title 3


Types of ASICs

• ICs are made on a wafer. Circuits are built up with


successive mask layers.
• The number of masks used to define the interconnect and
other layers is different between full-custom ICs and
programmable ASICs
• Full-custom ICs
• Semi-custom ICs
• Standard-cell-based ASICs
• Gate-array-based ASICs
9/4/20XX Presentation Title 4
Types of ASICs

Standard Cell Gate Array Programmable


based based ASICs ASICs
• Channelled • Programmabl
• Channelless e gate array
• Structured • Filed
Gated Array Programmabl
e gate array
All mask layers are Full-custom offers the highest
customized in a full-custom performance and lowest part
ASIC. It only makes sense to cost (smallest die size) with
design a full-custom IC if the disadvantages of
there are no libraries increased design time,
available complexity, design
(Speed, Size and Power) expense, and highest risk

Full-Custom ASICs

Other examples of full-custom


Microprocessors were ICs or ASICs are requirements
exclusively full-custom, but for high-voltage
designers are increasingly (automobile),
turning to semicustom ASIC analog/digital
techniques in this area too (communications), or
sensors and actuators.
Standard-Cell–Based ASICs

• A cell-based ASIC (CBIC—“sea-bick”)


• predesigned logic cells (AND gates, OR gates,
multiplexers, and flip-flops, for example)
known as standard cells
• Possibly megacells, megafunctions,
fullcustom blocks, system-level macros
(SLMs), fixed blocks, cores, or Functional
Standard Blocks (FSBs)
• All mask layers are customized—transistors
and interconnect
• Custom blocks can be embedded
• Manufacturing lead time is about eight weeks.
Standard-Cell–Based ASICs

• In datapath (DP) logic


we may use a datapath
compiler and a datapath
library. Cells such as
arithmetic and logical
units (ALUs) are pitch-
matched to each other
to improve timing and
density
9/4/20XX Presentation Title 9
Gate-Array–Based ASICs

A gate array, masked gate


array (MGA), or pre-diffused
Channelle
array uses macros (books) to Structured
gate Array
d

reduce turnaround time and


comprises a base array made
from a base cell or primitive
Channelless
cell
Channelled Gate Array

Channelled
gate Array
One difference is that the
The space for interconnect
interconnect between rows of cells are
uses Manufacturin fixed in height in a
Only
predefined g Lead time channeled gate array,
Interconnect
spaces is 2 days to 2 whereas the
is customised
between weeks space between rows of
Rows of Base cells may be adjusted in a
cells CBIC.
Channelless Gate Array

Channelless gate
Array
Manufacturi
Also called Top Few ng Lead
SEA of Gate mask are time is 2
ARRAY customized days to 2
weeks

The key difference between a channelless gate array and channeled gate array is that
there are no predefined areas set aside for routing between cells on a channelless
gate array.
Instead we route over the top of the gate-array devices.
The logic density - the amount of logic that can be implemented in a given silicon area is
higher for channelless gate arrays than for channeled gate arrays. This is usually attributed
Structured Gate Array

Structured
gate Array
Manufacturi
Only Custom
ng Lead
Interconnect Blocks can
time is 2
s are be
days to 2
Customized embedded
weeks
Programmable Logic Devices

Examples and types of PLDs:


• read-only memory (ROM)
• programmable ROM or PROM
• electrically programmable ROM, or EPROM
• An erasable PLD (EPLD)
• electrically erasable PROM, or EEPROM
• UV-erasable PROM, or UVPROM
• mask-programmable ROM
• A mask-programmed PLD usually uses bipolar technology Logic
arrays may be either a Programmable Array Logic (PAL®, a
registered trademark of AMD) or a programmable logic array (PLA);
both have an AND plane and an OR plane
Programmable Logic Devices

Important features that all PLDs have in


common:
• No customized mask layers or logic cells
• Fast design turnaround
• A single large block of programmable
interconnect
• A matrix of logic macrocells that usually
consist of programmable array logic
followed by a flip-flop or latch

9/4/20XX Presentation Title 15


A field-programmable gate array
(FPGA) or complex PLD
• A step above the PLD in complexity is the field-
programmable gate array (FPGA ).
• There is very little difference between an FPGA and a PLD
an FPGA is usually just larger and more complex than a PLD.
A field-programmable gate array
(FPGA) or complex PLD
• None of the mask layers are customized
• A method for programming the basic
logic cells and the interconnect
• The core is a regular array of
programmable basic logic cells that can
implement combinational as well as
sequential logic (flip-flops)
• A matrix of programmable interconnect
surrounds the basic logic cells
• Programmable I/O cells surround the
core
• Design turnaround is a few hours
Design Flow
A design flow is a sequence of steps to design an ASIC
1. Design entry. Using a hardware description
language (HDL) or schematic entry.
2. Logic synthesis. Produces a netlist—logic cells
and their connections.
3. System partitioning. Divide a large system into
ASIC-sized pieces.
4. Prelayout simulation. Check to see if the design
functions correctly.
5. Floorplanning. Arrange the blocks of the netlist
on the chip.
6. Placement. Decide the locations of cells in a
block.
7. Routing. Make the connections between cells and
blocks.
8. Extraction. Determine the resistance and
capacitance of the interconnect.
9. Postlayout simulation. Check to see the design
still works with the added loads of the interconnect.
ASIC Cell Libraries

You can:
1. use a design kit from the ASIC vendor
2. buy an ASIC-vendor library from a library vendor
3. you can build your own cell library
(1)is usually a phantom library—the cells are empty boxes, or phantoms, you hand off your
design to the ASIC vendor and they perform phantom instantiation (Synopsys CBA)
(2)involves a buy-or-build decision. You need a qualified cell library (qualified by the ASIC
foundry) If you own the masks (the tooling) you have a customer-owned tooling (COT)
solution (which is becoming very popular)
(3) involves a complex library development process: cell layout • behavioural model •
Verilog/VHDL model • timing model • test strategy • characterization • circuit
extraction • process control monitors (PCMs) or drop-ins • cell schematic • cell icon •
layout versus schematic (LVS) check • cell icon• logic synthesis • retargeting • wire-
load model• routing model
• we have looked at the difference between full-custom
ASICs, semi-custom ASICs, and programmable ASICs.

9/4/20XX Presentation Title 20


CMOS Logic:
Datapath Logic Cells

• Suppose we wish to build an n -bit adder (that adds


two n -bit numbers) and to exploit the regularity of
this function in the layout. We can do so using a
datapath structure.

9/4/20XX Presentation Title 21


4-bit ripple-carry adder ( RCA )

FIGURE 2.20 A datapath adder. (a) A full-adder (FA) cell with inputs (A and B), a carry in, CIN, sum output, S, and carry
out, COUT. (b) A 4-bit adder. (c) The layout, using two-level metal, with data in m1 and control in m2. In this example
the wiring is completed outside the cell; it is also possible to design the datapath cells to contain the wiring. Using
three levels of metal, it is possible to wire over
9/4/20XX the topTitle
Presentation of the datapath cells. (d) The datapath layout.
22
4-bit ripple-carry adder ( RCA )

• What is the difference between using a datapath, standard


cells, or gate arrays?
• Cells are placed together in rows on a CBIC or an MGA, but
there is generally no regularity to the arrangement of the
cells within the rows we let software arrange the cells and
complete the interconnect.
• Datapath layout automatically takes care of most of the
interconnect between the cells with the following
advantages:
• Regular layout produces predictable and equal delay for each bit.
• Interconnect between cells can be built into each cell.
9/4/20XX Presentation Title 23
4-bit ripple-carry adder ( RCA )

There are some disadvantages of using a datapath:


• The overhead (buffering and routing the control signals, for example) can
make a narrow (small number of bits) datapath larger and slower than a
standard-cell (or even gate-array) implementation.
• Datapath cells have to be predesigned (otherwise we are using full-
custom design) for use in a wide range of datapath sizes. Datapath cell
design can be harder than designing gate-array macros or standard cells.
• Software to assemble a datapath is more complex and not as widely
used as software for assembling standard cells or gate arrays.

9/4/20XX Presentation Title 24


Datapath Elements

9/4/20XX Presentation Title 25


Datapath Elements

9/4/20XX Presentation Title 26


9/4/20XX Presentation Title 27
Adders

• The delay of an n -bit RCA is proportional to n and is limited by the


propagation of the carry signal through all of the stages.
• We can reduce delay by using pairs of go-faster bubbles to change
AND and OR gates to fast two-input NAND gates
9/4/20XX Presentation Title 28
9/4/20XX Presentation Title 29
The Carry equations allow us to build the carry
chain from two-input NAND gates, one per
cell, using different logic in even and odd
stages

9/4/20XX Presentation Title 30


Carry-save adder (CSA)

• CSA (A1[i], A2[i], A3[i ], CIN, S1[i], S2[i], COUT) has


three outputs:

9/4/20XX Presentation Title 31


Carry-save adder (CSA)

• The carries are “saved” at each stage and shifted left


onto the bus S1.
• So there is no carry propagation and the delay of a
CSA is constant.
• At the output we still need to add all the saved carries
and all the sums to get n-bit result.

9/4/20XX Presentation Title 32


• We can use a CSA to add multiple inputs ( Ex.: an
adder with four 4-bit inputs)
• The last stage sums two input buses using a carry-
propagate adder (CPA) (RCA has been used as CPA;
but any adder type can be used)
• Two CSA cells and one RCA cell form a bit-slice (or
slice)
• 4 slices are stacked vertically to form the Datapath
9/4/20XX Presentation Title 33
• By using registers between stages of combinational
logic we use pipelining to increase the speed and pay
a price of increased area (for the registers) and
introduce latency. (latency is the time it takes for a
clock signal to travel from its source to the clock pin
of a flip-flop.)
• It takes a few clock cycles to fill the pipeline, but once
it is filled the answers emerge every clock cycle.
9/4/20XX Presentation Title 34
carry-bypass adders (CBA)

• The problem with an RCA is that every stage has to wait to make
its carry decision, C[i], until the previous stage has calculated C[i -
1].
• If we examine the propagate signals we can bypass this critical
path.
• for example. to bypass the carries for bits 4-7 (stages 5-8) of an
adder we can compute BYPASS = P[4].P[5].P[6].P[7] and then use
a MUX as C[7]=(G[7]+P[7]·C[6])·BYPASS'+C[3]·BYPASS
• Manchester-carry chains can compute the carries and the
bypass operation using TGs or just pass transistors.
9/4/20XX Presentation Title 35
Carry-skip Adder

9/4/20XX Presentation Title 36


Carry-lookahead Adder (CLA)

• If we evaluate the
above equation
recursively for i=1,2,3…
• We can “look ahead” by
two stages and
calculate the carry into
the third stage (bit 2),
which is C[1], using only
the first-stage inputs.

9/4/20XX Presentation Title 37


• As we look ahead further these equations become
more complex.
• Take longer to calculate and the logic becomes less
regular.
• The physical and logical structure of each bit must be
similar, so that the Datapath layout fits in a bit slice.
• Regular structures are not of concern in standard
cell or gate array logic.
9/4/20XX Presentation Title 38
Brent-Kung adder

• Reduces the
delay and
increases the
regularity of
the carry-
Iookahead
scheme

9/4/20XX Presentation Title 39


Brent-Kung adder

• Carry generation in a 4-bit CLA.

9/4/20XX Presentation Title 40


Brent-Kung adder

• A cell to generate the lookahead terms. C[0]-C[3].

9/4/20XX Presentation Title 41


Brent-Kung adder

• Cells L1, L2, and L3 are rearranged into a tree that has less
delay. Cell L4 is added to calculate C[2] that is lost in the
translation.

9/4/20XX Presentation Title 42


Brent-Kung adder

• Simplified representations
of parts a and c.

9/4/20XX Presentation Title 43


Brent-Kung adder

• The lookahead logic for


an 8-bit adder. The
inputs 0-7 are the
propagate and carry
terms formed from the
inputs to the adder.

9/4/20XX Presentation Title 44


Brent-Kung adder

• An 8-bit Brent-Kung CLA. The outputs


of the lookahead logic are the carry
bits that (together with the inputs)
form the sum.
• One advantage of this adder is that
delays from the inputs to the outputs
are more nearly equal than in other
adders. This tends to reduce the
number of unwanted and
unnecessary switching events and
thus reduces power dissipation.
9/4/20XX Presentation Title 45
carry-select adder

• we duplicate two small adders (usually 4-bit or 8-bit


adders-often CLAs) for the cases CIN = '0' and ClN =
‘1' and then use a MUX to select the case that we
need - wasteful but fast.
• A carry-select adder is often used as the fast adder in
a Datapath Library because its layout is regular.

9/4/20XX Presentation Title 46


Conditional-sum Adder (CSA)

• An n-bit adder that generates two sums


• One sum assumes a carry-in condition of '0'.
• The other sum assumes a carry-in condition of’ 1’.
• Can split this n-bit adder into an i-bit adder for the i LSBs and an (n-i) -bit
adder for the (n-i) MSBs.
• Both adders generate two conditional sums as well as true and complement
carry signals.
• The two (true and complement ) carry signals from the LSB adder are used
to select between the two conditional sums from the MSB adder using two-
input MUXes.
• Example: we can split a 16-bit adder using i = 8 and n = 8: then we can split one or
both 8-bit adders again and so on
9/4/20XX Presentation Title 47
Conditional-sum Adder (CSA)

• The simplest form


of an n-bit CSA that
uses n single-bit
conditional adders,
H (each with four
outputs; two
conditional sums,
true carry and
complementary
carry), together
with a tree of 2:1
MUXes (Qi_j).
9/4/20XX Presentation Title 48
Conditional-sum Adder (CSA)

(a) A 1-bit conditional adder that calculates the (b) The multiplexer that selects between sums
sum and carry out assuming the carry-in is either '1' or '0' and carries
9/4/20XX Presentation Title 49
Conditional-sum Adder (CSA)

(c) A 4-bit conditional-sum adder


with carry input, C[0]

9/4/20XX Presentation Title 50


Conditional-sum Adder (CSA)

9/4/20XX Presentation Title 51


FIGURE 2.26: Datapath adders. This data is from a series of submicron datapath libraries. (a) Delay normalized to a two-input NAND
logic cell delay (approximately equal to 250 ps in a 0.5μm process). For example, a 64-bit ripple-carry adder (RCA) has a delay of
approximately 30 ns in a 0.5 μm process. The spread in delay is due to variations in delays between different inputs and outputs. An
n-bit RCA has a delay proportional to n. The delay of an n-bit carry select adder is approximately proportional to log2n. The carry-
save adder delay is constant (but requires a carry-propagate adder to complete an addition). (b) In a datapath library the
area of all adders are proportional to the bit size.
9/4/20XX Presentation Title 52
Multipliers

a symmetric 6-bit array multiplier (an n-bit multiplier


multiplies two n-bit numbers; we shall use n-bit by m-bit
multiplier if the lengths are different).

9/4/20XX Presentation Title 53


Multipliers
• Adders a0-f0 may be
eliminated, which then
eliminates adders a a0-
a6. leaving an
asymmetric CSA array of
30 (5 x 6) adders
(including one half adder).
• An n-bit array multiplier
has a delay proportional
to n plus the delay of the
CPA (adders b6-f6 in
Figure 2_27).
• There are two items we
can attack to improve the
performance of a
multiplier: the number of
partial products and the
9/4/20XX Presentation Title 54
Multipliers

9/4/20XX Presentation Title 55


Multipliers

9/4/20XX Presentation Title 56


Multipliers

9/4/20XX Presentation Title 57


Wallace tree Multiplier

• collapse the chain of adders


a0-f5 (5 adder delays) to the
Wallace tree consisting of
adders 5.1-5.4
• At each stage we have the
following three choices:
• (1) sum three outputs using a
full 'adder (denoted by a box
enclosing three dots),
• (2) sum two outputs using a
half adder (a box with two dots)
• (3) pass the outputs directly to
the next stage.

9/4/20XX Presentation Title 58


• The object is to choose (1), (2), or (3) at each stage to
maximize the performance of the multiplier.
• In tree-based multipliers there are two ways to do
this- working forward and working backward.
• In a Wallace-tree multiplier we work forward from the
multiplier inputs, compressing the number of signals
to be added at each stage.

9/4/20XX Presentation Title 59


9/4/20XX Presentation Title 60
Dadda multiplier

• In a Dadda
multiplier
we work
backward
from the
final
product.

9/4/20XX Presentation Title 61


Ferrari–Stefanelli Multipliers

• Nests multipliers-
the 2-bit
submultipliers
reduce the number
of partial products

9/4/20XX Presentation Title 62


Deciding between parallel multiplier
architectures

• Wallace-tree multiplier is more suited to full-custom layout.


but is slightly larger than a Dadda multiplier-both are less
regular than an array multiplier. For cell-based ASICs, a
Dadda multiplier is smaller than a Wallace-tree multiplier.
• The overall multiplier speed does depend on the size and
architecture of the final CPA, but this may be optimized
independently of the CSA array. This means a Dadda
multiplier is always at least as fast as the Wallace-tree
version.
9/4/20XX Presentation Title 63
Deciding between parallel multiplier
architectures

• The low-order bits of any parallel multiplier settle first


and can be added in the CPA before the remaining bits
settle. This allows multiplication and the final
addition to be overlapped in time.
• Any of the parallel multiplier architectures may be
pipelined. We may also use a variably pipelined
approach that tailors the register locations to the size
of the multiplier.

9/4/20XX Presentation Title 64


Deciding between parallel multiplier
architectures

• Using appropriate counters increases the stage


compression and permits the size of the stages to
be tuned. There is a trade-off in using these counters
between the speed and size of the logic cells and the
delay as well as area of the interconnect.
• Power dissipation is reduced by the tree -based
structures, The simplified carry-save logic produces
fewer signal transitions and the tree structures produce
fewer glitches than a chain.
9/4/20XX Presentation Title 65
I/O Cells

• A three-stage bidirectional output buffer.


• When the output enable, OE, is ‘1’ the
output section is enabled and drives the
I/O pad.
• When OE is ‘0’ the output buffer is
placed in a high-impedance state.
• This allows multiple drivers to be connected
on a bus- CONTENTION- A Bus Keeper

9/4/20XX Presentation Title 66

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy