0% found this document useful (0 votes)
27 views

Physical Design: Methodologies and Developments: Abhay Chopde, and Atharva M. Kulkarni

Uploaded by

geek.bill.0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Physical Design: Methodologies and Developments: Abhay Chopde, and Atharva M. Kulkarni

Uploaded by

geek.bill.0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Physical Design: Methodologies and Developments

Abhay Chopde∗ , Fellow, IEEE and Atharva M. Kulkarni∗ ,


∗ Dept. of Electronics and Telecommunication Engineering, Vishwakarma Institute of Technology, Pune, India

Abstract—The design and production of VLSI chips is a chips showing desired characteristics, applications, processing
multilevel heirarchical process. As the demand for reduced die- parameters and portability.
area and technology nodes becomes prevalent, it gets increasingly Physical Design is the process of translating the gate-level
challenging to optimize Power, Performance and Area (PPA)
parameters to accommodate for the ever-increasing core logic RTL logical functionality of a design (.vg) into a physical
on a chip. A well defined heirarchical flow is thus quintessential geometricized form (GDSII) which can be taped-out for pro-
arXiv:2409.04726v1 [eess.SY] 7 Sep 2024

when it comes to VLSI design process. A robust heirarchical duction / packaging.


flow should encompass all stages, right from Gate-level RTL Need for Physical Design:
Synthesis (Front End Design) to Logic Placement and Verification • Current IC designs have millions of transistors and other
(Back End Physical Design) and finally culminating with tapeout
/ production. Physical Design in this aforementioned flow is the complex logic which is routed within each other along
process of translating logical circuit description into physically with several layers of metal in a given metal stack.
realizable GDSII form. This involves defining the best possible • Such designs furthermore need to be optimized for Power,
placement and routing for standard cells, macros and I/Os in the Performance, Timing and Area so as to ensure the best
design to optimize PPA for any given netlist. This paper helps possible performance output with lesser setbacks.
capture the nitty-gritty of methodologies and algorithms that are
• Manually optimizing the Placement and Routing of all
pertinent to the building and optimization of an efficient and
robust physical design flow in VLSI chip-designing process. components in a design has become a Herculean task
Index Terms—Power, Performance, Area, heirarchical flow, being time consuming and error-prone. As a result, Au-
synthesis, back end flow, physical layout, GDSII, floor planning, tomation in EDA industry is quintessential to carry out
placement, routing, verification, static timing analysis such an arduous task.
• Thus, a well-defined robust PD flow enhances our time

I. I NTRODUCTION to market capability and get more work done.


Physical Design in itself is a complex multi-domain pro-
Very-large-scale Integration (VLSI) technology revolves cess. As a result, it is broken down into simpler sequential
around designing Generic or Application Specific Integrated heirarchies to expedite the design process:
Circuits (ASICs) with the core logic sometimes comprising Partitioning breaks up a complex top-level circuit into
of billions and even trillions of transistors, all embedded smaller blocks or modules which can each be designed,
within a single small chip. Monumental developments in optimized and analyzed in isolation, before merging them
the field of VLSI date back to the 1970s when we were back into top-level analysis. Floor Planning determines the
experiencing a new dawn in semiconductor physics and its block dimensions, upsizing of HIC cells; assignment of pins,
application in building System-on-chip (SoCs), faster Proces- boundary and well-tap cells along with preplacement of any
sors and communication technologies.[1,2] Prior to all these critical logic in the design.
technological enhancements, most ICs and processors could Power Planning often done in conjunction with floor plan-
perform just a handful of operations with limited logic and ning, distributes power rings, rails and subsequent VDD,
instruction sets. With the help of developments in VLSI VSS, VDDQ, VDDR etc power domains across the design.
field however, it has become possible to implement circuits Power stripes are added in accordance with the metal stack
displaying desired levels of performance parameters based approved by the industry. Placement is done following the
on industry requirements. Such designs may contain billions floor planning stage, where the tool automates the placement
of standard and physical cells, I/O pins, macros, analog IPs, of core logic, macros and decap cells within the die area.
PLLs, data generators and numerous other blocks. The trend in Clock Tree Synthesis involves building and routing clock
VLSI technologies was observed and documented by Gordon architecture throughout the design. Critical parameters like
E. Moore as early as 1965. The famed Moore’s law states ”The Root pin, Through pin, Unsync pin and splits need to be
number of transistors in a microchip doubles every two years, defined prior to this stage, to balance skews and slack timings
though the cost of computers is halved”.[1,2] However, we in the clock path. Routing is done to define metal fill and
now have entered an era wherein this law may no longer hold connectivity between all the logic implemented in a design.
validity owing to the complexities involved while working at Metal layers and corresponding vias are dropped into the
lower process nodes. Despite these complexities, the cutting- placed logic in accordance with the metal stack and sub-circuit
edge EDA tools developed by Cadence Design Systems® requirement. Timing Closure helps optimize timing measures
and Synopsys have been aiding Design Engineers to produce like clock skew, delay skew, max- trans, setup and hold slack
in clock as well as data paths. Timing closure is critical as derate values, LEFs and metal stacks ; synthesizing the apt
failure to meet timing requirements can cause corruption in netlist and constraints to serve chip functionality ; carrying
data or on-chip instruction sets. out placement and routing of all cells and logic in the design
Verification stages include: ; implementing verification mechanics for timing parameters,
• Static Timing Analysis (STA) LVS, DRCs etc.[7]. Recently, Cadence Design Systems has
– Setup and hold check even been able to integrate Artificial Intelligence into chip-
– DRVs (max tran, max cap, fanout) check designing with the development of its ”Cerebrus” AI-oriented
– Clock and Data skew check EDA solution. Rest assured, the EDA industry is always on
the move to provide better and more robust solutions to the
• Power Distribution Network Analysis (PDN)
challenges faced by the ever-changing silicon market.
• Physical Verification (PV)
The solutions offered by EDA tools are gauged with the
– SI noise and Antenna check aid of several industry metrics to ensure accurate outputs
– Design Rule Checks (DRCs) and eradication of bottlenecks when the chip is out for sil-
– Layout vs Schematic (LVS) icon testing, fabrication and packaging. Considering the ever-
– DFM, LPA Analysis increasing demands in the near future in terms of processing
• Formal Verification (FV) power, supercomputing combined with the desire to implement
• Low Power Verification (CLP) greater logic on a smaller area, Aaron NG and Igor Markov
have proposed a robust benchmark mechanic to test the
II. LITERATURE SURVEY performance of EDA tools. Such a benchmark helps gauge the
Application Specific Integrated Circuits (ASIC) design, as tool in terms of accuracy, possibilities of failure, computing
the name suggests tends to cater to specific design purpose capacity, ability to extrapolate graphical and tabular results,
sought out by the industry. Such specific purposes may be Pareto Regression Analysis to compare Quality of Results
pertaining to chip performance, clocking frequency, data re- (QoR) with similar tools or previous versions, ability to control
quirement or process node, to name a few. The term ”Tech- the instruction flow etc.[8]
nology Node” or ”Process Node” is held in high regard in Cutting-edge EDA tools provide us with high level design
the semiconductor industry. Process node denotes the size of automation to help meet design requirements and constraints
the smallest gate being implemented in the chip. Smaller the with minimal manual work. However, it is quintessential that
process node, smaller will be the gate size, greater will be the we provide these tools with robust design methodologies and
gate density per micron, thereby resulting in greater scope to flows so as to get the best possible results. Three design
implement logic onto the chip. For example, a 7nm process architectures were predominantly in use in the early 1970s
node itself can have approximately one-tenth billion transistors all the way upto the 90s to propose semiconductor designs
per sq. mm.[3,4] In 2021, Intel has revealed its plans to for embedded CMOS applications, namely ASIC, DSP and
launch technologies based on 2nm process node as early as in RISC.[9] However, since the dawn of this century, owing to
the year 2024.[5] Although, greater gate density gives more the heaps of progress made in the silicon frame pertaining to
power to designers to implement logic, it complicates the chip size and on-chip computation, ASIC architecture has been
process of design furthermore leading to difficulties in closing at the heart of nearly all proposed VLSI designs. A standard
timing and optimizing PPA. Lower process nodes (especially ASIC flow covers all steps right from RTL synthesis to optimal
below 32nm) inherently add a layer of complexity especially Physical Design and Verification to finally culminating with
in terms of DRCs and congestion hotspots. Inamul Hussain production (tapeout). All stages in this process are critical.
and Saurabh Chaudhury published some research in 2020 Failure to meet design constraints in any of these stages may
discussing the prominence of power dissipation and leakage cause hiccups like data corruption, false outputs or even worse,
problems at such process nodes. Selection of appropriate logic chip failure. As a result, it is equally critical to maintain a
family is therefore deemed critical at the start of designing well defined flow which ensures that performance during all
any functional chip. As far as static logic families go, their the aforementioned stages is met as per desired requirements.
research claims that CNTFETs (Carbon Nanotube Field-Effect Such ASIC flows can be implemented on FPGAs or CPLDs
Transistors) surpass MOSFETs (Metal Oxide Semiconductor depending upon the dimensions and magnitude of proportions
Field-Effect Transistors) owing to their lower static power and of the design along with functionality and re-programmability.
leakage levels at lower technology nodes.[6] FPGAs generally supprt larger designs than CPLDs as they
Developments in the EDA industry have been critical in can host a large number of sequential flops and registers.
overcoming these challenges. With hi-tech EDA tools offered Being RAM-based or volatile in nature, FPGAs offer more
by prominent benefactors to the silicon industry in Cadence flexibility at the cost of a slightly higher booting or config-
Design Systems and Synopsys Inc, it is possible to optimize uration time. Once configured however, FPGAs are capable
results for majority of the designs and logic families, even of fast-computing on-chip signals. CPLDs being ROM-based
MOSFETs. These tools are compatible with representing data however are able to load up memories immediately on booting
from all major foundries including Samsung, TSMC, Intel up, but are slower in computational capability in the longer
etc. ; reading in multiple design inputs like timing libraries, run.[10,11] Traian Tulbure has published some of his work on
the reconfigurable nature of CPLD logic and its viability in logic synthesis leading upto automated pipelining options for
ASIC implementation in 2011. His research talks about the High-level Synthesis (HLS). Execution time for any synthesis
dynamic reconfigurable nature of CPLDs and how it could algorithm is governed by the following equation: where N
have an edge over SRAM-based implementation for smaller
designs. Satisfactory results were obtained from a timing point
of view over the course of this research.[10]
In the ASIC Design flow discussed earlier, Physical Design
is the process following Gate-level RTL synthesis and involves
representation of gates defined in the synthesis verilog netlists
into their geometricized forms complete with physical connec-
tivity provided using appropriate metal layers. This geometric
Fig. 1. Performance Equation for Synthesis Algorithms
representation is in GDSII form which is easily realizable for
masking and tape-out during production stage. As we look refers to number of cycles required to carry out instruction
deeper into the Physical Design flow and all the methodologies set and P denotes the time period for each of the N cycles.
that accompany it, we realize that it follows a sequential In combinational logic synthesis, instructions are provided to
heirarchy consisting of several sub-stages involving layout, try optimizing critical paths by working around the value of
timing and verification of the layout. The new age IC design P and other combinational logic. Sequential logic synthesis
flow consists of three most critical sub-sections. deals with optimizing false and multicycle paths in the design
• RTL Synthesis (Front End Design) and tampering with flops and other sequential components in
– This step involves synthesizing a verilog netlist the design.[12] Research work done by Satoshi Ohtake et al.
and constraints in Hardware Description Language further elaborates on accurately identifying false paths in a
(HDL) and Synthesis EDA tools in accordance with RTL design. They have proposed a novel methodology titled
the die area, process node, performance, chip func- ’Mapping Point Preserving-Logic Synthesis (MPP-LS)’ which
tionality and metal stack amongst numerous other maps path-to-path logical connectivity between the sequential
requirements set by the Foundry. logic in a circuit and distinguishes actual logical paths from
• Physical Design (Back End Design) redundant ones.[13] Front End Design further encompasses
– As discussed earlier, Physical Design takes the Syn- the following design processes:
Design entry: Each chip or IC is designed for a specific
thesis .vg and constraints as input and translates it to
purpose set by the industry. This purpose is defined in the
physical geometries to be implemented on the chip.
form of implementation logic on the chip. The objective of this
Throughout this process, the goal is to optimize the
stage is to identify this core logic and process settings along
design from timing and PPA point-of-view.
with the list of collaterals and desired requirements specified
• Physical Verification
by the industry. Foundry collaterals for this stage typically
– The objective of Physical Verification is to check the include architectural recommendations, target frequency for
design for any opens, shorts, unconnected pins and the design, timing windows and corresponding waveforms,
Design Rule checks set by the foundry as well as MMMC corners etc.
validating the physical layout of our design against Logic Synthesis: The functionality defined in the previous
the schematic obtained from the foundry. stage is put into code during Logic Synthesis. This code is
After the design is validated from verification point of view, written in any Hardware Description Language (HDL) such
it is then sent out for Tape-out and fabrication, followed by as Verilog or VHDL. This RTL logic is passed over to EDA
packaging. This post-verification process involving masking Synthesis tools such as Genus™ by Cadence® to obtain the
and chip production is handled by the Wafer Fabrication Gate-level netlist as output.
Houses before being packaged, tested and implemented as Gate level Simulation: This is a post-synthesis validation
ICs. All these stages in ASIC design flow possess their own procedure to verify the functionality of generated verilog
different methodologies and architecture which offers ease of with industry expectations. This stage also involves generation
interface for EDA tools as well as better conductivity of tasks. and analysis of power, timing and density reports for the
Along with the verilog netlists (.vg) and constraints generated synthesized netlist and corresponding constraints.
during Synthesis stage, there is some additional foundry data Authors A. Kahng, J. Lienig et al. in their 2011 Springer
which needs to be provided as input to the Physical Design publication titled ”VLSI Physical Design: From Graph Parti-
stage. This data is commonly known as ”foundry collaterals” tioning to Timing Closure” have garnered a comprehensive re-
or ”process collaterals” and contains information regarding search base on Back End VLSI Flow. Their work encompasses
attributes such as leakage power, area and functionality of detailed study on the following Physical Design methodolo-
standard cells with respect to different PVT corners, abstract gies. The back-end design includes following steps.[14]
view for the design and definition of metal layers. Schematic Entry: Similar to the Design Entry stage in
Cortadella et al. have published a journal article in 2015 Front End flow, this stage involves reading-in the logic design
taking us through the steps from combinational and sequential needed.
Pre Layout Simulation: The logic design extracted in the
previous step is validated before moving forward to the layout
stage. The simulation involves verification of the synthesized
netlist with schematic provided by the foundry with the help
of Cadence® Ultraism EDA tool.
Design Layout: After the logical equivalence of synthesized
netlist is verified, the design is ready to enter layout stage.
The Innovus™ and IC Compiler II EDA tools developed
by Cadence® and Synopsys respectively are widely used for
layout and subsequent optimization. The layout stage pans the
following sequential methodologies:
• Floor Planning
• Power Planning
• Placement
• Clock Tree Synthesis
• Routing
• Timing Optimization

Extracted Simulation: The design obtained from first-cut


P&R needs to be optimized for power, performance, timing
and area while also being wary of the timing and physical vio- Fig. 2. ASIC Design Flow
lations present in the design. Quantus™ which is an extraction
engine developed by Cadence® aids in extraction of resistive
• Physical Verification
and capacitive parasitics arising due to interconnect wires
added to the layout while routing. This net delay information is .
stored in Standard Parasitic Exchange Format (SPEF) files and
is further used for timing analysis. While clearing violating B. Design Collaterals
paths occurring in timing report, it is typical for designers to
try and optimize cell and net delay values in the layout. This is the output file generated
Gate Level Netlist (.vg) after synthesis. It is the gate level
The Journal ”Fundamentals of Layout Design for Electronic representation of the design.
Circuits” (Springer 2020) authored by J. Lienig and J. Scheible
talks in depth about various library interfaces, design rule This file is provided by foundry /
fabrication team. It provides technology
checks and resulting violations. Extensive research that has Technology file (.tech) specific information like physical and
been done into different EDA tools along with different heirar- electrical characteristics of metal layers,
chies and flows followed by each tool is another highlight from vias and metal widths, spacing, pitch
and routing design rules
this journal.[15]
This file is provided by process foundry.
III. BACK END FLOW It contains information regarding PVT
requirements, net delays, cell delays,
A. Physical Design transition, recovery, removal, setup and
A 2021 research by Dmitry Bulakh et al. published at hold time requirements. It also contains
information about area of cell, leakage
the International Seminar on Electron Devices Design and power, capacitance etc. LIB files are
Production was aimed at developing a Graphical User Interface Logical libraries /
generated using either of Composite Current
(GUI) to visualize, control and interpret the various stages of Source (CCS) or Non-linear Delay Model
Liberty files (.lib)
(NLDM) or ECSM methodologies. For smaller
Physical Design. The framework was developed in C++ using technology nodes, CCS is being preferred.
a few inherited classes from a cross-platform Qt library. Some The design needs to be validated for certain
.hpp, .dll and lib files can been provided as input along with PVT (Process, Voltage and Temperature)
corners to ensure seamless functionality
layer map files, and the framework would then render a GDSII under even the harshest of conditions.
file as output.[17] Timing is different for different analysis
The main steps in Physical Design flow are: views and corners. Hence, there is a .lib file
for every PVT corner.
• Post-synthesis Netlist Optimization
• Floor Planning This is provided by the foundry itself.
Library Exchange
The LEF is an abstract view of the cells
• Power Planning Format (.lef)
It contains information about cell geometries,
• Placement (Pre-CTS) routing and via placements.
• Clock Tree Synthesis (CTS)
• Routing .
• Post Route Timing Optimization
Constraint files are generated during synthesis • Polish
phase in accordance with foundry requirements Polish notation is widely used in the world of computing
pertaining to timing, power, performance and area to read and express logic embedded in data structures, but
System Design
requirements of the design. These files also define
Constraints file
the following: operating conditions, DRVs (max most prominently binary trees in the form of equations.
(.sdc)
trans, fanout and capacitance), frequencies of Such a notation is used to read the information conveyed
source and generated clocks along with clock by the sliced up floorplan binary tree in post-fix form.
uncertainty and latency, multicycle and false paths
etc. along with numerous other constraints.

TLU is a binary file used for RC estimation and


extraction, although header is in ASCII format. TLU
contains wire capacitance at different spacing and
Table Look Up
width in the form of a look-up table which provides
(TLU)
high accuracy and runtime benefits.
This file provides RC parasitic of metals per unit
length which is subsequently used to calculate net
delay.
Fig. 4. Non-slicing v Slicing Floor Plan

• Sequence Pair
• Bounded Slicing Grid (BSG)
C. Floor Planning • TCG
• O-Tree
As the name suggests Floor Planning helps create a skeletal
The O-Tree algorithm is one of the least computationally
level framework for spatial locations of standard cells, macros,
expensive algorithms with a complexity of O(n). It is
analog IPs and all other blocks on a circuit. The goal is to
a local-search algorithms and hence deterministic by
optimize design layout on the given die area while keeping
nature. Being deterministic means that this algorithm will
close tabs on probable congestion and density violations.
try to optimize the layout on the basis of immediate short
Typically floor planning is done to make the layout compact
term solutions in contrast to a few greedy algorithms
wherein logically connected instances are placed in close
which are a bit more experimental in their approach and
proximity to each other as well as everything else. This is
tend to be better at optimizing. Despite being one of the
done to make effective use of the routing resources available.
most computationally efficient algorithm, O-Tree based
approach may not always find the best possible solution
owing to its deterministic nature. A research done in
2005 by Maolin Tang and Alvin Sebastian addresses
this very issue and proposes a more greedy Genetic
Algorithm (GA) to optimize Floor Plan based on O-Tree
representation.[19]
• B* Tree
Similar to O-Tree, the B* algorithm showcases a compu-
tational complexity of O(n). It is widely regarded as one
of the most efficient and flexible Floor Plan notations to
optimize upon. It makes use of ordered weighted binary
tree, the root of which is located at the bottom left corner
Fig. 3. Example of efficient floor plan of the placement area at the coordinate (0,0).Once the
root is fixed, the rest of the tree is built recursively, first
Goals for Floor Planning: populating the left branch and then the right.
• Minimize the total chip area and dead space .
• Minimize total wire length Optimization
• Minimize Interconnection complexity Once the initial Floorplan notation is decided upon, then
• Improve the performance by minimize delay begins the process of optimizing the layout to have the smallest
Naushad Manzoor Laskar et al. have documented a compre- area with the most optimum core utilization. [18]
hensive study of all the prominent Floorplan Representations • Simulated Annealing
and the means of achieving them in their 2015 publication ti- Simulated Annealing is the oldest Floorplanning algo-
tled ”A Survey on VLSI Floorplanning: Its Representation and rithm and has been extensively used over the years. It
Modern Approaches of Optimization”. This detailed research can be used effectively with slicing as well as compacted
covers all aspects of floorplan and floorplanning algorithms notations like Polish, B*Tree etc. This algorithm can
starting with all different ways of denoting a floorplan.[18] converge to a fairly optimal solution but is irregular
Different ways of representing a floor plan are as follows: in doing so. In modern day, this algorithm is used as
a preliminary benchmark for researchers to test newer to avoid wastage of die area. Core area utilization should also
algorithms on.[18] be taken into account while define floor plan. Most industries
• Genetic Algorithm target 60-70% initial core utilization to generate margin for
Soon after the success of Simulated Annealing algorithm, timing optimization at a later stage. A compact floor plan
the Genetic Algorithm (GA) was developed. GA opti- presents a few advantages related to speed of operation of
mization begins with arbitrary placement of blocks on the chip, reason being, the more compact our design, closer
the pre-defined die area. The Cost Function of this initial will be the placement of on-chip components, lesser will be
floorplan is calculated using some distance metrics. Then the routing resources used, lesser will be the interconnect
any two of the blocks are spatially swapped and a new length and subsequent net delay, thereby reducing net latency
revised cost function is deduced. This is then compared and increasing the speed of operations. This gives rise to an
with the initial value and a decision is made whether the active trade-off between speed of design and resulting routing
swapping has affected the layout positively or adversely. congestion.
This process is recursively carried out until the most Common steps panning the Floor Plan stage include:
optimum value is obtained for the cost function.[18] • Partitioning
• Partical Swarm Optimization (PSO) • Defining Block Dimensions
In PSO, each block is treated as an individual entity • Pin Placement
with the same weight. The premise of this algorithm • Adding Decap, Tap and End Cap cells
is to spatially change the position of these individual
blocks so as each block determines its own best position
with respect to its nearest neighbours in the design. A
research published in 2019 by S.B.Vinay Kumar et al.
proposed an Adaptive PSO model which tunes the PSO
model further by adding a weight factor to the block. All
blocks are defined weight values at the initial stage. Over
successive iterations as the block gets closer and closer to
determining its optimum posititon, its weight and hence
priority for optimization keeps on reducing. So all blocks
will start with a high inertia value and end at a smaller
value towards the completion of the optimization process.
Such an adaptive architecture is more likely to give better
results than GA and traditional PSO models, as claimed
by the study.[20]
Fig. 5. Adding Decap cells
As all the above discussed algorithms point out, optimization
is a recursive process and hence may take up a lot of time • Macro Placement
to converge on an optimal floorplan. To reduce the time • Adding Routing and Placement blockages
taken by traditional floorplan algorithms, Yanling Zhou et al. • Adding IO buffers
have proposed a quicker floorplanning method which involves
breaking the full initial netlist into smaller parts and carrying D. Power Planning
out floorplanning for each individual sub-netlist in a parallel Power Planning is the process involving power grid creation
threaded manner thereby saving time on converging to the to facilitate equal distribution of energy to all parts of the
optimal floorplan.[21] design.
As discussed earlier, floor plan lays out a framework for What creates the Power Grid? There are 3 levels of power
the physical design to be built upon. While the process of distribution involved:
defining a floor plan may be entirely different for designs 1. Rings: Carries VDD and VSS around the chip
with different level of heirarchies, there are a few common 2. Stripes: Carry VDD and VSS from the Rings around the
fields to be defined to ensure an efficient design layout. To chip
begin with, floor plan defines the block dimensions thereby 3. Rails: Connect VDD and VSS from chip level to standard
setting the die area. To make optimum use of this die area, cell level
it is critical to identify all the logic, IPs, macros, I/Os Steps involved in this stage:
and memories which need to be placed in the design. For • Width, pitch and offset dimensions of power stripes wrt
placement of all the aforementioned components, floor plan each metal layer in accordance with provided metal stack.
should take into account the possibility of cell density issues • Block and I/O Power connection at top level using power
or routing congestion that may arise owing to poor or incorrect rings, bumps and stripes.
placement. The designer should also aim to make the layout • PG connection at standard cell and block level via power
compact so as to use routing resources efficiently as well as rails.
• Add a current source to each cell of the mesh, to compute
the magnitude of switching current originating from that
cell.
• The RC parasitic and switching current information thus
obtained is used to calculate dynamic power dissipation
for the schematic.
• If the Power dissipation numbers are below a pre-defined
threshold, we can realize the schematic grid using VDD
and VSS stripes.

Fig. 6. Power Planning

Power Planning is a very critical stage in Physical Design


as it defines the Power/Ground mesh for any layout. The mesh
needs to be built considering power requirement for all pins
and macros in the design ; standard cell placement and the
possibilty of congestion or high IR drop must also be taken Fig. 8. Power Dissipation
into consideration while doing power and bump planning for
any design. Any high voltage drop in the layout caused by Researchers have also tried integrating Power Planning
EM/IR imbalance may cause circuital failure. Zhu Qing et al. into the Floor Planning stage thereby letting the optimizing
in their publication titled ”Simulation and Planning Method for algorithms to optimize the floor plan while taking into consid-
On-Chip Power Distribution – An Industry Perspective” have eration the parallel creation of an apt power mesh.[23,24] Shuo
studied the different parameters affecting power distribution Zhou et al. have documented the integration of power planning
and have come up with a set of steps to be followed while with the Bounded Slicing Grid (BSG) floorplan representation
power planning to minimize the chances of fatal voltage drop as early as the year 2001. The proposed algorithm was aimed
across the design.[22] The proposed steps are as follows: to complete the two planning stages simultaneously on a BSG
representation with a view to optimize placement area and
• Categorize all standard cells in the design into one of the
power resources.[23] Han Liying et al. in the year 2009 have
two groups: those giving rise to severe IR drop and those documented an integrated power and floor planning algortihm
with nominal IR drop. based on the Genetic Algorithm (GA) method of optimization.
• Based on the IR drop metrics obtained in the previous
step, draw a schematic for the power grid, with the E. Placement
horizontal and vertical layers being represented by a Based on the ground-work defined in Floor Plan stage,
metal resistor. Compute the total RC parasitic offered by the objective of Placement stage is to optimize timing and
this schematic grid arrangement. For better accuracy, we routing within the design. Timing for any block is a function
may even include via parasitics in the calculation. of the cell and net delay between all logically connected
paths in the design. Cell delay is a property of the cell. By
manipulating the VT flavor of an instance, its cell delay can
be changed. Lower the Voltage Threshold (VT), lower the
cell delay for the instance. Net delay on the other hand, is
a function of the parasitics corresponding to interconnects
in the design. These RC Parasitics either be derived from
Wire Load Models (WLMs) which are in the form of look-
up tables provided by the foundry or from Virtual Route
(VR) parameters. Virtual Route model tends to provide more
accurate parasitic extraction than WLM models leading to
more refined timing calculations.
Need for placement:
Key factor in determining performance of the circuit: As it
Fig. 7. Schematic Power Grid indirectly dictates the routing length of wires, it plays a role
in determining the delay associated with each wire.
Determines routing ability of the design: A well placed – Buffer sizing
design will have no problems with routing. – Gate relocation
Decides distribution of heat on the die surface ; uneven – Gate sizing
temperature profiles can lead to reliability and timing issues – Improve skew
Power consumption also gets affected by placement – Delay insertion
Goal for placement stage: • Perform inter-clock balancing
• optimize routing resources – Between 2 flip flop delay balancing has to be done
• optimize die utilization – Clock group between which balancing has to be
• reduce routing congestion and hotspots specified
F. Clock Tree Synthesis While working on lower process nodes, it is observed that
Once all the core logic has been placed during the previous with the increase in sequential logic coupled with shorter
stage, we start building paths for transmission of clock signal and more dominant wire delays, it becomes increasingly
throughout the block. Clock Tree Synthesis is a very critical complex and critical to balance clock skew. While build-
process owing to the numerous functional failures that may ing clock architecture, we generally have a source clock
arise out of incorrectly built clock architectures. Misalignment which distributes clock signal throughout the block via clock
in clock paths may cause setbacks such as failure to meet nets defined using CTS root pins. Clock specifications like
timing and skews, false outputs, data corruption or even chip- max capacitance, max transition constraints along with max
failure. CTS involves building of accurate clock paths and and/or min insertion delays are defined for every root pin and
addition of buffers to balance these clock paths by minimizing are thus implemented across all clock nets originating from
clock skew. These addition of buffers also aids in reducing the pin. Guirong Wu et al. in the year 2009, have proposed a
hold slack value. At the timing of defining clock tree, signal is more efficient clock splitting methodology to build better clock
derived from source clock for the block and traversed across architectures. The proposed methodology talks about splitting
numerous splits in the tree to ultimately reach the CK pins the main source clock into multiple pseudo clock sources at
of flops. If any of these CK endpoint pins are defined to be transistor level based on the number of fanouts for every split.
’Don’t touch’ or ’ignore pins’, then they are bypassed during This would ultimately help with DRVs as well owing to the
clock tree synthesis. Alternatively if any pin is defined to be split distribution of fanouts, along with more balanced clock
a ’through pin’, it is assumed that the clock path is already skew.[25]
built and signal input to the pin is provided. Siong Kiong Teng et al. have proposed another ”Regional
Types of skews: Clock-Splitting” methodology in their 2010 publication titled
• Global skew achieves zero skew between two synchronous ”Regional Clock Gate Splitting Algorithm for Clock Tree
pins without considering logic relationship. Synthesis”.[26] The said methodology involves the following
• Local skew achieves zero skew between two synchronous steps:
pins while considering logic relationship. • Clock Gate Marking
• If clock is skewed intentionally to improve setup slack Upon placement of standard cells, macros and other phys-
then it is known as useful skew. ical cells in the layout, all Clock Gating (asynchronous)
. cells are identified. Each of these Clock Gating (CG)
Inputs to CTS stage: cells are allotted a bounding box that encompasses all
• Placement Data direct fanouts related to that cell. This bounding box
• Clock Specification File is demarcated to compare the skews across clock nets
affiliated to each of these cells. Simply put, a larger
– Maximum and Minimum insertion delay
bounding box area implies that the fanouts are located
– Target Skew
farther away from the CG cell thereby drawing additional
– Maximum transition value
interconnect wire length and giving rise to higher skew.
– Non Default Rules (NDR)
• Clock Gate Splitting
– Auto CTS root pin
In the event where any CG cell is associated with any
– Preferred metal layers for clock
setup violations along with the area encompassed by
– Type of buffers
the corresponding bounding box being greater than a
– Latency
pre-defined threshold, the clock gate cell will undergo
– Maximum fanout
splitting. While splitting, the bounding box will split at
– Maximum capacitance value , etc.
half the original length in its dominant direction i.e if any
. bounding box is horizontally oriented, it will split at X/2
Steps involved in CTS: distance keeping its Y metric stable, and vice-versa. This
• Synthesize the Clock Tree splitting is recursively carried out until bounding box area
• Optimize the Clock tree. This is done by meets the threshold criteria defined earlier. Once such
– Buffer relocation a stage is achieved, the parent clock gate will split ’n’
times, where ’n’ is the number of times the corresponding
bounding box had split in order to meet threshold. The
study claims to have achieved reduced post-split insertion
delay along with a robust clock architecture with balanced
skews.[26]
G. Routing
The Physical Design flow should have completed placement
of standard cells, IPs, macros, physical cells and built the entire
clock architecture before going for routing. Simply put, routing Fig. 9. Dominant Points and the Underlying Graph
provides net or wire connectivity between all instances on the
layout. Routing creates a grid for transmission of all signals
throughout the design. Hence the interconnect nets it creates
are also called ”signal nets”. Similarly CTS stage creates
”clock nets” and Power Planning generates ”power nets”. The
aim of routing is to facilitate appropriate connectivity between
instances in the design in accordance with the metal stack
approved by industry, while also trying to optimize design
with respect to possible congestion hotspots. There are two
stages within routing methodology:
• Global Routing
Global Routing, also called Early Routing in some cases
is the stage wherein routing metal is allocated to appro-
priate layers along with channel track alignment. It helps
build a general topology for the resources to be used
during Detailed Routing.
• Detailed Routing
Detailed Routing, also known as Nano Route in some Fig. 10. Identification of 1-Steiner points
cases makes use of the foundation laid down by Early
routing, to actually route all interconnects in the design.
Detailed routing weighs the trade-off between conser- • Netlist with location of blocks and location of pins (after
vation of routing resources and LPA congestions, and CTS has been completed)
optimizes its function accordingly. DFM and congestion • Timing budget for critical net
checks are carried out for the same purpose. • Technology File
Hao Tang et al. have published a comprehensive survey on • TLU+ File (Commonly included along with the Technol-
the Steiner Tree Algorithm for Global Routing in their 2020 ogy File .tf)
research work titled ”A Survey on Steiner Tree Construction • SDC
and Global Routing for VLSI Design”. Steiner Tree is an .
extension of the Minimum Spanning Tree problem that is Checklist before Routing:
commonly used in processing methodologies. The objective of • Placement completed
this algorithm is to optimize the routing graph or network to • CTS completed
have minimum traversal cost. All vertices in the said graph that • Power and ground nets routed
constitute the minimum cost tree are referred to as ”demand • Estimated congestion is acceptable
points”. All such demand points are connected via horizontal • Estimated Timing – acceptable ( 0 ns slack )
and vertical lines on a mesh referred to as the ”underlying • Estimated max cap/trans – no violations
graph”.[27,28] Routing Congestion: When designing chips on lower pro-
The 1-Steiner optimization is further implemented on these cess nodes and critical utilization factors, it just may happen
underlying graphs, wherein a few common points are identified that routing resources would get crowded in a certain area.
as ”1-Steiner Points”. These 1-Steiner points is chosen such Such congestions may have been caused due to lack of routing
as to minimize the routing distance (horizontal and vertical) resources or tracks or simply lack of adequate area to establish
between the dominant points.[28] the routes. This is a common cause for Density-related DRCs
Just like the Steiner Tree algorithm, other minimum distance and even EM-IR failures on-chip.
algorithms like PRIM, Bounded PRIM, Bounded Radius Span-
ning Tree have also been implemented to optimize routing for H. Physical Verification
a layout.[28] . Physical verification is a process whereby complete Layout
Inputs to Routing stage: design is verified via EDA software tools to ensure correct
logical functionality and manufacturability these EDA tools, which measures the tool performance as
Physical Verification involves the following validation well as post-optimzation utility in formulating and displaying
checks: results.[8] However, throughout our study it was observed that
• Design Rule Check a well-curated Physcial Design flow is as important as the
DRCs are of two kinds: Base DRCs and Metal layer computational calibre of EDA tools to achieve optimum results
DRCs ; of which Base DRCs need to be cleared in and desired functionality.[30,31]
the floorplan or placement stage of the flow. These may
include placement legalization issues, track misalignment The input to such a flow is in the form of a synthesized
violations, density and utilization requirements not being verilog netlist supported by a set of design constraints. Re-
met, incorrect instance orientation and physical cell viola- search proposed by Cortadella et al. and Satoshi Ohtake et
tions. Metal layer DRCs on the other hand can be cleared al. captures the different ways of optimizing the synthesis
in the Post-Route database as well. These issues range process with critical path targetting methodologies like High-
from overlapping vias to metal shorts to even minimum level Synthesis and Mapping Point Preserving-Logic Synthesis
spacing violations between instances and nets. respectively.[12,13] These synthesized .vg, .sdc files along
• Layout versus Schematic with foundry collaterals like timing libraries, lef files etc
In this process, the final streamout layout GDS is com- are provided as input to the PD flow which is covered in
pared against the golden schematic netlist to check for great detail by A. Kahng et al. in their publication titled
any mismatches or missing instances. All connectivity ”VLSI Physical Design: From Graph Partitioning to Timing
issues (i.e opens and shorts) are also checked and fixed Closure”.[14] The introductory stage to Physical Design is
during cleaning LVS for a design. Floor Planning. Floor planning involves laying a ground-
• Antenna Check work for the instances to be placed upon. There are a few
• Electrical Rule Check different ways to represent a floorplan of which the O-Tree
and B*Tree are widely regarded as the most efficient. Naushad
Manzoor Laskar et al. have compiled a comprehensive survey
on all the different Floor Plan representations and algorithms
like Simulated Annealing (SA), Genetic Algorithm (GA) and
Particle Swarm Optimization (PSO) that are used for opti-
mizing floorplan for the said representations.[18] Researchers
have also tried integration of the Power Planning stage into
Floor Plan optimization. Once the floorplan is decided upon
and the Power/Ground (PG) mesh is generated, the design
enters P&R stage which comprises of Placement, Clock Tree
Synthesis (CTS) and Routing. Based on the ground-work
defined in Floor Plan stage,the objective of Placement stage is
Fig. 11. Back End Flow to optimize timing androuting within the design. In CTS, the
clock heirarchies are defined throughout the design for timely
and balanced propagation of clock signal. However, balancing
C ONCLUSION clock skews at lower process nodes is a complex yet critical
In this paper, we have garnered and compiled a compre- task which even some EDA tools may find challenging to
hensive study on all the different aspects and methodolo- optimize. Siong Kiong Teng et al. have proposed an efficient
gies encompassing any robust ASIC Back End Flow. As we ”Clock Gate Splitting” algorithm to allow a more robust clock
move further onto a path that supersedes Moore’s Law, it is architecture with balanced skews and insertion delays. [26]
observed to be increasingly complex to develop even lower Once the clock heirarchy is established, the design is ready to
process nodes owing to the difficulties in power management, enter Routing phase wherein interconnects between on-chip
timing closure and conservative routing versus congestion instances are established in a two step process: the superficial
trade-off.[29] While automation aided by EDA tools plays Global Routing and the more in-depth Detailed Routing. As
a key role in simplifying and accomplishing design scope, the main aim of routing phase is to optimize and conserve the
their optimization could be hindered by the aforementioned routing resources throughout the design, a fair few minimum
challenges. Research work compiled by Inamul Hussain et distance optimization algorithms come to the fray. Of these,
al. captures the gist of these challenges and points towards the Steiner Tree Algorithm has been studied and documented
the use of CNTFETs logic family to design chips at lower in detail during the course of this survey.
technology nodes.[5] However, the ever-developing EDA tools
prominently aided by Cadence Design Systems and Synopsys The design methodologies, optimization algorithms and
Inc. are able to provide adequate coverage to even use MOS- ASIC flow information garnered and documented in this
FETs at such lower processes. N.G Aaron et al. have proposed review may serve as a comprehensive knowledge-point for
a benchmark for gauging the accuracy and QoR obtained from improving upon the existing methodologies and practices.
R EFERENCES CON) - A method to speed up VLSI hierarchical physical design in
floorplanning
[1] Schaller, R. R. (1997). Moore’s law: past, present and future. IEEE [22] Zhu, Qing K.; Bars, Vincent (2009). [IEEE 2009 12th International
Spectrum, 34(6), 52–59. Symposium on Design and Diagnostics of Electronic Circuits & Sys-
[2] Ethan R. Mollick. August 2006 IEEE Annals of the History of Com- tems - Liberec, Czech Republic (2009.04.15-2009.04.17)] 2009 12th
puting 28(3):62 - 75. Establishing Moore’s Law International Symposium on Design and Diagnostics of Electronic
[3] Vishesh S, Manu Srinath et al. International Journal of Advanced Circuits & Systems - Simulation and planning method for on-chip power
Research in Computer and Communication Engineering Vol. 6, Issue 4, distribution — An industry perspective
April 2017. Case Study of 32nm, 22nm, 14nm and 10nm Semiconductor [23] Shuo Zhou, ; Sheqin Dong, ; Xiaohai Wu, ; Xianlong Hong, (2001).
Process Technologies [IEEE 4th International Conference on ASIC - Shanghai, China (23-25
[4] J. K. Lorenz et al 2018 ECS J. Solid State Sci. Technol. 7 P595. Process Oct. 2001)] ASICON 2001. 2001 4th International Conference on ASIC
Variability for Devices at and beyond the 7 nm Node Proceedings (Cat. No.01TH8549) - Integrated floorplanning and power
[5] Dexter Johnson. IEEE Spectrum Article 06MAY 2021. IBM Introduces supply planning
the World’s First 2-nm Node Chip New chip, Milestone offers greater [24] Liying, Han; Hongmei, Tang; Ruoyan, Zhang; Cunshan, Zhang; Hong-
efficiency and performance dong, Zhao (2009). [IEEE 2009 International Conference on Computer
[6] Hussain, Inamul; Chaudhury, Saurabh (2020). A comparative study on and Communications Security (ICCCS) - Hong Kong, Hong Kong
the effects of technology nodes and logic styles for low power high (2009.12.5-2009.12.6)] 2009 International Conference on Computer and
speed VLSI applications. International Journal of Nanoparticles, 12 Communications Security - An Optimization Method for Power/Ground
[7] Whiteley, S. R., & Kawa, J. (2019). Progress Toward VLSI-Capable Network Based on Genetic Algorithm
EDA Tools for Superconductive Digital Electronics. 2019 IEEE Inter- [25] Guirong Wu, ; Song Jia, ; Yuan Wang, ; Ganggang Zhang, (2009).
national Superconductive Electronics Conference (ISEC) [IEEE 2009 IEEE International Conference of Electron Devices and
[8] Ng, A.; Markov, I.L. (2005). [IEEE Sixth International Symposium Solid-State Circuits (EDSSC 2009) - Xi’an (2009.12.25-2009.12.27)]
on Quality of Electronic Design (ISQED’05) - San Jose, CA, USA 2009 IEEE International Conference of Electron Devices and Solid-
(21-23 March 2005)] Sixth International Symposium on Quality of State Circuits (EDSSC) - An efficient clock tree synthesis method in
Electronic Design (ISQED’05) - Toward Quality EDA Tools and Tool physical design Optimization Method for Power/Ground Network Based
Flows Through High-Performance Computing on Genetic Algorithm
[9] Campbell, M.C. (1998). [IEEE Comput. Soc First Merged International [26] Teng, Siong Kiong; Soin, Norhayati (2010). [IEEE 2010 IEEE Inter-
Parallel Processing Symposium and Symposium on Parallel and Dis- national Conference on Semiconductor Electronics (ICSE) - Malacca,
tributed Processing - Orlando, FL, USA (30 March-3 April 1998)] Pro- Malaysia (2010.06.28-2010.06.30)] 2010 IEEE International Conference
ceedings of the First Merged International Parallel Processing Sympo- on Semiconductor Electronics (ICSE2010) - Regional clock gate split-
sium and Symposium on Parallel and Distributed Processing - Evaluating ting algorithm for clock tree synthesis
ASIC, DSP, and RISC architectures for embedded applications. [27] Tang, Hao; Liu, Genggeng; Chen, Xiaohua; Xiong, Naixue (2020). A
[10] Traian Tulbure (2011) A Dynamic Reconfigurable CPLD Architecture Survey on Steiner Tree Construction and Global Routing for VLSI
for Structured ASIC Technology. International Symposium on Applied Design. IEEE Access
Reconfigurable Computing ARC 2011: Reconfigurable Computing: Ar- [28] Du, DZ., Lu, B., Ngo, H., Pardalos, P.M. (2001). STEINER TREE
chitectures, Tools and Applications pp 296-301 PROBLEMS . In: Floudas, C.A., Pardalos, P.M. (eds) Encyclopedia
[11] Balboni, A., & Valenti, L. (1996). ASIC design and FPGA design: of Optimization. Springer, Boston, MA. https://doi.org/10.1007/0-306-
A unified design methodology applied to different technologies. Field- 48332-7 489
Programmable Logic Smart Applications, New Paradigms and Compil- [29] J. Ganesh Prasad, S. R. Karbari, S. Ammikkallingal and S. K. Bellal,
ers ”Analysis, Physical Design and Power Optimization of Design Block
[12] Cortadella, Jordi; Galceran-Oms, Marc; Kishinevsky, Mike; Sapatnekar, at Lower Technology Node,” 2018 3rd IEEE International Confer-
Sachin S. (2015). RTL Synthesis: From Logic Synthesis to Automatic ence on Recent Trends in Electronics, Information & Communica-
Pipelining. Proceedings of the IEEE tion Technology (RTEICT), 2018, pp. 732-737, doi: 10.1109/RTE-
[13] Ohtake, Satoshi; Iwata, Hiroshi; Fujiwara, Hideo (2010). [IEEE 13th ICT42901.2018.9012556.
IEEE Symposium on Design and Diagnostics of Electronic Circuits and [30] Y. Zhou, Y. Yan and W. Yan, ”A method to speed up VLSI hierar-
Systems - Vienna, Austria chical physical design in floorplanning,” 2017 IEEE 12th International
[14] A. Kahng, J. Lienig, I. Markov, J. Hu: ”VLSI Physical Design: From Conference on ASIC (ASICON), 2017, pp. 347-350, doi: 10.1109/ASI-
Graph Partitioning to Timing Closure”, Springer (2011) CON.2017.8252484.
[15] J. Lienig, J. Scheible (2020). ”Chap. 3.3: Mask Data: Layout Post [31] Jess, J. (2008). Physical Design and Validation. In: Lauwereins, R.,
Processing”. Fundamentals of Layout Design for Electronic Circuits. Madsen, J. (eds) Design, Automation, and Test in Europe. Springer,
Springer. Dordrecht. https://doi.org/10.1007/978-1-4020-6488-3 25
[16] Mehrotra, Alok; Van Ginneken, Lukas P P P; Trivedi, Yatin. ”Design
flow and methodology for 50M gate ASIC”, IEEE Conference Publica-
tions
[17] Bulakh, D., Korshunov, A., & Datsuk, A. (2021). An Academic
Framework for IC Physical Design Algorithms Development. 2021
International Seminar on Electron Devices Design and Production (SED)
[18] Naushad Manzoor Laskar, ; Sen, Rahul; Paul, P.K.; Baishnab, K.L.
(2015). [IEEE 2015 International Conference on Innovations in In-
formation,Embedded and Communication Systems (ICIIECS) - Coim-
batore, India (2015.3.19-2015.3.20)] 2015 International Conference on
Innovations in Information, Embedded and Communication Systems
(ICIIECS) - A survey on VLSI Floorplanning: Its representation and
modern approaches of optimization
[19] Maolin Tang ; Alvin Sebastian ; A Genetic Algorithm for VLSI
Floorplanning Using O-Tree Representation. Workshops on Applications
of Evolutionary Computation EvoWorkshops 2005: Applications of
Evolutionary Computing
[20] Vinay Kumar, S. B.; Rao, P. V.; Singh, Manoj Kumar (2019). Optimal
floor planning in VLSI using improved adaptive particle swarm opti-
mization. Evolutionary Intelligence
[21] Zhou, Yanling; Yan, Yunyao; Yan, Wei (2017). [IEEE 2017 IEEE 12th
International Conference on ASIC (ASICON) - Guiyang (2017.10.25-
2017.10.28)] 2017 IEEE 12th International Conference on ASIC (ASI-

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy