Mentor Graphics Tutorial: Schematic Capture, Simulation, & Placement/Routing
Mentor Graphics Tutorial: Schematic Capture, Simulation, & Placement/Routing
Mentor Graphics Tutorial: Schematic Capture, Simulation, & Placement/Routing
1.0 Introduction
This tutorial demonstrates a simple VLSI circuit design process from concept to chip
layout of an 8-bit Modified Booth Multiplier on a 0.5m process using software from
Mentor Graphics Corp. The topics covered in this tutorial include schematic capture &
design, simulation, and placement & routing.
2.0 Schematic Capture & Design
The specification for the design calls for an 8-bit unsigned multiplier that accepts two 8bit unsigned inputs and produces a 16-bit unsigned output as a result. Three additional
control signals are required, including a DONE, START, and CLOCK signal. The
implementation presented in this tutorial consists of these baseline requirements, in
addition to a RESET control signal. Also, the multiplier accepts signed inputs and
therefore performs signed multiplication. We use Booths Modified Algorithm as the
underlying architecture for the design due to its ability to produce a result quickly and
reliably. In an effort to better understand the derivation of the design, a brief description
of Booths Modified Algorithm follows.
2.1 Booths Modified Algorithm
On average Booths Modified Algorithm can produce results in approximately half the
time that the traditional add and shift multiplier can. This is because Booths Modified
Algorithm looks at strings of three bits simultaneously with a one-bit overlap in each
successive comparison in order to decide what to do next. Since one bit of each string of
three bits overlaps with the previous triplet, two new bits are effectively considered
during each clock cycle. In the case of an 8-bit multiplier, this means that calculations
can optimally be performed in 4 clock cycles, excluding additional control states. The
following is a more formal definition of the Modified Booth Algorithm. Let x be the
multiplier and m be the multiplicand. Let two bits of x plus the last bit from the previous
two bits represent the triplet xL. Assume x and m are n-bit signed binary numbers. The
triplet xL can be represented by the following vector:
xL =
x 2 y +1 , x 2 y , x 2 y 1
(1)
n
where y = 0,1,2,..., ; x2 y +1 is the first bit of the triplet, x2y is the second bit of the triplet,
2
and x2y-1 is the overlapped bit from the previous triplet. Letting xi be the ith bit of x and
let x-1 = 0, the twos complement of x can be written as
n2
x = xn 1 2 n 1 + xi 2i
i =0
(2)
2 1
= 22 y ( 2 x2 y +1 + x2 y + x2 y 1 )
(3)
y =0
21
2 1
= 22 y (x y , m )
(5)
y =0
where (x y , m ) represents the Modified Booth recoding function and is defined by the
piecewise function:
0, x y = 0,0,0
m, x y = 0,0,1
m, x y = 0,1,0
(xy , m) =
2 m, x y = 0,1,1
2 m, x y = 1,0,0
m, x y = 1,0,1
m, x y = 1,1,0
0, x y = 1,1,1
(6)
The Mentor Graphics Design Architect tool is used in this tutorial for schematic capture
and design. Due to the nature of schematic capture, a hierarchy consisting of
encapsulation and abstraction is used to make the design more modular and
comprehensible. The ADK libraries consist of all the necessary standard cells needed to
build each functional unit comprising the circuit, and therefore they will be used
extensively. The top level circuit schematic, presented in figure 1, consists of 7 primary
functional units. In total, there are 11 functional units that encompass the design.
use. A bus does not have to be named in the event that two symbols with common bus
sizes are to be connected. Ports (inputs/outputs), GND & VDD, basic logic gates, flipflops, transistors, pads, etc. can be added by navigating to the ADK libraries under the
Libraries pull-down menu. This will display the ADK libraries palette menu to the
right of the screen. This concludes the basic foreknowledge needed in order to reproduce
the multiplier presented in this tutorial. The tutorial will proceed as follows: Each
functional unit will be presented and a brief description of each will be given.
2.2.1 The Control Unit
Figure 2 presents the Design Architect circuit schematic for the control unit. The main
control unit is implemented as a finite state machine consisting of eight states.
X1
10
10
7
X1
X1
X1, 00
X0
X0
00
X1
X1
X0
X0
3
X0
X0
X1
STATE DEFINITION
CONTROL STATES
0
0
0 CLEAR / WAIT FOR RDY SIGNAL
0
0
1 RDY ASSERTED / LOAD MULTIPLICAND
0
1
0 LOAD MULTIPLIER
0
1
1 ADD RECODED MULTIPLIER AND SHIFT BY 2
1
0
0 ADD RECODED MULTIPLIER AND SHIFT BY 2
1
0
1 ADD RECODED MULTIPLIER AND SHIFT BY 2
1
1
0 ADD RECODED MULTIPLIER AND SHIFT BY 2
1
1
1 WAIT FOR RDY SIGNAL AND ASSERT DONE
01
0
0
1
1
11
0
0
1
1
10
0
1
0
1
0
1
1
1
The following is a brief explanation of the derivation of the state table listed in figure 3.
Upon initialization, the state of the multiplier is unknown. Therefore, to ensure that the
multiplier is in a known state, the external reset signal must be asserted to clear the
contents of the state machine as well as the result register. When this is done, the
multiplier enters state 0 and stays in this state until the ready signal is asserted. When the
ready signal is asserted, the multiplier enters state 1, and it loads the multiplicand into an
8-bit storage register for recoded multiplier calculations. At this time, the state machine
asserts the enable signal to allow the multiplicand to be loaded and it then de-asserts the
enable signal on the next clock edge. The state machine enters state 2 during the next
clock cycle, where it loads the multiplier into the lower byte of the result register from
the input. Due to external pin limitations, the two 8-bit input operands (the multiplicand
and the multiplier) are loaded sequentially with the same 8-bit input bus. The following
four states are required to produce the product. The product is calculated by adding the
proper recoded multiplier to the upper byte of the result register and then performing a
right 2-bit shift during each clock cycle. During state 7, the done signal is asserted and
the content of the result register is stable and valid. The multiplier stays in this state and
the output is valid until the ready signal is asserted again. The figure below presents a
timing diagram of the circuit.
S0
S1
S2
S3
S4
S5
S6
S7
CLOCK
A (7:0)
B (7:0)
RESET
DONE
READY
R (15:0)
An 8-bit register was designed with D-Flip-Flops with the sole purpose of storing the
input multiplicand for recoded multiplier calculations. Figure 7 presents the circuit
schematic for this component.
Figure 8 presents the circuit schematic for this component. The purpose of this
component is to select the appropriate recoded multiplier based on the lower 2 bits of the
result register and the carry out bit. The input to this component consists of a 3-bit bus
containing the result bits just described, and the 4-bit output bus contains an enable bit
signal for each of the four non-zero recoded multipliers.
As probably deduced from the title of this section, this component serves to simply
generate each of the four non-zero recoded multipliers. Figure 9 presents the circuit
schematic for this component. One of the four multipliers is selected based on a 4-bit
input signal generated by the multiplicand select decoder. The contents of the 8-bit
multiplicand register serve as the other input to this component. The outputs consist of 4
10-bit recoded multipliers, in which one of the recoded multipliers will be active,
depending on the input select signal from the multiplicand select decoder.
This unit simply produces the 2s complement of the input. The input is 8 bits wide, and
the output is 8 bits wide with two additional bits for sign extension. Additionally, a sign
output bit gives the polarity of the result. This component is used in generating the
recoded multipliers for the recoded multiplier unit. The circuit schematic for this
component is presented in figure 10.
This unit simply consists of 10 4-bit OR gates. The gates are reconvergent so as to allow
one of the four recoded multipliers to pass as input to the partial sum of the product
register. The multiplicand select decoder enables only one of the recoded multipliers by
AND-ing that particular multiplier with a logical high and AND-ing the remaining three
recoded multipliers with logical low. Therefore each of the 4-bit OR gates will have
three guaranteed logic 0 inputs, while the other bit will be the ith bit of the enabled
recoded multiplier. The output is 10-bits wide. The circuit schematic for this component
is shown in figure 11.
A 10-bit full adder is used in computing the partial sum of the upper byte of the result
register. The inputs to the adder consist of the upper byte of the result register including
two bits for sign extension, and the other 10-bit input comes from the addend unit. The
10-bit result is re-deposited into the upper byte and sign bit of the result register. The
circuit schematic for this component is shown in figure 12.
This component is by far the most complex logic block in the circuit. The result register
is 16-bits wide and contains three additional D-Flip-Flops, two for sign extension, and the
other to store the shift-out bit. Therefore, the result register is actually 19-bits wide, but
only the 16 bits are available to the user. The result register can be in one of three modes:
load, hold, or shift. The load mode has two separate contexts for the upper and lower
bytes of the result register. For the lower byte, the load mode loads the input multiplier.
For the upper byte, the load mode simply loads 0, effectively clearing the byte. This
mode corresponds to state 2 of the finite state machine. The hold mode keeps the
contents of the result register regardless of changes at the input. This mode is valid for
state 7 of the finite state machine. The shift mode shifts the contents of the result register
to the right by two bits during each successive clock cycle. This mode is valid for states
3, 4, 5, and 6 of the finite state machine. The circuit schematic for this component is
presented in figure 13.
Load, hold, & shift modes of operation were added to standard cell D-Flip-Flops in order
to avoid including an additional 16-bit register to store a stable and valid result upon the
completion of a multiplication cycle. The need for these special flip-flops stems from the
fact that three inputs needed to be multiplexed with each D-Flip-Flop in the result
register. Some control logic was needed in order to properly multiplex the inputs with the
select signals. The corresponding states of the result register in response to each of the 8
states generated by the state machine are listed in the table of figure 14.
REGISTER CONTROL
0
0
0
0
0
1
0
1
0
0
1
1
1
0
0
1
0
1
1
1
0
1
1
1
DEFINITION
CLEAR REGISTERS
CLEAR MULTIPLICAND
LOAD MULTIPLIER
SHIFT BY 2
SHIFT BY 2
SHIFT BY 2
SHIFT BY 2
HOLD / QNEXT = QPRESENT
Figure 17. D-Flip-Flops with Load, Hold, & Shift Circuit Schematic (dfflhs)
2.2.10 The Register Mode Decoder
This is the last significant logic block to be discussed. This logic block decodes the
current state of the control unit into select lines that are used to multiplex the D-FlipFlops with load, hold, and shift capabilities. The input consists of the 3-bit state of the
control unit (this serves as the select input) as well as the inputs for the load, shift, and
hold lines. The output is one of the load, hold, or shift lines. Figure 18 shows the
schematic diagram for this component.
The modified booth multiplier inputs (multiplier and multiplicand) need to have the
binary numbers sent to them individually. If the pattern generator is set to count from 0
to 255 with two separate sets of 8 binary digits, it will not iterate as expected. For this
reason, the 8-bit multiplier and multiplicand should be combined into a 16-bit bus. If the
pattern generator is set to count a 16 bit binary number from 0 to 65536, then the
multiplier can count from 0 to 255 every time the multiplicand counts a single binary
digit. In this implementation, we make use of buffers for the combination of the
multiplier and multiplicand. A 16 bit input bus goes into the bus combine component and
is split into two 8-bit buses (one goes to the multiplier and the other to the multiplicand).
For this simulation, the clock is set to the maximum operating frequency of 30.3Mhz, for
an equivalent period of 33ns. This is done by clicking on the clock input line and adding
a clock force. The clock period should be 33ns (see figure 20).
Next, the multiplier needs to be cleared on the first clock cycle. This is done by
clicking on the reset input line and adding a force to the reset line. The reset line will
have a value of 1 at time 0ns and a value of 0 at 33ns. The multiplier should be ready
to multiply all the 65536 values after the reset cycle. To set the multiplier to be ready,
the ready input line is clicked and a force is added to the ready line. The ready line will
have a value of 0 at time 0 and a value of 1 at 33ns.
The input pattern for the multiplier and multiplicand must be created. To do this, click on
the 16 bit input bus going into the bus combine component. Next, click on the
PATTERN GENERATOR icon. The pattern generation should begin 33ns after the
assertion of the reset signal. Since the multiplier needs 7 cycles to make a multiplication,
7*33ns (or 231ns) is needed per multiplication cycle. A total of 65536 patterns are
needed, so this requires that the entire pattern sequence should be 65536*231ns or
15,138,816ns (15.1ms) long.
To view the relevant traces, select the multiplicand, multiplier, and result lines. Next,
add the selected traces in hexadecimal format. The other single bit lines can be directly
added by clicking on the TRACE or LIST button. Having completed this, the inputs
should be set and added to the list; however, the simulation still needs to be completed.
The simulation should be allowed to run slightly longer then the total time of the pattern
generation. Type run 15139000 to run the pattern sequence.
To start the creation of an IC design the 88mult component must be connected to pads for
I/O and power. Certain pads may be reserved for VDD and ground depending on your
chip layout type. AMI 0.5 technology is used to create the pad layout for this
multiplier.
The core layout will automatically be generated, as shown in figure 25. Some design
rules may be violated in the creation process and must be manually corrected. See Dr.
Milenkovics website for some tutorial tips. Typing the command Peek followed by a
number will reveal that amount of hidden layers so that errors can be more easily fixed.
You might encounter one or more of the following errors when performing a design rule
check. It is important that all DRC errors be corrected in order to increase the probability
that the chip will perform as expected after fabrication.
4.3.1 Via must NOT be stacked with contact
Select the via and then right click, select editmoveunconstrained. Move the via over
2s. Then add metal layer 1 (or metal 2) in between the via and trace. Use the same
metal layer that the trace is made out of. Right click, select addshape, and click
options on the pop-up menu that appears. Select metal 1 (or metal layer 2) to add between
the via and trace. Figure 26 shows this DRC error and the corresponding correction.
Add metal layer 1 (or metal 2) over the white indicator box. Use the same metal layer
that the trace is made out of. Right click and select addshape, click options on the popup menu, and select metal 1 (or metal layer 2) to cover the white indicator box with
metal. Figure 27 shows an example of the DRC error.
Add metal layer 1 (or metal 2) in between the white indicator lines. Use the same metal
layer that the trace is made out of and only extend the metal coverage to the length of the
shortest white line. Right click, selectshape, click options on the pop-up menu and
select metal 1 (or metal layer 2) to cover the gap between the two white indicator lines.
Figure 28 shows this particular DRC error and a typical fix.
The core of the chip should be centered inside the pad frame. Traces should connect the
core to the pad frame. The traces have two layers. If manual routing is required, then the
route should be placed to connect the core to the frame by changing layers when
necessary to avoid inadvertent connections between two separate traces. You may find
the preferred route facility of the auto-route feature useful in performing manual
routing.