Signed Integers: 2's Complement: Arithmetic Circuits & Multipliers

Signed integers: 2’s complement
N bits
-2N-1 2N-2
… … … 23 22 21 20
Range: – 2N-1 to 2N-1 – 1

Arithmetic Circuits & Multipliers “sign bit” “decimal” point
• Addition, subtraction 8-bit 2’s complement example:

• Performance issues 11010110 = –27 + 26 + 24 + 22 + 21 = – 128 + 64 + 16 + 4 + 2 = – 42
-- ripple carry
-- carry bypass If we use a two’s complement representation for signed integers,
-- carry skip the same binary addition mod 2n procedure will work for adding
-- carry lookahead positive and negative numbers (don’t need separate subtraction
• Multipliers rules). The same procedure will also handle unsigned numbers!
By moving the implicit location of “decimal” point, we can represent

Reminder: Lab #3 due tonight! fractions too:
Pizza Wed 6p 1101.0110 = –23 + 22 + 20 + 2-2 + 2-3 = – 8 + 4 + 1 + 0.25 + 0.125 = – 2.625
6.111 Fall 2017 Lecture 8 1 6.111 Fall 2017 Lecture 8 2
Sign extension Adder: a circuit that does addition

Here’s an example of binary addition as one might do it by “hand”:
Consider the 8-bit 2’s complement representation of:
Carries from previous
11 0 1 column
42 = 00101010 -5 = ~00000101 + 1 Adding two N-bit 1101
= 11111010 + 1 numbers produces an + 0101
(N+1)-bit result
= 11111011 10010
What is their 16-bit 2’s complement representation?
If we build a circuit that implements one column:
42 = 0000000000101010
________00101010 we can quickly build a circuit to add two 4-bit numbers…
“Ripple-
-5 = ________11111011
1111111111111011 carry
adder”
Extend the MSB (aka the “sign bit”)
into the higher-order bit positions

“Full Adder” building block Subtraction: A-B = A + (-B)
A B C S CO
0 0 0 0 0 Using 2’s complement representation: –B = ~B + 1
0 0 1 1 0
~ = bit-wise complement
0 1 0 1 0
The “half adder”
circuit has only the 0 1 1 0 1
A and B inputs 1 0 0 1 0
1 0 1 0 1 So let’s build an arithmetic unit that does both addition and
1 1 0 0 1 subtraction. Operation selected by control input:
1 1 1 1 1
S  ABC
But what
about the
CO  ABC  ABC  ABC  ABC “+1”?
 (A  A)BC  (B  B)AC  AB(C  C)

 BC  AC  AB
Condition Codes Condition Codes in Verilog

Besides the sum, one often wants four other Z (zero): result is = 0
bits of information from an arithmetic unit: To compare A and B,
perform A–B and use N (negative): result is < 0
Z (zero): result is = 0 big NOR gate condition codes:
Signed comparison: C (carry): indicates an add wire signed [31:0] a,b,s;

N (negative): result is < 0 SN-1
LT NV in the most significant wire z,n,v,c;
assign {c,s} = a + b;
position produced a carry, assign z = ~|s;
C (carry): indicates an add in the most LE Z+(NV)
e.g., 1111 + 0001 assign n = s[31];
significant position produced a carry, e.g.,
EQ Z
assign v = a[31]^b[31]^s[31]^c;
1111 + 0001
NE ~Z
from last FA
GE ~(NV) V (overflow): indicates that
V (overflow): indicates that the answer GT ~(Z+(NV)) the answer has too many Might be better to use sum-of-
has too many bits to be represented bits to be represented products formula for V from previous
correctly by the result width, e.g., Unsigned comparison: correctly by the result slide if using LUT implementation
0111 + 0111 LTU C width, e.g., 0111 + 0111 (only 3 variables instead of 4).
LEU C+Z
V A  B  S  A  B  S  GEU ~C
N 1 N 1 N 1 N 1 N 1N 1 GTU ~(C+Z)
V  COUT CIN
N 1 N 1
Modular Arithmetic Speed: tPD of Ripple-carry Adder
The Verilog arithmetic operators (+,-,*) all produce full-precision CO = AB + ACI + BCI
results, e.g., adding two 8-bit numbers produces a 9-bit result.
In many designs one chooses a “word size” (many computers use 32

or 64 bits) and all arithmetic results are truncated to that number of
bits, i.e., arithmetic is performed modulo 2word size.
Using a fixed word size can lead to overflow, e.g., when the operation
produces a result that’s too large to fit in the word size. One can
Worst-case path: carry propagation from LSB to MSB, (N) is read
“order N” :
e.g., when adding 11…111 to 00…001.
•Avoid overflow: choose a sufficiently large word size means that the
•Detect overflow: have the hardware remember if an operation latency of our
tPD = (N-1)*(tPD,OR + tPD,AND) + tPD,XOR  (N) adder grows at
produced an overflow – trap or check status at end worst in
•Embrace overflow: sometimes this is exactly what you want, e.g., proportion to
when doing index arithmetic for circular buffers of size 2N. CI to CO CIN-1 to SN-1 the number of
•“Correct” overflow: replace result with most positive or most bits in the
operands.
negative number as appropriate, aka saturating arithmetic. Good for
digital signal processing.
How about the tPD of this circuit? Alternate Adder Logic Formulation
How to Speed up the Critical (Carry) Path?

(How to Build a Fast Adder?)
A B
Cn-1 Cn-2 C2 C1 C0
Cin Full Co
Adder
S
Generate (G) = AB
Is the tPD of this circuit = 2 * tPD,N-BIT RIPPLE ? Propagate (P) = A  B
Nope! tPD of this circuit = tPD,N-BIT RIPPLE + tPD,FA !!!

Timing analysis is tricky!
Note: can also use P = A + B for Co
6.111 Fall 2017 Lecture 8 11

Faster carry logic Virtex II Adder Implementation
Let’s see if we can improve the speed by rewriting the equations Cout
for COUT: LUT: AB
COUT = AB + ACIN + BCIN

A B
= AB + (A + B)CIN
where G = AB B
COUT CIN = G + P CIN P=A+B A Y = A  B  Cin
P
generate propagate
S
G
Actually, P is usually
module fa(input a,b,cin, output s,cout);
wire g = a & b;
defined as P = A^B
wire p = a ^ b; which won’t change
COUT but will allow us Dedicated adder logic
assign s = p ^ cin;
assign cout = g | (p & cin); to express S as a
endmodule simple function : 1 half-Slice = 1-bit adder
S = P^CIN
Cin
Virtex II Carry Chain Carry Bypass Adder

1 CLB = 4 Slices = 2, 4-bit adders A0 B0 A1 B1 A2 B2 A3 B3
64-bit Adder: 16 CLBs

P,G P,G P,G P,G Can compute P, G
A[63:0] P0 G0 P1 G1 P2 G2 P3 G3 in parallel for all bits
+ Y[63:0]
Ci,0 C/S C/S C/S C/S
B[63:0] Co,0 Co,1 Co,2 Co,3
BP= P0P1P2P3
Y[64] P,G P,G
A[63:60]
P,G P,G
CLB15 Y[63:60] P0 G0 P1 G1 P2 G2 P3 G3
B[63:60]
Ci,0
C/S Co,0
C/S Co,1
C/S Co,2
C/S 0 Co,3
A[7:4]
B[7:4] CLB1 Y[7:4]
1
A[3:0]
CLB0 Y[3:0]
B[3:0] Key Idea: if (P0 P1 P2 P3) then Co,3 = Ci,0
CLBs must be in same column
16-bit Carry Bypass Adder Critical Path Analysis
BP= P0P1P2P3 BP= P4P5P6P7 BP= P0P1P2P3 BP2= P4P5P6P7 BP3= P8P9P10P11
BP= P8P9P10P11 BP= P12P13P14P15 BP4= P12P13P14P15
P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G
P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G P,G
Ci,0 Ci,0 Co,3
Co,3 Co,11 Co,7 Co,11
C/S C/S C/S C/S 0 C/S C/S C/S C/S 0
Co,7
C/S C/S C/S C/S 0 C/S C/S C/S C/S 0
C/S C/S C/S C/S 0 C/S C/S C/S C/S 0 C/S C/S C/S C/S 0 C/S C/S C/S C/S 0
Co,0 Co,1 Co,2 Co,4 Co,5 Co,6 Co,0 Co,1 Co,2 Co,4 Co,5 Co,6
1 1 Co,8 Co,9 Co,10 Co,12 Co,13 Co,14 1 1 Co,8 Co,9 Co,10 Co,12 Co,13 Co,14
1 1 1 1
Co,15 Co,15
What is the worst case propagation delay For the second stage, is the critical path:
for the 16-bit adder?
BP2 = 0 or BP2 = 1 ?
Assume the following for delay each gate:
P, G from A, B: 1 delay unit Message: Timing analysis is very tricky –
P, G, Ci to Co or Sum for a C/S: 1 delay unit Must carefully consider data dependencies for false paths
2:1 mux delay: 1 delay unit
Carry Bypass vs Ripple Carry Carry Lookahead Adder (CLA)

Ripple Carry: tadder = (N-1) tcarry + tsum • Recall that COUT = G + P CIN where G = A&B and P = A^B
Carry Bypass: tadder = 2(M-1) tcarry + tsum + (N/M-1) tbypass
This image
cannot
currently
• For adding two N-bit numbers:
tadder
be
This
display ed.
image
cannot
currentl
y be
M = bypass CN = GN-1 + PN-1CN-1

word size
ripple adder = GN-1 + PN-1 GN-2 + PN-1 PN-2CN-2
N = number
of bits being = GN-1 + PN-1 GN-2 + PN-1 PN-2GN-3 + … + PN-1 ...P0CIN
added
CN in only 3 gate delays* :

bypass adder 1 for P/G generation, 1 for ANDs, 1 for final OR
*assuming gates with N inputs
• Idea: pre-compute all carry bits as f(Gs,Ps,CIN)

N
This image cannot currently be display ed 4..8 This image
cannot
image
currently be
cannot
display ed.
currentl
y be …

Carry Lookahead Circuits The 74182 Carry Lookahead Unit
Block Generate and Propagate 8-bit CLA (P/G generation)

G and P can be computed for groups of bits (instead of just for
individual bits). This allows us to choose the maximum fan-in we
want for our logic gates and then build a hierarchical carry
chain using these equations:
CJ+1 = GIJ + PIJCI “generate a carry from bits I thru
K if it is generated in the high-order
(J+1,K) part of the block or if it is
GIK = GJ+1,K + PJ+1,K GIJ generated in the low-order (I,J) part
of the block and then propagated
PIK = PIJ PJ+1,K thru the high part”
where I < J and J+1 < K

Log2(N)
P/G generation
1st level of
lookahead
Hierarchical building block From Hennessy & Patterson, Appendix A

8-bit CLA (carry generation) 8-bit CLA (complete)
Log2(N)
tPD = Θ(log(N))
Unsigned Multiplication Combinational Multiplier (unsigned)

X3 X2 X1 X0 multiplicand
* Y3 Y2 Y1 Y0 multiplier
--------------------
X3Y0 X2Y0 X1Y0 X0Y0 Partial products, one for each bit in
A3 A2 A1 A0 + X3Y1 X2Y1 X1Y1 X0Y1 multiplier (each bit needs just one
x B3 B2 B1 B0 + X3Y2 X2Y2 X1Y2 X0Y2 AND gate)
+ X3Y3 X2Y3 X1Y3 X0Y3
y0
x3 x2 x1 x0
A3B0 A2B0 A1B0 A0B0
-----------------------------------------
ABi called a “partial product” Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0
y1
A3B1 A2B1 A1B1 A0B1
x3 x2 x1 x0
z0
A3B2 A2B2 A1B2 A0B2
HA FA FA HA
+ A3B3 A2B3 A1B3 A0B3
 Propagation delay ~2N x3 x2 x1 x0
y2
z1
FA FA FA HA
Multiplying N-bit number by M-bit number gives (N+M)-bit result x3 x2 x1 x0
y3
z2
Easy part: forming partial products
(just an AND gate since BI is either 0 or 1) FA FA FA HA
Hard part: adding M N-bit partial products

z7 z6 z5 z4 z3
Combinational Multiplier (signed!) 2’s Complement Multiplication
X3 X2 X1 X0
(Baugh-Wooley)
* Y3 Y2 Y1 Y0 Step 1: two’s complement operands so high Step 3: add the ones to the partial
-------------------- order bit is –2N-1. Must sign extend partial products and propagate the carries. All
X3Y0 X3Y0 X3Y0 X3Y0 X3Y0 X2Y0 X1Y0 X0Y0 products and subtract the last one the sign extension bits go away!
+ X3Y1 X3Y1 X3Y1 X3Y1 X2Y1 X1Y1 X0Y1
X3 X2 X1 X0 X3Y0 X2Y0 X1Y0 X0Y0
+ X3Y2 X3Y2 X3Y2 X2Y2 X1Y2 X0Y2
* Y3 Y2 Y1 Y0 + X3Y1 X2Y1 X1Y1 X0Y1
- X3Y3 X3Y3 X2Y3 X1Y3 X0Y3 -------------------- + X2Y2 X1Y2 X0Y2 X0Y2
----------------------------------------- y0 X3Y0 X3Y0 X3Y0 X3Y0 X3Y0 X2Y0 X1Y0 X0Y0
x3 x2 x1 x0 + X3Y3 X2Y3 X1Y3 X0Y3
Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0 + X3Y1 X3Y1 X3Y1 X3Y1 X2Y1 X1Y1 X0Y1 + 1
+ X3Y2 X3Y2 X3Y2 X2Y2 X1Y2 X0Y2 - 1 1 1 1
y1 - X3Y3 X3Y3 X2Y3 X1Y3 X0Y3
x3 x2 x1 x0
-----------------------------------------
z0
Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0
FA FA FA FA FA FA HA Step 2: don’t want all those extra additions, so Step 4: finish computing the constants…
x3 x2 x1 x0
y2 add a carefully chosen constant, remembering
z1 to subtract it at the end. Convert subtraction
into add of (complement + 1).
X3Y0 X2Y0 X1Y0 X0Y0
FA FA FA FA FA HA X3Y0 X3Y0 X3Y0 X3Y0 X3Y0 X2Y0 X1Y0 X0Y0 + X3Y1 X2Y1 X1Y1 X0Y1
+ 1 + X2Y2 X1Y2 X0Y2 X0Y2
x3 x2 x1 y3 + X3Y3 X2Y3 X1Y3 X0Y3
x0 + X3Y1 X3Y1 X3Y1 X3Y1 X2Y1 X1Y1 X0Y1
z2 + 1 + 1 1
+ X3Y2 X3Y2 X3Y2 X2Y2 X1Y2 X0Y2
FA FA FA FA FA 1 NB: There are tricks we + 1
can use to eliminate the + X3Y3 X3Y3 X2Y3 X1Y3 X0Y3 –B = ~B + 1 Result: multiplying 2’s complement operands
takes just about same amount of hardware as
+ 1
extra circuitry we + 1
z7 z6 z5 multiplying unsigned operands!
added…
z4 z3 - 1 1 1 1
Baugh Wooley Formulation –The Math 2’s Complement Multiplication

no insight required X3Y0 X2Y0 X1Y0 X0Y0
Assuming X and Y are 4-bit twos complement numbers: + X3Y1 X2Y1 X1Y1 X0Y1
y0
+ X2Y2 X1Y2 X0Y2 X0Y2 x3 x2 x1 x0
2 2
X = -23x3 + Σ xi2i Y = -23y3 + Σ yi2i +
+ 1
X3Y3 X2Y3 X1Y3 X0Y3
1
i=0 i=0
y1
x3 x2 x1 x0
The product of X and Y is: 1 z0
2 2 2 2
XY = x3y326 - Σ xiy32i+3 - Σ x3yj2j+3 + Σ Σ xiyj2i+j FA FA HA
i=0 j=0 i=0 j=0 FA
y2
For twos3 complement,3 the following is true: x3 x2 x1 x0
-Σ xi2i = -24 + Σ xi2i + 1 z1
i=0 i=0
FA FA FA
The product then becomes: HA
2 2 2 2 y3
XY = x3y326 + Σ xiy32i+3 + 23 - 26 + Σ x3yj2j+3 + 23 – 26 + Σ Σ xiyj2i+j x3 x2 x1 x0
i=0 j=0 i=0 j=0 1 z2
2 2 2 2
= x3y326 + Σ xiy32i+3 + Σ x3yj2j+3 + Σ Σ xiyj2i+j + 24 – 27
i=0 j=0 i=0 j=0 FA
HA FA FA HA
= – 27 + x3y326 + (x2y3 + x3y2)25 + (x1y3 + x3y1 + x2y2 +1)24

z7 z6 z5 z4 z3
+ (x0y3 + x3y0 + x1y2 + x2y1)23 + (x0y2 + x1y1 + x2y0)22 1
+ (x0y1 + x1y0)21 +(x0y0)20
6.111 Fall 2017 Lecture 8 32
Multiplication in Verilog Multiplication on the FPGA
You can use the “*” operator to multiply two numbers: Hardware multiplier block: two 18-bit twos complement (signed) operands
wire [9:0] a,b;

wire [19:0] result = a*b; // unsigned multiplication!
If you want Verilog to treat your operands as signed two’s

tPD ≈ 10ns
complement numbers, add the keyword signed to your wire or
reg declaration:
wire signed [9:0] a,b; In the XC2V6000: 6 columns of mults, 24 in each column = 144 mults
wire signed [19:0] result = a*b; // signed multiplication!
Remember: unlike addition and subtraction, you need different

circuitry if your multiplication operands are signed vs. unsigned.
Same is true of the >>> (arithmetic right shift) operator. To get
signed operations all operands must be signed.
To make a signed constant: 10’sh37C

Sequential Multiplier Bit-Serial Multiplication
Assume the multiplicand (A) has N bits and the Init: P = 0; Load A,B
multiplier (B) has M bits. If we only want to invest in a
single N-bit adder, we can build a sequential circuit
Repeat M times {
P A Repeat N times {
that processes a single partial product at a time and B shift A,P:
then cycle the circuit M times: Amsb = Alsb
Pmsb = Plsb + Alsb*Blsb + C/0
0 }
shift P,B: Pmsb = C, Bmsb = Plsb
Init: P0, load A and B
SN SN-1…S0 LSB C FA }
P B NC A Repeat M times { (N+M)-bit result in P/B

M bits
P  P + (BLSB==1 ? A : 0)
N 1 N
shift P/B right one bit
+ xN
}
N+1
Done: (N+M)-bit result in P/B

Combinational Multiplier (unsigned) Useful building block: Carry-Save Adder
X3 X2 X1 X0 multiplicand
* Y3 Y2 Y1 Y0 multiplier Good for pipelining: delay
-------------------- through each partial product
Partial products, one for each bit in
(except the last) is just
X3Y0 X2Y0 X1Y0 X0Y0
+ X3Y1 X2Y1 X1Y1 X0Y1 multiplier (each bit needs just one
+ X3Y2 X2Y2 X1Y2 X0Y2 AND gate)
tPD,AND + tPD,FA.
+ X3Y3 X2Y3 X1Y3 X0Y3
y0 No carry propagation time!
----------------------------------------- x3 x2 x1 x0
Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0
y1
x3 x2 x1 x0
z0 CSA
HA FA FA HA
 Propagation delay ~2N x3 x2 x1 x0

y2
z1
FA FA FA HA
x3 x2 x1 y3
x0
z2
FA FA FA HA
Last stage is still a carry-propagate adder (CPA)

z7 z6 z5 z4 z3
Wallace Tree Multiplier

Wallace Tree *
Four Bit Multiplier
This is called a 3:2
counter by multiplier
...
CSA CSA CSA

hackers: counts
number of 1’s on the
3 inputs, outputs 2-
bit result.
CSA CSA
Wallace Tree: O(log1.5M)

Combine groups of CSA
three bits at a time
CSA
Higher fan-in adders can be used

to further reduce delays for large *Digital Integrated Circuits
M. CPA J Rabaey, A Chandrakasan, B Nikolic
4:2 compressors and 5:3
counters are popular
building blocks.
Multiplication by a constant Booth Recoding: Higher-radix mult.
• If one of the operands is a constant, make it the multiplier (B in
the earlier examples). For each “1” bit in the constant we get a Idea: If we could use, say, 2 bits of the multiplier in generating each
partial product (PP) – may be noticeably fewer PPs than in the partial product we would halve the number of columns and halve the
general case. latency of the multiplier!
– For example, in general multiplying two 4-bit operands generates
four PPs (3 rows of full adders). If the multiplier is say, 12 AN-1 AN-2 … A4 A3 A2 A1 A0
(4’b1100), then there are only two PPs: 8*A+4*A (only 1 row of full x BM-1 BM-2 … B3 B2 B1 B0
adders).
– But lots of “1”s means lots of PPs… can we improve on this?
• If we allow ourselves to subtract PPs as well as adding them (the M/2 2
hardware cost is virtually the same), we can re-encode
arbitrarily long contiguous runs of “1” bits in the multiplier to ...
produce just two PPs.
…011110… = …100000… - …000010… = …0100010…

BK+1,K*A = 0*A  0
Booth’s insight: rewrite = 1*A  A
where 1 indicates subtracting a PP instead of adding it. Thus we’ve re- 2*A and 3*A cases, leave = 2*A  4A – 2A
encoded the multiplier using 1,0,-1 digits – aka canonical signed digit –
4A for next partial = 3*A  4A – A
greatly reducing the number of additions required.
product to do!
Booth recoding Summary

On-the-fly canonical signed digit encoding!
current bit pair from previous bit pair
• Performance of arithmetic blocks dictate the
BK+1 BK BK-1 action performance of a digital system
0 0 0 add 0 • Architectural and logic transformations can enable
0 0 1 add A significant speed up (e.g., adder delay from O(N)
0 1 0 add A to O(log2(N))
0 1 1 add 2*A • Similar concepts and formulation can be applied at
1 0 0 sub 2*A
1 0 1 sub A the system level
-2*A+A
1 1 0 sub A • Timing analysis is tricky: watch out for false paths!
1 1 1 add 0 -A+A
• Area-Delay trade-offs (serial vs. parallel
A “1” in this bit means the previous stage implementations)
needed to add 4*A. Since this stage is shifted
by 2 bits with respect to the previous stage,
adding 4*A in the previous stage is like adding
A in this stage!
Lab 4 Car Alarm - Design Approach Car Alarm – Inputs &
• Read lab/specifications carefully, use reasonable
Outputs
interpretation
• Use modular design – don’t put everything into labkit.v
Inputs:
• Design the FSM! • passenger door switch
• driver door switch
– Define the inputs • ignition switch
• hidden switch
– Define the outputs • brake pedal switch
– Transition rules
• Logical modules:
– fsm.v
Outputs:
– timer.v // the hardest module!! • fuel pump power
• status indicator
– siren.v • siren
– fuel_pump.v
• Run simulation on each module!
• Use hex display: show state and time Figure 1: System diagram showing sensors (inputs) and actuators (outputs)
• Use logic analyzer in Vivado
Car Alarm – CMOS Implementation Debugging Hints – Lab 4
• Design Specs • Implement a warp speed debug mode for the one hz clock.
– Operating voltage 8-18VDC This will allow for viewing signals on the logic analyzer or
– Operating temp: -10C +65C Modelsim without waiting for 27/25 million clock cycles.
– Attitude: sea level Avoids recomplilations.
Fuel pump
Cloaking – Shock/Vibration
device relay
• Notes assign warp_speed = sw[6];
– Protected against 24V power always @ (posedge clk) begin
surges
if (count == (warp_speed ? 3 : 26_999_999)) count <= 0;
– CMOS implementation
else count <= count +1;
– CMOS inputs protected against
end
200V noise spikes
– On state DC current <10ma
assign one_hz = (count == (warp_speed ? 3 : 26_999_999)) ;
– Include T_PASSENGER_DELAY
and Fuel Pump Disable
– System disabled (cloaked) when
being serviced.

One Hz Ticks in Modelsim For Loops, Repeat Loops
in Simulation
To create a one hz tick, use the following in the Verilog test fixture:
integer i; // index must be declared as integer
integer irepeat;
always #5 clk=!clk;
always begin // this will just wait 10ns, repeated 32x.
#5 tick = 1; // simulation only! Cannot implement #10 in hardware!
#10 tick = 0; irepeat =0;
#15; repeat(32) begin
end #10;
irepeat = irepeat + 1;
initial begin end
// Initialize Inputs
clk = 0;
tick = 0; . . . // this will wait #10ns before incrementing the for loop
for (i=0; i<16; i=i+1) begin
#10; // wait #10 before increment.
// @(posedge clk);
// add to index on posedge
end
// other loops: forever, while
Edge Detection Vivado ILA

• Integrated Logic Analyzer (ILA) IP core
– logic analyzer core that can be used to monitor the internal signals
of a design
– includes many advanced features of modern logic analyzers
• Boolean trigger equations,
• edge transition triggers ...
– no physical probes to hook up!
• Bit file must be loaded on target device. Not simulation.

reg signal_delayed;
• Tutorial
always @(posedge clk) http://web.mit.edu/6.111/www/f2017/handouts/labs/ila.html
signal_delayed <= signal;
assign rising_edge = signal && !signal_delayed;

assign falling_edge = !signal && signal_delayed;

6.111 Fall 2017 Lecture 1 56
6.111 Fall 2017 Lecture 1 57
Student Comments
• “All very reasonable except for lab 4, Car Alarm. Total pain in the
ass. “
• “The labs were incredibly useful, interesting, and helpful for

learning. Lab 4 (car alarm) is long and difficult, but overall the
labs are not unreasonable.”
6.111 Fall 2017 Lecture 8 59

Signed Integers: 2's Complement: Arithmetic Circuits & Multipliers

Uploaded by

Copyright:

Available Formats

Signed Integers: 2's Complement: Arithmetic Circuits & Multipliers

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Signed Integers: 2's Complement: Arithmetic Circuits & Multipliers

Uploaded by

Copyright:

Available Formats

Signed integers: 2’s complement

Range: – 2N-1 to 2N-1 – 1

• Addition, subtraction 8-bit 2’s complement example:

By moving the implicit location of “decimal” point, we can represent

Sign extension Adder: a circuit that does addition

6.111 Fall 2017 Lecture 8 3 6.111 Fall 2017 Lecture 8 4

 (A  A)BC  (B  B)AC  AB(C  C)

Condition Codes Condition Codes in Verilog

Signed comparison: C (carry): indicates an add wire signed [31:0] a,b,s;

In many designs one chooses a “word size” (many computers use 32

How to Speed up the Critical (Carry) Path?

Nope! tPD of this circuit = tPD,N-BIT RIPPLE + tPD,FA !!!

6.111 Fall 2017 Lecture 8 11

COUT = AB + ACIN + BCIN

6.111 Fall 2017 Lecture 8 13 6.111 Fall 2017 Lecture 8 14

Virtex II Carry Chain Carry Bypass Adder

64-bit Adder: 16 CLBs

6.111 Fall 2017 Lecture 8 17 6.111 Fall 2017 Lecture 8 18

Carry Bypass vs Ripple Carry Carry Lookahead Adder (CLA)

M = bypass CN = GN-1 + PN-1CN-1

CN in only 3 gate delays* :

• Idea: pre-compute all carry bits as f(Gs,Ps,CIN)

6.111 Fall 2017 Lecture 8 19 6.111 Fall 2017 Lecture 8 20

6.111 Fall 2017 Lecture 8 21 6.111 Fall 2017 Lecture 8 22

Block Generate and Propagate 8-bit CLA (P/G generation)

where I < J and J+1 < K

Hierarchical building block From Hennessy & Patterson, Appendix A

6.111 Fall 2017 Lecture 8 23 6.111 Fall 2017 Lecture 8 24

6.111 Fall 2017 Lecture 8 25 6.111 Fall 2017 Lecture 8 26

Unsigned Multiplication Combinational Multiplier (unsigned)

Hard part: adding M N-bit partial products

Baugh Wooley Formulation –The Math 2’s Complement Multiplication

= – 27 + x3y326 + (x2y3 + x3y2)25 + (x1y3 + x3y1 + x2y2 +1)24

wire [9:0] a,b;

If you want Verilog to treat your operands as signed two’s

Remember: unlike addition and subtraction, you need different

To make a signed constant: 10’sh37C

Sequential Multiplier Bit-Serial Multiplication

P B NC A Repeat M times { (N+M)-bit result in P/B

Done: (N+M)-bit result in P/B

6.111 Fall 2017 Lecture 8 35 6.111 Fall 2017 Lecture 8 36

 Propagation delay ~2N x3 x2 x1 x0

Last stage is still a carry-propagate adder (CPA)

Wallace Tree Multiplier

CSA CSA CSA

Wallace Tree: O(log1.5M)

Higher fan-in adders can be used

…011110… = …100000… - …000010… = …0100010…

Booth recoding Summary

• Use logic analyzer in Vivado

6.111 Fall 2013 Lecture 8 45 6.111 Fall 2017 Lecture 8 46

Car Alarm – CMOS Implementation Debugging Hints – Lab 4

6.111 Fall 2017 Lecture 8 47 6.111 Fall 2013 Lecture 8 48

// other loops: forever, while

6.111 Fall 2013 Lecture 8 49 6.111 Fall 2013 Lecture 8 50

Edge Detection Vivado ILA

• Bit file must be loaded on target device. Not simulation.

assign rising_edge = signal && !signal_delayed;