ALU and MAC Notes
ALU and MAC Notes
ALU and MAC Notes
2.1 OVERVIEW
This chapter describes the architecture and function of the three
computational units: the arithmetic/logic unit, the multiplier/
accumulator and the barrel shifter.
2.1.2 Unsigned
Unsigned binary numbers may be thought of as positive, having nearly
twice the magnitude of a signed number of the same length. The least
significant words of multiple precision numbers are treated as unsigned
numbers.
2–1
2 Computational Units
2.1.4 Fractional Representation: 1.15
ADSP-2100 family arithmetic is optimized for numerical values in a
fractional binary format denoted by 1.15 (“one dot fifteen”). In the 1.15
format, there is one sign bit (the MSB) and fifteen fractional bits
representing values from –1 up to one LSB less than +1.
Figure 2.1 shows the bit weighting for 1.15 numbers. Below are examples
of 1.15 numbers and their decimal equivalents.
2–2
Computational Units 2
2.1.6 MAC Arithmetic
The multiplier produces results that are binary strings. The inputs are
“interpreted” according to the information given in the instruction itself
(signed times signed, unsigned times unsigned, a mixture, or a rounding
operation). The 32-bit result from the multiplier is assumed to be signed,
in that it is sign-extended across the full 40-bit width of the MR register
set.
When the processor multiplies two 1.15 operands, the result is a 2.30
(2 sign bits, 30 fractional bits) number. In the fractional mode, the MAC
automatically shifts the multiplier product (P) left one bit before
transferring the result to the multiplier result register (MR). This shift
causes the multiplier result to be in 1.31 format, which can be rounded to
1.15 format. Figure 2.7, in the MAC section of this chapter, shows this.
In the integer mode, the left shift does not occur. For example, if the
operands are in the 16.0 format, the 32-bit multiplier result would be in
32.0 format. A left shift is not needed; it would change the numerical
representation. Figure 2.8 in the MAC section of this chapter shows this.
2–3
2 Computational Units
2.1.8 Summary
Table 2.1 summarizes some of the arithmetic characteristics of ADSP-2100
family operations. In addition to the numeric types described in this
section, the ADSP-2100 Family C Compiler supports a form of 32-bit
floating-point in which one 16-bit word is the exponent and the other
word is the mantissa. See the ADSP-2100 Family C Tools Manual.
MAC, Fractional
Multiplication (P) 1.15 Explicitly signed/unsigned 32 bits (2.30)
Multiplication (MR) 1.15 Explicitly signed/unsigned 2.30 shifted to 1.31
Mult / Add 1.15 Explicitly signed/unsigned 2.30 shifted to 1.31
Mult / Subtract 1.15 Explicitly signed/unsigned 2.30 shifted to 1.31
MAC Saturation Signed same as operands
Shifter
2–4
Computational Units 2
2.2 ARITHMETIC/LOGIC UNIT (ALU)
The arithmetic/logic unit (ALU) provides a standard set of arithmetic and
logical functions. The arithmetic functions are add, subtract, negate,
increment, decrement and absolute value. These are supplemented by two
division primitives with which multiple cycle division can be constructed.
The logic functions are AND, OR, XOR (exclusive OR) and NOT.
The ALU is 16 bits wide with two 16-bit input ports, X and Y, and one
output port, R. The ALU accepts a carry-in signal (CI) which is the carry
bit from the processor arithmetic status register (ASTAT). The ALU
generates six status signals: the zero (AZ) status, the negative (AN) status,
the carry (AC) status, the overflow (AV) status, the X-input sign (AS)
status, and the quotient (AQ) status. All arithmetic status signals are
latched into the arithmetic status register (ASTAT) at the end of the cycle.
Please see the “Instruction Set Reference” chapter of this manual for
information on how each instruction affects the ALU flags.
The X input port of the ALU can accept data from two sources: the AX
register file or the result (R) bus. The R bus connects the output registers of
all the computational units, permitting them to be used as input operands
directly. The AX register file is dedicated to the X input port and consists
of two registers, AX0 and AX1. These AX registers are readable and
writable from the DMD bus. The instruction set also provides for reading
these registers over the PMD bus, but there is no direct connection; this
operation uses the DMD-PMD bus exchange unit. The AX register file
outputs are dual-ported so that one register can provide input to the ALU
while either one simultaneously drives the DMD bus.
The Y input port of the ALU can also accept data from two sources: the
AY register file and the ALU feedback (AF) register. The AY register file is
dedicated to the Y input port and consists of two registers, AY0 and AY1.
These registers are readable and writable from the DMD bus and writable
from the PMD bus. The instruction set also provides for reading these
registers over the PMD bus, but there is no direct connection; this
operation uses the DMD-PMD bus exchange unit. The AY register file
outputs are also dual-ported: one AY register can provide input to the
ALU while either one simultaneously drives the DMD bus.
2–5
2 Computational Units
The output of the ALU is loaded into either the ALU feedback (AF)
register or the ALU result (AR) register. The AF register is an ALU
internal register which allows the ALU result to be used directly as the
ALU Y input. The AR register can drive both the DMD bus and the R bus.
It is also loadable directly from the DMD bus. The instruction set also
provides for reading AR over the PMD bus, but there is no direct
connection; this operation uses the DMD-PMD bus exchange unit.
PMD BUS 24
16 (UPPER)
DMD BUS 16
MUX
AX AY
REGISTERS REGISTERS
2 x 16 2 x 16
16 16
MUX MUX
X Y AF
AZ REGISTER
AN
AC
ALU CI
AV
AS
AQ
R
16
16
MUX
AR
REGISTER
16 R - BUS
The ALU contains a duplicate bank of registers, shown in Figure 2.2 behind the
primary registers. There are actually two sets of AR, AF, AX, and AY register
files. Only one bank is accessible at a time. The additional bank of registers can
be activated (such as during an interrupt service routine) for extremely fast
context switching. A new task, like an interrupt service routine, can be
executed without transferring current states to storage.
2–7
2 Computational Units
2.2.3 ALU Input/Output Registers
The sources of ALU input and output registers are shown below.
MR0, MR1 and MR2 are multiplier/accumulator result registers; SR0 and
SR1 are shifter result registers.
2–8
Computational Units 2
When the ALU saturation mode is used, only the AR register saturates; if
the AF register is the destination, wrap-around will occur but the flags
will reflect the saturated result.
2.2.7 Division
The ALU supports division. The divide function is achieved with
additional shift circuitry not shown in Figure 2.2. Division is accomplished
with two special divide primitives. These are used to implement a non-
restoring conditional add-subtract division algorithm. The division can be
either signed or unsigned; however, the dividend and divisor must both
be of the same type. Appendix B details various exceptions to the normal
division operation as described in this section.
2–9
2 Computational Units
15
LEFT SHIFT
L
AX0 AX1 AY1 AF S AY0
B
16 LOWER
DIVIDEND
MUX MUX
UPPER
DIVIDEND
MSB
DIVISOR MSB
AQ
R-BUS X Y
ALU
R = PASS Y
15 LSBs
2 – 10
Computational Units 2
divisor MSB and the ALU output MSB, and the quotient bit is this value
inverted. The quotient bit is loaded into the LSB of the AY0 register which is
also shifted left by one bit. The DIVQ operation is illustrated in Figure 2.4.
15
LEFT SHIFT
L
AX0 AX1 AF S AY0
B
LOWER
DIVIDEND
PARTIAL
REMAINDER
MUX
16
DIVISOR MSB
R-BUS X Y AQ
ALU
R=Y+X IF AQ=1
R=Y-X IF AQ=0
1 MSB
15 LSBs
2 – 11
2 Computational Units
Some format manipulation may be necessary to guarantee the validity of
the quotient. For example, if both operands are signed and fully fractional
(dividend in 1.31 format and divisor in 1.15 format) the result is fully
fractional (in 1.15 format) and therefore the dividend must be smaller than
the divisor for a valid result.
To divide two integers (dividend in 32.0 format and divisor in 16.0 format)
and produce an integer quotient (in 16.0 format), you must shift the
dividend one bit to the left (into 31.1 format) before dividing. Additional
discussion and code examples can be found in the handbook Digital Signal
Processing Applications Using the ADSP-2100 Family, Volume 1.
Dividend BBBBB.BBBBBBBBBBBBBBBBBBBBBBBBBBB
NL bits NR bits
Divisor BB.BBBBBBBBBBBBBB
DL bits DR bits
Quotient BBBB.BBBBBBBBBBBB
2 – 12
Computational Units 2
2.2.8 ALU Status
The ALU status bits in the ASTAT register are defined below. Complete
information about the ASTAT register and specific bit mnemonics and
positions is provided in the Program Control chapter.
The multiplier has two 16-bit input ports X and Y, and a 32-bit product output
port P. The 32-bit product is passed to a 40-bit adder/subtracter which adds
or subtracts the new product from the content of the multiplier result (MR)
register, or passes the new product directly to MR. The MR register is 40 bits
wide. In this manual, we refer to the entire register as MR. The register
actually consists of three smaller registers: MR0 and MR1 which are 16 bits
wide and MR2 which is 8 bits wide.
2 – 13
2 Computational Units
PMD BUS 24
16 (UPPER)
DMD BUS 16
MUX
MX MY
REGISTERS REGISTERS
2 x 16 2 x 16
16 16
MUX MUX
X Y MF
MULTIPLIER REGISTER
40 32 16
ADD / SUBTRACT
MV
R2 R1 R0
8 16 16
M
U
X
16 R - BUS
2 – 14
Computational Units 2
The input/output registers of the MAC are similar to the ALU.
The X input port can accept data from either the MX register file or from
any register on the result (R) bus. The R bus connects the output registers
of all the computational units, permitting them to be used as input
operands directly. There are two registers in the MX register file, MX0 and
MX1. These registers can be read and written from the DMD bus. The MX
register file outputs are dual-ported so that one register can provide input
to the multiplier while either one simultaneously drives the DMD bus.
The Y input port can accept data from either the MY register file or the MF
register. The MY register file has two registers, MY0 and MY1; these
registers can be read and written from the DMD bus and written from the
PMD bus. The instruction set also provides for reading these registers over
the PMD bus, but there is no direct connection; this operation uses the
DMD-PMD bus exchange unit. The MY register file outputs are also dual-
ported so that one register can provide input to the multiplier while either
one simultaneously drives the DMD bus.
Any of the registers associated with the MAC can be both read and
written in the same cycle. Registers are read at the beginning of the cycle
and written at the end of the cycle. A register read, therefore, reads the
value loaded at the end of a previous cycle. A new value written to a
register cannot be read out until a subsequent cycle. This allows an input
register to provide an operand to the MAC at the beginning of the cycle
and be updated with the next operand from memory at the end of the
same cycle. It also allows a result register to be stored in memory and
updated with a new result in the same cycle. See the discussion of
“Multifunction Instructions” in Chapter 15 “Instruction Set Reference” for
an illustration of this same-cycle read and write.
2 – 15
2 Computational Units
The MAC contains a duplicate bank of registers, shown in Figure 2.6
behind the primary registers. There are actually two sets of MR, MF, MX,
and MY register files. Only one bank is accessible at a time. The additional
bank of registers can be activated for extremely fast context switching. A
new task, such as an interrupt service routine, can be executed without
transferring current states to storage.
The ADSP-2100 family provides two modes for the standard multiply/
accumulate function: fractional mode for fractional numbers (1.15), and
integer mode for integers (16.0).
In the fractional mode, the 32-bit P output is format adjusted, that is, sign-
extended and shifted one bit to the left before being added to MR. For
example, bit 31 of P lines up with bit 32 of MR (which is bit 0 of MR2) and
bit 0 of P lines up with bit 1 of MR (which is bit 1 of MR0). The LSB is zero-
filled. The fractional multiplier result format is shown in Figure 2.7.
In the integer mode, the 32-bit P register is not shifted before being added
to MR. Figure 2.8 shows the integer-mode result placement.
The mode is selected by bit 4 of the mode status register (MSTAT). If this
bit is a 1, the integer mode is selected. Otherwise, the fractional mode is
selected. In either mode, the multiplier output P is fed into a 40-bit adder/
subtracter which adds or subtracts the new product with the current
contents of the MR register to form the final 40-bit result R.
2 – 16
Computational Units 2
31 31 31 31 31 31 31 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
31 31 31 31 31 31 31 31 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
2 – 17
2 Computational Units
2.3.2.2 Input Formats
To facilitate multiprecision multiplications, the multiplier accepts X and Y
inputs represented in any combination of signed twos-complement format
and unsigned format.
X input Y input
signed x signed
unsigned x signed
signed x unsigned
unsigned x unsigned
The input formats are specified as part of the instruction. These are
dynamically selectable each time the multiplier is used.
The (signed x signed) mode is used when multiplying two signed single
precision numbers or the two upper portions of two signed multiprecision
numbers.
The (unsigned x signed) and (signed x unsigned) modes are used when
multiplying the upper portion of a signed multiprecision number with the
lower portion of another or when multiplying a signed single precision
number by an unsigned single precision number.
2 – 18
Computational Units 2
The 8-bit MR2 register is tied to the lower 8 bits of these buses. When MR2
is output onto the DMD bus or the R bus, it is sign extended to form a 16-
bit value. MR1 also has an automatic sign-extend capability. When MR1 is
loaded from the DMD bus, every bit in MR2 will be set to the sign bit
(MSB) of MR1, so that MR2 appears as an extension of MR1. To load the
MR2 register with a value other than MR1’s sign extension, you must load
MR2 after MR1 has been loaded. Loading MR0 affects neither MR1 nor
MR2; no sign extension occurs in MR0 loads.
Overflowing beyond the MSB of MR2 should never be allowed. The true
sign bit of the result is then irretrievably lost and saturation may not
produce a correct value. It takes more than 255 overflows (MV type) to
reach this state, however.
2 – 19
2 Computational Units
2.3.2.6 Rounding Mode
The accumulator has the capability for rounding the 40-bit result R at the
boundary between bit 15 and bit 16. Rounding can be specified as part of
the instruction code. The rounded output is directed to either MR or MF.
When rounding is invoked with MF as the output register, register
contents in MF represent the rounded 16-bit result. Similarly, when MR is
selected as the output, MR1 contains the rounded 16-bit result; the
rounding effect in MR1 affects MR2 as well and MR2 and MR1 represent
the rounded 24-bit result.
Using x to represent any bit pattern (not all zeros), here are two examples
of rounding. The first example is the typical rounding operation.
Bit 15 = 1
Add 1 to bit 15 and carry 1
The compensation to avoid net bias becomes visible when the lower 15
bits are all zero and bit 15 is one, i.e. the midpoint value.
2 – 20
Computational Units 2
Example 2 MR2 MR1 MR0
In this last case, bit 16 is forced to zero. This algorithm is employed on every
rounding operation, but is only evident when the bit patterns shown in the
lower 16 bits of the last example are present.
This mode only has an effect when the MR0 register contains 0x8000; all
other rounding operations work normally. This mode allows more efficient
implementation of bit-specified algorithms that use biased rounding, for
example the GSM speech compression routines. Unbiased rounding is
preferred for most algorithms.
2 – 21
2 Computational Units
2.4 BARREL SHIFTER
The shifter provides a complete set of shifting functions for 16-bit inputs,
yielding a 32-bit output. These include arithmetic shift, logical shift and
normalization. The shifter also performs derivation of exponent and
derivation of common exponent for an entire block of numbers. These
basic functions can be combined to efficiently implement any degree of
numerical format control, including full floating-point representation.
The shifter array is a 16x32 barrel shifter. It accepts a 16-bit input and can
place it anywhere in the 32-bit output field, from off-scale right to off-scale
left, in a single cycle. This gives 49 possible placements within the 32-bit
field. The placement of the 16 input bits is determined by a control code
(C) and a HI/LO reference signal.
The shifter array and its associated logic are surrounded by a set of
registers. The shifter input (SI) register provides input to the shifter array
and the exponent detector. The SI register is 16 bits wide and is readable
and writable from the DMD bus. The shifter array and the exponent
detector also take as inputs AR, SR or MR via the R bus. The shifter result
(SR) register is 32 bits wide and is divided into two 16-bit sections, SR0
and SR1. The SR0 and SR1 registers can be loaded from the DMD bus and
output to either the DMD bus or the R bus. The SR register is also fed back
to the OR/PASS logic to allow double-precision shift operations.
The SE register (“shifter exponent”) is 8 bits wide and holds the exponent
during the normalize and denormalize operations. The SE register is
loadable and readable from the lower 8 bits of the DMD bus. It is a twos-
complement, 8.0 value.
Whenever the SE or SB registers are output onto the DMD bus, they are
sign-extended to form a 16-bit value.
2 – 22
Computational Units 2
DMD BUS 16
MUX SI
REGISTER
SB
REGISTER
MUX
SS
X
EXPONENT
COMPARE
DETECTOR
I X
HI / LO R SHIFTER
8 ARRAY
C
O
32
MUX MUX 32
OR / PASS
SE
REGISTER
16 16
NEGATE
8
MUX MUX
16 16 MUX
R - BUS
16
Any of the SI, SE or SR registers can be read and written in the same cycle.
Registers are read at the beginning of the cycle and written at the end of
the cycle. All register reads, therefore, read values loaded at the end of a
previous cycle. A new value written to a register cannot be read out until a
subsequent cycle. This allows an input register to provide an operand to
the shifter at the beginning of the cycle and be updated with the next
operand at the end of the same cycle. It also allows a result register to be
stored in memory and updated with a new result in the same cycle. See
the discussion of “Multifunction Instructions” in Chapter 15, “Instruction
Set Reference” for an illustration of this same-cycle read and write.
2 – 23
2 Computational Units
The shifter contains a duplicate bank of registers, shown in Figure 2.9
behind the primary registers. There are actually two sets of SE, SB, SI, SR1,
and SR0 registers. Only one bank is accessible at a time. The additional
bank of registers can be activated for extremely fast context switching. A
new task, such as an interrupt service routine, can then be executed
without transferring current states to storage.
The shifting of the input is determined by a control code (C) and a HI/LO
reference signal. The control code is an 8-bit signed value which indicates
the direction and number of places the input is to be shifted. Positive
codes indicate a left shift (upshift) and negative codes indicate a right shift
(downshift). The control code can come from three sources: the content of
the shifter exponent (SE) register, the negated content of the SE register or
an immediate value from the instruction.
The HI/LO signal determines the reference point for the shifting. In the HI
state, all shifts are referenced to SR1 (the upper half of the output field),
and in the LO state, all shifts are referenced to SR0 (the lower half). The
HI/LO reference feature is useful when shifting 32-bit values since it
allows both halves of the number to be shifted with the same control code.
HI/LO reference signal is selectable each time the shifter is used.
The shifter fills any bits to the right of the input value in the output field
with zeros, and bits to the left are filled with the extension bit (X). The
extension bit can be fed by three possible sources depending on the
instruction being performed. The three sources are the MSB of the input,
the AC bit from the arithmetic status register (ASTAT) or a zero.
Table 2.4 shows the shifter array output as a function of the control code
and HI/LO signal.
2 – 24
Computational Units 2
Control Code Shifter Array Output ABCDEFGHIJKLMNPR
represents the 16-bit
HI reference LO Reference input pattern
+16 to +127 +32 to +127 00000000 00000000 00000000 00000000
+15 +31 R0000000 00000000 00000000 00000000 X stands for the
+14 +30 PR000000 00000000 00000000 00000000 extension bit
+13 +29 NPR00000 00000000 00000000 00000000
+12 +28 MNPR0000 00000000 00000000 00000000
+11 +27 LMNPR000 00000000 00000000 00000000
+10 +26 KLMNPR00 00000000 00000000 00000000
+9 +25 JKLMNPR0 00000000 00000000 00000000
+8 +24 IJKLMNPR 00000000 00000000 00000000
+7 +23 HIJKLMNP R0000000 00000000 00000000
+6 +22 GHIJKLMN PR000000 00000000 00000000
+5 +21 FGHIJKLM NPR00000 00000000 00000000
+4 +20 EFGHIJKL MNPR0000 00000000 00000000
+3 +19 DEFGHIJK LMNPR000 00000000 00000000
+2 +18 CDEFGHIJ KLMNPR00 00000000 00000000
+1 +17 BCDEFGHI JKLMNPR0 00000000 00000000
0 +16 ABCDEFGH IJKLMNPR 00000000 00000000
-1 +15 XABCDEFG HIJKLMNP R0000000 00000000
-2 +14 XXABCDEF GHIJKLMN PR000000 00000000
-3 +13 XXXABCDE FGHIJKLM NPR00000 00000000
-4 +12 XXXXABCD EFGHIJKL MNPR0000 00000000
-5 +11 XXXXXABC DEFGHIJK LMNPR000 00000000
-6 +10 XXXXXXAB CDEFGHIJ KLMNPR00 00000000
-7 +9 XXXXXXXA BCDEFGHI JKLMNPR0 00000000
-8 +8 XXXXXXXX ABCDEFGH IJKLMNPR 00000000
-9 +7 XXXXXXXX XABCDEFG HIJKLMNP R0000000
-10 +6 XXXXXXXX XXABCDEF GHIJKLMN PR000000
-11 +5 XXXXXXXX XXXABCDE FGHIJKLM NPR00000
-12 +4 XXXXXXXX XXXXABCD EFGHIJKL MNPR0000
-13 +3 XXXXXXXX XXXXXABC DEFGHIJK LMNPR000
-14 +2 XXXXXXXX XXXXXXAB CDEFGHIJ KLMNPR00
-15 +1 XXXXXXXX XXXXXXXA BCDEFGHI JKLMNPR0
-16 0 XXXXXXXX XXXXXXXX ABCDEFGH IJKLMNPR
-17 -1 XXXXXXXX XXXXXXXX XABCDEFG HIJKLMNP
-18 -2 XXXXXXXX XXXXXXXX XXABCDEF GHIJKLMN
-19 -3 XXXXXXXX XXXXXXXX XXXABCDE FGHIJKLM
-20 -4 XXXXXXXX XXXXXXXX XXXXABCD EFGHIJKL
-21 -5 XXXXXXXX XXXXXXXX XXXXXABC DEFGHIJK
-22 -6 XXXXXXXX XXXXXXXX XXXXXXAB CDEFGHIJ
-23 -7 XXXXXXXX XXXXXXXX XXXXXXXA BCDEFGHI
-24 -8 XXXXXXXX XXXXXXXX XXXXXXXX ABCDEFGH
-25 -9 XXXXXXXX XXXXXXXX XXXXXXXX XABCDEFG
-26 -10 XXXXXXXX XXXXXXXX XXXXXXXX XXABCDEF
-27 -11 XXXXXXXX XXXXXXXX XXXXXXXX XXXABCDE
-28 -12 XXXXXXXX XXXXXXXX XXXXXXXX XXXXABCD
-29 -13 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXABC
-30 -14 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXAB
-31 -15 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXA
-32 to -128 -16 to -128 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
2 – 25
2 Computational Units
The exponent detector derives an exponent for the shifter input value. The
exponent detector operates in one of three ways which determine how the
input value is interpreted. In the HI state, the input is interpreted as a
single precision number or the upper half of a double precision number.
The exponent detector determines the number of leading sign bits and
produces a code which indicates how many places the input must be up-
shifted to eliminate all but one of the sign bits. The code is negative so that
it can become the effective exponent for the mantissa formed by removing
the redundant sign bits.
The exponent compare logic is used to find the largest exponent value in
an array of shifter input values. The exponent compare logic in
conjunction with the exponent detector derives a block exponent. The
comparator compares the exponent value derived by the exponent
detector with the value stored in the shifter block exponent (SB) register
and updates the SB register only when the derived exponent value is
larger than the value in SB register. See the examples shown in the
following sections.
2 – 26
Computational Units 2
S = Sign bit
N = Non-sign bit
D = Don’t care bit
HI Mode HIX Mode
Shifter Array Input Output AV Shifter Array Input Output
1 DDDDDDDD DDDDDDDD +1
SNDDDDDD DDDDDDDD 0 0 SNDDDDDD DDDDDDDD 0
SSNDDDDD DDDDDDDD -1 0 SSNDDDDD DDDDDDDD -1
SSSNDDDD DDDDDDDD -2 0 SSSNDDDD DDDDDDDD -2
SSSSNDDD DDDDDDDD -3 0 SSSSNDDD DDDDDDDD -3
SSSSSNDD DDDDDDDD -4 0 SSSSSNDD DDDDDDDD -4
SSSSSSND DDDDDDDD -5 0 SSSSSSND DDDDDDDD -5
SSSSSSSN DDDDDDDD -6 0 SSSSSSSN DDDDDDDD -6
SSSSSSSS NDDDDDDD -7 0 SSSSSSSS NDDDDDDD -7
SSSSSSSS SNDDDDDD -8 0 SSSSSSSS SNDDDDDD -8
SSSSSSSS SSNDDDDD -9 0 SSSSSSSS SSNDDDDD -9
SSSSSSSS SSSNDDDD -10 0 SSSSSSSS SSSNDDDD -10
SSSSSSSS SSSSNDDD -11 0 SSSSSSSS SSSSNDDD -11
SSSSSSSS SSSSSNDD -12 0 SSSSSSSS SSSSSNDD -12
SSSSSSSS SSSSSSND -13 0 SSSSSSSS SSSSSSND -13
SSSSSSSS SSSSSSSN -14 0 SSSSSSSS SSSSSSSN -14
SSSSSSSS SSSSSSSS -15 0 SSSSSSSS SSSSSSSS -15
LO Mode
SS Shifter Array Input Output
S NDDDDDDD DDDDDDDD -15
S SNDDDDDD DDDDDDDD -16
S SSNDDDDD DDDDDDDD -17
S SSSNDDDD DDDDDDDD -18
S SSSSNDDD DDDDDDDD -19
S SSSSSNDD DDDDDDDD -20
S SSSSSSND DDDDDDDD -21
S SSSSSSSN DDDDDDDD -22
S SSSSSSSS NDDDDDDD -23
S SSSSSSSS SNDDDDDD -24
S SSSSSSSS SSNDDDDD -25
S SSSSSSSS SSSNDDDD -26
S SSSSSSSS SSSSNDDD -27
S SSSSSSSS SSSSSNDD -28
S SSSSSSSS SSSSSSND -29
S SSSSSSSS SSSSSSSN -30
S SSSSSSSS SSSSSSSS -31
Table 2.5 Exponent Detector Characteristics
2 – 27
2 Computational Units
2.4.2 Shifter Operations
The shifter performs the following functions (instruction mnemonics
shown in parentheses):
The shift functions (arithmetic shift, logical shift, and normalize) can be
optionally specified with [SR OR] and HI/LO modes to facilitate
multiprecision operations. [SR OR] logically ORs the shift result with the
current contents of SR. This option is used to join two 16-bit quantities into
a 32-bit value in SR. When [SR OR] is not used, the shift value is passed
through to SR directly. The HI and LO modifiers reference the shift to the
upper or lower half of the 32-bit SR register. These shift functions take
inputs from either the SI register or any other result register and load the
32-bit shifted result into the SR register.
2 – 28
Computational Units 2
2.4.2.2 Derive Block Exponent
This function detects the exponent of the number largest in magnitude in
an array of numbers. The EXPADJ instruction performs this function. The
sequence of steps for a typical example is shown below.
The SB register is used to contain the exponent for the entire block. The
possible values at the conclusion of a series of EXPADJ operations range
from –15 to 0. The exponent compare logic updates the SB register if the
new value is greater than the current value. Loading the register with –16
initializes it to a value certain to be less than any actual exponents
detected.
Exponent = –3
– 3 > SB (–16)
SB gets –3
Exponent = –6
–6 < –3
SB remains –3
When and if an array element is found whose exponent is greater than SB,
that value is loaded into SB. When all array elements have been processed,
the SB register contains the exponent of the largest number in the entire
block. No normalization is performed. EXPADJ is purely an inspection
operation. The value in SB could be transferred to SE and used to
normalize the block on the next pass through the shifter. Or it could be
simply associated with that data for subsequent interpretation.
2 – 29
2 Computational Units
2.4.2.3 Immediate Shifts
An immediate shift simply shifts the input bit pattern to the right
(downshift) or left (upshift) by a given number of bits. Immediate shift
instructions use the data value in the instruction itself to control the
amount and direction of the shifting operation. (See the chapter
“Instruction Set Overview” for an example of this instruction.) The data
value controlling the shift is an 8-bit signed number. The SE register is not
used or changed by an immediate shift.
The following example shows the input value downshifted relative to the
upper half of SR (SR1). This is the (HI) version of the shift:
SI=0xB6A3;
SR=LSHIFT SI BY –5 (HI);
Shift value: –5
Here is the same input value shifted in the other direction, referenced to
the lower half (LO) of SR:
SI=0xB6A3;
SR=LSHIFT SI BY 5 (LO);
Shift value: +5
2 – 30
Computational Units 2
In addition to the direction of the shifting operation, the shift may be
either arithmetic (ASHIFT) or logical (LSHIFT). For example, the following
shows a logical shift, relative to the upper half of SR (HI):
SI=0xB6A3;
SR=LSHIFT SI BY –5 (HI);
Shift value: -5
This example shows an arithmetic shift of the same input and shift code:
SI=0xB6A3;
SR=ASHIFT SI BY –5 (HI);
Shift value: -5
2.4.2.4 Denormalize
Denormalizing refers to shifting a number according to a predefined
exponent. The operation is effectively a floating-point to fixed-point
conversion.
2 – 31
2 Computational Units
Two examples of denormalizing a double-precision number are given
below. The first shows a denormalization in which the upper half of the
number is shifted first, followed by the lower half. Since computations
may produce output in either order, the second example shows the same
operation in the other order, i.e. lower half first.
Always select the arithmetic shift for the higher half (HI) of the twos-
complement input (or logical for unsigned). Likewise, the first half
processed does not use the [SR OR] option.
Now the lower half is processed. Always select a logical shift for the lower
half of the input. Likewise, the second half processed must use the
[SR OR] option to avoid overwriting the previous half of the output value.
Here is the same input processed in the reverse order. The higher half is
always arithmetically shifted and the lower half is logically shifted. The
first input is passed straight through to SR, but the second half is ORed to
create a double-precision value in SR.
2 – 32
Computational Units 2
2.4.2.5 Normalize
Numbers with redundant sign bits require normalizing. Normalizing a
number is the process of shifting a twos-complement number within a
field so that the rightmost sign bit lines up with the MSB position of the
field and recording how many places the number was shifted. The
operation can be thought of as a fixed-point to floating-point conversion,
generating an exponent and a mantissa.
SE=EXP AR (HI);
SE set to: –3
For a single precision input, the normalize operation can use either the
(HI) or (LO) modifier, depending on whether you want the result in SR1
or SR0, respectively.
Double precision values follow the same general scheme. The first stage
detects the exponent and the second stage normalizes the two halves of
the input. For double precision, however, there are two operations in each
stage.
2 – 33
2 Computational Units
For the first stage, the upper half of the input must be operated on first.
This first exponent derivation loads the exponent value into SE. The
second exponent derivation, operating on the lower half of the number
will not alter the SE register unless SE = –15. This happens only when the
first half contained all sign bits. In this case, the second operation will load
a value into SE. (See Table 2.5) This value is used to control both parts of
the normalization that follows.
For the second stage, now that SE contains the correct exponent value, the
order of operations is immaterial. The first half (whether HI or LO) is
normalized without the [SR OR] and the second half is normalized with
[SR OR] to create one double-precision value in SR. The (HI) and (LO)
modifiers identify which half is being processed.
SE set to: -3
SE unchanged, still -3
2 – 34
Computational Units 2
If the upper half of the input contains all sign bits, the SE register value is
determined by the second derive exponent operation as shown below.
All values of SE less than –15 (resulting in a shift of +16 or more) upshift
the input completely off scale.
2 – 35
2 Computational Units
There is one additional normalization situation, requiring the HI-extended
(HIX) state. This is specifically when normalizing ALU results (AR) that
may have overflowed. This operation reads the arithmetic status word
(ASTAT) overflow bit (AV) and the carry bit (AC) in conjunction with the
value in AR. AV is set (1) if an overflow has occurred. AC contains the true
sign of the twos-complement value.
AR = 11111010 00110010
AV = 1, indicating overflow
AC = 0, the true sign bit of this value
SE gets set to +1
AR = 11111010 00110010
SR = 01111101 00011001
The HIX operation executes properly whether or not there has actually been
an overflow. Consider this example:
AR = 11100011 01011011
AV = 0, indicating no overflow
AC = 0, not meaningful if AV = 0
SE set to –2
AR = 11100011 01011011
SR = 10001101 01101 000 00000000 00000000
The AC bit is not used as the sign bit. A brief examination of Table 2.4
shows that the HIX mode is identical to the HI mode when AV is not set.
When the NORM, LO operation is done, the extension bit is zero; when the
NORM, HI operation is done, the extension bit is AC.
2 – 36