DSP Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

A Digital Signal Processor, or DSP, is a

semiconductor device used for processing signals


digitally. A signal, in this context, traditionally refers to
an analog signal (such as analog voltage) that has
been converted into a digital one so that it can be
processed mathematically. Nowadays, however,
almost every piece of information has been digitized,
so a digital signal may be any stream of digital data digital audio/video data, betting odds, or even the
weight of clothes in a washing machine. Analysis of
such digital signals for a variety of purposes can be
easily accomplished by a DSP.
Signal processing encompasses a large variety of
actions performed on signals - filtering,
encoding/decoding, compression/decompression,
amplification, modulation, level detection, pattern
matching, mathematical/logical operations, and much
more. These processes are performed on a signal for
a number of reasons: to enhance it; reduce its
component noise; make its transmission and reception
more effective, efficient, and faster; transform it; make
it interact with other signals in special ways; facilitate
its use in digital analysis, monitoring, or control; etc. A
DSP has built-in capabilities to perform these signal
processing functions easily.
A DSP is very similar to a microprocessor. In fact, it is
regarded by many as a special microprocessor
created particularly to process signals. Both a
microprocessor and a DSP can execute instructions,
accept input digital data, perform operations on them,
and output digital data. The fundamental difference
between a DSP and a microprocessor is what their
built-in processing capabilities were designed for.
A DSP is a highly-specialized device that's equipped
with a multitude of mathematicalfunctions specifically
intended for processing a digital signal, whereas a
microprocessor is designed to be a generalpurpose device. A microprocessor would be able to
handle many different applications, such as word
processing, spreadsheets, databases, and, well, even
digital signal processing. However, it can not be as
good as a DSP when it comes to serious DSP
applications.
Current trends in technology seem to indicate the
possibility though that the distinction between a DSP
and a microprocessor will soon be gone.
Microprocessors are becoming more and more
sophisticated that some of them are now equipped
with true DSPcapabilities. It will just be a matter of
time before high-end microprocessors will have the
capability to perform high-end signal processing, or
any high-end task for that matter.
A DSP is also very similar to a microprocessor as far
as architecture is concerned, i.e., it has many parts
that are also seen in a microprocessor, such as data
and address buses, an Arithmetic-Logic Unit (ALU), a
program control unit, assorted flags and registers, etc.
It also has its own native instruction set, which defines
what it can be programmed to do. Programming
DSP's is no longer complicated too, with the existence
of various development kits in the market that support

DSP software development using high-level


programming languages such as C.
Many DSP applications deal with real-world analog
signals (such as sound, light, analog voltage, analog
current, temperature, pressure). Since a DSP can only
process digital signals, there is a need to convert
analog signals first into digital data before they can be
processed by a DSP. After processing, there is again
a need for the DSP to convert these digital data back
into the original real-world analog signal format. In
such applications, the DSP must be supported by
an analog-to-digital converter (ADC) and adigital-toanalog converter (DAC), which will perform the
required analog-digital and digital-analog conversions,
respectively.
Applications where DSP's are commonly used
include: 1) digital sound and image processing; 2)
digital communications; 3) consumer electronics (e.g.,
mobile phones, faxes, computer peripherals such as
modems and sound cards, and digital entertainment
systems such as DVD players and digital TV); 4)
medical electronics; and 5) industrial and automation
electronics.
There are currently four major companies that
produce DSP's, namely, Texas Instruments, Analog
Devices, Motorola, and Lucent Technologies.
Examples of commercially available DSP's include:
- Analog Devices' ADSP-21xx: 10 to 50 MIPS 16-bit
fixed-point DSP's; 40-bit accumulator; 24-bit
instructions;
- Analog Devices' ADSP-2106x ("SHARC"): 40 MIPS,
32-bit floating point DSP's;
- Lucent Technologies' DSP32xx: 32-bit floating-point
with 40-bit accumulator and 16/24-bit fixed point
DSP's;
- Motorola's DSP568xx: 20 MIPS 16-bit fixed-point
DSP's;
- Motorola's DSP96002: IEEE format floating-point
DSP with two complete 32-bit data and address
buses;
- Texas Instruments' TMS320C1x: Low cost fixedpoint DSP's with 16-bit data, 32-bit registers;
- Texas Instruments' TMS320C8x: Multiple 50 MHz
32-bit fixed-point processors combined with a RISC
supervisory processor in a single multi-chip module.
Notes on Texas Instruments Processors
Prof. Brian L. Evans

At present, TI is developing new processors within three


digital signal processor families:

TMS320C2000 (formerly known as TMS320C20)


o disk drives, e.g. Seagate
TMS320C5000 (formerly known as TMS320C54)
o voiceband modems, e.g. modems by 3Com
and the modem for the compact Sun-Denshi
Online Station for Playstation 2

cell phone handsets, e.g. by Nokia and


Ericsson
o portable MP3 players, e.g. Sanyo Internet
audio player
o digital still cameras, e.g. Sony
o digital video, e.g. JVC's GR-DVM90 eCyberCam
TMS320C6000
o ADSL modems, e.g. TI's ADSL modems
o VDSL modems
o cell phone basestations
o modem banks
o laser printers
o video conferencing systems

TI has produced many other families of digital signal


processors which they still support but for which they are
not developing new members of the families. These families
include the TMS32010, TMS320C30, TMS320C40,
TMS320C50, and TMS320C80. Note that the TMS32010
family does not have a "C" in it because it was originally
designed in NMOS and not CMOS.
Conventional Fixed-Point DSP Processors
The family of conventional fixed-point DSP processors
includes the TMS32010, TMS320C20, TMS320C50,
TMS320C54, and TMS320C55. These processors have 16bit data words and 16-bit program words. The 10 (1982) and
C20 (1985) fixed-point processors are being widely used in
control applications. The C203, a derivative of the C20, was
released in 1995 in response to disk drive manufacturers'
needs. The C203 delivers 40 MIPS (80 MHz) and costs
under $5.00 in volume. The 10 is widely used as essentially
a powerful microcontroller. The C24 is dedicated for motion
control.
The C54x is a smaller, low-power version of the C50 meant
for use in wireless basestations and handsets. The C54x
instruction set is not compatible with the C50. The C54x
reminds me of Freescale's DSP56000 in that it can perform
parallel reads:

2 data reads from block 1


1 data write to block 2
1 instruction fetch from block 3

targeted for portable handheld Internet appliances. It has an


extensive set of on-board peripherals.

Clock rate: 144/200 MHz (up to 288/400 MIPS)


On-chip Memory: 128 kw RAM and 32 kw ROM
Interfaces: USB 1.1 port, I2C, Memory Stick,
MMC, SD, three serial ports
Data converter: on-chip 10-bit ADC

The TMS320C5502 is a low-cost member of the C5000


family for personal systems at $9.95/unit in quantities of
10,000 units:

Clock rate: * 200 MHz (up to 400 MIPS)


On-chip Memory: 32 kw DARAM and 16 kw
ROM
Interfaces: UART, I2C, three serial ports

Conventional Floating-Point DSP Processors


The first two TI floating-point DSP processors were the
TMS320C30 (1988) and TMS320C40 processors. These
two processors are very similar. The key difference is that
the C40 has extra communications features that allows it to
be more easily used in parallel.
TMS320C30 Family
The C30 is the base processor in the TMS320C30 family. A
DSP Starter Kit (DSK) board with the C31 (August, 1996)
sells for $99. This is much cheaper than the $750 for the
C30 evaluation module (EVM) board, which has the original
C30 on it. Like the EVM, the DSK does not come with a
compiler. However, an extension to the GNU C compiler
generates code for the C30.
By Fall of 1999, the price for the TMS320C3x family of
processors had dropped to $5 per processor. The
TMS320VC33 sells for $5. The C33 provides a full 1-Mbits
of random access memory (RAM) and delivers 120
MFLOPS. A 150-MHz version of the C33 is also available
for $8. TI is still maintaining the C30 line by continuing to
design C33s for faster clock speeds, e.g. the SM320VC33EP which runs at 150 MHz (August 2002).
Applications of the TMS320C30 Family

The C54x has a special instruction for Viterbi decoding.


Other features include three idle modes (controlled by host
processor) to preserve power consumption and flash
memory (must write in 2 kword blocks). A C compiler
exists. A low-cost C54x DSP Starter Kit (DSK) also exists.
The C54x is also used for servo-control in high-end disk
drives.
A variation of the C54x, the C54xx family, has 8 Mwords of
addressable memory due to the addition of a page pointer.
The TMS320C5416 has 128K words of on-chip SRAM and
runs at 160 MHz. Applications include Voice over Internet
Protocol (VoIP), communications servers, PBX add-ons and
other computer telephony and customer premise equipment.
The C55 is in the C5000 family but has lower power
consumption than the C54. The TMS320C5509 DSP is

A TMS320C30 ($20) DSP was used by the Miniature


Sensor Technology Integration (Misty) 3 Satellite which
orbited from 1995 to 1996. Southwestern Research Institute
in San Antonio, TX, under contract from SAIC, built a VME
card cage containing a single processor, military
specification TMS320C30 for the infrared satellite imaging
subsystem on Misty. The TMS320C31 ($20) was used by
Dr. Thomas P. Barnwell (Atlanta Signal Processors Inc.,
Atlanta, GA, which is now part of Polycom) to prototype a
DirectTV decoder before it was implemented on a fixedpoint processor. The TMS320C32 sells for $10 each with a
volume purchase being required. In the late 1990s, the C32
was used in the Concur Systems Inc. thin Internet data
acquisition systems.
The TMS320C40 Family

The C40 was intended for use in parallel processing. No


more C40 derivatives will be developed. The C44 is a scaled
down version of the C40. The fixed-point C80 family briefly
superseded the C40 for parallel processing, but no more C80
derivatives will be developed. The C80 is described next.
The primary TI processor family for parallel processing is
the C6x.
Unconventional DSP Processors
The members of this family include the TMS320C80,
TMS320C62x, TMS320C4x, and TMS320C67x.
The C80 contains four fixed-point DSPs plus a RISC on a
single chip and is meant for video processing. The reality is
that the C80 was too expensive, consumed too much power,
and development tools for it were poor. TI is no longer
developing new members of the C8x family, but third-party
C8x boards and tools are still being developed, e.g.
the Genesis board by Matrox.

a T-1 line on a single chip). More recently, the


C6203 is also used for the AR7 DSL gateway:
C67x Processor
It is pin compatible with the 'C62x. The C67x is in volume
production. At 100-MHz, the 'C6711 delivers 600 MFLOPS
for only $20. A 150-MHz version of the device, also new,
increases performance to 900 MFLOPS. The 'C67x family
offers a code-compatible roadmap to 3 GFLOPS and
beyond. Applications include beamforming base stations, 3D virtual reality, graphics, speech recognition, radar/sonar,
precision instrumentation, and medical imaging.
Problems with TI Tools

The C6x (C6000) family is a Very Long Instruction Word


(VLIW) Reduced Instruction Set Computer (RISC) Digital
Signal Processor (DSP) with eight parallel function units: 6
are ALUs and 2 are multipliers. The C6x has three key
members: C6200 and C6400 for 16-bit fixed-point and
C6700 for 32-bit floating-point processing. A 32-bit
floating-point multiplication has 1 cycle throughput and 3
cycles of delay. According to an October 29, 1999, press
release by TI, the market share for the C6x family hit $1.5
billion. The DSP processor market in 1999 was about $4.5
billion.

No code translators between C5x and C20x and


between C54x and C6x exist
No simulators and debuggers are publicly
available, except for the C31.
C compilers are very poor for the traditional fixedpoint DSP processors (C2x/C5x/C54x), but
relatively poor for the C6000 processors, when
compared to C compilers for desktop computers.

Harvad and von neuman architecture

When TI reports MIPS for the C6000, they are computing


RISC MIPS using 8 times the clock rate. These MIPS are
*not* DSP processor MIPS. Another useful figure of merit
is million multiply-accumulates per second (MMACS),
which is 2 x clock rate for the C6200 and C6400.
C62x Processor
The C62x has 8 arithmetic units (2 multipliers and 6
adders/shifters). Applications include wireless basestations,
modem pools, cable modems, remove access servers, digital
subscriber loop modems, and wireless PDAs. Members of
the family include:

TMS320C6211: 150 MHz (1200 RISC MIPS) for


$25 (in 25K unit quantities); 64 kbits on-chip
memory (32 kbits data; 32 kbits program) plus L2
cache (512 kbits)
TMS320C6201: 167 MHz (1333 RISC MIPS) and
200 MHz (1600 RISC MIPS); 1 Mbit on-chip
memory (512 kbits data; 512 kbits program); lowpower version C6201B at 200 MHz consumes 1.94
W of power
TMS320C6202: 250 MHz (2000 RISC MIPS).
TMS320C6203: 250 MHz (2000 RISC MIPS) and
300 MHz (2400 RISC MIPS); 7 Mbits on-chip
memory (3 Mb program; 4 Mb data); used in
digital communication systems. In 1999, it was
being used for third-generation wireless
communication systems (wireless data networks)
and modem banks (a bank of 24 V.90 modems for

Now let us have a closer look at the internal architecture of


computers so we can see how this has affected
the design of DSP chips.

Stored Program Machines


Computers need instructions to operate. At every clock
cycle, they must be told what to do. If the
instructions are stored, the computer just has to fetch and
execute them. Such computers are called stored
program machines. Our computer typically fetches an
instruction and then data, operates on the data, and
returns the resulting data to the store.
Stored program machines use two well-known and widely
used computer architectures: von Neuman and
Harvard.. The following diagram shows the structure of the
two architectures.
von Neuman Architecture
The von Neuman machines store programming and data in
the same memory area. In this type of machine, an
instruction contains the operation command and the address

of the data on which the operation is performed. There are


two basic operation units within these machines: the
arithmetic logic unit (ALU) and the input/output unit. The
ALU performs the core operations: multiply, add, subtract,
and many more. It is on these very simple core operations
that complex software, such as word processing software,
can be built. The input/output unit manages the flow of
external data for the machine.
1-

Digital Computers
INSTRUCTOR'S GUIDE INTRODUCTION TO DSP

CHAPTER 1 1-8
Harvard Architecture

The primary difference between Harvard architecture and


von Neuman architecture is that with Harvard, program and
data memories are physically separated transmission paths.
This enables the machine to transfer instructions and data
simultaneously. Such a structure can greatly enhance
performance, because instructions and data can be fetched
simultaneously. Harvard machines also have ALUs and
input/output units.
Von Neuman and Harvard Architecture History
The history of these two architectures is very interesting.
The Harvard architecture was developed by
Howard Aiken in the late 1930s at Harvard University, with
the Harvard Mark 1 becoming operational in
1944. The University of Pennsylvania followed in 1946 with
the development of the Electronic Numerical
Integrator and Calculator (ENIAC).
John von Neuman, a Hungarian-born mathematician,
suggested a simpler and lower cost architecture,
namely a single memory for programming and data. This
simple solution has set the standard ever since. In
1951, the Institute of Advanced Studies in Princeton built
the first von Neuman machine.

Which Architecture is Best Suited for


DSP?
Common general-purpose personal computers use
processors designed with the von Neuman architecture while
the Harvard architecture is more commonly used in
specialized microprocessors for real-time and embedded
applications. DSPs typically use Harvard architecture,
although von Neuman DSPs also exist. Many signal and
image processing applications require fast, real-time
machines. The drawback to using a true Harvard
architecture is that since it uses separate program and data
memories, it needs twice as many address and data pins on
the chip and twice as much external memory. Unfortunately,
as the number of pins or chips increases, so does the price.
Electronic designers, who have had to tackle problems like
these before, have come up with an elegant solution: a single
data and address bus is used externally, while two (or more)
separate buses for program and data are used internally.
Timing (multiplexing) handles the separation of program
and data information. In one clock cycle, the program
information flows on the pins, and in the second cycle, data
follows on the same pins. Program and data information is
then routed onto separate internal program and data buses.
Such machines are called modified Harvard architecture
processors because the internal architecture is Harvard while
the external architecture is von Neuman. The performance
of modified Harvard architecture processors typically
compares well with the performance of true Harvard
architecture processors because most DSP chips also
incorporate multiple internal RAM/ROM cells for high-use
instructions and data. This significantly reduces the time

used for external sequential program and dataaccess


associated with classic von Neuman processors.

Fixed-Point Notation
Conventions

Number range is between 1 and -1


Decimal point is always in a fixed location (e.g., 0.74, 0.34, etc.)
Multiplying a fraction by a fraction always results in a fraction and will
not produce an overflow (e.g., 0.99 x 0.9999 = less than 1)
Successive additions may cause overflow

Why?

Signal processing is multiplication-intensive


Fixed-point notation prevents overflow (useful with a small dynamic
range)
Fixed-point notation is less expensive

How is fixed-point notation realized in a DSP?

Most fixed-point DSPs are 16 bits


The range of numbers that can be represented is 32767 to -32768
The most common fixed-point format is Q15

Fixed-Point Notation

Fixed-point notation, sometimes called fractional-point


notation or Q format, uses an implied binary point
to represent binary fractions. This point always remains at a
fixed location. The dynamic range of a
processor is the range between the smallest and the largest
number it can represent. When the dynamic
range is limited,
In a 16-bit processor, the dynamic range is 32767 to 32768.
Such a small dynamic range can easily create
overflows. For example, 200 350 = 70000, which is an
overflow!
However, if the number range is limited, or more precisely
scaled, to +1 to 1, a multiplication could never
produce an overflow. For example, the multiplication of two
fractional numbers within the range of 1 to 1
must always produce a result that is also a fraction. The
result is therefore confined to be within the range of
1 to 1. Unfortunately, successive additions can produce
overflow values outside the range of 1 to 1. This
point should be remembered when performing fixed-point
arithmetic.
Signal processing is both multiplication and addition
intensive. An overflow can have serious consequences,
(e.g., unintentionally clipping a large signal). A fixed-point
system can solve this problem by either
checking for overflows after each math operation, or by
knowing that the inputs and outputs of the operation
are input bounded or well behaved.
Why Use a Fixed-Point System?
The cost of implementing many DSP systems is strongly
dependent on the amount of chip silicon used to get
the job done, with most of the chip silicon being either in the
processor or in the surrounding memory. If the
chip silicon is mostly used for data storage, such as long
audio delay buffers, video or coefficient tables, the
difference between 16- and 32-bit data storage can be as
much as 2:1.
Furthermore, routing twice as many signals around the chip
and system board can consume extra space and
drive up the power consumption. Another advantage of short
16-bit fixed-point chips is that by making the
core processor small, not only are the chips smaller and less
expensive, they are also usually a bit faster.
This may again lower the price of the DSP chip that, in
price-sensitive volume applications, is an important
consideration. However, if a 16-bit system must also
perform 32-bit operations, these advantages can be lost

and end up costing more. If a system can tolerate a smaller


dynamic range and resolution, then the use of
16-bit data can be an economic advantage.
Fixed-Point Q Notation
As we have seen in multiplication and addition, overflows
can be a problem for fixed-point DSPs. To
eliminate this problem, a programming convention called Q
format is introduced where fixed-point DSPs
operate on fractional numbers which, by definition, cannot
saturate. The principle of Q notation is the
application of a simple scaling coefficient to convert
fractions to integers that a fixed-point DSP is designed
to handle. (Note that this is not an issue for floating-point
DSP).
The letter Q represents the Quantity of fractional bits and
the number following the Q indicates the number
of bits that are used for the fraction. This divides the number
into an upper and lower region of bits where
the upper region contains the sign bit and any whole integer
values, and the lower bits hold the fraction.
Any Q format is possible, but Q15 is the most widespread in
16-bit DSPs and Q31 is most often used for 32bit DSPs.
In Q15 format, an imaginary decimal point is placed
between bits 15 and 16. The upper range in this case is
only one MSB (for a 16-bit DSP) which is essentially the
sign bit, or bits 1631 in a 32-bit DSP. The
remaining 15 bits are used to represent the fractional part of
the number. To convert a Q-format integer to a
floating-point value, a scaling coefficient is needed. If the Q
number is 15, the coefficient or resolution of
the fraction will be 2^15 or 30.518e6.

smaller than 1. There are some instances where this can be


safely violated. For example, a property of a
2s complement adder is that if an addition overflow occurs,
exceeding the available 16-bit range, a
subtraction can unwrap the result back down into a valid
range. Generally, however, it is best to avoid the
problem in the first place. If a dynamic range greater than
32767 to 32768 (i.e., a 16-bit system) is
required, it is also possible to perform longhand arithmetic
in pieces, but this consumes CPU cycles and
data.
The bottom portion of the slide shows an example where
multiplying 0.5 and 0.45 (unscaled for clarity)
results in another fraction, which is not a problem.
Multiplying the product by 2 can be done using two
methods. One method is to multiply one of the inputs by 2
first. If the result of this intermediate operation
exceeds +/1.0, we will have a problem. The inputs could be
scaled down first and then scaled up
INSTRUCTOR'S GUIDE INTRODUCTION TO DSP

CHAPTER 1 1-31
afterwards, but this is also far from efficient. An alternative
method is to add the product to itself,
effectively multiplying by 2. This is one of the difficulties of
using the fixed-point operation. The
programmer needs to think about these issues and plan
ahead.
Another important rule is that all numbers must be scaled to
the same Q format (Q15 in our examples),
placing the decimal points of both operands in the same
place, before an addition or subtraction is
performed. Generally, this is also practiced in multiplication.
However since the scaling coefficients are
multiplied, the correct fraction can be retrieved using yet
another scaling constant. Nevertheless, mixing Q
formats is not desired.

Dynamic Range in Q15

The dynamic range, or ratio of largest to smallest magnitude


levels, is the same for Q15 and normal integers.
It is the scaling coefficient that sets the two apart, and other
than this, you may have difficulty knowing
which format is in use. As mentioned previously, to prevent
overflows the inputs and outputs can be
constrained to fractions in the range of 1 to 1 by simply
applying a scaling coefficient.
Number Representation in Q15
Scaling a number is simple:
Integer = Q15_fractional_number 215
The second table on the slide shows several examples of
scaling.
Rules for Operations
The most important rule in using the Q15 fixed-point format
is to avoid using a number larger than 1 or

Floating-Point Formats

Although the C54x device is fixed-point, a popular floatingpoint format (used, for example, in
TMS320C67x devices) standard is IEEE 754. The
differences between various floating-point formats are
actually insignificant, and conversion can be performed in
ASIC hardware or software.
TMS320 Single-Precision Floating-Point Format
The preceding table shows an example of a TMS320C67x
floating-point bit assignments. The top eight bits
represent the exponent (e) in twos complement notation. Bit
23, (s), is the sign bit of the mantissa, and the

lower 23 bits are the fraction (f) of the mantissa. A value of


1.0 is also implied in the mantissa, but is not
allocated a bit position since it is always present. This
format is called floating-point because of the implied
binary point floats around, depending on how large the
exponent is. The exponent is essentially a variable Q
value that is automatically adjusted for maximum precision
and range by the hardware.
Conversion Equations
The middle table on the slide shows the conversion
equations for the TMS320 single-precision floating-point
format. The second column shows the binary and the third
column shows the decimal version of the same
equation. The decimal version of the equation is easier to
understand. There are two different equations for
positive and negative mantissa. We will use decimal
examples of both equations to aid in the understanding
of this format. The representation of 0.0 is a special case
where any number with an exponent of -128 (0x80) is
treated as zero. Since -128 is the smallest possible value for
the exponent, the scaling coefficient for these numbers
would produce very small values. The convention used in
the assembler is to represent zero as 0x80000000.
For example, all of the following numbers are treated as the
value 0:
0x80000000
0x80123456
0x80876345
This is a special case worth remembering.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy