Linear Predictive Coding
Linear Predictive Coding
Linear Predictive Coding
Introduction:
Linear predictive coding(LPC) is defined as a digital method for encoding an
analog signal in which a particular value is predicted by a linear function of the past
values of the signal. It was first proposed as a method for encoding human speech by
the United States Department of Defence in federal standard 1015, published in
1984. Human speech is produced in the vocal tract which can be approximated as a
variable diameter tube. The linear predictive coding (LPC) model is based on a
mathematical approximation of the vocal tract represented by this tube of a varying
diameter. At a particular time, t, the speech sample s(t) is represented as a linear
sum of the p previous samples. The most important aspect of LPC is the linear
predictive filter which allows the value of the next sample to be determined by a
linear combination of previous samples. Under normal circumstances, speech
is sampled at 8000 samples/second with 8 bits used to represent each sample. This
provides a rate of 64000 bits/second. Linear predictive coding reduces this to 2400
bits/second. At this reduced rate the speech has a distinctive synthetic sound and
there is a noticeable loss of quality. However, the speech is still audible and it can
still be easily understood. Since there is information loss in linear predictive coding,
it is a lossy form of compression.
There exist many different types of speech compression that make use of a variety of
different
techniques. However, most methods of speech compression exploit the fact that
speech production
occurs through slow anatomical movements and that the speech produced has a
limited frequency
range. The frequency of human speech production ranges from around 300 Hz to
3400 Hz. Speech
compression is often referred to as speech coding which is defined as a method for
reducing the
amount of information needed to represent a speech signal. Most forms of speech
coding are usually
based on a lossy algorithm. Lossy algorithms are considered acceptable when
encoding speech
because the loss of quality is often undetectable to the human ear.
There are many other characteristics about speech production that can be exploited
by speech
coding algorithms. One fact that is often used is that period of silence take up
greater than 50% of
conversations. An easy way to save bandwidth and reduce the amount of information
needed to
represent the speech signal is to not transmit the silence. Another fact about speech
production that
can be taken advantage of is that mechanically there is a high correlation between
adjacent samples
of speech. Most forms of speech compression are achieved by modelling the process
of speech
production as a linear digital filter. The digital filter and its slow changing
parameters are usually
encoded to achieve compression from the speech signal.
Linear Predictive Coding (LPC) is one of the methods of compression that models
the process
of speech production. Specifically, LPC models this process as a linear sum of
earlier samples using
a digital filter inputting an excitement signal. An alternate explanation is that linear
prediction filters
attempt to predict future values of the input signal based on past signals. LPC
“...models speech as
an autoregressive process, and sends the parameters of the process as opposed to
sending the speech
itself” [4]. It was first proposed as a method for encoding human speech by the
United States
Department of Defence in federal standard 1015, published in 1984. Another name
for federal
standard 1015 is LPC-10 which is the method of Linear predictive coding that will
be described in
this paper.
Speech coding or compression is usually conducted with the use of voice coders or
vocoders.
There are two types of voice coders: waveform-following coders and model-base
coders. Waveform-
following coders will exactly reproduce the original speech signal if no quantization
errors occur.
Model-based coders will never exactly reproduce the original speech signal,
regardless of the
3
presence of quantization errors, because they use a parametric model of speech
production which
involves encoding and transmitting the parameters not the signal. LPC vocoders are
considered
model-based coders which means that LPC coding is lossy even if no quantization
errors occur.
All vocoders, including LPC vocoders, have four main attributes: bit rate, delay,
complexity,
quality. Any voice coder, regardless of the algorithm it uses, will have to make trade
offs between
these different attributes. The first attribute of vocoders, the bit rate, is used to
determine the degree
of compression that a vocoder achieves. Uncompressed speech is usually transmitted
at 64 kb/s using
8 bits/sample and a rate of 8 kHz for sampling. Any bit rate below 64 kb/s is
considered compression.
The linear predictive coder transmits speech at a bit rate of 2.4 kb/s, an excellent
rate of compression.
Delay is another important attribute for vocoders that are involved with the
transmission of an
encoded speech signal. Vocoders which are involved with the storage of the
compressed speech, as
opposed to transmission, are not as concern with delay. The general delay standard
for transmitted
speech conversations is that any delay that is greater than 300 ms is considered
unacceptable. The
third attribute of voice coders is the complexity of the algorithm used. The
complexity affects both
the cost and the power of the vocoder. Linear predictive coding because of its high
compression rate
is very complex and involves executing millions of instructions per second. LPC
often requires more
than one processor to run in real time. The final attribute of vocoders is quality.
Quality is a subjective
attribute and it depends on how the speech sounds to a given listener. One of the
most common test
for speech quality is the absolute category rating (ACR) test. This test involves
subjects being given
pairs of sentences and asked to rate them as excellent, good, fair, poor, or bad.
Linear predictive
coders sacrifice quality in order to achieve a low bit rate and as a result often sound
synthetic. An
alternate method of speech compression called adaptive differential pulse code
modulation (ADPCM)
only reduces the bit rate by a factor of 2 to 4, between 16 kb/s and 32kb/s , but has a
much higher
quality of speech than LPC.
The general algorithm for linear predictive coding involves an analysis or encoding
part and
a synthesis or decoding part. In the encoding, LPC takes the speech signal in blocks
or frames of
speech and determines the input signal and the coefficients of the filter that will be
capable of
reproducing the current block of speech. This information is quantized and
transmitted. In the
decoding, LPC rebuilds the filter based on the coefficients received. The filter can be
thought of as
a tube which, when given an input signal, attempts to output speech. Additional
information about
the original speech signal is used by the decoder to determine the input or excitation
signal t