Header
Home | Set as homepage | Add to favorites
  Search the Site     » Advanced Search
Sections
Syndication


Blogroll:

||||| ALL Cisco-Network ARTICLES |||||  
CCIE Journey,
The CCIE Journey,


Voice Encoding/Decoding (The Vocoder)

Mar 17,2011 by alperen

image


Let’s first look at the microphone and its impact on uplink offered traffic. In Chapter 1
we described briefly the process of talking into a microphone to produce an analog
waveform (a varying voltage), which is then digitized in an analog-to-digital converter
(ADC). In GSM, this produces a digital bit stream of 104 kbps, which then has to
be compressed, typically to 13 kbps or lower, using a transfer from the time to the
frequency domain—the basis for all speech synthesis codecs.

Initially, cellular handset codecs were constant rate. The codecs specified for 3G are
variable rate. In the case of the adaptive multirate (AMR) codec specified by 3GPP1
(the standards group for IMT2000DS/W-CDMA/UMTS), the rate is switchable
between 4.75 and 12.2 kbps. The rate can be chosen to provide capacity gain (lower bit
rate) or quality gain (higher bit rate). The codec is an adaptive codebook excitation linear
prediction codec, which means speech waveforms are stored in a lookup table in
the receiver.
3GPP2 (the standards group responsible for CDMA2000) have specified a variablerate
vocoder described as a selectable mode (SMV) vocoder. It adapts dynamically to
the audio input waveform.
Figure 4.2 shows performance comparisons between the SMV and AMR codecs,
with the SMV codec providing a better quality/capacity trade-off—at the cost of some
additional processing overhead.
Voice quality is measured using a mean opinion score (MOS). Mean opinion scoring is
essentially an objective method for comparing subjective responses to quality—a
group of users listen to the voice quality from a handset and provide a score. Ascore of
5 is very good (equivalent to a wireline connection); a score of 2.5 would be comprehensible
but uncomfortable to listen to, and many of the harmonic qualities of the person’s
voice will have been lost, to the point where it is difficult to recognize who is
speaking. Figure 4.3 shows typical SMV and AMR vocoder performance with the SMV
codec, used in 3GPP2, which performs better than the AMR codec, used in 3GPP1,
albeit with some additional processing and delay overhead not shown on the graph.
The G711 reference is a 16-kbps μ-Law PCM waveform encoder used in wireline voice
compression and used in this example as a quality benchmark.

3GPP1 has also specified a wideband version of AMR that encompasses CD-quality
audio signals (16 kHz bandwidth versus 3 kHz voice bandwidth). The codec rates are
6.6 kbps, 8.85, 12.65, 15.25; 15.85 kbps, 18.25, 19.85, 23.05, and 23.85 kbps each. This
implies an associated need to increase speaker or headset quality in the handset and
audio amplifier efficiency.
Parallel work has been undertaken to standardize speech recognition algorithms,
with competing candidates from Qualcomm and Motorola/France Telecom/Alcatel.
Typical recognition accuracy—that is, user-to-user distance—is >90 percent in a noisy
car, five-language test environment. Recognition accuracy is a quality metric.
As we add complexity to audio processing, we increase processing delay, and the
delay budget is a not insignificant part of the overall end-to-end delay budget. Table
4.1 details the delay introduced in the send/receive path for each of the audio encoding
and encoding processes, including radio transmission framing and channel encoding/
decoding. The particular example is a CDMA2000 handset/base station. 114

279 times read

Related news

No matching news for this article
Did you enjoy this article?
Rating: -0.14 (total 7 votes)

comment Comments (0 posted) 

More Top News
CCSP-Cisco Certified Security Professional
Most Popular
Most Commented
Featured Author