Voice Encoding/Decoding (The Vocoder)
Let’s first look at the microphone and its impact on uplink offered traffic. In Chapter 1 we described briefly the process of talking into a microphone to produce an analog waveform (a varying voltage), which is then digitized in an analog-to-digital converter (ADC). In GSM, this produces a digital bit stream of 104 kbps, which then has to be compressed, typically to 13 kbps or lower, using a transfer from the time to the frequency domain—the basis for all speech synthesis codecs.
Initially, cellular handset codecs were constant rate. The codecs specified for 3G are variable rate. In the case of the adaptive multirate (AMR) codec specified by 3GPP1 (the standards group for IMT2000DS/W-CDMA/UMTS), the rate is switchable between 4.75 and 12.2 kbps. The rate can be chosen to provide capacity gain (lower bit rate) or quality gain (higher bit rate). The codec is an adaptive codebook excitation linear prediction codec, which means speech waveforms are stored in a lookup table in the receiver. 3GPP2 (the standards group responsible for CDMA2000) have specified a variablerate vocoder described as a selectable mode (SMV) vocoder. It adapts dynamically to the audio input waveform. Figure 4.2 shows performance comparisons between the SMV and AMR codecs, with the SMV codec providing a better quality/capacity trade-off—at the cost of some additional processing overhead. Voice quality is measured using a mean opinion score (MOS). Mean opinion scoring is essentially an objective method for comparing subjective responses to quality—a group of users listen to the voice quality from a handset and provide a score. Ascore of 5 is very good (equivalent to a wireline connection); a score of 2.5 would be comprehensible but uncomfortable to listen to, and many of the harmonic qualities of the person’s voice will have been lost, to the point where it is difficult to recognize who is speaking. Figure 4.3 shows typical SMV and AMR vocoder performance with the SMV codec, used in 3GPP2, which performs better than the AMR codec, used in 3GPP1, albeit with some additional processing and delay overhead not shown on the graph. The G711 reference is a 16-kbps μ-Law PCM waveform encoder used in wireline voice compression and used in this example as a quality benchmark.
3GPP1 has also specified a wideband version of AMR that encompasses CD-quality audio signals (16 kHz bandwidth versus 3 kHz voice bandwidth). The codec rates are 6.6 kbps, 8.85, 12.65, 15.25; 15.85 kbps, 18.25, 19.85, 23.05, and 23.85 kbps each. This implies an associated need to increase speaker or headset quality in the handset and audio amplifier efficiency. Parallel work has been undertaken to standardize speech recognition algorithms, with competing candidates from Qualcomm and Motorola/France Telecom/Alcatel. Typical recognition accuracy—that is, user-to-user distance—is >90 percent in a noisy car, five-language test environment. Recognition accuracy is a quality metric. As we add complexity to audio processing, we increase processing delay, and the delay budget is a not insignificant part of the overall end-to-end delay budget. Table 4.1 details the delay introduced in the send/receive path for each of the audio encoding and encoding processes, including radio transmission framing and channel encoding/ decoding. The particular example is a CDMA2000 handset/base station. 114
279 times read
|