Telephony: Measuring Voice Quality in a Packet Network
Everyone knows the ability to guarantee voice quality is a prerequisite to providing primary line telephony service, but how do you really know that your system meets the standards of carrier-grade quality? For that matter, do you know what those standards are? When I started to dig into this topic, I found there was a lot more than meets the ear. (Now, you really didn’t think I was going to say "eye" in an article on voice quality, did you?)
As a first cut on voice quality measurement, most cable engineers would probably point to metrics related to quality of service (QoS) in a packet network, and argue that a network that achieves QoS objectives provides carrier-grade voice quality. While that is probably true in most cases, the parameters that define acceptable QoS are only indirect measurements of voice quality. The bottom line in voice services is not jitter, lost packets or delay, but how a conversation sounds to the human being at the telephone receiver.
Sound, however, is a qualitative parameter, and quality is subjective. One person’s opinion of good may be another person’s average or even poor. Engineers prefer to work with quantitative parameters that can be numerically measured and compared. Mean opinion score (MOS) is one of the first such indicators created to measure voice quality.
Relying on the experts
To arrive at an MOS score, a tester assembles a panel of "expert listeners" who rate the quality of speech samples that have been processed by the system under test. Ideally, a panel would consist of a mix of male and female listeners of various ages. The samples should reflect a range of typical voice conversations, and would be spoken by both male and female speakers of various ages. The panel rates the quality of the system under test output from 1 to 5, with 1 indicating the worse quality, and 5 indicating the best quality. The scores of the panelists are then averaged.
The obvious drawback of this method is the need for panels of listeners for each test. Over the past five years, innovations in mathematical speech analysis and modeling techniques have greatly alleviated that need, but have also greatly increased the choices for quality metrics.
The ITU E-model, for example, uses a set of impairment factors derived from predicted network jitter, delay, packet loss and codec performance to rate a network design on a scale of 0-100. The sum of the impairment factors is subtracted from an ideal score of 100 to arrive at an end-to-end quality index. This model, however, originally was designed for evaluations of designs, rather than networks in service.
Other models have been created to test networks. Perceptual speech quality measurement (PSQM) defined by ITU Recommendation P.861 compares a processed voice sample in the 300-3,400 Hz range against a clean original sample, and arrives at a relative score that indicates the amount of distortion. Created to evaluate codec performance, PSQM is not designed to reflect the effects of network packet loss.
Perceptual analysis measurement system (PAMS) uses a different algorithm than PSQM, and produces a different scale to measure voice quality. PAMS scores indicate listening quality and listening effort on a 1 to 5 scale that can be correlated to MOS scores.
Perceptual evaluation of speech quality (PESQ) is the successor to PSQM, per ITU-T Recommendation P.862. PESQ was developed based upon PAMS and an improved version of PSQM called PSQM+. Its target application is in networks that include a variety of voice technologies, including voice over Internet protocol (VoIP), integrated services digital network (ISDN) and global system for mobile communication (GSM), in addition to plain old telephone service (POTS). PsyTechnics Ltd. and OPTICOM GmbH hold the license to the PESQ algorithm. PESQ produces a score from 4.5 to -.5, with higher scores indicating better quality.
Iain Wood, director of marketing at PsyTechnics, acknowledges that PESQ tests are intrusive, meaning that the scores are derived under test conditions with the network out of service. A known IP voice data stream is injected into the network under test, and the output of the network is compared to the original signal to determine a quality measure.
The PESQ algorithm finds most application in test equipment used to evaluate networks as they are prototyped in the lab, or first being brought into service. Mark McClain, product manager, voice quality test suite, at PESQ licensee Empirix, says voice quality measurement is an evolving field.
"We found that our earliest customers were equipment manufacturers. The next market was the service providers, and now we see interest by enterprise customers who need to interface their networks to a provider’s network," McClain explains. These customers need to do "active" testing, which is intrusive, to characterize their networks prior to service.
Can you hear me now?
Monitoring actual calls in a live network is a different scenario, and experts have varying opinions on its accuracy. "The only true measure of voice quality in a live network requires a person sitting with a handset," says McClain.
Work is being done to create nonintrusive algorithms that use software embedded in a network element, such as a gateway or MTA, to monitor quality and send results to an operations support system. Such algorithms would make it possible to dynamically change network configurations—effectively automating the "Can you hear me now?" scenario.
In addition to making it possible dynamically to configure a network, real-time monitoring enables you to monitor, detect and possibly correct transient network conditions.
Bob Massad, vice president of marketing for Telchemy, notes that most of these conditions cause dropped packets. "Burstiness is one characteristic of a packet network that causes variable packet loss. In addition to being concerned about how packet loss degrades voice conversations in an unpredictable manner, a service provider needs to consider the ‘recency factor.’ Degradation of quality near the end of a conversation will be more noticeable than degradation which occurs during the conversation."
VQMon, offered by Telchemy, uses software agents embedded in network elements to observe and analyze RTP packet streams carrying voice calls in real time. According to Massad, VQMon will indicate when degradation occurs in a call. It also will give a VQMon-MOS score for the call, which is similar to the MOS score achieved by a listening panel. In addition, VQMon is designed to forward call quality information to operations systems for alarms and record keeping.
With so many choices, which is the right one? The answer depends on the application. For network element design and evaluation, such as for codecs and gateways, the comparative results of a PESQ-based test may be the right answer. On the other hand, if the problem is resolution of subscriber complaints about poor call quality, real-time monitoring is probably the solution.
Justin J. Junkus is president of KnowledgeLink, Inc. To discuss this topic further, email him at [email protected].