Comprehensive Guide to VoIP Voice Quality

From VoIPmonitor.org
Revision as of 16:53, 8 January 2026 by Admin (talk | contribs) (Rewrite: konsolidace a vylepšení struktury)


Quick Navigation
Degradation Factors Measurement Methods VoIPmonitor Features

Traditional Impairments

IP Network Impairments

Subjective

Objective

Core Features

Troubleshooting

Introduction

Voice over IP quality depends on managing both traditional telephony impairments (loudness, noise, echo) and IP-network-specific issues (jitter, packet loss, delay). This guide covers:

  • Voice quality degradation factors and their measurement
  • Subjective and objective quality metrics (MOS, PESQ, POLQA, E-model)
  • Practical monitoring with VoIPmonitor

Voice Quality Degradation Factors

Traditional Impairments

Volume and Loudness

Loudness is quantified using Loudness Ratings:

  • Send Loudness Rating (SLR): Microphone/transmit gain
  • Receive Loudness Rating (RLR): Earphone/receive sensitivity
  • Overall Loudness Rating (OLR): OLR=SLR+RLR
OLR Range Impact
< 5 dB Too loud, discomfort
5-15 dB Optimal range
> 15 dB Too quiet, speech difficult to understand

Noise

Circuit noise (electrical hiss/hum) and background noise (ambient sound) reduce Signal-to-Noise Ratio.

Circuit Noise Impact (dBmp)
Level Impact
-80 to -70 Negligible
-65 to -55 Noticeable degradation
> -40 Severe degradation

Sidetone

Sidetone is hearing your own voice in the earpiece. Measured by Sidetone Masking Rating (STMR):

  • Optimal: 8-12 dB
  • Too high → "dead phone" feeling
  • Too low → distracting echo

Echo

  • Talker Echo: Speaker hears their own voice reflected back
  • Listener Echo: Same phenomenon from opposite perspective
Parameter Description
Echo Return Loss (ERL) Attenuation of echo vs original
Echo Delay Round-trip time to echo source
TELR Talker Echo Loudness Rating

ℹ️ Note: Echo becomes more annoying as delay increases. At <30ms it merges with sidetone; at >100ms it's highly disruptive.

Mitigation: Echo cancellers (ITU-T G.168), proper impedance matching, acoustic echo cancellation for speakerphones.

Frequency Response

  • Narrowband (G.711): 300 Hz - 3.4 kHz
  • Wideband (G.722, Opus): Up to 7 kHz (more natural sound)

Delay (Latency)

End-to-end one-way delay includes codec, packetization, network, and jitter buffer delays.

ITU-T G.114 Delay Recommendations
One-way Delay Assessment
0-150 ms Acceptable (unnoticeable)
150-400 ms Acceptable but impacts conversation flow
> 400 ms Generally unacceptable

Codec Impairments

Each codec has inherent quality limits expressed as Equipment Impairment Factor (Ie):

Codec Ie Max MOS
G.711 (64k) 0 4.4-4.5
G.729A (8k) 10-11 ~4.0
G.723.1 (5.3k) 15-19 ~3.6

IP Network Impairments

Jitter (Packet Delay Variation)

Jitter is the variation in packet arrival times. Receivers use jitter buffers to absorb this variation:

  • Static buffer: Fixed size (simpler, higher latency)
  • Adaptive buffer: Adjusts dynamically (lower latency, handles varying conditions)

Trade-off: Larger buffer = fewer late packets dropped, but more delay.

Jitter buffer types and their effect:

  • Packet arrives within buffer window → played in sequence
  • Packet arrives after playout deadline → effectively lost

Packet Loss

Loss Rate Impact
0-1% Negligible with good PLC
1-3% Minor degradation
3-5% Noticeable degradation
> 10% Severe, often unacceptable

Burst vs Random Loss: Burst losses (consecutive packets) are far more damaging than evenly distributed losses at the same percentage. The Burst Ratio (BurstR) quantifies this pattern.

Packet Loss Concealment (PLC): Modern codecs hide isolated losses by interpolating or repeating audio frames.

Voice Quality Measurement Methods

MOS and Subjective Testing

Mean Opinion Score (MOS) is a 1-5 scale:

Score Quality
5 Excellent
4 Good
3 Fair
2 Poor
1 Bad

Listening-Only MOS (MOS-LQ)

Listeners rate one-way speech samples. Evaluates fidelity and distortion.

Conversational MOS (MOS-CQ)

Two people rate a real conversation. Captures delay, echo, and interaction factors.

Objective Measurement

Intrusive Models (PESQ, POLQA)

Compare reference audio with degraded audio:

  • PESQ (P.862): Narrowband, widely used
  • POLQA (P.863): Successor, supports wideband/super-wideband

The E-Model (ITU-T G.107)

Parametric model that predicts quality from network/codec parameters:

R=R0IsIdIe+A

Where:

  • R0 (~94): Base signal-to-noise rating
  • Is: Simultaneous impairments (noise, sidetone)
  • Id: Delay impairment
  • Ie: Equipment/codec impairment
  • A: Advantage factor (usually 0)

R-Factor to MOS mapping:

R-Factor MOS Quality
90+ > 4.3 Excellent
80 ~4.0 Good
70 ~3.6 Acceptable
60 ~3.1 Fair
< 50 < 2.6 Poor

ℹ️ Note: VoIPmonitor does NOT calculate R-Factor separately because it's mathematically redundant with MOS. See Glossary for details.

Monitoring Voice Quality with VoIPmonitor

VoIPmonitor passively captures SIP/RTP traffic and computes quality metrics for every call.

Triple MOS Calculation

VoIPmonitor provides three MOS values simulating different jitter buffer behaviors:

MOS Type Buffer Size Use Case
MOS F1 Fixed 50 ms Simulates basic endpoints
MOS F2 Fixed 200 ms Simulates buffered endpoints
MOS Adapt Adaptive up to 500 ms Simulates modern endpoints

Interpreting the values:

  • If MOS F1 << MOS F2: Network jitter often exceeded 50ms but stayed under 200ms
  • If all three are similar and high: Jitter was low
  • If all three are low: Significant packet loss or extreme jitter

⚠️ Warning: MOS Calculation Requires RTP Flow Charts

MOS calculation REQUIRES enabling RTP flow charts:

  • Set savegraph=yes in voipmonitor.conf, OR
  • Set recordGRAPH=ON in capture rules

Without this, MOS scores will NOT be calculated.

Why High PDV Causes Low MOS Even With Zero Packet Loss

A call can show 0% packet loss and 0ms average jitter but still have low MOS. This occurs when:

  • Packets arrive with high variation (e.g., some at 20ms, others at 200ms+)
  • These delayed packets exceed jitter buffer capacity
  • The E-model treats them as "effectively lost" for playout purposes

This is correct behavior - the audio stream cannot be reconstructed smoothly regardless of all packets eventually arriving.

Jitter Analysis

VoIPmonitor records jitter as a distribution across delay bins:

  • 0-50 ms, 50-70 ms, 70-90 ms, 90-120 ms, 120-150 ms, 150-200 ms, >300 ms

Filter calls by PDV events: "find calls with at least 10 packets delayed > 120ms"

Packet Loss Analysis

  • Total loss percentage
  • Consecutive loss counters (burst analysis)
  • Filter by patterns: "calls with > X consecutive packets lost"

MOS XR (RTCP-XR) Metrics

MOS XR values are reported by endpoints (phones, gateways) via RTCP-XR packets (RFC 3611), NOT calculated by VoIPmonitor.

Feature VoIPmonitor MOS MOS XR
Source VoIPmonitor sensor Endpoint device
Method E-model (parametric) Endpoint algorithm
Perspective Network path quality User experience at device
Availability All calls with RTP captured Only if endpoint sends RTCP-XR

Database columns: mos_xr_avg_all, mos_xr_min_all, mos_xr_avg_caller_all, mos_xr_avg_called_all, mos_xr_min_caller_all, mos_xr_min_called_all

Troubleshooting use:

  • VoIPmonitor MOS low, MOS XR high → Problem is AFTER monitoring point
  • VoIPmonitor MOS high, MOS XR low → Problem is at endpoint or last mile
  • Both low → Network issues affecting entire path

Clipping and Silence Detection

  • Clipping detection: Counts audio samples at maximum amplitude (distortion)
  • Silence detection: Measures silence percentage (>95% indicates one-way audio)

For configuration details, see Silence_detection.

MOS Configuration

# /etc/voipmonitor.conf

# Jitterbuffer simulation (all enabled by default)
jitterbuffer_f1 = yes
jitterbuffer_f2 = yes
jitterbuffer_adapt = yes

# Packet Loss Concealment (keep enabled)
plcdisable = no

# G.729 specific MOS (enable if using G.729 extensively)
mos_g729 = no
jitterbuffer_* Value Effect
yes MOS calculated normally
no Static 4.5 pushed to GUI
null NULL pushed to GUI

Troubleshooting Quality Issues

Diagnosing Where Issues Occur

When MOS is low but users report no issues (or vice versa), determine the problem location:

Diagnostic Framework
Metric Before Probe Between Probe & Endpoint Sensor Overload
VoIPmonitor MOS Low Acceptable Low
RTP Graph Loss/PDV High Low High (sniffer drops)
RTCP Loss (endpoint) High High Low
User Complaints Yes Yes No
RRD Buffer Usage Normal Normal 100%

Scenario A: Issue Before Probe

Indicators: Low MOS + high RTP graph loss/PDV

Cause: Network problems in path TO monitoring point (congestion, jitter, loss upstream)

Scenario B: Issue Between Probe and Endpoint

Indicators: Acceptable VoIPmonitor MOS + high RTCP loss from endpoints

Cause: Problems in network segment FROM probe TO endpoint (last mile, endpoint network)

Scenario C: Sensor Overload (False Low MOS)

Indicators:

  • Low MOS across many calls
  • Users report no issues
  • RTP shows loss, but RTCP reports show low/no loss
  • RRD charts show buffer usage at 100%, packet drops

Cause: Sensor is CPU/IO bound, dropping packets at capture interface

Fixing Sensor Overload

# /etc/voipmonitor.conf

# Enable modern threading
threading_expanded = yes
# For high traffic: threading_expanded = high_traffic

# Increase ring buffer (default 50, use 500+ for >100 Mbit)
ringbuffer = 500

# Reduce processing load if needed
pcap_dump = no

Also check:

  • Switch SPAN port for drops (interface counters)
  • SPAN buffer capacity during peak traffic

Validate fix:

systemctl restart voipmonitor

Check RRD charts: buffer usage should stay well below 100%.

Missing Quality Metrics on One Leg

If one call leg shows 0 packets or missing metrics:

Asymmetric Routing

RTP from one endpoint bypasses monitoring point.

Solution: Move sensor to see all traffic, or use ERSPAN from remote switch.

NAT Mismatch

SDP contains public IPs but RTP uses private IPs.

Solution: Configure IP mapping:

# /etc/voipmonitor.conf
# Map public IP (from SDP) to private IP (actual RTP)
natalias = 1.1.1.1 10.0.0.3

Multiple natalias lines can be used for multiple mappings.

External Links

See Also

AI Summary for RAG

Summary: Comprehensive guide to VoIP voice quality covering degradation factors (loudness, noise, echo, delay, codec impairments, jitter, packet loss), measurement methods (MOS, PESQ, POLQA, E-model), and VoIPmonitor monitoring features. VoIPmonitor calculates triple MOS (F1: 50ms buffer, F2: 200ms buffer, Adapt: 500ms adaptive) using E-model to simulate different jitter buffer behaviors. CRITICAL: MOS requires savegraph=yes or recordGRAPH=ON. High PDV causes low MOS even with 0% packet loss because delayed packets are treated as effectively lost. MOS XR (RTCP-XR, RFC 3611) are endpoint-reported quality scores vs VoIPmonitor's network-side parametric MOS. Troubleshooting framework: compare VoIPmonitor MOS, RTP graph loss/PDV, and RTCP loss to diagnose if issues are before probe, between probe and endpoint, or sensor overload. Sensor overload indicators: low MOS, no user complaints, high sniffer loss but low RTCP loss, RRD buffer at 100%. Fix with threading_expanded=yes, ringbuffer=500+. Missing metrics on one leg: check asymmetric routing or NAT mismatch (use natalias).

Keywords: voice quality, MOS, Mean Opinion Score, jitter, PDV, packet loss, burst ratio, E-model, R-factor, G.711, G.729, codec impairment, echo, TELR, delay, latency, PESQ, POLQA, VoIPmonitor, triple MOS, MOS F1, MOS F2, MOS Adapt, savegraph, MOS XR, RTCP-XR, RFC 3611, jitter buffer, sensor overload, ringbuffer, threading_expanded, natalias, clipping detection, silence detection, troubleshooting

Key Questions:

  • What factors affect VoIP voice quality?
  • How does VoIPmonitor calculate MOS scores?
  • What is the difference between MOS F1, F2, and Adapt?
  • Why is MOS low even with 0% packet loss?
  • How do I enable MOS calculation in VoIPmonitor?
  • What is MOS XR and how does it differ from VoIPmonitor MOS?
  • How do I diagnose where quality issues are occurring?
  • What indicates sensor overload vs real network problems?
  • How do I fix sensor overload causing false low MOS?
  • Why are quality metrics missing on one call leg?
  • How do I configure natalias for NAT environments?
  • What is the E-model and how is R-factor converted to MOS?
  • What are ITU-T G.114 delay recommendations?
  • How does packet loss burst ratio affect quality?