Quick Navigation
Degradation Factors	Measurement Methods	VoIPmonitor Features
Traditional Impairments Loudness & Volume Circuit & Background Noise Echo (Talker & Listener) Absolute Delay Codec Impairments IP Network Impairments Jitter / PDV Packet Loss	Subjective MOS Testing (P.800) MOS-LQ MOS-CQ Objective PESQ / POLQA E-Model (G.107)	Core Features Triple MOS (F1, F2, Adapt) MOS XR (RTCP-XR) Jitter Distribution Loss & Burst Analysis Clipping/Silence Detection Troubleshooting Where Issues Occur Sensor Overload

Introduction

Voice over IP quality depends on managing both traditional telephony impairments (loudness, noise, echo) and IP-network-specific issues (jitter, packet loss, delay). This guide covers:

Voice quality degradation factors and their measurement
Subjective and objective quality metrics (MOS, PESQ, POLQA, E-model)
Practical monitoring with VoIPmonitor

Voice Quality Degradation Factors

Traditional Impairments

Volume and Loudness

Loudness is quantified using Loudness Ratings:

Send Loudness Rating (SLR): Microphone/transmit gain
Receive Loudness Rating (RLR): Earphone/receive sensitivity
Overall Loudness Rating (OLR): $O L R = S L R + R L R$

OLR Range	Impact
< 5 dB	Too loud, discomfort
5-15 dB	Optimal range
> 15 dB	Too quiet, speech difficult to understand

Noise

Circuit noise (electrical hiss/hum) and background noise (ambient sound) reduce Signal-to-Noise Ratio.

Circuit Noise Impact (dBmp)
Level	Impact
-80 to -70	Negligible
-65 to -55	Noticeable degradation
> -40	Severe degradation

Sidetone

Sidetone is hearing your own voice in the earpiece. Measured by Sidetone Masking Rating (STMR):

Optimal: 8-12 dB
Too high → "dead phone" feeling
Too low → distracting echo

Echo

Talker Echo: Speaker hears their own voice reflected back
Listener Echo: Same phenomenon from opposite perspective

Parameter	Description
Echo Return Loss (ERL)	Attenuation of echo vs original
Echo Delay	Round-trip time to echo source
TELR	Talker Echo Loudness Rating

ℹ️ Note: Echo becomes more annoying as delay increases. At <30ms it merges with sidetone; at >100ms it's highly disruptive.

Mitigation: Echo cancellers (ITU-T G.168), proper impedance matching, acoustic echo cancellation for speakerphones.

Frequency Response

Narrowband (G.711): 300 Hz - 3.4 kHz
Wideband (G.722, Opus): Up to 7 kHz (more natural sound)

Delay (Latency)

End-to-end one-way delay includes codec, packetization, network, and jitter buffer delays.

ITU-T G.114 Delay Recommendations
One-way Delay	Assessment
0-150 ms	Acceptable (unnoticeable)
150-400 ms	Acceptable but impacts conversation flow
> 400 ms	Generally unacceptable

Codec Impairments

Each codec has inherent quality limits expressed as Equipment Impairment Factor (I_e):

Codec	I_e	Max MOS
G.711 (64k)	0	4.4-4.5
G.729A (8k)	10-11	~4.0
G.723.1 (5.3k)	15-19	~3.6

IP Network Impairments

Jitter (Packet Delay Variation)

Jitter is the variation in packet arrival times. Receivers use jitter buffers to absorb this variation:

Static buffer: Fixed size (simpler, higher latency)
Adaptive buffer: Adjusts dynamically (lower latency, handles varying conditions)

Trade-off: Larger buffer = fewer late packets dropped, but more delay.

Jitter buffer types and their effect:

Packet arrives within buffer window → played in sequence
Packet arrives after playout deadline → effectively lost

Packet Loss

Loss Rate	Impact
0-1%	Negligible with good PLC
1-3%	Minor degradation
3-5%	Noticeable degradation
> 10%	Severe, often unacceptable

Burst vs Random Loss: Burst losses (consecutive packets) are far more damaging than evenly distributed losses at the same percentage. The Burst Ratio (BurstR) quantifies this pattern.

Packet Loss Concealment (PLC): Modern codecs hide isolated losses by interpolating or repeating audio frames.

Voice Quality Measurement Methods

MOS and Subjective Testing

Mean Opinion Score (MOS) is a 1-5 scale:

Score	Quality
5	Excellent
4	Good
3	Fair
2	Poor
1	Bad

Listening-Only MOS (MOS-LQ)

Listeners rate one-way speech samples. Evaluates fidelity and distortion.

Conversational MOS (MOS-CQ)

Two people rate a real conversation. Captures delay, echo, and interaction factors.

Objective Measurement

Intrusive Models (PESQ, POLQA)

Compare reference audio with degraded audio:

PESQ (P.862): Narrowband, widely used
POLQA (P.863): Successor, supports wideband/super-wideband

The E-Model (ITU-T G.107)

Parametric model that predicts quality from network/codec parameters:

R = R_{0} - I_{s} - I_{d} - I_{e} + A

Where:

R₀ (~94): Base signal-to-noise rating
I_s: Simultaneous impairments (noise, sidetone)
I_d: Delay impairment
I_e: Equipment/codec impairment
A: Advantage factor (usually 0)

R-Factor to MOS mapping:

R-Factor	MOS	Quality
90+	> 4.3	Excellent
80	~4.0	Good
70	~3.6	Acceptable
60	~3.1	Fair
< 50	< 2.6	Poor

ℹ️ Note: VoIPmonitor does NOT calculate R-Factor separately because it's mathematically redundant with MOS. See Glossary for details.

Monitoring Voice Quality with VoIPmonitor

VoIPmonitor passively captures SIP/RTP traffic and computes quality metrics for every call.

Triple MOS Calculation

VoIPmonitor provides three MOS values simulating different jitter buffer behaviors:

MOS Type	Buffer Size	Use Case
MOS F1	Fixed 50 ms	Simulates basic endpoints
MOS F2	Fixed 200 ms	Simulates buffered endpoints
MOS Adapt	Adaptive up to 500 ms	Simulates modern endpoints

Interpreting the values:

If MOS F1 << MOS F2: Network jitter often exceeded 50ms but stayed under 200ms
If all three are similar and high: Jitter was low
If all three are low: Significant packet loss or extreme jitter

⚠️ Warning: MOS Calculation Requires RTP Flow Charts

MOS calculation REQUIRES enabling RTP flow charts:

Set savegraph=yes in voipmonitor.conf, OR
Set recordGRAPH=ON in capture rules

Without this, MOS scores will NOT be calculated.

Why High PDV Causes Low MOS Even With Zero Packet Loss

A call can show 0% packet loss and 0ms average jitter but still have low MOS. This occurs when:

Packets arrive with high variation (e.g., some at 20ms, others at 200ms+)
These delayed packets exceed jitter buffer capacity
The E-model treats them as "effectively lost" for playout purposes

This is correct behavior - the audio stream cannot be reconstructed smoothly regardless of all packets eventually arriving.

Jitter Analysis

VoIPmonitor records jitter as a distribution across delay bins:

0-50 ms, 50-70 ms, 70-90 ms, 90-120 ms, 120-150 ms, 150-200 ms, >300 ms

Filter calls by PDV events: "find calls with at least 10 packets delayed > 120ms"

Packet Loss Analysis

Total loss percentage
Consecutive loss counters (burst analysis)
Filter by patterns: "calls with > X consecutive packets lost"

MOS XR (RTCP-XR) Metrics

MOS XR values are reported by endpoints (phones, gateways) via RTCP-XR packets (RFC 3611), NOT calculated by VoIPmonitor.

Feature	VoIPmonitor MOS	MOS XR
Source	VoIPmonitor sensor	Endpoint device
Method	E-model (parametric)	Endpoint algorithm
Perspective	Network path quality	User experience at device
Availability	All calls with RTP captured	Only if endpoint sends RTCP-XR

Database columns: mos_xr_avg_all, mos_xr_min_all, mos_xr_avg_caller_all, mos_xr_avg_called_all, mos_xr_min_caller_all, mos_xr_min_called_all

Troubleshooting use:

VoIPmonitor MOS low, MOS XR high → Problem is AFTER monitoring point
VoIPmonitor MOS high, MOS XR low → Problem is at endpoint or last mile
Both low → Network issues affecting entire path

Clipping and Silence Detection

Clipping detection: Counts audio samples at maximum amplitude (distortion)
Silence detection: Measures silence percentage (>95% indicates one-way audio)

For configuration details, see Silence_detection.

MOS Configuration

# /etc/voipmonitor.conf

# Jitterbuffer simulation (all enabled by default)
jitterbuffer_f1 = yes
jitterbuffer_f2 = yes
jitterbuffer_adapt = yes

# Packet Loss Concealment (keep enabled)
plcdisable = no

# G.729 specific MOS (enable if using G.729 extensively)
mos_g729 = no

jitterbuffer_* Value	Effect
`yes`	MOS calculated normally
`no`	Static 4.5 pushed to GUI
`null`	NULL pushed to GUI

Troubleshooting Quality Issues

Diagnosing Where Issues Occur

When MOS is low but users report no issues (or vice versa), determine the problem location:

Diagnostic Framework
Metric	Before Probe	Between Probe & Endpoint	Sensor Overload
VoIPmonitor MOS	Low	Acceptable	Low
RTP Graph Loss/PDV	High	Low	High (sniffer drops)
RTCP Loss (endpoint)	High	High	Low
User Complaints	Yes	Yes	No
RRD Buffer Usage	Normal	Normal	100%

Scenario A: Issue Before Probe

Indicators: Low MOS + high RTP graph loss/PDV

Cause: Network problems in path TO monitoring point (congestion, jitter, loss upstream)

Scenario B: Issue Between Probe and Endpoint

Indicators: Acceptable VoIPmonitor MOS + high RTCP loss from endpoints

Cause: Problems in network segment FROM probe TO endpoint (last mile, endpoint network)

Scenario C: Sensor Overload (False Low MOS)

Indicators:

Low MOS across many calls
Users report no issues
RTP shows loss, but RTCP reports show low/no loss
RRD charts show buffer usage at 100%, packet drops

Cause: Sensor is CPU/IO bound, dropping packets at capture interface

Fixing Sensor Overload

# /etc/voipmonitor.conf

# Enable modern threading
threading_expanded = yes
# For high traffic: threading_expanded = high_traffic

# Increase ring buffer (default 50, use 500+ for >100 Mbit)
ringbuffer = 500

# Reduce processing load if needed
pcap_dump = no

Also check:

Switch SPAN port for drops (interface counters)
SPAN buffer capacity during peak traffic

Validate fix:

systemctl restart voipmonitor

Check RRD charts: buffer usage should stay well below 100%.

Missing Quality Metrics on One Leg

If one call leg shows 0 packets or missing metrics:

Asymmetric Routing

RTP from one endpoint bypasses monitoring point.

Solution: Move sensor to see all traffic, or use ERSPAN from remote switch.

NAT Mismatch

SDP contains public IPs but RTP uses private IPs.

Solution: Configure IP mapping:

# /etc/voipmonitor.conf
# Map public IP (from SDP) to private IP (actual RTP)
natalias = 1.1.1.1 10.0.0.3

Multiple natalias lines can be used for multiple mappings.

External Links

AI Summary for RAG

Summary: Comprehensive guide to VoIP voice quality covering degradation factors (loudness, noise, echo, delay, codec impairments, jitter, packet loss), measurement methods (MOS, PESQ, POLQA, E-model), and VoIPmonitor monitoring features. VoIPmonitor calculates triple MOS (F1: 50ms buffer, F2: 200ms buffer, Adapt: 500ms adaptive) using E-model to simulate different jitter buffer behaviors. CRITICAL: MOS requires savegraph=yes or recordGRAPH=ON. High PDV causes low MOS even with 0% packet loss because delayed packets are treated as effectively lost. MOS XR (RTCP-XR, RFC 3611) are endpoint-reported quality scores vs VoIPmonitor's network-side parametric MOS. Troubleshooting framework: compare VoIPmonitor MOS, RTP graph loss/PDV, and RTCP loss to diagnose if issues are before probe, between probe and endpoint, or sensor overload. Sensor overload indicators: low MOS, no user complaints, high sniffer loss but low RTCP loss, RRD buffer at 100%. Fix with threading_expanded=yes, ringbuffer=500+. Missing metrics on one leg: check asymmetric routing or NAT mismatch (use natalias).

Keywords: voice quality, MOS, Mean Opinion Score, jitter, PDV, packet loss, burst ratio, E-model, R-factor, G.711, G.729, codec impairment, echo, TELR, delay, latency, PESQ, POLQA, VoIPmonitor, triple MOS, MOS F1, MOS F2, MOS Adapt, savegraph, MOS XR, RTCP-XR, RFC 3611, jitter buffer, sensor overload, ringbuffer, threading_expanded, natalias, clipping detection, silence detection, troubleshooting

Key Questions:

What factors affect VoIP voice quality?
How does VoIPmonitor calculate MOS scores?
What is the difference between MOS F1, F2, and Adapt?
Why is MOS low even with 0% packet loss?
How do I enable MOS calculation in VoIPmonitor?
What is MOS XR and how does it differ from VoIPmonitor MOS?
How do I diagnose where quality issues are occurring?
What indicates sensor overload vs real network problems?
How do I fix sensor overload causing false low MOS?
Why are quality metrics missing on one call leg?
How do I configure natalias for NAT environments?
What is the E-model and how is R-factor converted to MOS?
What are ITU-T G.114 delay recommendations?
How does packet loss burst ratio affect quality?