Audio Codecs - Comprehensive Guide: Difference between revisions

Latest revision as of 23:53, 11 December 2025

VoIPmonitor is capable of decoding a wide range of voice codecs commonly used in telephony and VoIP. This guide covers all the audio codecs that VoIPmonitor supports: G.711 (PCM A-law/μ-law), G.722, G.723.1, G.726, G.729a, Opus, AMR (Adaptive Multi-Rate), AMR-WB (Wideband), iLBC, Speex, GSM (Full Rate), Skype's Silk, iSAC, and MP4A-LATM (MPEG-4 Audio in LATM format).

Each codec is described in detail – including its bit rate, audio bandwidth, typical quality (MOS), and how it handles packet loss (PLC). VoIPmonitor supports decoding audio from all of these codecs in captured calls, ensuring you can analyze call quality across narrowband, wideband, and full-band audio.

Audio Bandwidth Terminology

Term	Frequency Range	Sample Rate	Description
Narrowband	300–3400 Hz	8 kHz	Traditional phone audio
Wideband (HD Voice)	50–7000 Hz	16 kHz	Clearer, more natural sound
Super-wideband	50–14000 Hz	32 kHz	Extended clarity
Full-band	20–20000 Hz	48 kHz	CD-quality audio

Generally, wideband and full-band codecs provide superior clarity and listener comfort than narrowband codecs. However, they often require higher bitrates. The codecs below span from narrowband to full-band, each balancing quality, bandwidth, and complexity differently.

G.711 (PCM A-law/μ-law) – 64 kbps Pulse Code Modulation

G.711 is the original PCM voice codec standard from 1972, providing uncompressed toll-quality audio. It samples analog voice at 8 kHz with 8-bit nonlinear quantization, resulting in a 64 kbps bitrate. Two companding laws are defined:

μ-law (Mu-law) – used in North America & Japan
A-law – used in Europe and elsewhere

These are logarithmic compression curves that reduce quantization noise for small signal amplitudes. In practical terms, G.711 delivers audio quality equivalent to classic landline telephony – narrowband 300–3400 Hz frequency range and very high fidelity for speech. It has a Mean Opinion Score around 4.1–4.3 (out of 5), essentially toll quality.

Parameter	Value
Bitrate	64 kbps
Bandwidth	Narrowband (8 kHz sample rate, ~300–3400 Hz)
Network bandwidth	~87 kbps per call (with overhead)
MOS	4.1–4.3 (toll quality)
Compression	None (PCM with companding only)
Algorithmic delay	~0 ms

Pros:

Excellent voice quality and interoperability (default for PSTN interconnect)
Very low latency (~0 added delay)
Tolerant of multiple encode/decode passes

Cons:

High bitrate (64k per call)
Narrowband only – cannot reproduce higher frequencies

Packet Loss Concealment: Because G.711 has no inter-frame dependency (each sample stands alone), lost packets can be concealed by simple techniques. The ITU G.711 standard includes Appendix I, which defines a PLC algorithm. With PLC enabled, G.711 can tolerate ~5% packet loss while maintaining MOS above 3.6, whereas without PLC even 2% loss can drop MOS below acceptable levels.

Template:Clear

G.722 – 64 kbps Wideband Audio (SB-ADPCM)

G.722 is a wideband speech codec from ITU (approved 1988) that delivers 7 kHz audio bandwidth at 48–64 kbps. It extends G.711 quality into the wideband range for clearer voice. G.722 uses Sub-Band ADPCM (SB-ADPCM): the audio (16 kHz sampling, 14-bit input) is split into two sub-bands (low 0–4 kHz and high 4–8 kHz) and each is encoded with ADPCM.

The standard allows 64, 56, or 48 kbps; 64 kbps (the most common) allocates 48 kbps to the lower band (which carries most speech energy) and 16 kbps to the upper band. Because of its wideband frequency response (~50–7000 Hz), G.722 yields noticeably richer and more natural voice than G.711.

Parameter	Value
Bitrate	64/56/48 kbps
Bandwidth	Wideband (50–7000 Hz, 16 kHz sampling)
MOS	~4.5 (wideband scale)
Frame length	10 ms
Algorithmic delay	~10 ms

Pros:

Much better audio fidelity than G.711 at the same bitrate
Low latency and simple decoding
Often supported in VoIP phones for HD audio calls

Cons:

Still uses 64k (unless lower rate negotiated)
Not as efficient as newer codecs (Opus, AMR-WB)
Optimized for voice, not suitable for music

Packet Loss Concealment: ITU later provided G.722 Appendix IV that defines a PLC method using adaptive muting and pitch extrapolation. If a frame is lost, the decoder extrapolates the low-band signal from previous data and mutes the high-band, then cross-fades when new data arrives.

G.723.1 – 6.3/5.3 kbps Dual-Rate Codec (ACELP/MP-MLQ)

G.723.1 is a low-bitrate speech codec standardized by ITU-T in 1996, designed for early VoIP and videoconferencing (used in H.323 systems). It provides two bit rates:

6.3 kbps – using Multi-Pulse LPC with Maximum Likelihood Quantization (MP-MLQ), MOS ~3.9
5.3 kbps – using Algebraic CELP, MOS ~3.7

Both operate on 30 ms frames. Despite the name, G.723.1 is unrelated to G.723 – it's a separate standard targeting much lower bit rates.

Parameter	Value
Bitrate	6.3 or 5.3 kbps
Frame length	30 ms (240 samples)
Lookahead	7.5 ms
Total algorithmic delay	37.5 ms
Bandwidth	Narrowband (8 kHz)
MOS	3.7–3.9 (fair)

Packet Loss Concealment: G.723.1 includes a built-in PLC technique in Annex A. If a frame is lost, the decoder uses the last good frame's parameters to synthesize a replacement, repeating the last pitch period and gradually attenuating the signal. Given the long frame length (30 ms), losing even one frame can be noticeable.

G.723.1 is mostly of historical interest now, supplanted by better codecs (G.729, Opus) that achieve more quality at similar bit rates.

G.726 – ADPCM at 16–40 kbps (Waveform Codec)

G.726 is an ADPCM (Adaptive Differential PCM) codec that compresses 64 kbps PCM down to 40, 32, 24, or 16 kbps. It was standardized in 1990 as a replacement for older ADPCM codecs (G.721 at 32k and G.723 at 24k/40k).

Mode	Bitrate	Bits/sample	MOS	Notes
G.726-40	40 kbps	5-bit	~4.2	Almost indistinguishable from G.711
G.726-32	32 kbps	4-bit	~3.9–4.0	Most commonly used (DECT phones)
G.726-24	24 kbps	3-bit	~3.7	Somewhat muffled
G.726-16	16 kbps	2-bit	~3.5	Only for extreme bandwidth limits

G.726 is a waveform codec like G.711, operating on audio samples directly, but uses differential encoding to reduce bitrate. At 32 kbps, it achieves near-toll quality speech at half the bitrate of G.711, making it valuable in bandwidth-limited systems.

Resilience and PLC: G.726 is somewhat sensitive to packet loss because the internal predictor state can go off track when samples are missing. It does not have a defined PLC in the standard. The codec is relatively insensitive to bit errors on a continuous channel, but with packet loss, the gap in the waveform can cause an audible click or jump.

G.729a – 8 kbps CS-ACELP Codec (Widely Used in VoIP)

G.729 is a popular 8 kbps speech codec that uses Conjugate-Structure Algebraic Code-Excited Linear Prediction (CS-ACELP). It was standardized by ITU-T in 1995 and became a go-to codec for VoIP due to its low bandwidth requirement and good voice quality for narrowband speech.

The common variant used is G.729a (Annex A), which is a slightly lower-complexity version compatible with the original G.729 bitstream.

Parameter	Value
Bitrate	8 kbps
Sample rate	8 kHz (narrowband)
Frame length	10 ms (80 bits/frame)
Lookahead	5 ms
Total algorithmic delay	15 ms
MOS	~3.9–4.0
Payload	20 ms packet = 2 frames = 20 bytes

Quality: One of the best in the narrowband 8 kbps class. MOS ~4.0 with one encode. G.729 can degrade if audio is transcoded multiple times or if fed non-speech signals (music, DTMF tones). It doesn't carry fax/modem signals well.

Packet Loss Concealment: G.729 has an effective PLC built into the decoder. Upon detecting loss, it uses the last good frame's parameters (pitch period, energy) to synthesize a replacement, typically by repeating the last pitch cycle and gradually reducing gain. According to developers, the PLC "works surprisingly well even under high packet loss rates."

Patents on G.729 expired in 2017, so it's now free to use. G.729 was the workhorse of VoIP for many years, now increasingly supplanted by Opus in new systems.

Opus – Adaptive Full-Band Codec (6–510 kbps, State-of-the-Art)

Opus is a modern, highly versatile audio codec standardized by the IETF in 2012 (RFC 6716). It is often considered the "Swiss Army knife" of audio codecs, as it can handle everything from narrowband speech at very low bitrates (~6 kbps) up to full-band stereo music at 510 kbps, all with low latency.

Opus is a hybrid codec: it combines:

SILK (Skype's speech codec) for linear-prediction voice coding
CELT (Xiph.Org's codec) for MDCT-based music coding

Opus can dynamically switch or mix these modes to optimize quality. In VoIP and WebRTC, Opus has become the go-to codec because it delivers unmatched voice quality across a wide range of network conditions. Many consider Opus the best available VoIP codec as of 2025.

Parameter	Value
Bitrate	6–510 kbps (adjustable)
Common VoIP usage	16–64 kbps (mono speech)
Bandwidth	Full-band (48 kHz capable)
Frame size	2.5–60 ms (default 20 ms)
Algorithmic delay	~26.5 ms (typical)
MOS	4.5–4.8 at high bitrates

Adaptive Features: Opus can dynamically adjust bitrate, bandwidth, and complexity in response to network conditions. It also supports audio bandwidth detection (speech vs music or hybrid content).

Packet Loss Concealment: Opus has very advanced PLC and FEC (Forward Error Correction) tools built-in. Additionally, Opus has an optional in-band FEC mode for voice: it can send a redundant copy of a lower-quality frame within the next packet. As a result, Opus maintains quality even on lossy networks – it's noted for "superior packet loss handling." Opus might still sound acceptable at 5–10% packet loss, where older codecs would be breaking up.

Use Cases: Virtually everywhere in WebRTC (browsers, Zoom/Teams audio, etc.), streaming, and even music production. It's royalty-free and open source.

Template:Clear

AMR (Adaptive Multi-Rate) – 3GPP Narrowband Codec (4.75–12.2 kbps)

AMR is an adaptive speech codec used primarily in mobile networks (2G/3G GSM and UMTS). Standardized by 3GPP in 1998, AMR-NB operates on 20 ms frames and can switch between 8 different bitrates:

Mode	Bitrate	Quality Notes
7	12.2 kbps	Best quality, similar to GSM EFR
6	10.2 kbps	Very good
5	7.95 kbps	Good, common in decent conditions
4	7.4 kbps	Typical balance mode
3	6.7 kbps	Compressed but understandable
2	5.9 kbps	Noticeable artifacts
1	5.15 kbps	Lower quality
0	4.75 kbps	Emergency mode, rough quality

The codec can dynamically mode-switch based on channel quality (controlled by the network). At 12.2 kbps (mode 7), AMR-NB is roughly on par with G.729 or slightly better, yielding near toll quality for speech.

Adaptation: When the network detects a lot of errors, it commands the sender to use a lower-rate mode that has more redundancy. When the channel is clear, it switches up to a higher bitrate for better quality.

Packet Loss/Error Concealment: AMR codec includes Frame Erasure Concealment. If a frame is marked as bad, the decoder uses the last received good frame's parameters to conceal the loss. AMR is robust against isolated frame losses – a single lost 20 ms might be barely noticeable.

AMR-WB (G.722.2) – Adaptive Multi-Rate Wideband (HD Voice at 6.6–23.85 kbps)

AMR-WB (Adaptive Multi-Rate Wideband), also known as ITU G.722.2, is the wideband extension of AMR. It is the codec behind HD Voice in 3G/4G networks, providing 50–7000 Hz audio (16 kHz sampling) for much clearer calls.

Mode	Bitrate	MOS
8	23.85 kbps	~4.5
7	23.05 kbps	~4.4
6	18.25 kbps	~4.3
5	15.85 kbps	~4.2
4	14.25 kbps	~4.1
3	12.65 kbps	~4.0 (VoLTE baseline)
2	8.85 kbps	~3.9
1	6.6 kbps	~3.7

At top rate (23.85 kbps), AMR-WB delivers wireline-quality or better voice: voices sound natural and crisp. Even lower modes maintain surprisingly good clarity. VoLTE (voice over LTE) initially used AMR-WB as the mandatory codec for HD voice calls.

AMR-WB at 12.65 kbps often outperforms older wideband codecs at higher bitrates (e.g., G.722 at 64k) due to its efficiency and optimization for speech.

iLBC – Internet Low Bitrate Codec (13.3/15.2 kbps, Loss-Tolerant)

iLBC is an open-source narrowband voice codec designed specifically for VoIP with robustness to packet loss in mind. It was developed by Global IP Solutions and published as RFC 3951 in 2004.

Mode	Frame Length	Bitrate	Bits/frame
Mode 20	20 ms	15.2 kbps	304 bits
Mode 30	30 ms	13.33 kbps	400 bits

What sets iLBC apart is its frame independence – each frame is coded largely independently of others, so that losses do not propagate errors. This means a lost frame doesn't ruin the decoding of subsequent frames.

Quality: Under ideal conditions (no loss), iLBC's quality is on par with G.729 or slightly better (MOS ~4.0–4.14). Where iLBC shines is when there is packet loss: its audio degrades much less dramatically than other codecs.

Packet Loss Concealment: Essentially built-in by design. Because each iLBC frame is encoded independently, the decoder can just play out a PLC-generated segment without having to reset internal state. Tests indicated iLBC could handle up to 15-20% loss and still yield understandable speech.

iLBC has been largely superseded by Opus, but remains an excellent fallback in high-loss scenarios.

Speex – Open-Source CELP Codec (2–44 kbps, Flexible)

Speex is an open-source speech codec project (by Xiph.org) released in the early 2000s. It was designed as a patent-free alternative to proprietary codecs. Speex is a CELP-based codec supporting multiple sampling rates:

Mode	Sample Rate	Bitrate Range
Narrowband	8 kHz	~2–24 kbps
Wideband	16 kHz	~4–36 kbps
Ultra-wideband	32 kHz	~8–44 kbps

Speex is configured by quality level (0 to 10). It supports VBR (variable bit rate), VAD (voice activity detection), and noise suppression. MOS range is 3.5–4.2 depending on bitrate.

While Speex provided good quality, it has been effectively succeeded by Opus (which the Speex authors also helped create). Nonetheless, Speex is still found in some legacy VoIP systems.

GSM Full Rate (GSM 6.10) – 13 kbps Early Cellular Codec

GSM 06.10 (GSM Full Rate) is the original speech codec for 2G GSM networks. It operates at 13 kbps and uses RPE-LTP (Regular Pulse Excitation – Long Term Prediction).

Parameter	Value
Bitrate	13 kbps
Frame length	20 ms
Sample rate	8 kHz
MOS	~3.5–3.7

GSM-FR audio quality is a bit lower than modern codecs; it sounds somewhat muffled compared to G.711. It was optimized for early GSM conditions. Modern networks have moved on to AMR, but GSM-FR remains a historical reference.

SILK – Skype's Wideband Codec (6–40 kbps, Adaptive)

SILK is an audio codec developed by Skype (introduced around 2009) for encoding speech at variable bitrates with an emphasis on wideband and super-wideband quality and low latency. It later became one half of the Opus codec.

Parameter	Value
Bitrate	~6–40 kbps
Sample rates	8, 12, 16, or 24 kHz
MOS	~4.0–4.5
Frame length	20 ms (typical)

SILK wideband (16 kHz) at moderate bitrates (~20-25 kbps) sounds very clear, often better than G.722. SILK can dynamically adjust bitrate in real-time based on network conditions.

Since Opus contains SILK, standalone SILK use is less common now, but it's effectively still around inside Opus for low/medium bitrate voice.

iSAC – Internet Speech Audio Codec (Adaptive Wideband Codec)

iSAC (Internet Speech Audio Codec) is another wideband speech codec from Global IP Solutions (the company behind iLBC). It is a wideband (16 kHz) codec that can also operate up to 32 kHz (super-wideband).

Parameter	Value
Bitrate	~10–32 kbps (adaptive)
Sample rate	16 kHz (wideband), 32 kHz (super-wideband)
Frame length	30–60 ms (adjustable)
MOS	~4.1–4.3

iSAC is an adaptive bit rate codec: instead of fixed modes, it can adjust its bit rate dynamically between roughly 10 kbps and 32 kbps depending on network conditions. Google integrated iSAC into early WebRTC.

Since Opus came, iSAC has been largely superseded (Opus fullband mode covers iSAC's territory and more), but it's part of the wideband codec legacy.

MP4A-LATM (AAC-LD) – MPEG-4 Audio in LATM Transport (High-Fidelity Codec)

MP4A-LATM refers to an AAC (Advanced Audio Coding) stream carried in the Low-overhead Audio Transport Multiplex format, often used in RTP. This is typically AAC-LD (Low Delay AAC) or AAC-ELD (Enhanced Low Delay) codec used for real-time communication.

Parameter	Value
Bandwidth	Full-band (48 kHz typical)
Bitrate	Variable (64 kbps typical for mono)
Latency	~20–30 ms
Quality	Studio-quality at sufficient bitrates

AAC-LD provides full-band audio with algorithmic delay low enough for conversations, while delivering excellent fidelity. It handles music far better than speech-specific codecs.

Use Cases: High-end telepresence systems (Cisco/Polycom), FaceTime (uses AAC-ELD), and scenarios requiring better sound quality than traditional voice codecs.

VoIPmonitor Codec Support and Best Practices

VoIPmonitor supports decoding of all the above codecs. This allows telecom engineers and analysts to capture calls and listen to or measure quality (MOS, waveform analysis) regardless of the codec used.

Codec Comparison Summary

Codec	Bitrate (kbps)	Bandwidth	MOS	PLC Quality	Primary Use
G.711	64	Narrowband	4.1–4.3	Good (Appendix I)	PSTN interconnect, default
G.722	48–64	Wideband	~4.5	Good	HD Voice VoIP
G.723.1	5.3–6.3	Narrowband	3.7–3.9	Basic	Legacy systems
G.726	16–40	Narrowband	3.5–4.2	Basic	DECT, bandwidth-limited
G.729a	8	Narrowband	3.9–4.0	Excellent	VoIP trunking
Opus	6–510	Full-band	4.5–4.8	Excellent (FEC)	WebRTC, modern VoIP
AMR	4.75–12.2	Narrowband	3.0–3.7	Good	GSM/UMTS
AMR-WB	6.6–23.85	Wideband	3.7–4.5	Good	VoLTE, HD Voice mobile
iLBC	13.3–15.2	Narrowband	~4.0	Excellent	Lossy networks
Speex	2–44	NB/WB/UWB	3.5–4.2	Good	Legacy open-source
GSM FR	13	Narrowband	3.5–3.7	Basic	Legacy GSM
SILK	6–40	WB/SWB	4.0–4.5	Good	Skype, Opus layer
iSAC	10–32	WB/SWB	4.1–4.3	Good	Early WebRTC
AAC-LD	Variable	Full-band	High	Basic	High-fidelity conferencing

Best Practices

Narrowband vs Wideband: Wideband codecs (G.722, AMR-WB, Opus, SILK, AAC-LD) naturally have higher user satisfaction due to richer sound. Narrowband codecs max out around MOS 4.2 even in perfect conditions.

Bitrate and Network Conditions: Lower-bitrate codecs save bandwidth but may introduce more artifacts. Ensure the chosen codec matches your scenario (Opus or G.722 for quality priority; G.729 or AMR for bandwidth-limited links).

Packet Loss Concealment: PLC algorithms significantly affect call quality in adverse conditions. Enabling PLC on G.711 can allow tolerating nearly 5% packet loss before quality falls below MOS 3.6. Modern codecs like Opus have very good PLC and FEC.

Monitoring MOS and QoE: VoIPmonitor uses the ITU E-model to estimate MOS for calls.

Recommended codec: If available, Opus should be the codec of choice for voice calls due to its clarity, robustness, and low latency.

@@ Line 446: / Line 446: @@
 == See Also ==
-* [[Main_Page|VoIPmonitor Documentation]]
+* [https://www.voipmonitor.org/doc/Comprehensive_Guide_to_VoIP_Voice_Quality Comprehensive Guide to VoIP Voice Quality]
-* [[Installation|VoIPmonitor Installation Guide]]
+* [https://www.voipmonitor.org/doc/Understanding_the_SIP_Protocol Understanding the SIP Protocol]
-* [[WebRTC_monitoring|WebRTC Monitoring]]
+* [https://www.voipmonitor.org/doc/Understanding_the_RTP_Protocol Understanding the RTP Protocol]
-== External References ==
-* [https://www.voipmonitor.org/ VoIPmonitor Official Website]
-* [https://telnyx.com/resources/voip-codec-list VoIP Codec List - Telnyx]
-* [https://datatracker.ietf.org/doc/html/rfc6716 RFC 6716 - Opus Codec Definition]
-* [https://en.wikipedia.org/wiki/Internet_Low_Bitrate_Codec iLBC - Wikipedia]
-* [https://www.opus-codec.org/ Opus Codec Official Website]
 [[Category:Audio]]