Audio Codecs - Comprehensive Guide: Difference between revisions

From VoIPmonitor.org
No edit summary
 
No edit summary
Line 440: Line 440:
* '''Packet Loss Concealment:''' PLC algorithms significantly affect call quality in adverse conditions. Enabling PLC on G.711 can allow tolerating nearly 5% packet loss before quality falls below MOS 3.6. Modern codecs like Opus have very good PLC and FEC.
* '''Packet Loss Concealment:''' PLC algorithms significantly affect call quality in adverse conditions. Enabling PLC on G.711 can allow tolerating nearly 5% packet loss before quality falls below MOS 3.6. Modern codecs like Opus have very good PLC and FEC.


* '''Monitoring MOS and QoE:''' VoIPmonitor uses the ITU E-model to estimate MOS for calls. These estimates take into account codec type (each codec has a base MOS and impairment factor for loss).
* '''Monitoring MOS and QoE:''' VoIPmonitor uses the ITU E-model to estimate MOS for calls.


* '''Recommended codec:''' If available, '''Opus should be the codec of choice''' for voice calls due to its clarity, robustness, and low latency.
* '''Recommended codec:''' If available, '''Opus should be the codec of choice''' for voice calls due to its clarity, robustness, and low latency.

Revision as of 23:51, 11 December 2025

VoIPmonitor is capable of decoding a wide range of voice codecs commonly used in telephony and VoIP. This guide covers all the audio codecs that VoIPmonitor supports: G.711 (PCM A-law/μ-law), G.722, G.723.1, G.726, G.729a, Opus, AMR (Adaptive Multi-Rate), AMR-WB (Wideband), iLBC, Speex, GSM (Full Rate), Skype's Silk, iSAC, and MP4A-LATM (MPEG-4 Audio in LATM format).

Each codec is described in detail – including its bit rate, audio bandwidth, typical quality (MOS), and how it handles packet loss (PLC). VoIPmonitor supports decoding audio from all of these codecs in captured calls, ensuring you can analyze call quality across narrowband, wideband, and full-band audio.

Audio Bandwidth Terminology

Term Frequency Range Sample Rate Description
Narrowband 300–3400 Hz 8 kHz Traditional phone audio
Wideband (HD Voice) 50–7000 Hz 16 kHz Clearer, more natural sound
Super-wideband 50–14000 Hz 32 kHz Extended clarity
Full-band 20–20000 Hz 48 kHz CD-quality audio

Generally, wideband and full-band codecs provide superior clarity and listener comfort than narrowband codecs. However, they often require higher bitrates. The codecs below span from narrowband to full-band, each balancing quality, bandwidth, and complexity differently.

G.711 (PCM A-law/μ-law) – 64 kbps Pulse Code Modulation

Comparison of A-law (blue) and μ-law (red) companding curves used in G.711, showing how these logarithmic quantization laws compress audio dynamic range. μ-law (red) has a slightly larger dynamic range than A-law (blue), but both achieve 8-bit PCM quality sufficient for telephone audio.

G.711 is the original PCM voice codec standard from 1972, providing uncompressed toll-quality audio. It samples analog voice at 8 kHz with 8-bit nonlinear quantization, resulting in a 64 kbps bitrate. Two companding laws are defined:

  • μ-law (Mu-law) – used in North America & Japan
  • A-law – used in Europe and elsewhere

These are logarithmic compression curves that reduce quantization noise for small signal amplitudes. In practical terms, G.711 delivers audio quality equivalent to classic landline telephony – narrowband 300–3400 Hz frequency range and very high fidelity for speech. It has a Mean Opinion Score around 4.1–4.3 (out of 5), essentially toll quality.

Parameter Value
Bitrate 64 kbps
Bandwidth Narrowband (8 kHz sample rate, ~300–3400 Hz)
Network bandwidth ~87 kbps per call (with overhead)
MOS 4.1–4.3 (toll quality)
Compression None (PCM with companding only)
Algorithmic delay ~0 ms

Pros:

  • Excellent voice quality and interoperability (default for PSTN interconnect)
  • Very low latency (~0 added delay)
  • Tolerant of multiple encode/decode passes

Cons:

  • High bitrate (64k per call)
  • Narrowband only – cannot reproduce higher frequencies

Packet Loss Concealment: Because G.711 has no inter-frame dependency (each sample stands alone), lost packets can be concealed by simple techniques. The ITU G.711 standard includes Appendix I, which defines a PLC algorithm. With PLC enabled, G.711 can tolerate ~5% packet loss while maintaining MOS above 3.6, whereas without PLC even 2% loss can drop MOS below acceptable levels.

Template:Clear

G.722 – 64 kbps Wideband Audio (SB-ADPCM)

G.722 is a wideband speech codec from ITU (approved 1988) that delivers 7 kHz audio bandwidth at 48–64 kbps. It extends G.711 quality into the wideband range for clearer voice. G.722 uses Sub-Band ADPCM (SB-ADPCM): the audio (16 kHz sampling, 14-bit input) is split into two sub-bands (low 0–4 kHz and high 4–8 kHz) and each is encoded with ADPCM.

The standard allows 64, 56, or 48 kbps; 64 kbps (the most common) allocates 48 kbps to the lower band (which carries most speech energy) and 16 kbps to the upper band. Because of its wideband frequency response (~50–7000 Hz), G.722 yields noticeably richer and more natural voice than G.711.

Parameter Value
Bitrate 64/56/48 kbps
Bandwidth Wideband (50–7000 Hz, 16 kHz sampling)
MOS ~4.5 (wideband scale)
Frame length 10 ms
Algorithmic delay ~10 ms

Pros:

  • Much better audio fidelity than G.711 at the same bitrate
  • Low latency and simple decoding
  • Often supported in VoIP phones for HD audio calls

Cons:

  • Still uses 64k (unless lower rate negotiated)
  • Not as efficient as newer codecs (Opus, AMR-WB)
  • Optimized for voice, not suitable for music

Packet Loss Concealment: ITU later provided G.722 Appendix IV that defines a PLC method using adaptive muting and pitch extrapolation. If a frame is lost, the decoder extrapolates the low-band signal from previous data and mutes the high-band, then cross-fades when new data arrives.

G.723.1 – 6.3/5.3 kbps Dual-Rate Codec (ACELP/MP-MLQ)

G.723.1 is a low-bitrate speech codec standardized by ITU-T in 1996, designed for early VoIP and videoconferencing (used in H.323 systems). It provides two bit rates:

  • 6.3 kbps – using Multi-Pulse LPC with Maximum Likelihood Quantization (MP-MLQ), MOS ~3.9
  • 5.3 kbps – using Algebraic CELP, MOS ~3.7

Both operate on 30 ms frames. Despite the name, G.723.1 is unrelated to G.723 – it's a separate standard targeting much lower bit rates.

Parameter Value
Bitrate 6.3 or 5.3 kbps
Frame length 30 ms (240 samples)
Lookahead 7.5 ms
Total algorithmic delay 37.5 ms
Bandwidth Narrowband (8 kHz)
MOS 3.7–3.9 (fair)

Packet Loss Concealment: G.723.1 includes a built-in PLC technique in Annex A. If a frame is lost, the decoder uses the last good frame's parameters to synthesize a replacement, repeating the last pitch period and gradually attenuating the signal. Given the long frame length (30 ms), losing even one frame can be noticeable.

G.723.1 is mostly of historical interest now, supplanted by better codecs (G.729, Opus) that achieve more quality at similar bit rates.

G.726 – ADPCM at 16–40 kbps (Waveform Codec)

G.726 is an ADPCM (Adaptive Differential PCM) codec that compresses 64 kbps PCM down to 40, 32, 24, or 16 kbps. It was standardized in 1990 as a replacement for older ADPCM codecs (G.721 at 32k and G.723 at 24k/40k).

Mode Bitrate Bits/sample MOS Notes
G.726-40 40 kbps 5-bit ~4.2 Almost indistinguishable from G.711
G.726-32 32 kbps 4-bit ~3.9–4.0 Most commonly used (DECT phones)
G.726-24 24 kbps 3-bit ~3.7 Somewhat muffled
G.726-16 16 kbps 2-bit ~3.5 Only for extreme bandwidth limits

G.726 is a waveform codec like G.711, operating on audio samples directly, but uses differential encoding to reduce bitrate. At 32 kbps, it achieves near-toll quality speech at half the bitrate of G.711, making it valuable in bandwidth-limited systems.

Resilience and PLC: G.726 is somewhat sensitive to packet loss because the internal predictor state can go off track when samples are missing. It does not have a defined PLC in the standard. The codec is relatively insensitive to bit errors on a continuous channel, but with packet loss, the gap in the waveform can cause an audible click or jump.

G.729a – 8 kbps CS-ACELP Codec (Widely Used in VoIP)

G.729 is a popular 8 kbps speech codec that uses Conjugate-Structure Algebraic Code-Excited Linear Prediction (CS-ACELP). It was standardized by ITU-T in 1995 and became a go-to codec for VoIP due to its low bandwidth requirement and good voice quality for narrowband speech.

The common variant used is G.729a (Annex A), which is a slightly lower-complexity version compatible with the original G.729 bitstream.

Parameter Value
Bitrate 8 kbps
Sample rate 8 kHz (narrowband)
Frame length 10 ms (80 bits/frame)
Lookahead 5 ms
Total algorithmic delay 15 ms
MOS ~3.9–4.0
Payload 20 ms packet = 2 frames = 20 bytes

Quality: One of the best in the narrowband 8 kbps class. MOS ~4.0 with one encode. G.729 can degrade if audio is transcoded multiple times or if fed non-speech signals (music, DTMF tones). It doesn't carry fax/modem signals well.

Packet Loss Concealment: G.729 has an effective PLC built into the decoder. Upon detecting loss, it uses the last good frame's parameters (pitch period, energy) to synthesize a replacement, typically by repeating the last pitch cycle and gradually reducing gain. According to developers, the PLC "works surprisingly well even under high packet loss rates."

Patents on G.729 expired in 2017, so it's now free to use. G.729 was the workhorse of VoIP for many years, now increasingly supplanted by Opus in new systems.

Opus – Adaptive Full-Band Codec (6–510 kbps, State-of-the-Art)

Quality vs. bitrate comparison of various codecs. Opus (green) consistently achieves higher quality (MOS) at a given bitrate compared to older codecs like Speex, AMR, G.722, MP3, AAC, etc. Notably, Opus in wideband mode outperforms AMR-WB, and at low bitrates Opus far exceeds codecs like AMR-NB or iLBC.

Opus is a modern, highly versatile audio codec standardized by the IETF in 2012 (RFC 6716). It is often considered the "Swiss Army knife" of audio codecs, as it can handle everything from narrowband speech at very low bitrates (~6 kbps) up to full-band stereo music at 510 kbps, all with low latency.

Opus is a hybrid codec: it combines:

  • SILK (Skype's speech codec) for linear-prediction voice coding
  • CELT (Xiph.Org's codec) for MDCT-based music coding

Opus can dynamically switch or mix these modes to optimize quality. In VoIP and WebRTC, Opus has become the go-to codec because it delivers unmatched voice quality across a wide range of network conditions. Many consider Opus the best available VoIP codec as of 2025.

Parameter Value
Bitrate 6–510 kbps (adjustable)
Common VoIP usage 16–64 kbps (mono speech)
Bandwidth Full-band (48 kHz capable)
Frame size 2.5–60 ms (default 20 ms)
Algorithmic delay ~26.5 ms (typical)
MOS 4.5–4.8 at high bitrates

Adaptive Features: Opus can dynamically adjust bitrate, bandwidth, and complexity in response to network conditions. It also supports audio bandwidth detection (speech vs music or hybrid content).

Packet Loss Concealment: Opus has very advanced PLC and FEC (Forward Error Correction) tools built-in. Additionally, Opus has an optional in-band FEC mode for voice: it can send a redundant copy of a lower-quality frame within the next packet. As a result, Opus maintains quality even on lossy networks – it's noted for "superior packet loss handling." Opus might still sound acceptable at 5–10% packet loss, where older codecs would be breaking up.

Use Cases: Virtually everywhere in WebRTC (browsers, Zoom/Teams audio, etc.), streaming, and even music production. It's royalty-free and open source.

Template:Clear

AMR (Adaptive Multi-Rate) – 3GPP Narrowband Codec (4.75–12.2 kbps)

AMR is an adaptive speech codec used primarily in mobile networks (2G/3G GSM and UMTS). Standardized by 3GPP in 1998, AMR-NB operates on 20 ms frames and can switch between 8 different bitrates:

Mode Bitrate Quality Notes
7 12.2 kbps Best quality, similar to GSM EFR
6 10.2 kbps Very good
5 7.95 kbps Good, common in decent conditions
4 7.4 kbps Typical balance mode
3 6.7 kbps Compressed but understandable
2 5.9 kbps Noticeable artifacts
1 5.15 kbps Lower quality
0 4.75 kbps Emergency mode, rough quality

The codec can dynamically mode-switch based on channel quality (controlled by the network). At 12.2 kbps (mode 7), AMR-NB is roughly on par with G.729 or slightly better, yielding near toll quality for speech.

Adaptation: When the network detects a lot of errors, it commands the sender to use a lower-rate mode that has more redundancy. When the channel is clear, it switches up to a higher bitrate for better quality.

Packet Loss/Error Concealment: AMR codec includes Frame Erasure Concealment. If a frame is marked as bad, the decoder uses the last received good frame's parameters to conceal the loss. AMR is robust against isolated frame losses – a single lost 20 ms might be barely noticeable.

AMR-WB (G.722.2) – Adaptive Multi-Rate Wideband (HD Voice at 6.6–23.85 kbps)

AMR-WB (Adaptive Multi-Rate Wideband), also known as ITU G.722.2, is the wideband extension of AMR. It is the codec behind HD Voice in 3G/4G networks, providing 50–7000 Hz audio (16 kHz sampling) for much clearer calls.

Mode Bitrate MOS
8 23.85 kbps ~4.5
7 23.05 kbps ~4.4
6 18.25 kbps ~4.3
5 15.85 kbps ~4.2
4 14.25 kbps ~4.1
3 12.65 kbps ~4.0 (VoLTE baseline)
2 8.85 kbps ~3.9
1 6.6 kbps ~3.7

At top rate (23.85 kbps), AMR-WB delivers wireline-quality or better voice: voices sound natural and crisp. Even lower modes maintain surprisingly good clarity. VoLTE (voice over LTE) initially used AMR-WB as the mandatory codec for HD voice calls.

AMR-WB at 12.65 kbps often outperforms older wideband codecs at higher bitrates (e.g., G.722 at 64k) due to its efficiency and optimization for speech.

iLBC – Internet Low Bitrate Codec (13.3/15.2 kbps, Loss-Tolerant)

iLBC is an open-source narrowband voice codec designed specifically for VoIP with robustness to packet loss in mind. It was developed by Global IP Solutions and published as RFC 3951 in 2004.

Mode Frame Length Bitrate Bits/frame
Mode 20 20 ms 15.2 kbps 304 bits
Mode 30 30 ms 13.33 kbps 400 bits

What sets iLBC apart is its frame independence – each frame is coded largely independently of others, so that losses do not propagate errors. This means a lost frame doesn't ruin the decoding of subsequent frames.

Quality: Under ideal conditions (no loss), iLBC's quality is on par with G.729 or slightly better (MOS ~4.0–4.14). Where iLBC shines is when there is packet loss: its audio degrades much less dramatically than other codecs.

Packet Loss Concealment: Essentially built-in by design. Because each iLBC frame is encoded independently, the decoder can just play out a PLC-generated segment without having to reset internal state. Tests indicated iLBC could handle up to 15-20% loss and still yield understandable speech.

iLBC has been largely superseded by Opus, but remains an excellent fallback in high-loss scenarios.

Speex – Open-Source CELP Codec (2–44 kbps, Flexible)

Speex is an open-source speech codec project (by Xiph.org) released in the early 2000s. It was designed as a patent-free alternative to proprietary codecs. Speex is a CELP-based codec supporting multiple sampling rates:

Mode Sample Rate Bitrate Range
Narrowband 8 kHz ~2–24 kbps
Wideband 16 kHz ~4–36 kbps
Ultra-wideband 32 kHz ~8–44 kbps

Speex is configured by quality level (0 to 10). It supports VBR (variable bit rate), VAD (voice activity detection), and noise suppression. MOS range is 3.5–4.2 depending on bitrate.

While Speex provided good quality, it has been effectively succeeded by Opus (which the Speex authors also helped create). Nonetheless, Speex is still found in some legacy VoIP systems.

GSM Full Rate (GSM 6.10) – 13 kbps Early Cellular Codec

GSM 06.10 (GSM Full Rate) is the original speech codec for 2G GSM networks. It operates at 13 kbps and uses RPE-LTP (Regular Pulse Excitation – Long Term Prediction).

Parameter Value
Bitrate 13 kbps
Frame length 20 ms
Sample rate 8 kHz
MOS ~3.5–3.7

GSM-FR audio quality is a bit lower than modern codecs; it sounds somewhat muffled compared to G.711. It was optimized for early GSM conditions. Modern networks have moved on to AMR, but GSM-FR remains a historical reference.

SILK – Skype's Wideband Codec (6–40 kbps, Adaptive)

SILK is an audio codec developed by Skype (introduced around 2009) for encoding speech at variable bitrates with an emphasis on wideband and super-wideband quality and low latency. It later became one half of the Opus codec.

Parameter Value
Bitrate ~6–40 kbps
Sample rates 8, 12, 16, or 24 kHz
MOS ~4.0–4.5
Frame length 20 ms (typical)

SILK wideband (16 kHz) at moderate bitrates (~20-25 kbps) sounds very clear, often better than G.722. SILK can dynamically adjust bitrate in real-time based on network conditions.

Since Opus contains SILK, standalone SILK use is less common now, but it's effectively still around inside Opus for low/medium bitrate voice.

iSAC – Internet Speech Audio Codec (Adaptive Wideband Codec)

iSAC (Internet Speech Audio Codec) is another wideband speech codec from Global IP Solutions (the company behind iLBC). It is a wideband (16 kHz) codec that can also operate up to 32 kHz (super-wideband).

Parameter Value
Bitrate ~10–32 kbps (adaptive)
Sample rate 16 kHz (wideband), 32 kHz (super-wideband)
Frame length 30–60 ms (adjustable)
MOS ~4.1–4.3

iSAC is an adaptive bit rate codec: instead of fixed modes, it can adjust its bit rate dynamically between roughly 10 kbps and 32 kbps depending on network conditions. Google integrated iSAC into early WebRTC.

Since Opus came, iSAC has been largely superseded (Opus fullband mode covers iSAC's territory and more), but it's part of the wideband codec legacy.

MP4A-LATM (AAC-LD) – MPEG-4 Audio in LATM Transport (High-Fidelity Codec)

MP4A-LATM refers to an AAC (Advanced Audio Coding) stream carried in the Low-overhead Audio Transport Multiplex format, often used in RTP. This is typically AAC-LD (Low Delay AAC) or AAC-ELD (Enhanced Low Delay) codec used for real-time communication.

Parameter Value
Bandwidth Full-band (48 kHz typical)
Bitrate Variable (64 kbps typical for mono)
Latency ~20–30 ms
Quality Studio-quality at sufficient bitrates

AAC-LD provides full-band audio with algorithmic delay low enough for conversations, while delivering excellent fidelity. It handles music far better than speech-specific codecs.

Use Cases: High-end telepresence systems (Cisco/Polycom), FaceTime (uses AAC-ELD), and scenarios requiring better sound quality than traditional voice codecs.

VoIPmonitor Codec Support and Best Practices

VoIPmonitor supports decoding of all the above codecs. This allows telecom engineers and analysts to capture calls and listen to or measure quality (MOS, waveform analysis) regardless of the codec used.

Codec Comparison Summary

Codec Bitrate (kbps) Bandwidth MOS PLC Quality Primary Use
G.711 64 Narrowband 4.1–4.3 Good (Appendix I) PSTN interconnect, default
G.722 48–64 Wideband ~4.5 Good HD Voice VoIP
G.723.1 5.3–6.3 Narrowband 3.7–3.9 Basic Legacy systems
G.726 16–40 Narrowband 3.5–4.2 Basic DECT, bandwidth-limited
G.729a 8 Narrowband 3.9–4.0 Excellent VoIP trunking
Opus 6–510 Full-band 4.5–4.8 Excellent (FEC) WebRTC, modern VoIP
AMR 4.75–12.2 Narrowband 3.0–3.7 Good GSM/UMTS
AMR-WB 6.6–23.85 Wideband 3.7–4.5 Good VoLTE, HD Voice mobile
iLBC 13.3–15.2 Narrowband ~4.0 Excellent Lossy networks
Speex 2–44 NB/WB/UWB 3.5–4.2 Good Legacy open-source
GSM FR 13 Narrowband 3.5–3.7 Basic Legacy GSM
SILK 6–40 WB/SWB 4.0–4.5 Good Skype, Opus layer
iSAC 10–32 WB/SWB 4.1–4.3 Good Early WebRTC
AAC-LD Variable Full-band High Basic High-fidelity conferencing

Best Practices

  • Narrowband vs Wideband: Wideband codecs (G.722, AMR-WB, Opus, SILK, AAC-LD) naturally have higher user satisfaction due to richer sound. Narrowband codecs max out around MOS 4.2 even in perfect conditions.
  • Bitrate and Network Conditions: Lower-bitrate codecs save bandwidth but may introduce more artifacts. Ensure the chosen codec matches your scenario (Opus or G.722 for quality priority; G.729 or AMR for bandwidth-limited links).
  • Packet Loss Concealment: PLC algorithms significantly affect call quality in adverse conditions. Enabling PLC on G.711 can allow tolerating nearly 5% packet loss before quality falls below MOS 3.6. Modern codecs like Opus have very good PLC and FEC.
  • Monitoring MOS and QoE: VoIPmonitor uses the ITU E-model to estimate MOS for calls.
  • Recommended codec: If available, Opus should be the codec of choice for voice calls due to its clarity, robustness, and low latency.

See Also

External References