Audio Codecs - Comprehensive Guide: Difference between revisions
No edit summary |
No edit summary |
||
| Line 446: | Line 446: | ||
== See Also == | == See Also == | ||
* [https://www.voipmonitor.org/doc/Comprehensive_Guide_to_VoIP_Voice_Quality Comprehensive Guide to VoIP Voice Quality] | |||
* [https://www.voipmonitor.org/doc/Understanding_the_SIP_Protocol Understanding the SIP Protocol] | |||
* [https://www.voipmonitor.org/doc/Understanding_the_RTP_Protocol Understanding the RTP Protocol] | |||
* [https://www.voipmonitor.org/ | |||
* [https:// | |||
* [https:// | |||
[[Category:Audio]] | [[Category:Audio]] | ||
Latest revision as of 23:53, 11 December 2025
VoIPmonitor is capable of decoding a wide range of voice codecs commonly used in telephony and VoIP. This guide covers all the audio codecs that VoIPmonitor supports: G.711 (PCM A-law/μ-law), G.722, G.723.1, G.726, G.729a, Opus, AMR (Adaptive Multi-Rate), AMR-WB (Wideband), iLBC, Speex, GSM (Full Rate), Skype's Silk, iSAC, and MP4A-LATM (MPEG-4 Audio in LATM format).
Each codec is described in detail – including its bit rate, audio bandwidth, typical quality (MOS), and how it handles packet loss (PLC). VoIPmonitor supports decoding audio from all of these codecs in captured calls, ensuring you can analyze call quality across narrowband, wideband, and full-band audio.
Audio Bandwidth Terminology
| Term | Frequency Range | Sample Rate | Description |
|---|---|---|---|
| Narrowband | 300–3400 Hz | 8 kHz | Traditional phone audio |
| Wideband (HD Voice) | 50–7000 Hz | 16 kHz | Clearer, more natural sound |
| Super-wideband | 50–14000 Hz | 32 kHz | Extended clarity |
| Full-band | 20–20000 Hz | 48 kHz | CD-quality audio |
Generally, wideband and full-band codecs provide superior clarity and listener comfort than narrowband codecs. However, they often require higher bitrates. The codecs below span from narrowband to full-band, each balancing quality, bandwidth, and complexity differently.
G.711 (PCM A-law/μ-law) – 64 kbps Pulse Code Modulation

G.711 is the original PCM voice codec standard from 1972, providing uncompressed toll-quality audio. It samples analog voice at 8 kHz with 8-bit nonlinear quantization, resulting in a 64 kbps bitrate. Two companding laws are defined:
- μ-law (Mu-law) – used in North America & Japan
- A-law – used in Europe and elsewhere
These are logarithmic compression curves that reduce quantization noise for small signal amplitudes. In practical terms, G.711 delivers audio quality equivalent to classic landline telephony – narrowband 300–3400 Hz frequency range and very high fidelity for speech. It has a Mean Opinion Score around 4.1–4.3 (out of 5), essentially toll quality.
| Parameter | Value |
|---|---|
| Bitrate | 64 kbps |
| Bandwidth | Narrowband (8 kHz sample rate, ~300–3400 Hz) |
| Network bandwidth | ~87 kbps per call (with overhead) |
| MOS | 4.1–4.3 (toll quality) |
| Compression | None (PCM with companding only) |
| Algorithmic delay | ~0 ms |
Pros:
- Excellent voice quality and interoperability (default for PSTN interconnect)
- Very low latency (~0 added delay)
- Tolerant of multiple encode/decode passes
Cons:
- High bitrate (64k per call)
- Narrowband only – cannot reproduce higher frequencies
Packet Loss Concealment: Because G.711 has no inter-frame dependency (each sample stands alone), lost packets can be concealed by simple techniques. The ITU G.711 standard includes Appendix I, which defines a PLC algorithm. With PLC enabled, G.711 can tolerate ~5% packet loss while maintaining MOS above 3.6, whereas without PLC even 2% loss can drop MOS below acceptable levels.
G.722 – 64 kbps Wideband Audio (SB-ADPCM)
G.722 is a wideband speech codec from ITU (approved 1988) that delivers 7 kHz audio bandwidth at 48–64 kbps. It extends G.711 quality into the wideband range for clearer voice. G.722 uses Sub-Band ADPCM (SB-ADPCM): the audio (16 kHz sampling, 14-bit input) is split into two sub-bands (low 0–4 kHz and high 4–8 kHz) and each is encoded with ADPCM.
The standard allows 64, 56, or 48 kbps; 64 kbps (the most common) allocates 48 kbps to the lower band (which carries most speech energy) and 16 kbps to the upper band. Because of its wideband frequency response (~50–7000 Hz), G.722 yields noticeably richer and more natural voice than G.711.
| Parameter | Value |
|---|---|
| Bitrate | 64/56/48 kbps |
| Bandwidth | Wideband (50–7000 Hz, 16 kHz sampling) |
| MOS | ~4.5 (wideband scale) |
| Frame length | 10 ms |
| Algorithmic delay | ~10 ms |
Pros:
- Much better audio fidelity than G.711 at the same bitrate
- Low latency and simple decoding
- Often supported in VoIP phones for HD audio calls
Cons:
- Still uses 64k (unless lower rate negotiated)
- Not as efficient as newer codecs (Opus, AMR-WB)
- Optimized for voice, not suitable for music
Packet Loss Concealment: ITU later provided G.722 Appendix IV that defines a PLC method using adaptive muting and pitch extrapolation. If a frame is lost, the decoder extrapolates the low-band signal from previous data and mutes the high-band, then cross-fades when new data arrives.
G.723.1 – 6.3/5.3 kbps Dual-Rate Codec (ACELP/MP-MLQ)
G.723.1 is a low-bitrate speech codec standardized by ITU-T in 1996, designed for early VoIP and videoconferencing (used in H.323 systems). It provides two bit rates:
- 6.3 kbps – using Multi-Pulse LPC with Maximum Likelihood Quantization (MP-MLQ), MOS ~3.9
- 5.3 kbps – using Algebraic CELP, MOS ~3.7
Both operate on 30 ms frames. Despite the name, G.723.1 is unrelated to G.723 – it's a separate standard targeting much lower bit rates.
| Parameter | Value |
|---|---|
| Bitrate | 6.3 or 5.3 kbps |
| Frame length | 30 ms (240 samples) |
| Lookahead | 7.5 ms |
| Total algorithmic delay | 37.5 ms |
| Bandwidth | Narrowband (8 kHz) |
| MOS | 3.7–3.9 (fair) |
Packet Loss Concealment: G.723.1 includes a built-in PLC technique in Annex A. If a frame is lost, the decoder uses the last good frame's parameters to synthesize a replacement, repeating the last pitch period and gradually attenuating the signal. Given the long frame length (30 ms), losing even one frame can be noticeable.
G.723.1 is mostly of historical interest now, supplanted by better codecs (G.729, Opus) that achieve more quality at similar bit rates.
G.726 – ADPCM at 16–40 kbps (Waveform Codec)
G.726 is an ADPCM (Adaptive Differential PCM) codec that compresses 64 kbps PCM down to 40, 32, 24, or 16 kbps. It was standardized in 1990 as a replacement for older ADPCM codecs (G.721 at 32k and G.723 at 24k/40k).
| Mode | Bitrate | Bits/sample | MOS | Notes |
|---|---|---|---|---|
| G.726-40 | 40 kbps | 5-bit | ~4.2 | Almost indistinguishable from G.711 |
| G.726-32 | 32 kbps | 4-bit | ~3.9–4.0 | Most commonly used (DECT phones) |
| G.726-24 | 24 kbps | 3-bit | ~3.7 | Somewhat muffled |
| G.726-16 | 16 kbps | 2-bit | ~3.5 | Only for extreme bandwidth limits |
G.726 is a waveform codec like G.711, operating on audio samples directly, but uses differential encoding to reduce bitrate. At 32 kbps, it achieves near-toll quality speech at half the bitrate of G.711, making it valuable in bandwidth-limited systems.
Resilience and PLC: G.726 is somewhat sensitive to packet loss because the internal predictor state can go off track when samples are missing. It does not have a defined PLC in the standard. The codec is relatively insensitive to bit errors on a continuous channel, but with packet loss, the gap in the waveform can cause an audible click or jump.
G.729a – 8 kbps CS-ACELP Codec (Widely Used in VoIP)
G.729 is a popular 8 kbps speech codec that uses Conjugate-Structure Algebraic Code-Excited Linear Prediction (CS-ACELP). It was standardized by ITU-T in 1995 and became a go-to codec for VoIP due to its low bandwidth requirement and good voice quality for narrowband speech.
The common variant used is G.729a (Annex A), which is a slightly lower-complexity version compatible with the original G.729 bitstream.
| Parameter | Value |
|---|---|
| Bitrate | 8 kbps |
| Sample rate | 8 kHz (narrowband) |
| Frame length | 10 ms (80 bits/frame) |
| Lookahead | 5 ms |
| Total algorithmic delay | 15 ms |
| MOS | ~3.9–4.0 |
| Payload | 20 ms packet = 2 frames = 20 bytes |
Quality: One of the best in the narrowband 8 kbps class. MOS ~4.0 with one encode. G.729 can degrade if audio is transcoded multiple times or if fed non-speech signals (music, DTMF tones). It doesn't carry fax/modem signals well.
Packet Loss Concealment: G.729 has an effective PLC built into the decoder. Upon detecting loss, it uses the last good frame's parameters (pitch period, energy) to synthesize a replacement, typically by repeating the last pitch cycle and gradually reducing gain. According to developers, the PLC "works surprisingly well even under high packet loss rates."
Patents on G.729 expired in 2017, so it's now free to use. G.729 was the workhorse of VoIP for many years, now increasingly supplanted by Opus in new systems.
Opus – Adaptive Full-Band Codec (6–510 kbps, State-of-the-Art)

Opus is a modern, highly versatile audio codec standardized by the IETF in 2012 (RFC 6716). It is often considered the "Swiss Army knife" of audio codecs, as it can handle everything from narrowband speech at very low bitrates (~6 kbps) up to full-band stereo music at 510 kbps, all with low latency.
Opus is a hybrid codec: it combines:
- SILK (Skype's speech codec) for linear-prediction voice coding
- CELT (Xiph.Org's codec) for MDCT-based music coding
Opus can dynamically switch or mix these modes to optimize quality. In VoIP and WebRTC, Opus has become the go-to codec because it delivers unmatched voice quality across a wide range of network conditions. Many consider Opus the best available VoIP codec as of 2025.
| Parameter | Value |
|---|---|
| Bitrate | 6–510 kbps (adjustable) |
| Common VoIP usage | 16–64 kbps (mono speech) |
| Bandwidth | Full-band (48 kHz capable) |
| Frame size | 2.5–60 ms (default 20 ms) |
| Algorithmic delay | ~26.5 ms (typical) |
| MOS | 4.5–4.8 at high bitrates |
Adaptive Features: Opus can dynamically adjust bitrate, bandwidth, and complexity in response to network conditions. It also supports audio bandwidth detection (speech vs music or hybrid content).
Packet Loss Concealment: Opus has very advanced PLC and FEC (Forward Error Correction) tools built-in. Additionally, Opus has an optional in-band FEC mode for voice: it can send a redundant copy of a lower-quality frame within the next packet. As a result, Opus maintains quality even on lossy networks – it's noted for "superior packet loss handling." Opus might still sound acceptable at 5–10% packet loss, where older codecs would be breaking up.
Use Cases: Virtually everywhere in WebRTC (browsers, Zoom/Teams audio, etc.), streaming, and even music production. It's royalty-free and open source.
AMR (Adaptive Multi-Rate) – 3GPP Narrowband Codec (4.75–12.2 kbps)
AMR is an adaptive speech codec used primarily in mobile networks (2G/3G GSM and UMTS). Standardized by 3GPP in 1998, AMR-NB operates on 20 ms frames and can switch between 8 different bitrates:
| Mode | Bitrate | Quality Notes |
|---|---|---|
| 7 | 12.2 kbps | Best quality, similar to GSM EFR |
| 6 | 10.2 kbps | Very good |
| 5 | 7.95 kbps | Good, common in decent conditions |
| 4 | 7.4 kbps | Typical balance mode |
| 3 | 6.7 kbps | Compressed but understandable |
| 2 | 5.9 kbps | Noticeable artifacts |
| 1 | 5.15 kbps | Lower quality |
| 0 | 4.75 kbps | Emergency mode, rough quality |
The codec can dynamically mode-switch based on channel quality (controlled by the network). At 12.2 kbps (mode 7), AMR-NB is roughly on par with G.729 or slightly better, yielding near toll quality for speech.
Adaptation: When the network detects a lot of errors, it commands the sender to use a lower-rate mode that has more redundancy. When the channel is clear, it switches up to a higher bitrate for better quality.
Packet Loss/Error Concealment: AMR codec includes Frame Erasure Concealment. If a frame is marked as bad, the decoder uses the last received good frame's parameters to conceal the loss. AMR is robust against isolated frame losses – a single lost 20 ms might be barely noticeable.
AMR-WB (G.722.2) – Adaptive Multi-Rate Wideband (HD Voice at 6.6–23.85 kbps)
AMR-WB (Adaptive Multi-Rate Wideband), also known as ITU G.722.2, is the wideband extension of AMR. It is the codec behind HD Voice in 3G/4G networks, providing 50–7000 Hz audio (16 kHz sampling) for much clearer calls.
| Mode | Bitrate | MOS |
|---|---|---|
| 8 | 23.85 kbps | ~4.5 |
| 7 | 23.05 kbps | ~4.4 |
| 6 | 18.25 kbps | ~4.3 |
| 5 | 15.85 kbps | ~4.2 |
| 4 | 14.25 kbps | ~4.1 |
| 3 | 12.65 kbps | ~4.0 (VoLTE baseline) |
| 2 | 8.85 kbps | ~3.9 |
| 1 | 6.6 kbps | ~3.7 |
At top rate (23.85 kbps), AMR-WB delivers wireline-quality or better voice: voices sound natural and crisp. Even lower modes maintain surprisingly good clarity. VoLTE (voice over LTE) initially used AMR-WB as the mandatory codec for HD voice calls.
AMR-WB at 12.65 kbps often outperforms older wideband codecs at higher bitrates (e.g., G.722 at 64k) due to its efficiency and optimization for speech.
iLBC – Internet Low Bitrate Codec (13.3/15.2 kbps, Loss-Tolerant)
iLBC is an open-source narrowband voice codec designed specifically for VoIP with robustness to packet loss in mind. It was developed by Global IP Solutions and published as RFC 3951 in 2004.
| Mode | Frame Length | Bitrate | Bits/frame |
|---|---|---|---|
| Mode 20 | 20 ms | 15.2 kbps | 304 bits |
| Mode 30 | 30 ms | 13.33 kbps | 400 bits |
What sets iLBC apart is its frame independence – each frame is coded largely independently of others, so that losses do not propagate errors. This means a lost frame doesn't ruin the decoding of subsequent frames.
Quality: Under ideal conditions (no loss), iLBC's quality is on par with G.729 or slightly better (MOS ~4.0–4.14). Where iLBC shines is when there is packet loss: its audio degrades much less dramatically than other codecs.
Packet Loss Concealment: Essentially built-in by design. Because each iLBC frame is encoded independently, the decoder can just play out a PLC-generated segment without having to reset internal state. Tests indicated iLBC could handle up to 15-20% loss and still yield understandable speech.
iLBC has been largely superseded by Opus, but remains an excellent fallback in high-loss scenarios.
Speex – Open-Source CELP Codec (2–44 kbps, Flexible)
Speex is an open-source speech codec project (by Xiph.org) released in the early 2000s. It was designed as a patent-free alternative to proprietary codecs. Speex is a CELP-based codec supporting multiple sampling rates:
| Mode | Sample Rate | Bitrate Range |
|---|---|---|
| Narrowband | 8 kHz | ~2–24 kbps |
| Wideband | 16 kHz | ~4–36 kbps |
| Ultra-wideband | 32 kHz | ~8–44 kbps |
Speex is configured by quality level (0 to 10). It supports VBR (variable bit rate), VAD (voice activity detection), and noise suppression. MOS range is 3.5–4.2 depending on bitrate.
While Speex provided good quality, it has been effectively succeeded by Opus (which the Speex authors also helped create). Nonetheless, Speex is still found in some legacy VoIP systems.
GSM Full Rate (GSM 6.10) – 13 kbps Early Cellular Codec
GSM 06.10 (GSM Full Rate) is the original speech codec for 2G GSM networks. It operates at 13 kbps and uses RPE-LTP (Regular Pulse Excitation – Long Term Prediction).
| Parameter | Value |
|---|---|
| Bitrate | 13 kbps |
| Frame length | 20 ms |
| Sample rate | 8 kHz |
| MOS | ~3.5–3.7 |
GSM-FR audio quality is a bit lower than modern codecs; it sounds somewhat muffled compared to G.711. It was optimized for early GSM conditions. Modern networks have moved on to AMR, but GSM-FR remains a historical reference.
SILK – Skype's Wideband Codec (6–40 kbps, Adaptive)
SILK is an audio codec developed by Skype (introduced around 2009) for encoding speech at variable bitrates with an emphasis on wideband and super-wideband quality and low latency. It later became one half of the Opus codec.
| Parameter | Value |
|---|---|
| Bitrate | ~6–40 kbps |
| Sample rates | 8, 12, 16, or 24 kHz |
| MOS | ~4.0–4.5 |
| Frame length | 20 ms (typical) |
SILK wideband (16 kHz) at moderate bitrates (~20-25 kbps) sounds very clear, often better than G.722. SILK can dynamically adjust bitrate in real-time based on network conditions.
Since Opus contains SILK, standalone SILK use is less common now, but it's effectively still around inside Opus for low/medium bitrate voice.
iSAC – Internet Speech Audio Codec (Adaptive Wideband Codec)
iSAC (Internet Speech Audio Codec) is another wideband speech codec from Global IP Solutions (the company behind iLBC). It is a wideband (16 kHz) codec that can also operate up to 32 kHz (super-wideband).
| Parameter | Value |
|---|---|
| Bitrate | ~10–32 kbps (adaptive) |
| Sample rate | 16 kHz (wideband), 32 kHz (super-wideband) |
| Frame length | 30–60 ms (adjustable) |
| MOS | ~4.1–4.3 |
iSAC is an adaptive bit rate codec: instead of fixed modes, it can adjust its bit rate dynamically between roughly 10 kbps and 32 kbps depending on network conditions. Google integrated iSAC into early WebRTC.
Since Opus came, iSAC has been largely superseded (Opus fullband mode covers iSAC's territory and more), but it's part of the wideband codec legacy.
MP4A-LATM (AAC-LD) – MPEG-4 Audio in LATM Transport (High-Fidelity Codec)
MP4A-LATM refers to an AAC (Advanced Audio Coding) stream carried in the Low-overhead Audio Transport Multiplex format, often used in RTP. This is typically AAC-LD (Low Delay AAC) or AAC-ELD (Enhanced Low Delay) codec used for real-time communication.
| Parameter | Value |
|---|---|
| Bandwidth | Full-band (48 kHz typical) |
| Bitrate | Variable (64 kbps typical for mono) |
| Latency | ~20–30 ms |
| Quality | Studio-quality at sufficient bitrates |
AAC-LD provides full-band audio with algorithmic delay low enough for conversations, while delivering excellent fidelity. It handles music far better than speech-specific codecs.
Use Cases: High-end telepresence systems (Cisco/Polycom), FaceTime (uses AAC-ELD), and scenarios requiring better sound quality than traditional voice codecs.
VoIPmonitor Codec Support and Best Practices
VoIPmonitor supports decoding of all the above codecs. This allows telecom engineers and analysts to capture calls and listen to or measure quality (MOS, waveform analysis) regardless of the codec used.
Codec Comparison Summary
| Codec | Bitrate (kbps) | Bandwidth | MOS | PLC Quality | Primary Use |
|---|---|---|---|---|---|
| G.711 | 64 | Narrowband | 4.1–4.3 | Good (Appendix I) | PSTN interconnect, default |
| G.722 | 48–64 | Wideband | ~4.5 | Good | HD Voice VoIP |
| G.723.1 | 5.3–6.3 | Narrowband | 3.7–3.9 | Basic | Legacy systems |
| G.726 | 16–40 | Narrowband | 3.5–4.2 | Basic | DECT, bandwidth-limited |
| G.729a | 8 | Narrowband | 3.9–4.0 | Excellent | VoIP trunking |
| Opus | 6–510 | Full-band | 4.5–4.8 | Excellent (FEC) | WebRTC, modern VoIP |
| AMR | 4.75–12.2 | Narrowband | 3.0–3.7 | Good | GSM/UMTS |
| AMR-WB | 6.6–23.85 | Wideband | 3.7–4.5 | Good | VoLTE, HD Voice mobile |
| iLBC | 13.3–15.2 | Narrowband | ~4.0 | Excellent | Lossy networks |
| Speex | 2–44 | NB/WB/UWB | 3.5–4.2 | Good | Legacy open-source |
| GSM FR | 13 | Narrowband | 3.5–3.7 | Basic | Legacy GSM |
| SILK | 6–40 | WB/SWB | 4.0–4.5 | Good | Skype, Opus layer |
| iSAC | 10–32 | WB/SWB | 4.1–4.3 | Good | Early WebRTC |
| AAC-LD | Variable | Full-band | High | Basic | High-fidelity conferencing |
Best Practices
- Narrowband vs Wideband: Wideband codecs (G.722, AMR-WB, Opus, SILK, AAC-LD) naturally have higher user satisfaction due to richer sound. Narrowband codecs max out around MOS 4.2 even in perfect conditions.
- Bitrate and Network Conditions: Lower-bitrate codecs save bandwidth but may introduce more artifacts. Ensure the chosen codec matches your scenario (Opus or G.722 for quality priority; G.729 or AMR for bandwidth-limited links).
- Packet Loss Concealment: PLC algorithms significantly affect call quality in adverse conditions. Enabling PLC on G.711 can allow tolerating nearly 5% packet loss before quality falls below MOS 3.6. Modern codecs like Opus have very good PLC and FEC.
- Monitoring MOS and QoE: VoIPmonitor uses the ITU E-model to estimate MOS for calls.
- Recommended codec: If available, Opus should be the codec of choice for voice calls due to its clarity, robustness, and low latency.