Understanding the WebRTC Protocol
Web Real-Time Communication (WebRTC) is a suite of protocols and APIs enabling real-time audio, video, and data exchange directly between browsers or other peers without requiring an intermediary server for the media path. It was designed to facilitate peer-to-peer (P2P) communication, tackling challenges of NAT traversal, media transport, encryption, and more via a collection of standards (defined in numerous RFCs and W3C specifications). This guide provides a detailed yet approachable overview of WebRTC's signaling mechanisms, network negotiation (ICE/STUN/TURN), media and data transport protocols, and securityall while distilling key points in cheat sheet recaps for easy reference.
| Architecture & Signaling | NAT Traversal | Media Transport | Data & Security |
|---|---|---|---|
Architecture Overview of WebRTC
At a high level, a WebRTC application consists of two communicating WebRTC agents (e.g. browser peers or other WebRTC endpoints) that establish a direct connection to send media (audio/video) and arbitrary data. Unlike traditional client-server connections, WebRTC employs a peer-to-peer model where each side acts as both client and server, negotiating a connection cooperatively. This P2P approach yields benefits in bandwidth usage and latencymedia travels directly between peers rather than through a central server.
To achieve this, WebRTC relies on several building blocks:
Signaling
- A mechanism (left to the application) to coordinate session setup by exchanging metadata between peers (namely Session Description Protocol offers/answers and ICE candidates). Signaling is not part of WebRTC's wire protocol but is an essential first step to get both peers on the same page.
Session Description Protocol (SDP)
- The format of the session metadata exchanged via signaling. SDP describes the media formats, transport addresses (candidates), and other negotiation parameters needed to establish the connection.
Interactive Connectivity Establishment (ICE)
- The process of finding a viable network path between peers, often through NATs. ICE leverages STUN (Session Traversal Utilities for NAT) servers to discover public reflexive addresses and TURN (Traversal Using Relays around NAT) servers as fallback relays if direct peer-to-peer paths cannot be found.
Media Transport (RTP/RTCP)
- Once connected, media is sent using the Real-Time Transport Protocol (RTP) with Real-Time Control Protocol (RTCP) for feedback. WebRTC mandates secure transport of RTP using DTLS-SRTP (Datagram Transport Layer Security negotiated keys for SRTP) and typically multiplexes all media on a single network 5-tuple (using mechanisms like BUNDLE).
Data Channels (SCTP)
- WebRTC also supports generic data transfer between peers via data channels, which use SCTP (Stream Control Transmission Protocol) layered over the same ICE+DTLS transport. This enables reliable or partially-reliable messaging akin to TCP/UDP but integrated into the peer connection.
Security
- Security is baked in at multiple layers. All WebRTC communications are encrypted (DTLS for handshakes and control messages, SRTP for media, and SCTP over DTLS for data channels). Additionally, session descriptions include cryptographic fingerprints to prevent man-in-the-middle attacks, and browser APIs enforce user consent for camera/microphone access and can mask local IP addresses for privacy.
WebRTC Protocol Stack
On each peer, user media (or data) is captured and sent into a PeerConnection API instance. The application's signaling service exchanges SDP offers/answers between peers, which contain ICE information and media parameters. The WebRTC stack gathers ICE candidates (IP/port endpoints) via STUN/TURN, and the ICE protocol finds a working route between the two peers. The peers perform a DTLS handshake over the chosen route to establish keys, then begin exchanging SRTP packets for media and SCTP packets for data, all securely over UDP (or TCP/TLS as fallback).
Signaling and Session Description (SDP)
WebRTC signaling is the coordination process that allows two peers to agree on the parameters of a communication session. Each WebRTC peer initially knows nothing about the other side's capabilities or network address; signaling is the "bootstrapping" that makes the call possible. In practice, signaling involves exchanging a few messages (typically over a web server, WebSocket, or any arbitrary method chosen by the app) containing session descriptions and network candidate information.
Important: WebRTC does not mandate a specific signaling protocol or transport it can be done via an existing application server, a SIP signal, or any messaging mechanism. The content of these signaling messages, however, is standardized: they consist of Session Description Protocol (SDP) data and related information that both peers can interpret.
Session Description Protocol (SDP)
SDP is a text-based protocol (defined in RFC 8866) for describing multimedia sessions, widely used in WebRTC to negotiate calls. An SDP message is essentially a series of newline-separated key-value lines. It contains a session section with global attributes (protocol version, session ID, etc.) and one or more media descriptions (each describing a media stream like audio or video).
For WebRTC usage, SDP conveys crucial information including:
| SDP Element | Description |
|---|---|
| Media format capabilities | Codecs and their parameters for each media type (e.g. Opus, VP8) via m= lines and a=rtpmap attributes
|
| Transport parameters | Type of transport (always RTP/SAVPF for secure RTP in WebRTC), indications of multiplexing (a=mid, a=group:BUNDLE) and RTCP multiplexing (a=rtcp-mux)
|
| Network candidates | ICE candidates (lines beginning with a=candidate) listing possible IP/port endpoints where each peer can be reached
|
| ICE credentials | Username fragment and password (a=ice-ufrag and a=ice-pwd) that authenticate STUN connectivity checks between peers
|
| DTLS fingerprint | Cryptographic hash of the local DTLS certificate (a=fingerprint) used by remote side to verify the DTLS handshake
|
| Media directions | a=sendrecv (or sendonly/recvonly), a=mid to identify media sections, a=msid to correlate media with MediaStream tracks
|
WebRTC's use of SDP is constrained and codified by the JavaScript Session Establishment Protocol (JSEP) (RFC 8829, updated by RFC 9429). JSEP defines how the browser's WebRTC API exposes SDP generation and processing to the application. The application drives the offer/answer exchange:
- One peer calls
createOffer()to generate an SDP offer based on its local media setup - Sets it as its local description
- Sends it to the other peer via the signaling channel
- The other peer sets that as a remote description
- Uses
createAnswer()to generate an SDP answer that matches the offer - Sends back the answer
Both sides then set the received description as remote, completing the offer/answer negotiation as described by RFC 3264.
Trickle ICE
WebRTC allows candidates to be sent incrementally (called Trickle ICE, RFC 8838) rather than waiting to gather all candidates before sending an SDP offer/answer. In practice, a peer may send an initial SDP offer with some candidates, then send additional ice-candidate messages via the signaling channel as they are discovered. Trickle ICE accelerates connection setup by not delaying the offer/answer exchange.
If trickle ICE is used, the SDP will often include:
- A special marker:
a=ice-options:trickle - Possibly an empty candidate at end to signal more candidates will come
- Eventually, an "end-of-candidates" indication when gathering is complete
Signaling Channel Security
The integrity of the signaling channel is critical if an attacker were to tamper with the SDP in transit (e.g., altering ICE candidates or the DTLS fingerprint), they could attempt a man-in-the-middle attack. WebRTC's encryption (DTLS/SRTP) ensures confidentiality of media, but it does not automatically secure the signaling path. Thus, applications must protect signaling (typically by using TLS-encrypted transport like HTTPS/WSS and proper authentication) to prevent session hijacking or eavesdropping.
Signaling & SDP Quick Reference
| Topic | Key Points |
|---|---|
| Signaling | Out-of-band process of exchanging control messages (SDPs and ICE candidates) between peers before media can flow. WebRTC does not define how to transport these messages use any secure method (WebSocket, SIP over WebSocket, etc.). |
| SDP Offers/Answers | Per RFC 3264, used to negotiate media and connection info. Describes codec capabilities, media types, bandwidth, and includes ICE (NAT traversal) and DTLS (security) info. WebRTC uses SDP as defined in RFC 8866. |
| JSEP | Browser API follows JSEP the app calls createOffer/createAnswer and exchanges resulting SDP blobs via signaling. Offers flexibility to integrate with any signaling protocol.
|
| ICE info in SDP | Contains a=candidate lines for each network candidate (host, STUN reflexive, TURN relay) and matching ice-ufrag/ice-pwd pair for authentication.
|
| DTLS fingerprint | a=fingerprint attribute provides hash of peer's certificate. Each side verifies DTLS handshake uses expected certificate to prevent MITM. Ensure signaling channel is secure.
|
| Trickle ICE | RFC 8838 - send ICE candidates progressively, speeding up connection setup. Offer/answer sent as soon as possible, new candidates signaled as they arrive. |
NAT Traversal and ICE (STUN/TURN)
Establishing a direct P2P connection on the internet is challenging because most users are behind NAT (Network Address Translation) devices or firewalls. NATs hide internal IP addresses, meaning a peer's local network address often isn't directly reachable from outside its network. WebRTC tackles this via the Interactive Connectivity Establishment (ICE) framework (RFC 8445).
ICE is essentially a battle-tested method for two peers to find some way to talk to each other, by collecting all possible network addresses each can use and systematically testing combinations to see what works.
Gathering Candidates
Each WebRTC agent gathers a set of candidate addresses that might reach it. There are several types of ICE candidates:
| Candidate Type | SDP Type | Description |
|---|---|---|
| Host | host | Direct IP addresses of the host's network interfaces (e.g., LAN IP or public IP). Simplest candidates - if both peers are in same LAN or have public IPs, host candidates can work directly. Modern WebRTC may use mDNS hostnames (UUID.local) instead of revealing private IPs for privacy. |
| Server-Reflexive | srflx | Public NAT-mapped addresses obtained by querying a STUN server. STUN (RFC 8489) asks "What is my IP and port as you see it?" and the server provides the reflexive address. This is the public IP/port the NAT assigned to the peer's outgoing request. |
| Relay | relay | Addresses from TURN servers (RFC 8656). If direct UDP is blocked (symmetric NATs, firewalls), TURN relays traffic through a server both peers can contact. Fallback when no direct candidate pair succeeds. Adds latency but ensures connectivity. |
| Peer-Reflexive | prflx | Not gathered upfront but discovered during ICE. When a peer receives a STUN check from an address not in SDP, and communication works, this new viable address becomes a peer-reflexive candidate. |
Connectivity Checks
After gathering, each peer sends its list of candidates to the other. The two ICE agents pair up candidates (each local candidate with each remote candidate, forming candidate pairs). For each candidate pair, the agents send STUN ping messages (Binding requests) to each other to see if a packet can get through.
These pings include authentication:
- Each contains the sender's ICE username fragment
- Message integrity hash using the shared password (from SDP)
- Recipient validates before responding
This ensures only the intended remote peer can respond to checks and prevents interference from stray STUN traffic.
Pair Selection Process:
- ICE checks happen in priority order (host-to-host tried before reflexive or relay)
- As soon as bi-directional check succeeds on a pair, it's considered valid
- The controlling agent (usually the offerer) picks one of the valid pairs to nominate
- Nomination done by sending STUN request with USE-CANDIDATE flag
- When controlled side acknowledges, both parties mark that as the selected candidate pair
All of this happens quickly, typically in a few hundred milliseconds to a couple seconds. ICE is robust: if the initially selected path fails, ICE can retry or restart to re-establish connectivity.
ICE and Networking Requirements
| Requirement | Description |
|---|---|
| Full ICE | WebRTC requires full ICE (not ICE-Lite) - both sides perform checks and both can be behind NATs |
| IPv4/IPv6 | Endpoints must handle both IPv4 and IPv6 candidates, trying IPv6 when possible |
| Happy Eyeballs | RFC 8421 approach for dual-stack scenarios to favor whatever works fastest |
| TCP Fallback | If UDP is blocked entirely, ICE can fall back to ICE-TCP or TURN-TCP (on port 443) |
NAT Traversal Quick Reference
| Topic | Key Points |
|---|---|
| ICE | RFC 8445 - protocol WebRTC uses to establish P2P connection through NATs and firewalls. Each peer gathers all possible candidates and exchanges them, then systematically tests connectivity. |
| STUN | RFC 8489 - used to: (1) get server-reflexive candidates by asking "what is my IP?", (2) perform ICE connectivity checks (peers send STUN binding requests to test paths). Simple request/response over UDP with "magic cookie" and transaction ID. |
| TURN | RFC 8656 - provides relay candidates when direct UDP isn't possible. Peer connects to TURN server (over UDP or TCP/TLS) and obtains relayed address. Ensures connectivity at cost of higher latency. WebRTC endpoints must support TURN. |
| ICE Checks | Peers authenticate STUN checks using ICE username/password from SDP. Controlling peer nominates candidate pair once good path found. |
| ICE States | Transitions: checking � connected � completed. If connectivity lost, can go to failed state. Renegotiation can restart ICE with new candidates. |
| Privacy | mDNS for hostnames instead of local IPs in SDP. Initial checks are authenticated STUN. |
Secure Media Transport (RTP/SRTP, RTCP, and Media Parameters)
Once signaling and ICE have done their jobs, the peers have a direct line of communication. For media (audio and video), WebRTC uses the established protocols from the real-time media world: RTP (Real-time Transport Protocol) for media streams and RTCP (RTP Control Protocol) for periodic feedback and control information.
RTP and Media Streams
RTP (RFC 3550) provides the basic framing and metadata for media frames in transit:
| RTP Field | Purpose |
|---|---|
| Timestamp | Timing information for synchronization |
| Sequence Number | Detect packet loss and reordering |
| Payload Type | Format indicator (codec identification) |
| SSRC | Synchronization Source identifier - unique per RTP stream |
In WebRTC, each media track (audio or video) is sent as one or more RTP streams, identified by SSRC and negotiated via SDP. For example:
- Audio track: one RTP stream using Opus codec (payload type 111)
- Video track: another RTP stream using VP8 (payload type 96)
For simulcasting or scalable video coding (SVC), multiple RTP streams may correspond to one logical video source, distinguished by different SSRCs and signaled via a=simulcast or RID attributes.
Transport Multiplexing
WebRTC endpoints by default multiplex all media over a single transport. Early RTP implementations used separate UDP ports for each stream and RTCP, but WebRTC assumes:
- BUNDLE - multiplexing multiple RTP streams on one transport five-tuple
- RTCP-mux - sending RTCP on the same port as RTP
Negotiated in SDP with:
a=group:BUNDLE- grouping all media "m=" lines to use one transporta=rtcp-mux- no separate RTCP ports
RFC 8844 ensures that if a peer doesn't support rtcp-mux, the call won't proceed - WebRTC has no fallback for non-mux.
Because everything is on one UDP flow, demultiplexing is handled internally by inspecting packet data per RFC 7983:
- STUN messages: characteristic magic cookie value
- DTLS: signature bytes
- RTP/RTCP: payload type ranges
SRTP and Encryption
A critical WebRTC requirement is that all media is secure. WebRTC uses Secure RTP (SRTP, RFC 3711) - RTP with encryption and authentication applied to payload and partially to headers. Keys are established via the DTLS handshake (DTLS-SRTP, RFC 5764).
After the DTLS handshake:
- Media packets protected by SRTP (typically AES encryption + HMAC-SHA1/SHA-256 authentication, or AES-GCM)
- RTCP packets sent as SRTCP (Secure RTCP) with similar protection
- DTLS connection only used for initial handshaking - keys provided to SRTP, DTLS channel kept alive for data channels
Note: While SRTP encryption protects media from eavesdropping, network administrators can still monitor WebRTC quality using tools like VoIPmonitor that decrypt traffic using the server's private TLS key. This enables quality analysis without compromising end-user security.
Media Quality and Codec Considerations
WebRTC audio and video must work under a wide range of network conditions. Several mechanisms are in place:
RTCP Feedback (RFC 4585, 5104):
- PLI (Picture Loss Indication) - request keyframe after video losses
- FIR (Full Intra Request) - similar to PLI
- NACK - signal specific packet losses for retransmission
- REMB/Transport-CC - receiver estimated maximum bitrate
Congestion Control (RFC 8836):
- Monitor packet loss, delay (RTT), and jitter
- Adjust sending rate using algorithms like Google's GCC
- Reduce bitrate by dropping quality or frame rate
Error Resilience:
- FEC (Forward Error Correction) - RFC 8854
- RTX (Retransmission) - resend lost packets upon NACK
- Opus built-in FEC for audio
- VP8/VP9 redundancy modes for video
Mandatory Codecs:
| Media Type | Mandatory Codecs | Notes |
|---|---|---|
| Audio | Opus (RFC 7874/7875) | Primary codec - adaptive bitrate, FEC support, fullband stereo |
| Audio | G.711 (PCMU/PCMA) | Legacy interoperability |
| Video | VP8 | RFC 7742 - royalty-free, widely supported |
| Video | H.264 | RFC 7742 - baseline profile, hardware acceleration common |
| Video | VP9 | Widely supported, better compression than VP8 |
| Video | AV1 | Newest, best compression, growing support |
Multiple Streams and Identification
For complex scenarios with multiple media streams:
| Attribute | Purpose |
|---|---|
a=mid |
Labels each media section in SDP; RTP header extension carries MID for demuxing bundled flows |
a=msid |
Ties RTP streams to MediaStream IDs and track IDs from web API |
a=rid |
RTP Stream ID (RFC 8851) - labels individual encoding streams in simulcast scenarios |
Quality of Service (QoS)
WebRTC endpoints can mark packets with DSCP (Differentiated Services Code Point) values to indicate priority to routers (RFC 8837). Audio packets typically marked as high priority since audio quality is more sensitive to delay than video. However, many networks ignore DSCP.
Media Transport Quick Reference
| Topic | Key Points |
|---|---|
| Encryption | All media encrypted and authenticated. RTP sent as SRTP using keys from DTLS handshake (DTLS-SRTP per RFC 5764). No option for unencrypted media. |
| Single 5-tuple | One transport for multiplexing all media and data (via BUNDLE and RTCP mux). Conserves resources and simplifies NAT traversal. |
| RTP specifics | Implement RTP/RTCP per RFC 3550/3551. Support RTCP feedback (PLI, FIR, NACK). Audio typically 20ms packets (Opus frame). Lost packets handled via jitter buffers and NACK/RTX. |
| Mandatory codecs | Opus for audio, VP8 and H.264 for video. Modern browsers also support VP9 and AV1. Codec negotiation via SDP offer/answer. |
| Adaptive bitrate | Monitor network conditions and dynamically adjust. May reduce video resolution/frame rate or audio bitrate. |
| RTCP use | Sender Reports (SR) and Receiver Reports (RR) provide statistics (packet counts, loss fraction, jitter). Used for RTT calculation and quality assessment. |
Data Channels (SCTP over DTLS)
In addition to media, WebRTC allows peer-to-peer data channels that can carry arbitrary application data (chat messages, file transfers, game state, etc.). Data channels are built on:
- SCTP (Stream Control Transmission Protocol, RFC 4960) - for reliability and ordering
- DTLS - SCTP runs on top of DTLS
- ICE/UDP - DTLS runs over ICE (or ICE/TCP if needed)
Data channels piggyback on the same secure connection used for media, avoiding separate port or ICE negotiation.
Why SCTP?
SCTP was chosen because it supports:
- Multiple logical streams within a single association
- Configurable ordering (ordered or unordered per stream)
- Configurable reliability (reliable or partially reliable)
One SCTP association is established between peers over DTLS, with up to 65,534 streams available, each representing a separate data channel.
Channel Configuration Options
| Option | Values | Description |
|---|---|---|
| Ordered | true / false | In-order delivery (like TCP) or messages arrive as soon as possible (unordered) |
| Reliable | true / partial / false | Full retransmission (reliable), limited retries (partial), or no retransmission (unreliable like UDP) |
| maxRetransmits | number | Limit number of retransmission attempts (partial reliability) |
| maxPacketLifeTime | milliseconds | Timeout for retries (partial reliability via PR-SCTP extension, RFC 3758) |
Default data channel is ordered and reliable (like TCP), but configurable for each channel.
Establishing Data Channels (DCEP)
The Data Channel Establishment Protocol (DCEP) (RFC 8832) defines control messages for opening channels:
| Message | Purpose |
|---|---|
| DATA_CHANNEL_OPEN | Sent on reserved SCTP stream to open channel. Includes: label, priority, reliability parameters, optional subprotocol. |
| DATA_CHANNEL_ACK | Response acknowledging channel opening. |
Channels can also be negotiated out-of-band via SDP (RFC 8864), but in-band DCEP negotiation is more common.
Integration with SDP
Data channel support indicated in SDP by:
m=section of typeapplicationwith protocolDTLS/SCTPorUDP/DTLS/SCTPa=sctp-port- SCTP port number (often 5000 or 5001)a=max-message-size- support for large messages
Data Channels Quick Reference
| Topic | Key Points |
|---|---|
| DataChannel API | Simple message-based pipes - .send() data and receive 'message' events. All data channels share one SCTP association between peers.
|
| Protocols | SCTP (RFC 4960) provides transport with streams. DCEP (RFC 8832) for opening channels within SCTP. All secured by DTLS (SCTP-over-DTLS-over-ICE). |
| Stream independence | Each data channel is one SCTP stream. Loss on one ordered stream doesn't block unordered ones. Up to 65k channels possible. |
| Ordered vs Unordered | Ordered: messages arrive in send order (like TCP). Unordered: messages arrive ASAP, even if earlier ones delayed/lost. |
| Reliable vs Partial | Reliable: retransmit indefinitely (like TCP). Partial: limit retries or time, trading completeness for timeliness. Configured via maxRetransmits or maxPacketLifeTime.
|
| Opening handshake | createDataChannel(label, options) � SCTP association initiated � DCEP Open sent � Remote ondatachannel fires � Ack sent � Messages can flow.
|
| Use cases | Text chat, file transfer, gaming synchronization, tunneling protocols, P2P CDN. Low-latency direct delivery reduces server load. |
Security and Privacy in WebRTC
Security is a first-class concern in WebRTC's design. The goal is that users can communicate freely without eavesdropping or tampering. Media and data are encrypted on the wire (end-to-end between peers) using DTLS and SRTP.
For enterprises: While WebRTC encryption protects against external threats, organizations often need visibility into their own WebRTC traffic for quality monitoring, troubleshooting, and compliance. Solutions like VoIPmonitor can decrypt and analyze WebRTC calls when configured with the server's private key, providing full CDR and quality metrics without weakening security against external attackers.
Encryption & Authentication
Every WebRTC connection is encrypted such that no third party can decipher the media or data in transit, nor inject malicious packets without detection. Authentication of the remote party is achieved through DTLS certificate fingerprint verification:
- SDP includes fingerprint (SHA-256 hash of DTLS certificate)
- During DTLS handshake, each side verifies certificate matches SDP fingerprint
- If attacker tries MITM, they would have to alter fingerprint and present their own certificate
- Legitimate peer would detect mismatch
This is why secure signaling channel is crucial - if fingerprint and ICE info are delivered accurately, the peer connection is secure.
Browser Security Model
- WebRTC only available on secure contexts (HTTPS) in modern browsers
- User permission required for microphone/camera access
- Data channels don't require special permission but page must be trusted context
Privacy: IP Address Exposure
Early WebRTC implementations exposed local IP addresses via ICE candidates, enabling browser fingerprinting. Mitigations:
| Mitigation | Description |
|---|---|
| mDNS obfuscation | Browsers use .local hostnames (UUID.local) instead of IP addresses in host candidates |
| Delayed gathering | No candidates gathered until media or data component authorized |
| Relay-only mode | Enterprise policies can force using only TURN relay candidates |
| Limited API access | Non-HTTPS pages get restricted or no WebRTC access |
Additional Security Features
| Feature | Description |
|---|---|
| Perfect Forward Secrecy | DTLS with modern cipher suites provides PFS - past communications cannot be decrypted even if private keys compromised later |
| Key Continuity | Renegotiation maintains or refreshes DTLS connection. ICE restart triggers new DTLS handshake with new keys. |
| Identity Framework | Optional IdP (Identity Provider) integration for cryptographic peer identity verification |
| DoS Protection | STUN requires ICE credentials, DTLS has handshake backoff, browsers rate-limit operations |
Security & Privacy Quick Reference
| Topic | Key Points |
|---|---|
| Encryption mandatory | No plaintext audio/video ever sent. DTLS and SRTP used. If not encrypted, it's not WebRTC. |
| DTLS Fingerprints | Each peer's certificate SHA-256 fingerprint in SDP ensures talking to right peer (assuming signaling not compromised). |
| Secure signaling | Use TLS for signaling transport and authenticate server. WebRTC doesn't protect offer/answer itself. |
| NAT traversal vs Privacy | Host candidates might reveal local IPs. Mitigated with mDNS and relay-only settings. |
| Browser constraints | HTTPS required. Camera/mic need permission. Data channel doesn't need permission but needs trusted context. |
| Avoiding pitfalls | Don't log SDP unnecessarily (contains IPs). Clean up unused PeerConnections. Use standard WebRTC libraries. |
| Future updates | DTLS 1.3, newer cipher suites (ECDSA, Ed25519), Oblivious relay research. Stay updated with browser releases. |
Monitoring WebRTC with VoIPmonitor
WebRTC's mandatory encryption presents a unique challenge for network monitoring and troubleshooting. Unlike traditional SIP/RTP where traffic can be captured in plaintext, WebRTC requires specialized tools capable of decrypting both the signaling and media layers.
VoIPmonitor WebRTC Capabilities
VoIPmonitor provides comprehensive WebRTC monitoring by decrypting both encryption layers:
| Layer | Protocol | What VoIPmonitor Captures |
|---|---|---|
| Signaling | SIP over WSS (Secure WebSocket) | Call setup, Offer/Answer SDP exchange, ICE candidates, call metadata |
| Media | DTLS-SRTP | RTP streams, audio/video quality metrics, packet loss, jitter, MOS scores |
This enables full visibility into WebRTC calls for:
- Quality monitoring - Track MOS scores, packet loss, jitter, and latency in real-time
- Troubleshooting - Analyze call setup failures, ICE negotiation issues, codec problems
- CDR generation - Generate detailed call records for encrypted WebRTC sessions
- SLA compliance - Monitor voice quality against service level agreements
How It Works
VoIPmonitor decrypts WebRTC traffic by using the PBX's private TLS key. The decryption process works as follows:
- WSS Decryption - VoIPmonitor uses the private key to decrypt TLS-protected WebSocket traffic, revealing the SIP signaling (INVITE, 200 OK, BYE, etc.)
- DTLS-SRTP Key Extraction - From the decrypted SDP, VoIPmonitor extracts DTLS parameters and performs the DTLS handshake to obtain SRTP master keys
- SRTP Decryption - Using the derived keys, VoIPmonitor decrypts the actual audio/video RTP streams for quality analysis
Configuration
To enable WebRTC monitoring, configure /etc/voipmonitor.conf with the SSL module and your PBX's private key:
ssl = yes ssl_ipport = 192.168.1.100:8089 /etc/asterisk/keys/asterisk.key
Where:
192.168.1.100:8089is the IP and port of your PBX's WSS interface/etc/asterisk/keys/asterisk.keyis the path to the private key file
This configuration works with:
- Asterisk with PJSIP and WebSocket transport
- FreeSWITCH with mod_verto or mod_sofia WebSocket support
- Other PBX systems that use standard WSS for WebRTC signaling
Quality Metrics Available
Once decryption is configured, VoIPmonitor provides the same comprehensive quality metrics for WebRTC as for traditional VoIP:
| Metric | Description |
|---|---|
| MOS (Mean Opinion Score) | Calculated voice quality score (1.0-4.5) |
| Packet Loss | Percentage of lost RTP packets |
| Jitter | Variation in packet arrival times |
| Latency/RTT | Round-trip time measurements from RTCP |
| Codec Detection | Identifies Opus, VP8, H.264, and other WebRTC codecs |
| ICE Candidate Analysis | Tracks which ICE candidate pairs were used (host/srflx/relay) |
For detailed setup instructions, see the VoIPmonitor WebRTC Configuration Guide.
Conclusion
WebRTC brings together a complex set of protocolsSDP for session setup, ICE (with STUN/TURN) for connectivity, DTLS for security, SRTP for media, SCTP for datato enable seamless real-time communication between peers. While each component can be intricate, this guide has walked through the big picture: from the initial offer/answer negotiation down to the encrypted packets on the wire.
WebRTC's power lies in abstracting most of this complexity under a simple API, but understanding the underlying protocols (and their configuration via SDP) is crucial for debugging, optimizing, and building interoperable solutions. With this knowledge, one can appreciate how WebRTC achieves what it does: making a web browser (or any endpoint with a WebRTC stack) a full-fledged real-time communicator, armed with the best of both telecom and internet protocols.
References
| RFC/Resource | Description |
|---|---|
| RFC 8835 | WebRTC Overview |
| RFC 8866 | Session Description Protocol (SDP) |
| RFC 8829 | JavaScript Session Establishment Protocol (JSEP) |
| RFC 8445 | Interactive Connectivity Establishment (ICE) |
| RFC 8489 | Session Traversal Utilities for NAT (STUN) |
| RFC 8656 | Traversal Using Relays around NAT (TURN) |
| RFC 8838 | Trickle ICE |
| RFC 3550 | RTP: Real-time Transport Protocol |
| RFC 3711 | Secure RTP (SRTP) |
| RFC 5764 | DTLS-SRTP |
| RFC 4960 | Stream Control Transmission Protocol (SCTP) |
| RFC 8832 | Data Channel Establishment Protocol (DCEP) |
| RFC 8826 | WebRTC Security Architecture |
| RFC 7874 | WebRTC Audio Codec Requirements |
| RFC 7742 | WebRTC Video Codec Requirements |
| WebRTC for the Curious | Comprehensive WebRTC learning resource |