Web Real-Time Communication (WebRTC) is a suite of protocols and APIs enabling real-time audio, video, and data exchange directly between browsers or other peers without requiring an intermediary server for the media path. It was designed to facilitate peer-to-peer (P2P) communication, tackling challenges of NAT traversal, media transport, encryption, and more via a collection of standards (defined in numerous RFCs and W3C specifications). This guide provides a detailed yet approachable overview of WebRTC's signaling mechanisms, network negotiation (ICE/STUN/TURN), media and data transport protocols, and securityall while distilling key points in cheat sheet recaps for easy reference.

Quick Navigation

Architecture & Signaling	NAT Traversal	Media Transport	Data & Security
Architecture Overview Signaling & SDP SDP Details Trickle ICE SDP Quick Reference	NAT Traversal & ICE Gathering Candidates Connectivity Checks ICE Quick Reference	Media Transport RTP & Media Streams SRTP & Encryption Codecs & Quality Media Quick Reference	Data Channels Security & Privacy VoIPmonitor Integration Data Quick Reference Security Quick Reference

Architecture Overview of WebRTC

At a high level, a WebRTC application consists of two communicating WebRTC agents (e.g. browser peers or other WebRTC endpoints) that establish a direct connection to send media (audio/video) and arbitrary data. Unlike traditional client-server connections, WebRTC employs a peer-to-peer model where each side acts as both client and server, negotiating a connection cooperatively. This P2P approach yields benefits in bandwidth usage and latencymedia travels directly between peers rather than through a central server.

To achieve this, WebRTC relies on several building blocks:

Signaling

A mechanism (left to the application) to coordinate session setup by exchanging metadata between peers (namely Session Description Protocol offers/answers and ICE candidates). Signaling is not part of WebRTC's wire protocol but is an essential first step to get both peers on the same page.

Session Description Protocol (SDP)

The format of the session metadata exchanged via signaling. SDP describes the media formats, transport addresses (candidates), and other negotiation parameters needed to establish the connection.

Interactive Connectivity Establishment (ICE)

The process of finding a viable network path between peers, often through NATs. ICE leverages STUN (Session Traversal Utilities for NAT) servers to discover public reflexive addresses and TURN (Traversal Using Relays around NAT) servers as fallback relays if direct peer-to-peer paths cannot be found.

Media Transport (RTP/RTCP)

Once connected, media is sent using the Real-Time Transport Protocol (RTP) with Real-Time Control Protocol (RTCP) for feedback. WebRTC mandates secure transport of RTP using DTLS-SRTP (Datagram Transport Layer Security negotiated keys for SRTP) and typically multiplexes all media on a single network 5-tuple (using mechanisms like BUNDLE).

Data Channels (SCTP)

WebRTC also supports generic data transfer between peers via data channels, which use SCTP (Stream Control Transmission Protocol) layered over the same ICE+DTLS transport. This enables reliable or partially-reliable messaging akin to TCP/UDP but integrated into the peer connection.

Security

Security is baked in at multiple layers. All WebRTC communications are encrypted (DTLS for handshakes and control messages, SRTP for media, and SCTP over DTLS for data channels). Additionally, session descriptions include cryptographic fingerprints to prevent man-in-the-middle attacks, and browser APIs enforce user consent for camera/microphone access and can mask local IP addresses for privacy.

WebRTC Protocol Stack

On each peer, user media (or data) is captured and sent into a PeerConnection API instance. The application's signaling service exchanges SDP offers/answers between peers, which contain ICE information and media parameters. The WebRTC stack gathers ICE candidates (IP/port endpoints) via STUN/TURN, and the ICE protocol finds a working route between the two peers. The peers perform a DTLS handshake over the chosen route to establish keys, then begin exchanging SRTP packets for media and SCTP packets for data, all securely over UDP (or TCP/TLS as fallback).

Signaling and Session Description (SDP)

WebRTC signaling is the coordination process that allows two peers to agree on the parameters of a communication session. Each WebRTC peer initially knows nothing about the other side's capabilities or network address; signaling is the "bootstrapping" that makes the call possible. In practice, signaling involves exchanging a few messages (typically over a web server, WebSocket, or any arbitrary method chosen by the app) containing session descriptions and network candidate information.

Important: WebRTC does not mandate a specific signaling protocol or transport it can be done via an existing application server, a SIP signal, or any messaging mechanism. The content of these signaling messages, however, is standardized: they consist of Session Description Protocol (SDP) data and related information that both peers can interpret.

Session Description Protocol (SDP)

SDP is a text-based protocol (defined in RFC 8866) for describing multimedia sessions, widely used in WebRTC to negotiate calls. An SDP message is essentially a series of newline-separated key-value lines. It contains a session section with global attributes (protocol version, session ID, etc.) and one or more media descriptions (each describing a media stream like audio or video).

For WebRTC usage, SDP conveys crucial information including:

SDP Element	Description
Media format capabilities	Codecs and their parameters for each media type (e.g. Opus, VP8) via `m=` lines and `a=rtpmap` attributes
Transport parameters	Type of transport (always RTP/SAVPF for secure RTP in WebRTC), indications of multiplexing (`a=mid`, `a=group:BUNDLE`) and RTCP multiplexing (`a=rtcp-mux`)
Network candidates	ICE candidates (lines beginning with `a=candidate`) listing possible IP/port endpoints where each peer can be reached
ICE credentials	Username fragment and password (`a=ice-ufrag` and `a=ice-pwd`) that authenticate STUN connectivity checks between peers
DTLS fingerprint	Cryptographic hash of the local DTLS certificate (`a=fingerprint`) used by remote side to verify the DTLS handshake
Media directions	`a=sendrecv` (or sendonly/recvonly), `a=mid` to identify media sections, `a=msid` to correlate media with MediaStream tracks

WebRTC's use of SDP is constrained and codified by the JavaScript Session Establishment Protocol (JSEP) (RFC 8829, updated by RFC 9429). JSEP defines how the browser's WebRTC API exposes SDP generation and processing to the application. The application drives the offer/answer exchange:

One peer calls createOffer() to generate an SDP offer based on its local media setup
Sets it as its local description
Sends it to the other peer via the signaling channel
The other peer sets that as a remote description
Uses createAnswer() to generate an SDP answer that matches the offer
Sends back the answer

Both sides then set the received description as remote, completing the offer/answer negotiation as described by RFC 3264.

Trickle ICE

WebRTC allows candidates to be sent incrementally (called Trickle ICE, RFC 8838) rather than waiting to gather all candidates before sending an SDP offer/answer. In practice, a peer may send an initial SDP offer with some candidates, then send additional ice-candidate messages via the signaling channel as they are discovered. Trickle ICE accelerates connection setup by not delaying the offer/answer exchange.

If trickle ICE is used, the SDP will often include:

A special marker: a=ice-options:trickle
Possibly an empty candidate at end to signal more candidates will come
Eventually, an "end-of-candidates" indication when gathering is complete

Signaling Channel Security

The integrity of the signaling channel is critical if an attacker were to tamper with the SDP in transit (e.g., altering ICE candidates or the DTLS fingerprint), they could attempt a man-in-the-middle attack. WebRTC's encryption (DTLS/SRTP) ensures confidentiality of media, but it does not automatically secure the signaling path. Thus, applications must protect signaling (typically by using TLS-encrypted transport like HTTPS/WSS and proper authentication) to prevent session hijacking or eavesdropping.

Signaling & SDP Quick Reference

Topic	Key Points
Signaling	Out-of-band process of exchanging control messages (SDPs and ICE candidates) between peers before media can flow. WebRTC does not define how to transport these messages use any secure method (WebSocket, SIP over WebSocket, etc.).
SDP Offers/Answers	Per RFC 3264, used to negotiate media and connection info. Describes codec capabilities, media types, bandwidth, and includes ICE (NAT traversal) and DTLS (security) info. WebRTC uses SDP as defined in RFC 8866.
JSEP	Browser API follows JSEP the app calls `createOffer`/`createAnswer` and exchanges resulting SDP blobs via signaling. Offers flexibility to integrate with any signaling protocol.
ICE info in SDP	Contains `a=candidate` lines for each network candidate (host, STUN reflexive, TURN relay) and matching `ice-ufrag`/`ice-pwd` pair for authentication.
DTLS fingerprint	`a=fingerprint` attribute provides hash of peer's certificate. Each side verifies DTLS handshake uses expected certificate to prevent MITM. Ensure signaling channel is secure.
Trickle ICE	RFC 8838 - send ICE candidates progressively, speeding up connection setup. Offer/answer sent as soon as possible, new candidates signaled as they arrive.

NAT Traversal and ICE (STUN/TURN)

Establishing a direct P2P connection on the internet is challenging because most users are behind NAT (Network Address Translation) devices or firewalls. NATs hide internal IP addresses, meaning a peer's local network address often isn't directly reachable from outside its network. WebRTC tackles this via the Interactive Connectivity Establishment (ICE) framework (RFC 8445).

ICE is essentially a battle-tested method for two peers to find some way to talk to each other, by collecting all possible network addresses each can use and systematically testing combinations to see what works.

Gathering Candidates

Each WebRTC agent gathers a set of candidate addresses that might reach it. There are several types of ICE candidates:

Candidate Type	SDP Type	Description
Host	host	Direct IP addresses of the host's network interfaces (e.g., LAN IP or public IP). Simplest candidates - if both peers are in same LAN or have public IPs, host candidates can work directly. Modern WebRTC may use mDNS hostnames (UUID.local) instead of revealing private IPs for privacy.
Server-Reflexive	srflx	Public NAT-mapped addresses obtained by querying a STUN server. STUN (RFC 8489) asks "What is my IP and port as you see it?" and the server provides the reflexive address. This is the public IP/port the NAT assigned to the peer's outgoing request.
Relay	relay	Addresses from TURN servers (RFC 8656). If direct UDP is blocked (symmetric NATs, firewalls), TURN relays traffic through a server both peers can contact. Fallback when no direct candidate pair succeeds. Adds latency but ensures connectivity.
Peer-Reflexive	prflx	Not gathered upfront but discovered during ICE. When a peer receives a STUN check from an address not in SDP, and communication works, this new viable address becomes a peer-reflexive candidate.

Connectivity Checks

After gathering, each peer sends its list of candidates to the other. The two ICE agents pair up candidates (each local candidate with each remote candidate, forming candidate pairs). For each candidate pair, the agents send STUN ping messages (Binding requests) to each other to see if a packet can get through.

These pings include authentication:

Each contains the sender's ICE username fragment
Message integrity hash using the shared password (from SDP)
Recipient validates before responding

This ensures only the intended remote peer can respond to checks and prevents interference from stray STUN traffic.

Pair Selection Process:

ICE checks happen in priority order (host-to-host tried before reflexive or relay)
As soon as bi-directional check succeeds on a pair, it's considered valid
The controlling agent (usually the offerer) picks one of the valid pairs to nominate
Nomination done by sending STUN request with USE-CANDIDATE flag
When controlled side acknowledges, both parties mark that as the selected candidate pair

All of this happens quickly, typically in a few hundred milliseconds to a couple seconds. ICE is robust: if the initially selected path fails, ICE can retry or restart to re-establish connectivity.

ICE and Networking Requirements

Requirement	Description
Full ICE	WebRTC requires full ICE (not ICE-Lite) - both sides perform checks and both can be behind NATs
IPv4/IPv6	Endpoints must handle both IPv4 and IPv6 candidates, trying IPv6 when possible
Happy Eyeballs	RFC 8421 approach for dual-stack scenarios to favor whatever works fastest
TCP Fallback	If UDP is blocked entirely, ICE can fall back to ICE-TCP or TURN-TCP (on port 443)

NAT Traversal Quick Reference

Topic	Key Points
ICE	RFC 8445 - protocol WebRTC uses to establish P2P connection through NATs and firewalls. Each peer gathers all possible candidates and exchanges them, then systematically tests connectivity.
STUN	RFC 8489 - used to: (1) get server-reflexive candidates by asking "what is my IP?", (2) perform ICE connectivity checks (peers send STUN binding requests to test paths). Simple request/response over UDP with "magic cookie" and transaction ID.
TURN	RFC 8656 - provides relay candidates when direct UDP isn't possible. Peer connects to TURN server (over UDP or TCP/TLS) and obtains relayed address. Ensures connectivity at cost of higher latency. WebRTC endpoints must support TURN.
ICE Checks	Peers authenticate STUN checks using ICE username/password from SDP. Controlling peer nominates candidate pair once good path found.
ICE States	Transitions: checking � connected � completed. If connectivity lost, can go to failed state. Renegotiation can restart ICE with new candidates.
Privacy	mDNS for hostnames instead of local IPs in SDP. Initial checks are authenticated STUN.

Secure Media Transport (RTP/SRTP, RTCP, and Media Parameters)

Once signaling and ICE have done their jobs, the peers have a direct line of communication. For media (audio and video), WebRTC uses the established protocols from the real-time media world: RTP (Real-time Transport Protocol) for media streams and RTCP (RTP Control Protocol) for periodic feedback and control information.

RTP and Media Streams

RTP (RFC 3550) provides the basic framing and metadata for media frames in transit:

RTP Field	Purpose
Timestamp	Timing information for synchronization
Sequence Number	Detect packet loss and reordering
Payload Type	Format indicator (codec identification)
SSRC	Synchronization Source identifier - unique per RTP stream

In WebRTC, each media track (audio or video) is sent as one or more RTP streams, identified by SSRC and negotiated via SDP. For example:

Audio track: one RTP stream using Opus codec (payload type 111)
Video track: another RTP stream using VP8 (payload type 96)

For simulcasting or scalable video coding (SVC), multiple RTP streams may correspond to one logical video source, distinguished by different SSRCs and signaled via a=simulcast or RID attributes.

Transport Multiplexing

WebRTC endpoints by default multiplex all media over a single transport. Early RTP implementations used separate UDP ports for each stream and RTCP, but WebRTC assumes:

BUNDLE - multiplexing multiple RTP streams on one transport five-tuple
RTCP-mux - sending RTCP on the same port as RTP

Negotiated in SDP with:

a=group:BUNDLE - grouping all media "m=" lines to use one transport
a=rtcp-mux - no separate RTCP ports

RFC 8844 ensures that if a peer doesn't support rtcp-mux, the call won't proceed - WebRTC has no fallback for non-mux.

Because everything is on one UDP flow, demultiplexing is handled internally by inspecting packet data per RFC 7983:

STUN messages: characteristic magic cookie value
DTLS: signature bytes
RTP/RTCP: payload type ranges

SRTP and Encryption

A critical WebRTC requirement is that all media is secure. WebRTC uses Secure RTP (SRTP, RFC 3711) - RTP with encryption and authentication applied to payload and partially to headers. Keys are established via the DTLS handshake (DTLS-SRTP, RFC 5764).

After the DTLS handshake:

Media packets protected by SRTP (typically AES encryption + HMAC-SHA1/SHA-256 authentication, or AES-GCM)
RTCP packets sent as SRTCP (Secure RTCP) with similar protection
DTLS connection only used for initial handshaking - keys provided to SRTP, DTLS channel kept alive for data channels

Note: While SRTP encryption protects media from eavesdropping, network administrators can still monitor WebRTC quality using tools like VoIPmonitor that decrypt traffic using the server's private TLS key. This enables quality analysis without compromising end-user security.

Media Quality and Codec Considerations

WebRTC audio and video must work under a wide range of network conditions. Several mechanisms are in place:

RTCP Feedback (RFC 4585, 5104):

PLI (Picture Loss Indication) - request keyframe after video losses
FIR (Full Intra Request) - similar to PLI
NACK - signal specific packet losses for retransmission
REMB/Transport-CC - receiver estimated maximum bitrate

Congestion Control (RFC 8836):

Monitor packet loss, delay (RTT), and jitter
Adjust sending rate using algorithms like Google's GCC
Reduce bitrate by dropping quality or frame rate

Error Resilience:

FEC (Forward Error Correction) - RFC 8854
RTX (Retransmission) - resend lost packets upon NACK
Opus built-in FEC for audio
VP8/VP9 redundancy modes for video

Mandatory Codecs:

Media Type	Mandatory Codecs	Notes
Audio	Opus (RFC 7874/7875)	Primary codec - adaptive bitrate, FEC support, fullband stereo
Audio	G.711 (PCMU/PCMA)	Legacy interoperability
Video	VP8	RFC 7742 - royalty-free, widely supported
Video	H.264	RFC 7742 - baseline profile, hardware acceleration common
Video	VP9	Widely supported, better compression than VP8
Video	AV1	Newest, best compression, growing support

Multiple Streams and Identification

For complex scenarios with multiple media streams:

Attribute	Purpose
`a=mid`	Labels each media section in SDP; RTP header extension carries MID for demuxing bundled flows
`a=msid`	Ties RTP streams to MediaStream IDs and track IDs from web API
`a=rid`	RTP Stream ID (RFC 8851) - labels individual encoding streams in simulcast scenarios

Quality of Service (QoS)

WebRTC endpoints can mark packets with DSCP (Differentiated Services Code Point) values to indicate priority to routers (RFC 8837). Audio packets typically marked as high priority since audio quality is more sensitive to delay than video. However, many networks ignore DSCP.

Media Transport Quick Reference

Topic	Key Points
Encryption	All media encrypted and authenticated. RTP sent as SRTP using keys from DTLS handshake (DTLS-SRTP per RFC 5764). No option for unencrypted media.
Single 5-tuple	One transport for multiplexing all media and data (via BUNDLE and RTCP mux). Conserves resources and simplifies NAT traversal.
RTP specifics	Implement RTP/RTCP per RFC 3550/3551. Support RTCP feedback (PLI, FIR, NACK). Audio typically 20ms packets (Opus frame). Lost packets handled via jitter buffers and NACK/RTX.
Mandatory codecs	Opus for audio, VP8 and H.264 for video. Modern browsers also support VP9 and AV1. Codec negotiation via SDP offer/answer.
Adaptive bitrate	Monitor network conditions and dynamically adjust. May reduce video resolution/frame rate or audio bitrate.
RTCP use	Sender Reports (SR) and Receiver Reports (RR) provide statistics (packet counts, loss fraction, jitter). Used for RTT calculation and quality assessment.

Data Channels (SCTP over DTLS)

In addition to media, WebRTC allows peer-to-peer data channels that can carry arbitrary application data (chat messages, file transfers, game state, etc.). Data channels are built on:

SCTP (Stream Control Transmission Protocol, RFC 4960) - for reliability and ordering
DTLS - SCTP runs on top of DTLS
ICE/UDP - DTLS runs over ICE (or ICE/TCP if needed)

Data channels piggyback on the same secure connection used for media, avoiding separate port or ICE negotiation.

Why SCTP?

SCTP was chosen because it supports:

Multiple logical streams within a single association
Configurable ordering (ordered or unordered per stream)
Configurable reliability (reliable or partially reliable)

One SCTP association is established between peers over DTLS, with up to 65,534 streams available, each representing a separate data channel.

Channel Configuration Options

Option	Values	Description
Ordered	true / false	In-order delivery (like TCP) or messages arrive as soon as possible (unordered)
Reliable	true / partial / false	Full retransmission (reliable), limited retries (partial), or no retransmission (unreliable like UDP)
maxRetransmits	number	Limit number of retransmission attempts (partial reliability)
maxPacketLifeTime	milliseconds	Timeout for retries (partial reliability via PR-SCTP extension, RFC 3758)

Default data channel is ordered and reliable (like TCP), but configurable for each channel.

Establishing Data Channels (DCEP)

The Data Channel Establishment Protocol (DCEP) (RFC 8832) defines control messages for opening channels:

Message	Purpose
DATA_CHANNEL_OPEN	Sent on reserved SCTP stream to open channel. Includes: label, priority, reliability parameters, optional subprotocol.
DATA_CHANNEL_ACK	Response acknowledging channel opening.

Channels can also be negotiated out-of-band via SDP (RFC 8864), but in-band DCEP negotiation is more common.

Integration with SDP

Data channel support indicated in SDP by:

m= section of type application with protocol DTLS/SCTP or UDP/DTLS/SCTP
a=sctp-port - SCTP port number (often 5000 or 5001)
a=max-message-size - support for large messages

Data Channels Quick Reference

Topic	Key Points
DataChannel API	Simple message-based pipes - `.send()` data and receive 'message' events. All data channels share one SCTP association between peers.
Protocols	SCTP (RFC 4960) provides transport with streams. DCEP (RFC 8832) for opening channels within SCTP. All secured by DTLS (SCTP-over-DTLS-over-ICE).
Stream independence	Each data channel is one SCTP stream. Loss on one ordered stream doesn't block unordered ones. Up to 65k channels possible.
Ordered vs Unordered	Ordered: messages arrive in send order (like TCP). Unordered: messages arrive ASAP, even if earlier ones delayed/lost.
Reliable vs Partial	Reliable: retransmit indefinitely (like TCP). Partial: limit retries or time, trading completeness for timeliness. Configured via `maxRetransmits` or `maxPacketLifeTime`.
Opening handshake	`createDataChannel(label, options)` � SCTP association initiated � DCEP Open sent � Remote `ondatachannel` fires � Ack sent � Messages can flow.
Use cases	Text chat, file transfer, gaming synchronization, tunneling protocols, P2P CDN. Low-latency direct delivery reduces server load.

Security and Privacy in WebRTC

Security is a first-class concern in WebRTC's design. The goal is that users can communicate freely without eavesdropping or tampering. Media and data are encrypted on the wire (end-to-end between peers) using DTLS and SRTP.

For enterprises: While WebRTC encryption protects against external threats, organizations often need visibility into their own WebRTC traffic for quality monitoring, troubleshooting, and compliance. Solutions like VoIPmonitor can decrypt and analyze WebRTC calls when configured with the server's private key, providing full CDR and quality metrics without weakening security against external attackers.

Encryption & Authentication

Every WebRTC connection is encrypted such that no third party can decipher the media or data in transit, nor inject malicious packets without detection. Authentication of the remote party is achieved through DTLS certificate fingerprint verification:

SDP includes fingerprint (SHA-256 hash of DTLS certificate)
During DTLS handshake, each side verifies certificate matches SDP fingerprint
If attacker tries MITM, they would have to alter fingerprint and present their own certificate
Legitimate peer would detect mismatch

This is why secure signaling channel is crucial - if fingerprint and ICE info are delivered accurately, the peer connection is secure.

Browser Security Model

WebRTC only available on secure contexts (HTTPS) in modern browsers
User permission required for microphone/camera access
Data channels don't require special permission but page must be trusted context

Privacy: IP Address Exposure

Early WebRTC implementations exposed local IP addresses via ICE candidates, enabling browser fingerprinting. Mitigations:

Mitigation	Description
mDNS obfuscation	Browsers use .local hostnames (UUID.local) instead of IP addresses in host candidates
Delayed gathering	No candidates gathered until media or data component authorized
Relay-only mode	Enterprise policies can force using only TURN relay candidates
Limited API access	Non-HTTPS pages get restricted or no WebRTC access

Additional Security Features

Feature	Description
Perfect Forward Secrecy	DTLS with modern cipher suites provides PFS - past communications cannot be decrypted even if private keys compromised later
Key Continuity	Renegotiation maintains or refreshes DTLS connection. ICE restart triggers new DTLS handshake with new keys.
Identity Framework	Optional IdP (Identity Provider) integration for cryptographic peer identity verification
DoS Protection	STUN requires ICE credentials, DTLS has handshake backoff, browsers rate-limit operations

Security & Privacy Quick Reference

Topic	Key Points
Encryption mandatory	No plaintext audio/video ever sent. DTLS and SRTP used. If not encrypted, it's not WebRTC.
DTLS Fingerprints	Each peer's certificate SHA-256 fingerprint in SDP ensures talking to right peer (assuming signaling not compromised).
Secure signaling	Use TLS for signaling transport and authenticate server. WebRTC doesn't protect offer/answer itself.
NAT traversal vs Privacy	Host candidates might reveal local IPs. Mitigated with mDNS and relay-only settings.
Browser constraints	HTTPS required. Camera/mic need permission. Data channel doesn't need permission but needs trusted context.
Avoiding pitfalls	Don't log SDP unnecessarily (contains IPs). Clean up unused PeerConnections. Use standard WebRTC libraries.
Future updates	DTLS 1.3, newer cipher suites (ECDSA, Ed25519), Oblivious relay research. Stay updated with browser releases.

Monitoring WebRTC with VoIPmonitor

WebRTC's mandatory encryption presents a unique challenge for network monitoring and troubleshooting. Unlike traditional SIP/RTP where traffic can be captured in plaintext, WebRTC requires specialized tools capable of decrypting both the signaling and media layers.

VoIPmonitor WebRTC Capabilities

VoIPmonitor provides comprehensive WebRTC monitoring by decrypting both encryption layers:

Layer	Protocol	What VoIPmonitor Captures
Signaling	SIP over WSS (Secure WebSocket)	Call setup, Offer/Answer SDP exchange, ICE candidates, call metadata
Media	DTLS-SRTP	RTP streams, audio/video quality metrics, packet loss, jitter, MOS scores

This enables full visibility into WebRTC calls for:

Quality monitoring - Track MOS scores, packet loss, jitter, and latency in real-time
Troubleshooting - Analyze call setup failures, ICE negotiation issues, codec problems
CDR generation - Generate detailed call records for encrypted WebRTC sessions
SLA compliance - Monitor voice quality against service level agreements

How It Works

VoIPmonitor decrypts WebRTC traffic by using the PBX's private TLS key. The decryption process works as follows:

WSS Decryption - VoIPmonitor uses the private key to decrypt TLS-protected WebSocket traffic, revealing the SIP signaling (INVITE, 200 OK, BYE, etc.)
DTLS-SRTP Key Extraction - From the decrypted SDP, VoIPmonitor extracts DTLS parameters and performs the DTLS handshake to obtain SRTP master keys
SRTP Decryption - Using the derived keys, VoIPmonitor decrypts the actual audio/video RTP streams for quality analysis

Configuration

To enable WebRTC monitoring, configure /etc/voipmonitor.conf with the SSL module and your PBX's private key:

ssl = yes
ssl_ipport = 192.168.1.100:8089 /etc/asterisk/keys/asterisk.key

Where:

192.168.1.100:8089 is the IP and port of your PBX's WSS interface
/etc/asterisk/keys/asterisk.key is the path to the private key file

This configuration works with:

Asterisk with PJSIP and WebSocket transport
FreeSWITCH with mod_verto or mod_sofia WebSocket support
Other PBX systems that use standard WSS for WebRTC signaling

WebRTC Monitoring Without SIP Signaling (Pure DTLS-SRTP)

Some WebRTC deployments do not use SIP signaling at all. In these cases, VoIPmonitor cannot create CDRs through normal SIP processing because there is no INVITE, 200 OK, or BYE messages to detect call setup and teardown.

Symptoms of Missing SIP Signaling:

WebRTC RTP streams are visible in packet captures but do not appear as CDRs in the GUI
No calls are recorded despite seeing encrypted traffic on the network
WebRTC server uses a single UDP port for all media (BUNDLE multiplexing) without any SIP SDP exchange

Solution: Use --rtp-no-sig Command-Line Flag

The --rtp-no-sig flag enables VoIPmonitor to create CDRs based entirely on RTP stream detection, bypassing the need for SIP signaling analysis.

Method	When to Use	Configuration
WebRTC without SIP, pure RTP monitoring, or when SIP signaling is not available for analysis \| Add to VoIPmonitor sniffer startup arguments (not in voipmonitor.conf)

Configuring --rtp-no-sig:

This flag is a command-line parameter for the voipmonitor sniffer binary. Add it to your service startup configuration:

For systemd (e.g., /etc/systemd/system/voipmonitor.service or via systemctl edit voipmonitor):

[Service]
ExecStart=/usr/local/sbin/voipmonitor --rtp-no-sig -d -c /etc/voipmonitor.conf

For init scripts (/etc/init.d/voipmonitor), add the flag to the ARGS or DAEMON_ARGS variable:

ARGS="--rtp-no-sig -d -c /etc/voipmonitor.conf"

Then restart the service:

systemctl daemon-reload
systemctl restart voipmonitor

ℹ️ Note: The --rtp-no-sig flag is not a configuration file option. It must be passed as a command-line argument to the voipmonitor binary at startup.

Combining with DTLS Decryption:

When using --rtp-no-sig with WebRTC, you can still configure DTLS-SRTP decryption using either:

Method 1: Private Key Decryption - Works for TLS 1.2 without Perfect Forward Secrecy (see Tls) configuration:

ssl = yes
ssl_ipport = 192.168.1.100:8089 /etc/asterisk/keys/asterisk.key

Method 2: SSL Key Logger (Recommended) - Universal solution for all ciphers including PFS. Install the sslkeylog.so library on the WebRTC server (see Tls for detailed instructions). The keylogger extracts DTLS keys and sends them to VoIPmonitor.

QoS Metrics Without Decryption:

Even without DTLS decryption (when using --rtp-no-sig without SSL configuration), VoIPmonitor can collect certain QoS metrics from encrypted RTP streams:

Packet loss percentage
Jitter (delay variation)
Latency/RTT measurements (from RTCP if not encrypted)
Codec detection (from unencrypted RTP headers)

These metrics provide visibility into network quality even when the media content cannot be decrypted for audio replay.

Quality Metrics Available

Once decryption is configured, VoIPmonitor provides the same comprehensive quality metrics for WebRTC as for traditional VoIP:

Metric	Description
MOS (Mean Opinion Score)	Calculated voice quality score (1.0-4.5)
Packet Loss	Percentage of lost RTP packets
Jitter	Variation in packet arrival times
Latency/RTT	Round-trip time measurements from RTCP
Codec Detection	Identifies Opus, VP8, H.264, and other WebRTC codecs
ICE Candidate Analysis	Tracks which ICE candidate pairs were used (host/srflx/relay)

For detailed setup instructions, see the VoIPmonitor WebRTC Configuration Guide.

Conclusion

WebRTC brings together a complex set of protocolsSDP for session setup, ICE (with STUN/TURN) for connectivity, DTLS for security, SRTP for media, SCTP for datato enable seamless real-time communication between peers. While each component can be intricate, this guide has walked through the big picture: from the initial offer/answer negotiation down to the encrypted packets on the wire.

WebRTC's power lies in abstracting most of this complexity under a simple API, but understanding the underlying protocols (and their configuration via SDP) is crucial for debugging, optimizing, and building interoperable solutions. With this knowledge, one can appreciate how WebRTC achieves what it does: making a web browser (or any endpoint with a WebRTC stack) a full-fledged real-time communicator, armed with the best of both telecom and internet protocols.

References

RFC/Resource	Description
RFC 8835	WebRTC Overview
RFC 8866	Session Description Protocol (SDP)
RFC 8829	JavaScript Session Establishment Protocol (JSEP)
RFC 8445	Interactive Connectivity Establishment (ICE)
RFC 8489	Session Traversal Utilities for NAT (STUN)
RFC 8656	Traversal Using Relays around NAT (TURN)
RFC 8838	Trickle ICE
RFC 3550	RTP: Real-time Transport Protocol
RFC 3711	Secure RTP (SRTP)
RFC 5764	DTLS-SRTP
RFC 4960	Stream Control Transmission Protocol (SCTP)
RFC 8832	Data Channel Establishment Protocol (DCEP)
RFC 8826	WebRTC Security Architecture
RFC 7874	WebRTC Audio Codec Requirements
RFC 7742	WebRTC Video Codec Requirements
WebRTC for the Curious	Comprehensive WebRTC learning resource

AI Summary for RAG

Summary: Comprehensive guide to WebRTC protocol architecture including SDP, ICE (STUN/TURN), DTLS-SRTP encryption, RTP/RTCP media transport, and SCTP data channels. Covers P2P connection establishment, NAT traversal, security with certificate fingerprints, packet multiplexing (BUNDLE, RTCP-mux) per RFC 7983, and protocol interactions. Explains VoIPmonitor integration for WebRTC monitoring: standard SIP-over-WSS deployment with SSL decryption (Method 1: Private Key in /etc/voipmonitor.conf via ssl_ipport=IP:PORT /path/to/key or GUI Settings > Sensors; Method 2: SSL Key Logger via Tls with LD_PRELOAD injecting sslkeylog.so). **Critical for WebRTC without SIP signaling:** use --rtp-no-sig command-line flag to create CDRs based on RTP streams only. This flag is not a voipmonitor.conf option—add to ExecStart in systemd or ARGS in init scripts. Required when WebRTC servers use single UDP port (BUNDLE) without SIP/SDP transport or when monitoring pure DTLS-SRTP media. Combine with DTLS decryption (private key or SSL Key Logger) for audio replay; without decryption, QoS metrics (packet loss, jitter, latency) are still available from unencrypted RTP headers.

Keywords: webrtc, dtls-srtp, ice, stun, turn, sdp, bundle, rtp-mux, rfc-7983, signaling, jsep, sctp, data channels, websockets, wss, certificate fingerprint, ssl decryption, ssl_ipport, rtp-no-sig, --rtp-no-sig, no signaling, cdr without sip, qos without decryption, pure rtp monitoring

Key Questions:

How does VoIPmonitor monitor WebRTC calls with SIP signaling?
How do I configure SSL/TLS decryption for WebRTC?
VoIPmonitor cannot capture WebRTC RTP streams when the server uses a single UDP port for all streams and relies on STUN packets for separation. What do I do?
How do I monitor WebRTC traffic that does not use SIP signaling at all?
What is the --rtp-no-sig flag and when should I use it?
How does WebRTC multiplex multiple media streams on a single UDP port?
What are the differences between STUN, TURN, and ICE in WebRTC?
How does DTLS-SRTP encryption work in WebRTC?
How do I extract SSL keys from a WebRTC server for monitoring?
Can VoIPmonitor collect quality metrics from encrypted WebRTC traffic without decryption?