Understanding the RTP Protocol: Difference between revisions
No edit summary |
No edit summary |
||
| Line 16: | Line 16: | ||
* [[#RTP_Header_Fields|Header Fields Reference]] | * [[#RTP_Header_Fields|Header Fields Reference]] | ||
* [[#Header_Extensions|Header Extensions]] | * [[#Header_Extensions|Header Extensions]] | ||
| valign="top" | | | valign="top" | | ||
* [[#RTP_Control_Protocol_(RTCP)|RTCP Overview]] | * [[#RTP_Control_Protocol_(RTCP)|RTCP Overview]] | ||
* [[#RTCP_Packet_Types|RTCP Packet Types]] | * [[#RTCP_Packet_Types|RTCP Packet Types]] | ||
* [[#RTCP_Bandwidth_Control|Bandwidth Control]] | * [[#RTCP_Bandwidth_Control|Bandwidth Control]] | ||
* [[# | * [[#Receiver_Report_Fields|Quality Feedback]] | ||
| valign="top" | | | valign="top" | | ||
* [[#RTP_Profiles_and_Payload_Types|Profiles & Payload Types]] | * [[#RTP_Profiles_and_Payload_Types|Profiles & Payload Types]] | ||
* [[#Session_Setup_and_Signaling|SIP/SDP Integration]] | * [[#Session_Setup_and_Signaling|SIP/SDP Integration]] | ||
* [[#Transport_Ports|Transport Ports]] | * [[#Transport_Ports|Transport Ports]] | ||
* [[# | * [[#Dynamic_Payload_Type_Negotiation|Codec Negotiation]] | ||
| valign="top" | | | valign="top" | | ||
* [[#Mixers_and_Translators|Mixers & Translators]] | * [[#Mixers_and_Translators|Mixers & Translators]] | ||
| Line 240: | Line 237: | ||
storage "RTCP" as rtcp | storage "RTCP" as rtcp | ||
rectangle "QoS Feedback" as qos | rectangle "QoS Feedback\n(loss, jitter, RTT)" as qos | ||
rectangle "Synchronization\n(NTP/RTP timestamps)" as sync | |||
rectangle "Identification\n(CNAME, SDES)" as id | |||
rectangle "Session Control\n(keep-alive, BYE)" as ctrl | |||
rtcp -down-> qos | |||
rtcp -down-> sync | |||
rtcp -down-> id | |||
rtcp -down-> ctrl | |||
rtcp --> qos | |||
rtcp --> sync | |||
rtcp --> id | |||
rtcp --> ctrl | |||
@enduml | @enduml | ||
| Line 488: | Line 469: | ||
RTP supports intermediate nodes for complex topologies: | RTP supports intermediate nodes for complex topologies: | ||
{| class="wikitable" style="width: 100%;" | |||
|- | |||
! style="width: 50%; text-align: center;" | Translator | |||
! style="width: 50%; text-align: center;" | Mixer | |||
|- | |||
| | |||
<kroki lang="plantuml"> | <kroki lang="plantuml"> | ||
@startuml | @startuml | ||
skinparam backgroundColor #f8fafc | skinparam backgroundColor #f8fafc | ||
| Line 507: | Line 493: | ||
} | } | ||
rectangle "Source A (SSRC=1)" as ta | |||
rectangle "Source B (SSRC=2)" as tb | |||
storage "Translator" as t | |||
rectangle "Receiver" as tr | |||
ta -down-> t | |||
tb -down-> t | |||
t -down-> tr : "SSRC=1\nSSRC=2" | |||
@enduml | |||
</kroki> | |||
| | |||
<kroki lang="plantuml"> | |||
@startuml | |||
skinparam backgroundColor #f8fafc | |||
skinparam defaultFontColor #1e293b | |||
skinparam ArrowColor #64748b | |||
skinparam rectangle { | |||
BackgroundColor #e0f4fc | |||
BorderColor #00A7E3 | |||
RoundCorner 8 | |||
} | |||
skinparam storage { | |||
BackgroundColor #fef3e2 | |||
BorderColor #f78d1d | |||
} | } | ||
rectangle "Source A (SSRC=1)" as ma | |||
rectangle "Source B (SSRC=2)" as mb | |||
storage "Mixer (SSRC=M)" as m | |||
rectangle "Receiver" as mr | |||
ma -down-> m | |||
mb -down-> m | |||
m -down-> mr : "SSRC=M\nCSRC=[1,2]" | |||
@enduml | @enduml | ||
</kroki> | </kroki> | ||
|} | |||
{| class="wikitable" style="width: 100%;" | {| class="wikitable" style="width: 100%;" | ||
| Line 569: | Line 573: | ||
* '''Replay protection''' - sequence number tracking | * '''Replay protection''' - sequence number tracking | ||
{| class="wikitable" style="margin: auto;" | |||
|+ '''SRTP Packet Structure''' | |||
|- | |||
! Component !! Encryption !! Purpose | |||
|- | |||
| RTP Header || Cleartext || SSRC, seq, timestamp visible for routing | |||
|- | |||
| Payload || '''AES Encrypted''' || Media content protected | |||
|- | |||
| Auth Tag || HMAC-SHA1 || Integrity verification (typically 10 bytes) | |||
|} | |||
=== SRTP Key Exchange Methods === | === SRTP Key Exchange Methods === | ||
Revision as of 00:29, 12 December 2025
The Real-time Transport Protocol (RTP) is an Internet-standard transport protocol for real-time audio, video, and other time-sensitive data transmission, defined in RFC 3550. This comprehensive guide covers all essential aspects of RTP including packet structure, RTCP, profiles, signaling integration, and security.
| Protocol Fundamentals | Control & Feedback | Session Management | Advanced Topics |
|---|---|---|---|
Introduction
RTP provides end-to-end delivery services for media streams, including:
- Sequence numbering - detect packet loss and reorder packets
- Timestamping - enable jitter calculation and synchronization
- Payload type identification - identify media encoding
- Monitoring - via companion RTCP protocol
RTP is typically used on top of UDP for its low overhead and latency. Unlike TCP, RTP/UDP does not guarantee delivery or ordering, nor does it provide congestion control or QoS guarantees. Instead, it tolerates some packet loss and reordering as a trade-off for timely delivery - a lost packet is preferable to a delayed one in interactive communications.
| Feature | Description |
|---|---|
| Transport | UDP (typically), can use other transports |
| Reliability | No guarantee - tolerates loss for timeliness |
| Companion Protocol | RTCP for control, feedback, and monitoring |
| Profiles | Extensible via profiles (AVP, AVPF, SAVP, etc.) |
| Scalability | Supports unicast and multicast, one-to-one to large conferences |
| Session Setup | External (SIP/SDP, H.323, WebRTC signaling) |
RTP Packet Structure
Every RTP packet consists of a header followed by a payload (the media data). The fixed RTP header is 12 bytes long and contains fields that enable proper delivery and playback of real-time media.
RTP Header Fields
| Field | Bits | Description |
|---|---|---|
| Version (V) | 2 | RTP version number. Always 2 for RFC 3550. |
| Padding (P) | 1 | If set, packet contains padding bytes at the end. Last byte indicates padding count. |
| Extension (X) | 1 | If set, header extension follows CSRC list. |
| CSRC Count (CC) | 4 | Number of CSRC identifiers (0-15). |
| Marker (M) | 1 | Profile-specific meaning. Video: last packet of frame. Audio: start of talkspurt. |
| Payload Type (PT) | 7 | Media format identifier. Static (0-95) or dynamic (96-127). |
| Sequence Number | 16 | Increments by 1 per packet. Starts random. Wraps at 65535. |
| Timestamp | 32 | Sampling instant of first byte. Clock rate depends on payload format. |
| SSRC | 32 | Synchronization source identifier. Randomly chosen, unique per session. |
| CSRC List | 0-480 | Contributing source IDs (used by mixers). Up to 15 × 32-bit IDs. |
Example RTP Header:
V=2, P=0, X=0, CC=0, M=1, PT=96, Seq=12345, Timestamp=0x30551980, SSRC=0x1A2B3C4D
This indicates: RTP version 2, no padding/extension, no CSRCs, marker set (end of frame), payload type 96 (dynamic - e.g., H.264), sequence 12345, with SSRC 0x1A2B3C4D.
Header Extensions
RTP allows optional header extensions when the X bit is set. RFC 5285/8285 introduced one-byte and two-byte extension formats allowing multiple elements:
| Extension Type | Description | Negotiation |
|---|---|---|
| One-byte header | Up to 14 extension elements, each 1-16 bytes | SDP a=extmap
|
| Two-byte header | Larger extensions, more flexibility | SDP a=extmap
|
| Common uses | Audio levels, video orientation, timing info | Signaling agreement |
RTP Control Protocol (RTCP)
RTCP is defined alongside RTP in RFC 3550 and provides:
- Quality of Service feedback - packet loss, jitter, round-trip time
- Inter-media synchronization - correlate audio/video timestamps via NTP
- Participant identification - CNAME and other source descriptors
- Session control - keep-alive, goodbye notifications
RTCP Packet Types
| Type | Code | Name | Sender | Contents |
|---|---|---|---|---|
| SR | 200 | Sender Report | Active senders | NTP/RTP timestamps, packet/byte counts, reception reports |
| RR | 201 | Receiver Report | Non-senders | Fraction lost, cumulative loss, jitter, LSR, DLSR |
| SDES | 202 | Source Description | All participants | CNAME (required), NAME, EMAIL, PHONE, LOC, TOOL, NOTE |
| BYE | 203 | Goodbye | Leaving participant | SSRC of departing stream, optional reason |
| APP | 204 | Application-specific | Application-defined | Custom data for experimental features |
RTCP Compound Packet Structure
Per RFC 3550, each RTCP compound packet must:
- Start with SR or RR
- Include SDES with at least CNAME
- Optionally include BYE, APP, or other packets
+--------+--------+--------+--------+ | SR or RR (required first) | +--------+--------+--------+--------+ | SDES with CNAME (required) | +--------+--------+--------+--------+ | BYE (optional) | +--------+--------+--------+--------+ | APP (optional) | +--------+--------+--------+--------+
RTCP Bandwidth Control
| Parameter | Value | Description |
|---|---|---|
| Total RTCP bandwidth | ~5% of session bandwidth | Prevents control overhead from dominating |
| Senders share | 25% of RTCP (1.25% total) | For SR packets |
| Receivers share | 75% of RTCP (3.75% total) | For RR packets |
| Minimum interval | 5 seconds (AVP profile) | Between RTCP reports |
| Scaling | Randomized, participant-based | Adapts to session size |
Receiver Report Fields
| Field | Size | Description |
|---|---|---|
| SSRC of source | 32 bits | Which stream this report is about |
| Fraction lost | 8 bits | Packets lost / packets expected since last RR (0-255 = 0%-100%) |
| Cumulative lost | 24 bits | Total packets lost since session start |
| Extended highest seq | 32 bits | Highest sequence number received (with rollover) |
| Interarrival jitter | 32 bits | Statistical variance of packet inter-arrival time |
| Last SR (LSR) | 32 bits | Middle 32 bits of NTP timestamp from last SR |
| Delay since last SR (DLSR) | 32 bits | Time between receiving SR and sending this RR |
RTP Profiles and Payload Types
RTP profiles define how the protocol is used for specific applications:
| Profile | RFC | Description |
|---|---|---|
| RTP/AVP | 3551 | Audio/Video Profile - standard A/V conferencing |
| RTP/AVPF | 4585 | AVP with Feedback - immediate RTCP feedback (PLI, NACK) |
| RTP/SAVP | 3711 | Secure AVP - SRTP encryption and authentication |
| RTP/SAVPF | 5124 | Secure AVPF - SRTP with feedback |
Static Payload Types (AVP Profile)
| PT | Encoding | Clock Rate | Type |
|---|---|---|---|
| 0 | PCMU (G.711 μ-law) | 8000 Hz | Audio |
| 3 | GSM | 8000 Hz | Audio |
| 4 | G723 | 8000 Hz | Audio |
| 8 | PCMA (G.711 A-law) | 8000 Hz | Audio |
| 9 | G722 | 8000 Hz | Audio |
| 18 | G729 | 8000 Hz | Audio |
| 31 | H261 | 90000 Hz | Video |
| 32 | MPV (MPEG-1/2 Video) | 90000 Hz | Video |
| 34 | H263 | 90000 Hz | Video |
| 96-127 | Dynamic | Negotiated | Any |
Dynamic Payload Type Negotiation
Dynamic payload types (96-127) are negotiated via SDP:
m=audio 4000 RTP/AVP 96 97 a=rtpmap:96 opus/48000/2 a=rtpmap:97 telephone-event/8000 a=fmtp:97 0-16 m=video 4002 RTP/AVP 98 a=rtpmap:98 H264/90000 a=fmtp:98 profile-level-id=42e01f
Session Setup and Signaling
RTP does not provide session establishment - this is handled by external signaling protocols.
Transport Ports
| Convention | RTP Port | RTCP Port | Notes |
|---|---|---|---|
| Traditional | Even number (e.g., 4000) | RTP + 1 (e.g., 4001) | Separate ports |
| RTCP Mux | Same as RTP | Same as RTP | SDP: a=rtcp-mux
|
| Demultiplexing | PT 0-127 | PT >= 200 (RTCP types) | By first byte |
SDP Media Description
v=0 o=alice 2890844526 2890844526 IN IP4 10.1.1.5 s=VoIP Call c=IN IP4 10.1.1.5 t=0 0 m=audio 4000 RTP/AVP 0 8 96 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:96 opus/48000/2 a=rtcp-mux a=sendrecv
Mixers and Translators
RTP supports intermediate nodes for complex topologies:
| Translator | Mixer |
|---|---|
|
|
|
| Feature | Translator | Mixer |
|---|---|---|
| SSRC handling | Preserves original SSRCs | Creates new SSRC, lists originals in CSRC |
| Stream output | Separate streams forwarded | Single combined stream |
| Use cases | Relay, transcoding, firewall traversal | Audio conferencing, video compositing |
| Timing | Maintains original timing | Becomes timing master |
| Bandwidth | Sum of all streams | Single stream (reduced) |
| RTCP | Forwards or adjusts | Generates own SR, sends RR upstream |
Translator Use Cases
- Multicast-to-unicast relay
- Encryption/decryption gateway
- Codec transcoding
- IPv4/IPv6 bridging
- Firewall traversal
Mixer Use Cases
- Audio conference bridges (mixing multiple speakers)
- Video MCU (compositing multiple feeds)
- Bandwidth reduction for large conferences
Security Considerations (SRTP)
SRTP (Secure RTP, RFC 3711) provides:
- Confidentiality - AES encryption of payload
- Integrity - HMAC-SHA1 authentication
- Replay protection - sequence number tracking
| Component | Encryption | Purpose |
|---|---|---|
| RTP Header | Cleartext | SSRC, seq, timestamp visible for routing |
| Payload | AES Encrypted | Media content protected |
| Auth Tag | HMAC-SHA1 | Integrity verification (typically 10 bytes) |
SRTP Key Exchange Methods
| Method | RFC | Description | Usage |
|---|---|---|---|
| SDES | 4568 | Keys in SDP (deprecated) | Legacy SIP |
| DTLS-SRTP | 5764 | In-band DTLS handshake | WebRTC (mandatory) |
| ZRTP | 6189 | In-call DH exchange | Oportunistic encryption |
| MIKEY | 3830 | Multimedia Internet KEYing | Group scenarios |
DTLS-SRTP
DTLS-SRTP (used in WebRTC) provides:
- Perfect forward secrecy (DH key exchange)
- Certificate fingerprint verification (via SDP)
- Multiplexing on same port as RTP/RTCP/STUN
SDP attributes for DTLS-SRTP:
a=fingerprint:sha-256 AB:CD:EF:... a=setup:actpass a=rtcp-mux
Troubleshooting RTP
| Issue | Symptoms | RTCP Indicator | Solution |
|---|---|---|---|
| Packet loss | Choppy audio, video artifacts | High fraction lost in RR | Check network path, QoS |
| High jitter | Audio gaps, video stuttering | High jitter value in RR | Increase jitter buffer, check network |
| One-way audio | Only one party hears | No RTP received | Check NAT, firewall, SDP IPs |
| No media | Complete silence | No RR/SR packets | Verify signaling, check ports |
| Codec mismatch | Garbled audio | PT doesn't match expected | Verify SDP negotiation |
| Clock drift | A/V desync over time | Compare SR timestamps | Use RTCP for sync |
Wireshark RTP Analysis
Key filters for RTP analysis:
rtp # All RTP packets rtcp # All RTCP packets rtp.ssrc == 0x1234abcd # Specific stream rtp.marker == 1 # Frame boundaries rtcp.pt == 200 # Sender Reports rtcp.pt == 201 # Receiver Reports
Telephony > RTP > RTP Streams - shows all streams with statistics
Quick Reference Tables
RTP Header Bit Layout
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CSRC list | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Clock Rates by Media Type
| Media Type | Typical Clock Rate | Examples |
|---|---|---|
| Narrowband audio | 8000 Hz | G.711, G.729, GSM |
| Wideband audio | 16000 Hz | G.722.1, AMR-WB |
| Full-band audio | 48000 Hz | Opus |
| Video | 90000 Hz | H.264, VP8, H.265 |
Marker Bit Usage
| Media Type | M=1 Meaning | Purpose |
|---|---|---|
| Video | Last packet of frame | Frame boundary detection |
| Audio (silence suppression) | First packet after silence | Talkspurt indication |
| RFC 4733 DTMF | End of DTMF event | Event boundary |
References
Primary Standards
- RFC 3550 - RTP: A Transport Protocol for Real-Time Applications
- RFC 3551 - RTP Profile for Audio and Video Conferences (AVP)
- RFC 3711 - The Secure Real-time Transport Protocol (SRTP)
- RFC 4585 - Extended RTP Profile for RTCP-Based Feedback (AVPF)
Extensions and Updates
- RFC 5761 - Multiplexing RTP Data and Control Packets on a Single Port
- RFC 5764 - DTLS Extension to Establish Keys for SRTP
- RFC 6051 - Rapid Synchronisation of RTP Flows
- RFC 6222 - Guidelines for Choosing RTCP Canonical Names (CNAMEs)
- RFC 8285 - A General Mechanism for RTP Header Extensions
Related Protocols
- RFC 4566 - SDP: Session Description Protocol
- RFC 3261 - SIP: Session Initiation Protocol
- RFC 4733 - RTP Payload for DTMF Digits, Telephony Tones