Understanding the RTP Protocol: Difference between revisions

From VoIPmonitor.org
No edit summary
 
No edit summary
Line 16: Line 16:
* [[#RTP_Header_Fields|Header Fields Reference]]
* [[#RTP_Header_Fields|Header Fields Reference]]
* [[#Header_Extensions|Header Extensions]]
* [[#Header_Extensions|Header Extensions]]
|-
| valign="top" |
| valign="top" |
* [[#RTP_Control_Protocol_(RTCP)|RTCP Overview]]
* [[#RTP_Control_Protocol_(RTCP)|RTCP Overview]]
* [[#RTCP_Packet_Types|RTCP Packet Types]]
* [[#RTCP_Packet_Types|RTCP Packet Types]]
* [[#RTCP_Bandwidth_Control|Bandwidth Control]]
* [[#RTCP_Bandwidth_Control|Bandwidth Control]]
* [[#Quality_Feedback|Quality Feedback]]
* [[#Receiver_Report_Fields|Quality Feedback]]
|-
| valign="top" |
| valign="top" |
* [[#RTP_Profiles_and_Payload_Types|Profiles & Payload Types]]
* [[#RTP_Profiles_and_Payload_Types|Profiles & Payload Types]]
* [[#Session_Setup_and_Signaling|SIP/SDP Integration]]
* [[#Session_Setup_and_Signaling|SIP/SDP Integration]]
* [[#Transport_Ports|Transport Ports]]
* [[#Transport_Ports|Transport Ports]]
* [[#Codec_Negotiation|Codec Negotiation]]
* [[#Dynamic_Payload_Type_Negotiation|Codec Negotiation]]
|-
| valign="top" |
| valign="top" |
* [[#Mixers_and_Translators|Mixers & Translators]]
* [[#Mixers_and_Translators|Mixers & Translators]]
Line 240: Line 237:
storage "RTCP" as rtcp
storage "RTCP" as rtcp


rectangle "QoS Feedback" as qos {
rectangle "QoS Feedback\n(loss, jitter, RTT)" as qos
  rectangle "Packet Loss" as loss
rectangle "Synchronization\n(NTP/RTP timestamps)" as sync
  rectangle "Jitter" as jitter
rectangle "Identification\n(CNAME, SDES)" as id
  rectangle "RTT" as rtt
rectangle "Session Control\n(keep-alive, BYE)" as ctrl
}


rectangle "Synchronization" as sync {
rtcp -down-> qos
  rectangle "NTP Timestamp" as ntp
rtcp -down-> sync
  rectangle "RTP Timestamp" as rtpts
rtcp -down-> id
}
rtcp -down-> ctrl
 
rectangle "Identification" as id {
  rectangle "CNAME" as cname
  rectangle "SDES items" as sdes
}
 
rectangle "Session Control" as ctrl {
  rectangle "Keep-alive" as keep
  rectangle "BYE" as bye
}
 
rtcp --> qos
rtcp --> sync
rtcp --> id
rtcp --> ctrl


@enduml
@enduml
Line 488: Line 469:
RTP supports intermediate nodes for complex topologies:
RTP supports intermediate nodes for complex topologies:


{| class="wikitable" style="width: 100%;"
|-
! style="width: 50%; text-align: center;" | Translator
! style="width: 50%; text-align: center;" | Mixer
|-
|
<kroki lang="plantuml">
<kroki lang="plantuml">
@startuml
@startuml
title RTP Mixer vs Translator


skinparam backgroundColor #f8fafc
skinparam backgroundColor #f8fafc
Line 507: Line 493:
}
}


rectangle "TRANSLATOR" as trans {
rectangle "Source A (SSRC=1)" as ta
  rectangle "Source A\nSSRC=1" as ta
rectangle "Source B (SSRC=2)" as tb
  rectangle "Source B\nSSRC=2" as tb
storage "Translator" as t
  storage "Translator" as t
rectangle "Receiver" as tr
  rectangle "Receiver" as tr
 
ta -down-> t
tb -down-> t
t -down-> tr : "SSRC=1\nSSRC=2"


  ta -right-> t : "SSRC=1"
@enduml
  tb -right-> t : "SSRC=2"
</kroki>
  t -right-> tr : "SSRC=1, SSRC=2\n(separate streams)"
|
<kroki lang="plantuml">
@startuml
 
skinparam backgroundColor #f8fafc
skinparam defaultFontColor #1e293b
skinparam ArrowColor #64748b
 
skinparam rectangle {
    BackgroundColor #e0f4fc
    BorderColor #00A7E3
    RoundCorner 8
}
 
skinparam storage {
    BackgroundColor #fef3e2
    BorderColor #f78d1d
}
}


rectangle "MIXER" as mix {
rectangle "Source A (SSRC=1)" as ma
  rectangle "Source A\nSSRC=1" as ma
rectangle "Source B (SSRC=2)" as mb
  rectangle "Source B\nSSRC=2" as mb
storage "Mixer (SSRC=M)" as m
  storage "Mixer\nSSRC=M" as m
rectangle "Receiver" as mr
  rectangle "Receiver" as mr


  ma -right-> m : "SSRC=1"
ma -down-> m
  mb -right-> m : "SSRC=2"
mb -down-> m
  m -right-> mr : "SSRC=M\nCSRC=[1,2]\n(combined stream)"
m -down-> mr : "SSRC=M\nCSRC=[1,2]"
}


@enduml
@enduml
</kroki>
</kroki>
|}


{| class="wikitable" style="width: 100%;"
{| class="wikitable" style="width: 100%;"
Line 569: Line 573:
* '''Replay protection''' - sequence number tracking
* '''Replay protection''' - sequence number tracking


<kroki lang="plantuml">
{| class="wikitable" style="margin: auto;"
@startuml
|+ '''SRTP Packet Structure'''
title SRTP Packet Structure
|-
 
! Component !! Encryption !! Purpose
skinparam backgroundColor #f8fafc
|-
skinparam defaultFontColor #1e293b
| RTP Header || Cleartext || SSRC, seq, timestamp visible for routing
 
|-
skinparam rectangle {
| Payload || '''AES Encrypted''' || Media content protected
    BackgroundColor #e0f4fc
|-
    BorderColor #00A7E3
| Auth Tag || HMAC-SHA1 || Integrity verification (typically 10 bytes)
    RoundCorner 4
|}
}
 
skinparam storage {
    BackgroundColor #fef3e2
    BorderColor #f78d1d
}
 
rectangle "SRTP Packet" as srtp {
  rectangle "RTP Header\n(cleartext)" as hdr
  rectangle "Encrypted Payload\n(AES)" as enc
  rectangle "Auth Tag\n(HMAC)" as auth
}
 
note right of hdr
  SSRC, seq, timestamp
  visible for routing
end note
 
note right of enc
  Media content protected
  AES-128 or AES-256
end note
 
note right of auth
  Integrity verification
  10 bytes typical
end note
 
@enduml
</kroki>


=== SRTP Key Exchange Methods ===
=== SRTP Key Exchange Methods ===

Revision as of 00:29, 12 December 2025

The Real-time Transport Protocol (RTP) is an Internet-standard transport protocol for real-time audio, video, and other time-sensitive data transmission, defined in RFC 3550. This comprehensive guide covers all essential aspects of RTP including packet structure, RTCP, profiles, signaling integration, and security.

Quick Navigation

Protocol Fundamentals Control & Feedback Session Management Advanced Topics

Introduction

RTP provides end-to-end delivery services for media streams, including:

  • Sequence numbering - detect packet loss and reorder packets
  • Timestamping - enable jitter calculation and synchronization
  • Payload type identification - identify media encoding
  • Monitoring - via companion RTCP protocol

RTP is typically used on top of UDP for its low overhead and latency. Unlike TCP, RTP/UDP does not guarantee delivery or ordering, nor does it provide congestion control or QoS guarantees. Instead, it tolerates some packet loss and reordering as a trade-off for timely delivery - a lost packet is preferable to a delayed one in interactive communications.

RTP Key Characteristics
Feature Description
Transport UDP (typically), can use other transports
Reliability No guarantee - tolerates loss for timeliness
Companion Protocol RTCP for control, feedback, and monitoring
Profiles Extensible via profiles (AVP, AVPF, SAVP, etc.)
Scalability Supports unicast and multicast, one-to-one to large conferences
Session Setup External (SIP/SDP, H.323, WebRTC signaling)

RTP Packet Structure

Every RTP packet consists of a header followed by a payload (the media data). The fixed RTP header is 12 bytes long and contains fields that enable proper delivery and playback of real-time media.

RTP Header Fields

RTP Header Fields Reference
Field Bits Description
Version (V) 2 RTP version number. Always 2 for RFC 3550.
Padding (P) 1 If set, packet contains padding bytes at the end. Last byte indicates padding count.
Extension (X) 1 If set, header extension follows CSRC list.
CSRC Count (CC) 4 Number of CSRC identifiers (0-15).
Marker (M) 1 Profile-specific meaning. Video: last packet of frame. Audio: start of talkspurt.
Payload Type (PT) 7 Media format identifier. Static (0-95) or dynamic (96-127).
Sequence Number 16 Increments by 1 per packet. Starts random. Wraps at 65535.
Timestamp 32 Sampling instant of first byte. Clock rate depends on payload format.
SSRC 32 Synchronization source identifier. Randomly chosen, unique per session.
CSRC List 0-480 Contributing source IDs (used by mixers). Up to 15 × 32-bit IDs.

Example RTP Header:

V=2, P=0, X=0, CC=0, M=1, PT=96, Seq=12345, Timestamp=0x30551980, SSRC=0x1A2B3C4D

This indicates: RTP version 2, no padding/extension, no CSRCs, marker set (end of frame), payload type 96 (dynamic - e.g., H.264), sequence 12345, with SSRC 0x1A2B3C4D.

Header Extensions

RTP allows optional header extensions when the X bit is set. RFC 5285/8285 introduced one-byte and two-byte extension formats allowing multiple elements:

Extension Type Description Negotiation
One-byte header Up to 14 extension elements, each 1-16 bytes SDP a=extmap
Two-byte header Larger extensions, more flexibility SDP a=extmap
Common uses Audio levels, video orientation, timing info Signaling agreement

RTP Control Protocol (RTCP)

RTCP is defined alongside RTP in RFC 3550 and provides:

  • Quality of Service feedback - packet loss, jitter, round-trip time
  • Inter-media synchronization - correlate audio/video timestamps via NTP
  • Participant identification - CNAME and other source descriptors
  • Session control - keep-alive, goodbye notifications

RTCP Packet Types

RTCP Packet Types (RFC 3550)
Type Code Name Sender Contents
SR 200 Sender Report Active senders NTP/RTP timestamps, packet/byte counts, reception reports
RR 201 Receiver Report Non-senders Fraction lost, cumulative loss, jitter, LSR, DLSR
SDES 202 Source Description All participants CNAME (required), NAME, EMAIL, PHONE, LOC, TOOL, NOTE
BYE 203 Goodbye Leaving participant SSRC of departing stream, optional reason
APP 204 Application-specific Application-defined Custom data for experimental features

RTCP Compound Packet Structure

Per RFC 3550, each RTCP compound packet must:

  1. Start with SR or RR
  2. Include SDES with at least CNAME
  3. Optionally include BYE, APP, or other packets
+--------+--------+--------+--------+
|   SR or RR (required first)       |
+--------+--------+--------+--------+
|   SDES with CNAME (required)      |
+--------+--------+--------+--------+
|   BYE (optional)                  |
+--------+--------+--------+--------+
|   APP (optional)                  |
+--------+--------+--------+--------+

RTCP Bandwidth Control

Parameter Value Description
Total RTCP bandwidth ~5% of session bandwidth Prevents control overhead from dominating
Senders share 25% of RTCP (1.25% total) For SR packets
Receivers share 75% of RTCP (3.75% total) For RR packets
Minimum interval 5 seconds (AVP profile) Between RTCP reports
Scaling Randomized, participant-based Adapts to session size

Receiver Report Fields

RTCP Receiver Report Block Fields
Field Size Description
SSRC of source 32 bits Which stream this report is about
Fraction lost 8 bits Packets lost / packets expected since last RR (0-255 = 0%-100%)
Cumulative lost 24 bits Total packets lost since session start
Extended highest seq 32 bits Highest sequence number received (with rollover)
Interarrival jitter 32 bits Statistical variance of packet inter-arrival time
Last SR (LSR) 32 bits Middle 32 bits of NTP timestamp from last SR
Delay since last SR (DLSR) 32 bits Time between receiving SR and sending this RR

RTP Profiles and Payload Types

RTP profiles define how the protocol is used for specific applications:

Common RTP Profiles
Profile RFC Description
RTP/AVP 3551 Audio/Video Profile - standard A/V conferencing
RTP/AVPF 4585 AVP with Feedback - immediate RTCP feedback (PLI, NACK)
RTP/SAVP 3711 Secure AVP - SRTP encryption and authentication
RTP/SAVPF 5124 Secure AVPF - SRTP with feedback

Static Payload Types (AVP Profile)

Common Static Payload Types
PT Encoding Clock Rate Type
0 PCMU (G.711 μ-law) 8000 Hz Audio
3 GSM 8000 Hz Audio
4 G723 8000 Hz Audio
8 PCMA (G.711 A-law) 8000 Hz Audio
9 G722 8000 Hz Audio
18 G729 8000 Hz Audio
31 H261 90000 Hz Video
32 MPV (MPEG-1/2 Video) 90000 Hz Video
34 H263 90000 Hz Video
96-127 Dynamic Negotiated Any

Dynamic Payload Type Negotiation

Dynamic payload types (96-127) are negotiated via SDP:

m=audio 4000 RTP/AVP 96 97
a=rtpmap:96 opus/48000/2
a=rtpmap:97 telephone-event/8000
a=fmtp:97 0-16

m=video 4002 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42e01f

Session Setup and Signaling

RTP does not provide session establishment - this is handled by external signaling protocols.

Transport Ports

Convention RTP Port RTCP Port Notes
Traditional Even number (e.g., 4000) RTP + 1 (e.g., 4001) Separate ports
RTCP Mux Same as RTP Same as RTP SDP: a=rtcp-mux
Demultiplexing PT 0-127 PT >= 200 (RTCP types) By first byte

SDP Media Description

v=0
o=alice 2890844526 2890844526 IN IP4 10.1.1.5
s=VoIP Call
c=IN IP4 10.1.1.5
t=0 0
m=audio 4000 RTP/AVP 0 8 96
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:96 opus/48000/2
a=rtcp-mux
a=sendrecv

Mixers and Translators

RTP supports intermediate nodes for complex topologies:

Translator Mixer
Mixer vs Translator Comparison
Feature Translator Mixer
SSRC handling Preserves original SSRCs Creates new SSRC, lists originals in CSRC
Stream output Separate streams forwarded Single combined stream
Use cases Relay, transcoding, firewall traversal Audio conferencing, video compositing
Timing Maintains original timing Becomes timing master
Bandwidth Sum of all streams Single stream (reduced)
RTCP Forwards or adjusts Generates own SR, sends RR upstream

Translator Use Cases

  • Multicast-to-unicast relay
  • Encryption/decryption gateway
  • Codec transcoding
  • IPv4/IPv6 bridging
  • Firewall traversal

Mixer Use Cases

  • Audio conference bridges (mixing multiple speakers)
  • Video MCU (compositing multiple feeds)
  • Bandwidth reduction for large conferences

Security Considerations (SRTP)

SRTP (Secure RTP, RFC 3711) provides:

  • Confidentiality - AES encryption of payload
  • Integrity - HMAC-SHA1 authentication
  • Replay protection - sequence number tracking
SRTP Packet Structure
Component Encryption Purpose
RTP Header Cleartext SSRC, seq, timestamp visible for routing
Payload AES Encrypted Media content protected
Auth Tag HMAC-SHA1 Integrity verification (typically 10 bytes)

SRTP Key Exchange Methods

Method RFC Description Usage
SDES 4568 Keys in SDP (deprecated) Legacy SIP
DTLS-SRTP 5764 In-band DTLS handshake WebRTC (mandatory)
ZRTP 6189 In-call DH exchange Oportunistic encryption
MIKEY 3830 Multimedia Internet KEYing Group scenarios

DTLS-SRTP

DTLS-SRTP (used in WebRTC) provides:

  • Perfect forward secrecy (DH key exchange)
  • Certificate fingerprint verification (via SDP)
  • Multiplexing on same port as RTP/RTCP/STUN

SDP attributes for DTLS-SRTP:

a=fingerprint:sha-256 AB:CD:EF:...
a=setup:actpass
a=rtcp-mux

Troubleshooting RTP

Common RTP Issues and Diagnostics
Issue Symptoms RTCP Indicator Solution
Packet loss Choppy audio, video artifacts High fraction lost in RR Check network path, QoS
High jitter Audio gaps, video stuttering High jitter value in RR Increase jitter buffer, check network
One-way audio Only one party hears No RTP received Check NAT, firewall, SDP IPs
No media Complete silence No RR/SR packets Verify signaling, check ports
Codec mismatch Garbled audio PT doesn't match expected Verify SDP negotiation
Clock drift A/V desync over time Compare SR timestamps Use RTCP for sync

Wireshark RTP Analysis

Key filters for RTP analysis:

rtp                          # All RTP packets
rtcp                         # All RTCP packets
rtp.ssrc == 0x1234abcd       # Specific stream
rtp.marker == 1              # Frame boundaries
rtcp.pt == 200               # Sender Reports
rtcp.pt == 201               # Receiver Reports

Telephony > RTP > RTP Streams - shows all streams with statistics

Quick Reference Tables

RTP Header Bit Layout

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |       Sequence Number         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           Timestamp                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             SSRC                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          CSRC list                            |
|                             ....                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Clock Rates by Media Type

Media Type Typical Clock Rate Examples
Narrowband audio 8000 Hz G.711, G.729, GSM
Wideband audio 16000 Hz G.722.1, AMR-WB
Full-band audio 48000 Hz Opus
Video 90000 Hz H.264, VP8, H.265

Marker Bit Usage

Media Type M=1 Meaning Purpose
Video Last packet of frame Frame boundary detection
Audio (silence suppression) First packet after silence Talkspurt indication
RFC 4733 DTMF End of DTMF event Event boundary

References

Primary Standards

  • RFC 3550 - RTP: A Transport Protocol for Real-Time Applications
  • RFC 3551 - RTP Profile for Audio and Video Conferences (AVP)
  • RFC 3711 - The Secure Real-time Transport Protocol (SRTP)
  • RFC 4585 - Extended RTP Profile for RTCP-Based Feedback (AVPF)

Extensions and Updates

  • RFC 5761 - Multiplexing RTP Data and Control Packets on a Single Port
  • RFC 5764 - DTLS Extension to Establish Keys for SRTP
  • RFC 6051 - Rapid Synchronisation of RTP Flows
  • RFC 6222 - Guidelines for Choosing RTCP Canonical Names (CNAMEs)
  • RFC 8285 - A General Mechanism for RTP Header Extensions

Related Protocols

  • RFC 4566 - SDP: Session Description Protocol
  • RFC 3261 - SIP: Session Initiation Protocol
  • RFC 4733 - RTP Payload for DTMF Digits, Telephony Tones

External Resources