Understanding the SIP Protocol: Difference between revisions
No edit summary |
(No difference)
|
Revision as of 22:54, 11 December 2025
Session Initiation Protocol (SIP) is an application-layer signaling protocol designed for creating, modifying, and terminating multimedia sessions over IP networks. These sessions can include Internet telephone calls (VoIP), video conferences, or any combination of multimedia streams. SIP itself handles the signaling and control portion - it establishes the session parameters - while the actual media (audio, video, etc.) is carried over separate protocols (typically RTP). SIP messages are text-based (similar to HTTP) and use a request/response model. SIP invitations to sessions carry session descriptions (usually using the SDP protocol) so that participants can agree on media types and formats. A key design goal of SIP is protocol agility: it is independent of the underlying transport (UDP, TCP, TLS, etc. on port 5060/5061 by default) and of the type of session being established.
SIP was originally defined in RFC 2543 and later refined in RFC 3261 (2002), which became the core SIP standard. Over time, numerous extension RFCs have expanded SIP's capabilities (for reliability, events, IM, security, etc.), making SIP a broad and powerful framework for signaling.
SIP Architecture and Core Components
SIP is a peer-to-peer protocol with a client-server design for message exchange. Its architecture defines several types of network entities, each with specific roles:
User Agent (UA)
A UA is an endpoint in SIP, typically a user's device or software (softphone). It represents an end system and can function as both a client and a server. A UA has two logical sub-roles:
- User Agent Client (UAC) - initiates requests
- User Agent Server (UAS) - responds to requests
For example, if Alice's phone calls Bob's phone, Alice's UA acts as a UAC (sending an INVITE request) and Bob's UA acts as a UAS (receiving the INVITE and sending a response). Once a session is established, the roles can flip for each new transaction (e.g. Bob's phone sending a BYE will be UAC for that request and Alice's phone UAS to respond).
Proxy Server
A SIP proxy is an intermediary that routes SIP requests and responses on behalf of UAs. When a UA sends a request to a SIP address, it typically goes to a proxy server in the domain, which then forwards it towards the destination. Proxies handle tasks like:
- Routing logic (determining the next hop or target for a request)
- Enforcing policies
- Authentication
A request may traverse multiple proxies in sequence. Each proxy may add or modify certain headers (like Via or Record-Route) before forwarding. Responses automatically follow the reverse path of the request through those proxies.
Proxies can be:
- Stateful - maintaining transaction state, allowing forks and smarter handling
- Stateless - simply forwarding messages and forgetting them
Registrar
A registrar is a server that handles user registration. UAs use the REGISTER method to sign in with a SIP service, providing their address and current location (IP address or forwarding address). The registrar accepts REGISTER requests and stores the information in a location service (a database of user addresses) for its domain. This allows proxies to later look up where a user is currently reachable. In essence, a registrar binds a user's permanent SIP URI (their Address-of-Record like sip:alice@atlanta.com) to the Contact address of their device.
Redirect Server
A redirect server is a UAS that does not forward requests but instead sends back a special 3xx redirect response informing the UAC of a different route or address to try. For example, if a user has moved to a different domain, a redirect server might respond with "302 Moved Temporarily" and the new contact address. The UAC then sends a new request to that address.
Back-to-Back User Agent (B2BUA)
A B2BUA is an entity that acts as a UA on both sides of a call - effectively terminating the SIP dialog on one side and creating a new one on the other side. Unlike a proxy, which passes messages along, a B2BUA maintains full state of the call and can perform deep packet inspection or modification. It behaves like a UAS to the caller and as a UAC to the callee, bridging the two call legs. This is used for scenarios like protocol interworking, media handling, or enforce policies where a proxy's limited role isn't enough.
Addressing
SIP addresses are in the form of Uniform Resource Identifiers (URI). A user's public address-of-record looks like an email (e.g. sip:alice@atlanta.com). This URI can be resolved to the user's current Contact address via the registrar's location service.
SIP URIs can also embed telephone numbers (e.g. sip:1234567890@pstn.provider.net) and may use a tel: URI scheme for phone numbers. There is also a secure form sips: (SIP Secure) which mandates that the request be sent over a secure transport (TLS) end-to-end.
SIP relies on the DNS infrastructure for server location: the procedures in RFC 3263 define that the client will use DNS SRV records, NAPTR records, and A/AAAA records to find the SIP server for the target domain.
Transport and Network
SIP messages can be transported over:
- UDP - most common for telephony
- TCP
- TLS-encrypted TCP - for secure SIP
- SCTP or WebSockets - e.g. SIP over WebSocket in web apps
The protocol includes mechanisms to handle issues like fragmentation (large messages should use TCP) and network failures (via retransmission timers especially on UDP). NAT traversal can be challenging for SIP, because SIP messages and SDP often carry IP addresses and expect end-to-end connectivity. Extensions like rport (RFC 3581) and "outbound" (RFC 5626) address some of these issues.
SIP Messages: Requests and Responses
SIP is a text-based protocol that exchanges messages in a format similar to HTTP. There are two types of SIP messages:
- Requests (methods) - sent by clients to initiate an action
- Responses - sent by servers (or UAs) to convey the result of that request
Each SIP message consists of a start line, zero or more header fields, a blank line, and an optional message body.
Core SIP Methods
The core SIP specification (RFC 3261) defined six basic methods:
| Method | Description |
|---|---|
| INVITE | Establishes a session (initiate a call). Used to invite participants to a session with SDP for media setup. |
| ACK | Confirms receipt of a final response to an INVITE. Used only with INVITE. |
| BYE | Terminates an established session (hangs up a call). |
| CANCEL | Cancels a pending request (typically used to cancel an unanswered INVITE). |
| REGISTER | Registers the UA's address with a SIP server. |
| OPTIONS | Queries capabilities of a server or UA. Often used for keep-alive or diagnostics. |
Extension Methods
Several extension methods have been introduced by various RFCs:
| Method | RFC | Description |
|---|---|---|
| PRACK | RFC 3262 | Provisional Acknowledgment - acknowledges reliable provisional responses (1xx). |
| SUBSCRIBE | RFC 3265 | Subscribes to an event on a server (presence, message waiting, etc.). |
| NOTIFY | RFC 3265 | Sends an event notification to a subscriber. |
| PUBLISH | RFC 3903 | Publishes event state (e.g., presence information) to a server. |
| INFO | RFC 2976 | Sends mid-session information (often DTMF tones). |
| REFER | RFC 3515 | Asks recipient to issue a new request (call transfer). |
| MESSAGE | RFC 3428 | Conveys instant message within or outside a dialog. |
| UPDATE | RFC 3311 | Modifies session parameters before final INVITE response. |
SIP Response Codes
SIP responses are categorized by their class (hundreds digit):
| Class | Type | Description | Examples |
|---|---|---|---|
| 1xx | Informational | Provisional - request is being processed | 100 Trying, 180 Ringing, 183 Session Progress |
| 2xx | Success | Request succeeded | 200 OK |
| 3xx | Redirection | Try different location | 301 Moved Permanently, 302 Moved Temporarily |
| 4xx | Client Error | Bad request or cannot fulfill | 400 Bad Request, 401 Unauthorized, 404 Not Found, 486 Busy Here |
| 5xx | Server Error | Server failed to fulfill valid request | 500 Server Internal Error, 503 Service Unavailable |
| 6xx | Global Error | Cannot be fulfilled anywhere | 603 Decline, 604 Does Not Exist Anywhere |
Only final responses (2xx-6xx) terminate a SIP transaction. Provisional (1xx) responses are informative.
SIP Message Structure and Headers
SIP messages are structured like HTTP messages. After the start line, each message has header fields in "Name: value" format.
Key SIP Headers
| Header | Description |
|---|---|
| Via | Lists network path taken by request. Each proxy adds a Via header. Used to route responses back. Contains branch identifier for duplicate detection. |
| From | Originator's identity (caller). Contains display name, SIP URI, and tag parameter. |
| To | Intended recipient's identity (callee). Tag added by UAS in response. |
| Call-ID | Globally unique identifier for the call/session. |
| CSeq | Sequence number and method name for ordering requests. |
| Contact | Direct URI where UA wants to be reached for future requests. |
| Max-Forwards | Hop count limiter (similar to IP TTL). Default: 70. |
| Content-Type | MIME type of message body (usually application/sdp).
|
| Content-Length | Size of message body in bytes. |
Additional Important Headers
- Record-Route / Route - Used by proxies to maintain themselves in the dialog path
- Authorization / WWW-Authenticate - HTTP Digest authentication (challenge-response)
- Supported / Require - Indicate supported or required SIP extensions
Message Body
The body is separated from headers by a blank line. In call setup, the body is usually an SDP (Session Description Protocol) offer or answer describing media streams (audio/video), codecs, IP addresses, and ports.
The offer/answer model (RFC 3264) means one party offers media parameters in INVITE, and the other answers with their selection in 200 OK.
SIP Call Flow Example
This example shows the "SIP trapezoid" - Alice calls Bob through their domain proxy servers.
Call Flow Steps
- INVITE - Alice's softphone sends INVITE to Bob's SIP URI through her proxy (atlanta.com). Includes SDP offer with proposed media parameters.
- Routing - Atlanta proxy forwards INVITE to biloxi.com after DNS lookup (RFC 3263). Adds Via and Record-Route headers.
- 100 Trying - Proxies send this provisional response to stop retransmissions. Indicates "I got the INVITE, working on it."
- Proxy to UAS - Biloxi proxy forwards INVITE to Bob's registered Contact address.
- 180 Ringing - Bob's phone sends this when alerting (ringing). Travels back through proxies to Alice.
- 200 OK - Bob answers. His phone sends 200 OK with SDP answer containing agreed media parameters. Dialog is established.
- ACK - Alice confirms receipt of 200 OK. Sent end-to-end (possibly through Record-Route path).
- Media Session - RTP audio/video streams flow directly between endpoints.
- BYE - Bob hangs up. BYE sent within existing dialog (same Call-ID and tags) through the established path.
- 200 OK (BYE) - Alice acknowledges termination. Call ends.
Canceling a Call
If Alice hangs up before Bob answers, her UA sends a CANCEL request:
- Uses same Call-ID and CSeq (with method "CANCEL")
- Each proxy responds with 200 OK immediately
- Bob's UAS sends 487 Request Terminated for the original INVITE
- CANCEL only works before call is answered
Transactions, Dialogs, and Sessions
Transaction
A transaction is a single request and all its responses. It's the fundamental unit of message handling:
- Identified by CSeq, request method, and branch ID
- Provisional responses (1xx) don't end transactions
- Final responses (2xx-6xx) complete transactions
- Handles retransmissions and timeouts
Dialog
A dialog is a peer-to-peer relationship between two UAs that persists over time:
- Created by successful INVITE transaction
- Identified by: Call-ID + From tag + To tag
- Maintains: route set, remote target (Contact), sequence expectations
- Lasts until BYE or termination
- Allows in-dialog requests: BYE, re-INVITE, UPDATE, INFO, etc.
Session
A session is the actual media session negotiated via SDP and carried via RTP. It's what users experience as "the call."
Registration and Location Service
Before calls can be made, UAs typically register with their SIP server:
REGISTER sip:atlanta.com SIP/2.0 To: sip:alice@atlanta.com From: sip:alice@atlanta.com;tag=xyz Contact: <sip:alice@192.0.2.4:5060> Expires: 3600
Key points:
- Binds Address-of-Record (AOR) to Contact URI(s)
- Authentication via 401 challenge and Authorization header
- Registrations have limited lifetime (Expires header)
- Must be refreshed periodically
- Multiple devices can register for same AOR
- Location Service database stores these bindings
SIP Extensions and Advanced Features
Reliability of Provisional Responses (RFC 3262)
Adds PRACK method to acknowledge provisional responses reliably:
- UAS sends 1xx with
Require: 100reland RSeq header - UAC must respond with PRACK
- Ensures important 180/183 responses aren't lost
Offer/Answer Model with SDP (RFC 3264)
- One party offers media parameters (codecs, IPs, ports)
- Other party answers with selection
- INVITE contains offer, 200 OK contains answer
- Or: INVITE without SDP, offer in 200 OK, answer in ACK
- re-INVITE or UPDATE can modify session later
Session Timer (RFC 4028)
Keep-alive mechanism for dialogs:
- Negotiated interval for session refresh
- Refresher sends periodic re-INVITE/UPDATE
- Cleans up zombie calls if one side crashes
- Uses Session-Expires header
UPDATE Method (RFC 3311)
Allows changing session parameters before INVITE completes:
- Early media negotiations
- Codec changes during ringing
- Session timer refresh in early stage
REFER Method (RFC 3515)
Used for call transfer:
- Alice sends Bob REFER with Refer-To: Carol's URI
- Bob's UA initiates new INVITE to Carol
- Result reported via NOTIFY
NAT Traversal
rport (RFC 3581)
- UAC includes
Via: ...;rport - Server sends response to source IP/port (not Via address)
- Helps responses get through NAT
Outbound (RFC 5626)
- Persistent connection for registration and incoming requests
- Keep-alive messages (CRLF or STUN pings)
- Instance-ID for device identification
- Crucial for mobile devices and NAT-heavy environments
Instant Messaging and Presence (SIP SIMPLE)
- MESSAGE method for pager-mode IM
- SUBSCRIBE/NOTIFY for presence (RFC 3856)
- PUBLISH for publishing presence state
Security Mechanisms
- TLS - Transport layer encryption (port 5061, SIPS URI)
- HTTP Digest Authentication - Challenge-response for registration/calls
- STIR/SHAKEN - Caller ID authentication to prevent spoofing (RFC 8224)
- SRTP - Secure media via SDES or DTLS-SRTP key exchange
Key RFCs
| RFC | Title |
|---|---|
| RFC 3261 | SIP: Session Initiation Protocol (core spec) |
| RFC 3262 | Reliability of Provisional Responses (PRACK) |
| RFC 3263 | Locating SIP Servers |
| RFC 3264 | Offer/Answer Model with SDP |
| RFC 3265 | Event Notification Framework (SUBSCRIBE/NOTIFY) |
| RFC 3311 | UPDATE Method |
| RFC 3326 | Reason Header |
| RFC 3428 | MESSAGE Method (Instant Messaging) |
| RFC 3515 | REFER Method |
| RFC 3581 | Symmetric Response Routing (rport) |
| RFC 3903 | PUBLISH Method |
| RFC 4028 | Session Timers |
| RFC 5411 | A Hitchhiker's Guide to SIP (RFC summary) |
| RFC 5626 | Managing Client-Initiated Connections (Outbound) |