Understanding the SIP Protocol: Difference between revisions

From VoIPmonitor.org
No edit summary
No edit summary
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
__NOTOC__
__NOTOC__
{| class="wikitable" style="width:100%; background:#f8f9fa; border:2px solid #00A7E3; margin-bottom:20px;"
|-
! colspan="3" style="background:#00A7E3; color:white; font-size:1.2em; padding:10px;" | Quick Navigation
|-
! style="width:33%; background:#e0f4fc; padding:8px; vertical-align:top;" | Architecture & Basics
! style="width:33%; background:#fef3e2; padding:8px; vertical-align:top;" | Messages & Call Flow
! style="width:33%; background:#f1f5f9; padding:8px; vertical-align:top;" | Advanced Topics
|-
| style="vertical-align:top; padding:10px;" |
'''Core Components'''
* [[#User Agent (UA)|User Agent (UAC/UAS)]]
* [[#Proxy Server|Proxy Server]]
* [[#Registrar|Registrar]]
* [[#Redirect Server|Redirect Server]]
* [[#Back-to-Back User Agent (B2BUA)|B2BUA]]
'''Addressing & Transport'''
* [[#Addressing|SIP URIs]]
* [[#Transport and Network|UDP/TCP/TLS/WebSocket]]
| style="vertical-align:top; padding:10px;" |
'''SIP Methods'''
* [[#Core SIP Methods|INVITE, ACK, BYE, CANCEL]]
* [[#Extension Methods|PRACK, REFER, SUBSCRIBE]]
'''Response Codes'''
* [[#SIP Response Codes|1xx-6xx Classes]]
'''Headers & Structure'''
* [[#SIP Message Structure and Headers|Key Headers]]
* [[#Example: Complete SIP INVITE Message|INVITE Example]]
* [[#Example: SDP Body|SDP Example]]
'''Call Flows'''
* [[#SIP Call Flow Example (Basic Call Setup)|Basic Call Setup]]
* [[#Canceling a call|CANCEL Flow]]
| style="vertical-align:top; padding:10px;" |
'''Core Concepts'''
* [[#Transaction|Transactions]]
* [[#Dialog|Dialogs]]
* [[#Session|Sessions]]
'''Registration'''
* [[#Registration and Location Service|REGISTER Flow]]
* [[#Quick Reference: Registration|Registration Cheat Sheet]]
'''Extensions'''
* [[#Reliability of Provisional Responses (RFC 3262)|PRACK (100rel)]]
* [[#Session Timer (RFC 4028)|Session Timers]]
* [[#REFER Method (RFC 3515)|Call Transfer]]
* [[#NAT Traversal|NAT Traversal]]
'''Reference'''
* [[#Key RFCs|RFC Links]]
|}
'''Session Initiation Protocol (SIP)''' is an application-layer signaling protocol designed for creating, modifying, and terminating multimedia sessions over IP networks. These sessions can include Internet telephone calls (VoIP), video conferences, or any combination of multimedia streams. SIP itself handles the signaling and control portion – it establishes the session parameters – while the actual media (audio, video, etc.) is carried over separate protocols (typically RTP). SIP messages are text-based (similar to HTTP) and use a request/response model. SIP invitations to sessions carry session descriptions (usually using the SDP protocol) so that participants can agree on media types and formats. A key design goal of SIP is protocol agility: it is independent of the underlying transport (UDP, TCP, TLS, etc. on port 5060/5061 by default) and of the type of session being established.
'''Session Initiation Protocol (SIP)''' is an application-layer signaling protocol designed for creating, modifying, and terminating multimedia sessions over IP networks. These sessions can include Internet telephone calls (VoIP), video conferences, or any combination of multimedia streams. SIP itself handles the signaling and control portion – it establishes the session parameters – while the actual media (audio, video, etc.) is carried over separate protocols (typically RTP). SIP messages are text-based (similar to HTTP) and use a request/response model. SIP invitations to sessions carry session descriptions (usually using the SDP protocol) so that participants can agree on media types and formats. A key design goal of SIP is protocol agility: it is independent of the underlying transport (UDP, TCP, TLS, etc. on port 5060/5061 by default) and of the type of session being established.


SIP was originally defined in RFC 2543 and later refined in RFC 3261 (2002), which became the core SIP standard. Over time, numerous extension RFCs have expanded SIP's capabilities (for reliability, events, IM, security, etc.), making SIP a broad and powerful framework for signaling. Despite the many formal definitions in the RFCs, this guide aims to explain SIP in an accessible way – serving as a "cheat sheet" to understand SIP signaling without getting lost in the exhaustive RFC language. We will cover SIP's architecture, message format, call flow, and key features/extensions, providing a solid reference for anyone new to SIP or looking to grasp the full picture.
SIP was originally defined in RFC 2543 and later refined in [https://datatracker.ietf.org/doc/html/rfc3261 RFC 3261] (2002), which became the core SIP standard. Over time, numerous extension RFCs have expanded SIP's capabilities (for reliability, events, IM, security, etc.), making SIP a broad and powerful framework for signaling. Despite the many formal definitions in the RFCs, this guide aims to explain SIP in an accessible way – serving as a "cheat sheet" to understand SIP signaling without getting lost in the exhaustive RFC language. We will cover SIP's architecture, message format, call flow, and key features/extensions, providing a solid reference for anyone new to SIP or looking to grasp the full picture.


== SIP Architecture and Core Components ==
== SIP Architecture and Core Components ==
Line 80: Line 135:
=== Addressing ===
=== Addressing ===


SIP addresses are in the form of Uniform Resource Identifiers (URI). A user's public address-of-record looks like an email (e.g. <code>sip:alice@atlanta.com</code>). This URI can be resolved to the user's current Contact address via the registrar's location service. SIP URIs can also embed telephone numbers (e.g. <code>sip:1234567890@pstn.provider.net</code>) and may use a <code>tel:</code> URI scheme for phone numbers in certain cases (RFC 3824). There is also a secure form <code>sips:</code> (SIP Secure) which mandates that the request be sent over a secure transport (TLS) end-to-end. When a UA wants to reach another user, it sends a request to the domain part of the SIP URI. SIP relies on the DNS infrastructure for server location: the procedures in RFC 3263 define that the client will use DNS SRV records, NAPTR records, and A/AAAA records to find the SIP server for the target domain. For example, to send a request to <code>sip:bob@biloxi.com</code>, Alice's device will DNS-resolve biloxi.com for SIP service, possibly discovering a proxy server address to send the request to. This allows SIP to route messages globally using the DNS naming system.
SIP addresses are in the form of Uniform Resource Identifiers (URI). A user's public address-of-record looks like an email (e.g. <code>sip:alice@atlanta.com</code>). This URI can be resolved to the user's current Contact address via the registrar's location service. SIP URIs can also embed telephone numbers (e.g. <code>sip:1234567890@pstn.provider.net</code>) and may use a <code>tel:</code> URI scheme for phone numbers in certain cases ([https://datatracker.ietf.org/doc/html/rfc3966 RFC 3966]). There is also a secure form <code>sips:</code> (SIP Secure) which mandates that the request be sent over a secure transport (TLS) end-to-end. When a UA wants to reach another user, it sends a request to the domain part of the SIP URI. SIP relies on the DNS infrastructure for server location: the procedures in [https://datatracker.ietf.org/doc/html/rfc3263 RFC 3263] define that the client will use DNS SRV records, NAPTR records, and A/AAAA records to find the SIP server for the target domain. For example, to send a request to <code>sip:bob@biloxi.com</code>, Alice's device will DNS-resolve biloxi.com for SIP service, possibly discovering a proxy server address to send the request to. This allows SIP to route messages globally using the DNS naming system.


=== Transport and Network ===
=== Transport and Network ===


SIP messages can be transported over UDP (most common for telephony), TCP, or TLS-encrypted TCP (for secure SIP). It's flexible and even SCTP or WebSockets can be used (e.g. SIP over WebSocket in web apps). The protocol includes mechanisms to handle issues like fragmentation (e.g. large messages should use TCP) and network failures (via retransmission timers especially on UDP). NAT traversal can be challenging for SIP, because SIP messages and SDP often carry IP addresses and expect end-to-end connectivity. Extensions like rport (RFC 3581) and "outbound" (RFC 5626) address some of these issues (see later section on extensions), enabling symmetric response routing and keep-alive mechanisms to handle NATs.
SIP messages can be transported over UDP (most common for telephony), TCP, or TLS-encrypted TCP (for secure SIP). It's flexible and even SCTP or WebSockets can be used (e.g. SIP over WebSocket in web apps). The protocol includes mechanisms to handle issues like fragmentation (e.g. large messages should use TCP) and network failures (via retransmission timers especially on UDP). NAT traversal can be challenging for SIP, because SIP messages and SDP often carry IP addresses and expect end-to-end connectivity. Extensions like rport ([https://datatracker.ietf.org/doc/html/rfc3581 RFC 3581]) and "outbound" ([https://datatracker.ietf.org/doc/html/rfc5626 RFC 5626]) address some of these issues (see later section on extensions), enabling symmetric response routing and keep-alive mechanisms to handle NATs.
 
=== Quick Reference: SIP Components ===
 
{| class="wikitable"
|-
! Component !! Role !! Key Function
|-
| '''User Agent (UA)''' || Endpoint || Initiates/receives calls. Acts as UAC (client) or UAS (server).
|-
| '''Proxy Server''' || Intermediary || Routes requests/responses. Can be stateful or stateless.
|-
| '''Registrar''' || Registration || Accepts REGISTER, stores user location bindings.
|-
| '''Redirect Server''' || Routing || Returns 3xx responses with alternate contact addresses.
|-
| '''B2BUA''' || Call control || Terminates/originates dialogs on both sides. Full call state.
|-
| '''Location Service''' || Database || Stores AOR-to-Contact mappings for user lookup.
|}
 
{| class="wikitable"
|-
! URI Scheme !! Description !! Example
|-
| <code>sip:</code> || Standard SIP URI || <code>sip:alice@atlanta.com</code>
|-
| <code>sips:</code> || Secure SIP (TLS required) || <code>sips:alice@atlanta.com</code>
|-
| <code>tel:</code> || Telephone number || <code>tel:+1-555-123-4567</code>
|}
 
{| class="wikitable"
|-
! Transport !! Port !! Notes
|-
| UDP || 5060 || Most common, requires retransmission handling
|-
| TCP || 5060 || For large messages, reliable delivery
|-
| TLS || 5061 || Encrypted signaling
|-
| WebSocket || 80/443 || For web applications ([https://datatracker.ietf.org/doc/html/rfc7118 RFC 7118])
|}


== SIP Messages: Requests and Responses ==
== SIP Messages: Requests and Responses ==
Line 90: Line 188:
SIP is a text-based protocol that exchanges messages in a format similar to HTTP. There are two types of SIP messages: '''Requests''' (also called methods) sent by clients to initiate an action, and '''Responses''' sent by servers (or UAs) to convey the result of that request. Each SIP message consists of a start line, zero or more header fields, a blank line, and an optional message body.
SIP is a text-based protocol that exchanges messages in a format similar to HTTP. There are two types of SIP messages: '''Requests''' (also called methods) sent by clients to initiate an action, and '''Responses''' sent by servers (or UAs) to convey the result of that request. Each SIP message consists of a start line, zero or more header fields, a blank line, and an optional message body.


A Request start-line includes a method name and a Request-URI (the target address) along with the SIP version. For example: <code>INVITE sip:bob@biloxi.com SIP/2.0</code>. There are a number of standard methods defined. The core SIP specification (RFC 3261) defined six basic methods, and subsequent RFCs added additional methods for extended functionality. Below is a list of the common SIP request methods and their purpose:
A Request start-line includes a method name and a Request-URI (the target address) along with the SIP version. For example: <code>INVITE sip:bob@biloxi.com SIP/2.0</code>. There are a number of standard methods defined. The core SIP specification ([https://datatracker.ietf.org/doc/html/rfc3261 RFC 3261]) defined six basic methods, and subsequent RFCs added additional methods for extended functionality. Below is a list of the common SIP request methods and their purpose:


=== Core SIP Methods ===
=== Core SIP Methods ===
Line 110: Line 208:
In addition to these core methods, several extension methods have been introduced by various RFCs to extend SIP's functionality:
In addition to these core methods, several extension methods have been introduced by various RFCs to extend SIP's functionality:


* '''PRACK''' – Provisional Acknowledgment. PRACK (defined in RFC 3262) is used to acknowledge provisional responses (1xx) that are sent reliably. It improves reliability of ringing or early media responses (see section on provisional reliability).
* '''PRACK''' – Provisional Acknowledgment. PRACK (defined in [https://datatracker.ietf.org/doc/html/rfc3262 RFC 3262]) is used to acknowledge provisional responses (1xx) that are sent reliably. It improves reliability of ringing or early media responses (see section on provisional reliability).


* '''SUBSCRIBE''' – Subscribes to an event on a server. Defined in RFC 3265, SUBSCRIBE allows a client to request notifications of events (such as presence changes, message waiting, etc.) from another entity.
* '''SUBSCRIBE''' – Subscribes to an event on a server. Defined in [https://datatracker.ietf.org/doc/html/rfc3265 RFC 3265], SUBSCRIBE allows a client to request notifications of events (such as presence changes, message waiting, etc.) from another entity.


* '''NOTIFY''' – Sends an event notification to a subscriber. When an event a user subscribed to occurs, the notifier (usually a server or UA) sends a NOTIFY to inform the subscriber of the new state.
* '''NOTIFY''' – Sends an event notification to a subscriber. When an event a user subscribed to occurs, the notifier (usually a server or UA) sends a NOTIFY to inform the subscriber of the new state.


* '''PUBLISH''' – Publishes an event state to a server. Defined in RFC 3903, PUBLISH allows a UA to push its current state (e.g., presence information) to a server, which can then distribute it to subscribers.
* '''PUBLISH''' – Publishes an event state to a server. Defined in [https://datatracker.ietf.org/doc/html/rfc3903 RFC 3903], PUBLISH allows a UA to push its current state (e.g., presence information) to a server, which can then distribute it to subscribers.


* '''INFO''' – Sends mid-session information that does not modify the session state. Defined in RFC 2976, this method is often used for sending DTMF tones or other signals during a call in-band (though newer mechanisms may replace INFO for that).
* '''INFO''' – Sends mid-session information that does not modify the session state. Defined in [https://datatracker.ietf.org/doc/html/rfc2976 RFC 2976], this method is often used for sending DTMF tones or other signals during a call in-band (though newer mechanisms may replace INFO for that).


* '''REFER''' – Asks the recipient to issue a new request (typically to transfer a call). Defined in RFC 3515, REFER is used to instruct a UA to contact a third party (e.g., Alice, in a call with Bob, sends Bob a REFER to call Charlie – effectively transferring or adding a party).
* '''REFER''' – Asks the recipient to issue a new request (typically to transfer a call). Defined in [https://datatracker.ietf.org/doc/html/rfc3515 RFC 3515], REFER is used to instruct a UA to contact a third party (e.g., Alice, in a call with Bob, sends Bob a REFER to call Charlie – effectively transferring or adding a party).


* '''MESSAGE''' – Conveys an instant message (IM) within a SIP dialog or as a standalone out-of-dialog message. Defined in RFC 3428, MESSAGE carries textual chat content in the SIP body, enabling basic instant messaging.
* '''MESSAGE''' – Conveys an instant message (IM) within a SIP dialog or as a standalone out-of-dialog message. Defined in [https://datatracker.ietf.org/doc/html/rfc3428 RFC 3428], MESSAGE carries textual chat content in the SIP body, enabling basic instant messaging.


* '''UPDATE''' – Modifies the session parameters of an existing dialog before the final INVITE response. Defined in RFC 3311, UPDATE can change session settings (like codecs or media streams) or send an offer/answer negotiation in early dialog, without waiting for the initial INVITE to complete.
* '''UPDATE''' – Modifies the session parameters of an existing dialog before the final INVITE response. Defined in [https://datatracker.ietf.org/doc/html/rfc3311 RFC 3311], UPDATE can change session settings (like codecs or media streams) or send an offer/answer negotiation in early dialog, without waiting for the initial INVITE to complete.


(Note: There are a few more methods and many SIP header extensions defined in various RFCs and domain-specific SIP profiles (e.g., INFO packages, PING as a keepalive in some systems, etc.), but the above are the primary methods you'll encounter. Together, they make SIP a very flexible protocol.)
(Note: There are a few more methods and many SIP header extensions defined in various RFCs and domain-specific SIP profiles (e.g., INFO packages, PING as a keepalive in some systems, etc.), but the above are the primary methods you'll encounter. Together, they make SIP a very flexible protocol.)
Line 145: Line 243:


Only final responses (2xx–6xx) terminate a SIP transaction. Provisional (1xx) responses are informative and do not terminate the transaction (except they may cease retransmissions of the request in some cases). Some specific response codes have special handling in SIP (for example, 100 is never forwarded by proxies, 407 Proxy Authentication Required triggers proxy auth, 487 is used to indicate a canceled request, etc.), but the above categories suffice for a general understanding.
Only final responses (2xx–6xx) terminate a SIP transaction. Provisional (1xx) responses are informative and do not terminate the transaction (except they may cease retransmissions of the request in some cases). Some specific response codes have special handling in SIP (for example, 100 is never forwarded by proxies, 407 Proxy Authentication Required triggers proxy auth, 487 is used to indicate a canceled request, etc.), but the above categories suffice for a general understanding.
=== Quick Reference: SIP Methods ===
{| class="wikitable"
|-
! Method !! RFC !! Creates Dialog? !! Description
|-
| '''INVITE''' || [https://datatracker.ietf.org/doc/html/rfc3261 3261] || Yes || Initiate session/call
|-
| '''ACK''' || [https://datatracker.ietf.org/doc/html/rfc3261 3261] || No || Confirm INVITE final response
|-
| '''BYE''' || [https://datatracker.ietf.org/doc/html/rfc3261 3261] || No (ends) || Terminate session
|-
| '''CANCEL''' || [https://datatracker.ietf.org/doc/html/rfc3261 3261] || No || Cancel pending INVITE
|-
| '''REGISTER''' || [https://datatracker.ietf.org/doc/html/rfc3261 3261] || No || Register contact with server
|-
| '''OPTIONS''' || [https://datatracker.ietf.org/doc/html/rfc3261 3261] || No || Query capabilities
|-
| '''PRACK''' || [https://datatracker.ietf.org/doc/html/rfc3262 3262] || No || Acknowledge reliable provisional
|-
| '''SUBSCRIBE''' || [https://datatracker.ietf.org/doc/html/rfc3265 3265] || Yes || Subscribe to events
|-
| '''NOTIFY''' || [https://datatracker.ietf.org/doc/html/rfc3265 3265] || No || Send event notification
|-
| '''PUBLISH''' || [https://datatracker.ietf.org/doc/html/rfc3903 3903] || No || Publish event state
|-
| '''INFO''' || [https://datatracker.ietf.org/doc/html/rfc2976 2976] || No || Mid-session info (DTMF)
|-
| '''REFER''' || [https://datatracker.ietf.org/doc/html/rfc3515 3515] || No || Request call transfer
|-
| '''MESSAGE''' || [https://datatracker.ietf.org/doc/html/rfc3428 3428] || No || Instant message
|-
| '''UPDATE''' || [https://datatracker.ietf.org/doc/html/rfc3311 3311] || No || Modify session (early dialog)
|}
=== Quick Reference: Response Codes ===
{| class="wikitable"
|-
! Code !! Meaning !! Notes
|-
! colspan="3" | '''1xx – Provisional (Informational)'''
|-
| 100 || Trying || Stops retransmissions, not forwarded by proxies
|-
| 180 || Ringing || Callee alerting
|-
| 181 || Call Being Forwarded || Call is being forwarded
|-
| 182 || Queued || Call queued
|-
| 183 || Session Progress || Early media / progress info
|-
! colspan="3" | '''2xx – Success'''
|-
| 200 || OK || Request succeeded
|-
| 202 || Accepted || Request accepted (async processing)
|-
! colspan="3" | '''3xx – Redirection'''
|-
| 300 || Multiple Choices || Multiple options available
|-
| 301 || Moved Permanently || User permanently at new location
|-
| 302 || Moved Temporarily || User temporarily at new location
|-
| 305 || Use Proxy || Must use specified proxy
|-
! colspan="3" | '''4xx – Client Error'''
|-
| 400 || Bad Request || Malformed syntax
|-
| 401 || Unauthorized || Requires authentication
|-
| 403 || Forbidden || Request refused
|-
| 404 || Not Found || User not found
|-
| 405 || Method Not Allowed || Method not supported
|-
| 407 || Proxy Auth Required || Proxy authentication needed
|-
| 408 || Request Timeout || No response in time
|-
| 415 || Unsupported Media Type || Body format not supported
|-
| 420 || Bad Extension || Required extension not supported
|-
| 480 || Temporarily Unavailable || Callee unavailable
|-
| 481 || Call/Transaction Does Not Exist || Dialog/transaction not found
|-
| 486 || Busy Here || Callee busy
|-
| 487 || Request Terminated || Request was CANCELed
|-
| 488 || Not Acceptable Here || SDP not acceptable
|-
! colspan="3" | '''5xx – Server Error'''
|-
| 500 || Server Internal Error || Server failure
|-
| 501 || Not Implemented || Method not implemented
|-
| 502 || Bad Gateway || Gateway error
|-
| 503 || Service Unavailable || Server overloaded/maintenance
|-
| 504 || Server Timeout || Gateway timeout
|-
! colspan="3" | '''6xx – Global Failure'''
|-
| 600 || Busy Everywhere || All endpoints busy
|-
| 603 || Decline || Call declined by user
|-
| 604 || Does Not Exist Anywhere || User doesn't exist
|-
| 606 || Not Acceptable || No acceptable media
|}


== SIP Message Structure and Headers ==
== SIP Message Structure and Headers ==
Line 190: Line 410:
=== Authorization / WWW-Authenticate ===
=== Authorization / WWW-Authenticate ===


SIP uses an HTTP Digest authentication model. When a UAC sends a request that needs authentication (like REGISTER or an INVITE to a realm requiring auth), the server can respond with 401 Unauthorized (or 407 Proxy Authentication Required for proxy-auth) along with a WWW-Authenticate (or Proxy-Authenticate) header challenging the user. The UAC then resends the request with an Authorization (or Proxy-Authorization) header containing the credentials (username, realm, nonce, response, etc.) as per the HTTP Digest algorithm. This challenge-response handshake ensures the user is who they claim (given a shared password or secret). This is how SIP enforces user authentication for things like registration and calling. The details follow RFC 2617 (HTTP Digest). From the user perspective: the first call attempt might get a 401, then the phone automatically resends with credentials, then it succeeds with 200 OK.
SIP uses an HTTP Digest authentication model. When a UAC sends a request that needs authentication (like REGISTER or an INVITE to a realm requiring auth), the server can respond with 401 Unauthorized (or 407 Proxy Authentication Required for proxy-auth) along with a WWW-Authenticate (or Proxy-Authenticate) header challenging the user. The UAC then resends the request with an Authorization (or Proxy-Authorization) header containing the credentials (username, realm, nonce, response, etc.) as per the HTTP Digest algorithm. This challenge-response handshake ensures the user is who they claim (given a shared password or secret). This is how SIP enforces user authentication for things like registration and calling. The details follow [https://datatracker.ietf.org/doc/html/rfc2617 RFC 2617] (HTTP Digest). From the user perspective: the first call attempt might get a 401, then the phone automatically resends with credentials, then it succeeds with 200 OK.


=== Supported / Require ===
=== Supported / Require ===
Line 196: Line 416:
These headers are used to indicate extensions. A UAC can include a Supported header to list optional SIP extensions it supports (like 100rel, timer, replaces, etc.). A UAS or proxy can use Require to insist that the other side must support a certain extension or the request fails. For example, <code>Require: 100rel</code> might be in an INVITE to demand that provisional reliability (PRACK) is used. If the other side doesn't support it, it will send an error (420 Bad Extension). Generally, Require is used sparingly (only when an extension is critical), while Supported/Unsupported/Allow advertise capabilities.
These headers are used to indicate extensions. A UAC can include a Supported header to list optional SIP extensions it supports (like 100rel, timer, replaces, etc.). A UAS or proxy can use Require to insist that the other side must support a certain extension or the request fails. For example, <code>Require: 100rel</code> might be in an INVITE to demand that provisional reliability (PRACK) is used. If the other side doesn't support it, it will send an error (420 Bad Extension). Generally, Require is used sparingly (only when an extension is critical), while Supported/Unsupported/Allow advertise capabilities.


The SIP headers collectively provide a rich amount of information and control. The complete set of SIP header fields is defined in RFC 3261 Section 20 and subsequent RFCs. For everyday use, the ones discussed above are the most crucial to understand the call flows.
The SIP headers collectively provide a rich amount of information and control. The complete set of SIP header fields is defined in [https://datatracker.ietf.org/doc/html/rfc3261#section-20 RFC 3261 Section 20] and subsequent RFCs. For everyday use, the ones discussed above are the most crucial to understand the call flows.
 
=== Example: Complete SIP INVITE Message ===
 
Below is a complete example of a SIP INVITE request with all essential headers:
 
<pre>
INVITE sip:bob@biloxi.com SIP/2.0
Via: SIP/2.0/UDP pc33.atlanta.com;branch=z9hG4bK776asdhds
Max-Forwards: 70
To: Bob <sip:bob@biloxi.com>
From: Alice <sip:alice@atlanta.com>;tag=1928301774
Call-ID: a84b4c76e66710@pc33.atlanta.com
CSeq: 314159 INVITE
Contact: <sip:alice@pc33.atlanta.com>
Content-Type: application/sdp
Content-Length: 142
 
v=0
o=alice 2890844526 2890844526 IN IP4 pc33.atlanta.com
s=Session SDP
c=IN IP4 pc33.atlanta.com
t=0 0
m=audio 49172 RTP/AVP 0
a=rtpmap:0 PCMU/8000
</pre>
 
Key elements:
* '''Request line''': Method (INVITE), Request-URI (sip:bob@biloxi.com), SIP version
* '''Via''': Shows origin host with branch ID for transaction matching
* '''From/To''': Caller and callee identities (From has tag, To will get tag in response)
* '''Call-ID''': Unique identifier for this call
* '''CSeq''': Sequence number (314159) + method name
* '''Contact''': Where Alice can be directly reached
* '''Content-Type/Length''': Indicates SDP body follows
* '''Body''': SDP session description (see below)
 
=== Example: SDP Body ===
 
The Session Description Protocol (SDP) body in SIP messages describes the media session parameters:
 
<pre>
v=0
o=alice 2890844526 2890844526 IN IP4 pc33.atlanta.com
s=Session SDP
c=IN IP4 pc33.atlanta.com
t=0 0
m=audio 49172 RTP/AVP 0 8 97
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:97 iLBC/8000
a=ptime:20
a=sendrecv
</pre>
 
{| class="wikitable"
|-
! Line !! Meaning
|-
| <code>v=0</code> || SDP version (always 0)
|-
| <code>o=alice 2890844526 2890844526 IN IP4 pc33.atlanta.com</code> || Origin: username, session-id, version, network type, address
|-
| <code>s=Session SDP</code> || Session name (required but often ignored)
|-
| <code>c=IN IP4 pc33.atlanta.com</code> || Connection info: where to send media (IP address)
|-
| <code>t=0 0</code> || Timing: start and stop times (0 0 = unbounded)
|-
| <code>m=audio 49172 RTP/AVP 0 8 97</code> || Media: type (audio), port (49172), protocol (RTP/AVP), payload types
|-
| <code>a=rtpmap:0 PCMU/8000</code> || Attribute: payload 0 = G.711 μ-law at 8000 Hz
|-
| <code>a=rtpmap:8 PCMA/8000</code> || Attribute: payload 8 = G.711 A-law at 8000 Hz
|-
| <code>a=rtpmap:97 iLBC/8000</code> || Attribute: payload 97 = iLBC codec
|-
| <code>a=ptime:20</code> || Packetization time: 20ms per packet
|-
| <code>a=sendrecv</code> || Direction: bidirectional media
|}


=== Message Body and SDP ===
Common SDP direction attributes:
* <code>sendrecv</code> – bidirectional (normal call)
* <code>sendonly</code> – only sending (music on hold from server)
* <code>recvonly</code> – only receiving
* <code>inactive</code> – no media (call on hold)


Finally, if a SIP message contains a body, it is separated from headers by a blank line. In call setup, the body is usually an SDP offer or answer, containing lines that describe media streams (audio/video), codecs, IP addresses, and ports. SIP itself is agnostic to the body content, treating it as opaque data to be passed along. The default and most common body type is '''SDP (Session Description Protocol)''', which SIP MUST support according to RFC 3261, as it's the standard way to negotiate media for calls. The offer/answer model used by SIP (RFC 3264) means one party offers a set of media parameters in an INVITE or in a response, and the other party answers with their selection, allowing both to agree on how to communicate media. For instance, Alice's INVITE might include an SDP offer proposing an audio stream with certain codecs; Bob's 200 OK carries an SDP answer selecting one codec and confirming media parameters. This way, by the time the call is established, both sides know what IP/port to send media to and in which format. (If an INVITE doesn't contain SDP, the offer may be delayed to the 200 OK, and then the ACK carries the answer – but the end result is the same negotiation.) Aside from calls, other uses of SIP can have different body types or none at all (e.g., MESSAGE might carry text in the body, or a SIP NOTIFY for a voicemail could carry an XML message summary, etc., with appropriate Content-Type indicating the format).
=== Quick Reference: SIP Headers ===
 
{| class="wikitable"
|-
! Header !! Purpose !! Example
|-
| '''Via''' || Route responses back; branch ID || <code>Via: SIP/2.0/UDP pc.atlanta.com;branch=z9hG4bK...</code>
|-
| '''From''' || Caller identity + tag || <code>From: "Alice" <sip:alice@atlanta.com>;tag=1234</code>
|-
| '''To''' || Callee identity + tag (in response) || <code>To: "Bob" <sip:bob@biloxi.com>;tag=5678</code>
|-
| '''Call-ID''' || Unique call identifier || <code>Call-ID: a84b4c76e66710@pc.atlanta.com</code>
|-
| '''CSeq''' || Request sequence number + method || <code>CSeq: 314159 INVITE</code>
|-
| '''Contact''' || Direct reachable address || <code>Contact: <sip:alice@192.0.2.4:5060></code>
|-
| '''Max-Forwards''' || Hop limit (TTL) || <code>Max-Forwards: 70</code>
|-
| '''Content-Type''' || Body MIME type || <code>Content-Type: application/sdp</code>
|-
| '''Content-Length''' || Body size in bytes || <code>Content-Length: 142</code>
|-
| '''Record-Route''' || Proxy stays in path || <code>Record-Route: <sip:proxy.atlanta.com;lr></code>
|-
| '''Route''' || Forced routing path || <code>Route: <sip:proxy.atlanta.com;lr></code>
|-
| '''Authorization''' || Auth credentials || <code>Authorization: Digest username="alice"...</code>
|-
| '''WWW-Authenticate''' || Auth challenge || <code>WWW-Authenticate: Digest realm="atlanta.com"...</code>
|-
| '''Supported''' || Optional extensions supported || <code>Supported: 100rel, timer, replaces</code>
|-
| '''Require''' || Mandatory extensions || <code>Require: 100rel</code>
|-
| '''Allow''' || Supported methods || <code>Allow: INVITE, ACK, BYE, CANCEL, OPTIONS</code>
|-
| '''Expires''' || Registration/subscription lifetime || <code>Expires: 3600</code>
|-
| '''Session-Expires''' || Session timer value || <code>Session-Expires: 1800;refresher=uac</code>
|-
| '''Refer-To''' || Transfer target || <code>Refer-To: <sip:carol@chicago.com></code>
|-
| '''Event''' || Event package type || <code>Event: presence</code>
|-
| '''Subscription-State''' || Subscription status || <code>Subscription-State: active;expires=3600</code>
|}
 
=== Dialog Identification ===
 
A SIP dialog is uniquely identified by the combination of three values:
 
{| class="wikitable"
|-
! Component !! Source !! When Set
|-
| '''Call-ID''' || UAC generates || Initial request
|-
| '''From-tag''' || UAC generates || Initial request
|-
| '''To-tag''' || UAS generates || First response (usually 1xx or 2xx)
|}


== SIP Call Flow Example (Basic Call Setup) ==
== SIP Call Flow Example (Basic Call Setup) ==
Line 254: Line 620:
=== Routing – Proxy atlanta.com forwards toward biloxi.com ===
=== Routing – Proxy atlanta.com forwards toward biloxi.com ===


The atlanta.com proxy now acts as a UAC on behalf of Alice, forwarding the INVITE to the next hop. It needs to determine how to reach biloxi.com. It likely performs a DNS lookup (RFC 3263) to find the SIP server for biloxi.com. Let's say it finds an address for the biloxi.com proxy. Before forwarding, the atlanta proxy adds its own Via header (so that responses come back to it) and typically a Record-Route header if it wants to stay on the path. It then sends the INVITE to the biloxi.com proxy server.
The atlanta.com proxy now acts as a UAC on behalf of Alice, forwarding the INVITE to the next hop. It needs to determine how to reach biloxi.com. It likely performs a DNS lookup ([https://datatracker.ietf.org/doc/html/rfc3263 RFC 3263]) to find the SIP server for biloxi.com. Let's say it finds an address for the biloxi.com proxy. Before forwarding, the atlanta proxy adds its own Via header (so that responses come back to it) and typically a Record-Route header if it wants to stay on the path. It then sends the INVITE to the biloxi.com proxy server.


=== 100 Trying – Provisional response by proxies ===
=== 100 Trying – Provisional response by proxies ===
Line 332: Line 698:
If Alice decided to hang up before Bob answered (for example, she gets tired of waiting), her UA could send a CANCEL request for the INVITE. CANCEL is a separate request that uses the same Call-ID, CSeq (same number, method "CANCEL"), and Via as the INVITE it's canceling. The CANCEL travels through the proxies hop-by-hop (each proxy responds to CANCEL with 200 OK immediately), and when the CANCEL reaches Bob's UAS, if Bob hasn't answered yet, his phone will terminate the ringing and respond to the original INVITE with 487 Request Terminated. Alice's UA, upon seeing 487, knows the call was canceled. (If Bob had already sent a 200 OK moments before CANCEL arrived, then the CANCEL has no effect – a UAS ignores CANCEL if a final response has already been sent. CANCEL is only useful before a call is answered, and specifically for INVITE; you don't cancel other methods in practice. Also by spec, a UAC should not send CANCEL until it has received at least one provisional response for the INVITE, to avoid a race where CANCEL arrives before the INVITE is even processed.)
If Alice decided to hang up before Bob answered (for example, she gets tired of waiting), her UA could send a CANCEL request for the INVITE. CANCEL is a separate request that uses the same Call-ID, CSeq (same number, method "CANCEL"), and Via as the INVITE it's canceling. The CANCEL travels through the proxies hop-by-hop (each proxy responds to CANCEL with 200 OK immediately), and when the CANCEL reaches Bob's UAS, if Bob hasn't answered yet, his phone will terminate the ringing and respond to the original INVITE with 487 Request Terminated. Alice's UA, upon seeing 487, knows the call was canceled. (If Bob had already sent a 200 OK moments before CANCEL arrived, then the CANCEL has no effect – a UAS ignores CANCEL if a final response has already been sent. CANCEL is only useful before a call is answered, and specifically for INVITE; you don't cancel other methods in practice. Also by spec, a UAC should not send CANCEL until it has received at least one provisional response for the INVITE, to avoid a race where CANCEL arrives before the INVITE is even processed.)


This basic flow demonstrates how SIP works in a simple call setup scenario. We saw the INVITE request/response handshake to establish a session, the use of provisional responses (100, 180) for call progress, and the use of BYE to terminate the session. We also highlighted the special nature of ACK: for a successful call, ACK is its own transaction with no response; for an unsuccessful call (a non-2xx final response like a busy or decline), the UAC still sends an ACK but in that case the ACK is considered part of the INVITE's transaction and is not a separate handshake (the UAS will stop retransmitting the error after it gets the ACK). These nuances are defined in the SIP spec to handle reliability over UDP. Essentially, an ACK for a 200 OK is sent end-to-end and not acknowledged, whereas an ACK for an error is just to stop the retransmissions from the UAS (which does retransmit errors like 487 or 486 until ACK).
=== Quick Reference: Basic Call Flow ===


Throughout the call, proxies didn't alter the message bodies – SDP offer/answer passed end-to-end. They did manage routing, though. If this were a multi-target scenario (say Bob is registered at two places), the proxy could have forked the INVITE to multiple locations. Forking means a proxy sends parallel INVITEs to all contacts for Bob (or serially tries one then another). This can lead to multiple 180 Ringing responses (Alice might hear multiple phones ringing) and even potentially multiple 200 OKs if two devices answer nearly simultaneously. SIP handles this by the proxy forwarding the first 200 OK to Alice and then sending CANCELs to the others, typically, so one call is established. If two 200 OKs do reach Alice, she technically has two dialogs and would have to BYE one – but in practice proxies try to avoid that. This is an example of how proxies can coordinate more complex scenarios.
{| class="wikitable"
|-
! Step !! Direction !! Message !! Purpose
|-
| 1 || Alice -> Proxy || INVITE || Initiate call (with SDP offer)
|-
| 2 || Proxy -> Alice || 100 Trying || Stop retransmissions
|-
| 3 || Bob -> Proxy || 180 Ringing || Callee alerting
|-
| 4 || Bob -> Proxy || 200 OK || Call answered (with SDP answer)
|-
| 5 || Alice -> Bob || ACK || Confirm receipt of 200 OK
|-
| 6 || Alice <-> Bob || RTP || Media session
|-
| 7 || Bob -> Alice || BYE || End call
|-
| 8 || Alice -> Bob || 200 OK || Confirm termination
|}


== Transactions, Dialogs, and Sessions ==
== Transactions, Dialogs, and Sessions ==


It's important to understand the layering of SIP's concepts: transactions, dialogs, and sessions. In the above flow:
It's important to understand the layering of SIP's concepts: transactions, dialogs, and sessions. These three concepts operate at different scopes and lifetimes:


<kroki lang="plantuml">
<kroki lang="plantuml">
@startuml
@startuml
title SIP Layered Concepts: Transactions, Dialogs, Sessions
title SIP Call: Transactions within a Dialog and Session


skinparam backgroundColor #ffffff
skinparam backgroundColor #ffffff
skinparam defaultFontColor #1e293b
skinparam defaultFontColor #1e293b
skinparam ArrowColor #64748b


skinparam rectangle {
skinparam participant {
     BackgroundColor #e0f4fc
     BackgroundColor #e0f4fc
     BorderColor #00A7E3
     BorderColor #00A7E3
    RoundCorner 8
}
}


rectangle "Session (Media/RTP)" as session #fef3e2 {
participant "Alice" as A
    rectangle "Dialog (Call-ID + From-tag + To-tag)" as dialog {
participant "Bob" as B
        rectangle "INVITE Transaction\n(INVITE + responses)" as tx1
 
        rectangle "ACK Transaction\n(ACK for 2xx)" as tx2
== Dialog Established (Call-ID + From-tag + To-tag) ==
        rectangle "BYE Transaction\n(BYE + 200 OK)" as tx3
 
    }
box "INVITE Transaction" #e0f4fc
}
A -> B : INVITE (CSeq: 1)
A <-- B : 100 Trying
A <-- B : 180 Ringing
A <-- B : 200 OK
end box
 
box "ACK Transaction" #fef3e2
A -> B : ACK
end box
 
|||
note over A, B #f1f5f9: '''SESSION ACTIVE'''\n(RTP Media Flowing between endpoints)
|||


tx1 -[hidden]right-> tx2
box "BYE Transaction" #e0f4fc
tx2 -[hidden]right-> tx3
B -> A : BYE (CSeq: 1)
B <-- A : 200 OK
end box


note bottom of session
== Dialog Terminated ==
  Session = actual media exchange (RTP audio/video)
  Dialog = SIP signaling relationship between UAs
  Transaction = single request + responses
end note


@enduml
@enduml
Line 389: Line 784:
One special case: the initial INVITE transaction for a dialog is slightly different from subsequent ones because of the three-way handshake for INVITE (INVITE -> 200 OK -> ACK). As noted, the ACK for a 2xx response is not considered part of the transaction – so the INVITE transaction is actually only completed by the 200 OK. The ACK is its own (with no response expected) to confirm the dialog. For non-2xx final responses, however, the ACK is considered part of that transaction (to conclude it). This distinction exists because SIP needed to solve reliability for the 200 OK over an unreliable transport: the UAS can't rely on the transaction layer's retransmit mechanism for 200 OK (since the INVITE transaction would technically end at the 200 OK), so instead the UAS itself retransmits the 200 OK until an ACK arrives. This is a unique quirk of SIP's INVITE handling.
One special case: the initial INVITE transaction for a dialog is slightly different from subsequent ones because of the three-way handshake for INVITE (INVITE -> 200 OK -> ACK). As noted, the ACK for a 2xx response is not considered part of the transaction – so the INVITE transaction is actually only completed by the 200 OK. The ACK is its own (with no response expected) to confirm the dialog. For non-2xx final responses, however, the ACK is considered part of that transaction (to conclude it). This distinction exists because SIP needed to solve reliability for the 200 OK over an unreliable transport: the UAS can't rely on the transaction layer's retransmit mechanism for 200 OK (since the INVITE transaction would technically end at the 200 OK), so instead the UAS itself retransmits the 200 OK until an ACK arrives. This is a unique quirk of SIP's INVITE handling.


In summary, the dialog provides the state that links a series of message exchanges, and transactions ensure each request/response exchange is reliable and independent. For most practical purposes, when someone talks about "a SIP call" they mean a dialog established by an INVITE. Within that call, multiple transactions occur (re-INVITE, BYE, etc.). Outside of calls, transactions like an isolated OPTIONS or a REGISTER stand alone (no lasting dialog unless the method itself defines one – REGISTER doesn't, SUBSCRIBE does create a dialog for the subscription, etc.).
=== Quick Reference: Transaction vs Dialog vs Session ===
 
{| class="wikitable"
|-
! Concept !! Scope !! Lifetime !! Identified By !! Example
|-
| '''Transaction''' || Single request/response || Seconds || Branch ID, CSeq, Method || INVITE + 100/180/200
|-
| '''Dialog''' || UA-to-UA relationship || Minutes to hours || Call-ID + From-tag + To-tag || Entire call signaling
|-
| '''Session''' || Media exchange || Duration of call || SDP negotiation || RTP audio/video stream
|}
 
{| class="wikitable"
|-
! ACK Behavior !! For 2xx Response !! For non-2xx Response
|-
| Part of INVITE transaction? || No (separate transaction) || Yes (same transaction)
|-
| Retransmitted? || No || No
|-
| What triggers it? || Receiving 200 OK || Receiving 4xx/5xx/6xx
|-
| UAS behavior if no ACK || Retransmits 200 OK || Retransmits error response
|-
| Routing || End-to-end via Route set || Hop-by-hop like original INVITE
|}


== Registration and Location Service ==
== Registration and Location Service ==
Line 436: Line 857:
For example, when Alice opens her softphone app, it sends: <code>REGISTER sip:atlanta.com SIP/2.0</code> with headers including <code>To: sip:alice@atlanta.com</code>, <code>From: sip:alice@atlanta.com</code> (with a new tag), <code>Contact: <sip:alice@192.0.2.4:5060></code>, and often an Expires header (or Contact parameter) indicating how long this registration should be valid (e.g., 3600 seconds). The REGISTER is sent to the registrar server for atlanta.com (often co-located with the proxy). If authentication is required, the server will respond 401 Unauthorized with a challenge, and Alice's UA will re-send the REGISTER with proper Authorization credentials (username/password) – a process identical to HTTP digest auth as mentioned. Once authorized, the registrar responds with 200 OK. At this point, Alice is "registered." The registrar has stored the mapping: AOR=sip:alice@atlanta.com -> Contact=sip:alice@192.0.2.4:5060 (plus an expiration time).
For example, when Alice opens her softphone app, it sends: <code>REGISTER sip:atlanta.com SIP/2.0</code> with headers including <code>To: sip:alice@atlanta.com</code>, <code>From: sip:alice@atlanta.com</code> (with a new tag), <code>Contact: <sip:alice@192.0.2.4:5060></code>, and often an Expires header (or Contact parameter) indicating how long this registration should be valid (e.g., 3600 seconds). The REGISTER is sent to the registrar server for atlanta.com (often co-located with the proxy). If authentication is required, the server will respond 401 Unauthorized with a challenge, and Alice's UA will re-send the REGISTER with proper Authorization credentials (username/password) – a process identical to HTTP digest auth as mentioned. Once authorized, the registrar responds with 200 OK. At this point, Alice is "registered." The registrar has stored the mapping: AOR=sip:alice@atlanta.com -> Contact=sip:alice@192.0.2.4:5060 (plus an expiration time).


Registrations have a limited lifetime (the Expires header or Contact's expires parameter dictates this). Alice must periodically refresh her registration (by sending a new REGISTER before expiry) to keep it active, or she can send a REGISTER with <code>Expires: 0</code> to remove a registration (logout). Multiple devices can register for the same AOR with different Contacts; the registrar will store all of them (often proxies will then fork incoming INVITEs to all Contacts). The Contact header in REGISTER can also contain a q-value for priority or other parameters (defined in RFC 3261 and extended in RFC 3840 for capabilities).
Registrations have a limited lifetime (the Expires header or Contact's expires parameter dictates this). Alice must periodically refresh her registration (by sending a new REGISTER before expiry) to keep it active, or she can send a REGISTER with <code>Expires: 0</code> to remove a registration (logout). Multiple devices can register for the same AOR with different Contacts; the registrar will store all of them (often proxies will then fork incoming INVITEs to all Contacts). The Contact header in REGISTER can also contain a q-value for priority or other parameters (defined in [https://datatracker.ietf.org/doc/html/rfc3261 RFC 3261] and extended in [https://datatracker.ietf.org/doc/html/rfc3840 RFC 3840] for capabilities).


The '''Location Service''' is the database that the registrar populates with these contacts. When an incoming INVITE for Alice arrives at atlanta.com, the proxy queries the location service to find where to send the INVITE (e.g., to Alice's registered IP). This is how routing of incoming calls works in SIP, enabling user mobility. If Alice moves networks and sends a new REGISTER, the location service updates to her new Contact. If she unregisters (or it expires), callers will get a 404 Not Found or be sent to voicemail, etc., depending on server policy.
The '''Location Service''' is the database that the registrar populates with these contacts. When an incoming INVITE for Alice arrives at atlanta.com, the proxy queries the location service to find where to send the INVITE (e.g., to Alice's registered IP). This is how routing of incoming calls works in SIP, enabling user mobility. If Alice moves networks and sends a new REGISTER, the location service updates to her new Contact. If she unregisters (or it expires), callers will get a 404 Not Found or be sent to voicemail, etc., depending on server policy.


A REGISTER is distinct from an INVITE dialog – it's a standalone transaction (with its own response). It does not create a dialog. However, there is an optional SIP event package (RFC 3680) that allows subscription to registration state if needed (beyond scope here).
=== Quick Reference: Registration ===


One should also note that REGISTER binds only the Contact address for future requests, it doesn't affect the current transport connection except when using specific extensions like outbound (RFC 5626) which allow maintaining a persistent flow. In basic SIP, the proxy just uses the Contact when sending a new request.
{| class="wikitable"
|-
! Action !! Expires Value !! Result
|-
| Initial registration || e.g., 3600 || Binding created
|-
| Refresh registration || e.g., 3600 || Binding updated
|-
| Unregister || 0 || Binding removed
|-
| Fetch bindings || (no Contact) || Returns current bindings
|}


Another point: The From and To in REGISTER are usually the same (both are the AOR of the user), because you're effectively saying "I as this user want to register this address." All registrations from one UA to the same server reuse the same Call-ID and increment CSeq (as a best practice) so the server knows it's an update to an existing registration. The server's 200 OK to REGISTER often contains Contact headers (echoing what's registered, possibly with q-values or stating how many contacts were registered).
{| class="wikitable"
|-
! Header !! In REGISTER Request !! Notes
|-
| Request-URI || <code>sip:domain.com</code> || Registrar's domain
|-
| To || <code>sip:user@domain.com</code> || AOR being registered
|-
| From || <code>sip:user@domain.com</code> || Usually same as To
|-
| Contact || <code><sip:user@ip:port></code> || Device's reachable address
|-
| Expires || 3600 || Registration lifetime (seconds)
|-
| Call-ID || (consistent) || Same for all registrations from UA
|-
| CSeq || (incrementing) || Increments each registration
|}


== SIP Extensions and Advanced Features ==
== SIP Extensions and Advanced Features ==


The core SIP specification (RFC 3261) is augmented by numerous other RFCs that introduce new features. Understanding every extension is a massive undertaking (the SIP RFC series is extensive), but here we'll summarize some of the key extensions and advanced concepts that build on the basics we've covered:
The core SIP specification ([https://datatracker.ietf.org/doc/html/rfc3261 RFC 3261]) is augmented by numerous other RFCs that introduce new features. Understanding every extension is a massive undertaking (the SIP RFC series is extensive), but here we'll summarize some of the key extensions and advanced concepts that build on the basics we've covered:


=== Reliability of Provisional Responses (RFC 3262) ===
=== Reliability of Provisional Responses (RFC 3262) ===


In basic SIP, provisional responses (1xx) are not acknowledged at the transaction layer – they are sent unreliably (except 100 Trying isn't forwarded). RFC 3262 adds an option to send provisional responses reliably. It introduces the '''PRACK''' method (Provisional ACK) which acts like an ACK for a provisional response. To use this, the UAS includes a header <code>Require: 100rel</code> (and the UAC might offer <code>Supported: 100rel</code>). A provisional response (like 180 Ringing or 183 Session Progress) is sent with a RSeq header (sequence number) and the UAC must respond with PRACK to acknowledge it. This ensures, over UDP, that important provisional responses (especially ones carrying early media in 183 or SIP precondition info) are not lost. PRACK itself is just another request within the dialog (with its own 200 OK response). The use of PRACK is negotiated per call. If used, provisional responses are essentially reliable, similar to final responses.
In basic SIP, provisional responses (1xx) are not acknowledged at the transaction layer – they are sent unreliably (except 100 Trying isn't forwarded). [https://datatracker.ietf.org/doc/html/rfc3262 RFC 3262] adds an option to send provisional responses reliably. It introduces the '''PRACK''' method (Provisional ACK) which acts like an ACK for a provisional response. To use this, the UAS includes a header <code>Require: 100rel</code> (and the UAC might offer <code>Supported: 100rel</code>). A provisional response (like 180 Ringing or 183 Session Progress) is sent with a RSeq header (sequence number) and the UAC must respond with PRACK to acknowledge it. This ensures, over UDP, that important provisional responses (especially ones carrying early media in 183 or SIP precondition info) are not lost. PRACK itself is just another request within the dialog (with its own 200 OK response). The use of PRACK is negotiated per call. If used, provisional responses are essentially reliable, similar to final responses.


=== Offer/Answer Model with SDP (RFC 3264) ===
=== Offer/Answer Model with SDP (RFC 3264) ===


This RFC goes hand-in-hand with SIP, detailing how SDP is used in INVITE/200/ACK to negotiate media. It specifies that one party offers a set of media (codecs, IPs, ports) and the answerer picks from those. We touched on this earlier; key points are that all SIP UAs must support SDP, and that an INVITE can contain an SDP offer. If it does, the 200 OK must contain the answer. If the INVITE has no SDP, it's implying the offer comes in the 200 and then the ACK from caller contains the answer. The model also allows for re-INVITEs or UPDATE to change the session later (e.g., hold by setting a=inactive or sendonly, codec change, adding video, etc.). The Offer/Answer RFC is fundamental for ensuring both ends agree on the media session parameters.
This RFC goes hand-in-hand with SIP, detailing how SDP is used in INVITE/200/ACK to negotiate media. It specifies that one party offers a set of media (codecs, IPs, ports) and the answerer picks from those. We touched on this earlier; key points are that all SIP UAs must support SDP, and that an INVITE can contain an SDP offer. If it does, the 200 OK must contain the answer. If the INVITE has no SDP, it's implying the offer comes in the 200 and then the ACK from caller contains the answer. The model also allows for re-INVITEs or UPDATE to change the session later (e.g., hold by setting a=inactive or sendonly, codec change, adding video, etc.). The Offer/Answer RFC ([https://datatracker.ietf.org/doc/html/rfc3264 RFC 3264]) is fundamental for ensuring both ends agree on the media session parameters.
 
=== Locating SIP Servers (RFC 3263) ===
 
This details the DNS procedures we mentioned. SIP clients perform NAPTR and SRV lookups for the domain to find the correct host and transport to send requests. It also covers fallback between UDP and TCP if a large message doesn't get response, etc. For example, a lookup might find <code>_sip._udp.biloxi.com</code> and <code>_sip._tcp.biloxi.com</code> records, etc., and the client tries them in order of priority/weight. This all happens under the hood in a UA library or proxy.
 
=== Event Framework – SUBSCRIBE/NOTIFY (RFC 3265 & RFC 6665) ===
 
This provides a generic framework for subscription to events in SIP. Using SUBSCRIBE, a UA can request to be notified of certain events from another UA or server. The subscription itself is a dialog (with its own Call-ID and tags separate from call dialogs). NOTIFY messages are then sent to the subscriber whenever the subscribed event occurs or changes. For example, SIP presence is built on this: you SUBSCRIBE to a user's presence, and their presence server sends NOTIFY updates when their status changes. Other examples: message waiting indicators (voicemail waiting), call monitoring, or even SLA sharing. RFC 3265 defined the baseline; RFC 6665 later updated it and clarified a lot of behavior. Event notifications require defining specific Event Packages (e.g., an event package for "presence" is defined in RFC 3856, for "message-summary" in RFC 3842, etc.). Each NOTIFY indicates the event type and carries a message body with the state (like presence info in XML, or voicemail count). The subscription has an expiration and can be refreshed. This system is called SIP-SIMPLE when used for IM and presence.


=== Session Timer (RFC 4028) ===
=== Session Timer (RFC 4028) ===


This extension introduced a keep-alive mechanism for SIP dialogs. A session timer is an interval negotiated between UA and proxy (or UAS) such that if no refreshing request (like re-INVITE or UPDATE) is sent within that time, the session is considered dead. This helps in cleaning up zombie calls if one side crashes and stops sending media – without session timer, a call might remain "active" forever if BYE is never sent. With session timers, one side (the "refresher") will send a periodic re-INVITE/UPDATE (with Session-Expires header) to check that the dialog is still active. If the refresh fails (no response), the call is terminated. This is often used in networks to make sure hung calls don't tie up resources. The timer value might be, say, 30 minutes or even 2 minutes in some systems – if both sides support it (<code>Require: timer</code> or <code>Supported: timer</code> is used to negotiate).
This extension introduced a keep-alive mechanism for SIP dialogs. A session timer is an interval negotiated between UA and proxy (or UAS) such that if no refreshing request (like re-INVITE or UPDATE) is sent within that time, the session is considered dead. This helps in cleaning up zombie calls if one side crashes and stops sending media – without session timer, a call might remain "active" forever if BYE is never sent. With session timers, one side (the "refresher") will send a periodic re-INVITE/UPDATE (with Session-Expires header) to check that the dialog is still active. If the refresh fails (no response), the call is terminated. This is often used in networks to make sure hung calls don't tie up resources. The timer value might be, say, 30 minutes or even 2 minutes in some systems – if both sides support it (<code>Require: timer</code> or <code>Supported: timer</code> is used to negotiate). See [https://datatracker.ietf.org/doc/html/rfc4028 RFC 4028] for details.
 
=== UPDATE Method (RFC 3311) ===
 
As described earlier, UPDATE allows changing session parameters before the initial INVITE has completed. For instance, during early ringing, you might want to send updated SDP (maybe the network prefers to do early media negotiations or change ringback tones). Or for call queues, an UPDATE could be used to alert the caller with new info while the call is still pending. UPDATE can also refresh session timers in early stage. Essentially, it fills a gap by allowing mid-dialog (actually mid-transaction) modifications when INVITE is in progress. If used after call establishment, it's similar to a re-INVITE but doesn't create a new offer/answer if not needed (though re-INVITE could also be used then).


=== REFER Method (RFC 3515) ===
=== REFER Method (RFC 3515) ===


REFER is used for call transfer and similar features. When Alice wants to transfer Bob to Carol, Alice (who is in a call with Bob) sends Bob a REFER with a <code>Refer-To: Carol's URI</code>. Bob's UA, upon receiving REFER, will act as if Bob is calling Carol (essentially it triggers a new INVITE to Carol). The outcome of that referral (whether Carol answered) is reported back to Alice via NOTIFY (REFER defined an event package for the result of the transfer). This mechanism allows one party to ask the other to initiate a new request on their behalf. It's not limited to transferring calls, but that's the common use (attended or unattended transfer scenarios). REFER builds on the subscription model (implicit subscription to the "refer" event is created by the REFER request). A successful REFER usually gets a 202 Accepted response and then NOTIFYs (with "SIP/2.0 200 OK" or error inside) to indicate the referred call's status. There are also extensions like RFC 7647 clarifying REFER behavior with NOTIFY (allowing suppressing the implicit subscription with a norefersub parameter, etc.).
REFER is used for call transfer and similar features. When Alice wants to transfer Bob to Carol, Alice (who is in a call with Bob) sends Bob a REFER with a <code>Refer-To: Carol's URI</code>. Bob's UA, upon receiving REFER, will act as if Bob is calling Carol (essentially it triggers a new INVITE to Carol). The outcome of that referral (whether Carol answered) is reported back to Alice via NOTIFY (REFER defined an event package for the result of the transfer). This mechanism allows one party to ask the other to initiate a new request on their behalf. It's not limited to transferring calls, but that's the common use (attended or unattended transfer scenarios). See [https://datatracker.ietf.org/doc/html/rfc3515 RFC 3515].


=== NAT Traversal ===
=== NAT Traversal ===
Line 482: Line 919:
==== Symmetric Response Routing (rport, RFC 3581) ====
==== Symmetric Response Routing (rport, RFC 3581) ====


NATs pose a big problem for SIP because the SIP headers carry IP addresses/ports that might be local addresses, and the endpoints might be behind firewalls. RFC 3581 introduced the '''rport''' parameter for Via headers. When a UAC behind a NAT includes <code>Via: ...;rport</code>, it is asking the server to send the response back to the source IP and port the request came from, rather than the address in the Via (which might be the private IP). This helps responses get back through NATs because they go to the NAT's mapped address/port (since the request came from there). Most modern SIP devices use rport by default.
NATs pose a big problem for SIP because the SIP headers carry IP addresses/ports that might be local addresses, and the endpoints might be behind firewalls. [https://datatracker.ietf.org/doc/html/rfc3581 RFC 3581] introduced the '''rport''' parameter for Via headers. When a UAC behind a NAT includes <code>Via: ...;rport</code>, it is asking the server to send the response back to the source IP and port the request came from, rather than the address in the Via (which might be the private IP). This helps responses get back through NATs because they go to the NAT's mapped address/port (since the request came from there). Most modern SIP devices use rport by default.


==== Outbound (RFC 5626) ====
==== Outbound (RFC 5626) ====


RFC 5626 (Outbound) goes further: it allows a UA to register via a persistent flow (TCP/TLS or UDP with keep-alives) and maintain that connection for incoming requests as well. The UA would register with an Instance-ID and possibly multiple contacts with different flow-IDs, and the registrar/proxy will use the same connection to forward incoming INVITEs. Outbound also defines keep-alive messages (CRLF or STUN pings) to keep NAT bindings open. This extension is crucial for mobile devices and NAT-heavy environments (e.g., SIP over cellular networks).
[https://datatracker.ietf.org/doc/html/rfc5626 RFC 5626] (Outbound) goes further: it allows a UA to register via a persistent flow (TCP/TLS or UDP with keep-alives) and maintain that connection for incoming requests as well. The UA would register with an Instance-ID and possibly multiple contacts with different flow-IDs, and the registrar/proxy will use the same connection to forward incoming INVITEs. Outbound also defines keep-alive messages (CRLF or STUN pings) to keep NAT bindings open. This extension is crucial for mobile devices and NAT-heavy environments (e.g., SIP over cellular networks).


Alongside these, techniques like '''Session Border Controllers (SBCs)''' are often used – these are essentially B2BUAs at the network edge that ensure media and signaling traverse NATs (they hide the complexities from endpoints). Additionally, '''ICE (Interactive Connectivity Establishment)''' is used for media NAT traversal (by including multiple candidate addresses in SDP and testing connectivity). While ICE (RFC 5245) is a separate protocol, SIP endpoints often integrate ICE in their SDP to get media flowing.
=== Quick Reference: SIP Extensions ===


=== Instant Messaging and Presence (SIP SIMPLE) ===
{| class="wikitable"
|-
! Extension !! RFC !! Purpose !! Key Headers/Methods
|-
| '''100rel''' || [https://datatracker.ietf.org/doc/html/rfc3262 3262] || Reliable provisional responses || PRACK, RSeq, RAck
|-
| '''timer''' || [https://datatracker.ietf.org/doc/html/rfc4028 4028] || Session keep-alive || Session-Expires, Min-SE
|-
| '''replaces''' || [https://datatracker.ietf.org/doc/html/rfc3891 3891] || Replace existing dialog || Replaces header
|-
| '''join''' || [https://datatracker.ietf.org/doc/html/rfc3911 3911] || Join existing dialog || Join header
|-
| '''norefersub''' || [https://datatracker.ietf.org/doc/html/rfc4488 4488] || Suppress REFER subscription || Refer-Sub header
|-
| '''outbound''' || [https://datatracker.ietf.org/doc/html/rfc5626 5626] || NAT traversal, persistent flows || Instance-ID, reg-id
|-
| '''gruu''' || [https://datatracker.ietf.org/doc/html/rfc5627 5627] || Globally Routable UA URI || Contact with gr parameter
|-
| '''path''' || [https://datatracker.ietf.org/doc/html/rfc3327 3327] || Edge proxy routing || Path header
|}


We mentioned MESSAGE method (RFC 3428) for instant messaging. SIP can be used as a pager-mode IM by simply sending SIP MESSAGE outside a call or within a dialog. Each MESSAGE is like an SMS – no sessions, just one-off messages, usually carrying plain text or maybe CPIM wrappers. For more complex chat (large messages, offline storage), often SIP is used with MSRP (Message Session Relay Protocol) negotiated via SDP in an INVITE (for session-mode IM). Presence, as noted, is done via SUBSCRIBE/NOTIFY (presence event package, RFC 3856) where users publish their status and others subscribe. There's also PUBLISH method (RFC 3903) for a UA to publish its presence state to a server which then NOTIFYs subscribers.
{| class="wikitable"
 
|-
=== Call Forking and Forked Responses ===
! Event Package !! RFC !! Purpose
 
|-
When a proxy forks a request to multiple UAS locations, the UAC might receive multiple 2xx responses (from different branches). SIP handles this by treating each 2xx as establishing a separate dialog. The UAC can choose to accept one and terminate the others. Typically, a proxy will try to avoid multiple final answers (by canceling other branches once one answers), but it's possible for two almost-simultaneous answers to both reach the caller. The caller should then send ACK to both and quickly send BYE to the one it doesn't want, or handle it (in some cases two active dialogs might be merged into a conference, but that's beyond basic SIP). This is a corner case that implementations must be aware of.
| presence || [https://datatracker.ietf.org/doc/html/rfc3856 3856] || User presence status
 
|-
=== Additional SIP Headers and Features ===
| message-summary || [https://datatracker.ietf.org/doc/html/rfc3842 3842] || Voicemail waiting indicator
 
|-
There are many more headers and features defined across RFCs. For example:
| refer || [https://datatracker.ietf.org/doc/html/rfc3515 3515] || REFER result notification
 
|-
* The '''Replaces''' header (RFC 3891) allows a REFER to suggest replacing an existing dialog (used in attended transfer to swap calls).
| dialog || [https://datatracker.ietf.org/doc/html/rfc4235 4235] || Dialog state notification
* The '''Join''' header (RFC 3911 & updated by RFC 7621) allows a UA to join an existing dialog (used in conferencing scenarios).
|-
* The '''Reason''' header (RFC 3326) can be included in BYE or CANCEL to indicate why a call was terminated (busy, declined, etc.).
| conference || [https://datatracker.ietf.org/doc/html/rfc4575 4575] || Conference state
* '''Privacy extensions''' (RFC 3323, etc.) allow hiding user identity by proxies.
|-
* The '''P-Preferred-Identity / P-Asserted-Identity''' headers (RFC 3325) are used in trusted networks (like carrier networks) to convey identity information securely within the network.
| reg || [https://datatracker.ietf.org/doc/html/rfc3680 3680] || Registration state
* '''Session-Expires''' header is used for session timers (we discussed).
|}
* '''Allow''' header indicates what methods a UA supports.
* '''Supported''' indicates other supported extensions.
* '''Event''' header identifies event packages in SUBSCRIBE/NOTIFY.
 
=== Security Mechanisms ===
 
Apart from digest authentication, SIP can be secured at the transport layer using '''TLS''' (SIPS URI or using TLS for a domain – often on port 5061). This provides encryption of SIP signaling, protecting headers and content from eavesdropping or tampering. At the application layer, there was a mechanism for end-to-end body encryption using '''S/MIME''' (carrying certificates and encrypted SDP or messages), but this saw little practical use.
 
More modern is the '''STIR/SHAKEN''' framework (RFC 8224 and others) which uses identity headers with certificates to prevent caller ID spoofing – this is a hot topic in telephony (robocall mitigation). That involves the Identity header where the originating service signs the call information, and the terminating service verifies it.
 
There are also extensions for media security like '''SDES for SRTP''' (using SDP to exchange SRTP keys) or '''DTLS-SRTP''' (keying via DTLS handshake) to secure the media. Those are outside SIP itself, but SIP carries their negotiation via SDP.
 
=== Telephony Interworking ===
 
SIP can interwork with the traditional phone network (PSTN). For instance, '''SIP-T / SIP-I''' (RFC 3372 etc.) define how ISDN/PSTN signaling (like ISUP messages) can be encapsulated in SIP for handoff between IP and circuit networks. This might involve including ISUP message in the body of SIP for seamless transit. Gateways perform translation between SIP messages and ISUP/SCCP, etc. For most SIP users, this is transparent – you dial a phone number, it goes to a SIP trunk provider, and they convert it to PSTN signaling if needed.
 
=== Conferencing and Early Media ===
 
Basic SIP call flows can be extended to multiparty. One way is '''third-party call control (3PCC)''' where a controller (app) uses multiple SIP dialogs to bring parties into a conference by managing INVITEs (RFC 3725 outlines best practices). Another is using a '''conference server''' where users INVITE themselves to a focus (conference URI), and the mixing is done there (RFC 4353). SIP has event packages like conference event package (RFC 4575) that let participants know who's in the call, etc.
 
'''Early media''' (media before call answer, like ringback tones or announcements) can be handled by sending media during 180/183 responses. There's a whole discussion (RFC 3960) on how to manage early media and avoid clipping or confusion with local ringback. Essentially, either the caller or the callee's network may generate tones or media to play to the caller before answer.
 
As you can see, SIP is not just a single protocol in isolation, but an entire family of protocols and extensions – often referred to as the SIP "umbrella". It can be daunting, but knowing the core (RFC 3261) plus the major extensions above covers most scenarios.
 
When implementing or troubleshooting SIP, it's helpful to have a reference (like this guide or concise RFC summaries) because the formal RFCs are very detailed. However, the RFCs also provide exact definitions which can be crucial for edge cases. For example, timer values, how to handle forked responses, tag generation rules, how stateful proxies handle CANCEL, what to do if multiple challenges in a forked response, etc., are all spelled out in RFC 3261 and others.
 
In practice, tools like Wireshark can decode SIP flows and show these headers and messages clearly, and SIP stack libraries (pjsip, reSIProcate, etc.) implement most of these behaviors under the hood. For someone learning SIP, it's useful to capture a simple SIP call and map it to the steps above.
 
== Conclusion ==
 
SIP is a powerful and flexible protocol that has become the foundation of modern IP telephony and multimedia communication. We covered the essentials of SIP: its architecture (user agents, proxies, registrars, etc.), the format of SIP messages (methods, responses, and headers), the typical call setup and teardown process, and many important extensions that enhance SIP's capabilities (from reliability to event notifications and beyond).
 
With this understanding, one should be able to read a SIP call flow and make sense of the messages, or configure/troubleshoot a SIP-based system with a clear mental model of what's supposed to happen. This guide serves as a high-level reference – essentially a cheat sheet – to demystify SIP signaling without drowning in the full RFC jargon. For deeper dives, the relevant RFCs (3261 and others cited) can be consulted for exact details, but often remembering the key concepts and how they fit together is enough to work effectively with SIP.
 
SIP continues to evolve, especially in areas of security and large-scale deployments (for instance, SIP in IMS – the IP Multimedia Subsystem – in mobile networks, or extensions for emergency calling, etc.). However, the core principles remain as discussed. Armed with this knowledge, you should be able to approach those advanced uses with a solid foundation. SIP's beauty lies in its relative simplicity (text messages, clear roles) combined with extensibility – it's like the lingua franca that different voice/video systems speak to interoperate on the Internet.


== Key RFCs ==
== Key RFCs ==
Line 548: Line 969:
{| class="wikitable"
{| class="wikitable"
|-
|-
! RFC !! Title
! RFC !! Title !! Category
|-
|-
| RFC 3261 || SIP: Session Initiation Protocol (core specification)
| [https://datatracker.ietf.org/doc/html/rfc3261 RFC 3261] || SIP: Session Initiation Protocol || Core
|-
|-
| RFC 3262 || Reliability of Provisional Responses in SIP (PRACK)
| [https://datatracker.ietf.org/doc/html/rfc3262 RFC 3262] || Reliability of Provisional Responses (PRACK) || Reliability
|-
|-
| RFC 3263 || Session Initiation Protocol: Locating SIP Servers
| [https://datatracker.ietf.org/doc/html/rfc3263 RFC 3263] || Locating SIP Servers || DNS/Routing
|-
|-
| RFC 3264 || An Offer/Answer Model with SDP
| [https://datatracker.ietf.org/doc/html/rfc3264 RFC 3264] || An Offer/Answer Model with SDP || Media
|-
|-
| RFC 3265 || Session Initiation Protocol - Specific Event Notification
| [https://datatracker.ietf.org/doc/html/rfc3265 RFC 3265] || SIP-Specific Event Notification || Events
|-
|-
| RFC 3311 || The Session Initiation Protocol UPDATE Method
| [https://datatracker.ietf.org/doc/html/rfc3311 RFC 3311] || The UPDATE Method || Session modification
|-
|-
| RFC 3326 || The Reason Header Field for SIP
| [https://datatracker.ietf.org/doc/html/rfc3326 RFC 3326] || The Reason Header Field || Diagnostics
|-
|-
| RFC 3428 || Session Initiation Protocol Extension for Instant Messaging
| [https://datatracker.ietf.org/doc/html/rfc3428 RFC 3428] || Extension for Instant Messaging || IM
|-
|-
| RFC 3515 || The Session Initiation Protocol REFER Method
| [https://datatracker.ietf.org/doc/html/rfc3515 RFC 3515] || The REFER Method || Call transfer
|-
|-
| RFC 3581 || An Extension to SIP for Symmetric Response Routing (rport)
| [https://datatracker.ietf.org/doc/html/rfc3581 RFC 3581] || Symmetric Response Routing (rport) || NAT
|-
|-
| RFC 3856 || A Presence Event Package for SIP
| [https://datatracker.ietf.org/doc/html/rfc3856 RFC 3856] || A Presence Event Package || Presence
|-
|-
| RFC 3903 || Session Initiation Protocol Extension for Event State Publication
| [https://datatracker.ietf.org/doc/html/rfc3903 RFC 3903] || Extension for Event State Publication || Presence
|-
|-
| RFC 4028 || Session Timers in SIP
| [https://datatracker.ietf.org/doc/html/rfc4028 RFC 4028] || Session Timers in SIP || Keep-alive
|-
|-
| RFC 5411 || A Hitchhiker's Guide to the Session Initiation Protocol
| [https://datatracker.ietf.org/doc/html/rfc5411 RFC 5411] || A Hitchhiker's Guide to SIP || Reference
|-
|-
| RFC 5626 || Managing Client-Initiated Connections in SIP (Outbound)
| [https://datatracker.ietf.org/doc/html/rfc5626 RFC 5626] || Managing Client-Initiated Connections || NAT/Outbound
|-
|-
| RFC 6665 || SIP-Specific Event Notification (update to RFC 3265)
| [https://datatracker.ietf.org/doc/html/rfc6665 RFC 6665] || SIP-Specific Event Notification (update) || Events
|}
|}


== See Also ==
== See Also ==


* [[VoIP]]
* [[Comprehensive_Guide_to_VoIP_Voice_Quality|VoIP Voice Quality Guide]]
* [[RTP]] - Real-time Transport Protocol
* [[SDP]] - Session Description Protocol
* [[VoIPmonitor]] - Network packet sniffer for VoIP


[[Category:VoIP]]
[[Category:VoIP]]
[[Category:Protocols]]
[[Category:Protocols]]
[[Category:SIP]]
[[Category:SIP]]

Latest revision as of 23:34, 11 December 2025

Quick Navigation
Architecture & Basics Messages & Call Flow Advanced Topics

Core Components

Addressing & Transport

SIP Methods

Response Codes

Headers & Structure

Call Flows

Core Concepts

Registration

Extensions

Reference

Session Initiation Protocol (SIP) is an application-layer signaling protocol designed for creating, modifying, and terminating multimedia sessions over IP networks. These sessions can include Internet telephone calls (VoIP), video conferences, or any combination of multimedia streams. SIP itself handles the signaling and control portion – it establishes the session parameters – while the actual media (audio, video, etc.) is carried over separate protocols (typically RTP). SIP messages are text-based (similar to HTTP) and use a request/response model. SIP invitations to sessions carry session descriptions (usually using the SDP protocol) so that participants can agree on media types and formats. A key design goal of SIP is protocol agility: it is independent of the underlying transport (UDP, TCP, TLS, etc. on port 5060/5061 by default) and of the type of session being established.

SIP was originally defined in RFC 2543 and later refined in RFC 3261 (2002), which became the core SIP standard. Over time, numerous extension RFCs have expanded SIP's capabilities (for reliability, events, IM, security, etc.), making SIP a broad and powerful framework for signaling. Despite the many formal definitions in the RFCs, this guide aims to explain SIP in an accessible way – serving as a "cheat sheet" to understand SIP signaling without getting lost in the exhaustive RFC language. We will cover SIP's architecture, message format, call flow, and key features/extensions, providing a solid reference for anyone new to SIP or looking to grasp the full picture.

SIP Architecture and Core Components

SIP is a peer-to-peer protocol with a client-server design for message exchange. Its architecture defines several types of network entities, each with specific roles:

User Agent (UA)

A UA is an endpoint in SIP, typically a user's device or software (softphone). It represents an end system and can function as both a client and a server. A UA has two logical sub-roles: a User Agent Client (UAC), which initiates requests, and a User Agent Server (UAS), which responds to requests. For example, if Alice's phone calls Bob's phone, Alice's UA acts as a UAC (sending an INVITE request) and Bob's UA acts as a UAS (receiving the INVITE and sending a response). Once a session is established, the roles can flip for each new transaction (e.g. Bob's phone sending a BYE will be UAC for that request and Alice's phone UAS to respond).

Proxy Server

A SIP proxy is an intermediary that routes SIP requests and responses on behalf of UAs. When a UA sends a request to a SIP address, it typically goes to a proxy server in the domain, which then forwards it towards the destination. Proxies handle tasks like routing logic (determining the next hop or target for a request), enforcing policies, and potentially authentication. A request may traverse multiple proxies in sequence. Each proxy may add or modify certain headers (like Via or Record-Route) before forwarding. Responses automatically follow the reverse path of the request through those proxies. Proxies can be stateful (maintaining transaction state, allowing forks and smarter handling) or stateless (simply forwarding messages and forgetting them). For example, a stateful proxy might fork an INVITE to ring multiple devices and manage the responses, whereas a stateless proxy just retransmits messages. Being a proxy is a logical role – in practice, a single server often acts as a proxy for some requests and a UAS/UAC for others depending on context.

Registrar

A registrar is a server that handles user registration. UAs use the REGISTER method to sign in with a SIP service, providing their address and current location (IP address or forwarding address). The registrar accepts REGISTER requests and stores the information in a location service (a database of user addresses) for its domain. This allows proxies to later look up where a user is currently reachable. In essence, a registrar binds a user's permanent SIP URI (their Address-of-Record like sip:alice@atlanta.com) to the Contact address of their device. Registrars often reside on the same server as a proxy for a domain. For example, when Alice's softphone comes online, it sends a REGISTER to atlanta.com containing her AOR (alice@atlanta.com) and her device's network address; the registrar server at atlanta.com will save that binding so that future calls to Alice can be routed to her device.

Redirect Server

A redirect server is a UAS that does not forward requests but instead sends back a special 3xx redirect response informing the UAC of a different route or address to try. Essentially, it redirects the client to contact an alternate server or URI. For example, if a user has moved to a different domain, a redirect server might respond with "302 Moved Temporarily" and the new contact address. The UAC then sends a new request to that address. Redirect servers offload routing logic from proxies by having the clients handle it.

Back-to-Back User Agent (B2BUA)

A B2BUA is not a defined role in the core SIP spec's transaction model, but it's worth mentioning because of its common use in practice (such as in PBX or SBC systems). A B2BUA is an entity that acts as a UA on both sides of a call – effectively terminating the SIP dialog on one side and creating a new one on the other side. Unlike a proxy, which passes messages along, a B2BUA maintains full state of the call and can perform deep packet inspection or modification. It behaves like a UAS to the caller and as a UAC to the callee, bridging the two call legs. This is used for scenarios like protocol interworking, media handling (since it can also manipulate media), or enforce policies where a proxy's limited role isn't enough.

These components can be combined in single physical servers or distributed. In a typical VoIP service, a server might act as proxy + registrar for a domain (accepting registrations and routing calls for users of that domain). Location service databases are used by registrars and proxies to map user addresses (AORs) to current device locations.

Addressing

SIP addresses are in the form of Uniform Resource Identifiers (URI). A user's public address-of-record looks like an email (e.g. sip:alice@atlanta.com). This URI can be resolved to the user's current Contact address via the registrar's location service. SIP URIs can also embed telephone numbers (e.g. sip:1234567890@pstn.provider.net) and may use a tel: URI scheme for phone numbers in certain cases (RFC 3966). There is also a secure form sips: (SIP Secure) which mandates that the request be sent over a secure transport (TLS) end-to-end. When a UA wants to reach another user, it sends a request to the domain part of the SIP URI. SIP relies on the DNS infrastructure for server location: the procedures in RFC 3263 define that the client will use DNS SRV records, NAPTR records, and A/AAAA records to find the SIP server for the target domain. For example, to send a request to sip:bob@biloxi.com, Alice's device will DNS-resolve biloxi.com for SIP service, possibly discovering a proxy server address to send the request to. This allows SIP to route messages globally using the DNS naming system.

Transport and Network

SIP messages can be transported over UDP (most common for telephony), TCP, or TLS-encrypted TCP (for secure SIP). It's flexible and even SCTP or WebSockets can be used (e.g. SIP over WebSocket in web apps). The protocol includes mechanisms to handle issues like fragmentation (e.g. large messages should use TCP) and network failures (via retransmission timers especially on UDP). NAT traversal can be challenging for SIP, because SIP messages and SDP often carry IP addresses and expect end-to-end connectivity. Extensions like rport (RFC 3581) and "outbound" (RFC 5626) address some of these issues (see later section on extensions), enabling symmetric response routing and keep-alive mechanisms to handle NATs.

Quick Reference: SIP Components

Component Role Key Function
User Agent (UA) Endpoint Initiates/receives calls. Acts as UAC (client) or UAS (server).
Proxy Server Intermediary Routes requests/responses. Can be stateful or stateless.
Registrar Registration Accepts REGISTER, stores user location bindings.
Redirect Server Routing Returns 3xx responses with alternate contact addresses.
B2BUA Call control Terminates/originates dialogs on both sides. Full call state.
Location Service Database Stores AOR-to-Contact mappings for user lookup.
URI Scheme Description Example
sip: Standard SIP URI sip:alice@atlanta.com
sips: Secure SIP (TLS required) sips:alice@atlanta.com
tel: Telephone number tel:+1-555-123-4567
Transport Port Notes
UDP 5060 Most common, requires retransmission handling
TCP 5060 For large messages, reliable delivery
TLS 5061 Encrypted signaling
WebSocket 80/443 For web applications (RFC 7118)

SIP Messages: Requests and Responses

SIP is a text-based protocol that exchanges messages in a format similar to HTTP. There are two types of SIP messages: Requests (also called methods) sent by clients to initiate an action, and Responses sent by servers (or UAs) to convey the result of that request. Each SIP message consists of a start line, zero or more header fields, a blank line, and an optional message body.

A Request start-line includes a method name and a Request-URI (the target address) along with the SIP version. For example: INVITE sip:bob@biloxi.com SIP/2.0. There are a number of standard methods defined. The core SIP specification (RFC 3261) defined six basic methods, and subsequent RFCs added additional methods for extended functionality. Below is a list of the common SIP request methods and their purpose:

Core SIP Methods

  • INVITE – Establishes a session (initiate a call). This method is used to invite one or more participants to a session. It can carry session description details (SDP) to set up media. A successful INVITE results in a dialog and session between endpoints.
  • ACK – Confirms that the client has received a final response to an INVITE. The ACK is used only with INVITE (to acknowledge the receipt of a 200 OK or other final response in some cases). We will discuss its special role in the call flow later.
  • BYE – Terminates an established session (hangs up a call). Either participant in a call sends a BYE to end the call when it's finished.
  • CANCEL – Cancels a pending request (typically used to cancel an INVITE that hasn't been answered yet). If you start a call and want to abort before it's answered, a CANCEL is sent.
  • REGISTER – Registers the UA's address with a SIP server. UAs send REGISTER to a registrar to upload their current contact information (binding their UA's network location to their SIP URI).
  • OPTIONS – Queries the capabilities of a server or another UA. This is like a "ping" that can ask what methods or media types the other side supports. It's often used for keep-alive or diagnostic purposes too.

Extension Methods

In addition to these core methods, several extension methods have been introduced by various RFCs to extend SIP's functionality:

  • PRACK – Provisional Acknowledgment. PRACK (defined in RFC 3262) is used to acknowledge provisional responses (1xx) that are sent reliably. It improves reliability of ringing or early media responses (see section on provisional reliability).
  • SUBSCRIBE – Subscribes to an event on a server. Defined in RFC 3265, SUBSCRIBE allows a client to request notifications of events (such as presence changes, message waiting, etc.) from another entity.
  • NOTIFY – Sends an event notification to a subscriber. When an event a user subscribed to occurs, the notifier (usually a server or UA) sends a NOTIFY to inform the subscriber of the new state.
  • PUBLISH – Publishes an event state to a server. Defined in RFC 3903, PUBLISH allows a UA to push its current state (e.g., presence information) to a server, which can then distribute it to subscribers.
  • INFO – Sends mid-session information that does not modify the session state. Defined in RFC 2976, this method is often used for sending DTMF tones or other signals during a call in-band (though newer mechanisms may replace INFO for that).
  • REFER – Asks the recipient to issue a new request (typically to transfer a call). Defined in RFC 3515, REFER is used to instruct a UA to contact a third party (e.g., Alice, in a call with Bob, sends Bob a REFER to call Charlie – effectively transferring or adding a party).
  • MESSAGE – Conveys an instant message (IM) within a SIP dialog or as a standalone out-of-dialog message. Defined in RFC 3428, MESSAGE carries textual chat content in the SIP body, enabling basic instant messaging.
  • UPDATE – Modifies the session parameters of an existing dialog before the final INVITE response. Defined in RFC 3311, UPDATE can change session settings (like codecs or media streams) or send an offer/answer negotiation in early dialog, without waiting for the initial INVITE to complete.

(Note: There are a few more methods and many SIP header extensions defined in various RFCs and domain-specific SIP profiles (e.g., INFO packages, PING as a keepalive in some systems, etc.), but the above are the primary methods you'll encounter. Together, they make SIP a very flexible protocol.)

SIP Response Codes

A Response start-line begins with a numeric status code (similar to HTTP codes) and a reason phrase, plus the SIP version. For example: SIP/2.0 180 Ringing or SIP/2.0 486 Busy Here. SIP responses are categorized by their class (hundreds digit):

1xx – Informational: provisional responses, used to convey that the request is being processed but not yet completed. These include 100 Trying (an interim response from proxies/UAS to stop retransmissions and indicate progress), 180 Ringing (the callee's phone is ringing), 183 Session Progress (often used to convey early media like ringback tones).

2xx – Success: the request succeeded. 200 OK is the general successful response for most requests (meaning the action is completed). INVITE's 200 OK specifically means the call is answered (and usually contains SDP media details).

3xx – Redirection: the request should be tried at a different location. For example, 301 Moved Permanently or 302 Moved Temporarily provide an alternate contact (these are used by redirect servers or UAs that want to redirect calls).

4xx – Client Error: the request is bad or cannot be fulfilled as is. This includes things like 400 Bad Request (malformed message), 401 Unauthorized (requires authentication), 404 Not Found (user not found), 486 Busy Here (the target UA is busy), 487 Request Terminated (request was canceled).

5xx – Server Error: the server (recipient) failed to fulfill a valid request. E.g., 500 Server Internal Error, 503 Service Unavailable (often means overload or maintenance).

6xx – Global Error: the request cannot be fulfilled by any server globally. E.g., 603 Decline (the user rejected the call), 604 Does Not Exist Anywhere. These indicate failure that shouldn't be retried elsewhere.

Only final responses (2xx–6xx) terminate a SIP transaction. Provisional (1xx) responses are informative and do not terminate the transaction (except they may cease retransmissions of the request in some cases). Some specific response codes have special handling in SIP (for example, 100 is never forwarded by proxies, 407 Proxy Authentication Required triggers proxy auth, 487 is used to indicate a canceled request, etc.), but the above categories suffice for a general understanding.

Quick Reference: SIP Methods

Method RFC Creates Dialog? Description
INVITE 3261 Yes Initiate session/call
ACK 3261 No Confirm INVITE final response
BYE 3261 No (ends) Terminate session
CANCEL 3261 No Cancel pending INVITE
REGISTER 3261 No Register contact with server
OPTIONS 3261 No Query capabilities
PRACK 3262 No Acknowledge reliable provisional
SUBSCRIBE 3265 Yes Subscribe to events
NOTIFY 3265 No Send event notification
PUBLISH 3903 No Publish event state
INFO 2976 No Mid-session info (DTMF)
REFER 3515 No Request call transfer
MESSAGE 3428 No Instant message
UPDATE 3311 No Modify session (early dialog)

Quick Reference: Response Codes

Code Meaning Notes
1xx – Provisional (Informational)
100 Trying Stops retransmissions, not forwarded by proxies
180 Ringing Callee alerting
181 Call Being Forwarded Call is being forwarded
182 Queued Call queued
183 Session Progress Early media / progress info
2xx – Success
200 OK Request succeeded
202 Accepted Request accepted (async processing)
3xx – Redirection
300 Multiple Choices Multiple options available
301 Moved Permanently User permanently at new location
302 Moved Temporarily User temporarily at new location
305 Use Proxy Must use specified proxy
4xx – Client Error
400 Bad Request Malformed syntax
401 Unauthorized Requires authentication
403 Forbidden Request refused
404 Not Found User not found
405 Method Not Allowed Method not supported
407 Proxy Auth Required Proxy authentication needed
408 Request Timeout No response in time
415 Unsupported Media Type Body format not supported
420 Bad Extension Required extension not supported
480 Temporarily Unavailable Callee unavailable
481 Call/Transaction Does Not Exist Dialog/transaction not found
486 Busy Here Callee busy
487 Request Terminated Request was CANCELed
488 Not Acceptable Here SDP not acceptable
5xx – Server Error
500 Server Internal Error Server failure
501 Not Implemented Method not implemented
502 Bad Gateway Gateway error
503 Service Unavailable Server overloaded/maintenance
504 Server Timeout Gateway timeout
6xx – Global Failure
600 Busy Everywhere All endpoints busy
603 Decline Call declined by user
604 Does Not Exist Anywhere User doesn't exist
606 Not Acceptable No acceptable media

SIP Message Structure and Headers

SIP messages, being text-based, are structured like HTTP messages. After the start line, each message has a series of header fields, each on their own line in a "Name: value" format. These header fields convey routing information, message attributes, and protocol-specific data. Here are some of the most important SIP header fields and what they mean:

Via

Lists the network path taken by the request. Each SIP proxy that forwards a request adds a Via header indicating its address. The Via also includes a branch identifier that uniquely marks this request to detect duplicates. For example, Alice's UA might send with Via: SIP/2.0/UDP alice_pc.atlanta.com;branch=z9hG4bK776asdhds. Proxies add their own Via on top. The Via is used to route responses back the same path: each proxy and UA uses the Via stack to send the response in reverse order. The branch parameter often begins with the magic cookie "z9hG4bK" for RFC3261-compliant systems, which helps identify loops and protocol version (the cookie ensures unique branch IDs and distinguishes them from older RFC2543 branches). In summary, the top Via header in a request indicates where to send the response. In responses, the Via headers are simply echoed back (each proxy removes its own Via as the response passes).

From

Indicates the originator of the request – i.e., the caller's identity. It contains a display name and SIP URI of the caller, and it also has a tag parameter. Example: From: "Alice" <sip:alice@atlanta.com>;tag=1928301774. The tag is a random string added by the UA to uniquely identify this particular dialog leg from the caller's side. The combination of Call-ID and tags is what makes a SIP dialog unique (explained later). The From header is usually not changed by proxies (except some privacy services). It represents the logical identity of who sent the request.

To

Indicates the intended recipient of the request (the callee's identity). It also contains a display name and SIP URI. Example: To: "Bob" <sip:bob@biloxi.com>. In an initial request (like the first INVITE of a call), the To header usually does not have a tag; the tag is added by the UAS in its response. So when Bob's phone answers, it will send a response with To: "Bob" <sip:bob@biloxi.com>;tag=a6c85cf (for instance). This tag in the To header is how the callee's UA marks its leg of the dialog. Subsequent in-dialog requests will include both tags.

Call-ID

A globally unique identifier for a particular call or SIP session. It is usually a random string (often a GUID or combination of random numbers and a host name) generated by the UA that initiates the call. Both sides use the same Call-ID for all messages in the same dialog. Together with the two tags (From tag and To tag), the Call-ID forms a unique key for the SIP dialog (the peer-to-peer relationship). If any of these differ, it's not the same dialog. Call-ID helps endpoints and proxies differentiate separate calls. (It's possible, albeit unusual, for different calls to accidentally pick the same Call-ID by random chance, but the tags will differ, so the full tuple is always unique.)

CSeq (Command Sequence)

A sequence number and method name pair that identifies a specific request within a dialog. For example: CSeq: 314159 INVITE. The CSeq number is incremented for each new request sent within a dialog (so each request a UAC sends is CSeq +1). This helps in ordering requests and matching responses to the requests (responses echo the same CSeq). The method name in CSeq is also used by UAS to detect out-of-order requests or retransmissions. Note that ACK and CANCEL have some special CSeq rules: ACK and CANCEL use the same CSeq number as the INVITE they refer to (since they are effectively part of that invite transaction), rather than getting their own new sequence numbers.

Contact

Specifies a direct URI where the user agent wants to be reached for future requests. In an INVITE, the UAC includes its Contact (like Contact: <sip:alice@192.0.2.4:5060> or an address that can be used to reach Alice's device). This is the address that the callee will send the ACK or BYE to, for example, instead of going through all proxies (unless Record-Route is set, see below). The Contact is essentially the current device address of the UA. The Contact header is crucial for direct routing of subsequent messages in a dialog. Proxies typically do not modify Contacts (except perhaps edge proxies for NAT traversal), it's end-to-end. In REGISTER requests, the Contact is the binding being registered (i.e., "associate this Contact with my AOR").

Max-Forwards

A hop count limiter, similar in concept to IP TTL. It's an integer value that gets decremented by one at each SIP hop (each proxy). If it reaches 0, the message is discarded (and an error sent). This prevents infinite loops in routing. For example, an INVITE might start with Max-Forwards: 70 (the default recommended initial value), and each proxy will reduce it. This ensures that misconfigured routes don't cause messages to circulate forever.

Content-Type and Content-Length

Content-Type indicates the MIME type of the message body, if any. In SIP, the body is often an SDP payload (with Content-Type: application/sdp). Other content types are possible (for example, instant MESSAGE may carry text/plain or a picture share might use some image type, etc., as long as both sides understand it). If there's no body, Content-Type may be omitted.

Content-Length is the size of the message body in bytes. This helps the receiver know where the message ends (especially important for TCP, as UDP has its own length from the packet).

Record-Route / Route

These headers are used by proxies to maintain themselves in the path of subsequent dialog requests. If a proxy wants to stay involved in the dialog (perhaps for call recording, policy enforcement, or because it's an outbound proxy), it will add a Record-Route header to the initial INVITE as it passes through. The UAs will then include those addresses in a Route header in future requests (e.g., the ACK, BYE) to route them through the same proxies. Record-Route ensures that even though Contact might allow direct UA-to-UA messaging, the proxies can insist that routing remain through them. Each proxy that Record-Routes is listed; the UAS copies Record-Route headers from request to response, and the UAC uses those to build a Route set for subsequent messages. If no Record-Route is used, the default is that the next requests in the dialog go directly to the Contact. (Record-Route is ignored in REGISTER and some other requests.) Most SIP service providers use Record-Route to keep signaling through their proxies (forming what's called a "signaling path" for the dialog).

Authorization / WWW-Authenticate

SIP uses an HTTP Digest authentication model. When a UAC sends a request that needs authentication (like REGISTER or an INVITE to a realm requiring auth), the server can respond with 401 Unauthorized (or 407 Proxy Authentication Required for proxy-auth) along with a WWW-Authenticate (or Proxy-Authenticate) header challenging the user. The UAC then resends the request with an Authorization (or Proxy-Authorization) header containing the credentials (username, realm, nonce, response, etc.) as per the HTTP Digest algorithm. This challenge-response handshake ensures the user is who they claim (given a shared password or secret). This is how SIP enforces user authentication for things like registration and calling. The details follow RFC 2617 (HTTP Digest). From the user perspective: the first call attempt might get a 401, then the phone automatically resends with credentials, then it succeeds with 200 OK.

Supported / Require

These headers are used to indicate extensions. A UAC can include a Supported header to list optional SIP extensions it supports (like 100rel, timer, replaces, etc.). A UAS or proxy can use Require to insist that the other side must support a certain extension or the request fails. For example, Require: 100rel might be in an INVITE to demand that provisional reliability (PRACK) is used. If the other side doesn't support it, it will send an error (420 Bad Extension). Generally, Require is used sparingly (only when an extension is critical), while Supported/Unsupported/Allow advertise capabilities.

The SIP headers collectively provide a rich amount of information and control. The complete set of SIP header fields is defined in RFC 3261 Section 20 and subsequent RFCs. For everyday use, the ones discussed above are the most crucial to understand the call flows.

Example: Complete SIP INVITE Message

Below is a complete example of a SIP INVITE request with all essential headers:

INVITE sip:bob@biloxi.com SIP/2.0
Via: SIP/2.0/UDP pc33.atlanta.com;branch=z9hG4bK776asdhds
Max-Forwards: 70
To: Bob <sip:bob@biloxi.com>
From: Alice <sip:alice@atlanta.com>;tag=1928301774
Call-ID: a84b4c76e66710@pc33.atlanta.com
CSeq: 314159 INVITE
Contact: <sip:alice@pc33.atlanta.com>
Content-Type: application/sdp
Content-Length: 142

v=0
o=alice 2890844526 2890844526 IN IP4 pc33.atlanta.com
s=Session SDP
c=IN IP4 pc33.atlanta.com
t=0 0
m=audio 49172 RTP/AVP 0
a=rtpmap:0 PCMU/8000

Key elements:

  • Request line: Method (INVITE), Request-URI (sip:bob@biloxi.com), SIP version
  • Via: Shows origin host with branch ID for transaction matching
  • From/To: Caller and callee identities (From has tag, To will get tag in response)
  • Call-ID: Unique identifier for this call
  • CSeq: Sequence number (314159) + method name
  • Contact: Where Alice can be directly reached
  • Content-Type/Length: Indicates SDP body follows
  • Body: SDP session description (see below)

Example: SDP Body

The Session Description Protocol (SDP) body in SIP messages describes the media session parameters:

v=0
o=alice 2890844526 2890844526 IN IP4 pc33.atlanta.com
s=Session SDP
c=IN IP4 pc33.atlanta.com
t=0 0
m=audio 49172 RTP/AVP 0 8 97
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:97 iLBC/8000
a=ptime:20
a=sendrecv
Line Meaning
v=0 SDP version (always 0)
o=alice 2890844526 2890844526 IN IP4 pc33.atlanta.com Origin: username, session-id, version, network type, address
s=Session SDP Session name (required but often ignored)
c=IN IP4 pc33.atlanta.com Connection info: where to send media (IP address)
t=0 0 Timing: start and stop times (0 0 = unbounded)
m=audio 49172 RTP/AVP 0 8 97 Media: type (audio), port (49172), protocol (RTP/AVP), payload types
a=rtpmap:0 PCMU/8000 Attribute: payload 0 = G.711 μ-law at 8000 Hz
a=rtpmap:8 PCMA/8000 Attribute: payload 8 = G.711 A-law at 8000 Hz
a=rtpmap:97 iLBC/8000 Attribute: payload 97 = iLBC codec
a=ptime:20 Packetization time: 20ms per packet
a=sendrecv Direction: bidirectional media

Common SDP direction attributes:

  • sendrecv – bidirectional (normal call)
  • sendonly – only sending (music on hold from server)
  • recvonly – only receiving
  • inactive – no media (call on hold)

Quick Reference: SIP Headers

Header Purpose Example
Via Route responses back; branch ID Via: SIP/2.0/UDP pc.atlanta.com;branch=z9hG4bK...
From Caller identity + tag From: "Alice" <sip:alice@atlanta.com>;tag=1234
To Callee identity + tag (in response) To: "Bob" <sip:bob@biloxi.com>;tag=5678
Call-ID Unique call identifier Call-ID: a84b4c76e66710@pc.atlanta.com
CSeq Request sequence number + method CSeq: 314159 INVITE
Contact Direct reachable address Contact: <sip:alice@192.0.2.4:5060>
Max-Forwards Hop limit (TTL) Max-Forwards: 70
Content-Type Body MIME type Content-Type: application/sdp
Content-Length Body size in bytes Content-Length: 142
Record-Route Proxy stays in path Record-Route: <sip:proxy.atlanta.com;lr>
Route Forced routing path Route: <sip:proxy.atlanta.com;lr>
Authorization Auth credentials Authorization: Digest username="alice"...
WWW-Authenticate Auth challenge WWW-Authenticate: Digest realm="atlanta.com"...
Supported Optional extensions supported Supported: 100rel, timer, replaces
Require Mandatory extensions Require: 100rel
Allow Supported methods Allow: INVITE, ACK, BYE, CANCEL, OPTIONS
Expires Registration/subscription lifetime Expires: 3600
Session-Expires Session timer value Session-Expires: 1800;refresher=uac
Refer-To Transfer target Refer-To: <sip:carol@chicago.com>
Event Event package type Event: presence
Subscription-State Subscription status Subscription-State: active;expires=3600

Dialog Identification

A SIP dialog is uniquely identified by the combination of three values:

Component Source When Set
Call-ID UAC generates Initial request
From-tag UAC generates Initial request
To-tag UAS generates First response (usually 1xx or 2xx)

SIP Call Flow Example (Basic Call Setup)

To ground these concepts, let's walk through a typical SIP call flow for a voice call. This example will use the canonical scenario often called the "SIP trapezoid": Alice calls Bob, with each of their devices using a SIP service provider (their domains atlanta.com and biloxi.com respectively), and there are proxy servers in between. We'll illustrate the message exchange step by step:

Let's explain this call flow step by step:

INVITE – Alice calls Bob

Alice's SIP UA (softphone) sends an INVITE request to Bob's SIP URI (sip:bob@biloxi.com). The INVITE goes to Alice's configured proxy (at atlanta.com) as the outbound proxy for her domain. This INVITE includes Alice's SDP offer (e.g., proposing an audio stream) and has headers: From (Alice's URI, with a tag), To (Bob's URI, no tag yet), Call-ID, CSeq, Contact (Alice's contact address), etc. The atlanta.com proxy receives this INVITE from Alice.

Routing – Proxy atlanta.com forwards toward biloxi.com

The atlanta.com proxy now acts as a UAC on behalf of Alice, forwarding the INVITE to the next hop. It needs to determine how to reach biloxi.com. It likely performs a DNS lookup (RFC 3263) to find the SIP server for biloxi.com. Let's say it finds an address for the biloxi.com proxy. Before forwarding, the atlanta proxy adds its own Via header (so that responses come back to it) and typically a Record-Route header if it wants to stay on the path. It then sends the INVITE to the biloxi.com proxy server.

100 Trying – Provisional response by proxies

Upon receiving the INVITE, the biloxi.com proxy will usually generate a quick 100 Trying response back towards Alice. This is a provisional (1xx) response that indicates "I got the INVITE, I am working on it." It's not end-to-end; proxies send 100 Trying upstream to stop retransmissions from the previous hops. In our flow, the biloxi proxy sends 100 Trying immediately back to atlanta proxy, which in turn forwards 100 Trying back to Alice's UA. Alice's phone receives "100 Trying" from Atlanta, meaning the call attempt is proceeding. The proxies won't forward any further 100 response beyond the first – once Alice got a trying from atlanta.com, she knows her INVITE is being handled.

Proxy to UAS – Invite reaches Bob's server

The biloxi.com proxy now forwards the INVITE to Bob's SIP phone (UAS). First, it needs to figure out where Bob is registered. It likely consults a location service (populated by Bob's registrations). Suppose Bob is registered at an IP or another proxy; for simplicity, assume Bob's device is directly registered with biloxi.com and reachable. The proxy forwards the INVITE to Bob's Contact address. It will also add itself to Record-Route on the way out if not already.

180 Ringing – Bob's phone rings

When Bob's SIP phone (UAS) receives the INVITE, Bob isn't picking up yet, so the phone sends a 180 Ringing provisional response. This indicates to the caller that the callee's endpoint is alerting (ringing). The 180 Ringing travels back through the proxies: Bob's phone sends it to biloxi.com proxy, which forwards it to atlanta.com proxy, which then sends it to Alice's UA. Alice's softphone receives "180 Ringing" and can now typically play a ringback tone or show "Ringing" on the UI to inform Alice that Bob's phone is ringing. (This 180 may also contain an SDP if early media is to be established, but usually ringing itself doesn't need SDP – a 183 Session Progress would be used if early media like announcements were sent).

200 OK – Bob answers

When Bob picks up the phone, his SIP phone (UAS) sends a 200 OK (Success) response. This is a final response indicating the call is accepted. The 200 OK carries Bob's SDP answer, which contains the agreed media parameters (IP/port for Bob, chosen codec, etc.) that answer the offer from Alice. This 200 OK goes from Bob's phone to the biloxi proxy, then to the atlanta proxy, and then to Alice's UA, following the Via headers added earlier. When Alice's UA receives the 200 OK, the dialog is considered established – at this point, both sides have exchanged SDP and agreed on the session, and a SIP dialog (identified by Call-ID + tags) is in place connecting Alice and Bob for this call.

ACK – Alice confirms receipt of 200 OK

Upon receiving the 200 OK, Alice's UA must send an ACK request to confirm it. The ACK for a successful INVITE is sent end-to-end directly to the UAS (Bob's phone) in this scenario. How does Alice know where to send it? The Contact header in the 200 OK likely contains Bob's direct address, and any Record-Route headers from proxies are used to build a Route set. So, Alice's ACK will go to atlanta proxy (if Record-Route was set) then biloxi proxy then to Bob, or possibly directly if none. In our example, since proxies did Record-Route, the ACK will traverse atlanta -> biloxi -> Bob (just like the INVITE path). However, note that ACK is a peculiar case: for a 200 OK, the ACK is a separate transaction and not retransmitted by the UAS. If it gets lost, Bob's phone will retransmit the 200 OK periodically until an ACK arrives (for reliability). Alice's ACK contains no SDP (usually) and no response is expected to the ACK. Once Bob's phone receives the ACK, the call is formally established.

Media Session – Audio conversation begins

After the 200 OK/ACK handshake, Alice and Bob begin the media session (shown as a double arrow in the diagram). They send RTP packets for audio (and/or video) directly between their IP addresses as negotiated (or through any media relays if configured, but at SIP level we consider the session established). SIP itself is quiet during the conversation, allowing the media layer to handle the real-time stream.

BYE – Terminating the call

Let's say Bob hangs up after some time. Bob's phone will send a BYE request to end the session. The BYE is a SIP request that is sent within the existing dialog (so it will have the same Call-ID and the To/From tags from earlier) and it is routed along the established path. In our case, Bob's phone sends BYE to the biloxi.com proxy (since the dialog's Route set includes it), which forwards to atlanta.com proxy, then to Alice's UA. Alice's UA receives the BYE from Bob.

200 OK (BYE) – Call terminated

Alice's UA responds with 200 OK to acknowledge the BYE (this 200 OK for the BYE is a final response indicating the session is terminated). That 200 OK travels back to Bob (through atlanta and biloxi proxies). Once Bob's side gets the 200 OK for its BYE, the call is fully terminated on both sides. They will stop the media session. The SIP dialog is torn down at this point.

This completes the basic call flow. Throughout this flow, various headers played their roles: e.g., multiple Via entries were present on the INVITE and thus on responses, the proxies used the branch in Via to match responses to requests, the To tag appeared in the 200 OK establishing the dialog, and so on. Also note some special cases: the 100 Trying is generated by proxies to suppress retransmissions (Alice's UA would retransmit INVITE if no response at all), and the ACK for the final answer is a separate transaction (with no response) that went end-to-end. If Bob had never answered, Alice might have sent a CANCEL (covered next) or Bob's side might send a 408 Request Timeout or some 4xx rejection.

Canceling a call

If Alice decided to hang up before Bob answered (for example, she gets tired of waiting), her UA could send a CANCEL request for the INVITE. CANCEL is a separate request that uses the same Call-ID, CSeq (same number, method "CANCEL"), and Via as the INVITE it's canceling. The CANCEL travels through the proxies hop-by-hop (each proxy responds to CANCEL with 200 OK immediately), and when the CANCEL reaches Bob's UAS, if Bob hasn't answered yet, his phone will terminate the ringing and respond to the original INVITE with 487 Request Terminated. Alice's UA, upon seeing 487, knows the call was canceled. (If Bob had already sent a 200 OK moments before CANCEL arrived, then the CANCEL has no effect – a UAS ignores CANCEL if a final response has already been sent. CANCEL is only useful before a call is answered, and specifically for INVITE; you don't cancel other methods in practice. Also by spec, a UAC should not send CANCEL until it has received at least one provisional response for the INVITE, to avoid a race where CANCEL arrives before the INVITE is even processed.)

Quick Reference: Basic Call Flow

Step Direction Message Purpose
1 Alice -> Proxy INVITE Initiate call (with SDP offer)
2 Proxy -> Alice 100 Trying Stop retransmissions
3 Bob -> Proxy 180 Ringing Callee alerting
4 Bob -> Proxy 200 OK Call answered (with SDP answer)
5 Alice -> Bob ACK Confirm receipt of 200 OK
6 Alice <-> Bob RTP Media session
7 Bob -> Alice BYE End call
8 Alice -> Bob 200 OK Confirm termination

Transactions, Dialogs, and Sessions

It's important to understand the layering of SIP's concepts: transactions, dialogs, and sessions. These three concepts operate at different scopes and lifetimes:

Transaction

A Transaction is a single request and all of its responses (excepting the ACKs for 2xx). It is the fundamental unit of message handling in SIP. For example, the initial INVITE request and the final 200 OK (and all provisional responses in between) constitute a single transaction. A separate transaction was the ACK for that 200 OK (because ACK to 2xx is not considered part of the invite transaction). The BYE and its 200 OK was another transaction. SIP's state machines (client transaction, server transaction) handle retransmissions and timeouts at the transaction level. Transactions are identified by the CSeq number, request method, and some headers like branch ID, etc. Provisional responses (1xx) do not end a transaction, while final responses (2xx-6xx) do. Once a final response is sent and its ACK handled (if applicable), the transaction is completed. If using UDP, SIP relies on retransmission timers: the UAC retransmits requests (like INVITE) until a response is received, the UAS retransmits 200 OK until an ACK is received, etc., as defined by the transaction timers in the RFC.

Dialog

A Dialog is a peer-to-peer relationship between two UAs that persists for some time, typically created by an INVITE transaction's successful completion. In our flow, once Alice received the 200 OK from Bob, a dialog was established. The dialog is identified by the combination of Call-ID, Alice's tag (From tag), and Bob's tag (To tag). Within this dialog, either UA can send new requests (called in-dialog requests), such as BYE or re-INVITE, which then form new transactions but are within the context of the existing dialog. The dialog maintains state like the route set (learned from Record-Route), the remote target (the Contact of the other side), and sequence number expectations. It's basically the SIP "call state." A dialog lasts until terminated by a BYE (or error like 408 if one side goes down, or explicit termination for other dialog usages like SUBSCRIBE which might have an expiration). Dialogs are important because they allow the two endpoints to have a context for further messages: for example, a BYE doesn't make sense without a dialog – it needs to know which call to terminate. Also, dialog state is used for things like mid-call requests (UPDATE or re-INVITE to modify media, INFO, etc.). Note: certain methods like REGISTER or OPTIONS are typically outside of dialogs (they are standalone), and SUBSCRIBE/NOTIFY can create their own subscription dialogs separate from call dialogs. But an INVITE 2xx and its ACK always establish a dialog (unless it's a special stateless case like CANCEL/ACK which don't create dialogs). Dialogs also have a notion of "local" and "remote" CSeq numbers to track ordering of requests.

Session

A Session in SIP refers to the actual media session negotiated – e.g., the audio/video session described by SDP and carried via RTP. It's what users think of as "the call" in terms of media. SIP's job is to set up, modify, and tear down sessions. A dialog often corresponds one-to-one with a session (the call between Alice and Bob). However, technically you could have a dialog without an active session (for example, a SUBSCRIBE/NOTIFY dialog has no media session, it's just a subscription state, or an INVITE that completed but no media was sent yet). Also, a single dialog can manage multiple media streams (audio + video, etc., part of one session description) or can be updated with new sessions (hold/resume with new SDP, etc.). Generally, though, when the dialog ends (BYE), the session is gone too. The terms "call" and "session" are often used interchangeably, though session specifically refers to the set of media streams.

SIP's design separates transactions from dialogs: Transactions are like the individual request/response exchanges (short-lived), whereas Dialogs are the long-lived connection or context (which can span multiple transactions). For instance, the INVITE/200/ACK was one transaction that established the dialog; the BYE/200 was another transaction within the same dialog. Within a dialog, each new request increments the CSeq and is processed in order.

One special case: the initial INVITE transaction for a dialog is slightly different from subsequent ones because of the three-way handshake for INVITE (INVITE -> 200 OK -> ACK). As noted, the ACK for a 2xx response is not considered part of the transaction – so the INVITE transaction is actually only completed by the 200 OK. The ACK is its own (with no response expected) to confirm the dialog. For non-2xx final responses, however, the ACK is considered part of that transaction (to conclude it). This distinction exists because SIP needed to solve reliability for the 200 OK over an unreliable transport: the UAS can't rely on the transaction layer's retransmit mechanism for 200 OK (since the INVITE transaction would technically end at the 200 OK), so instead the UAS itself retransmits the 200 OK until an ACK arrives. This is a unique quirk of SIP's INVITE handling.

Quick Reference: Transaction vs Dialog vs Session

Concept Scope Lifetime Identified By Example
Transaction Single request/response Seconds Branch ID, CSeq, Method INVITE + 100/180/200
Dialog UA-to-UA relationship Minutes to hours Call-ID + From-tag + To-tag Entire call signaling
Session Media exchange Duration of call SDP negotiation RTP audio/video stream
ACK Behavior For 2xx Response For non-2xx Response
Part of INVITE transaction? No (separate transaction) Yes (same transaction)
Retransmitted? No No
What triggers it? Receiving 200 OK Receiving 4xx/5xx/6xx
UAS behavior if no ACK Retransmits 200 OK Retransmits error response
Routing End-to-end via Route set Hop-by-hop like original INVITE

Registration and Location Service

Before calls can be made, user agents typically register with their SIP server. Registration is how a SIP UA announces "Here I am, at this address" to the network. The REGISTER request binds a user's Address-of-Record (AOR) (which is a SIP URI like sip:alice@atlanta.com) to one or more Contact URIs (device addresses).

For example, when Alice opens her softphone app, it sends: REGISTER sip:atlanta.com SIP/2.0 with headers including To: sip:alice@atlanta.com, From: sip:alice@atlanta.com (with a new tag), Contact: <sip:alice@192.0.2.4:5060>, and often an Expires header (or Contact parameter) indicating how long this registration should be valid (e.g., 3600 seconds). The REGISTER is sent to the registrar server for atlanta.com (often co-located with the proxy). If authentication is required, the server will respond 401 Unauthorized with a challenge, and Alice's UA will re-send the REGISTER with proper Authorization credentials (username/password) – a process identical to HTTP digest auth as mentioned. Once authorized, the registrar responds with 200 OK. At this point, Alice is "registered." The registrar has stored the mapping: AOR=sip:alice@atlanta.com -> Contact=sip:alice@192.0.2.4:5060 (plus an expiration time).

Registrations have a limited lifetime (the Expires header or Contact's expires parameter dictates this). Alice must periodically refresh her registration (by sending a new REGISTER before expiry) to keep it active, or she can send a REGISTER with Expires: 0 to remove a registration (logout). Multiple devices can register for the same AOR with different Contacts; the registrar will store all of them (often proxies will then fork incoming INVITEs to all Contacts). The Contact header in REGISTER can also contain a q-value for priority or other parameters (defined in RFC 3261 and extended in RFC 3840 for capabilities).

The Location Service is the database that the registrar populates with these contacts. When an incoming INVITE for Alice arrives at atlanta.com, the proxy queries the location service to find where to send the INVITE (e.g., to Alice's registered IP). This is how routing of incoming calls works in SIP, enabling user mobility. If Alice moves networks and sends a new REGISTER, the location service updates to her new Contact. If she unregisters (or it expires), callers will get a 404 Not Found or be sent to voicemail, etc., depending on server policy.

Quick Reference: Registration

Action Expires Value Result
Initial registration e.g., 3600 Binding created
Refresh registration e.g., 3600 Binding updated
Unregister 0 Binding removed
Fetch bindings (no Contact) Returns current bindings
Header In REGISTER Request Notes
Request-URI sip:domain.com Registrar's domain
To sip:user@domain.com AOR being registered
From sip:user@domain.com Usually same as To
Contact <sip:user@ip:port> Device's reachable address
Expires 3600 Registration lifetime (seconds)
Call-ID (consistent) Same for all registrations from UA
CSeq (incrementing) Increments each registration

SIP Extensions and Advanced Features

The core SIP specification (RFC 3261) is augmented by numerous other RFCs that introduce new features. Understanding every extension is a massive undertaking (the SIP RFC series is extensive), but here we'll summarize some of the key extensions and advanced concepts that build on the basics we've covered:

Reliability of Provisional Responses (RFC 3262)

In basic SIP, provisional responses (1xx) are not acknowledged at the transaction layer – they are sent unreliably (except 100 Trying isn't forwarded). RFC 3262 adds an option to send provisional responses reliably. It introduces the PRACK method (Provisional ACK) which acts like an ACK for a provisional response. To use this, the UAS includes a header Require: 100rel (and the UAC might offer Supported: 100rel). A provisional response (like 180 Ringing or 183 Session Progress) is sent with a RSeq header (sequence number) and the UAC must respond with PRACK to acknowledge it. This ensures, over UDP, that important provisional responses (especially ones carrying early media in 183 or SIP precondition info) are not lost. PRACK itself is just another request within the dialog (with its own 200 OK response). The use of PRACK is negotiated per call. If used, provisional responses are essentially reliable, similar to final responses.

Offer/Answer Model with SDP (RFC 3264)

This RFC goes hand-in-hand with SIP, detailing how SDP is used in INVITE/200/ACK to negotiate media. It specifies that one party offers a set of media (codecs, IPs, ports) and the answerer picks from those. We touched on this earlier; key points are that all SIP UAs must support SDP, and that an INVITE can contain an SDP offer. If it does, the 200 OK must contain the answer. If the INVITE has no SDP, it's implying the offer comes in the 200 and then the ACK from caller contains the answer. The model also allows for re-INVITEs or UPDATE to change the session later (e.g., hold by setting a=inactive or sendonly, codec change, adding video, etc.). The Offer/Answer RFC (RFC 3264) is fundamental for ensuring both ends agree on the media session parameters.

Session Timer (RFC 4028)

This extension introduced a keep-alive mechanism for SIP dialogs. A session timer is an interval negotiated between UA and proxy (or UAS) such that if no refreshing request (like re-INVITE or UPDATE) is sent within that time, the session is considered dead. This helps in cleaning up zombie calls if one side crashes and stops sending media – without session timer, a call might remain "active" forever if BYE is never sent. With session timers, one side (the "refresher") will send a periodic re-INVITE/UPDATE (with Session-Expires header) to check that the dialog is still active. If the refresh fails (no response), the call is terminated. This is often used in networks to make sure hung calls don't tie up resources. The timer value might be, say, 30 minutes or even 2 minutes in some systems – if both sides support it (Require: timer or Supported: timer is used to negotiate). See RFC 4028 for details.

REFER Method (RFC 3515)

REFER is used for call transfer and similar features. When Alice wants to transfer Bob to Carol, Alice (who is in a call with Bob) sends Bob a REFER with a Refer-To: Carol's URI. Bob's UA, upon receiving REFER, will act as if Bob is calling Carol (essentially it triggers a new INVITE to Carol). The outcome of that referral (whether Carol answered) is reported back to Alice via NOTIFY (REFER defined an event package for the result of the transfer). This mechanism allows one party to ask the other to initiate a new request on their behalf. It's not limited to transferring calls, but that's the common use (attended or unattended transfer scenarios). See RFC 3515.

NAT Traversal

Symmetric Response Routing (rport, RFC 3581)

NATs pose a big problem for SIP because the SIP headers carry IP addresses/ports that might be local addresses, and the endpoints might be behind firewalls. RFC 3581 introduced the rport parameter for Via headers. When a UAC behind a NAT includes Via: ...;rport, it is asking the server to send the response back to the source IP and port the request came from, rather than the address in the Via (which might be the private IP). This helps responses get back through NATs because they go to the NAT's mapped address/port (since the request came from there). Most modern SIP devices use rport by default.

Outbound (RFC 5626)

RFC 5626 (Outbound) goes further: it allows a UA to register via a persistent flow (TCP/TLS or UDP with keep-alives) and maintain that connection for incoming requests as well. The UA would register with an Instance-ID and possibly multiple contacts with different flow-IDs, and the registrar/proxy will use the same connection to forward incoming INVITEs. Outbound also defines keep-alive messages (CRLF or STUN pings) to keep NAT bindings open. This extension is crucial for mobile devices and NAT-heavy environments (e.g., SIP over cellular networks).

Quick Reference: SIP Extensions

Extension RFC Purpose Key Headers/Methods
100rel 3262 Reliable provisional responses PRACK, RSeq, RAck
timer 4028 Session keep-alive Session-Expires, Min-SE
replaces 3891 Replace existing dialog Replaces header
join 3911 Join existing dialog Join header
norefersub 4488 Suppress REFER subscription Refer-Sub header
outbound 5626 NAT traversal, persistent flows Instance-ID, reg-id
gruu 5627 Globally Routable UA URI Contact with gr parameter
path 3327 Edge proxy routing Path header
Event Package RFC Purpose
presence 3856 User presence status
message-summary 3842 Voicemail waiting indicator
refer 3515 REFER result notification
dialog 4235 Dialog state notification
conference 4575 Conference state
reg 3680 Registration state

Key RFCs

RFC Title Category
RFC 3261 SIP: Session Initiation Protocol Core
RFC 3262 Reliability of Provisional Responses (PRACK) Reliability
RFC 3263 Locating SIP Servers DNS/Routing
RFC 3264 An Offer/Answer Model with SDP Media
RFC 3265 SIP-Specific Event Notification Events
RFC 3311 The UPDATE Method Session modification
RFC 3326 The Reason Header Field Diagnostics
RFC 3428 Extension for Instant Messaging IM
RFC 3515 The REFER Method Call transfer
RFC 3581 Symmetric Response Routing (rport) NAT
RFC 3856 A Presence Event Package Presence
RFC 3903 Extension for Event State Publication Presence
RFC 4028 Session Timers in SIP Keep-alive
RFC 5411 A Hitchhiker's Guide to SIP Reference
RFC 5626 Managing Client-Initiated Connections NAT/Outbound
RFC 6665 SIP-Specific Event Notification (update) Events

See Also