Understanding the SIP Protocol: Difference between revisions

Revision as of 23:23, 11 December 2025

Session Initiation Protocol (SIP) is an application-layer signaling protocol designed for creating, modifying, and terminating multimedia sessions over IP networks. These sessions can include Internet telephone calls (VoIP), video conferences, or any combination of multimedia streams. SIP itself handles the signaling and control portion – it establishes the session parameters – while the actual media (audio, video, etc.) is carried over separate protocols (typically RTP). SIP messages are text-based (similar to HTTP) and use a request/response model. SIP invitations to sessions carry session descriptions (usually using the SDP protocol) so that participants can agree on media types and formats. A key design goal of SIP is protocol agility: it is independent of the underlying transport (UDP, TCP, TLS, etc. on port 5060/5061 by default) and of the type of session being established.

SIP was originally defined in RFC 2543 and later refined in RFC 3261 (2002), which became the core SIP standard. Over time, numerous extension RFCs have expanded SIP's capabilities (for reliability, events, IM, security, etc.), making SIP a broad and powerful framework for signaling. Despite the many formal definitions in the RFCs, this guide aims to explain SIP in an accessible way – serving as a "cheat sheet" to understand SIP signaling without getting lost in the exhaustive RFC language. We will cover SIP's architecture, message format, call flow, and key features/extensions, providing a solid reference for anyone new to SIP or looking to grasp the full picture.

SIP Architecture and Core Components

SIP is a peer-to-peer protocol with a client-server design for message exchange. Its architecture defines several types of network entities, each with specific roles:

User Agent (UA)

A UA is an endpoint in SIP, typically a user's device or software (softphone). It represents an end system and can function as both a client and a server. A UA has two logical sub-roles: a User Agent Client (UAC), which initiates requests, and a User Agent Server (UAS), which responds to requests. For example, if Alice's phone calls Bob's phone, Alice's UA acts as a UAC (sending an INVITE request) and Bob's UA acts as a UAS (receiving the INVITE and sending a response). Once a session is established, the roles can flip for each new transaction (e.g. Bob's phone sending a BYE will be UAC for that request and Alice's phone UAS to respond).

Proxy Server

A SIP proxy is an intermediary that routes SIP requests and responses on behalf of UAs. When a UA sends a request to a SIP address, it typically goes to a proxy server in the domain, which then forwards it towards the destination. Proxies handle tasks like routing logic (determining the next hop or target for a request), enforcing policies, and potentially authentication. A request may traverse multiple proxies in sequence. Each proxy may add or modify certain headers (like Via or Record-Route) before forwarding. Responses automatically follow the reverse path of the request through those proxies. Proxies can be stateful (maintaining transaction state, allowing forks and smarter handling) or stateless (simply forwarding messages and forgetting them). For example, a stateful proxy might fork an INVITE to ring multiple devices and manage the responses, whereas a stateless proxy just retransmits messages. Being a proxy is a logical role – in practice, a single server often acts as a proxy for some requests and a UAS/UAC for others depending on context.

Registrar

A registrar is a server that handles user registration. UAs use the REGISTER method to sign in with a SIP service, providing their address and current location (IP address or forwarding address). The registrar accepts REGISTER requests and stores the information in a location service (a database of user addresses) for its domain. This allows proxies to later look up where a user is currently reachable. In essence, a registrar binds a user's permanent SIP URI (their Address-of-Record like sip:alice@atlanta.com) to the Contact address of their device. Registrars often reside on the same server as a proxy for a domain. For example, when Alice's softphone comes online, it sends a REGISTER to atlanta.com containing her AOR (alice@atlanta.com) and her device's network address; the registrar server at atlanta.com will save that binding so that future calls to Alice can be routed to her device.

Redirect Server

A redirect server is a UAS that does not forward requests but instead sends back a special 3xx redirect response informing the UAC of a different route or address to try. Essentially, it redirects the client to contact an alternate server or URI. For example, if a user has moved to a different domain, a redirect server might respond with "302 Moved Temporarily" and the new contact address. The UAC then sends a new request to that address. Redirect servers offload routing logic from proxies by having the clients handle it.

Back-to-Back User Agent (B2BUA)

A B2BUA is not a defined role in the core SIP spec's transaction model, but it's worth mentioning because of its common use in practice (such as in PBX or SBC systems). A B2BUA is an entity that acts as a UA on both sides of a call – effectively terminating the SIP dialog on one side and creating a new one on the other side. Unlike a proxy, which passes messages along, a B2BUA maintains full state of the call and can perform deep packet inspection or modification. It behaves like a UAS to the caller and as a UAC to the callee, bridging the two call legs. This is used for scenarios like protocol interworking, media handling (since it can also manipulate media), or enforce policies where a proxy's limited role isn't enough.

These components can be combined in single physical servers or distributed. In a typical VoIP service, a server might act as proxy + registrar for a domain (accepting registrations and routing calls for users of that domain). Location service databases are used by registrars and proxies to map user addresses (AORs) to current device locations.

Addressing

SIP addresses are in the form of Uniform Resource Identifiers (URI). A user's public address-of-record looks like an email (e.g. sip:alice@atlanta.com). This URI can be resolved to the user's current Contact address via the registrar's location service. SIP URIs can also embed telephone numbers (e.g. sip:1234567890@pstn.provider.net) and may use a tel: URI scheme for phone numbers in certain cases (RFC 3824). There is also a secure form sips: (SIP Secure) which mandates that the request be sent over a secure transport (TLS) end-to-end. When a UA wants to reach another user, it sends a request to the domain part of the SIP URI. SIP relies on the DNS infrastructure for server location: the procedures in RFC 3263 define that the client will use DNS SRV records, NAPTR records, and A/AAAA records to find the SIP server for the target domain. For example, to send a request to sip:bob@biloxi.com, Alice's device will DNS-resolve biloxi.com for SIP service, possibly discovering a proxy server address to send the request to. This allows SIP to route messages globally using the DNS naming system.

Transport and Network

SIP messages can be transported over UDP (most common for telephony), TCP, or TLS-encrypted TCP (for secure SIP). It's flexible and even SCTP or WebSockets can be used (e.g. SIP over WebSocket in web apps). The protocol includes mechanisms to handle issues like fragmentation (e.g. large messages should use TCP) and network failures (via retransmission timers especially on UDP). NAT traversal can be challenging for SIP, because SIP messages and SDP often carry IP addresses and expect end-to-end connectivity. Extensions like rport (RFC 3581) and "outbound" (RFC 5626) address some of these issues (see later section on extensions), enabling symmetric response routing and keep-alive mechanisms to handle NATs.

Quick Reference: SIP Components

Component	Role	Key Function
User Agent (UA)	Endpoint	Initiates/receives calls. Acts as UAC (client) or UAS (server).
Proxy Server	Intermediary	Routes requests/responses. Can be stateful or stateless.
Registrar	Registration	Accepts REGISTER, stores user location bindings.
Redirect Server	Routing	Returns 3xx responses with alternate contact addresses.
B2BUA	Call control	Terminates/originates dialogs on both sides. Full call state.
Location Service	Database	Stores AOR-to-Contact mappings for user lookup.

URI Scheme	Description	Example
`sip:`	Standard SIP URI	`sip:alice@atlanta.com`
`sips:`	Secure SIP (TLS required)	`sips:alice@atlanta.com`
`tel:`	Telephone number	`tel:+1-555-123-4567`

Transport	Port	Notes
UDP	5060	Most common, requires retransmission handling
TCP	5060	For large messages, reliable delivery
TLS	5061	Encrypted signaling
WebSocket	80/443	For web applications (RFC 7118)

SIP Messages: Requests and Responses

SIP is a text-based protocol that exchanges messages in a format similar to HTTP. There are two types of SIP messages: Requests (also called methods) sent by clients to initiate an action, and Responses sent by servers (or UAs) to convey the result of that request. Each SIP message consists of a start line, zero or more header fields, a blank line, and an optional message body.

A Request start-line includes a method name and a Request-URI (the target address) along with the SIP version. For example: INVITE sip:bob@biloxi.com SIP/2.0. There are a number of standard methods defined. The core SIP specification (RFC 3261) defined six basic methods, and subsequent RFCs added additional methods for extended functionality. Below is a list of the common SIP request methods and their purpose:

Core SIP Methods

INVITE – Establishes a session (initiate a call). This method is used to invite one or more participants to a session. It can carry session description details (SDP) to set up media. A successful INVITE results in a dialog and session between endpoints.

ACK – Confirms that the client has received a final response to an INVITE. The ACK is used only with INVITE (to acknowledge the receipt of a 200 OK or other final response in some cases). We will discuss its special role in the call flow later.

BYE – Terminates an established session (hangs up a call). Either participant in a call sends a BYE to end the call when it's finished.

CANCEL – Cancels a pending request (typically used to cancel an INVITE that hasn't been answered yet). If you start a call and want to abort before it's answered, a CANCEL is sent.

REGISTER – Registers the UA's address with a SIP server. UAs send REGISTER to a registrar to upload their current contact information (binding their UA's network location to their SIP URI).

OPTIONS – Queries the capabilities of a server or another UA. This is like a "ping" that can ask what methods or media types the other side supports. It's often used for keep-alive or diagnostic purposes too.

Extension Methods

In addition to these core methods, several extension methods have been introduced by various RFCs to extend SIP's functionality:

PRACK – Provisional Acknowledgment. PRACK (defined in RFC 3262) is used to acknowledge provisional responses (1xx) that are sent reliably. It improves reliability of ringing or early media responses (see section on provisional reliability).

SUBSCRIBE – Subscribes to an event on a server. Defined in RFC 3265, SUBSCRIBE allows a client to request notifications of events (such as presence changes, message waiting, etc.) from another entity.

NOTIFY – Sends an event notification to a subscriber. When an event a user subscribed to occurs, the notifier (usually a server or UA) sends a NOTIFY to inform the subscriber of the new state.

PUBLISH – Publishes an event state to a server. Defined in RFC 3903, PUBLISH allows a UA to push its current state (e.g., presence information) to a server, which can then distribute it to subscribers.

INFO – Sends mid-session information that does not modify the session state. Defined in RFC 2976, this method is often used for sending DTMF tones or other signals during a call in-band (though newer mechanisms may replace INFO for that).

REFER – Asks the recipient to issue a new request (typically to transfer a call). Defined in RFC 3515, REFER is used to instruct a UA to contact a third party (e.g., Alice, in a call with Bob, sends Bob a REFER to call Charlie – effectively transferring or adding a party).

MESSAGE – Conveys an instant message (IM) within a SIP dialog or as a standalone out-of-dialog message. Defined in RFC 3428, MESSAGE carries textual chat content in the SIP body, enabling basic instant messaging.

UPDATE – Modifies the session parameters of an existing dialog before the final INVITE response. Defined in RFC 3311, UPDATE can change session settings (like codecs or media streams) or send an offer/answer negotiation in early dialog, without waiting for the initial INVITE to complete.

(Note: There are a few more methods and many SIP header extensions defined in various RFCs and domain-specific SIP profiles (e.g., INFO packages, PING as a keepalive in some systems, etc.), but the above are the primary methods you'll encounter. Together, they make SIP a very flexible protocol.)

SIP Response Codes

A Response start-line begins with a numeric status code (similar to HTTP codes) and a reason phrase, plus the SIP version. For example: SIP/2.0 180 Ringing or SIP/2.0 486 Busy Here. SIP responses are categorized by their class (hundreds digit):

1xx – Informational: provisional responses, used to convey that the request is being processed but not yet completed. These include 100 Trying (an interim response from proxies/UAS to stop retransmissions and indicate progress), 180 Ringing (the callee's phone is ringing), 183 Session Progress (often used to convey early media like ringback tones).

2xx – Success: the request succeeded. 200 OK is the general successful response for most requests (meaning the action is completed). INVITE's 200 OK specifically means the call is answered (and usually contains SDP media details).

3xx – Redirection: the request should be tried at a different location. For example, 301 Moved Permanently or 302 Moved Temporarily provide an alternate contact (these are used by redirect servers or UAs that want to redirect calls).

4xx – Client Error: the request is bad or cannot be fulfilled as is. This includes things like 400 Bad Request (malformed message), 401 Unauthorized (requires authentication), 404 Not Found (user not found), 486 Busy Here (the target UA is busy), 487 Request Terminated (request was canceled).

5xx – Server Error: the server (recipient) failed to fulfill a valid request. E.g., 500 Server Internal Error, 503 Service Unavailable (often means overload or maintenance).

6xx – Global Error: the request cannot be fulfilled by any server globally. E.g., 603 Decline (the user rejected the call), 604 Does Not Exist Anywhere. These indicate failure that shouldn't be retried elsewhere.

Only final responses (2xx–6xx) terminate a SIP transaction. Provisional (1xx) responses are informative and do not terminate the transaction (except they may cease retransmissions of the request in some cases). Some specific response codes have special handling in SIP (for example, 100 is never forwarded by proxies, 407 Proxy Authentication Required triggers proxy auth, 487 is used to indicate a canceled request, etc.), but the above categories suffice for a general understanding.

Quick Reference: SIP Methods

Method	RFC	Creates Dialog?	Description
INVITE	3261	Yes	Initiate session/call
ACK	3261	No	Confirm INVITE final response
BYE	3261	No (ends)	Terminate session
CANCEL	3261	No	Cancel pending INVITE
REGISTER	3261	No	Register contact with server
OPTIONS	3261	No	Query capabilities
PRACK	3262	No	Acknowledge reliable provisional
SUBSCRIBE	3265	Yes	Subscribe to events
NOTIFY	3265	No	Send event notification
PUBLISH	3903	No	Publish event state
INFO	2976	No	Mid-session info (DTMF)
REFER	3515	No	Request call transfer
MESSAGE	3428	No	Instant message
UPDATE	3311	No	Modify session (early dialog)

Quick Reference: Response Codes

Code	Meaning	Notes
1xx – Provisional (Informational)
100	Trying	Stops retransmissions, not forwarded by proxies
180	Ringing	Callee alerting
181	Call Being Forwarded	Call is being forwarded
182	Queued	Call queued
183	Session Progress	Early media / progress info
2xx – Success
200	OK	Request succeeded
202	Accepted	Request accepted (async processing)
3xx – Redirection
300	Multiple Choices	Multiple options available
301	Moved Permanently	User permanently at new location
302	Moved Temporarily	User temporarily at new location
305	Use Proxy	Must use specified proxy
4xx – Client Error
400	Bad Request	Malformed syntax
401	Unauthorized	Requires authentication
403	Forbidden	Request refused
404	Not Found	User not found
405	Method Not Allowed	Method not supported
407	Proxy Auth Required	Proxy authentication needed
408	Request Timeout	No response in time
415	Unsupported Media Type	Body format not supported
420	Bad Extension	Required extension not supported
480	Temporarily Unavailable	Callee unavailable
481	Call/Transaction Does Not Exist	Dialog/transaction not found
486	Busy Here	Callee busy
487	Request Terminated	Request was CANCELed
488	Not Acceptable Here	SDP not acceptable
5xx – Server Error
500	Server Internal Error	Server failure
501	Not Implemented	Method not implemented
502	Bad Gateway	Gateway error
503	Service Unavailable	Server overloaded/maintenance
504	Server Timeout	Gateway timeout
6xx – Global Failure
600	Busy Everywhere	All endpoints busy
603	Decline	Call declined by user
604	Does Not Exist Anywhere	User doesn't exist
606	Not Acceptable	No acceptable media

SIP Message Structure and Headers

SIP messages, being text-based, are structured like HTTP messages. After the start line, each message has a series of header fields, each on their own line in a "Name: value" format. These header fields convey routing information, message attributes, and protocol-specific data. Here are some of the most important SIP header fields and what they mean:

Via

Lists the network path taken by the request. Each SIP proxy that forwards a request adds a Via header indicating its address. The Via also includes a branch identifier that uniquely marks this request to detect duplicates. For example, Alice's UA might send with Via: SIP/2.0/UDP alice_pc.atlanta.com;branch=z9hG4bK776asdhds. Proxies add their own Via on top. The Via is used to route responses back the same path: each proxy and UA uses the Via stack to send the response in reverse order. The branch parameter often begins with the magic cookie "z9hG4bK" for RFC3261-compliant systems, which helps identify loops and protocol version (the cookie ensures unique branch IDs and distinguishes them from older RFC2543 branches). In summary, the top Via header in a request indicates where to send the response. In responses, the Via headers are simply echoed back (each proxy removes its own Via as the response passes).

From

Indicates the originator of the request – i.e., the caller's identity. It contains a display name and SIP URI of the caller, and it also has a tag parameter. Example: From: "Alice" <sip:alice@atlanta.com>;tag=1928301774. The tag is a random string added by the UA to uniquely identify this particular dialog leg from the caller's side. The combination of Call-ID and tags is what makes a SIP dialog unique (explained later). The From header is usually not changed by proxies (except some privacy services). It represents the logical identity of who sent the request.

To

Indicates the intended recipient of the request (the callee's identity). It also contains a display name and SIP URI. Example: To: "Bob" <sip:bob@biloxi.com>. In an initial request (like the first INVITE of a call), the To header usually does not have a tag; the tag is added by the UAS in its response. So when Bob's phone answers, it will send a response with To: "Bob" <sip:bob@biloxi.com>;tag=a6c85cf (for instance). This tag in the To header is how the callee's UA marks its leg of the dialog. Subsequent in-dialog requests will include both tags.

Call-ID

A globally unique identifier for a particular call or SIP session. It is usually a random string (often a GUID or combination of random numbers and a host name) generated by the UA that initiates the call. Both sides use the same Call-ID for all messages in the same dialog. Together with the two tags (From tag and To tag), the Call-ID forms a unique key for the SIP dialog (the peer-to-peer relationship). If any of these differ, it's not the same dialog. Call-ID helps endpoints and proxies differentiate separate calls. (It's possible, albeit unusual, for different calls to accidentally pick the same Call-ID by random chance, but the tags will differ, so the full tuple is always unique.)

CSeq (Command Sequence)

A sequence number and method name pair that identifies a specific request within a dialog. For example: CSeq: 314159 INVITE. The CSeq number is incremented for each new request sent within a dialog (so each request a UAC sends is CSeq +1). This helps in ordering requests and matching responses to the requests (responses echo the same CSeq). The method name in CSeq is also used by UAS to detect out-of-order requests or retransmissions. Note that ACK and CANCEL have some special CSeq rules: ACK and CANCEL use the same CSeq number as the INVITE they refer to (since they are effectively part of that invite transaction), rather than getting their own new sequence numbers.

Contact

Specifies a direct URI where the user agent wants to be reached for future requests. In an INVITE, the UAC includes its Contact (like Contact: <sip:alice@192.0.2.4:5060> or an address that can be used to reach Alice's device). This is the address that the callee will send the ACK or BYE to, for example, instead of going through all proxies (unless Record-Route is set, see below). The Contact is essentially the current device address of the UA. The Contact header is crucial for direct routing of subsequent messages in a dialog. Proxies typically do not modify Contacts (except perhaps edge proxies for NAT traversal), it's end-to-end. In REGISTER requests, the Contact is the binding being registered (i.e., "associate this Contact with my AOR").

Max-Forwards

A hop count limiter, similar in concept to IP TTL. It's an integer value that gets decremented by one at each SIP hop (each proxy). If it reaches 0, the message is discarded (and an error sent). This prevents infinite loops in routing. For example, an INVITE might start with Max-Forwards: 70 (the default recommended initial value), and each proxy will reduce it. This ensures that misconfigured routes don't cause messages to circulate forever.

Content-Type and Content-Length

Content-Type indicates the MIME type of the message body, if any. In SIP, the body is often an SDP payload (with Content-Type: application/sdp). Other content types are possible (for example, instant MESSAGE may carry text/plain or a picture share might use some image type, etc., as long as both sides understand it). If there's no body, Content-Type may be omitted.

Content-Length is the size of the message body in bytes. This helps the receiver know where the message ends (especially important for TCP, as UDP has its own length from the packet).

Record-Route / Route

These headers are used by proxies to maintain themselves in the path of subsequent dialog requests. If a proxy wants to stay involved in the dialog (perhaps for call recording, policy enforcement, or because it's an outbound proxy), it will add a Record-Route header to the initial INVITE as it passes through. The UAs will then include those addresses in a Route header in future requests (e.g., the ACK, BYE) to route them through the same proxies. Record-Route ensures that even though Contact might allow direct UA-to-UA messaging, the proxies can insist that routing remain through them. Each proxy that Record-Routes is listed; the UAS copies Record-Route headers from request to response, and the UAC uses those to build a Route set for subsequent messages. If no Record-Route is used, the default is that the next requests in the dialog go directly to the Contact. (Record-Route is ignored in REGISTER and some other requests.) Most SIP service providers use Record-Route to keep signaling through their proxies (forming what's called a "signaling path" for the dialog).

Authorization / WWW-Authenticate

SIP uses an HTTP Digest authentication model. When a UAC sends a request that needs authentication (like REGISTER or an INVITE to a realm requiring auth), the server can respond with 401 Unauthorized (or 407 Proxy Authentication Required for proxy-auth) along with a WWW-Authenticate (or Proxy-Authenticate) header challenging the user. The UAC then resends the request with an Authorization (or Proxy-Authorization) header containing the credentials (username, realm, nonce, response, etc.) as per the HTTP Digest algorithm. This challenge-response handshake ensures the user is who they claim (given a shared password or secret). This is how SIP enforces user authentication for things like registration and calling. The details follow RFC 2617 (HTTP Digest). From the user perspective: the first call attempt might get a 401, then the phone automatically resends with credentials, then it succeeds with 200 OK.

Supported / Require

These headers are used to indicate extensions. A UAC can include a Supported header to list optional SIP extensions it supports (like 100rel, timer, replaces, etc.). A UAS or proxy can use Require to insist that the other side must support a certain extension or the request fails. For example, Require: 100rel might be in an INVITE to demand that provisional reliability (PRACK) is used. If the other side doesn't support it, it will send an error (420 Bad Extension). Generally, Require is used sparingly (only when an extension is critical), while Supported/Unsupported/Allow advertise capabilities.

The SIP headers collectively provide a rich amount of information and control. The complete set of SIP header fields is defined in RFC 3261 Section 20 and subsequent RFCs. For everyday use, the ones discussed above are the most crucial to understand the call flows.

Message Body and SDP

Finally, if a SIP message contains a body, it is separated from headers by a blank line. In call setup, the body is usually an SDP offer or answer, containing lines that describe media streams (audio/video), codecs, IP addresses, and ports. SIP itself is agnostic to the body content, treating it as opaque data to be passed along. The default and most common body type is SDP (Session Description Protocol), which SIP MUST support according to RFC 3261, as it's the standard way to negotiate media for calls. The offer/answer model used by SIP (RFC 3264) means one party offers a set of media parameters in an INVITE or in a response, and the other party answers with their selection, allowing both to agree on how to communicate media. For instance, Alice's INVITE might include an SDP offer proposing an audio stream with certain codecs; Bob's 200 OK carries an SDP answer selecting one codec and confirming media parameters. This way, by the time the call is established, both sides know what IP/port to send media to and in which format. (If an INVITE doesn't contain SDP, the offer may be delayed to the 200 OK, and then the ACK carries the answer – but the end result is the same negotiation.) Aside from calls, other uses of SIP can have different body types or none at all (e.g., MESSAGE might carry text in the body, or a SIP NOTIFY for a voicemail could carry an XML message summary, etc., with appropriate Content-Type indicating the format).

Quick Reference: SIP Headers

Header	Purpose	Example
Via	Route responses back; branch ID	`Via: SIP/2.0/UDP pc.atlanta.com;branch=z9hG4bK...`
From	Caller identity + tag	`From: "Alice" <sip:alice@atlanta.com>;tag=1234`
To	Callee identity + tag (in response)	`To: "Bob" <sip:bob@biloxi.com>;tag=5678`
Call-ID	Unique call identifier	`Call-ID: a84b4c76e66710@pc.atlanta.com`
CSeq	Request sequence number + method	`CSeq: 314159 INVITE`
Contact	Direct reachable address	`Contact: <sip:alice@192.0.2.4:5060>`
Max-Forwards	Hop limit (TTL)	`Max-Forwards: 70`
Content-Type	Body MIME type	`Content-Type: application/sdp`
Content-Length	Body size in bytes	`Content-Length: 142`
Record-Route	Proxy stays in path	`Record-Route: <sip:proxy.atlanta.com;lr>`
Route	Forced routing path	`Route: <sip:proxy.atlanta.com;lr>`
Authorization	Auth credentials	`Authorization: Digest username="alice"...`
WWW-Authenticate	Auth challenge	`WWW-Authenticate: Digest realm="atlanta.com"...`
Supported	Optional extensions supported	`Supported: 100rel, timer, replaces`
Require	Mandatory extensions	`Require: 100rel`
Allow	Supported methods	`Allow: INVITE, ACK, BYE, CANCEL, OPTIONS`
Expires	Registration/subscription lifetime	`Expires: 3600`
Session-Expires	Session timer value	`Session-Expires: 1800;refresher=uac`
Refer-To	Transfer target	`Refer-To: <sip:carol@chicago.com>`
Event	Event package type	`Event: presence`
Subscription-State	Subscription status	`Subscription-State: active;expires=3600`

Dialog Identification

A SIP dialog is uniquely identified by the combination of three values:

Component	Source	When Set
Call-ID	UAC generates	Initial request
From-tag	UAC generates	Initial request
To-tag	UAS generates	First response (usually 1xx or 2xx)

SIP Call Flow Example (Basic Call Setup)

To ground these concepts, let's walk through a typical SIP call flow for a voice call. This example will use the canonical scenario often called the "SIP trapezoid": Alice calls Bob, with each of their devices using a SIP service provider (their domains atlanta.com and biloxi.com respectively), and there are proxy servers in between. We'll illustrate the message exchange step by step:

Let's explain this call flow step by step:

INVITE – Alice calls Bob

Alice's SIP UA (softphone) sends an INVITE request to Bob's SIP URI (sip:bob@biloxi.com). The INVITE goes to Alice's configured proxy (at atlanta.com) as the outbound proxy for her domain. This INVITE includes Alice's SDP offer (e.g., proposing an audio stream) and has headers: From (Alice's URI, with a tag), To (Bob's URI, no tag yet), Call-ID, CSeq, Contact (Alice's contact address), etc. The atlanta.com proxy receives this INVITE from Alice.

Routing – Proxy atlanta.com forwards toward biloxi.com

The atlanta.com proxy now acts as a UAC on behalf of Alice, forwarding the INVITE to the next hop. It needs to determine how to reach biloxi.com. It likely performs a DNS lookup (RFC 3263) to find the SIP server for biloxi.com. Let's say it finds an address for the biloxi.com proxy. Before forwarding, the atlanta proxy adds its own Via header (so that responses come back to it) and typically a Record-Route header if it wants to stay on the path. It then sends the INVITE to the biloxi.com proxy server.

100 Trying – Provisional response by proxies

Upon receiving the INVITE, the biloxi.com proxy will usually generate a quick 100 Trying response back towards Alice. This is a provisional (1xx) response that indicates "I got the INVITE, I am working on it." It's not end-to-end; proxies send 100 Trying upstream to stop retransmissions from the previous hops. In our flow, the biloxi proxy sends 100 Trying immediately back to atlanta proxy, which in turn forwards 100 Trying back to Alice's UA. Alice's phone receives "100 Trying" from Atlanta, meaning the call attempt is proceeding. The proxies won't forward any further 100 response beyond the first – once Alice got a trying from atlanta.com, she knows her INVITE is being handled.

Proxy to UAS – Invite reaches Bob's server

The biloxi.com proxy now forwards the INVITE to Bob's SIP phone (UAS). First, it needs to figure out where Bob is registered. It likely consults a location service (populated by Bob's registrations). Suppose Bob is registered at an IP or another proxy; for simplicity, assume Bob's device is directly registered with biloxi.com and reachable. The proxy forwards the INVITE to Bob's Contact address. It will also add itself to Record-Route on the way out if not already.

180 Ringing – Bob's phone rings

When Bob's SIP phone (UAS) receives the INVITE, Bob isn't picking up yet, so the phone sends a 180 Ringing provisional response. This indicates to the caller that the callee's endpoint is alerting (ringing). The 180 Ringing travels back through the proxies: Bob's phone sends it to biloxi.com proxy, which forwards it to atlanta.com proxy, which then sends it to Alice's UA. Alice's softphone receives "180 Ringing" and can now typically play a ringback tone or show "Ringing" on the UI to inform Alice that Bob's phone is ringing. (This 180 may also contain an SDP if early media is to be established, but usually ringing itself doesn't need SDP – a 183 Session Progress would be used if early media like announcements were sent).

200 OK – Bob answers

When Bob picks up the phone, his SIP phone (UAS) sends a 200 OK (Success) response. This is a final response indicating the call is accepted. The 200 OK carries Bob's SDP answer, which contains the agreed media parameters (IP/port for Bob, chosen codec, etc.) that answer the offer from Alice. This 200 OK goes from Bob's phone to the biloxi proxy, then to the atlanta proxy, and then to Alice's UA, following the Via headers added earlier. When Alice's UA receives the 200 OK, the dialog is considered established – at this point, both sides have exchanged SDP and agreed on the session, and a SIP dialog (identified by Call-ID + tags) is in place connecting Alice and Bob for this call.

ACK – Alice confirms receipt of 200 OK

Upon receiving the 200 OK, Alice's UA must send an ACK request to confirm it. The ACK for a successful INVITE is sent end-to-end directly to the UAS (Bob's phone) in this scenario. How does Alice know where to send it? The Contact header in the 200 OK likely contains Bob's direct address, and any Record-Route headers from proxies are used to build a Route set. So, Alice's ACK will go to atlanta proxy (if Record-Route was set) then biloxi proxy then to Bob, or possibly directly if none. In our example, since proxies did Record-Route, the ACK will traverse atlanta -> biloxi -> Bob (just like the INVITE path). However, note that ACK is a peculiar case: for a 200 OK, the ACK is a separate transaction and not retransmitted by the UAS. If it gets lost, Bob's phone will retransmit the 200 OK periodically until an ACK arrives (for reliability). Alice's ACK contains no SDP (usually) and no response is expected to the ACK. Once Bob's phone receives the ACK, the call is formally established.

Media Session – Audio conversation begins

After the 200 OK/ACK handshake, Alice and Bob begin the media session (shown as a double arrow in the diagram). They send RTP packets for audio (and/or video) directly between their IP addresses as negotiated (or through any media relays if configured, but at SIP level we consider the session established). SIP itself is quiet during the conversation, allowing the media layer to handle the real-time stream.

BYE – Terminating the call

Let's say Bob hangs up after some time. Bob's phone will send a BYE request to end the session. The BYE is a SIP request that is sent within the existing dialog (so it will have the same Call-ID and the To/From tags from earlier) and it is routed along the established path. In our case, Bob's phone sends BYE to the biloxi.com proxy (since the dialog's Route set includes it), which forwards to atlanta.com proxy, then to Alice's UA. Alice's UA receives the BYE from Bob.

200 OK (BYE) – Call terminated

Alice's UA responds with 200 OK to acknowledge the BYE (this 200 OK for the BYE is a final response indicating the session is terminated). That 200 OK travels back to Bob (through atlanta and biloxi proxies). Once Bob's side gets the 200 OK for its BYE, the call is fully terminated on both sides. They will stop the media session. The SIP dialog is torn down at this point.

This completes the basic call flow. Throughout this flow, various headers played their roles: e.g., multiple Via entries were present on the INVITE and thus on responses, the proxies used the branch in Via to match responses to requests, the To tag appeared in the 200 OK establishing the dialog, and so on. Also note some special cases: the 100 Trying is generated by proxies to suppress retransmissions (Alice's UA would retransmit INVITE if no response at all), and the ACK for the final answer is a separate transaction (with no response) that went end-to-end. If Bob had never answered, Alice might have sent a CANCEL (covered next) or Bob's side might send a 408 Request Timeout or some 4xx rejection.

Canceling a call

If Alice decided to hang up before Bob answered (for example, she gets tired of waiting), her UA could send a CANCEL request for the INVITE. CANCEL is a separate request that uses the same Call-ID, CSeq (same number, method "CANCEL"), and Via as the INVITE it's canceling. The CANCEL travels through the proxies hop-by-hop (each proxy responds to CANCEL with 200 OK immediately), and when the CANCEL reaches Bob's UAS, if Bob hasn't answered yet, his phone will terminate the ringing and respond to the original INVITE with 487 Request Terminated. Alice's UA, upon seeing 487, knows the call was canceled. (If Bob had already sent a 200 OK moments before CANCEL arrived, then the CANCEL has no effect – a UAS ignores CANCEL if a final response has already been sent. CANCEL is only useful before a call is answered, and specifically for INVITE; you don't cancel other methods in practice. Also by spec, a UAC should not send CANCEL until it has received at least one provisional response for the INVITE, to avoid a race where CANCEL arrives before the INVITE is even processed.)

This basic flow demonstrates how SIP works in a simple call setup scenario. We saw the INVITE request/response handshake to establish a session, the use of provisional responses (100, 180) for call progress, and the use of BYE to terminate the session. We also highlighted the special nature of ACK: for a successful call, ACK is its own transaction with no response; for an unsuccessful call (a non-2xx final response like a busy or decline), the UAC still sends an ACK but in that case the ACK is considered part of the INVITE's transaction and is not a separate handshake (the UAS will stop retransmitting the error after it gets the ACK). These nuances are defined in the SIP spec to handle reliability over UDP. Essentially, an ACK for a 200 OK is sent end-to-end and not acknowledged, whereas an ACK for an error is just to stop the retransmissions from the UAS (which does retransmit errors like 487 or 486 until ACK).

Throughout the call, proxies didn't alter the message bodies – SDP offer/answer passed end-to-end. They did manage routing, though. If this were a multi-target scenario (say Bob is registered at two places), the proxy could have forked the INVITE to multiple locations. Forking means a proxy sends parallel INVITEs to all contacts for Bob (or serially tries one then another). This can lead to multiple 180 Ringing responses (Alice might hear multiple phones ringing) and even potentially multiple 200 OKs if two devices answer nearly simultaneously. SIP handles this by the proxy forwarding the first 200 OK to Alice and then sending CANCELs to the others, typically, so one call is established. If two 200 OKs do reach Alice, she technically has two dialogs and would have to BYE one – but in practice proxies try to avoid that. This is an example of how proxies can coordinate more complex scenarios.

Quick Reference: Basic Call Flow

Step	Direction	Message	Purpose
1	Alice -> Proxy	INVITE	Initiate call (with SDP offer)
2	Proxy -> Alice	100 Trying	Stop retransmissions
3	Bob -> Proxy	180 Ringing	Callee alerting
4	Bob -> Proxy	200 OK	Call answered (with SDP answer)
5	Alice -> Bob	ACK	Confirm receipt of 200 OK
6	Alice <-> Bob	RTP	Media session
7	Bob -> Alice	BYE	End call
8	Alice -> Bob	200 OK	Confirm termination

Transactions, Dialogs, and Sessions

It's important to understand the layering of SIP's concepts: transactions, dialogs, and sessions. In the above flow:

Transaction

A Transaction is a single request and all of its responses (excepting the ACKs for 2xx). It is the fundamental unit of message handling in SIP. For example, the initial INVITE request and the final 200 OK (and all provisional responses in between) constitute a single transaction. A separate transaction was the ACK for that 200 OK (because ACK to 2xx is not considered part of the invite transaction). The BYE and its 200 OK was another transaction. SIP's state machines (client transaction, server transaction) handle retransmissions and timeouts at the transaction level. Transactions are identified by the CSeq number, request method, and some headers like branch ID, etc. Provisional responses (1xx) do not end a transaction, while final responses (2xx-6xx) do. Once a final response is sent and its ACK handled (if applicable), the transaction is completed. If using UDP, SIP relies on retransmission timers: the UAC retransmits requests (like INVITE) until a response is received, the UAS retransmits 200 OK until an ACK is received, etc., as defined by the transaction timers in the RFC.

Dialog

A Dialog is a peer-to-peer relationship between two UAs that persists for some time, typically created by an INVITE transaction's successful completion. In our flow, once Alice received the 200 OK from Bob, a dialog was established. The dialog is identified by the combination of Call-ID, Alice's tag (From tag), and Bob's tag (To tag). Within this dialog, either UA can send new requests (called in-dialog requests), such as BYE or re-INVITE, which then form new transactions but are within the context of the existing dialog. The dialog maintains state like the route set (learned from Record-Route), the remote target (the Contact of the other side), and sequence number expectations. It's basically the SIP "call state." A dialog lasts until terminated by a BYE (or error like 408 if one side goes down, or explicit termination for other dialog usages like SUBSCRIBE which might have an expiration). Dialogs are important because they allow the two endpoints to have a context for further messages: for example, a BYE doesn't make sense without a dialog – it needs to know which call to terminate. Also, dialog state is used for things like mid-call requests (UPDATE or re-INVITE to modify media, INFO, etc.). Note: certain methods like REGISTER or OPTIONS are typically outside of dialogs (they are standalone), and SUBSCRIBE/NOTIFY can create their own subscription dialogs separate from call dialogs. But an INVITE 2xx and its ACK always establish a dialog (unless it's a special stateless case like CANCEL/ACK which don't create dialogs). Dialogs also have a notion of "local" and "remote" CSeq numbers to track ordering of requests.

Session

A Session in SIP refers to the actual media session negotiated – e.g., the audio/video session described by SDP and carried via RTP. It's what users think of as "the call" in terms of media. SIP's job is to set up, modify, and tear down sessions. A dialog often corresponds one-to-one with a session (the call between Alice and Bob). However, technically you could have a dialog without an active session (for example, a SUBSCRIBE/NOTIFY dialog has no media session, it's just a subscription state, or an INVITE that completed but no media was sent yet). Also, a single dialog can manage multiple media streams (audio + video, etc., part of one session description) or can be updated with new sessions (hold/resume with new SDP, etc.). Generally, though, when the dialog ends (BYE), the session is gone too. The terms "call" and "session" are often used interchangeably, though session specifically refers to the set of media streams.

SIP's design separates transactions from dialogs: Transactions are like the individual request/response exchanges (short-lived), whereas Dialogs are the long-lived connection or context (which can span multiple transactions). For instance, the INVITE/200/ACK was one transaction that established the dialog; the BYE/200 was another transaction within the same dialog. Within a dialog, each new request increments the CSeq and is processed in order.

One special case: the initial INVITE transaction for a dialog is slightly different from subsequent ones because of the three-way handshake for INVITE (INVITE -> 200 OK -> ACK). As noted, the ACK for a 2xx response is not considered part of the transaction – so the INVITE transaction is actually only completed by the 200 OK. The ACK is its own (with no response expected) to confirm the dialog. For non-2xx final responses, however, the ACK is considered part of that transaction (to conclude it). This distinction exists because SIP needed to solve reliability for the 200 OK over an unreliable transport: the UAS can't rely on the transaction layer's retransmit mechanism for 200 OK (since the INVITE transaction would technically end at the 200 OK), so instead the UAS itself retransmits the 200 OK until an ACK arrives. This is a unique quirk of SIP's INVITE handling.

In summary, the dialog provides the state that links a series of message exchanges, and transactions ensure each request/response exchange is reliable and independent. For most practical purposes, when someone talks about "a SIP call" they mean a dialog established by an INVITE. Within that call, multiple transactions occur (re-INVITE, BYE, etc.). Outside of calls, transactions like an isolated OPTIONS or a REGISTER stand alone (no lasting dialog unless the method itself defines one – REGISTER doesn't, SUBSCRIBE does create a dialog for the subscription, etc.).

Quick Reference: Transaction vs Dialog vs Session

Concept	Scope	Lifetime	Identified By
Transaction	Single request/response	Seconds	Branch ID, CSeq, Method
Dialog	UA-to-UA relationship	Minutes to hours	Call-ID + From-tag + To-tag
Session	Media exchange	Duration of call	SDP negotiation (via dialog)

ACK Behavior	For 2xx Response	For non-2xx Response
Part of transaction?	No (separate)	Yes
Retransmitted?	No	No
Triggers retransmit?	UAS retransmits 200 until ACK	Stops UAS retransmit of error
Routing	End-to-end via Route set	Hop-by-hop like INVITE

Registration and Location Service

Before calls can be made, user agents typically register with their SIP server. Registration is how a SIP UA announces "Here I am, at this address" to the network. The REGISTER request binds a user's Address-of-Record (AOR) (which is a SIP URI like sip:alice@atlanta.com) to one or more Contact URIs (device addresses).

For example, when Alice opens her softphone app, it sends: REGISTER sip:atlanta.com SIP/2.0 with headers including To: sip:alice@atlanta.com, From: sip:alice@atlanta.com (with a new tag), Contact: <sip:alice@192.0.2.4:5060>, and often an Expires header (or Contact parameter) indicating how long this registration should be valid (e.g., 3600 seconds). The REGISTER is sent to the registrar server for atlanta.com (often co-located with the proxy). If authentication is required, the server will respond 401 Unauthorized with a challenge, and Alice's UA will re-send the REGISTER with proper Authorization credentials (username/password) – a process identical to HTTP digest auth as mentioned. Once authorized, the registrar responds with 200 OK. At this point, Alice is "registered." The registrar has stored the mapping: AOR=sip:alice@atlanta.com -> Contact=sip:alice@192.0.2.4:5060 (plus an expiration time).

Registrations have a limited lifetime (the Expires header or Contact's expires parameter dictates this). Alice must periodically refresh her registration (by sending a new REGISTER before expiry) to keep it active, or she can send a REGISTER with Expires: 0 to remove a registration (logout). Multiple devices can register for the same AOR with different Contacts; the registrar will store all of them (often proxies will then fork incoming INVITEs to all Contacts). The Contact header in REGISTER can also contain a q-value for priority or other parameters (defined in RFC 3261 and extended in RFC 3840 for capabilities).

The Location Service is the database that the registrar populates with these contacts. When an incoming INVITE for Alice arrives at atlanta.com, the proxy queries the location service to find where to send the INVITE (e.g., to Alice's registered IP). This is how routing of incoming calls works in SIP, enabling user mobility. If Alice moves networks and sends a new REGISTER, the location service updates to her new Contact. If she unregisters (or it expires), callers will get a 404 Not Found or be sent to voicemail, etc., depending on server policy.

A REGISTER is distinct from an INVITE dialog – it's a standalone transaction (with its own response). It does not create a dialog. However, there is an optional SIP event package (RFC 3680) that allows subscription to registration state if needed (beyond scope here).

One should also note that REGISTER binds only the Contact address for future requests, it doesn't affect the current transport connection except when using specific extensions like outbound (RFC 5626) which allow maintaining a persistent flow. In basic SIP, the proxy just uses the Contact when sending a new request.

Another point: The From and To in REGISTER are usually the same (both are the AOR of the user), because you're effectively saying "I as this user want to register this address." All registrations from one UA to the same server reuse the same Call-ID and increment CSeq (as a best practice) so the server knows it's an update to an existing registration. The server's 200 OK to REGISTER often contains Contact headers (echoing what's registered, possibly with q-values or stating how many contacts were registered).

Quick Reference: Registration

Action	Expires Value	Result
Initial registration	e.g., 3600	Binding created
Refresh registration	e.g., 3600	Binding updated
Unregister	0	Binding removed
Fetch bindings	(no Contact)	Returns current bindings

Header	In REGISTER Request	Notes
Request-URI	`sip:domain.com`	Registrar's domain
To	`sip:user@domain.com`	AOR being registered
From	`sip:user@domain.com`	Usually same as To
Contact	`<sip:user@ip:port>`	Device's reachable address
Expires	3600	Registration lifetime (seconds)
Call-ID	(consistent)	Same for all registrations from UA
CSeq	(incrementing)	Increments each registration

SIP Extensions and Advanced Features

The core SIP specification (RFC 3261) is augmented by numerous other RFCs that introduce new features. Understanding every extension is a massive undertaking (the SIP RFC series is extensive), but here we'll summarize some of the key extensions and advanced concepts that build on the basics we've covered:

Reliability of Provisional Responses (RFC 3262)

In basic SIP, provisional responses (1xx) are not acknowledged at the transaction layer – they are sent unreliably (except 100 Trying isn't forwarded). RFC 3262 adds an option to send provisional responses reliably. It introduces the PRACK method (Provisional ACK) which acts like an ACK for a provisional response. To use this, the UAS includes a header Require: 100rel (and the UAC might offer Supported: 100rel). A provisional response (like 180 Ringing or 183 Session Progress) is sent with a RSeq header (sequence number) and the UAC must respond with PRACK to acknowledge it. This ensures, over UDP, that important provisional responses (especially ones carrying early media in 183 or SIP precondition info) are not lost. PRACK itself is just another request within the dialog (with its own 200 OK response). The use of PRACK is negotiated per call. If used, provisional responses are essentially reliable, similar to final responses.

Offer/Answer Model with SDP (RFC 3264)

This RFC goes hand-in-hand with SIP, detailing how SDP is used in INVITE/200/ACK to negotiate media. It specifies that one party offers a set of media (codecs, IPs, ports) and the answerer picks from those. We touched on this earlier; key points are that all SIP UAs must support SDP, and that an INVITE can contain an SDP offer. If it does, the 200 OK must contain the answer. If the INVITE has no SDP, it's implying the offer comes in the 200 and then the ACK from caller contains the answer. The model also allows for re-INVITEs or UPDATE to change the session later (e.g., hold by setting a=inactive or sendonly, codec change, adding video, etc.). The Offer/Answer RFC is fundamental for ensuring both ends agree on the media session parameters.

Locating SIP Servers (RFC 3263)

This details the DNS procedures we mentioned. SIP clients perform NAPTR and SRV lookups for the domain to find the correct host and transport to send requests. It also covers fallback between UDP and TCP if a large message doesn't get response, etc. For example, a lookup might find _sip._udp.biloxi.com and _sip._tcp.biloxi.com records, etc., and the client tries them in order of priority/weight. This all happens under the hood in a UA library or proxy.

Event Framework – SUBSCRIBE/NOTIFY (RFC 3265 & RFC 6665)

This provides a generic framework for subscription to events in SIP. Using SUBSCRIBE, a UA can request to be notified of certain events from another UA or server. The subscription itself is a dialog (with its own Call-ID and tags separate from call dialogs). NOTIFY messages are then sent to the subscriber whenever the subscribed event occurs or changes. For example, SIP presence is built on this: you SUBSCRIBE to a user's presence, and their presence server sends NOTIFY updates when their status changes. Other examples: message waiting indicators (voicemail waiting), call monitoring, or even SLA sharing. RFC 3265 defined the baseline; RFC 6665 later updated it and clarified a lot of behavior. Event notifications require defining specific Event Packages (e.g., an event package for "presence" is defined in RFC 3856, for "message-summary" in RFC 3842, etc.). Each NOTIFY indicates the event type and carries a message body with the state (like presence info in XML, or voicemail count). The subscription has an expiration and can be refreshed. This system is called SIP-SIMPLE when used for IM and presence.

Session Timer (RFC 4028)

This extension introduced a keep-alive mechanism for SIP dialogs. A session timer is an interval negotiated between UA and proxy (or UAS) such that if no refreshing request (like re-INVITE or UPDATE) is sent within that time, the session is considered dead. This helps in cleaning up zombie calls if one side crashes and stops sending media – without session timer, a call might remain "active" forever if BYE is never sent. With session timers, one side (the "refresher") will send a periodic re-INVITE/UPDATE (with Session-Expires header) to check that the dialog is still active. If the refresh fails (no response), the call is terminated. This is often used in networks to make sure hung calls don't tie up resources. The timer value might be, say, 30 minutes or even 2 minutes in some systems – if both sides support it (Require: timer or Supported: timer is used to negotiate).

UPDATE Method (RFC 3311)

As described earlier, UPDATE allows changing session parameters before the initial INVITE has completed. For instance, during early ringing, you might want to send updated SDP (maybe the network prefers to do early media negotiations or change ringback tones). Or for call queues, an UPDATE could be used to alert the caller with new info while the call is still pending. UPDATE can also refresh session timers in early stage. Essentially, it fills a gap by allowing mid-dialog (actually mid-transaction) modifications when INVITE is in progress. If used after call establishment, it's similar to a re-INVITE but doesn't create a new offer/answer if not needed (though re-INVITE could also be used then).

REFER Method (RFC 3515)

REFER is used for call transfer and similar features. When Alice wants to transfer Bob to Carol, Alice (who is in a call with Bob) sends Bob a REFER with a Refer-To: Carol's URI. Bob's UA, upon receiving REFER, will act as if Bob is calling Carol (essentially it triggers a new INVITE to Carol). The outcome of that referral (whether Carol answered) is reported back to Alice via NOTIFY (REFER defined an event package for the result of the transfer). This mechanism allows one party to ask the other to initiate a new request on their behalf. It's not limited to transferring calls, but that's the common use (attended or unattended transfer scenarios). REFER builds on the subscription model (implicit subscription to the "refer" event is created by the REFER request). A successful REFER usually gets a 202 Accepted response and then NOTIFYs (with "SIP/2.0 200 OK" or error inside) to indicate the referred call's status. There are also extensions like RFC 7647 clarifying REFER behavior with NOTIFY (allowing suppressing the implicit subscription with a norefersub parameter, etc.).

NAT Traversal

Symmetric Response Routing (rport, RFC 3581)

NATs pose a big problem for SIP because the SIP headers carry IP addresses/ports that might be local addresses, and the endpoints might be behind firewalls. RFC 3581 introduced the rport parameter for Via headers. When a UAC behind a NAT includes Via: ...;rport, it is asking the server to send the response back to the source IP and port the request came from, rather than the address in the Via (which might be the private IP). This helps responses get back through NATs because they go to the NAT's mapped address/port (since the request came from there). Most modern SIP devices use rport by default.

Outbound (RFC 5626)

RFC 5626 (Outbound) goes further: it allows a UA to register via a persistent flow (TCP/TLS or UDP with keep-alives) and maintain that connection for incoming requests as well. The UA would register with an Instance-ID and possibly multiple contacts with different flow-IDs, and the registrar/proxy will use the same connection to forward incoming INVITEs. Outbound also defines keep-alive messages (CRLF or STUN pings) to keep NAT bindings open. This extension is crucial for mobile devices and NAT-heavy environments (e.g., SIP over cellular networks).

Alongside these, techniques like Session Border Controllers (SBCs) are often used – these are essentially B2BUAs at the network edge that ensure media and signaling traverse NATs (they hide the complexities from endpoints). Additionally, ICE (Interactive Connectivity Establishment) is used for media NAT traversal (by including multiple candidate addresses in SDP and testing connectivity). While ICE (RFC 5245) is a separate protocol, SIP endpoints often integrate ICE in their SDP to get media flowing.

Instant Messaging and Presence (SIP SIMPLE)

We mentioned MESSAGE method (RFC 3428) for instant messaging. SIP can be used as a pager-mode IM by simply sending SIP MESSAGE outside a call or within a dialog. Each MESSAGE is like an SMS – no sessions, just one-off messages, usually carrying plain text or maybe CPIM wrappers. For more complex chat (large messages, offline storage), often SIP is used with MSRP (Message Session Relay Protocol) negotiated via SDP in an INVITE (for session-mode IM). Presence, as noted, is done via SUBSCRIBE/NOTIFY (presence event package, RFC 3856) where users publish their status and others subscribe. There's also PUBLISH method (RFC 3903) for a UA to publish its presence state to a server which then NOTIFYs subscribers.

Call Forking and Forked Responses

When a proxy forks a request to multiple UAS locations, the UAC might receive multiple 2xx responses (from different branches). SIP handles this by treating each 2xx as establishing a separate dialog. The UAC can choose to accept one and terminate the others. Typically, a proxy will try to avoid multiple final answers (by canceling other branches once one answers), but it's possible for two almost-simultaneous answers to both reach the caller. The caller should then send ACK to both and quickly send BYE to the one it doesn't want, or handle it (in some cases two active dialogs might be merged into a conference, but that's beyond basic SIP). This is a corner case that implementations must be aware of.

Additional SIP Headers and Features

There are many more headers and features defined across RFCs. For example:

The Replaces header (RFC 3891) allows a REFER to suggest replacing an existing dialog (used in attended transfer to swap calls).
The Join header (RFC 3911 & updated by RFC 7621) allows a UA to join an existing dialog (used in conferencing scenarios).
The Reason header (RFC 3326) can be included in BYE or CANCEL to indicate why a call was terminated (busy, declined, etc.).
Privacy extensions (RFC 3323, etc.) allow hiding user identity by proxies.
The P-Preferred-Identity / P-Asserted-Identity headers (RFC 3325) are used in trusted networks (like carrier networks) to convey identity information securely within the network.
Session-Expires header is used for session timers (we discussed).
Allow header indicates what methods a UA supports.
Supported indicates other supported extensions.
Event header identifies event packages in SUBSCRIBE/NOTIFY.

Security Mechanisms

Apart from digest authentication, SIP can be secured at the transport layer using TLS (SIPS URI or using TLS for a domain – often on port 5061). This provides encryption of SIP signaling, protecting headers and content from eavesdropping or tampering. At the application layer, there was a mechanism for end-to-end body encryption using S/MIME (carrying certificates and encrypted SDP or messages), but this saw little practical use.

More modern is the STIR/SHAKEN framework (RFC 8224 and others) which uses identity headers with certificates to prevent caller ID spoofing – this is a hot topic in telephony (robocall mitigation). That involves the Identity header where the originating service signs the call information, and the terminating service verifies it.

There are also extensions for media security like SDES for SRTP (using SDP to exchange SRTP keys) or DTLS-SRTP (keying via DTLS handshake) to secure the media. Those are outside SIP itself, but SIP carries their negotiation via SDP.

Telephony Interworking

SIP can interwork with the traditional phone network (PSTN). For instance, SIP-T / SIP-I (RFC 3372 etc.) define how ISDN/PSTN signaling (like ISUP messages) can be encapsulated in SIP for handoff between IP and circuit networks. This might involve including ISUP message in the body of SIP for seamless transit. Gateways perform translation between SIP messages and ISUP/SCCP, etc. For most SIP users, this is transparent – you dial a phone number, it goes to a SIP trunk provider, and they convert it to PSTN signaling if needed.

Conferencing and Early Media

Basic SIP call flows can be extended to multiparty. One way is third-party call control (3PCC) where a controller (app) uses multiple SIP dialogs to bring parties into a conference by managing INVITEs (RFC 3725 outlines best practices). Another is using a conference server where users INVITE themselves to a focus (conference URI), and the mixing is done there (RFC 4353). SIP has event packages like conference event package (RFC 4575) that let participants know who's in the call, etc.

Early media (media before call answer, like ringback tones or announcements) can be handled by sending media during 180/183 responses. There's a whole discussion (RFC 3960) on how to manage early media and avoid clipping or confusion with local ringback. Essentially, either the caller or the callee's network may generate tones or media to play to the caller before answer.

As you can see, SIP is not just a single protocol in isolation, but an entire family of protocols and extensions – often referred to as the SIP "umbrella". It can be daunting, but knowing the core (RFC 3261) plus the major extensions above covers most scenarios.

When implementing or troubleshooting SIP, it's helpful to have a reference (like this guide or concise RFC summaries) because the formal RFCs are very detailed. However, the RFCs also provide exact definitions which can be crucial for edge cases. For example, timer values, how to handle forked responses, tag generation rules, how stateful proxies handle CANCEL, what to do if multiple challenges in a forked response, etc., are all spelled out in RFC 3261 and others.

In practice, tools like Wireshark can decode SIP flows and show these headers and messages clearly, and SIP stack libraries (pjsip, reSIProcate, etc.) implement most of these behaviors under the hood. For someone learning SIP, it's useful to capture a simple SIP call and map it to the steps above.

Quick Reference: SIP Extensions

Extension	RFC	Purpose	Key Headers/Methods
100rel	3262	Reliable provisional responses	PRACK, RSeq, RAck
timer	4028	Session keep-alive	Session-Expires, Min-SE
replaces	3891	Replace existing dialog	Replaces header
join	3911	Join existing dialog	Join header
norefersub	4488	Suppress REFER subscription	Refer-Sub header
outbound	5626	NAT traversal, persistent flows	Instance-ID, reg-id
gruu	5627	Globally Routable UA URI	Contact with gr parameter
path	3327	Edge proxy routing	Path header
histinfo	7044	Call history/diversion	History-Info header

Event Package	RFC	Purpose
presence	3856	User presence status
message-summary	3842	Voicemail waiting indicator
refer	3515	REFER result notification
dialog	4235	Dialog state notification
conference	4575	Conference state
reg	3680	Registration state

Conclusion

SIP is a powerful and flexible protocol that has become the foundation of modern IP telephony and multimedia communication. We covered the essentials of SIP: its architecture (user agents, proxies, registrars, etc.), the format of SIP messages (methods, responses, and headers), the typical call setup and teardown process, and many important extensions that enhance SIP's capabilities (from reliability to event notifications and beyond).

With this understanding, one should be able to read a SIP call flow and make sense of the messages, or configure/troubleshoot a SIP-based system with a clear mental model of what's supposed to happen. This guide serves as a high-level reference – essentially a cheat sheet – to demystify SIP signaling without drowning in the full RFC jargon. For deeper dives, the relevant RFCs (3261 and others cited) can be consulted for exact details, but often remembering the key concepts and how they fit together is enough to work effectively with SIP.

SIP continues to evolve, especially in areas of security and large-scale deployments (for instance, SIP in IMS – the IP Multimedia Subsystem – in mobile networks, or extensions for emergency calling, etc.). However, the core principles remain as discussed. Armed with this knowledge, you should be able to approach those advanced uses with a solid foundation. SIP's beauty lies in its relative simplicity (text messages, clear roles) combined with extensibility – it's like the lingua franca that different voice/video systems speak to interoperate on the Internet.

Key RFCs

RFC	Title	Category
RFC 3261	SIP: Session Initiation Protocol	Core
RFC 3262	Reliability of Provisional Responses (PRACK)	Reliability
RFC 3263	Locating SIP Servers	DNS/Routing
RFC 3264	An Offer/Answer Model with SDP	Media
RFC 3265	SIP-Specific Event Notification	Events
RFC 3311	The UPDATE Method	Session modification
RFC 3326	The Reason Header Field	Diagnostics
RFC 3428	Extension for Instant Messaging	IM
RFC 3515	The REFER Method	Call transfer
RFC 3581	Symmetric Response Routing (rport)	NAT
RFC 3856	A Presence Event Package	Presence
RFC 3903	Extension for Event State Publication	Presence
RFC 4028	Session Timers in SIP	Keep-alive
RFC 5411	A Hitchhiker's Guide to SIP	Reference
RFC 5626	Managing Client-Initiated Connections	NAT/Outbound
RFC 6665	SIP-Specific Event Notification (update)	Events