Sniffer detailed architecture: Difference between revisions

From VoIPmonitor.org
(Create page: comprehensive sniffer internal architecture documentation)
 
(Add: sniffer_stat command documentation with pbStatString explanation for SQLq)
Line 1: Line 1:
{{DISPLAYTITLE:Sniffer Internal Architecture}}
[[Category:Architecture]]
[[Category:Sniffer]]
'''This document describes the internal architecture of the VoIPmonitor sensor (sniffer). It covers the threading model, buffer architecture, packet processing pipeline, and database write mechanisms. Understanding these internals helps administrators diagnose performance issues and tune the sensor for optimal performance.'''
For deployment topology and configuration, see:
* [[Sniffing_modes|Deployment & Topology Guide]] - Where to deploy sensors
* [[Sniffer_configuration|Configuration Reference]] - All config parameters
* [[Scaling|Scaling Guide]] - Performance tuning
* [[Sniffer_troubleshooting|Troubleshooting Guide]] - Common issues
== Architecture Overview ==
<kroki lang="plantuml">
@startuml
skinparam shadowing false
skinparam defaultFontName Arial
skinparam rectangle {
  BorderColor #4A90E2
  BackgroundColor #FFFFFF
}
title VoIPmonitor Sniffer Internal Architecture
rectangle "Network Interface" as NIC #E8F4FD
rectangle "Ring Buffer\n(kernel space)" as RING #FFE6E6
rectangle "'''t0 Thread'''\nPacket Capture" as T0 #FFE6E6
rectangle "Packet Buffer\n(user space)" as PBUF #FFF3E6
rectangle "Preprocessing\nThreads" as PREPROC #E6FFE6
rectangle "Call Assembly\n& RTP Processing" as CALL #E6FFE6
rectangle "PCAP Write\nThreads" as PCAP #E6E6FF
rectangle "SQL Write\nThreads" as SQL #E6E6FF
database "Spool\n(/var/spool)" as DISK
database "MySQL/MariaDB" as DB
NIC -down-> RING : hardware IRQ
RING -down-> T0 : TPACKET_V3
T0 -down-> PBUF : packet data
PBUF -down-> PREPROC : parallel
PREPROC -down-> CALL : SIP/RTP
CALL -down-> PCAP : call data
CALL -down-> SQL : CDR data
PCAP -down-> DISK
SQL -down-> DB
note right of T0
  '''CRITICAL'''
  Single-threaded
  Monitor: t0CPU
end note
note right of PBUF
  max_buffer_mem
  controls size
end note
note right of SQL
  query_cache=yes
  for reliability
end note
@enduml
</kroki>
The VoIPmonitor sniffer uses a multi-stage pipeline architecture:
# '''Packet Capture (t0)''' - Single thread reads packets from kernel ring buffer
# '''Packet Buffer''' - User-space queue for packet distribution
# '''Preprocessing''' - Multiple threads parse SIP/RTP headers
# '''Call Assembly''' - Correlates packets into calls, calculates metrics
# '''Output''' - Parallel threads write PCAPs to disk and CDRs to database
== Threading Model ==
VoIPmonitor uses a multi-threaded architecture with specialized threads for different tasks. Understanding which thread is bottlenecked helps target optimizations.
=== Thread Types ===
{| class="wikitable"
|-
! Thread !! Function !! Scaling !! Monitor
|-
| '''t0''' || Packet capture from kernel || '''Single thread''' (cannot scale) || <code>t0CPU</code> in logs
|-
| '''Preprocessing''' || SIP/RTP header parsing || Multiple threads || Thread count in logs
|-
| '''RTP Processing''' || Jitterbuffer simulation, MOS calculation || Per-call || CPU usage
|-
| '''PCAP Writers''' || Compress and write PCAP files || <code>pcap_dump_writethreads_max</code> || I/O wait
|-
| '''SQL Writers''' || Insert CDRs into database || <code>mysqlstore_max_threads_cdr</code> || <code>SQLq</code> in logs
|}
=== The Critical t0 Thread ===
The t0 thread is the '''most critical component''' of the sniffer. It runs on a single CPU core and reads all packets from the network interface. If t0CPU approaches 100%, packets will be dropped.
{{Warning|The t0 thread cannot be parallelized. If it becomes a bottleneck, you must either reduce load (filters, disable features) or use kernel-bypass solutions ([[DPDK]], PF_RING, [[Napatech]]).}}
'''Monitoring t0CPU:'''
<syntaxhighlight lang="bash">
# View current t0CPU in real-time
journalctl -u voipmonitor -f | grep t0CPU
# Example output showing healthy t0CPU (23.4%):
# t0CPU[23.4%] t1CPU[0.7%] t2CPU[0.3%] rss/vsize[2.1G/14.6G]
</syntaxhighlight>
'''Symptoms of t0 overload:'''
* <code>t0CPU > 90%</code> in logs
* Increasing packet drops: <code>ip -s link show eth0</code>
* Missing call legs or incomplete CDRs
'''Solutions:'''
* Use <code>interface_ip_filter</code> instead of BPF <code>filter</code>
* Disable jitterbuffer analysis if not needed
* Upgrade to kernel-bypass: [[DPDK]], [[Napatech]]
== Buffer Architecture ==
VoIPmonitor uses multiple buffer layers to handle traffic bursts and prevent packet loss.
<kroki lang="mermaid">
%%{init: {'flowchart': {'nodeSpacing': 15, 'rankSpacing': 30}}}%%
flowchart LR
    subgraph Kernel
        RB[Ring Buffer<br/>ringbuffer]
    end
    subgraph UserSpace
        PB[Packet Buffer<br/>max_buffer_mem]
        QC[Query Cache<br/>query_cache]
    end
    subgraph Storage
        DISK[(Spool)]
        DB[(MySQL)]
    end
    RB --> PB
    PB --> QC
    QC --> DB
    PB --> DISK
</kroki>
=== Ring Buffer (Kernel Space) ===
The ring buffer is a circular queue in kernel memory where the NIC driver places incoming packets. VoIPmonitor reads from this buffer using <code>TPACKET_V3</code>.
{| class="wikitable"
|-
! Parameter !! Default !! Description
|-
| <code>ringbuffer</code> || 50 || Size in MB (per interface)
|}
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
# Increase for high-traffic or bursty environments
ringbuffer = 200
</syntaxhighlight>
{{Tip|Increase <code>ringbuffer</code> if you see "ring buffer overflow" messages or during traffic spikes. Typical values: 50-500 MB depending on traffic volume.}}
=== Packet Buffer (User Space) ===
After reading from the ring buffer, packets are queued in user-space memory for processing by worker threads.
{| class="wikitable"
|-
! Parameter !! Default !! Description
|-
| <code>max_buffer_mem</code> || 2000 || Maximum memory in MB for packet buffering
|}
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
# Increase for servers with ample RAM
max_buffer_mem = 4000
</syntaxhighlight>
'''Symptoms of buffer exhaustion:'''
* Log message: <code>PACKETBUFFER: MEMORY IS FULL</code>
* Increasing packet drops
'''Solutions:'''
* Increase <code>max_buffer_mem</code>
* Add more preprocessing threads
* Investigate database bottleneck (see [[#Database Write Pipeline|Database Write Pipeline]])
=== Query Cache (Disk-based) ===
When the database cannot keep up with CDR inserts, VoIPmonitor queues SQL statements to disk files (<code>qoq*</code> files in spool directory). This prevents data loss during database outages or slowdowns.
{| class="wikitable"
|-
! Parameter !! Default !! Description
|-
| <code>query_cache</code> || no || Enable disk-based SQL queue
|}
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
# CRITICAL: Enable for production systems
query_cache = yes
</syntaxhighlight>
{{Warning|Without <code>query_cache=yes</code>, CDRs may be lost if the database is temporarily unavailable or slow. Always enable this in production.}}
== Packet Processing Pipeline ==
=== Stage 1: Packet Capture ===
The t0 thread reads packets using Linux's high-performance <code>TPACKET_V3</code> interface (or DPDK/Napatech if configured).
'''Capture sources supported:'''
* Standard Linux interfaces (eth0, bond0, etc.)
* VLAN-tagged traffic
* Tunneled traffic: GRE, ERSPAN, VXLAN, TZSP, HEP
* [[Audiocodes_tunneling|AudioCodes Debug Recording]]
* [[DPDK]] or [[Napatech]] for kernel bypass
=== Stage 2: Packet Classification ===
Packets are classified by protocol:
* '''SIP''' - Matched by port (<code>sipport</code> config) and content inspection
* '''RTP/RTCP''' - Matched by correlation with SIP SDP or heuristics
* '''Other''' - Tunneling protocols, management traffic
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
# Define SIP ports (comma-separated or ranges)
sipport = 5060,5061,5080
</syntaxhighlight>
=== Stage 3: Call Assembly ===
VoIPmonitor correlates packets into calls using multiple methods:
{| class="wikitable"
|-
! Method !! Used For !! Identifier
|-
| '''Call-ID''' || SIP dialog correlation || <code>Call-ID</code> header
|-
| '''SSRC''' || RTP stream correlation || RTP SSRC field
|-
| '''SDP Ports''' || RTP-to-SIP binding || Ports from SDP offer/answer
|-
| '''Custom Headers''' || Multi-leg correlation || <code>matchheader</code> config
|}
For complex scenarios with multiple call legs, see [[Merging_or_correlating_multiple_call_legs|Call Correlation Guide]].
=== Stage 4: Quality Analysis ===
For each RTP stream, VoIPmonitor calculates:
* '''Packet Loss''' - Missing sequence numbers
* '''Jitter''' - Packet delay variation
* '''MOS Score''' - Simulated Mean Opinion Score (three variants: F1, F2, adapt)
{{Note|MOS calculation uses jitterbuffer simulation. Disabling jitterbuffer (<code>jitterbuffer_f1=no</code>, etc.) saves CPU but removes quality metrics.}}
=== Stage 5: Output ===
Completed calls are written to:
* '''PCAP files''' - Raw packet captures (grouped into TAR archives per minute)
* '''Database''' - CDR records with all metadata and quality metrics
== Database Write Pipeline ==
The database write pipeline is often the bottleneck in high-traffic deployments.
<kroki lang="plantuml">
@startuml
skinparam shadowing false
skinparam defaultFontName Arial
title Database Write Pipeline
rectangle "Call\nCompleted" as CALL
rectangle "SQL\nGenerator" as GEN
rectangle "Memory\nQueue" as MEM
rectangle "Disk Queue\n(query_cache)" as DISK #FFF3E6
database "MySQL" as DB
CALL -right-> GEN : CDR data
GEN -right-> MEM : INSERT stmt
MEM -down-> DISK : if queue full
MEM -right-> DB : normal flow
DISK -right-> DB : replay on recovery
note bottom of DISK
  qoq* files in spool
  Prevents data loss
end note
@enduml
</kroki>
=== Key Parameters ===
{| class="wikitable"
|-
! Parameter !! Default !! Description
|-
| <code>mysqlstore_max_threads_cdr</code> || 1 || Parallel CDR insert threads
|-
| <code>quick_save_cdr</code> || no || Faster CDR saving (yes/quick)
|-
| <code>query_cache</code> || no || Disk-based SQL queue
|-
| <code>cdr_partition</code> || yes || Daily table partitioning
|}
=== Monitoring SQL Queue ===
The <code>SQLq</code> metric in logs shows pending SQL statements:
<syntaxhighlight lang="bash">
# Monitor SQL queue in real-time
journalctl -u voipmonitor -f | grep SQLq
# Example output:
# SQLq[cdr: 0] SQLf[cdr: 0]  # Healthy - no backlog
# SQLq[cdr: 5234] SQLf[cdr: 12]  # Backlog - database slow
</syntaxhighlight>
'''When SQLq is growing:'''
* Database cannot keep up with insert rate
* Check MySQL performance: <code>innodb_buffer_pool_size</code>, disk I/O
* Increase <code>mysqlstore_max_threads_cdr</code> (with caution)
* See [[Database_troubleshooting|Database Troubleshooting]] for detailed guidance
== Manager API (Port 5029) ==
The sniffer exposes a management interface on TCP port 5029 (configurable via <code>managerport</code>).
=== Configuration ===
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
manager_ip = 127.0.0.1    # Bind address (127.0.0.1 for local only)
manager_port = 5029      # TCP port
# OR use Unix socket:
# manager_socket = /tmp/vm_manager_socket
</syntaxhighlight>
=== Common Commands ===
=== Common Commands ===


Line 365: Line 18:
</syntaxhighlight>
</syntaxhighlight>


{{Note|The GUI communicates with sensors via this API. If the GUI cannot connect to a sensor, verify port 5029 is accessible and the service is running.}}
=== sniffer_stat Output ===
 
== Memory Management ==
 
VoIPmonitor's memory usage depends on:
* Number of concurrent calls
* Buffer sizes (ringbuffer, max_buffer_mem)
* Call recording settings
* SQL queue depth


=== Monitoring Memory ===
The <code>sniffer_stat</code> command returns JSON with detailed sensor status:


<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Memory shown in logs (RSS = physical, VSZ = virtual)
echo 'sniffer_stat' | nc 127.0.0.1 5029 | jq .
journalctl -u voipmonitor -f | grep rss
 
# Example: rss/vsize[2.1G/14.6G]
</syntaxhighlight>
</syntaxhighlight>


=== Preventing OOM ===
The SQL queue metric is located in the <code>pbStatString</code> field:


{| class="wikitable"
{| class="wikitable"
|-
|-
! Symptom !! Cause !! Solution
! Field !! Format !! Example
|-
| OOM killer terminates sniffer || Insufficient RAM || Add RAM or reduce <code>max_buffer_mem</code>
|-
| Memory grows continuously || SQL queue backlog || Fix database performance
|-
|-
| High VSZ, normal RSS || Normal behavior || Virtual memory is pre-allocated, not consumed
| <code>pbStatString</code> || Text string with SQLq || <code>SQLq[C1:145 / 0.059s / 2q/s]</code>
|}
|}


{{Tip|Set <code>query_cache=yes</code> to move SQL queue to disk instead of RAM during database slowdowns.}}
{{Note|If the SQL queue is empty, the <code>SQLq</code> string will not appear in <code>pbStatString</code>.}}
 
Example output showing SQLq in pbStatString:
<syntaxhighlight lang="json">
{
  "pbStatString": "calls[315][355] PS[C:4 S:29 R:6354] SQLq[C1:145 / 0.059s / 2q/s] heap[0|0|0] [12.6Mb/s] t0CPU[5.2%]",
  "version": "2025.9.0",
  ...
}
</syntaxhighlight>


== See Also ==
{{Note|The GUI communicates with sensors via this API. If the GUI cannot connect to a sensor, verify port 5029 is accessible and the service is running.}}


* [[Sniffer_configuration|Configuration Reference]] - All config parameters
* [[Scaling|Scaling Guide]] - Performance tuning
* [[Database_troubleshooting|Database Troubleshooting]] - SQL queue issues
* [[DPDK|DPDK Guide]] - Kernel bypass for high traffic
* [[Napatech|Napatech Integration]] - Hardware acceleration
* [[Sniffer_troubleshooting|Troubleshooting]] - Common issues


== AI Summary for RAG ==
== AI Summary for RAG ==

Revision as of 10:44, 9 January 2026

Common Commands

# Get sniffer version
echo 'sniffer_version' | nc 127.0.0.1 5029

# List active calls
echo 'listcalls' | nc 127.0.0.1 5029

# List active registrations
echo 'listregisters' | nc 127.0.0.1 5029

# Get thread statistics
echo 'sniffer_threads' | nc 127.0.0.1 5029

# Reload configuration
echo 'reload' | nc 127.0.0.1 5029

sniffer_stat Output

The sniffer_stat command returns JSON with detailed sensor status:

echo 'sniffer_stat' | nc 127.0.0.1 5029 | jq .

The SQL queue metric is located in the pbStatString field:

Field Format Example
pbStatString Text string with SQLq SQLq[C1:145 / 0.059s / 2q/s]

ℹ️ Note: If the SQL queue is empty, the SQLq string will not appear in pbStatString.

Example output showing SQLq in pbStatString:

{
  "pbStatString": "calls[315][355] PS[C:4 S:29 R:6354] SQLq[C1:145 / 0.059s / 2q/s] heap[0|0|0] [12.6Mb/s] t0CPU[5.2%]",
  "version": "2025.9.0",
  ...
}

ℹ️ Note: The GUI communicates with sensors via this API. If the GUI cannot connect to a sensor, verify port 5029 is accessible and the service is running.


AI Summary for RAG

Summary: This document describes the internal architecture of the VoIPmonitor sniffer. The sniffer uses a multi-stage pipeline: (1) t0 thread captures packets from kernel ring buffer using TPACKET_V3, (2) packets are queued in user-space packet buffer (max_buffer_mem), (3) preprocessing threads parse SIP/RTP, (4) call assembly correlates packets into calls using Call-ID/SSRC/SDP, (5) parallel threads write PCAPs to disk and CDRs to database. Critical metrics: t0CPU (must stay below 90%), SQLq (database queue depth), rss/vsize (memory usage). Key buffers: ringbuffer (kernel, default 50MB), max_buffer_mem (user space, default 2000MB), query_cache (disk-based SQL queue for reliability). Manager API on port 5029 provides control interface for GUI and CLI tools.

Keywords: sniffer architecture, t0 thread, t0CPU, ringbuffer, max_buffer_mem, packet buffer, query_cache, SQLq, threading model, packet capture, TPACKET_V3, call assembly, RTP correlation, manager API, port 5029, memory management, OOM, database pipeline, mysqlstore_max_threads_cdr, quick_save_cdr

Key Questions:

  • What is the t0 thread and why is it critical?
  • How do I monitor t0CPU and what does high t0CPU mean?
  • What is the ringbuffer and how do I size it?
  • What is max_buffer_mem and when should I increase it?
  • What does "PACKETBUFFER: MEMORY IS FULL" mean?
  • What is query_cache and why should I enable it?
  • How do I monitor the SQL queue (SQLq)?
  • What is the manager API and what port does it use?
  • How does VoIPmonitor correlate packets into calls?
  • What causes OOM errors and how do I prevent them?
  • How many threads does VoIPmonitor use?
  • What is the difference between ringbuffer and max_buffer_mem?