Scaling: Difference between revisions

From VoIPmonitor.org
(Add practical memory optimization steps: add RAM, run headless without GUI, monitor OOM killer with dmesg)
 
(18 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''This guide provides a comprehensive overview of performance tuning and scaling for VoIPmonitor. It covers the three primary system bottlenecks and offers practical, expert-level advice for optimizing your deployment for high traffic loads.'''
{{DISPLAYTITLE:Scaling and Performance Tuning}}
[[Category:Administration]]
 
This guide covers performance tuning for high-traffic VoIPmonitor deployments, addressing the three primary system bottlenecks.


== Understanding Performance Bottlenecks ==
== Understanding Performance Bottlenecks ==
A VoIPmonitor deployment's maximum capacity is determined by three potential bottlenecks. Identifying and addressing the correct one is key to achieving high performance.
# '''Packet Capturing (CPU & Network Stack):''' The ability of a single CPU core to read packets from the network interface. This is often the first limit you will encounter.
# '''Disk I/O (Storage):''' The speed at which the sensor can write PCAP files to disk. This is critical when call recording is enabled.
# '''Database Performance (MySQL/MariaDB):''' The rate at which the database can ingest Call Detail Records (CDRs) and serve data to the GUI.


On a modern, well-tuned server (e.g., 24-core Xeon, 10Gbit NIC), a single VoIPmonitor instance can handle up to '''10,000 concurrent calls''' with full RTP analysis and recording, or over '''60,000 concurrent calls''' with SIP-only analysis.
A VoIPmonitor deployment's capacity is limited by three potential bottlenecks:
 
<kroki lang="plantuml">
@startuml
skinparam shadowing false
skinparam defaultFontName Arial
 
title VoIPmonitor Performance Bottlenecks
 
rectangle "Network\nInterface" as NIC #E8F4FD
rectangle "Packet Capture\n(t0 thread)" as T0 #FFE6E6
rectangle "RTP/SIP\nProcessing" as PROC #E6FFE6
rectangle "PCAP Files\nStorage" as DISK #FFF3E6
database "MySQL/MariaDB" as DB #E6E6FF
 
NIC -right-> T0 : "1. CPU"
T0 -right-> PROC
PROC -down-> DISK : "2. I/O"
PROC -right-> DB : "3. Database"
 
note bottom of T0
  Monitor: t0CPU
  Limit: 1 CPU core
end note
 
note bottom of DISK
  Monitor: iostat
  Solution: SSD, TAR
end note


== 1. Optimizing Packet Capturing (CPU & Network) ==
note bottom of DB
The most performance-critical task is the initial packet capture, which is handled by a single, highly optimized thread (t0). If this thread's CPU usage (`t0CPU` in logs) approaches 100%, you are hitting the capture limit. Here are the primary methods to optimize it, from easiest to most advanced.
  Monitor: SQLq
  Solution: RAM, tuning
end note
@enduml
</kroki>


=== A. Use a Modern Linux Kernel & VoIPmonitor Build ===
{| class="wikitable"
Modern Linux kernels (3.2+) and VoIPmonitor builds include '''TPACKET_V3''' support, a high-speed packet capture mechanism. This is the single most important factor for high performance.
|-
* '''Recommendation:''' Always use a recent Linux distribution (like AlmaLinux, Rocky Linux, or Debian) and the latest VoIPmonitor static binary. With this combination, a standard Intel 10Gbit NIC can often handle up to 2 Gbit/s of VoIP traffic without special drivers.
! Bottleneck !! Description !! Monitor
|-
| '''1. Packet Capture''' || Single CPU core reading packets from NIC || <code>t0CPU</code> in syslog
|-
| '''2. Disk I/O''' || Writing PCAP files to storage || <code>iostat</code>, <code>ioping</code>
|-
| '''3. Database''' || CDR ingestion and GUI queries || <code>SQLq</code> in syslog
|}


=== B. Network Stack & Driver Tuning ===
'''Capacity:''' A modern server (24-core Xeon, 10Gbit NIC) can handle '''~10,000 concurrent calls''' with full RTP recording, or '''60,000+''' with SIP-only analysis.
For high-traffic environments (>500 Mbit/s), fine-tuning the network driver and kernel parameters is essential.


==== NIC Ring Buffer ====
== Optimizing Packet Capture (CPU & Network) ==
The ring buffer is a queue between the network card driver and the VoIPmonitor application. A larger buffer prevents packet loss during short CPU usage spikes.
# '''Check maximum size:'''
#<syntaxhighlight lang="bash">ethtool -g eth0</syntaxhighlight>
# '''Set to maximum (e.g., 16384):'''
#<syntaxhighlight lang="bash">ethtool -G eth0 rx 16384</syntaxhighlight>


==== Interrupt Coalescing ====
The packet capture thread (t0) runs on a single CPU core. If <code>t0CPU</code> approaches 100%, you've hit the capture limit.
This setting batches multiple hardware interrupts into one, reducing CPU overhead.
 
#<syntaxhighlight lang="bash">ethtool -C eth0 rx-usecs 1022</syntaxhighlight>
With a modern kernel and VoIPmonitor build, a standard Intel 10Gbit NIC handles up to 3 Gbit/s VoIP traffic without special drivers and almost full 10Gbit rate with [[DPDK]]
 
=== Threading (Automatic) ===
 
Since version 2023.11, VoIPmonitor uses <code>threading_expanded=yes</code> by default, which automatically spawns threads based on CPU load. '''No manual threading configuration is needed.'''
 
For very high traffic (≥1500 Mbit/s), set:
<syntaxhighlight lang="ini">
threading_expanded = high_traffic
</syntaxhighlight>
 
See [[Sniffer_configuration#Threading_Model|Threading Model]] for details.
 
=== NIC Tuning (>500 Mbit/s) ===


==== Applying Settings Persistently ====
To make these settings permanent, add them to your network configuration. For Debian/Ubuntu using `/etc/network/interfaces`:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Increase ring buffer (prevents packet loss during CPU spikes)
ethtool -g eth0                  # Check max size
ethtool -G eth0 rx 16384        # Set to max
# Enable interrupt coalescing (reduces CPU overhead)
ethtool -C eth0 rx-usecs 1022
</syntaxhighlight>
'''Persistent settings''' (Debian/Ubuntu <code>/etc/network/interfaces</code>):
<syntaxhighlight lang="ini">
auto eth0
auto eth0
iface eth0 inet manual
iface eth0 inet manual
Line 40: Line 93:
     up ethtool -C $IFACE rx-usecs 1022
     up ethtool -C $IFACE rx-usecs 1022
</syntaxhighlight>
</syntaxhighlight>
''Note: Modern systems using NetworkManager or systemd-networkd require different configuration methods.''


=== C. Advanced Offloading and Kernel-Bypass Solutions ===
=== Configuration Optimizations ===
If kernel and driver tuning are insufficient, you can offload the capture process entirely by bypassing the kernel's network stack.
 
{| class="wikitable"
|-
! Parameter !! Purpose !! Recommendation
|-
| <code>interface_ip_filter</code> || IP-based filtering || More efficient than BPF <code>filter</code>
|-
| <code>pcap_dump_writethreads_max</code> || Compression threads || Set to CPU core count
|-
| <code>jitterbuffer_f1/f2/adapt</code> || Jitter simulation || Keep <code>f2=yes</code>, disable f1 and adapt to save CPU while keeping MOS
|}
 
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
 
# Efficient IP filtering (replaces BPF filter)
interface_ip_filter = 192.168.0.0/24
interface_ip_filter = 10.0.0.0/8
 
# Compression scaling
pcap_dump_writethreads = 1
pcap_dump_writethreads_max = 32
pcap_dump_asyncwrite = yes
</syntaxhighlight>
 
{{Note|1=Recommended: <code>jitterbuffer_f1=no</code>, <code>jitterbuffer_f2=yes</code>, <code>jitterbuffer_adapt=no</code>. This saves CPU while preserving MOS-F2 metrics. Only disable f2 if you don't need quality monitoring at all.}}


* '''DPDK (Data Plane Development Kit):''' DPDK is a set of libraries and drivers for fast packet processing. VoIPmonitor can leverage DPDK to read packets directly from the network card, completely bypassing the kernel and significantly reducing CPU overhead. This is a powerful, open-source solution for achieving multi-gigabit capture rates on commodity hardware. For detailed installation and configuration instructions, see the [[DPDK|official DPDK guide]].
=== Kernel-Bypass Solutions ===


* '''PF_RING ZC/DNA:''' A commercial software driver from ntop.org that also dramatically reduces CPU load by bypassing the kernel. In tests, it can reduce CPU usage from 90% to as low as 20% for the same traffic load.
For extreme loads, bypass the kernel network stack entirely:


* '''Napatech SmartNICs:''' Specialized hardware acceleration cards that deliver packets to VoIPmonitor with near-zero CPU overhead (<3% CPU for 10 Gbit/s traffic). This is the ultimate solution for extreme performance requirements.
{| class="wikitable"
|-
! Solution !! Type !! CPU Reduction !! Use Case
|-
| '''[[DPDK]]''' || Open-source || ~70% || Multi-gigabit on commodity hardware
|-
| '''PF_RING ZC''' || Commercial || 90% → 20% || High-volume enterprise
|-
| '''[[Napatech|Napatech SmartNICs]]''' || Hardware || <3% at 10Gbit/s || Extreme performance
|}


== 2. Optimizing Disk I/O ==
== Optimizing Disk I/O ==
VoIPmonitor's modern storage engine is highly optimized to minimize random disk access, which is the primary cause of I/O bottlenecks.


=== VoIPmonitor Storage Strategy ===
=== VoIPmonitor Storage Strategy ===
Instead of writing a separate PCAP file for each call (which causes massive I/O load), VoIPmonitor groups all calls starting within the same minute into a single compressed `.tar` archive. This changes the I/O pattern from thousands of small, random writes to a few large, sequential writes, reducing IOPS (I/O Operations Per Second) by a factor of 10 or more. A standard 7200 RPM SATA drive can typically handle up to 2000 concurrent calls with full recording.
 
VoIPmonitor groups all calls starting within the same minute into a single compressed <code>.tar</code> archive. This changes thousands of random writes into few sequential writes, reducing IOPS by 10x+.
 
'''Typical capacity:''' 7200 RPM SATA handles ~2,000 concurrent calls with full recording.


=== Filesystem Tuning (ext4) ===
=== Filesystem Tuning (ext4) ===
For the spool directory (`/var/spool/voipmonitor`), using an optimized ext4 filesystem can improve performance.
 
*'''Example setup:'''
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Format partition without a journal (use with caution, requires battery-backed RAID controller)
# Format without journal (requires battery-backed RAID)
mke2fs -t ext4 -O ^has_journal /dev/sda2
mke2fs -t ext4 -O ^has_journal /dev/sda2
</syntaxhighlight>


# Add to /etc/fstab for optimal performance
<syntaxhighlight lang="ini">
/dev/sda2   /var/spool/voipmonitor  ext4   errors=remount-ro,noatime,data=writeback,barrier=0 0 0
# /etc/fstab
/dev/sda2 /var/spool/voipmonitor  ext4 errors=remount-ro,noatime,data=writeback,barrier=0 0 0
</syntaxhighlight>
</syntaxhighlight>


=== RAID Controller Cache Policy ===
{{Warning|1=Disabling journal removes crash protection. Only use with battery-backed RAID controller (BBU).}}
A misconfigured RAID controller is a common bottleneck. For database and spool workloads, the cache policy should always be set to '''WriteBack''', not WriteThrough (this applies for RPM disks, not fast SSD). This requires a healthy Battery Backup Unit (BBU). If the BBU is dead or missing, you may need to force this setting.
* ''The specific commands vary by vendor (`megacli`, `ssacli`, `perccli`). Refer to the original, more detailed version of this article or vendor documentation for specific commands for LSI, HP, and Dell controllers.''


== 3. Optimizing Database Performance (MySQL/MariaDB) ==
=== RAID Controller ===
A well-tuned database is critical for both data ingestion from the sensor and responsiveness of the GUI.


=== Key Configuration Parameters ===
Set cache policy to '''WriteBack''' (not WriteThrough). Requires healthy BBU. Commands vary by vendor (<code>megacli</code>, <code>ssacli</code>, <code>perccli</code>).
These settings should be placed in your `my.cnf` or a file in `/etc/mysql/mariadb.conf.d/`.


* '''`innodb_buffer_pool_size`''': '''This is the most important setting.''' It defines the amount of memory InnoDB uses to cache both data and indexes.
== Optimizing Database Performance ==


'''WARNING for shared servers (VoIPmonitor + MySQL on same host):'''
=== Memory Configuration ===
If VoIPmonitor and MySQL are running on the same server, setting `innodb_buffer_pool_size` too high will cause Out of Memory (OOM) killer events. MySQL will consume most of the available RAM, causing the kernel to kill processes (usually MySQL first, then VoIPmonitor), resulting in CDR delays, crashes, and instability. See [[Sniffer_troubleshooting|Step 8: Check for OOM Issues]] for detailed troubleshooting.


=== Practical Memory Optimization Steps ===
The most critical parameter is <code>innodb_buffer_pool_size</code>.


If your server experiences database connection failures, slow CDR queries, or OOM killer events:
{{Warning|1=Setting too high causes OOM killer events, CDR delays, and crashes. See [[Sniffer_troubleshooting#Check_for_OOM_.28Out_of_Memory.29_Issues|OOM Troubleshooting]].}}


;1. Increase Server RAM
'''Buffer Pool Sizing:'''
The most effective solution for memory pressure is to add more RAM to your server. Database performance scales directly with available memory. As a general rule:
* '''Minimum recommended:''' 16GB RAM for production systems
* '''Recommended for high traffic:''' 32GB or more
* Large database servers should have 64GB+ to maximize InnoDB buffer pool effectiveness


;2. Run Headless (No Graphical Desktop)
{| class="wikitable"
A graphical desktop environment (X Window System) can consume 1-2GB of RAM that could otherwise be used by the database. VoIPmonitor is managed entirely through a web interface and ''does not require a desktop environment''.
|-
! Server Type !! Formula !! Example (32GB RAM)
|-
| '''Shared''' (VoIPmonitor + MySQL) || (Total RAM - VoIPmonitor - OS) / 2 || 14GB
|-
| '''Dedicated''' MySQL server || 50-70% of total RAM || 20-22GB
|}


<syntaxhighlight lang="bash">
'''RAM Recommendations:'''
# Stop and disable the graphical display manager (method varies by distribution)
# For Ubuntu/Debian with GDM:
systemctl stop gdm
systemctl disable gdm


# For CentOS/RHEL with GNOME:
{| class="wikitable"
systemctl stop gdm
|-
systemctl disable gdm
! Deployment Size !! Minimum !! Recommended
|-
| Small (<500 calls) || 8GB || 16GB
|-
| Medium (500-2000) || 16GB || 32GB
|-
| Large (>2000) || 32GB || 64GB+
|}


# For X.org generic:
=== Key MySQL Parameters ===
systemctl stop display-manager
systemctl disable display-manager


# Set the default runlevel to multi-user (no GUI)
<syntaxhighlight lang="ini">
systemctl set-default multi-user.target
# /etc/mysql/my.cnf or mariadb.conf.d/50-server.cnf
[mysqld]
innodb_buffer_pool_size = 14G
innodb_flush_log_at_trx_commit = 2  # Faster, minimal data loss risk
innodb_file_per_table = 1          # Essential for partitioning
innodb_compression_algorithm = lz4  # MariaDB only
</syntaxhighlight>
</syntaxhighlight>


To verify memory usage before and after:
=== Slow Query Log ===
<syntaxhighlight lang="bash">
# Check total memory and usage before removal
free -h


# Identify services using the most memory
The slow query log can consume significant memory. Consider disabling on high-traffic systems:
ps aux --sort=-%mem | head -10


# After disabling the GUI, verify memory freed
<syntaxhighlight lang="ini">
free -h
[mysqld]
slow_query_log = 0
# Or increase threshold: long_query_time = 600
</syntaxhighlight>
</syntaxhighlight>


;3. Monitor for OOM Killer Activity
=== Database Partitioning ===
After making changes, verify the system is stable by checking kernel logs for OOM killer events:
 
VoIPmonitor automatically partitions large tables (like <code>cdr</code>) by day. This is enabled by default and '''highly recommended'''.
 
See [[Data_Cleaning#Database_Cleaning_.28CDR_Retention.29|Database Partitioning]] for details.
 
=== Troubleshooting: Connection Refused ===
 
'''Symptoms:''' GUI crashes, "Connection refused" errors, intermittent issues during peak volumes.
 
'''Cause:''' <code>innodb_buffer_pool_size</code> too low (default 128M is insufficient).
 
'''Solution:''' Increase to 6G+ based on available RAM:
 
<syntaxhighlight lang="ini">
[mysqld]
innodb_buffer_pool_size = 6G
</syntaxhighlight>


<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Watch for recent OOM killer messages
systemctl restart mariadb
dmesg -T | grep -i "killed process"
</syntaxhighlight>
 
== Component Separation (Multi-Host Architecture) ==
 
For deployments exceeding 5,000-10,000 concurrent calls, separate VoIPmonitor components onto dedicated hosts.
 
=== Architecture Overview ===
 
{| class="wikitable"
|-
! Host !! Component !! Primary Resources !! Scaling Strategy
|-
| '''Host 1''' || MySQL Database || RAM, fast SSD || Add RAM, read replicas
|-
| '''Host 2''' || Sensor(s) || CPU (t0 thread), network || DPDK/PF_RING, more sensors
|-
| '''Host 3''' || GUI || CPU, network || Load balancer, caching
|}
 
=== Configuration ===
 
'''MySQL Server:'''
<syntaxhighlight lang="ini">
# /etc/mysql/my.cnf
[mysqld]
bind-address = 0.0.0.0
innodb_buffer_pool_size = 50G  # 50-70% RAM for dedicated server
</syntaxhighlight>


# If OOM killer is still active, it will show events like:
<syntaxhighlight lang="sql">
# Out of memory: Kill process 12345 (mysqld) score 900 or sacrifice child
CREATE USER 'voipmonitor'@'%' IDENTIFIED BY 'strong_password';
GRANT ALL PRIVILEGES ON voipmonitor.* TO 'voipmonitor'@'%';
</syntaxhighlight>
</syntaxhighlight>


If you continue to see OOM killer events targeting MySQL or VoIPmonitor after adding RAM and removing the GUI, you may need to:
'''Sensor:'''
* Further reduce `innodb_buffer_pool_size` in your MySQL configuration
<syntaxhighlight lang="ini">
* Consider moving the database to a dedicated server
# /etc/voipmonitor.conf
* Reduce call recording or traffic monitoring load
id_sensor = 1
mysqlhost = mysql.server.ip
mysqldb = voipmonitor
mysqlusername = voipmonitor
mysqlpassword = strong_password
</syntaxhighlight>


'''Calculation for shared servers:'''
'''GUI:''' Configure via Settings > System Configuration > Database, or edit <code>config/system_configuration.php</code>.
<syntaxhighlight lang="bash">
 
Formula: innodb_buffer_pool_size = (Total RAM - VoIPmonitor memory - OS & services overhead - safety margin) / 2
'''Firewall Rules:'''
{| class="wikitable"
|-
! Source !! Destination !! Port !! Purpose
|-
| Sensor || MySQL || 3306 || CDR writes
|-
| GUI || MySQL || 3306 || Queries
|-
| GUI || Sensor(s) || 5029 || PCAP retrieval
|-
| Users || GUI || 80, 443 || Web access
|}


Example for a 32GB server:
{{Note|1=Component separation can be combined with [[Sniffer_distributed_architecture|Client-Server mode]] for multi-site deployments.}}
- Total RAM: 32GB
- VoIPmonitor process memory (check with ps aux): 2GB
- OS + other services overhead: 2GB
- Available for buffer pool: 28GB
- Recommended innodb_buffer_pool_size = 14G
</syntaxhighlight>


'''Calculation for dedicated database servers:'''
== Monitoring Performance ==
For a server running ONLY MySQL, a good starting point is 50-70% of the server's total available RAM. For a dedicated database server, 8GB is a good minimum, with 32GB or more being optimal for large databases.


* '''`innodb_flush_log_at_trx_commit = 2`''': The default value of `1` forces a write to disk for every single transaction, which is very slow without a high-end, battery-backed RAID controller. Setting it to `2` relaxes this, flushing logs to the OS cache and writing to disk once per second. This dramatically improves write performance with a minimal risk of data loss (max 1-2 seconds) in case of a full OS crash.
VoIPmonitor logs performance metrics every 10 seconds to syslog. Key metrics to watch:


* '''`innodb_file_per_table = 1`''': This instructs InnoDB to store each table and its indexes in its own `.ibd` file, rather than in one giant, monolithic `ibdata1` file. This is essential for performance, management, and features like table compression and partitioning.
{| class="wikitable"
|-
! Metric !! Warning Sign !! Bottleneck Type
|-
| <code>t0CPU</code> || >90% || CPU (packet capture limit)
|-
| <code>heap[A&#124;B&#124;C]</code> || A >50% || I/O or CPU (buffer filling)
|-
| <code>SQLq</code> || Growing || Database
|-
| <code>comp</code> || Maxed out || I/O (compression waiting for disk)
|}


* '''LZ4 Compression:''' For modern MySQL/MariaDB versions, using LZ4 for page compression offers a great balance of reduced storage size and minimal CPU overhead.
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# In my.cnf [mysqld] section
# Monitor in real-time
innodb_compression_algorithm=lz4
journalctl -u voipmonitor -f
# For MariaDB, you may also need to set default table formats:
# mysqlcompress_type = PAGE_COMPRESSED=1 in voipmonitor.conf
</syntaxhighlight>
</syntaxhighlight>


=== Database Partitioning ===
'''Main article: [[Syslog_Status_Line]]''' - Complete reference for all metrics with detailed explanations and troubleshooting guidance.
Partitioning is a feature where VoIPmonitor automatically splits large tables (like `cdr`) into smaller, more manageable pieces, typically one per day. This is enabled by default and is '''highly recommended'''.
* '''Benefits:'''
    * Massively improves query performance in the GUI, as it only needs to scan partitions relevant to the selected time range.
    * Allows for instant deletion of old data by dropping a partition, which is thousands of times faster than running a `DELETE` query on millions of rows.


== 4. Monitoring Live Performance ==
'''For bottleneck diagnosis:''' See [[Sniffer_troubleshooting#Diagnose:_I.2FO_vs_CPU_Bottleneck|I/O vs CPU Bottleneck Diagnosis]] for step-by-step diagnostic procedure using syslog metrics and Linux tools.
VoIPmonitor logs detailed performance metrics every 10 seconds to syslog. You can watch them live:
<syntaxhighlight lang="bash">
tail -f /var/log/syslog  # Debian/Ubuntu
tail -f /var/log/messages # CentOS/RHEL
</syntaxhighlight>
A sample log line:
<code>voipmonitor[15567]: calls[315][355] PS[C:4 S:29/29 R:6354 A:6484] SQLq[0] heap[0|0|0] comp[54] [12.6Mb/s] t0CPU[5.2%] ... RSS/VSZ[323|752]MB</code>


* '''`calls[X][Y]`''': X = active calls, Y = total calls in memory.
== See Also ==
* '''`SQLq[C]`''': Number of SQL queries waiting to be sent to the database. If this number is consistently growing, your database cannot keep up.
* '''`heap[A|B|C]`''': Memory usage percentages for internal buffers. If A (main heap) reaches 100%, packets will be dropped.
* '''`t0CPU[X%]`''': '''The most important CPU metric.''' This is the usage of the main packet capture thread. If it consistently exceeds 90-95%, you are at your server's capture limit.


[[File:kernelstandarddiagram.png]]
* [[Sniffer_troubleshooting]] - Troubleshooting including OOM issues
[[File:ntop.png]]
* [[Data_Cleaning]] - Database and spool retention
* [[Sniffer_configuration]] - Complete configuration reference
* [[DPDK]] - DPDK setup guide
* [[Sniffer_distributed_architecture]] - Client-Server mode


== AI Summary for RAG ==
== AI Summary for RAG ==
'''Summary:''' This article is an expert guide to scaling and performance tuning VoIPmonitor for high-traffic environments. It identifies the three main system bottlenecks: Packet Capturing (CPU-bound), Disk I/O (for PCAP storage), and Database Performance (MySQL/MariaDB). For packet capture, it details tuning network card drivers with `ethtool`, the benefits of modern kernels with `TPACKET_V3`, and advanced kernel-bypass solutions like DPDK, PF_RING, and Napatech SmartNICs. For I/O, it explains VoIPmonitor's efficient TAR-based storage and provides tuning tips for ext4 filesystems and RAID controller cache policies. The largest section focuses on MySQL/MariaDB tuning, with a critical warning about `innodb_buffer_pool_size` on shared servers: setting this parameter too high causes OOM killer events, CDR delays, crashes, and instability. The guide provides a calculation formula for shared servers: innodb_buffer_pool_size = (Total RAM - VoIPmonitor memory - OS & services overhead - safety margin) / 2. For a 32GB server with VoIPmonitor using 2GB, the recommended value is 14G. For dedicated database servers, 50-70% of total RAM is appropriate. The article also covers `innodb_flush_log_at_trx_commit`, `innodb_file_per_table`, and native LZ4 compression. It explains the critical role of database partitioning for query performance and data retention. Finally, it details how to interpret live performance statistics from the syslog to diagnose bottlenecks.
 
'''Keywords:''' performance tuning, scaling, high throughput, bottleneck, CPU bound, t0CPU, packet capture, TPACKET_V3, DPDK, ethtool, ring buffer, interrupt coalescing, PF_RING, Napatech, I/O bottleneck, IOPS, filesystem tuning, ext4, RAID cache, WriteBack, megacli, ssacli, MySQL performance, MariaDB tuning, innodb_buffer_pool_size, innodb_buffer_pool too high, buffer pool calculation, OOM killer crashes, shared server memory, innodb_flush_log_at_trx_commit, innodb_file_per_table, LZ4 compression, database partitioning, syslog, monitoring, high calls per second, CPS
<!-- This section is for AI/RAG systems. Do not edit manually. -->
'''Key Questions:'''
 
* How do I scale VoIPmonitor for thousands of concurrent calls?
=== Summary ===
* What are the main performance bottlenecks in VoIPmonitor?
Performance tuning guide for high-traffic VoIPmonitor deployments. Covers three main bottlenecks: CPU (t0 packet capture thread, single-core limit), Disk I/O (PCAP storage), and Database (MySQL/MariaDB). Threading is automatic since 2023.11 via threading_expanded=yes (use high_traffic for ≥1500 Mbit/s). NIC tuning: ethtool ring buffer and interrupt coalescing. CPU optimization: interface_ip_filter instead of BPF, jitterbuffer_f2=yes with f1/adapt disabled. Kernel bypass solutions: DPDK (~70% CPU reduction), PF_RING ZC, Napatech SmartNICs (<3% CPU at 10Gbit). Disk I/O: TAR archives reduce IOPS 10x, ext4 tuning (noatime, writeback), RAID WriteBack cache with BBU. Database: innodb_buffer_pool_size (50-70% RAM for dedicated server), innodb_flush_log_at_trx_commit=2, partitioning. Multi-host architecture for >5000-10000 concurrent calls separating MySQL, sensors, and GUI.
* How can I fix high t0CPU usage?
 
* What is DPDK and when should I use it?
=== Keywords ===
* What are the best `my.cnf` settings for a high-performance VoIPmonitor database?
scaling, performance, tuning, optimization, high traffic, bottleneck, CPU, t0CPU, t0 thread, single-core, disk I/O, storage, database, MySQL, MariaDB, threading_expanded, high_traffic, NIC tuning, ethtool, ring buffer, interrupt coalescing, interface_ip_filter, jitterbuffer, DPDK, PF_RING, Napatech, kernel bypass, TAR archive, ext4, noatime, writeback, RAID, WriteBack cache, BBU, innodb_buffer_pool_size, innodb_flush_log_at_trx_commit, partitioning, multi-host, component separation, concurrent calls, capacity, 10000 calls, heap, SQLq, compression threads, pcap_dump_writethreads
* How does database partitioning work and why is it important?
 
* My sniffer is dropping packets, how do I fix it?
=== Key Questions ===
* How do I interpret the performance metrics in the syslog?
* How to tune VoIPmonitor for high traffic?
* Should I use a dedicated database server for VoIPmonitor?
* How many concurrent calls can VoIPmonitor handle?
* How do I calculate innodb_buffer_pool_size for a shared server?
* What are the main performance bottlenecks?
* What happens if innodb_buffer_pool_size is set too high?
* How to optimize CPU usage for packet capture?
* Does high innodb_buffer_pool_size cause OOM killer crashes?
* What is threading_expanded and when to use high_traffic?
* What is the formula for innodb_buffer_pool_size on VoIPmonitor server?
* How to tune NIC for VoIPmonitor?
* Should innodb_buffer_pool_size be 50-70% of RAM on shared servers?
* How to reduce CPU with jitterbuffer settings?
* Should I add more RAM to improve VoIPmonitor database performance?
* What is DPDK and when to use it?
* How do I disable the graphical desktop environment to save memory on a VoIPmonitor server?
* How to optimize disk I/O for PCAP storage?
* How do I check for OOM killer activity with dmesg?
* How to tune ext4 filesystem for VoIPmonitor?
* How much RAM does VoIPmonitor database server need?
* What is the recommended innodb_buffer_pool_size?
* Why should VoIPmonitor server run headless without GUI?
* How to configure MySQL for VoIPmonitor performance?
* How to stop X server and graphical desktop on Linux server?
* When to separate VoIPmonitor components to multiple hosts?
* How much memory does a graphical desktop environment consume?
* How to monitor VoIPmonitor performance metrics?
* How to check if MySQL was killed by OOM killer?
* What do t0CPU, heap, SQLq metrics mean?
* What is the minimum recommended RAM for VoIPmonitor production server?

Latest revision as of 21:52, 20 January 2026


This guide covers performance tuning for high-traffic VoIPmonitor deployments, addressing the three primary system bottlenecks.

Understanding Performance Bottlenecks

A VoIPmonitor deployment's capacity is limited by three potential bottlenecks:

Bottleneck Description Monitor
1. Packet Capture Single CPU core reading packets from NIC t0CPU in syslog
2. Disk I/O Writing PCAP files to storage iostat, ioping
3. Database CDR ingestion and GUI queries SQLq in syslog

Capacity: A modern server (24-core Xeon, 10Gbit NIC) can handle ~10,000 concurrent calls with full RTP recording, or 60,000+ with SIP-only analysis.

Optimizing Packet Capture (CPU & Network)

The packet capture thread (t0) runs on a single CPU core. If t0CPU approaches 100%, you've hit the capture limit.

With a modern kernel and VoIPmonitor build, a standard Intel 10Gbit NIC handles up to 3 Gbit/s VoIP traffic without special drivers and almost full 10Gbit rate with DPDK

Threading (Automatic)

Since version 2023.11, VoIPmonitor uses threading_expanded=yes by default, which automatically spawns threads based on CPU load. No manual threading configuration is needed.

For very high traffic (≥1500 Mbit/s), set:

threading_expanded = high_traffic

See Threading Model for details.

NIC Tuning (>500 Mbit/s)

# Increase ring buffer (prevents packet loss during CPU spikes)
ethtool -g eth0                  # Check max size
ethtool -G eth0 rx 16384         # Set to max

# Enable interrupt coalescing (reduces CPU overhead)
ethtool -C eth0 rx-usecs 1022

Persistent settings (Debian/Ubuntu /etc/network/interfaces):

auto eth0
iface eth0 inet manual
    up ip link set $IFACE up
    up ip link set $IFACE promisc on
    up ethtool -G $IFACE rx 16384
    up ethtool -C $IFACE rx-usecs 1022

Configuration Optimizations

Parameter Purpose Recommendation
interface_ip_filter IP-based filtering More efficient than BPF filter
pcap_dump_writethreads_max Compression threads Set to CPU core count
jitterbuffer_f1/f2/adapt Jitter simulation Keep f2=yes, disable f1 and adapt to save CPU while keeping MOS
# /etc/voipmonitor.conf

# Efficient IP filtering (replaces BPF filter)
interface_ip_filter = 192.168.0.0/24
interface_ip_filter = 10.0.0.0/8

# Compression scaling
pcap_dump_writethreads = 1
pcap_dump_writethreads_max = 32
pcap_dump_asyncwrite = yes

ℹ️ Note: Recommended: jitterbuffer_f1=no, jitterbuffer_f2=yes, jitterbuffer_adapt=no. This saves CPU while preserving MOS-F2 metrics. Only disable f2 if you don't need quality monitoring at all.

Kernel-Bypass Solutions

For extreme loads, bypass the kernel network stack entirely:

Solution Type CPU Reduction Use Case
DPDK Open-source ~70% Multi-gigabit on commodity hardware
PF_RING ZC Commercial 90% → 20% High-volume enterprise
Napatech SmartNICs Hardware <3% at 10Gbit/s Extreme performance

Optimizing Disk I/O

VoIPmonitor Storage Strategy

VoIPmonitor groups all calls starting within the same minute into a single compressed .tar archive. This changes thousands of random writes into few sequential writes, reducing IOPS by 10x+.

Typical capacity: 7200 RPM SATA handles ~2,000 concurrent calls with full recording.

Filesystem Tuning (ext4)

# Format without journal (requires battery-backed RAID)
mke2fs -t ext4 -O ^has_journal /dev/sda2
# /etc/fstab
/dev/sda2  /var/spool/voipmonitor  ext4  errors=remount-ro,noatime,data=writeback,barrier=0  0 0

⚠️ Warning: Disabling journal removes crash protection. Only use with battery-backed RAID controller (BBU).

RAID Controller

Set cache policy to WriteBack (not WriteThrough). Requires healthy BBU. Commands vary by vendor (megacli, ssacli, perccli).

Optimizing Database Performance

Memory Configuration

The most critical parameter is innodb_buffer_pool_size.

⚠️ Warning: Setting too high causes OOM killer events, CDR delays, and crashes. See OOM Troubleshooting.

Buffer Pool Sizing:

Server Type Formula Example (32GB RAM)
Shared (VoIPmonitor + MySQL) (Total RAM - VoIPmonitor - OS) / 2 14GB
Dedicated MySQL server 50-70% of total RAM 20-22GB

RAM Recommendations:

Deployment Size Minimum Recommended
Small (<500 calls) 8GB 16GB
Medium (500-2000) 16GB 32GB
Large (>2000) 32GB 64GB+

Key MySQL Parameters

# /etc/mysql/my.cnf or mariadb.conf.d/50-server.cnf
[mysqld]
innodb_buffer_pool_size = 14G
innodb_flush_log_at_trx_commit = 2  # Faster, minimal data loss risk
innodb_file_per_table = 1           # Essential for partitioning
innodb_compression_algorithm = lz4  # MariaDB only

Slow Query Log

The slow query log can consume significant memory. Consider disabling on high-traffic systems:

[mysqld]
slow_query_log = 0
# Or increase threshold: long_query_time = 600

Database Partitioning

VoIPmonitor automatically partitions large tables (like cdr) by day. This is enabled by default and highly recommended.

See Database Partitioning for details.

Troubleshooting: Connection Refused

Symptoms: GUI crashes, "Connection refused" errors, intermittent issues during peak volumes.

Cause: innodb_buffer_pool_size too low (default 128M is insufficient).

Solution: Increase to 6G+ based on available RAM:

[mysqld]
innodb_buffer_pool_size = 6G
systemctl restart mariadb

Component Separation (Multi-Host Architecture)

For deployments exceeding 5,000-10,000 concurrent calls, separate VoIPmonitor components onto dedicated hosts.

Architecture Overview

Host Component Primary Resources Scaling Strategy
Host 1 MySQL Database RAM, fast SSD Add RAM, read replicas
Host 2 Sensor(s) CPU (t0 thread), network DPDK/PF_RING, more sensors
Host 3 GUI CPU, network Load balancer, caching

Configuration

MySQL Server:

# /etc/mysql/my.cnf
[mysqld]
bind-address = 0.0.0.0
innodb_buffer_pool_size = 50G  # 50-70% RAM for dedicated server
CREATE USER 'voipmonitor'@'%' IDENTIFIED BY 'strong_password';
GRANT ALL PRIVILEGES ON voipmonitor.* TO 'voipmonitor'@'%';

Sensor:

# /etc/voipmonitor.conf
id_sensor = 1
mysqlhost = mysql.server.ip
mysqldb = voipmonitor
mysqlusername = voipmonitor
mysqlpassword = strong_password

GUI: Configure via Settings > System Configuration > Database, or edit config/system_configuration.php.

Firewall Rules:

Source Destination Port Purpose
Sensor MySQL 3306 CDR writes
GUI MySQL 3306 Queries
GUI Sensor(s) 5029 PCAP retrieval
Users GUI 80, 443 Web access

ℹ️ Note: Component separation can be combined with Client-Server mode for multi-site deployments.

Monitoring Performance

VoIPmonitor logs performance metrics every 10 seconds to syslog. Key metrics to watch:

Metric Warning Sign Bottleneck Type
t0CPU >90% CPU (packet capture limit)
heap[A|B|C] A >50% I/O or CPU (buffer filling)
SQLq Growing Database
comp Maxed out I/O (compression waiting for disk)
# Monitor in real-time
journalctl -u voipmonitor -f

Main article: Syslog_Status_Line - Complete reference for all metrics with detailed explanations and troubleshooting guidance.

For bottleneck diagnosis: See I/O vs CPU Bottleneck Diagnosis for step-by-step diagnostic procedure using syslog metrics and Linux tools.

See Also

AI Summary for RAG

Summary

Performance tuning guide for high-traffic VoIPmonitor deployments. Covers three main bottlenecks: CPU (t0 packet capture thread, single-core limit), Disk I/O (PCAP storage), and Database (MySQL/MariaDB). Threading is automatic since 2023.11 via threading_expanded=yes (use high_traffic for ≥1500 Mbit/s). NIC tuning: ethtool ring buffer and interrupt coalescing. CPU optimization: interface_ip_filter instead of BPF, jitterbuffer_f2=yes with f1/adapt disabled. Kernel bypass solutions: DPDK (~70% CPU reduction), PF_RING ZC, Napatech SmartNICs (<3% CPU at 10Gbit). Disk I/O: TAR archives reduce IOPS 10x, ext4 tuning (noatime, writeback), RAID WriteBack cache with BBU. Database: innodb_buffer_pool_size (50-70% RAM for dedicated server), innodb_flush_log_at_trx_commit=2, partitioning. Multi-host architecture for >5000-10000 concurrent calls separating MySQL, sensors, and GUI.

Keywords

scaling, performance, tuning, optimization, high traffic, bottleneck, CPU, t0CPU, t0 thread, single-core, disk I/O, storage, database, MySQL, MariaDB, threading_expanded, high_traffic, NIC tuning, ethtool, ring buffer, interrupt coalescing, interface_ip_filter, jitterbuffer, DPDK, PF_RING, Napatech, kernel bypass, TAR archive, ext4, noatime, writeback, RAID, WriteBack cache, BBU, innodb_buffer_pool_size, innodb_flush_log_at_trx_commit, partitioning, multi-host, component separation, concurrent calls, capacity, 10000 calls, heap, SQLq, compression threads, pcap_dump_writethreads

Key Questions

  • How to tune VoIPmonitor for high traffic?
  • How many concurrent calls can VoIPmonitor handle?
  • What are the main performance bottlenecks?
  • How to optimize CPU usage for packet capture?
  • What is threading_expanded and when to use high_traffic?
  • How to tune NIC for VoIPmonitor?
  • How to reduce CPU with jitterbuffer settings?
  • What is DPDK and when to use it?
  • How to optimize disk I/O for PCAP storage?
  • How to tune ext4 filesystem for VoIPmonitor?
  • What is the recommended innodb_buffer_pool_size?
  • How to configure MySQL for VoIPmonitor performance?
  • When to separate VoIPmonitor components to multiple hosts?
  • How to monitor VoIPmonitor performance metrics?
  • What do t0CPU, heap, SQLq metrics mean?