Scaling: Difference between revisions

From VoIPmonitor.org
(Add practical memory optimization steps: add RAM, run headless without GUI, monitor OOM killer with dmesg)
(Improve structure: add PlantUML diagram, wikitables, fix heading numbering, add Warning templates, reorganize memory section, add See Also, shorten AI Summary)
Line 1: Line 1:
'''This guide provides a comprehensive overview of performance tuning and scaling for VoIPmonitor. It covers the three primary system bottlenecks and offers practical, expert-level advice for optimizing your deployment for high traffic loads.'''
{{DISPLAYTITLE:Scaling and Performance Tuning}}
Category:Administration
 
This guide provides a comprehensive overview of performance tuning and scaling for VoIPmonitor. It covers the three primary system bottlenecks and offers practical, expert-level advice for optimizing your deployment for high traffic loads.


== Understanding Performance Bottlenecks ==
== Understanding Performance Bottlenecks ==
A VoIPmonitor deployment's maximum capacity is determined by three potential bottlenecks. Identifying and addressing the correct one is key to achieving high performance.
A VoIPmonitor deployment's maximum capacity is determined by three potential bottlenecks. Identifying and addressing the correct one is key to achieving high performance.
# '''Packet Capturing (CPU & Network Stack):''' The ability of a single CPU core to read packets from the network interface. This is often the first limit you will encounter.
 
# '''Disk I/O (Storage):''' The speed at which the sensor can write PCAP files to disk. This is critical when call recording is enabled.
<kroki lang="plantuml">
# '''Database Performance (MySQL/MariaDB):''' The rate at which the database can ingest Call Detail Records (CDRs) and serve data to the GUI.
@startuml
skinparam shadowing false
skinparam defaultFontName Arial
skinparam rectangle {
  BorderColor #4A90E2
  BackgroundColor #FFFFFF
}
 
title VoIPmonitor Performance Bottlenecks
 
rectangle "Network\nInterface" as NIC #E8F4FD
rectangle "Packet Capture\n(t0 thread)" as T0 #FFE6E6
rectangle "RTP/SIP\nProcessing" as PROC #E6FFE6
rectangle "PCAP Files\nStorage" as DISK #FFF3E6
database "MySQL/MariaDB\nDatabase" as DB #E6E6FF
 
NIC -right-> T0 : "1. CPU\nBottleneck"
T0 -right-> PROC
PROC -down-> DISK : "2. I/O\nBottleneck"
PROC -right-> DB : "3. Database\nBottleneck"
 
note bottom of T0
  Monitor: t0CPU in syslog
  Limit: Single CPU core
end note
 
note bottom of DISK
  Monitor: iostat, ioping
  Solution: SSD, TAR archives
end note
 
note bottom of DB
  Monitor: SQLq in syslog
  Solution: Partitioning, tuning
end note
@enduml
</kroki>
 
The three bottlenecks are:
# '''Packet Capturing (CPU & Network Stack):''' The ability of a single CPU core to read packets from the network interface. This is often the first limit encountered.
# '''Disk I/O (Storage):''' The speed at which the sensor can write PCAP files to disk. Critical when call recording is enabled.
# '''Database Performance (MySQL/MariaDB):''' The rate at which the database can ingest CDRs and serve data to the GUI.


On a modern, well-tuned server (e.g., 24-core Xeon, 10Gbit NIC), a single VoIPmonitor instance can handle up to '''10,000 concurrent calls''' with full RTP analysis and recording, or over '''60,000 concurrent calls''' with SIP-only analysis.
On a modern, well-tuned server (e.g., 24-core Xeon, 10Gbit NIC), a single VoIPmonitor instance can handle up to '''10,000 concurrent calls''' with full RTP analysis and recording, or over '''60,000 concurrent calls''' with SIP-only analysis.


== 1. Optimizing Packet Capturing (CPU & Network) ==
== Optimizing Packet Capturing (CPU & Network) ==
The most performance-critical task is the initial packet capture, which is handled by a single, highly optimized thread (t0). If this thread's CPU usage (`t0CPU` in logs) approaches 100%, you are hitting the capture limit. Here are the primary methods to optimize it, from easiest to most advanced.
The most performance-critical task is the initial packet capture, handled by a single, highly optimized thread (t0). If this thread's CPU usage (<code>t0CPU</code> in logs) approaches 100%, you are hitting the capture limit.


=== A. Use a Modern Linux Kernel & VoIPmonitor Build ===
=== Use a Modern Linux Kernel & VoIPmonitor Build ===
Modern Linux kernels (3.2+) and VoIPmonitor builds include '''TPACKET_V3''' support, a high-speed packet capture mechanism. This is the single most important factor for high performance.
Modern Linux kernels (3.2+) and VoIPmonitor builds include '''TPACKET_V3''' support, a high-speed packet capture mechanism. This is the single most important factor for high performance.
* '''Recommendation:''' Always use a recent Linux distribution (like AlmaLinux, Rocky Linux, or Debian) and the latest VoIPmonitor static binary. With this combination, a standard Intel 10Gbit NIC can often handle up to 2 Gbit/s of VoIP traffic without special drivers.


=== B. Network Stack & Driver Tuning ===
'''Recommendation:''' Always use a recent Linux distribution (AlmaLinux, Rocky Linux, or Debian) and the latest VoIPmonitor static binary. With this combination, a standard Intel 10Gbit NIC can often handle up to 2 Gbit/s of VoIP traffic without special drivers.
 
=== Network Stack & Driver Tuning ===
For high-traffic environments (>500 Mbit/s), fine-tuning the network driver and kernel parameters is essential.
For high-traffic environments (>500 Mbit/s), fine-tuning the network driver and kernel parameters is essential.


==== NIC Ring Buffer ====
==== NIC Ring Buffer ====
The ring buffer is a queue between the network card driver and the VoIPmonitor application. A larger buffer prevents packet loss during short CPU usage spikes.
The ring buffer is a queue between the network card driver and VoIPmonitor. A larger buffer prevents packet loss during short CPU usage spikes.
# '''Check maximum size:'''
 
#<syntaxhighlight lang="bash">ethtool -g eth0</syntaxhighlight>
<syntaxhighlight lang="bash">
# '''Set to maximum (e.g., 16384):'''
# Check maximum size
#<syntaxhighlight lang="bash">ethtool -G eth0 rx 16384</syntaxhighlight>
ethtool -g eth0
 
# Set to maximum (e.g., 16384)
ethtool -G eth0 rx 16384
</syntaxhighlight>


==== Interrupt Coalescing ====
==== Interrupt Coalescing ====
This setting batches multiple hardware interrupts into one, reducing CPU overhead.
This setting batches multiple hardware interrupts into one, reducing CPU overhead.
#<syntaxhighlight lang="bash">ethtool -C eth0 rx-usecs 1022</syntaxhighlight>
 
<syntaxhighlight lang="bash">
ethtool -C eth0 rx-usecs 1022
</syntaxhighlight>


==== Applying Settings Persistently ====
==== Applying Settings Persistently ====
To make these settings permanent, add them to your network configuration. For Debian/Ubuntu using `/etc/network/interfaces`:
To make these settings permanent, add them to your network configuration. For Debian/Ubuntu using <code>/etc/network/interfaces</code>:
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="ini">
auto eth0
auto eth0
iface eth0 inet manual
iface eth0 inet manual
Line 40: Line 93:
     up ethtool -C $IFACE rx-usecs 1022
     up ethtool -C $IFACE rx-usecs 1022
</syntaxhighlight>
</syntaxhighlight>
''Note: Modern systems using NetworkManager or systemd-networkd require different configuration methods.''


=== C. Advanced Offloading and Kernel-Bypass Solutions ===
Note: Modern systems using NetworkManager or systemd-networkd require different configuration methods.
 
=== Advanced Kernel-Bypass Solutions ===
If kernel and driver tuning are insufficient, you can offload the capture process entirely by bypassing the kernel's network stack.
If kernel and driver tuning are insufficient, you can offload the capture process entirely by bypassing the kernel's network stack.


* '''DPDK (Data Plane Development Kit):''' DPDK is a set of libraries and drivers for fast packet processing. VoIPmonitor can leverage DPDK to read packets directly from the network card, completely bypassing the kernel and significantly reducing CPU overhead. This is a powerful, open-source solution for achieving multi-gigabit capture rates on commodity hardware. For detailed installation and configuration instructions, see the [[DPDK|official DPDK guide]].
{| class="wikitable"
|-
! Solution !! Type !! CPU Reduction !! Use Case
|-
| '''[[DPDK]]''' || Open-source || ~70% || Multi-gigabit on commodity hardware
|-
| '''PF_RING ZC/DNA''' || Commercial || 90% → 20% || High-volume enterprise
|-
| '''Napatech SmartNICs''' || Hardware || <3% at 10 Gbit/s || Extreme performance requirements
|}
 
;DPDK (Data Plane Development Kit)
:A set of libraries and drivers for fast packet processing. VoIPmonitor can leverage DPDK to read packets directly from the network card, completely bypassing the kernel. See [[DPDK|DPDK guide]] for details.


* '''PF_RING ZC/DNA:''' A commercial software driver from ntop.org that also dramatically reduces CPU load by bypassing the kernel. In tests, it can reduce CPU usage from 90% to as low as 20% for the same traffic load.
;PF_RING ZC/DNA
:A commercial software driver from ntop.org that dramatically reduces CPU load by bypassing the kernel.


* '''Napatech SmartNICs:''' Specialized hardware acceleration cards that deliver packets to VoIPmonitor with near-zero CPU overhead (<3% CPU for 10 Gbit/s traffic). This is the ultimate solution for extreme performance requirements.
;Napatech SmartNICs
:Specialized hardware acceleration cards that deliver packets with near-zero CPU overhead.


== 2. Optimizing Disk I/O ==
== Optimizing Disk I/O ==
VoIPmonitor's modern storage engine is highly optimized to minimize random disk access, which is the primary cause of I/O bottlenecks.
VoIPmonitor's modern storage engine is highly optimized to minimize random disk access, which is the primary cause of I/O bottlenecks.


=== VoIPmonitor Storage Strategy ===
=== VoIPmonitor Storage Strategy ===
Instead of writing a separate PCAP file for each call (which causes massive I/O load), VoIPmonitor groups all calls starting within the same minute into a single compressed `.tar` archive. This changes the I/O pattern from thousands of small, random writes to a few large, sequential writes, reducing IOPS (I/O Operations Per Second) by a factor of 10 or more. A standard 7200 RPM SATA drive can typically handle up to 2000 concurrent calls with full recording.
Instead of writing a separate PCAP file for each call (which causes massive I/O load), VoIPmonitor groups all calls starting within the same minute into a single compressed <code>.tar</code> archive. This changes the I/O pattern from thousands of small, random writes to a few large, sequential writes, reducing IOPS by a factor of 10 or more.
 
'''Typical capacity:''' A standard 7200 RPM SATA drive can handle up to 2,000 concurrent calls with full recording.


=== Filesystem Tuning (ext4) ===
=== Filesystem Tuning (ext4) ===
For the spool directory (`/var/spool/voipmonitor`), using an optimized ext4 filesystem can improve performance.
For the spool directory (<code>/var/spool/voipmonitor</code>), using an optimized ext4 filesystem can improve performance.
*'''Example setup:'''
 
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Format partition without a journal (use with caution, requires battery-backed RAID controller)
# Format partition without a journal (use with caution, requires battery-backed RAID controller)
Line 67: Line 137:
/dev/sda2  /var/spool/voipmonitor  ext4    errors=remount-ro,noatime,data=writeback,barrier=0 0 0
/dev/sda2  /var/spool/voipmonitor  ext4    errors=remount-ro,noatime,data=writeback,barrier=0 0 0
</syntaxhighlight>
</syntaxhighlight>
{{Warning|Disabling the journal removes protection against filesystem corruption after crashes. Only use this with a battery-backed RAID controller.}}


=== RAID Controller Cache Policy ===
=== RAID Controller Cache Policy ===
A misconfigured RAID controller is a common bottleneck. For database and spool workloads, the cache policy should always be set to '''WriteBack''', not WriteThrough (this applies for RPM disks, not fast SSD). This requires a healthy Battery Backup Unit (BBU). If the BBU is dead or missing, you may need to force this setting.
A misconfigured RAID controller is a common bottleneck. For database and spool workloads, the cache policy should be set to '''WriteBack''', not WriteThrough. This applies for RPM disks, not fast SSDs.
* ''The specific commands vary by vendor (`megacli`, `ssacli`, `perccli`). Refer to the original, more detailed version of this article or vendor documentation for specific commands for LSI, HP, and Dell controllers.''
 
'''Requirements:'''
* A healthy Battery Backup Unit (BBU) is required
* Specific commands vary by vendor (<code>megacli</code>, <code>ssacli</code>, <code>perccli</code>)
* Refer to vendor documentation for LSI, HP, and Dell controllers
 
== Optimizing Database Performance (MySQL/MariaDB) ==
A well-tuned database is critical for both data ingestion from the sensor and GUI responsiveness.


== 3. Optimizing Database Performance (MySQL/MariaDB) ==
=== Memory Configuration ===
A well-tuned database is critical for both data ingestion from the sensor and responsiveness of the GUI.
The most critical database parameter is <code>innodb_buffer_pool_size</code>, which defines how much memory InnoDB uses to cache data and indexes.


=== Key Configuration Parameters ===
{{Warning|On servers running both VoIPmonitor and MySQL, setting <code>innodb_buffer_pool_size</code> too high causes OOM (Out of Memory) killer events, resulting in CDR delays, crashes, and instability. See [[Sniffer_troubleshooting#Check_for_OOM_.28Out_of_Memory.29_Issues|OOM Troubleshooting]] for details.}}
These settings should be placed in your `my.cnf` or a file in `/etc/mysql/mariadb.conf.d/`.


* '''`innodb_buffer_pool_size`''': '''This is the most important setting.''' It defines the amount of memory InnoDB uses to cache both data and indexes.
==== Buffer Pool Sizing ====


'''WARNING for shared servers (VoIPmonitor + MySQL on same host):'''
{| class="wikitable"
If VoIPmonitor and MySQL are running on the same server, setting `innodb_buffer_pool_size` too high will cause Out of Memory (OOM) killer events. MySQL will consume most of the available RAM, causing the kernel to kill processes (usually MySQL first, then VoIPmonitor), resulting in CDR delays, crashes, and instability. See [[Sniffer_troubleshooting|Step 8: Check for OOM Issues]] for detailed troubleshooting.
|-
! Server Type !! Calculation !! Example (32GB RAM)
|-
| '''Shared''' (VoIPmonitor + MySQL) || (Total RAM - VoIPmonitor - OS overhead) / 2 || 14GB
|-
| '''Dedicated''' MySQL server || 50-70% of total RAM || 20-22GB
|}


=== Practical Memory Optimization Steps ===
For shared servers, use this formula:
<syntaxhighlight lang="text">
innodb_buffer_pool_size = (Total RAM - VoIPmonitor memory - OS overhead - safety margin) / 2


If your server experiences database connection failures, slow CDR queries, or OOM killer events:
Example for a 32GB server:
- Total RAM: 32GB
- VoIPmonitor process memory: ~2GB (check with ps aux)
- OS + other services overhead: ~2GB
- Available for buffer pool: 28GB
- Recommended innodb_buffer_pool_size = 14G
</syntaxhighlight>


;1. Increase Server RAM
==== RAM Recommendations ====
The most effective solution for memory pressure is to add more RAM to your server. Database performance scales directly with available memory. As a general rule:
{| class="wikitable"
* '''Minimum recommended:''' 16GB RAM for production systems
|-
* '''Recommended for high traffic:''' 32GB or more
! Deployment Size !! Minimum RAM !! Recommended RAM
* Large database servers should have 64GB+ to maximize InnoDB buffer pool effectiveness
|-
| Small (<500 concurrent calls) || 8GB || 16GB
|-
| Medium (500-2000 calls) || 16GB || 32GB
|-
| Large (>2000 calls) || 32GB || 64GB+
|}


;2. Run Headless (No Graphical Desktop)
==== Disable Graphical Desktop ====
A graphical desktop environment (X Window System) can consume 1-2GB of RAM that could otherwise be used by the database. VoIPmonitor is managed entirely through a web interface and ''does not require a desktop environment''.
A graphical desktop environment consumes 1-2GB of RAM unnecessarily. VoIPmonitor is managed through a web interface and does not require a desktop.


<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Stop and disable the graphical display manager (method varies by distribution)
# Disable display manager
# For Ubuntu/Debian with GDM:
systemctl stop gdm          # Ubuntu/Debian with GDM
systemctl stop gdm
systemctl disable gdm
systemctl disable gdm


# For CentOS/RHEL with GNOME:
# Set default to multi-user (no GUI)
systemctl stop gdm
systemctl set-default multi-user.target
systemctl disable gdm
 
# Verify memory freed
free -h
</syntaxhighlight>
 
=== Other Key Parameters ===


# For X.org generic:
<syntaxhighlight lang="ini">
systemctl stop display-manager
# /etc/mysql/my.cnf or /etc/mysql/mariadb.conf.d/50-server.cnf
systemctl disable display-manager


# Set the default runlevel to multi-user (no GUI)
[mysqld]
systemctl set-default multi-user.target
# Buffer pool size (calculate per above)
</syntaxhighlight>
innodb_buffer_pool_size = 14G


To verify memory usage before and after:
# Flush logs to OS cache, write to disk once per second (faster, minimal data loss risk)
<syntaxhighlight lang="bash">
innodb_flush_log_at_trx_commit = 2
# Check total memory and usage before removal
free -h


# Identify services using the most memory
# Store each table in its own file (essential for partitioning)
ps aux --sort=-%mem | head -10
innodb_file_per_table = 1


# After disabling the GUI, verify memory freed
# LZ4 compression for modern MariaDB
free -h
innodb_compression_algorithm = lz4
</syntaxhighlight>
</syntaxhighlight>


;3. Monitor for OOM Killer Activity
=== Database Partitioning ===
After making changes, verify the system is stable by checking kernel logs for OOM killer events:
VoIPmonitor automatically splits large tables (like <code>cdr</code>) into daily partitions. This is enabled by default and '''highly recommended'''.


<syntaxhighlight lang="bash">
'''Benefits:'''
# Watch for recent OOM killer messages
* Massively improves GUI query performance (only relevant partitions are scanned)
dmesg -T | grep -i "killed process"
* Allows instant deletion of old data by dropping partitions (thousands of times faster than DELETE)


# If OOM killer is still active, it will show events like:
See [[Data_Cleaning#The_Modern_Method:_Partitioning_.28Recommended.29|Database Partitioning]] for configuration details.
# Out of memory: Kill process 12345 (mysqld) score 900 or sacrifice child
</syntaxhighlight>


If you continue to see OOM killer events targeting MySQL or VoIPmonitor after adding RAM and removing the GUI, you may need to:
== Monitoring Live Performance ==
* Further reduce `innodb_buffer_pool_size` in your MySQL configuration
VoIPmonitor logs detailed performance metrics every 10 seconds to syslog.
* Consider moving the database to a dedicated server
* Reduce call recording or traffic monitoring load


'''Calculation for shared servers:'''
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
Formula: innodb_buffer_pool_size = (Total RAM - VoIPmonitor memory - OS & services overhead - safety margin) / 2
# Debian/Ubuntu
tail -f /var/log/syslog | grep voipmonitor
 
# CentOS/RHEL
tail -f /var/log/messages | grep voipmonitor
</syntaxhighlight>


Example for a 32GB server:
=== Understanding the Log Output ===
- Total RAM: 32GB
Sample log line:
- VoIPmonitor process memory (check with ps aux): 2GB
<syntaxhighlight lang="text">
- OS + other services overhead: 2GB
voipmonitor[15567]: calls[315][355] PS[C:4 S:29/29 R:6354 A:6484] SQLq[0] heap[0|0|0] comp[54] [12.6Mb/s] t0CPU[5.2%] ... RSS/VSZ[323|752]MB
- Available for buffer pool: 28GB
- Recommended innodb_buffer_pool_size = 14G
</syntaxhighlight>
</syntaxhighlight>


'''Calculation for dedicated database servers:'''
{| class="wikitable"
For a server running ONLY MySQL, a good starting point is 50-70% of the server's total available RAM. For a dedicated database server, 8GB is a good minimum, with 32GB or more being optimal for large databases.
|-
! Metric !! Description !! Warning Threshold
|-
| <code>calls[X][Y]</code> || X = active calls, Y = total calls in memory || -
|-
| <code>SQLq[C]</code> || SQL queries waiting to be sent to database || Growing consistently = DB bottleneck
|-
| <code>heap[A{{!}}B{{!}}C]</code> || Memory usage % for internal buffers || A = 100% → packet drops
|-
| <code>t0CPU[X%]</code> || '''Main packet capture thread CPU usage''' || >90-95% = capture limit reached
|-
| <code>RSS/VSZ[X{{!}}Y]MB</code> || Resident/Virtual memory usage || RSS growing = memory leak
|}


* '''`innodb_flush_log_at_trx_commit = 2`''': The default value of `1` forces a write to disk for every single transaction, which is very slow without a high-end, battery-backed RAID controller. Setting it to `2` relaxes this, flushing logs to the OS cache and writing to disk once per second. This dramatically improves write performance with a minimal risk of data loss (max 1-2 seconds) in case of a full OS crash.
=== Performance Diagrams ===


* '''`innodb_file_per_table = 1`''': This instructs InnoDB to store each table and its indexes in its own `.ibd` file, rather than in one giant, monolithic `ibdata1` file. This is essential for performance, management, and features like table compression and partitioning.
The following diagrams illustrate the difference between standard kernel packet capture and optimized solutions:


* '''LZ4 Compression:''' For modern MySQL/MariaDB versions, using LZ4 for page compression offers a great balance of reduced storage size and minimal CPU overhead.
[[File:kernelstandarddiagram.png|thumb|center|600px|Standard kernel packet capture path - packets traverse multiple kernel layers before reaching VoIPmonitor]]
<syntaxhighlight lang="bash">
# In my.cnf [mysqld] section
innodb_compression_algorithm=lz4
# For MariaDB, you may also need to set default table formats:
# mysqlcompress_type = PAGE_COMPRESSED=1 in voipmonitor.conf
</syntaxhighlight>


=== Database Partitioning ===
[[File:ntop.png|thumb|center|600px|PF_RING/DPDK bypass mode - packets are delivered directly to VoIPmonitor, bypassing the kernel network stack]]
Partitioning is a feature where VoIPmonitor automatically splits large tables (like `cdr`) into smaller, more manageable pieces, typically one per day. This is enabled by default and is '''highly recommended'''.
* '''Benefits:'''
    * Massively improves query performance in the GUI, as it only needs to scan partitions relevant to the selected time range.
    * Allows for instant deletion of old data by dropping a partition, which is thousands of times faster than running a `DELETE` query on millions of rows.


== 4. Monitoring Live Performance ==
== See Also ==
VoIPmonitor logs detailed performance metrics every 10 seconds to syslog. You can watch them live:
* [[Sniffer_troubleshooting]] - Troubleshooting guide including OOM issues
<syntaxhighlight lang="bash">
* [[Data_Cleaning]] - Database and spool retention configuration
tail -f /var/log/syslog  # Debian/Ubuntu
* [[Sniffer_configuration]] - Complete configuration reference
tail -f /var/log/messages # CentOS/RHEL
* [[DPDK]] - DPDK setup guide
</syntaxhighlight>
* [[IO_Measurement]] - Disk I/O benchmarking tools
A sample log line:
<code>voipmonitor[15567]: calls[315][355] PS[C:4 S:29/29 R:6354 A:6484] SQLq[0] heap[0|0|0] comp[54] [12.6Mb/s] t0CPU[5.2%] ... RSS/VSZ[323|752]MB</code>


* '''`calls[X][Y]`''': X = active calls, Y = total calls in memory.
== AI Summary for RAG ==
* '''`SQLq[C]`''': Number of SQL queries waiting to be sent to the database. If this number is consistently growing, your database cannot keep up.
'''Summary:''' Expert guide to scaling VoIPmonitor for high-traffic environments. Covers three main bottlenecks: (1) Packet Capturing - optimized via TPACKET_V3, NIC tuning with ethtool, and kernel-bypass solutions (DPDK, PF_RING, Napatech); (2) Disk I/O - VoIPmonitor uses TAR-based storage to reduce IOPS, with ext4 tuning and RAID WriteBack cache; (3) Database - critical innodb_buffer_pool_size tuning with formula for shared servers: (Total RAM - VoIPmonitor - OS overhead) / 2. For 32GB shared server, recommend 14GB buffer pool. Dedicated servers can use 50-70% of RAM. Covers partitioning benefits and syslog monitoring (t0CPU, SQLq, heap metrics).
* '''`heap[A|B|C]`''': Memory usage percentages for internal buffers. If A (main heap) reaches 100%, packets will be dropped.
* '''`t0CPU[X%]`''': '''The most important CPU metric.''' This is the usage of the main packet capture thread. If it consistently exceeds 90-95%, you are at your server's capture limit.


[[File:kernelstandarddiagram.png]]
'''Keywords:''' scaling, performance tuning, bottleneck, t0CPU, TPACKET_V3, DPDK, PF_RING, ethtool, ring buffer, innodb_buffer_pool_size, OOM killer, shared server memory, database partitioning, SQLq monitoring
[[File:ntop.png]]


== AI Summary for RAG ==
'''Summary:''' This article is an expert guide to scaling and performance tuning VoIPmonitor for high-traffic environments. It identifies the three main system bottlenecks: Packet Capturing (CPU-bound), Disk I/O (for PCAP storage), and Database Performance (MySQL/MariaDB). For packet capture, it details tuning network card drivers with `ethtool`, the benefits of modern kernels with `TPACKET_V3`, and advanced kernel-bypass solutions like DPDK, PF_RING, and Napatech SmartNICs. For I/O, it explains VoIPmonitor's efficient TAR-based storage and provides tuning tips for ext4 filesystems and RAID controller cache policies. The largest section focuses on MySQL/MariaDB tuning, with a critical warning about `innodb_buffer_pool_size` on shared servers: setting this parameter too high causes OOM killer events, CDR delays, crashes, and instability. The guide provides a calculation formula for shared servers: innodb_buffer_pool_size = (Total RAM - VoIPmonitor memory - OS & services overhead - safety margin) / 2. For a 32GB server with VoIPmonitor using 2GB, the recommended value is 14G. For dedicated database servers, 50-70% of total RAM is appropriate. The article also covers `innodb_flush_log_at_trx_commit`, `innodb_file_per_table`, and native LZ4 compression. It explains the critical role of database partitioning for query performance and data retention. Finally, it details how to interpret live performance statistics from the syslog to diagnose bottlenecks.
'''Keywords:''' performance tuning, scaling, high throughput, bottleneck, CPU bound, t0CPU, packet capture, TPACKET_V3, DPDK, ethtool, ring buffer, interrupt coalescing, PF_RING, Napatech, I/O bottleneck, IOPS, filesystem tuning, ext4, RAID cache, WriteBack, megacli, ssacli, MySQL performance, MariaDB tuning, innodb_buffer_pool_size, innodb_buffer_pool too high, buffer pool calculation, OOM killer crashes, shared server memory, innodb_flush_log_at_trx_commit, innodb_file_per_table, LZ4 compression, database partitioning, syslog, monitoring, high calls per second, CPS
'''Key Questions:'''
'''Key Questions:'''
* How do I scale VoIPmonitor for thousands of concurrent calls?
* How do I scale VoIPmonitor for thousands of concurrent calls?
* What are the main performance bottlenecks in VoIPmonitor?
* What are the main performance bottlenecks in VoIPmonitor?
* How can I fix high t0CPU usage?
* How do I fix high t0CPU usage?
* What is DPDK and when should I use it?
* What is DPDK and when should I use it?
* What are the best `my.cnf` settings for a high-performance VoIPmonitor database?
* How does database partitioning work and why is it important?
* My sniffer is dropping packets, how do I fix it?
* How do I interpret the performance metrics in the syslog?
* Should I use a dedicated database server for VoIPmonitor?
* How do I calculate innodb_buffer_pool_size for a shared server?
* How do I calculate innodb_buffer_pool_size for a shared server?
* What happens if innodb_buffer_pool_size is set too high?
* What happens if innodb_buffer_pool_size is set too high?
* Does high innodb_buffer_pool_size cause OOM killer crashes?
* How do I interpret the performance metrics in syslog?
* What is the formula for innodb_buffer_pool_size on VoIPmonitor server?
* Should I use a dedicated database server for VoIPmonitor?
* Should innodb_buffer_pool_size be 50-70% of RAM on shared servers?
* How much RAM does a VoIPmonitor server need?
* Should I add more RAM to improve VoIPmonitor database performance?
* How do I disable the graphical desktop environment to save memory on a VoIPmonitor server?
* How do I check for OOM killer activity with dmesg?
* How much RAM does VoIPmonitor database server need?
* Why should VoIPmonitor server run headless without GUI?
* How to stop X server and graphical desktop on Linux server?
* How much memory does a graphical desktop environment consume?
* How to check if MySQL was killed by OOM killer?
* What is the minimum recommended RAM for VoIPmonitor production server?

Revision as of 20:30, 4 January 2026

Category:Administration

This guide provides a comprehensive overview of performance tuning and scaling for VoIPmonitor. It covers the three primary system bottlenecks and offers practical, expert-level advice for optimizing your deployment for high traffic loads.

Understanding Performance Bottlenecks

A VoIPmonitor deployment's maximum capacity is determined by three potential bottlenecks. Identifying and addressing the correct one is key to achieving high performance.

The three bottlenecks are:

  1. Packet Capturing (CPU & Network Stack): The ability of a single CPU core to read packets from the network interface. This is often the first limit encountered.
  2. Disk I/O (Storage): The speed at which the sensor can write PCAP files to disk. Critical when call recording is enabled.
  3. Database Performance (MySQL/MariaDB): The rate at which the database can ingest CDRs and serve data to the GUI.

On a modern, well-tuned server (e.g., 24-core Xeon, 10Gbit NIC), a single VoIPmonitor instance can handle up to 10,000 concurrent calls with full RTP analysis and recording, or over 60,000 concurrent calls with SIP-only analysis.

Optimizing Packet Capturing (CPU & Network)

The most performance-critical task is the initial packet capture, handled by a single, highly optimized thread (t0). If this thread's CPU usage (t0CPU in logs) approaches 100%, you are hitting the capture limit.

Use a Modern Linux Kernel & VoIPmonitor Build

Modern Linux kernels (3.2+) and VoIPmonitor builds include TPACKET_V3 support, a high-speed packet capture mechanism. This is the single most important factor for high performance.

Recommendation: Always use a recent Linux distribution (AlmaLinux, Rocky Linux, or Debian) and the latest VoIPmonitor static binary. With this combination, a standard Intel 10Gbit NIC can often handle up to 2 Gbit/s of VoIP traffic without special drivers.

Network Stack & Driver Tuning

For high-traffic environments (>500 Mbit/s), fine-tuning the network driver and kernel parameters is essential.

NIC Ring Buffer

The ring buffer is a queue between the network card driver and VoIPmonitor. A larger buffer prevents packet loss during short CPU usage spikes.

# Check maximum size
ethtool -g eth0

# Set to maximum (e.g., 16384)
ethtool -G eth0 rx 16384

Interrupt Coalescing

This setting batches multiple hardware interrupts into one, reducing CPU overhead.

ethtool -C eth0 rx-usecs 1022

Applying Settings Persistently

To make these settings permanent, add them to your network configuration. For Debian/Ubuntu using /etc/network/interfaces:

auto eth0
iface eth0 inet manual
    up ip link set $IFACE up
    up ip link set $IFACE promisc on
    up ethtool -G $IFACE rx 16384
    up ethtool -C $IFACE rx-usecs 1022

Note: Modern systems using NetworkManager or systemd-networkd require different configuration methods.

Advanced Kernel-Bypass Solutions

If kernel and driver tuning are insufficient, you can offload the capture process entirely by bypassing the kernel's network stack.

Solution Type CPU Reduction Use Case
DPDK Open-source ~70% Multi-gigabit on commodity hardware
PF_RING ZC/DNA Commercial 90% → 20% High-volume enterprise
Napatech SmartNICs Hardware <3% at 10 Gbit/s Extreme performance requirements
DPDK (Data Plane Development Kit)
A set of libraries and drivers for fast packet processing. VoIPmonitor can leverage DPDK to read packets directly from the network card, completely bypassing the kernel. See DPDK guide for details.
PF_RING ZC/DNA
A commercial software driver from ntop.org that dramatically reduces CPU load by bypassing the kernel.
Napatech SmartNICs
Specialized hardware acceleration cards that deliver packets with near-zero CPU overhead.

Optimizing Disk I/O

VoIPmonitor's modern storage engine is highly optimized to minimize random disk access, which is the primary cause of I/O bottlenecks.

VoIPmonitor Storage Strategy

Instead of writing a separate PCAP file for each call (which causes massive I/O load), VoIPmonitor groups all calls starting within the same minute into a single compressed .tar archive. This changes the I/O pattern from thousands of small, random writes to a few large, sequential writes, reducing IOPS by a factor of 10 or more.

Typical capacity: A standard 7200 RPM SATA drive can handle up to 2,000 concurrent calls with full recording.

Filesystem Tuning (ext4)

For the spool directory (/var/spool/voipmonitor), using an optimized ext4 filesystem can improve performance.

# Format partition without a journal (use with caution, requires battery-backed RAID controller)
mke2fs -t ext4 -O ^has_journal /dev/sda2

# Add to /etc/fstab for optimal performance
/dev/sda2   /var/spool/voipmonitor  ext4    errors=remount-ro,noatime,data=writeback,barrier=0 0 0

⚠️ Warning: Disabling the journal removes protection against filesystem corruption after crashes. Only use this with a battery-backed RAID controller.

RAID Controller Cache Policy

A misconfigured RAID controller is a common bottleneck. For database and spool workloads, the cache policy should be set to WriteBack, not WriteThrough. This applies for RPM disks, not fast SSDs.

Requirements:

  • A healthy Battery Backup Unit (BBU) is required
  • Specific commands vary by vendor (megacli, ssacli, perccli)
  • Refer to vendor documentation for LSI, HP, and Dell controllers

Optimizing Database Performance (MySQL/MariaDB)

A well-tuned database is critical for both data ingestion from the sensor and GUI responsiveness.

Memory Configuration

The most critical database parameter is innodb_buffer_pool_size, which defines how much memory InnoDB uses to cache data and indexes.

⚠️ Warning: On servers running both VoIPmonitor and MySQL, setting innodb_buffer_pool_size too high causes OOM (Out of Memory) killer events, resulting in CDR delays, crashes, and instability. See OOM Troubleshooting for details.

Buffer Pool Sizing

Server Type Calculation Example (32GB RAM)
Shared (VoIPmonitor + MySQL) (Total RAM - VoIPmonitor - OS overhead) / 2 14GB
Dedicated MySQL server 50-70% of total RAM 20-22GB

For shared servers, use this formula:

innodb_buffer_pool_size = (Total RAM - VoIPmonitor memory - OS overhead - safety margin) / 2

Example for a 32GB server:
- Total RAM: 32GB
- VoIPmonitor process memory: ~2GB (check with ps aux)
- OS + other services overhead: ~2GB
- Available for buffer pool: 28GB
- Recommended innodb_buffer_pool_size = 14G

RAM Recommendations

Deployment Size Minimum RAM Recommended RAM
Small (<500 concurrent calls) 8GB 16GB
Medium (500-2000 calls) 16GB 32GB
Large (>2000 calls) 32GB 64GB+

Disable Graphical Desktop

A graphical desktop environment consumes 1-2GB of RAM unnecessarily. VoIPmonitor is managed through a web interface and does not require a desktop.

# Disable display manager
systemctl stop gdm          # Ubuntu/Debian with GDM
systemctl disable gdm

# Set default to multi-user (no GUI)
systemctl set-default multi-user.target

# Verify memory freed
free -h

Other Key Parameters

# /etc/mysql/my.cnf or /etc/mysql/mariadb.conf.d/50-server.cnf

[mysqld]
# Buffer pool size (calculate per above)
innodb_buffer_pool_size = 14G

# Flush logs to OS cache, write to disk once per second (faster, minimal data loss risk)
innodb_flush_log_at_trx_commit = 2

# Store each table in its own file (essential for partitioning)
innodb_file_per_table = 1

# LZ4 compression for modern MariaDB
innodb_compression_algorithm = lz4

Database Partitioning

VoIPmonitor automatically splits large tables (like cdr) into daily partitions. This is enabled by default and highly recommended.

Benefits:

  • Massively improves GUI query performance (only relevant partitions are scanned)
  • Allows instant deletion of old data by dropping partitions (thousands of times faster than DELETE)

See Database Partitioning for configuration details.

Monitoring Live Performance

VoIPmonitor logs detailed performance metrics every 10 seconds to syslog.

# Debian/Ubuntu
tail -f /var/log/syslog | grep voipmonitor

# CentOS/RHEL
tail -f /var/log/messages | grep voipmonitor

Understanding the Log Output

Sample log line:

voipmonitor[15567]: calls[315][355] PS[C:4 S:29/29 R:6354 A:6484] SQLq[0] heap[0|0|0] comp[54] [12.6Mb/s] t0CPU[5.2%] ... RSS/VSZ[323|752]MB
Metric Description Warning Threshold
calls[X][Y] X = active calls, Y = total calls in memory -
SQLq[C] SQL queries waiting to be sent to database Growing consistently = DB bottleneck
B|C] Memory usage % for internal buffers A = 100% → packet drops
t0CPU[X%] Main packet capture thread CPU usage >90-95% = capture limit reached
Y]MB Resident/Virtual memory usage RSS growing = memory leak

Performance Diagrams

The following diagrams illustrate the difference between standard kernel packet capture and optimized solutions:

Standard kernel packet capture path - packets traverse multiple kernel layers before reaching VoIPmonitor
PF_RING/DPDK bypass mode - packets are delivered directly to VoIPmonitor, bypassing the kernel network stack

See Also

AI Summary for RAG

Summary: Expert guide to scaling VoIPmonitor for high-traffic environments. Covers three main bottlenecks: (1) Packet Capturing - optimized via TPACKET_V3, NIC tuning with ethtool, and kernel-bypass solutions (DPDK, PF_RING, Napatech); (2) Disk I/O - VoIPmonitor uses TAR-based storage to reduce IOPS, with ext4 tuning and RAID WriteBack cache; (3) Database - critical innodb_buffer_pool_size tuning with formula for shared servers: (Total RAM - VoIPmonitor - OS overhead) / 2. For 32GB shared server, recommend 14GB buffer pool. Dedicated servers can use 50-70% of RAM. Covers partitioning benefits and syslog monitoring (t0CPU, SQLq, heap metrics).

Keywords: scaling, performance tuning, bottleneck, t0CPU, TPACKET_V3, DPDK, PF_RING, ethtool, ring buffer, innodb_buffer_pool_size, OOM killer, shared server memory, database partitioning, SQLq monitoring

Key Questions:

  • How do I scale VoIPmonitor for thousands of concurrent calls?
  • What are the main performance bottlenecks in VoIPmonitor?
  • How do I fix high t0CPU usage?
  • What is DPDK and when should I use it?
  • How do I calculate innodb_buffer_pool_size for a shared server?
  • What happens if innodb_buffer_pool_size is set too high?
  • How do I interpret the performance metrics in syslog?
  • Should I use a dedicated database server for VoIPmonitor?
  • How much RAM does a VoIPmonitor server need?