Scaling: Difference between revisions

From VoIPmonitor.org
No edit summary
(Add link to Blog in See Also)
 
(66 intermediate revisions by 3 users not shown)
Line 1: Line 1:
= Introduction =
{{DISPLAYTITLE:Scaling and Performance Tuning}}
[[Category:Administration]]


Our maximum throughput on single server (24 cores Xeon, 10Gbit NIC card) is around 20 000 calls. But VoIPmonitor can work in cluster mode where remote sensors writes to one central database with central GUI server. Usual 4 core Xeon server (E3-1220) is able to handle up to 4000 simultaneous calls and probably more.  
This guide covers performance tuning for high-traffic VoIPmonitor deployments, addressing the three primary system bottlenecks.


VoIPmonitor is able to use all available CPU cores but there are several bottlenecks which you should consider before deploying and configuring VoIPmonitor. We do free full presale support in case you need help to deploy our solution.
== Understanding Performance Bottlenecks ==


Basically there are three types of bottlenecks - CPU, disk I/O throughput (writing pcap files) and storing CDR to MySQL (I/O or CPU). The sniffer is multithreaded application but certain tasks cannot be split to more threads. Main thread is reading packets from kernel - this is the top most consuming thread and it depends on CPU type and kernel version (and number of packets per second). Below 1000 concurrent calls you do not need to be worried about CPU on usual CPU (Xeon, i5). More details about CPU bottleneck see following chapter CPU bound.
A VoIPmonitor deployment's capacity is limited by three potential bottlenecks:


I/O bottleneck is most common problem for voipmonitor and it depends if you store to local mysql database along with storing pcap files on the same server and the same storage. See next chapter I/O throughput.
<kroki lang="plantuml">
@startuml
skinparam shadowing false
skinparam defaultFontName Arial


= CPU bound =
title VoIPmonitor Performance Bottlenecks


== Reading packets ==
rectangle "Network\nInterface" as NIC #E8F4FD
Main thread which reads packets from kernel cannot be split into more threads which limits number of concurrent calls for the whole server. CPU used for this thread is equivalent to running "tcpdump -i ethX -w /dev/null" which you can use as a test if your server is able to handle your traffic. Since version 8 there is each 10 seconds output to syslog or to stdout (if running with -k -v 1 switch) which measures how many CPU% takes the thread number 0 which is reading packets from kernel, if this is >95% it means you need better CPU or special network cards or DNA ntop.org driver for intel cards). We have tested sniffer on countless type of servers and basically the limit is somewhere around at 800Mbit for usual 1Gbit card on newer Xeon CPU and kernel versions >= 2.6.32. To get higher throughput special drivers or hardware is needed.
rectangle "Packet Capture\n(t0 thread)" as T0 #FFE6E6
rectangle "RTP/SIP\nProcessing" as PROC #E6FFE6
rectangle "PCAP Files\nStorage" as DISK #FFF3E6
database "MySQL/MariaDB" as DB #E6E6FF


There is also important thing to check for high throughput especially if you use kernel < 3.X which do not balance IRQ from RX NIC interrupts by default. You need to check /proc/interrupts to see if your RX queues are not bound only to CPU0 in case you see in syslog that CPU0 is on 100%. If you are not sure just upgrade your kernel to 3.X and the IRQ balance is by default spread to more CPU automatically.
NIC -right-> T0 : "1. CPU"
T0 -right-> PROC
PROC -down-> DISK : "2. I/O"
PROC -right-> DB : "3. Database"


On following picture you can see how packets are proccessed from ethernet card to Kernel space to ethernet driver which queues packets to ring buffer. Ring buffer (available since kernel 2.6.32 and libpcap > 1.0) is read by libpcap library to its own voipmonitor buffer. Kernel ring buffer is circular buffer directly in kernel which reads packets from ethernet card and overwrites the oldest one if not read in time. Ring buffer can be large at maximum 2GB (this is actual limit in libpcap library version 1.3). VoIPmonitor sniffer reads packets from ring buffer (thread T0) and pass packets to dynamic queue allocated on heap memory which can be configured to use as much RAM as you are able to allocate which conceals packet loss when disk I/O spikes occurs. This heap memory can be also compressed which doubles the room for packets but takes some CPU (30% for 1Gbit traffic on newer xeons). If the heap memory is full the sniffer (if enabled) can write packets to files which can be any path - dedicated storage are recommended - this feature is for those who cannot afford to loose single packet or for cases where the sniffer mirrors data to remote sniffer and if the connection breaks for some time the sniffer can write data from heap to temporary files which are sent back once the connection is reestablished.
note bottom of T0
  Monitor: t0CPU
  Limit: 1 CPU core
end note


Since sniffer 8.4 we have implemented more threading which is not enabled by default. If you have traffic over ~400MBit you should consider to enable it (see [[Sniffer_configuration#threading_mod]])
note bottom of DISK
  Monitor: iostat
  Solution: SSD, TAR
end note


Jitterbuffer simulater uses a lot of CPU and you can disable all three type of jitterbuffers if your server is not able to handle it (parameters are jitterbuffer_f1, jitterbuffer_f2, jitterbuffer_adapt). If you need to disable one of the jitterbuffer keep jitterbuffer_f2 enabled which is the most usefull. Jitterbuffer runs in threads and by default number of threads equals to number of cores.
note bottom of DB
  Monitor: SQLq
  Solution: RAM, tuning
end note
@enduml
</kroki>


if voipmonitor sniffer is running with at least "-v 1" you can watch several metrics:
{| class="wikitable"
|-
! Bottleneck !! Description !! Monitor
|-
| '''1. Packet Capture''' || Single CPU core reading packets from NIC || <code>t0CPU</code> in syslog
|-
| '''2. Disk I/O''' || Writing PCAP files to storage || '''<code>IO[C%]</code> in syslog''' (v2026.01.3+), <code>iostat</code>
|-
| '''3. Database''' || CDR ingestion and GUI queries || <code>SQLq</code> in syslog
|}


tail -f /var/log/syslog (on debian/ubuntu)
'''Capacity:''' A modern server (24-core Xeon, 10Gbit NIC) can handle '''~10,000 concurrent calls''' with full RTP recording, or '''60,000+''' with SIP-only analysis.
tail -f /var/log/messages (on redhat/centos)


calls[96][99] SQLqueue[0] heap[0.0% / 0.0%] hoverruns[0] comp[22.5%] [8.7Mb/s] t0CPU[24.8%] t1CPU[1.1%] t2CPU[4.9%] res[653.5MB] virt[908.1MB] calls[365][405]
== Optimizing Packet Capture (CPU & Network) ==


The packet capture thread (t0) runs on a single CPU core. If <code>t0CPU</code> approaches 100%, you've hit the capture limit.


*calls - [X][Y] - X is actual calls in voipmonitor memory. Y is total calls in voipmonitor memory (actual + queue buffer)
With a modern kernel and VoIPmonitor build, a standard Intel 10Gbit NIC handles up to 3 Gbit/s VoIP traffic without special drivers and almost full 10Gbit rate with [[DPDK]]
*SQLqueue - is number of sql statements (INSERTs) waiting to be written to MySQL. If this number is growing the MySQL is not able to handle it. See [[Scaling#innodb_flush_log_at_trx_commit]]
*hoverruns - if this number grows the heap buffer was completely filled. In this case the primary thread will stop reading packets from ringbuffer and if the ringbuffer is full packets will be lost - this occurrence will be logged to syslog.
*comp - compression buffer ratio (if enabled)
*t0CPU - This is %CPU utilization for thread 0. Thread 0 is process reading from kernel ring buffer. Once it is over 90% it means that the current setup is hitting limit processing packets from network card. Please write to support@voipmonitor.org if you hit this limit.
*t1CPU - This is %CPU utilization for thread 1. Thread 1 is process reading packets from thread 0, adding it to the buffer and compress it (if enabled).
*t2CPU - This is %CPU utilization for thread 2. Thread 2 is process which parses all SIP packets. If >90% there the sensor is hitting limit - please contact support@voipmonitor.org if you see >90%.
*res - RES stands for the resident size, which is an accurate representation of how much actual physical memory sniffer is consuming.
*virt - VIRT stands for the virtual size of a process, which is the sum of memory it is actually using, memory it has mapped into itself (for instance the video card’s RAM for the X server), files on disk that have been mapped into it (most notably shared libraries), and memory shared with other processes. VIRT represents how much memory the program is able to access at the present moment.


[[File:kernelstandarddiagram.png]]
=== Threading (Automatic) ===


Good tool for measuring CPU is http://htop.sourceforge.net/
Since version 2023.11, VoIPmonitor uses <code>threading_expanded=yes</code> by default, which automatically spawns threads based on CPU load. '''No manual threading configuration is needed.'''


[[File:ntop.png]]
For very high traffic (≥1500 Mbit/s), set:
<syntaxhighlight lang="ini">
threading_expanded = high_traffic
</syntaxhighlight>


=== Software driver alternatives ===
See [[Sniffer_configuration#Threading_Model|Threading Model]] for details.


=== NIC Tuning (>500 Mbit/s) ===


If your traffic is to much for your current hardware you can try PF_RING feature.
<syntaxhighlight lang="bash">
# Increase ring buffer (prevents packet loss during CPU spikes)
ethtool -g eth0                  # Check max size
ethtool -G eth0 rx 16384        # Set to max


# Enable interrupt coalescing (reduces CPU overhead)
ethtool -C eth0 rx-usecs 1022
</syntaxhighlight>


*PF_RING http://www.ntop.org/products/pf_ring/
'''Persistent settings''' (Debian/Ubuntu <code>/etc/network/interfaces</code>):
*Direct NIC Access http://www.ntop.org/products/pf_ring/dna/
<syntaxhighlight lang="ini">
*[[Pcap_worksheet]]
auto eth0
iface eth0 inet manual
    up ip link set $IFACE up
    up ip link set $IFACE promisc on
    up ethtool -G $IFACE rx 16384
    up ethtool -C $IFACE rx-usecs 1022
</syntaxhighlight>


We tried DNA driver for stock 1Gbit Intel card which reduces 100% CPU load to 20%.
=== Configuration Optimizations ===


=== Hardware NIC cards ===
{| class="wikitable"
|-
! Parameter !! Purpose !! Recommendation
|-
| <code>interface_ip_filter</code> || IP-based filtering || More efficient than BPF <code>filter</code>
|-
| <code>pcap_dump_writethreads_max</code> || Compression threads || Set to CPU core count
|-
| <code>jitterbuffer_f1/f2/adapt</code> || Jitter simulation || Keep <code>f2=yes</code>, disable f1 and adapt to save CPU while keeping MOS
|}


We have succesfully tested 1Gbit and 10Gbit cards from Napatech which delivers packets to voipmonitor at 0% CPU.  
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf


# Efficient IP filtering (replaces BPF filter)
interface_ip_filter = 192.168.0.0/24
interface_ip_filter = 10.0.0.0/8


= I/O bottleneck =
# Compression scaling
pcap_dump_writethreads = 1
pcap_dump_writethreads_max = 32
pcap_dump_asyncwrite = yes
</syntaxhighlight>


For storing up to 100 simultaneous calls (with all SIP and RTP packets saving) you do not need to be worried about I/O performance much. For storing up to 500 calls your disk must have enabled write cache (some raid controllers are not set well for random write scenarios or has write cache disabled at all). For up to 1000 calls you can use ordinary SATA 7.2kRPM disks with NCQ enabled - like Western digital RE4 edition (RE4 is important as it implements good NCQ) and we use it for installations for saving full SIP+RTP up to 1000 simultaneous calls. If you have more than 1000 simultaneous calls you can still use usual SATA disk but using cachedir feature (see below) or you need to look for some enterprise hardware raid and test the performance before you buy! Performance of such raids varies a lot and there is no general recommendation or working solutions which we can provide as a reference.  
{{Note|1=Recommended: <code>jitterbuffer_f1=no</code>, <code>jitterbuffer_f2=yes</code>, <code>jitterbuffer_adapt=no</code>. This saves CPU while preserving MOS-F2 metrics. Only disable f2 if you don't need quality monitoring at all.}}


SSD disks are not recommended for pcap storing because of its low durability.
=== Kernel-Bypass Solutions ===


VoIPmonitor sniffer produces the worst case scenario for spin disks - random write. The situation gets worse in case of ext3/ext4 file systems which uses journal and writes meta data enabled by default thus adding more I/O writes. But ext4 can be tweaked to get maximum performance disabling journal and some other tweaks in cost of readability in case of system crash. We are recommending to use dedicated disk and format it with special ext4 switches. If you cannot use dedicated disk for storing pcap files use dedicated partition formatted with special tweaks (see below).
For extreme loads, bypass the kernel network stack entirely:


The fastest filesystem for voipmonitor spool directory is EXT4 with following tweaks. Assuming your partition is /dev/sda2: 
{| class="wikitable"
|-
! Solution !! Type !! CPU Reduction !! Use Case
|-
| '''[[DPDK]]''' || Open-source || ~70% || Multi-gigabit on commodity hardware
|-
| '''PF_RING ZC''' || Commercial || 90% → 20% || High-volume enterprise
|-
| '''[[Napatech|Napatech SmartNICs]]''' || Hardware || <3% at 10Gbit/s || Extreme performance
|}


export mydisk=/dev/sda2
== Optimizing Disk I/O ==
mke2fs -t ext4 -O ^has_journal $mydisk
tune2fs -O ^has_journal $mydisk
tune2fs -o journal_data_writeback $mydisk
#add following line to /etc/fstab
/dev/sda2      /var/spool/voipmonitor  ext4    errors=remount-ro,noatime,nodiratime,data=writeback,barrier=0 0 0


In case your disk is still not able to handle traffic you can enable cachedir feature (voipmonitor.conf:cachedir) which stores all files into fast storage which can handle random write - for example RAM disk located at /dev/shm (every linux distribution have enabled this for up to 50% of memory). After the file is closed (call ends) voipmonitor automatically move the file from this storage to spooldir directory which is located on slower storage in guaranteed serial order which eliminates random write problem. This also allows to use network shares which is usually too slow to use it for writing directly to it by voipmonitor sniffer.
=== VoIPmonitor Storage Strategy ===


= MySQL performance =
VoIPmonitor groups all calls starting within the same minute into a single compressed <code>.tar</code> archive. This changes thousands of random writes into few sequential writes, reducing IOPS by 10x+.


== Write performance ==
'''Typical capacity:''' 7200 RPM SATA handles ~2,000 concurrent calls with full recording.


Write performance depends a lot if a storage is also used for pcap storing (thus sharing I/O with voipmonitor) and on how mysql handles writes (innodb_flush_log_at_trx_commit parameter - see below). Since sniffer version 6 MySQL tables uses compression which doubles write and read performance almost with no trade cost on CPU (well it depends on CPU type and ammount of traffic).
=== Filesystem Tuning (ext4) ===


=== innodb_flush_log_at_trx_commit ===
<syntaxhighlight lang="bash">
# Format without journal (requires battery-backed RAID)
mke2fs -t ext4 -O ^has_journal /dev/sda2
</syntaxhighlight>


Default value of 1 will mean each update transaction commit (or each statement outside of transaction) will need to flush log to the disk which is rather expensive, especially if you do not have Battery backed up cache. Many applications are OK with value 2 which means do not flush log to the disk but only flush it to OS cache. The log is still flushed to the disk each second so you normally would not loose more than 1-2 sec worth of updates. Value 0 is a bit faster but is a bit less secure as you can lose transactions even in case MySQL Server crashes. Value 2 only cause data loss with full OS crash.
<syntaxhighlight lang="ini">
If you are importing or altering cdr table it is strongly recommended to set temporarily innodb_flush_log_at_trx_commit = 0 and turn off binlog if you are importing CDR via inserts.
# /etc/fstab
/dev/sda2  /var/spool/voipmonitor  ext4  errors=remount-ro,noatime,data=writeback,barrier=0 0 0
</syntaxhighlight>


innodb_flush_log_at_trx_commit = 2
{{Warning|1=Disabling journal removes crash protection. Only use with battery-backed RAID controller (BBU).}}


=== compression ===
=== RAID Controller ===


==== MySQL 5.1 ====
Set cache policy to '''WriteBack''' (not WriteThrough). Requires healthy BBU. Commands vary by vendor (<code>megacli</code>, <code>ssacli</code>, <code>perccli</code>).


set in my.cf in [global] section this value:
== Optimizing Database Performance ==


innodb_file_per_table = 1
=== Memory Configuration ===


==== MySQL > 5.1 ====
The most critical parameter is <code>innodb_buffer_pool_size</code>.


MySQL> set global innodb_file_per_table = 1;
{{Warning|1=Setting too high causes OOM killer events, CDR delays, and crashes. See [[Sniffer_troubleshooting#Check_for_OOM_.28Out_of_Memory.29_Issues|OOM Troubleshooting]].}}
MySQL> set global innodb_file_format = barracuda;


==== Tune KEY_BLOCK_SIZE ====
'''Buffer Pool Sizing:'''


If you choose KEY_BLOCK_SIZE=2 instead of 8 the compression will be twice better but with CPU penalty on read. We have tested differences between no compression, 8kb and 2kb block size compression on 700 000 CDR with this result (on single core system – we do not know how it behaves on multi core systems). Testing query is select with group by.
{| class="wikitable"
No compression – 1.6 seconds
|-
8kb - 1.7 seconds
! Server Type !! Formula !! Example (32GB RAM)
4kb - 8 seconds
|-
| '''Shared''' (VoIPmonitor + MySQL) || (Total RAM - VoIPmonitor - OS) / 2 || 14GB
|-
| '''Dedicated''' MySQL server || 50-70% of total RAM || 20-22GB
|}


== Read performance ==
'''RAM Recommendations:'''


Read performance depends how big the database is and how fast disk operates and how much memory is allocated for innodb cache. Since sniffer version 7 all large tables uses partitioning by days which reduces needs to allocate very large cache to get good performance for the GUI. Partitioning works since MySQL 5.1 and is highly recommended. It also allows instantly removes old data by wiping partition instead of DELETE rows which can take hours on very large tables (millions of rows).
{| class="wikitable"
|-
! Deployment Size !! Minimum !! Recommended
|-
| Small (<500 calls) || 8GB || 16GB
|-
| Medium (500-2000) || 16GB || 32GB
|-
| Large (>2000) || 32GB || 64GB+
|}


=== innodb_buffer_pool_size ===
=== Key MySQL Parameters ===


This is very important variable to tune if you’re using Innodb tables. Innodb tables are much more sensitive to buffer size compared to MyISAM. MyISAM may work kind of OK with default key_buffer_size even with large data set but it will crawl with default innodb_buffer_pool_size. Also Innodb buffer pool caches both data and index pages so you do not need to leave space for OS cache so values up to 70-80% of memory often make sense for Innodb only installations.
<syntaxhighlight lang="ini">
# /etc/mysql/my.cnf or mariadb.conf.d/50-server.cnf
[mysqld]
innodb_buffer_pool_size = 14G
innodb_flush_log_at_trx_commit = 2  # Faster, minimal data loss risk
innodb_file_per_table = 1          # Essential for partitioning
innodb_compression_algorithm = lz4  # MariaDB only
</syntaxhighlight>


We recommend to set this value to 50% of your available RAM. 2GB at least, 8GB is optimal. All depends how many CDR do you have per day.
=== Slow Query Log ===


innodb_buffer_pool_size = 8GB
The slow query log can consume significant memory. Consider disabling on high-traffic systems:


<syntaxhighlight lang="ini">
[mysqld]
slow_query_log = 0
# Or increase threshold: long_query_time = 600
</syntaxhighlight>


== Partitioning ==
=== Database Partitioning ===


Partitioning is enabled by default since version 7. If you want to take benefit of it (which we strongly recommend) you need to start with clean database - there is no conversion procedure from old database to partitioned one. Just create new database and start voipmonitor with new database and partitioning will be created. You can turn off partitioning by setting cdr_partition = no in voipmonitor.conf
VoIPmonitor automatically partitions large tables (like <code>cdr</code>) by day. This is enabled by default and '''highly recommended'''.


See [[Data_Cleaning#Database_Cleaning_.28CDR_Retention.29|Database Partitioning]] for details.


=== Troubleshooting: Connection Refused ===


'''Symptoms:''' GUI crashes, "Connection refused" errors, intermittent issues during peak volumes.


'''Cause:''' <code>innodb_buffer_pool_size</code> too low (default 128M is insufficient).


'''Solution:''' Increase to 6G+ based on available RAM:


<syntaxhighlight lang="ini">
[mysqld]
innodb_buffer_pool_size = 6G
</syntaxhighlight>


<syntaxhighlight lang="bash">
systemctl restart mariadb
</syntaxhighlight>


== Component Separation (Multi-Host Architecture) ==


For deployments exceeding 5,000-10,000 concurrent calls, separate VoIPmonitor components onto dedicated hosts.


=== Architecture Overview ===


{| class="wikitable"
|-
! Host !! Component !! Primary Resources !! Scaling Strategy
|-
| '''Host 1''' || MySQL Database || RAM, fast SSD || Add RAM, read replicas
|-
| '''Host 2''' || Sensor(s) || CPU (t0 thread), network || DPDK/PF_RING, more sensors
|-
| '''Host 3''' || GUI || CPU, network || Load balancer, caching
|}


=== Configuration ===


'''MySQL Server:'''
<syntaxhighlight lang="ini">
# /etc/mysql/my.cnf
[mysqld]
bind-address = 0.0.0.0
innodb_buffer_pool_size = 50G  # 50-70% RAM for dedicated server
</syntaxhighlight>


<syntaxhighlight lang="sql">
CREATE USER 'voipmonitor'@'%' IDENTIFIED BY 'strong_password';
GRANT ALL PRIVILEGES ON voipmonitor.* TO 'voipmonitor'@'%';
</syntaxhighlight>


'''Sensor:'''
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
id_sensor = 1
mysqlhost = mysql.server.ip
mysqldb = voipmonitor
mysqlusername = voipmonitor
mysqlpassword = strong_password
</syntaxhighlight>


'''GUI:''' Configure via Settings > System Configuration > Database, or edit <code>config/system_configuration.php</code>.


'''Firewall Rules:'''
{| class="wikitable"
|-
! Source !! Destination !! Port !! Purpose
|-
| Sensor || MySQL || 3306 || CDR writes
|-
| GUI || MySQL || 3306 || Queries
|-
| GUI || Sensor(s) || 5029 || PCAP retrieval
|-
| Users || GUI || 80, 443 || Web access
|}


{{Note|1=Component separation can be combined with [[Sniffer_distributed_architecture|Client-Server mode]] for multi-site deployments.}}


== Monitoring Performance ==


VoIPmonitor logs performance metrics every 10 seconds to syslog. Key metrics to watch:


{| class="wikitable"
|-
! Metric !! Warning Sign !! Bottleneck Type
|-
| <code>t0CPU</code> || >90% || CPU (packet capture limit)
|-
| '''<code>IO[C%]</code>''' || '''C ≥ 80% (WARN), C ≥ 95% (DISK_SAT)''' || '''Disk I/O (v2026.01.3+)'''
|-
| <code>heap[A&#124;B&#124;C]</code> || A >50% || I/O or CPU (buffer filling)
|-
| <code>SQLq</code> || Growing || Database
|-
| <code>comp</code> || Maxed out || I/O (compression waiting for disk)
|}


<syntaxhighlight lang="bash">
# Monitor in real-time
journalctl -u voipmonitor -f
</syntaxhighlight>


'''Main article: [[Syslog_Status_Line]]''' - Complete reference for all metrics with detailed explanations and troubleshooting guidance.


'''For bottleneck diagnosis:''' See [[Sniffer_troubleshooting#Diagnose:_I.2FO_vs_CPU_Bottleneck|I/O vs CPU Bottleneck Diagnosis]] for step-by-step diagnostic procedure using syslog metrics and Linux tools.


== See Also ==


* [[Sniffer_troubleshooting]] - Troubleshooting including OOM issues
* [[Data_Cleaning]] - Database and spool retention
* [[Sniffer_configuration]] - Complete configuration reference
* [[DPDK]] - DPDK setup guide
* [[Sniffer_distributed_architecture]] - Client-Server mode
* [[Blog#January_2026:_Enhanced_Sensor_Monitoring_.26_Disk_I.2FO_Analytics|Blog: January 2026 Release Notes]] - Detailed overview of new Disk I/O monitoring and GUI improvements




Line 162: Line 340:




== AI Summary for RAG ==


<!-- This section is for AI/RAG systems. Do not edit manually. -->


=== Summary ===
Performance tuning guide for high-traffic VoIPmonitor deployments. Covers three main bottlenecks: CPU (t0 packet capture thread, single-core limit), Disk I/O (PCAP storage), and Database (MySQL/MariaDB). Threading is automatic since 2023.11 via threading_expanded=yes (use high_traffic for ≥1500 Mbit/s). NIC tuning: ethtool ring buffer and interrupt coalescing. CPU optimization: interface_ip_filter instead of BPF, jitterbuffer_f2=yes with f1/adapt disabled. Kernel bypass solutions: DPDK (~70% CPU reduction), PF_RING ZC, Napatech SmartNICs (<3% CPU at 10Gbit). Disk I/O: TAR archives reduce IOPS 10x, ext4 tuning (noatime, writeback), RAID WriteBack cache with BBU. Database: innodb_buffer_pool_size (50-70% RAM for dedicated server), innodb_flush_log_at_trx_commit=2, partitioning. Multi-host architecture for >5000-10000 concurrent calls separating MySQL, sensors, and GUI.


=== Keywords ===
scaling, performance, tuning, optimization, high traffic, bottleneck, CPU, t0CPU, t0 thread, single-core, disk I/O, storage, database, MySQL, MariaDB, threading_expanded, high_traffic, NIC tuning, ethtool, ring buffer, interrupt coalescing, interface_ip_filter, jitterbuffer, DPDK, PF_RING, Napatech, kernel bypass, TAR archive, ext4, noatime, writeback, RAID, WriteBack cache, BBU, innodb_buffer_pool_size, innodb_flush_log_at_trx_commit, partitioning, multi-host, component separation, concurrent calls, capacity, 10000 calls, heap, SQLq, compression threads, pcap_dump_writethreads


 
=== Key Questions ===
 
* How to tune VoIPmonitor for high traffic?
 
* How many concurrent calls can VoIPmonitor handle?
 
* What are the main performance bottlenecks?
 
* How to optimize CPU usage for packet capture?
 
* What is threading_expanded and when to use high_traffic?
 
* How to tune NIC for VoIPmonitor?
 
* How to reduce CPU with jitterbuffer settings?
 
* What is DPDK and when to use it?
 
* How to optimize disk I/O for PCAP storage?
 
* How to tune ext4 filesystem for VoIPmonitor?
.
* What is the recommended innodb_buffer_pool_size?
* How to configure MySQL for VoIPmonitor performance?
* When to separate VoIPmonitor components to multiple hosts?
* How to monitor VoIPmonitor performance metrics?
* What do t0CPU, heap, SQLq metrics mean?

Latest revision as of 18:35, 23 January 2026


This guide covers performance tuning for high-traffic VoIPmonitor deployments, addressing the three primary system bottlenecks.

Understanding Performance Bottlenecks

A VoIPmonitor deployment's capacity is limited by three potential bottlenecks:

Bottleneck Description Monitor
1. Packet Capture Single CPU core reading packets from NIC t0CPU in syslog
2. Disk I/O Writing PCAP files to storage IO[C%] in syslog (v2026.01.3+), iostat
3. Database CDR ingestion and GUI queries SQLq in syslog

Capacity: A modern server (24-core Xeon, 10Gbit NIC) can handle ~10,000 concurrent calls with full RTP recording, or 60,000+ with SIP-only analysis.

Optimizing Packet Capture (CPU & Network)

The packet capture thread (t0) runs on a single CPU core. If t0CPU approaches 100%, you've hit the capture limit.

With a modern kernel and VoIPmonitor build, a standard Intel 10Gbit NIC handles up to 3 Gbit/s VoIP traffic without special drivers and almost full 10Gbit rate with DPDK

Threading (Automatic)

Since version 2023.11, VoIPmonitor uses threading_expanded=yes by default, which automatically spawns threads based on CPU load. No manual threading configuration is needed.

For very high traffic (≥1500 Mbit/s), set:

threading_expanded = high_traffic

See Threading Model for details.

NIC Tuning (>500 Mbit/s)

# Increase ring buffer (prevents packet loss during CPU spikes)
ethtool -g eth0                  # Check max size
ethtool -G eth0 rx 16384         # Set to max

# Enable interrupt coalescing (reduces CPU overhead)
ethtool -C eth0 rx-usecs 1022

Persistent settings (Debian/Ubuntu /etc/network/interfaces):

auto eth0
iface eth0 inet manual
    up ip link set $IFACE up
    up ip link set $IFACE promisc on
    up ethtool -G $IFACE rx 16384
    up ethtool -C $IFACE rx-usecs 1022

Configuration Optimizations

Parameter Purpose Recommendation
interface_ip_filter IP-based filtering More efficient than BPF filter
pcap_dump_writethreads_max Compression threads Set to CPU core count
jitterbuffer_f1/f2/adapt Jitter simulation Keep f2=yes, disable f1 and adapt to save CPU while keeping MOS
# /etc/voipmonitor.conf

# Efficient IP filtering (replaces BPF filter)
interface_ip_filter = 192.168.0.0/24
interface_ip_filter = 10.0.0.0/8

# Compression scaling
pcap_dump_writethreads = 1
pcap_dump_writethreads_max = 32
pcap_dump_asyncwrite = yes

ℹ️ Note: Recommended: jitterbuffer_f1=no, jitterbuffer_f2=yes, jitterbuffer_adapt=no. This saves CPU while preserving MOS-F2 metrics. Only disable f2 if you don't need quality monitoring at all.

Kernel-Bypass Solutions

For extreme loads, bypass the kernel network stack entirely:

Solution Type CPU Reduction Use Case
DPDK Open-source ~70% Multi-gigabit on commodity hardware
PF_RING ZC Commercial 90% → 20% High-volume enterprise
Napatech SmartNICs Hardware <3% at 10Gbit/s Extreme performance

Optimizing Disk I/O

VoIPmonitor Storage Strategy

VoIPmonitor groups all calls starting within the same minute into a single compressed .tar archive. This changes thousands of random writes into few sequential writes, reducing IOPS by 10x+.

Typical capacity: 7200 RPM SATA handles ~2,000 concurrent calls with full recording.

Filesystem Tuning (ext4)

# Format without journal (requires battery-backed RAID)
mke2fs -t ext4 -O ^has_journal /dev/sda2
# /etc/fstab
/dev/sda2  /var/spool/voipmonitor  ext4  errors=remount-ro,noatime,data=writeback,barrier=0  0 0

⚠️ Warning: Disabling journal removes crash protection. Only use with battery-backed RAID controller (BBU).

RAID Controller

Set cache policy to WriteBack (not WriteThrough). Requires healthy BBU. Commands vary by vendor (megacli, ssacli, perccli).

Optimizing Database Performance

Memory Configuration

The most critical parameter is innodb_buffer_pool_size.

⚠️ Warning: Setting too high causes OOM killer events, CDR delays, and crashes. See OOM Troubleshooting.

Buffer Pool Sizing:

Server Type Formula Example (32GB RAM)
Shared (VoIPmonitor + MySQL) (Total RAM - VoIPmonitor - OS) / 2 14GB
Dedicated MySQL server 50-70% of total RAM 20-22GB

RAM Recommendations:

Deployment Size Minimum Recommended
Small (<500 calls) 8GB 16GB
Medium (500-2000) 16GB 32GB
Large (>2000) 32GB 64GB+

Key MySQL Parameters

# /etc/mysql/my.cnf or mariadb.conf.d/50-server.cnf
[mysqld]
innodb_buffer_pool_size = 14G
innodb_flush_log_at_trx_commit = 2  # Faster, minimal data loss risk
innodb_file_per_table = 1           # Essential for partitioning
innodb_compression_algorithm = lz4  # MariaDB only

Slow Query Log

The slow query log can consume significant memory. Consider disabling on high-traffic systems:

[mysqld]
slow_query_log = 0
# Or increase threshold: long_query_time = 600

Database Partitioning

VoIPmonitor automatically partitions large tables (like cdr) by day. This is enabled by default and highly recommended.

See Database Partitioning for details.

Troubleshooting: Connection Refused

Symptoms: GUI crashes, "Connection refused" errors, intermittent issues during peak volumes.

Cause: innodb_buffer_pool_size too low (default 128M is insufficient).

Solution: Increase to 6G+ based on available RAM:

[mysqld]
innodb_buffer_pool_size = 6G
systemctl restart mariadb

Component Separation (Multi-Host Architecture)

For deployments exceeding 5,000-10,000 concurrent calls, separate VoIPmonitor components onto dedicated hosts.

Architecture Overview

Host Component Primary Resources Scaling Strategy
Host 1 MySQL Database RAM, fast SSD Add RAM, read replicas
Host 2 Sensor(s) CPU (t0 thread), network DPDK/PF_RING, more sensors
Host 3 GUI CPU, network Load balancer, caching

Configuration

MySQL Server:

# /etc/mysql/my.cnf
[mysqld]
bind-address = 0.0.0.0
innodb_buffer_pool_size = 50G  # 50-70% RAM for dedicated server
CREATE USER 'voipmonitor'@'%' IDENTIFIED BY 'strong_password';
GRANT ALL PRIVILEGES ON voipmonitor.* TO 'voipmonitor'@'%';

Sensor:

# /etc/voipmonitor.conf
id_sensor = 1
mysqlhost = mysql.server.ip
mysqldb = voipmonitor
mysqlusername = voipmonitor
mysqlpassword = strong_password

GUI: Configure via Settings > System Configuration > Database, or edit config/system_configuration.php.

Firewall Rules:

Source Destination Port Purpose
Sensor MySQL 3306 CDR writes
GUI MySQL 3306 Queries
GUI Sensor(s) 5029 PCAP retrieval
Users GUI 80, 443 Web access

ℹ️ Note: Component separation can be combined with Client-Server mode for multi-site deployments.

Monitoring Performance

VoIPmonitor logs performance metrics every 10 seconds to syslog. Key metrics to watch:

Metric Warning Sign Bottleneck Type
t0CPU >90% CPU (packet capture limit)
IO[C%] C ≥ 80% (WARN), C ≥ 95% (DISK_SAT) Disk I/O (v2026.01.3+)
heap[A|B|C] A >50% I/O or CPU (buffer filling)
SQLq Growing Database
comp Maxed out I/O (compression waiting for disk)
# Monitor in real-time
journalctl -u voipmonitor -f

Main article: Syslog_Status_Line - Complete reference for all metrics with detailed explanations and troubleshooting guidance.

For bottleneck diagnosis: See I/O vs CPU Bottleneck Diagnosis for step-by-step diagnostic procedure using syslog metrics and Linux tools.

See Also



AI Summary for RAG

Summary

Performance tuning guide for high-traffic VoIPmonitor deployments. Covers three main bottlenecks: CPU (t0 packet capture thread, single-core limit), Disk I/O (PCAP storage), and Database (MySQL/MariaDB). Threading is automatic since 2023.11 via threading_expanded=yes (use high_traffic for ≥1500 Mbit/s). NIC tuning: ethtool ring buffer and interrupt coalescing. CPU optimization: interface_ip_filter instead of BPF, jitterbuffer_f2=yes with f1/adapt disabled. Kernel bypass solutions: DPDK (~70% CPU reduction), PF_RING ZC, Napatech SmartNICs (<3% CPU at 10Gbit). Disk I/O: TAR archives reduce IOPS 10x, ext4 tuning (noatime, writeback), RAID WriteBack cache with BBU. Database: innodb_buffer_pool_size (50-70% RAM for dedicated server), innodb_flush_log_at_trx_commit=2, partitioning. Multi-host architecture for >5000-10000 concurrent calls separating MySQL, sensors, and GUI.

Keywords

scaling, performance, tuning, optimization, high traffic, bottleneck, CPU, t0CPU, t0 thread, single-core, disk I/O, storage, database, MySQL, MariaDB, threading_expanded, high_traffic, NIC tuning, ethtool, ring buffer, interrupt coalescing, interface_ip_filter, jitterbuffer, DPDK, PF_RING, Napatech, kernel bypass, TAR archive, ext4, noatime, writeback, RAID, WriteBack cache, BBU, innodb_buffer_pool_size, innodb_flush_log_at_trx_commit, partitioning, multi-host, component separation, concurrent calls, capacity, 10000 calls, heap, SQLq, compression threads, pcap_dump_writethreads

Key Questions

  • How to tune VoIPmonitor for high traffic?
  • How many concurrent calls can VoIPmonitor handle?
  • What are the main performance bottlenecks?
  • How to optimize CPU usage for packet capture?
  • What is threading_expanded and when to use high_traffic?
  • How to tune NIC for VoIPmonitor?
  • How to reduce CPU with jitterbuffer settings?
  • What is DPDK and when to use it?
  • How to optimize disk I/O for PCAP storage?
  • How to tune ext4 filesystem for VoIPmonitor?
  • What is the recommended innodb_buffer_pool_size?
  • How to configure MySQL for VoIPmonitor performance?
  • When to separate VoIPmonitor components to multiple hosts?
  • How to monitor VoIPmonitor performance metrics?
  • What do t0CPU, heap, SQLq metrics mean?