Scaling: Difference between revisions

From VoIPmonitor.org
No edit summary
 
(28 intermediate revisions by 3 users not shown)
Line 1: Line 1:
= Introduction =
{{DISPLAYTITLE:Scaling and Performance Tuning}}
[[Category:Administration]]


Our maximum throughput on a single server (24 cores Xeon, 10Gbit NIC card) is around 60000 simultaneous calls without RTP analyzing (only SIP) and 10000 concurrent calls with full RTP analyzing and packet capturing to disk (around 1.6Gbit voip traffic). VoIPmonitor can work in distributed mode where remote sniffers writes to one central database having one GUI accessing all data from all sensors.  
This guide covers performance tuning for high-traffic VoIPmonitor deployments, addressing the three primary system bottlenecks.


VoIPmonitor is able to use all available CPU cores but there are several bottlenecks which you should consider before deploying and configuring VoIPmonitor (do not hesitate to write email to support@voipmonitor.org if you need more info / help with deploying)
== Understanding Performance Bottlenecks ==


There are three types of bottlenecks - CPU, disk I/O throughput (if storing SIP/RTP packets are enabled) and storing CDR to MySQL (which is both I/O and CPU). Since version 11.0 all CPU intensive tasks was split to threads but one bottleneck still remains and that is sniffing packets from kernel which cannot be split to more threads. This is the top most consuming thread and it depends on CPU type and kernel version (and number of packets per second). Below 500Mbit of traffic you do not need to be worried about CPU on usual CPU (Xeon, i5).
A VoIPmonitor deployment's capacity is limited by three potential bottlenecks:


Since voipmonitor version 11 the most problematic bottleneck - I/O throughput was solved by changing write strategy. Instead of writing pcap file for each call they are grouped into year-mon-day-hour-minute.tar files where the minute is start of the call. For example 2000 concurrent calls with enabled RTP+SIP+GRAPH disk IOPS lowered from 250 to 40. Another example on server with 60 000 concurrent calls enabling only SIP packets writing IOPS lowered from 4000 to 10 (yes it is not mistake). Usually single SATA disk with 7.5krpm has only 300 IOPS throughput.
<kroki lang="plantuml">
@startuml
skinparam shadowing false
skinparam defaultFontName Arial


= CPU bound =
title VoIPmonitor Performance Bottlenecks


== Reading packets ==
rectangle "Network\nInterface" as NIC #E8F4FD
Main thread (called t0CPU) which reads packets from kernel cannot be split into more threads which limits number of concurrent calls for the whole server. You can check how much CPU is spent in T0 thread looking in the syslog where voipmonitor sends each 10 seconds information about CPU and memory. If the t0CPU is >95% you are at the limit of capturing packets from the interface. Your options are:
rectangle "Packet Capture\n(t0 thread)" as T0 #FFE6E6
rectangle "RTP/SIP\nProcessing" as PROC #E6FFE6
rectangle "PCAP Files\nStorage" as DISK #FFF3E6
database "MySQL/MariaDB" as DB #E6E6FF


NIC -right-> T0 : "1. CPU"
T0 -right-> PROC
PROC -down-> DISK : "2. I/O"
PROC -right-> DB : "3. Database"


*better CPU (faster core)
note bottom of T0
*kernel >= 3.2 (out latest static binaries supports TPACKET_V3 feature which speedups capturing and reducing t0CPU
  Monitor: t0CPU
*special network card (Napatech for example reduces t0CPU to 3% for 1.6Gbit voip traffic)
  Limit: 1 CPU core
*Commercial ntop.org drivers for intel cards which offloads CPU from 90% to 20-40% for 1.5Gbit (tested)
end note


Recent sniffer versions with kernel >3.2 is able to sniff up to 2Gbit voip traffic on 10Gbit intel card with native intel drivers. The CPU configuration is Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz.
note bottom of DISK
  Monitor: iostat
  Solution: SSD, TAR
end note


There is also important thing to check for high throughput (>500Mbit) especially if you use kernel < 3.X which do not balance IRQ from RX NIC interrupts by default. You need to check /proc/interrupts to see if your RX queues are not bound only to CPU0 in case you see in syslog that CPU0 is on 100%. If you are not sure just upgrade your kernel to 3.X and the IRQ balance is by default spread to more CPU automatically.
note bottom of DB
  Monitor: SQLq
  Solution: RAM, tuning
end note
@enduml
</kroki>


Another consideration is limit number of rx tx queues on your nic card which is by default number of cores in the system which adds a lot of overhead causing more CPU cycles. six cores are sufficient with up to 2Gbit traffic on 10Gbit intel card. This is how you can limit it:
{| class="wikitable"
|-
! Bottleneck !! Description !! Monitor
|-
| '''1. Packet Capture''' || Single CPU core reading packets from NIC || <code>t0CPU</code> in syslog
|-
| '''2. Disk I/O''' || Writing PCAP files to storage || <code>iostat</code>, <code>ioping</code>
|-
| '''3. Database''' || CDR ingestion and GUI queries || <code>SQLq</code> in syslog
|}


modprobe ixgbe DCA=6,6 RSS=6,6
'''Capacity:''' A modern server (24-core Xeon, 10Gbit NIC) can handle '''~10,000 concurrent calls''' with full RTP recording, or '''60,000+''' with SIP-only analysis.


If you want to make it by default create file  /etc/modprobe.d/ixgbe.conf
== Optimizing Packet Capture (CPU & Network) ==
options ixgbe DCA=6,6 RSS=6,6


The next important thing is to set ring buffer in the hardware for RX to its maximum value. You can get what you can set at maximum by running
The packet capture thread (t0) runs on a single CPU core. If <code>t0CPU</code> approaches 100%, you've hit the capture limit.


ethtool -g eth3
With a modern kernel and VoIPmonitor build, a standard Intel 10Gbit NIC handles up to 3 Gbit/s VoIP traffic without special drivers and almost full 10Gbit rate with [[DPDK]]


For example for Intel 10Gbit card default value is 4092 which can be extended to 16384
=== Threading (Automatic) ===


ethtool -G eth3 rx 16384
Since version 2023.11, VoIPmonitor uses <code>threading_expanded=yes</code> by default, which automatically spawns threads based on CPU load. '''No manual threading configuration is needed.'''


This will prevent to miss packets when interrupts spike occurs
For very high traffic (≥1500 Mbit/s), set:
<syntaxhighlight lang="ini">
threading_expanded = high_traffic
</syntaxhighlight>


Tweak coalesce (hw interrupts)
See [[Sniffer_configuration#Threading_Model|Threading Model]] for details.


ethtool -C eth3 rx-usecs 1022
=== NIC Tuning (>500 Mbit/s) ===


(put it all togather into /etc/network/interfaces)  
<syntaxhighlight lang="bash">
# Increase ring buffer (prevents packet loss during CPU spikes)
ethtool -g eth0                  # Check max size
ethtool -G eth0 rx 16384        # Set to max


auto eth3
# Enable interrupt coalescing (reduces CPU overhead)
iface eth3 inet manual
ethtool -C eth0 rx-usecs 1022
up ip address add 0/0 dev $IFACE
</syntaxhighlight>
up ip link set $IFACE up
up ip link set $IFACE promisc on
up ethtool -A $IFACE autoneg off rx off tx off 2>&1 > /dev/null || true
up ethtool -C $IFACE rx-usecs 1022 2>&1 > /dev/null || true
up ethtool -G $IFACE rx 16384  2>&1 > /dev/null || true


'''Persistent settings''' (Debian/Ubuntu <code>/etc/network/interfaces</code>):
<syntaxhighlight lang="ini">
auto eth0
iface eth0 inet manual
    up ip link set $IFACE up
    up ip link set $IFACE promisc on
    up ethtool -G $IFACE rx 16384
    up ethtool -C $IFACE rx-usecs 1022
</syntaxhighlight>


On following picture you can see how packets are processed from ethernet card to Kernel space to ethernet driver which queues packets to ring buffer. Ring buffer (available since kernel 2.6.32 and libpcap > 1.0) is packet queue waiting to be fetched by the VoIPmonitor sniffer. This ringbuffer prevents packet loss in case the VoIPmonitor process does not have enough CPU cycles from kernel planner. Once the ringbuffer is filled it is logged to syslog that the sniffer loosed some packets from the ringbuffer. In this case you can increase the ringbuffer from the default 50MB to its maximum value of 2000MB.
=== Configuration Optimizations ===


{| class="wikitable"
|-
! Parameter !! Purpose !! Recommendation
|-
| <code>interface_ip_filter</code> || IP-based filtering || More efficient than BPF <code>filter</code>
|-
| <code>pcap_dump_writethreads_max</code> || Compression threads || Set to CPU core count
|-
| <code>jitterbuffer_f1/f2/adapt</code> || Jitter simulation || Keep <code>f2=yes</code>, disable f1 and adapt to save CPU while keeping MOS
|}


From the kernel ringbuffer voipmonitor is storing packets to its internal cache memory (heap) which you can control with packetbuffer_total_maxheap parameter. Default value is 200MB. This cache is also compressed by very fast snappy compression algorithm which allows to store more packets in the cache (about 50% ratio for G711 calls). Packets from the cache heap is sent to processing threads which analyzes SIP and RTP. From there packets are destroyed or continues to the write queue.  
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf


If voipmonitor sniffer is running with at least "-v 1" you can watch several metrics:
# Efficient IP filtering (replaces BPF filter)
interface_ip_filter = 192.168.0.0/24
interface_ip_filter = 10.0.0.0/8


tail -f /var/log/syslog (on debian/ubuntu)
# Compression scaling
tail -f /var/log/messages (on redhat/centos)
pcap_dump_writethreads = 1
pcap_dump_writethreads_max = 32
pcap_dump_asyncwrite = yes
</syntaxhighlight>


voipmonitor[15567]: calls[315][355] PS[C:4 S:29/29 R:6354 A:6484] SQLq[C:0 M:0 Cl:0] heap[0|0|0] comp[54] [12.6Mb/s] tarQ[1865] tarCPU[12.0|9.2|3.4|18.4%] t0CPU[5.2%] t1CPU[1.2%] t2CPU[0.9%] tacCPU[4.6|3.0|3.7|4.5%] RSS/VSZ[323|752]MB
{{Note|1=Recommended: <code>jitterbuffer_f1=no</code>, <code>jitterbuffer_f2=yes</code>, <code>jitterbuffer_adapt=no</code>. This saves CPU while preserving MOS-F2 metrics. Only disable f2 if you don't need quality monitoring at all.}}


*voipmonitor[15567] - 15567 is PID of the process
=== Kernel-Bypass Solutions ===
*calls - [X][Y] - X is actual calls in voipmonitor memory. Y is total calls in voipmonitor memory (actual + queue buffer) including SIP register
*PS - call/packet counters per second. C: number of calls / second, S: X/Y - X is number of valid SIP packets / second on sip ports. Y is number of all packets on sip ports. R: number of RTP packets / second of registered calls by voipmonitor per second. A: all packets per second
*SQLqueue - is number of sql statements (INSERTs) waiting to be written to MySQL. If this number is growing the MySQL is not able to handle it. See [[Scaling#innodb_flush_log_at_trx_commit]]
heap[A|B|C] - A: % of used heap memory. If 100 voipmonitor is not able to process packets in realtime due to CPU or I/O. B: number of % used memory in packetbuffer. C: number of % used for async write buffers (if 100% I/O is blocking and heap will grow and than ring buffer will get full and then packet loss will occur)
*hoverruns - if this number grows the heap buffer was completely filled. In this case the primary thread will stop reading packets from ringbuffer and if the ringbuffer is full packets will be lost - this occurrence will be logged to syslog.
*comp - compression buffer ratio (if enabled)
*[12.6Mb/s] - total network throughput
*tarQ[1865] - number of opened files when tar=yes enabled which is default option for sniffer >11
*tarCPU[12.0|9.2|3.4|18.4%] - CPU utilization when compressing tar which is enabled by default. Maximum thread is controlled by option tar_maxthreads which is 4 by default
*t0CPU - This is %CPU utilization for thread 0. Thread 0 is process reading from kernel ring buffer. Once it is over 90% it means that the current setup is hitting limit processing packets from network card. Please write to support@voipmonitor.org if you hit this limit.
*t1CPU - This is %CPU utilization for thread 1. Thread 1 is process reading packets from thread 0, adding it to the buffer and compress it (if enabled).
*t2CPU - on start there is only one thread (pb) - if it will be >50% new 3 threads spawn (hash control thread rm, hash computation rh, thread moving packets to rtp threads rd. If pb > 50% new thread d is created. If d>50% new sip preprocess thread (s) is created. If s thread >50% new extended (e) thread is created (searching and creating Call structure)
*tacCPU[N0|N1|N...] - %CPU utilization when compressing pcap files or when compressing internal memory if tar=yes (which is by default) number of threads grows automatically
*RSS/VSZ[323|752]MB - RSS stands for the resident size, which is an accurate representation of how much actual physical memory sniffer is consuming. VSZ stands for the virtual size of a process, which is the sum of memory it is actually using, memory it has mapped into itself (for instance the video card’s RAM for the X server), files on disk that have been mapped into it (most notably shared libraries), and memory shared with other processes. VIRT represents how much memory the program is able to access at the present moment.
*more about [[Logging]]


[[File:kernelstandarddiagram.png]]
For extreme loads, bypass the kernel network stack entirely:


Good tool for measuring CPU is http://htop.sourceforge.net/
{| class="wikitable"
|-
! Solution !! Type !! CPU Reduction !! Use Case
|-
| '''[[DPDK]]''' || Open-source || ~70% || Multi-gigabit on commodity hardware
|-
| '''PF_RING ZC''' || Commercial || 90% → 20% || High-volume enterprise
|-
| '''[[Napatech|Napatech SmartNICs]]''' || Hardware || <3% at 10Gbit/s || Extreme performance
|}


[[File:ntop.png]]
== Optimizing Disk I/O ==


=== Software driver alternatives ===
=== VoIPmonitor Storage Strategy ===


*TPACKET_V3 - New libpcap 1.5.3 and >= 3.2 kernel supports TPACKET_V3 which means that you need to compile this libpcap against recent linux kernel. In our tests we are able to sniff on 10Gbit intel card 2Gbit traffic without special drivers - just using the latest libpcap and kernel. Our latest statically compiled binaries (in download section) already includes TPACKET_V3 which means that if you are running kernel >= 3.2 it is used.  
VoIPmonitor groups all calls starting within the same minute into a single compressed <code>.tar</code> archive. This changes thousands of random writes into few sequential writes, reducing IOPS by 10x+.


*Direct NIC Access http://www.ntop.org/products/pf_ring/dna/ - We have tried DNA driver for stock 1Gbit Intel card which reduces 100% CPU load to 20%.  
'''Typical capacity:''' 7200 RPM SATA handles ~2,000 concurrent calls with full recording.


*[[Pcap_worksheet]]
=== Filesystem Tuning (ext4) ===


=== Hardware NIC cards ===
<syntaxhighlight lang="bash">
# Format without journal (requires battery-backed RAID)
mke2fs -t ext4 -O ^has_journal /dev/sda2
</syntaxhighlight>


We have successfully tested 1Gbit and 10Gbit cards from Napatech which delivers packets to VoIPmonitor at <3% CPU.
<syntaxhighlight lang="ini">
# /etc/fstab
/dev/sda2  /var/spool/voipmonitor  ext4  errors=remount-ro,noatime,data=writeback,barrier=0  0 0
</syntaxhighlight>


= I/O bottleneck =
{{Warning|1=Disabling journal removes crash protection. Only use with battery-backed RAID controller (BBU).}}


Since sniffer version 11.0 number of IOPS (overall random writes) lowered by factor 10 which means that the I/O bottleneck is not a problem anymore. 2000 simultaneous calls takes around 40 IOPS which is 10MB / sec which can handle almost any storage. But still the next section is good reading:
=== RAID Controller ===


== filesystem ==
Set cache policy to '''WriteBack''' (not WriteThrough). Requires healthy BBU. Commands vary by vendor (<code>megacli</code>, <code>ssacli</code>, <code>perccli</code>).


== Optimizing Database Performance ==


The fastest filesystem for voipmonitor spool directory is EXT4 with following tweaks. Assuming your partition is /dev/sda2: 
=== Memory Configuration ===


export mydisk=/dev/sda2
The most critical parameter is <code>innodb_buffer_pool_size</code>.
mke2fs -t ext4 -O ^has_journal $mydisk
tune2fs -O ^has_journal $mydisk
tune2fs -o journal_data_writeback $mydisk
#add following line to /etc/fstab
/dev/sda2      /var/spool/voipmonitor  ext4    errors=remount-ro,noatime,nodiratime,data=writeback,barrier=0 0 0


== LSI write back cache policy ==
{{Warning|1=Setting too high causes OOM killer events, CDR delays, and crashes. See [[Sniffer_troubleshooting#Check_for_OOM_.28Out_of_Memory.29_Issues|OOM Troubleshooting]].}}


On many installations a raid controller is in not optimally configured. To check what is your cache policy run:  
'''Buffer Pool Sizing:'''


rpm -Uhv http://dl.marmotte.net/rpms/redhat/el6/x86_64/megacli-8.00.46-2/megacli-8.00.46-2.x86_64.rpm
{| class="wikitable"
#Debian/Ubuntu - you can use following repository for package megacli installation. https://hwraid.le-vert.net/wiki/DebianPackages
|-
megacli -LDGetProp -Cache -L0 -a0
! Server Type !! Formula !! Example (32GB RAM)
  Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough, ReadAheadNone, Direct, No Write Cache if bad BBU
|-
Cache policy write through has very bad random write performance so you probably want to change it to write back cache policy:
| '''Shared''' (VoIPmonitor + MySQL) || (Total RAM - VoIPmonitor - OS) / 2 || 14GB
megacli  -LDSetProp -WB -L0 -a0
|-
  Battery needs replacement
| '''Dedicated''' MySQL server || 50-70% of total RAM || 20-22GB
So policy Change to WB will not come into effect immediately Set Write Policy to WriteBack on Adapter 0, VD 0 (target id: 0) success
|}


'''RAM Recommendations:'''


Recheck if the cache was really set to write back if not, you need to force write cache if battery is bad / missing with this command:
{| class="wikitable"
|-
! Deployment Size !! Minimum !! Recommended
|-
| Small (<500 calls) || 8GB || 16GB
|-
| Medium (500-2000) || 16GB || 32GB
|-
| Large (>2000) || 32GB || 64GB+
|}


megacli -LDSetProp CachedBadBBU -Lall -aAll
=== Key MySQL Parameters ===
Set Write Cache OK if bad BBU on Adapter 0, VD 0 (target id: 0) success Set Write Cache OK if bad BBU on Adapter 0, VD 1 (target id: 1) success
And then set the write back cache again
megacli  -LDSetProp -WB -L0 -a0
Please note that this example assumes you have one logical drive if you have more you need to repeat it for all of your virtual disks.


== HP SMART ARRAY ==
<syntaxhighlight lang="ini">
# /etc/mysql/my.cnf or mariadb.conf.d/50-server.cnf
[mysqld]
innodb_buffer_pool_size = 14G
innodb_flush_log_at_trx_commit = 2  # Faster, minimal data loss risk
innodb_file_per_table = 1          # Essential for partitioning
innodb_compression_algorithm = lz4  # MariaDB only
</syntaxhighlight>


=== Centos ===
=== Slow Query Log ===
==== controller class <9 ====


wget ftp://ftp.hp.com/pub/softlib2/software1/pubsw-linux/p1257348637/v71527/hpacucli-9.10-22.0.x86_64.rpm
The slow query log can consume significant memory. Consider disabling on high-traffic systems:
yum install hpacucli-9.10-22.0.x86_64.rpm


See status
<syntaxhighlight lang="ini">
[mysqld]
slow_query_log = 0
# Or increase threshold: long_query_time = 600
</syntaxhighlight>


hpacucli ctrl slot=0 show
=== Database Partitioning ===


Enable cache
VoIPmonitor automatically partitions large tables (like <code>cdr</code>) by day. This is enabled by default and '''highly recommended'''.


hpacucli ctrl slot=0 ld all modify arrayaccelerator=enable hpacucli ctrl slot=0 modify dwc=enable
See [[Data_Cleaning#Database_Cleaning_.28CDR_Retention.29|Database Partitioning]] for details.


Modify cache ratio between read and write:
=== Troubleshooting: Connection Refused ===
hpacucli ctrl slot=0 modify cacheratio=50/50


==== controller class >=9 ====
'''Symptoms:''' GUI crashes, "Connection refused" errors, intermittent issues during peak volumes.
Find package for download on this site:
https://downloads.linux.hpe.com/sdr/repo/mcp/centos/7/x86_64/current/


Download and install
'''Cause:''' <code>innodb_buffer_pool_size</code> too low (default 128M is insufficient).
wget https://downloads.linux.hpe.com/sdr/repo/mcp/centos/7/x86_64/current/ssacli-5.10-44.0.x86_64.rpm
rpm -i ssacli-5.10-44.0.x86_64.rpm


Check the config cache related
'''Solution:''' Increase to 6G+ based on available RAM:
ssacli ctrl slot=0 show config detail|grep ache


Read state, Enable write cache and set cache ratio for write
<syntaxhighlight lang="ini">
ssacli ctrl slot=0 modify dwc=?
[mysqld]
ssacli ctrl slot=0 modify dwc=enable forced
innodb_buffer_pool_size = 6G
ssacli controller slot=0 modify cacheratio=0/100
</syntaxhighlight>


Make sure that cache is enabled also when battery failure (not installed)
<syntaxhighlight lang="bash">
ssacli ctrl slot=0 modify nobatterywritecache=?
systemctl restart mariadb
ssacli ctrl slot=0 modify nobatterywritecache=enable
</syntaxhighlight>


== Component Separation (Multi-Host Architecture) ==


For deployments exceeding 5,000-10,000 concurrent calls, separate VoIPmonitor components onto dedicated hosts.


=== Ubuntu 18.04 ===
=== Architecture Overview ===
(Controller class 9 and above)


add to sources.list and install ssacli
{| class="wikitable"
deb http://downloads.linux.hpe.com/SDR/downloads/MCP/ubuntu xenial/current non-free
|-
! Host !! Component !! Primary Resources !! Scaling Strategy
|-
| '''Host 1''' || MySQL Database || RAM, fast SSD || Add RAM, read replicas
|-
| '''Host 2''' || Sensor(s) || CPU (t0 thread), network || DPDK/PF_RING, more sensors
|-
| '''Host 3''' || GUI || CPU, network || Load balancer, caching
|}


apt update
=== Configuration ===
apt install ssacli


Check the config cache related
'''MySQL Server:'''
ssacli ctrl slot=0 show config detail|grep ache
<syntaxhighlight lang="ini">
# /etc/mysql/my.cnf
[mysqld]
bind-address = 0.0.0.0
innodb_buffer_pool_size = 50G  # 50-70% RAM for dedicated server
</syntaxhighlight>


Read state, Enable write cache and set cache ratio for write
<syntaxhighlight lang="sql">
ssacli ctrl slot=0 modify dwc=?
CREATE USER 'voipmonitor'@'%' IDENTIFIED BY 'strong_password';
ssacli ctrl slot=0 modify dwc=enable forced
GRANT ALL PRIVILEGES ON voipmonitor.* TO 'voipmonitor'@'%';
ssacli controller slot=0 modify cacheratio=0/100
</syntaxhighlight>


Make sure that cache is enabled also when battery failure (not installed)
'''Sensor:'''
ssacli ctrl slot=0 modify nobatterywritecache=?
<syntaxhighlight lang="ini">
ssacli ctrl slot=0 modify nobatterywritecache=enable
# /etc/voipmonitor.conf
id_sensor = 1
mysqlhost = mysql.server.ip
mysqldb = voipmonitor
mysqlusername = voipmonitor
mysqlpassword = strong_password
</syntaxhighlight>


== DELL PERC class v8 and newer==
'''GUI:''' Configure via Settings > System Configuration > Database, or edit <code>config/system_configuration.php</code>.
Dell's perccli binary is used instead of megacli for perc class 8 and newer


Reading status
'''Firewall Rules:'''
{| class="wikitable"
|-
! Source !! Destination !! Port !! Purpose
|-
| Sensor || MySQL || 3306 || CDR writes
|-
| GUI || MySQL || 3306 || Queries
|-
| GUI || Sensor(s) || 5029 || PCAP retrieval
|-
| Users || GUI || 80, 443 || Web access
|}


./perccli64 /c0 show all
{{Note|1=Component separation can be combined with [[Sniffer_distributed_architecture|Client-Server mode]] for multi-site deployments.}}


Changing mode to writeback
== Monitoring Performance ==


perccli /c0/v0 set wrcache=wb
VoIPmonitor logs performance metrics every 10 seconds to syslog. Key metrics to watch:


Changing mode to writethru (for SSDs)
{| class="wikitable"
|-
! Metric !! Warning Sign !! Bottleneck Type
|-
| <code>t0CPU</code> || >90% || CPU (packet capture limit)
|-
| <code>heap[A&#124;B&#124;C]</code> || A >50% || I/O or CPU (buffer filling)
|-
| <code>SQLq</code> || Growing || Database
|-
| <code>comp</code> || Maxed out || I/O (compression waiting for disk)
|}


perccli /c0/v0 set wrcache=wt
<syntaxhighlight lang="bash">
# Monitor in real-time
journalctl -u voipmonitor -f
</syntaxhighlight>


= MySQL performance =
'''Main article: [[Syslog_Status_Line]]''' - Complete reference for all metrics with detailed explanations and troubleshooting guidance.


Before you create the database make sure that you either run
'''For bottleneck diagnosis:''' See [[Sniffer_troubleshooting#Diagnose:_I.2FO_vs_CPU_Bottleneck|I/O vs CPU Bottleneck Diagnosis]] for step-by-step diagnostic procedure using syslog metrics and Linux tools.


MySQL>SET GLOBAL innodb_file_per_table=1;
== See Also ==


or set in my.cnf file in global section SET innodb_file_per_table = 1
* [[Sniffer_troubleshooting]] - Troubleshooting including OOM issues
* [[Data_Cleaning]] - Database and spool retention
* [[Sniffer_configuration]] - Complete configuration reference
* [[DPDK]] - DPDK setup guide
* [[Sniffer_distributed_architecture]] - Client-Server mode


this will prevent /var/lib/mysql/ibdata1 file grow to giant size and instead data are organized in /var/lib/mysql/voipmonitor which greatly increases I/O performance.
== AI Summary for RAG ==


== Write performance ==
<!-- This section is for AI/RAG systems. Do not edit manually. -->


Write performance depends a lot if a storage is also used for pcap storing (thus sharing I/O with voipmonitor) and on how mysql handles writes (innodb_flush_log_at_trx_commit parameter - see below). Since sniffer version 6 MySQL tables uses compression which doubles write and read performance almost with no trade cost on CPU (well it depends on CPU type and amount of traffic).  
=== Summary ===
Performance tuning guide for high-traffic VoIPmonitor deployments. Covers three main bottlenecks: CPU (t0 packet capture thread, single-core limit), Disk I/O (PCAP storage), and Database (MySQL/MariaDB). Threading is automatic since 2023.11 via threading_expanded=yes (use high_traffic for ≥1500 Mbit/s). NIC tuning: ethtool ring buffer and interrupt coalescing. CPU optimization: interface_ip_filter instead of BPF, jitterbuffer_f2=yes with f1/adapt disabled. Kernel bypass solutions: DPDK (~70% CPU reduction), PF_RING ZC, Napatech SmartNICs (<3% CPU at 10Gbit). Disk I/O: TAR archives reduce IOPS 10x, ext4 tuning (noatime, writeback), RAID WriteBack cache with BBU. Database: innodb_buffer_pool_size (50-70% RAM for dedicated server), innodb_flush_log_at_trx_commit=2, partitioning. Multi-host architecture for >5000-10000 concurrent calls separating MySQL, sensors, and GUI.


=== innodb_flush_log_at_trx_commit ===
=== Keywords ===
scaling, performance, tuning, optimization, high traffic, bottleneck, CPU, t0CPU, t0 thread, single-core, disk I/O, storage, database, MySQL, MariaDB, threading_expanded, high_traffic, NIC tuning, ethtool, ring buffer, interrupt coalescing, interface_ip_filter, jitterbuffer, DPDK, PF_RING, Napatech, kernel bypass, TAR archive, ext4, noatime, writeback, RAID, WriteBack cache, BBU, innodb_buffer_pool_size, innodb_flush_log_at_trx_commit, partitioning, multi-host, component separation, concurrent calls, capacity, 10000 calls, heap, SQLq, compression threads, pcap_dump_writethreads


Default value of 1 will mean each update transaction commit (or each statement outside of transaction) will need to flush log to the disk which is rather expensive, especially if you do not have Battery backed up cache. Many applications are OK with value 2 which means do not flush log to the disk but only flush it to OS cache. The log is still flushed to the disk each second so you normally would not loose more than 1-2 sec worth of updates. Value 0 is a bit faster but is a bit less secure as you can lose transactions even in case MySQL Server crashes. Value 2 only cause data loss with full OS crash.
=== Key Questions ===
If you are importing or altering cdr table it is strongly recommended to set temporarily innodb_flush_log_at_trx_commit = 0 and turn off binlog if you are importing CDR via inserts.
* How to tune VoIPmonitor for high traffic?
 
* How many concurrent calls can VoIPmonitor handle?
innodb_flush_log_at_trx_commit = 2
* What are the main performance bottlenecks?
 
* How to optimize CPU usage for packet capture?
=== compression ===
* What is threading_expanded and when to use high_traffic?
 
* How to tune NIC for VoIPmonitor?
The compression is enabled by default when you use mysql >=5.5
* How to reduce CPU with jitterbuffer settings?
 
* What is DPDK and when to use it?
==== MySQL 5.1 ====
* How to optimize disk I/O for PCAP storage?
 
* How to tune ext4 filesystem for VoIPmonitor?
set in my.cf in [global] section this value:
* What is the recommended innodb_buffer_pool_size?
 
* How to configure MySQL for VoIPmonitor performance?
innodb_file_per_table = 1
* When to separate VoIPmonitor components to multiple hosts?
 
* How to monitor VoIPmonitor performance metrics?
==== MySQL 5.5, 5.6, 5.7 ====
* What do t0CPU, heap, SQLq metrics mean?
 
innodb_file_per_table = 1
innodb_file_format = barracuda
 
==== MySQL 8.0 ====
innodb_file_per_table = 1
 
 
==== Tune KEY_BLOCK_SIZE ====
 
If you choose KEY_BLOCK_SIZE=2 instead of default 8 the compression will be twice better but with CPU penalty on read. We have tested differences between no compression, 8kb and 2kb block size compression on 700 000 CDR with this result (on single core system – we do not know how it behaves on multi core systems). Testing query is select with group by.
No compression – 1.6 seconds
8kb -  1.7 seconds
4kb - 8 seconds
 
== Read performance ==
 
Read performance depends how big the database is and how fast disk operates and how much memory is allocated for innodb cache. Since sniffer version 7 all large tables uses partitioning by days which reduces needs to allocate very large cache to get good performance for the GUI. Partitioning works since MySQL 5.1 and is highly recommended. It also allows instantly removes old data by wiping partition instead of DELETE rows which can take hours on very large tables (millions of rows).
 
=== innodb_buffer_pool_size ===
 
This is very important variable to tune if you’re using Innodb tables. Innodb tables are much more sensitive to buffer size compared to MyISAM. MyISAM may work kind of OK with default key_buffer_size even with large data set but it will crawl with default innodb_buffer_pool_size. Also Innodb buffer pool caches both data and index pages so you do not need to leave space for OS cache so values up to 70-80% of memory often make sense for Innodb only installations.
 
We recommend to set this value to 50% of your available RAM. 2GB at least, 8GB is optimal. All depends how many CDR do you have per day.
 
put into /etc/mysql/my.cnf (or /etc/my.cnf if redhat/centos) [mysqld] section
innodb_buffer_pool_size = 8GB
 
== Partitioning ==
 
Partitioning is enabled by default since version 7. If you want to take benefit of it (which we strongly recommend) you need to start with clean database - there is no conversion procedure from old database to partitioned one. Just create new database and start voipmonitor with new database and partitioning will be created. You can turn off partitioning by setting cdr_partition = no in voipmonitor.conf
 
== SSDs ==
 
When there are used SSDs (partitions with fast access time) for database's datadir, make sure that mysql options are not the limiting. Optimal settings of following options is essential.
(Example is for mysql8.0 server with 256GB of RAM and SSDs fs mounted with options (rw,noatime,nodiratime,nobarrier,errors=remount-ro,stripe=1024,data=writeback):
innodb_flush_log_at_trx_commit=0
innodb_flush_log_at_timeout = 1800
innodb_flush_neighbors = 0
innodb_io_capacity = 1000000
innodb_io_capacity_max = 10000000
innodb_doublewrite=0
innodb_flush_method = O_DIRECT
innodb_read_io_threads = 20
innodb_write_io_threads = 20
innodb_purge_threads = 20
innodb_thread_concurrency = 40
transaction-isolation = READ-UNCOMMITTED
open_files_limit = 200000
skip-external-locking
skip-name-resolve
performance_schema=0
sort_buffer_size = 65M
max_heap_table_size = 24G
innodb_log_file_size = 5G
innodb_log_buffer_size = 2G
innodb_buffer_pool_size = 180G
 
 
 
== High calls per second configuration (>= 15000 CPS) ==
 
[mysqld]
default-authentication-plugin=mysql_native_password
skip-log-bin
symbolic-links=0
innodb_flush_log_at_trx_commit=0
innodb_flush_log_at_timeout = 1800
max_heap_table_size = 24G
innodb_log_file_size = 5G
innodb_log_buffer_size = 2G
innodb_file_per_table = 1
open_files_limit = 200000
skip-external-locking
key_buffer_size = 2G
sort_buffer_size = 65M
max_connections = 100000
max_connect_errors = 1000
skip-name-resolve
innodb_read_io_threads = 20
innodb_write_io_threads = 20
innodb_purge_threads = 20
innodb_thread_concurrency = 40
innodb_flush_neighbors = 0
innodb_io_capacity = 1000000
innodb_io_capacity_max = 10000000
innodb_doublewrite = 0
innodb_buffer_pool_size = 150G
innodb_flush_method = O_DIRECT
innodb_page_cleaners = 15
innodb_buffer_pool_instances = 15
log_timestamps = SYSTEM
transaction-isolation = READ-UNCOMMITTED
performance_schema=0
 
 
==== server voipmonitor.conf ====
 
all sensors MUST be configured to connect to one central voipmonitor sniffer which connects to mysql - clients does not send SQL to DB - only central server communicates to mysql server \
client sniffers does not need to mirror packets to central voipmonitor sniffer - it connects only for sending CDR for central batch database inserts
 
id_sensor = 1
mysql_enable_new_store = per_query
mysql_enable_set_id = yes
 
server_sql_queue_limit = 50000 // clients will not send its queue when limit is reached
mysqlstore_max_threads_cdr = 9 // maximum 99 threads
server_type_compress = lzo
 
disable_cdr_indexes_rtp = yes // this will apply only when CDR tables are being created - drop cdr* tables when applying this changes
#check if mysql supports lz4 : SHOW GLOBAL STATUS WHERE Variable_name IN (  'Innodb_have_lz4');
#based on mysql / mariadb set one of the mysqlcompress_type:
#for mysql8:
mysqlcompress_type = compression="lz4"
#for mariadb:
mysqlcompress_type = PAGE_COMPRESSED=1
 
 
server_bind = 0.0.0.0
server_bind_port = 55551
server_password = yourpassword
query_cache = yes
 
==== clients voipmonitor.conf ====
 
id_sensor = 2
query_cache = yes
server_destination = 192.168.0.1
server_destination_port = 55551
server_password = yourpassword
mysqlstore_max_threads_cdr = 9 // maximum 99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.

Latest revision as of 21:52, 20 January 2026


This guide covers performance tuning for high-traffic VoIPmonitor deployments, addressing the three primary system bottlenecks.

Understanding Performance Bottlenecks

A VoIPmonitor deployment's capacity is limited by three potential bottlenecks:

Bottleneck Description Monitor
1. Packet Capture Single CPU core reading packets from NIC t0CPU in syslog
2. Disk I/O Writing PCAP files to storage iostat, ioping
3. Database CDR ingestion and GUI queries SQLq in syslog

Capacity: A modern server (24-core Xeon, 10Gbit NIC) can handle ~10,000 concurrent calls with full RTP recording, or 60,000+ with SIP-only analysis.

Optimizing Packet Capture (CPU & Network)

The packet capture thread (t0) runs on a single CPU core. If t0CPU approaches 100%, you've hit the capture limit.

With a modern kernel and VoIPmonitor build, a standard Intel 10Gbit NIC handles up to 3 Gbit/s VoIP traffic without special drivers and almost full 10Gbit rate with DPDK

Threading (Automatic)

Since version 2023.11, VoIPmonitor uses threading_expanded=yes by default, which automatically spawns threads based on CPU load. No manual threading configuration is needed.

For very high traffic (≥1500 Mbit/s), set:

threading_expanded = high_traffic

See Threading Model for details.

NIC Tuning (>500 Mbit/s)

# Increase ring buffer (prevents packet loss during CPU spikes)
ethtool -g eth0                  # Check max size
ethtool -G eth0 rx 16384         # Set to max

# Enable interrupt coalescing (reduces CPU overhead)
ethtool -C eth0 rx-usecs 1022

Persistent settings (Debian/Ubuntu /etc/network/interfaces):

auto eth0
iface eth0 inet manual
    up ip link set $IFACE up
    up ip link set $IFACE promisc on
    up ethtool -G $IFACE rx 16384
    up ethtool -C $IFACE rx-usecs 1022

Configuration Optimizations

Parameter Purpose Recommendation
interface_ip_filter IP-based filtering More efficient than BPF filter
pcap_dump_writethreads_max Compression threads Set to CPU core count
jitterbuffer_f1/f2/adapt Jitter simulation Keep f2=yes, disable f1 and adapt to save CPU while keeping MOS
# /etc/voipmonitor.conf

# Efficient IP filtering (replaces BPF filter)
interface_ip_filter = 192.168.0.0/24
interface_ip_filter = 10.0.0.0/8

# Compression scaling
pcap_dump_writethreads = 1
pcap_dump_writethreads_max = 32
pcap_dump_asyncwrite = yes

ℹ️ Note: Recommended: jitterbuffer_f1=no, jitterbuffer_f2=yes, jitterbuffer_adapt=no. This saves CPU while preserving MOS-F2 metrics. Only disable f2 if you don't need quality monitoring at all.

Kernel-Bypass Solutions

For extreme loads, bypass the kernel network stack entirely:

Solution Type CPU Reduction Use Case
DPDK Open-source ~70% Multi-gigabit on commodity hardware
PF_RING ZC Commercial 90% → 20% High-volume enterprise
Napatech SmartNICs Hardware <3% at 10Gbit/s Extreme performance

Optimizing Disk I/O

VoIPmonitor Storage Strategy

VoIPmonitor groups all calls starting within the same minute into a single compressed .tar archive. This changes thousands of random writes into few sequential writes, reducing IOPS by 10x+.

Typical capacity: 7200 RPM SATA handles ~2,000 concurrent calls with full recording.

Filesystem Tuning (ext4)

# Format without journal (requires battery-backed RAID)
mke2fs -t ext4 -O ^has_journal /dev/sda2
# /etc/fstab
/dev/sda2  /var/spool/voipmonitor  ext4  errors=remount-ro,noatime,data=writeback,barrier=0  0 0

⚠️ Warning: Disabling journal removes crash protection. Only use with battery-backed RAID controller (BBU).

RAID Controller

Set cache policy to WriteBack (not WriteThrough). Requires healthy BBU. Commands vary by vendor (megacli, ssacli, perccli).

Optimizing Database Performance

Memory Configuration

The most critical parameter is innodb_buffer_pool_size.

⚠️ Warning: Setting too high causes OOM killer events, CDR delays, and crashes. See OOM Troubleshooting.

Buffer Pool Sizing:

Server Type Formula Example (32GB RAM)
Shared (VoIPmonitor + MySQL) (Total RAM - VoIPmonitor - OS) / 2 14GB
Dedicated MySQL server 50-70% of total RAM 20-22GB

RAM Recommendations:

Deployment Size Minimum Recommended
Small (<500 calls) 8GB 16GB
Medium (500-2000) 16GB 32GB
Large (>2000) 32GB 64GB+

Key MySQL Parameters

# /etc/mysql/my.cnf or mariadb.conf.d/50-server.cnf
[mysqld]
innodb_buffer_pool_size = 14G
innodb_flush_log_at_trx_commit = 2  # Faster, minimal data loss risk
innodb_file_per_table = 1           # Essential for partitioning
innodb_compression_algorithm = lz4  # MariaDB only

Slow Query Log

The slow query log can consume significant memory. Consider disabling on high-traffic systems:

[mysqld]
slow_query_log = 0
# Or increase threshold: long_query_time = 600

Database Partitioning

VoIPmonitor automatically partitions large tables (like cdr) by day. This is enabled by default and highly recommended.

See Database Partitioning for details.

Troubleshooting: Connection Refused

Symptoms: GUI crashes, "Connection refused" errors, intermittent issues during peak volumes.

Cause: innodb_buffer_pool_size too low (default 128M is insufficient).

Solution: Increase to 6G+ based on available RAM:

[mysqld]
innodb_buffer_pool_size = 6G
systemctl restart mariadb

Component Separation (Multi-Host Architecture)

For deployments exceeding 5,000-10,000 concurrent calls, separate VoIPmonitor components onto dedicated hosts.

Architecture Overview

Host Component Primary Resources Scaling Strategy
Host 1 MySQL Database RAM, fast SSD Add RAM, read replicas
Host 2 Sensor(s) CPU (t0 thread), network DPDK/PF_RING, more sensors
Host 3 GUI CPU, network Load balancer, caching

Configuration

MySQL Server:

# /etc/mysql/my.cnf
[mysqld]
bind-address = 0.0.0.0
innodb_buffer_pool_size = 50G  # 50-70% RAM for dedicated server
CREATE USER 'voipmonitor'@'%' IDENTIFIED BY 'strong_password';
GRANT ALL PRIVILEGES ON voipmonitor.* TO 'voipmonitor'@'%';

Sensor:

# /etc/voipmonitor.conf
id_sensor = 1
mysqlhost = mysql.server.ip
mysqldb = voipmonitor
mysqlusername = voipmonitor
mysqlpassword = strong_password

GUI: Configure via Settings > System Configuration > Database, or edit config/system_configuration.php.

Firewall Rules:

Source Destination Port Purpose
Sensor MySQL 3306 CDR writes
GUI MySQL 3306 Queries
GUI Sensor(s) 5029 PCAP retrieval
Users GUI 80, 443 Web access

ℹ️ Note: Component separation can be combined with Client-Server mode for multi-site deployments.

Monitoring Performance

VoIPmonitor logs performance metrics every 10 seconds to syslog. Key metrics to watch:

Metric Warning Sign Bottleneck Type
t0CPU >90% CPU (packet capture limit)
heap[A|B|C] A >50% I/O or CPU (buffer filling)
SQLq Growing Database
comp Maxed out I/O (compression waiting for disk)
# Monitor in real-time
journalctl -u voipmonitor -f

Main article: Syslog_Status_Line - Complete reference for all metrics with detailed explanations and troubleshooting guidance.

For bottleneck diagnosis: See I/O vs CPU Bottleneck Diagnosis for step-by-step diagnostic procedure using syslog metrics and Linux tools.

See Also

AI Summary for RAG

Summary

Performance tuning guide for high-traffic VoIPmonitor deployments. Covers three main bottlenecks: CPU (t0 packet capture thread, single-core limit), Disk I/O (PCAP storage), and Database (MySQL/MariaDB). Threading is automatic since 2023.11 via threading_expanded=yes (use high_traffic for ≥1500 Mbit/s). NIC tuning: ethtool ring buffer and interrupt coalescing. CPU optimization: interface_ip_filter instead of BPF, jitterbuffer_f2=yes with f1/adapt disabled. Kernel bypass solutions: DPDK (~70% CPU reduction), PF_RING ZC, Napatech SmartNICs (<3% CPU at 10Gbit). Disk I/O: TAR archives reduce IOPS 10x, ext4 tuning (noatime, writeback), RAID WriteBack cache with BBU. Database: innodb_buffer_pool_size (50-70% RAM for dedicated server), innodb_flush_log_at_trx_commit=2, partitioning. Multi-host architecture for >5000-10000 concurrent calls separating MySQL, sensors, and GUI.

Keywords

scaling, performance, tuning, optimization, high traffic, bottleneck, CPU, t0CPU, t0 thread, single-core, disk I/O, storage, database, MySQL, MariaDB, threading_expanded, high_traffic, NIC tuning, ethtool, ring buffer, interrupt coalescing, interface_ip_filter, jitterbuffer, DPDK, PF_RING, Napatech, kernel bypass, TAR archive, ext4, noatime, writeback, RAID, WriteBack cache, BBU, innodb_buffer_pool_size, innodb_flush_log_at_trx_commit, partitioning, multi-host, component separation, concurrent calls, capacity, 10000 calls, heap, SQLq, compression threads, pcap_dump_writethreads

Key Questions

  • How to tune VoIPmonitor for high traffic?
  • How many concurrent calls can VoIPmonitor handle?
  • What are the main performance bottlenecks?
  • How to optimize CPU usage for packet capture?
  • What is threading_expanded and when to use high_traffic?
  • How to tune NIC for VoIPmonitor?
  • How to reduce CPU with jitterbuffer settings?
  • What is DPDK and when to use it?
  • How to optimize disk I/O for PCAP storage?
  • How to tune ext4 filesystem for VoIPmonitor?
  • What is the recommended innodb_buffer_pool_size?
  • How to configure MySQL for VoIPmonitor performance?
  • When to separate VoIPmonitor components to multiple hosts?
  • How to monitor VoIPmonitor performance metrics?
  • What do t0CPU, heap, SQLq metrics mean?