Scaling: Difference between revisions

Revision as of 20:30, 4 January 2026

Category:Administration

This guide provides a comprehensive overview of performance tuning and scaling for VoIPmonitor. It covers the three primary system bottlenecks and offers practical, expert-level advice for optimizing your deployment for high traffic loads.

Understanding Performance Bottlenecks

A VoIPmonitor deployment's maximum capacity is determined by three potential bottlenecks. Identifying and addressing the correct one is key to achieving high performance.

The three bottlenecks are:

Packet Capturing (CPU & Network Stack): The ability of a single CPU core to read packets from the network interface. This is often the first limit encountered.
Disk I/O (Storage): The speed at which the sensor can write PCAP files to disk. Critical when call recording is enabled.
Database Performance (MySQL/MariaDB): The rate at which the database can ingest CDRs and serve data to the GUI.

On a modern, well-tuned server (e.g., 24-core Xeon, 10Gbit NIC), a single VoIPmonitor instance can handle up to 10,000 concurrent calls with full RTP analysis and recording, or over 60,000 concurrent calls with SIP-only analysis.

Optimizing Packet Capturing (CPU & Network)

The most performance-critical task is the initial packet capture, handled by a single, highly optimized thread (t0). If this thread's CPU usage (t0CPU in logs) approaches 100%, you are hitting the capture limit.

Use a Modern Linux Kernel & VoIPmonitor Build

Modern Linux kernels (3.2+) and VoIPmonitor builds include TPACKET_V3 support, a high-speed packet capture mechanism. This is the single most important factor for high performance.

Recommendation: Always use a recent Linux distribution (AlmaLinux, Rocky Linux, or Debian) and the latest VoIPmonitor static binary. With this combination, a standard Intel 10Gbit NIC can often handle up to 2 Gbit/s of VoIP traffic without special drivers.

Network Stack & Driver Tuning

For high-traffic environments (>500 Mbit/s), fine-tuning the network driver and kernel parameters is essential.

NIC Ring Buffer

The ring buffer is a queue between the network card driver and VoIPmonitor. A larger buffer prevents packet loss during short CPU usage spikes.

# Check maximum size
ethtool -g eth0

# Set to maximum (e.g., 16384)
ethtool -G eth0 rx 16384

Interrupt Coalescing

This setting batches multiple hardware interrupts into one, reducing CPU overhead.

ethtool -C eth0 rx-usecs 1022

Applying Settings Persistently

To make these settings permanent, add them to your network configuration. For Debian/Ubuntu using /etc/network/interfaces:

auto eth0
iface eth0 inet manual
    up ip link set $IFACE up
    up ip link set $IFACE promisc on
    up ethtool -G $IFACE rx 16384
    up ethtool -C $IFACE rx-usecs 1022

Note: Modern systems using NetworkManager or systemd-networkd require different configuration methods.

Advanced Kernel-Bypass Solutions

If kernel and driver tuning are insufficient, you can offload the capture process entirely by bypassing the kernel's network stack.

Solution	Type	CPU Reduction	Use Case
DPDK	Open-source	~70%	Multi-gigabit on commodity hardware
PF_RING ZC/DNA	Commercial	90% → 20%	High-volume enterprise
Napatech SmartNICs	Hardware	<3% at 10 Gbit/s	Extreme performance requirements

DPDK (Data Plane Development Kit): A set of libraries and drivers for fast packet processing. VoIPmonitor can leverage DPDK to read packets directly from the network card, completely bypassing the kernel. See DPDK guide for details.

PF_RING ZC/DNA: A commercial software driver from ntop.org that dramatically reduces CPU load by bypassing the kernel.

Napatech SmartNICs: Specialized hardware acceleration cards that deliver packets with near-zero CPU overhead.

Optimizing Disk I/O

VoIPmonitor's modern storage engine is highly optimized to minimize random disk access, which is the primary cause of I/O bottlenecks.

VoIPmonitor Storage Strategy

Instead of writing a separate PCAP file for each call (which causes massive I/O load), VoIPmonitor groups all calls starting within the same minute into a single compressed .tar archive. This changes the I/O pattern from thousands of small, random writes to a few large, sequential writes, reducing IOPS by a factor of 10 or more.

Typical capacity: A standard 7200 RPM SATA drive can handle up to 2,000 concurrent calls with full recording.

Filesystem Tuning (ext4)

For the spool directory (/var/spool/voipmonitor), using an optimized ext4 filesystem can improve performance.

# Format partition without a journal (use with caution, requires battery-backed RAID controller)
mke2fs -t ext4 -O ^has_journal /dev/sda2

# Add to /etc/fstab for optimal performance
/dev/sda2   /var/spool/voipmonitor  ext4    errors=remount-ro,noatime,data=writeback,barrier=0 0 0

⚠️ Warning: Disabling the journal removes protection against filesystem corruption after crashes. Only use this with a battery-backed RAID controller.

RAID Controller Cache Policy

A misconfigured RAID controller is a common bottleneck. For database and spool workloads, the cache policy should be set to WriteBack, not WriteThrough. This applies for RPM disks, not fast SSDs.

Requirements:

A healthy Battery Backup Unit (BBU) is required
Specific commands vary by vendor (megacli, ssacli, perccli)
Refer to vendor documentation for LSI, HP, and Dell controllers

Optimizing Database Performance (MySQL/MariaDB)

A well-tuned database is critical for both data ingestion from the sensor and GUI responsiveness.

Memory Configuration

The most critical database parameter is innodb_buffer_pool_size, which defines how much memory InnoDB uses to cache data and indexes.

⚠️ Warning: On servers running both VoIPmonitor and MySQL, setting innodb_buffer_pool_size too high causes OOM (Out of Memory) killer events, resulting in CDR delays, crashes, and instability. See OOM Troubleshooting for details.

Buffer Pool Sizing

Server Type	Calculation	Example (32GB RAM)
Shared (VoIPmonitor + MySQL)	(Total RAM - VoIPmonitor - OS overhead) / 2	14GB
Dedicated MySQL server	50-70% of total RAM	20-22GB

For shared servers, use this formula:

innodb_buffer_pool_size = (Total RAM - VoIPmonitor memory - OS overhead - safety margin) / 2

Example for a 32GB server:
- Total RAM: 32GB
- VoIPmonitor process memory: ~2GB (check with ps aux)
- OS + other services overhead: ~2GB
- Available for buffer pool: 28GB
- Recommended innodb_buffer_pool_size = 14G

RAM Recommendations

Deployment Size	Minimum RAM	Recommended RAM
Small (<500 concurrent calls)	8GB	16GB
Medium (500-2000 calls)	16GB	32GB
Large (>2000 calls)	32GB	64GB+

Disable Graphical Desktop

A graphical desktop environment consumes 1-2GB of RAM unnecessarily. VoIPmonitor is managed through a web interface and does not require a desktop.

# Disable display manager
systemctl stop gdm          # Ubuntu/Debian with GDM
systemctl disable gdm

# Set default to multi-user (no GUI)
systemctl set-default multi-user.target

# Verify memory freed
free -h

Other Key Parameters

# /etc/mysql/my.cnf or /etc/mysql/mariadb.conf.d/50-server.cnf

[mysqld]
# Buffer pool size (calculate per above)
innodb_buffer_pool_size = 14G

# Flush logs to OS cache, write to disk once per second (faster, minimal data loss risk)
innodb_flush_log_at_trx_commit = 2

# Store each table in its own file (essential for partitioning)
innodb_file_per_table = 1

# LZ4 compression for modern MariaDB
innodb_compression_algorithm = lz4

Database Partitioning

VoIPmonitor automatically splits large tables (like cdr) into daily partitions. This is enabled by default and highly recommended.

Benefits:

Massively improves GUI query performance (only relevant partitions are scanned)
Allows instant deletion of old data by dropping partitions (thousands of times faster than DELETE)

See Database Partitioning for configuration details.

Monitoring Live Performance

VoIPmonitor logs detailed performance metrics every 10 seconds to syslog.

# Debian/Ubuntu
tail -f /var/log/syslog | grep voipmonitor

# CentOS/RHEL
tail -f /var/log/messages | grep voipmonitor

Understanding the Log Output

Sample log line:

voipmonitor[15567]: calls[315][355] PS[C:4 S:29/29 R:6354 A:6484] SQLq[0] heap[0|0|0] comp[54] [12.6Mb/s] t0CPU[5.2%] ... RSS/VSZ[323|752]MB

Metric	Description	Warning Threshold
`calls[X][Y]`	X = active calls, Y = total calls in memory	-
`SQLq[C]`	SQL queries waiting to be sent to database	Growing consistently = DB bottleneck
B\|C]	Memory usage % for internal buffers	A = 100% → packet drops
`t0CPU[X%]`	Main packet capture thread CPU usage	>90-95% = capture limit reached
Y]MB	Resident/Virtual memory usage	RSS growing = memory leak

Performance Diagrams

The following diagrams illustrate the difference between standard kernel packet capture and optimized solutions:

Standard kernel packet capture path - packets traverse multiple kernel layers before reaching VoIPmonitor

PF_RING/DPDK bypass mode - packets are delivered directly to VoIPmonitor, bypassing the kernel network stack

AI Summary for RAG

Summary: Expert guide to scaling VoIPmonitor for high-traffic environments. Covers three main bottlenecks: (1) Packet Capturing - optimized via TPACKET_V3, NIC tuning with ethtool, and kernel-bypass solutions (DPDK, PF_RING, Napatech); (2) Disk I/O - VoIPmonitor uses TAR-based storage to reduce IOPS, with ext4 tuning and RAID WriteBack cache; (3) Database - critical innodb_buffer_pool_size tuning with formula for shared servers: (Total RAM - VoIPmonitor - OS overhead) / 2. For 32GB shared server, recommend 14GB buffer pool. Dedicated servers can use 50-70% of RAM. Covers partitioning benefits and syslog monitoring (t0CPU, SQLq, heap metrics).

Keywords: scaling, performance tuning, bottleneck, t0CPU, TPACKET_V3, DPDK, PF_RING, ethtool, ring buffer, innodb_buffer_pool_size, OOM killer, shared server memory, database partitioning, SQLq monitoring

Key Questions:

How do I scale VoIPmonitor for thousands of concurrent calls?
What are the main performance bottlenecks in VoIPmonitor?
How do I fix high t0CPU usage?
What is DPDK and when should I use it?
How do I calculate innodb_buffer_pool_size for a shared server?
What happens if innodb_buffer_pool_size is set too high?
How do I interpret the performance metrics in syslog?
Should I use a dedicated database server for VoIPmonitor?
How much RAM does a VoIPmonitor server need?