SQL queue is growing in a peaktime: Difference between revisions
(Add section about invalid CDR queries stopping queue processing) |
(Add quick_save_cdr section for fast CDR visibility and warning about query_cache) |
||
| Line 20: | Line 20: | ||
https://www.voipmonitor.org/doc/Scaling#SSDs | https://www.voipmonitor.org/doc/Scaling#SSDs | ||
== Quick CDR Visibility (Reduce Delay) == | |||
If CDRs are taking too long to appear in the GUI after calls end (typically 30-60 seconds delay), you can reduce this delay by enabling quick CDR save mode. Add this parameter to <code>/etc/voipmonitor.conf</code>: | |||
quick_save_cdr = yes | |||
This speeds up the visibility of calls in the GUI by reducing the buffering delay before CDRs are written to the database. | |||
{{Warning| | |||
'''Performance Impact:''' Enabling <code>quick_save_cdr</code> increases CPU and I/O load on the database server by forcing more frequent CDR writes. Only use this if near-real-time CDR visibility is absolutely required. | |||
}} | |||
Available options: | |||
* <code>no</code> (default) - 10 second delay, recommended for most deployments | |||
* <code>yes</code> - 3 second delay, moderate performance impact | |||
* <code>quick</code> - 1 second delay, high performance impact | |||
See [[Sniffer_configuration#Performance_.26_Schema|Sniffer Configuration]] for more details on this parameter. | |||
== Enable Disk-Based Query Queue (Prevent OOM) == | == Enable Disk-Based Query Queue (Prevent OOM) == | ||
| Line 27: | Line 45: | ||
This is a '''critical''' feature. When enabled, SQL queries are first saved to disk-based queue files (qoq* files in the spool directory) before being sent to the database. This prevents OOM and data loss if the database is temporarily unavailable. The default is <code>yes</code>, but if it was previously set to <code>no</code>, changing it to <code>yes</code> will use disk storage instead of RAM for queued queries. | This is a '''critical''' feature. When enabled, SQL queries are first saved to disk-based queue files (qoq* files in the spool directory) before being sent to the database. This prevents OOM and data loss if the database is temporarily unavailable. The default is <code>yes</code>, but if it was previously set to <code>no</code>, changing it to <code>yes</code> will use disk storage instead of RAM for queued queries. | ||
{{Note| | |||
<strong>Do not set <code>query_cache = no</code> to improve performance.</strong> While disabling query cache reduces CDR delay by holding queries in RAM instead of writing to qoq files, this creates a severe risk: if the database becomes unreachable and memory fills up, OOM killer will terminate the VoIPmonitor process and all queued CDRs will be lost. Keep <code>query_cache = yes</code> for data safety, and use <code>quick_save_cdr</code> instead for faster CDR visibility. | |||
}} | |||
See [[Sniffer_configuration#Performance_.26_Schema|Sniffer Configuration]] for more details on this parameter. | See [[Sniffer_configuration#Performance_.26_Schema|Sniffer Configuration]] for more details on this parameter. | ||
| Line 95: | Line 117: | ||
This clears the backlog and allows new CDRs to be written immediately without waiting for the old queue to process. | This clears the backlog and allows new CDRs to be written immediately without waiting for the old queue to process. | ||
== When Configuration Tuning Is Not Enough: Hardware Upgrade Required == | == When Configuration Tuning Is Not Enough: Hardware Upgrade Required == | ||
| Line 245: | Line 240: | ||
== AI Summary for RAG == | == AI Summary for RAG == | ||
'''Summary:''' This page explains the delay between Active Calls view and CDR view, which occurs when the database cannot keep up with CDR insertion rates | '''Summary:''' This page explains the delay between Active Calls view and CDR view, which occurs when the database cannot keep up with CDR insertion rates. Symptoms include slow CDR appearance, "Crontab log is too old" warning, lag between call end and reporting, or active calls visible but recent CDRs missing. The SQLq/SQLf metric indicates queue size before CDRs are pushed to the database. When the database is slow, CDRs queue in qoq* files in the spool directory (default: /var/spool/voipmonitor). Two solutions exist: (1) Wait for the system to process the queue - monitor SQLf parameter which should decrease over time; (2) Delete qoq* files from spool directory as emergency measure - this clears backlog and allows new CDRs to be written immediately, but any CDRs still waiting in the queue files will be lost (only use if immediate access is critical and losing older CDRs is acceptable). Configuration fixes include: enabling quick_save_cdr for faster CDR visibility (3s delay with yes, 1s delay with quick, at cost of higher CPU/I/O load), ensuring MySQL config follows recommendations from Scaling page (especially innodb_flush_log_at_trx_commit and innodb_buffer_pool_size), enabling disk-based query cache via query_cache=yes (prevents OOM by storing queued queries in spool files instead of RAM - never set to no as it risks OOM/CDR loss), increasing database connection threads via mysqlstore_max_threads_cdr and mysqlstore_max_threads_sip_msg in voipmonitor.conf, and understanding that mysql_enable_set_id=yes or mysql_enable_new_store=per_query limits setup to single sensor writing to database (for centralized writer architecture). HARDWARE UPGRADE: If CPU load is consistently at or near 100% on database/GUI servers even after configuration tuning, this indicates insufficient hardware and requires hardware upgrade to modern AMD EPYC or Intel Xeon Gold/Silver with more cores, sufficient RAM for buffer pool, and NVMe SSD storage. MYSQL MIGRATION TO SSD: When upgrading storage to SSD/NVMe to fix IOPS bottlenecks, you can migrate MySQL datadir while system is running: stop MySQL, rsync /var/lib/mysql to new SSD, update datadir in my.cnf, update AppArmor paths if using Ubuntu/Debian, start MySQL. No CDR loss expected because sniffer queues CDRs internally during downtime. Monitor SQLq in GUI to verify queued CDRs are processed after restart. Use symbolic link alternative: mv /var/lib/mysql to mysql.old and symlink to new location. | ||
'''Keywords:''' SQL queue, SQLq, SQLf, delay, active calls, CDR, database bottleneck, mysqlstore_max_threads_cdr, mysqlstore_max_threads_sip_msg, innodb_flush_log_at_trx_commit, innodb_buffer_pool_size, query_cache, OOM, out of memory, disk-based queue, spool files, mysql_enable_set_id, mysql_enable_new_store, central writer, single sensor, Crontab log, hardware upgrade, CPU 100%, aging hardware, inadequate hardware, server upgrade, modern CPU, AMD EPYC, Intel Xeon, NVMe SSD, SSD upgrade, datadir migration, move MySQL to SSD, migrate MySQL datadir, rsync MySQL, no CDR loss, queued CDRs processed, AppArmor MySQL paths, symlink database, qoq files, queue files, spool directory, file queue backlog, delete qoq files, missing CDRs recent calls, clearing queue, emergency queue clearing, | '''Keywords:''' SQL queue, SQLq, SQLf, delay, active calls, CDR, database bottleneck, mysqlstore_max_threads_cdr, mysqlstore_max_threads_sip_msg, innodb_flush_log_at_trx_commit, innodb_buffer_pool_size, query_cache, OOM, out of memory, disk-based queue, spool files, mysql_enable_set_id, mysql_enable_new_store, central writer, single sensor, Crontab log, hardware upgrade, CPU 100%, aging hardware, inadequate hardware, server upgrade, modern CPU, AMD EPYC, Intel Xeon, NVMe SSD, SSD upgrade, datadir migration, move MySQL to SSD, migrate MySQL datadir, rsync MySQL, no CDR loss, queued CDRs processed, AppArmor MySQL paths, symlink database, qoq files, queue files, spool directory, file queue backlog, delete qoq files, missing CDRs recent calls, clearing queue, emergency queue clearing, quick_save_cdr, CDR delay, CDR visibility | ||
Revision as of 07:18, 6 January 2026
Delay between active call and cdr view
The Active Calls view in the GUI displays the timestart of a calls(INVITEs) obtained from the VoIPmonitor sniffer service, whereas the CDR view shows the stored CDRs (after the call ends) from the database.
Symptoms of Database Delays
When the database cannot keep up with CDR insertion rates, you may experience:
- Slow CDR appearance in GUI - New calls take minutes to appear after they end
- "Crontab log is too old" warning - The cron job runs slowly (every 5-10 minutes or more) instead of every minute due to database overload
- Lag between call end and reporting - Daily reports and alerts process outdated data
To diagnose cron-related delays specifically, see "Crontab log is too old" troubleshooting.
SQLq/SQLf
In the service status (expanded status, status line) under GUI -> Settings -> Sensors : status, the SQLq/SQLf values represent the size of the queue before the CDRs are pushed to the database. When SQLq/SQLf is high, it usually indicates that the database is unable to process requests in a timely manner, causing them to queue. Logging#SQLq/SQLf
Make sure that db config is not causing io overhead
If the MySQL configuration already follows the recommendations from the scaling section of our documentation (especially innodb_flush_log_at_trx_commit and innodb_buffer_pool_size):
https://www.voipmonitor.org/doc/Scaling#SSDs
Quick CDR Visibility (Reduce Delay)
If CDRs are taking too long to appear in the GUI after calls end (typically 30-60 seconds delay), you can reduce this delay by enabling quick CDR save mode. Add this parameter to /etc/voipmonitor.conf:
quick_save_cdr = yes
This speeds up the visibility of calls in the GUI by reducing the buffering delay before CDRs are written to the database.
⚠️ Warning:
Performance Impact: Enabling quick_save_cdr increases CPU and I/O load on the database server by forcing more frequent CDR writes. Only use this if near-real-time CDR visibility is absolutely required.
Available options:
no(default) - 10 second delay, recommended for most deploymentsyes- 3 second delay, moderate performance impactquick- 1 second delay, high performance impact
See Sniffer Configuration for more details on this parameter.
Enable Disk-Based Query Queue (Prevent OOM)
If your system is experiencing Out Of Memory (OOM) issues or if the SQL queue keeps growing during peak traffic, ensure that disk-based query queuing is enabled. Add or verify this parameter in /etc/voipmonitor.conf:
query_cache = yes
This is a critical feature. When enabled, SQL queries are first saved to disk-based queue files (qoq* files in the spool directory) before being sent to the database. This prevents OOM and data loss if the database is temporarily unavailable. The default is yes, but if it was previously set to no, changing it to yes will use disk storage instead of RAM for queued queries.
ℹ️ Note:
See Sniffer Configuration for more details on this parameter.
More threads/connections to a db
you can also increase the number of threads used for connection to a db for particular use like CDRs - the VoIPmonitor sniffer service in /etc/voipmonitor.conf uses the option
mysqlstore_max_threads_cdr = 8
Important: mysql_enable_set_id Limitation
If you are using mysql_enable_new_store = per_query or have enabled mysql_enable_set_id = yes, please note that this configuration limits your setup to a single sensor writing to the database. This option allows the central server to generate CDR IDs instead of waiting for MySQL's AUTO_INCREMENT, which can improve batch insert performance. However, this architecture only works when one sensor is responsible for all database writes. If you have multiple sensors writing to the same database, do not enable this option as it will cause ID conflicts and data corruption.
If you are processing a high volume of SIP OPTIONS, SUBSCRIBE, or NOTIFY messages and see the sip_msg queue growing, you can increase threads specifically for those messages:
mysqlstore_max_threads_sip_msg = 8
However, if the database is waiting for storage I/O, increasing the number of threads will not help.
Clearing File Queue Backlog (qoq* Files)
When the file queue grows and recent calls are not appearing in CDR despite active calls being visible, you may have a backlog of qoq* files in the spool directory waiting to be processed.
What are qoq* Files?
The VoIPmonitor sniffer creates qoq* (queue) files in the spool directory (default: /var/spool/voipmonitor) to buffer database operations. These files contain queued SQL commands waiting to be inserted into the database.
When the database cannot keep up with the insertion rate:
- Active calls are visible in the GUI (retrieved from sensor memory in real-time)
- Recent CDRs do not appear (waiting in qoq queue files)
- SQLq metric stays high or continues growing
Solution 1: Wait for Queue Processing
The system will automatically process the qoq* files as the database catches up. Monitor the progress:
# Check the SQLf parameter in logs to monitor queue size
# SQLf should decrease over time as files are processed
# View current qoq files in spool directory
ls -lh /var/spool/voipmonitor/qoq* 2>/dev/null | wc -l
This is the preferred approach if losing older CDRs is not acceptable.
Solution 2: Delete qoq* Files (Emergency)
If immediate access to recent CDRs is critical and you can afford to lose older CDRs, you can delete the qoq* files to clear the backlog:
# WARNING: This will queue any CDRs still waiting in the files
# Stop the VoIPmonitor service first
systemctl stop voipmonitor
# Delete all qoq* files from the spool directory
rm -f /var/spool/voipmonitor/qoq*
# Start the VoIPmonitor service
systemctl start voipmonitor
⚠️ Warning: Data Loss Warning: Deleting qoq* files will queue any CDRs that were waiting in the queue files. Only use this method if:
- Immediate access to recent CDRs is critical
- Losing older CDRs is acceptable
- You have exhausted all other options (configuration tuning, hardware upgrade)
This clears the backlog and allows new CDRs to be written immediately without waiting for the old queue to process.
When Configuration Tuning Is Not Enough: Hardware Upgrade Required
If you have applied all the configuration optimizations above and the SQL queue continues to grow or the database remains significantly behind the processing queue, the underlying issue may be insufficient hardware.
Signs That Hardware Upgrade Is Necessary
- CPU load is consistently at or near 100% on both the database and GUI servers, even during normal traffic patterns
- Old or aging hardware - Servers with CPUs more than several years old may lack the performance of modern processors
- Configuration tuning provides only marginal improvement - After applying MySQL and VoIPmonitor optimizations, the delay between the "Last CDR in processing queue" and "Last CDR in database" remains significant
Diagnosing Hardware Limitations
Monitor CPU usage on both the database and GUI servers:
# Check CPU load during peak traffic
top
# Or use sar for historical data
sar -u 1 10
# Check CPU core usage per process
mpstat -P ALL 1 5
If the CPU load is consistently at or near 100% across multiple cores, the hardware cannot keep up with the traffic load. No amount of configuration tuning will solve this issue - the servers themselves need to be upgraded.
Hardware Upgrade Recommendations
Upgrade to a more modern CPU architecture with significantly better performance:
- CPU - Modern AMD EPYC or Intel Xeon Gold/Silver processors with more cores
- RAM - Ensure sufficient memory for the database buffer pool (see Memory Configuration)
- Storage - Use NVMe SSDs for the database to eliminate I/O bottlenecks
Architecture Consideration: Merge GUI and Database
In some cases, merging the GUI and database roles onto a single, powerful new server can be more efficient than maintaining separate, underpowered servers. A single modern server with sufficient CPU cores and RAM can often handle both workloads more effectively than multiple older servers.
For hardware sizing examples, see the Hardware page, which includes real-world deployments for various call volumes.
Migrating MySQL Data to Faster Storage
When upgrading from HDD or slow SATA SSDs to NVMe storage, you can migrate the MySQL data directory (`datadir`) while the system is running. There is no CDR loss expected with this method because the sniffer queues the CDRs internally and will process them after MySQL restarts.
Prerequisites:
- New SSD/NVMe storage installed and mounted
- Sufficient disk space on the new storage for the existing MySQL data
- MySQL/MariaDB service can be stopped briefly for the migration
Migration Procedure:
# 1. Prepare the new storage (if your SSD is hot-pluggable)
# Mount the new filesystem. Example:
mount /dev/nvme0n1p1 /mnt/fast_storage
mkdir /mnt/fast_storage/mysql
# 2. Stop the MySQL service
systemctl stop mysql
# Or for MariaDB:
# systemctl stop mariadb
# 3. Copy the MySQL data directory to the new location
# The -a flag preserves permissions and ownership, -x skips other filesystems
rsync -avx /var/lib/mysql/ /mnt/fast_storage/mysql/
# 4. Verify the copy
ls -la /mnt/fast_storage/mysql/
# Check that all databases are present
# 5. Update MySQL configuration to point to the new datadir location
# Edit /etc/mysql/my.cnf or /etc/my.cnf and change:
# datadir = /var/lib/mysql
# To:
# datadir = /mnt/fast_storage/mysql
# Also update the socket location if needed:
# socket = /tmp/mysql.sock (or your preferred location)
# 6. Update AppArmor/SELinux (Ubuntu/Debian only)
# Edit /etc/apparmor.d/usr.sbin.mysqld and update the paths:
# /var/lib/mysql/ r,
# /var/lib/mysql/** rwk,
# Change to:
# /mnt/fast_storage/mysql/ r,
# /mnt/fast_storage/mysql/** rwk,
# Reload AppArmor:
systemctl reload apparmor
# 7. Start MySQL with the new datadir
systemctl start mysql
# 8. Verify MySQL is running and databases are accessible
mysql -e "SHOW DATABASES;"
mysql -e "SELECT COUNT(*) FROM cdr;"
# 9. Monitor the SQL queue in the VoIPmonitor GUI
# Navigate to GUI → Settings → Sensors → Status
# The SQLq value should decrease as queued CDRs are processed
Important Notes:
- No CDR Loss: The VoIPmonitor sniffer queues CDRs in memory during MySQL downtime. These will be processed after MySQL restarts.
- Backup First: Always take a backup of `/var/lib/mysql` before migration.
- Service Downtime: Plan for MySQL to be stopped for the duration of the copy operation (depends on database size and storage speed).
- Storage Mount Options: For the new database partition, mount with ext4 optimizations:
/dev/nvme0n1p1 /mnt/fast_storage ext4 defaults,noatime,data=writeback,barrier=0 0 1
- Permissions: Ensure the new datadir and all files are owned by the MySQL user (`mysql:mysql`).
- Symbolic Links Alternative: You can create a symbolic link instead of changing the datadir in my.cnf:
# After stopping MySQL and copying data:
mv /var/lib/mysql /var/lib/mysql.old
ln -s /mnt/fast_storage/mysql /var/lib/mysql
systemctl start mysql
AI Summary for RAG
Summary: This page explains the delay between Active Calls view and CDR view, which occurs when the database cannot keep up with CDR insertion rates. Symptoms include slow CDR appearance, "Crontab log is too old" warning, lag between call end and reporting, or active calls visible but recent CDRs missing. The SQLq/SQLf metric indicates queue size before CDRs are pushed to the database. When the database is slow, CDRs queue in qoq* files in the spool directory (default: /var/spool/voipmonitor). Two solutions exist: (1) Wait for the system to process the queue - monitor SQLf parameter which should decrease over time; (2) Delete qoq* files from spool directory as emergency measure - this clears backlog and allows new CDRs to be written immediately, but any CDRs still waiting in the queue files will be lost (only use if immediate access is critical and losing older CDRs is acceptable). Configuration fixes include: enabling quick_save_cdr for faster CDR visibility (3s delay with yes, 1s delay with quick, at cost of higher CPU/I/O load), ensuring MySQL config follows recommendations from Scaling page (especially innodb_flush_log_at_trx_commit and innodb_buffer_pool_size), enabling disk-based query cache via query_cache=yes (prevents OOM by storing queued queries in spool files instead of RAM - never set to no as it risks OOM/CDR loss), increasing database connection threads via mysqlstore_max_threads_cdr and mysqlstore_max_threads_sip_msg in voipmonitor.conf, and understanding that mysql_enable_set_id=yes or mysql_enable_new_store=per_query limits setup to single sensor writing to database (for centralized writer architecture). HARDWARE UPGRADE: If CPU load is consistently at or near 100% on database/GUI servers even after configuration tuning, this indicates insufficient hardware and requires hardware upgrade to modern AMD EPYC or Intel Xeon Gold/Silver with more cores, sufficient RAM for buffer pool, and NVMe SSD storage. MYSQL MIGRATION TO SSD: When upgrading storage to SSD/NVMe to fix IOPS bottlenecks, you can migrate MySQL datadir while system is running: stop MySQL, rsync /var/lib/mysql to new SSD, update datadir in my.cnf, update AppArmor paths if using Ubuntu/Debian, start MySQL. No CDR loss expected because sniffer queues CDRs internally during downtime. Monitor SQLq in GUI to verify queued CDRs are processed after restart. Use symbolic link alternative: mv /var/lib/mysql to mysql.old and symlink to new location.
Keywords: SQL queue, SQLq, SQLf, delay, active calls, CDR, database bottleneck, mysqlstore_max_threads_cdr, mysqlstore_max_threads_sip_msg, innodb_flush_log_at_trx_commit, innodb_buffer_pool_size, query_cache, OOM, out of memory, disk-based queue, spool files, mysql_enable_set_id, mysql_enable_new_store, central writer, single sensor, Crontab log, hardware upgrade, CPU 100%, aging hardware, inadequate hardware, server upgrade, modern CPU, AMD EPYC, Intel Xeon, NVMe SSD, SSD upgrade, datadir migration, move MySQL to SSD, migrate MySQL datadir, rsync MySQL, no CDR loss, queued CDRs processed, AppArmor MySQL paths, symlink database, qoq files, queue files, spool directory, file queue backlog, delete qoq files, missing CDRs recent calls, clearing queue, emergency queue clearing, quick_save_cdr, CDR delay, CDR visibility