Database troubleshooting

From VoIPmonitor.org
Revision as of 17:11, 6 January 2026 by Admin (talk | contribs) (Create Database_troubleshooting page: merged SQL queue troubleshooting (from SQL_queue_is_growing rev 8218) + ERROR 1062 lookup table fix + MySQL performance tuning)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


This page provides troubleshooting guidance for VoIPmonitor database-related issues including SQL queue problems, CDR delays, MySQL performance tuning, and database errors.

Quick Navigation - Database Troubleshooting
SQL Queue Issues Database Errors Performance & Migration

CDR Visibility

SQL Queue

Error 1062

Related

Tuning

Hardware

SQL Queue and CDR Delays

Delay between active call and CDR view

The Active Calls view in the GUI displays the timestart of calls (INVITEs) obtained from the VoIPmonitor sniffer service, whereas the CDR view shows the stored CDRs (after the call ends) from the database.

Symptoms of Database Delays

When the database cannot keep up with CDR insertion rates, you may experience:

  • Slow CDR appearance in GUI - New calls take minutes to appear after they end
  • "Crontab log is too old" warning - The cron job runs slowly (every 5-10 minutes or more) instead of every minute due to database overload
  • Lag between call end and reporting - Daily reports and alerts process outdated data

To diagnose cron-related delays specifically, see "Crontab log is too old" troubleshooting.

SQLq/SQLf

In the service status (expanded status, status line) under GUI -> Settings -> Sensors : status, the SQLq/SQLf values represent the size of the queue before the CDRs are pushed to the database. When SQLq/SQLf is high, it usually indicates that the database is unable to process requests in a timely manner, causing them to queue.

See Logging#SQLq/SQLf for more details on these metrics.

Make sure that db config is not causing io overhead

If the MySQL configuration already follows the recommendations from the scaling section of our documentation (especially innodb_flush_log_at_trx_commit and innodb_buffer_pool_size):

https://www.voipmonitor.org/doc/Scaling#SSDs

Quick CDR Visibility (Reduce Delay)

If CDRs are taking too long to appear in the GUI after calls end (typically 30-60 seconds delay), you can reduce this delay by enabling quick CDR save mode. Add this parameter to /etc/voipmonitor.conf:

quick_save_cdr = yes

This speeds up the visibility of calls in the GUI by reducing the buffering delay before CDRs are written to the database.

⚠️ Warning: Performance Impact: Enabling quick_save_cdr increases CPU and I/O load on the database server by forcing more frequent CDR writes. Only use this if near-real-time CDR visibility is absolutely required.

Available options:

  • no (default) - 10 second delay, recommended for most deployments
  • yes - 3 second delay, moderate performance impact
  • quick - 1 second delay, high performance impact

See Sniffer Configuration for more details on this parameter.

Enable Disk-Based Query Queue (Prevent OOM)

If your system is experiencing Out Of Memory (OOM) issues or if the SQL queue keeps growing during peak traffic, ensure that disk-based query queuing is enabled. Add or verify this parameter in /etc/voipmonitor.conf:

query_cache = yes

This is a critical feature. When enabled, SQL queries are first saved to disk-based queue files (qoq* files in the spool directory) before being sent to the database. This prevents OOM and data loss if the database is temporarily unavailable. The default is yes, but if it was previously set to no, changing it to yes will use disk storage instead of RAM for queued queries.

ℹ️ Note:

See Sniffer Configuration for more details on this parameter.

More threads/connections to a db

You can also increase the number of threads used for connection to a db for particular use like CDRs - the VoIPmonitor sniffer service in /etc/voipmonitor.conf uses the option:

mysqlstore_max_threads_cdr = 8

Important: mysql_enable_set_id Limitation

If you are using mysql_enable_new_store = per_query or have enabled mysql_enable_set_id = yes, please note that this configuration limits your setup to a single sensor writing to the database. This option allows the central server to generate CDR IDs instead of waiting for MySQL's AUTO_INCREMENT, which can improve batch insert performance. However, this architecture only works when one sensor is responsible for all database writes. If you have multiple sensors writing to the same database, do not enable this option as it will cause ID conflicts and data corruption.

If you are processing a high volume of SIP OPTIONS, SUBSCRIBE, or NOTIFY messages and see the sip_msg queue growing, you can increase threads specifically for those messages:

mysqlstore_max_threads_sip_msg = 8

However, if the database is waiting for storage I/O, increasing the number of threads will not help.

Clearing File Queue Backlog (qoq* Files)

When the file queue grows and recent calls are not appearing in CDR despite active calls being visible, you may have a backlog of qoq* files in the spool directory waiting to be processed.

What are qoq* Files?

The VoIPmonitor sniffer creates qoq* (queue) files in the spool directory (default: /var/spool/voipmonitor) to buffer database operations. These files contain queued SQL commands waiting to be inserted into the database.

When the database cannot keep up with the insertion rate:

  • Active calls are visible in the GUI (retrieved from sensor memory in real-time)
  • Recent CDRs do not appear (waiting in qoq queue files)
  • SQLq metric stays high or continues growing

Solution 1: Wait for Queue Processing

The system will automatically process the qoq* files as the database catches up. Monitor the progress:

# Check the SQLf parameter in logs to monitor queue size
# SQLf should decrease over time as files are processed

# View current qoq files in spool directory
ls -lh /var/spool/voipmonitor/qoq* 2>/dev/null | wc -l

This is the preferred approach if losing older CDRs is not acceptable.

Solution 2: Delete qoq* Files (Emergency)

If immediate access to recent CDRs is critical and you can afford to lose older CDRs, you can delete the qoq* files to clear the backlog:

# WARNING: This will delete any CDRs still waiting in the files
# Stop the VoIPmonitor service first
systemctl stop voipmonitor

# Delete all qoq* files from the spool directory
rm -f /var/spool/voipmonitor/qoq*

# Start the VoIPmonitor service
systemctl start voipmonitor

⚠️ Warning: Data Loss Warning: Deleting qoq* files will delete any CDRs that were waiting in the queue files. Only use this method if:

  • Immediate access to recent CDRs is critical
  • Losing older CDRs is acceptable
  • You have exhausted all other options (configuration tuning, hardware upgrade)

This clears the backlog and allows new CDRs to be written immediately without waiting for the old queue to process.

MySQL/MariaDB Performance Tuning

Optimize MySQL Performance

Tune the MySQL/MariaDB server for better write performance to handle the high insert rate from VoIPmonitor.

Edit your MySQL configuration file (typically /etc/mysql/my.cnf or /etc/mysql/mariadb.conf.d/50-server.cnf):

[mysqld]
# InnoDB buffer pool size - set to approximately 50-70% of available RAM on a dedicated database server
# On servers running VoIPmonitor and MySQL together, use approximately 30-50% of RAM
innodb_buffer_pool_size = 8G

# Reduce transaction durability for faster writes (may lose up to 1 second of data on crash)
innodb_flush_log_at_trx_commit = 2

Restart MySQL and VoIPmonitor:

systemctl restart mysql
systemctl restart voipmonitor

⚠️ Warning: Setting innodb_flush_log_at_trx_commit to 2 trades some data safety for performance. In the event of a power loss or crash, up to 1 second of the most recent transactions may be lost.

When Configuration Tuning Is Not Enough: Hardware Upgrade Required

If you have applied all the configuration optimizations above and the SQL queue continues to grow or the database remains significantly behind the processing queue, the underlying issue may be insufficient hardware.

Signs That Hardware Upgrade Is Necessary

  • CPU load is consistently at or near 100% on both the database and GUI servers, even during normal traffic patterns
  • Old or aging hardware - Servers with CPUs more than several years old may lack the performance of modern processors
  • Configuration tuning provides only marginal improvement - After applying MySQL and VoIPmonitor optimizations, the delay between the "Last CDR in processing queue" and "Last CDR in database" remains significant

Diagnosing Hardware Limitations

Monitor CPU usage on both the database and GUI servers:

# Check CPU load during peak traffic
top

# Or use sar for historical data
sar -u 1 10

# Check CPU core usage per process
mpstat -P ALL 1 5

If the CPU load is consistently at or near 100% across multiple cores, the hardware cannot keep up with the traffic load. No amount of configuration tuning will solve this issue - the servers themselves need to be upgraded.

Hardware Upgrade Recommendations

Upgrade to a more modern CPU architecture with significantly better performance:

  • CPU - Modern AMD EPYC or Intel Xeon Gold/Silver processors with more cores
  • RAM - Ensure sufficient memory for the database buffer pool (see Memory Configuration)
  • Storage - Use NVMe SSDs for the database to eliminate I/O bottlenecks

Architecture Consideration: Merge GUI and Database

In some cases, merging the GUI and database roles onto a single, powerful new server can be more efficient than maintaining separate, underpowered servers. A single modern server with sufficient CPU cores and RAM can often handle both workloads more effectively than multiple older servers.

For hardware sizing examples, see the Hardware page, which includes real-world deployments for various call volumes.

Migrating MySQL Data to Faster Storage

When upgrading from HDD or slow SATA SSDs to NVMe storage, you can migrate the MySQL data directory (`datadir`) while the system is running. There is no CDR loss expected with this method because the sniffer queues the CDRs internally and will process them after MySQL restarts.

Prerequisites:

  • New SSD/NVMe storage installed and mounted
  • Sufficient disk space on the new storage for the existing MySQL data
  • MySQL/MariaDB service can be stopped briefly for the migration

Migration Procedure:

# 1. Prepare the new storage (if your SSD is hot-pluggable)
# Mount the new filesystem. Example:
mount /dev/nvme0n1p1 /mnt/fast_storage
mkdir /mnt/fast_storage/mysql

# 2. Stop the MySQL service
systemctl stop mysql
# Or for MariaDB:
# systemctl stop mariadb

# 3. Copy the MySQL data directory to the new location
# The -a flag preserves permissions and ownership, -x skips other filesystems
rsync -avx /var/lib/mysql/ /mnt/fast_storage/mysql/

# 4. Verify the copy
ls -la /mnt/fast_storage/mysql/
# Check that all databases are present

# 5. Update MySQL configuration to point to the new datadir location
# Edit /etc/mysql/my.cnf or /etc/my.cnf and change:
# datadir = /var/lib/mysql
# To:
# datadir = /mnt/fast_storage/mysql

# Also update the socket location if needed:
# socket = /tmp/mysql.sock  (or your preferred location)

# 6. Update AppArmor/SELinux (Ubuntu/Debian only)
# Edit /etc/apparmor.d/usr.sbin.mysqld and update the paths:
# /var/lib/mysql/ r,
# /var/lib/mysql/** rwk,
# Change to:
# /mnt/fast_storage/mysql/ r,
# /mnt/fast_storage/mysql/** rwk,

# Reload AppArmor:
systemctl reload apparmor

# 7. Start MySQL with the new datadir
systemctl start mysql

# 8. Verify MySQL is running and databases are accessible
mysql -e "SHOW DATABASES;"
mysql -e "SELECT COUNT(*) FROM cdr;"

# 9. Monitor the SQL queue in the VoIPmonitor GUI
# Navigate to GUI -> Settings -> Sensors -> Status
# The SQLq value should decrease as queued CDRs are processed

Important Notes:

  • No CDR Loss: The VoIPmonitor sniffer queues CDRs in memory during MySQL downtime. These will be processed after MySQL restarts.
  • Backup First: Always take a backup of `/var/lib/mysql` before migration.
  • Service Downtime: Plan for MySQL to be stopped for the duration of the copy operation (depends on database size and storage speed).
  • Storage Mount Options: For the new database partition, mount with ext4 optimizations:
/dev/nvme0n1p1  /mnt/fast_storage  ext4  defaults,noatime,data=writeback,barrier=0  0  1
  • Permissions: Ensure the new datadir and all files are owned by the MySQL user (`mysql:mysql`).
  • Symbolic Links Alternative: You can create a symbolic link instead of changing the datadir in my.cnf:
# After stopping MySQL and copying data:
mv /var/lib/mysql /var/lib/mysql.old
ln -s /mnt/fast_storage/mysql /var/lib/mysql
systemctl start mysql

Database Errors

Troubleshooting: Database Error 1062 - Lookup Table Auto-Increment Limit

If the sniffer logs show a database error `1062 - Duplicate entry '16777215' for key 'PRIMARY'` and new CDRs stop being stored, this is caused by a lookup table reaching its maximum auto-increment limit.

Symptoms

  • CDRs stop being inserted into the database
  • Sniffer logs show: `query error in [call __insert_10_0S1();]: 1062 - Duplicate entry '16777215' for key 'PRIMARY'`
  • The error affects a lookup table (such as `cdr_sip_response` or `cdr_reason`)
  • The value 16777215 (16,777,215) indicates the table is using `MEDIUMINT UNSIGNED` for the ID column

Root Cause

VoIPmonitor uses lookup tables (like `cdr_sip_response` or `cdr_reason`) to store unique values such as SIP response reason strings or custom response text. These are used to normalize data and reduce storage in the main `cdr` table.

When the system receives many unique SIP response strings or reason messages (e.g., different error messages from various carriers, devices with custom SIP header formats, or PBX-specific responses), the lookup table's auto-increment ID can reach the `MEDIUMINT` limit of 16,777,215. Once this limit is hit, new unique values cannot be inserted, causing all subsequent CDRs to fail with error 1062.

Identifying the Affected Table

Check which lookup table is hitting the limit:

-- Check the current AUTO_INCREMENT value for lookup tables
SELECT
    TABLE_NAME,
    COLUMN_TYPE,
    AUTO_INCREMENT
FROM
    INFORMATION_SCHEMA.TABLES
JOIN
    INFORMATION_SCHEMA.COLUMNS
USING (TABLE_SCHEMA, TABLE_NAME)
WHERE
    TABLE_SCHEMA = 'voipmonitor' AND
    (TABLE_NAME LIKE 'cdr_sip%' OR TABLE_NAME LIKE 'cdr_reason%') AND
    COLUMN_KEY = 'PRI' AND
    EXTRA LIKE '%auto_increment%'
ORDER BY AUTO_INCREMENT DESC;

Look for AUTO_INCREMENT values approaching or exceeding 16,000,000 in tables using `MEDIUMINT`.

Solution: Prevent New Unique Entries

The most effective solution is to configure VoIPmonitor to stop storing or normalize the unique SIP response text that is causing the rapid growth of the lookup table.

Option 1: Disable SIP Response Text Storage

Edit `/etc/voipmonitor.conf` on the sniffer to disable storing SIP response reason text:

# Disable storing SIP response reason strings in lookup tables
cdr_reason_string_enable = no

This prevents the system from creating new unique entries for SIP response reason strings. Restart the sniffer:

systemctl restart voipmonitor

Option 2: Normalize Response Text

If you need to keep some response text but reduce the number of unique entries, enable normalization in `/etc/voipmonitor.conf`:

# Normalize SIP response text to reduce unique entries
cdr_reason_normalisation = yes
cdr_sip_response_normalisation = yes
cdr_ua_normalisation = yes

Normalization transforms similar response strings into a single canonical form, significantly reducing the number of unique rows created. Include all three normalization options for maximum effectiveness.

Option 3: Clear Queued SQL Queries

If error 1062 persists after applying the configuration changes and restarting the service, there may be a large backlog of failed SQL queries queued in the spool directory. Clearing this queue can eliminate the persistent 1062 errors that are caused by previously buffered failed inserts.

⚠️ Warning: This step will DELETE all buffered CDRs in the queue. These CDRs will be permanently lost.

To clear the SQL queue:

# 1. Stop the VoIPmonitor service
systemctl stop voipmonitor

# 2. Remove the queued query files (qoq-* files)
# The default spool directory is /var/spool/voipmonitor
rm -f /var/spool/voipmonitor/qoq-*

# 3. Verify the files are removed
ls /var/spool/voipmonitor/qoq-*

# 4. Restart the service
systemctl start voipmonitor

# 5. Check that service is running
systemctl status voipmonitor

After restarting, the service should no longer attempt to re-insert the previously failed 1062 queries from the queue. Monitor the logs to confirm the error has stopped.

Option 4: Clean Existing Data (Immediate Fix)

The lookup table has reached its MEDIUMINT limit, preventing new CDRs from being stored. Truncating the table clears it and resets the auto-increment counter to 1, allowing CDRs to be written immediately.

⚠️ Warning: TRUNCATE permanently deletes all data. This will remove the exact SIP response text display in the GUI for historical CDRs, but will not affect the main CDR records or call data. Only do this if you are certain you no longer need the original response text.

-- Clear the cdr_reason table (adjust table name as needed based on error message)
TRUNCATE TABLE cdr_reason;

Verification

After applying the fix:

1. Check that CDRs are being stored again by monitoring the sniffer logs 2. Verify the lookup table AUTO_INCREMENT is no longer increasing rapidly:

SELECT AUTO_INCREMENT FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'cdr_sip_response' AND TABLE_SCHEMA = 'voipmonitor';

3. Monitor the error logs to confirm the 1062 error has stopped appearing

Important Note: NOT a Database Schema Issue

This error is typically NOT solved by changing the database schema (e.g., migrating to BIGINT). The root cause is storing too many unique SIP response strings, which will continue to grow regardless of the ID column size. The correct solution is to configure VoIPmonitor to stop creating these unique entries via the `cdr_reason_string_enable` configuration option.

⚠️ Warning: Do NOT confuse this with the unrelated `cdr` table integer overflow problem. The main `cdr` table may encounter limits around 4 billion rows (32-bit INT), which is addressed in the Upgrade_to_bigint guide. Lookup table issues at 16.7 million (MEDIUMINT) are solved by configuration, not schema migration.

See Also

AI Summary for RAG

Summary: This page covers VoIPmonitor database troubleshooting including SQL queue issues, CDR delays, MySQL performance tuning, and database errors. SQL QUEUE DELAYS: The Active Calls view shows real-time data from sniffer while CDR view shows database records - delays occur when database cannot keep up. Symptoms include slow CDR appearance, "Crontab log is too old" warning, lag in reporting. Monitor SQLq/SQLf metrics in GUI Settings -> Sensors -> Status. QUICK CDR VISIBILITY: Use quick_save_cdr=yes (3s delay) or quick_save_cdr=quick (1s delay) in voipmonitor.conf to reduce CDR appearance delay (default is 10s). Warning: increases CPU/I/O load. QOQ FILES: SQL queries are buffered in qoq* files in /var/spool/voipmonitor when database is slow. To clear backlog: either wait for automatic processing (preferred), or emergency delete: stop voipmonitor, rm -f /var/spool/voipmonitor/qoq*, start voipmonitor (loses queued CDRs). PREVENT OOM: Keep query_cache=yes (default) to store queries on disk instead of RAM - never set to no as it risks OOM and CDR loss. DATABASE THREADS: Increase mysqlstore_max_threads_cdr and mysqlstore_max_threads_sip_msg for high traffic (default 4, can increase to 8-16). Note: mysql_enable_set_id=yes limits setup to single sensor writing to database. MYSQL TUNING: Set innodb_buffer_pool_size=50-70% RAM on dedicated DB or 30-50% on shared server. Set innodb_flush_log_at_trx_commit=2 for faster writes (may lose 1 second of data on crash). HARDWARE UPGRADE: If CPU is consistently at 100% after tuning, hardware upgrade is needed - modern AMD EPYC or Intel Xeon with NVMe SSD. MYSQL TO SSD MIGRATION: Stop MySQL, rsync /var/lib/mysql to new SSD, update datadir in my.cnf, update AppArmor paths, start MySQL. No CDR loss expected as sniffer queues internally. ERROR 1062 - LOOKUP TABLE LIMIT: If logs show "1062 - Duplicate entry '16777215' for key 'PRIMARY'" and CDRs stop being stored, this is lookup table (cdr_sip_response, cdr_reason) hitting MEDIUMINT limit (16,777,215). NOT a schema issue. SOLUTION: Set cdr_reason_string_enable=no in voipmonitor.conf to disable storing SIP response strings, OR enable all three normalization options (cdr_reason_normalisation=yes, cdr_sip_response_normalisation=yes, cdr_ua_normalisation=yes). IMMEDIATE FIX: TRUNCATE TABLE cdr_reason to reset auto-increment. CLEAR QUEUE: If error persists after config change, stop service, rm -f /var/spool/voipmonitor/qoq-*, restart. Do NOT confuse with cdr table INT overflow (4B rows) which requires Upgrade_to_bigint guide.

Keywords: SQL queue, SQLq, SQLf, database delay, CDR delay, active calls, CDR view, mysqlstore_max_threads_cdr, mysqlstore_max_threads_sip_msg, quick_save_cdr, query_cache, qoq files, queue files, spool directory, database backlog, innodb_buffer_pool_size, innodb_flush_log_at_trx_commit, MySQL tuning, MariaDB tuning, hardware upgrade, CPU 100%, AMD EPYC, Intel Xeon, NVMe SSD, datadir migration, MySQL to SSD, rsync MySQL, AppArmor MySQL, symbolic link database, OOM, out of memory, 1062 duplicate entry, 16777215, lookup table, MEDIUMINT limit, cdr_sip_response, cdr_reason, cdr_reason_string_enable, auto-increment limit, normalization, cdr_reason_normalisation, cdr_sip_response_normalisation, cdr_ua_normalisation, TRUNCATE, database error, mysql_enable_set_id, mysql_enable_new_store, central writer, single sensor

Key Questions:

  • Why are CDRs delayed in the GUI? (Database cannot keep up with insertion rate - check SQLq/SQLf metrics)
  • How do I reduce CDR delay in VoIPmonitor? (Set quick_save_cdr=yes or quick_save_cdr=quick in voipmonitor.conf)
  • What are qoq files in VoIPmonitor? (SQL queue files in spool directory buffering database operations)
  • How do I clear SQL queue backlog? (Stop service, rm -f /var/spool/voipmonitor/qoq*, start service - loses queued CDRs)
  • Why is SQL queue growing? (Database cannot keep up - check MySQL performance, increase threads, upgrade hardware)
  • How do I increase database threads? (Set mysqlstore_max_threads_cdr=8 and mysqlstore_max_threads_sip_msg=8 in voipmonitor.conf)
  • What causes error 1062 - Duplicate entry '16777215' for key 'PRIMARY'? (Lookup table hitting MEDIUMINT limit due to too many unique SIP response strings)
  • How do I fix error 1062 in cdr_sip_response or cdr_reason tables? (Set cdr_reason_string_enable=no in voipmonitor.conf, restart sniffer)
  • How do I stop 1062 errors immediately? (TRUNCATE the affected lookup table to reset auto-increment counter)
  • How do I prevent error 1062 from recurring? (Set cdr_reason_string_enable=no, or enable ALL THREE normalization options)
  • Should I migrate lookup table to BIGINT to fix error 1062? (No, the root cause is storing too many unique strings - use configuration fix)
  • Why does error 1062 persist after configuration change? (Failed queries remain queued in qoq-* files - clear them)
  • How do I migrate MySQL to SSD? (Stop MySQL, rsync data to SSD, update datadir in my.cnf, start MySQL)
  • When should I upgrade database hardware? (When CPU is consistently at 100% after all tuning optimizations)
  • What is mysql_enable_set_id limitation? (Limits setup to single sensor writing to database - do not use with multiple sensors)
  • How do I optimize MySQL for VoIPmonitor? (Set innodb_buffer_pool_size=50-70% RAM, innodb_flush_log_at_trx_commit=2)