SQL queue is growing in a peaktime: Difference between revisions

From VoIPmonitor.org
(Addcdr_ua_normalisation config option and qoq queue cleanup steps for 1062 error troubleshooting (matches ticket 79747 solution))
(Improve page: add storage speed section, VoIPmonitor tuning, MySQL tuning, reference High-Performance only for extreme cases)
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{DISPLAYTITLE:Troubleshooting: No Calls Being Sniffed}}
{{DISPLAYTITLE:SQL Queue Growing During Peak Time}}
[[Category:Troubleshooting]]
[[Category:Database]]
[[Category:Performance]]


'''This guide provides a systematic, step-by-step process to diagnose why the VoIPmonitor sensor might not be capturing any calls. Follow these steps in order to quickly identify and resolve the most common issues.'''
= SQL Queue Growing During Peak Time =


== Troubleshooting Flowchart ==
When the SQL queue (SQLq/SQLf) grows during peak traffic, the database cannot keep up with the CDR insertion rate. This causes delayed CDR visibility in the GUI and can lead to memory exhaustion.


<mermaid>
== Quick Diagnosis ==
 
<kroki lang="mermaid">
%%{init: {'flowchart': {'nodeSpacing': 15, 'rankSpacing': 30}}}%%
flowchart TD
flowchart TD
     A[No Calls Being Captured] --> B{Step 1: Service Running?}
     A[SQLq/SQLf growing] --> B{mysqld CPU 100%?}
     B -->|No| B1[systemctl restart voipmonitor]
     B -->|Yes| C[CPU bottleneck]
     B -->|Yes| C{Step 2: Traffic on Interface?<br/>tshark -i eth0 -Y 'sip'}
     B -->|No| D{High iowait?}
 
     D -->|Yes| E[I/O bottleneck]
    C -->|No packets| D[Step 3: Network Issue]
     D -->|No| F[RAM/Buffer bottleneck]
    D --> D1{Interface UP?}
     D1 -->|No| D2[ip link set dev eth0 up]
    D1 -->|Yes| D3{SPAN/RSPAN?}
    D3 -->|Yes| D4[Enable promisc mode]
    D3 -->|ERSPAN/GRE/TZSP| D5[Check tunnel config]
 
     C -->|Packets visible| E[Step 4: VoIPmonitor Config]
    E --> E1{interface correct?}
    E1 -->|No| E2[Fix interface in voipmonitor.conf]
    E1 -->|Yes| E3{sipport correct?}
    E3 -->|No| E4[Add port: sipport = 5060,5080]
    E3 -->|Yes| E5{BPF filter blocking?}
    E5 -->|Maybe| E6[Comment out filter directive]
 
    E5 -->|No| F[Step 5: GUI Capture Rules]
    F --> F1{Rules with Skip: ON?}
    F1 -->|Yes| F2[Remove/modify rules + reload sniffer]
    F1 -->|No| G[Step 6: Check Logs]
 
    G --> H{OOM Events?}
    H -->|Yes| H1[Step 7: Add RAM / tune MySQL]
    H -->|No| I{Large SIP packets?}
    I -->|Yes| I1{External SIP source?<br/>Kamailio/HAProxy mirror}
    I1 -->|No| I2[Increase snaplen in voipmonitor.conf]
    I1 -->|Yes| I3[Fix external source: Kamailio siptrace or HAProxy tee]
    I2 --> I4[If snaplen change fails, recheck with tcpdump -s0]
    I4 --> I1
    I -->|No| J[Contact Support]
</mermaid>
 
== Post-Reboot Verification Checklist ==
After a planned server reboot, verify these critical items to ensure VoIPmonitor operates correctly. This check helps identify issues that may occur when configurations are not persisted across reboots.
 
=== Verify Firewall/Iptables Rules ===
 
After a system restart, verify that firewall rules have been correctly applied and are allowing necessary traffic. Firewall rules may need to be manually re-applied if they were not made persistent.
 
;1. Check current firewall status:
<syntaxhighlight lang="bash">
# For systems using iptables
iptables -L -n -v
 
# For systems using firewalld
firewall-cmd --list-all
 
# For systems using ufw
ufw status verbose
</syntaxhighlight>
 
;2. Verify critical ports are allowed:
Ensure the firewall permits traffic on the following VoIPmonitor ports:
* SIP ports (default: 5060/udp, or your configured sipport values)
* RTP ports (range used by your PBX)
* GUI access (typically: 80/tcp, 443/tcp)
* Sensor management port: 5029/tcp
* Client-Server connection port: 60024/tcp (for distributed setups)
 
;3. Make firewall rules persistent:
To prevent firewall rules from being lost after future reboots:
 
'''For iptables (Debian/Ubuntu):'''
<syntaxhighlight lang="bash">
# Save current rules
iptables-save > /etc/iptables/rules.v4
# Install persistent package if not present
apt-get install iptables-persistent
</syntaxhighlight>
 
'''For firewalld (CentOS/RHEL):'''
<syntaxhighlight lang="bash">
# Runtime rules automatically persist with --permanent flag
firewall-cmd --permanent --add-port=5060/udp
firewall-cmd --permanent --add-port=60024/tcp
firewall-cmd --reload
</syntaxhighlight>
 
=== Verify System Time Synchronization ===
 
Correct system time synchronization is '''critical''', especially when using the <code>packetbuffer_sender</code> option in distributed architectures. Time mismatches between hosts and servers can cause call correlation failures and dropped packets.
 
;1. Check current NTP/chrony status:
<syntaxhighlight lang="bash">
# For systems using NTP
ntpstat
 
# For systems using chrony
chronyc tracking
</syntaxhighlight>
 
;2. Verify time synchronization with servers:
<syntaxhighlight lang="bash">
# For NTP
ntpq -p
 
# For chrony
chronyc sources -v
</syntaxhighlight>
 
'''Expected output:''' Time offset should be minimal (ideally under 100 milliseconds). Large offsets (several seconds) indicate synchronization problems.
 
;3. Manual sync if needed (temporary fix):
<syntaxhighlight lang="bash">
# Force immediate NTP sync
sudo systemctl restart ntp
 
# For chrony
sudo chronyc makestep
</syntaxhighlight>
 
'''Critical for packetbuffer_sender mode:''' When using <code>packetbuffer_sender=yes</code> to forward raw packets from remote sensors to a central server, the host and server '''must have synchronized times'''. VoIPmonitor requires host and server times to match for proper call correlation and packet processing. Maximum allowed time difference is 2 seconds by default (configurable via <code>client_server_connect_maximum_time_diff_s</code>).
 
;4. Check distributed architecture time sync:
In Client-Server mode, ensure all sensors and the central server are synchronized to the same NTP servers:
 
<syntaxhighlight lang="bash">
# On each sensor and central server
timedatectl status
</syntaxhighlight>
 
Look for: <code>System clock synchronized: yes</code>
 
If times are not synchronized across distributed components:
* Verify all systems point to the same reliable NTP source
* Check firewall allows UDP port 123 (NTP)
* Ensure timezones are consistent across all systems
 
'''Troubleshooting time sync issues:'''
* Check firewall rules allow NTP (UDP port 123)
* Verify NTP servers are reachable: <code>ping pool.ntp.org</code>
* Review NTP configuration: <code>/etc/ntp.conf</code> or <code>/etc/chrony.conf</code>
* Ensure time service is enabled to start on boot: <code>systemctl enable ntp</code>
 
== Step 1: Is the VoIPmonitor Service Running Correctly? ==
First, confirm that the sensor process is active and loaded the correct configuration file.
 
;1. Check the service status (for modern systemd systems):
<syntaxhighlight lang="bash">
systemctl status voipmonitor
</syntaxhighlight>
Look for a line that says <code>Active: active (running)</code>. If it is inactive or failed, try restarting it with <code>systemctl restart voipmonitor</code> and check the status again.
 
;2. Verify the running process:
<syntaxhighlight lang="bash">
ps aux | grep voipmonitor
</syntaxhighlight>
This command will show the running process and the exact command line arguments it was started with. Critically, ensure it is using the correct configuration file, for example: <code>--config-file /etc/voipmonitor.conf</code>. If it is not, there may be an issue with your startup script.
 
=== Troubleshooting: Missing Package or Library Dependencies ===
 
If the sensor service fails to start or crashes immediately with an error about a "missing package" or "missing library," it indicates that a required system dependency is not installed on the server. This is most common on newly installed sensors or fresh operating system installations.
 
;1. Check the system logs for the specific error message:
<syntaxhighlight lang="bash">
# For Debian/Ubuntu
tail -f /var/log/syslog | grep voipmonitor
 
# For CentOS/RHEL/AlmaLinux or systemd systems
journalctl -u voipmonitor -f
</syntaxhighlight>
 
;2. Common missing packages for sensors:
Most sensor missing package issues are resolved by installing the <code>rrdtools</code> package. This is required for RRD (Round-Robin Database) graphing and statistics functionality.
 
<syntaxhighlight lang="bash">
# For Debian/Ubuntu
apt-get update && apt-get install rrdtool
 
# For CentOS/RHEL/AlmaLinux
yum install rrdtool
# OR
dnf install rrdtool
</syntaxhighlight>
 
;3. Other frequently missing dependencies:
If the error references a specific shared library or binary, install it using your package manager. Common examples:
 
* <code>libpcap</code> or <code>libpcap-dev</code>: Packet capture library
* <code>libssl</code> or <code>libssl-dev</code>: SSL/TLS support
* <code>zlib</code> or <code>zlib1g-dev</code>: Compression library
 
;4. Verify shared library dependencies:
If the error mentions a specific shared library (e.g., <code>error while loading shared libraries: libxxx.so</code>), check which libraries the binary is trying to load:
 
<syntaxhighlight lang="bash">
ldd /usr/local/sbin/voipmonitor | grep pcap
</syntaxhighlight>
 
If <code>ldd</code> reports "not found," install the missing library using your package manager.
 
;5. After installing the missing package, restart the sensor service:
<syntaxhighlight lang="bash">
systemctl restart voipmonitor
systemctl status voipmonitor
</syntaxhighlight>
 
Verify the service starts successfully and is now <code>Active: active (running)</code>.
 
=== Troubleshooting: Cron Daemon Not Running ===
 
If the VoIPmonitor sniffer service fails to start or the sensor appears unavailable despite the systemd service being configured correctly, the cron daemon may not be running. Some VoIPmonitor deployment methods and maintenance scripts rely on cron for proper initialization and periodic tasks.
 
;1. Check the cron daemon status:
<syntaxhighlight lang="bash">
systemctl status cron
</syntaxhighlight>
 
Look for <code>Active: active (running)</code>. If the status shows inactive or failed, the cron daemon is not running.
 
;2. Alternative check for systems using cron (not crond):
<syntaxhighlight lang="bash">
systemctl status crond
</syntaxhighlight>
 
Note: On CentOS/RHEL systems, the service is typically named <code>crond</code>, while on Debian/Ubuntu systems it is named <code>cron</code>.
 
;3. Start the cron daemon if it is inactive:
<syntaxhighlight lang="bash">
# For Debian/Ubuntu systems
systemctl start cron
 
# For CentOS/RHEL/AlmaLinux systems
systemctl start crond
</syntaxhighlight>
 
;4. Enable cron to start automatically on boot:
<syntaxhighlight lang="bash">
# For Debian/Ubuntu systems
systemctl enable cron
 
# For CentOS/RHEL/AlmaLinux systems
systemctl enable crond
</syntaxhighlight>
 
;5. After starting the cron daemon, restart the VoIPmonitor service:
<syntaxhighlight lang="bash">
systemctl restart voipmonitor
systemctl status voipmonitor
</syntaxhighlight>
 
Verify the service now shows <code>Active: active (running)</code> and the sensor becomes visible in the GUI.
 
=== Root Cause ===
The cron daemon being inactive can prevent VoIPmonitor from starting properly in scenarios where:
* Installation scripts use cron for post-install configuration
* Maintenance or cleanup jobs are required for proper sensor operation
* System initialization processes depend on cron-based tasks
* The sensor was recently rebooted or upgraded and cron failed to start
 
=== Long-Term Stability ===
If the cron daemon is consistently failing to start after reboots:
* Check system logs for cron startup errors: <code>journalctl -u cron -n 50</code> or <code>journalctl -u crond -n 50</code>
* Verify that the server has sufficient resources (CPU, memory) to run all required system services
* Investigate performance bottlenecks that may be causing system services to fail to start
* Ensure no other system services are conflicting or preventing cron from starting
 
== Step 2: Is Network Traffic Reaching the Server? ==
If the service is running, the next step is to verify if the VoIP packets (SIP/RTP) are actually arriving at the server's network interface. The best tool for this is <code>tshark</code> (the command-line version of Wireshark).
 
;1. Install tshark:
<syntaxhighlight lang="bash">
# For Debian/Ubuntu
apt-get update && apt-get install tshark
 
# For CentOS/RHEL/AlmaLinux
yum install wireshark
</syntaxhighlight>
 
;2. Listen for SIP traffic on the correct interface:
Replace <code>eth0</code> with the interface name you have configured in <code>voipmonitor.conf</code>.
<syntaxhighlight lang="bash">
tshark -i eth0 -Y "sip || rtp" -n
</syntaxhighlight>
* '''If you see a continuous stream of SIP and RTP packets''', it means traffic is reaching the server, and the problem is likely in VoIPmonitor's configuration (see Step 4).
* '''If you see NO packets''', the problem lies with your network configuration. Proceed to Step 3.
 
== Step 3: Troubleshoot Network and Interface Configuration ==
If <code>tshark</code> shows no traffic, it means the packets are not being delivered to the operating system correctly.
 
;1. Check if the interface is UP:
Ensure the network interface is active.
<syntaxhighlight lang="bash">
ip link show eth0
</syntaxhighlight>
The output should contain the word <code>UP</code>. If it doesn't, bring it up with:
<syntaxhighlight lang="bash">
ip link set dev eth0 up
</syntaxhighlight>
 
;2. Check for Promiscuous Mode (for SPAN/RSPAN Mirrored Traffic):
'''Important:''' Promiscuous mode requirements depend on your traffic mirroring method:
 
* '''SPAN/RSPAN (Layer 2 mirroring):''' The network interface '''must''' be in promiscuous mode. Mirrored packets retain their original MAC addresses, so the interface would normally ignore them. Promiscuous mode forces the interface to accept all packets regardless of destination MAC.
 
* '''ERSPAN/GRE/TZSP/VXLAN (Layer 3 tunnels):''' Promiscuous mode is '''NOT required'''. These tunneling protocols encapsulate the mirrored traffic inside IP packets that are addressed directly to the sensor's IP address. The operating system receives these packets normally, and VoIPmonitor automatically decapsulates them to extract the inner SIP/RTP traffic.
 
For SPAN/RSPAN deployments, check the current promiscuous mode status:
<syntaxhighlight lang="bash">
ip link show eth0
</syntaxhighlight>
Look for the <code>PROMISC</code> flag.
 
Enable promiscuous mode manually if needed:
<syntaxhighlight lang="bash">
ip link set eth0 promisc on
</syntaxhighlight>
If this solves the problem, you should make the change permanent. The <code>install-script.sh</code> for the sensor usually attempts to do this, but it can fail.
 
;3A. Troubleshooting: Missing Packets for Specific IPs During High-Traffic Periods:
If calls are missing only for certain IP addresses or specific call flows (particularly during high-traffic periods), the issue is typically at the network infrastructure level (SPAN configuration) rather than sensor resource limits. Use this systematic approach:
 
=== Step 1: Use tcpdump to Verify Packet Arrival ===
 
Before tuning any sensor configuration, first verify if the missing packets are actually reaching the sensor's network interface. Use <code>tcpdump</code> for this verification:
 
<syntaxhighlight lang="bash">
# Listen for SIP packets from a specific IP during the next high-traffic window
# Replace eth0 with your interface and 10.1.2.3 with the problematic IP
tcpdump -i eth0 -nn "host 10.1.2.3 and port 5060" -v
 
# Or capture to a file for later analysis
tcpdump -i eth0 -nn "host 10.1.2.3 and port 5060" -w /tmp/trace_10.1.2.3.pcap
</syntaxhighlight>
 
Interpret the results:
* '''If you see SIP packets arriving:''' The traffic reaches the sensor. The issue is likely a sensor resource bottleneck (CPU, memory, or configuration limits). Proceed to [[#Sensor_Resource_Bottlenecks|Step 4: Check Sensor Statistics]].
* '''If you see NO packets or only intermittent packets:''' The traffic is not reaching the sensor. This indicates a network infrastructure issue. Proceed to [[#SPAN_Configuration_Troubleshooting|Step 2: Check SPAN Configuration]].
 
=== Step 2: Check SPAN Configuration for Bidirectional Capture ===
 
If packets are missing at the interface level, verify your network switch's SPAN (port mirroring) configuration. During high-traffic periods, switches may have insufficient SPAN buffer capacity, causing packets to be dropped in the mirroring process itself.
 
Key verification points:
 
* '''Verify Source Ports:''' Confirm that both source IP addresses (or the switch ports they connect to) are included in the SPAN source list. Missing one direction of the call flow will result in incomplete CDRs.
 
* '''Check for Bidirectional Mirroring:''' Your SPAN configuration must capture '''BOTH inbound and outbound traffic'''. On most Cisco switches, this requires specifying:
  <syntaxhighlight lang="bash">
  monitor session 1 source interface GigabitEthernet1/1 both
  </syntaxhighlight>
 
  Replace <code>both</code> with:
  * <code>rx</code> for incoming traffic only
  * <code>tx</code> for outgoing traffic only
  * <code>both</code> for bidirectional capture (recommended)
 
* '''Verify Destination Port:''' Confirm the SPAN destination points to the switch port where the VoIPmonitor sensor is connected.
 
* '''Check SPAN Buffer Saturation (High-Traffic Issues):''' Some switches have limited SPAN buffer capacity. When monitoring multiple high-traffic ports simultaneously, the SPAN buffer may overflow during peak usage, causing randomized packet drops. Symptoms:
  ** Drops occur only during busy hours
  ** Missing packets are inconsistent across different calls
  ** Sensor CPU usage and t0CPU metrics appear normal (no bottleneck at sensor)
 
  Solutions:
  ** Reduce the number of monitored source ports in the SPAN session
  ** Use multiple SPAN sessions if your switch supports it
  ** Consider upgrading to a switch with higher SPAN buffer capacity
 
* '''Verify Switch Interface Counters for Packet Drops:''' Check the network switch interface counters to determine if the switch itself is dropping packets during the mirroring process. This is critical when investigating false low MOS scores or packet loss reports.
 
  Cisco switches:
  <syntaxhighlight lang="bash">
  # Show general interface statistics for the SPAN source port
  show interface GigabitEthernet1/1 counters
  show interface GigabitEthernet1/1 | include drops|errors|Input queue|Output queue
 
  # Show detailed interface status (look for input errors, CRC, frame)
  show interface GigabitEthernet1/1 detail
 
  # Monitor in real-time during a high-traffic period
  show interface Gi1/1 accounting
  </syntaxhighlight>
 
  Key indicators of switch-level packet loss:
  ** Non-zero input errors or CRC errors on source/destination ports
  ** Input queue drops (indicating switch buffer overflow)
  ** Increasing drop counters during peak traffic hours
  ** Output errors on the SPAN destination port (sensor may not be accepting fast enough)
 
  If switch interface counters show drops, the issue is at the network infrastructure level (overloaded switch), not the VoIPmonitor sensor. Consult your network administrator for switch optimization or consider redistributing SPAN traffic across multiple ports.
 
* '''Verify VLAN Trunking:''' If the monitored traffic spans different VLANs, ensure the SPAN destination port is configured as a trunk to carry all necessary VLAN tags. Without trunk mode, packets from non-native VLANs will be dropped or stripped of their tags.
 
For detailed instructions on configuring SPAN/ERSPAN/GRE for different network environments, see [[Sniffing_modes]].
 
=== Step 3: Check for Sensor Resource Bottlenecks ===
 
If <code>tcpdump</code> confirms that packets are arriving at the interface consistently, but VoIPmonitor is still missing them, the issue may be sensor resource limitations.
 
* '''Check Packet Drops:''' In the GUI, navigate to '''Settings → Sensors''' and look at the "# packet drops" counter. If this counter is non-zero or increasing during high traffic:
  ** Increase the <code>ringbuffer</code> size in <code>voipmonitor.conf</code> (default 50 MB, max 2000 MB)
  ** Check the <code>t0CPU</code> metric in system logs - if consistently above 90%, you may need to upgrade CPU or optimize NIC drivers
 
* '''Monitor Memory Usage:''' Check for OOM (Out of Memory) killer events:
  <syntaxhighlight lang="bash">
  grep -i "out of memory\|killed process" /var/log/syslog | tail -20
  </syntaxhighlight>
 
* '''SIP Packet Limits:''' If only long or chatty calls are affected, check the <code>max_sip_packets_in_call</code> and <code>max_invite_packets_in_call</code> limits in <code>voipmonitor.conf</code>.
 
;3. Verify Your SPAN/Mirror/TAP Configuration:
This is the most common cause of no traffic. Double-check your network switch or hardware tap configuration to ensure:
* The correct source ports (where your PBX/SBC is connected) are being monitored.
* The correct destination port (where your VoIPmonitor sensor is connected) is configured.
* If you are monitoring traffic across different VLANs, ensure your mirror port is configured to carry all necessary VLAN tags (often called "trunk" mode).
 
;4. Investigate Packet Encapsulation (If tcpdump shows traffic but VoIPmonitor does not):
If <code>tcpdump</code> or <code>tshark</code> shows packets reaching the interface but VoIPmonitor is not capturing them, the traffic may be encapsulated in a tunnel that VoIPmonitor cannot automatically process without additional configuration. Common encapsulations include VLAN tags, ERSPAN, GRE, VXLAN, and TZSP.
 
First, capture a sample of the traffic for analysis:
<syntaxhighlight lang="bash">
# Capture 100 packets of SIP traffic to a pcap file
tcpdump -i eth0 -c 100 -s0 port 5060 -w /tmp/encapsulation_check.pcap
</syntaxhighlight>
 
Then analyze the capture to identify encapsulation:
<syntaxhighlight lang="bash">
# Check for VLAN-tagged packets (802.1Q)
tshark -r /tmp/encapsulation_check.pcap -Y "vlan"
 
# Check for GRE tunnels
tshark -r /tmp/encapsulation_check.pcap -Y "gre"
 
# Check for ERSPAN (GRE encapsulated with ERSPAN protocol)
tshark -r /tmp/encapsulation_check.pcap -Y "gre && ip.proto == 47"
 
# Check for VXLAN (UDP port 4789)
tshark -r /tmp/encapsulation_check.pcap -Y "udp.port == 4789"
 
# Check for TZSP (UDP ports 37008 or 37009)
tshark -r /tmp/encapsulation_check.pcap -Y "udp.port == 37008 || udp.port == 37009"
 
# Show packet summary to identify any unusual protocol stacks
tshark -r /tmp/encapsulation_check.pcap -V | head -50
</syntaxhighlight>
 
Identifying encapsulation issues:
* '''VLAN tags present:''' Ensure VoIPmonitor's <code>sipport</code> filter does not use <code>udp</code> (which may drop VLAN-tagged packets). Comment out the <code>filter</code> directive in <code>voipmonitor.conf</code> to test.
 
* '''ERSPAN/GRE tunnels:''' Promiscuous mode is NOT required for these Layer 3 tunnels. Verify that tunneling is configured correctly on your network device and that the packets are addressed to the sensor's IP. VoIPmonitor automatically decapsulates ERSPAN and GRE.
 
* '''VXLAN/TZSP tunnels:''' These specialized tunneling protocols require proper configuration on the sending device. Consult your network device documentation for VoIPmonitor compatibility requirements.
 
If encapsulation is identified as the issue, review [[Sniffing_modes]] for detailed configuration guidance.
 
;3B. Troubleshooting: RTP Streams Not Displayed for Specific Provider:
If SIP signaling appears correctly in the GUI for calls from a specific provider, but RTP streams (audio quality graphs, waveform visualization) are missing for that provider while working correctly for other call paths, use this systematic approach to identify the cause.
 
=== Step 1: Make a Test Call to Reproduce the Issue===
 
First, create a controlled test scenario to investigate the specific provider.
 
* Determine if the issue affects ALL calls from this provider or only some (e.g., specific codecs, call duration, time of day)
* Make a test call that reproduces the problem (e.g., from the problematic provider to a test number)
* Allow the call to establish and run for at least 30-60 seconds to capture meaningful RTP data
 
=== Step 2: Capture Packets on the Sniffing Interface During the Test Call ===


During the test call, use <code>tcpdump</code> (or <code>tshark</code>) to directly capture packets on the network interface configured in <code>voipmonitor.conf</code>. This tells you whether RTP packets are being received by the sensor.
    C --> G[Upgrade CPU or optimize queries]
    E --> H[Upgrade to SSD/NVMe]
    F --> I[Increase innodb_buffer_pool_size]
</kroki>


'''Monitor the queue:'''
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Capture SIP and RTP packets from the specific provider IP during your test call
journalctl -u voipmonitor -f | grep SQLf
# Replace eth0 with your interface and 1.2.3.4 with the provider's IP
sudo tcpdump -i eth0 -nn "host 1.2.3.4 and (udp port 5060 or (udp[0] & 0x78) == 0x78)" -v
 
# Capture RTP to a file for detailed analysis (recommended)
sudo tcpdump -i eth0 -nn "host 1.2.3.4 and rtp" -w /tmp/test_provider_rtp.pcap
</syntaxhighlight>
 
Note: The RTP filter <code>(udp[0] & 0x78) == 0x78</code> matches packets with the first two bits of the first byte set to "10", which is characteristic of RTP.
 
=== Step 3: Compare Raw Packet Capture with Sensor Output ===
 
After the test call:
 
* Check what tcpdump captured:
<syntaxhighlight lang="bash">
# Count SIP packets
tshark -r /tmp/test_provider_rtp.pcap -Y "sip" | wc -l
 
# Count RTP packets
tshark -r /tmp/test_provider_rtp.pcap -Y "rtp" | wc -l
 
# View RTP stream details
tshark -r /tmp/test_provider_rtp.pcap -Y "rtp" -T fields -e rtp.ssrc -e rtp.seq -e rtp.ptype -e udp.srcport -e udp.dstport | head -20
</syntaxhighlight>
</syntaxhighlight>


* Check what VoIPmonitor recorded:
'''Check RRD graphs:''' Settings > Sensors > click graph icon > select peak time period. Look for growing "SQL Cache" or "SQL Cache Files".
  * Open the CDR for your test call in the GUI
  * Verify if the "Received Packets" column shows non-zero values for the provider leg
  * Check if the "Streams" section shows RTP quality graphs and waveform visualization
 
* Compare the results:
** '''If tcpdump shows NO RTP packets:''' The RTP traffic is not reaching the sensor interface. This indicates a network-level issue (asymmetric routing, SPAN configuration missing the RTP path, or firewall). You need to troubleshoot the network infrastructure, not VoIPmonitor.
 
** '''If tcpdump shows RTP packets but the GUI shows no streams or zero received packets:''' The packets are reaching the sensor but VoIPmonitor is not processing them. Check:
* [[#Check_GUI_Capture_Rules_(Causing_Call_Stops)|Step 5: Check GUI Capture Rules]] - Look for capture rules targeting the provider's IP with RTP set to "DISCARD" or "Header Only"
* [[Tls|TLS/SSL Decryption]] - Verify SRTP decryption is configured correctly if the provider uses encryption
* [[Sniffer_configuration]] - Check for any problematic <code>sipport</code> or <code>filter</code> settings
 
For more information on capture rules that affect RTP storage, see [[Capture_rules]].
 
;5. Check for Non-Call SIP Traffic Only:
If you see SIP traffic but it consists only of OPTIONS, NOTIFY, SUBSCRIBE, or MESSAGE methods (without any INVITE packets), there are no calls to generate CDRs. This can occur in environments that use SIP for non-call purposes like heartbeat checks or instant messaging.
 
You can configure VoIPmonitor to process and store these non-call SIP messages. See [[SIP_OPTIONS/SUBSCRIBE/NOTIFY]] and [[MESSAGES]] for configuration details.
 
Enable non-call SIP message processing in '''/etc/voipmonitor.conf''':
<syntaxhighlight lang="ini">
# Process SIP OPTIONS (qualify pings). Default: no
sip-options = yes
 
# Process SIP MESSAGE (instant messaging). Default: yes
sip-message = yes
 
# Process SIP SUBSCRIBE requests. Default: no
sip-subscribe = yes
 
# Process SIP NOTIFY requests. Default: no
sip-notify = yes
</syntaxhighlight>
 
Note that enabling these for processing and storage can significantly increase database load in high-traffic scenarios. Use with caution and monitor SQL queue growth. See [[SIP_OPTIONS/SUBSCRIBE/NOTIFY#Performance_Tuning|Performance Tuning]] for optimization tips.
 
== Step 4: Check the VoIPmonitor Configuration ==
If <code>tshark</code> sees traffic but VoIPmonitor does not, the problem is almost certainly in <code>voipmonitor.conf</code>.
 
;1. Check the <code>interface</code> directive:
:Make sure the <code>interface</code> parameter in <code>/etc/voipmonitor.conf</code> exactly matches the interface where you see traffic with <code>tshark</code>. For example: <code>interface = eth0</code>.


;2. Check the <code>sipport</code> directive:
{{Note|SQLf value should normally stay below 10. If it consistently exceeds 10 or grows during peak time, apply mitigations.}}
:By default, VoIPmonitor only listens on port 5060. If your PBX uses a different port for SIP, you must add it. '''Common causes of missing calls:'''
:* '''Missing ports:''' Some providers use alternate SIP ports (5061, 5080, etc.). If these are not listed, calls on those ports will be ignored.
:* '''Syntax errors:''' List multiple ports comma-separated without extra commas or trailing characters. Correct syntax: <code>sipport = 5060,5061</code> or <code>sipport = 5060,5080</code>
:* '''Ranges:''' You can specify port ranges using dashes: <code>sipport = 5060,5070-5080</code>
:Example:
:<code>sipport = 5060,5080</code>


;3. '''Distributed/Probe Setup Considerations:'''
== Storage Speed is Critical ==
:If you are using a remote sensor (probe) with Packet Mirroring (<code>packetbuffer_sender=yes</code>), call detection depends on configuration on '''both''' the probe and the central analysis host.


:Common symptom: The probe captures traffic (visible via <code>tcpdump</code>), but the central server records incomplete or missing CDRs for calls on non-default ports.
'''The most common cause of SQL queue growth is slow storage.''' MySQL write performance depends heavily on disk I/O speed.


{| class="wikitable" style="background:#fff3cd; border:1px solid #ffc107;"
{| class="wikitable"
! Storage Type !! Typical IOPS !! CDR Write Capacity !! Recommendation
|-
|-
! colspan="2" style="background:#ffc107;" | Critical: Both Systems Must Have Matching sipport Configuration
| HDD (7200 RPM) || ~150 || 10-50 CDR/s || '''Not suitable''' for production
|-
|-
| style="vertical-align: top;" | '''Probe side:'''
| SATA SSD || ~50,000 || 200-500 CDR/s || Suitable for small deployments
| The probe captures packets from the network interface. Its <code>sipport</code> setting determines which UDP ports it considers as SIP traffic to capture and forward.
|-
|-
| style="vertical-align: top;" | '''Central server side:'''
| NVMe SSD || ~500,000+ || 1000+ CDR/s || '''Recommended''' for production
| When receiving raw packets in Packet Mirroring mode, the central server analyzes the packets locally. Its <code>sipport</code> setting determines which ports it interprets as SIP during analysis. If a port is missing here, packets are captured but not recognized as SIP, resulting in missing CDRs.
|}
|}


:'''Troubleshooting steps for distributed probe setups:'''
'''Check your current I/O performance:'''
 
::1. Verify traffic reachability on the probe:
::Use <code>tcpdump</code> on the probe VM to confirm SIP packets for the missing calls are arriving on the expected ports.
::<pre>
::# On the probe VM
::tcpdump -i eth0 -n port 5061
::</pre>
 
::2. Check the probe's ''voipmonitor.conf'':
::Ensure the <code>sipport</code> directive on the probe includes all necessary SIP ports used in your network.
::<syntaxhighlight lang="ini">
::# /etc/voipmonitor.conf on the PROBE
::sipport = 5060,5061,5080,6060
::</syntaxhighlight>
 
::3. Check the central analysis host's ''voipmonitor.conf'':
::'''This is the most common cause of missing calls in distributed setups.''' The central analysis host (the system receiving packets via <code>server_bind</code> or legacy <code>mirror_bind</code>) must also have the <code>sipport</code> directive configured with the same list of ports used by all probes.
::<syntaxhighlight lang="ini">
::# /etc/voipmonitor.conf on the CENTRAL HOST
::sipport = 5060,5061,5080,6060
::</syntaxhighlight>
 
::4. Restart both services:
::Apply the configuration changes:
::<syntaxhighlight lang="bash">
::# On both probe and central host
::systemctl restart voipmonitor
::</syntaxhighlight>
 
:For more details on distributed architecture configuration and packet mirroring, see [[Sniffer_distributed_architecture|Distributed Architecture: Client-Server Mode]].
 
;4. Check for a restrictive <code>filter</code>:
:If you have a BPF <code>filter</code> configured, ensure it is not accidentally excluding the traffic you want to see. For debugging, try commenting out the <code>filter</code> line entirely and restarting the sensor.
 
== Step 5: Check GUI Capture Rules (Causing Call Stops) ==
If <code>tshark</code> sees SIP traffic and the sniffer configuration appears correct, but the probe stops processing calls or shows traffic only on the network interface, GUI capture rules may be the culprit.
 
Capture rules configured in the GUI can instruct the sniffer to ignore ("skip") all processing for matched calls. This includes calls matching specific IP addresses or telephone number prefixes.
 
;1. Review existing capture rules:
:Navigate to '''GUI -> Capture rules''' and examine all rules for any that might be blocking your traffic.
:Look specifically for rules with the '''Skip''' option set to '''ON''' (displayed as "Skip: ON"). The Skip option instructs the sniffer to completely ignore matching calls (no files, RTP analysis, or CDR creation).
 
;2. Test by temporarily removing all capture rules:
:To isolate the issue, first create a backup of your GUI configuration:
:* Navigate to '''Tools -> Backup & Restore -> Backup GUI -> Configuration tables'''
:* This saves your current settings including capture rules
:* Delete all capture rules from the GUI
:* Click the '''Apply''' button to save changes
:* Reload the sniffer by clicking the green '''"reload sniffer"''' button in the control panel
:* Test if calls are now being processed correctly
:* If resolved, restore the configuration from the backup and systematically investigate the rules to identify the problematic one
 
;3. Identify the problematic rule:
:* After restoring your configuration, remove rules one at a time and reload the sniffer after each removal
:* When calls start being processed again, you have identified the problematic rule
:* Review the rule's match criteria (IP addresses, prefixes, direction) against your actual traffic pattern
:* Adjust the rule's conditions or Skip setting as needed
 
;4. Verify rules are reloaded:
:After making changes to capture rules, remember that changes are '''not automatically applied''' to the running sniffer. You must click the '''"reload sniffer"''' button in the control panel, or the rules will continue using the previous configuration.
 
For more information on capture rules, see [[Capture_rules]].
 
== Troubleshooting: Service Fails to Start with "failed read rsa key" Error ==
 
If the VoIPmonitor sniffer service fails to start and logs the error message "failed read rsa key," this indicates that the manager key cannot be loaded from the database.
 
=== Cause ===
 
The manager_key is stored in the <code>system</code> database table (identified by <code>type='manager_key'</code>) and is required for proper manager/sensor operations in distributed deployments. This error most commonly occurs when the <code>mysqlloadconfig</code> option in <code>voipmonitor.conf</code> is set to <code>no</code>, which prevents VoIPmonitor from loading configuration (including the manager_key) from the database.
 
=== Troubleshooting Steps ===
 
;1. Check for the error in system logs:
<syntaxhighlight lang="bash">
# For Debian/Ubuntu
tail -f /var/log/syslog | grep voipmonitor
 
# For CentOS/RHEL/AlmaLinux
tail -f /var/log/messages | grep voipmonitor
 
# For systemd systems
journalctl -u voipmonitor -f
</syntaxhighlight>
 
;2. Verify mysqlloadconfig setting:
<syntaxhighlight lang="bash">
# Check if mysqlloadconfig is set to no in voipmonitor.conf
grep mysqlloadconfig /etc/voipmonitor.conf
</syntaxhighlight>
 
If the output shows <code>mysqlloadconfig = no</code>, this is the cause of the issue.
 
;3. Fix the mysqlloadconfig setting:
<syntaxhighlight lang="bash">
# Edit the configuration file
nano /etc/voipmonitor.conf
 
# Either remove the mysqlloadconfig line entirely (defaults to yes)
# Or uncomment/set to yes:
# mysqlloadconfig = yes
 
# Restart the sniffer service
systemctl restart voipmonitor
 
# Check if it started successfully
systemctl status voipmonitor
</syntaxhighlight>
 
;4. Verify manager_key exists in database:
<syntaxhighlight lang="sql">
-- Query the manager_key from the system table
SELECT * FROM voipmonitor.`system` WHERE type='manager_key'\G
</syntaxhighlight>
 
If no manager_key exists, check your VoIPmonitor installation and consider running the installer or contacting support to regenerate the key.
 
;5. Check database connectivity and permissions:
Verify that the VoIPmonitor sniffer can connect to the database and has read access to the <code>system</code> table.
<syntaxhighlight lang="bash">
# Test database connectivity with the configured credentials
mysql -h <mysqlhost> -u <mysqlusername> -p <mysqldb>
 
# Inside MySQL, verify the user has SELECT on voipmonitor.system
SHOW GRANTS FOR 'voipmonitor_user'@'%';
</syntaxhighlight>
 
;6. Check configuration consistency between probe and server:
In distributed deployments with probe and server components, ensure that both systems have consistent configuration in <code>/etc/voipmonitor.conf</code>. Specifically, both should have the same database connection settings and <code>mysqlloadconfig</code> should be enabled on both systems.
 
=== Summary ===
 
The "failed read rsa key" error is almost always caused by <code>mysqlloadconfig=no</code> in <code>voipmonitor.conf</code>. The solution is to remove or change this setting to <code>yes</code>, then restart the service.
 
== Step 6: Check VoIPmonitor Logs for Errors ==
Finally, VoIPmonitor's own logs are the best source for clues. Check the system log for any error messages generated by the sensor on startup or during operation.
<syntaxhighlight lang="bash">
# For Debian/Ubuntu
tail -f /var/log/syslog | grep voipmonitor
 
# For CentOS/RHEL/AlmaLinux
tail -f /var/log/messages | grep voipmonitor
</syntaxhighlight>
Look for errors like:
* "pcap_open_live(eth0) error: eth0: No such device" (Wrong interface name)
* "Permission denied" (The sensor is not running with sufficient privileges)
* Errors related to database connectivity.
* Messages about dropping packets.
 
== Step 7: Check for OOM (Out of Memory) Issues ==
If VoIPmonitor suddenly stops processing CDRs and a service restart temporarily restores functionality, the system may be experiencing OOM (Out of Memory) killer events. The Linux OOM killer terminates processes when available RAM is exhausted, and MySQL (<code>mysqld</code>) is a common target due to its memory-intensive nature.
 
;1. Check for OOM killer events in kernel logs:
<syntaxhighlight lang="bash">
# For Debian/Ubuntu
grep -i "out of memory\|killed process" /var/log/syslog | tail -20
 
# For CentOS/RHEL/AlmaLinux
grep -i "out of memory\|killed process" /var/log/messages | tail -20
 
# Also check dmesg:
dmesg | grep -i "killed process" | tail -10
</syntaxhighlight>
Typical OOM killer messages look like:
<pre>
Out of memory: Kill process 1234 (mysqld) score 123 or sacrifice child
Killed process 1234 (mysqld) total-vm: 12345678kB, anon-rss: 1234567kB
</pre>
 
;2. Monitor current memory usage:
<syntaxhighlight lang="bash">
# Check available memory (look for low 'available' or 'free' values)
free -h
 
# Check per-process memory usage (sorted by RSS)
ps aux --sort=-%mem | head -15
 
# Check MySQL memory usage in bytes
cat /proc/$(pgrep mysqld)/status | grep -E "VmSize|VmRSS"
</syntaxhighlight>
Warning signs:
* '''Available memory consistently below 500MB during operation'''
* '''MySQL consuming most of the available RAM'''
* '''Swap usage near 100% (if swap is enabled)'''
* '''Frequent process restarts without clear error messages'''
 
;3. Solution: Increase physical memory:
The definitive solution for OOM-related CDR processing issues is to upgrade the server's physical RAM. After upgrading:
* Verify memory improvements with <code>free -h</code>
* Monitor for several days to ensure OOM events stop
* Consider tuning <code>innodb_buffer_pool_size</code> in your MySQL configuration to use the additional memory effectively
 
Additional mitigation strategies (while planning for RAM upgrade):
* Reduce MySQL's memory footprint by lowering <code>innodb_buffer_pool_size</code> (e.g., from 16GB to 8GB)
* Disable or limit non-essential VoIPmonitor features (e.g., packet capture storage, RTP analysis)
* Ensure swap space is properly configured as a safety buffer (though swap is much slower than RAM)
* Use <code>sysctl vm.swappiness=10</code> to favor RAM over swap when some memory is still available
 
== Step 8: Missing CDRs for Calls with Large Packets ==
If VoIPmonitor is capturing some calls successfully but missing CDRs for specific calls (especially those that seem to have larger SIP packets like INVITEs with extensive SDP), there are two common causes to investigate.
 
=== Cause 1: snaplen Packet Truncation (VoIPmonitor Configuration) ===
The <code>snaplen</code> parameter in <code>voipmonitor.conf</code> limits how many bytes of each packet are captured. If a SIP packet exceeds <code>snaplen</code>, it is truncated and the sniffer may fail to parse the call correctly.
 
;1. Check your current snaplen setting:
<syntaxhighlight lang="bash">
grep snaplen /etc/voipmonitor.conf
</syntaxhighlight>
Default is 3200 bytes (6000 if SSL/HTTP is enabled).
 
;2. Test if packet truncation is the issue:
Use <code>tcpdump</code> with <code>-s0</code> (snap infinite) to capture full packets:
<syntaxhighlight lang="bash">
# Capture SIP traffic with full packet length
tcpdump -i eth0 -s0 -nn port 5060 -w /tmp/test_capture.pcap
 
# Analyze packet sizes with Wireshark or tshark
tshark -r /tmp/test_capture.pcap -T fields -e frame.len -Y "sip" | sort -n | tail -10
</syntaxhighlight>
If you see SIP packets larger than your <code>snaplen</code> value (e.g., 4000+ bytes), increase <code>snaplen</code> in <code>voipmonitor.conf</code>:
<syntaxhighlight lang="ini">
snaplen = 65535
</syntaxhighlight>
Then restart the sniffer: <code>systemctl restart voipmonitor</code>.
 
=== Cause 2: MTU Mismatch (Network Infrastructure) ===
If packets are being lost or fragmented due to MTU mismatches in the network path, VoIPmonitor may never receive the complete packets, regardless of <code>snaplen</code> settings.
 
;1. Diagnose MTU-related packet loss:
Capture traffic with tcpdump and analyze in Wireshark:
<syntaxhighlight lang="bash">
# Capture traffic on the VoIPmonitor host
tcpdump -i eth0 -s0 host <pbx_ip_address> -w /tmp/mtu_test.pcap
</syntaxhighlight>
Open the pcap in Wireshark and look for:
* Reassembled PDUs marked as incomplete
* TCP retransmissions for the same packet
* ICMP "Fragmentation needed" messages (Type 3, Code 4)
 
;2. Verify packet completeness:
In Wireshark, examine large SIP INVITE packets. If the SIP headers or SDP appear cut off or incomplete, packets are likely being lost in transit due to MTU issues.
 
;3. Identify the MTU bottleneck:
The issue is typically a network device with a lower MTU than the end devices. Common locations:
* VPN concentrators
* Firewalls
* Routers with tunnel interfaces
* Cloud provider gateways (typically 1500 bytes vs. standard 9000 jumbo frames)
 
To locate the problematic device, trace the MTU along the network path from the PBX to the VoIPmonitor sensor.
 
;4. Resolution options:
* Increase MTU on the bottleneck device to match the rest of the network (e.g., from 1500 to 9000 for jumbo frame environments)
* Enable Path MTU Discovery (PMTUD) on intermediate devices
* Ensure your switching infrastructure supports jumbo frames end-to-end if you are using them
 
For more information on the <code>snaplen</code> parameter, see [[Sniffer_configuration#Network_Interface_.26_Sniffing|Sniffer Configuration]].
 
=== Cause 3: External Source Packet Truncation (Traffic Mirroring/LBS Modules) ===
If packets are truncated or corrupted BEFORE they reach VoIPmonitor, changing <code>snaplen</code> will NOT fix the issue. This scenario occurs when using external SIP sources that have their own packet size limitations.
 
; Symptoms to identify this scenario:
* Large SIP packets (e.g., WebRTC INVITE with big Authorization headers ~4k) appear truncated
* Packets show as corrupted or malformatted in VoIPmonitor GUI
* Changing <code>snaplen</code> in <code>voipmonitor.conf</code> has no effect
* Using TCP instead of UDP in the external system does not resolve the issue
 
; Common external sources that may truncate packets:
# Kamailio <code>siptrace</code> module
# FreeSWITCH <code>sip_trace</code> module
# OpenSIPS tracing modules
# Custom HEP/HOMER agent implementations
# Load balancers or proxy servers with traffic mirroring
 
; Diagnose external source truncation:
Use <code>tcpdump</code> with <code>-s0</code> (snap infinite) on the VoIPmonitor sensor to compare packet sizes:
<syntaxhighlight lang="bash">
# Capture traffic received by VoIPmonitor
sudo tcpdump -i eth0 -s0 -nn port 5060 -w /tmp/voipmonitor_input.pcap
 
# Analyze actual packet sizes received
tshark -r /tmp/voipmonitor_input.pcap -T fields -e frame.len -Y "sip.Method == INVITE" | sort -n | tail -10
</syntaxhighlight>
 
If:
* You see packets with truncated SIP headers or incomplete SDP
* The packet length is much smaller than expected (e.g., 1500 bytes instead of 4000+ bytes)
* Truncation is consistent across all calls
 
Then the external source is truncating packets before they reach VoIPmonitor.
 
; Solutions for Kamailio siptrace truncation:
If using Kamailio's <code>siptrace</code> module with traffic mirroring:
 
1. Configure Kamailio to use TCP transport for siptrace (may help in some cases):
<pre>
# In kamailio.cfg
modparam("siptrace", "duplicate_uri", "sip:voipmonitor_ip:port;transport=tcp")
</pre>
 
2. If Kamailio reports "Connection refused", VoIPmonitor does not open a TCP listener by default. Manually open one:
<syntaxhighlight lang="bash">
# Open TCP listener using socat
socat TCP-LISTEN:5888,fork,reuseaddr &
</syntaxhighlight>
Then update kamailio.cfg to use the specified port instead of the standard SIP port.
 
3. Use HAProxy traffic 'tee' function (recommended):
If your architecture includes HAProxy in front of Kamailio, use its traffic mirroring to send a copy of the WebSocket traffic directly to VoIPmonitor's standard SIP listening port. This bypasses the siptrace module entirely and preserves original packets:
<pre>
# In haproxy.cfg, within your frontend/backend configuration
# Send a copy of traffic to VoIPmonitor
option splice-response
tcp-request inspect-delay 5s
tcp-request content accept if { req_ssl_hello_type 1 }
use-server voipmonitor if { req_ssl_hello_type 1 }
listen voipmonitor_mirror
    bind :5888
    mode tcp
    server voipmonitor <voipmonitor_sensor_ip>:5060 send-proxy
</pre>
 
Note: The exact HAProxy configuration depends on your architecture and whether you are mirroring TCP (WebSocket) or UDP traffic.
 
; Solutions for other external sources:
# Check the external system's documentation for packet size limits or truncation settings
# Consider using standard network mirroring (SPAN/ERSPAN/GRE) instead of SIP tracing modules
# Ensure the external system captures full packet lengths (disable any internal packet size caps)
# Verify that the external system does not reassemble or modify SIP packets before forwarding
 
== Troubleshooting: CDR Shows 000 No Response Despite Valid SIP Response ==
 
If the CDR View displays "000 No Response" in the Last Response column for calls that actually have valid final SIP response codes (such as 403 Forbidden, 500 Server Error, etc.), this indicates that the sniffer is receiving response packets but failing to correlate them with their corresponding INVITE transactions before writing the CDR.
 
=== Diagnosis: Verify Response Packets Are Captured ===
 
;1. Locate the affected call in the CDR View:
:* Find a call showing "000 No Response" in the Last Response column.
 
;2. Check the SIP History:
:* Click the [+] icon to expand the call's detail view.
:* Open the "SIP History" tab.
:* Look for the actual SIP response (e.g., 403 Forbidden, 486 Busy Here, 500 Internal Server Error).
 
If the response packet IS present in SIP History, the issue is a correlation timing problem. Proceed to the solution below.
 
If the response packet is NOT present in SIP History, the issue is a network visibility problem (see [[#SPAN_Configuration_Troubleshooting|Step 3: Investigate Packet Encapsulation]] and other network troubleshooting sections).
 
=== Root Cause: libpcap Packet Queue Timeout ===
 
The issue is caused by VoIPmonitor's libpcap packet capture timing out before responses can be matched to their originating INVITEs. This typically occurs in high-traffic environments or when packet processing is temporarily delayed due to system load.
 
The sniffer creates CDR records based on SIP INVITE packets. It attempts to correlate subsequent SIP responses (403, 500, etc.) with the original INVITE. If the packet queue processing is too slow or the time window is too short, responses arrive after the CDR has already been written with "Last Response" set to 0.
 
=== Solution: Configure libpcap Nonblocking Mode ===
 
Edit the "/etc/voipmonitor.conf" file on the sniffer host and add the following parameters:
 
<syntaxhighlight lang="ini">
# Enable libpcap nonblocking mode to prevent packet queue blocking
libpcap_nonblock_mode = yes
 
# Increase packet deque window length (in milliseconds) for response correlation
# Default is often 2000ms, increasing to 5000ms gives more time for responses
pcap_queue_deque_window_length = 5000
</syntaxhighlight>
 
Save the file and restart the voipmonitor service:
 
<syntaxhighlight lang="bash">
systemctl restart voipmonitor
</syntaxhighlight>
 
=== Additional Considerations ===
 
;If the issue persists after applying the fix:
:* Try increasing <code>pcap_queue_deque_window_length</code> further (e.g., to 7000 or 10000 milliseconds)
:* Check system load to ensure the server is not under heavy CPU or I/O pressure
:* Verify adequate <code>ringbuffer</code> size is configured for your traffic volume (see [[Scaling|Scaling and Performance Tuning]])
 
;For distributed architectures:
:* Ensure all voipmonitor hosts have synchronized time (see [[#Verify_System_Time_Synchronization]])
:* Time mismatches between components can cause correlation failures
 
{{Note|The <code>pcap_queue_deque_window_length</code> parameter is also used in distributed mirroring scenarios to sort packets from multiple mirrors. Increasing this value improves packet correlation in both single-sensor and distributed setups.}}
 
For more information on packet capture configuration, see [[Sniffer_configuration|Sniffer Configuration]].
 
== Step 9: Probe Timeout Due to Virtualization Timing Issues ==
 
If remote probes are intermittently disconnecting from the central server with timeout errors, even on a high-performance network with low load, the issue may be related to virtualization host timing problems rather than network connectivity.
 
=== Diagnosis: Check System Log Timing Intervals ===
 
The VoIPmonitor sensor generates status log messages approximately every 10 seconds during normal operation. If the timing system on the probe is inconsistent, the interval between these status messages can exceed 30 seconds, triggering a connection timeout.
 
;1. Monitor the system log on the affected probe:
<syntaxhighlight lang="bash">
tail -f /var/log/syslog | grep voipmonitor
</syntaxhighlight>
 
;2. Examine the timestamps of voipmonitor status messages:
Look for repeating log entries that should appear approximately every 10 seconds during normal operations.
 
;3. Identify timing irregularities:
Calculate the time interval between successive status log entries. '''If the interval exceeds 30 seconds''', this indicates a timing system problem that will cause connection timeouts with the central server.
 
=== Root Cause: Virtualization Host RDTSC Issues ===
 
This problem is '''not''' network-related. It is a host-level timing issue that impacts the application's internal timers.
 
The issue typically occurs on virtualized probes where the host's CPU timekeeping is inconsistent. Specifically, problems with the RDTSC (Read Time-Stamp Counter) CPU instruction on the virtualization host can cause:
 
* Irregular system clock behavior on the guest VM
* Application timers that do not fire consistently
* Sporadic timeouts in client-server connections
 
=== Resolution ===
 
;1. Investigate the virtualization host configuration:
Check the host's hypervisor or virtualization platform documentation for known timekeeping issues related to RDTSC.
 
Common virtualization platforms with known timing considerations:
* KVM/QEMU: Check CPU passthrough and TSC mode settings
* VMware: Verify time synchronization between guest and host
* Hyper-V: Review Integration Services time sync configuration
* Xen: Check TSC emulation settings
 
;2. Apply host-level fixes:
These are host-level fixes, not changes to the guest VM configuration. Consult your virtualization platform's documentation for specific steps to address RDTSC timing issues.
 
Typical solutions include:
* Enabling appropriate TSC modes on the host
* Configuring CPU features passthrough correctly
* Adjusting hypervisor timekeeping parameters
 
;3. Verify the fix:
After applying the host-level configuration changes, monitor the probe's status logs again to confirm that the timing intervals are now consistently around 10 seconds (never exceeding 30 seconds).
 
<syntaxhighlight lang="bash">
# Monitor for regular status messages
tail -f /var/log/syslog | grep voipmonitor
</syntaxhighlight>
 
Once the timing is corrected, probe connections to the central server should remain stable without intermittent timeouts.
 
== Troubleshooting: Audio Missing on One Call Leg ==
 
If the sniffer captures full audio on one call leg (e.g., carrier/outside) but only partial or no audio on the other leg (e.g., PBX/inside), use this diagnostic workflow to identify the root cause BEFORE applying any configuration fixes.
 
The key question to answer is: '''Are the RTP packets for the silent leg present on the wire?'''
 
=== Step 1: Use tcpdump to Capture Traffic During a Test Call ===
 
Initiate a new test call that reproduces the issue. During the call, use tcpdump or tshark directly on the sensor's sniffing interface to capture all traffic:
 
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Capture traffic to a file during the test call
# Check disk utilization during peak time
# Replace eth0 with your sniffing interface
iostat -x 1 5 | grep -E "Device|sda|nvme"
tcpdump -i eth0 -s0 -w /tmp/direct_capture.pcap


# OR: Display live traffic for specific IPs (useful for real-time diagnostics)
# If %util stays above 80%, storage is the bottleneck
tcpdump -i eth0 -s0 -nn "host <pbx_ip> or host <carrier_ip>"
</syntaxhighlight>
</syntaxhighlight>


Let the call run for 10-30 seconds, then stop tcpdump with Ctrl+C.
== VoIPmonitor Tuning ==
 
=== Step 2: Retrieve VoIPmonitor GUI's PCAP for the Same Call ===
 
After the call completes:
1. Navigate to the '''CDR View''' in the VoIPmonitor GUI
2. Find the test call you just made
3. Download the PCAP file for that call (click the PCAP icon/button)
4. Save it as: <code>/tmp/gui_capture.pcap</code>
 
=== Step 3: Compare the Two Captures ===
 
Analyze both captures to determine if RTP packets for the silent leg are present on the wire:
 
<syntaxhighlight lang="bash">
# Count RTP packets in the direct capture
tshark -r /tmp/direct_capture.pcap -Y "rtp" | wc -l
 
# Count RTP packets in the GUI capture
tshark -r /tmp/gui_capture.pcap -Y "rtp" | wc -l


# Check for RTP from specific source IPs in the direct capture
These parameters in <code>/etc/voipmonitor.conf</code> control how the sniffer writes to the database:
tshark -r /tmp/direct_capture.pcap -Y "rtp" -T fields -e rtp.ssrc -e ip.src -e ip.dst


# Check Call-ID in both captures to verify they're the same call
{| class="wikitable"
tshark -r /tmp/direct_capture.pcap -Y "sip" -T fields -e sip.Call-ID | head -1
! Parameter !! Default !! Recommended !! Description
tshark -r /tmp/gui_capture.pcap -Y "sip" -T fields -e sip.Call-ID | head -1
</syntaxhighlight>
 
=== Step 4: Interpret the Results ===
 
{| class="wikitable" style="background:#e7f3ff; border:1px solid #3366cc;"
|-
|-
! colspan="2" style="background:#3366cc; color: white;" | Diagnostic Decision Matrix
| <code>mysqlstore_max_threads_cdr</code> || 1 || 4-8 || Number of parallel threads writing CDRs to database
|-
|-
! Observation
| <code>mysqlstore_concat_limit</code> || 400 || 1000-2000 || CDRs batched into single INSERT statement
! Root Cause & Next Steps
|-
|-
| '''RTP packets for silent leg are NOT present in direct capture'''
| <code>query_cache</code> || no || yes || Write queries to disk when queue grows (prevents OOM)
| '''Network/PBX Issue:''' The PBX or network is not sending the packets. This is not a VoIPmonitor problem. Troubleshoot the PBX (check NAT, RTP port configuration) or network (SPAN/mirror configuration, firewall rules).
|-
| '''RTP packets for silent leg ARE present in direct capture but missing in GUI capture'''
| '''Sniffer Configuration Issue:''' Packets are on the wire but VoIPmonitor is failing to capture or correlate them. Likely causes: NAT IP mismatch (natalias configuration incorrect), SIP signaling advertises different IP than RTP source, or restrictive filter rules. Proceed with configuration fixes.
|-
| '''RTP packets present in both captures but audio still silent'''
| '''Codec/Transcoding Issue:''' Packets are captured correctly but may not be decoded properly. Check codec compatibility, unsupported codecs, or transcoding issues on the PBX.
|}
|}


=== Step 5: Apply the Correct Fix Based on Diagnosis ===
'''Example configuration:'''
 
;If RTP is NOT on the wire (Network/PBX issue):
:* Check PBX RTP port configuration and firewall rules
:* Verify network SPAN/mirror is capturing bidirectional traffic (see [[#SPAN_Configuration_Troubleshooting|Section 3]])
:* Check PBX NAT settings - RTP packets may be blocked or routed incorrectly
 
;If RTP is on the wire but not captured (Sniffer configuration issue):
 
==== Check rtp_check_both_sides_by_sdp Setting ''(Primary Cause)''
 
This is the '''most common cause''' of one-way RTP capture when packets are present on the wire. The <code>rtp_check_both_sides_by_sdp</code> parameter controls how strictly RTP streams are correlated with SDP (Session Description Protocol) signaling.
 
:* Check the current setting in <code>/etc/voipmonitor.conf</code>:
:<syntaxhighlight lang="bash">
:grep "^rtp_check_both_sides_by_sdp" /etc/voipmonitor.conf
:</syntaxhighlight>
 
:* If the setting is <code>yes</code> or <code>strict</code> or <code>very_strict</code>, this requires '''BOTH sides of RTP to exactly match SDP (SIP signaling)'''
:: <code>strict</code>: Only allows verified packets after first match (blocks unverified)
:: <code>very_strict</code>: Blocks all unverified packets (most restrictive)
:: <code>keep_rtp_packets</code>: Same as <code>yes</code> but stores unverified packets for debugging
 
Symptoms of restrictive <code>rtp_check_both_sides_by_sdp</code> settings:
* Only one call leg appears in CDR (caller OR called, not both)
* Received packets column shows 0 or very low on one leg
* tcpdump shows both RTP streams present, but GUI captures only one
* Affects many calls, not just specific ones
 
'''Solution:''' Change to <code>no</code> or comment out the line:
 
<syntaxhighlight lang="ini">
; /etc/voipmonitor.conf
rtp_check_both_sides_by_sdp = no
</syntaxhighlight>
 
Restart the sniffer to apply:
 
<syntaxhighlight lang="bash">
systemctl restart voipmonitor
</syntaxhighlight>
 
If you previously set <code>rtp_check_both_sides_by_sdp = yes</code> to solve audio mixing issues in multi-call environments where multiple calls share the same IP:port, consider using alternative approaches like <code>sdp_multiplication</code> instead, as enabling strict checking breaks one-way RTP capture.
 
==== Other Configuration Checks
 
If checking <code>rtp_check_both_sides_by_sdp</code> does not resolve the issue, proceed with these additional diagnostic steps:
 
:* Configure '''natalias''' in <code>/etc/voipmonitor.conf</code> to map the IP advertised in SIP signaling to the actual RTP source IP (NAT scenarios only):
:<syntaxhighlight lang="ini">
:; /etc/voipmonitor.conf
:natalias = <Public_IP_Signaled> <Private_IP_Actual>
:</syntaxhighlight>
:: When using <code>natalias</code>, ensure <code>rtp_check_both_sides_by_sdp</code> is set to <code>no</code> (the default).
:* Check for restrictive <code>filter</code> directives in <code>voipmonitor.conf</code>
:* Verify <code>sipport</code> includes all necessary SIP ports
 
;If packets are captured but audio silent (Codec issue):
:* Check CDR view for codec information on both legs
:* Verify VoIPmonitor GUI has the necessary codec decoders installed
:* Check for codec mismatches between call legs (transcoding may be missing)
 
=== Step 6: Verify the Fix After Configuration Changes ===
 
After making changes in <code>/etc/voipmonitor.conf</code>:
 
<syntaxhighlight lang="bash">
# Restart the sniffer
systemctl restart voipmonitor
 
# Make another test call and repeat the diagnostic workflow
# Compare direct vs GUI capture again
</syntaxhighlight>
 
Confirm that RTP packets for the problematic leg now appear in both the direct tcpdump capture AND the GUI's PCAP file.
 
'''Note:''' This diagnostic methodology helps you identify whether the issue is in the network infrastructure (PBX, SPAN, firewall) or in VoIPmonitor configuration (natalias, filters). Applying VoIPmonitor configuration fixes when the root cause is a network issue will not resolve the problem.
 
== Troubleshooting: Server Coredumps and SQL Queue Overload ==
 
If the VoIPmonitor server is experiencing regular coredumps, the cause may be an SQL queue bottleneck that exceeds system limits. The SQL queue grows when the database cannot keep up with the rate of data being inserted from VoIPmonitor.
 
=== Symptoms ===
 
* Server crashes or coredumps regularly, often during peak traffic hours
* Syslog messages showing a growing <code>SQLq</code> counter (SQL queries waiting)
* Crashes occur when OPTIONS, SUBSCRIBE, and NOTIFY messages are being processed at high volume
 
=== Identify the Root Cause ===
 
;1. Check the SQL queue metric in syslog:
<syntaxhighlight lang="bash">
# Debian/Ubuntu
tail -f /var/log/syslog | grep "SQLq"
 
# CentOS/RHEL
tail -f /var/log/messages | grep "SQLq"
</syntaxhighlight>
 
Look for the <code>SQLq[XXX]</code> value where XXX is the number of queued SQL commands. If this number is consistently growing or reaching high values (thousands or more), the database is a bottleneck.
 
;2. Check if SIP message processing is enabled:
<syntaxhighlight lang="bash">
grep -E "sip-options=|sip-subscribe=|sip-notify=" /etc/voipmonitor.conf
</syntaxhighlight>
 
If these are set to <code>yes</code> and you have a high volume of these messages (OPTIONS pings sent frequently by SIP devices), this can overwhelm the database insert thread queue.
 
=== Solutions ===
 
There are three approaches to resolve SQL queue overload coredumps:
 
==== Solution 1: Increase MySQL Insert Threads ===
 
Increase the number of threads dedicated to inserting SIP messages into the database. This allows more parallel database operations.
 
Edit <code>/etc/voipmonitor.conf</code> and add or modify:
 
<syntaxhighlight lang="ini">
<syntaxhighlight lang="ini">
# Increase insert threads for SIP messages (default is 4, increase to 8 or higher for high traffic)
# /etc/voipmonitor.conf
mysqlstore_max_threads_sip_msg = 8
mysqlstore_max_threads_cdr = 8
</syntaxhighlight>
mysqlstore_concat_limit = 2000
 
query_cache = yes
Restart VoIPmonitor for the change to take effect:
<syntaxhighlight lang="bash">
systemctl restart voipmonitor
</syntaxhighlight>
</syntaxhighlight>


{{Tip|For very high traffic environments, you may need to increase this value further (e.g., 12 or 16).}}
Restart after changes: <code>systemctl restart voipmonitor</code>


==== Solution 2: Disable High-Volume SIP Message Types ===
== MySQL Tuning ==


Reduce the load on the SQL queue by disabling processing of specific high-volume SIP message types that are not needed for your analysis.
=== Essential Settings ===


Edit <code>/etc/voipmonitor.conf</code>:
These settings are safe for most deployments and significantly improve write performance:


<syntaxhighlight lang="ini">
<syntaxhighlight lang="ini">
# Disable processing and database storage for specific message types
# /etc/mysql/mysql.conf.d/mysqld.cnf
sip-options = no
sip-subscribe = no
sip-notify = no
</syntaxhighlight>
 
Restart VoIPmonitor:
<syntaxhighlight lang="bash">
systemctl restart voipmonitor
</syntaxhighlight>


{{Note|See [[SIP_OPTIONS/SUBSCRIBE/NOTIFY]] for detailed information on these options and when to use <code>nodb</code> mode instead of disabling entirely.}}
# Buffer pool - use 50-70% of available RAM
 
==== Solution 3: Optimize MySQL Performance ===
 
Tune the MySQL/MariaDB server for better write performance to handle the high insert rate from VoIPmonitor.
 
Edit your MySQL configuration file (typically <code>/etc/mysql/my.cnf</code> or <code>/etc/mysql/mariadb.conf.d/50-server.cnf</code>):
 
<syntaxhighlight lang="ini">
[mysqld]
# InnoDB buffer pool size - set to approximately 50-70% of available RAM on a dedicated database server
# On servers running VoIPmonitor and MySQL together, use approximately 30-50% of RAM
innodb_buffer_pool_size = 8G
innodb_buffer_pool_size = 8G


# Reduce transaction durability for faster writes (may lose up to 1 second of data on crash)
# Faster writes - flush log once per second instead of every transaction
# Small risk: up to 1 second of data loss on crash
innodb_flush_log_at_trx_commit = 2
innodb_flush_log_at_trx_commit = 2
</syntaxhighlight>
Restart MySQL and VoIPmonitor:
<syntaxhighlight lang="bash">
systemctl restart mysql
systemctl restart voipmonitor
</syntaxhighlight>
{{Warning|Setting <code>innodb_flush_log_at_trx_commit</code> to <code>2</code> trades some data safety for performance. In the event of a power loss or crash, up to 1 second of the most recent transactions may be lost.}}
=== Additional Troubleshooting ===
* If increasing threads and disabling SIP message types do not resolve the issue, check if the database server itself has performance bottlenecks (CPU, disk I/O, memory)
* For systems with extremely high call volumes, consider moving the database to a separate dedicated server
* Monitor the <code>SQLq</code> metric after making changes to verify the queue is not growing unchecked
== Appendix: tshark Display Filter Syntax for SIP ==
When using <code>tshark</code> to analyze SIP traffic, it is important to use the '''correct Wireshark display filter syntax'''. Below are common filter examples:


=== Basic SIP Filters ===
# I/O threads - adjust to your CPU core count
<syntaxhighlight lang="bash">
innodb_read_io_threads = 4
# Show all SIP INVITE messages
innodb_write_io_threads = 4
tshark -r capture.pcap -Y "sip.Method == INVITE"
innodb_purge_threads = 2


# Show all SIP messages (any method)
# Skip DNS lookups for faster connections
tshark -r capture.pcap -Y "sip"
skip-name-resolve


# Show SIP and RTP traffic
# Larger log files = fewer checkpoints = better performance
tshark -r capture.pcap -Y "sip || rtp"
innodb_log_file_size = 1G
innodb_log_buffer_size = 256M
</syntaxhighlight>
</syntaxhighlight>


=== Search for Specific Phone Number or Text ===
=== SSD/NVMe Specific Settings ===
<syntaxhighlight lang="bash">
# Find calls containing a specific phone number (e.g., 5551234567)
tshark -r capture.pcap -Y 'sip contains "5551234567"'


# Find INVITE messages for a specific number
If using SSD or NVMe storage, add these optimizations:
tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"'
</syntaxhighlight>


=== Extract Call-ID from Matching Calls ===
<syntaxhighlight lang="ini">
<syntaxhighlight lang="bash">
# Disable HDD optimization (not needed for SSD)
# Get Call-ID for calls matching a phone number
innodb_flush_neighbors = 0
tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"' -T fields -e sip.Call-ID
 
# Get Call-ID along with From and To headers
tshark -r capture.pcap -Y 'sip.Method == INVITE' -T fields -e sip.Call-ID -e sip.from.user -e sip.to.user
</syntaxhighlight>
 
=== Filter by IP Address ===
<syntaxhighlight lang="bash">
# SIP traffic from a specific source IP
tshark -r capture.pcap -Y "sip && ip.src == 192.168.1.100"
 
# SIP traffic between two hosts
tshark -r capture.pcap -Y "sip && ip.addr == 192.168.1.100 && ip.addr == 10.0.0.50"
</syntaxhighlight>
 
=== Filter by SIP Response Code ===
<syntaxhighlight lang="bash">
# Show all 200 OK responses
tshark -r capture.pcap -Y "sip.Status-Code == 200"
 
# Show all 4xx and 5xx error responses
tshark -r capture.pcap -Y "sip.Status-Code >= 400"
 
# Show 486 Busy Here responses
tshark -r capture.pcap -Y "sip.Status-Code == 486"
</syntaxhighlight>
 
=== Important Syntax Notes ===
* '''Field names are case-sensitive:''' Use <code>sip.Method</code>, <code>sip.Call-ID</code>, <code>sip.Status-Code</code> (not <code>sip.method</code> or <code>sip.call-id</code>)
* '''String matching uses <code>contains</code>:''' Use <code>sip contains "text"</code> (not <code>sip.contains()</code>)
* '''Use double quotes for strings:''' <code>sip contains "number"</code> (not single quotes)
* '''Boolean operators:''' Use <code>&&</code> (and), <code>||</code> (or), <code>!</code> (not)
 
For a complete reference, see the [https://www.wireshark.org/docs/dfref/s/sip.html Wireshark SIP Display Filter Reference].
 
== Troubleshooting: Database Error 1062 - Lookup Table Auto-Increment Limit ==
 
If the sniffer logs show a database error `1062 - Duplicate entry '16777215' for key 'PRIMARY'` and new CDRs stop being stored, this is caused by a lookup table reaching its maximum auto-increment limit.
 
=== Symptoms ===
 
* CDRs stop being inserted into the database
* Sniffer logs show: `query error in [call __insert_10_0S1();]: 1062 - Duplicate entry '16777215' for key 'PRIMARY'`
* The error affects a lookup table (such as `cdr_sip_response` or `cdr_reason`)
* The value 16777215 (16,777,215) indicates the table is using `MEDIUMINT UNSIGNED` for the ID column
 
=== Root Cause ===
 
VoIPmonitor uses lookup tables (like `cdr_sip_response` or `cdr_reason`) to store unique values such as SIP response reason strings or custom response text. These are used to normalize data and reduce storage in the main `cdr` table.
 
When the system receives many unique SIP response strings or reason messages (e.g., different error messages from various carriers, devices with custom SIP header formats, or PBX-specific responses), the lookup table's auto-increment ID can reach the `MEDIUMINT` limit of 16,777,215. Once this limit is hit, new unique values cannot be inserted, causing all subsequent CDRs to fail with error 1062.
 
=== Identifying the Affected Table ===


Check which lookup table is hitting the limit:
# Tell InnoDB about fast storage (adjust to your disk specs)
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000


<syntaxhighlight lang="sql">
# Bypass OS cache - better for dedicated DB servers
-- Check the current AUTO_INCREMENT value for lookup tables
innodb_flush_method = O_DIRECT
SELECT
    TABLE_NAME,
    COLUMN_TYPE,
    AUTO_INCREMENT
FROM
    INFORMATION_SCHEMA.TABLES
JOIN
    INFORMATION_SCHEMA.COLUMNS
USING (TABLE_SCHEMA, TABLE_NAME)
WHERE
    TABLE_SCHEMA = 'voipmonitor' AND
    (TABLE_NAME LIKE 'cdr_sip%' OR TABLE_NAME LIKE 'cdr_reason%') AND
    COLUMN_KEY = 'PRI' AND
    EXTRA LIKE '%auto_increment%'
ORDER BY AUTO_INCREMENT DESC;
</syntaxhighlight>
</syntaxhighlight>


Look for AUTO_INCREMENT values approaching or exceeding 16,000,000 in tables using `MEDIUMINT`.
Restart MySQL after changes: <code>systemctl restart mysql</code>
 
=== Solution: Prevent New Unique Entries ===
 
The most effective solution is to configure VoIPmonitor to stop storing or normalize the unique SIP response text that is causing the rapid growth of the lookup table.
 
==== Option 1: Disable SIP Response Text Storage ===
 
Edit `/etc/voipmonitor.conf` on the sniffer to disable storing SIP response reason text:


<syntaxhighlight lang="ini">
=== Verify Settings ===
# Disable storing SIP response reason strings in lookup tables
cdr_reason_string_enable = no
</syntaxhighlight>
 
This prevents the system from creating new unique entries for SIP response reason strings. Restart the sniffer:


<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
systemctl restart voipmonitor
# Check current buffer pool size
</syntaxhighlight>
mysql -e "SHOW VARIABLES LIKE 'innodb_buffer_pool_size';"
 
==== Option 2: Normalize Response Text ===


If you need to keep some response text but reduce the number of unique entries, enable normalization in `/etc/voipmonitor.conf`:
# Check if buffer pool is fully utilized
 
mysql -e "SHOW STATUS LIKE 'Innodb_buffer_pool_pages%';"
<syntaxhighlight lang="ini">
# Normalize SIP response text to reduce unique entries
cdr_reason_normalisation = yes
cdr_sip_response_normalisation = yes
cdr_ua_normalisation = yes
</syntaxhighlight>
</syntaxhighlight>


Normalization transforms similar response strings into a single canonical form, significantly reducing the number of unique rows created. Include all three normalization options for maximum effectiveness.
== Prevention ==


==== Option 3: Clear Queued SQL Queries ===
* '''Schedule heavy operations off-peak:''' Configure alerts and reports to run at night
* '''Separate components:''' Use dedicated database server for high-traffic deployments
* '''Monitor trends:''' Set up alerts for SQLq growth before it becomes critical


If error 1062 persists after applying the configuration changes and restarting the service, there may be a large backlog of failed SQL queries queued in the spool directory. Clearing this queue can eliminate the persistent 1062 errors that are caused by previously buffered failed inserts.
== Extreme High-Volume Deployments ==


{{Warning|This step will DELETE all buffered CDRs in the queue. These CDRs will be permanently lost.}}
For deployments processing '''hundreds or thousands of CDRs per second''' (10,000+ concurrent calls), standard tuning is not sufficient. These scenarios require:


To clear the SQL queue:
* Disabling AUTO_INCREMENT and letting the sniffer generate IDs
* Hourly table partitioning instead of daily
* Batch sizes of 10,000+ CDRs
* Aggressive MySQL settings that trade durability for performance


<syntaxhighlight lang="bash">
See [[High-Performance_VoIPmonitor_and_MySQL_Setup_Manual]] for detailed configuration.
# 1. Stop the VoIPmonitor service
systemctl stop voipmonitor


# 2. Remove the queued query files (qoq-* files)
== See Also ==
# The default spool directory is /var/spool/voipmonitor
rm -f /var/spool/voipmonitor/qoq-*


# 3. Verify the files are removed
* [[Database_troubleshooting]] - Comprehensive database troubleshooting guide
ls /var/spool/voipmonitor/qoq-*
* [[Emergency_procedures]] - RRD graph diagnostics for bottleneck identification
* [[Scaling]] - Performance tuning and scaling guidelines
* [[High-Performance_VoIPmonitor_and_MySQL_Setup_Manual]] - Configuration for 1000+ CDR/s deployments


# 4. Restart the service
systemctl start voipmonitor


# 5. Check that service is running
systemctl status voipmonitor
</syntaxhighlight>


After restarting, the service should no longer attempt to re-insert the previously failed 1062 queries from the queue. Monitor the logs to confirm the error has stopped.


==== Option 4: Clean Existing Data (Immediate Fix) ===


The lookup table has reached its MEDIUMINT limit, preventing new CDRs from being stored. Truncating the table clears it and resets the auto-increment counter to 1, allowing CDRs to be written immediately.


{{Warning|TRUNCATE permanently deletes all data. This will remove the exact SIP response text display in the GUI for historical CDRs, but will not affect the main CDR records or call data. Only do this if you are certain you no longer need the original response text.}}
== AI Summary for RAG ==
 
<syntaxhighlight lang="sql">
-- Clear the cdr_reason table (adjust table name as needed based on error message)
TRUNCATE TABLE cdr_reason;
</syntaxhighlight>
 
=== Verification ===
 
After applying the fix:
 
1. Check that CDRs are being stored again by monitoring the sniffer logs
2. Verify the lookup table AUTO_INCREMENT is no longer increasing rapidly:
<syntaxhighlight lang="sql">
SELECT AUTO_INCREMENT FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'cdr_sip_response' AND TABLE_SCHEMA = 'voipmonitor';
</syntaxhighlight>
3. Monitor the error logs to confirm the 1062 error has stopped appearing


=== Important Note: NOT a Database Schema Issue ===
'''Summary:''' Troubleshooting guide for SQL queue (SQLq/SQLf) growth during peak traffic periods. The queue grows when the database cannot keep up with CDR insertion rate. Diagnosis involves checking mysqld CPU usage, iowait, and RRD graphs in GUI (Settings > Sensors > graph icon). Immediate mitigations: enable <code>query_cache=yes</code> to prevent OOM, <code>quick_save_cdr=yes</code> to reduce GUI delay, <code>mysqlstore_max_threads_cdr=8</code> for parallel writes. Root causes are typically I/O bottleneck (upgrade HDD to SSD/NVMe), RAM bottleneck (increase <code>innodb_buffer_pool_size</code> to 50-70% of RAM), or CPU bottleneck (hardware upgrade needed). Prevention includes scheduling heavy operations off-peak and using dedicated database servers.


This error is typically NOT solved by changing the database schema (e.g., migrating to BIGINT). The root cause is storing too many unique SIP response strings, which will continue to grow regardless of the ID column size. The correct solution is to configure VoIPmonitor to stop creating these unique entries via the `cdr_reason_string_enable` configuration option.
'''Keywords:''' SQL queue, SQLq, SQLf, peak time, database bottleneck, query_cache, quick_save_cdr, mysqlstore_max_threads_cdr, innodb_buffer_pool_size, I/O bottleneck, SSD upgrade, OOM, CDR delay, database performance, peak traffic
 
{{Warning|Do NOT confuse this with the unrelated `cdr` table integer overflow problem. The main `cdr` table may encounter limits around 4 billion rows (32-bit INT), which is addressed in the [[Upgrade_to_bigint]] guide. Lookup table issues at 16.7 million (MEDIUMINT) are solved by configuration, not schema migration.}}
 
== See Also ==
* [[Sniffer_configuration]] - Complete configuration reference for voipmonitor.conf
* [[Sniffer_distributed_architecture]] - Client/server deployment and troubleshooting
* [[Capture_rules]] - GUI-based selective recording configuration
* [[Sniffing_modes]] - Traffic forwarding methods (SPAN, ERSPAN, GRE, TZSP)
* [[Scaling]] - Performance tuning and optimization
* [[Upgrade_to_bigint]] - Migrating CDR table to BIGINT (unrelated to lookup table issues)
 
== AI Summary for RAG ==
'''Summary:''' Comprehensive troubleshooting guide for VoIPmonitor sensor issues. POST-REBOOT VERIFICATION: After planned server reboot, verify two critical items: (1) Firewall/Iptables Rules - check with `iptables -L -n -v`, `firewall-cmd --list-all`, or `ufw status verbose`. Verify VoIPmonitor ports are allowed: SIP (5060/udp), RTP range, GUI (80/tcp, 443/tcp), sensor management (5029/tcp), Client-Server (60024/tcp). Make rules persistent: for iptables use `iptables-save > /etc/iptables/rules.v4` and install `iptables-persistent`; for firewalld use `--permanent` flag. (2) System Time Synchronization - CRITICAL especially for packetbuffer_sender mode. Check with `ntpstat` or `chronyc tracking`. Verify with `ntpq -p` or `chronyc sources -v`. Time offset should be under 100ms. For packetbuffer_sender mode, host and server times must match for proper call correlation (max difference: 2 seconds). Ensure all distributed sensors and central server use same NTP source: `timedatectl status`. Troubleshoot time sync: check firewall allows UDP 123, verify NTP servers reachable, review `/etc/ntp.conf` or `/etc/chrony.conf`, enable service on boot. MAIN TROUBLESHOOTING STEPS for no calls: (1) Verify service running with <code>systemctl status</code>. If service fails to start or crashes immediately with "missing package" error: check logs (syslog/journalctl), install missing dependencies - most commonly <code>rrdtool</code> for RRD graphing/statistics (apt-get install rrdtool or yum/dnf install rrdtool), other common missing packages: libpcap, libssl, zlib. Use <code>ldd</code> to check shared library dependencies. Restart service after installing packages. (2) CRITICAL STEP: Use <code>tshark</code> to verify live traffic is reaching the correct network interface: <code>tshark -i eth0 -Y "sip || rtp" -n</code> (replace eth0 with interface from voipmonitor.conf). If command shows NO packets: issue is network - check SPAN/mirror port configuration on switch, firewall rules. If command shows OPTIONS/NOTIFY/SUBSCRIBE/METHOD but NO INVITE packets: environment has no calls (VOIPmonitor requires INVITE for CDRs). Configure to process non-call SIP messages in voipmonitor.conf with sip-options, sip-message, sip-subscribe, sip-notify set to yes. (3) Check network config - promiscuous mode required for SPAN/RSPAN but NOT for Layer 3 tunnels (ERSPAN/GRE/TZSP/VXLAN). (3A) SPECIAL CASE: Missing packets for specific IPs during high-traffic periods. Use tcpdump FIRST: `tcpdump -i eth0 -nn "host 10.1.2.3 and port 5060"`. If NO packets arrive -> check SPAN config for bidirectional capture (source ports, BOTH inbound/outbound, SPAN buffer saturation during peak, VLAN trunking). If packets DO arrive -> check sensor bottlenecks (ringbuffer, t0CPU, OOM, max_sip_packets_in_call). (3a) If tcpdump shows traffic but VoIPmonitor does NOT capture it, investigate packet encapsulation - capture with tcpdump and analyze with tshark for VLAN tags, ERSPAN, GRE (tshark -Y "gre"), VXLAN (udp.port == 4789), TZSP (udp.port 37008/37009). VLAN tags: ensure filter directive does not use "udp" which drops VLAN-tagged packets. ERSPAN/GRE: verify tunnel configured correctly and packets addressed to sensor IP (promiscuous mode NOT required). VXLAN/TZSP: require proper sending device configuration. (3B) SPECIAL CASE: RTP streams not displayed for specific provider. If SIP signaling works in GUI but RTP streams/quality graphs missing for one provider while working for others: Step 1: Make a test call to reproduce issue. Step 2: During test call, capture RTP packets with tcpdump: `sudo tcpdump -i eth0 -nn "host 1.2.3.4 and rtp" -w /tmp/test_provider_rtp.pcap`. Step 3: Compare tcpdump output with sensor GUI. If tcpdump shows NO RTP packets: network-level issue (asymmetric routing, SPAN config missing RTP path). If tcpdump shows RTP packets but GUI shows no streams: check capture rules with RTP set to DISCARD/Header Only, SRTP decryption config, or sipport/filter settings. (4) Verify <code>voipmonitor.conf</code> settings: interface, sipport, filter directives. (5) Check GUI capture rules with "Skip" option blocking calls. (6) Review system logs for errors. (7) Diagnose OOM killer events causing CDR processing stops. (8) Investigate missing CDRs due tosnaplen truncation, MTU mismatch, or EXTERNAL SOURCE packet truncation. Cause 3: If packets truncated before reaching VoIPmonitor (e.g., Kamailio siptrace, FreeSWITCH sip_trace, custom HEP/HOMER agents, load balancer mirrors), snaplen changes will NOT help. Diagnose with tcpdump -s0; check if received packets smaller than expected. Solutions: For Kamailio siptrace, use TCP transport in duplicate_uri parameter; if connection refused, open TCP listener with socat; best solution: use HAProxy traffic 'tee' to bypass siptrace entirely and send original packets directly. (9) Diagnose probe timeout due to virtualization timing issues - check syslog for 10-second voipmonitor status intervals, RDTSC problems on hypervisor cause >30 second gaps triggering timeouts. (10) Server coredumps and SQL queue overload: Check syslog for growing `SQLq` counter indicating database bottleneck. Symptoms include regular coredumps during peak hours when processing high-volume OPTIONS/SUBSCRIBE/NOTIFY messages. Solutions: 1) Increase `mysqlstore_max_threads_sip_msg` in voipmonitor.conf from default 4 to 8 or higher, restart service. 2) Disable high-volume SIP message types if not needed: set `sip-options=no`, `sip-subscribe=no`, `sip-notify=no`. 3) Optimize MySQL performance with `innodb_buffer_pool_size=8G` (or 50-70% of RAM on dedicated DB, 30-50% on shared) and `innodb_flush_log_at_trx_commit=2`. Restart MySQL and VoIPmonitor after changes. Monitor SQLq metric to verify queue is stable. (11) DATABASE ERROR 1062 - LOOKUP TABLE LIMIT: If sniffer logs show `1062 - Duplicate entry '16777215' for key 'PRIMARY'` and CDRs stop being stored, this is caused by lookup tables (cdr_sip_response, cdr_reason) hitting MEDIUMINT auto-increment limit (16,777,215) due to too many unique SIP response strings. This is NOT a schema migration issue. SOLUTION: (Immediate) TRUNCATE the affected lookup table to reset auto-increment counter: `TRUNCATE TABLE cdr_reason;`. (Prevent Recurrence) Edit `/etc/voipmonitor.conf` on the machine that analyzes packets and set `cdr_reason_string_enable = no` to disable storing SIP response reason strings, or enable ALL THREE normalization options: `cdr_reason_normalisation=yes`, `cdr_sip_response_normalisation=yes`, `cdr_ua_normalisation=yes` to reduce unique entries. Restart sniffer after changes. (Clear Queued Errors) If error persists after config changes, stop service, remove qoq-* queue files from `/var/spool/voipmonitor/`, then restart - this deletes buffered CDRs. Do NOT confuse with unrelated cdr table INT overflow (4 billion rows) which requires schema migration via Upgrade_to_bigint guide. Includes tshark display filter syntax appendix.
 
'''Keywords:''' troubleshooting, no calls, not sniffing, no CDRs, tshark, missing package, missing library, rrdtool, rrdtools, dependencies, service failed, service crashed, ldd, libpcap, libssl, zlib, systemctl restart, journalctl, syslog, promiscuous mode, SPAN, RSPAN, ERSPAN, GRE, TZSP, VXLAN, voipmonitor.conf, interface, sipport, filter, capture rules, Skip, OOM, out of memory, snaplen, MTU, packet truncation, external source truncation, Kamailio siptrace, FreeSWITCH sip_trace, OpenSIPS, HEP, HOMER, HAProxy tee, traffic mirroring, load balancer, socat, TCP listener, WebRTC INVITE, truncated packets, corrupted packets, Authorization header, 4k packets, display filter, sip.Method, sip.Call-ID, probe timeout, virtualization, RDTSC, timing issues, status logs, 10 second interval, KVM, VMware, Hyper-V, Xen, non-call SIP traffic, OPTIONS, NOTIFY, SUBSCRIBE, MESSAGE, sip-options, sip-message, sip-subscribe, sip-notify, qualify pings, heartbeat, instant messaging, encapsulation, packet encapsulation, VLAN tags, 802.1Q, tcpdump analysis, tshark encapsulation filters, high traffic, specific IP, missing packets, specific IP addresses, call legs missing, INVITE missing, high-traffic periods, peak hours, bidirectional capture, inbound outbound, both directions, SPAN buffer saturation, port mirroring, SPAN buffer capacity, rx tx both, monitor session, SPAN source, SPAN destination, ringbuffer, t0CPU, max_sip_packets_in_call, max_invite_packets_in_call, RTP missing, RTP not displayed, RTP missing specific provider, audio quality graphs missing, SRTP, asymmetric routing, RTP test call, tcpdump RTP capture, RTP stream visualization, audio missing, audio missing on one leg, partial audio, silenced audio, one call leg, carrier, PBX, inside, outside, tcpdump tshark comparison, direct capture vs GUI capture, diagnose audio issues, RTP packets on the wire, NAT IP mismatch, natalias configuration, codec issue, transcoding, RTP port configuration, network issue, PBX issue, sniffer configuration, packet correlation, RTP source IP mismatch, SIP signaling IP, coredump, server crash, SQL queue, SQLq, mysqlstore_max_threads_sip_msg, innodb_buffer_pool_size, innodb_flush_log_at_trx_commit, database bottleneck, SQL queue overflow, performance tuning, post-reboot verification, after reboot, server reboot, planned reboot, firewall verification, iptables check, firewalld check, ufw status, firewall persistence, iptables-persistent, firewall persistent, time synchronization, NTP, chrony, ntpstat, chronyc tracking, timedatectl, time sync, time drift, NTP port 123, distributed architecture time sync, client_server_connect_maximum_time_diff_s, packetbuffer_sender time sync, 1062 duplicate entry, 16777215, lookup table, MEDIUMINT limit, cdr_sip_response, cdr_reason, cdr_reason_string_enable, auto-increment limit, SIP response strings, unique entries, normalization, cdr_reason_normalisation, cdr_sip_response_normalisation, cdr_ua_normalisation, TRUNCATE cdr_sip_response, TRUNCATE cdr_reason, database error, lookup table overflow, rtp_check_both_sides_by_sdp, one-way RTP, RTP capturing one stream, RTP strict verification, SDP RTP matching, qoq files, qoq queue files, SQL queue cleanup, buffered CDRs, spooldir cleanup, spool directory cleanup, SQL queue persistent error


'''Key Questions:'''
'''Key Questions:'''
* What should I verify after a planned server reboot to ensure VoIPmonitor operates correctly? (Verify firewall rules and time synchronization)
* Why is the SQL queue growing during peak hours?
* How do I check firewall rules after a server reboot? (Use iptables -L -n -v, firewall-cmd --list-all, or ufw status verbose)
* How do I monitor SQL queue size in VoIPmonitor?
* Which VoIPmonitor ports should be allowed through the firewall? (SIP: 5060/udp, RTP range, GUI: 80/tcp and 443/tcp, sensor management: 5029/tcp, Client-Server: 60024/tcp)
* What causes CDR delays in the GUI?
* How do I make firewall rules persistent across reboots? (For iptables: iptables-save > /etc/iptables/rules.v4 and install iptables-persistent; for firewalld: use --permanent flag)
* How do I prevent out-of-memory errors when SQL queue grows?
* Why is time synchronization critical for packetbuffer_sender mode? (Host and server times must match for proper call correlation and packet processing; maximum allowed time difference: 2 seconds)
* Should I upgrade to SSD for VoIPmonitor database?
* How do I check NTP time synchronization after a reboot? (Use ntpstat or chronyc tracking; verify with ntpq -p or chronyc sources -v)
* What is the recommended innodb_buffer_pool_size setting?
* How do I ensure all distributed sensors and central server have synchronized time? (Check timedatectl status on each system; ensure they use same NTP source and allow UDP 123)
* How do I speed up CDR insertion during high traffic?
* What is the correct tshark command to verify SIP/RTP traffic is reaching the VoIPmonitor sensor? (Use: tshark -i eth0 -Y "sip || rtp" -n)
* How do I diagnose why sniffer captures full audio on one call leg but no audio on the other leg?
* How do I use tcpdump to diagnose missing audio on one call leg?
* How do I compare tcpdump capture with the GUI's PCAP file?
* How do I determine if RTP packets are on the wire when one leg has no audio?
* What is the diagnostic workflow for audio missing on one call leg?
* How do I determine if audio issue is network/PBX problem vs VoIPmonitor configuration?
* How do I check if RTP packets for the silent leg are present on the wire?
* How do I verify if natalias is needed for NAT IP mismatch?
* What is the most common cause of one-way RTP capture when packets are present on the wire? (rtp_check_both_sides_by_sdp set to yes, strict, or very_strict)
* How does rtp_check_both_sides_by_sdp setting affect RTP capture? (Setting to yes requires both RTP sides to exactly match SDP; no allows matching based on single direction)
* What are the symptoms of restrictive rtp_check_both_sides_by_sdp settings? (Only one call leg in CDR,_RECEIVED packets 0 on one leg, tcpdump shows both streams but GUI captures only one)
* How do I check and change rtp_check_both_sides_by_sdp configuration? (Use grep to check setting in voipmonitor.conf; change to no and restart service)
* Why does rtp_check_both_sides_by_sdp yes cause one-way RTP issues? (Requires both sides to match SIP signaling, too strict for many environments)
* How do I diagnose whether one-way audio is a codec issue or network issue?
* How do I use tcpdump vs GUI PCAP comparison for troubleshooting?
* What should I do first when one call leg has missing or partial audio?
* How do I interpret tcpdump vs GUI capture comparison results?
* How do I check for codec/transcoding issues causing one-way audio?
* How do I configure VoIPmonitor to process non-call SIP messages like OPTIONS/NOTIFY/SUBSCRIBE?
* How do I check for VLAN tags in a pcap file?
* How do I detect ERSPAN or GRE tunnels with tshark?
* How do I check for VXLAN encapsulation in my capture?
* How do I identify TZSP packets in a pcap?
* Why does my BPF filter drop VLAN-tagged packets?
* Do I need promiscuous mode for ERSPAN or GRE tunnels?
* Why is VoIPmonitor not recording any calls?
* How can I check if VoIP traffic is reaching my sensor server?
* How do I enable promiscuous mode on my network card?
* What are the most common reasons for VoIPmonitor not capturing data?
* How do I filter tshark output for SIP INVITE messages?
* What is the correct tshark filter syntax to find a specific phone number?
* Why is my VoIPmonitor probe stopping processing calls?
* What does the "Skip" option in capture rules do?
* How do I check for OOM killer events in Linux?
* Why are CDRs missing for calls with large SIP packets?
* What does the snaplen parameter do in voipmonitor.conf?
* Traffic capture stopped with missing package error, what should I do?
* Which package is commonly missing on newly installed sensors?
* How do I fix a missing library dependency for VoIPmonitor sensor?
* How do I diagnose MTU-related packet loss?
* Why are my large SIP packets truncated even after increasing snaplen?
* How do I tell if packets are truncated by VoIPmonitor or by an external source?
* How do I fix Kamailio siptrace truncating large packets?
* What is HAProxy traffic tee and how can it help with packet truncation?
* Why does Kamailio report "Connection refused" when sending siptrace via TCP?
* How do I open a TCP listener on VoIPmonitor for Kamailio siptrace?
* How do I use socat to open a TCP listening port?
* How do I troubleshoot missing packets for specific IP addresses?
* Why are packets missing only during high-traffic periods?
* How do I use tcpdump to verify if packets reach the VoIPmonitor sensor?
* What should I check if tcpdump shows no traffic but the PBX is sending packets?
* How do I verify SPAN configuration is capturing bidirectional traffic?
* What is SPAN buffer saturation and how does it affect packet capture?
* How do I configure Cisco switch SPAN for bidirectional mirroring?
* Why are packets missing for specific IP addresses during peak hours?
* What is the difference between rx, tx, and both in SPAN configuration?
* How do I know if my SPAN buffer is overloading during high traffic?
* Why do some calls work but others miss packet legs for specific IPs?
* How do I verify SPAN source and destination ports are correct?
* How do I check if SPAN is configured for trunk mode on VLAN traffic?
* Do I need SPAN to capture both ingress and egress traffic?
* When should I check SPAN buffer capacity vs sensor t0CPU for packet drops?
* What should I do if FreeSWITCH sip_trace is truncating packets?
* Why are my probes disconnecting from the server with timeout errors?
* How do I diagnose probe timeout issues on high-performance networks?
* What causes intermittent probe timeout errors in client-server mode?
* How do I check for virtualization timing issues on VoIPmonitor probes?
* Why are there no CDRs even though tshark shows SIP OPTIONS/NOTIFY traffic?
* How do I enable sip-options, sip-message, sip-subscribe, sip-notify in voipmonitor.conf?
* What SIP methods are processed to generate CDRs vs non-call records?
* Why are RTP streams not displayed in the GUI for a specific provider?
* How do I use tcpdump to capture RTP packets during a test call?
* How do I diagnose missing RTP audio quality graphs for one provider?
* If SIP signaling works but RTP is missing for a specific provider, what should I check?
* Why is my VoIPmonitor server experiencing regular coredumps?
* How do I check for SQL queue overload causing server crashes?
* What does the SQLq metric in syslog indicate?
* How do I fix server coredumps caused by high-volume OPTIONS/SUBSCRIBE/NOTIFY processing?
* What is the mysqlstore_max_threads_sip_msg parameter and how do I tune it?
* How much should I set mysqlstore_max_threads_sip_msg to for high traffic?
* How do I disable SIP message types that are causing SQL queue overload?
* How do I optimize MySQL performance to prevent SQL queue-related coredumps?
* What is the recommended innodb_buffer_pool_size for VoIPmonitor servers?
* How do I set innodb_flush_log_at_trx_commit for better database write performance?
* What are the trade-offs when setting innodb_flush_log_at_trx_commit to 2?
* What causes error 1062 - Duplicate entry '16777215' for key 'PRIMARY' in VoIPmonitor?
 
=== KEY QUESTIONS (Lookup Table Error) ===
* What causes error 1062 - Duplicate entry '16777215' for key 'PRIMARY' in VoIPmonitor? (Lookup table hitting MEDIUMINT limit 16,777,215 due to too many unique SIP response strings)
* How do I fix error 1062 in cdr_sip_response or cdr_reason tables? (Set cdr_reason_string_enable=no in voipmonitor.conf to disable storing SIP response reason strings, restart sniffer)
* IMMEDIATE FIX: How do I stop 1062 errors immediately? (TRUNCATE the affected lookup table: TRUNCATE TABLE cdr_reason; - this resets auto-increment counter)
* How do I prevent error 1062 from recurring? (Set cdr_reason_string_enable=no, or enable ALL THREE normalization options: cdr_reason_normalisation=yes, cdr_sip_response_normalisation=yes, cdr_ua_normalisation=yes)
* What does error 16777215 mean in VoIPmonitor database? (Lookup table using MEDIUMINT UNSIGNED has reached its auto-increment limit)
* How do I check which lookup table has hit the auto-increment limit? (Query INFORMATION_SCHEMA for tables with AUTO_INCREMENT approaching 16,777,215)
* Why are CDRs not being stored with error 1062? (Lookup table cdr_sip_response or cdr_reason cannot insert new unique entries)
* Should I migrate cdr_sip_response table to BIGINT to fix error 1062? (No, the root cause is storing too many unique strings; configure cdr_reason_string_enable=no instead)
* What is cdr_reason_string_enable in voipmonitor.conf? (Controls whether SIP response reason strings are stored in lookup tables - set to no to prevent 1062 errors)
* What cdr_ua_normalisation option should I enable? (Include cdr_ua_normalisation=yes along with cdr_reason_normalisation=yes and cdr_sip_response_normalisation=yes for maximum effectiveness)
* How do I prevent cdr_sip_response table overflow? (Disable SIP response text storage with cdr_reason_string_enable=no, or enable ALL THREE normalization options including cdr_ua_normalisation)
* What is the difference between cdr table INT overflow and lookup table MEDIUMINT overflow? (cdr table at 4 billion rows vs lookup tables at 16.7 million - solved differently: cdr needs schema migration, lookup tables need configuration)
* Do I need to ALTER TABLE to fix duplicate entry 16777215 error? (No, configure cdr_reason_string_enable=no to stop creating new unique entries)
* When should I use TRUNCATE on cdr_sip_response table? (IMMEDIATE - TRUNCATE resets auto-increment counter allowing CDRs to be stored again, use before/after configuration changes)
* Does error 1062 happen in main cdr table or lookup tables? (Affects lookup tables like cdr_sip_response using MEDIUMINT, not main cdr table using INT/BIGINT)
* How do I enable cdr_reason_normalisation to reduce unique entries? (Set cdr_reason_normalisation=yes, cdr_sip_response_normalisation=yes, and cdr_ua_normalisation=yes in voipmonitor.conf)
* Why does error 1062 persist after configuration change? (Failed queries remain queued in qoq-* files - stop service, remove /var/spool/voipmonitor/qoq-* files, restart service)
* How do I clear queued SQL queries for error 1062? (Stop VoIPmonitor service, rm -f /var/spool/voipmonitor/qoq-*, then restart service - this deletes buffered CDRs)
* What are qoq-* files in VoIPmonitor? (SQL queue files in spool directory storing buffered queries during database issues - must be cleared to prevent persistent 1062 errors)

Latest revision as of 20:58, 20 January 2026


SQL Queue Growing During Peak Time

When the SQL queue (SQLq/SQLf) grows during peak traffic, the database cannot keep up with the CDR insertion rate. This causes delayed CDR visibility in the GUI and can lead to memory exhaustion.

Quick Diagnosis

Monitor the queue:

journalctl -u voipmonitor -f | grep SQLf

Check RRD graphs: Settings > Sensors > click graph icon > select peak time period. Look for growing "SQL Cache" or "SQL Cache Files".

ℹ️ Note: SQLf value should normally stay below 10. If it consistently exceeds 10 or grows during peak time, apply mitigations.

Storage Speed is Critical

The most common cause of SQL queue growth is slow storage. MySQL write performance depends heavily on disk I/O speed.

Storage Type Typical IOPS CDR Write Capacity Recommendation
HDD (7200 RPM) ~150 10-50 CDR/s Not suitable for production
SATA SSD ~50,000 200-500 CDR/s Suitable for small deployments
NVMe SSD ~500,000+ 1000+ CDR/s Recommended for production

Check your current I/O performance:

# Check disk utilization during peak time
iostat -x 1 5 | grep -E "Device|sda|nvme"

# If %util stays above 80%, storage is the bottleneck

VoIPmonitor Tuning

These parameters in /etc/voipmonitor.conf control how the sniffer writes to the database:

Parameter Default Recommended Description
mysqlstore_max_threads_cdr 1 4-8 Number of parallel threads writing CDRs to database
mysqlstore_concat_limit 400 1000-2000 CDRs batched into single INSERT statement
query_cache no yes Write queries to disk when queue grows (prevents OOM)

Example configuration:

# /etc/voipmonitor.conf
mysqlstore_max_threads_cdr = 8
mysqlstore_concat_limit = 2000
query_cache = yes

Restart after changes: systemctl restart voipmonitor

MySQL Tuning

Essential Settings

These settings are safe for most deployments and significantly improve write performance:

# /etc/mysql/mysql.conf.d/mysqld.cnf

# Buffer pool - use 50-70% of available RAM
innodb_buffer_pool_size = 8G

# Faster writes - flush log once per second instead of every transaction
# Small risk: up to 1 second of data loss on crash
innodb_flush_log_at_trx_commit = 2

# I/O threads - adjust to your CPU core count
innodb_read_io_threads = 4
innodb_write_io_threads = 4
innodb_purge_threads = 2

# Skip DNS lookups for faster connections
skip-name-resolve

# Larger log files = fewer checkpoints = better performance
innodb_log_file_size = 1G
innodb_log_buffer_size = 256M

SSD/NVMe Specific Settings

If using SSD or NVMe storage, add these optimizations:

# Disable HDD optimization (not needed for SSD)
innodb_flush_neighbors = 0

# Tell InnoDB about fast storage (adjust to your disk specs)
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000

# Bypass OS cache - better for dedicated DB servers
innodb_flush_method = O_DIRECT

Restart MySQL after changes: systemctl restart mysql

Verify Settings

# Check current buffer pool size
mysql -e "SHOW VARIABLES LIKE 'innodb_buffer_pool_size';"

# Check if buffer pool is fully utilized
mysql -e "SHOW STATUS LIKE 'Innodb_buffer_pool_pages%';"

Prevention

  • Schedule heavy operations off-peak: Configure alerts and reports to run at night
  • Separate components: Use dedicated database server for high-traffic deployments
  • Monitor trends: Set up alerts for SQLq growth before it becomes critical

Extreme High-Volume Deployments

For deployments processing hundreds or thousands of CDRs per second (10,000+ concurrent calls), standard tuning is not sufficient. These scenarios require:

  • Disabling AUTO_INCREMENT and letting the sniffer generate IDs
  • Hourly table partitioning instead of daily
  • Batch sizes of 10,000+ CDRs
  • Aggressive MySQL settings that trade durability for performance

See High-Performance_VoIPmonitor_and_MySQL_Setup_Manual for detailed configuration.

See Also




AI Summary for RAG

Summary: Troubleshooting guide for SQL queue (SQLq/SQLf) growth during peak traffic periods. The queue grows when the database cannot keep up with CDR insertion rate. Diagnosis involves checking mysqld CPU usage, iowait, and RRD graphs in GUI (Settings > Sensors > graph icon). Immediate mitigations: enable query_cache=yes to prevent OOM, quick_save_cdr=yes to reduce GUI delay, mysqlstore_max_threads_cdr=8 for parallel writes. Root causes are typically I/O bottleneck (upgrade HDD to SSD/NVMe), RAM bottleneck (increase innodb_buffer_pool_size to 50-70% of RAM), or CPU bottleneck (hardware upgrade needed). Prevention includes scheduling heavy operations off-peak and using dedicated database servers.

Keywords: SQL queue, SQLq, SQLf, peak time, database bottleneck, query_cache, quick_save_cdr, mysqlstore_max_threads_cdr, innodb_buffer_pool_size, I/O bottleneck, SSD upgrade, OOM, CDR delay, database performance, peak traffic

Key Questions:

  • Why is the SQL queue growing during peak hours?
  • How do I monitor SQL queue size in VoIPmonitor?
  • What causes CDR delays in the GUI?
  • How do I prevent out-of-memory errors when SQL queue grows?
  • Should I upgrade to SSD for VoIPmonitor database?
  • What is the recommended innodb_buffer_pool_size setting?
  • How do I speed up CDR insertion during high traffic?