Sniffer troubleshooting: Difference between revisions

From VoIPmonitor.org
(Add VoIPmonitor buffer settings for OOM troubleshooting)
(Patch: replace '=== Solution: I/O Bottleneck =...')
 
(104 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{DISPLAYTITLE:Troubleshooting: No Calls Being Sniffed}}
= Sniffer Troubleshooting =


'''This guide provides a systematic, step-by-step process to diagnose why the VoIPmonitor sensor might not be capturing any calls. Follow these steps in order to quickly identify and resolve the most common issues.'''
This page covers common VoIPmonitor sniffer/sensor problems organized by symptom. For configuration reference, see [[Sniffer_configuration]]. For performance tuning, see [[Scaling]].


== Step 1: Is the VoIPmonitor Service Running Correctly? ==
== Critical First Step: Is Traffic Reaching the Interface? ==
First, confirm the sensor process is active and loaded the correct configuration file.


;1. Check the service status (for modern systemd systems):
{{Warning|Before any sensor tuning, verify packets are reaching the network interface. If packets aren't there, no amount of sensor configuration will help.}}
<pre>systemctl status voipmonitor</pre>
Look for a line that says <code>Active: active (running)</code>. If it is inactive or failed, try restarting it with `systemctl restart voipmonitor` and check the status again.


;2. Service Fails to Start with "Binary Not Found" After Crash:
<syntaxhighlight lang="bash">
If the VoIPmonitor service fails to start after a crash or watchdog restart with an error message indicating the binary cannot be found (e.g., "No such file or directory" for `/usr/local/sbin/voipmonitor`), the binary may have been renamed with an underscore suffix during the crash recovery process.
# Check for SIP traffic on the capture interface
tcpdump -i eth0 -nn "host <PROBLEMATIC_IP> and port 5060" -c 10


Check for a renamed binary:
# If no packets: Network/SPAN issue - contact network admin
<pre>
# If packets visible: Proceed with sensor troubleshooting below
# Check if the standard binary path exists
</syntaxhighlight>
ls -l /usr/local/sbin/voipmonitor


# If not found, look for a renamed version with underscore suffix
<kroki lang="mermaid">
ls -l /usr/local/sbin/voipmonitor_*
graph TD
</pre>
    A[No Calls Recorded] --> B{Packets on interface?<br/>tcpdump -i eth0 port 5060}
    B -->|No packets| C[Network Issue]
    C --> C1[Check SPAN/mirror config]
    C --> C2[Verify VLAN tagging]
    C --> C3[Check cable/port]
    B -->|Packets visible| D[Sensor Issue]
    D --> D1[Check voipmonitor.conf]
    D --> D2[Check GUI Capture Rules]
    D --> D3[Check logs for errors]
</kroki>


If you find a renamed binary (e.g., `voipmonitor_`, `voipmonitor_20250104`, etc.), rename it back to the standard name:
== Quick Diagnostic Checklist ==
<pre>
mv /usr/local/sbin/voipmonitor_ /usr/local/sbin/voipmonitor
</pre>


Then restart the service:
{| class="wikitable"
<pre>
|-
systemctl start voipmonitor
! Check !! Command !! Expected Result
</pre>
|-
| Service running || <code>systemctl status voipmonitor</code> || Active (running)
|-
| Traffic on interface || <code>tshark -i eth0 -c 5 -Y "sip"</code> || SIP packets displayed
|-
| Interface errors || <code>ip -s link show eth0</code> || No RX errors/drops
|-
| Promiscuous mode || <code>ip link show eth0</code> || PROMISC flag present
|-
| Logs || <code>tail -100 /var/log/syslog \| grep voip</code> || No critical errors
|-
| GUI rules || Settings → Capture Rules || No unexpected "Skip" rules
|}


Verify the service starts correctly:
== No Calls Being Recorded ==
<pre>
 
=== Service Not Running ===
 
<syntaxhighlight lang="bash">
# Check status
systemctl status voipmonitor
systemctl status voipmonitor
</pre>


;3. Sensor Becomes Unresponsive After GUI Update:
# View recent logs
If the sensor service fails to start or becomes unresponsive after updating a sensor through the Web GUI, the update process may have left the service in a stuck state. The solution is to forcefully stop the service and restart it using these commands:
journalctl -u voipmonitor --since "10 minutes ago"
<pre>
# SSH into the sensor host and execute:
killall voipmonitor
systemctl stop voipmonitor
systemctl start voipmonitor
</pre>
After running these commands, verify the sensor status in the GUI to confirm it is responding correctly. This sequence ensures: (1) Any zombie or hung processes are terminated with `killall`, (2) systemd is fully stopped, and (3) a clean start of the service.


;4. Verify the running process:
# Start/restart
<pre>ps aux | grep voipmonitor</pre>
systemctl restart voipmonitor
This command will show the running process and the exact command line arguments it was started with. Critically, ensure it is using the correct configuration file, for example: <code>--config-file /etc/voipmonitor.conf</code>. If it is not, there may be an issue with your startup script.
</syntaxhighlight>


== Step 2: Is Network Traffic Reaching the Server? ==
Common startup failures:
If the service is running, the next step is to verify if the VoIP packets (SIP/RTP) are actually arriving at the server's network interface. The best tool for this is `tshark` (the command-line version of Wireshark).
* '''Interface not found''': Check <code>interface</code> in voipmonitor.conf matches <code>ip a</code> output
* '''Port already in use''': Another process using the management port
* '''License issue''': Check [[License]] for activation problems


;1. Install tshark:
=== Wrong Interface or Port Configuration ===
<pre>
# For Debian/Ubuntu
apt-get update && apt-get install tshark


# For CentOS/RHEL/AlmaLinux
<syntaxhighlight lang="bash">
yum install wireshark
# Check current config
</pre>
grep -E "^interface|^sipport" /etc/voipmonitor.conf


;2. Listen for SIP traffic on the correct interface:
# Example correct config:
Replace `eth0` with the interface name you have configured in `voipmonitor.conf`.
# interface = eth0
<pre>
# sipport = 5060
tshark -i eth0 -Y "sip || rtp" -n
</syntaxhighlight>
</pre>
*'''If you see a continuous stream of SIP and RTP packets''', it means traffic is reaching the server, and the problem is likely in VoIPmonitor's configuration (see Step 4).
*'''If you see NO packets''', the problem lies with your network configuration. Proceed to Step 3.


;3. Advanced: Capture to PCAP File for Definitive Testing
{{Tip|For multiple SIP ports: <code>sipport = 5060,5061,5080</code>}}
Live monitoring with tshark is useful for observation, but capturing traffic to a .pcap file during a test call provides definitive evidence for troubleshooting intermittent issues or specific call legs.


'''Method 1: Using tcpdump (Recommended)'''
=== GUI Capture Rules Blocking ===
<pre>
# Start capture on the correct interface (replace eth0)
tcpdump -i eth0 -s 0 -w /tmp/test_capture.pcap port 5060


# Or capture both SIP and RTP traffic:
Navigate to '''Settings → Capture Rules''' and check for rules with action "Skip" that may be blocking calls. Rules are processed in order - a Skip rule early in the list will block matching calls.
tcpdump -i eth0 -s 0 -w /tmp/test_capture.pcap "(port 5060 or udp)"


# Let it run while you make a test call with the missing call leg
See [[Capture_rules]] for detailed configuration.
# Press Ctrl+C to stop the capture


# Analyze the capture file:
=== SPAN/Mirror Not Configured ===
tshark -r /tmp/test_capture.pcap -Y "sip"
</pre>


'''Method 2: Using tshark to capture to file'''
If <code>tcpdump</code> shows no traffic:
<pre>
# Verify switch SPAN/mirror port configuration
# Start capture:
# Check that both directions (ingress + egress) are mirrored
tshark -i eth0 -w /tmp/test_capture.pcap -f "tcp port 5060 or udp"
# Confirm VLAN tagging is preserved if needed
# Test physical connectivity (cable, port status)


# Make your test call, then press Ctrl+C to stop
See [[Sniffing_modes]] for SPAN, RSPAN, and ERSPAN configuration.


# Analyze the capture:
=== Filter Parameter Too Restrictive ===
tshark -r /tmp/test_capture.pcap -Y "sip" -V
</pre>


'''Decision Tree for PCAP Analysis:'''
If <code>filter</code> is set in voipmonitor.conf, it may exclude traffic:
After capturing a test call known to have a missing leg:


* '''If SIP packets are missing from the .pcap file:'''
<syntaxhighlight lang="bash">
** The problem is with your network mirroring configuration (SPAN/TAP port, AWS Traffic Mirroring, etc.)
# Check filter
** The packets never reached the VoIPmonitor sensor's network interface
grep "^filter" /etc/voipmonitor.conf
** Fix the switch mirroring setup or infrastructure configuration first


* '''If SIP packets ARE present in the .pcap file but missing in the VoIPmonitor GUI:**
# Temporarily disable to test
** The problem is with VoIPmonitor's configuration or processing
# Comment out the filter line and restart
** Packets reached the NIC but were not processed correctly
</syntaxhighlight>
** Review Step 4 (VoIPmonitor Configuration) and Step 5 (Capture Rules)


'''Example Test Call Workflow:'''
<pre>
# 1. Start capture
tcpdump -i eth0 -s 0 -w /tmp/test.pcap "sip and host 10.0.1.100"


# 2. Make a test call from phone at 10.0.1.100 to 10.0.2.200
#    (a call that you know should have recordings but is missing)


# 3. Stop capture (Ctrl+C)
==== Missing id_sensor Parameter ====


# 4. Check for the specific call's Call-ID
'''Symptom''': SIP packets visible in Capture/PCAP section but missing from CDR, SIP messages, and Call flow.
tshark -r /tmp/test.pcap -Y "sip" -T fields -e sip.Call-ID


# 5. Verify if packets for both A-leg and B-leg exist
'''Cause''': The <code>id_sensor</code> parameter is not configured or is missing. This parameter is required to associate captured packets with the CDR database.
tshark -r /tmp/test.pcap -Y "sip && ip.addr == 10.0.1.100"


# 6. Compare results with VoIPmonitor GUI
'''Solution''':
#    - If packets found in .pcap: VoIPmonitor software issue
<syntaxhighlight lang="bash">
#   - If packets missing from .pcap: Network mirroring issue
# Check if id_sensor is set
</pre>
grep "^id_sensor" /etc/voipmonitor.conf


== Step 3: Troubleshoot Network and Interface Configuration ==
# Add or correct the parameter
If `tshark` shows no traffic, it means the packets are not being delivered to the operating system correctly.
echo "id_sensor = 1" >> /etc/voipmonitor.conf


;1. Check if the interface is UP:
# Restart the service
Ensure the network interface is active.
systemctl restart voipmonitor
<pre>ip link show eth0</pre>
</syntaxhighlight>
The output should contain the word `UP`. If it doesn't, bring it up with:
<pre>ip link set dev eth0 up</pre>


;2. Check for Promiscuous Mode (for SPAN/RSPAN Mirrored Traffic):
{{Tip|Use a unique numeric identifier (1-65535) for each sensor. Essential for multi-sensor deployments. See [[Sniffer_configuration#id_sensor|id_sensor documentation]].}}
'''Important:''' Promiscuous mode requirements depend on your traffic mirroring method:
== Missing Audio / RTP Issues ==


* '''SPAN/RSPAN (Layer 2 mirroring):''' The network interface '''must''' be in promiscuous mode. Mirrored packets retain their original MAC addresses, so the interface would normally ignore them. Promiscuous mode forces the interface to accept all packets regardless of destination MAC.
=== One-Way Audio (Asymmetric Mirroring) ===


* '''ERSPAN/GRE/TZSP/VXLAN (Layer 3 tunnels):''' Promiscuous mode is '''NOT required'''. These tunneling protocols encapsulate the mirrored traffic inside IP packets that are addressed directly to the sensor's IP address. The operating system receives these packets normally, and VoIPmonitor automatically decapsulates them to extract the inner SIP/RTP traffic.
'''Symptom''': SIP recorded but only one RTP direction captured.


For SPAN/RSPAN deployments, check the current promiscuous mode status:
'''Cause''': SPAN port configured for only one direction.
<pre>ip link show eth0</pre>
Look for the `PROMISC` flag.


Enable promiscuous mode manually if needed:
'''Diagnosis''':
<pre>ip link set eth0 promisc on</pre>
<syntaxhighlight lang="bash">
If this solves the problem, you should make the change permanent. The `install-script.sh` for the sensor usually attempts to do this, but it can fail.
# Count RTP packets per direction
tshark -i eth0 -Y "rtp" -T fields -e ip.src -e ip.dst | sort | uniq -c
</syntaxhighlight>


;3. Verify Your SPAN/Mirror/TAP Configuration:
If one direction shows 0 or very few packets, configure the switch to mirror both ingress and egress traffic.
This is the most common cause of no traffic. Double-check your network switch or hardware tap configuration to ensure:
* The correct source ports (where your PBX/SBC is connected) are being monitored.
* The correct destination port (where your VoIPmonitor sensor is connected) is configured.
* If you are monitoring traffic across different VLANs, ensure your mirror port is configured to carry all necessary VLAN tags (often called "trunk" mode).


== Step 4: Check the VoIPmonitor Configuration ==
=== RTP Not Associated with Call ===
If `tshark` sees traffic but VoIPmonitor does not, the problem is almost certainly in `voipmonitor.conf`.


;1. Check the `interface` directive:
'''Symptom''': Audio plays in sniffer but not in GUI, or RTP listed under wrong call.
:Make sure the `interface` parameter in `/etc/voipmonitor.conf` exactly matches the interface where you see traffic with `tshark`. For example: `interface = eth0`.


;2. Check the `sipport` directive:
'''Possible causes''':
:By default, VoIPmonitor only listens on port 5060. If your PBX uses a different port for SIP, you must add it. For example:
:<code>sipport = 5060,5080</code>


;3. Check for a restrictive `filter`:
'''1. SIP and RTP on different interfaces/VLANs''':
:If you have a BPF `filter` configured, ensure it is not accidentally excluding the traffic you want to see. For debugging, try commenting out the `filter` line entirely and restarting the sensor.
<syntaxhighlight lang="ini">
# voipmonitor.conf - enable automatic RTP association
auto_enable_use_blocks = yes
</syntaxhighlight>


== Step 5: Check GUI Capture Rules (Causing Call Stops) ==
'''2. NAT not configured''':
If `tshark` sees SIP traffic and the sniffer configuration appears correct, but the probe stops processing calls or shows traffic only on the network interface, GUI capture rules may be the culprit.
<syntaxhighlight lang="ini">
# voipmonitor.conf - for NAT scenarios
natalias = <public_ip> <private_ip>


Capture rules configured in the GUI can instruct the sniffer to ignore ("skip") all processing for matched calls. This includes calls matching specific IP addresses or telephone number prefixes.
# If not working, try reversed order:
natalias = <private_ip> <public_ip>
</syntaxhighlight>


;1. Review existing capture rules:
'''3. External device modifying media ports''':
:Navigate to '''GUI → Capture rules''' and examine all rules for any that might be blocking your traffic.
:Look specifically for rules with the '''Skip''' option set to '''ON''' (displayed as "Skip: ON"). The Skip option instructs the sniffer to completely ignore matching calls (no files, RTP analysis, or CDR creation).


;2. Test by temporarily removing all capture rules:
If SDP advertises one port but RTP arrives on different port (SBC/media server issue):
:To isolate the issue, first create a backup of your GUI configuration:
<syntaxhighlight lang="bash">
:* Navigate to '''Tools → Backup & Restore → Backup GUI → Configuration tables'''
# Compare SDP ports vs actual RTP
:* This saves your current settings including capture rules
tshark -r call.pcap -Y "sip.Method == INVITE" -V | grep "m=audio"
:* Delete all capture rules from the GUI
tshark -r call.pcap -Y "rtp" -T fields -e udp.dstport | sort -u
:* Click the '''Apply''' button to save changes
</syntaxhighlight>
:* Reload the sniffer by clicking the green '''"reload sniffer"''' button in the control panel
:* Test if calls are now being processed correctly
:* If resolved, restore the configuration from the backup and systematically investigate the rules to identify the problematic one


;3. Identify the problematic rule:
If ports don't match, the external device must be configured to preserve SDP ports - VoIPmonitor cannot compensate.
:* After restoring your configuration, remove rules one at a time and reload the sniffer after each removal
=== RTP Incorrectly Associated with Wrong Call (PBX Port Reuse) ===
:* When calls start being processed again, you have identified the problematic rule
'''Symptom''': RTP streams from one call appear associated with a different CDR when your PBX aggressively reuses the same IP:port across multiple calls.
:* Review the rule's match criteria (IP addresses, prefixes, direction) against your actual traffic pattern
:* Adjust the rule's conditions or Skip setting as needed


;4. Verify rules are reloaded:
'''Cause''': When PBX reuses media ports, VoIPmonitor may incorrectly correlate RTP packets to the wrong call based on weaker correlation methods.
:After making changes to capture rules, remember that changes are '''not automatically applied''' to the running sniffer. You must click the '''"reload sniffer"''' button in the control panel, or the rules will continue using the previous configuration.


For more information on capture rules, see [[Capture_rules]].
'''Solution''': Enable <code>rtp_check_both_sides_by_sdp</code> to require verification of both source and destination IP:port against SDP:
<syntaxhighlight lang="ini">
# voipmonitor.conf - require both source and destination to match SDP
rtp_check_both_sides_by_sdp = yes


== Step 6: Troubleshoot MySQL/MariaDB Database Connection Errors ==
# Alternative (strict) mode - allows initial unverified packets
If you see "Connection refused (111)" errors or the sensor cannot connect to your database server, the issue is with the MySQL/MariaDB database connection configuration in `/etc/voipmonitor.conf`.
rtp_check_both_sides_by_sdp = strict
</syntaxhighlight>


Error 111 (Connection refused) indicates that the database server is reachable on the network, but no MySQL/MariaDB service is listening on the specified port, or the connection is being blocked by a firewall. This commonly happens after migrations when the database server IP address has changed.
{{Warning|Enabling this may prevent RTP association for calls using NAT, as the source IP:port will not match the SDP. Use <code>natalias</code> mappings or the <code>strict</code> setting to mitigate this.}}
=== Snaplen Truncation ===


;1. Check for database connection errors in sensor logs:
'''Symptom''': Large SIP messages truncated, incomplete headers.
Verify the specific error from the sensor process:
<pre>
# For Debian/Ubuntu (systemd journal)
journalctl -u voipmonitor --since "1 hour ago" | grep -iE "mysql|database|connection|can.t connect"


# For systems using traditional syslog
'''Solution''':
tail -f /var/log/syslog | grep voipmonitor | grep -iE "mysql|database|connection"
<syntaxhighlight lang="ini">
# voipmonitor.conf - increase packet capture size
snaplen = 8192
</syntaxhighlight>


# For CentOS/RHEL/AlmaLinux
For Kamailio siptrace, also check <code>trace_msg_fragment_size</code> in Kamailio config. See [[Sniffer_configuration#snaplen|snaplen documentation]].
tail -f /var/log/messages | grep voipmonitor | grep -iE "mysql|database|connection"
</pre>


Look for errors like:
== PACKETBUFFER Saturation ==
* <code>Can't connect to MySQL server on '192.168.1.10' (111)</code> - Connection refused (wrong host/port)
* <code>Access denied for user 'root'@'localhost'</code> - Wrong username/password
* <code>Unknown database 'voipmonitor'</code> - Wrong database name


;2. Verify database connection parameters in `voipmonitor.conf`:
'''Symptom''': Log shows <code>PACKETBUFFER: memory is FULL</code>, truncated RTP recordings.
Open `/etc/voipmonitor.conf` and check the MySQL connection settings:
<pre>
# Database Connection Parameters
mysqlhost = 192.168.1.10      # IP address or hostname of MySQL/MariaDB server
mysqlport = 3306              # TCP port of the database server (default: 3306)
mysqlusername = root          # Database username
mysqlpassword = your_password  # Database password
mysqldatabase = voipmonitor    # Database name
</pre>


Key points:
{{Warning|This alert refers to VoIPmonitor's '''internal packet buffer''' (<code>max_buffer_mem</code>), '''NOT system RAM'''. High system memory availability does not prevent this error. The root cause is always a downstream bottleneck (disk I/O or CPU) preventing packets from being processed fast enough.}}
* <code>mysqlhost</code>: Should be the IP address or hostname of the database server. After migration, this may have changed.
* <code>mysqlport</code>: Port 3306 is the default, but your database might use a different port.
* <code>mysqlusername</code>: Database user must have proper privileges.
* <code>mysqlpassword</code>: Ensure there are no typos or special character issues (surround with single quotes if needed).
* <code>mysqldatabase</code>: Database must exist on the server.


;3. Test MySQL connectivity from the sensor host:
'''Before testing solutions''', gather diagnostic data:
Use the <code>mysql</code> command-line client to test if the database is reachable from the sensor:
* Check sensor logs: <code>/var/log/syslog</code> (Debian/Ubuntu) or <code>/var/log/messages</code> (RHEL/CentOS)
<pre>
* Generate debug log via GUI: '''Tools → Generate debug log'''
# Test basic TCP connectivity (replace IP and port as needed)
nc -zv 192.168.1.10 3306


# Or using telnet
=== Diagnose: I/O vs CPU Bottleneck ===
telnet 192.168.1.10 3306
</pre>


If you see "Connection refused", the database service is not running or not listening on that port.
{{Warning|Do not guess the bottleneck source. Use proper diagnostics first to identify whether the issue is disk I/O, CPU, or database-related. Disabling storage as a test is valid but should be used to '''confirm''' findings, not as the primary diagnostic method.}}


;4. Test MySQL authentication using credentials from `voipmonitor.conf`:
==== Step 1: Check IO[] Metrics (v2026.01.3+) ====
Use the same credentials configured in <code>voipmonitor.conf</code> to verify they work:
<pre>
mysql -h 192.168.1.10 -P 3306 -u root -p'your_password' voipmonitor
</pre>


Commands to run inside mysql client to verify:
'''Starting with version 2026.01.3''', VoIPmonitor includes built-in disk I/O monitoring that directly shows disk saturation status:
<pre>
# Check if connected correctly
SELECT USER(), CURRENT_USER();


# Check database exists
<syntaxhighlight lang="text">
SHOW DATABASES LIKE 'voipmonitor';
[283.4/283.4Mb/s] IO[B1.1|L0.7|U45|C75|W125|R10|WI1.2k|RI0.5k]
</syntaxhighlight>


# Test write access
'''Quick interpretation:'''
USE voipmonitor;
{| class="wikitable"
SHOW TABLES;
|-
EXIT;
! Metric !! Meaning !! Problem Indicator
</pre>
|-
| '''C''' (Capacity) || % of disk's sustainable throughput used || '''C ≥ 80% = Warning''', '''C ≥ 95% = Saturated'''
|-
| '''L''' (Latency) || Current write latency in ms || '''L ≥ 3× B''' (baseline) = Saturated
|-
| '''U''' (Utilization) || % time disk is busy || '''U > 90%''' = Disk at limit
|}


;5. Compare with a working sensor's configuration:
'''If you see <code>DISK_SAT</code> or <code>WARN</code> after IO[]:'''
If you have other sensors that successfully connect to the database, compare their configuration files:
<syntaxhighlight lang="text">
<pre>
IO[B1.1|L8.5|U98|C97|W890|R5|WI12.5k|RI0.1k] DISK_SAT
# Compare database settings between working and failing sensors
</syntaxhighlight>
diff <(grep -E "^mysql" /etc/voipmonitor.conf) <(grep -E "^mysql" /path/to/working/sensor/voipmonitor.conf)
</pre>


Common discrepancies after migration:
→ This confirms I/O bottleneck. Skip to [[#Solution:_I.2FO_Bottleneck|I/O Bottleneck Solutions]].
* Wrong database server IP address (<code>mysqlhost</code>)
* Wrong database port (<code>mysqlport</code>)
* Different password due to migration to new database server
* Using <code>localhost</code> vs actual IP address


;6. Check firewall and network connectivity:
'''For older versions or additional confirmation''', continue with the steps below.
Ensure the sensor can reach the database server and the required port is open:
<pre>
# Test network reachability
ping -c 4 192.168.1.10


# Check if MySQL port is reachable
{{Note|See [[Syslog_Status_Line#IO.5B....5D_-_Disk_I.2FO_Monitoring_.28v2026.01.3.2B.29|Syslog Status Line - IO[] section]] for detailed field descriptions.}}
nc -zv 192.168.1.10 3306


# Check firewall rules (if using firewalld)
==== Step 2: Read the Full Syslog Status Line ====
firewall-cmd --list-ports


# Check firewall rules (if using iptables)
VoIPmonitor outputs a status line every 10 seconds. This is your first diagnostic tool:
iptables -L -n | grep 3306
</pre>


If the port is blocked, you may need to:
<syntaxhighlight lang="bash">
* Open port 3306 in the firewall on the database server
# Monitor in real-time
* Configure network ACLs or security groups (for cloud deployments)
journalctl -u voipmonitor -f
* Check VPN/SSH tunnel configurations
# or
tail -f /var/log/syslog | grep voipmonitor
</syntaxhighlight>


;7. Verify MySQL/MariaDB service is running:
'''Example status line:'''
On the database server, check if the service is active:
<syntaxhighlight lang="text">
<pre>
calls[424] PS[C:4 S:41 R:13540] SQLq[C:0 M:0] heap[45|30|20] comp[48] [25.6Mb/s] t0CPU[85%] t1CPU[12%] t2CPU[8%] tacCPU[8|8|7|7%] RSS/VSZ[365|1640]MB
# Check MySQL/MariaDB service status
</syntaxhighlight>
systemctl status mariadb    # or systemctl status mysql


# Restart service if needed
'''Key metrics for bottleneck identification:'''
systemctl restart mariadb
 
{| class="wikitable"
|-
! Metric !! What It Indicates !! I/O Bottleneck Sign !! CPU Bottleneck Sign
|-
| <code>heap[A&#124;B&#124;C]</code> || Buffer fill % (primary / secondary / processing) || High A with low t0CPU || High A with high t0CPU
|-
| <code>t0CPU[X%]</code> || Packet capture thread (single-core, cannot parallelize) || Low (<50%) || High (>80%)
|-
| <code>comp[X]</code> || Active compression threads || Very high (maxed out) || Normal
|-
| <code>SQLq[C:X M:Y]</code> || Pending SQL queries || Growing = database bottleneck || Stable
|-
| <code>tacCPU[...]</code> || TAR compression threads || All near 100% = compression bottleneck || Normal
|}
 
'''Interpretation flowchart:'''
 
<kroki lang="mermaid">
graph TD
    A[heap values rising] --> B{Check t0CPU}
    B -->|t0CPU > 80%| C[CPU Bottleneck]
    B -->|t0CPU < 50%| D{Check comp and tacCPU}
    D -->|comp maxed, tacCPU high| E[I/O Bottleneck<br/>Disk cannot keep up with writes]
    D -->|comp normal| F{Check SQLq}
    F -->|SQLq growing| G[Database Bottleneck]
    F -->|SQLq stable| H[Mixed/Other Issue]
 
    C --> C1[Solution: CPU optimization]
    E --> E1[Solution: Faster storage]
    G --> G1[Solution: MySQL tuning]
</kroki>
 
==== Step 3: Linux I/O Diagnostics ====
 
Use these standard Linux tools to confirm I/O bottleneck:
 
'''Install required tools:'''
<syntaxhighlight lang="bash">
# Debian/Ubuntu
apt install sysstat iotop ioping
 
# CentOS/RHEL
yum install sysstat iotop ioping
</syntaxhighlight>
 
'''2a) iostat - Disk utilization and wait times'''
<syntaxhighlight lang="bash">
# Run for 10 intervals of 2 seconds
iostat -xz 2 10
</syntaxhighlight>
 
'''Key output columns:'''
<syntaxhighlight lang="text">
Device  r/s    w/s  rkB/s  wkB/s  await  %util
sda    12.50  245.30  50.00  1962.40  45.23  98.50
</syntaxhighlight>
 
{| class="wikitable"
|-
! Column !! Description !! Problem Indicator
|-
| <code>%util</code> || Device utilization percentage || '''> 90%''' = disk saturated
|-
| <code>await</code> || Average I/O wait time (ms) || '''> 20ms''' for SSD, '''> 50ms''' for HDD = high latency
|-
| <code>w/s</code> || Writes per second || Compare with disk's rated IOPS
|}
 
'''2b) iotop - Per-process I/O usage'''
<syntaxhighlight lang="bash">
# Show I/O by process (run as root)
iotop -o
</syntaxhighlight>
 
Look for <code>voipmonitor</code> or <code>mysqld</code> dominating I/O. If voipmonitor shows high DISK WRITE but system <code>%util</code> is 100%, disk cannot keep up.
 
'''2c) ioping - Quick latency check'''
<syntaxhighlight lang="bash">
# Test latency on VoIPmonitor spool directory
cd /var/spool/voipmonitor
ioping -c 20 .
</syntaxhighlight>
 
'''Expected results:'''
{| class="wikitable"
|-
! Storage Type !! Healthy Latency !! Problem Indicator
|-
| NVMe SSD || < 0.5 ms || > 2 ms
|-
| SATA SSD || < 1 ms || > 5 ms
|-
| HDD (7200 RPM) || < 10 ms || > 30 ms
|}
 
==== Step 4: Linux CPU Diagnostics ====
 
'''3a) top - Overall CPU usage'''
<syntaxhighlight lang="bash">
# Press '1' to show per-core CPU
top
</syntaxhighlight>
 
Look for:
* Individual CPU core at 100% (t0 thread is single-threaded)
* High <code>%wa</code> (I/O wait) vs high <code>%us/%sy</code> (CPU-bound)
 
'''3b) Verify voipmonitor threads'''
<syntaxhighlight lang="bash">
# Show voipmonitor threads with CPU usage
top -H -p $(pgrep voipmonitor)
</syntaxhighlight>
 
If one thread shows ~100% CPU while others are low, you have a CPU bottleneck on the capture thread (t0).
 
==== Step 5: Decision Matrix ====
 
{| class="wikitable"
|-
! Observation !! Likely Cause !! Go To
|-
| <code>heap</code> high, <code>t0CPU</code> > 80%, iostat <code>%util</code> low || '''CPU Bottleneck''' || [[#Solution: CPU Bottleneck|CPU Solution]]
|-
| <code>heap</code> high, <code>t0CPU</code> < 50%, iostat <code>%util</code> > 90% || '''I/O Bottleneck''' || [[#Solution: I/O Bottleneck|I/O Solution]]
|-
| <code>heap</code> high, <code>t0CPU</code> < 50%, iostat <code>%util</code> < 50%, <code>SQLq</code> growing || '''Database Bottleneck''' || [[#SQL Queue Overload|Database Solution]]
|-
| <code>heap</code> normal, <code>comp</code> maxed, <code>tacCPU</code> all ~100% || '''Compression Bottleneck''' (type of I/O) || [[#Solution: I/O Bottleneck|I/O Solution]]
|}
 
==== Step 6: Confirmation Test (Optional) ====


# Check which port MySQL is listening on
After identifying the likely cause with the tools above, you can confirm with a storage disable test:
ss -tulpn | grep mysql
# or
netstat -tulpn | grep mysql
</pre>


MySQL should be listening on the interface and port specified in your <code>voipmonitor.conf</code> <code>mysqlhost</code> and <code>mysqlport</code> settings.
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf - temporarily disable all storage
savesip = no
savertp = no
savertcp = no
savegraph = no
</syntaxhighlight>


;8. Apply configuration changes and restart the sensor:
<syntaxhighlight lang="bash">
After correcting the database connection settings in <code>/etc/voipmonitor.conf</code>:
<pre>
# Restart the VoIPmonitor service to apply changes
systemctl restart voipmonitor
systemctl restart voipmonitor
# Monitor for 5-10 minutes during peak traffic
journalctl -u voipmonitor -f | grep heap
</syntaxhighlight>
* If <code>heap</code> values drop to near zero → confirms '''I/O bottleneck'''
* If <code>heap</code> values remain high → confirms '''CPU bottleneck'''
{{Warning|Remember to re-enable storage after testing! This test causes call recordings to be lost.}}
=== Solution: I/O Bottleneck ===
{{Note|If you see <code>IO[...] DISK_SAT</code> or <code>WARN</code> in the syslog status line (v2026.01.3+), disk saturation is already confirmed. See [[Syslog_Status_Line#IO.5B....5D_-_Disk_I.2FO_Monitoring_.28v2026.01.3.2B.29|IO[] Metrics]] for details.}}
'''Quick confirmation (for older versions):'''
Temporarily save only RTP headers to reduce disk write load:
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
savertp = header
</syntaxhighlight>
Restart the sniffer and monitor. If heap usage stabilizes and "MEMORY IS FULL" errors stop, the issue is confirmed to be storage I/O.
'''Check storage health before upgrading:'''
<syntaxhighlight lang="bash">
# Check drive health
smartctl -a /dev/sda
# Check for I/O errors in system logs
dmesg | grep -i "i/o error\|sd.*error\|ata.*error"
</syntaxhighlight>
Look for reallocated sectors, pending sectors, or I/O errors. Replace failing drives before considering upgrades.
'''Storage controller cache settings:'''
{| class="wikitable"
|-
! Storage Type !! Recommended Cache Mode
|-
| HDD / NAS || WriteBack (requires battery-backed cache)
|-
| SSD || WriteThrough (or WriteBack with power loss protection)
|}
Use vendor-specific tools to configure cache policy (<code>megacli</code>, <code>ssacli</code>, <code>perccli</code>).
'''Storage upgrades (in order of effectiveness):'''
{| class="wikitable"
|-
! Solution !! IOPS Improvement !! Notes
|-
| '''NVMe SSD''' || 50-100x vs HDD || Best option, handles 10,000+ concurrent calls
|-
| '''SATA SSD''' || 20-50x vs HDD || Good option, handles 5,000+ concurrent calls
|-
| '''RAID 10 with BBU''' || 5-10x vs single disk || Enable WriteBack cache (requires battery backup)
|-
| '''Separate storage server''' || Variable || Use [[Sniffer_distributed_architecture|client/server mode]]
|}
'''Filesystem tuning (ext4):'''
<syntaxhighlight lang="bash">
# Check current mount options
mount | grep voipmonitor
# Recommended mount options for /var/spool/voipmonitor
# Add to /etc/fstab: noatime,data=writeback,barrier=0
# WARNING: barrier=0 requires battery-backed RAID
</syntaxhighlight>
'''Verify improvement:'''
<syntaxhighlight lang="bash">
# After changes, monitor iostat
iostat -xz 2 10
# %util should drop below 70%, await should decrease
</syntaxhighlight>


# Alternatively, reload without full restart (if supported in your version)
=== Solution: CPU Bottleneck ===
echo 'reload' | nc 127.0.0.1 5029


# Verify the service started successfully
==== Identify CPU Bottleneck Using Manager Commands ====
systemctl status voipmonitor
 
VoIPmonitor provides manager commands to monitor thread CPU usage in real-time. This is essential for identifying which thread is saturated.
 
'''Connect to manager interface:'''
<syntaxhighlight lang="bash">
# Via Unix socket (local, recommended)
echo 'sniffer_threads' | nc -U /tmp/vm_manager_socket
 
# Via TCP port 5029 (remote or local)
echo 'sniffer_threads' | nc 127.0.0.1 5029
 
# Monitor continuously (every 2 seconds)
watch -n 2 "echo 'sniffer_threads' | nc -U /tmp/vm_manager_socket"
</syntaxhighlight>
 
{{Note|1=TCP port 5029 is encrypted by default. For unencrypted access, set <code>manager_enable_unencrypted = yes</code> in voipmonitor.conf (security risk on public networks).}}
 
'''Example output:'''
<syntaxhighlight lang="text">
t0 - binlog1 fifo pcap read          (  12345) :  78.5  FIFO  99    1234
t2 - binlog1 pb write                (  12346) :  12.3              456
rtp thread binlog1 binlog1 0        (  12347) :  8.1              234
rtp thread binlog1 binlog1 1        (  12348) :  6.2              198
t1 - binlog1 call processing        (  12349) :  4.5              567
tar binlog1 compression 0            (  12350) :  3.2                89
</syntaxhighlight>
 
'''Column interpretation:'''
{| class="wikitable"
|-
! Column !! Description
|-
| Thread name || Descriptive name (t0=capture, t1=call processing, t2=packet buffer write)
|-
| (TID) || Linux thread ID (useful for <code>top -H -p TID</code>)
|-
| CPU % || Current CPU usage percentage - '''key metric'''
|-
| Sched || Scheduler type (FIFO = real-time, empty = normal)
|-
| Priority || Thread priority
|-
| CS/s || Context switches per second
|}
 
'''Critical threads to watch:'''
{| class="wikitable"
|-
! Thread !! Role !! If at 90-100%
|-
| '''t0''' (pcap read) || Packet capture from NIC || '''Single-core limit reached!''' Cannot parallelize. Need DPDK/Napatech.
|-
| '''t2''' (pb write) || Packet buffer processing || Processing bottleneck. Check t2CPU breakdown.
|-
| '''rtp thread''' || RTP packet processing || Threads auto-scale. If still saturated, consider DPDK/Napatech.
|-
| '''tar compression''' || PCAP archiving || I/O bottleneck (compression waiting for disk)
|-
| '''mysql store''' || Database writes || Database bottleneck. Check SQLq metric.
|}
 
{{Warning|If '''t0 thread is at 90-100%''', you have hit the fundamental single-core capture limit. The t0 thread reads packets from the kernel and '''cannot be parallelized'''. Disabling features like jitterbuffer will NOT help - those run on different threads. The only solutions are:
* '''Reduce captured traffic''' using <code>interface_ip_filter</code> or BPF <code>filter</code>
* '''Use kernel bypass''' ([[DPDK]] or [[Napatech]]) which eliminates kernel overhead entirely}}
 
==== Interpreting t2CPU Detailed Breakdown ====
 
The syslog status line shows <code>t2CPU</code> with detailed sub-metrics:
<syntaxhighlight lang="text">
t2CPU[pb:10/ d:39/ s:24/ e:17/ c:6/ g:6/ r:7/ rm:24/ rh:16/ rd:19/]
</syntaxhighlight>
 
{| class="wikitable"
|-
! Code !! Function !! High Value Indicates
|-
| '''pb''' || Packet buffer output || Buffer management overhead
|-
| '''d''' || Dispatch || Structure creation bottleneck
|-
| '''s''' || SIP parsing || Complex/large SIP messages
|-
| '''e''' || Entity lookup || Call table lookup overhead
|-
| '''c''' || Call processing || Call state machine processing
|-
| '''g''' || Register processing || High REGISTER volume
|-
| '''r, rm, rh, rd''' || RTP processing stages || High RTP volume (threads auto-scale)
|}
 
'''Thread auto-scaling:''' VoIPmonitor automatically spawns additional threads when load increases:
* If '''d''' > 50% → SIP parsing thread ('''s''') starts
* If '''s''' > 50% → Entity lookup thread ('''e''') starts
* If '''e''' > 50% → Call/register/RTP threads start
 
==== Configuration for High Traffic (>10,000 calls/sec) ====
 
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
 
# Increase buffer to handle processing spikes (value in MB)
# 10000 = 10 GB - can go higher (20000, 30000+) if RAM allows
# Larger buffer absorbs I/O and CPU spikes without packet loss
max_buffer_mem = 10000
 
# Use IP filter instead of BPF (more efficient)
interface_ip_filter = 10.0.0.0/8
interface_ip_filter = 192.168.0.0/16
# Comment out any 'filter' parameter
</syntaxhighlight>
 
==== CPU Optimizations ====
 
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
 
# Reduce jitterbuffer calculations to save CPU (keeps MOS-F2 metric)
jitterbuffer_f1 = no
jitterbuffer_f2 = yes
jitterbuffer_adapt = no
 
# If MOS metrics are not needed at all, disable everything:
# jitterbuffer_f1 = no
# jitterbuffer_f2 = no
# jitterbuffer_adapt = no
</syntaxhighlight>
 
==== Kernel Bypass Solutions (Extreme Loads) ====
 
When t0 thread hits 100% on standard NIC, kernel bypass is the only solution:
 
{| class="wikitable"
|-
! Solution !! Type !! CPU Reduction !! Use Case
|-
| '''[[DPDK]]''' || Open-source || ~70% || Multi-gigabit on commodity hardware
|-
| '''[[Napatech]]''' || Hardware SmartNIC || >97% (< 3% at 10Gbit) || Extreme performance requirements
|}
 
==== Verify Improvement ====
 
<syntaxhighlight lang="bash">
# Monitor thread CPU after changes
watch -n 2 "echo 'sniffer_threads' | nc -U /tmp/vm_manager_socket | head -10"
 
# Or monitor syslog
journalctl -u voipmonitor -f
# t0CPU should drop, heap values should stay < 20%
</syntaxhighlight>
 
{{Note|1=After changes, monitor syslog <code>heap[A&#124;B&#124;C]</code> values - should stay below 20% during peak traffic. See [[Syslog_Status_Line]] for detailed metric explanations.}}
 
== Storage Hardware Failure ==
 
'''Symptom''': Sensor shows disconnected (red X) with "DROPPED PACKETS" at low traffic volumes.
 
'''Diagnosis''':
<syntaxhighlight lang="bash">
# Check disk health
smartctl -a /dev/sda
 
# Check RAID status (if applicable)
cat /proc/mdstat
mdadm --detail /dev/md0
</syntaxhighlight>
 
Look for reallocated sectors, pending sectors, or RAID degraded state. Replace failing disk.
 
== OOM (Out of Memory) ==
 
=== Identify OOM Victim ===
 
<syntaxhighlight lang="bash">
# Check for OOM kills
dmesg | grep -i "out of memory\|oom\|killed process"
journalctl --since "1 hour ago" | grep -i oom
</syntaxhighlight>
 
=== MySQL Killed by OOM ===


# Check logs for database connection confirmation
Reduce InnoDB buffer pool:
journalctl -u voipmonitor -n 20
<syntaxhighlight lang="ini">
</pre>
# /etc/mysql/my.cnf
innodb_buffer_pool_size = 2G  # Reduce from default
</syntaxhighlight>


Look for a successful database connection message in the logs, which typically appears within the first few seconds after startup.
=== Voipmonitor Killed by OOM ===


;9. Common Troubleshooting Scenarios:
Reduce buffer sizes in voipmonitor.conf:
<syntaxhighlight lang="ini">
max_buffer_mem = 2000  # Reduce from default
ringbuffer = 50        # Reduce from default
</syntaxhighlight>


<b>Scenario A: Database server IP changed after migration</b>
=== Runaway External Process ===
* Symptom: "Can't connect to MySQL server on '10.1.1.10' (111)"
* Fix: Update <code>mysqlhost</code> in <code>/etc/voipmonitor.conf</code> to the new database server IP


<b>Scenario B: Wrong MySQL username or password</b>
<syntaxhighlight lang="bash">
* Symptom: "Access denied for user 'user'@'host'"
# Find memory-hungry processes
* Fix: Verify credentials match the database server's user permissions, update <code>mysqlusername</code> and <code>mysqlpassword</code>
ps aux --sort=-%mem | head -20


<b>Scenario C: Database service not running</b>
# Kill orphaned/runaway process
* Symptom: "Connection refused (111)" or "Connection timed out"
kill -9 <PID>
* Fix: Start MySQL/MariaDB service on the database server: <code>systemctl start mariadb</code>
</syntaxhighlight>
For servers limited to '''16GB RAM''' or when experiencing repeated MySQL OOM kills:


<b>Scenario D: Firewall blocking port 3306</b>
<syntaxhighlight lang="ini">
* Symptom: "Connection refused" when testing with <code>nc</code>, but MySQL is running
# /etc/my.cnf or /etc/mysql/mariadb.conf.d/50-server.cnf
* Fix: Open port 3306 in firewall, or configure MySQL to allow connections from the sensor's IP in <code>user</code> table
[mysqld]
# On 16GB server: 6GB buffer pool + 6GB MySQL overhead = 12GB total
# Leaves 4GB for OS + GUI, preventing OOM
innodb_buffer_pool_size = 6G


<b>Scenario E: Localhost vs remote connection confusion</b>
# Enable write buffering (may lose up to 1s of data on crash but reduces memory pressure)
* Symptom: Connection works locally but fails from sensor
innodb_flush_log_at_trx_commit = 2
* Fix: Ensure <code>mysqlhost</code> uses the actual IP address (not <code>localhost</code> or <code>127.0.0.1</code>) if the sensor is on a different host
</syntaxhighlight>


For more detailed information about all <code>mysql*</code> configuration parameters, see [[Sniffer_configuration#Database_Configuration]].
Restart MySQL after changes:
== Step 8: Check for Storage Hardware Errors (HEAP FULL / packetbuffer Issues) ==
<syntaxhighlight lang="bash">
If the sensor is crashing with "HEAP FULL" errors or showing "packetbuffer: MEMORY IS FULL" messages, you must distinguish between '''actual storage hardware failures''' (requires disk replacement) and '''performance bottlenecks''' (requires tuning).
systemctl restart mysql
# or
systemctl restart mariadb
</syntaxhighlight>
=== SQL Queue Growth from Non-Call Data ===


;1. Check kernel message buffer for storage errors:
If <code>sip-register</code>, <code>sip-options</code>, or <code>sip-subscribe</code> are enabled, non-call SIP-messages (OPTIONS, REGISTER, SUBSCRIBE, NOTIFY) can accumulate in the database and cause the SQL queue to grow unbounded. This increases MySQL memory usage and leads to OOM kills of mysqld.
<pre>dmesg -T | grep -iE "ext4-fs error|i/o error|nvram warning|ata.*failed|sda.*error|disk failure|smart error" | tail -50</pre>


Look for these hardware error indicators:
{{Warning|1=Even with reduced <code>innodb_buffer_pool_size</code>, SQL queue will grow indefinitely without cleanup of non-call data.}}
* <code>ext4-fs error</code> - Filesystem corruption or disk failure
* <code>I/O error</code> or <code>BUG: soft lockup</code> - Disk read/write failures
* <code>NVRAM WARNING: nvram_check: failed</code> - RAID controller battery/capacitor issues
* <code>ata.*: FAILED</code> - Hard drive SMART failure
* <code>Buffer I/O error</code> - Disk unable to complete operations


If you see ANY of these errors:
'''Solution: Enable automatic cleanup of old non-call data'''
* The storage subsystem is failing and likely needs hardware replacement
<syntaxhighlight lang="ini">
* Do not attempt performance tuning - replace the failed disk/RAID first
# /etc/voipmonitor.conf
* Check SMART status: <code>smartctl -a /dev/sda</code>
# cleandatabase=2555 automatically deletes partitions older than 7 years
* Check RAID health: <code>cat /proc/mdstat</code> or RAID controller tools
# Covers: CDR, register_state, register_failed, and sip_msg (OPTIONS/SUBSCRIBE/NOTIFY)
cleandatabase = 2555
</syntaxhighlight>


;2. If dmesg is clean of errors → Performance Bottleneck:
Restart the sniffer after changes:
If the kernel logs show no storage errors, the issue is a performance bottleneck (disk too slow, network latency, etc.).
<syntaxhighlight lang="bash">
systemctl restart voipmonitor
</syntaxhighlight>


<b>Check disk I/O performance:</b>
{{Note|See [[Data_Cleaning]] for detailed configuration options and other <code>cleandatabase_*</code> parameters.}}
<pre>
== Service Startup Failures ==
# Current I/O wait (should be < 10% normally)
iostat -x 5


# Detailed disk stats
=== Interface No Longer Exists ===
dstat -d


# Real-time disk latency
After OS upgrade, interface names may change (eth0 → ensXXX):
ioping -c 10 .
</pre>


<b>Check NFS latency (if using NFS storage):</b>
<syntaxhighlight lang="bash">
<pre>
# Find current interface names
# Test NFS read/write latency
ip a
time dd if=/dev/zero of=/var/spool/voipmonitor/testfile bs=1M count=100
time cat /var/spool/voipmonitor/testfile > /dev/null
rm /var/spool/voipmonitor/testfile


# Check NFS mount options
# Update all config locations
mount | grep nfs
grep -r "interface" /etc/voipmonitor.conf /etc/voipmonitor.conf.d/
</pre>


<b>Common performance solutions:</b>
# Also check GUI: Settings → Sensors → Configuration
* Use SSD/NVMe for VoIPmonitor spool directory
</syntaxhighlight>
* Ensure proper NIC queue settings for high-throughput NFS
* Check network switch port configuration for NFS
* Review [[Scaling]] guide for detailed optimization


See also [[IO_Measurement]] for comprehensive disk benchmarking tools.
=== Missing Dependencies ===


== Step 9: Check for OOM (Out of Memory) Issues ==
<syntaxhighlight lang="bash">
If VoIPmonitor suddenly stops processing CDRs and a service restart temporarily restores functionality, the system may be experiencing OOM (Out of Memory) killer events. The Linux OOM killer terminates processes when available RAM is exhausted, and MySQL (`mysqld`) is a common target due to its memory-intensive nature.
# Install common missing package
apt install libpcap0.8  # Debian/Ubuntu
yum install libpcap    # RHEL/CentOS
</syntaxhighlight>


;1. Check for OOM killer events in kernel logs:
== Network Interface Issues ==
<pre>
# For Debian/Ubuntu
grep -i "out of memory\|killed process" /var/log/syslog | tail -20


# For CentOS/RHEL/AlmaLinux
=== Promiscuous Mode ===
grep -i "out of memory\|killed process" /var/log/messages | tail -20


# Also check dmesg:
Required for SPAN port monitoring:
dmesg | grep -i "killed process" | tail -10
<syntaxhighlight lang="bash">
</pre>
# Enable
Typical OOM killer messages look like:
ip link set eth0 promisc on
<pre>
Out of memory: Kill process 1234 (mysqld) score 123 or sacrifice child
Killed process 1234 (mysqld) total-vm: 12345678kB, anon-rss: 1234567kB
</pre>


;2. Monitor current memory usage:
# Verify
<pre>
ip link show eth0 | grep PROMISC
# Check available memory (look for low 'available' or 'free' values)
</syntaxhighlight>
free -h


# Check per-process memory usage (sorted by RSS)
{{Note|Promiscuous mode is NOT required for ERSPAN/GRE tunnels where traffic is addressed to the sensor.}}
ps aux --sort=-%mem | head -15


# Check MySQL memory usage in bytes
=== Interface Drops ===
cat /proc/$(pgrep mysqld)/status | grep -E "VmSize|VmRSS"
</pre>
Warning signs:
* '''Available memory consistently below 500MB during operation'''
* '''MySQL consuming most of the available RAM'''
* '''Swap usage near 100% (if swap is enabled)'''
* '''Frequent process restarts without clear error messages'''


;3. First Fix: Check and correct innodb_buffer_pool_size:
<syntaxhighlight lang="bash">
Before upgrading hardware, verify that <code>innodb_buffer_pool_size</code> is not set too high. This is a common cause of OOM incidents. If MySQL/MariaDB is consuming most of the available RAM, the buffer pool size is likely configured incorrectly for your system.
# Check for drops
ip -s link show eth0 | grep -i drop


'''Calculate the correct buffer pool size:'''
# If drops present, increase ring buffer
For a server running both VoIPmonitor and MySQL on the same host:
ethtool -G eth0 rx 4096
<pre>
</syntaxhighlight>
Formula: innodb_buffer_pool_size = (Total RAM - VoIPmonitor memory - OS & services overhead - safety margin) / 2


Example for a 32GB server:
=== Bonded/EtherChannel Interfaces ===
- Total RAM: 32GB
- VoIPmonitor process memory (check with ps aux): 2GB
- OS + other services overhead: 2GB
- Safety margin: ~25-30% of remaining RAM for other internal buffers


Calculation:
'''Symptom''': False packet loss when monitoring bond0 or br0.
Available for buffer pool = 32GB - 2GB - 2GB = 28GB
Recommended innodb_buffer_pool_size = 14G (approximately 50% of available memory)
</pre>


'''Edit the MariaDB configuration file:'''
'''Solution''': Monitor physical interfaces, not logical:
<pre>
<syntaxhighlight lang="ini">
# Common locations: /etc/mysql/my.cnf, /etc/mysql/mariadb.conf.d/50-server.cnf, /etc/my.cnf.d/
# voipmonitor.conf - use physical interfaces
interface = eth0,eth1
</syntaxhighlight>


innodb_buffer_pool_size = 14G  # Adjust based on your calculation
=== Network Offloading Issues ===
</pre>


'''Restart MariaDB to apply:'''
'''Symptom''': Kernel errors like <code>bad gso: type: 1, size: 1448</code>
<pre>systemctl restart mariadb  # or systemctl restart mysql</pre>


If the OOM events stop after correcting <code>innodb_buffer_pool_size</code>, no hardware upgrade is needed.
<syntaxhighlight lang="bash">
# Disable offloading on capture interface
ethtool -K eth0 gso off tso off gro off lro off
</syntaxhighlight>


;4. Solution: Increase physical memory (if buffer pool tuning is insufficient):
== Packet Ordering Issues ==
If correcting the buffer pool size does not resolve the OOM issues, upgrade the server's physical RAM. After upgrading:
* Verify memory improvements with <code>free -h</code>
* Recalculate and adjust <code>innodb_buffer_pool_size</code> to utilize the additional memory
* Monitor for several days to ensure OOM events stop


Additional mitigation strategies (while planning for RAM upgrade or adjusting configuration):
If SIP messages appear out of sequence:
* Reduce MySQL's memory footprint by lowering <code>innodb_buffer_pool_size</code> (e.g., from 16GB to 8GB)
* Ensure swap space is properly configured as a safety buffer (though swap is much slower than RAM)
* Use <code>sysctl vm.swappiness=10</code> to favor RAM over swap when some memory is still available


;4. Second Fix: Reduce VoIPmonitor buffer memory usage:
'''First''': Rule out Wireshark display artifact - disable "Analyze TCP sequence numbers" in Wireshark. See [[FAQ]].
In addition to MySQL memory consumption, VoIPmonitor itself allocates significant memory for packet buffers. The total buffer memory used by VoIPmonitor is calculated based on:


'''VoIPmonitor Buffer Memory Calculation:'''
'''If genuine reordering''': Usually caused by packet bursts in network infrastructure. Use tcpdump to verify packets arrive out of order at the interface. Work with network admin to implement QoS or traffic shaping. For persistent issues, consider dedicated capture card with hardware timestamping (see [[Napatech]]).
* '''<code>ringbuffer</code>''': Ring buffer size in MB per interface (default: 50MB, recommended ≥500MB for >100 Mbit traffic)
{{Note|For out-of-order packets in '''client/server mode''' (multiple sniffers), see [[Sniffer_distributed_architecture]] for <code>pcap_queue_dequeu_window_length</code> configuration.}}
* '''<code>max_buffer_mem</code>''': Maximum buffer memory limit in MB (default: 2000MB)
* '''Number of sniffing interfaces''': Each interface gets its own ringbuffer allocation
* '''Total formula''': Approximate total = (ringbuffer × number of interfaces) + max_buffer_mem


If you are monitoring multiple interfaces (e.g., <code>interface = eth0,eth1,eth2</code>), each interface uses a separate ringbuffer. With the default ringbuffer of 50MB and 3 interfaces, that's 150MB plus max_buffer_mem of 2000MB, totaling approximately 2150MB.
=== Solutions for SPAN/Mirroring Reordering ===


'''To reduce VoIPmonitor memory usage:'''
If packets arrive out of order at the SPAN/mirror port (e.g., 302 responses before INVITE causing "000 no response" errors):


Edit <code>/etc/voipmonitor.conf</code> and decrease buffer settings:
1. '''Configure switch to preserve packet order''': Many switches allow configuring SPAN/mirror ports to maintain packet ordering. Consult your switch documentation for packet ordering guarantees in mirroring configuration.
<pre>
# Reduce ringbuffer for each interface (e.g., from 50 to 20)
ringbuffer = 20


# Reduce maximum buffer memory (e.g., from 2000 to 1000)
2. '''Replace SPAN with TAP or packet broker''': Unlike software-based SPAN mirroring, hardware TAPs and packet brokers guarantee packet order. Consider upgrading to a dedicated TAP or packet broker device for mission-critical monitoring.
max_buffer_mem = 1000
== Database Issues ==


# Alternatively, reduce the number of sniffing interfaces if not all are needed
=== SQL Queue Overload ===
interface = eth0,eth1  # Instead of eth0,eth1,eth2,eth3
</pre>


After making changes, restart the VoIPmonitor service:
'''Symptom''': Growing <code>SQLq</code> metric, potential coredumps.
<pre>systemctl restart voipmonitor</pre>


'''Important notes:'''
<syntaxhighlight lang="ini">
* Reducing <code>ringbuffer</code> may increase packet loss during traffic spikes
# voipmonitor.conf - increase threads
* Reducing <code>max_buffer_mem</code> affects how many packets can be buffered before being written to disk
mysqlstore_concat_limit_cdr = 1000
* Monitor packet loss statistics in the GUI after reducing buffers to ensure acceptable performance
cdr_check_exists_callid = 0
</syntaxhighlight>


;5. Solution: Increase physical memory (if buffer tuning is insufficient):
=== Error 1062 - Lookup Table Limit ===
If correcting both MySQL and VoIPmonitor buffer settings does not resolve the OOM issues, upgrade the server's physical RAM. After upgrading:
* Verify memory improvements with <code>free -h</code>
* Recalculate and adjust <code>innodb_buffer_pool_size</code> to utilize the additional memory
* Re-tune <code>ringbuffer</code> and <code>max_buffer_mem</code> for the new memory capacity
* Monitor for several days to ensure OOM events stop


== Step 10: Sensor Upgrade Fails with "Permission denied" from /tmp ==
'''Symptom''': <code>Duplicate entry '16777215' for key 'PRIMARY'</code>
If the sensor upgrade process fails with "Permission denied" errors when executing scripts from the `/tmp` directory, or the service fails to restart after upgrade, the `/tmp` partition may be mounted with the `noexec` flag.


The `noexec` mount option prevents execution of any script or binary from the `/tmp` directory for security reasons. However, the VoIPmonitor sensor upgrade process uses `/tmp` for temporary script execution.
'''Quick fix''':
<syntaxhighlight lang="ini">
# voipmonitor.conf
cdr_reason_string_enable = no
</syntaxhighlight>


;1. Check the mount options for /tmp:
See [[Database_troubleshooting#Database_Error_1062_-_Lookup_Table_Auto-Increment_Limit|Database Troubleshooting]] for complete solution.
<pre>mount | grep /tmp</pre>
Look for the `noexec` flag in the mount options. Output will show something like:
<pre>/dev/sda2 on /tmp type ext4 rw,relatime,noexec,nosuid,nodev</pre>


;2. Remount /tmp without noexec (temporary fix):
== Bad Packet Errors ==
<pre>mount -o remount,exec /tmp</pre>
Verify the change:
<pre>mount | grep /tmp</pre>
The output should no longer contain `noexec`.


;3. Make the change permanent (edit /etc/fstab):
'''Symptom''': <code>bad packet with ether_type 0xFFFF detected on interface</code>
Open the `/etc/fstab` file and locate the line corresponding to the `/tmp` mount point. Remove the `noexec` option from that line.
<pre>nano /etc/fstab</pre>
Example:
<pre>
# Before:
/dev/sda2  /tmp  ext4  rw,relatime,noexec,nosuid,nodev  0 0


# After (remove noexec):
'''Diagnosis''':
/dev/sda2  /tmp  ext4  rw,relatime,nosuid,nodev  0 0
<syntaxhighlight lang="bash">
</pre>
# Run diagnostic (let run 30-60 seconds, then kill)
If `/tmp` is a separate partition, you may need to remount it for changes to take effect:
voipmonitor --check_bad_ether_type=eth0
<pre>mount -o remount /tmp</pre>


;4. Re-run the sensor upgrade:
# Find and kill the diagnostic process
After fixing the mount options, retry the sensor upgrade process.
ps ax | grep voipmonitor
kill -9 <PID>
</syntaxhighlight>


== Appendix: tshark Display Filter Syntax for SIP ==
Causes: corrupted packets, driver issues, VLAN tagging problems. Check <code>ethtool -S eth0</code> for interface errors.
When using `tshark` to analyze SIP traffic, it is important to use the '''correct Wireshark display filter syntax'''. Below are common filter examples:


=== Basic SIP Filters ===
== Useful Diagnostic Commands ==
<pre>
# Show all SIP INVITE messages
tshark -r capture.pcap -Y "sip.Method == INVITE"


# Show all SIP messages (any method)
=== tshark Filters for SIP ===
tshark -r capture.pcap -Y "sip"


# Show SIP and RTP traffic
<syntaxhighlight lang="bash">
tshark -r capture.pcap -Y "sip || rtp"
# All SIP INVITEs
</pre>
tshark -r capture.pcap -Y "sip.Method == INVITE"


=== Search for Specific Phone Number or Text ===
# Find specific phone number
<pre>
# Find calls containing a specific phone number (e.g., 5551234567)
tshark -r capture.pcap -Y 'sip contains "5551234567"'
tshark -r capture.pcap -Y 'sip contains "5551234567"'


# Find INVITE messages for a specific number
# Get Call-IDs
tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"'
tshark -r capture.pcap -Y "sip.Method == INVITE" -T fields -e sip.Call-ID
</pre>


=== Extract Call-ID from Matching Calls ===
# SIP errors (4xx, 5xx)
<pre>
tshark -r capture.pcap -Y "sip.Status-Code >= 400"
# Get Call-ID for calls matching a phone number
</syntaxhighlight>
tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"' -T fields -e sip.Call-ID


# Get Call-ID along with From and To headers
=== Interface Statistics ===
tshark -r capture.pcap -Y 'sip.Method == INVITE' -T fields -e sip.Call-ID -e sip.from.user -e sip.to.user
</pre>


=== Filter by IP Address ===
<syntaxhighlight lang="bash">
<pre>
# Detailed NIC stats
# SIP traffic from a specific source IP
ethtool -S eth0
tshark -r capture.pcap -Y "sip && ip.src == 192.168.1.100"


# SIP traffic between two hosts
# Watch packet rates
tshark -r capture.pcap -Y "sip && ip.addr == 192.168.1.100 && ip.addr == 10.0.0.50"
watch -n 1 'cat /proc/net/dev | grep eth0'
</pre>
</syntaxhighlight>


=== Filter by SIP Response Code ===
== See Also ==
<pre>
# Show all 200 OK responses
tshark -r capture.pcap -Y "sip.Status-Code == 200"


# Show all 4xx and 5xx error responses
* [[Sniffer_configuration]] - Configuration parameter reference
tshark -r capture.pcap -Y "sip.Status-Code >= 400"
* [[Sniffer_distributed_architecture]] - Client/server deployment
* [[Capture_rules]] - GUI-based recording rules
* [[Sniffing_modes]] - SPAN, ERSPAN, GRE, TZSP setup
* [[Scaling]] - Performance optimization
* [[Database_troubleshooting]] - Database issues
* [[FAQ]] - Common questions and Wireshark display issues


# Show 486 Busy Here responses
tshark -r capture.pcap -Y "sip.Status-Code == 486"
</pre>


=== Important Syntax Notes ===
* '''Field names are case-sensitive:''' Use <code>sip.Method</code>, <code>sip.Call-ID</code>, <code>sip.Status-Code</code> (not <code>sip.method</code> or <code>sip.call-id</code>)
* '''String matching uses <code>contains</code>:''' Use <code>sip contains "text"</code> (not <code>sip.contains()</code>)
* '''Use double quotes for strings:''' <code>sip contains "number"</code> (not single quotes)
* '''Boolean operators:''' Use <code>&&</code> (and), <code>||</code> (or), <code>!</code> (not)


For a complete reference, see the [https://www.wireshark.org/docs/dfref/s/sip.html Wireshark SIP Display Filter Reference].


== Step 11: Check VoIPmonitor Logs for General Errors ==
After addressing the specific issues above, check the system logs for other error messages from the sensor process that may reveal additional problems.


<pre>
# For Debian/Ubuntu
tail -f /var/log/syslog | grep voipmonitor


# For CentOS/RHEL/AlmaLinux
tail -f /var/log/messages | grep voipmonitor
</pre>


Look for errors like:
* "pcap_open_live(eth0) error: eth0: No such device" (Wrong interface name)
* "Permission denied" (The sensor is not running with sufficient privileges)
* Messages about connection issues (see Step 6 for database connection troubleshooting)
* Messages about dropping packets


== AI Summary for RAG ==
== AI Summary for RAG ==
'''Summary:''' This document provides a step-by-step troubleshooting guide for when the VoIPmonitor sensor is not capturing any calls. The process is broken down into eleven logical steps. Step 1 is to verify the service is running correctly using `systemctl status` and `ps`. Step 1 also includes troubleshooting for when the service fails to start with "binary not found" error after a crash: check if the binary has been renamed with an underscore suffix (e.g., `voipmonitor_`), rename it back to the standard name with `mv /usr/local/sbin/voipmonitor_ /usr/local/sbin/voipmonitor`, then restart the service. Additionally, Step 1 includes a specific troubleshooting procedure for sensors that become unresponsive after GUI updates: execute `killall voipmonitor`, `systemctl stop voipmonitor`, and `systemctl start voipmonitor` to forcefully restart the service, then verify sensor status in the GUI. Step 2 is to use `tshark` to confirm if SIP/RTP traffic is actually arriving at the server's network interface. Step 3 covers network-level issues, including an important distinction between Layer 2 mirroring (SPAN/RSPAN) which requires promiscuous mode, and Layer 3 tunneling (ERSPAN/GRE/TZSP/VXLAN) which does NOT require promiscuous mode because the tunnel packets are addressed to the sensor's IP. Step 4 focuses on checking the `voipmonitor.conf` file for common misconfigurations like the `interface`, `sipport`, or `filter` parameters. Step 5 addresses GUI capture rules with the "Skip" option that can cause probes to stop processing calls even when network traffic is visible; it explains how to review, backup, remove, and systematically test capture rules via the GUI interface. Step 6 is a comprehensive guide for troubleshooting MySQL/MariaDB database connection errors, specifically "Connection refused (111)" errors that commonly occur after migrations. Step 6 provides detailed instructions for: checking sensor logs for database errors; verifying `mysqlhost`, `mysqlport`, `mysqlusername`, `mysqlpassword`, and `mysqldatabase` parameters in `/etc/voipmonitor.conf`; testing MySQL connectivity with `nc` and the `mysql` command-line client; comparing configuration with working sensors; checking firewall and network connectivity; verifying the MySQL/MariaDB service is running and listening on the correct port; and applying configuration changes by restarting the sensor service. The step also includes common troubleshooting scenarios: database server IP changed after migration (update `mysqlhost`), wrong username or password (update credentials), database service not running (start MariaDB), firewall blocking port 3306 (open firewall rules), and localhost vs remote connection confusion (use actual IP address). Step 7 explains how to diagnose HEAP FULL / packetbuffer MEMORY IS FULL errors by first checking kernel message buffer (`dmesg -T`) for storage hardware errors like `ext4-fs error`, `I/O error`, or `NVRAM WARNING`. If hardware errors are present, the disk needs replacement. If dmesg is clear, the issue is a performance bottleneck that requires checking disk I/O performance and NFS latency. This step emphasizes the critical distinction between hardware failure and performance issues. Step 9 explains how to diagnose OOM (Out of Memory) killer events, which cause CDR processing to stop and require service restarts to restore. It provides commands to check for OOM events in kernel logs (grep, dmesg), monitor current memory usage (free -h, ps aux), and identify warning signs like low available memory (<500MB), high MySQL memory consumption, and frequent process restarts. The first fix is to check and correct `innodb_buffer_pool_size`, which is often set too high and causes MariaDB to consume most of the available RAM. The guide provides a calculation formula: innodb_buffer_pool_size = (Total RAM - VoIPmonitor memory - OS & services overhead - safety margin) / 2. For a 32GB server with VoIPmonitor using 2GB, the recommended value is 14G. The second fix is to reduce VoIPmonitor's own memory consumption by adjusting buffer settings in `voipmonitor.conf`: decreasing `ringbuffer` (MB per interface), decreasing `max_buffer_mem` (MB), or reducing the number of sniffing interfaces. The total VoIPmonitor buffer memory is approximately (ringbuffer × number of interfaces) + max_buffer_mem. For example, with ringbuffer=50, max_buffer_mem=2000, and 3 interfaces, total is ~2150MB. After making configuration changes, the sniffer should be restarted with `systemctl restart voipmonitor`. Important notes: reducing buffers may increase packet loss during traffic spikes, so monitor packet loss statistics in the GUI after changes. If neither MySQL innodb_buffer_pool_size tuning nor VoIPmonitor buffer reduction resolves OOM events, the solution is to increase physical RAM. Step 10 addresses a specific issue where sensor upgrade fails with "Permission denied" errors when executing scripts from `/tmp`, or the service fails to restart after upgrade. This is caused by the `/tmp` partition being mounted with the `noexec` flag. The solution involves checking mount options with `mount | grep /tmp`, remounting `/tmp` without the `noexec` flag using `mount -o remount,exec /tmp`, and making the change permanent by editing `/etc/fstab` to remove the `noexec` option from the `/tmp` mount line. The Appendix provides correct tshark display filter syntax for analyzing SIP traffic, including examples for filtering by SIP method (sip.Method == INVITE), searching for phone numbers (sip contains "number"), extracting Call-IDs (-T fields -e sip.Call-ID), and filtering by response codes (sip.Status-Code).
 
'''Keywords:''' troubleshooting, no calls, not sniffing, no data, no CDRs, tshark, wireshark, promiscuous mode, promisc, ifconfig, ip link, SPAN, RSPAN, ERSPAN, GRE, TZSP, VXLAN, port mirroring, voipmonitor.conf, interface, sipport, filter, capture rules, Skip ON, GUI capture rules, reload sniffer, backup and restore, syslog, logs, permission denied, display filter, sip.Method, sip.Call-ID, sip.Status-Code, sip contains, OOM, out of memory, OOM killer, killed process, mysqld killed, free -h, memory usage, dmesg, swap, RAM upgrade, innodb_buffer_pool_size, GUI update, sensor update, buffer pool calculation, innodb_buffer_pool too high, correct buffer pool size, calculate innodb_buffer_pool, killall, sensor unresponsive, service fails to start after update, /tmp, noexec, mount, fstab, upgrade fails, permission denied from tmp, sensor upgrade permission denied, tmp noexec flag, remount tmp exec, HEAP FULL, packetbuffer MEMORY IS FULL, dmesg -T, ext4-fs error, I/O error, NVRAM WARNING, storage hardware errors, disk failure, smart error, ata failed, buffer I/O error, smartctl, mdstat, RAID health, iostat, dstat, ioping, NFS latency, disk I/O performance, iowait, performance bottleneck, hardware failure vs performance, binary not found, service fails to start after crash, voipmonitor binary renaming, voipmonitor_ underscore, watchdog crash, binary renamed with underscore, mv voipmonitor_ voipmonitor, ls -l /usr/local/sbin/voipmonitor_, ringbuffer, max_buffer_mem, voipmonitor buffer memory, reduce voipmonitor memory, voipmonitor buffer calculation, sniffer buffers, packetbuffer, ring buffer size, maximum buffer memory, voipmonitor memory settings
<!-- This section is for AI/RAG systems. Do not edit manually. -->
'''Key Questions:'''
 
* Why is VoIPmonitor not recording any calls?
=== Summary ===
* VoIPmonitor service fails to start after crash with binary not found error, what to do?
Comprehensive troubleshooting guide for VoIPmonitor sniffer/sensor problems. Covers: verifying traffic reaches interface (tcpdump/tshark), diagnosing no calls recorded (service, config, capture rules, SPAN), missing audio/RTP issues (one-way audio, NAT, natalias, rtp_check_both_sides_by_sdp), PACKETBUFFER FULL errors (I/O vs CPU bottleneck diagnosis using syslog metrics heap/t0CPU/SQLq and Linux tools iostat/iotop/ioping), manager commands for thread monitoring (sniffer_threads via socket or port 5029), t0 single-core capture limit and solutions (DPDK/Napatech kernel bypass), I/O solutions (NVMe/SSD, async writes, pcap_dump_writethreads), CPU solutions (max_buffer_mem 10GB+, jitterbuffer tuning), OOM issues (MySQL buffer pool, voipmonitor buffers), network interface problems (promiscuous mode, drops, offloading), packet ordering, database issues (SQL queue, Error 1062).
* How do I fix voipmonitor binary not found error?
 
* Where is the voipmonitor binary located?
=== Keywords ===
* Why is the voipmonitor binary renamed with underscore?
troubleshooting, sniffer, sensor, no calls, missing audio, one-way audio, RTP, PACKETBUFFER FULL, memory is FULL, buffer saturation, I/O bottleneck, CPU bottleneck, heap, t0CPU, t1CPU, t2CPU, SQLq, comp, tacCPU, iostat, iotop, ioping, sniffer_threads, manager socket, port 5029, thread CPU, t0 thread, single-core limit, DPDK, Napatech, kernel bypass, NVMe, SSD, async write, pcap_dump_writethreads, tar_maxthreads, max_buffer_mem, jitterbuffer, interface_ip_filter, OOM, out of memory, innodb_buffer_pool_size, promiscuous mode, interface drops, ethtool, packet ordering, SPAN, mirror, SQL queue, Error 1062, natalias, NAT, id_sensor, snaplen, capture rules, tcpdump, tshark
* How can I check if VoIP traffic is reaching my sensor server?
 
* What command can I use to see live SIP traffic on the command line?
=== Key Questions ===
* How do I enable promiscuous mode on my network card?
* Why are no calls being recorded in VoIPmonitor?
* Do I need promiscuous mode for ERSPAN or GRE tunnels?
* How to diagnose PACKETBUFFER FULL or memory is FULL error?
* Does ERSPAN require promiscuous mode on the receiving interface?
* How to determine if bottleneck is I/O or CPU?
* VoIPmonitor is running but I have no new calls in the GUI, what should I check first?
* What do heap values in syslog mean?
* Where can I find the log files for the VoIPmonitor sniffer?
* What does t0CPU percentage indicate?
* What are the most common reasons for VoIPmonitor not capturing data?
* How to use sniffer_threads manager command?
* How do I filter tshark output for SIP INVITE messages?
* How to connect to manager socket or port 5029?
* What is the correct tshark filter syntax to find a specific phone number?
* What to do when t0 thread is at 100%?
* How do I extract Call-ID from a pcap file using tshark?
* How to fix one-way audio or missing RTP?
* What tshark filter shows all SIP 4xx and 5xx error responses?
* How to configure natalias for NAT?
* Why is my VoIPmonitor probe stopping processing calls even though network traffic is visible?
* How to increase max_buffer_mem for high traffic?
* What should I check if the probe sees SIP packets on the interface but processes no calls?
* How to disable jitterbuffer to save CPU?
* How do GUI capture rules affect call processing?
* What causes OOM kills of voipmonitor or MySQL?
* What does the "Skip" option in capture rules do?
* How to check disk I/O performance with iostat?
* How do I troubleshoot capture rules that are blocking calls?
* How to enable promiscuous mode on interface?
* VoIPmonitor server stops processing CDRs and needs restart. What could be wrong?
* How to fix packet ordering issues with SPAN?
* Why does sensor upgrade fail with permission denied errors from /tmp?
* What is Error 1062 duplicate entry?
* How do I fix noexec flag on /tmp directory?
* How to verify traffic reaches capture interface?
* VoIPmonitor permission denied when upgrading sensor, what to do?
* How do I check mount options for /tmp partition?
* What command checks if /tmp has noexec flag?
* How to remove noexec from /tmp mount point?
* Sensor upgrade fails in tmp directory, why?
* Why does MySQL crash and restart on my VoIPmonitor server?
* How do I check for OOM killer events in Linux?
* What does the error "Out of memory: Kill process" mean?
* How can I monitor memory usage on my VoIPmonitor server?
* What command shows available memory in Linux?
* How do I fix OOM killer issues on VoIPmonitor?
* Why is mysqld getting killed on my system?
* How do I calculate the correct innodb_buffer_pool_size for VoIPmonitor?
* What is the formula for innodb_buffer_pool_size on a shared server?
* My innodb_buffer_pool_size is too high, how do I fix OOM issues?
* What should innodb_buffer_pool_size be set to on a 32GB server?
* Does MySQL innodb_buffer_pool_size cause OOM killer?
* Sensor becomes unresponsive after GUI update, what should I do?
* How do I restart a sensor service that fails to start after GUI update?
* What is the fix for sensor service not starting after web GUI update?
* Why is my sensor service stuck after updating through the GUI?
* How to fix unresponsive sensor after GUI upgrade using killall?
* Sensor service fails after GUI upgrade, what commands to run?
* How do I force kill all voipmonitor processes?
* VoIPmonitor crashes with HEAP FULL error, what should I check?
* sniffer crashes with packetbuffer MEMORY IS FULL messages, how to fix?
* How do I check for disk hardware errors in Linux?
* What command shows storage errors in the kernel buffer?
* How to distinguish between disk hardware failure vs performance bottleneck?
* What are the signs of a failing hard drive in dmesg?
* How do I check SMART status of a hard drive?
* What does ext4-fs error mean in dmesg?
* What is an I/O error in kernel logs?
* What is NVRAM WARNING on RAID controller?
* How do I check disk I/O performance on Linux?
* How to measure NFS latency for VoIPmonitor?
* What is iostat command and how do I use it?
* How to test disk speed with ioping?
* Should I replace disk or tune settings for HEAP FULL errors?
* Voipmonitor sensor crashing with HEAP FULL, first diagnostic step?

Latest revision as of 19:08, 22 January 2026

Sniffer Troubleshooting

This page covers common VoIPmonitor sniffer/sensor problems organized by symptom. For configuration reference, see Sniffer_configuration. For performance tuning, see Scaling.

Critical First Step: Is Traffic Reaching the Interface?

⚠️ Warning: Before any sensor tuning, verify packets are reaching the network interface. If packets aren't there, no amount of sensor configuration will help.

# Check for SIP traffic on the capture interface
tcpdump -i eth0 -nn "host <PROBLEMATIC_IP> and port 5060" -c 10

# If no packets: Network/SPAN issue - contact network admin
# If packets visible: Proceed with sensor troubleshooting below

Quick Diagnostic Checklist

Check Command Expected Result
Service running systemctl status voipmonitor Active (running)
Traffic on interface tshark -i eth0 -c 5 -Y "sip" SIP packets displayed
Interface errors ip -s link show eth0 No RX errors/drops
Promiscuous mode ip link show eth0 PROMISC flag present
Logs grep voip No critical errors
GUI rules Settings → Capture Rules No unexpected "Skip" rules

No Calls Being Recorded

Service Not Running

# Check status
systemctl status voipmonitor

# View recent logs
journalctl -u voipmonitor --since "10 minutes ago"

# Start/restart
systemctl restart voipmonitor

Common startup failures:

  • Interface not found: Check interface in voipmonitor.conf matches ip a output
  • Port already in use: Another process using the management port
  • License issue: Check License for activation problems

Wrong Interface or Port Configuration

# Check current config
grep -E "^interface|^sipport" /etc/voipmonitor.conf

# Example correct config:
# interface = eth0
# sipport = 5060

💡 Tip:

GUI Capture Rules Blocking

Navigate to Settings → Capture Rules and check for rules with action "Skip" that may be blocking calls. Rules are processed in order - a Skip rule early in the list will block matching calls.

See Capture_rules for detailed configuration.

SPAN/Mirror Not Configured

If tcpdump shows no traffic:

  1. Verify switch SPAN/mirror port configuration
  2. Check that both directions (ingress + egress) are mirrored
  3. Confirm VLAN tagging is preserved if needed
  4. Test physical connectivity (cable, port status)

See Sniffing_modes for SPAN, RSPAN, and ERSPAN configuration.

Filter Parameter Too Restrictive

If filter is set in voipmonitor.conf, it may exclude traffic:

# Check filter
grep "^filter" /etc/voipmonitor.conf

# Temporarily disable to test
# Comment out the filter line and restart


Missing id_sensor Parameter

Symptom: SIP packets visible in Capture/PCAP section but missing from CDR, SIP messages, and Call flow.

Cause: The id_sensor parameter is not configured or is missing. This parameter is required to associate captured packets with the CDR database.

Solution:

# Check if id_sensor is set
grep "^id_sensor" /etc/voipmonitor.conf

# Add or correct the parameter
echo "id_sensor = 1" >> /etc/voipmonitor.conf

# Restart the service
systemctl restart voipmonitor

💡 Tip: Use a unique numeric identifier (1-65535) for each sensor. Essential for multi-sensor deployments. See id_sensor documentation.

Missing Audio / RTP Issues

One-Way Audio (Asymmetric Mirroring)

Symptom: SIP recorded but only one RTP direction captured.

Cause: SPAN port configured for only one direction.

Diagnosis:

# Count RTP packets per direction
tshark -i eth0 -Y "rtp" -T fields -e ip.src -e ip.dst | sort | uniq -c

If one direction shows 0 or very few packets, configure the switch to mirror both ingress and egress traffic.

RTP Not Associated with Call

Symptom: Audio plays in sniffer but not in GUI, or RTP listed under wrong call.

Possible causes:

1. SIP and RTP on different interfaces/VLANs:

# voipmonitor.conf - enable automatic RTP association
auto_enable_use_blocks = yes

2. NAT not configured:

# voipmonitor.conf - for NAT scenarios
natalias = <public_ip> <private_ip>

# If not working, try reversed order:
natalias = <private_ip> <public_ip>

3. External device modifying media ports:

If SDP advertises one port but RTP arrives on different port (SBC/media server issue):

# Compare SDP ports vs actual RTP
tshark -r call.pcap -Y "sip.Method == INVITE" -V | grep "m=audio"
tshark -r call.pcap -Y "rtp" -T fields -e udp.dstport | sort -u

If ports don't match, the external device must be configured to preserve SDP ports - VoIPmonitor cannot compensate.

RTP Incorrectly Associated with Wrong Call (PBX Port Reuse)

Symptom: RTP streams from one call appear associated with a different CDR when your PBX aggressively reuses the same IP:port across multiple calls.

Cause: When PBX reuses media ports, VoIPmonitor may incorrectly correlate RTP packets to the wrong call based on weaker correlation methods.

Solution: Enable rtp_check_both_sides_by_sdp to require verification of both source and destination IP:port against SDP:

# voipmonitor.conf - require both source and destination to match SDP
rtp_check_both_sides_by_sdp = yes

# Alternative (strict) mode - allows initial unverified packets
rtp_check_both_sides_by_sdp = strict

⚠️ Warning: Enabling this may prevent RTP association for calls using NAT, as the source IP:port will not match the SDP. Use natalias mappings or the strict setting to mitigate this.

Snaplen Truncation

Symptom: Large SIP messages truncated, incomplete headers.

Solution:

# voipmonitor.conf - increase packet capture size
snaplen = 8192

For Kamailio siptrace, also check trace_msg_fragment_size in Kamailio config. See snaplen documentation.

PACKETBUFFER Saturation

Symptom: Log shows PACKETBUFFER: memory is FULL, truncated RTP recordings.

⚠️ Warning: This alert refers to VoIPmonitor's internal packet buffer (max_buffer_mem), NOT system RAM. High system memory availability does not prevent this error. The root cause is always a downstream bottleneck (disk I/O or CPU) preventing packets from being processed fast enough.

Before testing solutions, gather diagnostic data:

  • Check sensor logs: /var/log/syslog (Debian/Ubuntu) or /var/log/messages (RHEL/CentOS)
  • Generate debug log via GUI: Tools → Generate debug log

Diagnose: I/O vs CPU Bottleneck

⚠️ Warning: Do not guess the bottleneck source. Use proper diagnostics first to identify whether the issue is disk I/O, CPU, or database-related. Disabling storage as a test is valid but should be used to confirm findings, not as the primary diagnostic method.

Step 1: Check IO[] Metrics (v2026.01.3+)

Starting with version 2026.01.3, VoIPmonitor includes built-in disk I/O monitoring that directly shows disk saturation status:

[283.4/283.4Mb/s] IO[B1.1|L0.7|U45|C75|W125|R10|WI1.2k|RI0.5k]

Quick interpretation:

Metric Meaning Problem Indicator
C (Capacity) % of disk's sustainable throughput used C ≥ 80% = Warning, C ≥ 95% = Saturated
L (Latency) Current write latency in ms L ≥ 3× B (baseline) = Saturated
U (Utilization) % time disk is busy U > 90% = Disk at limit

If you see DISK_SAT or WARN after IO[]:

IO[B1.1|L8.5|U98|C97|W890|R5|WI12.5k|RI0.1k] DISK_SAT

→ This confirms I/O bottleneck. Skip to I/O Bottleneck Solutions.

For older versions or additional confirmation, continue with the steps below.

ℹ️ Note: See Syslog Status Line - IO[] section for detailed field descriptions.

Step 2: Read the Full Syslog Status Line

VoIPmonitor outputs a status line every 10 seconds. This is your first diagnostic tool:

# Monitor in real-time
journalctl -u voipmonitor -f
# or
tail -f /var/log/syslog | grep voipmonitor

Example status line:

calls[424] PS[C:4 S:41 R:13540] SQLq[C:0 M:0] heap[45|30|20] comp[48] [25.6Mb/s] t0CPU[85%] t1CPU[12%] t2CPU[8%] tacCPU[8|8|7|7%] RSS/VSZ[365|1640]MB

Key metrics for bottleneck identification:

Metric What It Indicates I/O Bottleneck Sign CPU Bottleneck Sign
heap[A|B|C] Buffer fill % (primary / secondary / processing) High A with low t0CPU High A with high t0CPU
t0CPU[X%] Packet capture thread (single-core, cannot parallelize) Low (<50%) High (>80%)
comp[X] Active compression threads Very high (maxed out) Normal
SQLq[C:X M:Y] Pending SQL queries Growing = database bottleneck Stable
tacCPU[...] TAR compression threads All near 100% = compression bottleneck Normal

Interpretation flowchart:

Step 3: Linux I/O Diagnostics

Use these standard Linux tools to confirm I/O bottleneck:

Install required tools:

# Debian/Ubuntu
apt install sysstat iotop ioping

# CentOS/RHEL
yum install sysstat iotop ioping

2a) iostat - Disk utilization and wait times

# Run for 10 intervals of 2 seconds
iostat -xz 2 10

Key output columns:

Device   r/s     w/s   rkB/s   wkB/s  await  %util
sda     12.50  245.30  50.00  1962.40  45.23  98.50
Column Description Problem Indicator
%util Device utilization percentage > 90% = disk saturated
await Average I/O wait time (ms) > 20ms for SSD, > 50ms for HDD = high latency
w/s Writes per second Compare with disk's rated IOPS

2b) iotop - Per-process I/O usage

# Show I/O by process (run as root)
iotop -o

Look for voipmonitor or mysqld dominating I/O. If voipmonitor shows high DISK WRITE but system %util is 100%, disk cannot keep up.

2c) ioping - Quick latency check

# Test latency on VoIPmonitor spool directory
cd /var/spool/voipmonitor
ioping -c 20 .

Expected results:

Storage Type Healthy Latency Problem Indicator
NVMe SSD < 0.5 ms > 2 ms
SATA SSD < 1 ms > 5 ms
HDD (7200 RPM) < 10 ms > 30 ms

Step 4: Linux CPU Diagnostics

3a) top - Overall CPU usage

# Press '1' to show per-core CPU
top

Look for:

  • Individual CPU core at 100% (t0 thread is single-threaded)
  • High %wa (I/O wait) vs high %us/%sy (CPU-bound)

3b) Verify voipmonitor threads

# Show voipmonitor threads with CPU usage
top -H -p $(pgrep voipmonitor)

If one thread shows ~100% CPU while others are low, you have a CPU bottleneck on the capture thread (t0).

Step 5: Decision Matrix

Observation Likely Cause Go To
heap high, t0CPU > 80%, iostat %util low CPU Bottleneck CPU Solution
heap high, t0CPU < 50%, iostat %util > 90% I/O Bottleneck I/O Solution
heap high, t0CPU < 50%, iostat %util < 50%, SQLq growing Database Bottleneck Database Solution
heap normal, comp maxed, tacCPU all ~100% Compression Bottleneck (type of I/O) I/O Solution

Step 6: Confirmation Test (Optional)

After identifying the likely cause with the tools above, you can confirm with a storage disable test:

# /etc/voipmonitor.conf - temporarily disable all storage
savesip = no
savertp = no
savertcp = no
savegraph = no
systemctl restart voipmonitor
# Monitor for 5-10 minutes during peak traffic
journalctl -u voipmonitor -f | grep heap
  • If heap values drop to near zero → confirms I/O bottleneck
  • If heap values remain high → confirms CPU bottleneck

⚠️ Warning: Remember to re-enable storage after testing! This test causes call recordings to be lost.

Solution: I/O Bottleneck

ℹ️ Note: If you see IO[...] DISK_SAT or WARN in the syslog status line (v2026.01.3+), disk saturation is already confirmed. See IO[] Metrics for details.

Quick confirmation (for older versions):

Temporarily save only RTP headers to reduce disk write load:

# /etc/voipmonitor.conf
savertp = header

Restart the sniffer and monitor. If heap usage stabilizes and "MEMORY IS FULL" errors stop, the issue is confirmed to be storage I/O.

Check storage health before upgrading:

# Check drive health
smartctl -a /dev/sda

# Check for I/O errors in system logs
dmesg | grep -i "i/o error\|sd.*error\|ata.*error"

Look for reallocated sectors, pending sectors, or I/O errors. Replace failing drives before considering upgrades.

Storage controller cache settings:

Storage Type Recommended Cache Mode
HDD / NAS WriteBack (requires battery-backed cache)
SSD WriteThrough (or WriteBack with power loss protection)

Use vendor-specific tools to configure cache policy (megacli, ssacli, perccli).

Storage upgrades (in order of effectiveness):

Solution IOPS Improvement Notes
NVMe SSD 50-100x vs HDD Best option, handles 10,000+ concurrent calls
SATA SSD 20-50x vs HDD Good option, handles 5,000+ concurrent calls
RAID 10 with BBU 5-10x vs single disk Enable WriteBack cache (requires battery backup)
Separate storage server Variable Use client/server mode

Filesystem tuning (ext4):

# Check current mount options
mount | grep voipmonitor

# Recommended mount options for /var/spool/voipmonitor
# Add to /etc/fstab: noatime,data=writeback,barrier=0
# WARNING: barrier=0 requires battery-backed RAID

Verify improvement:

# After changes, monitor iostat
iostat -xz 2 10
# %util should drop below 70%, await should decrease

Solution: CPU Bottleneck

Identify CPU Bottleneck Using Manager Commands

VoIPmonitor provides manager commands to monitor thread CPU usage in real-time. This is essential for identifying which thread is saturated.

Connect to manager interface:

# Via Unix socket (local, recommended)
echo 'sniffer_threads' | nc -U /tmp/vm_manager_socket

# Via TCP port 5029 (remote or local)
echo 'sniffer_threads' | nc 127.0.0.1 5029

# Monitor continuously (every 2 seconds)
watch -n 2 "echo 'sniffer_threads' | nc -U /tmp/vm_manager_socket"

ℹ️ Note: TCP port 5029 is encrypted by default. For unencrypted access, set manager_enable_unencrypted = yes in voipmonitor.conf (security risk on public networks).

Example output:

t0 - binlog1 fifo pcap read          (  12345) :  78.5  FIFO  99     1234
t2 - binlog1 pb write                (  12346) :  12.3               456
rtp thread binlog1 binlog1 0         (  12347) :   8.1               234
rtp thread binlog1 binlog1 1         (  12348) :   6.2               198
t1 - binlog1 call processing         (  12349) :   4.5               567
tar binlog1 compression 0            (  12350) :   3.2                89

Column interpretation:

Column Description
Thread name Descriptive name (t0=capture, t1=call processing, t2=packet buffer write)
(TID) Linux thread ID (useful for top -H -p TID)
CPU % Current CPU usage percentage - key metric
Sched Scheduler type (FIFO = real-time, empty = normal)
Priority Thread priority
CS/s Context switches per second

Critical threads to watch:

Thread Role If at 90-100%
t0 (pcap read) Packet capture from NIC Single-core limit reached! Cannot parallelize. Need DPDK/Napatech.
t2 (pb write) Packet buffer processing Processing bottleneck. Check t2CPU breakdown.
rtp thread RTP packet processing Threads auto-scale. If still saturated, consider DPDK/Napatech.
tar compression PCAP archiving I/O bottleneck (compression waiting for disk)
mysql store Database writes Database bottleneck. Check SQLq metric.

⚠️ Warning: If t0 thread is at 90-100%, you have hit the fundamental single-core capture limit. The t0 thread reads packets from the kernel and cannot be parallelized. Disabling features like jitterbuffer will NOT help - those run on different threads. The only solutions are:

  • Reduce captured traffic using interface_ip_filter or BPF filter
  • Use kernel bypass (DPDK or Napatech) which eliminates kernel overhead entirely

Interpreting t2CPU Detailed Breakdown

The syslog status line shows t2CPU with detailed sub-metrics:

t2CPU[pb:10/ d:39/ s:24/ e:17/ c:6/ g:6/ r:7/ rm:24/ rh:16/ rd:19/]
Code Function High Value Indicates
pb Packet buffer output Buffer management overhead
d Dispatch Structure creation bottleneck
s SIP parsing Complex/large SIP messages
e Entity lookup Call table lookup overhead
c Call processing Call state machine processing
g Register processing High REGISTER volume
r, rm, rh, rd RTP processing stages High RTP volume (threads auto-scale)

Thread auto-scaling: VoIPmonitor automatically spawns additional threads when load increases:

  • If d > 50% → SIP parsing thread (s) starts
  • If s > 50% → Entity lookup thread (e) starts
  • If e > 50% → Call/register/RTP threads start

Configuration for High Traffic (>10,000 calls/sec)

# /etc/voipmonitor.conf

# Increase buffer to handle processing spikes (value in MB)
# 10000 = 10 GB - can go higher (20000, 30000+) if RAM allows
# Larger buffer absorbs I/O and CPU spikes without packet loss
max_buffer_mem = 10000

# Use IP filter instead of BPF (more efficient)
interface_ip_filter = 10.0.0.0/8
interface_ip_filter = 192.168.0.0/16
# Comment out any 'filter' parameter

CPU Optimizations

# /etc/voipmonitor.conf

# Reduce jitterbuffer calculations to save CPU (keeps MOS-F2 metric)
jitterbuffer_f1 = no
jitterbuffer_f2 = yes
jitterbuffer_adapt = no

# If MOS metrics are not needed at all, disable everything:
# jitterbuffer_f1 = no
# jitterbuffer_f2 = no
# jitterbuffer_adapt = no

Kernel Bypass Solutions (Extreme Loads)

When t0 thread hits 100% on standard NIC, kernel bypass is the only solution:

Solution Type CPU Reduction Use Case
DPDK Open-source ~70% Multi-gigabit on commodity hardware
Napatech Hardware SmartNIC >97% (< 3% at 10Gbit) Extreme performance requirements

Verify Improvement

# Monitor thread CPU after changes
watch -n 2 "echo 'sniffer_threads' | nc -U /tmp/vm_manager_socket | head -10"

# Or monitor syslog
journalctl -u voipmonitor -f
# t0CPU should drop, heap values should stay < 20%

ℹ️ Note: After changes, monitor syslog heap[A|B|C] values - should stay below 20% during peak traffic. See Syslog_Status_Line for detailed metric explanations.

Storage Hardware Failure

Symptom: Sensor shows disconnected (red X) with "DROPPED PACKETS" at low traffic volumes.

Diagnosis:

# Check disk health
smartctl -a /dev/sda

# Check RAID status (if applicable)
cat /proc/mdstat
mdadm --detail /dev/md0

Look for reallocated sectors, pending sectors, or RAID degraded state. Replace failing disk.

OOM (Out of Memory)

Identify OOM Victim

# Check for OOM kills
dmesg | grep -i "out of memory\|oom\|killed process"
journalctl --since "1 hour ago" | grep -i oom

MySQL Killed by OOM

Reduce InnoDB buffer pool:

# /etc/mysql/my.cnf
innodb_buffer_pool_size = 2G  # Reduce from default

Voipmonitor Killed by OOM

Reduce buffer sizes in voipmonitor.conf:

max_buffer_mem = 2000  # Reduce from default
ringbuffer = 50        # Reduce from default

Runaway External Process

# Find memory-hungry processes
ps aux --sort=-%mem | head -20

# Kill orphaned/runaway process
kill -9 <PID>

For servers limited to 16GB RAM or when experiencing repeated MySQL OOM kills:

# /etc/my.cnf or /etc/mysql/mariadb.conf.d/50-server.cnf
[mysqld]
# On 16GB server: 6GB buffer pool + 6GB MySQL overhead = 12GB total
# Leaves 4GB for OS + GUI, preventing OOM
innodb_buffer_pool_size = 6G

# Enable write buffering (may lose up to 1s of data on crash but reduces memory pressure)
innodb_flush_log_at_trx_commit = 2

Restart MySQL after changes:

systemctl restart mysql
# or
systemctl restart mariadb

SQL Queue Growth from Non-Call Data

If sip-register, sip-options, or sip-subscribe are enabled, non-call SIP-messages (OPTIONS, REGISTER, SUBSCRIBE, NOTIFY) can accumulate in the database and cause the SQL queue to grow unbounded. This increases MySQL memory usage and leads to OOM kills of mysqld.

⚠️ Warning: Even with reduced innodb_buffer_pool_size, SQL queue will grow indefinitely without cleanup of non-call data.

Solution: Enable automatic cleanup of old non-call data

# /etc/voipmonitor.conf
# cleandatabase=2555 automatically deletes partitions older than 7 years
# Covers: CDR, register_state, register_failed, and sip_msg (OPTIONS/SUBSCRIBE/NOTIFY)
cleandatabase = 2555

Restart the sniffer after changes:

systemctl restart voipmonitor

ℹ️ Note: See Data_Cleaning for detailed configuration options and other cleandatabase_* parameters.

Service Startup Failures

Interface No Longer Exists

After OS upgrade, interface names may change (eth0 → ensXXX):

# Find current interface names
ip a

# Update all config locations
grep -r "interface" /etc/voipmonitor.conf /etc/voipmonitor.conf.d/

# Also check GUI: Settings → Sensors → Configuration

Missing Dependencies

# Install common missing package
apt install libpcap0.8  # Debian/Ubuntu
yum install libpcap     # RHEL/CentOS

Network Interface Issues

Promiscuous Mode

Required for SPAN port monitoring:

# Enable
ip link set eth0 promisc on

# Verify
ip link show eth0 | grep PROMISC

ℹ️ Note: Promiscuous mode is NOT required for ERSPAN/GRE tunnels where traffic is addressed to the sensor.

Interface Drops

# Check for drops
ip -s link show eth0 | grep -i drop

# If drops present, increase ring buffer
ethtool -G eth0 rx 4096

Bonded/EtherChannel Interfaces

Symptom: False packet loss when monitoring bond0 or br0.

Solution: Monitor physical interfaces, not logical:

# voipmonitor.conf - use physical interfaces
interface = eth0,eth1

Network Offloading Issues

Symptom: Kernel errors like bad gso: type: 1, size: 1448

# Disable offloading on capture interface
ethtool -K eth0 gso off tso off gro off lro off

Packet Ordering Issues

If SIP messages appear out of sequence:

First: Rule out Wireshark display artifact - disable "Analyze TCP sequence numbers" in Wireshark. See FAQ.

If genuine reordering: Usually caused by packet bursts in network infrastructure. Use tcpdump to verify packets arrive out of order at the interface. Work with network admin to implement QoS or traffic shaping. For persistent issues, consider dedicated capture card with hardware timestamping (see Napatech).

ℹ️ Note: For out-of-order packets in client/server mode (multiple sniffers), see Sniffer_distributed_architecture for pcap_queue_dequeu_window_length configuration.

Solutions for SPAN/Mirroring Reordering

If packets arrive out of order at the SPAN/mirror port (e.g., 302 responses before INVITE causing "000 no response" errors):

1. Configure switch to preserve packet order: Many switches allow configuring SPAN/mirror ports to maintain packet ordering. Consult your switch documentation for packet ordering guarantees in mirroring configuration.

2. Replace SPAN with TAP or packet broker: Unlike software-based SPAN mirroring, hardware TAPs and packet brokers guarantee packet order. Consider upgrading to a dedicated TAP or packet broker device for mission-critical monitoring.

Database Issues

SQL Queue Overload

Symptom: Growing SQLq metric, potential coredumps.

# voipmonitor.conf - increase threads
mysqlstore_concat_limit_cdr = 1000
cdr_check_exists_callid = 0

Error 1062 - Lookup Table Limit

Symptom: Duplicate entry '16777215' for key 'PRIMARY'

Quick fix:

# voipmonitor.conf
cdr_reason_string_enable = no

See Database Troubleshooting for complete solution.

Bad Packet Errors

Symptom: bad packet with ether_type 0xFFFF detected on interface

Diagnosis:

# Run diagnostic (let run 30-60 seconds, then kill)
voipmonitor --check_bad_ether_type=eth0

# Find and kill the diagnostic process
ps ax | grep voipmonitor
kill -9 <PID>

Causes: corrupted packets, driver issues, VLAN tagging problems. Check ethtool -S eth0 for interface errors.

Useful Diagnostic Commands

tshark Filters for SIP

# All SIP INVITEs
tshark -r capture.pcap -Y "sip.Method == INVITE"

# Find specific phone number
tshark -r capture.pcap -Y 'sip contains "5551234567"'

# Get Call-IDs
tshark -r capture.pcap -Y "sip.Method == INVITE" -T fields -e sip.Call-ID

# SIP errors (4xx, 5xx)
tshark -r capture.pcap -Y "sip.Status-Code >= 400"

Interface Statistics

# Detailed NIC stats
ethtool -S eth0

# Watch packet rates
watch -n 1 'cat /proc/net/dev | grep eth0'

See Also





AI Summary for RAG

Summary

Comprehensive troubleshooting guide for VoIPmonitor sniffer/sensor problems. Covers: verifying traffic reaches interface (tcpdump/tshark), diagnosing no calls recorded (service, config, capture rules, SPAN), missing audio/RTP issues (one-way audio, NAT, natalias, rtp_check_both_sides_by_sdp), PACKETBUFFER FULL errors (I/O vs CPU bottleneck diagnosis using syslog metrics heap/t0CPU/SQLq and Linux tools iostat/iotop/ioping), manager commands for thread monitoring (sniffer_threads via socket or port 5029), t0 single-core capture limit and solutions (DPDK/Napatech kernel bypass), I/O solutions (NVMe/SSD, async writes, pcap_dump_writethreads), CPU solutions (max_buffer_mem 10GB+, jitterbuffer tuning), OOM issues (MySQL buffer pool, voipmonitor buffers), network interface problems (promiscuous mode, drops, offloading), packet ordering, database issues (SQL queue, Error 1062).

Keywords

troubleshooting, sniffer, sensor, no calls, missing audio, one-way audio, RTP, PACKETBUFFER FULL, memory is FULL, buffer saturation, I/O bottleneck, CPU bottleneck, heap, t0CPU, t1CPU, t2CPU, SQLq, comp, tacCPU, iostat, iotop, ioping, sniffer_threads, manager socket, port 5029, thread CPU, t0 thread, single-core limit, DPDK, Napatech, kernel bypass, NVMe, SSD, async write, pcap_dump_writethreads, tar_maxthreads, max_buffer_mem, jitterbuffer, interface_ip_filter, OOM, out of memory, innodb_buffer_pool_size, promiscuous mode, interface drops, ethtool, packet ordering, SPAN, mirror, SQL queue, Error 1062, natalias, NAT, id_sensor, snaplen, capture rules, tcpdump, tshark

Key Questions

  • Why are no calls being recorded in VoIPmonitor?
  • How to diagnose PACKETBUFFER FULL or memory is FULL error?
  • How to determine if bottleneck is I/O or CPU?
  • What do heap values in syslog mean?
  • What does t0CPU percentage indicate?
  • How to use sniffer_threads manager command?
  • How to connect to manager socket or port 5029?
  • What to do when t0 thread is at 100%?
  • How to fix one-way audio or missing RTP?
  • How to configure natalias for NAT?
  • How to increase max_buffer_mem for high traffic?
  • How to disable jitterbuffer to save CPU?
  • What causes OOM kills of voipmonitor or MySQL?
  • How to check disk I/O performance with iostat?
  • How to enable promiscuous mode on interface?
  • How to fix packet ordering issues with SPAN?
  • What is Error 1062 duplicate entry?
  • How to verify traffic reaches capture interface?