Sniffer troubleshooting: Difference between revisions

From VoIPmonitor.org
(Add NFS timeout error checking for missing CDRs and PCAP data - adds Step 1b and 1c for distributed NFS setup troubleshooting)
(Add Step 8: Missing CDRs for calls with large packets (snaplen truncation and MTU mismatch))
Line 1: Line 1:
{{DISPLAYTITLE:Troubleshooting: No Calls Being Sniffed}}
{{DISPLAYTITLE:Troubleshooting: No Calls Being Sniffed}}
[[Category:Troubleshooting]]
[[Category:Sensor]]


'''This guide provides a systematic process to diagnose why the VoIPmonitor sensor might not be capturing any calls. Use it to quickly identify and resolve the most common issues.'''
'''This guide provides a systematic, step-by-step process to diagnose why the VoIPmonitor sensor might not be capturing any calls. Follow these steps in order to quickly identify and resolve the most common issues.'''


__TOC__
== Step 1: Is the VoIPmonitor Service Running Correctly? ==
 
First, confirm that the sensor process is active and loaded the correct configuration file.
== Quick Diagnostic Flowchart ==
 
Use this flowchart to quickly identify where your problem lies:
 
<kroki lang="mermaid">
flowchart TD
    A[No Calls in GUI] --> B{Is voipmonitor<br/>service running?}
    B -->|No| C[Start service:<br/>systemctl start voipmonitor]
    B -->|Yes| D{Does tshark see<br/>SIP traffic?}
    D -->|No| E[Network/Mirror<br/>Configuration Issue]
    D -->|Yes| F{Check voipmonitor.conf:<br/>interface, sipport, filter}
    F -->|Config OK| G{Check GUI<br/>Capture Rules}
    G -->|Rules OK| H{Database<br/>connection errors?}
    H -->|Yes| I[Fix MySQL connection<br/>in voipmonitor.conf]
    H -->|No| J[Check logs for<br/>other errors]
 
    E --> E1[Verify SPAN/TAP config]
    E --> E2[Check promiscuous mode<br/>for Layer 2 mirrors]
 
    C --> B
 
    style A fill:#ff6b6b
    style C fill:#4ecdc4
    style I fill:#4ecdc4
    style E1 fill:#ffe66d
    style E2 fill:#ffe66d
</kroki>
 
== Is the VoIPmonitor Service Running Correctly? ==
First, confirm the sensor process is active and loaded the correct configuration file.


;1. Check the service status (for modern systemd systems):
;1. Check the service status (for modern systemd systems):
<syntaxhighlight lang="bash">systemctl status voipmonitor</syntaxhighlight>
<pre>systemctl status voipmonitor</pre>
Look for a line that says <code>Active: active (running)</code>. If it is inactive or failed, try restarting it with <code>systemctl restart voipmonitor</code> and check the status again.
Look for a line that says <code>Active: active (running)</code>. If it is inactive or failed, try restarting it with `systemctl restart voipmonitor` and check the status again.


;2. Service Fails to Start with "Binary Not Found" After Crash:
;2. Verify the running process:
If the VoIPmonitor service fails to start after a crash or watchdog restart with an error message indicating the binary cannot be found (e.g., "No such file or directory" for <code>/usr/local/sbin/voipmonitor</code>), the binary may have been renamed with an underscore suffix during the crash recovery process.
<pre>ps aux | grep voipmonitor</pre>
 
Check for a renamed binary:
<syntaxhighlight lang="bash">
# Check if the standard binary path exists
ls -l /usr/local/sbin/voipmonitor
 
# If not found, look for a renamed version with underscore suffix
ls -l /usr/local/sbin/voipmonitor_*
</syntaxhighlight>
 
If you find a renamed binary (e.g., <code>voipmonitor_</code>, <code>voipmonitor_20250104</code>, etc.), rename it back to the standard name:
<syntaxhighlight lang="bash">
mv /usr/local/sbin/voipmonitor_ /usr/local/sbin/voipmonitor
</syntaxhighlight>
 
Then restart the service:
<syntaxhighlight lang="bash">
systemctl start voipmonitor
</syntaxhighlight>
 
Verify the service starts correctly:
<syntaxhighlight lang="bash">
systemctl status voipmonitor
</syntaxhighlight>
 
;3. Sensor Becomes Unresponsive After GUI Update:
If the sensor service fails to start or becomes unresponsive after updating a sensor through the Web GUI, the update process may have left the service in a stuck state. The solution is to forcefully stop the service and restart it using these commands:
<syntaxhighlight lang="bash">
# SSH into the sensor host and execute:
killall voipmonitor
systemctl stop voipmonitor
systemctl start voipmonitor
</syntaxhighlight>
After running these commands, verify the sensor status in the GUI to confirm it is responding correctly. This sequence ensures: (1) Any zombie or hung processes are terminated with <code>killall</code>, (2) systemd is fully stopped, and (3) a clean start of the service.
 
;4. Verify the running process:
<syntaxhighlight lang="bash">ps aux | grep voipmonitor</syntaxhighlight>
This command will show the running process and the exact command line arguments it was started with. Critically, ensure it is using the correct configuration file, for example: <code>--config-file /etc/voipmonitor.conf</code>. If it is not, there may be an issue with your startup script.
This command will show the running process and the exact command line arguments it was started with. Critically, ensure it is using the correct configuration file, for example: <code>--config-file /etc/voipmonitor.conf</code>. If it is not, there may be an issue with your startup script.


== Is Network Traffic Reaching the Server? ==
== Step 2: Is Network Traffic Reaching the Server? ==
If the service is running, verify if the VoIP packets (SIP/RTP) are actually arriving at the server's network interface. The best tool for this is <code>tshark</code> (the command-line version of Wireshark).
If the service is running, the next step is to verify if the VoIP packets (SIP/RTP) are actually arriving at the server's network interface. The best tool for this is `tshark` (the command-line version of Wireshark).


;1. Install tshark:
;1. Install tshark:
<syntaxhighlight lang="bash">
<pre>
# For Debian/Ubuntu
# For Debian/Ubuntu
apt-get update && apt-get install tshark
apt-get update && apt-get install tshark
Line 93: Line 24:
# For CentOS/RHEL/AlmaLinux
# For CentOS/RHEL/AlmaLinux
yum install wireshark
yum install wireshark
</syntaxhighlight>
</pre>


;2. Listen for SIP traffic on the correct interface:
;2. Listen for SIP traffic on the correct interface:
Replace <code>eth0</code> with the interface name you have configured in <code>voipmonitor.conf</code>.
Replace `eth0` with the interface name you have configured in `voipmonitor.conf`.
<syntaxhighlight lang="bash">
<pre>
tshark -i eth0 -Y "sip || rtp" -n
tshark -i eth0 -Y "sip || rtp" -n
</syntaxhighlight>
* '''If you see a continuous stream of SIP and RTP packets''', it means traffic is reaching the server, and the problem is likely in VoIPmonitor's configuration (see [[#Check the VoIPmonitor Configuration|Check the VoIPmonitor Configuration]]).
* '''If you see NO packets''', the problem lies with your network configuration. See [[#Troubleshoot Network and Interface Configuration|Troubleshoot Network and Interface Configuration]].
;3. Advanced: Capture to PCAP File for Definitive Testing
Live monitoring with tshark is useful for observation, but capturing traffic to a .pcap file during a test call provides definitive evidence for troubleshooting intermittent issues or specific call legs.
'''Method 1: Using tcpdump (Recommended)'''
<syntaxhighlight lang="bash">
# Start capture on the correct interface (replace eth0)
tcpdump -i eth0 -s 0 -w /tmp/test_capture.pcap port 5060
# Or capture both SIP and RTP traffic:
tcpdump -i eth0 -s 0 -w /tmp/test_capture.pcap "(port 5060 or udp)"
# Let it run while you make a test call with the missing call leg
# Press Ctrl+C to stop the capture
# Analyze the capture file:
tshark -r /tmp/test_capture.pcap -Y "sip"
</syntaxhighlight>
'''Method 2: Using tshark to capture to file'''
<syntaxhighlight lang="bash">
# Start capture:
tshark -i eth0 -w /tmp/test_capture.pcap -f "tcp port 5060 or udp"
# Make your test call, then press Ctrl+C to stop
# Analyze the capture:
tshark -r /tmp/test_capture.pcap -Y "sip" -V
</syntaxhighlight>
'''Decision Tree for PCAP Analysis:'''
After capturing a test call known to have a missing leg:
* '''If SIP packets are missing from the .pcap file:'''
** The problem is with your network mirroring configuration (SPAN/TAP port, AWS Traffic Mirroring, etc.)
** The packets never reached the VoIPmonitor sensor's network interface
** Fix the switch mirroring setup or infrastructure configuration first
* '''If SIP packets ARE present in the .pcap file but missing in the VoIPmonitor GUI:'''
** The problem is with VoIPmonitor's configuration or processing
** Packets reached the NIC but were not processed correctly
** Review [[#Check the VoIPmonitor Configuration|VoIPmonitor Configuration]] and [[#Check GUI Capture Rules (Causing Call Stops)|Capture Rules]]
'''Example Test Call Workflow:'''
<syntaxhighlight lang="bash">
# 1. Start capture
tcpdump -i eth0 -s 0 -w /tmp/test.pcap "port 5060 and host 10.0.1.100"
# 2. Make a test call from phone at 10.0.1.100 to 10.0.2.200
#    (a call that you know should have recordings but is missing)
# 3. Stop capture (Ctrl+C)
# 4. Check for the specific call's Call-ID
tshark -r /tmp/test.pcap -Y "sip" -T fields -e sip.Call-ID
# 5. Verify if packets for both A-leg and B-leg exist
tshark -r /tmp/test.pcap -Y "sip && ip.addr == 10.0.1.100"
# 6. Compare results with VoIPmonitor GUI
#    - If packets found in .pcap: VoIPmonitor software issue
#    - If packets missing from .pcap: Network mirroring issue
</syntaxhighlight>
=== Decision Tree - When SIP Packets Are Present but Not Captured ===
If <code>tshark</code> confirms SIP packets ARE present in the .pcap file but VoIPmonitor GUI shows no calls, follow this systematic troubleshooting approach:
* '''Check VoIPmonitor Configuration:''' Verify <code>interface</code>, <code>sipport</code>, and <code>filter</code> settings in <code>/etc/voipmonitor.conf</code>
* '''Check Capture Rules:''' GUI "Capture Rules" with Skip:ON can silently block traffic
* '''Check for Encapsulation:''' Traffic may be wrapped in ERSPAN, GRE, VXLAN, or VLAN tags (see [[#Check for Network Encapsulation Issues|Check for Encapsulation]] below)
* '''Check for Packet Drops:''' Review GUI → Settings → Sensors for drop counters
* '''Check Database Connection:''' Sensor may be processing calls but unable to write to MySQL
== Check for Network Encapsulation Issues ==
If <code>tshark</code> confirms traffic is reaching the interface but VoIPmonitor is not capturing it, network encapsulation may be the cause. Common encapsulation protocols like ERSPAN, GRE, VXLAN, or TZSP wrap the original VoIP packets in an additional protocol layer.
=== Symptoms of Encapsulation Issues ===
* <code>tcpdump</code> shows SIP/RTP packets arriving at the interface
* Packets appear with unexpected outer headers (tunnel protocol, VLAN tags)
* VoIPmonitor CDRs are empty despite confirmed traffic presence
* Traffic works on the network but is invisible to VoIPmonitor
=== Diagnostic Step 1: Use tshark to Detect Encapsulation ===
Analyze your capture file to identify the protocol layers:
<syntaxhighlight lang="bash">
# Show protocol hierarchy (all protocols present in the capture)
tshark -r /tmp/test_capture.pcap -q -z io,phs
# Look for these encapsulation protocols in the output:
# - vlan (802.1Q VLAN tags)
# - gre (Generic Routing Encapsulation)
# - er_span (Cisco ERSPAN)
# - vxlan (Virtual Extensible LAN)
# - tzsp (TaZmen Sniffer Protocol)
</syntaxhighlight>
=== Diagnostic Step 2: Visual Inspection with tshark ===
Check individual packets to see the full protocol stack:
<syntaxhighlight lang="bash">
# Show the first few packets with protocol details
tshark -r /tmp/test_capture.pcap -V -c 3 | head -50
# Look for nested protocol chains like:
# Ethernet -> VLAN -> IP -> UDP -> SIP
# Ethernet -> GRE -> IP -> UDP -> SIP
# Ethernet -> ERSPAN -> IP -> UDP -> SIP
</syntaxhighlight>
=== Diagnostic Step 3: Check for VLAN Tags Specifically ===
VLAN tags are the most common form of encapsulation:
<syntaxhighlight lang="bash">
# Look for VLAN tagging in packet headers
tshark -r /tmp/test_capture.pcap -Y "vlan" -T fields -e vlan.id | head -20
# Show which VLAN IDs are carrying your SIP traffic
tshark -r /tmp/test_capture.pcap -Y "sip" -T fields -e frame.protocols | head -20
</syntaxhighlight>
=== Encapsulation Types and VoIPmonitor Handling ===
{| class="wikitable"
|-
! Encapsulation Type !! VoIPmonitor Support !! Configuration Notes
|-
| VLAN (802.1Q tags) || '''Automatic''' || Configure mirror port in "trunk" mode to allow tagged traffic. No special VoIPmonitor configuration needed.
|-
| ERSPAN || '''Automatic''' || VoIPmonitor decapsulates ERSPAN packets automatically. No configuration needed.
|-
| GRE || '''Automatic''' || VoIPmonitor decapsulates GRE packets automatically. No configuration needed.
|-
| VXLAN || '''Automatic''' || VoIPmonitor decapsulates VXLAN packets automatically. No configuration needed.
|-
| TZSP || '''Automatic''' || VoIPmonitor decapsulates TZSP packets automatically. No configuration needed.
|}
=== Additional Configuration for Encapsulated Traffic ===
While VoIPmonitor handles most encapsulation automatically, verify these settings:
;1. For VLAN-tagged traffic with BPF filters:
If using a custom <code>filter</code> in <code>voipmonitor.conf</code>, be aware that some BPF expressions may drop VLAN-tagged packets. Test by temporarily disabling the filter:
<pre>
# Temporarily comment out the filter line in voipmonitor.conf
# filter = udp port 5060  <-- comment this out
# Restart the sensor
systemctl restart voipmonitor
# If traffic appears, the filter was the problem
# Update filter to be VLAN-aware or remove it entirely
</pre>
;2. Mirror port configuration for VLANs:
Ensure your switch mirror port is configured in trunk mode to allow all VLAN tags:
<pre>
# Check switch configuration (command varies by vendor/switch OS)
# Example (Cisco):
show monitor session 1
# Verify: "Allowed VLANs: 1-4094" or specific VLAN range containing your SIP traffic
</pre>
;3. Verify interface configuration:
If traffic uses encapsulation, confirm your <code>voipmonitor.conf</code> <code>interface</code> setting matches the configured mirror port:
<pre>
interface = eth0  # This should match your switch mirror destination port
</pre>
=== Troubleshooting Decision Tree for Encapsulation ===
If encapsulation is detected in the .pcap file:
* '''If VLAN tags are present:'''
** Check if your mirror port is in trunk mode (not access mode)
** Verify VLAN IDs match expected SIP traffic VLANs
** Check BPF filter - some filters like "udp" drop VLAN-tagged packets
* '''If tunnel protocols (GRE/ERSPAN/VXLAN) are present:'''
** This is expected - VoIPmonitor should handle these automatically
** Verify the sensor is running on an interface where tunnel traffic is delivered
** Check for packet drops in GUI → Settings → Sensors
* '''If packets show NO encapsulation:'''
** The problem is not encapsulation-related
** Review other troubleshooting sections: voipmonitor.conf, filter settings, capture rules, database connection
=== When to Contact Support ===
If encapsulation is detected but VoIPmonitor still does not capture the traffic after verifying:
1. Your VoIPmonitor version is up to date
2. Packets are confirmed visible with <code>tshark</code>
3. The <code>interface</code> parameter matches your configuration
4. No capture rules are blocking the traffic
5. Encapsulation is one of the supported types (VLAN, ERSPAN, GRE, VXLAN, TZSP)
Then gather and provide this information to VoIPmonitor support:
* The <code>/etc/voipmonitor.conf</code> file
* A sample <code>.pcap</code> capture file created with:
  <code>tcpdump -i eth0 -s 0 -w traffic.pcap port 5060</code>
* Output of protocol analysis:
  <code>tshark -r traffic.pcap -q -z io,phs</code>
* VoIPmonitor version
* Any relevant errors from sensor logs
For more information on deployment with traffic forwarding, see [[Sniffer_distributed_architecture|Distributed Architecture Guide]].
== Check Sensor Statistics in GUI ==
If <code>tshark</code> confirms traffic is reaching the interface, use the VoIPmonitor GUI to verify the sensor is processing packets without drops.
;1. Navigate to '''Settings → Sensors'''
:Expand the sensor details to view real-time capture statistics.
;2. Check the '''# packet drops''' counter:
:This counter should ideally be '''0'''. If it shows a value other than zero, the sensor is dropping packets due to processing bottlenecks, insufficient buffer memory, or hardware limitations.
;3. Common causes of packet drops:
{| class="wikitable"
|-
! Symptom !! Likely Cause !! Solution
|-
| Drops increase with high traffic | Insufficient buffer memory | Increase <code>ringbuffer</code> or <code>max_buffer_mem</code> in <code>voipmonitor.conf</code><br/>See [[Scaling]] for tuning guidance
|-
| Consistent drops at moderate traffic | CPU bottleneck | Check sensor CPU utilization; consider dedicating cores with <code>cpu_affinity</code>
|-
| Drops only on specific interfaces | Hardware/driver issue | Verify interface driver; check for errors with <code>ethtool -S eth0</code>
|-
| Drops after configuration change | New filter or feature overload | Remove or simplify BPF filters, disable unnecessary features
|}
;4. Other useful sensor statistics:
{| class="wikitable"
|-
! Metric !! Description
|-
| '''Packets/sec''' | Current capture rate
|-
| '''Bytes/sec''' | Current bandwidth utilization
|-
| '''Calls/sec''' | Call processing rate
|-
| '''Graph''' | Real-time graph of capture rates over time
|}
For detailed performance metrics beyond basic statistics, see [[Understanding_the_Sniffer's_Performance_Log]].
== Troubleshoot Random Missing Calls in Distributed NFS Setup ==
If you are experiencing random call recordings missing in a distributed setup with a central instance and remote sensors using NFS for the spool directory, and you see errors like "No such file or directory" for .qoq or other files, follow this diagnostic workflow.
=== Step 1: Collect System Logs ===
First, gather the relevant logs to understand what is happening when files are missing.
<syntaxhighlight lang="bash">
# Collect full syslog or VoIPMonitor service logs from the affected sensor
# For the day when the issue occurred
# For Debian/Ubuntu (systemd journal)
journalctl -u voipmonitor --since "yesterday" --until "today" > /tmp/voipmonitor_logs_yesterday.txt
# For CentOS/RHEL/AlmaLinux
tail -n 10000 /var/log/messages | grep voipmonitor > /tmp/voipmonitor_recent.txt
# Or capture specific time window
journalctl -u voipmonitor --since "2025-01-05 08:00:00" --until "2025-01-05 18:00:00" > /tmp/voipmonitor_issue_window.txt
</syntaxhighlight>
Look for error messages like:
* <code>No such file or directory</code> for .qoq, .pcap, or other files
* NFS timeout or disconnection errors
* File write failure messages
* I/O errors
=== Step 1b: Check for NFS Timeout Errors (Critical for Missing Data) ===
If you are missing call data (CDRs) and PCAP files for a specific time period, and your spooldir is on NFS, the most common cause is an NFS timeout or disconnection. Check system logs specifically for NFS server not responding errors.
<syntaxhighlight lang="bash">
# Check for NFS timeout errors in system logs
journalctl --since "yesterday" --until "today" | grep -i "nfs.*server.*not responding"
# Alternative: Check /var/log/messages or system syslog
grep -i "nfs.*server.*not responding" /var/log/messages
# Look for NFS timeout patterns in kernel messages
dmesg -T | grep -i "nfs.*timeout"
# Search broader NFS errors during the missing data period
journalctl --since "2025-01-05 08:00:00" --until "2025-01-05 18:00:00" | grep -E "nfs.*error|i/o error.*nfs"
</syntaxhighlight>
<code>'''Key NFS Error Messages to Look For:'''</code>
* <code>nfs: server [IP_ADDRESS] not responding, timed out</code> - NFS server unreachable
* <code>nfs: server [IP_ADDRESS] OK</code> (after timeout) - Connection restored
* <code>NFS: server [IP_ADDRESS] not responding, still trying</code> - Intermittent connection
* <code>I/O error accessing /var/spool/voipmonitor...</code> - NFS write failure
If you find these errors, the root cause is the NFS server becoming unreachable during that time period. The voipmonitor sensor was running but could not write files to the NFS share, resulting in missing data.
=== Step 1c: Verify NFS Server Connectivity During Missing Data Period ===
After identifying NFS timeout errors, verify network connectivity to the NFS server.
<syntaxhighlight lang="bash">
# Check if NFS server is currently reachable
ping -c 5 <NFS_SERVER_IP>
# Verify NFS port is accessible (NFS uses port 2049)
nc -zv <NFS_SERVER_IP> 2049
# Test route to NFS server
traceroute <NFS_SERVER_IP>
# Check if NFS mount is still accessible
touch /var/spool/voipmonitor/test_connectivity.$$ && rm /var/spool/voipmonitor/test_connectivity.$$
# If this fails, NFS mount is likely not functioning
</syntaxhighlight>
;If NFS server connectivity is unstable or unreachable during the missing data period:
* Check network infrastructure (switches, firewalls) between sensor and NFS server
* Verify NFS server is running and not overloaded during that time period
* Consider increasing NFS mount timeout options (see Step 4 below)
* For critical deployments, consider using local storage for spooldir with periodic archive to NFS
=== Step 2: Check Packet Drops in GUI ===
Verify that the sensor is not experiencing packet drops, which could indicate performance issues that lead to incomplete file writes.
;1. Navigate to '''Settings → Sensors'''
:Expand the sensor details to view real-time capture statistics.
;2. Check the '''# packet drops''' counter:
:If this counter is increasing, the sensor may be dropping packets, which can lead to incomplete call recordings that manifest as "missing calls."
If packet drops are occurring, see the [[#Check Sensor Statistics in GUI|Sensor Statistics section]] above for causes and solutions.
=== Step 3: Verify Spool Directory Configuration and Disk Space ===
Check if the spool directory is correctly configured and has sufficient space. Issues here can cause file write failures that appear as missing calls.
<syntaxhighlight lang="bash">
# Verify spool directory configuration
cat /etc/voipmonitor.conf | grep -E "^spooldir|^maxpool"
# Check available disk space on the spool directory
df -h /var/spool/voipmonitor
# Check actual disk usage
du -hs /var/spool/voipmonitor
# Check available inodes (can be exhausted even with free space)
df -i /var/spool/voipmonitor
</syntaxhighlight>
;Common issues to check:
{| class="wikitable"
|-
! Issue !! Symptom !! Solution
|-
| Disk full or near capacity | Use% shows 90-100% in df -h | Clean up old data, adjust maxpool settings
|-
| No inodes available | IUse% shows 100% in df -i | Clean up many small files, consider enabling TAR archiving
|-
| NFS mount disconnected | df shows stale mount, mount shows transport endpoint not connected | Check NFS server connectivity, remount
|-
| Incorrect spooldir path | grep shows wrong path or no output | Edit voipmonitor.conf and restart sensor
|}
=== Step 4: Check NFS Mount Stability ===
If the spool is on NFS, network issues or NFS server problems can cause file operations to fail.
<syntaxhighlight lang="bash">
# Check if NFS mount is still accessible
touch /var/spool/voipmonitor/testfile.$$ && rm /var/spool/voipmonitor/testfile.$$
# If this fails, NFS mount is likely not functioning
# Check mount options for the NFS spool
mount | grep nfs
# Look for: soft/intr (can cause failures), hard defaults
# Recommended: hard,nfsvers=3,timeo=600
# Test NFS write latency
time dd if=/dev/zero of=/var/spool/voipmonitor/test_write bs=1M count=10
rm /var/spool/voipmonitor/test_write
# Check NFS server logs for errors (on NFS server)
tail -f /var/log/syslog | grep -i nfs
</syntaxhighlight>
;Recommended NFS mount options for VoIPmonitor spooldir:
<pre>
mount -t nfs -o hard,nfsvers=3,timeo=600,retrans=2,rsize=1048576,wsize=1048576 192.168.1.10:/export/voipmonitor /var/spool/voipmonitor
</pre>
</pre>
*'''If you see a continuous stream of SIP and RTP packets''', it means traffic is reaching the server, and the problem is likely in VoIPmonitor's configuration (see Step 4).
*'''If you see NO packets''', the problem lies with your network configuration. Proceed to Step 3.


These options provide:
== Step 3: Troubleshoot Network and Interface Configuration ==
* <code>hard</code> - Guarantees that write operations complete (do not silently fail)
If `tshark` shows no traffic, it means the packets are not being delivered to the operating system correctly.
* <code>nfsvers=3</code> - Use stable NFSv3 instead of v4 (fewer edge cases)
* <code>timeo=600</code> - Increase timeout to 60 seconds (handles slow storage)
* <code>large rsize/wsize</code> - Better throughput for large file operations
 
=== Step 5: Check for I/O Bottlenecks ===
 
Slow I/O to the NFS spool can cause the sensor's buffers to fill up, leading to incomplete files that later appear as "missing calls."
 
<syntaxhighlight lang="bash">
# Check current disk I/O latency
ioping -c 10 /var/spool/voipmonitor
 
# Monitor I/O wait during heavy call traffic
iostat -x 2 10
 
# Check for NFS-specific latency
time cat /var/spool/voipmonitor/test_10mb.file > /dev/null
</syntaxhighlight>
 
If I/O latency is high (>50ms average for ioping):
* Check network bandwidth to NFS server
* Verify NFS server is not overloaded
* Consider local disk for spooldir instead of NFS
* See [[IO_Measurement]] for detailed benchmarking
 
=== Step 6: Verify TAR Archiving Configuration ===
 
TAR archiving reduces the number of small file operations on NFS, which is critical for reducing IOPS and preventing file write failures.
 
<syntaxhighlight lang="bash">
# Check if TAR archiving is enabled
grep -E "^tar\s*=" /etc/voipmonitor.conf
 
# Recommended configuration:
# tar = yes
# tar_sip_size = 100 (100M per archive, balanced)
# tar_graph_size = 100
</syntaxhighlight>
 
If TAR archiving is not enabled, strongly consider enabling it in distributed NFS setups to reduce I/O load. See [[Sniffer_configuration#PCAP/TAR_Storage_Strategy]] for details.
 
=== Step 7: Check cachedir and Cross-Filesystem Issues ===
 
If <code>cachedir</code> points to a different filesystem (e.g., <code>/dev/shm</code> for RAM) while <code>spooldir</code> is on NFS, the cross-filesystem move operation can fail under load.
 
<syntaxhighlight lang="bash">
# Check cachedir configuration
grep -E "^cachedir\s*=" /etc/voipmonitor.conf
 
# Check if cachedir and spooldir are on different filesystems
df /var/spool/voipmonitor
df /dev/shm  # if cachedir = /dev/shm
</syntaxhighlight>
 
;If cachedir is on a different filesystem than spooldir:
* '''Option A (Recommended):''' Disable cachedir for NFS spooldir - set <code>cachedir = </code> (empty)
* '''Option B:** Move cachedir to the same NFS mount point - e.g., <code>cachedir = /var/spool/voipmonitor/cache</code>
 
This ensures atomic rename operations instead of cross-filesystem copies.
 
== Troubleshoot S3 Cloud Storage Mounting Issues ==
 
If you are using S3 (or compatible cloud storage) mounted as a filesystem for the VoIPmonitor spool directory, the choice of mounting tool significantly affects stability and performance. The manager interface may become unresponsive or the sensor may experience performance issues depending on how the storage is mounted.
 
=== Symptoms of S3 Mounting Issues ===
 
When using problematic S3 mounting solutions, you may experience:
* Manager interface becomes unresponsive or slow during tar_move operations
* High latency when reading or writing call recording files
* System hangs or timeouts during file operations
* Incomplete TAR archives or failed tar_move operations
 
=== Recommended Solution: Use rclone Instead of s3fs ===
 
The s3fs FUSE tool has known limitations with high-frequency file operations and can cause the manager interface to become unresponsive when tar_move is used with S3 storage.
 
Use rclone instead for better performance and stability. rclone handles S3 operations more efficiently and is the recommended mounting tool for production deployments.
 
=== rclone Mount Configuration Example ===
 
A working rclone mount command for S3 storage:
 
<syntaxhighlight lang="bash">
/usr/bin/rclone mount spr-prod0-voipmonitor /mnt1/spool-backup \
  --allow-other \
  --dir-cache-time 30s \
  --poll-interval 0 \
  --vfs-cache-mode off \
  --buffer-size 0 \
  --use-server-modtime \
  --no-modtime \
  --s3-no-head \
  --log-level INFO
</syntaxhighlight>
 
=== Key rclone Mount Parameters ===
 
The recommended mount options are optimized for VoIPmonitor workloads:
 
{| class="wikitable"
|-
! Parameter !! Purpose
|-
| <code>--allow-other</code> || Allows other users (non-root) to access the mount
|-
| <code>--dir-cache-time 30s</code> || Directory cache timeout for performance
|-
| <code>--poll-interval 0</code> || Disable polling, rclone uses kernel notifications
|-
| <code>--vfs-cache-mode off</code> || Disable VFS caching for better consistency
|-
| <code>--buffer-size 0</code> || No buffering, direct I/O to S3
|-
| <code>--use-server-modtime</code> || Use S3 server timestamps
|-
| <code>--no-modtime</code> || Don't modify file timestamps
|-
| <code>--s3-no-head</code> || Optimize S3 HEAD requests (no metadata checks)
|-
| <code>--log-level INFO</code> || Set appropriate logging level
|}
 
=== Why s3fs Is Not Recommended ===
 
The s3fs mounting tool has several limitations that make it unsuitable for VoIPmonitor workloads with tar_move:
 
* '''Performance Issues:''' s3fs uses FUSE with inefficient caching, leading to slow file operations
* '''Unresponsive Manager Interface:** During high-frequency operations (e.g., tar_move), s3fs can cause the system to hang or become unresponsive
* '''Limited Concurrency:** s3fs cannot handle parallel file operations reliably
* '''Metadata Overhead:** Excessive stat() and HEAD operations to S3 slow down operations
* '''Inconsistent Behavior:** File handles may timeout or fail during TAR archiving operations
 
When switching from s3fs to rclone, users typically see immediate improvements in stability and responsiveness.
 
=== Troubleshooting rclone Mounts ===
 
If you encounter issues with an rclone-mounted S3 storage:
 
;1. Check if mount is active:
<syntaxhighlight lang="bash">
mount | grep rclone
df -h /mnt1/spool-backup  # Check mount point exists and reports storage
</syntaxhighlight>
 
;2. Test write performance to ensure mount is functional:
<syntaxhighlight lang="bash">
time dd if=/dev/zero of=/mnt1/spool-backup/test_write bs=1M count=10
rm /mnt1/spool-backup/test_write
</syntaxhighlight>
 
;3. Check rclone logs for errors:
<syntaxhighlight lang="bash">
journalctl -u rclone -f  # If running as systemd service
# Check terminal output if mounted manually
</syntaxhighlight>
 
;4. Verify tar_move functionality:
After switching to rclone, test that tar_move operations complete without causing manager interface hangs:
* Monitor the manager interface responsiveness during TAR operations
* Check system logs for rclone errors or timeouts
* Verify TAR archives are created successfully
 
=== Alternative: Use tar_rotate for Direct S3 Upload ===
 
Instead of mounting S3 locally, consider using tar_rotate to archive directly to S3 if your VoIPmonitor version supports it. This eliminates filesystem mounting overhead and provides more reliable cloud storage integration.
 
For tar_rotate configuration, see [[Sniffer_configuration#PCAP/TAR_Storage_Strategy]].
 
== Troubleshoot Network and Interface Configuration ==
If <code>tshark</code> shows no traffic, it means the packets are not being delivered to the operating system correctly.


;1. Check if the interface is UP:
;1. Check if the interface is UP:
Ensure the network interface is active.
Ensure the network interface is active.
<syntaxhighlight lang="bash">ip link show eth0</syntaxhighlight>
<pre>ip link show eth0</pre>
The output should contain the word <code>UP</code>. If it doesn't, bring it up with:
The output should contain the word `UP`. If it doesn't, bring it up with:
<syntaxhighlight lang="bash">ip link set dev eth0 up</syntaxhighlight>
<pre>ip link set dev eth0 up</pre>


;2. Check for Promiscuous Mode (for SPAN/RSPAN Mirrored Traffic):
;2. Check for Promiscuous Mode (for SPAN/RSPAN Mirrored Traffic):
'''Important:''' Promiscuous mode requirements depend on your traffic mirroring method:
'''Important:''' Promiscuous mode requirements depend on your traffic mirroring method:


{| class="wikitable"
* '''SPAN/RSPAN (Layer 2 mirroring):''' The network interface '''must''' be in promiscuous mode. Mirrored packets retain their original MAC addresses, so the interface would normally ignore them. Promiscuous mode forces the interface to accept all packets regardless of destination MAC.
|-
 
! Mirroring Method !! Promiscuous Mode Required? !! Reason
* '''ERSPAN/GRE/TZSP/VXLAN (Layer 3 tunnels):''' Promiscuous mode is '''NOT required'''. These tunneling protocols encapsulate the mirrored traffic inside IP packets that are addressed directly to the sensor's IP address. The operating system receives these packets normally, and VoIPmonitor automatically decapsulates them to extract the inner SIP/RTP traffic.
|-
| SPAN/RSPAN (Layer 2) || '''Yes''' || Mirrored packets retain original MAC addresses; interface must accept all packets
|-
| ERSPAN/GRE/TZSP/VXLAN (Layer 3) || '''No''' || Tunneled traffic is addressed directly to sensor's IP; VoIPmonitor decapsulates automatically
|}


For SPAN/RSPAN deployments, check the current promiscuous mode status:
For SPAN/RSPAN deployments, check the current promiscuous mode status:
<syntaxhighlight lang="bash">ip link show eth0</syntaxhighlight>
<pre>ip link show eth0</pre>
Look for the <code>PROMISC</code> flag.
Look for the `PROMISC` flag.


Enable promiscuous mode manually if needed:
Enable promiscuous mode manually if needed:
<syntaxhighlight lang="bash">ip link set eth0 promisc on</syntaxhighlight>
<pre>ip link set eth0 promisc on</pre>
If this solves the problem, you should make the change permanent. The <code>install-script.sh</code> for the sensor usually attempts to do this, but it can fail.
If this solves the problem, you should make the change permanent. The `install-script.sh` for the sensor usually attempts to do this, but it can fail.


;3. Verify Your SPAN/Mirror/TAP Configuration:
;3. Verify Your SPAN/Mirror/TAP Configuration:
Line 707: Line 64:
* If you are monitoring traffic across different VLANs, ensure your mirror port is configured to carry all necessary VLAN tags (often called "trunk" mode).
* If you are monitoring traffic across different VLANs, ensure your mirror port is configured to carry all necessary VLAN tags (often called "trunk" mode).


=== Troubleshoot Missing Call Legs or Specific SIP Packets for Certain IP Addresses ===
== Step 4: Check the VoIPmonitor Configuration ==
If the sensor is capturing most calls successfully but missing specific call legs or SIP packets (like INVITEs) for certain IP addresses, particularly during high-traffic periods, this typically indicates an incomplete SPAN/port mirroring configuration.
If `tshark` sees traffic but VoIPmonitor does not, the problem is almost certainly in `voipmonitor.conf`.
 
;1. Identify the specific IP addresses or call flows where packets are missing:
Use the GUI CDR view to identify which IP addresses or call flows show incomplete SIP traces. Note the source and destination IP addresses that are consistently missing from the capture.
 
;2. Verify if packets are reaching the sensor interface:
Use <code>tcpdump</code> to verify if the missing packets are arriving at the server's network interface:
<syntaxhighlight lang="bash">
# Capture packets for the specific IP address during a test call
tcpdump -i eth0 -n "port 5060 and host <TARGET_IP>"
 
# Or capture to a file for detailed analysis
tcpdump -i eth0 -s 0 -w /tmp/missing_packets.pcap "host <SOURCE_IP> and host <DEST_IP>"
</syntaxhighlight>
 
{| class="wikitable"
|-
! tcpdump shows packets? | Diagnosis | Next Steps
|-
| YES | Packets are reaching the NIC | Issue is in VoIPmonitor configuration ([[#Check the VoIPmonitor Configuration|see below]])
|-
| NO | Packets not reaching the interface | Continue to Step 3 (SPAN configuration issue)
|}
 
;3. Investigate the network switch's SPAN configuration:
If <code>tcpdump</code> confirms packets are not present on the interface, the switch's port mirroring configuration is incomplete. Log into your network switch and verify the SPAN/mirror session settings.
 
{| class="wikitable"
|-
! Common SPAN Configuration Issue !! Symptoms !! Solution
|-
| Monitoring only ingress traffic on source ports | See packets from PBX to network, but not vice versa | Configure SPAN to monitor '''both ingress (inbound) and egress (outbound)''' traffic on source ports
|-
| Source port not included in mirror session | Specific IP addresses always missing from capture | Add the switch port where that IP is connected to the source port list
|-
| VLAN trunk not configured correctly | Tags stripped or filtered | Ensure destination mirror port is in trunk mode to carry all VLANs
|-
| Bandwidth limitation during high traffic | Missing packets only during peak hours | Verify switch mirror bandwidth capacity is not saturated
|}
 
;4. Ensure bidirectional capture for critical IP addresses:
For proper call leg reconstruction, the SPAN configuration must capture traffic in both directions. Verify your switch configuration captures:
* '''Inbound (ingress) traffic:** Packets arriving at monitored ports (e.g., from PBX to switch)
* '''Outbound (egress) traffic:** Packets leaving monitored ports (e.g., from switch to PBX)
 
Example SPAN session verification (syntax varies by switch vendor):
<syntaxhighlight lang="text">
# Cisco IOS example - verify bidirectional mirroring
show monitor session 1
# Expected: "Direction : Both" or "Ingress: Enabled, Egress: Enabled"


# Verify source ports include all necessary PBX/SBC ports
;1. Check the `interface` directive:
Source Ports:
:Make sure the `interface` parameter in `/etc/voipmonitor.conf` exactly matches the interface where you see traffic with `tshark`. For example: `interface = eth0`.
  Both  : Gi0/1 (PBX), Gi0/5 (SBC)
  ...


# Verify destination port
;2. Check the `sipport` directive:
Destination Ports:
  Gi0/10 (VoIPmonitor sensor)
</syntaxhighlight>
 
If your SPAN session only shows "Ingress" or "Egress" but not "Both", reconfigure the session to monitor both directions. Consult your switch vendor documentation or network administrator to update the SPAN configuration; this must be done on the switch, not within VoIPmonitor.
 
;5. Validate the fix:
After updating the SPAN configuration, repeat the <code>tcpdump</code> verification from Step 2. You should now see packets for the previously missing IP addresses in both directions.
 
<syntaxhighlight lang="bash">
# Verify bidirectional capture
tcpdump -i eth0 -n "port 5060 and host <TARGET_IP>"
 
# Should show:
# - SIP INVITE from SOURCE to DEST (outbound from source port)
# - SIP 200 OK from DEST to SOURCE (inbound to source port)
# - SIP ACK from SOURCE to DEST (outbound)
# - BYE and responses in both directions
</syntaxhighlight>
 
== Check the VoIPmonitor Configuration ==
If <code>tshark</code> sees traffic but VoIPmonitor does not, the problem is almost certainly in <code>voipmonitor.conf</code>.
 
;1. Check the <code>interface</code> directive:
:Make sure the <code>interface</code> parameter in <code>/etc/voipmonitor.conf</code> exactly matches the interface where you see traffic with <code>tshark</code>. For example: <code>interface = eth0</code>.
 
;2. Check the <code>sipport</code> directive:
:By default, VoIPmonitor only listens on port 5060. If your PBX uses a different port for SIP, you must add it. For example:
:By default, VoIPmonitor only listens on port 5060. If your PBX uses a different port for SIP, you must add it. For example:
:<code>sipport = 5060,5080</code>
:<code>sipport = 5060,5080</code>


;3. Check for a restrictive <code>filter</code>:
;3. Check for a restrictive `filter`:
:If you have a BPF <code>filter</code> configured, ensure it is not accidentally excluding the traffic you want to see. For debugging, try commenting out the <code>filter</code> line entirely and restarting the sensor.
:If you have a BPF `filter` configured, ensure it is not accidentally excluding the traffic you want to see. For debugging, try commenting out the `filter` line entirely and restarting the sensor.


== Check GUI Capture Rules (Causing Call Stops) ==
== Step 5: Check GUI Capture Rules (Causing Call Stops) ==
If <code>tshark</code> sees SIP traffic and the sniffer configuration appears correct, but the probe stops processing calls or shows traffic only on the network interface, GUI capture rules may be the culprit.
If `tshark` sees SIP traffic and the sniffer configuration appears correct, but the probe stops processing calls or shows traffic only on the network interface, GUI capture rules may be the culprit.


Capture rules configured in the GUI can instruct the sniffer to ignore ("skip") all processing for matched calls. This includes calls matching specific IP addresses or telephone number prefixes.
Capture rules configured in the GUI can instruct the sniffer to ignore ("skip") all processing for matched calls. This includes calls matching specific IP addresses or telephone number prefixes.
Line 828: Line 107:
For more information on capture rules, see [[Capture_rules]].
For more information on capture rules, see [[Capture_rules]].


== Troubleshoot Missing or One-Way Audio ==
== Step 6: Check VoIPmonitor Logs for Errors ==
 
Finally, VoIPmonitor's own logs are the best source for clues. Check the system log for any error messages generated by the sensor on startup or during operation.
If calls are being captured (you see them in the GUI CDR list), but audio is missing or only present in one direction, this indicates an RTP correlation issue. This is different from "no calls being captured" - the problem is specifically that VoIPmonitor cannot link RTP packets to the SIP call.
<pre>
 
# For Debian/Ubuntu
'''Symptom:''' CDR shows "No audio" or audio in only one column (Caller/Called), but tshark confirms RTP packets are present on the network.
tail -f /var/log/syslog | grep voipmonitor
 
=== Step 1: Verify RTP Capture is Enabled ===
 
First, ensure the sensor is configured to save RTP packets.
 
;1. Check <code>savertp</code> in <code>voipmonitor.conf</code>:
<syntaxhighlight lang="ini">
# Verify RTP saving is enabled
savertp = yes
 
# Common mistake: savertp=no or savertp=header (saves only headers, not audio)
# For audio recording, savertp must be 'yes'
</syntaxhighlight>
 
;2. Check <code>savesip</code> is also enabled:
<syntaxhighlight lang="ini">
# SIP data is required to link RTP to calls
savesip = yes
</syntaxhighlight>
 
;3. Restart the sniffer after changes:
<syntaxhighlight lang="bash">systemctl restart voipmonitor</syntaxhighlight>
 
=== Step 2: Verify RTP Traffic is Present ===
 
If <code>savertp = yes</code> but you still have no audio, verify RTP packets are reaching the interface and are actually being processed, not just present in tshark.
 
<syntaxhighlight lang="bash">
# Capture SIP and RTP for a test call
tcpdump -i eth0 -n -w /tmp/test_audio.pcap "sip or udp"
 
# Make a test call, then analyze:
tshark -r /tmp/test_audio.pcap -Y "rtp" -c 10
 
# Check what ports RTP is using
tshark -r /tmp/test_audio.pcap -Y "rtp" -T fields -e rtp.ssrc -e udp.srcport -e udp.dstport | head -20
</syntaxhighlight>
 
* '''If you see NO RTP packets in the capture:''' The network mirroring is not configured to capture UDP/RTP traffic (common in VLAN-based deployments).
* '''If you see RTP packets:''' Proceed to Step 3 for correlation troubleshooting.
 
=== Step 3: Diagnose RTP Correlation Failure ===
 
VoIPmonitor uses SDP (Session Description Protocol) information in SIP messages to associate RTP packets with calls. If this correlation fails, audio will not be displayed even though RTP packets are captured.
 
The most common causes are:
* NAT devices mismatching IP addresses (SDP shows private IP, RTP arrives from public IP)
* SBCs/media servers modifying RTP ports (SDP advertises port X, RTP arrives on port Y)
* Proxies separating SIP signaling from RTP media path
 
=== Step 4: Configure NAT Aliases (natalias) ===
 
If endpoints are behind NAT, the SDP messages may contain private IPs (e.g., `10.x.x.x`, `192.168.x.x`) but RTP arrives from the firewall's public IP. You must explicitly map these IPs.
 
;1. Verify correct <code>natalias</code> syntax:
<syntaxhighlight lang="ini">
# Syntax: natalias = <public_ip> <private_ip>
# OR: natalias = <private_ip> <public_ip>
# Use only TWO parameters (IP addresses, not CIDR notation for simple mappings)
 
# Example: Map firewall public IP to internal subnet
natalias = 203.0.113.5 10.0.0.50
</syntaxhighlight>
 
Multiple <code>natalias</code> lines can be used for multiple mappings.
 
;2. Try reversing IP order if initial configuration does not work:
<syntaxhighlight lang="ini">
# If configuration above doesn't fix issue, try the reverse order:
natalias = 10.0.0.50 203.0.113.5
</syntaxhighlight>
 
This is a common troubleshooting step - the correct order depends on whether RTP is being received from the public IP (typical) or the private IP (after NAT traversal).
 
;3. Enable automatic RTP endpoint detection:
<syntaxhighlight lang="ini">
# Helps identify actual RTP endpoints behind proxies/NAT
rtpip_find_endpoints = yes
</syntaxhighlight>
 
;4. Relax strict RTP source checking:
<syntaxhighlight lang="ini">
# Accept RTP even if source IP differs from SIP header IP
rtpfromsdp_onlysip = no
</syntaxhighlight>
 
;5. Restart the sniffer after changes:
<syntaxhighlight lang="bash">systemctl restart voipmonitor</syntaxhighlight>
 
For more configuration options, see [[Sniffer_configuration#NAT_Handling]].
 
=== Step 5: Perform Port Mismatch Analysis ===
 
If <code>natalias</code> fixes do not resolve the issue, an external device (SBC, media server) may be modifying the RTP ports. This is a common scenario in carrier environments.
 
;1. Capture a test call with both SIP and RTP:
<syntaxhighlight lang="bash">
# Start capture
tcpdump -i eth0 -n -w /tmp/port_mismatch.pcap "sip or udp"
 
# Make a test call with missing audio
# Press Ctrl+C to stop
</syntaxhighlight>
 
;2. Extract SDP-advertised RTP ports from SIP:
<syntaxhighlight lang="bash">
# Find INVITE and extract Call-ID
tshark -r /tmp/port_mismatch.pcap -Y "sip.Method == INVITE" -T fields -e sip.Call-ID -e sip.to.user
 
# For a specific Call-ID, show SDP port information
tshark -r /tmp/port_mismatch.pcap -Y "sip and sip.Call-ID == YOUR_CALL_ID" -T fields -e sdp.media.port | head -20
</syntaxhighlight>
 
This shows what ports the SIP SDP says RTP should use.
 
;3. Extract actual RTP ports from captured packets:
<syntaxhighlight lang="bash">
# Show all RTP packets and their ports
tshark -r /tmp/port_mismatch.pcap -Y "rtp" -T fields -e rtp.ssrc -e udp.srcport -e udp.dstport -e ip.src -e ip.dst | sort -u | head -20
</syntaxhighlight>
 
This shows what ports RTP is actually using in the capture.
 
;4. Compare the results:
 
{| class="wikitable"
|-
! Source !! SDP Port (from SIP) !! Actual RTP Port !! Match?
|-
| Example | 50100 | 32456 | '''NO - Port mismatch'''
|}
 
* '''If ports MATCH''': The issue is not port modification. Check NAT configuration again or verify <code>savertp</code> is enabled.
* '''If ports DO NOT MATCH''': An external device is modifying the media ports.
 
=== Step 6: External Device Root Cause ===
 
If a port mismatch is detected (Step 5), the root cause is an external device (SBC, media server, SIP proxy) modifying the RTP ports.
 
'''Important:''' VoIPmonitor requires the SDP-advertised ports to match the actual RTP packet ports for proper correlation. If these do not match, fixing the issue requires changing the configuration of the external device, not VoIPmonitor.
 
Common external devices that modify RTP ports:
* Session Border Controllers (SBC)
* Media servers
* SIP proxies with media handling
* IP-PBX systems with built-in NAT traversal
 
'''Solutions:'''
1. Check the external device's configuration for media port handling
2. If possible, disable port modification on the external device
3. Use media mirroring features (e.g., Ribbon SBC Monitoring Profile) if available
4. Consider deploying a dedicated sensor on the internal network side where SDP and RTP ports match
 
For example documentation on SBC mirroring, see [[Ribbon7k_monitoring_profiles]].
 
=== Step 7: Distributed Architecture Check ===
 
VoIPmonitor cannot associate SIP and RTP if they are captured by different sensors.
 
;Scenario:*
* Sensor A captures SIP traffic (VLAN 10)
* Sensor B captures RTP traffic (VLAN 20)
* Both sensors send data to the same GUI
 
;Result:* The GUI cannot reconstruct the call because SIP and RTP are not on the same sensor.
 
;Solution:*
* Ensure one single sensor instance sees both SIP signaling and RTP media
* Use a trunk port in the switch mirror configuration
* Or use a combined mode where all traffic goes to one interface
 
For more details, see [[Sniffer_distributed_architecture]].
 
=== Quick Decision Matrix ===
 
{| class="wikitable"
|-
! Symptom !! Likely Cause !! Primary Solution
|-
| tshark shows NO RTP traffic | Network Mirroring/VLANs | Fix SPAN/TAP or allow VLAN trunk
|-
| tshark shows RTP, GUI shows "No audio" | <code>savertp ≠ yes</code> | Set <code>savertp = yes</code>
|-
| tshark shows RTP, GUI shows one-way audio | NAT/Proxy correlation failure | Enable <code>rtpip_find_endpoints</code> and <code>natalias</code>
|-
| SDP and RTP ports do not match | External device modifying ports | Fix SBC/media server configuration
|-
| SIP on sensor A, RTP on sensor B | Distributed architecture issue | Capture both on single sensor
|}
 
== Troubleshoot MySQL/MariaDB Database Connection Errors ==
If you see "Connection refused (111)" errors or the sensor cannot connect to your database server, the issue is with the MySQL/MariaDB database connection configuration in <code>/etc/voipmonitor.conf</code>.
 
Error 111 (Connection refused) indicates that the database server is reachable on the network, but no MySQL/MariaDB service is listening on the specified port, or the connection is being blocked by a firewall. This commonly happens after migrations when the database server IP address has changed.
 
=== Symptoms and Common Errors ===
 
{| class="wikitable"
|-
! Error Message !! Likely Cause
|-
| <code>Can't connect to MySQL server on 'IP' (111)</code> || Wrong host/port or service not running
|-
| <code>Access denied for user 'user'@'host'</code> || Wrong username or password
|-
| <code>Unknown database 'voipmonitor'</code> || Wrong database name
|}
 
=== Diagnostic Steps ===
 
;1. Check for database connection errors in sensor logs:
<syntaxhighlight lang="bash">
# For Debian/Ubuntu (systemd journal)
journalctl -u voipmonitor --since "1 hour ago" | grep -iE "mysql|database|connection|can.t connect"
 
# For systems using traditional syslog
tail -f /var/log/syslog | grep voipmonitor | grep -iE "mysql|database|connection"


# For CentOS/RHEL/AlmaLinux
# For CentOS/RHEL/AlmaLinux
tail -f /var/log/messages | grep voipmonitor | grep -iE "mysql|database|connection"
tail -f /var/log/messages | grep voipmonitor
</syntaxhighlight>
</pre>
 
Look for errors like:
;2. Verify database connection parameters in <code>voipmonitor.conf</code>:
* "pcap_open_live(eth0) error: eth0: No such device" (Wrong interface name)
<syntaxhighlight lang="ini">
* "Permission denied" (The sensor is not running with sufficient privileges)
# Database Connection Parameters
* Errors related to database connectivity.
mysqlhost = 192.168.1.10      # IP address or hostname of MySQL/MariaDB server
* Messages about dropping packets.
mysqlport = 3306              # TCP port of the database server (default: 3306)
mysqlusername = root          # Database username
mysqlpassword = your_password  # Database password
mysqldatabase = voipmonitor    # Database name
</syntaxhighlight>
 
;3. Test MySQL connectivity from the sensor host:
<syntaxhighlight lang="bash">
# Test basic TCP connectivity (replace IP and port as needed)
nc -zv 192.168.1.10 3306
 
# Or using telnet
telnet 192.168.1.10 3306
</syntaxhighlight>
 
If you see "Connection refused", the database service is not running or not listening on that port.
 
;4. Test MySQL authentication using credentials from <code>voipmonitor.conf</code>:
<syntaxhighlight lang="bash">
mysql -h 192.168.1.10 -P 3306 -u root -p'your_password' voipmonitor
</syntaxhighlight>
 
Commands to run inside mysql client to verify:
<syntaxhighlight lang="sql">
-- Check if connected correctly
SELECT USER(), CURRENT_USER();
 
-- Check database exists
SHOW DATABASES LIKE 'voipmonitor';
 
-- Test write access
USE voipmonitor;
SHOW TABLES;
EXIT;
</syntaxhighlight>
 
;5. Compare with a working sensor's configuration:
If you have other sensors that successfully connect to the database, compare their configuration files:
<syntaxhighlight lang="bash">
diff <(grep -E "^mysql" /etc/voipmonitor.conf) <(grep -E "^mysql" /path/to/working/sensor/voipmonitor.conf)
</syntaxhighlight>
 
;6. Check firewall and network connectivity:
<syntaxhighlight lang="bash">
# Test network reachability
ping -c 4 192.168.1.10
 
# Check if MySQL port is reachable
nc -zv 192.168.1.10 3306
 
# Check firewall rules (if using firewalld)
firewall-cmd --list-ports
 
# Check firewall rules (if using iptables)
iptables -L -n | grep 3306
</syntaxhighlight>
 
;7. Verify MySQL/MariaDB service is running:
On the database server, check if the service is active:
<syntaxhighlight lang="bash">
# Check MySQL/MariaDB service status
systemctl status mariadb    # or systemctl status mysql
 
# Restart service if needed
systemctl restart mariadb
 
# Check which port MySQL is listening on
ss -tulpn | grep mysql
</syntaxhighlight>
 
;8. Apply configuration changes and restart the sensor:
<syntaxhighlight lang="bash">
# Restart the VoIPmonitor service to apply changes
systemctl restart voipmonitor
 
# Alternatively, reload without full restart (if supported in your version)
echo 'reload' | nc 127.0.0.1 5029
 
# Verify the service started successfully
systemctl status voipmonitor
 
# Check logs for database connection confirmation
journalctl -u voipmonitor -n 20
</syntaxhighlight>
 
=== Common Troubleshooting Scenarios ===
 
{| class="wikitable"
|-
! Scenario !! Symptom !! Solution
|-
| Database server IP changed || "Can't connect to MySQL server on '10.1.1.10' (111)" || Update <code>mysqlhost</code> in <code>voipmonitor.conf</code>
|-
| Wrong credentials || "Access denied for user" || Verify and update <code>mysqlusername</code> and <code>mysqlpassword</code>
|-
| Database service not running || "Connection refused (111)" || Start service: <code>systemctl start mariadb</code>
|-
| Firewall blocking port || <code>nc</code> shows "refused" but MySQL is running || Open port 3306 in firewall
|-
| Localhost vs remote confusion || Works locally but fails from sensor || Use actual IP address instead of <code>localhost</code>
|}
 
For more detailed information about all <code>mysql*</code> configuration parameters, see [[Sniffer_configuration#Database_Configuration]].
 
== Check for Storage Hardware Errors (HEAP FULL / packetbuffer Issues) ==
If the sensor is crashing with "HEAP FULL" errors or showing "packetbuffer: MEMORY IS FULL" messages, you must distinguish between '''actual storage hardware failures''' (requires disk replacement) and '''performance bottlenecks''' (requires tuning).
 
;1. Check kernel message buffer for storage errors:
<syntaxhighlight lang="bash">
dmesg -T | grep -iE "ext4-fs error|i/o error|nvram warning|ata.*failed|sda.*error|disk failure|smart error" | tail -50
</syntaxhighlight>
 
Look for these hardware error indicators:
* <code>ext4-fs error</code> - Filesystem corruption or disk failure
* <code>I/O error</code> or <code>BUG: soft lockup</code> - Disk read/write failures
* <code>NVRAM WARNING: nvram_check: failed</code> - RAID controller battery/capacitor issues
* <code>ata.*: FAILED</code> - Hard drive SMART failure
* <code>Buffer I/O error</code> - Disk unable to complete operations
 
'''If you see ANY of these errors:'''
* The storage subsystem is failing and likely needs hardware replacement
* Do not attempt performance tuning - replace the failed disk/RAID first
* Check SMART status: <code>smartctl -a /dev/sda</code>
* Check RAID health: <code>cat /proc/mdstat</code> or RAID controller tools


;2. If dmesg is clean of errors → Performance Bottleneck:
== Step 7: Check for OOM (Out of Memory) Issues ==
If the kernel logs show no storage errors, the issue is a performance bottleneck (disk too slow, network latency, etc.).
If VoIPmonitor suddenly stops processing CDRs and a service restart temporarily restores functionality, the system may be experiencing OOM (Out of Memory) killer events. The Linux OOM killer terminates processes when available RAM is exhausted, and MySQL (`mysqld`) is a common target due to its memory-intensive nature.
 
'''Check disk I/O performance:'''
<syntaxhighlight lang="bash">
# Current I/O wait (should be < 10% normally)
iostat -x 5
 
# Detailed disk stats
dstat -d
 
# Real-time disk latency
ioping -c 10 .
</syntaxhighlight>
 
'''Check NFS latency (if using NFS storage):'''
<syntaxhighlight lang="bash">
# Test NFS read/write latency
time dd if=/dev/zero of=/var/spool/voipmonitor/testfile bs=1M count=100
time cat /var/spool/voipmonitor/testfile > /dev/null
rm /var/spool/voipmonitor/testfile
 
# Check NFS mount options
mount | grep nfs
</syntaxhighlight>
 
'''Common performance solutions:'''
* Use SSD/NVMe for VoIPmonitor spool directory
* Ensure proper NIC queue settings for high-throughput NFS
* Check network switch port configuration for NFS
* Review [[Scaling]] guide for detailed optimization
 
See also [[IO_Measurement]] for comprehensive disk benchmarking tools.
 
=== Diagnosing Memory Buffer Filling Issues ===
 
If the sensor's working memory buffer is filling up and causing crashes or "packetbuffer: MEMORY IS FULL" messages, you must determine '''what is consuming the buffer memory''' before applying fixes. The root cause is typically an I/O bottleneck (disk writes are too slow) or an insufficient buffer size for high traffic.
 
;1. Performance Isolation Test: Temporarily Disable Packet Saving
 
If the buffer keeps filling despite having sufficient RAM, use this diagnostic technique to identify whether the issue is disk I/O related or CPU related:
 
<syntaxhighlight lang="ini">
# Edit /etc/voipmonitor.conf - temporarily disable packet writing to disk
 
savesip = no
savertp = no
savertcp = no
savegraph = no
</syntaxhighlight>
 
Restart the sensor:
<syntaxhighlight lang="bash">
systemctl restart voipmonitor
</syntaxhighlight>
 
;2. Observe the Behavior:
 
'''Scenario A: High CPU usage continues with packet saving disabled'''
* '''Diagnosis:''' The issue is CPU-related (RTP processing thread bottleneck)
* '''Solution:''' Disable audio features or use SIP-only analysis mode:
<syntaxhighlight lang="ini">
silencedetect = no
# saveaudio = wav  # Comment out to disable audio conversion
</syntaxhighlight>
* See [[Scaling|Scaling and Performance Tuning]] for CPU optimization strategies
 
'''Scenario B: CPU usage drops but system still crashes when packet saving is re-enabled'''
* '''Diagnosis:''' The issue is I/O-related (disk write bottleneck is causing buffers to back up)
* '''Solutions:**
** If server has sufficient RAM: '''Increase''' <code>max_buffer_mem</code> to allocate more buffer memory for the I/O bottleneck:
<syntaxhighlight lang="ini">
# Increase buffer memory only if RAM is available (e.g., 10GB)
max_buffer_mem = 10000
</syntaxhighlight>
** If server has limited RAM: Upgrade to faster storage (SSD/NVMe) or add physical RAM
** See [[Scaling]] for I/O optimization strategies
 
;3. High Concurrent Call Load (8,000–10,000 or more calls)
 
When experiencing "PACKETBUFFER: memory is FULL" errors under high concurrent call loads (8000+ calls), apply this specific threading configuration in <code>/etc/voipmonitor.conf</code>:
 
<syntaxhighlight lang="ini">
# Increase RTP processing threads - set to 20 or half of total CPU count
preprocess_rtp_threads = 20
 
# Set threading mode to high_traffic for environments exceeding 5 Gbit/s
threading_expanded = high_traffic
 
# Increase buffer memory (ensure you have sufficient physical RAM)
max_buffer_mem = 10000
</syntaxhighlight>
 
After applying these settings:
* Restart the sensor: <code>systemctl restart voipmonitor</code>
* Monitor the packet buffer heap to ensure it remains below 20% during peak traffic:
<syntaxhighlight lang="bash">
tail -f /var/log/syslog | grep "heap[A|B|C]"
</syntaxhighlight>
The first value in the heap metrics should ideally stay under 20%. If buffer usage frequently exceeds 20%, consider:
* Increasing <code>max_buffer_mem</code> further (if RAM is available)
* Adding more CPU cores
* Offloading database to a separate server
* Enabling <code>packetbuffer_compress = yes</code> (uses more CPU but saves memory)
 
;4. Normal Kernel Behavior Check:
 
Linux systems use available RAM for caching disk blocks and buffers to improve performance. This expected behavior shows as high "buff/cache" usage in <code>free -m</code> output:
 
<syntaxhighlight lang="bash>
free -m
# Output example:
#              total        used        free      shared  buff/cache  available
# Mem:          32168      24542        783        256        6843        6779
# Swap:          16383          0      16383
#
# "buff/cache" = 6843MB - This is NORMAL and considered free RAM
</syntaxhighlight>
 
This cache memory is automatically reclaimed when applications need it. High cache/buffer usage does NOT indicate a memory leak or problem.
 
;5. Verify swappiness Configuration:
 
For real-time services like VoIPmonitor, ensure <code>vm.swappiness</code> is set to a low value to prevent the system from swapping too early:
 
<syntaxhighlight lang="bash>
# Check current swappiness value
cat /proc/sys/vm/swappiness
# Default 60 is dangerous for VoIPmonitor - should be 5 or lower
</syntaxhighlight>
 
Set swappiness to 5 (recommended):
<syntaxhighlight lang="bash>
# Temporary change
echo '5' > /proc/sys/vm/swappiness
 
# Permanent change - add to /etc/sysctl.conf
echo 'vm.swappiness=5' >> /etc/sysctl.conf
sysctl -p
</syntaxhighlight>
 
See [[Swap|Swap Configuration]] for detailed swappiness tuning guidance.
 
== Check for OOM (Out of Memory) Issues ==
If VoIPmonitor suddenly stops processing CDRs and a service restart temporarily restores functionality, the system may be experiencing OOM (Out of Memory) killer events. The Linux OOM killer terminates processes when available RAM is exhausted, and MySQL (<code>mysqld</code>) is a common target due to its memory-intensive nature.


;1. Check for OOM killer events in kernel logs:
;1. Check for OOM killer events in kernel logs:
<syntaxhighlight lang="bash">
<pre>
# For Debian/Ubuntu
# For Debian/Ubuntu
grep -i "out of memory\|killed process" /var/log/syslog | tail -20
grep -i "out of memory\|killed process" /var/log/syslog | tail -20
Line 1,340: Line 135:
# Also check dmesg:
# Also check dmesg:
dmesg | grep -i "killed process" | tail -10
dmesg | grep -i "killed process" | tail -10
</syntaxhighlight>
</pre>
 
Typical OOM killer messages look like:
Typical OOM killer messages look like:
<syntaxhighlight lang="text">
<pre>
Out of memory: Kill process 1234 (mysqld) score 123 or sacrifice child
Out of memory: Kill process 1234 (mysqld) score 123 or sacrifice child
Killed process 1234 (mysqld) total-vm: 12345678kB, anon-rss: 1234567kB
Killed process 1234 (mysqld) total-vm: 12345678kB, anon-rss: 1234567kB
</syntaxhighlight>
</pre>


;2. Monitor current memory usage:
;2. Monitor current memory usage:
<syntaxhighlight lang="bash">
<pre>
# Check available memory (look for low 'available' or 'free' values)
# Check available memory (look for low 'available' or 'free' values)
free -h
free -h
Line 1,358: Line 152:
# Check MySQL memory usage in bytes
# Check MySQL memory usage in bytes
cat /proc/$(pgrep mysqld)/status | grep -E "VmSize|VmRSS"
cat /proc/$(pgrep mysqld)/status | grep -E "VmSize|VmRSS"
</syntaxhighlight>
</pre>
 
Warning signs:
'''Warning signs:'''
* '''Available memory consistently below 500MB during operation'''
* Available memory consistently below 500MB during operation
* '''MySQL consuming most of the available RAM'''
* MySQL consuming most of the available RAM
* '''Swap usage near 100% (if swap is enabled)'''
* Swap usage near 100% (if swap is enabled)
* '''Frequent process restarts without clear error messages'''
* Frequent process restarts without clear error messages
 
;3. First Fix: Check and correct innodb_buffer_pool_size:
Before upgrading hardware, verify that <code>innodb_buffer_pool_size</code> is not set too high. This is a common cause of OOM incidents.
 
'''Calculate the correct buffer pool size:'''
For a server running both VoIPmonitor and MySQL on the same host:
<syntaxhighlight lang="text">
Formula: innodb_buffer_pool_size = (Total RAM - VoIPmonitor memory - OS overhead) / 2
 
Example for a 32GB server:
- Total RAM: 32GB
- VoIPmonitor process memory (check with ps aux): ~2GB
- OS + other services overhead: ~2GB
- Available for buffer pool: 28GB
- Recommended innodb_buffer_pool_size = 14G
</syntaxhighlight>
 
'''Edit the MariaDB configuration file:'''
<syntaxhighlight lang="ini">
# Common locations: /etc/mysql/my.cnf, /etc/mysql/mariadb.conf.d/50-server.cnf
 
innodb_buffer_pool_size = 14G  # Adjust based on your calculation
</syntaxhighlight>
 
'''Restart MariaDB to apply:'''
<syntaxhighlight lang="bash">
systemctl restart mariadb  # or systemctl restart mysql
</syntaxhighlight>
 
;4. Second Fix: Reduce VoIPmonitor buffer memory usage:
VoIPmonitor allocates significant memory for packet buffers. The total buffer memory is calculated based on:
 
{| class="wikitable"
|-
! Parameter !! Default !! Description
|-
| <code>ringbuffer</code> || 50MB || Ring buffer size per interface (recommended ≥500MB for >100 Mbit traffic)
|-
| <code>max_buffer_mem</code> || 2000MB || Maximum buffer memory limit
|}


'''Total formula:''' Approximate total = (ringbuffer × number of interfaces) + max_buffer_mem
;3. Solution: Increase physical memory:
 
The definitive solution for OOM-related CDR processing issues is to upgrade the server's physical RAM. After upgrading:
'''To reduce VoIPmonitor memory usage:'''
<syntaxhighlight lang="ini">
# Edit /etc/voipmonitor.conf
 
# Reduce ringbuffer for each interface (e.g., from 50 to 20)
ringbuffer = 20
 
# Reduce maximum buffer memory (e.g., from 2000 to 1000)
max_buffer_mem = 1000
 
# Alternatively, reduce the number of sniffing interfaces if not all are needed
interface = eth0,eth1  # Instead of eth0,eth1,eth2,eth3
</syntaxhighlight>
 
'''After making changes:'''
<syntaxhighlight lang="bash">
systemctl restart voipmonitor
</syntaxhighlight>
 
'''Important notes:'''
* Reducing <code>ringbuffer</code> may increase packet loss during traffic spikes
* Reducing <code>max_buffer_mem</code> affects how many packets can be buffered before being written to disk
* Monitor packet loss statistics in the GUI after reducing buffers to ensure acceptable performance
 
;5. Solution: Increase physical memory (if buffer tuning is insufficient):
If correcting both MySQL and VoIPmonitor buffer settings does not resolve the OOM issues, upgrade the server's physical RAM. After upgrading:
* Verify memory improvements with <code>free -h</code>
* Verify memory improvements with <code>free -h</code>
* Recalculate and adjust <code>innodb_buffer_pool_size</code>
* Re-tune <code>ringbuffer</code> and <code>max_buffer_mem</code>
* Monitor for several days to ensure OOM events stop
* Monitor for several days to ensure OOM events stop
* Consider tuning `innodb_buffer_pool_size` in your MySQL configuration to use the additional memory effectively


== Sensor Upgrade Fails with "Permission denied" from /tmp ==
Additional mitigation strategies (while planning for RAM upgrade):
If the sensor upgrade process fails with "Permission denied" errors when executing scripts from the <code>/tmp</code> directory, or the service fails to restart after upgrade, the <code>/tmp</code> partition may be mounted with the <code>noexec</code> flag.
* Reduce MySQL's memory footprint by lowering `innodb_buffer_pool_size` (e.g., from 16GB to 8GB)
 
* Disable or limit non-essential VoIPmonitor features (e.g., packet capture storage, RTP analysis)
The <code>noexec</code> mount option prevents execution of any script or binary from the <code>/tmp</code> directory for security reasons. However, the VoIPmonitor sensor upgrade process uses <code>/tmp</code> for temporary script execution.
* Ensure swap space is properly configured as a safety buffer (though swap is much slower than RAM)
 
* Use `sysctl vm.swappiness=10` to favor RAM over swap when some memory is still available
;1. Check the mount options for /tmp:
<syntaxhighlight lang="bash">
mount | grep /tmp
</syntaxhighlight>
Look for the <code>noexec</code> flag in the mount options:
<syntaxhighlight lang="text">
/dev/sda2 on /tmp type ext4 rw,relatime,noexec,nosuid,nodev
</syntaxhighlight>
 
;2. Remount /tmp without noexec (temporary fix):
<syntaxhighlight lang="bash">
mount -o remount,exec /tmp
 
# Verify the change:
mount | grep /tmp
</syntaxhighlight>
The output should no longer contain <code>noexec</code>.
 
;3. Make the change permanent (edit /etc/fstab):
<syntaxhighlight lang="bash">
nano /etc/fstab
</syntaxhighlight>
 
Remove the <code>noexec</code> option from the /tmp line:
<syntaxhighlight lang="text">
# Before:
/dev/sda2  /tmp  ext4  rw,relatime,noexec,nosuid,nodev  0 0
 
# After (remove noexec):
/dev/sda2  /tmp  ext4  rw,relatime,nosuid,nodev  0 0
</syntaxhighlight>
 
If <code>/tmp</code> is a separate partition, remount for changes to take effect:
<syntaxhighlight lang="bash">
mount -o remount /tmp
</syntaxhighlight>
 
;4. Re-run the sensor upgrade:
After fixing the mount options, retry the sensor upgrade process.
 
== "No space left on device" Despite Disks Having Free Space ==
If system services (like php-fpm, voipmonitor, or commands like <code>screen</code>) fail with a "No space left on device" error even though <code>df -h</code> shows sufficient disk space, the issue is likely with '''temporary filesystems''' (<code>/tmp</code>, <code>/run</code>) filling up, not with main disk storage.
 
;1. Check usage of temporary filesystems:
<syntaxhighlight lang="bash">
# Check /tmp usage
df -h /tmp
 
# Check /run usage
df -h /run
</syntaxhighlight>
 
If <code>/tmp</code> or <code>/run</code> show 100% usage despite main filesystems having free space, these temporary filesystems need to be cleaned.
 
;2. Check what is consuming space:
<syntaxhighlight lang="bash">
# Find large files in /tmp
du -sh /tmp/* 2>/dev/null | sort -hr | head -20
 
# Check journal disk usage
journalctl --disk-usage
</syntaxhighlight>
 
;3. Immediate cleanup of journal logs:
System journal logs stored in <code>/run/log/journal/</code> can fill up the <code>/run</code> filesystem.
<syntaxhighlight lang="bash">
# Limit journal to 100MB total size
sudo journalctl --vacuum-size=100M
 
# Or limit by time (keep only last 2 days)
sudo journalctl --vacuum-time=2d
</syntaxhighlight>
 
;4. Permanent solution - Configure journal rotation:
Edit <code>/etc/systemd/journald.conf</code>:
<syntaxhighlight lang="ini">
[Journal]
SystemMaxUse=100M
MaxRetentionSec=1month
</syntaxhighlight>
 
Apply changes:
<syntaxhighlight lang="bash">
sudo systemctl restart systemd-journald
</syntaxhighlight>
 
;5. Quick fix - System reboot:
The quickest way to free space in <code>/tmp</code> and <code>/run</code> is a system reboot, as these filesystems are cleared on each boot.
 
== Check VoIPmonitor Logs for General Errors ==
After addressing the specific issues above, check the system logs for other error messages from the sensor process that may reveal additional problems.
 
<syntaxhighlight lang="bash">
# For Debian/Ubuntu
tail -f /var/log/syslog | grep voipmonitor
 
# For CentOS/RHEL/AlmaLinux
tail -f /var/log/messages | grep voipmonitor
</syntaxhighlight>
 
'''Common errors to look for:'''
* <code>"pcap_open_live(eth0) error: eth0: No such device"</code> - Wrong interface name
* <code>"Permission denied"</code> - Sensor not running with sufficient privileges
* Messages about connection issues - See [[#Troubleshoot MySQL/MariaDB Database Connection Errors|database troubleshooting]]
* Messages about dropping packets - See [[Scaling]] guide
 
== Benign Database Errors When Features Are Disabled ==
Some VoIPmonitor features may generate harmless database errors when those features are not enabled in your configuration. These errors are '''benign''' and can be safely ignored.
 
=== Common Benign Error: Missing Tables ===
If you see MySQL errors stating that a table does not exist (e.g., "Table 'voipmonitor.ss7' doesn't exist") even though the corresponding feature is disabled, this is expected behavior.
 
'''Common examples:'''
* Errors about the <code>ss7</code> table when <code>ss7 = no</code> in <code>voipmonitor.conf</code>
* Errors about the <code>register_failed</code>, <code>register_state</code>, or <code>sip_msg</code> tables when those features are disabled
 
=== Solution: Ignore or Suppress in Monitoring ===
Since these errors indicate that a feature is simply not active, they do not impact system functionality:
 
# '''Do not change the configuration''' to fix these errors
# '''Add monitoring exceptions''' to suppress warnings for table-not-found errors (MySQL error code 1146)
# Configure alerting systems to exclude these specific SQL errors from notifications
 
=== When to Take Action ===
You only need to take action if:
* You actually want to use the feature (enable the corresponding configuration option)
* Errors persist about tables for features that '''are''' explicitly enabled in <code>voipmonitor.conf</code>


== Appendix: tshark Display Filter Syntax for SIP ==
== Appendix: tshark Display Filter Syntax for SIP ==
When using <code>tshark</code> to analyze SIP traffic, it is important to use the correct Wireshark display filter syntax.
When using `tshark` to analyze SIP traffic, it is important to use the '''correct Wireshark display filter syntax'''. Below are common filter examples:


=== Basic SIP Filters ===
=== Basic SIP Filters ===
<syntaxhighlight lang="bash">
<pre>
# Show all SIP INVITE messages
# Show all SIP INVITE messages
tshark -r capture.pcap -Y "sip.Method == INVITE"
tshark -r capture.pcap -Y "sip.Method == INVITE"
Line 1,585: Line 184:
# Show SIP and RTP traffic
# Show SIP and RTP traffic
tshark -r capture.pcap -Y "sip || rtp"
tshark -r capture.pcap -Y "sip || rtp"
</syntaxhighlight>
</pre>


=== Search for Specific Phone Number or Text ===
=== Search for Specific Phone Number or Text ===
<syntaxhighlight lang="bash">
<pre>
# Find calls containing a specific phone number (e.g., 5551234567)
# Find calls containing a specific phone number (e.g., 5551234567)
tshark -r capture.pcap -Y 'sip contains "5551234567"'
tshark -r capture.pcap -Y 'sip contains "5551234567"'
Line 1,594: Line 193:
# Find INVITE messages for a specific number
# Find INVITE messages for a specific number
tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"'
tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"'
</syntaxhighlight>
</pre>


=== Extract Call-ID from Matching Calls ===
=== Extract Call-ID from Matching Calls ===
<syntaxhighlight lang="bash">
<pre>
# Get Call-ID for calls matching a phone number
# Get Call-ID for calls matching a phone number
tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"' -T fields -e sip.Call-ID
tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"' -T fields -e sip.Call-ID
Line 1,603: Line 202:
# Get Call-ID along with From and To headers
# Get Call-ID along with From and To headers
tshark -r capture.pcap -Y 'sip.Method == INVITE' -T fields -e sip.Call-ID -e sip.from.user -e sip.to.user
tshark -r capture.pcap -Y 'sip.Method == INVITE' -T fields -e sip.Call-ID -e sip.from.user -e sip.to.user
</syntaxhighlight>
</pre>


=== Filter by IP Address ===
=== Filter by IP Address ===
<syntaxhighlight lang="bash">
<pre>
# SIP traffic from a specific source IP
# SIP traffic from a specific source IP
tshark -r capture.pcap -Y "sip && ip.src == 192.168.1.100"
tshark -r capture.pcap -Y "sip && ip.src == 192.168.1.100"
Line 1,612: Line 211:
# SIP traffic between two hosts
# SIP traffic between two hosts
tshark -r capture.pcap -Y "sip && ip.addr == 192.168.1.100 && ip.addr == 10.0.0.50"
tshark -r capture.pcap -Y "sip && ip.addr == 192.168.1.100 && ip.addr == 10.0.0.50"
</syntaxhighlight>
</pre>


=== Filter by SIP Response Code ===
=== Filter by SIP Response Code ===
<syntaxhighlight lang="bash">
<pre>
# Show all 200 OK responses
# Show all 200 OK responses
tshark -r capture.pcap -Y "sip.Status-Code == 200"
tshark -r capture.pcap -Y "sip.Status-Code == 200"
Line 1,624: Line 223:
# Show 486 Busy Here responses
# Show 486 Busy Here responses
tshark -r capture.pcap -Y "sip.Status-Code == 486"
tshark -r capture.pcap -Y "sip.Status-Code == 486"
</syntaxhighlight>
</pre>


=== Important Syntax Notes ===
=== Important Syntax Notes ===
{| class="wikitable"
* '''Field names are case-sensitive:''' Use <code>sip.Method</code>, <code>sip.Call-ID</code>, <code>sip.Status-Code</code> (not <code>sip.method</code> or <code>sip.call-id</code>)
|-
* '''String matching uses <code>contains</code>:''' Use <code>sip contains "text"</code> (not <code>sip.contains()</code>)
! Syntax Element !! Correct Usage !! Notes
* '''Use double quotes for strings:''' <code>sip contains "number"</code> (not single quotes)
|-
* '''Boolean operators:''' Use <code>&&</code> (and), <code>||</code> (or), <code>!</code> (not)
| Field names || <code>sip.Method</code>, <code>sip.Call-ID</code> || Case-sensitive
|-
| String matching || <code>sip contains "text"</code> || Use <code>contains</code> keyword
|-
| String quotes || Double quotes <code>"..."</code> || Not single quotes
|-
| Boolean operators || <code>&&</code>, <code>||</code>, <code>!</code> || AND, OR, NOT
|}


For a complete reference, see the [https://www.wireshark.org/docs/dfref/s/sip.html Wireshark SIP Display Filter Reference].
For a complete reference, see the [https://www.wireshark.org/docs/dfref/s/sip.html Wireshark SIP Display Filter Reference].


== AI Summary for RAG ==
== Step 8: Missing CDRs for Calls with Large Packets ==
'''Summary:''' Systematic troubleshooting guide for VoIPmonitor sensor issues. Part 1 covers "no calls being captured": service startup problems (binary renamed after crash, unresponsive after GUI update), network traffic verification using tshark (pcap capture and tshark analysis), decision tree when SIP packets present but not captured, GUI sensor statistics (packet drops counter), network encapsulation troubleshooting (detecting VLAN, ERSPAN, GRE, VXLAN, TZSP with tshark protocol analysis -Z io,phs, visual inspection -V, VLAN tag detection), promiscuous mode requirements (needed for SPAN/RSPAN but not for ERSPAN/GRE/TZSP - VoIPmonitor decapsulates automatically), SPAN/port mirroring configuration verification including bidirectional capture (ensure SPAN monitors both inbound/ingress and outbound/egress traffic, verify source ports include all PBX/SBC ports, check VLAN trunk mode on destination port, identify missing call legs using tcpdump for specific IPs during test calls), voipmonitor.conf configuration checks (interface, sipport, filter), GUI capture rules with Skip option, database connection errors (Error 111 after migration), HEAP FULL errors (hardware vs performance issues), memory buffer filling diagnosis (disabling savesip/savertp/savertcp/savegraph to isolate CPU vs I/O bottleneck, increasing max_buffer_mem for I/O issues vs reducing for OOM, normal kernel buff/cache behavior, vm.swappiness = 5), OOM killer problems (innodb_buffer_pool_size and ringbuffer/max_buffer_mem tuning), upgrade failures due to /tmp noexec flag, and "no space left" errors caused by full /tmp or /run filesystems. Part 2 covers "missing or one-way audio": verifying savertp=yes, RTP traffic presence, NAT alias configuration (natalias with correct two-parameter syntax, try reversing IP order if needed), rtpip_find_endpoints, rtpfromsdp_onlysip, SDP vs actual RTP port mismatch analysis using tshark/tcpdump, external device (SBC/media server) port modification issues, and distributed architecture constraints (SIP and RTP must be captured by same sensor). Part 3 covers "NFS distributed setup troubleshooting" (missing call data for specific time periods): check NFS timeout errors in system logs with grep "nfs.*server.*not responding", dmesg -T | grep nfs.*timeout, look for specific error messages "nfs: server [IP_ADDRESS] not responding, timed out", "NFS: server [IP_ADDRESS] not responding, still trying", verify NFS server connectivity with ping, traceroute, nc -zv for port 2049, check NFS mount accessibility with touch test, verify network infrastructure between sensor and NFS server. Part 4 covers "S3 cloud storage mounting issues": s3fs causes manager interface to become unresponsive during tar_move due to inefficient FUSE caching and excessive metadata overhead, switch to rclone mount with optimized parameters (--allow-other, --dir-cache-time 30s, --poll-interval 0, --vfs-cache-mode off, --buffer-size 0, --use-server-modtime, --no-modtime, --s3-no-head, --log-level INFO) for production deployments.
If VoIPmonitor is capturing some calls successfully but missing CDRs for specific calls (especially those that seem to have larger SIP packets like INVITEs with extensive SDP), there are two common causes to investigate.
 
=== Cause 1: snaplen Packet Truncation (VoIPmonitor Configuration) ===
The `snaplen` parameter in `voipmonitor.conf` limits how many bytes of each packet are captured. If a SIP packet exceeds `snaplen`, it is truncated and the sniffer may fail to parse the call correctly.
 
;1. Check your current snaplen setting:
<pre>
grep snaplen /etc/voipmonitor.conf
</pre>
Default is 3200 bytes (6000 if SSL/HTTP is enabled).
 
;2. Test if packet truncation is the issue:
Use `tcpdump` with `-s0` (snap infinite) to capture full packets:
<pre>
# Capture SIP traffic with full packet length
tcpdump -i eth0 -s0 -nn port 5060 -w /tmp/test_capture.pcap
 
# Analyze packet sizes with Wireshark or tshark
tshark -r /tmp/test_capture.pcap -T fields -e frame.len -Y "sip" | sort -n | tail -10
</pre>
If you see SIP packets larger than your `snaplen` value (e.g., 4000+ bytes), increase `snaplen` in `voipmonitor.conf`:
<pre>
snaplen = 65535
</pre>
Then restart the sniffer: `systemctl restart voipmonitor`.
 
=== Cause 2: MTU Mismatch (Network Infrastructure) ===
If packets are being lost or fragmented due to MTU mismatches in the network path, VoIPmonitor may never receive the complete packets, regardless of `snaplen` settings.
 
;1. Diagnose MTU-related packet loss:
Capture traffic with tcpdump and analyze in Wireshark:
<pre>
# Capture traffic on the VoIPmonitor host
tcpdump -i eth0 -s0 host <pbx_ip_address> -w /tmp/mtu_test.pcap
</pre>
Open the pcap in Wireshark and look for:
* Reassembled PDUs marked as incomplete
* TCP retransmissions for the same packet
* ICMP "Fragmentation needed" messages (Type 3, Code 4)
 
;2. Verify packet completeness:
In Wireshark, examine large SIP INVITE packets. If the SIP headers or SDP appear cut off or incomplete, packets are likely being lost in transit due to MTU issues.
 
;3. Identify the MTU bottleneck:
The issue is typically a network device with a lower MTU than the end devices. Common locations:
* VPN concentrators
* Firewalls
* Routers with tunnel interfaces
* Cloud provider gateways (typically 1500 bytes vs. standard 9000 jumbo frames)
 
To locate the problematic device, trace the MTU along the network path from the PBX to the VoIPmonitor sensor.
 
;4. Resolution options:
* Increase MTU on the bottleneck device to match the rest of the network (e.g., from 1500 to 9000 for jumbo frame environments)
* Enable Path MTU Discovery (PMTUD) on intermediate devices
* Ensure your switching infrastructure supports jumbo frames end-to-end if you are using them


'''Keywords:''' troubleshooting, no calls, tshark, promiscuous mode, SPAN, ERSPAN, GRE, TZSP, VXLAN, VLAN, encapsulation, vlan tags, io,phs protocol analysis, voipmonitor.conf, interface, sipport, capture rules, Skip, packet drops, sensor statistics, Settings → Sensors, ringbuffer, max_buffer_mem, memory buffer filling, HEAP FULL, packetbuffer memory is full, savesip, savertp, savertcp, savegraph, CPU bottleneck, I/O bottleneck, max_buffer_mem increase vs decrease, buff/cache, vm.swappiness, OOM killer, innodb_buffer_pool_size, Connection refused 111, noexec, /tmp, journal logs, no space left on device, missing audio, one-way audio, no audio, RTP correlation, natalias, rtpip_find_endpoints, rtpfromsdp_onlysip, savertp, SDP port mismatch, SBC, media server, distributed architecture, NFS, NFS timeout, NFS server not responding, missing data, missing CDRs, missing PCAP files, spooldir, nfs.*server.*not responding, dmesg nfs timeout, ping, traceroute, nc -zv port 2049, network connectivity, network infrastructure, timeo=600, hard nfs mount, nfsvers=3, S3, cloud storage, s3fs, rclone, tar_move, AWS S3, S3 bucket, FUSE mounting, storage mount, cloud mount, unresponsive manager interface, high latency, missing call legs, missing SIP packets, specific IP addresses, bidirectional capture, SPAN configuration, port mirroring, inbound traffic, outbound traffic, ingress, egress, tcpdump verification
For more information on the `snaplen` parameter, see [[Sniffer_configuration#Network_Interface_.26_Sniffing|Sniffer Configuration]].


== AI Summary for RAG ==
'''Summary:''' This document provides a step-by-step troubleshooting guide for when the VoIPmonitor sensor is not capturing any calls. The process is broken down into eight logical steps. Step 1 is to verify the service is running correctly using `systemctl status` and `ps`. Step 2 is to use `tshark` to confirm if SIP/RTP traffic is actually arriving at the server's network interface. Step 3 covers network-level issues, including an important distinction between Layer 2 mirroring (SPAN/RSPAN) which requires promiscuous mode, and Layer 3 tunneling (ERSPAN/GRE/TZSP/VXLAN) which does NOT require promiscuous mode because the tunnel packets are addressed to the sensor's IP. Step 4 focuses on checking the `voipmonitor.conf` file for common misconfigurations like the `interface`, `sipport`, or `filter` parameters. Step 5 addresses GUI capture rules with the "Skip" option that can cause probes to stop processing calls even when network traffic is visible; it explains how to review, backup, remove, and systematically test capture rules via the GUI interface. Step 6 instructs the user on how to check the system logs (`syslog` or `messages`) for specific error messages from the sensor. Step 7 explains how to diagnose OOM (Out of Memory) killer events, which cause CDR processing to stop and require service restarts to restore. It provides commands to check for OOM events in kernel logs (grep, dmesg), monitor current memory usage (free -h, ps aux), and identify warning signs like low available memory (<500MB), high MySQL memory consumption, and frequent process restarts. The solution is to increase physical RAM, with mitigation strategies including reducing `innodb_buffer_pool_size`, disabling non-essential features, and configuring swap. Step 8 covers missing CDRs for calls with large packets, explaining two common causes: snaplen packet truncation (VoIPmonitor configuration issue requiring increasing `snaplen` to 65535 in voipmonitor.conf) and MTU mismatch (network infrastructure issue requiring finding and fixing devices with lower MTU, typically through Path MTU Discovery or adjusting MTU settings on VPNs, firewalls, or cloud gateways). The Appendix provides correct tshark display filter syntax for analyzing SIP traffic, including examples for filtering by SIP method (sip.Method == INVITE), searching for phone numbers (sip contains "number"), extracting Call-IDs (-T fields -e sip.Call-ID), and filtering by response codes (sip.Status-Code).
'''Keywords:''' troubleshooting, no calls, not sniffing, no data, no CDRs, tshark, wireshark, promiscuous mode, promisc, ifconfig, ip link, SPAN, RSPAN, ERSPAN, GRE, TZSP, VXLAN, port mirroring, voipmonitor.conf, interface, sipport, filter, capture rules, Skip ON, GUI capture rules, reload sniffer, backup and restore, syslog, logs, permission denied, display filter, sip.Method, sip.Call-ID, sip.Status-Code, sip contains, OOM, out of memory, OOM killer, killed process, mysqld killed, free -h, memory usage, dmesg, swap, RAM upgrade, innodb_buffer_pool_size, missing CDRs, large packets, snaplen, packet truncation, tcpdump, MTU mismatch, MTU, Path MTU Discovery, fragmentation, ICMP fragmentation needed, VPN MTU, jumbo frames, incomplete packets
'''Key Questions:'''
'''Key Questions:'''
* Sensor is missing call legs or specific SIP packets (like INVITE) for certain IP addresses, how do I troubleshoot?
* How do I use tcpdump to verify if packets for specific IP addresses are reaching the sensor interface?
* How do I check if SPAN port mirroring is configured correctly for bidirectional capture?
* What should I verify in the switch SPAN configuration when packets are missing for specific IPs?
* Do I need to monitor both ingress and egress traffic for complete call capture?
* Why are only certain IP addresses missing from VoIPmonitor capture during high-traffic periods?
* How do I verify my SPAN session is capturing both inbound and outbound traffic?
* What are common SPAN configuration issues that cause specific call legs to be missing?
* How do I diagnose whether missing packets are due to SPAN configuration vs VoIPmonitor issues?
* Why does the manager interface become unresponsive when using tar_move with S3 storage?
* What is the recommended S3 mounting tool for VoIPmonitor?
* How do I fix S3 mounting issues with s3fs causing unresponsive manager interface?
* What rclone mount parameters are recommended for VoIPmonitor S3 storage?
* Why is s3fs not recommended for tar_move operations?
* How do I switch from s3fs to rclone for S3 storage mounting?
* How do I troubleshoot rclone mount issues?
* Why is VoIPmonitor not recording any calls?
* Why is VoIPmonitor not recording any calls?
* How do I check if VoIP traffic is reaching my sensor?
* How can I check if VoIP traffic is reaching my sensor server?
* How do I capture traffic to a .pcap file for analysis?
* What command can I use to see live SIP traffic on the command line?
* How do I use tshark to detect network encapsulation?
* How do I enable promiscuous mode on my network card?
* How do I check if my traffic is encapsulated in ERSPAN, GRE, VXLAN, or VLAN?
* How do I detect VLAN tags in my capture file?
* Does VoIPmonitor handle ERSPAN, GRE, VXLAN, and TZSP automatically?
* Do I need promiscuous mode for ERSPAN or GRE tunnels?
* Do I need promiscuous mode for ERSPAN or GRE tunnels?
* Why does tcpdump show traffic but VoIPmonitor doesn't capture it?
* Does ERSPAN require promiscuous mode on the receiving interface?
* How do I diagnose packet drops in VoIPmonitor?
* VoIPmonitor is running but I have no new calls in the GUI, what should I check first?
* How do I check for packet drops in the GUI sensor statistics?
* Where can I find the log files for the VoIPmonitor sniffer?
* What is the acceptable value for # packet drops in Settings → Sensors?
* What are the most common reasons for VoIPmonitor not capturing data?
* How do I fix "Connection refused (111)" database errors?
* How do I filter tshark output for SIP INVITE messages?
* VoIPmonitor crashes with HEAP FULL error, what should I check?
* What is the correct tshark filter syntax to find a specific phone number?
* How do I diagnose memory buffer filling issues - CPU vs I/O bottleneck?
* How do I extract Call-ID from a pcap file using tshark?
* How do I use savesip/savertp/savertcp/savegraph to isolate bottlenecks?
* What tshark filter shows all SIP 4xx and 5xx error responses?
* When should I increase vs decrease max_buffer_mem?
* Why is my VoIPmonitor probe stopping processing calls even though network traffic is visible?
* Is high buff/cache usage normal in Linux for VoIPmonitor?
* What should I check if the probe sees SIP packets on the interface but processes no calls?
* How to configure vm.swappiness for VoIPmonitor?
* How do GUI capture rules affect call processing?
* How do I fix OOM killer issues on VoIPmonitor server?
* Why are CDRs missing for calls with large SIP packets?
* Why does sensor upgrade fail with permission denied from /tmp?
* What does the snaplen parameter do in voipmonitor.conf?
* "No space left on device" but disk has free space, what to check?
* How do I fix missing CDRs for calls with large INVITE packets?
* Why is audio missing or one-way in the GUI CDR view?
* What is the difference between snaplen truncation and MTU mismatch?
* How do I configure natalias for NAT scenarios?
* How do I diagnose MTU-related packet loss in VoIPmonitor?
* How do I diagnose SDP vs RTP port mismatches?
* What tcpdump command should I use to capture full packets for debugging?
* What should I do if external SBC is modifying RTP ports?
* What does the "Skip" option in capture rules do?
* Why are CDRs and PCAP files missing for a specific time period in NFS setup?
* How do I troubleshoot capture rules that are blocking calls?
* How do I check for NFS timeout errors in system logs?
* VoIPmonitor server stops processing CDRs and needs restart. What could be wrong?
* What NFS error messages should I look for when data is missing?
* Why does MySQL crash and restart on my VoIPmonitor server?
* How do I verify NFS server connectivity is working?
* How do I check for OOM killer events in Linux?
* What does "nfs: server [IP_ADDRESS] not responding, timed out" mean?
* What does the error "Out of memory: Kill process" mean?
* How do I debug NFS disconnection issues in distributed VoIPmonitor setup?
* How can I monitor memory usage on my VoIPmonitor server?
* How do I check if NFS mount is accessible and writing correctly?
* What command shows available memory in Linux?
* How do I fix OOM killer issues on VoIPmonitor?
* Why is mysqld getting killed on my system?

Revision as of 07:48, 5 January 2026


This guide provides a systematic, step-by-step process to diagnose why the VoIPmonitor sensor might not be capturing any calls. Follow these steps in order to quickly identify and resolve the most common issues.

Step 1: Is the VoIPmonitor Service Running Correctly?

First, confirm that the sensor process is active and loaded the correct configuration file.

1. Check the service status (for modern systemd systems)
systemctl status voipmonitor

Look for a line that says Active: active (running). If it is inactive or failed, try restarting it with `systemctl restart voipmonitor` and check the status again.

2. Verify the running process
ps aux | grep voipmonitor

This command will show the running process and the exact command line arguments it was started with. Critically, ensure it is using the correct configuration file, for example: --config-file /etc/voipmonitor.conf. If it is not, there may be an issue with your startup script.

Step 2: Is Network Traffic Reaching the Server?

If the service is running, the next step is to verify if the VoIP packets (SIP/RTP) are actually arriving at the server's network interface. The best tool for this is `tshark` (the command-line version of Wireshark).

1. Install tshark
# For Debian/Ubuntu
apt-get update && apt-get install tshark

# For CentOS/RHEL/AlmaLinux
yum install wireshark
2. Listen for SIP traffic on the correct interface

Replace `eth0` with the interface name you have configured in `voipmonitor.conf`.

tshark -i eth0 -Y "sip || rtp" -n
  • If you see a continuous stream of SIP and RTP packets, it means traffic is reaching the server, and the problem is likely in VoIPmonitor's configuration (see Step 4).
  • If you see NO packets, the problem lies with your network configuration. Proceed to Step 3.

Step 3: Troubleshoot Network and Interface Configuration

If `tshark` shows no traffic, it means the packets are not being delivered to the operating system correctly.

1. Check if the interface is UP

Ensure the network interface is active.

ip link show eth0

The output should contain the word `UP`. If it doesn't, bring it up with:

ip link set dev eth0 up
2. Check for Promiscuous Mode (for SPAN/RSPAN Mirrored Traffic)

Important: Promiscuous mode requirements depend on your traffic mirroring method:

  • SPAN/RSPAN (Layer 2 mirroring): The network interface must be in promiscuous mode. Mirrored packets retain their original MAC addresses, so the interface would normally ignore them. Promiscuous mode forces the interface to accept all packets regardless of destination MAC.
  • ERSPAN/GRE/TZSP/VXLAN (Layer 3 tunnels): Promiscuous mode is NOT required. These tunneling protocols encapsulate the mirrored traffic inside IP packets that are addressed directly to the sensor's IP address. The operating system receives these packets normally, and VoIPmonitor automatically decapsulates them to extract the inner SIP/RTP traffic.

For SPAN/RSPAN deployments, check the current promiscuous mode status:

ip link show eth0

Look for the `PROMISC` flag.

Enable promiscuous mode manually if needed:

ip link set eth0 promisc on

If this solves the problem, you should make the change permanent. The `install-script.sh` for the sensor usually attempts to do this, but it can fail.

3. Verify Your SPAN/Mirror/TAP Configuration

This is the most common cause of no traffic. Double-check your network switch or hardware tap configuration to ensure:

  • The correct source ports (where your PBX/SBC is connected) are being monitored.
  • The correct destination port (where your VoIPmonitor sensor is connected) is configured.
  • If you are monitoring traffic across different VLANs, ensure your mirror port is configured to carry all necessary VLAN tags (often called "trunk" mode).

Step 4: Check the VoIPmonitor Configuration

If `tshark` sees traffic but VoIPmonitor does not, the problem is almost certainly in `voipmonitor.conf`.

1. Check the `interface` directive
Make sure the `interface` parameter in `/etc/voipmonitor.conf` exactly matches the interface where you see traffic with `tshark`. For example: `interface = eth0`.
2. Check the `sipport` directive
By default, VoIPmonitor only listens on port 5060. If your PBX uses a different port for SIP, you must add it. For example:
sipport = 5060,5080
3. Check for a restrictive `filter`
If you have a BPF `filter` configured, ensure it is not accidentally excluding the traffic you want to see. For debugging, try commenting out the `filter` line entirely and restarting the sensor.

Step 5: Check GUI Capture Rules (Causing Call Stops)

If `tshark` sees SIP traffic and the sniffer configuration appears correct, but the probe stops processing calls or shows traffic only on the network interface, GUI capture rules may be the culprit.

Capture rules configured in the GUI can instruct the sniffer to ignore ("skip") all processing for matched calls. This includes calls matching specific IP addresses or telephone number prefixes.

1. Review existing capture rules
Navigate to GUI → Capture rules and examine all rules for any that might be blocking your traffic.
Look specifically for rules with the Skip option set to ON (displayed as "Skip: ON"). The Skip option instructs the sniffer to completely ignore matching calls (no files, RTP analysis, or CDR creation).
2. Test by temporarily removing all capture rules
To isolate the issue, first create a backup of your GUI configuration:
  • Navigate to Tools → Backup & Restore → Backup GUI → Configuration tables
  • This saves your current settings including capture rules
  • Delete all capture rules from the GUI
  • Click the Apply button to save changes
  • Reload the sniffer by clicking the green "reload sniffer" button in the control panel
  • Test if calls are now being processed correctly
  • If resolved, restore the configuration from the backup and systematically investigate the rules to identify the problematic one
3. Identify the problematic rule
  • After restoring your configuration, remove rules one at a time and reload the sniffer after each removal
  • When calls start being processed again, you have identified the problematic rule
  • Review the rule's match criteria (IP addresses, prefixes, direction) against your actual traffic pattern
  • Adjust the rule's conditions or Skip setting as needed
4. Verify rules are reloaded
After making changes to capture rules, remember that changes are not automatically applied to the running sniffer. You must click the "reload sniffer" button in the control panel, or the rules will continue using the previous configuration.

For more information on capture rules, see Capture_rules.

Step 6: Check VoIPmonitor Logs for Errors

Finally, VoIPmonitor's own logs are the best source for clues. Check the system log for any error messages generated by the sensor on startup or during operation.

# For Debian/Ubuntu
tail -f /var/log/syslog | grep voipmonitor

# For CentOS/RHEL/AlmaLinux
tail -f /var/log/messages | grep voipmonitor

Look for errors like:

  • "pcap_open_live(eth0) error: eth0: No such device" (Wrong interface name)
  • "Permission denied" (The sensor is not running with sufficient privileges)
  • Errors related to database connectivity.
  • Messages about dropping packets.

Step 7: Check for OOM (Out of Memory) Issues

If VoIPmonitor suddenly stops processing CDRs and a service restart temporarily restores functionality, the system may be experiencing OOM (Out of Memory) killer events. The Linux OOM killer terminates processes when available RAM is exhausted, and MySQL (`mysqld`) is a common target due to its memory-intensive nature.

1. Check for OOM killer events in kernel logs
# For Debian/Ubuntu
grep -i "out of memory\|killed process" /var/log/syslog | tail -20

# For CentOS/RHEL/AlmaLinux
grep -i "out of memory\|killed process" /var/log/messages | tail -20

# Also check dmesg:
dmesg | grep -i "killed process" | tail -10

Typical OOM killer messages look like:

Out of memory: Kill process 1234 (mysqld) score 123 or sacrifice child
Killed process 1234 (mysqld) total-vm: 12345678kB, anon-rss: 1234567kB
2. Monitor current memory usage
# Check available memory (look for low 'available' or 'free' values)
free -h

# Check per-process memory usage (sorted by RSS)
ps aux --sort=-%mem | head -15

# Check MySQL memory usage in bytes
cat /proc/$(pgrep mysqld)/status | grep -E "VmSize|VmRSS"

Warning signs:

  • Available memory consistently below 500MB during operation
  • MySQL consuming most of the available RAM
  • Swap usage near 100% (if swap is enabled)
  • Frequent process restarts without clear error messages
3. Solution
Increase physical memory:

The definitive solution for OOM-related CDR processing issues is to upgrade the server's physical RAM. After upgrading:

  • Verify memory improvements with free -h
  • Monitor for several days to ensure OOM events stop
  • Consider tuning `innodb_buffer_pool_size` in your MySQL configuration to use the additional memory effectively

Additional mitigation strategies (while planning for RAM upgrade):

  • Reduce MySQL's memory footprint by lowering `innodb_buffer_pool_size` (e.g., from 16GB to 8GB)
  • Disable or limit non-essential VoIPmonitor features (e.g., packet capture storage, RTP analysis)
  • Ensure swap space is properly configured as a safety buffer (though swap is much slower than RAM)
  • Use `sysctl vm.swappiness=10` to favor RAM over swap when some memory is still available

Appendix: tshark Display Filter Syntax for SIP

When using `tshark` to analyze SIP traffic, it is important to use the correct Wireshark display filter syntax. Below are common filter examples:

Basic SIP Filters

# Show all SIP INVITE messages
tshark -r capture.pcap -Y "sip.Method == INVITE"

# Show all SIP messages (any method)
tshark -r capture.pcap -Y "sip"

# Show SIP and RTP traffic
tshark -r capture.pcap -Y "sip || rtp"

Search for Specific Phone Number or Text

# Find calls containing a specific phone number (e.g., 5551234567)
tshark -r capture.pcap -Y 'sip contains "5551234567"'

# Find INVITE messages for a specific number
tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"'

Extract Call-ID from Matching Calls

# Get Call-ID for calls matching a phone number
tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"' -T fields -e sip.Call-ID

# Get Call-ID along with From and To headers
tshark -r capture.pcap -Y 'sip.Method == INVITE' -T fields -e sip.Call-ID -e sip.from.user -e sip.to.user

Filter by IP Address

# SIP traffic from a specific source IP
tshark -r capture.pcap -Y "sip && ip.src == 192.168.1.100"

# SIP traffic between two hosts
tshark -r capture.pcap -Y "sip && ip.addr == 192.168.1.100 && ip.addr == 10.0.0.50"

Filter by SIP Response Code

# Show all 200 OK responses
tshark -r capture.pcap -Y "sip.Status-Code == 200"

# Show all 4xx and 5xx error responses
tshark -r capture.pcap -Y "sip.Status-Code >= 400"

# Show 486 Busy Here responses
tshark -r capture.pcap -Y "sip.Status-Code == 486"

Important Syntax Notes

  • Field names are case-sensitive: Use sip.Method, sip.Call-ID, sip.Status-Code (not sip.method or sip.call-id)
  • String matching uses contains: Use sip contains "text" (not sip.contains())
  • Use double quotes for strings: sip contains "number" (not single quotes)
  • Boolean operators: Use && (and), || (or), ! (not)

For a complete reference, see the Wireshark SIP Display Filter Reference.

Step 8: Missing CDRs for Calls with Large Packets

If VoIPmonitor is capturing some calls successfully but missing CDRs for specific calls (especially those that seem to have larger SIP packets like INVITEs with extensive SDP), there are two common causes to investigate.

Cause 1: snaplen Packet Truncation (VoIPmonitor Configuration)

The `snaplen` parameter in `voipmonitor.conf` limits how many bytes of each packet are captured. If a SIP packet exceeds `snaplen`, it is truncated and the sniffer may fail to parse the call correctly.

1. Check your current snaplen setting
grep snaplen /etc/voipmonitor.conf

Default is 3200 bytes (6000 if SSL/HTTP is enabled).

2. Test if packet truncation is the issue

Use `tcpdump` with `-s0` (snap infinite) to capture full packets:

# Capture SIP traffic with full packet length
tcpdump -i eth0 -s0 -nn port 5060 -w /tmp/test_capture.pcap

# Analyze packet sizes with Wireshark or tshark
tshark -r /tmp/test_capture.pcap -T fields -e frame.len -Y "sip" | sort -n | tail -10

If you see SIP packets larger than your `snaplen` value (e.g., 4000+ bytes), increase `snaplen` in `voipmonitor.conf`:

snaplen = 65535

Then restart the sniffer: `systemctl restart voipmonitor`.

Cause 2: MTU Mismatch (Network Infrastructure)

If packets are being lost or fragmented due to MTU mismatches in the network path, VoIPmonitor may never receive the complete packets, regardless of `snaplen` settings.

1. Diagnose MTU-related packet loss

Capture traffic with tcpdump and analyze in Wireshark:

# Capture traffic on the VoIPmonitor host
tcpdump -i eth0 -s0 host <pbx_ip_address> -w /tmp/mtu_test.pcap

Open the pcap in Wireshark and look for:

  • Reassembled PDUs marked as incomplete
  • TCP retransmissions for the same packet
  • ICMP "Fragmentation needed" messages (Type 3, Code 4)
2. Verify packet completeness

In Wireshark, examine large SIP INVITE packets. If the SIP headers or SDP appear cut off or incomplete, packets are likely being lost in transit due to MTU issues.

3. Identify the MTU bottleneck

The issue is typically a network device with a lower MTU than the end devices. Common locations:

  • VPN concentrators
  • Firewalls
  • Routers with tunnel interfaces
  • Cloud provider gateways (typically 1500 bytes vs. standard 9000 jumbo frames)

To locate the problematic device, trace the MTU along the network path from the PBX to the VoIPmonitor sensor.

4. Resolution options
  • Increase MTU on the bottleneck device to match the rest of the network (e.g., from 1500 to 9000 for jumbo frame environments)
  • Enable Path MTU Discovery (PMTUD) on intermediate devices
  • Ensure your switching infrastructure supports jumbo frames end-to-end if you are using them

For more information on the `snaplen` parameter, see Sniffer Configuration.

AI Summary for RAG

Summary: This document provides a step-by-step troubleshooting guide for when the VoIPmonitor sensor is not capturing any calls. The process is broken down into eight logical steps. Step 1 is to verify the service is running correctly using `systemctl status` and `ps`. Step 2 is to use `tshark` to confirm if SIP/RTP traffic is actually arriving at the server's network interface. Step 3 covers network-level issues, including an important distinction between Layer 2 mirroring (SPAN/RSPAN) which requires promiscuous mode, and Layer 3 tunneling (ERSPAN/GRE/TZSP/VXLAN) which does NOT require promiscuous mode because the tunnel packets are addressed to the sensor's IP. Step 4 focuses on checking the `voipmonitor.conf` file for common misconfigurations like the `interface`, `sipport`, or `filter` parameters. Step 5 addresses GUI capture rules with the "Skip" option that can cause probes to stop processing calls even when network traffic is visible; it explains how to review, backup, remove, and systematically test capture rules via the GUI interface. Step 6 instructs the user on how to check the system logs (`syslog` or `messages`) for specific error messages from the sensor. Step 7 explains how to diagnose OOM (Out of Memory) killer events, which cause CDR processing to stop and require service restarts to restore. It provides commands to check for OOM events in kernel logs (grep, dmesg), monitor current memory usage (free -h, ps aux), and identify warning signs like low available memory (<500MB), high MySQL memory consumption, and frequent process restarts. The solution is to increase physical RAM, with mitigation strategies including reducing `innodb_buffer_pool_size`, disabling non-essential features, and configuring swap. Step 8 covers missing CDRs for calls with large packets, explaining two common causes: snaplen packet truncation (VoIPmonitor configuration issue requiring increasing `snaplen` to 65535 in voipmonitor.conf) and MTU mismatch (network infrastructure issue requiring finding and fixing devices with lower MTU, typically through Path MTU Discovery or adjusting MTU settings on VPNs, firewalls, or cloud gateways). The Appendix provides correct tshark display filter syntax for analyzing SIP traffic, including examples for filtering by SIP method (sip.Method == INVITE), searching for phone numbers (sip contains "number"), extracting Call-IDs (-T fields -e sip.Call-ID), and filtering by response codes (sip.Status-Code). Keywords: troubleshooting, no calls, not sniffing, no data, no CDRs, tshark, wireshark, promiscuous mode, promisc, ifconfig, ip link, SPAN, RSPAN, ERSPAN, GRE, TZSP, VXLAN, port mirroring, voipmonitor.conf, interface, sipport, filter, capture rules, Skip ON, GUI capture rules, reload sniffer, backup and restore, syslog, logs, permission denied, display filter, sip.Method, sip.Call-ID, sip.Status-Code, sip contains, OOM, out of memory, OOM killer, killed process, mysqld killed, free -h, memory usage, dmesg, swap, RAM upgrade, innodb_buffer_pool_size, missing CDRs, large packets, snaplen, packet truncation, tcpdump, MTU mismatch, MTU, Path MTU Discovery, fragmentation, ICMP fragmentation needed, VPN MTU, jumbo frames, incomplete packets Key Questions:

  • Why is VoIPmonitor not recording any calls?
  • How can I check if VoIP traffic is reaching my sensor server?
  • What command can I use to see live SIP traffic on the command line?
  • How do I enable promiscuous mode on my network card?
  • Do I need promiscuous mode for ERSPAN or GRE tunnels?
  • Does ERSPAN require promiscuous mode on the receiving interface?
  • VoIPmonitor is running but I have no new calls in the GUI, what should I check first?
  • Where can I find the log files for the VoIPmonitor sniffer?
  • What are the most common reasons for VoIPmonitor not capturing data?
  • How do I filter tshark output for SIP INVITE messages?
  • What is the correct tshark filter syntax to find a specific phone number?
  • How do I extract Call-ID from a pcap file using tshark?
  • What tshark filter shows all SIP 4xx and 5xx error responses?
  • Why is my VoIPmonitor probe stopping processing calls even though network traffic is visible?
  • What should I check if the probe sees SIP packets on the interface but processes no calls?
  • How do GUI capture rules affect call processing?
  • Why are CDRs missing for calls with large SIP packets?
  • What does the snaplen parameter do in voipmonitor.conf?
  • How do I fix missing CDRs for calls with large INVITE packets?
  • What is the difference between snaplen truncation and MTU mismatch?
  • How do I diagnose MTU-related packet loss in VoIPmonitor?
  • What tcpdump command should I use to capture full packets for debugging?
  • What does the "Skip" option in capture rules do?
  • How do I troubleshoot capture rules that are blocking calls?
  • VoIPmonitor server stops processing CDRs and needs restart. What could be wrong?
  • Why does MySQL crash and restart on my VoIPmonitor server?
  • How do I check for OOM killer events in Linux?
  • What does the error "Out of memory: Kill process" mean?
  • How can I monitor memory usage on my VoIPmonitor server?
  • What command shows available memory in Linux?
  • How do I fix OOM killer issues on VoIPmonitor?
  • Why is mysqld getting killed on my system?