Sniffer troubleshooting: Difference between revisions

From VoIPmonitor.org
(Review: Replace <pre> with <syntaxhighlight lang=bash>, fix inline backticks to <code>, trim AI Summary section)
(Review: Added troubleshooting flowchart diagram, See Also section, fixed syntax highlighting)
Line 2: Line 2:


'''This guide provides a systematic, step-by-step process to diagnose why the VoIPmonitor sensor might not be capturing any calls. Follow these steps in order to quickly identify and resolve the most common issues.'''
'''This guide provides a systematic, step-by-step process to diagnose why the VoIPmonitor sensor might not be capturing any calls. Follow these steps in order to quickly identify and resolve the most common issues.'''
== Troubleshooting Flowchart ==
<mermaid>
flowchart TD
    A[No Calls Being Captured] --> B{Step 1: Service Running?}
    B -->|No| B1[systemctl restart voipmonitor]
    B -->|Yes| C{Step 2: Traffic on Interface?<br/>tshark -i eth0 -Y 'sip'}
    C -->|No packets| D[Step 3: Network Issue]
    D --> D1{Interface UP?}
    D1 -->|No| D2[ip link set dev eth0 up]
    D1 -->|Yes| D3{SPAN/RSPAN?}
    D3 -->|Yes| D4[Enable promisc mode]
    D3 -->|ERSPAN/GRE/TZSP| D5[Check tunnel config]
    C -->|Packets visible| E[Step 4: VoIPmonitor Config]
    E --> E1{interface correct?}
    E1 -->|No| E2[Fix interface in voipmonitor.conf]
    E1 -->|Yes| E3{sipport correct?}
    E3 -->|No| E4[Add port: sipport = 5060,5080]
    E3 -->|Yes| E5{BPF filter blocking?}
    E5 -->|Maybe| E6[Comment out filter directive]
    E5 -->|No| F[Step 5: GUI Capture Rules]
    F --> F1{Rules with Skip: ON?}
    F1 -->|Yes| F2[Remove/modify rules + reload sniffer]
    F1 -->|No| G[Step 6: Check Logs]
    G --> H{OOM Events?}
    H -->|Yes| H1[Step 7: Add RAM / tune MySQL]
    H -->|No| I{Large SIP packets?}
    I -->|Yes| I1[Step 8: Increase snaplen / fix MTU]
    I -->|No| J[Contact Support]
</mermaid>


== Step 1: Is the VoIPmonitor Service Running Correctly? ==
== Step 1: Is the VoIPmonitor Service Running Correctly? ==
Line 151: Line 186:
</syntaxhighlight>
</syntaxhighlight>
Typical OOM killer messages look like:
Typical OOM killer messages look like:
<syntaxhighlight lang="text">
<pre>
Out of memory: Kill process 1234 (mysqld) score 123 or sacrifice child
Out of memory: Kill process 1234 (mysqld) score 123 or sacrifice child
Killed process 1234 (mysqld) total-vm: 12345678kB, anon-rss: 1234567kB
Killed process 1234 (mysqld) total-vm: 12345678kB, anon-rss: 1234567kB
</syntaxhighlight>
</pre>


;2. Monitor current memory usage:
;2. Monitor current memory usage:
Line 306: Line 341:


For a complete reference, see the [https://www.wireshark.org/docs/dfref/s/sip.html Wireshark SIP Display Filter Reference].
For a complete reference, see the [https://www.wireshark.org/docs/dfref/s/sip.html Wireshark SIP Display Filter Reference].
== See Also ==
* [[Sniffer_configuration]] - Complete configuration reference for voipmonitor.conf
* [[Sniffer_distributed_architecture]] - Client/server deployment and troubleshooting
* [[Capture_rules]] - GUI-based selective recording configuration
* [[Sniffing_modes]] - Traffic forwarding methods (SPAN, ERSPAN, GRE, TZSP)
* [[Scaling]] - Performance tuning and optimization


== AI Summary for RAG ==
== AI Summary for RAG ==

Revision as of 13:10, 5 January 2026


This guide provides a systematic, step-by-step process to diagnose why the VoIPmonitor sensor might not be capturing any calls. Follow these steps in order to quickly identify and resolve the most common issues.

Troubleshooting Flowchart

<mermaid> flowchart TD

   A[No Calls Being Captured] --> B{Step 1: Service Running?}
   B -->|No| B1[systemctl restart voipmonitor]
   B -->|Yes| C{Step 2: Traffic on Interface?
tshark -i eth0 -Y 'sip'}
   C -->|No packets| D[Step 3: Network Issue]
   D --> D1{Interface UP?}
   D1 -->|No| D2[ip link set dev eth0 up]
   D1 -->|Yes| D3{SPAN/RSPAN?}
   D3 -->|Yes| D4[Enable promisc mode]
   D3 -->|ERSPAN/GRE/TZSP| D5[Check tunnel config]
   C -->|Packets visible| E[Step 4: VoIPmonitor Config]
   E --> E1{interface correct?}
   E1 -->|No| E2[Fix interface in voipmonitor.conf]
   E1 -->|Yes| E3{sipport correct?}
   E3 -->|No| E4[Add port: sipport = 5060,5080]
   E3 -->|Yes| E5{BPF filter blocking?}
   E5 -->|Maybe| E6[Comment out filter directive]
   E5 -->|No| F[Step 5: GUI Capture Rules]
   F --> F1{Rules with Skip: ON?}
   F1 -->|Yes| F2[Remove/modify rules + reload sniffer]
   F1 -->|No| G[Step 6: Check Logs]
   G --> H{OOM Events?}
   H -->|Yes| H1[Step 7: Add RAM / tune MySQL]
   H -->|No| I{Large SIP packets?}
   I -->|Yes| I1[Step 8: Increase snaplen / fix MTU]
   I -->|No| J[Contact Support]

</mermaid>

Step 1: Is the VoIPmonitor Service Running Correctly?

First, confirm that the sensor process is active and loaded the correct configuration file.

1. Check the service status (for modern systemd systems)
systemctl status voipmonitor

Look for a line that says Active: active (running). If it is inactive or failed, try restarting it with systemctl restart voipmonitor and check the status again.

2. Verify the running process
ps aux | grep voipmonitor

This command will show the running process and the exact command line arguments it was started with. Critically, ensure it is using the correct configuration file, for example: --config-file /etc/voipmonitor.conf. If it is not, there may be an issue with your startup script.

Step 2: Is Network Traffic Reaching the Server?

If the service is running, the next step is to verify if the VoIP packets (SIP/RTP) are actually arriving at the server's network interface. The best tool for this is tshark (the command-line version of Wireshark).

1. Install tshark
# For Debian/Ubuntu
apt-get update && apt-get install tshark

# For CentOS/RHEL/AlmaLinux
yum install wireshark
2. Listen for SIP traffic on the correct interface

Replace eth0 with the interface name you have configured in voipmonitor.conf.

tshark -i eth0 -Y "sip || rtp" -n
  • If you see a continuous stream of SIP and RTP packets, it means traffic is reaching the server, and the problem is likely in VoIPmonitor's configuration (see Step 4).
  • If you see NO packets, the problem lies with your network configuration. Proceed to Step 3.

Step 3: Troubleshoot Network and Interface Configuration

If tshark shows no traffic, it means the packets are not being delivered to the operating system correctly.

1. Check if the interface is UP

Ensure the network interface is active.

ip link show eth0

The output should contain the word UP. If it doesn't, bring it up with:

ip link set dev eth0 up
2. Check for Promiscuous Mode (for SPAN/RSPAN Mirrored Traffic)

Important: Promiscuous mode requirements depend on your traffic mirroring method:

  • SPAN/RSPAN (Layer 2 mirroring): The network interface must be in promiscuous mode. Mirrored packets retain their original MAC addresses, so the interface would normally ignore them. Promiscuous mode forces the interface to accept all packets regardless of destination MAC.
  • ERSPAN/GRE/TZSP/VXLAN (Layer 3 tunnels): Promiscuous mode is NOT required. These tunneling protocols encapsulate the mirrored traffic inside IP packets that are addressed directly to the sensor's IP address. The operating system receives these packets normally, and VoIPmonitor automatically decapsulates them to extract the inner SIP/RTP traffic.

For SPAN/RSPAN deployments, check the current promiscuous mode status:

ip link show eth0

Look for the PROMISC flag.

Enable promiscuous mode manually if needed:

ip link set eth0 promisc on

If this solves the problem, you should make the change permanent. The install-script.sh for the sensor usually attempts to do this, but it can fail.

3. Verify Your SPAN/Mirror/TAP Configuration

This is the most common cause of no traffic. Double-check your network switch or hardware tap configuration to ensure:

  • The correct source ports (where your PBX/SBC is connected) are being monitored.
  • The correct destination port (where your VoIPmonitor sensor is connected) is configured.
  • If you are monitoring traffic across different VLANs, ensure your mirror port is configured to carry all necessary VLAN tags (often called "trunk" mode).

Step 4: Check the VoIPmonitor Configuration

If tshark sees traffic but VoIPmonitor does not, the problem is almost certainly in voipmonitor.conf.

1. Check the interface directive
Make sure the interface parameter in /etc/voipmonitor.conf exactly matches the interface where you see traffic with tshark. For example: interface = eth0.
2. Check the sipport directive
By default, VoIPmonitor only listens on port 5060. If your PBX uses a different port for SIP, you must add it. For example:
sipport = 5060,5080
For distributed/probe setups: If you are using a remote sensor (probe) with Packet Mirroring, the sipport configuration must match on BOTH the probe AND the central analysis host. See Distributed Architecture: Troubleshooting for details.
3. Check for a restrictive filter
If you have a BPF filter configured, ensure it is not accidentally excluding the traffic you want to see. For debugging, try commenting out the filter line entirely and restarting the sensor.

Step 5: Check GUI Capture Rules (Causing Call Stops)

If tshark sees SIP traffic and the sniffer configuration appears correct, but the probe stops processing calls or shows traffic only on the network interface, GUI capture rules may be the culprit.

Capture rules configured in the GUI can instruct the sniffer to ignore ("skip") all processing for matched calls. This includes calls matching specific IP addresses or telephone number prefixes.

1. Review existing capture rules
Navigate to GUI -> Capture rules and examine all rules for any that might be blocking your traffic.
Look specifically for rules with the Skip option set to ON (displayed as "Skip: ON"). The Skip option instructs the sniffer to completely ignore matching calls (no files, RTP analysis, or CDR creation).
2. Test by temporarily removing all capture rules
To isolate the issue, first create a backup of your GUI configuration:
  • Navigate to Tools -> Backup & Restore -> Backup GUI -> Configuration tables
  • This saves your current settings including capture rules
  • Delete all capture rules from the GUI
  • Click the Apply button to save changes
  • Reload the sniffer by clicking the green "reload sniffer" button in the control panel
  • Test if calls are now being processed correctly
  • If resolved, restore the configuration from the backup and systematically investigate the rules to identify the problematic one
3. Identify the problematic rule
  • After restoring your configuration, remove rules one at a time and reload the sniffer after each removal
  • When calls start being processed again, you have identified the problematic rule
  • Review the rule's match criteria (IP addresses, prefixes, direction) against your actual traffic pattern
  • Adjust the rule's conditions or Skip setting as needed
4. Verify rules are reloaded
After making changes to capture rules, remember that changes are not automatically applied to the running sniffer. You must click the "reload sniffer" button in the control panel, or the rules will continue using the previous configuration.

For more information on capture rules, see Capture_rules.

Step 6: Check VoIPmonitor Logs for Errors

Finally, VoIPmonitor's own logs are the best source for clues. Check the system log for any error messages generated by the sensor on startup or during operation.

# For Debian/Ubuntu
tail -f /var/log/syslog | grep voipmonitor

# For CentOS/RHEL/AlmaLinux
tail -f /var/log/messages | grep voipmonitor

Look for errors like:

  • "pcap_open_live(eth0) error: eth0: No such device" (Wrong interface name)
  • "Permission denied" (The sensor is not running with sufficient privileges)
  • Errors related to database connectivity.
  • Messages about dropping packets.

Step 7: Check for OOM (Out of Memory) Issues

If VoIPmonitor suddenly stops processing CDRs and a service restart temporarily restores functionality, the system may be experiencing OOM (Out of Memory) killer events. The Linux OOM killer terminates processes when available RAM is exhausted, and MySQL (mysqld) is a common target due to its memory-intensive nature.

1. Check for OOM killer events in kernel logs
# For Debian/Ubuntu
grep -i "out of memory\|killed process" /var/log/syslog | tail -20

# For CentOS/RHEL/AlmaLinux
grep -i "out of memory\|killed process" /var/log/messages | tail -20

# Also check dmesg:
dmesg | grep -i "killed process" | tail -10

Typical OOM killer messages look like:

Out of memory: Kill process 1234 (mysqld) score 123 or sacrifice child
Killed process 1234 (mysqld) total-vm: 12345678kB, anon-rss: 1234567kB
2. Monitor current memory usage
# Check available memory (look for low 'available' or 'free' values)
free -h

# Check per-process memory usage (sorted by RSS)
ps aux --sort=-%mem | head -15

# Check MySQL memory usage in bytes
cat /proc/$(pgrep mysqld)/status | grep -E "VmSize|VmRSS"

Warning signs:

  • Available memory consistently below 500MB during operation
  • MySQL consuming most of the available RAM
  • Swap usage near 100% (if swap is enabled)
  • Frequent process restarts without clear error messages
3. Solution
Increase physical memory:

The definitive solution for OOM-related CDR processing issues is to upgrade the server's physical RAM. After upgrading:

  • Verify memory improvements with free -h
  • Monitor for several days to ensure OOM events stop
  • Consider tuning innodb_buffer_pool_size in your MySQL configuration to use the additional memory effectively

Additional mitigation strategies (while planning for RAM upgrade):

  • Reduce MySQL's memory footprint by lowering innodb_buffer_pool_size (e.g., from 16GB to 8GB)
  • Disable or limit non-essential VoIPmonitor features (e.g., packet capture storage, RTP analysis)
  • Ensure swap space is properly configured as a safety buffer (though swap is much slower than RAM)
  • Use sysctl vm.swappiness=10 to favor RAM over swap when some memory is still available

Step 8: Missing CDRs for Calls with Large Packets

If VoIPmonitor is capturing some calls successfully but missing CDRs for specific calls (especially those that seem to have larger SIP packets like INVITEs with extensive SDP), there are two common causes to investigate.

Cause 1: snaplen Packet Truncation (VoIPmonitor Configuration)

The snaplen parameter in voipmonitor.conf limits how many bytes of each packet are captured. If a SIP packet exceeds snaplen, it is truncated and the sniffer may fail to parse the call correctly.

1. Check your current snaplen setting
grep snaplen /etc/voipmonitor.conf

Default is 3200 bytes (6000 if SSL/HTTP is enabled).

2. Test if packet truncation is the issue

Use tcpdump with -s0 (snap infinite) to capture full packets:

# Capture SIP traffic with full packet length
tcpdump -i eth0 -s0 -nn port 5060 -w /tmp/test_capture.pcap

# Analyze packet sizes with Wireshark or tshark
tshark -r /tmp/test_capture.pcap -T fields -e frame.len -Y "sip" | sort -n | tail -10

If you see SIP packets larger than your snaplen value (e.g., 4000+ bytes), increase snaplen in voipmonitor.conf:

snaplen = 65535

Then restart the sniffer: systemctl restart voipmonitor.

Cause 2: MTU Mismatch (Network Infrastructure)

If packets are being lost or fragmented due to MTU mismatches in the network path, VoIPmonitor may never receive the complete packets, regardless of snaplen settings.

1. Diagnose MTU-related packet loss

Capture traffic with tcpdump and analyze in Wireshark:

# Capture traffic on the VoIPmonitor host
tcpdump -i eth0 -s0 host <pbx_ip_address> -w /tmp/mtu_test.pcap

Open the pcap in Wireshark and look for:

  • Reassembled PDUs marked as incomplete
  • TCP retransmissions for the same packet
  • ICMP "Fragmentation needed" messages (Type 3, Code 4)
2. Verify packet completeness

In Wireshark, examine large SIP INVITE packets. If the SIP headers or SDP appear cut off or incomplete, packets are likely being lost in transit due to MTU issues.

3. Identify the MTU bottleneck

The issue is typically a network device with a lower MTU than the end devices. Common locations:

  • VPN concentrators
  • Firewalls
  • Routers with tunnel interfaces
  • Cloud provider gateways (typically 1500 bytes vs. standard 9000 jumbo frames)

To locate the problematic device, trace the MTU along the network path from the PBX to the VoIPmonitor sensor.

4. Resolution options
  • Increase MTU on the bottleneck device to match the rest of the network (e.g., from 1500 to 9000 for jumbo frame environments)
  • Enable Path MTU Discovery (PMTUD) on intermediate devices
  • Ensure your switching infrastructure supports jumbo frames end-to-end if you are using them

For more information on the snaplen parameter, see Sniffer Configuration.

Appendix: tshark Display Filter Syntax for SIP

When using tshark to analyze SIP traffic, it is important to use the correct Wireshark display filter syntax. Below are common filter examples:

Basic SIP Filters

# Show all SIP INVITE messages
tshark -r capture.pcap -Y "sip.Method == INVITE"

# Show all SIP messages (any method)
tshark -r capture.pcap -Y "sip"

# Show SIP and RTP traffic
tshark -r capture.pcap -Y "sip || rtp"

Search for Specific Phone Number or Text

# Find calls containing a specific phone number (e.g., 5551234567)
tshark -r capture.pcap -Y 'sip contains "5551234567"'

# Find INVITE messages for a specific number
tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"'

Extract Call-ID from Matching Calls

# Get Call-ID for calls matching a phone number
tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"' -T fields -e sip.Call-ID

# Get Call-ID along with From and To headers
tshark -r capture.pcap -Y 'sip.Method == INVITE' -T fields -e sip.Call-ID -e sip.from.user -e sip.to.user

Filter by IP Address

# SIP traffic from a specific source IP
tshark -r capture.pcap -Y "sip && ip.src == 192.168.1.100"

# SIP traffic between two hosts
tshark -r capture.pcap -Y "sip && ip.addr == 192.168.1.100 && ip.addr == 10.0.0.50"

Filter by SIP Response Code

# Show all 200 OK responses
tshark -r capture.pcap -Y "sip.Status-Code == 200"

# Show all 4xx and 5xx error responses
tshark -r capture.pcap -Y "sip.Status-Code >= 400"

# Show 486 Busy Here responses
tshark -r capture.pcap -Y "sip.Status-Code == 486"

Important Syntax Notes

  • Field names are case-sensitive: Use sip.Method, sip.Call-ID, sip.Status-Code (not sip.method or sip.call-id)
  • String matching uses contains: Use sip contains "text" (not sip.contains())
  • Use double quotes for strings: sip contains "number" (not single quotes)
  • Boolean operators: Use && (and), || (or), ! (not)

For a complete reference, see the Wireshark SIP Display Filter Reference.

See Also

AI Summary for RAG

Summary: Step-by-step troubleshooting guide for VoIPmonitor sensor not capturing calls. Steps: (1) Verify service running with systemctl status. (2) Use tshark to confirm SIP/RTP traffic reaching interface. (3) Check network config - promiscuous mode required for SPAN/RSPAN but NOT for Layer 3 tunnels (ERSPAN/GRE/TZSP/VXLAN). (4) Verify voipmonitor.conf settings: interface, sipport, filter directives. (5) Check GUI capture rules with "Skip" option blocking calls. (6) Review system logs for errors. (7) Diagnose OOM killer events causing CDR processing stops. (8) Investigate missing CDRs due to snaplen truncation or MTU mismatch. Includes tshark display filter syntax appendix.

Keywords: troubleshooting, no calls, not sniffing, no CDRs, tshark, promiscuous mode, SPAN, RSPAN, ERSPAN, GRE, TZSP, VXLAN, voipmonitor.conf, interface, sipport, filter, capture rules, Skip, syslog, OOM, out of memory, snaplen, MTU, packet truncation, display filter, sip.Method, sip.Call-ID

Key Questions:

  • Why is VoIPmonitor not recording any calls?
  • How can I check if VoIP traffic is reaching my sensor server?
  • How do I enable promiscuous mode on my network card?
  • Do I need promiscuous mode for ERSPAN or GRE tunnels?
  • What are the most common reasons for VoIPmonitor not capturing data?
  • How do I filter tshark output for SIP INVITE messages?
  • What is the correct tshark filter syntax to find a specific phone number?
  • Why is my VoIPmonitor probe stopping processing calls?
  • What does the "Skip" option in capture rules do?
  • How do I check for OOM killer events in Linux?
  • Why are CDRs missing for calls with large SIP packets?
  • What does the snaplen parameter do in voipmonitor.conf?
  • How do I diagnose MTU-related packet loss?