Sniffer troubleshooting: Difference between revisions

From VoIPmonitor.org
(Rewrite: konsolidace a vylepšení struktury - zkráceno z 2905 na 433 řádků, organizace podle symptomů)
Tag: Replaced
(Add rtp_check_both_sides_by_sdp parameter documentation for PBX port reuse issues)
 
(4 intermediate revisions by the same user not shown)
Line 108: Line 108:
</syntaxhighlight>
</syntaxhighlight>


==== Missing id_sensor Parameter ====
'''Symptom''': SIP packets visible in Capture/PCAP section but missing from CDR, SIP messages, and Call flow.
'''Cause''': The <code>id_sensor</code> parameter is not configured or is missing. This parameter is required to associate captured packets with the CDR database.
'''Solution''':
<syntaxhighlight lang="bash">
# Check if id_sensor is set
grep "^id_sensor" /etc/voipmonitor.conf
# Add or correct the parameter
echo "id_sensor = 1" >> /etc/voipmonitor.conf
# Restart the service
systemctl restart voipmonitor
</syntaxhighlight>
{{Tip|Use a unique numeric identifier (1-65535) for each sensor. Essential for multi-sensor deployments. See [[Sniffer_configuration#id_sensor|id_sensor documentation]].}}
== Missing Audio / RTP Issues ==
== Missing Audio / RTP Issues ==


Line 155: Line 176:


If ports don't match, the external device must be configured to preserve SDP ports - VoIPmonitor cannot compensate.
If ports don't match, the external device must be configured to preserve SDP ports - VoIPmonitor cannot compensate.
=== RTP Incorrectly Associated with Wrong Call (PBX Port Reuse) ===
'''Symptom''': RTP streams from one call appear associated with a different CDR when your PBX aggressively reuses the same IP:port across multiple calls.
'''Cause''': When PBX reuses media ports, VoIPmonitor may incorrectly correlate RTP packets to the wrong call based on weaker correlation methods.
'''Solution''': Enable <code>rtp_check_both_sides_by_sdp</code> to require verification of both source and destination IP:port against SDP:
<syntaxhighlight lang="ini">
# voipmonitor.conf - require both source and destination to match SDP
rtp_check_both_sides_by_sdp = yes
# Alternative (strict) mode - allows initial unverified packets
rtp_check_both_sides_by_sdp = strict
</syntaxhighlight>


{{Warning|Enabling this may prevent RTP association for calls using NAT, as the source IP:port will not match the SDP. Use <code>natalias</code> mappings or the <code>strict</code> setting to mitigate this.}}
=== Snaplen Truncation ===
=== Snaplen Truncation ===


Line 253: Line 288:
kill -9 <PID>
kill -9 <PID>
</syntaxhighlight>
</syntaxhighlight>
For servers limited to '''16GB RAM''' or when experiencing repeated MySQL OOM kills:


<syntaxhighlight lang="ini">
# /etc/my.cnf or /etc/mysql/mariadb.conf.d/50-server.cnf
[mysqld]
# On 16GB server: 6GB buffer pool + 6GB MySQL overhead = 12GB total
# Leaves 4GB for OS + GUI, preventing OOM
innodb_buffer_pool_size = 6G
# Enable write buffering (may lose up to 1s of data on crash but reduces memory pressure)
innodb_flush_log_at_trx_commit = 2
</syntaxhighlight>
Restart MySQL after changes:
<syntaxhighlight lang="bash">
systemctl restart mysql
# or
systemctl restart mariadb
</syntaxhighlight>
=== SQL Queue Growth from Non-Call Data ===
If <code>sip-register</code>, <code>sip-options</code>, or <code>sip-subscribe</code> are enabled, non-call SIP-messages (OPTIONS, REGISTER, SUBSCRIBE, NOTIFY) can accumulate in the database and cause the SQL queue to grow unbounded. This increases MySQL memory usage and leads to OOM kills of mysqld.
{{Warning|1=Even with reduced <code>innodb_buffer_pool_size</code>, SQL queue will grow indefinitely without cleanup of non-call data.}}
'''Solution: Enable automatic cleanup of old non-call data'''
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
# cleandatabase=2555 automatically deletes partitions older than 7 years
# Covers: CDR, register_state, register_failed, and sip_msg (OPTIONS/SUBSCRIBE/NOTIFY)
cleandatabase = 2555
</syntaxhighlight>
Restart the sniffer after changes:
<syntaxhighlight lang="bash">
systemctl restart voipmonitor
</syntaxhighlight>
{{Note|See [[Data_Cleaning]] for detailed configuration options and other <code>cleandatabase_*</code> parameters.}}
== Service Startup Failures ==
== Service Startup Failures ==


Line 329: Line 402:


'''If genuine reordering''': Usually caused by packet bursts in network infrastructure. Use tcpdump to verify packets arrive out of order at the interface. Work with network admin to implement QoS or traffic shaping. For persistent issues, consider dedicated capture card with hardware timestamping (see [[Napatech]]).
'''If genuine reordering''': Usually caused by packet bursts in network infrastructure. Use tcpdump to verify packets arrive out of order at the interface. Work with network admin to implement QoS or traffic shaping. For persistent issues, consider dedicated capture card with hardware timestamping (see [[Napatech]]).
{{Note|For out-of-order packets in '''client/server mode''' (multiple sniffers), see [[Sniffer_distributed_architecture]] for <code>pcap_queue_dequeu_window_length</code> configuration.}}


=== Solutions for SPAN/Mirroring Reordering ===
If packets arrive out of order at the SPAN/mirror port (e.g., 302 responses before INVITE causing "000 no response" errors):
1. '''Configure switch to preserve packet order''': Many switches allow configuring SPAN/mirror ports to maintain packet ordering. Consult your switch documentation for packet ordering guarantees in mirroring configuration.
2. '''Replace SPAN with TAP or packet broker''': Unlike software-based SPAN mirroring, hardware TAPs and packet brokers guarantee packet order. Consider upgrading to a dedicated TAP or packet broker device for mission-critical monitoring.
== Database Issues ==
== Database Issues ==


Line 407: Line 488:
* [[Database_troubleshooting]] - Database issues
* [[Database_troubleshooting]] - Database issues
* [[FAQ]] - Common questions and Wireshark display issues
* [[FAQ]] - Common questions and Wireshark display issues


== AI Summary for RAG ==
== AI Summary for RAG ==

Latest revision as of 23:42, 9 January 2026

Sniffer Troubleshooting

This page covers common VoIPmonitor sniffer/sensor problems organized by symptom. For configuration reference, see Sniffer_configuration. For performance tuning, see Scaling.

Critical First Step: Is Traffic Reaching the Interface?

⚠️ Warning: Before any sensor tuning, verify packets are reaching the network interface. If packets aren't there, no amount of sensor configuration will help.

# Check for SIP traffic on the capture interface
tcpdump -i eth0 -nn "host <PROBLEMATIC_IP> and port 5060" -c 10

# If no packets: Network/SPAN issue - contact network admin
# If packets visible: Proceed with sensor troubleshooting below
Diagram-Code:

Quick Diagnostic Checklist

Check Command Expected Result
Service running systemctl status voipmonitor Active (running)
Traffic on interface tshark -i eth0 -c 5 -Y "sip" SIP packets displayed
Interface errors ip -s link show eth0 No RX errors/drops
Promiscuous mode ip link show eth0 PROMISC flag present
Logs grep voip No critical errors
GUI rules Settings → Capture Rules No unexpected "Skip" rules

No Calls Being Recorded

Service Not Running

# Check status
systemctl status voipmonitor

# View recent logs
journalctl -u voipmonitor --since "10 minutes ago"

# Start/restart
systemctl restart voipmonitor

Common startup failures:

  • Interface not found: Check interface in voipmonitor.conf matches ip a output
  • Port already in use: Another process using the management port
  • License issue: Check License for activation problems

Wrong Interface or Port Configuration

# Check current config
grep -E "^interface|^sipport" /etc/voipmonitor.conf

# Example correct config:
# interface = eth0
# sipport = 5060

💡 Tip:

GUI Capture Rules Blocking

Navigate to Settings → Capture Rules and check for rules with action "Skip" that may be blocking calls. Rules are processed in order - a Skip rule early in the list will block matching calls.

See Capture_rules for detailed configuration.

SPAN/Mirror Not Configured

If tcpdump shows no traffic:

  1. Verify switch SPAN/mirror port configuration
  2. Check that both directions (ingress + egress) are mirrored
  3. Confirm VLAN tagging is preserved if needed
  4. Test physical connectivity (cable, port status)

See Sniffing_modes for SPAN, RSPAN, and ERSPAN configuration.

Filter Parameter Too Restrictive

If filter is set in voipmonitor.conf, it may exclude traffic:

# Check filter
grep "^filter" /etc/voipmonitor.conf

# Temporarily disable to test
# Comment out the filter line and restart


Missing id_sensor Parameter

Symptom: SIP packets visible in Capture/PCAP section but missing from CDR, SIP messages, and Call flow.

Cause: The id_sensor parameter is not configured or is missing. This parameter is required to associate captured packets with the CDR database.

Solution:

# Check if id_sensor is set
grep "^id_sensor" /etc/voipmonitor.conf

# Add or correct the parameter
echo "id_sensor = 1" >> /etc/voipmonitor.conf

# Restart the service
systemctl restart voipmonitor

💡 Tip: Use a unique numeric identifier (1-65535) for each sensor. Essential for multi-sensor deployments. See id_sensor documentation.

Missing Audio / RTP Issues

One-Way Audio (Asymmetric Mirroring)

Symptom: SIP recorded but only one RTP direction captured.

Cause: SPAN port configured for only one direction.

Diagnosis:

# Count RTP packets per direction
tshark -i eth0 -Y "rtp" -T fields -e ip.src -e ip.dst | sort | uniq -c

If one direction shows 0 or very few packets, configure the switch to mirror both ingress and egress traffic.

RTP Not Associated with Call

Symptom: Audio plays in sniffer but not in GUI, or RTP listed under wrong call.

Possible causes:

1. SIP and RTP on different interfaces/VLANs:

# voipmonitor.conf - enable automatic RTP association
auto_enable_use_blocks = yes

2. NAT not configured:

# voipmonitor.conf - for NAT scenarios
natalias = <public_ip> <private_ip>

# If not working, try reversed order:
natalias = <private_ip> <public_ip>

3. External device modifying media ports:

If SDP advertises one port but RTP arrives on different port (SBC/media server issue):

# Compare SDP ports vs actual RTP
tshark -r call.pcap -Y "sip.Method == INVITE" -V | grep "m=audio"
tshark -r call.pcap -Y "rtp" -T fields -e udp.dstport | sort -u

If ports don't match, the external device must be configured to preserve SDP ports - VoIPmonitor cannot compensate.

RTP Incorrectly Associated with Wrong Call (PBX Port Reuse)

Symptom: RTP streams from one call appear associated with a different CDR when your PBX aggressively reuses the same IP:port across multiple calls.

Cause: When PBX reuses media ports, VoIPmonitor may incorrectly correlate RTP packets to the wrong call based on weaker correlation methods.

Solution: Enable rtp_check_both_sides_by_sdp to require verification of both source and destination IP:port against SDP:

# voipmonitor.conf - require both source and destination to match SDP
rtp_check_both_sides_by_sdp = yes

# Alternative (strict) mode - allows initial unverified packets
rtp_check_both_sides_by_sdp = strict

⚠️ Warning: Enabling this may prevent RTP association for calls using NAT, as the source IP:port will not match the SDP. Use natalias mappings or the strict setting to mitigate this.

Snaplen Truncation

Symptom: Large SIP messages truncated, incomplete headers.

Solution:

# voipmonitor.conf - increase packet capture size
snaplen = 8192

For Kamailio siptrace, also check trace_msg_fragment_size in Kamailio config. See snaplen documentation.

PACKETBUFFER Saturation

Symptom: Log shows PACKETBUFFER: memory is FULL, truncated RTP recordings.

Diagnose: I/O vs CPU Bottleneck

# voipmonitor.conf - disable storage temporarily to test
savesip = no
savertp = no
savertcp = no
savegraph = no

Restart and monitor. If problem disappears → disk I/O bottleneck. If problem persists → CPU bottleneck.

Solution: I/O Bottleneck

  • Upgrade to faster storage (SSD, NVMe)
  • Use RAID with write cache
  • Move to client/server mode with dedicated storage server

Solution: CPU Bottleneck

For 8,000-10,000 concurrent calls:

# voipmonitor.conf
rtpthreads_start = 20
threading_expanded = high_traffic
max_buffer_mem = 10000

ℹ️ Note: After changes, monitor syslog heap[A

Storage Hardware Failure

Symptom: Sensor shows disconnected (red X) with "DROPPED PACKETS" at low traffic volumes.

Diagnosis:

# Check disk health
smartctl -a /dev/sda

# Check RAID status (if applicable)
cat /proc/mdstat
mdadm --detail /dev/md0

Look for reallocated sectors, pending sectors, or RAID degraded state. Replace failing disk.

OOM (Out of Memory)

Identify OOM Victim

# Check for OOM kills
dmesg | grep -i "out of memory\|oom\|killed process"
journalctl --since "1 hour ago" | grep -i oom

MySQL Killed by OOM

Reduce InnoDB buffer pool:

# /etc/mysql/my.cnf
innodb_buffer_pool_size = 2G  # Reduce from default

Voipmonitor Killed by OOM

Reduce buffer sizes in voipmonitor.conf:

max_buffer_mem = 2000  # Reduce from default
ringbuffer = 50        # Reduce from default

Runaway External Process

# Find memory-hungry processes
ps aux --sort=-%mem | head -20

# Kill orphaned/runaway process
kill -9 <PID>

For servers limited to 16GB RAM or when experiencing repeated MySQL OOM kills:

# /etc/my.cnf or /etc/mysql/mariadb.conf.d/50-server.cnf
[mysqld]
# On 16GB server: 6GB buffer pool + 6GB MySQL overhead = 12GB total
# Leaves 4GB for OS + GUI, preventing OOM
innodb_buffer_pool_size = 6G

# Enable write buffering (may lose up to 1s of data on crash but reduces memory pressure)
innodb_flush_log_at_trx_commit = 2

Restart MySQL after changes:

systemctl restart mysql
# or
systemctl restart mariadb

SQL Queue Growth from Non-Call Data

If sip-register, sip-options, or sip-subscribe are enabled, non-call SIP-messages (OPTIONS, REGISTER, SUBSCRIBE, NOTIFY) can accumulate in the database and cause the SQL queue to grow unbounded. This increases MySQL memory usage and leads to OOM kills of mysqld.

⚠️ Warning: Even with reduced innodb_buffer_pool_size, SQL queue will grow indefinitely without cleanup of non-call data.

Solution: Enable automatic cleanup of old non-call data

# /etc/voipmonitor.conf
# cleandatabase=2555 automatically deletes partitions older than 7 years
# Covers: CDR, register_state, register_failed, and sip_msg (OPTIONS/SUBSCRIBE/NOTIFY)
cleandatabase = 2555

Restart the sniffer after changes:

systemctl restart voipmonitor

ℹ️ Note: See Data_Cleaning for detailed configuration options and other cleandatabase_* parameters.

Service Startup Failures

Interface No Longer Exists

After OS upgrade, interface names may change (eth0 → ensXXX):

# Find current interface names
ip a

# Update all config locations
grep -r "interface" /etc/voipmonitor.conf /etc/voipmonitor.conf.d/

# Also check GUI: Settings → Sensors → Configuration

Missing Dependencies

# Install common missing package
apt install libpcap0.8  # Debian/Ubuntu
yum install libpcap     # RHEL/CentOS

Network Interface Issues

Promiscuous Mode

Required for SPAN port monitoring:

# Enable
ip link set eth0 promisc on

# Verify
ip link show eth0 | grep PROMISC

ℹ️ Note: Promiscuous mode is NOT required for ERSPAN/GRE tunnels where traffic is addressed to the sensor.

Interface Drops

# Check for drops
ip -s link show eth0 | grep -i drop

# If drops present, increase ring buffer
ethtool -G eth0 rx 4096

Bonded/EtherChannel Interfaces

Symptom: False packet loss when monitoring bond0 or br0.

Solution: Monitor physical interfaces, not logical:

# voipmonitor.conf - use physical interfaces
interface = eth0,eth1

Network Offloading Issues

Symptom: Kernel errors like bad gso: type: 1, size: 1448

# Disable offloading on capture interface
ethtool -K eth0 gso off tso off gro off lro off

Packet Ordering Issues

If SIP messages appear out of sequence:

First: Rule out Wireshark display artifact - disable "Analyze TCP sequence numbers" in Wireshark. See FAQ.

If genuine reordering: Usually caused by packet bursts in network infrastructure. Use tcpdump to verify packets arrive out of order at the interface. Work with network admin to implement QoS or traffic shaping. For persistent issues, consider dedicated capture card with hardware timestamping (see Napatech).

ℹ️ Note: For out-of-order packets in client/server mode (multiple sniffers), see Sniffer_distributed_architecture for pcap_queue_dequeu_window_length configuration.

Solutions for SPAN/Mirroring Reordering

If packets arrive out of order at the SPAN/mirror port (e.g., 302 responses before INVITE causing "000 no response" errors):

1. Configure switch to preserve packet order: Many switches allow configuring SPAN/mirror ports to maintain packet ordering. Consult your switch documentation for packet ordering guarantees in mirroring configuration.

2. Replace SPAN with TAP or packet broker: Unlike software-based SPAN mirroring, hardware TAPs and packet brokers guarantee packet order. Consider upgrading to a dedicated TAP or packet broker device for mission-critical monitoring.

Database Issues

SQL Queue Overload

Symptom: Growing SQLq metric, potential coredumps.

# voipmonitor.conf - increase threads
mysqlstore_concat_limit_cdr = 1000
cdr_check_exists_callid = 0

Error 1062 - Lookup Table Limit

Symptom: Duplicate entry '16777215' for key 'PRIMARY'

Quick fix:

# voipmonitor.conf
cdr_reason_string_enable = no

See Database Troubleshooting for complete solution.

Bad Packet Errors

Symptom: bad packet with ether_type 0xFFFF detected on interface

Diagnosis:

# Run diagnostic (let run 30-60 seconds, then kill)
voipmonitor --check_bad_ether_type=eth0

# Find and kill the diagnostic process
ps ax | grep voipmonitor
kill -9 <PID>

Causes: corrupted packets, driver issues, VLAN tagging problems. Check ethtool -S eth0 for interface errors.

Useful Diagnostic Commands

tshark Filters for SIP

# All SIP INVITEs
tshark -r capture.pcap -Y "sip.Method == INVITE"

# Find specific phone number
tshark -r capture.pcap -Y 'sip contains "5551234567"'

# Get Call-IDs
tshark -r capture.pcap -Y "sip.Method == INVITE" -T fields -e sip.Call-ID

# SIP errors (4xx, 5xx)
tshark -r capture.pcap -Y "sip.Status-Code >= 400"

Interface Statistics

# Detailed NIC stats
ethtool -S eth0

# Watch packet rates
watch -n 1 'cat /proc/net/dev | grep eth0'

See Also




AI Summary for RAG

Summary: Troubleshooting guide for VoIPmonitor sniffer/sensor issues organized by symptom. CRITICAL FIRST STEP: Run tcpdump -i eth0 -nn "host <IP> and port 5060" before any sensor tuning - if no packets visible, it's a network/SPAN issue, not sensor. Main problem categories: (1) No calls recorded - check service status, interface config, sipport, GUI capture rules, SPAN configuration; (2) Missing audio - check for asymmetric mirroring (one-way SPAN), NAT config with natalias, auto_enable_use_blocks for SIP/RTP on different NICs; (3) PACKETBUFFER saturation - diagnose I/O vs CPU by temporarily disabling savesip/savertp, for CPU bottleneck use rtpthreads_start=20, max_buffer_mem=10000; (4) Storage failure - smartctl diagnostics, RAID status; (5) OOM - identify victim in dmesg, reduce innodb_buffer_pool_size or max_buffer_mem; (6) Service startup - interface name changes after OS upgrade, missing libpcap; (7) Network issues - promiscuous mode, interface drops, bonded interfaces (use physical not logical), offloading (disable gso/tso/gro); (8) Database - SQL queue overload, Error 1062 lookup table limit. For packet ordering issues, first rule out Wireshark display artifact, then investigate network packet bursts. Bad ether_type errors: diagnose with voipmonitor --check_bad_ether_type=eth0.

Keywords: troubleshooting, no calls, PACKETBUFFER, OOM, tcpdump, tshark, SPAN, RSPAN, ERSPAN, interface, sipport, filter, capture rules, snaplen, asymmetric mirroring, one-way audio, natalias, NAT, auto_enable_use_blocks, rtpthreads_start, max_buffer_mem, storage failure, smartctl, promiscuous mode, bonded interface, EtherChannel, network offloading, gso, tso, packet ordering, SQL queue, Error 1062, bad ether_type, service startup, interface name change

Key Questions:

  • Why is VoIPmonitor not recording any calls?
  • How do I verify packets are reaching the capture interface?
  • What causes PACKETBUFFER saturation?
  • How do I diagnose if PACKETBUFFER issue is I/O or CPU bottleneck?
  • Why is only one direction of audio being recorded?
  • How do I configure natalias for NAT scenarios?
  • What causes RTP to not be associated with the correct call?
  • Why does the sensor show disconnected with dropped packets at low traffic?
  • How do I check for OOM kills?
  • Why does the service fail to start after OS upgrade?
  • Do I need promiscuous mode for ERSPAN?
  • Why does VoIPmonitor report false packet loss on bonded interfaces?
  • How do I diagnose bad ether_type packet errors?
  • What tshark filters are useful for SIP troubleshooting?