|
|
| (61 intermediate revisions by 2 users not shown) |
| Line 1: |
Line 1: |
| {{DISPLAYTITLE:Troubleshooting: No Calls Being Sniffed}}
| | = Sniffer Troubleshooting = |
|
| |
|
| '''This guide provides a systematic, step-by-step process to diagnose why the VoIPmonitor sensor might not be capturing any calls. Follow these steps in order to quickly identify and resolve the most common issues.'''
| | This page covers common VoIPmonitor sniffer/sensor problems organized by symptom. For configuration reference, see [[Sniffer_configuration]]. For performance tuning, see [[Scaling]]. |
|
| |
|
| == Troubleshooting Flowchart == | | == Critical First Step: Is Traffic Reaching the Interface? == |
|
| |
|
| <mermaid>
| | {{Warning|Before any sensor tuning, verify packets are reaching the network interface. If packets aren't there, no amount of sensor configuration will help.}} |
| flowchart TD
| |
| A[No Calls Being Captured] --> B{Step 1: Service Running?}
| |
| B -->|No| B1[systemctl restart voipmonitor]
| |
| B -->|Yes| C{Step 2: Traffic on Interface?<br/>tshark -i eth0 -Y 'sip'}
| |
|
| |
|
| C -->|No packets| D[Step 3: Network Issue]
| |
| D --> D1{Interface UP?}
| |
| D1 -->|No| D2[ip link set dev eth0 up]
| |
| D1 -->|Yes| D3{SPAN/RSPAN?}
| |
| D3 -->|Yes| D4[Enable promisc mode]
| |
| D3 -->|ERSPAN/GRE/TZSP| D5[Check tunnel config]
| |
|
| |
| C -->|Packets visible| E[Step 4: VoIPmonitor Config]
| |
| E --> E1{interface correct?}
| |
| E1 -->|No| E2[Fix interface in voipmonitor.conf]
| |
| E1 -->|Yes| E3{sipport correct?}
| |
| E3 -->|No| E4[Add port: sipport = 5060,5080]
| |
| E3 -->|Yes| E5{BPF filter blocking?}
| |
| E5 -->|Maybe| E6[Comment out filter directive]
| |
|
| |
| E5 -->|No| F[Step 5: GUI Capture Rules]
| |
| F --> F1{Rules with Skip: ON?}
| |
| F1 -->|Yes| F2[Remove/modify rules + reload sniffer]
| |
| F1 -->|No| G[Step 6: Check Logs]
| |
|
| |
| G --> H{OOM Events?}
| |
| H -->|Yes| H1[Step 7: Add RAM / tune MySQL]
| |
| H -->|No| I{Large SIP packets?}
| |
| I -->|Yes| I1{External SIP source?<br/>Kamailio/HAProxy mirror}
| |
| I1 -->|No| I2[Increase snaplen in voipmonitor.conf]
| |
| I1 -->|Yes| I3[Fix external source: Kamailio siptrace or HAProxy tee]
| |
| I2 --> I4[If snaplen change fails, recheck with tcpdump -s0]
| |
| I4 --> I1
| |
| I -->|No| J[Contact Support]
| |
| </mermaid>
| |
|
| |
| == Post-Reboot Verification Checklist ==
| |
| After a planned server reboot, verify these critical items to ensure VoIPmonitor operates correctly. This check helps identify issues that may occur when configurations are not persisted across reboots.
| |
|
| |
| === Verify Firewall/Iptables Rules ===
| |
|
| |
| After a system restart, verify that firewall rules have been correctly applied and are allowing necessary traffic. Firewall rules may need to be manually re-applied if they were not made persistent.
| |
|
| |
| ;1. Check current firewall status:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # For systems using iptables | | # Check for SIP traffic on the capture interface |
| iptables -L -n -v
| | tcpdump -i eth0 -nn "host <PROBLEMATIC_IP> and port 5060" -c 10 |
|
| |
|
| # For systems using firewalld | | # If no packets: Network/SPAN issue - contact network admin |
| firewall-cmd --list-all
| | # If packets visible: Proceed with sensor troubleshooting below |
| | |
| # For systems using ufw | |
| ufw status verbose
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| ;2. Verify critical ports are allowed:
| | <kroki lang="mermaid"> |
| Ensure the firewall permits traffic on the following VoIPmonitor ports:
| | graph TD |
| * SIP ports (default: 5060/udp, or your configured sipport values)
| | A[No Calls Recorded] --> B{Packets on interface?<br/>tcpdump -i eth0 port 5060} |
| * RTP ports (range used by your PBX)
| | B -->|No packets| C[Network Issue] |
| * GUI access (typically: 80/tcp, 443/tcp)
| | C --> C1[Check SPAN/mirror config] |
| * Sensor management port: 5029/tcp
| | C --> C2[Verify VLAN tagging] |
| * Client-Server connection port: 60024/tcp (for distributed setups)
| | C --> C3[Check cable/port] |
| | B -->|Packets visible| D[Sensor Issue] |
| | D --> D1[Check voipmonitor.conf] |
| | D --> D2[Check GUI Capture Rules] |
| | D --> D3[Check logs for errors] |
| | </kroki> |
|
| |
|
| ;3. Make firewall rules persistent:
| | == Quick Diagnostic Checklist == |
| To prevent firewall rules from being lost after future reboots:
| |
|
| |
|
| '''For iptables (Debian/Ubuntu):'''
| | {| class="wikitable" |
| <syntaxhighlight lang="bash"> | | |- |
| # Save current rules
| | ! Check !! Command !! Expected Result |
| iptables-save > /etc/iptables/rules.v4
| | |- |
| # Install persistent package if not present
| | | Service running || <code>systemctl status voipmonitor</code> || Active (running) |
| apt-get install iptables-persistent
| | |- |
| </syntaxhighlight> | | | Traffic on interface || <code>tshark -i eth0 -c 5 -Y "sip"</code> || SIP packets displayed |
| | |- |
| | | Interface errors || <code>ip -s link show eth0</code> || No RX errors/drops |
| | |- |
| | | Promiscuous mode || <code>ip link show eth0</code> || PROMISC flag present |
| | |- |
| | | Logs || <code>tail -100 /var/log/syslog \| grep voip</code> || No critical errors |
| | |- |
| | | GUI rules || Settings → Capture Rules || No unexpected "Skip" rules |
| | |} |
|
| |
|
| '''For firewalld (CentOS/RHEL):'''
| | == No Calls Being Recorded == |
| <syntaxhighlight lang="bash">
| |
| # Runtime rules automatically persist with --permanent flag
| |
| firewall-cmd --permanent --add-port=5060/udp
| |
| firewall-cmd --permanent --add-port=60024/tcp
| |
| firewall-cmd --reload
| |
| </syntaxhighlight>
| |
|
| |
|
| === Verify System Time Synchronization === | | === Service Not Running === |
|
| |
|
| Correct system time synchronization is '''critical''', especially when using the <code>packetbuffer_sender</code> option in distributed architectures. Time mismatches between hosts and servers can cause call correlation failures and dropped packets.
| | <syntaxhighlight lang="bash"> |
| | # Check status |
| | systemctl status voipmonitor |
|
| |
|
| ;1. Check current NTP/chrony status:
| | # View recent logs |
| <syntaxhighlight lang="bash">
| | journalctl -u voipmonitor --since "10 minutes ago" |
| # For systems using NTP
| |
| ntpstat
| |
|
| |
|
| # For systems using chrony | | # Start/restart |
| chronyc tracking
| | systemctl restart voipmonitor |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| ;2. Verify time synchronization with servers:
| | Common startup failures: |
| <syntaxhighlight lang="bash"> | | * '''Interface not found''': Check <code>interface</code> in voipmonitor.conf matches <code>ip a</code> output |
| # For NTP
| | * '''Port already in use''': Another process using the management port |
| ntpq -p
| | * '''License issue''': Check [[License]] for activation problems |
| | |
| # For chrony
| |
| chronyc sources -v
| |
| </syntaxhighlight> | |
|
| |
|
| '''Expected output:''' Time offset should be minimal (ideally under 100 milliseconds). Large offsets (several seconds) indicate synchronization problems.
| | === Wrong Interface or Port Configuration === |
|
| |
|
| ;3. Manual sync if needed (temporary fix):
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Force immediate NTP sync | | # Check current config |
| sudo systemctl restart ntp
| | grep -E "^interface|^sipport" /etc/voipmonitor.conf |
|
| |
|
| # For chrony | | # Example correct config: |
| sudo chronyc makestep
| | # interface = eth0 |
| | # sipport = 5060 |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| '''Critical for packetbuffer_sender mode:''' When using <code>packetbuffer_sender=yes</code> to forward raw packets from remote sensors to a central server, the host and server '''must have synchronized times'''. VoIPmonitor requires host and server times to match for proper call correlation and packet processing. Maximum allowed time difference is 2 seconds by default (configurable via <code>client_server_connect_maximum_time_diff_s</code>).
| | {{Tip|For multiple SIP ports: <code>sipport = 5060,5061,5080</code>}} |
|
| |
|
| ;4. Check distributed architecture time sync:
| | === GUI Capture Rules Blocking === |
| In Client-Server mode, ensure all sensors and the central server are synchronized to the same NTP servers:
| |
|
| |
|
| <syntaxhighlight lang="bash">
| | Navigate to '''Settings → Capture Rules''' and check for rules with action "Skip" that may be blocking calls. Rules are processed in order - a Skip rule early in the list will block matching calls. |
| # On each sensor and central server
| |
| timedatectl status
| |
| </syntaxhighlight>
| |
|
| |
|
| Look for: <code>System clock synchronized: yes</code>
| | See [[Capture_rules]] for detailed configuration. |
|
| |
|
| If times are not synchronized across distributed components:
| | === SPAN/Mirror Not Configured === |
| * Verify all systems point to the same reliable NTP source
| |
| * Check firewall allows UDP port 123 (NTP)
| |
| * Ensure timezones are consistent across all systems
| |
|
| |
|
| '''Troubleshooting time sync issues:'''
| | If <code>tcpdump</code> shows no traffic: |
| * Check firewall rules allow NTP (UDP port 123)
| | # Verify switch SPAN/mirror port configuration |
| * Verify NTP servers are reachable: <code>ping pool.ntp.org</code>
| | # Check that both directions (ingress + egress) are mirrored |
| * Review NTP configuration: <code>/etc/ntp.conf</code> or <code>/etc/chrony.conf</code>
| | # Confirm VLAN tagging is preserved if needed |
| * Ensure time service is enabled to start on boot: <code>systemctl enable ntp</code>
| | # Test physical connectivity (cable, port status) |
|
| |
|
| == Step 1: Is the VoIPmonitor Service Running Correctly? ==
| | See [[Sniffing_modes]] for SPAN, RSPAN, and ERSPAN configuration. |
| First, confirm that the sensor process is active and loaded the correct configuration file.
| |
|
| |
|
| ;1. Check the service status (for modern systemd systems):
| | === Filter Parameter Too Restrictive === |
| <syntaxhighlight lang="bash">
| |
| systemctl status voipmonitor
| |
| </syntaxhighlight>
| |
| Look for a line that says <code>Active: active (running)</code>. If it is inactive or failed, try restarting it with <code>systemctl restart voipmonitor</code> and check the status again.
| |
|
| |
|
| ;2. Verify the running process:
| | If <code>filter</code> is set in voipmonitor.conf, it may exclude traffic: |
| <syntaxhighlight lang="bash"> | |
| ps aux | grep voipmonitor
| |
| </syntaxhighlight> | |
| This command will show the running process and the exact command line arguments it was started with. Critically, ensure it is using the correct configuration file, for example: <code>--config-file /etc/voipmonitor.conf</code>. If it is not, there may be an issue with your startup script.
| |
| | |
| === Troubleshooting: Missing Package or Library Dependencies ===
| |
| | |
| If the sensor service fails to start or crashes immediately with an error about a "missing package" or "missing library," it indicates that a required system dependency is not installed on the server. This is most common on newly installed sensors or fresh operating system installations.
| |
|
| |
|
| ;1. Check the system logs for the specific error message:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # For Debian/Ubuntu | | # Check filter |
| tail -f /var/log/syslog | grep voipmonitor
| | grep "^filter" /etc/voipmonitor.conf |
|
| |
|
| # For CentOS/RHEL/AlmaLinux or systemd systems | | # Temporarily disable to test |
| journalctl -u voipmonitor -f
| | # Comment out the filter line and restart |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| ;2. Common missing packages for sensors:
| |
| Most sensor missing package issues are resolved by installing the <code>rrdtools</code> package. This is required for RRD (Round-Robin Database) graphing and statistics functionality.
| |
|
| |
| <syntaxhighlight lang="bash">
| |
| # For Debian/Ubuntu
| |
| apt-get update && apt-get install rrdtool
| |
|
| |
|
| # For CentOS/RHEL/AlmaLinux
| |
| yum install rrdtool
| |
| # OR
| |
| dnf install rrdtool
| |
| </syntaxhighlight>
| |
|
| |
|
| ;3. Other frequently missing dependencies:
| | ==== Missing id_sensor Parameter ==== |
| If the error references a specific shared library or binary, install it using your package manager. Common examples:
| |
|
| |
|
| * <code>libpcap</code> or <code>libpcap-dev</code>: Packet capture library
| | '''Symptom''': SIP packets visible in Capture/PCAP section but missing from CDR, SIP messages, and Call flow. |
| * <code>libssl</code> or <code>libssl-dev</code>: SSL/TLS support
| |
| * <code>zlib</code> or <code>zlib1g-dev</code>: Compression library
| |
|
| |
|
| ;4. Verify shared library dependencies:
| | '''Cause''': The <code>id_sensor</code> parameter is not configured or is missing. This parameter is required to associate captured packets with the CDR database. |
| If the error mentions a specific shared library (e.g., <code>error while loading shared libraries: libxxx.so</code>), check which libraries the binary is trying to load:
| |
|
| |
|
| | '''Solution''': |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| ldd /usr/local/sbin/voipmonitor | grep pcap
| | # Check if id_sensor is set |
| </syntaxhighlight>
| | grep "^id_sensor" /etc/voipmonitor.conf |
|
| |
|
| If <code>ldd</code> reports "not found," install the missing library using your package manager.
| | # Add or correct the parameter |
| | echo "id_sensor = 1" >> /etc/voipmonitor.conf |
|
| |
|
| ;5. After installing the missing package, restart the sensor service:
| | # Restart the service |
| <syntaxhighlight lang="bash">
| |
| systemctl restart voipmonitor | | systemctl restart voipmonitor |
| systemctl status voipmonitor
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Verify the service starts successfully and is now <code>Active: active (running)</code>.
| | {{Tip|Use a unique numeric identifier (1-65535) for each sensor. Essential for multi-sensor deployments. See [[Sniffer_configuration#id_sensor|id_sensor documentation]].}} |
| | == Missing Audio / RTP Issues == |
|
| |
|
| === Troubleshooting: Cron Daemon Not Running === | | === One-Way Audio (Asymmetric Mirroring) === |
|
| |
|
| If the VoIPmonitor sniffer service fails to start or the sensor appears unavailable despite the systemd service being configured correctly, the cron daemon may not be running. Some VoIPmonitor deployment methods and maintenance scripts rely on cron for proper initialization and periodic tasks.
| | '''Symptom''': SIP recorded but only one RTP direction captured. |
|
| |
|
| ;1. Check the cron daemon status:
| | '''Cause''': SPAN port configured for only one direction. |
| <syntaxhighlight lang="bash">
| |
| systemctl status cron
| |
| </syntaxhighlight>
| |
|
| |
|
| Look for <code>Active: active (running)</code>. If the status shows inactive or failed, the cron daemon is not running.
| | '''Diagnosis''': |
| | |
| ;2. Alternative check for systems using cron (not crond):
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| systemctl status crond
| | # Count RTP packets per direction |
| | tshark -i eth0 -Y "rtp" -T fields -e ip.src -e ip.dst | sort | uniq -c |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Note: On CentOS/RHEL systems, the service is typically named <code>crond</code>, while on Debian/Ubuntu systems it is named <code>cron</code>.
| | If one direction shows 0 or very few packets, configure the switch to mirror both ingress and egress traffic. |
|
| |
|
| ;3. Start the cron daemon if it is inactive:
| | === RTP Not Associated with Call === |
| <syntaxhighlight lang="bash">
| |
| # For Debian/Ubuntu systems
| |
| systemctl start cron
| |
|
| |
|
| # For CentOS/RHEL/AlmaLinux systems
| | '''Symptom''': Audio plays in sniffer but not in GUI, or RTP listed under wrong call. |
| systemctl start crond
| |
| </syntaxhighlight>
| |
|
| |
|
| ;4. Enable cron to start automatically on boot:
| | '''Possible causes''': |
| <syntaxhighlight lang="bash">
| |
| # For Debian/Ubuntu systems
| |
| systemctl enable cron
| |
|
| |
|
| # For CentOS/RHEL/AlmaLinux systems
| | '''1. SIP and RTP on different interfaces/VLANs''': |
| systemctl enable crond
| | <syntaxhighlight lang="ini"> |
| | # voipmonitor.conf - enable automatic RTP association |
| | auto_enable_use_blocks = yes |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| ;5. After starting the cron daemon, restart the VoIPmonitor service:
| | '''2. NAT not configured''': |
| <syntaxhighlight lang="bash">
| | <syntaxhighlight lang="ini"> |
| systemctl restart voipmonitor
| | # voipmonitor.conf - for NAT scenarios |
| systemctl status voipmonitor
| | natalias = <public_ip> <private_ip> |
| </syntaxhighlight>
| |
| | |
| Verify the service now shows <code>Active: active (running)</code> and the sensor becomes visible in the GUI.
| |
| | |
| === Root Cause ===
| |
| The cron daemon being inactive can prevent VoIPmonitor from starting properly in scenarios where:
| |
| * Installation scripts use cron for post-install configuration
| |
| * Maintenance or cleanup jobs are required for proper sensor operation
| |
| * System initialization processes depend on cron-based tasks
| |
| * The sensor was recently rebooted or upgraded and cron failed to start
| |
| | |
| === Long-Term Stability ===
| |
| If the cron daemon is consistently failing to start after reboots:
| |
| * Check system logs for cron startup errors: <code>journalctl -u cron -n 50</code> or <code>journalctl -u crond -n 50</code>
| |
| * Verify that the server has sufficient resources (CPU, memory) to run all required system services
| |
| * Investigate performance bottlenecks that may be causing system services to fail to start
| |
| * Ensure no other system services are conflicting or preventing cron from starting
| |
| | |
| == Step 2: Is Network Traffic Reaching the Server? ==
| |
| If the service is running, the next step is to verify if the VoIP packets (SIP/RTP) are actually arriving at the server's network interface. The best tool for this is <code>tshark</code> (the command-line version of Wireshark).
| |
| | |
| ;1. Install tshark:
| |
| <syntaxhighlight lang="bash"> | |
| # For Debian/Ubuntu | |
| apt-get update && apt-get install tshark
| |
| | |
| # For CentOS/RHEL/AlmaLinux
| |
| yum install wireshark
| |
| </syntaxhighlight>
| |
| | |
| ;2. Listen for SIP traffic on the correct interface:
| |
| Replace <code>eth0</code> with the interface name you have configured in <code>voipmonitor.conf</code>.
| |
| <syntaxhighlight lang="bash">
| |
| tshark -i eth0 -Y "sip || rtp" -n
| |
| </syntaxhighlight>
| |
| * '''If you see a continuous stream of SIP and RTP packets''', it means traffic is reaching the server, and the problem is likely in VoIPmonitor's configuration (see Step 4).
| |
| * '''If you see NO packets''', the problem lies with your network configuration. Proceed to Step 3.
| |
| | |
| == Step 3: Troubleshoot Network and Interface Configuration ==
| |
| If <code>tshark</code> shows no traffic, it means the packets are not being delivered to the operating system correctly.
| |
|
| |
|
| ;1. Check if the interface is UP:
| | # If not working, try reversed order: |
| Ensure the network interface is active.
| | natalias = <private_ip> <public_ip> |
| <syntaxhighlight lang="bash">
| |
| ip link show eth0
| |
| </syntaxhighlight>
| |
| The output should contain the word <code>UP</code>. If it doesn't, bring it up with:
| |
| <syntaxhighlight lang="bash">
| |
| ip link set dev eth0 up
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| ;2. Check for Interface Packet Drops:
| | '''3. External device modifying media ports''': |
| If calls are missing, showing "000" as the last response, or have silent audio, the root cause may be packets being dropped at the **network interface level** BEFORE they reach VoIPmonitor. This is different from sensor resource limitations.
| |
| | |
| Check the interface statistics for packet drops on the sniffing interface:
| |
|
| |
|
| | If SDP advertises one port but RTP arrives on different port (SBC/media server issue): |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Check detailed interface statistics, packet errors, and drops | | # Compare SDP ports vs actual RTP |
| ip -s -s l l eth0
| | tshark -r call.pcap -Y "sip.Method == INVITE" -V | grep "m=audio" |
| | tshark -r call.pcap -Y "rtp" -T fields -e udp.dstport | sort -u |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| The output shows RX (receive) and TX (transmit) statistics. Look specifically at the <code>dropped</code> counter under the <code>RX</code> section:
| | If ports don't match, the external device must be configured to preserve SDP ports - VoIPmonitor cannot compensate. |
| | === RTP Incorrectly Associated with Wrong Call (PBX Port Reuse) === |
| | '''Symptom''': RTP streams from one call appear associated with a different CDR when your PBX aggressively reuses the same IP:port across multiple calls. |
|
| |
|
| <pre>
| | '''Cause''': When PBX reuses media ports, VoIPmonitor may incorrectly correlate RTP packets to the wrong call based on weaker correlation methods. |
| 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
| |
| link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
| |
| RX: bytes packets errors dropped overrun mcast
| |
| 12345123 45678 12 5432 0 0
| |
| TX: bytes packets errors dropped carrier collsns
| |
| 9876543 23456 5 0 0 0
| |
| </pre>
| |
|
| |
|
| {| class="wikitable" style="background:#fff3cd; border:1px solid #ffc107;"
| | '''Solution''': Enable <code>rtp_check_both_sides_by_sdp</code> to require verification of both source and destination IP:port against SDP: |
| |-
| | <syntaxhighlight lang="ini"> |
| ! colspan="2" style="background:#ffc107;" | Critical: Interface Packet Drops vs Sensor Drops
| | # voipmonitor.conf - require both source and destination to match SDP |
| |-
| | rtp_check_both_sides_by_sdp = yes |
| | style="vertical-align: top;" | '''Interface drops (kernel level):'''
| |
| | Packets dropped by the network card driver BEFORE reaching VoIPmonitor. Use <code>ip -s -s l l</code> to check. Root cause is network infrastructure: switch overload, duplex mismatch, faulty NIC/switch port.
| |
| |-
| |
| | style="vertical-align: top;" | '''Sensor drops (VoIPmonitor level):'''
| |
| | Packets received by the OS but dropped by VoIPmonitor due to high CPU load, insufficient ringbuffer, or configuration limits. Check the "# packet drops" counter in GUI Settings → Sensors and <code>t0CPU</code> metric in logs.
| |
| |}
| |
|
| |
|
| ;Diagnosing Interface Packet Drops:
| | # Alternative (strict) mode - allows initial unverified packets |
| | | rtp_check_both_sides_by_sdp = strict |
| '''Step 1: Check if the dropped counter is increasing:'''
| |
| | |
| Run the interface statistics command multiple times while making test calls:
| |
| | |
| <syntaxhighlight lang="bash">
| |
| # First measurement | |
| ip -s -s l l eth0 | grep -A 1 "RX:"
| |
| | |
| # Make a test call during which you expect to see dropped packets
| |
| | |
| # Second measurement 10-30 seconds later
| |
| ip -s -s l l eth0 | grep -A 1 "RX:"
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| If the <code>dropped</code> value increases between measurements during test calls, the interface is losing packets due to infrastructure issues.
| | {{Warning|Enabling this may prevent RTP association for calls using NAT, as the source IP:port will not match the SDP. Use <code>natalias</code> mappings or the <code>strict</code> setting to mitigate this.}} |
| | === Snaplen Truncation === |
|
| |
|
| '''Step 2: Identify the root cause of interface packet drops:''' | | '''Symptom''': Large SIP messages truncated, incomplete headers. |
|
| |
|
| Common causes of network interface packet drops:
| | '''Solution''': |
| | | <syntaxhighlight lang="ini"> |
| * '''Network switch port overload:''' The switch port connected to the VoIPmonitor sensor is receiving more traffic than it can forward to the sensor. This is common during peak traffic hours.
| | # voipmonitor.conf - increase packet capture size |
| * '''Duplex/speed mismatch:''' The server NIC and switch port are configured with mismatched speed or duplex settings (e.g., NIC set to 100Mbps/half-duplex while switch is 1Gbps/full-duplex).
| | snaplen = 8192 |
| * '''Faulty hardware:''' Defective network interface card (NIC) or a damaged switch port.
| |
| | |
| ;Specific diagnostic actions:
| |
| | |
| * '''Check duplex/speed negotiation:'''
| |
| <syntaxhighlight lang="bash"> | |
| # Check the current speed and duplex setting | |
| ethtool eth0
| |
| </syntaxhighlight> | | </syntaxhighlight> |
| Look for the <code>Speed</code> and <code>Duplex</code> values. Ensure they match the switch port configuration (e.g., Speed: 1000Mb/s, Duplex: Full).
| |
|
| |
|
| * '''Check for driver errors:'''
| | For Kamailio siptrace, also check <code>trace_msg_fragment_size</code> in Kamailio config. See [[Sniffer_configuration#snaplen|snaplen documentation]]. |
| <syntaxhighlight lang="bash"> | |
| # Look for NIC driver messages that indicate hardware issues
| |
| dmesg | grep -i eth0 | tail -50
| |
| </syntaxhighlight> | |
|
| |
|
| * '''Test with a different network interface or cable:'''
| | == PACKETBUFFER Saturation == |
| If the issue persists after verifying duplex/speed, try connecting the sensor to a different switch port or using a different cable to rule out hardware faults.
| |
|
| |
|
| ;Step 3: Verify bidirectional SIP traffic in SPAN/mirroring configuration:
| | '''Symptom''': Log shows <code>PACKETBUFFER: memory is FULL</code>, truncated RTP recordings. |
|
| |
|
| Even if the interface shows no packet drops, verify that your network switch SPAN/mirror configuration is sending **BOTH directions** of SIP traffic to the sniffing interface. Missing one direction causes incomplete CDRs and incorrect Last Response tracking.
| | {{Warning|This alert refers to VoIPmonitor's '''internal packet buffer''' (<code>max_buffer_mem</code>), '''NOT system RAM'''. High system memory availability does not prevent this error. The root cause is always a downstream bottleneck (disk I/O or CPU) preventing packets from being processed fast enough.}} |
|
| |
|
| Use <code>tshark</code> during a test call to verify bidirectional SIP flow:
| | '''Before testing solutions''', gather diagnostic data: |
| | * Check sensor logs: <code>/var/log/syslog</code> (Debian/Ubuntu) or <code>/var/log/messages</code> (RHEL/CentOS) |
| | * Generate debug log via GUI: '''Tools → Generate debug log''' |
|
| |
|
| <syntaxhighlight lang="bash">
| | === Diagnose: I/O vs CPU Bottleneck === |
| # Monitor INVITE requests and their responses to confirm bidirectional flow
| |
| # Replace eth0 with your sniffing interface
| |
| tshark -i eth0 -Y 'sip.CSeq.method == "INVITE" || sip.Status-Code' -n
| |
| </syntaxhighlight>
| |
| | |
| If you see INVITE requests but NO corresponding responses (like 200 OK, 404, 500), your SPAN/mirror configuration is only capturing one direction of traffic. This requires network switch configuration changes:
| |
| | |
| * **Cisco switches:** Verify SPAN source includes `both` direction:
| |
| <syntaxhighlight lang="bash">
| |
| show running-config | include monitor session
| |
| # Should include: monitor session 1 source interface GigabitEthernet1/1 both
| |
| </syntaxhighlight>
| |
| | |
| * **Other switch vendors:** Refer to switch documentation for SPAN mirroring direction configuration.
| |
| | |
| ;Summary of workflow when interface packet drops are detected:
| |
| | |
| # Use <code>ip -s -s l l</code> to check for increasing <code>dropped</code> counter
| |
| # Confirm drops occur during test calls with repeated measurements
| |
| # Check duplex/speed with <code>ethtool</code>
| |
| # Verify switch port configuration matches NIC settings
| |
| # Check for hardware faults (different port, different cable)
| |
| # Verify SPAN/mirror sends both directions of SIP traffic with <code>tshark -Y sip.CSeq.method == "INVITE"</code>
| |
| # Resolve the underlying network infrastructure issue before tuning VoIPmonitor configuration
| |
| | |
| {{Warning|Interface packet drops cannot be fixed with VoIPmonitor configuration changes. Increasing ringbuffer, adjusting pcap_queue_deque_window_length, or other sniffer tuning will NOT resolve packet drops at the kernel/interface level. You must fix the network infrastructure first.}}
| |
|
| |
|
| ;3. Check for Promiscuous Mode (for SPAN/RSPAN Mirrored Traffic):
| | {{Warning|Do not guess the bottleneck source. Use proper diagnostics first to identify whether the issue is disk I/O, CPU, or database-related. Disabling storage as a test is valid but should be used to '''confirm''' findings, not as the primary diagnostic method.}} |
| '''Important:''' Promiscuous mode requirements depend on your traffic mirroring method: | |
|
| |
|
| * '''SPAN/RSPAN (Layer 2 mirroring):''' The network interface '''must''' be in promiscuous mode. Mirrored packets retain their original MAC addresses, so the interface would normally ignore them. Promiscuous mode forces the interface to accept all packets regardless of destination MAC.
| | ==== Step 1: Check IO[] Metrics (v2026.01.3+) ==== |
|
| |
|
| * '''ERSPAN/GRE/TZSP/VXLAN (Layer 3 tunnels):''' Promiscuous mode is '''NOT required'''. These tunneling protocols encapsulate the mirrored traffic inside IP packets that are addressed directly to the sensor's IP address. The operating system receives these packets normally, and VoIPmonitor automatically decapsulates them to extract the inner SIP/RTP traffic.
| | '''Starting with version 2026.01.3''', VoIPmonitor includes built-in disk I/O monitoring that directly shows disk saturation status: |
|
| |
|
| For SPAN/RSPAN deployments, check the current promiscuous mode status:
| | <syntaxhighlight lang="text"> |
| <syntaxhighlight lang="bash"> | | [283.4/283.4Mb/s] IO[B1.1|L0.7|U45|C75|W125|R10|WI1.2k|RI0.5k] |
| ip link show eth0
| |
| </syntaxhighlight> | | </syntaxhighlight> |
| Look for the <code>PROMISC</code> flag.
| |
|
| |
|
| Enable promiscuous mode manually if needed:
| | '''Quick interpretation:''' |
| <syntaxhighlight lang="bash">
| | {| class="wikitable" |
| ip link set eth0 promisc on
| | |- |
| </syntaxhighlight>
| | ! Metric !! Meaning !! Problem Indicator |
| If this solves the problem, you should make the change permanent. The <code>install-script.sh</code> for the sensor usually attempts to do this, but it can fail.
| | |- |
| | | '''C''' (Capacity) || % of disk's sustainable throughput used || '''C ≥ 80% = Warning''', '''C ≥ 95% = Saturated''' |
| | |- |
| | | '''L''' (Latency) || Current write latency in ms || '''L ≥ 3× B''' (baseline) = Saturated |
| | |- |
| | | '''U''' (Utilization) || % time disk is busy || '''U > 90%''' = Disk at limit |
| | |} |
|
| |
|
| ;3A. Troubleshooting: Missing Packets for Specific IPs During High-Traffic Periods:
| | '''If you see <code>DISK_SAT</code> or <code>WARN</code> after IO[]:''' |
| If calls are missing only for certain IP addresses or specific call flows (particularly during high-traffic periods), the issue is typically at the network infrastructure level (SPAN configuration) rather than sensor resource limits. Use this systematic approach: | | <syntaxhighlight lang="text"> |
| | | IO[B1.1|L8.5|U98|C97|W890|R5|WI12.5k|RI0.1k] DISK_SAT |
| === Step 1: Use tcpdump to Verify Packet Arrival ===
| |
| | |
| Before tuning any sensor configuration, first verify if the missing packets are actually reaching the sensor's network interface. Use <code>tcpdump</code> for this verification:
| |
| | |
| <syntaxhighlight lang="bash"> | |
| # Listen for SIP packets from a specific IP during the next high-traffic window
| |
| # Replace eth0 with your interface and 10.1.2.3 with the problematic IP
| |
| tcpdump -i eth0 -nn "host 10.1.2.3 and port 5060" -v
| |
| | |
| # Or capture to a file for later analysis
| |
| tcpdump -i eth0 -nn "host 10.1.2.3 and port 5060" -w /tmp/trace_10.1.2.3.pcap
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Interpret the results:
| | → This confirms I/O bottleneck. Skip to [[#Solution:_I.2FO_Bottleneck|I/O Bottleneck Solutions]]. |
| * '''If you see SIP packets arriving:''' The traffic reaches the sensor. The issue is likely a sensor resource bottleneck (CPU, memory, or configuration limits). Proceed to [[#Sensor_Resource_Bottlenecks|Step 4: Check Sensor Statistics]].
| |
| * '''If you see NO packets or only intermittent packets:''' The traffic is not reaching the sensor. This indicates a network infrastructure issue. Proceed to [[#SPAN_Configuration_Troubleshooting|Step 2: Check SPAN Configuration]].
| |
| | |
| === Step 2: Check SPAN Configuration for Bidirectional Capture ===
| |
| | |
| If packets are missing at the interface level, verify your network switch's SPAN (port mirroring) configuration. During high-traffic periods, switches may have insufficient SPAN buffer capacity, causing packets to be dropped in the mirroring process itself.
| |
| | |
| Key verification points:
| |
| | |
| * '''Verify Source Ports:''' Confirm that both source IP addresses (or the switch ports they connect to) are included in the SPAN source list. Missing one direction of the call flow will result in incomplete CDRs.
| |
| | |
| * '''Check for Bidirectional Mirroring:''' Your SPAN configuration must capture '''BOTH inbound and outbound traffic'''. On most Cisco switches, this requires specifying:
| |
| <syntaxhighlight lang="bash">
| |
| monitor session 1 source interface GigabitEthernet1/1 both
| |
| </syntaxhighlight>
| |
| | |
| Replace <code>both</code> with:
| |
| * <code>rx</code> for incoming traffic only
| |
| * <code>tx</code> for outgoing traffic only
| |
| * <code>both</code> for bidirectional capture (recommended)
| |
| | |
| * '''Verify Destination Port:''' Confirm the SPAN destination points to the switch port where the VoIPmonitor sensor is connected.
| |
|
| |
|
| * '''Check SPAN Buffer Saturation (High-Traffic Issues):''' Some switches have limited SPAN buffer capacity. When monitoring multiple high-traffic ports simultaneously, the SPAN buffer may overflow during peak usage, causing randomized packet drops. Symptoms:
| | '''For older versions or additional confirmation''', continue with the steps below. |
| ** Drops occur only during busy hours
| |
| ** Missing packets are inconsistent across different calls
| |
| ** Sensor CPU usage and t0CPU metrics appear normal (no bottleneck at sensor)
| |
|
| |
|
| Solutions:
| | {{Note|See [[Syslog_Status_Line#IO.5B....5D_-_Disk_I.2FO_Monitoring_.28v2026.01.3.2B.29|Syslog Status Line - IO[] section]] for detailed field descriptions.}} |
| ** Reduce the number of monitored source ports in the SPAN session
| |
| ** Use multiple SPAN sessions if your switch supports it
| |
| ** Consider upgrading to a switch with higher SPAN buffer capacity
| |
|
| |
|
| * '''Verify Switch Interface Counters for Packet Drops:''' Check the network switch interface counters to determine if the switch itself is dropping packets during the mirroring process. This is critical when investigating false low MOS scores or packet loss reports.
| | ==== Step 2: Read the Full Syslog Status Line ==== |
|
| |
|
| Cisco switches:
| | VoIPmonitor outputs a status line every 10 seconds. This is your first diagnostic tool: |
| <syntaxhighlight lang="bash">
| |
| # Show general interface statistics for the SPAN source port
| |
| show interface GigabitEthernet1/1 counters
| |
| show interface GigabitEthernet1/1 | include drops|errors|Input queue|Output queue
| |
|
| |
|
| # Show detailed interface status (look for input errors, CRC, frame)
| |
| show interface GigabitEthernet1/1 detail
| |
|
| |
| # Monitor in real-time during a high-traffic period
| |
| show interface Gi1/1 accounting
| |
| </syntaxhighlight>
| |
|
| |
| Key indicators of switch-level packet loss:
| |
| ** Non-zero input errors or CRC errors on source/destination ports
| |
| ** Input queue drops (indicating switch buffer overflow)
| |
| ** Increasing drop counters during peak traffic hours
| |
| ** Output errors on the SPAN destination port (sensor may not be accepting fast enough)
| |
|
| |
| If switch interface counters show drops, the issue is at the network infrastructure level (overloaded switch), not the VoIPmonitor sensor. Consult your network administrator for switch optimization or consider redistributing SPAN traffic across multiple ports.
| |
|
| |
| * '''Verify VLAN Trunking:''' If the monitored traffic spans different VLANs, ensure the SPAN destination port is configured as a trunk to carry all necessary VLAN tags. Without trunk mode, packets from non-native VLANs will be dropped or stripped of their tags.
| |
|
| |
| For detailed instructions on configuring SPAN/ERSPAN/GRE for different network environments, see [[Sniffing_modes]].
| |
|
| |
| === Step 3: Check for Sensor Resource Bottlenecks ===
| |
|
| |
| If <code>tcpdump</code> confirms that packets are arriving at the interface consistently, but VoIPmonitor is still missing them, the issue may be sensor resource limitations.
| |
|
| |
| * '''Check Packet Drops:''' In the GUI, navigate to '''Settings → Sensors''' and look at the "# packet drops" counter. If this counter is non-zero or increasing during high traffic:
| |
| ** Increase the <code>ringbuffer</code> size in <code>voipmonitor.conf</code> (default 50 MB, max 2000 MB)
| |
| ** Check the <code>t0CPU</code> metric in system logs - if consistently above 90%, you may need to upgrade CPU or optimize NIC drivers
| |
|
| |
| * '''Monitor Memory Usage:''' Check for OOM (Out of Memory) killer events:
| |
| <syntaxhighlight lang="bash">
| |
| grep -i "out of memory\|killed process" /var/log/syslog | tail -20
| |
| </syntaxhighlight>
| |
|
| |
| * '''SIP Packet Limits:''' If only long or chatty calls are affected, check the <code>max_sip_packets_in_call</code> and <code>max_invite_packets_in_call</code> limits in <code>voipmonitor.conf</code>.
| |
|
| |
| ;3. Verify Your SPAN/Mirror/TAP Configuration:
| |
| This is the most common cause of no traffic. Double-check your network switch or hardware tap configuration to ensure:
| |
| * The correct source ports (where your PBX/SBC is connected) are being monitored.
| |
| * The correct destination port (where your VoIPmonitor sensor is connected) is configured.
| |
| * If you are monitoring traffic across different VLANs, ensure your mirror port is configured to carry all necessary VLAN tags (often called "trunk" mode).
| |
|
| |
| ;4. Investigate Packet Encapsulation (If tcpdump shows traffic but VoIPmonitor does not):
| |
| If <code>tcpdump</code> or <code>tshark</code> shows packets reaching the interface but VoIPmonitor is not capturing them, the traffic may be encapsulated in a tunnel that VoIPmonitor cannot automatically process without additional configuration. Common encapsulations include VLAN tags, ERSPAN, GRE, VXLAN, and TZSP.
| |
|
| |
| First, capture a sample of the traffic for analysis:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Capture 100 packets of SIP traffic to a pcap file | | # Monitor in real-time |
| tcpdump -i eth0 -c 100 -s0 port 5060 -w /tmp/encapsulation_check.pcap
| | journalctl -u voipmonitor -f |
| | # or |
| | tail -f /var/log/syslog | grep voipmonitor |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Then analyze the capture to identify encapsulation:
| | '''Example status line:''' |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="text"> |
| # Check for VLAN-tagged packets (802.1Q)
| | calls[424] PS[C:4 S:41 R:13540] SQLq[C:0 M:0] heap[45|30|20] comp[48] [25.6Mb/s] t0CPU[85%] t1CPU[12%] t2CPU[8%] tacCPU[8|8|7|7%] RSS/VSZ[365|1640]MB |
| tshark -r /tmp/encapsulation_check.pcap -Y "vlan"
| |
| | |
| # Check for GRE tunnels
| |
| tshark -r /tmp/encapsulation_check.pcap -Y "gre"
| |
| | |
| # Check for ERSPAN (GRE encapsulated with ERSPAN protocol)
| |
| tshark -r /tmp/encapsulation_check.pcap -Y "gre && ip.proto == 47"
| |
| | |
| # Check for VXLAN (UDP port 4789)
| |
| tshark -r /tmp/encapsulation_check.pcap -Y "udp.port == 4789"
| |
| | |
| # Check for TZSP (UDP ports 37008 or 37009)
| |
| tshark -r /tmp/encapsulation_check.pcap -Y "udp.port == 37008 || udp.port == 37009"
| |
| | |
| # Show packet summary to identify any unusual protocol stacks
| |
| tshark -r /tmp/encapsulation_check.pcap -V | head -50
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Identifying encapsulation issues:
| | '''Key metrics for bottleneck identification:''' |
| * '''VLAN tags present:''' Ensure VoIPmonitor's <code>sipport</code> filter does not use <code>udp</code> (which may drop VLAN-tagged packets). Comment out the <code>filter</code> directive in <code>voipmonitor.conf</code> to test.
| |
|
| |
|
| * '''ERSPAN/GRE tunnels:''' Promiscuous mode is NOT required for these Layer 3 tunnels. Verify that tunneling is configured correctly on your network device and that the packets are addressed to the sensor's IP. VoIPmonitor automatically decapsulates ERSPAN and GRE.
| | {| class="wikitable" |
| | | |- |
| * '''VXLAN/TZSP tunnels:''' These specialized tunneling protocols require proper configuration on the sending device. Consult your network device documentation for VoIPmonitor compatibility requirements.
| | ! Metric !! What It Indicates !! I/O Bottleneck Sign !! CPU Bottleneck Sign |
| | | |- |
| If encapsulation is identified as the issue, review [[Sniffing_modes]] for detailed configuration guidance.
| | | <code>heap[A|B|C]</code> || Buffer fill % (primary / secondary / processing) || High A with low t0CPU || High A with high t0CPU |
| | |- |
| | | <code>t0CPU[X%]</code> || Packet capture thread (single-core, cannot parallelize) || Low (<50%) || High (>80%) |
| | |- |
| | | <code>comp[X]</code> || Active compression threads || Very high (maxed out) || Normal |
| | |- |
| | | <code>SQLq[C:X M:Y]</code> || Pending SQL queries || Growing = database bottleneck || Stable |
| | |- |
| | | <code>tacCPU[...]</code> || TAR compression threads || All near 100% = compression bottleneck || Normal |
| | |} |
|
| |
|
| ;3B. Troubleshooting: RTP Streams Not Displayed for Specific Provider:
| | '''Interpretation flowchart:''' |
| If SIP signaling appears correctly in the GUI for calls from a specific provider, but RTP streams (audio quality graphs, waveform visualization) are missing for that provider while working correctly for other call paths, use this systematic approach to identify the cause.
| |
|
| |
|
| === Step 1: Make a Test Call to Reproduce the Issue=== | | <kroki lang="mermaid"> |
| | graph TD |
| | A[heap values rising] --> B{Check t0CPU} |
| | B -->|t0CPU > 80%| C[CPU Bottleneck] |
| | B -->|t0CPU < 50%| D{Check comp and tacCPU} |
| | D -->|comp maxed, tacCPU high| E[I/O Bottleneck<br/>Disk cannot keep up with writes] |
| | D -->|comp normal| F{Check SQLq} |
| | F -->|SQLq growing| G[Database Bottleneck] |
| | F -->|SQLq stable| H[Mixed/Other Issue] |
|
| |
|
| First, create a controlled test scenario to investigate the specific provider.
| | C --> C1[Solution: CPU optimization] |
| | E --> E1[Solution: Faster storage] |
| | G --> G1[Solution: MySQL tuning] |
| | </kroki> |
|
| |
|
| * Determine if the issue affects ALL calls from this provider or only some (e.g., specific codecs, call duration, time of day)
| | ==== Step 3: Linux I/O Diagnostics ==== |
| * Make a test call that reproduces the problem (e.g., from the problematic provider to a test number)
| |
| * Allow the call to establish and run for at least 30-60 seconds to capture meaningful RTP data
| |
|
| |
|
| === Step 2: Capture Packets on the Sniffing Interface During the Test Call ===
| | Use these standard Linux tools to confirm I/O bottleneck: |
| | |
| During the test call, use <code>tcpdump</code> (or <code>tshark</code>) to directly capture packets on the network interface configured in <code>voipmonitor.conf</code>. This tells you whether RTP packets are being received by the sensor.
| |
|
| |
|
| | '''Install required tools:''' |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Capture SIP and RTP packets from the specific provider IP during your test call | | # Debian/Ubuntu |
| # Replace eth0 with your interface and 1.2.3.4 with the provider's IP
| | apt install sysstat iotop ioping |
| sudo tcpdump -i eth0 -nn "host 1.2.3.4 and (udp port 5060 or (udp[0] & 0x78) == 0x78)" -v
| |
|
| |
|
| # Capture RTP to a file for detailed analysis (recommended) | | # CentOS/RHEL |
| sudo tcpdump -i eth0 -nn "host 1.2.3.4 and rtp" -w /tmp/test_provider_rtp.pcap
| | yum install sysstat iotop ioping |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Note: The RTP filter <code>(udp[0] & 0x78) == 0x78</code> matches packets with the first two bits of the first byte set to "10", which is characteristic of RTP.
| | '''2a) iostat - Disk utilization and wait times''' |
| | |
| === Step 3: Compare Raw Packet Capture with Sensor Output ===
| |
| | |
| After the test call:
| |
| | |
| * Check what tcpdump captured:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Count SIP packets | | # Run for 10 intervals of 2 seconds |
| tshark -r /tmp/test_provider_rtp.pcap -Y "sip" | wc -l
| | iostat -xz 2 10 |
| | |
| # Count RTP packets
| |
| tshark -r /tmp/test_provider_rtp.pcap -Y "rtp" | wc -l
| |
| | |
| # View RTP stream details
| |
| tshark -r /tmp/test_provider_rtp.pcap -Y "rtp" -T fields -e rtp.ssrc -e rtp.seq -e rtp.ptype -e udp.srcport -e udp.dstport | head -20
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| * Check what VoIPmonitor recorded:
| | '''Key output columns:''' |
| * Open the CDR for your test call in the GUI
| | <syntaxhighlight lang="text"> |
| * Verify if the "Received Packets" column shows non-zero values for the provider leg
| | Device r/s w/s rkB/s wkB/s await %util |
| * Check if the "Streams" section shows RTP quality graphs and waveform visualization
| | sda 12.50 245.30 50.00 1962.40 45.23 98.50 |
| | |
| * Compare the results:
| |
| ** '''If tcpdump shows NO RTP packets:''' The RTP traffic is not reaching the sensor interface. This indicates a network-level issue (asymmetric routing, SPAN configuration missing the RTP path, or firewall). You need to troubleshoot the network infrastructure, not VoIPmonitor.
| |
| | |
| ** '''If tcpdump shows RTP packets but the GUI shows no streams or zero received packets:''' The packets are reaching the sensor but VoIPmonitor is not processing them. Check:
| |
| * [[#Check_GUI_Capture_Rules_(Causing_Call_Stops)|Step 5: Check GUI Capture Rules]] - Look for capture rules targeting the provider's IP with RTP set to "DISCARD" or "Header Only"
| |
| * [[Tls|TLS/SSL Decryption]] - Verify SRTP decryption is configured correctly if the provider uses encryption
| |
| * [[Sniffer_configuration]] - Check for any problematic <code>sipport</code> or <code>filter</code> settings
| |
| | |
| For more information on capture rules that affect RTP storage, see [[Capture_rules]].
| |
| | |
| ;5. Check for Non-Call SIP Traffic Only:
| |
| If you see SIP traffic but it consists only of OPTIONS, NOTIFY, SUBSCRIBE, or MESSAGE methods (without any INVITE packets), there are no calls to generate CDRs. This can occur in environments that use SIP for non-call purposes like heartbeat checks or instant messaging.
| |
| | |
| You can configure VoIPmonitor to process and store these non-call SIP messages. See [[SIP_OPTIONS/SUBSCRIBE/NOTIFY]] and [[MESSAGES]] for configuration details.
| |
| | |
| Enable non-call SIP message processing in '''/etc/voipmonitor.conf''':
| |
| <syntaxhighlight lang="ini">
| |
| # Process SIP OPTIONS (qualify pings). Default: no
| |
| sip-options = yes
| |
| | |
| # Process SIP MESSAGE (instant messaging). Default: yes
| |
| sip-message = yes
| |
| | |
| # Process SIP SUBSCRIBE requests. Default: no
| |
| sip-subscribe = yes
| |
| | |
| # Process SIP NOTIFY requests. Default: no
| |
| sip-notify = yes
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Note that enabling these for processing and storage can significantly increase database load in high-traffic scenarios. Use with caution and monitor SQL queue growth. See [[SIP_OPTIONS/SUBSCRIBE/NOTIFY#Performance_Tuning|Performance Tuning]] for optimization tips.
| | {| class="wikitable" |
| | | |- |
| == Step 4: Check the VoIPmonitor Configuration ==
| | ! Column !! Description !! Problem Indicator |
| If <code>tshark</code> sees traffic but VoIPmonitor does not, the problem is almost certainly in <code>voipmonitor.conf</code>.
| |
| | |
| ;1. Check the <code>interface</code> directive:
| |
| :Make sure the <code>interface</code> parameter in <code>/etc/voipmonitor.conf</code> exactly matches the interface where you see traffic with <code>tshark</code>. For example: <code>interface = eth0</code>.
| |
| :
| |
| :=== Troubleshooting: Wrong Interface Name ===
| |
| :
| |
| :If the <code>interface</code> directive is set to an interface name that does not exist on the system, the sensor will fail to capture traffic completely. This is a common issue when network interface names change after system updates or hardware reconfiguration.
| |
| :
| |
| :;Step 1: Identify the correct interface name:
| |
| :Use either of these commands to list all available network interfaces:
| |
| :<syntaxhighlight lang="bash">
| |
| :# Option 1: Modern Linux systems
| |
| :ip a
| |
| :
| |
| :# Option 2: Older systems
| |
| :ifconfig
| |
| :</syntaxhighlight>
| |
| :
| |
| :Look for the interface that is receiving traffic. Common interface names include:
| |
| :* <code>eth0, eth1, eth2...</code> (classic Ethernet naming)
| |
| :* <code>ens33, ens34, enp0s3...</code> (predictable naming on modern systems)
| |
| :* <code>enp2s0f0, enp2s0f1...</code> (multi-port NICs)
| |
| :
| |
| :;Step 2: Verify the interface exists and is UP:
| |
| :<syntaxhighlight lang="bash">
| |
| :# Check specific interface status (replace eth0 with your interface name)
| |
| :ip link show eth0
| |
| :
| |
| :# The output should show "UP" and "LOWER_UP" to indicate the interface is active
| |
| :</syntaxhighlight>
| |
| :
| |
| :;Step 3: Update <code>/etc/voipmonitor.conf</code>:
| |
| :Edit the <code>interface</code> directive to use the correct interface name:
| |
| :<syntaxhighlight lang="ini">
| |
| :# /etc/voipmonitor.conf
| |
| :interface = ens33
| |
| :</syntaxhighlight>
| |
| :
| |
| :;Step 4: Restart the VoIPmonitor service:
| |
| :<syntaxhighlight lang="bash">
| |
| :systemctl restart voipmonitor
| |
| :systemctl status voipmonitor
| |
| :</syntaxhighlight>
| |
| :
| |
| :Verify the service shows <code>Active: active (running)</code> after the restart.
| |
| | |
| ;2. Check the <code>sipport</code> directive:
| |
| :By default, VoIPmonitor only listens on port 5060. If your PBX uses a different port for SIP, you must add it. '''Common causes of missing calls:'''
| |
| :* '''Missing ports:''' Some providers use alternate SIP ports (5061, 5080, etc.). If these are not listed, calls on those ports will be ignored.
| |
| :* '''Syntax errors:''' List multiple ports comma-separated without extra commas or trailing characters. Correct syntax: <code>sipport = 5060,5061</code> or <code>sipport = 5060,5080</code>
| |
| :* '''Ranges:''' You can specify port ranges using dashes: <code>sipport = 5060,5070-5080</code>
| |
| :Example:
| |
| :<code>sipport = 5060,5080</code>
| |
| | |
| ;3. '''Distributed/Probe Setup Considerations:'''
| |
| :If you are using a remote sensor (probe) with Packet Mirroring (<code>packetbuffer_sender=yes</code>), call detection depends on configuration on '''both''' the probe and the central analysis host.
| |
| | |
| :Common symptom: The probe captures traffic (visible via <code>tcpdump</code>), but the central server records incomplete or missing CDRs for calls on non-default ports.
| |
| | |
| {| class="wikitable" style="background:#fff3cd; border:1px solid #ffc107;"
| |
| |- | | |- |
| ! colspan="2" style="background:#ffc107;" | Critical: Both Systems Must Have Matching sipport Configuration
| | | <code>%util</code> || Device utilization percentage || '''> 90%''' = disk saturated |
| |- | | |- |
| | style="vertical-align: top;" | '''Probe side:''' | | | <code>await</code> || Average I/O wait time (ms) || '''> 20ms''' for SSD, '''> 50ms''' for HDD = high latency |
| | The probe captures packets from the network interface. Its <code>sipport</code> setting determines which UDP ports it considers as SIP traffic to capture and forward.
| |
| |- | | |- |
| | style="vertical-align: top;" | '''Central server side:''' | | | <code>w/s</code> || Writes per second || Compare with disk's rated IOPS |
| | When receiving raw packets in Packet Mirroring mode, the central server analyzes the packets locally. Its <code>sipport</code> setting determines which ports it interprets as SIP during analysis. If a port is missing here, packets are captured but not recognized as SIP, resulting in missing CDRs.
| |
| |} | | |} |
|
| |
|
| :'''Troubleshooting steps for distributed probe setups:'''
| | '''2b) iotop - Per-process I/O usage''' |
| | |
| ::1. Verify traffic reachability on the probe:
| |
| ::Use <code>tcpdump</code> on the probe VM to confirm SIP packets for the missing calls are arriving on the expected ports.
| |
| ::<pre>
| |
| ::# On the probe VM
| |
| ::tcpdump -i eth0 -n port 5061
| |
| ::</pre>
| |
| | |
| ::2. Check the probe's ''voipmonitor.conf'':
| |
| ::Ensure the <code>sipport</code> directive on the probe includes all necessary SIP ports used in your network.
| |
| ::<syntaxhighlight lang="ini">
| |
| ::# /etc/voipmonitor.conf on the PROBE
| |
| ::sipport = 5060,5061,5080,6060
| |
| ::</syntaxhighlight>
| |
| | |
| ::3. Check the central analysis host's ''voipmonitor.conf'':
| |
| ::'''This is the most common cause of missing calls in distributed setups.''' The central analysis host (the system receiving packets via <code>server_bind</code> or legacy <code>mirror_bind</code>) must also have the <code>sipport</code> directive configured with the same list of ports used by all probes.
| |
| ::<syntaxhighlight lang="ini">
| |
| ::# /etc/voipmonitor.conf on the CENTRAL HOST
| |
| ::sipport = 5060,5061,5080,6060
| |
| ::</syntaxhighlight>
| |
| | |
| ::4. Restart both services:
| |
| ::Apply the configuration changes:
| |
| ::<syntaxhighlight lang="bash">
| |
| ::# On both probe and central host
| |
| ::systemctl restart voipmonitor
| |
| ::</syntaxhighlight>
| |
| | |
| :For more details on distributed architecture configuration and packet mirroring, see [[Sniffer_distributed_architecture|Distributed Architecture: Client-Server Mode]].
| |
| | |
| ;4. Check for a restrictive <code>filter</code>:
| |
| :If you have a BPF <code>filter</code> configured, ensure it is not accidentally excluding the traffic you want to see. For debugging, try commenting out the <code>filter</code> line entirely and restarting the sensor.
| |
| | |
| == Step 5: Check GUI Capture Rules (Causing Call Stops) ==
| |
| If <code>tshark</code> sees SIP traffic and the sniffer configuration appears correct, but the probe stops processing calls or shows traffic only on the network interface, GUI capture rules may be the culprit.
| |
| | |
| Capture rules configured in the GUI can instruct the sniffer to ignore ("skip") all processing for matched calls. This includes calls matching specific IP addresses or telephone number prefixes.
| |
| | |
| ;1. Review existing capture rules:
| |
| :Navigate to '''GUI -> Capture rules''' and examine all rules for any that might be blocking your traffic.
| |
| :Look specifically for rules with the '''Skip''' option set to '''ON''' (displayed as "Skip: ON"). The Skip option instructs the sniffer to completely ignore matching calls (no files, RTP analysis, or CDR creation).
| |
| | |
| ;2. Test by temporarily removing all capture rules:
| |
| :To isolate the issue, first create a backup of your GUI configuration:
| |
| :* Navigate to '''Tools -> Backup & Restore -> Backup GUI -> Configuration tables'''
| |
| :* This saves your current settings including capture rules
| |
| :* Delete all capture rules from the GUI
| |
| :* Click the '''Apply''' button to save changes
| |
| :* Reload the sniffer by clicking the green '''"reload sniffer"''' button in the control panel
| |
| :* Test if calls are now being processed correctly
| |
| :* If resolved, restore the configuration from the backup and systematically investigate the rules to identify the problematic one
| |
| | |
| ;3. Identify the problematic rule:
| |
| :* After restoring your configuration, remove rules one at a time and reload the sniffer after each removal
| |
| :* When calls start being processed again, you have identified the problematic rule
| |
| :* Review the rule's match criteria (IP addresses, prefixes, direction) against your actual traffic pattern
| |
| :* Adjust the rule's conditions or Skip setting as needed
| |
| | |
| ;4. Verify rules are reloaded:
| |
| :After making changes to capture rules, remember that changes are '''not automatically applied''' to the running sniffer. You must click the '''"reload sniffer"''' button in the control panel, or the rules will continue using the previous configuration.
| |
| | |
| For more information on capture rules, see [[Capture_rules]].
| |
| | |
| == Troubleshooting: Service Fails to Start with "failed read rsa key" Error ==
| |
| | |
| If the VoIPmonitor sniffer service fails to start and logs the error message "failed read rsa key," this indicates that the manager key cannot be loaded from the database.
| |
| | |
| === Cause ===
| |
| | |
| The manager_key is stored in the <code>system</code> database table (identified by <code>type='manager_key'</code>) and is required for proper manager/sensor operations in distributed deployments. This error most commonly occurs when the <code>mysqlloadconfig</code> option in <code>voipmonitor.conf</code> is set to <code>no</code>, which prevents VoIPmonitor from loading configuration (including the manager_key) from the database.
| |
| | |
| === Troubleshooting Steps ===
| |
| | |
| ;1. Check for the error in system logs:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # For Debian/Ubuntu | | # Show I/O by process (run as root) |
| tail -f /var/log/syslog | grep voipmonitor
| | iotop -o |
| | |
| # For CentOS/RHEL/AlmaLinux
| |
| tail -f /var/log/messages | grep voipmonitor
| |
| | |
| # For systemd systems
| |
| journalctl -u voipmonitor -f
| |
| </syntaxhighlight>
| |
| | |
| ;2. Verify mysqlloadconfig setting:
| |
| <syntaxhighlight lang="bash">
| |
| # Check if mysqlloadconfig is set to no in voipmonitor.conf
| |
| grep mysqlloadconfig /etc/voipmonitor.conf
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| If the output shows <code>mysqlloadconfig = no</code>, this is the cause of the issue. | | Look for <code>voipmonitor</code> or <code>mysqld</code> dominating I/O. If voipmonitor shows high DISK WRITE but system <code>%util</code> is 100%, disk cannot keep up. |
|
| |
|
| ;3. Fix the mysqlloadconfig setting:
| | '''2c) ioping - Quick latency check''' |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Edit the configuration file | | # Test latency on VoIPmonitor spool directory |
| nano /etc/voipmonitor.conf
| | cd /var/spool/voipmonitor |
| | | ioping -c 20 . |
| # Either remove the mysqlloadconfig line entirely (defaults to yes)
| |
| # Or uncomment/set to yes:
| |
| # mysqlloadconfig = yes
| |
| | |
| # Restart the sniffer service
| |
| systemctl restart voipmonitor
| |
| | |
| # Check if it started successfully
| |
| systemctl status voipmonitor
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| ;4. Verify manager_key exists in database:
| | '''Expected results:''' |
| <syntaxhighlight lang="sql">
| | {| class="wikitable" |
| -- Query the manager_key from the system table | | |- |
| SELECT * FROM voipmonitor.`system` WHERE type='manager_key'\G
| | ! Storage Type !! Healthy Latency !! Problem Indicator |
| </syntaxhighlight> | | |- |
| | | | NVMe SSD || < 0.5 ms || > 2 ms |
| If no manager_key exists, check your VoIPmonitor installation and consider running the installer or contacting support to regenerate the key.
| | |- |
| | | | SATA SSD || < 1 ms || > 5 ms |
| ;5. Check database connectivity and permissions:
| | |- |
| Verify that the VoIPmonitor sniffer can connect to the database and has read access to the <code>system</code> table.
| | | HDD (7200 RPM) || < 10 ms || > 30 ms |
| <syntaxhighlight lang="bash">
| | |} |
| # Test database connectivity with the configured credentials
| |
| mysql -h <mysqlhost> -u <mysqlusername> -p <mysqldb>
| |
| | |
| # Inside MySQL, verify the user has SELECT on voipmonitor.system
| |
| SHOW GRANTS FOR 'voipmonitor_user'@'%';
| |
| </syntaxhighlight> | |
| | |
| ;6. Check configuration consistency between probe and server:
| |
| In distributed deployments with probe and server components, ensure that both systems have consistent configuration in <code>/etc/voipmonitor.conf</code>. Specifically, both should have the same database connection settings and <code>mysqlloadconfig</code> should be enabled on both systems.
| |
| | |
| === Summary ===
| |
|
| |
|
| The "failed read rsa key" error is almost always caused by <code>mysqlloadconfig=no</code> in <code>voipmonitor.conf</code>. The solution is to remove or change this setting to <code>yes</code>, then restart the service.
| | ==== Step 4: Linux CPU Diagnostics ==== |
|
| |
|
| == Step 6: Check VoIPmonitor Logs for Errors ==
| | '''3a) top - Overall CPU usage''' |
| Finally, VoIPmonitor's own logs are the best source for clues. Check the system log for any error messages generated by the sensor on startup or during operation.
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # For Debian/Ubuntu | | # Press '1' to show per-core CPU |
| tail -f /var/log/syslog | grep voipmonitor
| | top |
| | |
| # For CentOS/RHEL/AlmaLinux
| |
| tail -f /var/log/messages | grep voipmonitor
| |
| </syntaxhighlight> | | </syntaxhighlight> |
| Look for errors like:
| |
| * "pcap_open_live(eth0) error: eth0: No such device" (Wrong interface name)
| |
| * "Permission denied" (The sensor is not running with sufficient privileges)
| |
| * Errors related to database connectivity.
| |
| * Messages about dropping packets.
| |
|
| |
|
| == Step 7: Check for OOM (Out of Memory) Issues ==
| | Look for: |
| If VoIPmonitor suddenly stops processing CDRs and a service restart temporarily restores functionality, the system may be experiencing OOM (Out of Memory) killer events. The Linux OOM killer terminates processes when available RAM is exhausted, and MySQL (<code>mysqld</code>) is a common target due to its memory-intensive nature.
| | * Individual CPU core at 100% (t0 thread is single-threaded) |
| | * High <code>%wa</code> (I/O wait) vs high <code>%us/%sy</code> (CPU-bound) |
|
| |
|
| ;1. Check for OOM killer events in kernel logs:
| | '''3b) Verify voipmonitor threads''' |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # For Debian/Ubuntu | | # Show voipmonitor threads with CPU usage |
| grep -i "out of memory\|killed process" /var/log/syslog | tail -20
| | top -H -p $(pgrep voipmonitor) |
| | |
| # For CentOS/RHEL/AlmaLinux
| |
| grep -i "out of memory\|killed process" /var/log/messages | tail -20
| |
| | |
| # Also check dmesg:
| |
| dmesg | grep -i "killed process" | tail -10
| |
| </syntaxhighlight> | | </syntaxhighlight> |
| Typical OOM killer messages look like:
| |
| <pre>
| |
| Out of memory: Kill process 1234 (mysqld) score 123 or sacrifice child
| |
| Killed process 1234 (mysqld) total-vm: 12345678kB, anon-rss: 1234567kB
| |
| </pre>
| |
|
| |
|
| ;2. Monitor current memory usage:
| | If one thread shows ~100% CPU while others are low, you have a CPU bottleneck on the capture thread (t0). |
| <syntaxhighlight lang="bash">
| |
| # Check available memory (look for low 'available' or 'free' values)
| |
| free -h
| |
|
| |
|
| # Check per-process memory usage (sorted by RSS)
| | ==== Step 5: Decision Matrix ==== |
| ps aux --sort=-%mem | head -15
| |
|
| |
|
| # Check MySQL memory usage in bytes
| | {| class="wikitable" |
| cat /proc/$(pgrep mysqld)/status | grep -E "VmSize|VmRSS"
| | |- |
| </syntaxhighlight> | | ! Observation !! Likely Cause !! Go To |
| Warning signs:
| | |- |
| * '''Available memory consistently below 500MB during operation'''
| | | <code>heap</code> high, <code>t0CPU</code> > 80%, iostat <code>%util</code> low || '''CPU Bottleneck''' || [[#Solution: CPU Bottleneck|CPU Solution]] |
| * '''MySQL consuming most of the available RAM'''
| | |- |
| * '''Swap usage near 100% (if swap is enabled)'''
| | | <code>heap</code> high, <code>t0CPU</code> < 50%, iostat <code>%util</code> > 90% || '''I/O Bottleneck''' || [[#Solution: I/O Bottleneck|I/O Solution]] |
| * '''Frequent process restarts without clear error messages'''
| | |- |
| | | <code>heap</code> high, <code>t0CPU</code> < 50%, iostat <code>%util</code> < 50%, <code>SQLq</code> growing || '''Database Bottleneck''' || [[#SQL Queue Overload|Database Solution]] |
| | |- |
| | | <code>heap</code> normal, <code>comp</code> maxed, <code>tacCPU</code> all ~100% || '''Compression Bottleneck''' (type of I/O) || [[#Solution: I/O Bottleneck|I/O Solution]] |
| | |} |
|
| |
|
| ;3. Solution: Increase physical memory:
| | ==== Step 6: Confirmation Test (Optional) ==== |
| The definitive solution for OOM-related CDR processing issues is to upgrade the server's physical RAM. After upgrading:
| |
| * Verify memory improvements with <code>free -h</code>
| |
| * Monitor for several days to ensure OOM events stop
| |
| * Consider tuning <code>innodb_buffer_pool_size</code> in your MySQL configuration to use the additional memory effectively
| |
|
| |
|
| Additional mitigation strategies (while planning for RAM upgrade):
| | After identifying the likely cause with the tools above, you can confirm with a storage disable test: |
| * Reduce MySQL's memory footprint by lowering <code>innodb_buffer_pool_size</code> (e.g., from 16GB to 8GB)
| |
| * Disable or limit non-essential VoIPmonitor features (e.g., packet capture storage, RTP analysis)
| |
| * Ensure swap space is properly configured as a safety buffer (though swap is much slower than RAM)
| |
| * Use <code>sysctl vm.swappiness=10</code> to favor RAM over swap when some memory is still available
| |
|
| |
|
| == Step 8: Missing CDRs for Calls with Large Packets ==
| |
| If VoIPmonitor is capturing some calls successfully but missing CDRs for specific calls (especially those that seem to have larger SIP packets like INVITEs with extensive SDP), there are two common causes to investigate.
| |
|
| |
| === Cause 1: snaplen Packet Truncation (VoIPmonitor Configuration) ===
| |
| The <code>snaplen</code> parameter in <code>voipmonitor.conf</code> limits how many bytes of each packet are captured. If a SIP packet exceeds <code>snaplen</code>, it is truncated and the sniffer may fail to parse the call correctly.
| |
|
| |
| ;1. Check your current snaplen setting:
| |
| <syntaxhighlight lang="bash">
| |
| grep snaplen /etc/voipmonitor.conf
| |
| </syntaxhighlight>
| |
| Default is 3200 bytes (6000 if SSL/HTTP is enabled).
| |
|
| |
| ;2. Test if packet truncation is the issue:
| |
| Use <code>tcpdump</code> with <code>-s0</code> (snap infinite) to capture full packets:
| |
| <syntaxhighlight lang="bash">
| |
| # Capture SIP traffic with full packet length
| |
| tcpdump -i eth0 -s0 -nn port 5060 -w /tmp/test_capture.pcap
| |
|
| |
| # Analyze packet sizes with Wireshark or tshark
| |
| tshark -r /tmp/test_capture.pcap -T fields -e frame.len -Y "sip" | sort -n | tail -10
| |
| </syntaxhighlight>
| |
| If you see SIP packets larger than your <code>snaplen</code> value (e.g., 4000+ bytes), increase <code>snaplen</code> in <code>voipmonitor.conf</code>:
| |
| <syntaxhighlight lang="ini"> | | <syntaxhighlight lang="ini"> |
| snaplen = 65535
| | # /etc/voipmonitor.conf - temporarily disable all storage |
| | savesip = no |
| | savertp = no |
| | savertcp = no |
| | savegraph = no |
| </syntaxhighlight> | | </syntaxhighlight> |
| Then restart the sniffer: <code>systemctl restart voipmonitor</code>.
| |
|
| |
|
| === Cause 2: MTU Mismatch (Network Infrastructure) ===
| |
| If packets are being lost or fragmented due to MTU mismatches in the network path, VoIPmonitor may never receive the complete packets, regardless of <code>snaplen</code> settings.
| |
|
| |
| ;1. Diagnose MTU-related packet loss:
| |
| Capture traffic with tcpdump and analyze in Wireshark:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Capture traffic on the VoIPmonitor host | | systemctl restart voipmonitor |
| tcpdump -i eth0 -s0 host <pbx_ip_address> -w /tmp/mtu_test.pcap
| | # Monitor for 5-10 minutes during peak traffic |
| | journalctl -u voipmonitor -f | grep heap |
| </syntaxhighlight> | | </syntaxhighlight> |
| Open the pcap in Wireshark and look for:
| |
| * Reassembled PDUs marked as incomplete
| |
| * TCP retransmissions for the same packet
| |
| * ICMP "Fragmentation needed" messages (Type 3, Code 4)
| |
|
| |
|
| ;2. Verify packet completeness:
| | * If <code>heap</code> values drop to near zero → confirms '''I/O bottleneck''' |
| In Wireshark, examine large SIP INVITE packets. If the SIP headers or SDP appear cut off or incomplete, packets are likely being lost in transit due to MTU issues.
| | * If <code>heap</code> values remain high → confirms '''CPU bottleneck''' |
|
| |
|
| ;3. Identify the MTU bottleneck:
| | {{Warning|Remember to re-enable storage after testing! This test causes call recordings to be lost.}} |
| The issue is typically a network device with a lower MTU than the end devices. Common locations:
| |
| * VPN concentrators
| |
| * Firewalls
| |
| * Routers with tunnel interfaces
| |
| * Cloud provider gateways (typically 1500 bytes vs. standard 9000 jumbo frames)
| |
|
| |
|
| To locate the problematic device, trace the MTU along the network path from the PBX to the VoIPmonitor sensor.
| | === Solution: I/O Bottleneck === |
|
| |
|
| ;4. Resolution options:
| | {{Note|If you see <code>IO[...] DISK_SAT</code> or <code>WARN</code> in the syslog status line (v2026.01.3+), disk saturation is already confirmed. See [[Syslog_Status_Line#IO.5B....5D_-_Disk_I.2FO_Monitoring_.28v2026.01.3.2B.29|IO[] Metrics]] for details.}} |
| * Increase MTU on the bottleneck device to match the rest of the network (e.g., from 1500 to 9000 for jumbo frame environments)
| |
| * Enable Path MTU Discovery (PMTUD) on intermediate devices
| |
| * Ensure your switching infrastructure supports jumbo frames end-to-end if you are using them
| |
|
| |
|
| For more information on the <code>snaplen</code> parameter, see [[Sniffer_configuration#Network_Interface_.26_Sniffing|Sniffer Configuration]].
| | '''Quick confirmation (for older versions):''' |
|
| |
|
| === Cause 3: External Source Packet Truncation (Traffic Mirroring/LBS Modules) === | | Temporarily save only RTP headers to reduce disk write load: |
| If packets are truncated or corrupted BEFORE they reach VoIPmonitor, changing <code>snaplen</code> will NOT fix the issue. This scenario occurs when using external SIP sources that have their own packet size limitations.
| | <syntaxhighlight lang="ini"> |
| | # /etc/voipmonitor.conf |
| | savertp = header |
| | </syntaxhighlight> |
|
| |
|
| ; Symptoms to identify this scenario:
| | Restart the sniffer and monitor. If heap usage stabilizes and "MEMORY IS FULL" errors stop, the issue is confirmed to be storage I/O. |
| * Large SIP packets (e.g., WebRTC INVITE with big Authorization headers ~4k) appear truncated
| |
| * Packets show as corrupted or malformatted in VoIPmonitor GUI
| |
| * Changing <code>snaplen</code> in <code>voipmonitor.conf</code> has no effect
| |
| * Using TCP instead of UDP in the external system does not resolve the issue
| |
|
| |
|
| ; Common external sources that may truncate packets:
| | '''Check storage health before upgrading:''' |
| # Kamailio <code>siptrace</code> module
| |
| # FreeSWITCH <code>sip_trace</code> module
| |
| # OpenSIPS tracing modules
| |
| # Custom HEP/HOMER agent implementations
| |
| # Load balancers or proxy servers with traffic mirroring
| |
| | |
| ; Diagnose external source truncation:
| |
| Use <code>tcpdump</code> with <code>-s0</code> (snap infinite) on the VoIPmonitor sensor to compare packet sizes:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Capture traffic received by VoIPmonitor | | # Check drive health |
| sudo tcpdump -i eth0 -s0 -nn port 5060 -w /tmp/voipmonitor_input.pcap
| | smartctl -a /dev/sda |
|
| |
|
| # Analyze actual packet sizes received | | # Check for I/O errors in system logs |
| tshark -r /tmp/voipmonitor_input.pcap -T fields -e frame.len -Y "sip.Method == INVITE" | sort -n | tail -10
| | dmesg | grep -i "i/o error\|sd.*error\|ata.*error" |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| If:
| | Look for reallocated sectors, pending sectors, or I/O errors. Replace failing drives before considering upgrades. |
| * You see packets with truncated SIP headers or incomplete SDP
| |
| * The packet length is much smaller than expected (e.g., 1500 bytes instead of 4000+ bytes)
| |
| * Truncation is consistent across all calls
| |
|
| |
|
| Then the external source is truncating packets before they reach VoIPmonitor.
| | '''Storage controller cache settings:''' |
| | {| class="wikitable" |
| | |- |
| | ! Storage Type !! Recommended Cache Mode |
| | |- |
| | | HDD / NAS || WriteBack (requires battery-backed cache) |
| | |- |
| | | SSD || WriteThrough (or WriteBack with power loss protection) |
| | |} |
|
| |
|
| ; Solutions for Kamailio siptrace truncation:
| | Use vendor-specific tools to configure cache policy (<code>megacli</code>, <code>ssacli</code>, <code>perccli</code>). |
| If using Kamailio's <code>siptrace</code> module with traffic mirroring:
| |
|
| |
|
| 1. Configure Kamailio to use TCP transport for siptrace (may help in some cases):
| | '''Storage upgrades (in order of effectiveness):''' |
| <pre>
| | {| class="wikitable" |
| # In kamailio.cfg
| | |- |
| modparam("siptrace", "duplicate_uri", "sip:voipmonitor_ip:port;transport=tcp")
| | ! Solution !! IOPS Improvement !! Notes |
| </pre>
| | |- |
| | | '''NVMe SSD''' || 50-100x vs HDD || Best option, handles 10,000+ concurrent calls |
| | |- |
| | | '''SATA SSD''' || 20-50x vs HDD || Good option, handles 5,000+ concurrent calls |
| | |- |
| | | '''RAID 10 with BBU''' || 5-10x vs single disk || Enable WriteBack cache (requires battery backup) |
| | |- |
| | | '''Separate storage server''' || Variable || Use [[Sniffer_distributed_architecture|client/server mode]] |
| | |} |
|
| |
|
| 2. If Kamailio reports "Connection refused", VoIPmonitor does not open a TCP listener by default. Manually open one:
| | '''Filesystem tuning (ext4):''' |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Open TCP listener using socat | | # Check current mount options |
| socat TCP-LISTEN:5888,fork,reuseaddr &
| | mount | grep voipmonitor |
| </syntaxhighlight>
| |
| Then update kamailio.cfg to use the specified port instead of the standard SIP port.
| |
| | |
| 3. Use HAProxy traffic 'tee' function (recommended):
| |
| If your architecture includes HAProxy in front of Kamailio, use its traffic mirroring to send a copy of the WebSocket traffic directly to VoIPmonitor's standard SIP listening port. This bypasses the siptrace module entirely and preserves original packets:
| |
| <pre>
| |
| # In haproxy.cfg, within your frontend/backend configuration
| |
| # Send a copy of traffic to VoIPmonitor
| |
| option splice-response
| |
| tcp-request inspect-delay 5s
| |
| tcp-request content accept if { req_ssl_hello_type 1 }
| |
| use-server voipmonitor if { req_ssl_hello_type 1 }
| |
| listen voipmonitor_mirror
| |
| bind :5888
| |
| mode tcp
| |
| server voipmonitor <voipmonitor_sensor_ip>:5060 send-proxy
| |
| </pre>
| |
| | |
| Note: The exact HAProxy configuration depends on your architecture and whether you are mirroring TCP (WebSocket) or UDP traffic.
| |
| | |
| ; Solutions for other external sources:
| |
| # Check the external system's documentation for packet size limits or truncation settings
| |
| # Consider using standard network mirroring (SPAN/ERSPAN/GRE) instead of SIP tracing modules
| |
| # Ensure the external system captures full packet lengths (disable any internal packet size caps)
| |
| # Verify that the external system does not reassemble or modify SIP packets before forwarding
| |
| | |
| == Troubleshooting: CDR Shows 000 No Response Despite Valid SIP Response ==
| |
| | |
| If the CDR View displays "000 No Response" in the Last Response column for calls that actually have valid final SIP response codes (such as 403 Forbidden, 500 Server Error, etc.), this indicates that the sniffer is receiving response packets but failing to correlate them with their corresponding INVITE transactions before writing the CDR.
| |
| | |
| === Diagnosis: Verify Response Packets Are Captured ===
| |
| | |
| ;1. Locate the affected call in the CDR View:
| |
| :* Find a call showing "000 No Response" in the Last Response column.
| |
| | |
| ;2. Check the SIP History:
| |
| :* Click the [+] icon to expand the call's detail view.
| |
| :* Open the "SIP History" tab.
| |
| :* Look for the actual SIP response (e.g., 403 Forbidden, 486 Busy Here, 500 Internal Server Error).
| |
| | |
| If the response packet IS present in SIP History, the issue is a correlation timing problem. Proceed to the solution below.
| |
| | |
| If the response packet is NOT present in SIP History, the issue is a network visibility problem (see [[#SPAN_Configuration_Troubleshooting|Step 3: Investigate Packet Encapsulation]] and other network troubleshooting sections).
| |
| | |
| === Root Cause: libpcap Packet Queue Timeout ===
| |
| | |
| The issue is caused by VoIPmonitor's libpcap packet capture timing out before responses can be matched to their originating INVITEs. This typically occurs in high-traffic environments or when packet processing is temporarily delayed due to system load.
| |
| | |
| The sniffer creates CDR records based on SIP INVITE packets. It attempts to correlate subsequent SIP responses (403, 500, etc.) with the original INVITE. If the packet queue processing is too slow or the time window is too short, responses arrive after the CDR has already been written with "Last Response" set to 0.
| |
| | |
| === Solution: Configure libpcap Nonblocking Mode ===
| |
| | |
| Edit the "/etc/voipmonitor.conf" file on the sniffer host and add the following parameters:
| |
| | |
| <syntaxhighlight lang="ini">
| |
| # Enable libpcap nonblocking mode to prevent packet queue blocking
| |
| libpcap_nonblock_mode = yes
| |
|
| |
|
| # Increase packet deque window length (in milliseconds) for response correlation | | # Recommended mount options for /var/spool/voipmonitor |
| # Default is often 2000ms, increasing to 5000ms gives more time for responses | | # Add to /etc/fstab: noatime,data=writeback,barrier=0 |
| pcap_queue_deque_window_length = 5000
| | # WARNING: barrier=0 requires battery-backed RAID |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Save the file and restart the voipmonitor service:
| | '''Verify improvement:''' |
| | |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| systemctl restart voipmonitor
| | # After changes, monitor iostat |
| | iostat -xz 2 10 |
| | # %util should drop below 70%, await should decrease |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| === Additional Considerations === | | === Solution: CPU Bottleneck === |
|
| |
|
| ;If the issue persists after applying the fix:
| | ==== Identify CPU Bottleneck Using Manager Commands ==== |
| :* Try increasing <code>pcap_queue_deque_window_length</code> further (e.g., to 7000 or 10000 milliseconds)
| |
| :* Check system load to ensure the server is not under heavy CPU or I/O pressure
| |
| :* Verify adequate <code>ringbuffer</code> size is configured for your traffic volume (see [[Scaling|Scaling and Performance Tuning]])
| |
|
| |
|
| ;For distributed architectures:
| | VoIPmonitor provides manager commands to monitor thread CPU usage in real-time. This is essential for identifying which thread is saturated. |
| :* Ensure all voipmonitor hosts have synchronized time (see [[#Verify_System_Time_Synchronization]])
| |
| :* Time mismatches between components can cause correlation failures
| |
|
| |
|
| {{Note|The <code>pcap_queue_deque_window_length</code> parameter is also used in distributed mirroring scenarios to sort packets from multiple mirrors. Increasing this value improves packet correlation in both single-sensor and distributed setups.}}
| | '''Connect to manager interface:''' |
| | |
| For more information on packet capture configuration, see [[Sniffer_configuration|Sniffer Configuration]].
| |
| | |
| == Step 9: Probe Timeout Due to Virtualization Timing Issues ==
| |
| | |
| If remote probes are intermittently disconnecting from the central server with timeout errors, even on a high-performance network with low load, the issue may be related to virtualization host timing problems rather than network connectivity.
| |
| | |
| === Diagnosis: Check System Log Timing Intervals ===
| |
| | |
| The VoIPmonitor sensor generates status log messages approximately every 10 seconds during normal operation. If the timing system on the probe is inconsistent, the interval between these status messages can exceed 30 seconds, triggering a connection timeout.
| |
| | |
| ;1. Monitor the system log on the affected probe:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| tail -f /var/log/syslog | grep voipmonitor
| | # Via Unix socket (local, recommended) |
| </syntaxhighlight>
| | echo 'sniffer_threads' | nc -U /tmp/vm_manager_socket |
|
| |
|
| ;2. Examine the timestamps of voipmonitor status messages:
| | # Via TCP port 5029 (remote or local) |
| Look for repeating log entries that should appear approximately every 10 seconds during normal operations.
| | echo 'sniffer_threads' | nc 127.0.0.1 5029 |
|
| |
|
| ;3. Identify timing irregularities:
| | # Monitor continuously (every 2 seconds) |
| Calculate the time interval between successive status log entries. '''If the interval exceeds 30 seconds''', this indicates a timing system problem that will cause connection timeouts with the central server.
| | watch -n 2 "echo 'sniffer_threads' | nc -U /tmp/vm_manager_socket" |
| | |
| === Root Cause: Virtualization Host RDTSC Issues ===
| |
| | |
| This problem is '''not''' network-related. It is a host-level timing issue that impacts the application's internal timers.
| |
| | |
| The issue typically occurs on virtualized probes where the host's CPU timekeeping is inconsistent. Specifically, problems with the RDTSC (Read Time-Stamp Counter) CPU instruction on the virtualization host can cause:
| |
| | |
| * Irregular system clock behavior on the guest VM
| |
| * Application timers that do not fire consistently
| |
| * Sporadic timeouts in client-server connections
| |
| | |
| === Resolution ===
| |
| | |
| ;1. Investigate the virtualization host configuration:
| |
| Check the host's hypervisor or virtualization platform documentation for known timekeeping issues related to RDTSC.
| |
| | |
| Common virtualization platforms with known timing considerations:
| |
| * KVM/QEMU: Check CPU passthrough and TSC mode settings
| |
| * VMware: Verify time synchronization between guest and host
| |
| * Hyper-V: Review Integration Services time sync configuration
| |
| * Xen: Check TSC emulation settings
| |
| | |
| ;2. Apply host-level fixes:
| |
| These are host-level fixes, not changes to the guest VM configuration. Consult your virtualization platform's documentation for specific steps to address RDTSC timing issues.
| |
| | |
| Typical solutions include:
| |
| * Enabling appropriate TSC modes on the host
| |
| * Configuring CPU features passthrough correctly
| |
| * Adjusting hypervisor timekeeping parameters
| |
| | |
| ;3. Verify the fix:
| |
| After applying the host-level configuration changes, monitor the probe's status logs again to confirm that the timing intervals are now consistently around 10 seconds (never exceeding 30 seconds).
| |
| | |
| <syntaxhighlight lang="bash">
| |
| # Monitor for regular status messages
| |
| tail -f /var/log/syslog | grep voipmonitor
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Once the timing is corrected, probe connections to the central server should remain stable without intermittent timeouts.
| | {{Note|1=TCP port 5029 is encrypted by default. For unencrypted access, set <code>manager_enable_unencrypted = yes</code> in voipmonitor.conf (security risk on public networks).}} |
|
| |
|
| == Troubleshooting: Audio Missing on One Call Leg ==
| | '''Example output:''' |
| | | <syntaxhighlight lang="text"> |
| If the sniffer captures full audio on one call leg (e.g., carrier/outside) but only partial or no audio on the other leg (e.g., PBX/inside), use this diagnostic workflow to identify the root cause BEFORE applying any configuration fixes.
| | t0 - binlog1 fifo pcap read ( 12345) : 78.5 FIFO 99 1234 |
| | | t2 - binlog1 pb write ( 12346) : 12.3 456 |
| The key question to answer is: '''Are the RTP packets for the silent leg present on the wire?'''
| | rtp thread binlog1 binlog1 0 ( 12347) : 8.1 234 |
| | | rtp thread binlog1 binlog1 1 ( 12348) : 6.2 198 |
| === Step 1: Use tcpdump to Capture Traffic During a Test Call ===
| | t1 - binlog1 call processing ( 12349) : 4.5 567 |
| | | tar binlog1 compression 0 ( 12350) : 3.2 89 |
| Initiate a new test call that reproduces the issue. During the call, use tcpdump or tshark directly on the sensor's sniffing interface to capture all traffic:
| |
| | |
| <syntaxhighlight lang="bash"> | |
| # Capture traffic to a file during the test call
| |
| # Replace eth0 with your sniffing interface
| |
| tcpdump -i eth0 -s0 -w /tmp/direct_capture.pcap
| |
| | |
| # OR: Display live traffic for specific IPs (useful for real-time diagnostics)
| |
| tcpdump -i eth0 -s0 -nn "host <pbx_ip> or host <carrier_ip>"
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Let the call run for 10-30 seconds, then stop tcpdump with Ctrl+C.
| | '''Column interpretation:''' |
| | {| class="wikitable" |
| | |- |
| | ! Column !! Description |
| | |- |
| | | Thread name || Descriptive name (t0=capture, t1=call processing, t2=packet buffer write) |
| | |- |
| | | (TID) || Linux thread ID (useful for <code>top -H -p TID</code>) |
| | |- |
| | | CPU % || Current CPU usage percentage - '''key metric''' |
| | |- |
| | | Sched || Scheduler type (FIFO = real-time, empty = normal) |
| | |- |
| | | Priority || Thread priority |
| | |- |
| | | CS/s || Context switches per second |
| | |} |
|
| |
|
| === Step 2: Retrieve VoIPmonitor GUI's PCAP for the Same Call ===
| | '''Critical threads to watch:''' |
| | | {| class="wikitable" |
| After the call completes:
| | |- |
| 1. Navigate to the '''CDR View''' in the VoIPmonitor GUI
| | ! Thread !! Role !! If at 90-100% |
| 2. Find the test call you just made
| | |- |
| 3. Download the PCAP file for that call (click the PCAP icon/button)
| | | '''t0''' (pcap read) || Packet capture from NIC || '''Single-core limit reached!''' Cannot parallelize. Need DPDK/Napatech. |
| 4. Save it as: <code>/tmp/gui_capture.pcap</code>
| | |- |
| | | | '''t2''' (pb write) || Packet buffer processing || Processing bottleneck. Check t2CPU breakdown. |
| === Step 3: Compare the Two Captures ===
| | |- |
| | | | '''rtp thread''' || RTP packet processing || Threads auto-scale. If still saturated, consider DPDK/Napatech. |
| Analyze both captures to determine if RTP packets for the silent leg are present on the wire:
| | |- |
| | | | '''tar compression''' || PCAP archiving || I/O bottleneck (compression waiting for disk) |
| <syntaxhighlight lang="bash">
| | |- |
| # Count RTP packets in the direct capture
| | | '''mysql store''' || Database writes || Database bottleneck. Check SQLq metric. |
| tshark -r /tmp/direct_capture.pcap -Y "rtp" | wc -l
| | |} |
|
| |
|
| # Count RTP packets in the GUI capture
| | {{Warning|If '''t0 thread is at 90-100%''', you have hit the fundamental single-core capture limit. The t0 thread reads packets from the kernel and '''cannot be parallelized'''. Disabling features like jitterbuffer will NOT help - those run on different threads. The only solutions are: |
| tshark -r /tmp/gui_capture.pcap -Y "rtp" | wc -l
| | * '''Reduce captured traffic''' using <code>interface_ip_filter</code> or BPF <code>filter</code> |
| | * '''Use kernel bypass''' ([[DPDK]] or [[Napatech]]) which eliminates kernel overhead entirely}} |
|
| |
|
| # Check for RTP from specific source IPs in the direct capture
| | ==== Interpreting t2CPU Detailed Breakdown ==== |
| tshark -r /tmp/direct_capture.pcap -Y "rtp" -T fields -e rtp.ssrc -e ip.src -e ip.dst
| |
|
| |
|
| # Check Call-ID in both captures to verify they're the same call
| | The syslog status line shows <code>t2CPU</code> with detailed sub-metrics: |
| tshark -r /tmp/direct_capture.pcap -Y "sip" -T fields -e sip.Call-ID | head -1
| | <syntaxhighlight lang="text"> |
| tshark -r /tmp/gui_capture.pcap -Y "sip" -T fields -e sip.Call-ID | head -1
| | t2CPU[pb:10/ d:39/ s:24/ e:17/ c:6/ g:6/ r:7/ rm:24/ rh:16/ rd:19/] |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| === Step 4: Interpret the Results ===
| | {| class="wikitable" |
| | | |- |
| {| class="wikitable" style="background:#e7f3ff; border:1px solid #3366cc;" | | ! Code !! Function !! High Value Indicates |
| | |- |
| | | '''pb''' || Packet buffer output || Buffer management overhead |
| | |- |
| | | '''d''' || Dispatch || Structure creation bottleneck |
| |- | | |- |
| ! colspan="2" style="background:#3366cc; color: white;" | Diagnostic Decision Matrix
| | | '''s''' || SIP parsing || Complex/large SIP messages |
| |- | | |- |
| ! Observation
| | | '''e''' || Entity lookup || Call table lookup overhead |
| ! Root Cause & Next Steps
| |
| |- | | |- |
| | '''RTP packets for silent leg are NOT present in direct capture''' | | | '''c''' || Call processing || Call state machine processing |
| | '''Network/PBX Issue:''' The PBX or network is not sending the packets. This is not a VoIPmonitor problem. Troubleshoot the PBX (check NAT, RTP port configuration) or network (SPAN/mirror configuration, firewall rules). | |
| |- | | |- |
| | '''RTP packets for silent leg ARE present in direct capture but missing in GUI capture''' | | | '''g''' || Register processing || High REGISTER volume |
| | '''Sniffer Configuration Issue:''' Packets are on the wire but VoIPmonitor is failing to capture or correlate them. Likely causes: NAT IP mismatch (natalias configuration incorrect), SIP signaling advertises different IP than RTP source, or restrictive filter rules. Proceed with configuration fixes. | |
| |- | | |- |
| | '''RTP packets present in both captures but audio still silent''' | | | '''r, rm, rh, rd''' || RTP processing stages || High RTP volume (threads auto-scale) |
| | '''Codec/Transcoding Issue:''' Packets are captured correctly but may not be decoded properly. Check codec compatibility, unsupported codecs, or transcoding issues on the PBX. | |
| |} | | |} |
|
| |
|
| === Step 5: Apply the Correct Fix Based on Diagnosis ===
| | '''Thread auto-scaling:''' VoIPmonitor automatically spawns additional threads when load increases: |
| | * If '''d''' > 50% → SIP parsing thread ('''s''') starts |
| | * If '''s''' > 50% → Entity lookup thread ('''e''') starts |
| | * If '''e''' > 50% → Call/register/RTP threads start |
|
| |
|
| ;If RTP is NOT on the wire (Network/PBX issue):
| | ==== Configuration for High Traffic (>10,000 calls/sec) ==== |
| :* Check PBX RTP port configuration and firewall rules
| |
| :* Verify network SPAN/mirror is capturing bidirectional traffic (see [[#SPAN_Configuration_Troubleshooting|Section 3]])
| |
| :* Check PBX NAT settings - RTP packets may be blocked or routed incorrectly
| |
|
| |
|
| ;If RTP is on the wire but not captured (Sniffer configuration issue):
| | <syntaxhighlight lang="ini"> |
| | # /etc/voipmonitor.conf |
|
| |
|
| ==== Check rtp_check_both_sides_by_sdp Setting ''(Primary Cause)'' | | # Increase buffer to handle processing spikes (value in MB) |
| | # 10000 = 10 GB - can go higher (20000, 30000+) if RAM allows |
| | # Larger buffer absorbs I/O and CPU spikes without packet loss |
| | max_buffer_mem = 10000 |
|
| |
|
| This is the '''most common cause''' of one-way RTP capture when packets are present on the wire. The <code>rtp_check_both_sides_by_sdp</code> parameter controls how strictly RTP streams are correlated with SDP (Session Description Protocol) signaling.
| | # Use IP filter instead of BPF (more efficient) |
| | interface_ip_filter = 10.0.0.0/8 |
| | interface_ip_filter = 192.168.0.0/16 |
| | # Comment out any 'filter' parameter |
| | </syntaxhighlight> |
|
| |
|
| :* Check the current setting in <code>/etc/voipmonitor.conf</code>:
| | ==== CPU Optimizations ==== |
| :<syntaxhighlight lang="bash">
| |
| :grep "^rtp_check_both_sides_by_sdp" /etc/voipmonitor.conf
| |
| :</syntaxhighlight>
| |
|
| |
|
| :* If the setting is <code>yes</code> or <code>strict</code> or <code>very_strict</code>, this requires '''BOTH sides of RTP to exactly match SDP (SIP signaling)'''
| | <syntaxhighlight lang="ini"> |
| :: <code>strict</code>: Only allows verified packets after first match (blocks unverified)
| | # /etc/voipmonitor.conf |
| :: <code>very_strict</code>: Blocks all unverified packets (most restrictive)
| |
| :: <code>keep_rtp_packets</code>: Same as <code>yes</code> but stores unverified packets for debugging
| |
| | |
| Symptoms of restrictive <code>rtp_check_both_sides_by_sdp</code> settings:
| |
| * Only one call leg appears in CDR (caller OR called, not both)
| |
| * Received packets column shows 0 or very low on one leg
| |
| * tcpdump shows both RTP streams present, but GUI captures only one
| |
| * Affects many calls, not just specific ones
| |
|
| |
|
| '''Solution:''' Change to <code>no</code> or comment out the line:
| | # Reduce jitterbuffer calculations to save CPU (keeps MOS-F2 metric) |
| | jitterbuffer_f1 = no |
| | jitterbuffer_f2 = yes |
| | jitterbuffer_adapt = no |
|
| |
|
| <syntaxhighlight lang="ini">
| | # If MOS metrics are not needed at all, disable everything: |
| ; /etc/voipmonitor.conf
| | # jitterbuffer_f1 = no |
| rtp_check_both_sides_by_sdp = no
| | # jitterbuffer_f2 = no |
| | # jitterbuffer_adapt = no |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Restart the sniffer to apply:
| | ==== Kernel Bypass Solutions (Extreme Loads) ==== |
|
| |
|
| <syntaxhighlight lang="bash">
| | When t0 thread hits 100% on standard NIC, kernel bypass is the only solution: |
| systemctl restart voipmonitor
| |
| </syntaxhighlight>
| |
|
| |
|
| If you previously set <code>rtp_check_both_sides_by_sdp = yes</code> to solve audio mixing issues in multi-call environments where multiple calls share the same IP:port, consider using alternative approaches like <code>sdp_multiplication</code> instead, as enabling strict checking breaks one-way RTP capture.
| | {| class="wikitable" |
| | |- |
| | ! Solution !! Type !! CPU Reduction !! Use Case |
| | |- |
| | | '''[[DPDK]]''' || Open-source || ~70% || Multi-gigabit on commodity hardware |
| | |- |
| | | '''[[Napatech]]''' || Hardware SmartNIC || >97% (< 3% at 10Gbit) || Extreme performance requirements |
| | |} |
|
| |
|
| ==== Other Configuration Checks | | ==== Verify Improvement ==== |
|
| |
|
| If checking <code>rtp_check_both_sides_by_sdp</code> does not resolve the issue, proceed with these additional diagnostic steps:
| | <syntaxhighlight lang="bash"> |
| | # Monitor thread CPU after changes |
| | watch -n 2 "echo 'sniffer_threads' | nc -U /tmp/vm_manager_socket | head -10" |
|
| |
|
| :* Configure '''natalias''' in <code>/etc/voipmonitor.conf</code> to map the IP advertised in SIP signaling to the actual RTP source IP (NAT scenarios only):
| | # Or monitor syslog |
| :<syntaxhighlight lang="ini">
| | journalctl -u voipmonitor -f |
| :; /etc/voipmonitor.conf
| | # t0CPU should drop, heap values should stay < 20% |
| :natalias = <Public_IP_Signaled> <Private_IP_Actual>
| | </syntaxhighlight> |
| :</syntaxhighlight>
| |
| :: When using <code>natalias</code>, ensure <code>rtp_check_both_sides_by_sdp</code> is set to <code>no</code> (the default).
| |
| :* Check for restrictive <code>filter</code> directives in <code>voipmonitor.conf</code>
| |
| :* Verify <code>sipport</code> includes all necessary SIP ports
| |
|
| |
|
| ;If packets are captured but audio silent (Codec issue): | | {{Note|1=After changes, monitor syslog <code>heap[A|B|C]</code> values - should stay below 20% during peak traffic. See [[Syslog_Status_Line]] for detailed metric explanations.}} |
| :* Check CDR view for codec information on both legs
| |
| :* Verify VoIPmonitor GUI has the necessary codec decoders installed
| |
| :* Check for codec mismatches between call legs (transcoding may be missing)
| |
|
| |
|
| === Step 6: Verify the Fix After Configuration Changes === | | == Storage Hardware Failure == |
|
| |
|
| After making changes in <code>/etc/voipmonitor.conf</code>:
| | '''Symptom''': Sensor shows disconnected (red X) with "DROPPED PACKETS" at low traffic volumes. |
|
| |
|
| | '''Diagnosis''': |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Restart the sniffer | | # Check disk health |
| systemctl restart voipmonitor
| | smartctl -a /dev/sda |
|
| |
|
| # Make another test call and repeat the diagnostic workflow | | # Check RAID status (if applicable) |
| # Compare direct vs GUI capture again
| | cat /proc/mdstat |
| | mdadm --detail /dev/md0 |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Confirm that RTP packets for the problematic leg now appear in both the direct tcpdump capture AND the GUI's PCAP file.
| | Look for reallocated sectors, pending sectors, or RAID degraded state. Replace failing disk. |
| | |
| '''Note:''' This diagnostic methodology helps you identify whether the issue is in the network infrastructure (PBX, SPAN, firewall) or in VoIPmonitor configuration (natalias, filters). Applying VoIPmonitor configuration fixes when the root cause is a network issue will not resolve the problem.
| |
| | |
| == Troubleshooting: Server Coredumps and SQL Queue Overload ==
| |
|
| |
|
| If the VoIPmonitor server is experiencing regular coredumps, the cause may be an SQL queue bottleneck that exceeds system limits. The SQL queue grows when the database cannot keep up with the rate of data being inserted from VoIPmonitor.
| | == OOM (Out of Memory) == |
|
| |
|
| === Symptoms === | | === Identify OOM Victim === |
|
| |
|
| * Server crashes or coredumps regularly, often during peak traffic hours
| |
| * Syslog messages showing a growing <code>SQLq</code> counter (SQL queries waiting)
| |
| * Crashes occur when OPTIONS, SUBSCRIBE, and NOTIFY messages are being processed at high volume
| |
|
| |
| === Identify the Root Cause ===
| |
|
| |
| ;1. Check the SQL queue metric in syslog:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Debian/Ubuntu | | # Check for OOM kills |
| tail -f /var/log/syslog | grep "SQLq"
| | dmesg | grep -i "out of memory\|oom\|killed process" |
| | | journalctl --since "1 hour ago" | grep -i oom |
| # CentOS/RHEL
| |
| tail -f /var/log/messages | grep "SQLq"
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Look for the <code>SQLq[XXX]</code> value where XXX is the number of queued SQL commands. If this number is consistently growing or reaching high values (thousands or more), the database is a bottleneck.
| | === MySQL Killed by OOM === |
|
| |
|
| ;2. Check if SIP message processing is enabled:
| | Reduce InnoDB buffer pool: |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="ini"> |
| grep -E "sip-options=|sip-subscribe=|sip-notify=" /etc/voipmonitor.conf
| | # /etc/mysql/my.cnf |
| | innodb_buffer_pool_size = 2G # Reduce from default |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| If these are set to <code>yes</code> and you have a high volume of these messages (OPTIONS pings sent frequently by SIP devices), this can overwhelm the database insert thread queue.
| | === Voipmonitor Killed by OOM === |
|
| |
|
| === Solutions === | | Reduce buffer sizes in voipmonitor.conf: |
| | <syntaxhighlight lang="ini"> |
| | max_buffer_mem = 2000 # Reduce from default |
| | ringbuffer = 50 # Reduce from default |
| | </syntaxhighlight> |
|
| |
|
| There are three approaches to resolve SQL queue overload coredumps:
| | === Runaway External Process === |
|
| |
|
| ==== Solution 1: Increase MySQL Insert Threads === | | <syntaxhighlight lang="bash"> |
| | # Find memory-hungry processes |
| | ps aux --sort=-%mem | head -20 |
|
| |
|
| Increase the number of threads dedicated to inserting SIP messages into the database. This allows more parallel database operations.
| | # Kill orphaned/runaway process |
| | kill -9 <PID> |
| | </syntaxhighlight> |
| | For servers limited to '''16GB RAM''' or when experiencing repeated MySQL OOM kills: |
|
| |
|
| Edit <code>/etc/voipmonitor.conf</code> and add or modify:
| | <syntaxhighlight lang="ini"> |
| | # /etc/my.cnf or /etc/mysql/mariadb.conf.d/50-server.cnf |
| | [mysqld] |
| | # On 16GB server: 6GB buffer pool + 6GB MySQL overhead = 12GB total |
| | # Leaves 4GB for OS + GUI, preventing OOM |
| | innodb_buffer_pool_size = 6G |
|
| |
|
| <syntaxhighlight lang="ini">
| | # Enable write buffering (may lose up to 1s of data on crash but reduces memory pressure) |
| # Increase insert threads for SIP messages (default is 4, increase to 8 or higher for high traffic) | | innodb_flush_log_at_trx_commit = 2 |
| mysqlstore_max_threads_sip_msg = 8
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Restart VoIPmonitor for the change to take effect: | | Restart MySQL after changes: |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| systemctl restart voipmonitor | | systemctl restart mysql |
| | # or |
| | systemctl restart mariadb |
| </syntaxhighlight> | | </syntaxhighlight> |
| | === SQL Queue Growth from Non-Call Data === |
|
| |
|
| {{Tip|For very high traffic environments, you may need to increase this value further (e.g., 12 or 16).}}
| | If <code>sip-register</code>, <code>sip-options</code>, or <code>sip-subscribe</code> are enabled, non-call SIP-messages (OPTIONS, REGISTER, SUBSCRIBE, NOTIFY) can accumulate in the database and cause the SQL queue to grow unbounded. This increases MySQL memory usage and leads to OOM kills of mysqld. |
|
| |
|
| ==== Solution 2: Disable High-Volume SIP Message Types === | | {{Warning|1=Even with reduced <code>innodb_buffer_pool_size</code>, SQL queue will grow indefinitely without cleanup of non-call data.}} |
| | |
| Reduce the load on the SQL queue by disabling processing of specific high-volume SIP message types that are not needed for your analysis.
| |
| | |
| Edit <code>/etc/voipmonitor.conf</code>:
| |
|
| |
|
| | '''Solution: Enable automatic cleanup of old non-call data''' |
| <syntaxhighlight lang="ini"> | | <syntaxhighlight lang="ini"> |
| # Disable processing and database storage for specific message types | | # /etc/voipmonitor.conf |
| sip-options = no
| | # cleandatabase=2555 automatically deletes partitions older than 7 years |
| sip-subscribe = no
| | # Covers: CDR, register_state, register_failed, and sip_msg (OPTIONS/SUBSCRIBE/NOTIFY) |
| sip-notify = no
| | cleandatabase = 2555 |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Restart VoIPmonitor: | | Restart the sniffer after changes: |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| systemctl restart voipmonitor | | systemctl restart voipmonitor |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| {{Note|See [[SIP_OPTIONS/SUBSCRIBE/NOTIFY]] for detailed information on these options and when to use <code>nodb</code> mode instead of disabling entirely.}} | | {{Note|See [[Data_Cleaning]] for detailed configuration options and other <code>cleandatabase_*</code> parameters.}} |
| | == Service Startup Failures == |
|
| |
|
| ==== Solution 3: Optimize MySQL Performance === | | === Interface No Longer Exists === |
|
| |
|
| Tune the MySQL/MariaDB server for better write performance to handle the high insert rate from VoIPmonitor.
| | After OS upgrade, interface names may change (eth0 → ensXXX): |
|
| |
|
| Edit your MySQL configuration file (typically <code>/etc/mysql/my.cnf</code> or <code>/etc/mysql/mariadb.conf.d/50-server.cnf</code>):
| | <syntaxhighlight lang="bash"> |
| | # Find current interface names |
| | ip a |
|
| |
|
| <syntaxhighlight lang="ini">
| | # Update all config locations |
| [mysqld]
| | grep -r "interface" /etc/voipmonitor.conf /etc/voipmonitor.conf.d/ |
| # InnoDB buffer pool size - set to approximately 50-70% of available RAM on a dedicated database server | |
| # On servers running VoIPmonitor and MySQL together, use approximately 30-50% of RAM
| |
| innodb_buffer_pool_size = 8G
| |
|
| |
|
| # Reduce transaction durability for faster writes (may lose up to 1 second of data on crash) | | # Also check GUI: Settings → Sensors → Configuration |
| innodb_flush_log_at_trx_commit = 2
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Restart MySQL and VoIPmonitor:
| | === Missing Dependencies === |
| | |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| systemctl restart mysql
| | # Install common missing package |
| systemctl restart voipmonitor
| | apt install libpcap0.8 # Debian/Ubuntu |
| | yum install libpcap # RHEL/CentOS |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| {{Warning|Setting <code>innodb_flush_log_at_trx_commit</code> to <code>2</code> trades some data safety for performance. In the event of a power loss or crash, up to 1 second of the most recent transactions may be lost.}}
| | == Network Interface Issues == |
| | |
| === Additional Troubleshooting === | |
| | |
| * If increasing threads and disabling SIP message types do not resolve the issue, check if the database server itself has performance bottlenecks (CPU, disk I/O, memory)
| |
| * For systems with extremely high call volumes, consider moving the database to a separate dedicated server
| |
| * Monitor the <code>SQLq</code> metric after making changes to verify the queue is not growing unchecked
| |
|
| |
|
| == Appendix: tshark Display Filter Syntax for SIP == | | === Promiscuous Mode === |
| When using <code>tshark</code> to analyze SIP traffic, it is important to use the '''correct Wireshark display filter syntax'''. Below are common filter examples:
| |
|
| |
|
| === Basic SIP Filters ===
| | Required for SPAN port monitoring: |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Show all SIP INVITE messages | | # Enable |
| tshark -r capture.pcap -Y "sip.Method == INVITE"
| | ip link set eth0 promisc on |
|
| |
|
| # Show all SIP messages (any method) | | # Verify |
| tshark -r capture.pcap -Y "sip"
| | ip link show eth0 | grep PROMISC |
| | |
| # Show SIP and RTP traffic
| |
| tshark -r capture.pcap -Y "sip || rtp"
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| === Search for Specific Phone Number or Text ===
| | {{Note|Promiscuous mode is NOT required for ERSPAN/GRE tunnels where traffic is addressed to the sensor.}} |
| <syntaxhighlight lang="bash">
| |
| # Find calls containing a specific phone number (e.g., 5551234567)
| |
| tshark -r capture.pcap -Y 'sip contains "5551234567"'
| |
|
| |
|
| # Find INVITE messages for a specific number
| | === Interface Drops === |
| tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"'
| |
| </syntaxhighlight>
| |
|
| |
|
| === Extract Call-ID from Matching Calls ===
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Get Call-ID for calls matching a phone number | | # Check for drops |
| tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"' -T fields -e sip.Call-ID
| | ip -s link show eth0 | grep -i drop |
|
| |
|
| # Get Call-ID along with From and To headers | | # If drops present, increase ring buffer |
| tshark -r capture.pcap -Y 'sip.Method == INVITE' -T fields -e sip.Call-ID -e sip.from.user -e sip.to.user
| | ethtool -G eth0 rx 4096 |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| === Filter by IP Address === | | === Bonded/EtherChannel Interfaces === |
| <syntaxhighlight lang="bash">
| | |
| # SIP traffic from a specific source IP
| | '''Symptom''': False packet loss when monitoring bond0 or br0. |
| tshark -r capture.pcap -Y "sip && ip.src == 192.168.1.100"
| |
|
| |
|
| # SIP traffic between two hosts | | '''Solution''': Monitor physical interfaces, not logical: |
| tshark -r capture.pcap -Y "sip && ip.addr == 192.168.1.100 && ip.addr == 10.0.0.50"
| | <syntaxhighlight lang="ini"> |
| | # voipmonitor.conf - use physical interfaces |
| | interface = eth0,eth1 |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| === Filter by SIP Response Code === | | === Network Offloading Issues === |
| <syntaxhighlight lang="bash">
| |
| # Show all 200 OK responses
| |
| tshark -r capture.pcap -Y "sip.Status-Code == 200"
| |
|
| |
|
| # Show all 4xx and 5xx error responses
| | '''Symptom''': Kernel errors like <code>bad gso: type: 1, size: 1448</code> |
| tshark -r capture.pcap -Y "sip.Status-Code >= 400"
| |
|
| |
|
| # Show 486 Busy Here responses | | <syntaxhighlight lang="bash"> |
| tshark -r capture.pcap -Y "sip.Status-Code == 486"
| | # Disable offloading on capture interface |
| | ethtool -K eth0 gso off tso off gro off lro off |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| === Important Syntax Notes === | | == Packet Ordering Issues == |
| * '''Field names are case-sensitive:''' Use <code>sip.Method</code>, <code>sip.Call-ID</code>, <code>sip.Status-Code</code> (not <code>sip.method</code> or <code>sip.call-id</code>)
| |
| * '''String matching uses <code>contains</code>:''' Use <code>sip contains "text"</code> (not <code>sip.contains()</code>)
| |
| * '''Use double quotes for strings:''' <code>sip contains "number"</code> (not single quotes)
| |
| * '''Boolean operators:''' Use <code>&&</code> (and), <code>||</code> (or), <code>!</code> (not)
| |
|
| |
|
| For a complete reference, see the [https://www.wireshark.org/docs/dfref/s/sip.html Wireshark SIP Display Filter Reference].
| | If SIP messages appear out of sequence: |
|
| |
|
| == Troubleshooting: Database Error 1062 - Lookup Table Auto-Increment Limit ==
| | '''First''': Rule out Wireshark display artifact - disable "Analyze TCP sequence numbers" in Wireshark. See [[FAQ]]. |
|
| |
|
| If the sniffer logs show a database error `1062 - Duplicate entry '16777215' for key 'PRIMARY'` and new CDRs stop being stored, this is caused by a lookup table reaching its maximum auto-increment limit. | | '''If genuine reordering''': Usually caused by packet bursts in network infrastructure. Use tcpdump to verify packets arrive out of order at the interface. Work with network admin to implement QoS or traffic shaping. For persistent issues, consider dedicated capture card with hardware timestamping (see [[Napatech]]). |
| | {{Note|For out-of-order packets in '''client/server mode''' (multiple sniffers), see [[Sniffer_distributed_architecture]] for <code>pcap_queue_dequeu_window_length</code> configuration.}} |
|
| |
|
| === Symptoms === | | === Solutions for SPAN/Mirroring Reordering === |
|
| |
|
| * CDRs stop being inserted into the database
| | If packets arrive out of order at the SPAN/mirror port (e.g., 302 responses before INVITE causing "000 no response" errors): |
| * Sniffer logs show: `query error in [call __insert_10_0S1();]: 1062 - Duplicate entry '16777215' for key 'PRIMARY'`
| |
| * The error affects a lookup table (such as `cdr_sip_response` or `cdr_reason`)
| |
| * The value 16777215 (16,777,215) indicates the table is using `MEDIUMINT UNSIGNED` for the ID column
| |
|
| |
|
| === Root Cause ===
| | 1. '''Configure switch to preserve packet order''': Many switches allow configuring SPAN/mirror ports to maintain packet ordering. Consult your switch documentation for packet ordering guarantees in mirroring configuration. |
|
| |
|
| VoIPmonitor uses lookup tables (like `cdr_sip_response` or `cdr_reason`) to store unique values such as SIP response reason strings or custom response text. These are used to normalize data and reduce storage in the main `cdr` table.
| | 2. '''Replace SPAN with TAP or packet broker''': Unlike software-based SPAN mirroring, hardware TAPs and packet brokers guarantee packet order. Consider upgrading to a dedicated TAP or packet broker device for mission-critical monitoring. |
| | == Database Issues == |
|
| |
|
| When the system receives many unique SIP response strings or reason messages (e.g., different error messages from various carriers, devices with custom SIP header formats, or PBX-specific responses), the lookup table's auto-increment ID can reach the `MEDIUMINT` limit of 16,777,215. Once this limit is hit, new unique values cannot be inserted, causing all subsequent CDRs to fail with error 1062.
| | === SQL Queue Overload === |
|
| |
|
| === Identifying the Affected Table ===
| | '''Symptom''': Growing <code>SQLq</code> metric, potential coredumps. |
|
| |
|
| Check which lookup table is hitting the limit:
| | <syntaxhighlight lang="ini"> |
| | | # voipmonitor.conf - increase threads |
| <syntaxhighlight lang="sql"> | | mysqlstore_concat_limit_cdr = 1000 |
| -- Check the current AUTO_INCREMENT value for lookup tables | | cdr_check_exists_callid = 0 |
| SELECT
| |
| TABLE_NAME,
| |
| COLUMN_TYPE,
| |
| AUTO_INCREMENT
| |
| FROM
| |
| INFORMATION_SCHEMA.TABLES
| |
| JOIN
| |
| INFORMATION_SCHEMA.COLUMNS
| |
| USING (TABLE_SCHEMA, TABLE_NAME)
| |
| WHERE
| |
| TABLE_SCHEMA = 'voipmonitor' AND
| |
| (TABLE_NAME LIKE 'cdr_sip%' OR TABLE_NAME LIKE 'cdr_reason%') AND
| |
| COLUMN_KEY = 'PRI' AND
| |
| EXTRA LIKE '%auto_increment%'
| |
| ORDER BY AUTO_INCREMENT DESC;
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Look for AUTO_INCREMENT values approaching or exceeding 16,000,000 in tables using `MEDIUMINT`.
| | === Error 1062 - Lookup Table Limit === |
| | |
| === Solution: Prevent New Unique Entries === | |
|
| |
|
| The most effective solution is to configure VoIPmonitor to stop storing or normalize the unique SIP response text that is causing the rapid growth of the lookup table.
| | '''Symptom''': <code>Duplicate entry '16777215' for key 'PRIMARY'</code> |
| | |
| ==== Option 1: Disable SIP Response Text Storage ===
| |
| | |
| Edit `/etc/voipmonitor.conf` on the sniffer to disable storing SIP response reason text:
| |
|
| |
|
| | '''Quick fix''': |
| <syntaxhighlight lang="ini"> | | <syntaxhighlight lang="ini"> |
| # Disable storing SIP response reason strings in lookup tables | | # voipmonitor.conf |
| cdr_reason_string_enable = no | | cdr_reason_string_enable = no |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| This prevents the system from creating new unique entries for SIP response reason strings. Restart the sniffer:
| | See [[Database_troubleshooting#Database_Error_1062_-_Lookup_Table_Auto-Increment_Limit|Database Troubleshooting]] for complete solution. |
|
| |
|
| <syntaxhighlight lang="bash">
| | == Bad Packet Errors == |
| systemctl restart voipmonitor
| |
| </syntaxhighlight>
| |
|
| |
|
| ==== Option 2: Normalize Response Text ===
| | '''Symptom''': <code>bad packet with ether_type 0xFFFF detected on interface</code> |
|
| |
|
| If you need to keep some response text but reduce the number of unique entries, enable normalization in `/etc/voipmonitor.conf`:
| | '''Diagnosis''': |
| | <syntaxhighlight lang="bash"> |
| | # Run diagnostic (let run 30-60 seconds, then kill) |
| | voipmonitor --check_bad_ether_type=eth0 |
|
| |
|
| <syntaxhighlight lang="ini">
| | # Find and kill the diagnostic process |
| # Normalize SIP response text to reduce unique entries | | ps ax | grep voipmonitor |
| cdr_reason_normalisation = yes
| | kill -9 <PID> |
| cdr_sip_response_normalisation = yes
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Normalization transforms similar response strings into a single canonical form, significantly reducing the number of unique rows created.
| | Causes: corrupted packets, driver issues, VLAN tagging problems. Check <code>ethtool -S eth0</code> for interface errors. |
|
| |
|
| ==== Option 3: Clean Existing Data (Optional) === | | == Useful Diagnostic Commands == |
|
| |
|
| After disabling or normalizing new entries, you may want to clear the lookup table to free space. The data in lookup tables is only used for display purposes and is not critical for historical analysis.
| | === tshark Filters for SIP === |
|
| |
|
| <syntaxhighlight lang="sql"> | | <syntaxhighlight lang="bash"> |
| -- Clear the cdr_sip_response table (adjust table name as needed) | | # All SIP INVITEs |
| TRUNCATE TABLE cdr_sip_response;
| | tshark -r capture.pcap -Y "sip.Method == INVITE" |
| </syntaxhighlight>
| |
|
| |
|
| {{Warning|TRUNCATE permanently deletes all data. This will remove the exact SIP response text display in the GUI for historical CDRs, but will not affect the main CDR records or call data. Only do this if you are certain you no longer need the original response text.}}
| | # Find specific phone number |
| | tshark -r capture.pcap -Y 'sip contains "5551234567"' |
|
| |
|
| === Verification === | | # Get Call-IDs |
| | tshark -r capture.pcap -Y "sip.Method == INVITE" -T fields -e sip.Call-ID |
|
| |
|
| After applying the fix:
| | # SIP errors (4xx, 5xx) |
| | | tshark -r capture.pcap -Y "sip.Status-Code >= 400" |
| 1. Check that CDRs are being stored again by monitoring the sniffer logs
| |
| 2. Verify the lookup table AUTO_INCREMENT is no longer increasing rapidly:
| |
| <syntaxhighlight lang="sql">
| |
| SELECT AUTO_INCREMENT FROM INFORMATION_SCHEMA.TABLES
| |
| WHERE TABLE_NAME = 'cdr_sip_response' AND TABLE_SCHEMA = 'voipmonitor';
| |
| </syntaxhighlight> | | </syntaxhighlight> |
| 3. Monitor the error logs to confirm the 1062 error has stopped appearing
| |
|
| |
|
| === Important Note: NOT a Database Schema Issue === | | === Interface Statistics === |
|
| |
|
| This error is typically NOT solved by changing the database schema (e.g., migrating to BIGINT). The root cause is storing too many unique SIP response strings, which will continue to grow regardless of the ID column size. The correct solution is to configure VoIPmonitor to stop creating these unique entries via the `cdr_reason_string_enable` configuration option.
| | <syntaxhighlight lang="bash"> |
| | # Detailed NIC stats |
| | ethtool -S eth0 |
|
| |
|
| {{Warning|Do NOT confuse this with the unrelated `cdr` table integer overflow problem. The main `cdr` table may encounter limits around 4 billion rows (32-bit INT), which is addressed in the [[Upgrade_to_bigint]] guide. Lookup table issues at 16.7 million (MEDIUMINT) are solved by configuration, not schema migration.}}
| | # Watch packet rates |
| | watch -n 1 'cat /proc/net/dev | grep eth0' |
| | </syntaxhighlight> |
|
| |
|
| === Routing Loops === | | == See Also == |
|
| |
|
| Routing loops occur when SIP INVITE requests continuously circulate between SIP servers without completing, causing excessive traffic and call failures. Common symptoms include:
| | * [[Sniffer_configuration]] - Configuration parameter reference |
| | * [[Sniffer_distributed_architecture]] - Client/server deployment |
| | * [[Capture_rules]] - GUI-based recording rules |
| | * [[Sniffing_modes]] - SPAN, ERSPAN, GRE, TZSP setup |
| | * [[Scaling]] - Performance optimization |
| | * [[Database_troubleshooting]] - Database issues |
| | * [[FAQ]] - Common questions and Wireshark display issues |
|
| |
|
| * High volume of calls to a single destination number in a short time period
| |
| * Many INVITE requests with no SIP response (response code 0)
| |
| * Very long Post Dial Delay (PDD) values
| |
| * Rapid retransmission of INVITE to the same called number
| |
|
| |
|
| {{Note|Routing loops can be caused by misconfigured dial plans, incorrect SIP URI formats, or circular forwarding rules.}}
| |
|
| |
|
| ==== Detection Methods
| |
|
| |
|
| Use alerts to detect routing loops:
| |
|
| |
|
| * '''SIP Response Alert (Response code 0)''': Configure an alert to detect unreplied INVITE requests. [[Alerts|Configure this in GUI > Alerts]] by setting Response code to 0. This catches calls in a loop that never receive any SIP response.
| |
|
| |
|
| * '''PDD (Post Dial Delay) Alert''': Configure a PDD alert with a threshold (e.g., <code>PDD > 30</code> seconds) to detect calls taking excessively long to complete. Routing loops often have very high PDD values as INVITEs continue retransmitting. [[Alerts|See Alerts documentation for PDD configuration]].
| |
|
| |
|
| * '''Fraud: Sequential Alert''': Monitor for excessive calls to any single destination number within a short time window. Configure [[Anti-fraud|Fraud: Sequential]] with an appropriate Interval and Limit (e.g., 50 calls in 1 hour to the same number). Leave the called number field empty to monitor all destinations.
| |
|
| |
|
| ==== Troubleshooting Steps | | == AI Summary for RAG == |
|
| |
|
| 1. Identify the looping destination number from alert logs or CDR search
| | <!-- This section is for AI/RAG systems. Do not edit manually. --> |
| 2. Check the SIP dialog to trace the call path (use PCAP analysis in GUI)
| |
| 3. Verify dial plan configuration on all involved SIP servers
| |
| 4. Look for forwarding rules or translation patterns that may create circular routing
| |
| 5. Fix the misconfiguration and verify the loop no longer occurs
| |
|
| |
|
| == See Also == | | === Summary === |
| * [[Sniffer_configuration]] - Complete configuration reference for voipmonitor.conf
| | Comprehensive troubleshooting guide for VoIPmonitor sniffer/sensor problems. Covers: verifying traffic reaches interface (tcpdump/tshark), diagnosing no calls recorded (service, config, capture rules, SPAN), missing audio/RTP issues (one-way audio, NAT, natalias, rtp_check_both_sides_by_sdp), PACKETBUFFER FULL errors (I/O vs CPU bottleneck diagnosis using syslog metrics heap/t0CPU/SQLq and Linux tools iostat/iotop/ioping), manager commands for thread monitoring (sniffer_threads via socket or port 5029), t0 single-core capture limit and solutions (DPDK/Napatech kernel bypass), I/O solutions (NVMe/SSD, async writes, pcap_dump_writethreads), CPU solutions (max_buffer_mem 10GB+, jitterbuffer tuning), OOM issues (MySQL buffer pool, voipmonitor buffers), network interface problems (promiscuous mode, drops, offloading), packet ordering, database issues (SQL queue, Error 1062). |
| * [[Sniffer_distributed_architecture]] - Client/server deployment and troubleshooting
| |
| * [[Capture_rules]] - GUI-based selective recording configuration
| |
| * [[Sniffing_modes]] - Traffic forwarding methods (SPAN, ERSPAN, GRE, TZSP)
| |
| * [[Scaling]] - Performance tuning and optimization
| |
| * [[Upgrade_to_bigint]] - Migrating CDR table to BIGINT (unrelated to lookup table issues)
| |
| | |
| == AI Summary for RAG == | |
| '''Summary:''' Comprehensive troubleshooting guide for VoIPmonitor sensor issues. POST-REBOOT VERIFICATION: After planned server reboot, verify two critical items: (1) Firewall/Iptables Rules - check with `iptables -L -n -v`, `firewall-cmd --list-all`, or `ufw status verbose`. Verify VoIPmonitor ports are allowed: SIP (5060/udp), RTP range, GUI (80/tcp, 443/tcp), sensor management (5029/tcp), Client-Server (60024/tcp). Make rules persistent: for iptables use `iptables-save > /etc/iptables/rules.v4` and install `iptables-persistent`; for firewalld use `--permanent` flag. (2) System Time Synchronization - CRITICAL especially for packetbuffer_sender mode. Check with `ntpstat` or `chronyc tracking`. Verify with `ntpq -p` or `chronyc sources -v`. Time offset should be under 100ms. For packetbuffer_sender mode, host and server times must match for proper call correlation (max difference: 2 seconds). Ensure all distributed sensors and central server use same NTP source: `timedatectl status`. Troubleshoot time sync: check firewall allows UDP 123, verify NTP servers reachable, review `/etc/ntp.conf` or `/etc/chrony.conf`, enable service on boot. MAIN TROUBLESHOOTING STEPS for no calls: (1) Verify service running with <code>systemctl status</code>. If service fails to start or crashes immediately with "missing package" error: check logs (syslog/journalctl), install missing dependencies - most commonly <code>rrdtool</code> for RRD graphing/statistics (apt-get install rrdtool or yum/dnf install rrdtool), other common missing packages: libpcap, libssl, zlib. Use <code>ldd</code> to check shared library dependencies. Restart service after installing packages. (2) CRITICAL STEP: Use <code>tshark</code> to verify live traffic is reaching the correct network interface: <code>tshark -i eth0 -Y "sip || rtp" -n</code> (replace eth0 with interface from voipmonitor.conf). If command shows NO packets: issue is network - check SPAN/mirror port configuration on switch, firewall rules. If command shows OPTIONS/NOTIFY/SUBSCRIBE/METHOD but NO INVITE packets: environment has no calls (VOIPmonitor requires INVITE for CDRs). Configure to process non-call SIP messages in voipmonitor.conf with sip-options, sip-message, sip-subscribe, sip-notify set to yes. (3) Check network config - promiscuous mode required for SPAN/RSPAN but NOT for Layer 3 tunnels (ERSPAN/GRE/TZSP/VXLAN). (3A) SPECIAL CASE: Missing packets for specific IPs during high-traffic periods. Use tcpdump FIRST: `tcpdump -i eth0 -nn "host 10.1.2.3 and port 5060"`. If NO packets arrive -> check SPAN config for bidirectional capture (source ports, BOTH inbound/outbound, SPAN buffer saturation during peak, VLAN trunking). If packets DO arrive -> check sensor bottlenecks (ringbuffer, t0CPU, OOM, max_sip_packets_in_call). (3a) If tcpdump shows traffic but VoIPmonitor does NOT capture it, investigate packet encapsulation - capture with tcpdump and analyze with tshark for VLAN tags, ERSPAN, GRE (tshark -Y "gre"), VXLAN (udp.port == 4789), TZSP (udp.port 37008/37009). VLAN tags: ensure filter directive does not use "udp" which drops VLAN-tagged packets. ERSPAN/GRE: verify tunnel configured correctly and packets addressed to sensor IP (promiscuous mode NOT required). VXLAN/TZSP: require proper sending device configuration. (3B) SPECIAL CASE: RTP streams not displayed for specific provider. If SIP signaling works in GUI but RTP streams/quality graphs missing for one provider while working for others: Step 1: Make a test call to reproduce issue. Step 2: During test call, capture RTP packets with tcpdump: `sudo tcpdump -i eth0 -nn "host 1.2.3.4 and rtp" -w /tmp/test_provider_rtp.pcap`. Step 3: Compare tcpdump output with sensor GUI. If tcpdump shows NO RTP packets: network-level issue (asymmetric routing, SPAN config missing RTP path). If tcpdump shows RTP packets but GUI shows no streams: check capture rules with RTP set to DISCARD/Header Only, SRTP decryption config, or sipport/filter settings. (4) Verify <code>voipmonitor.conf</code> settings: interface, sipport, filter directives. (5) Check GUI capture rules with "Skip" option blocking calls. (6) Review system logs for errors. (7) Diagnose OOM killer events causing CDR processing stops. (8) Investigate missing CDRs due tosnaplen truncation, MTU mismatch, or EXTERNAL SOURCE packet truncation. Cause 3: If packets truncated before reaching VoIPmonitor (e.g., Kamailio siptrace, FreeSWITCH sip_trace, custom HEP/HOMER agents, load balancer mirrors), snaplen changes will NOT help. Diagnose with tcpdump -s0; check if received packets smaller than expected. Solutions: For Kamailio siptrace, use TCP transport in duplicate_uri parameter; if connection refused, open TCP listener with socat; best solution: use HAProxy traffic 'tee' to bypass siptrace entirely and send original packets directly. (9) Diagnose probe timeout due to virtualization timing issues - check syslog for 10-second voipmonitor status intervals, RDTSC problems on hypervisor cause >30 second gaps triggering timeouts. (10) Server coredumps and SQL queue overload: Check syslog for growing `SQLq` counter indicating database bottleneck. Symptoms include regular coredumps during peak hours when processing high-volume OPTIONS/SUBSCRIBE/NOTIFY messages. Solutions: 1) Increase `mysqlstore_max_threads_sip_msg` in voipmonitor.conf from default 4 to 8 or higher, restart service. 2) Disable high-volume SIP message types if not needed: set `sip-options=no`, `sip-subscribe=no`, `sip-notify=no`. 3) Optimize MySQL performance with `innodb_buffer_pool_size=8G` (or 50-70% of RAM on dedicated DB, 30-50% on shared) and `innodb_flush_log_at_trx_commit=2`. Restart MySQL and VoIPmonitor after changes. Monitor SQLq metric to verify queue is stable. (11) DATABASE ERROR 1062 - LOOKUP TABLE LIMIT: If sniffer logs show `1062 - Duplicate entry '16777215' for key 'PRIMARY'` and CDRs stop being stored, this is caused by lookup tables (cdr_sip_response, cdr_reason) hitting MEDIUMINT auto-increment limit (16,777,215) due to too many unique SIP response strings. This is NOT a schema migration issue. SOLUTION: Edit `/etc/voipmonitor.conf` and set `cdr_reason_string_enable = no` to disable storing SIP response reason strings, or enable normalization with `cdr_reason_normalisation=yes` and `cdr_sip_response_normalisation=yes` to reduce unique entries. Restart sniffer after changes. Optionally TRUNCATE table to clean existing data. Do NOT confuse with unrelated cdr table INT overflow (4 billion rows) which requires schema migration via Upgrade_to_bigint guide. Includes tshark display filter syntax appendix.
| |
| | |
| '''Keywords:''' troubleshooting, no calls, not sniffing, no CDRs, tshark, missing package, missing library, rrdtool, rrdtools, dependencies, service failed, service crashed, ldd, libpcap, libssl, zlib, systemctl restart, journalctl, syslog, promiscuous mode, SPAN, RSPAN, ERSPAN, GRE, TZSP, VXLAN, voipmonitor.conf, interface, sipport, filter, capture rules, Skip, OOM, out of memory, snaplen, MTU, packet truncation, external source truncation, Kamailio siptrace, FreeSWITCH sip_trace, OpenSIPS, HEP, HOMER, HAProxy tee, traffic mirroring, load balancer, socat, TCP listener, WebRTC INVITE, truncated packets, corrupted packets, Authorization header, 4k packets, display filter, sip.Method, sip.Call-ID, probe timeout, virtualization, RDTSC, timing issues, status logs, 10 second interval, KVM, VMware, Hyper-V, Xen, non-call SIP traffic, OPTIONS, NOTIFY, SUBSCRIBE, MESSAGE, sip-options, sip-message, sip-subscribe, sip-notify, qualify pings, heartbeat, instant messaging, encapsulation, packet encapsulation, VLAN tags, 802.1Q, tcpdump analysis, tshark encapsulation filters, high traffic, specific IP, missing packets, specific IP addresses, call legs missing, INVITE missing, high-traffic periods, peak hours, bidirectional capture, inbound outbound, both directions, SPAN buffer saturation, port mirroring, SPAN buffer capacity, rx tx both, monitor session, SPAN source, SPAN destination, ringbuffer, t0CPU, max_sip_packets_in_call, max_invite_packets_in_call, RTP missing, RTP not displayed, RTP missing specific provider, audio quality graphs missing, SRTP, asymmetric routing, RTP test call, tcpdump RTP capture, RTP stream visualization, audio missing, audio missing on one leg, partial audio, silenced audio, one call leg, carrier, PBX, inside, outside, tcpdump tshark comparison, direct capture vs GUI capture, diagnose audio issues, RTP packets on the wire, NAT IP mismatch, natalias configuration, codec issue, transcoding, RTP port configuration, network issue, PBX issue, sniffer configuration, packet correlation, RTP source IP mismatch, SIP signaling IP, coredump, server crash, SQL queue, SQLq, mysqlstore_max_threads_sip_msg, innodb_buffer_pool_size, innodb_flush_log_at_trx_commit, database bottleneck, SQL queue overflow, performance tuning, post-reboot verification, after reboot, server reboot, planned reboot, firewall verification, iptables check, firewalld check, ufw status, firewall persistence, iptables-persistent, firewall persistent, time synchronization, NTP, chrony, ntpstat, chronyc tracking, timedatectl, time sync, time drift, NTP port 123, distributed architecture time sync, client_server_connect_maximum_time_diff_s, packetbuffer_sender time sync, 1062 duplicate entry, 16777215, lookup table, MEDIUMINT limit, cdr_sip_response, cdr_reason, cdr_reason_string_enable, auto-increment limit, SIP response strings, unique entries, normalization, cdr_reason_normalisation, cdr_sip_response_normalisation, TRUNCATE cdr_sip_response, database error, lookup table overflow, rtp_check_both_sides_by_sdp, one-way RTP, RTP capturing one stream, RTP strict verification, SDP RTP matching
| |
|
| |
|
| '''Key Questions:'''
| | === Keywords === |
| * What should I verify after a planned server reboot to ensure VoIPmonitor operates correctly? (Verify firewall rules and time synchronization)
| | troubleshooting, sniffer, sensor, no calls, missing audio, one-way audio, RTP, PACKETBUFFER FULL, memory is FULL, buffer saturation, I/O bottleneck, CPU bottleneck, heap, t0CPU, t1CPU, t2CPU, SQLq, comp, tacCPU, iostat, iotop, ioping, sniffer_threads, manager socket, port 5029, thread CPU, t0 thread, single-core limit, DPDK, Napatech, kernel bypass, NVMe, SSD, async write, pcap_dump_writethreads, tar_maxthreads, max_buffer_mem, jitterbuffer, interface_ip_filter, OOM, out of memory, innodb_buffer_pool_size, promiscuous mode, interface drops, ethtool, packet ordering, SPAN, mirror, SQL queue, Error 1062, natalias, NAT, id_sensor, snaplen, capture rules, tcpdump, tshark |
| * How do I check firewall rules after a server reboot? (Use iptables -L -n -v, firewall-cmd --list-all, or ufw status verbose)
| |
| * Which VoIPmonitor ports should be allowed through the firewall? (SIP: 5060/udp, RTP range, GUI: 80/tcp and 443/tcp, sensor management: 5029/tcp, Client-Server: 60024/tcp)
| |
| * How do I make firewall rules persistent across reboots? (For iptables: iptables-save > /etc/iptables/rules.v4 and install iptables-persistent; for firewalld: use --permanent flag)
| |
| * Why is time synchronization critical for packetbuffer_sender mode? (Host and server times must match for proper call correlation and packet processing; maximum allowed time difference: 2 seconds)
| |
| * How do I check NTP time synchronization after a reboot? (Use ntpstat or chronyc tracking; verify with ntpq -p or chronyc sources -v)
| |
| * How do I ensure all distributed sensors and central server have synchronized time? (Check timedatectl status on each system; ensure they use same NTP source and allow UDP 123)
| |
| * What is the correct tshark command to verify SIP/RTP traffic is reaching the VoIPmonitor sensor? (Use: tshark -i eth0 -Y "sip || rtp" -n)
| |
| * How do I diagnose why sniffer captures full audio on one call leg but no audio on the other leg?
| |
| * How do I use tcpdump to diagnose missing audio on one call leg?
| |
| * How do I compare tcpdump capture with the GUI's PCAP file?
| |
| * How do I determine if RTP packets are on the wire when one leg has no audio?
| |
| * What is the diagnostic workflow for audio missing on one call leg?
| |
| * How do I determine if audio issue is network/PBX problem vs VoIPmonitor configuration?
| |
| * How do I check if RTP packets for the silent leg are present on the wire?
| |
| * How do I verify if natalias is needed for NAT IP mismatch?
| |
| * What is the most common cause of one-way RTP capture when packets are present on the wire? (rtp_check_both_sides_by_sdp set to yes, strict, or very_strict)
| |
| * How does rtp_check_both_sides_by_sdp setting affect RTP capture? (Setting to yes requires both RTP sides to exactly match SDP; no allows matching based on single direction)
| |
| * What are the symptoms of restrictive rtp_check_both_sides_by_sdp settings? (Only one call leg in CDR,_RECEIVED packets 0 on one leg, tcpdump shows both streams but GUI captures only one)
| |
| * How do I check and change rtp_check_both_sides_by_sdp configuration? (Use grep to check setting in voipmonitor.conf; change to no and restart service)
| |
| * Why does rtp_check_both_sides_by_sdp yes cause one-way RTP issues? (Requires both sides to match SIP signaling, too strict for many environments)
| |
| * How do I diagnose whether one-way audio is a codec issue or network issue?
| |
| * How do I use tcpdump vs GUI PCAP comparison for troubleshooting?
| |
| * What should I do first when one call leg has missing or partial audio?
| |
| * How do I interpret tcpdump vs GUI capture comparison results?
| |
| * How do I check for codec/transcoding issues causing one-way audio?
| |
| * How do I configure VoIPmonitor to process non-call SIP messages like OPTIONS/NOTIFY/SUBSCRIBE?
| |
| * How do I check for VLAN tags in a pcap file?
| |
| * How do I detect ERSPAN or GRE tunnels with tshark?
| |
| * How do I check for VXLAN encapsulation in my capture?
| |
| * How do I identify TZSP packets in a pcap?
| |
| * Why does my BPF filter drop VLAN-tagged packets?
| |
| * Do I need promiscuous mode for ERSPAN or GRE tunnels?
| |
| * Why is VoIPmonitor not recording any calls?
| |
| * How can I check if VoIP traffic is reaching my sensor server?
| |
| * How do I enable promiscuous mode on my network card?
| |
| * What are the most common reasons for VoIPmonitor not capturing data?
| |
| * How do I filter tshark output for SIP INVITE messages?
| |
| * What is the correct tshark filter syntax to find a specific phone number?
| |
| * Why is my VoIPmonitor probe stopping processing calls?
| |
| * What does the "Skip" option in capture rules do?
| |
| * How do I check for OOM killer events in Linux?
| |
| * Why are CDRs missing for calls with large SIP packets?
| |
| * What does the snaplen parameter do in voipmonitor.conf?
| |
| * Traffic capture stopped with missing package error, what should I do?
| |
| * Which package is commonly missing on newly installed sensors?
| |
| * How do I fix a missing library dependency for VoIPmonitor sensor?
| |
| * How do I diagnose MTU-related packet loss?
| |
| * Why are my large SIP packets truncated even after increasing snaplen?
| |
| * How do I tell if packets are truncated by VoIPmonitor or by an external source?
| |
| * How do I fix Kamailio siptrace truncating large packets?
| |
| * What is HAProxy traffic tee and how can it help with packet truncation?
| |
| * Why does Kamailio report "Connection refused" when sending siptrace via TCP?
| |
| * How do I open a TCP listener on VoIPmonitor for Kamailio siptrace?
| |
| * How do I use socat to open a TCP listening port?
| |
| * How do I troubleshoot missing packets for specific IP addresses?
| |
| * Why are packets missing only during high-traffic periods?
| |
| * How do I use tcpdump to verify if packets reach the VoIPmonitor sensor?
| |
| * What should I check if tcpdump shows no traffic but the PBX is sending packets?
| |
| * How do I verify SPAN configuration is capturing bidirectional traffic?
| |
| * What is SPAN buffer saturation and how does it affect packet capture?
| |
| * How do I configure Cisco switch SPAN for bidirectional mirroring?
| |
| * Why are packets missing for specific IP addresses during peak hours?
| |
| * What is the difference between rx, tx, and both in SPAN configuration?
| |
| * How do I know if my SPAN buffer is overloading during high traffic?
| |
| * Why do some calls work but others miss packet legs for specific IPs?
| |
| * How do I verify SPAN source and destination ports are correct?
| |
| * How do I check if SPAN is configured for trunk mode on VLAN traffic?
| |
| * Do I need SPAN to capture both ingress and egress traffic?
| |
| * When should I check SPAN buffer capacity vs sensor t0CPU for packet drops?
| |
| * What should I do if FreeSWITCH sip_trace is truncating packets?
| |
| * Why are my probes disconnecting from the server with timeout errors?
| |
| * How do I diagnose probe timeout issues on high-performance networks?
| |
| * What causes intermittent probe timeout errors in client-server mode?
| |
| * How do I check for virtualization timing issues on VoIPmonitor probes?
| |
| * Why are there no CDRs even though tshark shows SIP OPTIONS/NOTIFY traffic?
| |
| * How do I enable sip-options, sip-message, sip-subscribe, sip-notify in voipmonitor.conf?
| |
| * What SIP methods are processed to generate CDRs vs non-call records?
| |
| * Why are RTP streams not displayed in the GUI for a specific provider?
| |
| * How do I use tcpdump to capture RTP packets during a test call?
| |
| * How do I diagnose missing RTP audio quality graphs for one provider?
| |
| * If SIP signaling works but RTP is missing for a specific provider, what should I check?
| |
| * Why is my VoIPmonitor server experiencing regular coredumps?
| |
| * How do I check for SQL queue overload causing server crashes?
| |
| * What does the SQLq metric in syslog indicate?
| |
| * How do I fix server coredumps caused by high-volume OPTIONS/SUBSCRIBE/NOTIFY processing?
| |
| * What is the mysqlstore_max_threads_sip_msg parameter and how do I tune it?
| |
| * How much should I set mysqlstore_max_threads_sip_msg to for high traffic?
| |
| * How do I disable SIP message types that are causing SQL queue overload?
| |
| * How do I optimize MySQL performance to prevent SQL queue-related coredumps?
| |
| * What is the recommended innodb_buffer_pool_size for VoIPmonitor servers?
| |
| * How do I set innodb_flush_log_at_trx_commit for better database write performance?
| |
| * What are the trade-offs when setting innodb_flush_log_at_trx_commit to 2?
| |
| * What causes error 1062 - Duplicate entry '16777215' for key 'PRIMARY' in VoIPmonitor?
| |
|
| |
|
| === KEY QUESTIONS (Lookup Table Error) === | | === Key Questions === |
| * What causes error 1062 - Duplicate entry '16777215' for key 'PRIMARY' in VoIPmonitor? (Lookup table hitting MEDIUMINT limit 16,777,215 due to too many unique SIP response strings) | | * Why are no calls being recorded in VoIPmonitor? |
| * How do I fix error 1062 in cdr_sip_response or cdr_reason tables? (Set cdr_reason_string_enable=no in voipmonitor.conf to disable storing SIP response reason strings, restart sniffer) | | * How to diagnose PACKETBUFFER FULL or memory is FULL error? |
| * What does error 16777215 mean in VoIPmonitor database? (Lookup table using MEDIUMINT UNSIGNED has reached its auto-increment limit) | | * How to determine if bottleneck is I/O or CPU? |
| * How do I check which lookup table has hit the auto-increment limit? (Query INFORMATION_SCHEMA for tables with AUTO_INCREMENT approaching 16,777,215) | | * What do heap values in syslog mean? |
| * Why are CDRs not being stored with error 1062? (Lookup table cdr_sip_response or cdr_reason cannot insert new unique entries) | | * What does t0CPU percentage indicate? |
| * Should I migrate cdr_sip_response table to BIGINT to fix error 1062? (No, the root cause is storing too many unique strings; configure cdr_reason_string_enable=no instead) | | * How to use sniffer_threads manager command? |
| * What is cdr_reason_string_enable in voipmonitor.conf? (Controls whether SIP response reason strings are stored in lookup tables - set to no to prevent 1062 errors) | | * How to connect to manager socket or port 5029? |
| * How do I prevent cdr_sip_response table overflow? (Disable SIP response text storage with cdr_reason_string_enable=no, or enable normalization) | | * What to do when t0 thread is at 100%? |
| * What is the difference between cdr table INT overflow and lookup table MEDIUMINT overflow? (cdr table at 4 billion rows vs lookup tables at 16.7 million - solved differently: cdr needs schema migration, lookup tables need configuration) | | * How to fix one-way audio or missing RTP? |
| * Do I need to ALTER TABLE to fix duplicate entry 16777215 error? (No, configure cdr_reason_string_enable=no to stop creating new unique entries) | | * How to configure natalias for NAT? |
| * When should I use TRUNCATE on cdr_sip_response table? (After disabling new entries, optionally clear existing data) | | * How to increase max_buffer_mem for high traffic? |
| * Does error 1062 happen in main cdr table or lookup tables? (Affects lookup tables like cdr_sip_response using MEDIUMINT, not main cdr table using INT/BIGINT) | | * How to disable jitterbuffer to save CPU? |
| * How do I enable cdr_reason_normalisation to reduce unique entries? (Set cdr_reason_normalisation=yes and cdr_sip_response_normalisation=yes in voipmonitor.conf) | | * What causes OOM kills of voipmonitor or MySQL? |
| | * How to check disk I/O performance with iostat? |
| | * How to enable promiscuous mode on interface? |
| | * How to fix packet ordering issues with SPAN? |
| | * What is Error 1062 duplicate entry? |
| | * How to verify traffic reaches capture interface? |