|
|
| (38 intermediate revisions by 2 users not shown) |
| Line 1: |
Line 1: |
| [[Category:Troubleshooting]]
| | = Sniffer Troubleshooting = |
| [[Category:Sniffer]]
| |
| {{DISPLAYTITLE:Troubleshooting: No Calls Being Sniffed}}
| |
|
| |
|
| '''This guide provides a systematic, step-by-step process to diagnose why the VoIPmonitor sensor might not be capturing any calls. Follow these steps in order to quickly identify and resolve the most common issues.'''
| | This page covers common VoIPmonitor sniffer/sensor problems organized by symptom. For configuration reference, see [[Sniffer_configuration]]. For performance tuning, see [[Scaling]]. |
|
| |
|
| == CRITICAL: First Diagnostic Decision -- Are Packets Reaching the Interface? == | | == Critical First Step: Is Traffic Reaching the Interface? == |
|
| |
|
| {{Warning|BEFORE making ANY configuration changes to ringbuffer, max_sip_packets, or other sensor tuning parameters, you MUST determine if the missing packets are reaching the network interface. | | {{Warning|Before any sensor tuning, verify packets are reaching the network interface. If packets aren't there, no amount of sensor configuration will help.}} |
|
| |
|
| Sensor tuning CANNOT fix network infrastructure issues (SPAN/mirroring problems, switch configuration, asymmetric routing).}}
| | <syntaxhighlight lang="bash"> |
| | # Check for SIP traffic on the capture interface |
| | tcpdump -i eth0 -nn "host <PROBLEMATIC_IP> and port 5060" -c 10 |
|
| |
|
| === Diagnostic Workflow ===
| | # If no packets: Network/SPAN issue - contact network admin |
| | # If packets visible: Proceed with sensor troubleshooting below |
| | </syntaxhighlight> |
|
| |
|
| {| class="wikitable" style="background:#e7f3ff; border:1px solid #3366cc;" | | <kroki lang="mermaid"> |
| | graph TD |
| | A[No Calls Recorded] --> B{Packets on interface?<br/>tcpdump -i eth0 port 5060} |
| | B -->|No packets| C[Network Issue] |
| | C --> C1[Check SPAN/mirror config] |
| | C --> C2[Verify VLAN tagging] |
| | C --> C3[Check cable/port] |
| | B -->|Packets visible| D[Sensor Issue] |
| | D --> D1[Check voipmonitor.conf] |
| | D --> D2[Check GUI Capture Rules] |
| | D --> D3[Check logs for errors] |
| | </kroki> |
| | |
| | == Quick Diagnostic Checklist == |
| | |
| | {| class="wikitable" |
| | |- |
| | ! Check !! Command !! Expected Result |
| | |- |
| | | Service running || <code>systemctl status voipmonitor</code> || Active (running) |
| |- | | |- |
| ! colspan="2" style="background:#3366cc; color: white;" | Critical Diagnostic Decision Matrix
| | | Traffic on interface || <code>tshark -i eth0 -c 5 -Y "sip"</code> || SIP packets displayed |
| |- | | |- |
| ! Step
| | | Interface errors || <code>ip -s link show eth0</code> || No RX errors/drops |
| ! Action
| |
| |- | | |- |
| | '''Step 1: Verify Packet Arrival''' | | | Promiscuous mode || <code>ip link show eth0</code> || PROMISC flag present |
| | Use tcpdump or tshark directly on the sensor's interface during the high-traffic period when packets are missing:<br/><br/><syntaxhighlight lang="bash">tcpdump -i eth0 -nn "host <PROBLEMATIC_IP> and port 5060" -v</syntaxhighlight><br/>Replace <code>eth0</code> with your interface and <code><PROBLEMATIC_IP></code> with the IP address where packets are missing. | |
| |- | | |- |
| | '''Step 2A: NO packets on interface''' | | | Logs || <code>tail -100 /var/log/syslog \| grep voip</code> || No critical errors |
| | '''Network Infrastructure Issue -- STOP tuning VoIPmonitor!'''<br/><br/>Proceed to [[#Step_3:_Troubleshoot_Network_and_Interface_Configuration|Step 3: Troubleshoot Network and Interface Configuration]].<br/><br/>Common causes: | |
| * SPAN/mirroring not configured for the source IP or subnet
| |
| * SPAN configured for only one direction (rx or tx, not both)
| |
| * Switch SPAN buffer saturation during high traffic
| |
| * Interface packet drops at kernel level (check with <code>ip -s -s l l</code>)
| |
| |- | | |- |
| | '''Step 2B: Packets ARE on interface''' | | | GUI rules || Settings → Capture Rules || No unexpected "Skip" rules |
| | '''Sensor Resource Bottleneck -- Proceed with tuning!'''<br/><br/>Proceed to [[#Step_4:_Check_the_VoIPmonitor_Configuration|Step 4: Check VoIPmonitor Configuration]] and sensor tuning sections.<br/><br/>If packets are consistently arriving but missing in the GUI, the issue is likely: | |
| * Insufficient ringbuffer size
| |
| * CPU bottleneck (t0CPU > 90%)
| |
| * Packet processing limits (max_sip_packets_in_call)
| |
| |} | | |} |
|
| |
|
| === Why This Matters === | | == No Calls Being Recorded == |
| | |
| When experiencing missing call legs or SIP packets during high-traffic periods for certain IP addresses:
| |
| | |
| * '''If packets are NOT on the interface:''' The issue is with your network switch's SPAN/mirroring configuration. Increasing VoIPmonitor's ringbuffer, max_sip_packets, or any other sensor setting will NOT help. You must fix the SPAN configuration to:
| |
| ** Include the problematic source IP/port
| |
| ** Capture BOTH inbound and outbound traffic (use <code>both</code> direction)
| |
| ** Verify switch port counters show no drops
| |
| | |
| * '''If packets ARE on the interface:''' The sensor is receiving traffic but cannot keep up. Now tuning parameters like <code>ringbuffer</code>, <code>max_sip_packets_in_call</code>, and increasing CPU/resources will help.
| |
| | |
| {{Note|This applies SPECIFICALLY to missing packets during high-traffic periods for certain IPs. If ALL traffic is missing, follow the standard troubleshooting flow from Step 1.}}
| |
| | |
| == Troubleshooting Flowchart ==
| |
| | |
| <kroki lang="mermaid">
| |
| %%{init: {'theme': 'base', 'flowchart': {'nodeSpacing': 10, 'rankSpacing': 25, 'curve': 'basis'}, 'themeVariables': {'fontSize': '11px'}}}%%
| |
| flowchart TD
| |
| A[Sensor Not Working Properly] --> B{Service Running?}
| |
| B -->|No| B1[systemctl restart voipmonitor]
| |
| B -->|Yes| C{Calls Missing?}
| |
| | |
| C -->|Yes| D{Step 2: Traffic on Interface?<br/>tshark -i eth0 -Y 'sip'}
| |
| D -->|No packets| E[Step 3: Network Issue]
| |
| E --> E1{Interface UP?}
| |
| E1 -->|No| E2[ip link set dev eth0 up]
| |
| E1 -->|Yes| E3{SPAN/RSPAN?}
| |
| E3 -->|Yes| E4[Enable promisc mode]
| |
| E3 -->|ERSPAN/GRE/TZSP| E5[Check tunnel config]
| |
| | |
| D -->|Packets visible| F[Step 4: VoIPmonitor Config]
| |
| F --> F1{interface correct?}
| |
| F1 -->|No| F2[Fix interface in voipmonitor.conf]
| |
| F1 -->|Yes| F3{sipport correct?}
| |
| F3 -->|No| F4[Add port: sipport = 5060,5080]
| |
| F3 -->|Yes| F5{BPF filter blocking?}
| |
| F5 -->|Maybe| F6[Comment out filter directive]
| |
| | |
| F5 -->|No| G[Step 5: GUI Capture Rules]
| |
| G --> G1{Rules with Skip: ON?}
| |
| G1 -->|Yes| G2[Remove/modify rules + reload sniffer]
| |
| G1 -->|No| H[Step 6: Check Logs]
| |
| | |
| H --> I{OOM or Packetbuffer?}
| |
| I -->|OOM Events| I1[Step 7: Add RAM / tune MySQL]
| |
| I -->|PACKETBUFFER FULL| I2[Step 8: Increase threading and max_buffer_mem]
| |
| I -->|Neither| J{Large SIP packets?}
| |
| | |
| J -->|Yes| J1{External SIP source?<br/>Kamailio/HAProxy mirror}
| |
| J1 -->|No| J2[Step 9: Increase snaplen]
| |
| J1 -->|Yes| J3[Fix external source: Kamailio siptrace or HAProxy tee]
| |
| J -->|No| K[Contact Support]
| |
| </kroki>
| |
| | |
| == Post-Reboot Verification Checklist ==
| |
| After a planned server reboot, verify these critical items to ensure VoIPmonitor operates correctly. This check helps identify issues that may occur when configurations are not persisted across reboots.
| |
| | |
| === Verify Firewall/Iptables Rules ===
| |
| | |
| After a system restart, verify that firewall rules have been correctly applied and are allowing necessary traffic. Firewall rules may need to be manually re-applied if they were not made persistent.
| |
| | |
| ;1. Check current firewall status:
| |
| <syntaxhighlight lang="bash">
| |
| # For systems using iptables
| |
| iptables -L -n -v
| |
| | |
| # For systems using firewalld
| |
| firewall-cmd --list-all
| |
| | |
| # For systems using ufw
| |
| ufw status verbose
| |
| </syntaxhighlight>
| |
| | |
| ;2. Verify critical ports are allowed:
| |
| Ensure the firewall permits traffic on the following VoIPmonitor ports:
| |
| * SIP ports (default: 5060/udp, or your configured sipport values)
| |
| * RTP ports (range used by your PBX)
| |
| * GUI access (typically: 80/tcp, 443/tcp)
| |
| * Sensor management port: 5029/tcp
| |
| * Client-Server connection port: 60024/tcp (for distributed setups)
| |
| | |
| ;3. Make firewall rules persistent:
| |
| To prevent firewall rules from being lost after future reboots:
| |
| | |
| '''For iptables (Debian/Ubuntu):'''
| |
| <syntaxhighlight lang="bash">
| |
| # Save current rules
| |
| iptables-save > /etc/iptables/rules.v4
| |
| # Install persistent package if not present
| |
| apt-get install iptables-persistent
| |
| </syntaxhighlight>
| |
| | |
| '''For firewalld (CentOS/RHEL):'''
| |
| <syntaxhighlight lang="bash">
| |
| # Runtime rules automatically persist with --permanent flag
| |
| firewall-cmd --permanent --add-port=5060/udp
| |
| firewall-cmd --permanent --add-port=60024/tcp
| |
| firewall-cmd --reload
| |
| </syntaxhighlight>
| |
| | |
| === Verify System Time Synchronization ===
| |
| | |
| Correct system time synchronization is '''critical''', especially when using the <code>packetbuffer_sender</code> option in distributed architectures. Time mismatches between hosts and servers can cause call correlation failures and dropped packets.
| |
| | |
| ;1. Check current NTP/chrony status:
| |
| <syntaxhighlight lang="bash">
| |
| # For systems using NTP
| |
| ntpstat
| |
| | |
| # For systems using chrony
| |
| chronyc tracking
| |
| </syntaxhighlight>
| |
| | |
| ;2. Verify time synchronization with servers:
| |
| <syntaxhighlight lang="bash">
| |
| # For NTP
| |
| ntpq -p
| |
| | |
| # For chrony
| |
| chronyc sources -v
| |
| </syntaxhighlight>
| |
| | |
| '''Expected output:''' Time offset should be minimal (ideally under 100 milliseconds). Large offsets (several seconds) indicate synchronization problems.
| |
| | |
| ;3. Manual sync if needed (temporary fix):
| |
| <syntaxhighlight lang="bash">
| |
| # Force immediate NTP sync
| |
| sudo systemctl restart ntp
| |
| | |
| # For chrony
| |
| sudo chronyc makestep
| |
| </syntaxhighlight>
| |
| | |
| '''Critical for packetbuffer_sender mode:''' When using <code>packetbuffer_sender=yes</code> to forward raw packets from remote sensors to a central server, the host and server '''must have synchronized times'''. VoIPmonitor requires host and server times to match for proper call correlation and packet processing. Maximum allowed time difference is 2 seconds by default (configurable via <code>client_server_connect_maximum_time_diff_s</code>).
| |
| | |
| ;4. Check distributed architecture time sync:
| |
| In Client-Server mode, ensure all sensors and the central server are synchronized to the same NTP servers:
| |
| | |
| <syntaxhighlight lang="bash">
| |
| # On each sensor and central server
| |
| timedatectl status
| |
| </syntaxhighlight>
| |
| | |
| Look for: <code>System clock synchronized: yes</code>
| |
|
| |
|
| If times are not synchronized across distributed components:
| | === Service Not Running === |
| * Verify all systems point to the same reliable NTP source
| |
| * Check firewall allows UDP port 123 (NTP)
| |
| * Ensure timezones are consistent across all systems
| |
|
| |
|
| '''Troubleshooting time sync issues:'''
| |
| * Check firewall rules allow NTP (UDP port 123)
| |
| * Verify NTP servers are reachable: <code>ping pool.ntp.org</code>
| |
| * Review NTP configuration: <code>/etc/ntp.conf</code> or <code>/etc/chrony.conf</code>
| |
| * Ensure time service is enabled to start on boot: <code>systemctl enable ntp</code>
| |
|
| |
| == Step 1: Is the VoIPmonitor Service Running Correctly? ==
| |
| First, confirm that the sensor process is active and loaded the correct configuration file.
| |
|
| |
| ;1. Check the service status (for modern systemd systems):
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| | # Check status |
| systemctl status voipmonitor | | systemctl status voipmonitor |
| </syntaxhighlight>
| |
| Look for a line that says <code>Active: active (running)</code>. If it is inactive or failed, try restarting it with <code>systemctl restart voipmonitor</code> and check the status again.
| |
|
| |
| ;2. Verify the running process:
| |
| <syntaxhighlight lang="bash">
| |
| ps aux | grep voipmonitor
| |
| </syntaxhighlight>
| |
| This command will show the running process and the exact command line arguments it was started with. Critically, ensure it is using the correct configuration file, for example: <code>--config-file /etc/voipmonitor.conf</code>. If it is not, there may be an issue with your startup script.
| |
|
| |
| === Troubleshooting: Missing Package or Library Dependencies ===
| |
|
| |
| If the sensor service fails to start or crashes immediately with an error about a "missing package" or "missing library," it indicates that a required system dependency is not installed on the server. This is most common on newly installed sensors or fresh operating system installations.
| |
|
| |
| ;1. Check the system logs for the specific error message:
| |
| <syntaxhighlight lang="bash">
| |
| # For Debian/Ubuntu
| |
| tail -f /var/log/syslog | grep voipmonitor
| |
|
| |
| # For CentOS/RHEL/AlmaLinux or systemd systems
| |
| journalctl -u voipmonitor -f
| |
| </syntaxhighlight>
| |
|
| |
| ;2. Common missing packages for sensors:
| |
| Most sensor missing package issues are resolved by installing the <code>rrdtools</code> package. This is required for RRD (Round-Robin Database) graphing and statistics functionality.
| |
|
| |
| <syntaxhighlight lang="bash">
| |
| # For Debian/Ubuntu
| |
| apt-get update && apt-get install rrdtool
| |
|
| |
| # For CentOS/RHEL/AlmaLinux
| |
| yum install rrdtool
| |
| # OR
| |
| dnf install rrdtool
| |
| </syntaxhighlight>
| |
|
| |
| ;3. Other frequently missing dependencies:
| |
| If the error references a specific shared library or binary, install it using your package manager. Common examples:
| |
|
| |
|
| * <code>libpcap</code> or <code>libpcap-dev</code>: Packet capture library
| | # View recent logs |
| * <code>libssl</code> or <code>libssl-dev</code>: SSL/TLS support
| | journalctl -u voipmonitor --since "10 minutes ago" |
| * <code>zlib</code> or <code>zlib1g-dev</code>: Compression library
| |
|
| |
|
| ;4. Verify shared library dependencies:
| | # Start/restart |
| If the error mentions a specific shared library (e.g., <code>error while loading shared libraries: libxxx.so</code>), check which libraries the binary is trying to load:
| |
| | |
| <syntaxhighlight lang="bash">
| |
| ldd /usr/local/sbin/voipmonitor | grep pcap
| |
| </syntaxhighlight>
| |
| | |
| If <code>ldd</code> reports "not found," install the missing library using your package manager.
| |
| | |
| ;5. After installing the missing package, restart the sensor service:
| |
| <syntaxhighlight lang="bash">
| |
| systemctl restart voipmonitor | | systemctl restart voipmonitor |
| systemctl status voipmonitor
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Verify the service starts successfully and is now <code>Active: active (running)</code>.
| | Common startup failures: |
| | * '''Interface not found''': Check <code>interface</code> in voipmonitor.conf matches <code>ip a</code> output |
| | * '''Port already in use''': Another process using the management port |
| | * '''License issue''': Check [[License]] for activation problems |
|
| |
|
| === Troubleshooting: Cron Daemon Not Running === | | === Wrong Interface or Port Configuration === |
|
| |
|
| If the VoIPmonitor sniffer service fails to start or the sensor appears unavailable despite the systemd service being configured correctly, the cron daemon may not be running. Some VoIPmonitor deployment methods and maintenance scripts rely on cron for proper initialization and periodic tasks.
| |
|
| |
| ;1. Check the cron daemon status:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| systemctl status cron
| | # Check current config |
| </syntaxhighlight>
| | grep -E "^interface|^sipport" /etc/voipmonitor.conf |
|
| |
|
| Look for <code>Active: active (running)</code>. If the status shows inactive or failed, the cron daemon is not running.
| | # Example correct config: |
| | | # interface = eth0 |
| ;2. Alternative check for systems using cron (not crond):
| | # sipport = 5060 |
| <syntaxhighlight lang="bash">
| |
| systemctl status crond
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Note: On CentOS/RHEL systems, the service is typically named <code>crond</code>, while on Debian/Ubuntu systems it is named <code>cron</code>.
| | {{Tip|For multiple SIP ports: <code>sipport = 5060,5061,5080</code>}} |
|
| |
|
| ;3. Start the cron daemon if it is inactive:
| | === GUI Capture Rules Blocking === |
| <syntaxhighlight lang="bash">
| |
| # For Debian/Ubuntu systems
| |
| systemctl start cron
| |
|
| |
|
| # For CentOS/RHEL/AlmaLinux systems
| | Navigate to '''Settings → Capture Rules''' and check for rules with action "Skip" that may be blocking calls. Rules are processed in order - a Skip rule early in the list will block matching calls. |
| systemctl start crond
| |
| </syntaxhighlight>
| |
|
| |
|
| ;4. Enable cron to start automatically on boot:
| | See [[Capture_rules]] for detailed configuration. |
| <syntaxhighlight lang="bash">
| |
| # For Debian/Ubuntu systems
| |
| systemctl enable cron
| |
|
| |
|
| # For CentOS/RHEL/AlmaLinux systems
| | === SPAN/Mirror Not Configured === |
| systemctl enable crond
| |
| </syntaxhighlight>
| |
|
| |
|
| ;5. After starting the cron daemon, restart the VoIPmonitor service:
| | If <code>tcpdump</code> shows no traffic: |
| <syntaxhighlight lang="bash"> | | # Verify switch SPAN/mirror port configuration |
| systemctl restart voipmonitor
| | # Check that both directions (ingress + egress) are mirrored |
| systemctl status voipmonitor
| | # Confirm VLAN tagging is preserved if needed |
| </syntaxhighlight> | | # Test physical connectivity (cable, port status) |
| | |
| Verify the service now shows <code>Active: active (running)</code> and the sensor becomes visible in the GUI.
| |
|
| |
|
| === Root Cause ===
| | See [[Sniffing_modes]] for SPAN, RSPAN, and ERSPAN configuration. |
| The cron daemon being inactive can prevent VoIPmonitor from starting properly in scenarios where:
| |
| * Installation scripts use cron for post-install configuration
| |
| * Maintenance or cleanup jobs are required for proper sensor operation
| |
| * System initialization processes depend on cron-based tasks
| |
| * The sensor was recently rebooted or upgraded and cron failed to start
| |
|
| |
|
| === Long-Term Stability === | | === Filter Parameter Too Restrictive === |
| If the cron daemon is consistently failing to start after reboots:
| |
| * Check system logs for cron startup errors: <code>journalctl -u cron -n 50</code> or <code>journalctl -u crond -n 50</code>
| |
| * Verify that the server has sufficient resources (CPU, memory) to run all required system services
| |
| * Investigate performance bottlenecks that may be causing system services to fail to start
| |
| * Ensure no other system services are conflicting or preventing cron from starting
| |
|
| |
|
| == Step 2: Is Network Traffic Reaching the Server? ==
| | If <code>filter</code> is set in voipmonitor.conf, it may exclude traffic: |
| If the service is running, the next step is to verify if the VoIP packets (SIP/RTP) are actually arriving at the server's network interface. The best tool for this is <code>tshark</code> (the command-line version of Wireshark). | |
|
| |
|
| ;1. Install tshark:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # For Debian/Ubuntu | | # Check filter |
| apt-get update && apt-get install tshark
| | grep "^filter" /etc/voipmonitor.conf |
|
| |
|
| # For CentOS/RHEL/AlmaLinux | | # Temporarily disable to test |
| yum install wireshark
| | # Comment out the filter line and restart |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| ;2. Listen for SIP traffic on the correct interface:
| |
| Replace <code>eth0</code> with the interface name you have configured in <code>voipmonitor.conf</code>.
| |
| <syntaxhighlight lang="bash">
| |
| tshark -i eth0 -Y "sip || rtp" -n
| |
| </syntaxhighlight>
| |
| * '''If you see a continuous stream of SIP and RTP packets''', it means traffic is reaching the server, and the problem is likely in VoIPmonitor's configuration (see Step 4).
| |
| * '''If you see NO packets''', the problem lies with your network configuration. Proceed to Step 3.
| |
|
| |
|
| == Step 3: Troubleshoot Network and Interface Configuration ==
| |
| If <code>tshark</code> shows no traffic, it means the packets are not being delivered to the operating system correctly.
| |
|
| |
|
| ;1. Check if the interface is UP:
| | ==== Missing id_sensor Parameter ==== |
| Ensure the network interface is active.
| |
| <syntaxhighlight lang="bash">
| |
| ip link show eth0
| |
| </syntaxhighlight>
| |
| The output should contain the word <code>UP</code>. If it doesn't, bring it up with:
| |
| <syntaxhighlight lang="bash">
| |
| ip link set dev eth0 up
| |
| </syntaxhighlight>
| |
|
| |
|
| ;2. Check for Interface Packet Drops:
| | '''Symptom''': SIP packets visible in Capture/PCAP section but missing from CDR, SIP messages, and Call flow. |
| If calls are missing, showing "000" as the last response, or have silent audio, the root cause may be packets being dropped at the '''network interface level''' BEFORE they reach VoIPmonitor. This is different from sensor resource limitations.
| |
|
| |
|
| Check the interface statistics for packet drops on the sniffing interface:
| | '''Cause''': The <code>id_sensor</code> parameter is not configured or is missing. This parameter is required to associate captured packets with the CDR database. |
|
| |
|
| | '''Solution''': |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Check detailed interface statistics, packet errors, and drops | | # Check if id_sensor is set |
| ip -s -s l l eth0
| | grep "^id_sensor" /etc/voipmonitor.conf |
| </syntaxhighlight>
| |
|
| |
|
| The output shows RX (receive) and TX (transmit) statistics. Look specifically at the <code>dropped</code> counter under the <code>RX</code> section:
| | # Add or correct the parameter |
| | echo "id_sensor = 1" >> /etc/voipmonitor.conf |
|
| |
|
| <syntaxhighlight lang="text">
| | # Restart the service |
| 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
| | systemctl restart voipmonitor |
| link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
| |
| RX: bytes packets errors dropped overrun mcast
| |
| 12345123 45678 12 5432 0 0
| |
| TX: bytes packets errors dropped carrier collsns
| |
| 9876543 23456 5 0 0 0
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| {| class="wikitable" style="background:#fff3cd; border:1px solid #ffc107;" | | {{Tip|Use a unique numeric identifier (1-65535) for each sensor. Essential for multi-sensor deployments. See [[Sniffer_configuration#id_sensor|id_sensor documentation]].}} |
| |-
| | == Missing Audio / RTP Issues == |
| ! colspan="2" style="background:#ffc107;" | Critical: Interface Packet Drops vs Sensor Drops
| |
| |-
| |
| | style="vertical-align: top;" | '''Interface drops (kernel level):'''
| |
| | Packets dropped by the network card driver BEFORE reaching VoIPmonitor. Use <code>ip -s -s l l</code> to check. Root cause is network infrastructure: switch overload, duplex mismatch, faulty NIC/switch port.
| |
| |-
| |
| | style="vertical-align: top;" | '''Sensor drops (VoIPmonitor level):'''
| |
| | Packets received by the OS but dropped by VoIPmonitor due to high CPU load, insufficient ringbuffer, or configuration limits. Check the "# packet drops" counter in GUI Settings → Sensors and <code>t0CPU</code> metric in logs.
| |
| |}
| |
|
| |
|
| ;Diagnosing Interface Packet Drops:
| | === One-Way Audio (Asymmetric Mirroring) === |
|
| |
|
| '''Step 1: Check if the dropped counter is increasing:''' | | '''Symptom''': SIP recorded but only one RTP direction captured. |
|
| |
|
| Run the interface statistics command multiple times while making test calls:
| | '''Cause''': SPAN port configured for only one direction. |
|
| |
|
| | '''Diagnosis''': |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # First measurement | | # Count RTP packets per direction |
| ip -s -s l l eth0 | grep -A 1 "RX:"
| | tshark -i eth0 -Y "rtp" -T fields -e ip.src -e ip.dst | sort | uniq -c |
| | |
| # Make a test call during which you expect to see dropped packets
| |
| | |
| # Second measurement 10-30 seconds later
| |
| ip -s -s l l eth0 | grep -A 1 "RX:" | |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| If the <code>dropped</code> value increases between measurements during test calls, the interface is losing packets due to infrastructure issues. | | If one direction shows 0 or very few packets, configure the switch to mirror both ingress and egress traffic. |
|
| |
|
| '''Step 2: Identify the root cause of interface packet drops:'''
| | === RTP Not Associated with Call === |
|
| |
|
| Common causes of network interface packet drops:
| | '''Symptom''': Audio plays in sniffer but not in GUI, or RTP listed under wrong call. |
|
| |
|
| * '''Network switch port overload:''' The switch port connected to the VoIPmonitor sensor is receiving more traffic than it can forward to the sensor. This is common during peak traffic hours.
| | '''Possible causes''': |
| * '''Duplex/speed mismatch:''' The server NIC and switch port are configured with mismatched speed or duplex settings (e.g., NIC set to 100Mbps/half-duplex while switch is 1Gbps/full-duplex).
| |
| * '''Faulty hardware:''' Defective network interface card (NIC) or a damaged switch port.
| |
|
| |
|
| ;Specific diagnostic actions:
| | '''1. SIP and RTP on different interfaces/VLANs''': |
| | | <syntaxhighlight lang="ini"> |
| * '''Check duplex/speed negotiation:'''
| | # voipmonitor.conf - enable automatic RTP association |
| <syntaxhighlight lang="bash"> | | auto_enable_use_blocks = yes |
| # Check the current speed and duplex setting | |
| ethtool eth0
| |
| </syntaxhighlight> | | </syntaxhighlight> |
| Look for the <code>Speed</code> and <code>Duplex</code> values. Ensure they match the switch port configuration (e.g., Speed: 1000Mb/s, Duplex: Full).
| |
|
| |
|
| * '''Check for driver errors:'''
| | '''2. NAT not configured''': |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="ini"> |
| # Look for NIC driver messages that indicate hardware issues | | # voipmonitor.conf - for NAT scenarios |
| dmesg | grep -i eth0 | tail -50
| | natalias = <public_ip> <private_ip> |
| </syntaxhighlight> | |
| | |
| * '''Test with a different network interface or cable:'''
| |
| If the issue persists after verifying duplex/speed, try connecting the sensor to a different switch port or using a different cable to rule out hardware faults.
| |
| | |
| ;Step 3: Verify bidirectional SIP traffic in SPAN/mirroring configuration:
| |
| | |
| Even if the interface shows no packet drops, verify that your network switch SPAN/mirror configuration is sending '''BOTH directions''' of SIP traffic to the sniffing interface. Missing one direction causes incomplete CDRs and incorrect Last Response tracking.
| |
| | |
| Use <code>tshark</code> during a test call to verify bidirectional SIP flow:
| |
|
| |
|
| <syntaxhighlight lang="bash">
| | # If not working, try reversed order: |
| # Monitor INVITE requests and their responses to confirm bidirectional flow | | natalias = <private_ip> <public_ip> |
| # Replace eth0 with your sniffing interface
| |
| tshark -i eth0 -Y 'sip.CSeq.method == "INVITE" || sip.Status-Code' -n
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| If you see INVITE requests but NO corresponding responses (like 200 OK, 404, 500), your SPAN/mirror configuration is only capturing one direction of traffic. This requires network switch configuration changes:
| | '''3. External device modifying media ports''': |
| | |
| * '''Cisco switches:''' Verify SPAN source includes <code>both</code> direction:
| |
| <syntaxhighlight lang="bash">
| |
| show running-config | include monitor session
| |
| # Should include: monitor session 1 source interface GigabitEthernet1/1 both
| |
| </syntaxhighlight>
| |
| | |
| * '''Other switch vendors:''' Refer to switch documentation for SPAN mirroring direction configuration.
| |
| | |
| ;Summary of workflow when interface packet drops are detected:
| |
| | |
| # Use <code>ip -s -s l l</code> to check for increasing <code>dropped</code> counter
| |
| # Confirm drops occur during test calls with repeated measurements
| |
| # Check duplex/speed with <code>ethtool</code>
| |
| # Verify switch port configuration matches NIC settings
| |
| # Check for hardware faults (different port, different cable)
| |
| # Verify SPAN/mirror sends both directions of SIP traffic with <code>tshark -Y sip.CSeq.method == "INVITE"</code>
| |
| # Resolve the underlying network infrastructure issue before tuning VoIPmonitor configuration
| |
| | |
| {{Warning|Interface packet drops cannot be fixed with VoIPmonitor configuration changes. Increasing ringbuffer, adjusting pcap_queue_deque_window_length, or other sniffer tuning will NOT resolve packet drops at the kernel/interface level. You must fix the network infrastructure first.}}
| |
| | |
| ;3B. Troubleshooting: Asymmetric Traffic Mirroring Across Multiple Interfaces or Hosts
| |
| | |
| If SIP packets are visible on the network but CDRs are not appearing in the GUI, the issue may be that SIP requests and responses are being mirrored to '''different interfaces''' or '''different sniffer hosts''', rather than a single interface monitored by a single voipmonitor instance.
| |
| | |
| === Diagnosis: Check Traffic on Each Interface and Host ===
| |
| | |
| When investigating mirroring issues, you must verify packet flow on '''every network interface''' of '''each sniffer host'''. A common misconfiguration occurs when:
| |
| | |
| * SIP requests (INVITE) are mirrored to interface A on sensor host 1
| |
| * SIP responses (200 OK) are mirrored to interface B on sensor host 1
| |
| * Or requests go to sensor host 1 and responses go to sensor host 2
| |
| | |
| This asymmetric mirroring prevents voipmonitor from correlating requests with responses, resulting in missing or incomplete CDRs.
| |
| | |
| === Step 1: Identify All Sniffing Interfaces and Hosts ===
| |
| | |
| First, identify all interfaces configured for mirroring and all sensor hosts in your deployment:
| |
|
| |
|
| | If SDP advertises one port but RTP arrives on different port (SBC/media server issue): |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # On each sniffer host, list all network interfaces | | # Compare SDP ports vs actual RTP |
| ip a | grep -E "^[0-9]+:|state UP"
| | tshark -r call.pcap -Y "sip.Method == INVITE" -V | grep "m=audio" |
| | | tshark -r call.pcap -Y "rtp" -T fields -e udp.dstport | sort -u |
| # Check which interfaces voipmonitor is configured to use
| |
| grep "^interface" /etc/voipmonitor.conf
| |
| | |
| # Check for multiple voipmonitor instances running on the same host
| |
| ps aux | grep voipmonitor
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| === Step 2: Capture Traffic on Each Interface Simultaneously === | | If ports don't match, the external device must be configured to preserve SDP ports - VoIPmonitor cannot compensate. |
| | === RTP Incorrectly Associated with Wrong Call (PBX Port Reuse) === |
| | '''Symptom''': RTP streams from one call appear associated with a different CDR when your PBX aggressively reuses the same IP:port across multiple calls. |
|
| |
|
| For each interface on each sniffer host, run a separate tcpdump capture during a test call:
| | '''Cause''': When PBX reuses media ports, VoIPmonitor may incorrectly correlate RTP packets to the wrong call based on weaker correlation methods. |
|
| |
|
| <syntaxhighlight lang="bash"> | | '''Solution''': Enable <code>rtp_check_both_sides_by_sdp</code> to require verification of both source and destination IP:port against SDP: |
| # On sensor host 1, interface eth0: | | <syntaxhighlight lang="ini"> |
| tcpdump -i eth0 -nn "sip" -w /tmp/sensor1_eth0.pcap &
| | # voipmonitor.conf - require both source and destination to match SDP |
| | rtp_check_both_sides_by_sdp = yes |
|
| |
|
| # On sensor host 1, interface eth1 (if exists): | | # Alternative (strict) mode - allows initial unverified packets |
| tcpdump -i eth1 -nn "sip" -w /tmp/sensor1_eth1.pcap &
| | rtp_check_both_sides_by_sdp = strict |
| | |
| # On sensor host 2, interface eth0:
| |
| tcpdump -i eth0 -nn "sip" -w /tmp/sensor2_eth0.pcap &
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Make a test call and stop the captures after 10-30 seconds.
| | {{Warning|Enabling this may prevent RTP association for calls using NAT, as the source IP:port will not match the SDP. Use <code>natalias</code> mappings or the <code>strict</code> setting to mitigate this.}} |
| | === Snaplen Truncation === |
|
| |
|
| === Step 3: Analyze Captures to Detect Asymmetric Mirroring ===
| | '''Symptom''': Large SIP messages truncated, incomplete headers. |
|
| |
|
| Using tshark, check which interface received which part of the SIP dialog:
| | '''Solution''': |
| | | <syntaxhighlight lang="ini"> |
| <syntaxhighlight lang="bash"> | | # voipmonitor.conf - increase packet capture size |
| # Check capture 1 for INVITE requests | | snaplen = 8192 |
| tshark -r /tmp/sensor1_eth0.pcap -Y "sip.CSeq.method == INVITE" -T fields -e sip.Call-ID
| |
| | |
| # Check capture 1 for SIP responses
| |
| tshark -r /tmp/sensor1_eth0.pcap -Y "sip.Status-Code" -T fields -e sip.Call-ID
| |
| | |
| # Check capture 2 for INVITE requests
| |
| tshark -r /tmp/sensor1_eth1.pcap -Y "sip.CSeq.method == INVITE" -T fields -e sip.Call-ID
| |
| | |
| # Check capture 2 for SIP responses
| |
| tshark -r /tmp/sensor1_eth1.pcap -Y "sip.Status-Code" -T fields -e sip.Call-ID
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Compare the Call-IDs across captures. If you observe:
| | For Kamailio siptrace, also check <code>trace_msg_fragment_size</code> in Kamailio config. See [[Sniffer_configuration#snaplen|snaplen documentation]]. |
|
| |
|
| * INVITEs appear in capture 1, but responses for the same Call-ID appear only in capture 2
| | == PACKETBUFFER Saturation == |
| * Or INVITEs and responses are split across different hosts
| |
| * Or any combination where the complete SIP dialog is not on a single interface
| |
|
| |
|
| Then your mirroring configuration is asymmetric.
| | '''Symptom''': Log shows <code>PACKETBUFFER: memory is FULL</code>, truncated RTP recordings. |
|
| |
|
| === Step 4: Count Packets Per Interface ===
| | {{Warning|This alert refers to VoIPmonitor's '''internal packet buffer''' (<code>max_buffer_mem</code>), '''NOT system RAM'''. High system memory availability does not prevent this error. The root cause is always a downstream bottleneck (disk I/O or CPU) preventing packets from being processed fast enough.}} |
|
| |
|
| To quantify the asymmetry, count the SIP packets on each interface:
| | '''Before testing solutions''', gather diagnostic data: |
| | * Check sensor logs: <code>/var/log/syslog</code> (Debian/Ubuntu) or <code>/var/log/messages</code> (RHEL/CentOS) |
| | * Generate debug log via GUI: '''Tools → Generate debug log''' |
|
| |
|
| <syntaxhighlight lang="bash">
| | === Diagnose: I/O vs CPU Bottleneck === |
| # Count total SIP packets on each interface
| |
| tshark -r /tmp/sensor1_eth0.pcap -Y "sip" | wc -l
| |
| tshark -r /tmp/sensor1_eth1.pcap -Y "sip" | wc -l
| |
| tshark -r /tmp/sensor2_eth0.pcap -Y "sip" | wc -l
| |
| </syntaxhighlight>
| |
|
| |
|
| If voipmonitor is running on only one interface but traffic is split across two, you will see partial SIP dialogs. The interface receiving only requests or only responses will not generate complete CDRs.
| | {{Warning|Do not guess the bottleneck source. Use proper diagnostics first to identify whether the issue is disk I/O, CPU, or database-related. Disabling storage as a test is valid but should be used to '''confirm''' findings, not as the primary diagnostic method.}} |
|
| |
|
| === Solution: Correct Mirroring Configuration for Symmetric Traffic Flow === | | ==== Step 1: Check IO[] Metrics (v2026.01.3+) ==== |
|
| |
|
| VoIPmonitor requires the '''complete SIP session (both requests and responses)''' to be mirrored to a '''SINGLE network interface''' monitored by a '''SINGLE voipmonitor sniffer instance'''.
| | '''Starting with version 2026.01.3''', VoIPmonitor includes built-in disk I/O monitoring that directly shows disk saturation status: |
|
| |
|
| ==== Identify the Correct Source for Complete Mirroring ====
| | <syntaxhighlight lang="text"> |
| | | [283.4/283.4Mb/s] IO[B1.1|L0.7|U45|C75|W125|R10|WI1.2k|RI0.5k] |
| 1. Determine which switch port(s), interface(s), or VLAN(s) carry the traffic you want to monitor
| |
| 2. Trace the network path to understand where SIP requests and responses converge
| |
| 3. Find a single point in your network where you can capture the bidirectional traffic
| |
| | |
| ==== Configure SPAN/Mirror to Single Destination ====
| |
| | |
| Update your network switch's port mirroring (SPAN) configuration:
| |
| | |
| * Cisco switches example:
| |
| <syntaxhighlight lang="bash"> | |
| # Monitor both inbound and outbound traffic from source port
| |
| # Send complete session to a SINGLE destination port
| |
| monitor session 1 source interface GigabitEthernet1/1 both
| |
| monitor session 1 destination interface GigabitEthernet1/2
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| * Ensure the SPAN source uses the <code>both</code> keyword to capture bidirectional traffic
| | '''Quick interpretation:''' |
| * Ensure the SPAN destination is a single port connected to one voipmonitor sensor interface
| | {| class="wikitable" |
| | |
| ==== Multi-Host Deployments ====
| |
| | |
| If you must monitor traffic that is split across multiple network segments or hosts:
| |
| | |
| 1. Use VoIPmonitor's '''Client-Server''' mode (distributed architecture) instead of multiple independent sniffers
| |
| 2. Configure remote probes to forward all packets to one central analysis server
| |
| 3. See [[Sniffer_distributed_architecture|Distributed Architecture: Client-Server Mode]] for setup details
| |
| | |
| {| class="wikitable" style="background:#fff3cd; border:1px solid #ffc107;" | |
| |-
| |
| ! colspan="2" style="background:#ffc107;" | Asymmetric Mirroring Prevention Best Practices
| |
| |- | | |- |
| | style="vertical-align: top;" | '''Single Interface Rule:'''
| | ! Metric !! Meaning !! Problem Indicator |
| | Always ensure complete SIP sessions (requests + responses) go to one monitored interface on one sensor instance.
| |
| |- | | |- |
| | style="vertical-align: top;" | '''Verification Method:''' | | | '''C''' (Capacity) || % of disk's sustainable throughput used || '''C ≥ 80% = Warning''', '''C ≥ 95% = Saturated''' |
| | During installation, use tshark on the target interface to verify INVITEs AND responses have matching Call-IDs.
| |
| |- | | |- |
| | style="vertical-align: top;" | '''Switch SPAN Configuration:'''
| | | '''L''' (Latency) || Current write latency in ms || '''L ≥ 3× B''' (baseline) = Saturated |
| | Use <code>both</code> direction in SPAN/mirror commands. Verify no duplicate or split destination ports. | |
| |- | | |- |
| | style="vertical-align: top;" | '''Avoid Multiple Sniffers:''' | | | '''U''' (Utilization) || % time disk is busy || '''U > 90%''' = Disk at limit |
| | Do not run multiple independent voipmonitor instances on different interfaces unless using Client-Server mode with packet forwarding.
| |
| |} | | |} |
|
| |
|
| === Step 5: Verify Fix After Mirroring Changes === | | '''If you see <code>DISK_SAT</code> or <code>WARN</code> after IO[]:''' |
| | <syntaxhighlight lang="text"> |
| | IO[B1.1|L8.5|U98|C97|W890|R5|WI12.5k|RI0.1k] DISK_SAT |
| | </syntaxhighlight> |
|
| |
|
| After correcting the network mirroring configuration:
| | → This confirms I/O bottleneck. Skip to [[#Solution:_I.2FO_Bottleneck|I/O Bottleneck Solutions]]. |
|
| |
|
| 1. Restart the voipmonitor service: <code>systemctl restart voipmonitor</code>
| | '''For older versions or additional confirmation''', continue with the steps below. |
| 2. Run a test call
| |
| 3. Verify CDRs appear in the GUI within 30-60 seconds
| |
| 4. Open the CDR and confirm it shows complete SIP history (INVITE + responses)
| |
| 5. If using tcpdump for verification, confirm the complete dialog is now on one interface:
| |
|
| |
|
| <syntaxhighlight lang="bash">
| | {{Note|See [[Syslog_Status_Line#IO.5B....5D_-_Disk_I.2FO_Monitoring_.28v2026.01.3.2B.29|Syslog Status Line - IO[] section]] for detailed field descriptions.}} |
| # Should now show both INVITEs and responses with same Call-IDs | |
| tshark -i eth0 -Y "sip" -T fields -e sip.Call-ID -e sip.Method -e sip.Status-Code | head -20
| |
| </syntaxhighlight>
| |
| | |
| ;3. Check for Promiscuous Mode (for SPAN/RSPAN Mirrored Traffic):
| |
| '''Important:''' Promiscuous mode requirements depend on your traffic mirroring method:
| |
|
| |
|
| * '''SPAN/RSPAN (Layer 2 mirroring):''' The network interface '''must''' be in promiscuous mode. Mirrored packets retain their original MAC addresses, so the interface would normally ignore them. Promiscuous mode forces the interface to accept all packets regardless of destination MAC.
| | ==== Step 2: Read the Full Syslog Status Line ==== |
|
| |
|
| * '''ERSPAN/GRE/TZSP/VXLAN (Layer 3 tunnels):''' Promiscuous mode is '''NOT required'''. These tunneling protocols encapsulate the mirrored traffic inside IP packets that are addressed directly to the sensor's IP address. The operating system receives these packets normally, and VoIPmonitor automatically decapsulates them to extract the inner SIP/RTP traffic.
| | VoIPmonitor outputs a status line every 10 seconds. This is your first diagnostic tool: |
|
| |
|
| For SPAN/RSPAN deployments, check the current promiscuous mode status:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| ip link show eth0
| | # Monitor in real-time |
| | journalctl -u voipmonitor -f |
| | # or |
| | tail -f /var/log/syslog | grep voipmonitor |
| </syntaxhighlight> | | </syntaxhighlight> |
| Look for the <code>PROMISC</code> flag.
| |
|
| |
|
| Enable promiscuous mode manually if needed:
| | '''Example status line:''' |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="text"> |
| ip link set eth0 promisc on
| | calls[424] PS[C:4 S:41 R:13540] SQLq[C:0 M:0] heap[45|30|20] comp[48] [25.6Mb/s] t0CPU[85%] t1CPU[12%] t2CPU[8%] tacCPU[8|8|7|7%] RSS/VSZ[365|1640]MB |
| </syntaxhighlight> | | </syntaxhighlight> |
| If this solves the problem, you should make the change permanent. The <code>install-script.sh</code> for the sensor usually attempts to do this, but it can fail.
| |
|
| |
|
| ;3A. Troubleshooting: Missing Packets for Specific IPs During High-Traffic Periods:
| | '''Key metrics for bottleneck identification:''' |
| If calls are missing only for certain IP addresses or specific call flows (particularly during high-traffic periods), the issue is typically at the network infrastructure level (SPAN configuration) rather than sensor resource limits. Use this systematic approach:
| |
|
| |
|
| === Step 1: Use tcpdump to Verify Packet Arrival ===
| | {| class="wikitable" |
| | | |- |
| Before tuning any sensor configuration, first verify if the missing packets are actually reaching the sensor's network interface. Use <code>tcpdump</code> for this verification:
| | ! Metric !! What It Indicates !! I/O Bottleneck Sign !! CPU Bottleneck Sign |
| | | |- |
| <syntaxhighlight lang="bash">
| | | <code>heap[A|B|C]</code> || Buffer fill % (primary / secondary / processing) || High A with low t0CPU || High A with high t0CPU |
| # Listen for SIP packets from a specific IP during the next high-traffic window
| |
| # Replace eth0 with your interface and 10.1.2.3 with the problematic IP
| |
| tcpdump -i eth0 -nn "host 10.1.2.3 and port 5060" -v
| |
| | |
| # Or capture to a file for later analysis
| |
| tcpdump -i eth0 -nn "host 10.1.2.3 and port 5060" -w /tmp/trace_10.1.2.3.pcap
| |
| </syntaxhighlight> | |
| | |
| Interpret the results:
| |
| * '''If you see SIP packets arriving:''' The traffic reaches the sensor. The issue is likely a sensor resource bottleneck (CPU, memory, or configuration limits). Proceed to [[#Sensor_Resource_Bottlenecks|Step 4: Check Sensor Statistics]].
| |
| * '''If you see NO packets or only intermittent packets:''' The traffic is not reaching the sensor. This indicates a network infrastructure issue. Proceed to [[#SPAN_Configuration_Troubleshooting|Step 2: Check SPAN Configuration]].
| |
| | |
| === Step 2: Check SPAN Configuration for Bidirectional Capture ===
| |
| | |
| If packets are missing at the interface level, verify your network switch's SPAN (port mirroring) configuration. During high-traffic periods, switches may have insufficient SPAN buffer capacity, causing packets to be dropped in the mirroring process itself.
| |
| | |
| Key verification points:
| |
| | |
| * '''Verify Source Ports:''' Confirm that both source IP addresses (or the switch ports they connect to) are included in the SPAN source list. Missing one direction of the call flow will result in incomplete CDRs.
| |
| | |
| * '''Check for Bidirectional Mirroring:''' Your SPAN configuration must capture '''BOTH inbound and outbound traffic'''. On most Cisco switches, this requires specifying:
| |
| <syntaxhighlight lang="bash">
| |
| monitor session 1 source interface GigabitEthernet1/1 both
| |
| </syntaxhighlight>
| |
| | |
| Replace <code>both</code> with:
| |
| * <code>rx</code> for incoming traffic only
| |
| * <code>tx</code> for outgoing traffic only
| |
| * <code>both</code> for bidirectional capture (recommended)
| |
| | |
| * '''Verify Destination Port:''' Confirm the SPAN destination points to the switch port where the VoIPmonitor sensor is connected.
| |
| | |
| * '''Check SPAN Buffer Saturation (High-Traffic Issues):''' Some switches have limited SPAN buffer capacity. When monitoring multiple high-traffic ports simultaneously, the SPAN buffer may overflow during peak usage, causing randomized packet drops. Symptoms:
| |
| ** Drops occur only during busy hours
| |
| ** Missing packets are inconsistent across different calls
| |
| ** Sensor CPU usage and t0CPU metrics appear normal (no bottleneck at sensor)
| |
| | |
| Solutions:
| |
| ** Reduce the number of monitored source ports in the SPAN session
| |
| ** Use multiple SPAN sessions if your switch supports it
| |
| ** Consider upgrading to a switch with higher SPAN buffer capacity
| |
| | |
| * '''Verify Switch Interface Counters for Packet Drops:''' Check the network switch interface counters to determine if the switch itself is dropping packets during the mirroring process. This is critical when investigating false low MOS scores or packet loss reports.
| |
| | |
| Cisco switches:
| |
| <syntaxhighlight lang="bash">
| |
| # Show general interface statistics for the SPAN source port
| |
| show interface GigabitEthernet1/1 counters
| |
| show interface GigabitEthernet1/1 | include drops|errors|Input queue|Output queue
| |
| | |
| # Show detailed interface status (look for input errors, CRC, frame)
| |
| show interface GigabitEthernet1/1 detail
| |
| | |
| # Monitor in real-time during a high-traffic period
| |
| show interface Gi1/1 accounting
| |
| </syntaxhighlight>
| |
| | |
| Key indicators of switch-level packet loss:
| |
| ** Non-zero input errors or CRC errors on source/destination ports
| |
| ** Input queue drops (indicating switch buffer overflow)
| |
| ** Increasing drop counters during peak traffic hours
| |
| ** Output errors on the SPAN destination port (sensor may not be accepting fast enough)
| |
| | |
| If switch interface counters show drops, the issue is at the network infrastructure level (overloaded switch), not the VoIPmonitor sensor. Consult your network administrator for switch optimization or consider redistributing SPAN traffic across multiple ports.
| |
| | |
| * '''Verify VLAN Trunking:''' If the monitored traffic spans different VLANs, ensure the SPAN destination port is configured as a trunk to carry all necessary VLAN tags. Without trunk mode, packets from non-native VLANs will be dropped or stripped of their tags.
| |
| | |
| For detailed instructions on configuring SPAN/ERSPAN/GRE for different network environments, see [[Sniffing_modes]].
| |
| | |
| === Step 2A: EtherChannel/LACP Bonded Interface Configuration ===
| |
| | |
| If your network uses Ethernet channel bonding (EtherChannel, LACP, port-channel, bonding, teaming), VoIPmonitor may report false positive packet loss and lower MOS scores even when actual loss is minimal. This occurs when VoIPmonitor is configured to monitor a bridged/bonded interface (e.g., `br0`, `bond0`, `team0`) instead of monitoring the individual physical interfaces that carry the actual traffic.
| |
| | |
| {| class="wikitable" style="background:#fff3cd; border:1px solid #ffc107;"
| |
| |- | | |- |
| ! colspan="2" style="background:#ffc107;" | Issue: False Packet Loss on Bonded Interfaces
| | | <code>t0CPU[X%]</code> || Packet capture thread (single-core, cannot parallelize) || Low (<50%) || High (>80%) |
| |- | | |- |
| | style="vertical-align: top;" | '''Symptoms:''' | | | <code>comp[X]</code> || Active compression threads || Very high (maxed out) || Normal |
| | VoIPmonitor reports packet loss and low MOS scores, but network devices show minimal or no actual loss. RTP processing threads appear overloaded. Switch SPAN configuration is correct but packets still show as missing. | |
| |- | | |- |
| | style="vertical-align: top;" | '''Root Cause:''' | | | <code>SQLq[C:X M:Y]</code> || Pending SQL queries || Growing = database bottleneck || Stable |
| | When monitoring a bonded interface (e.g., `br0` for an EtherChannel bundle), VoIPmonitor cannot correctly track RTP streams because packets from the same SIP call may be distributed across multiple physical interfaces by the bonding driver. This causes packet association failures and "false" loss detection. | |
| |- | | |- |
| | style="vertical-align: top;" | '''Solution:''' | | | <code>tacCPU[...]</code> || TAR compression threads || All near 100% = compression bottleneck || Normal |
| | Configure <code>interface</code> to use a comma-separated list of the individual physical interfaces that make up the EtherChannel bundle, not the bonded interface itself.
| |
| |} | | |} |
|
| |
|
| ;Identify Bonded Interface Misconfiguration:
| | '''Interpretation flowchart:''' |
|
| |
|
| Check if VoIPmonitor is configured to monitor a bonded/bundled interface:
| | <kroki lang="mermaid"> |
| <syntaxhighlight lang="bash"> | | graph TD |
| # Check current interface setting
| | A[heap values rising] --> B{Check t0CPU} |
| grep "^interface" /etc/voipmonitor.conf
| | B -->|t0CPU > 80%| C[CPU Bottleneck] |
| | B -->|t0CPU < 50%| D{Check comp and tacCPU} |
| | D -->|comp maxed, tacCPU high| E[I/O Bottleneck<br/>Disk cannot keep up with writes] |
| | D -->|comp normal| F{Check SQLq} |
| | F -->|SQLq growing| G[Database Bottleneck] |
| | F -->|SQLq stable| H[Mixed/Other Issue] |
|
| |
|
| # Common problematic values that indicate bonded interfaces:
| | C --> C1[Solution: CPU optimization] |
| # interface = br0 (bridge interface)
| | E --> E1[Solution: Faster storage] |
| # interface = bond0 (Linux bonding)
| | G --> G1[Solution: MySQL tuning] |
| # interface = team0 (NetworkManager teaming)
| | </kroki> |
| # interface = port-channel1 (Cisco/Nexus EtherChannel)
| |
| </syntaxhighlight> | |
|
| |
|
| ;Solution: Use Individual Physical Interfaces:
| | ==== Step 3: Linux I/O Diagnostics ==== |
|
| |
|
| Edit <code>/etc/voipmonitor.conf</code> and change the <code>interface</code> setting to use the individual physical interfaces that comprise the EtherChannel bundle:
| | Use these standard Linux tools to confirm I/O bottleneck: |
|
| |
|
| | '''Install required tools:''' |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Example: EtherChannel with ports p3p1 and p3p2 bundled into br0 | | # Debian/Ubuntu |
| # BEFORE (INCORRECT):
| | apt install sysstat iotop ioping |
| # interface = br0
| |
|
| |
|
| # AFTER (CORRECT): | | # CentOS/RHEL |
| interface = p3p1,p3p2
| | yum install sysstat iotop ioping |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| The comma-separated list allows VoIPmonitor to create a dedicated capture thread for each physical interface, ensuring proper RTP stream association across the EtherChannel bundle.
| | '''2a) iostat - Disk utilization and wait times''' |
| | |
| ;Verify Physical Interface Names:
| |
| | |
| To identify the correct physical interface names for your EtherChannel setup:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # List all network interfaces | | # Run for 10 intervals of 2 seconds |
| ip link show
| | iostat -xz 2 10 |
| | </syntaxhighlight> |
|
| |
|
| # Check which interfaces are part of the bond/team
| | '''Key output columns:''' |
| cat /proc/net/bonding/bond0 # for Linux bonding
| | <syntaxhighlight lang="text"> |
| teamdctl team0 state dump # for NetworkManager teaming
| | Device r/s w/s rkB/s wkB/s await %util |
| | | sda 12.50 245.30 50.00 1962.40 45.23 98.50 |
| # Or check switch configuration for port-channel members
| |
| # Example: Cisco switch output
| |
| # show etherchannel summary
| |
| # shows: Port-channel1 (Po1) = eth0, eth1, eth2, eth3
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| ;After Making Configuration Changes:
| | {| class="wikitable" |
| | | |- |
| Restart the VoIPmonitor service for changes to take effect:
| | ! Column !! Description !! Problem Indicator |
| <syntaxhighlight lang="bash">systemctl restart voipmonitor</syntaxhighlight>
| |
| | |
| Then verify that packet loss reporting has improved and RTP processing threads are no longer overloaded.
| |
| | |
| ;Related Issues:
| |
| | |
| * '''Packet Reordering in EtherChannel:''' Even with correct physical interface monitoring, EtherChannel distributes packets across multiple links using a hashing algorithm. This can cause packets in the same RTP stream to arrive out of order (packet delay variation). High jitter/variation may still cause reduced MOS F1 scores. Compare MOS F1 (50ms buffer) vs MOS F2 (200ms buffer) - if F2 is significantly higher, the issue is jitter/reordering, not actual loss.
| |
| | |
| * '''SPAN Bandwidth for Aggregated Links:''' When monitoring EtherChannel bundles, ensure the SPAN destination port has sufficient bandwidth. For example, if you have 4x1Gbps EtherChannel (4Gbps total) but monitor to a 1Gbps SPAN port, the switch will drop packets at the SPAN destination. Always use a SPAN port speed equal to or greater than the sum of the EtherChannel bandwidth.
| |
| | |
| === Step 3: Check for UDP Fragmentation Issues ===
| |
| | |
| When investigating missing packets, especially for SIP over UDP, a common issue is packet fragmentation. If the MTU path between systems causes SIP packets to be fragmented (typically >1480 bytes), the IP fragments will not contain UDP port information, making port-based tcpdump filters miss these fragments.
| |
| | |
| {| class="wikitable" style="background:#fff3cd; border:1px solid #ffc107;" | |
| |- | | |- |
| ! colspan="2" style="background:#ffc107;" | Critical: Check Defragmentation Settings FIRST
| | | <code>%util</code> || Device utilization percentage || '''> 90%''' = disk saturated |
| |- | | |- |
| | style="vertical-align: top;" | '''Before using tcpdump:''' | | | <code>await</code> || Average I/O wait time (ms) || '''> 20ms''' for SSD, '''> 50ms''' for HDD = high latency |
| | Verify VoIPmonitor's defragmentation is enabled in <code>/etc/voipmonitor.conf</code>. If these are disabled, VoIPmonitor cannot reassemble IP fragments before parsing SIP, causing missing packets even when traffic reaches the interface correctly.
| |
| |- | | |- |
| | style="vertical-align: top;" | '''Required settings:''' | | | <code>w/s</code> || Writes per second || Compare with disk's rated IOPS |
| | <code>udpfrag = yes</code> (default) for UDP fragment reassembly<br/><code>sip_tcp_reassembly_ext = yes</code> (default) for TCP reassembly
| |
| |} | | |} |
| ;Step 1: Verify defragmentation is enabled:
| |
|
| |
|
| Check your VoIPmonitor configuration:
| | '''2b) iotop - Per-process I/O usage''' |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Check if UDP fragment reassembly is enabled | | # Show I/O by process (run as root) |
| grep "^udpfrag" /etc/voipmonitor.conf
| | iotop -o |
| | |
| # Check if TCP reassembly is enabled
| |
| grep "^sip_tcp_reassembly_ext" /etc/voipmonitor.conf
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Both should be set to <code>yes</code> (these are the defaults). If either is set to <code>no</code>, VoIPmonitor will not reassemble fragmented packets, causing packets larger than the MTU to be ignored or parsed incorrectly.
| | Look for <code>voipmonitor</code> or <code>mysqld</code> dominating I/O. If voipmonitor shows high DISK WRITE but system <code>%util</code> is 100%, disk cannot keep up. |
| | |
| If the settings are correct and you are still experiencing issues, proceed to the following diagnostic steps using tcpdump.
| |
| | |
| ;Step 2: Identify if fragmentation may be causing packet loss:
| |
| If you are investigating missing INVITE packets (especially those with large SDP), check if packet fragmentation is in the network path. Large SIP packets get fragmented at IP layers, and subsequent fragments only contain IP headers (no UDP port information). | |
| | |
| ;Step 3: Do NOT use port-based filters when investigating fragmentation:
| |
| <code>tcpdump</code> or <code>tshark</code> filters based on UDP port will miss IP fragments, because fragments only contain the IP header (no transport layer port information). Instead, filter by IP address. | |
|
| |
|
| ;Step 4: Use IP-based tcpdump filters for fragmented packets:
| | '''2c) ioping - Quick latency check''' |
| When capturing to investigate missing packets that may be fragmented, filter by IP addresses instead of ports:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # CORRECT: Filter by IP address to capture fragments | | # Test latency on VoIPmonitor spool directory |
| # Replace opensips_ip and carrier_ip with actual IP addresses
| | cd /var/spool/voipmonitor |
| tcpdump -i eth0 -nn "host opensips_ip and carrier_ip" -w /tmp/fragmentation_test.pcap
| | ioping -c 20 . |
| | |
| # INCORRECT: Port filter will miss IP fragments
| |
| tcpdump -i eth0 -nn "host opensips_ip and port 5060" -w /tmp/fragmentation_test.pcap
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| ;Step 5: Test with a specific call that is missing from the GUI:
| | '''Expected results:''' |
| Make a test call that exhibits the missing packet issue, and capture the entire traffic flow between the two endpoints:
| | {| class="wikitable" |
| <syntaxhighlight lang="bash">
| | |- |
| # Capture all traffic between two IPs during a test call
| | ! Storage Type !! Healthy Latency !! Problem Indicator |
| # Replace with your actual IP addresses
| |
| tcpdump -i eth0 -nn "host 192.168.1.100 and 203.0.113.50" -w /tmp/test_call_capture.pcap
| |
| </syntaxhighlight>
| |
| | |
| ;Step 6: Analyze the capture for fragmentation:
| |
| Open the pcap file in <code>tshark</code> or Wireshark to check for fragmented IP packets:
| |
| <syntaxhighlight lang="bash">
| |
| # Look for fragmented IP packets in the capture
| |
| tshark -r /tmp/test_call_capture.pcap -Y "ip.fragments"
| |
| | |
| # Count fragments
| |
| tshark -r /tmp/test_call_capture.pcap -Y "ip.fragments" | wc -l
| |
| </syntaxhighlight>
| |
| | |
| If you see fragmented packets, especially for large SIP INVITE messages, this confirms that port-based filters would miss fragments, which explains why the call appears incomplete in VoIPmonitor.
| |
| | |
| ;{{Note|If tcpdump with IP-based filters captures the missing packets while VoIPmonitor does not see them, share the pcap file and your <code>voipmonitor.conf</code> with VoIPmonitor support for further analysis.}}
| |
| | |
| === Step 3A: Check for Sensor Resource Bottlenecks ===
| |
| | |
| If <code>tcpdump</code> confirms that packets are arriving at the interface consistently, but VoIPmonitor is still missing them, the issue may be sensor resource limitations.
| |
| | |
| * '''Check Packet Drops:''' In the GUI, navigate to '''Settings → Sensors''' and look at the "# packet drops" counter. If this counter is non-zero or increasing during high traffic:
| |
| ** Increase the <code>ringbuffer</code> size in <code>voipmonitor.conf</code> (default 50 MB, max 2000 MB)
| |
| ** Check the <code>t0CPU</code> metric in system logs - if consistently above 90%, you may need to upgrade CPU or optimize NIC drivers
| |
| | |
| * '''Monitor Memory Usage:''' Check for OOM (Out of Memory) killer events:
| |
| <syntaxhighlight lang="bash">
| |
| grep -i "out of memory\|killed process" /var/log/syslog | tail -20
| |
| </syntaxhighlight>
| |
| | |
| * '''SIP Packet Limits:''' If only long or chatty calls are affected, check the <code>max_sip_packets_in_call</code> and <code>max_invite_packets_in_call</code> limits in <code>voipmonitor.conf</code>.
| |
| | |
| ;3. Verify Your SPAN/Mirror/TAP Configuration:
| |
| This is the most common cause of no traffic. Double-check your network switch or hardware tap configuration to ensure:
| |
| * The correct source ports (where your PBX/SBC is connected) are being monitored.
| |
| * The correct destination port (where your VoIPmonitor sensor is connected) is configured.
| |
| * If you are monitoring traffic across different VLANs, ensure your mirror port is configured to carry all necessary VLAN tags (often called "trunk" mode).
| |
| | |
| ;4. Investigate Packet Encapsulation (If tcpdump shows traffic but VoIPmonitor does not):
| |
| If <code>tcpdump</code> or <code>tshark</code> shows packets reaching the interface but VoIPmonitor is not capturing them, the traffic may be encapsulated in a tunnel that VoIPmonitor cannot automatically process without additional configuration. Common encapsulations include VLAN tags, ERSPAN, GRE, VXLAN, and TZSP.
| |
| | |
| First, capture a sample of the traffic for analysis:
| |
| <syntaxhighlight lang="bash">
| |
| # Capture 100 packets of SIP traffic to a pcap file
| |
| tcpdump -i eth0 -c 100 -s0 port 5060 -w /tmp/encapsulation_check.pcap
| |
| </syntaxhighlight>
| |
| | |
| Then analyze the capture to identify encapsulation:
| |
| <syntaxhighlight lang="bash">
| |
| # Check for VLAN-tagged packets (802.1Q)
| |
| tshark -r /tmp/encapsulation_check.pcap -Y "vlan"
| |
| | |
| # Check for GRE tunnels
| |
| tshark -r /tmp/encapsulation_check.pcap -Y "gre"
| |
| | |
| # Check for ERSPAN (GRE encapsulated with ERSPAN protocol)
| |
| tshark -r /tmp/encapsulation_check.pcap -Y "gre && ip.proto == 47"
| |
| | |
| # Check for VXLAN (UDP port 4789)
| |
| tshark -r /tmp/encapsulation_check.pcap -Y "udp.port == 4789"
| |
| | |
| # Check for TZSP (UDP ports 37008 or 37009)
| |
| tshark -r /tmp/encapsulation_check.pcap -Y "udp.port == 37008 || udp.port == 37009"
| |
| | |
| # Show packet summary to identify any unusual protocol stacks
| |
| tshark -r /tmp/encapsulation_check.pcap -V | head -50
| |
| </syntaxhighlight>
| |
| | |
| Identifying encapsulation issues:
| |
| * '''VLAN tags present:''' Ensure VoIPmonitor's <code>sipport</code> filter does not use <code>udp</code> (which may drop VLAN-tagged packets). Comment out the <code>filter</code> directive in <code>voipmonitor.conf</code> to test.
| |
| | |
| * '''ERSPAN/GRE tunnels:''' Promiscuous mode is NOT required for these Layer 3 tunnels. Verify that tunneling is configured correctly on your network device and that the packets are addressed to the sensor's IP. VoIPmonitor automatically decapsulates ERSPAN and GRE.
| |
| | |
| * '''VXLAN/TZSP tunnels:''' These specialized tunneling protocols require proper configuration on the sending device. Consult your network device documentation for VoIPmonitor compatibility requirements.
| |
| | |
| If encapsulation is identified as the issue, review [[Sniffing_modes]] for detailed configuration guidance.
| |
| | |
| ;3B. Troubleshooting: RTP Streams Not Displayed for Specific Provider:
| |
| If SIP signaling appears correctly in the GUI for calls from a specific provider, but RTP streams (audio quality graphs, waveform visualization) are missing for that provider while working correctly for other call paths, use this systematic approach to identify the cause.
| |
| | |
| === Step 1: Make a Test Call to Reproduce the Issue===
| |
| | |
| First, create a controlled test scenario to investigate the specific provider.
| |
| | |
| * Determine if the issue affects ALL calls from this provider or only some (e.g., specific codecs, call duration, time of day)
| |
| * Make a test call that reproduces the problem (e.g., from the problematic provider to a test number)
| |
| * Allow the call to establish and run for at least 30-60 seconds to capture meaningful RTP data
| |
| | |
| === Step 2: Capture Packets on the Sniffing Interface During the Test Call ===
| |
| | |
| During the test call, use <code>tcpdump</code> (or <code>tshark</code>) to directly capture packets on the network interface configured in <code>voipmonitor.conf</code>. This tells you whether RTP packets are being received by the sensor.
| |
| | |
| <syntaxhighlight lang="bash">
| |
| # Capture SIP and RTP packets from the specific provider IP during your test call
| |
| # Replace eth0 with your interface and 1.2.3.4 with the provider's IP
| |
| sudo tcpdump -i eth0 -nn "host 1.2.3.4 and (udp port 5060 or (udp[0] & 0x78) == 0x78)" -v
| |
| | |
| # Capture RTP to a file for detailed analysis (recommended)
| |
| sudo tcpdump -i eth0 -nn "host 1.2.3.4 and rtp" -w /tmp/test_provider_rtp.pcap
| |
| </syntaxhighlight>
| |
| | |
| Note: The RTP filter <code>(udp[0] & 0x78) == 0x78</code> matches packets with the first two bits of the first byte set to "10", which is characteristic of RTP.
| |
| | |
| === Step 3: Compare Raw Packet Capture with Sensor Output ===
| |
| | |
| After the test call:
| |
| | |
| * Check what tcpdump captured:
| |
| <syntaxhighlight lang="bash">
| |
| # Count SIP packets
| |
| tshark -r /tmp/test_provider_rtp.pcap -Y "sip" | wc -l
| |
| | |
| # Count RTP packets
| |
| tshark -r /tmp/test_provider_rtp.pcap -Y "rtp" | wc -l
| |
| | |
| # View RTP stream details
| |
| tshark -r /tmp/test_provider_rtp.pcap -Y "rtp" -T fields -e rtp.ssrc -e rtp.seq -e rtp.ptype -e udp.srcport -e udp.dstport | head -20
| |
| </syntaxhighlight>
| |
| | |
| * Check what VoIPmonitor recorded:
| |
| * Open the CDR for your test call in the GUI
| |
| * Verify if the "Received Packets" column shows non-zero values for the provider leg
| |
| * Check if the "Streams" section shows RTP quality graphs and waveform visualization
| |
| | |
| * Compare the results:
| |
| ** '''If tcpdump shows NO RTP packets:''' The RTP traffic is not reaching the sensor interface. This indicates a network-level issue (asymmetric routing, SPAN configuration missing the RTP path, or firewall). You need to troubleshoot the network infrastructure, not VoIPmonitor.
| |
| | |
| ** '''If tcpdump shows RTP packets but the GUI shows no streams or zero received packets:''' The packets are reaching the sensor but VoIPmonitor is not processing them. Check:
| |
| * [[#Check_GUI_Capture_Rules_(Causing_Call_Stops)|Step 5: Check GUI Capture Rules]] - Look for capture rules targeting the provider's IP with RTP set to "DISCARD" or "Header Only"
| |
| * [[Tls|TLS/SSL Decryption]] - Verify SRTP decryption is configured correctly if the provider uses encryption
| |
| * [[Sniffer_configuration]] - Check for any problematic <code>sipport</code> or <code>filter</code> settings
| |
| | |
| For more information on capture rules that affect RTP storage, see [[Capture_rules]].
| |
| | |
| ;5. Check for Non-Call SIP Traffic Only:
| |
| If you see SIP traffic but it consists only of OPTIONS, NOTIFY, SUBSCRIBE, or MESSAGE methods (without any INVITE packets), there are no calls to generate CDRs. This can occur in environments that use SIP for non-call purposes like heartbeat checks or instant messaging.
| |
| | |
| You can configure VoIPmonitor to process and store these non-call SIP messages. See [[SIP_OPTIONS/SUBSCRIBE/NOTIFY]] and [[MESSAGES]] for configuration details.
| |
| | |
| Enable non-call SIP message processing in '''/etc/voipmonitor.conf''':
| |
| <syntaxhighlight lang="ini">
| |
| # Process SIP OPTIONS (qualify pings). Default: no
| |
| sip-options = yes
| |
| | |
| # Process SIP MESSAGE (instant messaging). Default: yes
| |
| sip-message = yes
| |
| | |
| # Process SIP SUBSCRIBE requests. Default: no
| |
| sip-subscribe = yes
| |
| | |
| # Process SIP NOTIFY requests. Default: no
| |
| sip-notify = yes
| |
| </syntaxhighlight>
| |
| | |
| Note that enabling these for processing and storage can significantly increase database load in high-traffic scenarios. Use with caution and monitor SQL queue growth. See [[SIP_OPTIONS/SUBSCRIBE/NOTIFY#Performance_Tuning|Performance Tuning]] for optimization tips.
| |
| | |
| == Step 4: Check the VoIPmonitor Configuration ==
| |
| If <code>tshark</code> sees traffic but VoIPmonitor does not, the problem is almost certainly in <code>voipmonitor.conf</code>.
| |
| | |
| ;1. Check the <code>interface</code> directive:
| |
| Make sure the <code>interface</code> parameter in <code>/etc/voipmonitor.conf</code> exactly matches the interface where you see traffic with <code>tshark</code>. For example: <code>interface = eth0</code>.
| |
| | |
| === Troubleshooting: Wrong Interface Name ===
| |
| | |
| If the <code>interface</code> directive is set to an interface name that does not exist on the system, the sensor will fail to capture traffic completely. This is a common issue when network interface names change after system updates or hardware reconfiguration.
| |
| | |
| ;Step 1: Identify the correct interface name:
| |
| Use either of these commands to list all available network interfaces:
| |
| <syntaxhighlight lang="bash">
| |
| # Option 1: Modern Linux systems
| |
| ip a
| |
| | |
| # Option 2: Older systems
| |
| ifconfig
| |
| </syntaxhighlight>
| |
| | |
| Look for the interface that is receiving traffic. Common interface names include:
| |
| * <code>eth0, eth1, eth2...</code> (classic Ethernet naming)
| |
| * <code>ens33, ens34, enp0s3...</code> (predictable naming on modern systems)
| |
| * <code>enp2s0f0, enp2s0f1...</code> (multi-port NICs)
| |
| | |
| ;Step 2: Verify the interface exists and is UP:
| |
| <syntaxhighlight lang="bash">
| |
| # Check specific interface status (replace eth0 with your interface name)
| |
| ip link show eth0
| |
| | |
| # The output should show "UP" and "LOWER_UP" to indicate the interface is active
| |
| </syntaxhighlight>
| |
| | |
| ;Step 3: Update <code>/etc/voipmonitor.conf</code>:
| |
| Edit the <code>interface</code> directive to use the correct interface name:
| |
| <syntaxhighlight lang="ini">
| |
| # /etc/voipmonitor.conf
| |
| interface = ens33
| |
| </syntaxhighlight>
| |
| | |
| ;Step 4: Restart the VoIPmonitor service:
| |
| <syntaxhighlight lang="bash">
| |
| systemctl restart voipmonitor
| |
| systemctl status voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| Verify the service shows <code>Active: active (running)</code> after the restart.
| |
| | |
| ;2. Check the <code>sipport</code> directive:
| |
| By default, VoIPmonitor only listens on port 5060. If your PBX uses a different port for SIP, you must add it. '''Common causes of missing calls:'''
| |
| * '''Missing ports:''' Some providers use alternate SIP ports (5061, 5080, etc.). If these are not listed, calls on those ports will be ignored.
| |
| * '''Syntax errors:''' List multiple ports comma-separated without extra commas or trailing characters. Correct syntax: <code>sipport = 5060,5061</code> or <code>sipport = 5060,5080</code>
| |
| * '''Ranges:''' You can specify port ranges using dashes: <code>sipport = 5060,5070-5080</code>
| |
| Example:
| |
| <code>sipport = 5060,5080</code>
| |
| | |
| ;3. '''Distributed/Probe Setup Considerations:'''
| |
| If you are using a remote sensor (probe) with Packet Mirroring (<code>packetbuffer_sender=yes</code>), call detection depends on configuration on '''both''' the probe and the central analysis host.
| |
| | |
| Common symptom: The probe captures traffic (visible via <code>tcpdump</code>), but the central server records incomplete or missing CDRs for calls on non-default ports.
| |
| | |
| {| class="wikitable" style="background:#fff3cd; border:1px solid #ffc107;"
| |
| |- | | |- |
| ! colspan="2" style="background:#ffc107;" | Critical: Both Systems Must Have Matching sipport Configuration
| | | NVMe SSD || < 0.5 ms || > 2 ms |
| |- | | |- |
| | style="vertical-align: top;" | '''Probe side:''' | | | SATA SSD || < 1 ms || > 5 ms |
| | The probe captures packets from the network interface. Its <code>sipport</code> setting determines which UDP ports it considers as SIP traffic to capture and forward. | |
| |- | | |- |
| | style="vertical-align: top;" | '''Central server side:''' | | | HDD (7200 RPM) || < 10 ms || > 30 ms |
| | When receiving raw packets in Packet Mirroring mode, the central server analyzes the packets locally. Its <code>sipport</code> setting determines which ports it interprets as SIP during analysis. If a port is missing here, packets are captured but not recognized as SIP, resulting in missing CDRs. | |
| |} | | |} |
|
| |
|
| :'''Troubleshooting steps for distributed probe setups:'''
| | ==== Step 4: Linux CPU Diagnostics ==== |
| | |
| ::1. Verify traffic reachability on the probe:
| |
| ::Use <code>tcpdump</code> on the probe VM to confirm SIP packets for the missing calls are arriving on the expected ports.
| |
| ::<syntaxhighlight lang="bash">
| |
| # On the probe VM
| |
| tcpdump -i eth0 -n port 5061
| |
| </syntaxhighlight>
| |
| | |
| ::2. Check the probe's ''voipmonitor.conf'':
| |
| ::Ensure the <code>sipport</code> directive on the probe includes all necessary SIP ports used in your network.
| |
| ::<syntaxhighlight lang="ini">
| |
| # /etc/voipmonitor.conf on the PROBE
| |
| sipport = 5060,5061,5080,6060
| |
| </syntaxhighlight>
| |
| | |
| ::3. Check the central analysis host's ''voipmonitor.conf'':
| |
| ::'''This is the most common cause of missing calls in distributed setups.''' The central analysis host (the system receiving packets via <code>server_bind</code> or legacy <code>mirror_bind</code>) must also have the <code>sipport</code> directive configured with the same list of ports used by all probes.
| |
| ::<syntaxhighlight lang="ini">
| |
| # /etc/voipmonitor.conf on the CENTRAL HOST
| |
| sipport = 5060,5061,5080,6060
| |
| </syntaxhighlight>
| |
| | |
| ::4. Restart both services:
| |
| ::Apply the configuration changes:
| |
| ::<syntaxhighlight lang="bash">
| |
| # On both probe and central host
| |
| systemctl restart voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| :For more details on distributed architecture configuration and packet mirroring, see [[Sniffer_distributed_architecture|Distributed Architecture: Client-Server Mode]].
| |
| | |
| ;4. Check for a restrictive <code>filter</code>:
| |
| :If you have a BPF <code>filter</code> configured, ensure it is not accidentally excluding the traffic you want to see. For debugging, try commenting out the <code>filter</code> line entirely and restarting the sensor.
| |
| | |
| == Step 5: Check GUI Capture Rules (Causing Call Stops) == | |
| If <code>tshark</code> sees SIP traffic and the sniffer configuration appears correct, but the probe stops processing calls or shows traffic only on the network interface, GUI capture rules may be the culprit.
| |
| | |
| Capture rules configured in the GUI can instruct the sniffer to ignore ("skip") all processing for matched calls. This includes calls matching specific IP addresses or telephone number prefixes.
| |
| | |
| ;1. Review existing capture rules:
| |
| :Navigate to '''GUI -> Capture rules''' and examine all rules for any that might be blocking your traffic.
| |
| :Look specifically for rules with the '''Skip''' option set to '''ON''' (displayed as "Skip: ON"). The Skip option instructs the sniffer to completely ignore matching calls (no files, RTP analysis, or CDR creation).
| |
| | |
| ;2. Test by temporarily removing all capture rules:
| |
| :To isolate the issue, first create a backup of your GUI configuration:
| |
| :* Navigate to '''Tools -> Backup & Restore -> Backup GUI -> Configuration tables'''
| |
| :* This saves your current settings including capture rules
| |
| :* Delete all capture rules from the GUI
| |
| :* Click the '''Apply''' button to save changes
| |
| :* Reload the sniffer by clicking the green '''"reload sniffer"''' button in the control panel
| |
| :* Test if calls are now being processed correctly
| |
| :* If resolved, restore the configuration from the backup and systematically investigate the rules to identify the problematic one
| |
| | |
| ;3. Identify the problematic rule:
| |
| :* After restoring your configuration, remove rules one at a time and reload the sniffer after each removal
| |
| :* When calls start being processed again, you have identified the problematic rule
| |
| :* Review the rule's match criteria (IP addresses, prefixes, direction) against your actual traffic pattern
| |
| :* Adjust the rule's conditions or Skip setting as needed
| |
| | |
| ;4. Verify rules are reloaded:
| |
| :After making changes to capture rules, remember that changes are '''not automatically applied''' to the running sniffer. You must click the '''"reload sniffer"''' button in the control panel, or the rules will continue using the previous configuration.
| |
| | |
| For more information on capture rules, see [[Capture_rules]].
| |
| | |
| == Troubleshooting: Service Fails to Start with "failed read rsa key" Error ==
| |
| | |
| If the VoIPmonitor sniffer service fails to start and logs the error message "failed read rsa key," this indicates that the manager key cannot be loaded from the database.
| |
| | |
| === Cause ===
| |
| | |
| The manager_key is stored in the <code>system</code> database table (identified by <code>type='manager_key'</code>) and is required for proper manager/sensor operations in distributed deployments. This error most commonly occurs when the <code>mysqlloadconfig</code> option in <code>voipmonitor.conf</code> is set to <code>no</code>, which prevents VoIPmonitor from loading configuration (including the manager_key) from the database.
| |
| | |
| === Troubleshooting Steps === | |
| | |
| ;1. Check for the error in system logs:
| |
| <syntaxhighlight lang="bash">
| |
| # For Debian/Ubuntu
| |
| tail -f /var/log/syslog | grep voipmonitor
| |
| | |
| # For CentOS/RHEL/AlmaLinux
| |
| tail -f /var/log/messages | grep voipmonitor
| |
| | |
| # For systemd systems
| |
| journalctl -u voipmonitor -f
| |
| </syntaxhighlight>
| |
|
| |
|
| ;2. Verify mysqlloadconfig setting:
| | '''3a) top - Overall CPU usage''' |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Check if mysqlloadconfig is set to no in voipmonitor.conf | | # Press '1' to show per-core CPU |
| grep mysqlloadconfig /etc/voipmonitor.conf
| | top |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| If the output shows <code>mysqlloadconfig = no</code>, this is the cause of the issue.
| | Look for: |
| | * Individual CPU core at 100% (t0 thread is single-threaded) |
| | * High <code>%wa</code> (I/O wait) vs high <code>%us/%sy</code> (CPU-bound) |
|
| |
|
| ;3. Fix the mysqlloadconfig setting:
| | '''3b) Verify voipmonitor threads''' |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Edit the configuration file | | # Show voipmonitor threads with CPU usage |
| nano /etc/voipmonitor.conf
| | top -H -p $(pgrep voipmonitor) |
| | |
| # Either remove the mysqlloadconfig line entirely (defaults to yes)
| |
| # Or uncomment/set to yes:
| |
| # mysqlloadconfig = yes
| |
| | |
| # Restart the sniffer service
| |
| systemctl restart voipmonitor
| |
| | |
| # Check if it started successfully
| |
| systemctl status voipmonitor
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| ;4. Verify manager_key exists in database:
| | If one thread shows ~100% CPU while others are low, you have a CPU bottleneck on the capture thread (t0). |
| <syntaxhighlight lang="sql">
| |
| -- Query the manager_key from the system table
| |
| SELECT * FROM voipmonitor.`system` WHERE type='manager_key'\G
| |
| </syntaxhighlight>
| |
|
| |
|
| If no manager_key exists, check your VoIPmonitor installation and consider running the installer or contacting support to regenerate the key.
| | ==== Step 5: Decision Matrix ==== |
|
| |
|
| ;5. Check database connectivity and permissions:
| | {| class="wikitable" |
| Verify that the VoIPmonitor sniffer can connect to the database and has read access to the <code>system</code> table.
| | |- |
| <syntaxhighlight lang="bash">
| | ! Observation !! Likely Cause !! Go To |
| # Test database connectivity with the configured credentials
| |
| mysql -h <mysqlhost> -u <mysqlusername> -p <mysqldb>
| |
| | |
| # Inside MySQL, verify the user has SELECT on voipmonitor.system
| |
| SHOW GRANTS FOR 'voipmonitor_user'@'%';
| |
| </syntaxhighlight>
| |
| | |
| ;6. Check configuration consistency between probe and server:
| |
| In distributed deployments with probe and server components, ensure that both systems have consistent configuration in <code>/etc/voipmonitor.conf</code>. Specifically, both should have the same database connection settings and <code>mysqlloadconfig</code> should be enabled on both systems.
| |
| | |
| === Summary ===
| |
| | |
| The "failed read rsa key" error is almost always caused by <code>mysqlloadconfig=no</code> in <code>voipmonitor.conf</code>. The solution is to remove or change this setting to <code>yes</code>, then restart the service.
| |
| | |
| == Step 6: Check VoIPmonitor Logs for Errors ==
| |
| Finally, VoIPmonitor's own logs are the best source for clues. Check the system log for any error messages generated by the sensor on startup or during operation.
| |
| <syntaxhighlight lang="bash">
| |
| # For Debian/Ubuntu
| |
| tail -f /var/log/syslog | grep voipmonitor
| |
| | |
| # For CentOS/RHEL/AlmaLinux
| |
| tail -f /var/log/messages | grep voipmonitor
| |
| </syntaxhighlight>
| |
| Look for errors like:
| |
| * "pcap_open_live(eth0) error: eth0: No such device" (Wrong interface name)
| |
| * "Permission denied" (The sensor is not running with sufficient privileges)
| |
| * Errors related to database connectivity.
| |
| * Messages about dropping packets.
| |
| | |
| == Step 7: Check for OOM (Out of Memory) Issues ==
| |
| If VoIPmonitor suddenly stops processing CDRs and a service restart temporarily restores functionality, the system may be experiencing OOM (Out of Memory) killer events. The Linux OOM killer terminates processes when available RAM is exhausted, and MySQL (<code>mysqld</code>) is a common target due to its memory-intensive nature.
| |
| | |
| ;1. Check for OOM killer events in kernel logs:
| |
| <syntaxhighlight lang="bash">
| |
| # For Debian/Ubuntu
| |
| grep -i "out of memory\|killed process" /var/log/syslog | tail -20
| |
| | |
| # For CentOS/RHEL/AlmaLinux
| |
| grep -i "out of memory\|killed process" /var/log/messages | tail -20
| |
| | |
| # Also check dmesg:
| |
| dmesg | grep -i "killed process" | tail -10
| |
| </syntaxhighlight>
| |
| Typical OOM killer messages look like:
| |
| <syntaxhighlight lang="text">
| |
| Out of memory: Kill process 1234 (mysqld) score 123 or sacrifice child
| |
| Killed process 1234 (mysqld) total-vm: 12345678kB, anon-rss: 1234567kB
| |
| </syntaxhighlight>
| |
| | |
| ;2. Monitor current memory usage:
| |
| <syntaxhighlight lang="bash">
| |
| # Check available memory (look for low 'available' or 'free' values)
| |
| free -h
| |
| | |
| # Check per-process memory usage (sorted by RSS)
| |
| ps aux --sort=-%mem | head -15
| |
| | |
| # Check MySQL memory usage in bytes
| |
| cat /proc/$(pgrep mysqld)/status | grep -E "VmSize|VmRSS"
| |
| </syntaxhighlight>
| |
| Warning signs:
| |
| * '''Available memory consistently below 500MB during operation'''
| |
| * '''MySQL consuming most of the available RAM'''
| |
| * '''Swap usage near 100% (if swap is enabled)'''
| |
| * '''Frequent process restarts without clear error messages'''
| |
| | |
| ;3. Solution: Increase physical memory:
| |
| The definitive solution for OOM-related CDR processing issues is to upgrade the server's physical RAM. After upgrading:
| |
| * Verify memory improvements with <code>free -h</code>
| |
| * Monitor for several days to ensure OOM events stop
| |
| * Consider tuning <code>innodb_buffer_pool_size</code> in your MySQL configuration to use the additional memory effectively
| |
| | |
| Additional mitigation strategies (while planning for RAM upgrade):
| |
| * Reduce MySQL's memory footprint by lowering <code>innodb_buffer_pool_size</code> (e.g., from 16GB to 8GB)
| |
| * Disable or limit non-essential VoIPmonitor features (e.g., packet capture storage, RTP analysis)
| |
| * Ensure swap space is properly configured as a safety buffer (though swap is much slower than RAM)
| |
| * Use <code>sysctl vm.swappiness=10</code> to favor RAM over swap when some memory is still available
| |
| | |
| === Troubleshooting: Runaway Processes Causing OOM ===
| |
| | |
| If the sniffer is being killed by the OOM killer but increasing RAM or tuning MySQL configuration does not resolve the issue, the root cause may be '''external runaway processes''' spawned by scripts or other applications on the sensor host.
| |
| | |
| {| class="wikitable" style="background:#fff3cd; border:1px solid #ffc107;"
| |
| |- | | |- |
| ! colspan="2" style="background:#ffc107;" | Differentiating OOM Scenarios
| | | <code>heap</code> high, <code>t0CPU</code> > 80%, iostat <code>%util</code> low || '''CPU Bottleneck''' || [[#Solution: CPU Bottleneck|CPU Solution]] |
| |- | | |- |
| | style="vertical-align: top;" | '''MySQL killed by OOM''' | | | <code>heap</code> high, <code>t0CPU</code> < 50%, iostat <code>%util</code> > 90% || '''I/O Bottleneck''' || [[#Solution: I/O Bottleneck|I/O Solution]] |
| | Database consuming RAM. Solution: Tune innodb_buffer_pool_size, add RAM, enable query_cache=yes in voipmonitor.conf. | |
| |- | | |- |
| | style="vertical-align: top;" | '''voipmonitor killed by OOM''' | | | <code>heap</code> high, <code>t0CPU</code> < 50%, iostat <code>%util</code> < 50%, <code>SQLq</code> growing || '''Database Bottleneck''' || [[#SQL Queue Overload|Database Solution]] |
| | Sniffer internal buffers. Solution: Reduce max_buffer_mem, lower ringbuffer, check query_cache setting. | |
| |- | | |- |
| | style="vertical-align: top;" | '''Runaway external processes causing OOM''' | | | <code>heap</code> normal, <code>comp</code> maxed, <code>tacCPU</code> all ~100% || '''Compression Bottleneck''' (type of I/O) || [[#Solution: I/O Bottleneck|I/O Solution]] |
| | Other processes spawned by scripts/consum RAM. Solution: Identify and disable problematic script, kill orphaned processes (this section). | |
| |} | | |} |
|
| |
|
| == Diagnosis: Identify Runaway Processes == | | ==== Step 6: Confirmation Test (Optional) ==== |
| | |
| ;Step 1: Check which process was killed by OOM:
| |
| Examine OOM killer messages to identify which process was sacrificed:
| |
| <syntaxhighlight lang="bash">
| |
| # Check for OOM events and identify killed processes
| |
| grep -i "out of memory\|killed process" /var/log/syslog | tail -10
| |
| | |
| # Alternative on some systems
| |
| dmesg | grep -i "killed process"
| |
| </syntaxhighlight>
| |
| | |
| If <code>voipmonitor</code> or another critical service is being killed, but MySQL/voipmonitor configuration is already optimized and RAM limits are reasonable, suspect runaway external processes.
| |
| | |
| ;Step 2: Identify resource-intensive processes:
| |
| Look for processes with unusually high counts or memory consumption:
| |
| <syntaxhighlight lang="bash">
| |
| # Check all processes sorted by memory usage (high RSS)
| |
| ps aux --sort=-%mem | head -20
| |
| | |
| # Count processes by name (look for suspiciously high counts)
| |
| ps aux | awk '{print $11}' | sort | uniq -c | sort -rn | head -15
| |
| | |
| # Check for specific process types (replace with process name)
| |
| ps ax | grep <process_name> | wc -l
| |
| </syntaxhighlight>
| |
| | |
| Warning signs for runaway processes:
| |
| * Hundreds or thousands of instances of the same process
| |
| * Processes consuming excessive memory (RSS column)
| |
| * Processes created in rapid succession
| |
| * Parent-child process chains that do not terminate
| |
| | |
| ;Step 3: Find the parent spawning script:
| |
| Once you identify the runaway process, find its parent and the script responsible:
| |
| <syntaxhighlight lang="bash">
| |
| # Find the parent process of the runaway process
| |
| ps -ef | grep <runaway_process_name>
| |
| | |
| # Process tree to see the parent-child relationship
| |
| pstree -p | grep <runaway_process_name>
| |
| | |
| # Locate the parent script (check path from output above)
| |
| ls -la <path_to_parent_script>
| |
| </syntaxhighlight>
| |
| | |
| Common locations for problematic scripts:
| |
| * Cron jobs (<code>/etc/cron.*</code> directories, user crontabs)
| |
| * Systemd services (<code>/etc/systemd/system/</code>)
| |
| * Custom monitoring or management scripts
| |
| * Initialization scripts in <code>/etc/init.d/</code>
| |
| | |
| === Step 4: Disable the Problematic Script ===
| |
| | |
| After identifying the parent script causing the runaway processes:
| |
| | |
| ;Option A: Comment out or remove the script entry:
| |
| <syntaxhighlight lang="bash">
| |
| # If it's a cron job, edit the crontab
| |
| crontab -e
| |
| # Comment out the problematic line with #
| |
| | |
| # If it's a system cron script
| |
| chmod -x /etc/cron.daily/<script_name>
| |
| | |
| # If it's in a cron directory
| |
| rm /etc/cron.d/<script_name>
| |
| </syntaxhighlight>
| |
| | |
| ;Option B: Add exit at the beginning of the script:
| |
| If you cannot remove the script entirely, disable it by preventing it from running:
| |
| <syntaxhighlight lang="bash">
| |
| # Edit the script
| |
| nano /path/to/problematic_script.sh
| |
| | |
| # Add 'exit' as the first line
| |
| exit
| |
| # (rest of script content below)
| |
| </syntaxhighlight>
| |
| | |
| The <code>exit</code> command at the top ensures the script terminates immediately without spawning any processes.
| |
| | |
| ;Option C: For systemd services, disable the service:
| |
| <syntaxhighlight lang="bash">
| |
| # Stop and disable the problematic service
| |
| systemctl stop <service_name>
| |
| systemctl disable <service_name>
| |
| | |
| # Verify it will not start on boot
| |
| systemctl is-enabled <service_name> # Should return "disabled"
| |
| </syntaxhighlight>
| |
| | |
| === Step 5: Kill All Orphaned Processes ===
| |
| | |
| After disabling the parent script, terminate all existing runaway processes:
| |
| <syntaxhighlight lang="bash">
| |
| # Kill all instances of a specific process
| |
| killall <process_name>
| |
| | |
| # If kill does not work, force kill
| |
| killall -9 <process_name>
| |
| | |
| # Verify no processes remain
| |
| ps ax | grep <process_name>
| |
| </syntaxhighlight>
| |
| | |
| {{Warning|Using <code>killall -9</code> will force-terminate processes. Use only for runaway processes that you have positively identified as problematic.}}
| |
| | |
| === Step 6: Verify Stability ===
| |
| | |
| After disabling the script and cleaning up processes:
| |
| | |
| ;1. Monitor memory usage:
| |
| <syntaxhighlight lang="bash">
| |
| # Check current free memory
| |
| free -h
| |
| | |
| # Watch memory usage in real-time
| |
| watch -n 2 free -h
| |
| </syntaxhighlight>
| |
| | |
| ;2. Monitor for new OOM events:
| |
| <syntaxhighlight lang="bash">
| |
| # Watch syslog for new OOM killer events
| |
| tail -f /var/log/syslog | grep -i "out of memory"
| |
| </syntaxhighlight>
| |
| | |
| ;3. Verify voipmonitor service stability:
| |
| <syntaxhighlight lang="bash">
| |
| # Check service status
| |
| systemctl status voipmonitor
| |
| | |
| # Monitor service logs
| |
| tail -f /var/log/syslog | grep voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| ;4. Monitor process counts:
| |
| <syntaxhighlight lang="bash">
| |
| # Check that runaway processes are not respawning
| |
| watch -n 5 'ps ax | grep <process_name> | wc -l'
| |
| </syntaxhighlight>
| |
| | |
| If the process count remains at 0 or a low, stable number and no new OOM events occur over several hours, the issue is resolved.
| |
| | |
| === Prevention and Best Practices === | |
| | |
| To prevent future runaway process issues:
| |
| | |
| '''Review cron jobs regularly:'''
| |
| * Audit all cron entries on the sensor host: <code>crontab -l</code>, <code>/etc/cron.*</code>
| |
| * Remove unnecessary or outdated scripts
| |
| * Check run frequency to avoid overlapping script executions
| |
| | |
| '''Monitor process counts:'''
| |
| * Implement monitoring to alert when process counts for specific services exceed thresholds
| |
| * Use tools like Nagios, Zabbix, or VoIPmonitor's own alerting to track system load
| |
| | |
| '''Use resource limits:'''
| |
| * Consider using <code>ulimit</code> or systemd's <code>MemoryMax</code> to cap memory usage for non-critical services
| |
| * Docker or other containerization solutions can provide process isolation and resource limits
| |
| | |
| == Step 8: PACKETBUFFER Saturation Under High Load ==
| |
| If a newly deployed sensor is experiencing memory filling up, crashes, and truncated RTP streams even with minimal traffic, the root cause is likely PACKETBUFFER saturation. This occurs when the sensor's internal packet buffer cannot keep up with the incoming packet rate, causing dropped packets and incomplete RTP stream recording.
| |
|
| |
|
| === Symptoms ===
| | After identifying the likely cause with the tools above, you can confirm with a storage disable test: |
| * VoIPmonitor process or sensor crashes under load
| |
| * Syslog shows "PACKETBUFFER: memory is FULL" errors
| |
| * RTP streams are truncated or incomplete in PCAP files
| |
| * Memory usage increases steadily during high traffic periods
| |
| * Truncated PCAP files indicate forced disk writes or process termination
| |
| {{Note|This is different from OOM (Out of Memory) issues where the OOM killer terminates processes. PACKETBUFFER saturation is an internal VoIPmonitor buffer that fills when packet processing cannot keep up with the packet rate.}}
| |
|
| |
|
| === Diagnosis: Check for PACKETBUFFER Errors ===
| |
|
| |
| ;IMPORTANT: Diagnose the Bottleneck FIRST
| |
| Before increasing buffer memory or threads, determine whether PACKETBUFFER saturation is caused by disk I/O bottlenecks or CPU/threading limitations. This diagnostic step can save significant troubleshooting time.
| |
|
| |
| ;Diagnostic: Disable Packet Saving
| |
| Temporarily disable packet saving to test if disk I/O throughput is the bottleneck causing PACKETBUFFER saturation:
| |
|
| |
| Edit <code>/etc/voipmonitor.conf</code> and set all saving options to <code>no</code>:
| |
| <syntaxhighlight lang="ini"> | | <syntaxhighlight lang="ini"> |
| # Temporarily disable all packet saving to test if disk I/O is the bottleneck | | # /etc/voipmonitor.conf - temporarily disable all storage |
| savesip = no | | savesip = no |
| savertp = no | | savertp = no |
| Line 1,468: |
Line 408: |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Restart the VoIPmonitor service:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| systemctl restart voipmonitor | | systemctl restart voipmonitor |
| | # Monitor for 5-10 minutes during peak traffic |
| | journalctl -u voipmonitor -f | grep heap |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Monitor the system under the same traffic load:
| | * If <code>heap</code> values drop to near zero → confirms '''I/O bottleneck''' |
| <syntaxhighlight lang="bash"> | | * If <code>heap</code> values remain high → confirms '''CPU bottleneck''' |
| # Watch syslog for PACKETBUFFER errors during traffic
| |
| tail -f /var/log/syslog | grep -i packetbuffer
| |
| </syntaxhighlight> | |
| | |
| ;Interpret Results:
| |
|
| |
|
| * '''PACKETBUFFER saturation continues even with saving disabled:''' The bottleneck is CPU or threading, not disk I/O. Proceed to increase threading and buffer memory in the [[#Solution:_Increase_Threading_and_Buffer_Memory|Solution]] section below.
| | {{Warning|Remember to re-enable storage after testing! This test causes call recordings to be lost.}} |
|
| |
|
| * '''PACKETBUFFER saturation stops with saving disabled:''' Disk I/O throughput is the bottleneck. The storage subsystem cannot write packets fast enough to keep up with the incoming packet rate. Consider:
| | === Solution: I/O Bottleneck === |
| ** Upgrade to faster storage (SSD/NVMe instead of HDD)
| |
| ** Use dedicated storage server with packetbuffer_sender mode ([[Sniffer_distributed_architecture|Client-Server]])
| |
| ** Increase compression to reduce write load (<code>packetbuffer_compress = yes</code>)
| |
| ** Reduce PCAP file rotation frequency or use tar compression
| |
|
| |
|
| {{Tip|After diagnosis, re-enable the saving options as needed. If disk I/O is the limiting factor, you may need to accept some PACKETBUFFER saturation or upgrade the storage subsystem.}} | | {{Note|If you see <code>IO[...] DISK_SAT</code> or <code>WARN</code> in the syslog status line (v2026.01.3+), disk saturation is already confirmed. See [[Syslog_Status_Line#IO.5B....5D_-_Disk_I.2FO_Monitoring_.28v2026.01.3.2B.29|IO[] Metrics]] for details.}} |
|
| |
|
| ;1. Monitor syslog for PACKETBUFFER saturation messages:
| | '''Quick confirmation (for older versions):''' |
| <syntaxhighlight lang="bash">
| |
| # For Debian/Ubuntu
| |
| tail -f /var/log/syslog | grep -i packetbuffer
| |
|
| |
|
| # For CentOS/RHEL
| | Temporarily save only RTP headers to reduce disk write load: |
| tail -f /var/log/messages | grep -i packetbuffer
| |
| </syntaxhighlight>
| |
| | |
| Look for error messages such as:
| |
| <code>PACKETBUFFER: memory is FULL</code>
| |
| <code>dropping packet because packetbuffer is full</code>
| |
| | |
| ;2. Check sensor statistics in the GUI:
| |
| :* Navigate to '''Settings > Sensors'''
| |
| :* Expand the status for the affected sensor
| |
| :* Look for "# packet drops" counter or PACKETBUFFER-related errors
| |
| | |
| ;3. Check CPU thread distribution:
| |
| <syntaxhighlight lang="bash">
| |
| # View real-time CPU usage per thread
| |
| top -H -p $(pgrep voipmonitor)
| |
| </syntaxhighlight>
| |
| If one thread is at 100% CPU while others are idle, this indicates a threading bottleneck causing PACKETBUFFER saturation.
| |
| | |
| === Solution: Increase Threading and Buffer Memory ===
| |
| {{Tip|For complete configuration parameter reference with specific values for high-traffic scenarios (including <code>rtpthreads_start = 20</code>, <code>threading_expanded = high_traffic</code>, <code>max_buffer_mem = 10000</code> for 8,000-10,000 concurrent calls), see [[Sniffer_configuration#Core Threading Model|Core Threading Model]].}}
| |
| | |
| ;1. Increase RTP processing threads:
| |
| Edit <code>/etc/voipmonitor.conf</code> and add or modify:
| |
| <syntaxhighlight lang="ini"> | | <syntaxhighlight lang="ini"> |
| # Increase starting number of RTP threads (default is CPU count or rtpthreads value) | | # /etc/voipmonitor.conf |
| # For high concurrent call loads (8,000-10,000+ calls), set to approximately half of CPU count
| | savertp = header |
| # Recommended value for 8k-10k calls: rtpthreads_start = 20
| |
| rtpthreads_start = 8
| |
| | |
| # Alternatively, set specific thread count higher if needed
| |
| rtpthreads_start = 16
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| ;2. Increase packet buffer memory:
| | Restart the sniffer and monitor. If heap usage stabilizes and "MEMORY IS FULL" errors stop, the issue is confirmed to be storage I/O. |
| <syntaxhighlight lang="ini">
| |
| # Increase maximum buffer memory from default 2000 MB to prevent PACKETBUFFER saturation
| |
| # Ensure sufficient system RAM is available (this memory is used for packet buffering)
| |
| max_buffer_mem = 13000
| |
| </syntaxhighlight>
| |
| {{Warning|Setting <code>max_buffer_mem</code> too high on systems with limited RAM can cause OOM issues. Monitor memory usage after increasing this value.}}
| |
|
| |
|
| ;3. Enable or verify threading_expanded mode:
| | '''Check storage health before upgrading:''' |
| <syntaxhighlight lang="ini">
| |
| # Ensure threading_expanded is enabled (should be yes by default)
| |
| # This enables the modern multi-threaded processing engine that auto-scales threads
| |
| # For 8,000-10,000+ concurrent calls, use high_traffic mode:
| |
| threading_expanded = high_traffic
| |
| </syntaxhighlight>
| |
| {{Warning|Do not enable <code>threading_mod = 4</code> as this is deprecated. Only consider the legacy <code>threading_mod = 1</code> for single-threaded processing in extremely limited CPU scenarios as a last resort.}}
| |
| | |
| === Verification After Configuration Changes ===
| |
| ;1. Restart the VoIPmonitor service:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| systemctl restart voipmonitor
| | # Check drive health |
| </syntaxhighlight>
| | smartctl -a /dev/sda |
|
| |
|
| ;2. Monitor syslog during a high-traffic window:
| | # Check for I/O errors in system logs |
| <syntaxhighlight lang="bash">
| | dmesg | grep -i "i/o error\|sd.*error\|ata.*error" |
| tail -f /var/log/syslog | grep -i packetbuffer
| |
| </syntaxhighlight> | | </syntaxhighlight> |
| The PACKETBUFFER saturation errors should no longer appear.
| |
|
| |
|
| {{Tip|Monitor the PACKETBUFFER heap usage during peak traffic. The heap[A|B|C] values in syslog should remain below 20% for stable operation. If heap values consistently exceed 20-30%, further tuning is needed (increase threads or buffer memory, or upgrade hardware).}}
| | Look for reallocated sectors, pending sectors, or I/O errors. Replace failing drives before considering upgrades. |
|
| |
|
| ;3. Verify RTP streams are complete:
| | '''Storage controller cache settings:''' |
| Make several test calls and verify:
| | {| class="wikitable" |
| * Download the PCAP file for each call in the GUI
| | |- |
| * Open the PCAP in Wireshark or tshark
| | ! Storage Type !! Recommended Cache Mode |
| * Check that RTP streams are complete and not truncated
| | |- |
| <syntaxhighlight lang="bash">
| | | HDD / NAS || WriteBack (requires battery-backed cache) |
| # Count RTP packets in a captured PCAP
| | |- |
| tshark -r /path/to/capture.pcap -Y "rtp" | wc -l
| | | SSD || WriteThrough (or WriteBack with power loss protection) |
| </syntaxhighlight>
| | |} |
| | |
| ;4. Monitor memory stability:
| |
| <syntaxhighlight lang="bash">
| |
| # Watch memory usage
| |
| watch -n 5 free -h
| |
| | |
| # Check if OOM killer events have stopped
| |
| dmesg | grep -i "killed process" | tail -10
| |
| </syntaxhighlight>
| |
| | |
| Memory usage should stabilize and not grow indefinitely after applying these changes.
| |
| | |
| === Additional Configuration for Distributed Client/Server Mode ===
| |
| If using <code>packetbuffer_sender = yes</code> to forward packets from remote probes to a central server, additional tuning is required:
| |
| | |
| ;1. Enable packet buffer compression on the probe:
| |
| <syntaxhighlight lang="ini">
| |
| # Enable compression to reduce network transfer rate to central server
| |
| # This prevents PACKETBUFFER saturation when network bandwidth is a bottleneck
| |
| packetbuffer_compress = yes
| |
| </syntaxhighlight>
| |
| | |
| ;2. Increase max_buffer_mem on the central server:
| |
| The central server processing the mirrored traffic needs higher buffer memory:
| |
| <syntaxhighlight lang="ini">
| |
| # On the central server receiving packets
| |
| max_buffer_mem = 10000
| |
| </syntaxhighlight>
| |
| | |
| {{Note|When using <code>packetbuffer_sender = yes</code>, these settings must be applied on the '''receiving central server''', not the probe itself. The probe forwards all packets including RTP to the central server for processing.}}
| |
| | |
| === Related Documentation ===
| |
| For complete details on these parameters, see [[Sniffer_configuration#Core_Threading_Model|Core Threading Model]] and [[Scaling|Scaling and Performance Tuning]].
| |
| | |
| == Step 9: Troubleshooting System Instability Due to Storage Hardware Failure ==
| |
| | |
| If your VoIPmonitor sensor is showing as disconnected (red X in GUI) and the overall system is unstable (frequent crashes, sluggish performance, or unexpected restarts), the root cause may be **storage hardware failure**.
| |
| | |
| When the disk subsystem fails, VoIPmonitor cannot write packets or persist data, leading to critical log messages like:
| |
| * <code>packetbuffer: MEMORY IS FULL</code>
| |
| * <code>DROPPED PACKETS</code>
| |
| | |
| {{Warning|1=Do NOT confuse storage hardware failure with PACKETBUFFER saturation (Step 8). PACKETBUFFER saturation is due to high traffic load and is resolved by increasing threads and buffer memory. Storage hardware failure is due to physical disk problems and requires hardware replacement.}}
| |
| | |
| === Symptoms of Storage Hardware Failure ===
| |
| | |
| * Sensors showing as disconnected (red X) in the GUI
| |
| * System becoming unstable during normal operation
| |
| * Log messages indicating packetbuffer memory is full or packets being dropped
| |
| * Slow I/O operations, increased latency for database queries
| |
| * Intermittent service restarts or crashes
| |
| * Filesystem corruption errors in system logs
| |
| * Symptoms occur even with low or moderate traffic (unlike PACKETBUFFER saturation)
| |
| | |
| === Step 1: Check VoIPmonitor Service Logs for Storage Issues ===
| |
| | |
| Examine the VoIPmonitor logs for evidence of storage problems:
| |
| | |
| <syntaxhighlight lang="bash">
| |
| # Check voipmonitor service logs for packetbuffer or DROPPED messages
| |
| grep -E "packetbuffer.*MEMORY IS FULL|DROPPED PACKET" /var/log/syslog | tail -50
| |
| | |
| # For systemd systems
| |
| journalctl -u voipmonitor | grep -E "packetbuffer|DROPPED" | tail -50
| |
| </syntaxhighlight>
| |
| | |
| === Step 2: Use smartctl to Check Disk Health ===
| |
| | |
| The <code>smartctl</code> utility allows you to check the health status and error counters of physical disks in the storage array.
| |
|
| |
|
| <syntaxhighlight lang="bash"> | | Use vendor-specific tools to configure cache policy (<code>megacli</code>, <code>ssacli</code>, <code>perccli</code>). |
| # Identify all block devices (disks)
| |
| lsblk
| |
| | |
| # Check SMART health status for each disk
| |
| smartctl -H /dev/sda
| |
| smartctl -H /dev/sdb
| |
| | |
| # Get detailed information including error counters
| |
| smartctl -a /dev/sda | grep -E "SMART overall-health|Errors|Reallocated|Pending|Runtime"
| |
| </syntaxhighlight> | |
|
| |
|
| | '''Storage upgrades (in order of effectiveness):''' |
| {| class="wikitable" | | {| class="wikitable" |
| |- | | |- |
| ! SMART Metric !! What It Indicates !! Action Required | | ! Solution !! IOPS Improvement !! Notes |
| |- | | |- |
| ! ID 5: Reallocated Sector Count
| | | '''NVMe SSD''' || 50-100x vs HDD || Best option, handles 10,000+ concurrent calls |
| | Bad sectors have been remapped. Non-zero value indicates disk degradation | |
| | Monitor closely; replace if growing | |
| |- | | |- |
| ! ID 10: Spin Retry Count
| | | '''SATA SSD''' || 20-50x vs HDD || Good option, handles 5,000+ concurrent calls |
| | Drive had to retry spinning up platters. Indicates mechanical problems | |
| | Replace disk immediately | |
| |- | | |- |
| ! ID 197: Current Pending Sector Count
| | | '''RAID 10 with BBU''' || 5-10x vs single disk || Enable WriteBack cache (requires battery backup) |
| | Bad sectors waiting to be remapped. Data may be corrupted | |
| | Backup immediately; replace disk | |
| |- | | |- |
| ! ID 198: Uncorrectable Sector Count
| | | '''Separate storage server''' || Variable || Use [[Sniffer_distributed_architecture|client/server mode]] |
| | Drive encountered read/write errors it could not correct | |
| | Replace disk immediately | |
| |- | |
| ! SMART overall-health
| |
| | "FAILED" means disk is critically damaged and should be replaced | |
| | Replace immediately | |
| |} | | |} |
|
| |
|
| === Step 3: Check System Logs for Filesystem Corruption ===
| | '''Filesystem tuning (ext4):''' |
| | |
| Kernel logs often reveal filesystem or journal corruption errors indicating storage problems.
| |
| | |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Check dmesg for filesystem errors, I/O errors, or corruption | | # Check current mount options |
| dmesg | grep -E "I/O error|filesystem.*corrupt|journal.*error|EXT4-fs error" | tail -50
| | mount | grep voipmonitor |
| </syntaxhighlight>
| |
|
| |
|
| Error patterns indicating storage failure: <code>Buffer I/O error on device sdX</code>, <code>EXT4-fs error</code>, <code>journal commit I/O error</code>, <code>RAID array mdX is degraded</code>, <code>Hardware Error</code>.
| | # Recommended mount options for /var/spool/voipmonitor |
| | | # Add to /etc/fstab: noatime,data=writeback,barrier=0 |
| === Step 4: Check RAID Array Status (if applicable) ===
| | # WARNING: barrier=0 requires battery-backed RAID |
| | |
| <syntaxhighlight lang="bash">
| |
| # Check mdadm software RAID status
| |
| cat /proc/mdstat
| |
| mdadm --detail /dev/md0
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| If RAID state is "Degraded", replace the failed disk and rebuild the array. If "Failed", multiple disks have failed - risk of total data loss.
| | '''Verify improvement:''' |
| | |
| === Step 5: Replace Failing Hardware ===
| |
| | |
| 1. Identify failing disk with <code>lsblk -o NAME,SIZE,TYPE,MOUNTPOINT,SERIAL</code>
| |
| 2. Physically replace the failing drive with a new one
| |
| 3. Rebuild array with <code>mdadm --add /dev/md0 /dev/sdX</code> and monitor with <code>cat /proc/mdstat</code>
| |
| 4. Verify replacement with <code>smartctl -H /dev/sdX</code>
| |
| 5. Restart VoIPmonitor: <code>systemctl restart voipmonitor</code>
| |
| | |
| === Prevention: Monitor Storage Health Proactively ===
| |
| | |
| * Enable SMART monitoring: Install smartmontools, enable smartd service, configure <code>/etc/smartd.conf</code> with <code>DEVICESCAN -H -m email@example.com</code>
| |
| * Set up RAID monitoring with cron script: <code>grep degraded /proc/mdstat</code>
| |
| | |
| === Related Documentation ===
| |
| | |
| * [[Emergency_procedures|Diagnosing Database Bottlenecks]] - General database tuning
| |
| * [[IO_Measurement|IO Measurement]] - Benchmarking storage performance
| |
| | |
| == Step 10: Missing CDRs for Calls with Large Packets ==
| |
| If VoIPmonitor is capturing some calls successfully but missing CDRs for specific calls (especially those that seem to have larger SIP packets like INVITEs with extensive SDP), there are two common causes to investigate.
| |
| | |
| === Cause 1: snaplen Packet Truncation (VoIPmonitor Configuration) ===
| |
| The <code>snaplen</code> parameter in <code>voipmonitor.conf</code> limits how many bytes of each packet are captured. If a SIP packet exceeds <code>snaplen</code>, it is truncated and the sniffer may fail to parse the call correctly.
| |
| | |
| ;1. Check your current snaplen setting:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| grep snaplen /etc/voipmonitor.conf
| | # After changes, monitor iostat |
| | iostat -xz 2 10 |
| | # %util should drop below 70%, await should decrease |
| </syntaxhighlight> | | </syntaxhighlight> |
| Default is 3200 bytes (6000 if SSL/HTTP is enabled).
| |
|
| |
|
| ;2. Test if packet truncation is the issue:
| | === Solution: CPU Bottleneck === |
| Use <code>tcpdump</code> with <code>-s0</code> (snap infinite) to capture full packets:
| |
| <syntaxhighlight lang="bash">
| |
| # Capture SIP traffic with full packet length
| |
| tcpdump -i eth0 -s0 -nn port 5060 -w /tmp/test_capture.pcap
| |
|
| |
|
| # Analyze packet sizes with Wireshark or tshark
| | ==== Identify CPU Bottleneck Using Manager Commands ==== |
| tshark -r /tmp/test_capture.pcap -T fields -e frame.len -Y "sip" | sort -n | tail -10
| |
| </syntaxhighlight>
| |
| If you see SIP packets larger than your <code>snaplen</code> value (e.g., 4000+ bytes), increase <code>snaplen</code> in <code>voipmonitor.conf</code>:
| |
| <syntaxhighlight lang="ini">
| |
| snaplen = 65535
| |
| </syntaxhighlight>
| |
| Then restart the sniffer: <code>systemctl restart voipmonitor</code>.
| |
| | |
| === Cause 2: MTU Mismatch (Network Infrastructure) === | |
| | |
| {{Note|This cause is different from snaplen truncation. MTU mismatch is a network infrastructure issue where packets never reach VoIPmonitor, while snaplen is a VoIPmonitor configuration setting.}}
| |
| | |
| If packets are being lost or fragmented due to MTU mismatches in the network path, VoIPmonitor may never receive the complete packets, regardless of <code>snaplen</code> settings. This occurs when the network path between servers contains a device with a lower MTU (Maximum Transmission Unit), causing packets to be fragmented or dropped.
| |
|
| |
|
| ;1. Diagnose MTU-related packet loss:
| | VoIPmonitor provides manager commands to monitor thread CPU usage in real-time. This is essential for identifying which thread is saturated. |
| Capture network traffic directly on the VoIPmonitor host using tcpdump with <code>-s0</code> to ensure full packet length:
| |
|
| |
|
| | '''Connect to manager interface:''' |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Capture traffic on the VoIPmonitor host with full packet length | | # Via Unix socket (local, recommended) |
| # Replace <ip_address> with the IP address of the device you're monitoring (e.g., PBX)
| | echo 'sniffer_threads' | nc -U /tmp/vm_manager_socket |
| tcpdump -i any -s0 host <ip_address> -w /tmp/mtu_test.pcap
| |
| </syntaxhighlight>
| |
|
| |
|
| Analyze the resulting pcap file with a tool like Wireshark to verify if large packets are being captured completely or are missing fragments.
| | # Via TCP port 5029 (remote or local) |
| | echo 'sniffer_threads' | nc 127.0.0.1 5029 |
|
| |
|
| ;2. Verify packet completeness in Wireshark:
| | # Monitor continuously (every 2 seconds) |
| Open the pcap in Wireshark and look for these indicators of MTU issues:
| | watch -n 2 "echo 'sniffer_threads' | nc -U /tmp/vm_manager_socket" |
| * Large SIP packets (INVITE with extensive SDP) appearing truncated or incomplete
| |
| * Missing IP fragments (e.g., only the first fragment arrives, subsequent fragments are missing)
| |
| * ICMP "Fragmentation needed" messages (Type 3, Code 4) visible in the capture
| |
| * TCP retransmissions for the same large packet
| |
| * Reassembled PDUs marked as incomplete or "[Malformed] Packet"
| |
| | |
| Examine large SIP INVITE packets specifically. If the SIP headers or SDP appear cut off or incomplete, the issue is confirmed to be MTU mismatch in the network path.
| |
| | |
| ;3. Identify the MTU bottleneck:
| |
| The issue is typically a network device with a lower MTU than the end devices, often 1500 bytes in networks that otherwise use 9000-byte jumbo frames. Common locations for MTU bottlenecks:
| |
| * VPN concentrators
| |
| * Firewalls with lower MTU settings
| |
| * Routers with tunnel interfaces
| |
| * Cloud provider gateways (typically 1500 bytes vs. standard 9000 jumbo frames)
| |
| * Switches or routers with MTU mismatches between physical ports
| |
| | |
| To locate the problematic device, trace the MTU along the network path from the PBX to the VoIPmonitor sensor.
| |
| | |
| ;4. Resolution options:
| |
| Once the capture shows incomplete large packets, the solution is to investigate the network infrastructure:
| |
| | |
| * Increase MTU on the bottleneck device to match the rest of the network (e.g., from 1500 to 9000 for jumbo frame environments)
| |
| * Enable Path MTU Discovery (PMTUD) on intermediate devices to allow proper fragmentation negotiation
| |
| * Ensure your switching infrastructure supports jumbo frames end-to-end if you are using them
| |
| * Check all devices in the path (routers, firewalls, switches) for consistent MTU configuration
| |
| * For VPN/tunnel environments, verify MTU settings on tunnel interfaces
| |
| | |
| {{Warning|MTU mismatch cannot be fixed with VoIPmonitor configuration changes. Increasing snaplen will not help because the complete packet never reaches the sensor due to network-level fragmentation/dropping.}}
| |
| | |
| For more information on the <code>snaplen</code> parameter (which is different from MTU issues), see [[Sniffer_configuration#Network_Interface_.26_Sniffing|Sniffer Configuration]].
| |
| | |
| === Cause 3: External Source Packet Truncation (Traffic Mirroring/LBS Modules) ===
| |
| If packets are truncated or corrupted BEFORE they reach VoIPmonitor, changing <code>snaplen</code> will NOT fix the issue. This scenario occurs when using external SIP sources that have their own packet size limitations.
| |
| | |
| ; Symptoms to identify this scenario:
| |
| * Large SIP packets (e.g., WebRTC INVITE with big Authorization headers ~4k) appear truncated
| |
| * Packets show as corrupted or malformatted in VoIPmonitor GUI
| |
| * Changing <code>snaplen</code> in <code>voipmonitor.conf</code> has no effect
| |
| * Using TCP instead of UDP in the external system does not resolve the issue
| |
| | |
| ; Common external sources that may truncate packets:
| |
| # Kamailio <code>siptrace</code> module
| |
| # FreeSWITCH <code>sip_trace</code> module
| |
| # OpenSIPS tracing modules
| |
| # Custom HEP/HOMER agent implementations
| |
| # Load balancers or proxy servers with traffic mirroring
| |
| | |
| ; Diagnose external source truncation:
| |
| Use <code>tcpdump</code> with <code>-s0</code> (snap infinite) on the VoIPmonitor sensor to compare packet sizes:
| |
| <syntaxhighlight lang="bash">
| |
| # Capture traffic received by VoIPmonitor
| |
| sudo tcpdump -i eth0 -s0 -nn port 5060 -w /tmp/voipmonitor_input.pcap
| |
| | |
| # Analyze actual packet sizes received
| |
| tshark -r /tmp/voipmonitor_input.pcap -T fields -e frame.len -Y "sip.Method == INVITE" | sort -n | tail -10
| |
| </syntaxhighlight>
| |
| | |
| If:
| |
| * You see packets with truncated SIP headers or incomplete SDP
| |
| * The packet length is much smaller than expected (e.g., 1500 bytes instead of 4000+ bytes)
| |
| * Truncation is consistent across all calls
| |
| | |
| Then the external source is truncating packets before they reach VoIPmonitor.
| |
| | |
| ; Solutions for Kamailio siptrace truncation:
| |
| If using Kamailio's <code>siptrace</code> module with traffic mirroring:
| |
| | |
| 1. Configure Kamailio to use TCP transport for siptrace (may help in some cases):
| |
| <syntaxhighlight lang="ini">
| |
| # In kamailio.cfg
| |
| modparam("siptrace", "duplicate_uri", "sip:voipmonitor_ip:port;transport=tcp")
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| 2. If Kamailio reports "Connection refused", VoIPmonitor does not open a TCP listener by default. Manually open one:
| | {{Note|1=TCP port 5029 is encrypted by default. For unencrypted access, set <code>manager_enable_unencrypted = yes</code> in voipmonitor.conf (security risk on public networks).}} |
| <syntaxhighlight lang="bash"> | |
| # Open TCP listener using socat
| |
| socat TCP-LISTEN:5888,fork,reuseaddr &
| |
| </syntaxhighlight> | |
| Then update kamailio.cfg to use the specified port instead of the standard SIP port.
| |
|
| |
|
| 3. Use HAProxy traffic 'tee' function (recommended):
| | '''Example output:''' |
| If your architecture includes HAProxy in front of Kamailio, use its traffic mirroring to send a copy of the WebSocket traffic directly to VoIPmonitor's standard SIP listening port. This bypasses the siptrace module entirely and preserves original packets:
| |
| <syntaxhighlight lang="text"> | | <syntaxhighlight lang="text"> |
| # In haproxy.cfg, within your frontend/backend configuration
| | t0 - binlog1 fifo pcap read ( 12345) : 78.5 FIFO 99 1234 |
| # Send a copy of traffic to VoIPmonitor
| | t2 - binlog1 pb write ( 12346) : 12.3 456 |
| option splice-response
| | rtp thread binlog1 binlog1 0 ( 12347) : 8.1 234 |
| tcp-request inspect-delay 5s
| | rtp thread binlog1 binlog1 1 ( 12348) : 6.2 198 |
| tcp-request content accept if { req_ssl_hello_type 1 }
| | t1 - binlog1 call processing ( 12349) : 4.5 567 |
| use-server voipmonitor if { req_ssl_hello_type 1 }
| | tar binlog1 compression 0 ( 12350) : 3.2 89 |
| listen voipmonitor_mirror
| |
| bind :5888
| |
| mode tcp
| |
| server voipmonitor <voipmonitor_sensor_ip>:5060 send-proxy
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Note: The exact HAProxy configuration depends on your architecture and whether you are mirroring TCP (WebSocket) or UDP traffic.
| | '''Column interpretation:''' |
| | {| class="wikitable" |
| | |- |
| | ! Column !! Description |
| | |- |
| | | Thread name || Descriptive name (t0=capture, t1=call processing, t2=packet buffer write) |
| | |- |
| | | (TID) || Linux thread ID (useful for <code>top -H -p TID</code>) |
| | |- |
| | | CPU % || Current CPU usage percentage - '''key metric''' |
| | |- |
| | | Sched || Scheduler type (FIFO = real-time, empty = normal) |
| | |- |
| | | Priority || Thread priority |
| | |- |
| | | CS/s || Context switches per second |
| | |} |
|
| |
|
| ; Solutions for other external sources:
| | '''Critical threads to watch:''' |
| # Check the external system's documentation for packet size limits or truncation settings
| | {| class="wikitable" |
| # Consider using standard network mirroring (SPAN/ERSPAN/GRE) instead of SIP tracing modules
| | |- |
| # Ensure the external system captures full packet lengths (disable any internal packet size caps)
| | ! Thread !! Role !! If at 90-100% |
| # Verify that the external system does not reassemble or modify SIP packets before forwarding
| | |- |
| | | '''t0''' (pcap read) || Packet capture from NIC || '''Single-core limit reached!''' Cannot parallelize. Need DPDK/Napatech. |
| | |- |
| | | '''t2''' (pb write) || Packet buffer processing || Processing bottleneck. Check t2CPU breakdown. |
| | |- |
| | | '''rtp thread''' || RTP packet processing || Threads auto-scale. If still saturated, consider DPDK/Napatech. |
| | |- |
| | | '''tar compression''' || PCAP archiving || I/O bottleneck (compression waiting for disk) |
| | |- |
| | | '''mysql store''' || Database writes || Database bottleneck. Check SQLq metric. |
| | |} |
|
| |
|
| == Troubleshooting: CDR Shows 000 No Response Despite Valid SIP Response ==
| | {{Warning|If '''t0 thread is at 90-100%''', you have hit the fundamental single-core capture limit. The t0 thread reads packets from the kernel and '''cannot be parallelized'''. Disabling features like jitterbuffer will NOT help - those run on different threads. The only solutions are: |
| | * '''Reduce captured traffic''' using <code>interface_ip_filter</code> or BPF <code>filter</code> |
| | * '''Use kernel bypass''' ([[DPDK]] or [[Napatech]]) which eliminates kernel overhead entirely}} |
|
| |
|
| If the CDR View displays "000 No Response" in the Last Response column for calls that actually have valid final SIP response codes (such as 403 Forbidden, 500 Server Error, etc.), this indicates that the sniffer is receiving response packets but failing to correlate them with their corresponding INVITE transactions before writing the CDR.
| | ==== Interpreting t2CPU Detailed Breakdown ==== |
|
| |
|
| === Diagnosis: Verify Response Packets Are Captured ===
| | The syslog status line shows <code>t2CPU</code> with detailed sub-metrics: |
| | | <syntaxhighlight lang="text"> |
| ;1. Locate the affected call in the CDR View:
| | t2CPU[pb:10/ d:39/ s:24/ e:17/ c:6/ g:6/ r:7/ rm:24/ rh:16/ rd:19/] |
| :* Find a call showing "000 No Response" in the Last Response column.
| |
| | |
| ;2. Check the SIP History:
| |
| :* Click the [+] icon to expand the call's detail view. | |
| :* Open the "SIP History" tab. | |
| :* Look for the actual SIP response (e.g., 403 Forbidden, 486 Busy Here, 500 Internal Server Error). | |
| | |
| If the response packet IS present in SIP History, the issue is a correlation timing problem. Proceed to the solution below.
| |
| | |
| If the response packet is NOT present in SIP History, the issue is a network visibility problem (see [[#SPAN_Configuration_Troubleshooting|Step 3: Investigate Packet Encapsulation]] and other network troubleshooting sections).
| |
| | |
| === Root Cause: libpcap Packet Queue Timeout ===
| |
| | |
| The issue is caused by VoIPmonitor's libpcap packet capture timing out before responses can be matched to their originating INVITEs. This typically occurs in high-traffic environments or when packet processing is temporarily delayed due to system load.
| |
| | |
| The sniffer creates CDR records based on SIP INVITE packets. It attempts to correlate subsequent SIP responses (403, 500, etc.) with the original INVITE. If the packet queue processing is too slow or the time window is too short, responses arrive after the CDR has already been written with "Last Response" set to 0.
| |
| | |
| === Solution: Configure libpcap Nonblocking Mode ===
| |
| | |
| Edit the "/etc/voipmonitor.conf" file on the sniffer host and add the following parameters:
| |
| | |
| <syntaxhighlight lang="ini">
| |
| # Enable libpcap nonblocking mode to prevent packet queue blocking
| |
| libpcap_nonblock_mode = yes
| |
| | |
| # Increase packet deque window length (in milliseconds) for response correlation
| |
| # Default is often 2000ms, increasing to 5000ms gives more time for responses
| |
| pcap_queue_deque_window_length = 5000
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Save the file and restart the voipmonitor service:
| | {| class="wikitable" |
| | | |- |
| <syntaxhighlight lang="bash">
| | ! Code !! Function !! High Value Indicates |
| systemctl restart voipmonitor
| | |- |
| </syntaxhighlight>
| | | '''pb''' || Packet buffer output || Buffer management overhead |
| | | |- |
| === Additional Considerations ===
| | | '''d''' || Dispatch || Structure creation bottleneck |
| | |
| ;If the issue persists after applying the fix:
| |
| :* Try increasing <code>pcap_queue_deque_window_length</code> further (e.g., to 7000 or 10000 milliseconds)
| |
| :* Check system load to ensure the server is not under heavy CPU or I/O pressure
| |
| :* Verify adequate <code>ringbuffer</code> size is configured for your traffic volume (see [[Scaling|Scaling and Performance Tuning]])
| |
| | |
| ;For distributed architectures:
| |
| :* Ensure all voipmonitor hosts have synchronized time (see [[#Verify_System_Time_Synchronization]])
| |
| :* Time mismatches between components can cause correlation failures
| |
| | |
| {{Note|The <code>pcap_queue_deque_window_length</code> parameter is also used in distributed mirroring scenarios to sort packets from multiple mirrors. Increasing this value improves packet correlation in both single-sensor and distributed setups.}}
| |
| | |
| For more information on packet capture configuration, see [[Sniffer_configuration|Sniffer Configuration]].
| |
| | |
| == Step 9: Probe Timeout Due to Virtualization Timing Issues ==
| |
| | |
| If remote probes are intermittently disconnecting from the central server with timeout errors, even on a high-performance network with low load, the issue may be related to virtualization host timing problems rather than network connectivity.
| |
| | |
| === Diagnosis: Check System Log Timing Intervals ===
| |
| | |
| The VoIPmonitor sensor generates status log messages approximately every 10 seconds during normal operation. If the timing system on the probe is inconsistent, the interval between these status messages can exceed 30 seconds, triggering a connection timeout.
| |
| | |
| ;1. Monitor the system log on the affected probe:
| |
| <syntaxhighlight lang="bash">
| |
| tail -f /var/log/syslog | grep voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| ;2. Examine the timestamps of voipmonitor status messages:
| |
| Look for repeating log entries that should appear approximately every 10 seconds during normal operations.
| |
| | |
| ;3. Identify timing irregularities:
| |
| Calculate the time interval between successive status log entries. '''If the interval exceeds 30 seconds''', this indicates a timing system problem that will cause connection timeouts with the central server.
| |
| | |
| === Root Cause: Virtualization Host RDTSC Issues ===
| |
| | |
| This problem is '''not''' network-related. It is a host-level timing issue that impacts the application's internal timers.
| |
| | |
| The issue typically occurs on virtualized probes where the host's CPU timekeeping is inconsistent. Specifically, problems with the RDTSC (Read Time-Stamp Counter) CPU instruction on the virtualization host can cause:
| |
| | |
| * Irregular system clock behavior on the guest VM
| |
| * Application timers that do not fire consistently
| |
| * Sporadic timeouts in client-server connections
| |
| | |
| === Resolution ===
| |
| | |
| ;1. Investigate the virtualization host configuration:
| |
| Check the host's hypervisor or virtualization platform documentation for known timekeeping issues related to RDTSC.
| |
| | |
| Common virtualization platforms with known timing considerations:
| |
| * KVM/QEMU: Check CPU passthrough and TSC mode settings
| |
| * VMware: Verify time synchronization between guest and host
| |
| * Hyper-V: Review Integration Services time sync configuration
| |
| * Xen: Check TSC emulation settings
| |
| | |
| ;2. Apply host-level fixes:
| |
| These are host-level fixes, not changes to the guest VM configuration. Consult your virtualization platform's documentation for specific steps to address RDTSC timing issues.
| |
| | |
| Typical solutions include:
| |
| * Enabling appropriate TSC modes on the host
| |
| * Configuring CPU features passthrough correctly
| |
| * Adjusting hypervisor timekeeping parameters
| |
| | |
| ;3. Verify the fix:
| |
| After applying the host-level configuration changes, monitor the probe's status logs again to confirm that the timing intervals are now consistently around 10 seconds (never exceeding 30 seconds).
| |
| | |
| <syntaxhighlight lang="bash">
| |
| # Monitor for regular status messages
| |
| tail -f /var/log/syslog | grep voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| Once the timing is corrected, probe connections to the central server should remain stable without intermittent timeouts.
| |
| | |
| == Troubleshooting: Audio Missing on One Call Leg ==
| |
| | |
| If the sniffer captures full audio on one call leg (e.g., carrier/outside) but only partial or no audio on the other leg (e.g., PBX/inside), use this diagnostic workflow to identify the root cause BEFORE applying any configuration fixes.
| |
| | |
| The key question to answer is: '''Are the RTP packets for the silent leg present on the wire?'''
| |
| | |
| === Step 1: Use tcpdump to Capture Traffic During a Test Call ===
| |
| | |
| Initiate a new test call that reproduces the issue. During the call, use tcpdump or tshark directly on the sensor's sniffing interface to capture all traffic:
| |
| | |
| <syntaxhighlight lang="bash">
| |
| # Capture traffic to a file during the test call
| |
| # Replace eth0 with your sniffing interface
| |
| tcpdump -i eth0 -s0 -w /tmp/direct_capture.pcap
| |
| | |
| # OR: Display live traffic for specific IPs (useful for real-time diagnostics)
| |
| tcpdump -i eth0 -s0 -nn "host <pbx_ip> or host <carrier_ip>"
| |
| </syntaxhighlight>
| |
| | |
| Let the call run for 10-30 seconds, then stop tcpdump with Ctrl+C.
| |
| | |
| === Step 2: Retrieve VoIPmonitor GUI's PCAP for the Same Call ===
| |
| | |
| After the call completes:
| |
| 1. Navigate to the '''CDR View''' in the VoIPmonitor GUI
| |
| 2. Find the test call you just made
| |
| 3. Download the PCAP file for that call (click the PCAP icon/button)
| |
| 4. Save it as: <code>/tmp/gui_capture.pcap</code>
| |
| | |
| === Step 3: Compare the Two Captures ===
| |
| | |
| Analyze both captures to determine if RTP packets for the silent leg are present on the wire:
| |
| | |
| <syntaxhighlight lang="bash">
| |
| # Count RTP packets in the direct capture
| |
| tshark -r /tmp/direct_capture.pcap -Y "rtp" | wc -l
| |
| | |
| # Count RTP packets in the GUI capture
| |
| tshark -r /tmp/gui_capture.pcap -Y "rtp" | wc -l
| |
| | |
| # Check for RTP from specific source IPs in the direct capture
| |
| tshark -r /tmp/direct_capture.pcap -Y "rtp" -T fields -e rtp.ssrc -e ip.src -e ip.dst
| |
| | |
| # Check Call-ID in both captures to verify they're the same call
| |
| tshark -r /tmp/direct_capture.pcap -Y "sip" -T fields -e sip.Call-ID | head -1
| |
| tshark -r /tmp/gui_capture.pcap -Y "sip" -T fields -e sip.Call-ID | head -1
| |
| </syntaxhighlight>
| |
| | |
| === Step 4: Interpret the Results ===
| |
| | |
| {| class="wikitable" style="background:#e7f3ff; border:1px solid #3366cc;"
| |
| |- | | |- |
| ! colspan="2" style="background:#3366cc; color: white;" | Diagnostic Decision Matrix
| | | '''s''' || SIP parsing || Complex/large SIP messages |
| |- | | |- |
| ! Observation
| | | '''e''' || Entity lookup || Call table lookup overhead |
| ! Root Cause & Next Steps
| |
| |- | | |- |
| | '''RTP packets for silent leg are NOT present in direct capture''' | | | '''c''' || Call processing || Call state machine processing |
| | '''Network/PBX Issue:''' The PBX or network is not sending the packets. This is not a VoIPmonitor problem. Troubleshoot the PBX (check NAT, RTP port configuration) or network (SPAN/mirror configuration, firewall rules). | |
| |- | | |- |
| | '''RTP packets for silent leg ARE present in direct capture but missing in GUI capture''' | | | '''g''' || Register processing || High REGISTER volume |
| | '''Sniffer Configuration Issue:''' Packets are on the wire but VoIPmonitor is failing to capture or correlate them. Likely causes: NAT IP mismatch (natalias configuration incorrect), SIP signaling advertises different IP than RTP source, or restrictive filter rules. Proceed with configuration fixes. | |
| |- | | |- |
| | '''RTP packets present in both captures but audio still silent''' | | | '''r, rm, rh, rd''' || RTP processing stages || High RTP volume (threads auto-scale) |
| | '''Codec/Transcoding Issue:''' Packets are captured correctly but may not be decoded properly. Check codec compatibility, unsupported codecs, or transcoding issues on the PBX. | |
| |} | | |} |
|
| |
|
| === Step 5: Apply the Correct Fix Based on Diagnosis ===
| | '''Thread auto-scaling:''' VoIPmonitor automatically spawns additional threads when load increases: |
| | | * If '''d''' > 50% → SIP parsing thread ('''s''') starts |
| ;If RTP is NOT on the wire (Network/PBX issue):
| | * If '''s''' > 50% → Entity lookup thread ('''e''') starts |
| * Check PBX RTP port configuration and firewall rules | | * If '''e''' > 50% → Call/register/RTP threads start |
| * Verify network SPAN/mirror is capturing bidirectional traffic (see [[#SPAN_Configuration_Troubleshooting|Section 3]])
| |
| * Check PBX NAT settings - RTP packets may be blocked or routed incorrectly
| |
| | |
| ;If RTP is on the wire but not captured (Sniffer configuration issue):
| |
| | |
| ==== Check rtp_check_both_sides_by_sdp Setting (Primary Cause) ====
| |
| | |
| This is the '''most common cause''' of one-way RTP capture when packets are present on the wire. The <code>rtp_check_both_sides_by_sdp</code> parameter controls how strictly RTP streams are correlated with SDP (Session Description Protocol) signaling.
| |
| | |
| Check the current setting in <code>/etc/voipmonitor.conf</code>:
| |
| <syntaxhighlight lang="bash">
| |
| grep "^rtp_check_both_sides_by_sdp" /etc/voipmonitor.conf
| |
| </syntaxhighlight>
| |
| | |
| If the setting is <code>yes</code> or <code>strict</code> or <code>very_strict</code>, this requires '''BOTH sides of RTP to exactly match SDP (SIP signaling)''': | |
| * <code>strict</code>: Only allows verified packets after first match (blocks unverified)
| |
| * <code>very_strict</code>: Blocks all unverified packets (most restrictive) | |
| * <code>keep_rtp_packets</code>: Same as <code>yes</code> but stores unverified packets for debugging
| |
| | |
| Symptoms of restrictive <code>rtp_check_both_sides_by_sdp</code> settings:
| |
| * Only one call leg appears in CDR (caller OR called, not both)
| |
| * Received packets column shows 0 or very low on one leg
| |
| * tcpdump shows both RTP streams present, but GUI captures only one
| |
| * Affects many calls, not just specific ones
| |
|
| |
|
| '''Solution:''' Change to <code>no</code> or comment out the line:
| | ==== Configuration for High Traffic (>10,000 calls/sec) ==== |
|
| |
|
| <syntaxhighlight lang="ini"> | | <syntaxhighlight lang="ini"> |
| ; /etc/voipmonitor.conf
| | # /etc/voipmonitor.conf |
| rtp_check_both_sides_by_sdp = no
| |
| </syntaxhighlight>
| |
|
| |
|
| Restart the sniffer to apply:
| | # Increase buffer to handle processing spikes (value in MB) |
| | # 10000 = 10 GB - can go higher (20000, 30000+) if RAM allows |
| | # Larger buffer absorbs I/O and CPU spikes without packet loss |
| | max_buffer_mem = 10000 |
|
| |
|
| <syntaxhighlight lang="bash">
| | # Use IP filter instead of BPF (more efficient) |
| systemctl restart voipmonitor
| | interface_ip_filter = 10.0.0.0/8 |
| | interface_ip_filter = 192.168.0.0/16 |
| | # Comment out any 'filter' parameter |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| If you previously set <code>rtp_check_both_sides_by_sdp = yes</code> to solve audio mixing issues in multi-call environments where multiple calls share the same IP:port, consider using alternative approaches like <code>sdp_multiplication</code> instead, as enabling strict checking breaks one-way RTP capture.
| | ==== CPU Optimizations ==== |
| | |
| ==== Advanced natalias Troubleshooting ==== | |
| | |
| If the basic natalias configuration does not resolve missing audio issues, try these advanced diagnostic steps.
| |
| | |
| === Try Reversing IP Order ===
| |
| | |
| The <code>natalias</code> parameter accepts two IP addresses, but the correct order depends on your NAT topology. If your initial configuration does not work, try the REVERSE order:
| |
|
| |
|
| <syntaxhighlight lang="ini"> | | <syntaxhighlight lang="ini"> |
| ; Try this first (most common):
| | # /etc/voipmonitor.conf |
| natalias = 44.198.136.90 10.100.220.90
| |
| ; ^ Public IP ^ Private IP
| |
| | |
| ; If that doesn't work, try the reverse:
| |
| natalias = 10.100.220.90 44.198.136.90
| |
| ; ^ Private IP ^ Public IP
| |
| </syntaxhighlight>
| |
| | |
| {{Tip|The correct IP order depends on whether the SDP IP and RTP source IP are in the same relationship as the order specified. Test both configurations to determine which works in your environment.}}
| |
| | |
| === Verify Root Cause: SDP Port vs Actual RTP Port Mismatch ===
| |
|
| |
|
| If natalias configuration (with both IP orders tested) still does not resolve missing audio, perform a packet capture to compare the RTP ports.
| | # Reduce jitterbuffer calculations to save CPU (keeps MOS-F2 metric) |
| | jitterbuffer_f1 = no |
| | jitterbuffer_f2 = yes |
| | jitterbuffer_adapt = no |
|
| |
|
| '''Step 1: Capture a problematic call:'''
| | # If MOS metrics are not needed at all, disable everything: |
| <syntaxhighlight lang="bash">
| | # jitterbuffer_f1 = no |
| # Capture SIP and RTP during a test call | | # jitterbuffer_f2 = no |
| tcpdump -i eth0 -nn "host <pbx_ip> or host <carrier_ip>" -w /tmp/missing_rtp.pcap
| | # jitterbuffer_adapt = no |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| '''Step 2: Compare SDP-advertised ports with actual RTP ports in the capture:'''
| | ==== Kernel Bypass Solutions (Extreme Loads) ==== |
|
| |
|
| Open the pcap in Wireshark or tshark:
| | When t0 thread hits 100% on standard NIC, kernel bypass is the only solution: |
|
| |
|
| <syntaxhighlight lang="bash">
| | {| class="wikitable" |
| # Find SIP INVITE and extract SDP RTP ports
| |
| tshark -r /tmp/missing_rtp.pcap -Y "sip.Method == INVITE" -T fields -e sip.Call-ID -e rtp.ssrc
| |
| | |
| # Check actual RTP ports in packet stream
| |
| tshark -r /tmp/missing_rtp.pcap -Y "rtp" -T fields -e udp.srcport -e udp.dstport | head -20
| |
| </syntaxhighlight>
| |
| | |
| '''Diagnostic Decision:'''
| |
| | |
| {| class="wikitable" style="background:#fff3cd; border:1px solid #ffc107;" | |
| |- | | |- |
| ! colspan="2" style="background:#ffc107;" | SDP vs Actual RTP Port Analysis | | ! Solution !! Type !! CPU Reduction !! Use Case |
| |- | | |- |
| | style="vertical-align: top;" | '''Ports match:'''
| | | '''[[DPDK]]''' || Open-source || ~70% || Multi-gigabit on commodity hardware |
| | The SDP-advertised RTP ports match the actual RTP packet ports. This is NOT a port mismatch issue. Try other troubleshooting steps (check natialias order, rtpip_find_endpoints, rtpfromsdp_onlysip). | |
| |- | | |- |
| | style="vertical-align: top;" | '''Ports DO NOT match:'''
| | | '''[[Napatech]]''' || Hardware SmartNIC || >97% (< 3% at 10Gbit) || Extreme performance requirements |
| | An external device (SBC, media server, firewall, NAT) is modifying the media ports. This is NOT a VoIPmonitor bug - the external device must be corrected in its configuration. VoIPmonitor requires SDP RTP ports to match actual RTP packet ports for correlation. | |
| |} | | |} |
|
| |
|
| '''How to fix external device port mismatch:'''
| | ==== Verify Improvement ==== |
|
| |
|
| * Identify which device is modifying the RTP ports (typically an SBC or media server)
| | <syntaxhighlight lang="bash"> |
| * Consult the device documentation for media port mapping configuration
| | # Monitor thread CPU after changes |
| * Configure the device to preserve original SDP RTP ports in signaling
| | watch -n 2 "echo 'sniffer_threads' | nc -U /tmp/vm_manager_socket | head -10" |
|
| |
|
| {{Warning|Do NOT try to compensate for external port modification with VoIPmonitor configuration. The external device must be corrected to ensure reliable RTP correlation.}}
| | # Or monitor syslog |
| | | journalctl -u voipmonitor -f |
| ==== Other Configuration Checks ====
| | # t0CPU should drop, heap values should stay < 20% |
| | |
| If checking <code>rtp_check_both_sides_by_sdp</code> does not resolve the issue, proceed with these additional diagnostic steps:
| |
| | |
| * Configure '''natalias''' in <code>/etc/voipmonitor.conf</code> to map the IP advertised in SIP signaling to the actual RTP source IP (NAT scenarios only):
| |
| <syntaxhighlight lang="ini">
| |
| ; /etc/voipmonitor.conf
| |
| natalias = <Public_IP_Signaled> <Private_IP_Actual>
| |
| </syntaxhighlight> | | </syntaxhighlight> |
| : When using <code>natalias</code>, ensure <code>rtp_check_both_sides_by_sdp</code> is set to <code>no</code> (the default).
| |
| * Check for restrictive <code>filter</code> directives in <code>voipmonitor.conf</code>
| |
| * Verify <code>sipport</code> includes all necessary SIP ports
| |
|
| |
|
| ;If packets are captured but audio silent (Codec issue): | | {{Note|1=After changes, monitor syslog <code>heap[A|B|C]</code> values - should stay below 20% during peak traffic. See [[Syslog_Status_Line]] for detailed metric explanations.}} |
| * Check CDR view for codec information on both legs
| |
| * Verify VoIPmonitor GUI has the necessary codec decoders installed
| |
| * Check for codec mismatches between call legs (transcoding may be missing)
| |
|
| |
|
| === Step 6: Verify the Fix After Configuration Changes === | | == Storage Hardware Failure == |
|
| |
|
| After making changes in <code>/etc/voipmonitor.conf</code>:
| | '''Symptom''': Sensor shows disconnected (red X) with "DROPPED PACKETS" at low traffic volumes. |
|
| |
|
| | '''Diagnosis''': |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Restart the sniffer | | # Check disk health |
| systemctl restart voipmonitor
| | smartctl -a /dev/sda |
|
| |
|
| # Make another test call and repeat the diagnostic workflow | | # Check RAID status (if applicable) |
| # Compare direct vs GUI capture again
| | cat /proc/mdstat |
| | mdadm --detail /dev/md0 |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Confirm that RTP packets for the problematic leg now appear in both the direct tcpdump capture AND the GUI's PCAP file.
| | Look for reallocated sectors, pending sectors, or RAID degraded state. Replace failing disk. |
|
| |
|
| '''Note:''' This diagnostic methodology helps you identify whether the issue is in the network infrastructure (PBX, SPAN, firewall) or in VoIPmonitor configuration (natalias, filters). Applying VoIPmonitor configuration fixes when the root cause is a network issue will not resolve the problem.
| | == OOM (Out of Memory) == |
|
| |
|
| == Troubleshooting: Server Coredumps and SQL Queue Overload == | | === Identify OOM Victim === |
|
| |
|
| If the VoIPmonitor server is experiencing regular coredumps, the cause may be an SQL queue bottleneck that exceeds system limits. The SQL queue grows when the database cannot keep up with the rate of data being inserted from VoIPmonitor.
| |
|
| |
| === Symptoms ===
| |
|
| |
| * Server crashes or coredumps regularly, often during peak traffic hours
| |
| * Syslog messages showing a growing <code>SQLq</code> counter (SQL queries waiting)
| |
| * Crashes occur when OPTIONS, SUBSCRIBE, and NOTIFY messages are being processed at high volume
| |
|
| |
| === Identify the Root Cause ===
| |
|
| |
| ;1. Check the SQL queue metric in syslog:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Debian/Ubuntu | | # Check for OOM kills |
| tail -f /var/log/syslog | grep "SQLq"
| | dmesg | grep -i "out of memory\|oom\|killed process" |
| | | journalctl --since "1 hour ago" | grep -i oom |
| # CentOS/RHEL
| |
| tail -f /var/log/messages | grep "SQLq"
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Look for the <code>SQLq[XXX]</code> value where XXX is the number of queued SQL commands. If this number is consistently growing or reaching high values (thousands or more), the database is a bottleneck.
| | === MySQL Killed by OOM === |
|
| |
|
| ;2. Check if SIP message processing is enabled:
| | Reduce InnoDB buffer pool: |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="ini"> |
| grep -E "sip-options=|sip-subscribe=|sip-notify=" /etc/voipmonitor.conf
| | # /etc/mysql/my.cnf |
| | innodb_buffer_pool_size = 2G # Reduce from default |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| If these are set to <code>yes</code> and you have a high volume of these messages (OPTIONS pings sent frequently by SIP devices), this can overwhelm the database insert thread queue.
| | === Voipmonitor Killed by OOM === |
| | |
| === Solutions === | |
| | |
| There are three approaches to resolve SQL queue overload coredumps:
| |
| | |
| ==== Solution 1: Increase MySQL Insert Threads ====
| |
| | |
| Increase the number of threads dedicated to inserting SIP messages into the database. This allows more parallel database operations.
| |
| | |
| Edit <code>/etc/voipmonitor.conf</code> and add or modify:
| |
|
| |
|
| | Reduce buffer sizes in voipmonitor.conf: |
| <syntaxhighlight lang="ini"> | | <syntaxhighlight lang="ini"> |
| # Increase insert threads for SIP messages (default is 4, increase to 8 or higher for high traffic) | | max_buffer_mem = 2000 # Reduce from default |
| mysqlstore_max_threads_sip_msg = 8
| | ringbuffer = 50 # Reduce from default |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Restart VoIPmonitor for the change to take effect:
| | === Runaway External Process === |
| | |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| systemctl restart voipmonitor
| | # Find memory-hungry processes |
| </syntaxhighlight>
| | ps aux --sort=-%mem | head -20 |
| | |
| {{Tip|For very high traffic environments, you may need to increase this value further (e.g., 12 or 16).}}
| |
| | |
| ==== Solution 2: Disable High-Volume SIP Message Types ====
| |
| | |
| Reduce the load on the SQL queue by disabling processing of specific high-volume SIP message types that are not needed for your analysis.
| |
| | |
| Edit <code>/etc/voipmonitor.conf</code>:
| |
|
| |
|
| <syntaxhighlight lang="ini">
| | # Kill orphaned/runaway process |
| # Disable processing and database storage for specific message types | | kill -9 <PID> |
| sip-options = no
| |
| sip-subscribe = no
| |
| sip-notify = no
| |
| </syntaxhighlight> | | </syntaxhighlight> |
| | | For servers limited to '''16GB RAM''' or when experiencing repeated MySQL OOM kills: |
| Restart VoIPmonitor:
| |
| <syntaxhighlight lang="bash">
| |
| systemctl restart voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| {{Note|See [[SIP_OPTIONS/SUBSCRIBE/NOTIFY]] for detailed information on these options and when to use <code>nodb</code> mode instead of disabling entirely.}}
| |
| | |
| ==== Solution 3: Optimize MySQL Performance ====
| |
| | |
| Tune the MySQL/MariaDB server for better write performance to handle the high insert rate from VoIPmonitor.
| |
| | |
| Edit your MySQL configuration file (typically <code>/etc/mysql/my.cnf</code> or <code>/etc/mysql/mariadb.conf.d/50-server.cnf</code>):
| |
|
| |
|
| <syntaxhighlight lang="ini"> | | <syntaxhighlight lang="ini"> |
| | # /etc/my.cnf or /etc/mysql/mariadb.conf.d/50-server.cnf |
| [mysqld] | | [mysqld] |
| # InnoDB buffer pool size - set to approximately 50-70% of available RAM on a dedicated database server | | # On 16GB server: 6GB buffer pool + 6GB MySQL overhead = 12GB total |
| # On servers running VoIPmonitor and MySQL together, use approximately 30-50% of RAM | | # Leaves 4GB for OS + GUI, preventing OOM |
| innodb_buffer_pool_size = 8G | | innodb_buffer_pool_size = 6G |
|
| |
|
| # Reduce transaction durability for faster writes (may lose up to 1 second of data on crash) | | # Enable write buffering (may lose up to 1s of data on crash but reduces memory pressure) |
| innodb_flush_log_at_trx_commit = 2 | | innodb_flush_log_at_trx_commit = 2 |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| Restart MySQL and VoIPmonitor: | | Restart MySQL after changes: |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| systemctl restart mysql | | systemctl restart mysql |
| systemctl restart voipmonitor | | # or |
| | systemctl restart mariadb |
| </syntaxhighlight> | | </syntaxhighlight> |
| | === SQL Queue Growth from Non-Call Data === |
|
| |
|
| {{Warning|Setting <code>innodb_flush_log_at_trx_commit</code> to <code>2</code> trades some data safety for performance. In the event of a power loss or crash, up to 1 second of the most recent transactions may be lost.}}
| | If <code>sip-register</code>, <code>sip-options</code>, or <code>sip-subscribe</code> are enabled, non-call SIP-messages (OPTIONS, REGISTER, SUBSCRIBE, NOTIFY) can accumulate in the database and cause the SQL queue to grow unbounded. This increases MySQL memory usage and leads to OOM kills of mysqld. |
|
| |
|
| ==== Solution 4: Immediate SQL Queue Clearing ==== | | {{Warning|1=Even with reduced <code>innodb_buffer_pool_size</code>, SQL queue will grow indefinitely without cleanup of non-call data.}} |
|
| |
|
| If CDRs have stopped appearing in the GUI for several days and the SQL queue (SQLq) is backed up, you can immediately clear the backlog by deleting queue files and restarting services.
| | '''Solution: Enable automatic cleanup of old non-call data''' |
| | | <syntaxhighlight lang="ini"> |
| === Symptoms for SQL Queue Backup ===
| | # /etc/voipmonitor.conf |
| | | # cleandatabase=2555 automatically deletes partitions older than 7 years |
| * No new CDRs appearing in the GUI for an extended period (days)
| | # Covers: CDR, register_state, register_failed, and sip_msg (OPTIONS/SUBSCRIBE/NOTIFY) |
| * Sensor status shows a non-zero <code>SQLq</code> (SQL queue) count in '''Settings > Sensors'''
| | cleandatabase = 2555 |
| * The delay in CDRs equals the age of the oldest query file in the queue
| |
| | |
| === Diagnose SQL Queue Status ===
| |
| | |
| ;1. Check sensor status in the GUI:
| |
| :* Navigate to '''Settings > Sensors'''
| |
| :* Expand the status for the affected sensor
| |
| :* Look for a non-zero <code>SQLq</code> value
| |
| | |
| ;2. Alternatively check via command line:
| |
| <syntaxhighlight lang="bash"> | |
| # Check for SQL queue messages in syslog | |
| grep "SQLq" /var/log/syslog | tail -20
| |
| | |
| # Check for qoq* queue files in voipmonitor directory | |
| # The default directory is typically /voipmonitor or /var/lib/voipmonitor | |
| ls -la /voipmonitor/qoq*
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| === Clear SQL Queue and Restart Processing ===
| | Restart the sniffer after changes: |
| | |
| If the SQL queue is backed up and you want to resume processing immediately:
| |
| | |
| ;1. Restart the voipmonitor and rsyslog services to begin processing the queue naturally:
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Restart both services
| |
| systemctl restart voipmonitor | | systemctl restart voipmonitor |
| systemctl restart rsyslog
| |
| </syntaxhighlight>
| |
|
| |
| ;2. Monitor the SQLq count until it reaches zero. The delay in CDRs equals the age of the oldest query file in the queue.
| |
|
| |
| ;3. If the queue is too large or you want to skip old data and resume immediately, clear the queue files:
| |
| <syntaxhighlight lang="bash">
| |
| # Stop the voipmonitor service first to prevent new data writes
| |
| systemctl stop voipmonitor
| |
|
| |
| # Delete all qoq* queue files from the voipmonitor directory
| |
| # Replace /voipmonitor with your actual directory path
| |
| rm -f /voipmonitor/qoq*
| |
|
| |
| # Restart the voipmonitor service
| |
| systemctl start voipmonitor
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| ;4. After clearing the queue, monitor the <code>SQLf</code> (failed queries) stat in the sensor status:
| | {{Note|See [[Data_Cleaning]] for detailed configuration options and other <code>cleandatabase_*</code> parameters.}} |
| :* Navigate to '''Settings > Sensors'''
| | == Service Startup Failures == |
| :* Expand the status for the affected sensor
| |
| :* Watch for the <code>SQLf</code> counter
| |
| :* Ensure it remains low (e.g., below 10) and does not grow, indicating new queries are processing successfully
| |
| | |
| {{Warning|Deleting queue files permanently discards all pending CDR data. Only do this if you are willing to lose the queued data or if the backlog is so old that the data is no longer useful.}} | |
| | |
| {{Tip|The voipmonitor directory location varies by installation. Common locations include <code>/voipmonitor</code>, <code>/var/lib/voipmonitor</code>, or the directory specified by the <code>pcap_dir</code> configuration option in <code>voipmonitor.conf</code>.}}
| |
| | |
| === Additional Troubleshooting ===
| |
|
| |
|
| * If increasing threads and disabling SIP message types do not resolve the issue, check if the database server itself has performance bottlenecks (CPU, disk I/O, memory)
| | === Interface No Longer Exists === |
| * For systems with extremely high call volumes, consider moving the database to a separate dedicated server
| |
| * Monitor the <code>SQLq</code> metric after making changes to verify the queue is not growing unchecked
| |
|
| |
|
| == Appendix: tshark Display Filter Syntax for SIP ==
| | After OS upgrade, interface names may change (eth0 → ensXXX): |
| When using <code>tshark</code> to analyze SIP traffic, it is important to use the '''correct Wireshark display filter syntax'''. Below are common filter examples:
| |
|
| |
|
| === Basic SIP Filters ===
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Show all SIP INVITE messages | | # Find current interface names |
| tshark -r capture.pcap -Y "sip.Method == INVITE"
| | ip a |
|
| |
|
| # Show all SIP messages (any method) | | # Update all config locations |
| tshark -r capture.pcap -Y "sip"
| | grep -r "interface" /etc/voipmonitor.conf /etc/voipmonitor.conf.d/ |
|
| |
|
| # Show SIP and RTP traffic | | # Also check GUI: Settings → Sensors → Configuration |
| tshark -r capture.pcap -Y "sip || rtp"
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| === Search for Specific Phone Number or Text === | | === Missing Dependencies === |
| <syntaxhighlight lang="bash">
| |
| # Find calls containing a specific phone number (e.g., 5551234567)
| |
| tshark -r capture.pcap -Y 'sip contains "5551234567"'
| |
|
| |
|
| # Find INVITE messages for a specific number
| |
| tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"'
| |
| </syntaxhighlight>
| |
|
| |
| === Extract Call-ID from Matching Calls ===
| |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Get Call-ID for calls matching a phone number | | # Install common missing package |
| tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"' -T fields -e sip.Call-ID
| | apt install libpcap0.8 # Debian/Ubuntu |
| | | yum install libpcap # RHEL/CentOS |
| # Get Call-ID along with From and To headers | |
| tshark -r capture.pcap -Y 'sip.Method == INVITE' -T fields -e sip.Call-ID -e sip.from.user -e sip.to.user
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| === Filter by IP Address === | | == Network Interface Issues == |
| <syntaxhighlight lang="bash">
| |
| # SIP traffic from a specific source IP
| |
| tshark -r capture.pcap -Y "sip && ip.src == 192.168.1.100"
| |
|
| |
|
| # SIP traffic between two hosts
| | === Promiscuous Mode === |
| tshark -r capture.pcap -Y "sip && ip.addr == 192.168.1.100 && ip.addr == 10.0.0.50"
| |
| </syntaxhighlight>
| |
|
| |
|
| === Filter by SIP Response Code ===
| | Required for SPAN port monitoring: |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Show all 200 OK responses | | # Enable |
| tshark -r capture.pcap -Y "sip.Status-Code == 200"
| | ip link set eth0 promisc on |
|
| |
|
| # Show all 4xx and 5xx error responses | | # Verify |
| tshark -r capture.pcap -Y "sip.Status-Code >= 400"
| | ip link show eth0 | grep PROMISC |
| | |
| # Show 486 Busy Here responses
| |
| tshark -r capture.pcap -Y "sip.Status-Code == 486"
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| === Important Syntax Notes ===
| | {{Note|Promiscuous mode is NOT required for ERSPAN/GRE tunnels where traffic is addressed to the sensor.}} |
| * '''Field names are case-sensitive:''' Use <code>sip.Method</code>, <code>sip.Call-ID</code>, <code>sip.Status-Code</code> (not <code>sip.method</code> or <code>sip.call-id</code>)
| |
| * '''String matching uses <code>contains</code>:''' Use <code>sip contains "text"</code> (not <code>sip.contains()</code>)
| |
| * '''Use double quotes for strings:''' <code>sip contains "number"</code> (not single quotes)
| |
| * '''Boolean operators:''' Use <code>&&</code> (and), <code>||</code> (or), <code>!</code> (not)
| |
| | |
| For a complete reference, see the [https://www.wireshark.org/docs/dfref/s/sip.html Wireshark SIP Display Filter Reference].
| |
| | |
| == Database Error 1062 - Lookup Table Limit ==
| |
| | |
| {{Main|Database_troubleshooting#Database_Error_1062_-_Lookup_Table_Auto-Increment_Limit}} | |
| | |
| If sniffer logs show <code>1062 - Duplicate entry '16777215' for key 'PRIMARY'</code> and CDRs stop being stored, this is caused by lookup tables (cdr_sip_response, cdr_reason) hitting the MEDIUMINT auto-increment limit.
| |
| | |
| '''Quick fix:''' Set <code>cdr_reason_string_enable = no</code> in voipmonitor.conf and restart the service. For complete troubleshooting steps including TRUNCATE and queue cleanup, see [[Database_troubleshooting#Database_Error_1062_-_Lookup_Table_Auto-Increment_Limit|Database Troubleshooting]].
| |
| | |
| == Routing Loops ==
| |
| | |
| Routing loops occur when SIP INVITE requests continuously circulate between SIP servers without completing, causing excessive traffic and call failures. Common symptoms include:
| |
| | |
| * High volume of calls to a single destination number in a short time period
| |
| * Many INVITE requests with no SIP response (response code 0)
| |
| * Very long Post Dial Delay (PDD) values
| |
| * Rapid retransmission of INVITE to the same called number
| |
| | |
| {{Note|Routing loops can be caused by misconfigured dial plans, incorrect SIP URI formats, or circular forwarding rules.}}
| |
| | |
| ==== Detection Methods ====
| |
| | |
| Use alerts to detect routing loops:
| |
| | |
| * '''SIP Response Alert (Response code 0)''': Configure an alert to detect unreplied INVITE requests. [[Alerts|Configure this in GUI > Alerts]] by setting Response code to 0. This catches calls in a loop that never receive any SIP response.
| |
| | |
| * '''PDD (Post Dial Delay) Alert''': Configure a PDD alert with a threshold (e.g., <code>PDD > 30</code> seconds) to detect calls taking excessively long to complete. Routing loops often have very high PDD values as INVITEs continue retransmitting. [[Alerts|See Alerts documentation for PDD configuration]].
| |
| | |
| * '''Fraud: Sequential Alert''': Monitor for excessive calls to any single destination number within a short time window. Configure [[Anti-fraud|Fraud: Sequential]] with an appropriate Interval and Limit (e.g., 50 calls in 1 hour to the same number). Leave the called number field empty to monitor all destinations.
| |
| | |
| ==== Troubleshooting Steps ====
| |
| | |
| 1. Identify the looping destination number from alert logs or CDR search
| |
| 2. Check the SIP dialog to trace the call path (use PCAP analysis in GUI)
| |
| 3. Verify dial plan configuration on all involved SIP servers
| |
| 4. Look for forwarding rules or translation patterns that may create circular routing
| |
| 5. Fix the misconfiguration and verify the loop no longer occurs
| |
| | |
| | |
| == Service Fails to Start: Network Interface No Longer Exists ==
| |
| | |
| The voipmonitor service fails to start, reporting an error for a network interface that no longer exists and has been removed from the main configuration file.
| |
| | |
| === Symptoms ===
| |
| | |
| * Service fails to start with interface-related error
| |
| * Error message references an interface name that was removed or renamed
| |
| * The interface is not listed in <code>ip a</code> output
| |
| | |
| === Root Cause ===
| |
|
| |
|
| VoIPmonitor may have cached interface configurations in additional configuration files beyond the main <code>voipmonitor.conf</code>.
| | === Interface Drops === |
| | |
| === Solution === | |
| | |
| Check for interface references in all configuration locations:
| |
|
| |
|
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # 1. Check main config | | # Check for drops |
| grep -i "interface" /etc/voipmonitor.conf
| | ip -s link show eth0 | grep -i drop |
| | |
| # 2. Check for additional config files
| |
| ls -la /etc/voipmonitor.conf.d/
| |
| grep -r "interface" /etc/voipmonitor.conf.d/ | |
| | |
| # 3. Check sensor configuration stored in database (via GUI)
| |
| # Navigate to: Settings > Sensors > [Your Sensor] > Configuration
| |
| | |
| # 4. Remove or update references to the old interface
| |
| # Edit the relevant config file and change to the correct interface name:
| |
| nano /etc/voipmonitor.conf
| |
| # Change: interface = eth0 (old)
| |
| # To: interface = ens192 (new)
| |
|
| |
|
| # 5. Restart service | | # If drops present, increase ring buffer |
| systemctl restart voipmonitor
| | ethtool -G eth0 rx 4096 |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| === Prevention === | | === Bonded/EtherChannel Interfaces === |
|
| |
|
| When changing network interface names (e.g., during OS upgrade from eth0 to ensXXX naming):
| | '''Symptom''': False packet loss when monitoring bond0 or br0. |
| 1. Update <code>/etc/voipmonitor.conf</code> before reboot
| |
| 2. Check for any additional config files in <code>/etc/voipmonitor.conf.d/</code>
| |
| 3. Update sensor configuration in GUI if applicable
| |
|
| |
|
| == Kernel Errors: "bad gso" or Network Offloading Issues ==
| | '''Solution''': Monitor physical interfaces, not logical: |
| | | <syntaxhighlight lang="ini"> |
| If you encounter kernel errors like <code>bad gso: type: 1, size: 1448</code> or other network offloading-related errors, this may affect packet capture.
| | # voipmonitor.conf - use physical interfaces |
| | | interface = eth0,eth1 |
| === Symptoms ===
| |
| | |
| * Kernel log shows: <code>bad gso: type: 1, size: 1448, max: 1454</code>
| |
| * Network interface errors in dmesg
| |
| * Packet capture issues or dropped packets
| |
| * High CPU usage during packet processing
| |
| | |
| === Root Cause ===
| |
| | |
| Network offloading features (GSO, TSO, GRO) can cause issues with packet capture software. These features optimize network performance by handling packet segmentation in hardware/driver, but can interfere with raw packet capture.
| |
| | |
| === Solution ===
| |
| | |
| Disable network offloading on the capture interface:
| |
| | |
| <syntaxhighlight lang="bash"> | |
| # Check current offloading status | |
| ethtool -k eth0 | grep -E "generic-segmentation|tcp-segmentation|generic-receive"
| |
| | |
| # Disable GSO (Generic Segmentation Offload)
| |
| ethtool -K eth0 gso off
| |
| | |
| # Disable TSO (TCP Segmentation Offload)
| |
| ethtool -K eth0 tso off
| |
| | |
| # Disable GRO (Generic Receive Offload)
| |
| ethtool -K eth0 gro off
| |
| | |
| # Disable LRO (Large Receive Offload) if available
| |
| ethtool -K eth0 lro off
| |
| | |
| # Verify changes
| |
| ethtool -k eth0 | grep -E "segmentation|offload"
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| === Making Changes Persistent === | | === Network Offloading Issues === |
|
| |
|
| To make these changes survive reboots, add them to a network configuration script:
| | '''Symptom''': Kernel errors like <code>bad gso: type: 1, size: 1448</code> |
|
| |
|
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Option 1: Add to /etc/rc.local (if enabled) | | # Disable offloading on capture interface |
| echo "ethtool -K eth0 gso off tso off gro off" >> /etc/rc.local
| | ethtool -K eth0 gso off tso off gro off lro off |
| | |
| # Option 2: Create a systemd service
| |
| cat > /etc/systemd/system/disable-offload.service << EOF
| |
| [Unit]
| |
| Description=Disable network offloading for packet capture
| |
| After=network.target
| |
| | |
| [Service]
| |
| Type=oneshot
| |
| ExecStart=/sbin/ethtool -K eth0 gso off tso off gro off
| |
| RemainAfterExit=yes
| |
| | |
| [Install]
| |
| WantedBy=multi-user.target
| |
| EOF
| |
| | |
| systemctl enable disable-offload.service
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| === Related Information === | | == Packet Ordering Issues == |
|
| |
|
| * Replace <code>eth0</code> with your actual capture interface name
| | If SIP messages appear out of sequence: |
| * These settings may slightly increase CPU usage but improve packet capture reliability
| |
| * Virtual environments (VMware, KVM) may require additional VM-level settings
| |
|
| |
|
| == Packet Ordering Issues: Messages Appearing Out of Sequence ==
| | '''First''': Rule out Wireshark display artifact - disable "Analyze TCP sequence numbers" in Wireshark. See [[FAQ]]. |
| If SIP messages are displayed out of order in the VoIPmonitor GUI or in downloaded PCAP files, even after syncing the host machine time with NTP, this indicates packets are arriving at the network interface in an incorrect sequence. This is distinct from Wireshark display artifacts covered in the [[FAQ]] section.
| |
|
| |
|
| === Distinction from Wireshark Display Issues===
| | '''If genuine reordering''': Usually caused by packet bursts in network infrastructure. Use tcpdump to verify packets arrive out of order at the interface. Work with network admin to implement QoS or traffic shaping. For persistent issues, consider dedicated capture card with hardware timestamping (see [[Napatech]]). |
| Before continuing, verify the issue is NOT a Wireshark display artifact:
| | {{Note|For out-of-order packets in '''client/server mode''' (multiple sniffers), see [[Sniffer_distributed_architecture]] for <code>pcap_queue_dequeu_window_length</code> configuration.}} |
|
| |
|
| *'''Wireshark display artifact:''' PCAP files show correct order when TCP sequence analysis is disabled. Solution: Disable "Analyze TCP sequence numbers" in Wireshark Preferences (see [[FAQ]] for details).
| | === Solutions for SPAN/Mirroring Reordering === |
|
| |
|
| *'''Actual packet reordering:''' tcpdump on the VoIPmonitor host shows packets arriving out of order at the network interface. Continue with this section.
| | If packets arrive out of order at the SPAN/mirror port (e.g., 302 responses before INVITE causing "000 no response" errors): |
|
| |
|
| === Diagnosis: Use tcpdump to Identify Where Reordering Occurs ===
| | 1. '''Configure switch to preserve packet order''': Many switches allow configuring SPAN/mirror ports to maintain packet ordering. Consult your switch documentation for packet ordering guarantees in mirroring configuration. |
| The key diagnostic question is: Are packets arriving out of order at the network interface, or is the reordering occurring in VoIPmonitor's processing?
| |
|
| |
|
| '''Step 1: Capture traffic directly on the network interface:''' | | 2. '''Replace SPAN with TAP or packet broker''': Unlike software-based SPAN mirroring, hardware TAPs and packet brokers guarantee packet order. Consider upgrading to a dedicated TAP or packet broker device for mission-critical monitoring. |
| | == Database Issues == |
|
| |
|
| Use tcpdump on the VoIPmonitor host to capture packets at the lowest level:
| | === SQL Queue Overload === |
|
| |
|
| <syntaxhighlight lang="bash"> | | '''Symptom''': Growing <code>SQLq</code> metric, potential coredumps. |
| # Capture SIP traffic for 30 seconds to a file
| |
| # Replace eth0 with your sniffing interface
| |
| tcpdump -i eth0 -nn "sip" -w /tmp/direct_interface_capture.pcap
| |
|
| |
|
| # During the capture, make a test call that shows the out-of-order problem | | <syntaxhighlight lang="ini"> |
| | # voipmonitor.conf - increase threads |
| | mysqlstore_concat_limit_cdr = 1000 |
| | cdr_check_exists_callid = 0 |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| '''Step 2: Analyze the tcpdump capture for packet ordering:'''
| | === Error 1062 - Lookup Table Limit === |
|
| |
|
| Open the captured file in Wireshark or use tshark to check if timestamps are sequential:
| | '''Symptom''': <code>Duplicate entry '16777215' for key 'PRIMARY'</code> |
|
| |
|
| <syntaxhighlight lang="bash"> | | '''Quick fix''': |
| # Display SIP messages with timestamps sorted | | <syntaxhighlight lang="ini"> |
| tshark -r /tmp/direct_interface_capture.pcap -Y "sip" -T fields -e frame.time_epoch -e sip.Method -e sip.Status-Code | head -50
| | # voipmonitor.conf |
| | cdr_reason_string_enable = no |
| | </syntaxhighlight> |
|
| |
|
| # Analyze sequence numbers for the same call | | See [[Database_troubleshooting#Database_Error_1062_-_Lookup_Table_Auto-Increment_Limit|Database Troubleshooting]] for complete solution. |
| tshark -r /tmp/direct_interface_capture.pcap -Y "sip && sip.Call-ID == 'YOUR_CALL_ID'" -T fields -e frame.number -e frame.time_epoch -e sip.CSeq -e sip.Method -e sip.Status-Code
| |
| </syntaxhighlight>
| |
|
| |
|
| Replace <code>YOUR_CALL_ID</code> with the actual Call-ID from a problematic call (view it in the GUI's CDR → SIP History tab).
| | == Bad Packet Errors == |
|
| |
|
| '''Step 3: Compare tcpdump with downloaded PCAP from GUI:''' | | '''Symptom''': <code>bad packet with ether_type 0xFFFF detected on interface</code> |
|
| |
|
| | '''Diagnosis''': |
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Download the PCAP for the same problematic call from the GUI | | # Run diagnostic (let run 30-60 seconds, then kill) |
| # Save it as: /tmp/gui_download.pcap
| | voipmonitor --check_bad_ether_type=eth0 |
|
| |
|
| # Compare timestamps between direct capture and GUI download | | # Find and kill the diagnostic process |
| tshark -r /tmp/direct_interface_capture.pcap -Y "sip && sip.Call-ID == 'YOUR_CALL_ID'" -T fields -e frame.time_epoch | sort -n > /tmp/direct_timestamps.txt
| | ps ax | grep voipmonitor |
| tshark -r /tmp/gui_download.pcap -Y "sip && sip.Call-ID == 'YOUR_CALL_ID'" -T fields -e frame.time_epoch | sort -n > /tmp/gui_timestamps.txt
| | kill -9 <PID> |
| | |
| # Check if both show similar ordering issues
| |
| diff /tmp/direct_timestamps.txt /tmp/gui_timestamps.txt
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| === Diagnostic Decision Matrix ===
| | Causes: corrupted packets, driver issues, VLAN tagging problems. Check <code>ethtool -S eth0</code> for interface errors. |
|
| |
|
| {| class="wikitable" style="background:#e7f3ff; border:1px solid #3366cc;"
| | == Useful Diagnostic Commands == |
| |-
| |
| ! colspan="2" style="background:#3366cc; color: white;" | Packet Ordering Diagnosis
| |
| |-
| |
| ! Observation
| |
| ! Root Cause & Solution
| |
| |-
| |
| | '''tcpdump shows packets OUT of order at interface'''
| |
| | '''Network-level issue:''' Packets are arriving in wrong order at the network interface. This is usually caused by '''packet bursts'''. Investigate and eliminate packet bursts on the network path (see solution below).
| |
| |-
| |
| | '''tcpdump shows packets IN order, but GUI shows OUT of order'''
| |
| | '''Potential packet burst issue:''' Large bursts may overwhelm the sensor's queue, causing internal buffering and reordering. Consider using a dedicated capture card with internal clock (see solution below).
| |
| |-
| |
| | '''Both tcpdump and GUI show same out-of-order pattern'''
| |
| | '''Network infrastructure issue:''' The reordering is external to VoIPmonitor. The issue lies in the network path between the monitored devices and the sensor. Work with your network administrator.
| |
| |}
| |
| | |
| == Troubleshooting: Bad Packet with Invalid Ether Type Error ==
| |
| | |
| If sensor logs show error messages like:
| |
| <code>A bad packet with ether_type 0xFFFF was detected on interface [interface_name]. Contact support!</code>
| |
| | |
| This indicates that VoIPmonitor is detecting packets with an invalid Ethernet type value on the specified interface. The value <code>0xFFFF</code> is not a valid Ethernet type.
| |
| | |
| === Understanding the Error ===
| |
| | |
| The <code>ether_type</code> field in an Ethernet frame identifies the protocol being carried. Valid values include:
| |
| * <code>0x0800</code>: IPv4
| |
| * <code>0x86DD</code>: IPv6
| |
| * <code>0x8100</code>: 802.1Q VLAN tag
| |
| * <code>0x0806</code>: ARP
| |
| | |
| The value <code>0xFFFF</code> indicates one of these issues:
| |
| * '''Corrupted packet data''' - Packet was damaged during transmission
| |
| * '''Malformed packet''' - A device is sending incorrectly formatted packets
| |
| * '''Driver/hardware issue''' - Network interface or driver problems
| |
| * '''VLAN tagging problem''' - Issues with 802.1Q VLAN tag processing
| |
| | |
| === Diagnosis: Use Built-in Diagnostic Tool ===
| |
| | |
| VoIPmonitor includes a diagnostic command specifically for analyzing bad ether_type packets.
| |
| | |
| ;1. Log in to the sensor reporting the error:
| |
|
| |
|
| Connect to the sensor host via SSH or console.
| | === tshark Filters for SIP === |
| | |
| ;2. Run the diagnostic command on the specific interface:
| |
|
| |
|
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Run the diagnostic for the interface mentioned in the error log | | # All SIP INVITEs |
| # Replace [interface_name] with the actual interface (e.g., eth0, napa12)
| | tshark -r capture.pcap -Y "sip.Method == INVITE" |
| voipmonitor --check_bad_ether_type=[interface_name]
| |
| </syntaxhighlight>
| |
|
| |
|
| This command will capture and analyze packets with invalid ether_type values on the specified interface.
| | # Find specific phone number |
| | tshark -r capture.pcap -Y 'sip contains "5551234567"' |
|
| |
|
| ;3. Allow the command to run and capture output:
| | # Get Call-IDs |
| | tshark -r capture.pcap -Y "sip.Method == INVITE" -T fields -e sip.Call-ID |
|
| |
|
| Let the diagnostic run for 30-60 seconds to capture any bad packets. The output will provide details about the malformed packets.
| | # SIP errors (4xx, 5xx) |
| | | tshark -r capture.pcap -Y "sip.Status-Code >= 400" |
| ;4. Terminate the diagnostic process:
| |
| | |
| The command runs continuously until manually stopped. Find its PID and kill it:
| |
| | |
| <syntaxhighlight lang="bash">
| |
| # Find the process ID
| |
| ps ax | grep voipmonitor
| |
| | |
| # Kill the process using its PID
| |
| kill -9 [PID]
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| ;5. Provide the captured output to support team:
| | === Interface Statistics === |
| | |
| Send the diagnostic output to VoIPmonitor support for analysis. This information helps identify the source of the bad packets.
| |
| | |
| === Additional Troubleshooting Steps === | |
| | |
| If the diagnostic output indicates a persistent issue, check these items:
| |
| | |
| ;Check interface statistics for errors:
| |
|
| |
|
| <syntaxhighlight lang="bash"> | | <syntaxhighlight lang="bash"> |
| # Check for interface errors | | # Detailed NIC stats |
| ethtool -S [interface_name] | | ethtool -S eth0 |
| | |
| # Check kernel log for driver errors
| |
| dmesg | grep -i "error\|fail\|corrupt"
| |
| | |
| # Check interface-specific error counters
| |
| ip -s link show [interface_name]
| |
| </syntaxhighlight>
| |
| | |
| ;Verify network infrastructure:
| |
| | |
| * If using SPAN/mirroring ports: Ensure the mirror port is configured to pass all VLAN tags
| |
| * Check for duplex mismatches between the sensor interface and switch port
| |
| * Verify switch port is not over-subscribed
| |
| | |
| ;Test with a different interface:
| |
| | |
| If possible, temporarily configure VoIPmonitor to use a different network interface to determine if the issue is interface-specific.
| |
| | |
| {{Tip|If the error rate is low and does not impact call recording, this may be acceptable. However, if errors are frequent or correlating with call quality issues, investigate further.}}
| |
| | |
| === When to Contact Support ===
| |
| | |
| Contact VoIPmonitor support if:
| |
| * The error persists after using the diagnostic tool
| |
| * The error rate is high and impacts call recording
| |
| * You have identified a pattern (specific sources, times, or conditions)
| |
| | |
| Provide the output from <code>voipmonitor --check_bad_ether_type=[interface]</code> along with your VoIPmonitor configuration and network topology information.
| |
| | |
| === Solution: Investigate and Eliminate Packet Bursts ===
| |
|
| |
|
| Packet bursts are the most common cause of genuine out-of-order packet issues. During periods of high network load, multiple packets may arrive nearly simultaneously or with irregular timing, causing the sensor's capture queue to buffer and reorder packets.
| | # Watch packet rates |
| | |
| '''Diagnosing packet bursts:'''
| |
| | |
| <syntaxhighlight lang="bash">
| |
| # Monitor interface packet rates in real-time | |
| watch -n 1 'cat /proc/net/dev | grep eth0' | | watch -n 1 'cat /proc/net/dev | grep eth0' |
|
| |
| # Check for bursty packet arrival pattern
| |
| tcpdump -i eth0 -nn "sip" -l | while read line; do
| |
| echo "$(date +%s.%N) $line"
| |
| done | tee /tmp/burst_analysis.log
| |
|
| |
| # Analyze the log for irregular timing patterns
| |
| # Look for multiple packets with the same timestamp or very close timestamps
| |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| If you see multiple SIP packets with identical or nearly identical timestamps in the burst analysis log, this confirms packet burst issues.
| | == See Also == |
|
| |
|
| '''Eliminating packet bursts:'''
| | * [[Sniffer_configuration]] - Configuration parameter reference |
| | * [[Sniffer_distributed_architecture]] - Client/server deployment |
| | * [[Capture_rules]] - GUI-based recording rules |
| | * [[Sniffing_modes]] - SPAN, ERSPAN, GRE, TZSP setup |
| | * [[Scaling]] - Performance optimization |
| | * [[Database_troubleshooting]] - Database issues |
| | * [[FAQ]] - Common questions and Wireshark display issues |
|
| |
|
| Packet bursts are typically caused by network infrastructure factors:
| |
|
| |
|
| *'''Switch buffer overflow:''' Network switches have limited buffers. When traffic exceeds capacity, packets are burst-transmitted when buffer space becomes available.
| |
|
| |
|
| *'''Asymmetric routing:''' Different network paths for request and response packets can cause varying arrival times.
| |
|
| |
|
| *'''Layer 2 congestion:''' High traffic on VLANs or shared network segments.
| |
|
| |
|
| Work with your network administrator to:
| |
| * Implement traffic shaping/policing on the monitored network segments
| |
| * Increase switch buffer sizes or use switches with larger buffers
| |
| * Use quality of service (QoS) to prioritize SIP traffic
| |
| * Check for spanning tree topology changes or link flapping on switch ports
| |
|
| |
|
| === Solution: Use Dedicated Capture Card with Internal Clock ===
| |
| If packet bursts cannot be eliminated from the network or continue despite optimization, using a dedicated network interface card (NIC) with hardware timestamping capabilities can provide accurate packet ordering independent of host system timing issues.
| |
|
| |
|
| Dedicated capture cards (such as Napatech SmartNICs or similar hardware) provide:
| |
|
| |
|
| *'''Internal clock:''' The capture card maintains its own precise timing source, independent of the host system clock.
| | == AI Summary for RAG == |
|
| |
|
| *'''Hardware timestamping:''' Packets are timestamped as they enter the NIC hardware, before any operating system processing or queuing.
| | <!-- This section is for AI/RAG systems. Do not edit manually. --> |
|
| |
|
| *'''Large hardware buffers:** Specialized cards have dedicated packet buffers that can absorb bursts without affecting timestamp accuracy.
| | === Summary === |
| | | Comprehensive troubleshooting guide for VoIPmonitor sniffer/sensor problems. Covers: verifying traffic reaches interface (tcpdump/tshark), diagnosing no calls recorded (service, config, capture rules, SPAN), missing audio/RTP issues (one-way audio, NAT, natalias, rtp_check_both_sides_by_sdp), PACKETBUFFER FULL errors (I/O vs CPU bottleneck diagnosis using syslog metrics heap/t0CPU/SQLq and Linux tools iostat/iotop/ioping), manager commands for thread monitoring (sniffer_threads via socket or port 5029), t0 single-core capture limit and solutions (DPDK/Napatech kernel bypass), I/O solutions (NVMe/SSD, async writes, pcap_dump_writethreads), CPU solutions (max_buffer_mem 10GB+, jitterbuffer tuning), OOM issues (MySQL buffer pool, voipmonitor buffers), network interface problems (promiscuous mode, drops, offloading), packet ordering, database issues (SQL queue, Error 1062). |
| For VoIPmonitor integration with capture cards, see [[Napatech]] for Napatech-specific configuration and [[Sniffing_modes]] for other hardware-accelerated capture options.
| |
| | |
| === Workflow Summary === | |
| | |
| 1. Ensure host time is synced with NTP (step you have already completed)
| |
| 2. Use tcpdump to capture directly on the interface during a problematic call
| |
| 3. Analyze the capture to determine if packets are arriving out of order at the network level
| |
| 4. If tcpdump confirms out-of-order packets:
| |
| * Investigate and eliminate packet bursts in the network infrastructure
| |
| * Consider using dedicated capture cards with internal clock for timestamp accuracy
| |
| 5. If tcpdump shows packets in order but VoIPmonitor shows out of order:
| |
| * Review PACKETBUFFER settings ([[#Step_8:_PACKETBUFFER_Saturation_Under_High_Load|PACKETBUFFER Saturation]])
| |
| * Consider hardware capture cards for improved timing accuracy
| |
| | |
| {{Note|Do not confuse genuine packet reordering with Wireshark display artifacts. If Wireshark correctly shows packets after disabling TCP sequence analysis, the issue is a display setting, not actual packet ordering.}}
| |
| | |
| == See Also ==
| |
| * [[Sniffer_configuration]] - Complete configuration reference for voipmonitor.conf
| |
| * [[Sniffer_distributed_architecture]] - Client/server deployment and troubleshooting
| |
| * [[Capture_rules]] - GUI-based selective recording configuration
| |
| * [[Sniffing_modes]] - Traffic forwarding methods (SPAN, ERSPAN, GRE, TZSP)
| |
| * [[Scaling]] - Performance tuning and optimization
| |
| * [[Upgrade_to_bigint]] - Migrating CDR table to BIGINT (unrelated to lookup table issues)
| |
| * [[FAQ]] - Wireshark display artifacts and out-of-order packet display issues
| |
| | |
| == AI Summary for RAG ==
| |
| | |
| '''Summary:''' Comprehensive troubleshooting guide for VoIPmonitor sensor issues. CRITICAL FIRST RULE: For missing packets during high-traffic periods for specific IPs, run tcpdump -i eth0 -nn "host <PROBLEMATIC_IP> and port 5060" BEFORE changing any sensor configuration. If packets are NOT on the interface → network infrastructure issue (SPAN/mirroring) - sensor tuning will NOT help. If packets ARE on the interface → sensor resource bottleneck - proceed with ringbuffer, max_sip_packets, CPU tuning. Key diagnostic flow: (1) Check service status with systemctl, (2) Verify traffic with tshark -i eth0 -Y "sip || rtp", (3) Check SPAN/network configuration including interface drops and asymmetric mirroring, (4) Verify voipmonitor.conf (interface, sipport, filter), (5) Check GUI capture rules for Skip, (6) Review logs for OOM/PACKETBUFFER errors. PACKETBUFFER saturation: "PACKETBUFFER: memory is FULL" errors, truncated RTP - FIRST diagnose by disabling savesip/savertp/savertcp/savegraph=no to test if disk I/O is bottleneck vs CPU. If I/O is bottleneck, upgrade storage or use client-server mode. If CPU bottleneck, for 8,000-10,000 concurrent calls: rtpthreads_start=20, threading_expanded=high_traffic, max_buffer_mem=10000. After applying changes, monitor syslog heap[A|B|C] values should stay below 20%. Storage failure: sensor disconnected (red X) with DROPPED PACKETS at low traffic - use smartctl to diagnose, replace disk. EtherChannel/bonded interfaces: false packet loss when monitoring br0/bond0 - use comma-separated physical interfaces instead. OOM scenarios: MySQL killed (tune innodb), voipmonitor killed (reduce buffers), runaway processes (identify with ps aux, kill orphaned). Bad packet ether_type error: "A bad packet with ether_type 0xFFFF was detected on interface" - diagnose with voipmonitor --check_bad_ether_type=[interface], then kill with ps ax | grep voipmonitor and kill -9 [PID]. Advanced natalias troubleshooting: if natalias configuration does not work, try reversing IP order (natalias = public_ip private_ip vs private_ip public_ip). For persistent missing audio, perform packet capture to compare SDP RTP ports with actual RTP packet ports using tshark -Y "sip.Method == INVITE" and tshark -Y "rtp". If ports do not match, external device (SBC/media server) is modifying media ports and must be configured to preserve SDP ports - VoIPmonitor cannot compensate for external port modification. Other issues covered: snaplen truncation, Kamailio siptrace truncation, asymmetric mirroring, rtp_check_both_sides_by_sdp one-way RTP, SQL queue overload, packet ordering issues.
| |
|
| |
|
| '''Keywords:''' troubleshooting, no calls, PACKETBUFFER, PACKETBUFFER saturation, truncated RTP, savesip, savertp, savertcp, savegraph, disk I/O bottleneck, rtpthreads_start, max_buffer_mem, storage failure, smartctl, RAID, OOM, tshark, tcpdump, SPAN, RSPAN, ERSPAN, interface, sipport, filter, capture rules, snaplen, Kamailio siptrace, interface drops, asymmetric mirroring, rtp_check_both_sides_by_sdp, one-way RTP, SQL queue, EtherChannel, bonded interface, packet ordering, packet bursts, NTP, time synchronization, promiscuous mode, bad packet, ether_type, 0xFFFF, check_bad_ether_type, natalias, NAT alias, IP order reversal, SDP port mismatch, external device port modification, SBC, media server, RTP ports, tcpdump packet capture
| | === Keywords === |
| | troubleshooting, sniffer, sensor, no calls, missing audio, one-way audio, RTP, PACKETBUFFER FULL, memory is FULL, buffer saturation, I/O bottleneck, CPU bottleneck, heap, t0CPU, t1CPU, t2CPU, SQLq, comp, tacCPU, iostat, iotop, ioping, sniffer_threads, manager socket, port 5029, thread CPU, t0 thread, single-core limit, DPDK, Napatech, kernel bypass, NVMe, SSD, async write, pcap_dump_writethreads, tar_maxthreads, max_buffer_mem, jitterbuffer, interface_ip_filter, OOM, out of memory, innodb_buffer_pool_size, promiscuous mode, interface drops, ethtool, packet ordering, SPAN, mirror, SQL queue, Error 1062, natalias, NAT, id_sensor, snaplen, capture rules, tcpdump, tshark |
|
| |
|
| '''Key Questions:'''
| | === Key Questions === |
| * CRITICAL: When missing packets during high-traffic periods for specific IPs, should I run tcpdump before tuning sensor configuration? | | * Why are no calls being recorded in VoIPmonitor? |
| * How do I determine if packets are reaching the network interface?
| | * How to diagnose PACKETBUFFER FULL or memory is FULL error? |
| * Why is VoIPmonitor not recording any calls?
| | * How to determine if bottleneck is I/O or CPU? |
| * How do I verify SIP/RTP traffic is reaching the sensor? | | * What do heap values in syslog mean? |
| * What causes PACKETBUFFER saturation and how do I fix it?
| | * What does t0CPU percentage indicate? |
| * How do I diagnose if PACKETBUFFER saturation is caused by disk I/O or CPU? | | * How to use sniffer_threads manager command? |
| * What values should I use for rtpthreads_start, threading_expanded, and max_buffer_mem for 8,000-10,000 concurrent calls? | | * How to connect to manager socket or port 5029? |
| * What should the PACKETBUFFER heap percentage stay below during peak traffic? | | * What to do when t0 thread is at 100%? |
| * Should I disable savesip/savertp/savertcp/savegraph to test for disk I/O bottleneck? | | * How to fix one-way audio or missing RTP? |
| * How do I distinguish PACKETBUFFER saturation from storage hardware failure? | | * How to configure natalias for NAT? |
| * How do I use smartctl to check disk health?
| | * How to increase max_buffer_mem for high traffic? |
| * What are symptoms of interface packet drops? | | * How to disable jitterbuffer to save CPU? |
| * How do I check for asymmetric traffic mirroring? | | * What causes OOM kills of voipmonitor or MySQL? |
| * What causes one-way RTP capture?
| | * How to check disk I/O performance with iostat? |
| * How do I fix SPAN configured for one direction only? | | * How to enable promiscuous mode on interface? |
| * What should I verify after server reboot? | | * How to fix packet ordering issues with SPAN? |
| * Why is time sync critical for packetbuffer_sender?
| | * What is Error 1062 duplicate entry? |
| * How do I fix SQL queue overload causing coredumps? | | * How to verify traffic reaches capture interface? |
| * Which package is commonly missing on new sensors? | |
| * Do I need promiscuous mode for ERSPAN/GRE tunnels? | |
| * Why does port-based tcpdump filter miss fragmented SIP packets?
| |
| * How do I fix OOM caused by runaway external processes? | |
| * Why does VoIPmonitor report false packet loss on EtherChannel/bonded interfaces?
| |
| * How do I diagnose and fix out-of-order packet issues? | |
| * What does "bad packet with ether_type 0xFFFF" error mean?
| |
| * How do I use check_bad_ether_type diagnostic command?
| |
| * How do I configure natalias for NAT scenarios?
| |
| * What do I do if natalias configuration does not work? | |
| * When should I try reversing IP order in natalias?
| |
| * How do I diagnose SDP port vs actual RTP port mismatch? | |
| * What if SDP RTP ports do not match actual RTP packets in the capture?
| |
| * Which external device modifies media ports and how do I fix it?
| |
| * How do I use tshark to compare SDP and RTP ports?
| |