|
|
| Line 1: |
Line 1: |
| {{DISPLAYTITLE:Troubleshooting: No Calls Being Sniffed}}
| | #REDIRECT [[Sniffer_troubleshooting]] |
| | |
| '''This guide provides a systematic process to diagnose why the VoIPmonitor sensor might not be capturing any calls. Use it to quickly identify and resolve the most common issues.'''
| |
| | |
| == Is the VoIPmonitor Service Running Correctly? ==
| |
| First, confirm the sensor process is active and loaded the correct configuration file.
| |
| | |
| ;1. Check the service status (for modern systemd systems):
| |
| <syntaxhighlight lang="bash">systemctl status voipmonitor</syntaxhighlight>
| |
| Look for a line that says <code>Active: active (running)</code>. If it is inactive or failed, try restarting it with <code>systemctl restart voipmonitor</code> and check the status again.
| |
| | |
| ;2. Service Fails to Start with "Binary Not Found" After Crash:
| |
| If the VoIPmonitor service fails to start after a crash or watchdog restart with an error message indicating the binary cannot be found (e.g., "No such file or directory" for <code>/usr/local/sbin/voipmonitor</code>), the binary may have been renamed with an underscore suffix during the crash recovery process.
| |
| | |
| Check for a renamed binary:
| |
| <syntaxhighlight lang="bash">
| |
| # Check if the standard binary path exists | |
| ls -l /usr/local/sbin/voipmonitor
| |
| | |
| # If not found, look for a renamed version with underscore suffix
| |
| ls -l /usr/local/sbin/voipmonitor_*
| |
| </syntaxhighlight>
| |
| | |
| If you find a renamed binary (e.g., <code>voipmonitor_</code>, <code>voipmonitor_20250104</code>, etc.), rename it back to the standard name:
| |
| <syntaxhighlight lang="bash">
| |
| mv /usr/local/sbin/voipmonitor_ /usr/local/sbin/voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| Then restart the service:
| |
| <syntaxhighlight lang="bash">
| |
| systemctl start voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| Verify the service starts correctly:
| |
| <syntaxhighlight lang="bash">
| |
| systemctl status voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| ;3. Sensor Becomes Unresponsive After GUI Update:
| |
| If the sensor service fails to start or becomes unresponsive after updating a sensor through the Web GUI, the update process may have left the service in a stuck state. The solution is to forcefully stop the service and restart it using these commands:
| |
| <syntaxhighlight lang="bash">
| |
| # SSH into the sensor host and execute:
| |
| killall voipmonitor
| |
| systemctl stop voipmonitor
| |
| systemctl start voipmonitor
| |
| </syntaxhighlight>
| |
| After running these commands, verify the sensor status in the GUI to confirm it is responding correctly. This sequence ensures: (1) Any zombie or hung processes are terminated with <code>killall</code>, (2) systemd is fully stopped, and (3) a clean start of the service.
| |
| | |
| ;4. Verify the running process:
| |
| <syntaxhighlight lang="bash">ps aux | grep voipmonitor</syntaxhighlight>
| |
| This command will show the running process and the exact command line arguments it was started with. Critically, ensure it is using the correct configuration file, for example: <code>--config-file /etc/voipmonitor.conf</code>. If it is not, there may be an issue with your startup script.
| |
| | |
| == Is Network Traffic Reaching the Server? ==
| |
| If the service is running, verify if the VoIP packets (SIP/RTP) are actually arriving at the server's network interface. The best tool for this is <code>tshark</code> (the command-line version of Wireshark).
| |
| | |
| ;1. Install tshark:
| |
| <syntaxhighlight lang="bash">
| |
| # For Debian/Ubuntu
| |
| apt-get update && apt-get install tshark
| |
| | |
| # For CentOS/RHEL/AlmaLinux
| |
| yum install wireshark
| |
| </syntaxhighlight>
| |
| | |
| ;2. Listen for SIP traffic on the correct interface:
| |
| Replace <code>eth0</code> with the interface name you have configured in <code>voipmonitor.conf</code>.
| |
| <syntaxhighlight lang="bash">
| |
| tshark -i eth0 -Y "sip || rtp" -n
| |
| </syntaxhighlight>
| |
| *'''If you see a continuous stream of SIP and RTP packets''', it means traffic is reaching the server, and the problem is likely in VoIPmonitor's configuration (see [[#Check the VoIPmonitor Configuration|Check the VoIPmonitor Configuration]]).
| |
| *'''If you see NO packets''', the problem lies with your network configuration. See [[#Troubleshoot Network and Interface Configuration|Troubleshoot Network and Interface Configuration]].
| |
| | |
| ;3. Advanced: Capture to PCAP File for Definitive Testing
| |
| Live monitoring with tshark is useful for observation, but capturing traffic to a .pcap file during a test call provides definitive evidence for troubleshooting intermittent issues or specific call legs.
| |
| | |
| '''Method 1: Using tcpdump (Recommended)'''
| |
| <syntaxhighlight lang="bash">
| |
| # Start capture on the correct interface (replace eth0)
| |
| tcpdump -i eth0 -s 0 -w /tmp/test_capture.pcap port 5060
| |
| | |
| # Or capture both SIP and RTP traffic:
| |
| tcpdump -i eth0 -s 0 -w /tmp/test_capture.pcap "(port 5060 or udp)"
| |
| | |
| # Let it run while you make a test call with the missing call leg
| |
| # Press Ctrl+C to stop the capture
| |
| | |
| # Analyze the capture file:
| |
| tshark -r /tmp/test_capture.pcap -Y "sip"
| |
| </syntaxhighlight>
| |
| | |
| '''Method 2: Using tshark to capture to file'''
| |
| <syntaxhighlight lang="bash">
| |
| # Start capture:
| |
| tshark -i eth0 -w /tmp/test_capture.pcap -f "tcp port 5060 or udp"
| |
| | |
| # Make your test call, then press Ctrl+C to stop
| |
| | |
| # Analyze the capture:
| |
| tshark -r /tmp/test_capture.pcap -Y "sip" -V
| |
| </syntaxhighlight>
| |
| | |
| '''Decision Tree for PCAP Analysis:'''
| |
| After capturing a test call known to have a missing leg:
| |
| | |
| * '''If SIP packets are missing from the .pcap file:'''
| |
| ** The problem is with your network mirroring configuration (SPAN/TAP port, AWS Traffic Mirroring, etc.)
| |
| ** The packets never reached the VoIPmonitor sensor's network interface
| |
| ** Fix the switch mirroring setup or infrastructure configuration first
| |
| | |
| * '''If SIP packets ARE present in the .pcap file but missing in the VoIPmonitor GUI:'''
| |
| ** The problem is with VoIPmonitor's configuration or processing
| |
| ** Packets reached the NIC but were not processed correctly
| |
| ** Review [[#Check the VoIPmonitor Configuration|VoIPmonitor Configuration]] and [[#Check GUI Capture Rules (Causing Call Stops)|Capture Rules]]
| |
| | |
| '''Example Test Call Workflow:'''
| |
| <syntaxhighlight lang="bash">
| |
| # 1. Start capture
| |
| tcpdump -i eth0 -s 0 -w /tmp/test.pcap "sip and host 10.0.1.100"
| |
| | |
| # 2. Make a test call from phone at 10.0.1.100 to 10.0.2.200
| |
| # (a call that you know should have recordings but is missing)
| |
| | |
| # 3. Stop capture (Ctrl+C)
| |
| | |
| # 4. Check for the specific call's Call-ID
| |
| tshark -r /tmp/test.pcap -Y "sip" -T fields -e sip.Call-ID
| |
| | |
| # 5. Verify if packets for both A-leg and B-leg exist
| |
| tshark -r /tmp/test.pcap -Y "sip && ip.addr == 10.0.1.100"
| |
| | |
| # 6. Compare results with VoIPmonitor GUI
| |
| # - If packets found in .pcap: VoIPmonitor software issue
| |
| # - If packets missing from .pcap: Network mirroring issue
| |
| </syntaxhighlight>
| |
| | |
| == Troubleshoot Network and Interface Configuration ==
| |
| If <code>tshark</code> shows no traffic, it means the packets are not being delivered to the operating system correctly.
| |
| | |
| ;1. Check if the interface is UP:
| |
| Ensure the network interface is active.
| |
| <syntaxhighlight lang="bash">ip link show eth0</syntaxhighlight>
| |
| The output should contain the word <code>UP</code>. If it doesn't, bring it up with:
| |
| <syntaxhighlight lang="bash">ip link set dev eth0 up</syntaxhighlight>
| |
| | |
| ;2. Check for Promiscuous Mode (for SPAN/RSPAN Mirrored Traffic):
| |
| '''Important:''' Promiscuous mode requirements depend on your traffic mirroring method:
| |
| | |
| * '''SPAN/RSPAN (Layer 2 mirroring):''' The network interface '''must''' be in promiscuous mode. Mirrored packets retain their original MAC addresses, so the interface would normally ignore them. Promiscuous mode forces the interface to accept all packets regardless of destination MAC.
| |
| | |
| * '''ERSPAN/GRE/TZSP/VXLAN (Layer 3 tunnels):''' Promiscuous mode is '''NOT required'''. These tunneling protocols encapsulate the mirrored traffic inside IP packets that are addressed directly to the sensor's IP address. The operating system receives these packets normally, and VoIPmonitor automatically decapsulates them to extract the inner SIP/RTP traffic.
| |
| | |
| For SPAN/RSPAN deployments, check the current promiscuous mode status:
| |
| <syntaxhighlight lang="bash">ip link show eth0</syntaxhighlight>
| |
| Look for the <code>PROMISC</code> flag.
| |
| | |
| Enable promiscuous mode manually if needed:
| |
| <syntaxhighlight lang="bash">ip link set eth0 promisc on</syntaxhighlight>
| |
| If this solves the problem, you should make the change permanent. The <code>install-script.sh</code> for the sensor usually attempts to do this, but it can fail.
| |
| | |
| ;3. Verify Your SPAN/Mirror/TAP Configuration:
| |
| This is the most common cause of no traffic. Double-check your network switch or hardware tap configuration to ensure:
| |
| * The correct source ports (where your PBX/SBC is connected) are being monitored.
| |
| * The correct destination port (where your VoIPmonitor sensor is connected) is configured.
| |
| * If you are monitoring traffic across different VLANs, ensure your mirror port is configured to carry all necessary VLAN tags (often called "trunk" mode).
| |
| | |
| == Check the VoIPmonitor Configuration ==
| |
| If <code>tshark</code> sees traffic but VoIPmonitor does not, the problem is almost certainly in <code>voipmonitor.conf</code>.
| |
| | |
| ;1. Check the <code>interface</code> directive:
| |
| :Make sure the <code>interface</code> parameter in <code>/etc/voipmonitor.conf</code> exactly matches the interface where you see traffic with <code>tshark</code>. For example: <code>interface = eth0</code>.
| |
| | |
| ;2. Check the <code>sipport</code> directive:
| |
| :By default, VoIPmonitor only listens on port 5060. If your PBX uses a different port for SIP, you must add it. For example:
| |
| :<code>sipport = 5060,5080</code>
| |
| | |
| ;3. Check for a restrictive <code>filter</code>:
| |
| :If you have a BPF <code>filter</code> configured, ensure it is not accidentally excluding the traffic you want to see. For debugging, try commenting out the <code>filter</code> line entirely and restarting the sensor.
| |
| | |
| == Check GUI Capture Rules (Causing Call Stops) ==
| |
| If <code>tshark</code> sees SIP traffic and the sniffer configuration appears correct, but the probe stops processing calls or shows traffic only on the network interface, GUI capture rules may be the culprit.
| |
| | |
| Capture rules configured in the GUI can instruct the sniffer to ignore ("skip") all processing for matched calls. This includes calls matching specific IP addresses or telephone number prefixes.
| |
| | |
| ;1. Review existing capture rules:
| |
| :Navigate to '''GUI → Capture rules''' and examine all rules for any that might be blocking your traffic.
| |
| :Look specifically for rules with the '''Skip''' option set to '''ON''' (displayed as "Skip: ON"). The Skip option instructs the sniffer to completely ignore matching calls (no files, RTP analysis, or CDR creation).
| |
| | |
| ;2. Test by temporarily removing all capture rules:
| |
| :To isolate the issue, first create a backup of your GUI configuration:
| |
| :* Navigate to '''Tools → Backup & Restore → Backup GUI → Configuration tables'''
| |
| :* This saves your current settings including capture rules
| |
| :* Delete all capture rules from the GUI
| |
| :* Click the '''Apply''' button to save changes
| |
| :* Reload the sniffer by clicking the green '''"reload sniffer"''' button in the control panel
| |
| :* Test if calls are now being processed correctly
| |
| :* If resolved, restore the configuration from the backup and systematically investigate the rules to identify the problematic one
| |
| | |
| ;3. Identify the problematic rule:
| |
| :* After restoring your configuration, remove rules one at a time and reload the sniffer after each removal
| |
| :* When calls start being processed again, you have identified the problematic rule
| |
| :* Review the rule's match criteria (IP addresses, prefixes, direction) against your actual traffic pattern
| |
| :* Adjust the rule's conditions or Skip setting as needed
| |
| | |
| ;4. Verify rules are reloaded:
| |
| :After making changes to capture rules, remember that changes are '''not automatically applied''' to the running sniffer. You must click the '''"reload sniffer"''' button in the control panel, or the rules will continue using the previous configuration.
| |
| | |
| For more information on capture rules, see [[Capture_rules]].
| |
| | |
| == Troubleshoot MySQL/MariaDB Database Connection Errors ==
| |
| If you see "Connection refused (111)" errors or the sensor cannot connect to your database server, the issue is with the MySQL/MariaDB database connection configuration in <code>/etc/voipmonitor.conf</code>.
| |
| | |
| Error 111 (Connection refused) indicates that the database server is reachable on the network, but no MySQL/MariaDB service is listening on the specified port, or the connection is being blocked by a firewall. This commonly happens after migrations when the database server IP address has changed.
| |
| | |
| ;1. Check for database connection errors in sensor logs:
| |
| Verify the specific error from the sensor process:
| |
| <syntaxhighlight lang="bash">
| |
| # For Debian/Ubuntu (systemd journal)
| |
| journalctl -u voipmonitor --since "1 hour ago" | grep -iE "mysql|database|connection|can.t connect"
| |
| | |
| # For systems using traditional syslog
| |
| tail -f /var/log/syslog | grep voipmonitor | grep -iE "mysql|database|connection"
| |
| | |
| # For CentOS/RHEL/AlmaLinux
| |
| tail -f /var/log/messages | grep voipmonitor | grep -iE "mysql|database|connection"
| |
| </syntaxhighlight>
| |
| | |
| Look for errors like:
| |
| * <code>Can't connect to MySQL server on '192.168.1.10' (111)</code> - Connection refused (wrong host/port)
| |
| * <code>Access denied for user 'root'@'localhost'</code> - Wrong username/password
| |
| * <code>Unknown database 'voipmonitor'</code> - Wrong database name
| |
| | |
| ;2. Verify database connection parameters in <code>voipmonitor.conf</code>:
| |
| Open <code>/etc/voipmonitor.conf</code> and check the MySQL connection settings:
| |
| <syntaxhighlight lang="ini">
| |
| # Database Connection Parameters
| |
| mysqlhost = 192.168.1.10 # IP address or hostname of MySQL/MariaDB server
| |
| mysqlport = 3306 # TCP port of the database server (default: 3306)
| |
| mysqlusername = root # Database username
| |
| mysqlpassword = your_password # Database password
| |
| mysqldatabase = voipmonitor # Database name
| |
| </syntaxhighlight>
| |
| | |
| Key points:
| |
| * <code>mysqlhost</code>: Should be the IP address or hostname of the database server. After migration, this may have changed.
| |
| * <code>mysqlport</code>: Port 3306 is the default, but your database might use a different port.
| |
| * <code>mysqlusername</code>: Database user must have proper privileges.
| |
| * <code>mysqlpassword</code>: Ensure there are no typos or special character issues (surround with single quotes if needed).
| |
| * <code>mysqldatabase</code>: Database must exist on the server.
| |
| | |
| ;3. Test MySQL connectivity from the sensor host:
| |
| Use the <code>mysql</code> command-line client to test if the database is reachable from the sensor:
| |
| <syntaxhighlight lang="bash">
| |
| # Test basic TCP connectivity (replace IP and port as needed)
| |
| nc -zv 192.168.1.10 3306
| |
| | |
| # Or using telnet
| |
| telnet 192.168.1.10 3306
| |
| </syntaxhighlight>
| |
| | |
| If you see "Connection refused", the database service is not running or not listening on that port.
| |
| | |
| ;4. Test MySQL authentication using credentials from <code>voipmonitor.conf</code>:
| |
| Use the same credentials configured in <code>voipmonitor.conf</code> to verify they work:
| |
| <syntaxhighlight lang="bash">
| |
| mysql -h 192.168.1.10 -P 3306 -u root -p'your_password' voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| Commands to run inside mysql client to verify:
| |
| <syntaxhighlight lang="sql">
| |
| -- Check if connected correctly
| |
| SELECT USER(), CURRENT_USER();
| |
| | |
| -- Check database exists
| |
| SHOW DATABASES LIKE 'voipmonitor';
| |
| | |
| -- Test write access
| |
| USE voipmonitor;
| |
| SHOW TABLES;
| |
| EXIT;
| |
| </syntaxhighlight>
| |
| | |
| ;5. Compare with a working sensor's configuration:
| |
| If you have other sensors that successfully connect to the database, compare their configuration files:
| |
| <syntaxhighlight lang="bash">
| |
| # Compare database settings between working and failing sensors
| |
| diff <(grep -E "^mysql" /etc/voipmonitor.conf) <(grep -E "^mysql" /path/to/working/sensor/voipmonitor.conf)
| |
| </syntaxhighlight>
| |
| | |
| Common discrepancies after migration:
| |
| * Wrong database server IP address (<code>mysqlhost</code>)
| |
| * Wrong database port (<code>mysqlport</code>)
| |
| * Different password due to migration to new database server
| |
| * Using <code>localhost</code> vs actual IP address
| |
| | |
| ;6. Check firewall and network connectivity:
| |
| Ensure the sensor can reach the database server and the required port is open:
| |
| <syntaxhighlight lang="bash">
| |
| # Test network reachability
| |
| ping -c 4 192.168.1.10
| |
| | |
| # Check if MySQL port is reachable
| |
| nc -zv 192.168.1.10 3306
| |
| | |
| # Check firewall rules (if using firewalld)
| |
| firewall-cmd --list-ports
| |
| | |
| # Check firewall rules (if using iptables)
| |
| iptables -L -n | grep 3306
| |
| </syntaxhighlight>
| |
| | |
| If the port is blocked, you may need to:
| |
| * Open port 3306 in the firewall on the database server
| |
| * Configure network ACLs or security groups (for cloud deployments)
| |
| * Check VPN/SSH tunnel configurations
| |
| | |
| ;7. Verify MySQL/MariaDB service is running:
| |
| On the database server, check if the service is active:
| |
| <syntaxhighlight lang="bash">
| |
| # Check MySQL/MariaDB service status
| |
| systemctl status mariadb # or systemctl status mysql
| |
| | |
| # Restart service if needed
| |
| systemctl restart mariadb
| |
| | |
| # Check which port MySQL is listening on
| |
| ss -tulpn | grep mysql
| |
| # or
| |
| netstat -tulpn | grep mysql
| |
| </syntaxhighlight>
| |
| | |
| MySQL should be listening on the interface and port specified in your <code>voipmonitor.conf</code> <code>mysqlhost</code> and <code>mysqlport</code> settings.
| |
| | |
| ;8. Apply configuration changes and restart the sensor:
| |
| After correcting the database connection settings in <code>/etc/voipmonitor.conf</code>:
| |
| <syntaxhighlight lang="bash">
| |
| # Restart the VoIPmonitor service to apply changes
| |
| systemctl restart voipmonitor
| |
| | |
| # Alternatively, reload without full restart (if supported in your version)
| |
| echo 'reload' | nc 127.0.0.1 5029
| |
| | |
| # Verify the service started successfully
| |
| systemctl status voipmonitor
| |
| | |
| # Check logs for database connection confirmation
| |
| journalctl -u voipmonitor -n 20
| |
| </syntaxhighlight>
| |
| | |
| Look for a successful database connection message in the logs, which typically appears within the first few seconds after startup.
| |
| | |
| ;9. Common Troubleshooting Scenarios:
| |
| | |
| <b>Scenario A: Database server IP changed after migration</b>
| |
| * Symptom: "Can't connect to MySQL server on '10.1.1.10' (111)"
| |
| * Fix: Update <code>mysqlhost</code> in <code>/etc/voipmonitor.conf</code> to the new database server IP
| |
| | |
| <b>Scenario B: Wrong MySQL username or password</b>
| |
| * Symptom: "Access denied for user 'user'@'host'"
| |
| * Fix: Verify credentials match the database server's user permissions, update <code>mysqlusername</code> and <code>mysqlpassword</code>
| |
| | |
| <b>Scenario C: Database service not running</b>
| |
| * Symptom: "Connection refused (111)" or "Connection timed out"
| |
| * Fix: Start MySQL/MariaDB service on the database server: <code>systemctl start mariadb</code>
| |
| | |
| <b>Scenario D: Firewall blocking port 3306</b>
| |
| * Symptom: "Connection refused" when testing with <code>nc</code>, but MySQL is running
| |
| * Fix: Open port 3306 in firewall, or configure MySQL to allow connections from the sensor's IP in <code>user</code> table
| |
| | |
| <b>Scenario E: Localhost vs remote connection confusion</b>
| |
| * Symptom: Connection works locally but fails from sensor
| |
| * Fix: Ensure <code>mysqlhost</code> uses the actual IP address (not <code>localhost</code> or <code>127.0.0.1</code>) if the sensor is on a different host
| |
| | |
| For more detailed information about all <code>mysql*</code> configuration parameters, see [[Sniffer_configuration#Database_Configuration]].
| |
| | |
| == Check for Storage Hardware Errors (HEAP FULL / packetbuffer Issues) ==
| |
| If the sensor is crashing with "HEAP FULL" errors or showing "packetbuffer: MEMORY IS FULL" messages, you must distinguish between '''actual storage hardware failures''' (requires disk replacement) and '''performance bottlenecks''' (requires tuning).
| |
| | |
| ;1. Check kernel message buffer for storage errors:
| |
| <syntaxhighlight lang="bash">dmesg -T | grep -iE "ext4-fs error|i/o error|nvram warning|ata.*failed|sda.*error|disk failure|smart error" | tail -50</syntaxhighlight>
| |
| | |
| Look for these hardware error indicators:
| |
| * <code>ext4-fs error</code> - Filesystem corruption or disk failure
| |
| * <code>I/O error</code> or <code>BUG: soft lockup</code> - Disk read/write failures
| |
| * <code>NVRAM WARNING: nvram_check: failed</code> - RAID controller battery/capacitor issues
| |
| * <code>ata.*: FAILED</code> - Hard drive SMART failure
| |
| * <code>Buffer I/O error</code> - Disk unable to complete operations
| |
| | |
| If you see ANY of these errors:
| |
| * The storage subsystem is failing and likely needs hardware replacement
| |
| * Do not attempt performance tuning - replace the failed disk/RAID first
| |
| * Check SMART status: <code>smartctl -a /dev/sda</code>
| |
| * Check RAID health: <code>cat /proc/mdstat</code> or RAID controller tools
| |
| | |
| ;2. If dmesg is clean of errors → Performance Bottleneck:
| |
| If the kernel logs show no storage errors, the issue is a performance bottleneck (disk too slow, network latency, etc.).
| |
| | |
| <b>Check disk I/O performance:</b>
| |
| <syntaxhighlight lang="bash">
| |
| # Current I/O wait (should be < 10% normally)
| |
| iostat -x 5
| |
| | |
| # Detailed disk stats
| |
| dstat -d
| |
| | |
| # Real-time disk latency
| |
| ioping -c 10 .
| |
| </syntaxhighlight>
| |
| | |
| <b>Check NFS latency (if using NFS storage):</b>
| |
| <syntaxhighlight lang="bash">
| |
| # Test NFS read/write latency
| |
| time dd if=/dev/zero of=/var/spool/voipmonitor/testfile bs=1M count=100
| |
| time cat /var/spool/voipmonitor/testfile > /dev/null
| |
| rm /var/spool/voipmonitor/testfile
| |
| | |
| # Check NFS mount options
| |
| mount | grep nfs
| |
| </syntaxhighlight>
| |
| | |
| <b>Common performance solutions:</b>
| |
| * Use SSD/NVMe for VoIPmonitor spool directory
| |
| * Ensure proper NIC queue settings for high-throughput NFS
| |
| * Check network switch port configuration for NFS
| |
| * Review [[Scaling]] guide for detailed optimization
| |
| | |
| See also [[IO_Measurement]] for comprehensive disk benchmarking tools.
| |
| | |
| == Check for OOM (Out of Memory) Issues ==
| |
| If VoIPmonitor suddenly stops processing CDRs and a service restart temporarily restores functionality, the system may be experiencing OOM (Out of Memory) killer events. The Linux OOM killer terminates processes when available RAM is exhausted, and MySQL (<code>mysqld</code>) is a common target due to its memory-intensive nature.
| |
| | |
| ;1. Check for OOM killer events in kernel logs:
| |
| <syntaxhighlight lang="bash">
| |
| # For Debian/Ubuntu
| |
| grep -i "out of memory\|killed process" /var/log/syslog | tail -20
| |
| | |
| # For CentOS/RHEL/AlmaLinux
| |
| grep -i "out of memory\|killed process" /var/log/messages | tail -20
| |
| | |
| # Also check dmesg:
| |
| dmesg | grep -i "killed process" | tail -10
| |
| </syntaxhighlight>
| |
| Typical OOM killer messages look like:
| |
| <syntaxhighlight lang="text">
| |
| Out of memory: Kill process 1234 (mysqld) score 123 or sacrifice child
| |
| Killed process 1234 (mysqld) total-vm: 12345678kB, anon-rss: 1234567kB
| |
| </syntaxhighlight>
| |
| | |
| ;2. Monitor current memory usage:
| |
| <syntaxhighlight lang="bash">
| |
| # Check available memory (look for low 'available' or 'free' values)
| |
| free -h
| |
| | |
| # Check per-process memory usage (sorted by RSS)
| |
| ps aux --sort=-%mem | head -15
| |
| | |
| # Check MySQL memory usage in bytes
| |
| cat /proc/$(pgrep mysqld)/status | grep -E "VmSize|VmRSS"
| |
| </syntaxhighlight>
| |
| Warning signs:
| |
| * '''Available memory consistently below 500MB during operation'''
| |
| * '''MySQL consuming most of the available RAM'''
| |
| * '''Swap usage near 100% (if swap is enabled)'''
| |
| * '''Frequent process restarts without clear error messages'''
| |
| | |
| ;3. First Fix: Check and correct innodb_buffer_pool_size:
| |
| Before upgrading hardware, verify that <code>innodb_buffer_pool_size</code> is not set too high. This is a common cause of OOM incidents. If MySQL/MariaDB is consuming most of the available RAM, the buffer pool size is likely configured incorrectly for your system.
| |
| | |
| '''Calculate the correct buffer pool size:'''
| |
| For a server running both VoIPmonitor and MySQL on the same host:
| |
| <syntaxhighlight lang="text">
| |
| Formula: innodb_buffer_pool_size = (Total RAM - VoIPmonitor memory - OS & services overhead - safety margin) / 2
| |
| | |
| Example for a 32GB server:
| |
| - Total RAM: 32GB
| |
| - VoIPmonitor process memory (check with ps aux): 2GB
| |
| - OS + other services overhead: 2GB
| |
| - Safety margin: ~25-30% of remaining RAM for other internal buffers
| |
| | |
| Calculation:
| |
| Available for buffer pool = 32GB - 2GB - 2GB = 28GB
| |
| Recommended innodb_buffer_pool_size = 14G (approximately 50% of available memory)
| |
| </syntaxhighlight>
| |
| | |
| '''Edit the MariaDB configuration file:'''
| |
| <syntaxhighlight lang="ini">
| |
| # Common locations: /etc/mysql/my.cnf, /etc/mysql/mariadb.conf.d/50-server.cnf, /etc/my.cnf.d/
| |
| | |
| innodb_buffer_pool_size = 14G # Adjust based on your calculation
| |
| </syntaxhighlight>
| |
| | |
| '''Restart MariaDB to apply:'''
| |
| <syntaxhighlight lang="bash">systemctl restart mariadb # or systemctl restart mysql</syntaxhighlight>
| |
| | |
| If the OOM events stop after correcting <code>innodb_buffer_pool_size</code>, no hardware upgrade is needed.
| |
| | |
| ;4. Second Fix: Reduce VoIPmonitor buffer memory usage:
| |
| In addition to MySQL memory consumption, VoIPmonitor itself allocates significant memory for packet buffers. The total buffer memory used by VoIPmonitor is calculated based on:
| |
| | |
| '''VoIPmonitor Buffer Memory Calculation:'''
| |
| * '''<code>ringbuffer</code>''': Ring buffer size in MB per interface (default: 50MB, recommended ≥500MB for >100 Mbit traffic)
| |
| * '''<code>max_buffer_mem</code>''': Maximum buffer memory limit in MB (default: 2000MB)
| |
| * '''Number of sniffing interfaces''': Each interface gets its own ringbuffer allocation
| |
| * '''Total formula''': Approximate total = (ringbuffer × number of interfaces) + max_buffer_mem
| |
| | |
| If you are monitoring multiple interfaces (e.g., <code>interface = eth0,eth1,eth2</code>), each interface uses a separate ringbuffer. With the default ringbuffer of 50MB and 3 interfaces, that's 150MB plus max_buffer_mem of 2000MB, totaling approximately 2150MB.
| |
| | |
| '''To reduce VoIPmonitor memory usage:'''
| |
| | |
| Edit <code>/etc/voipmonitor.conf</code> and decrease buffer settings:
| |
| <syntaxhighlight lang="ini">
| |
| # Reduce ringbuffer for each interface (e.g., from 50 to 20)
| |
| ringbuffer = 20
| |
| | |
| # Reduce maximum buffer memory (e.g., from 2000 to 1000)
| |
| max_buffer_mem = 1000
| |
| | |
| # Alternatively, reduce the number of sniffing interfaces if not all are needed
| |
| interface = eth0,eth1 # Instead of eth0,eth1,eth2,eth3
| |
| </syntaxhighlight>
| |
| | |
| After making changes, restart the VoIPmonitor service:
| |
| <syntaxhighlight lang="bash">systemctl restart voipmonitor</syntaxhighlight>
| |
| | |
| '''Important notes:'''
| |
| * Reducing <code>ringbuffer</code> may increase packet loss during traffic spikes
| |
| * Reducing <code>max_buffer_mem</code> affects how many packets can be buffered before being written to disk
| |
| * Monitor packet loss statistics in the GUI after reducing buffers to ensure acceptable performance
| |
| | |
| ;5. Solution: Increase physical memory (if buffer tuning is insufficient):
| |
| If correcting both MySQL and VoIPmonitor buffer settings does not resolve the OOM issues, upgrade the server's physical RAM. After upgrading:
| |
| * Verify memory improvements with <code>free -h</code>
| |
| * Recalculate and adjust <code>innodb_buffer_pool_size</code> to utilize the additional memory
| |
| * Re-tune <code>ringbuffer</code> and <code>max_buffer_mem</code> for the new memory capacity
| |
| * Monitor for several days to ensure OOM events stop
| |
| | |
| == Sensor Upgrade Fails with "Permission denied" from /tmp ==
| |
| If the sensor upgrade process fails with "Permission denied" errors when executing scripts from the <code>/tmp</code> directory, or the service fails to restart after upgrade, the <code>/tmp</code> partition may be mounted with the <code>noexec</code> flag.
| |
| | |
| The <code>noexec</code> mount option prevents execution of any script or binary from the <code>/tmp</code> directory for security reasons. However, the VoIPmonitor sensor upgrade process uses <code>/tmp</code> for temporary script execution.
| |
| | |
| ;1. Check the mount options for /tmp:
| |
| <syntaxhighlight lang="bash">mount | grep /tmp</syntaxhighlight>
| |
| Look for the <code>noexec</code> flag in the mount options. Output will show something like:
| |
| <syntaxhighlight lang="text">/dev/sda2 on /tmp type ext4 rw,relatime,noexec,nosuid,nodev</syntaxhighlight>
| |
| | |
| ;2. Remount /tmp without noexec (temporary fix):
| |
| <syntaxhighlight lang="bash">mount -o remount,exec /tmp</syntaxhighlight>
| |
| Verify the change:
| |
| <syntaxhighlight lang="bash">mount | grep /tmp</syntaxhighlight>
| |
| The output should no longer contain <code>noexec</code>.
| |
| | |
| ;3. Make the change permanent (edit /etc/fstab):
| |
| Open the <code>/etc/fstab</code> file and locate the line corresponding to the <code>/tmp</code> mount point. Remove the <code>noexec</code> option from that line.
| |
| <syntaxhighlight lang="bash">nano /etc/fstab</syntaxhighlight>
| |
| Example:
| |
| <syntaxhighlight lang="text">
| |
| # Before:
| |
| /dev/sda2 /tmp ext4 rw,relatime,noexec,nosuid,nodev 0 0
| |
| | |
| # After (remove noexec):
| |
| /dev/sda2 /tmp ext4 rw,relatime,nosuid,nodev 0 0
| |
| </syntaxhighlight>
| |
| If <code>/tmp</code> is a separate partition, you may need to remount it for changes to take effect:
| |
| <syntaxhighlight lang="bash">mount -o remount /tmp</syntaxhighlight>
| |
| | |
| ;4. Re-run the sensor upgrade:
| |
| After fixing the mount options, retry the sensor upgrade process.
| |
| | |
| == "No space left on device" Despite Disks Having Free Space ==
| |
| If system services (like php-fpm, voipmonitor, or commands like <code>screen</code>) fail with a "No space left on device" error even though <code>df -h</code> shows sufficient disk space, the issue is likely with '''temporary filesystems''' (<code>/tmp</code>, <code>/run</code>) filling up, not with main disk storage.
| |
| | |
| ;1. Check usage of temporary filesystems:
| |
| <syntaxhighlight lang="bash">
| |
| # Check /tmp usage
| |
| df -h /tmp
| |
| | |
| # Check /run usage
| |
| df -h /run
| |
| </syntaxhighlight>
| |
| | |
| If <code>/tmp</code> or <code>/run</code> show 100% usage despite main filesystems having free space, these temporary filesystems need to be cleaned.
| |
| | |
| ;2. Check what is consuming space:
| |
| <syntaxhighlight lang="bash">
| |
| # Find large files in /tmp
| |
| du -sh /tmp/* 2>/dev/null | sort -hr | head -20
| |
| | |
| # Check journal disk usage
| |
| journalctl --disk-usage
| |
| </syntaxhighlight>
| |
| | |
| ;3. Immediate cleanup of journal logs:
| |
| System journal logs stored in <code>/run/log/journal/</code> can fill up the <code>/run</code> filesystem.
| |
| <syntaxhighlight lang="bash">
| |
| # Limit journal to 100MB total size
| |
| sudo journalctl --vacuum-size=100M
| |
| | |
| # Or limit by time (keep only last 2 days)
| |
| sudo journalctl --vacuum-time=2d
| |
| </syntaxhighlight>
| |
| | |
| ;4. Permanent solution - Configure journal rotation:
| |
| Edit <code>/etc/systemd/journald.conf</code>:
| |
| <syntaxhighlight lang="ini">
| |
| SystemMaxUse=100M
| |
| MaxRetentionSec=1month
| |
| </syntaxhighlight>
| |
| | |
| Apply changes:
| |
| <syntaxhighlight lang="bash">sudo systemctl restart systemd-journald</syntaxhighlight>
| |
| | |
| ;5. Quick fix - System reboot:
| |
| The quickest way to free space in <code>/tmp</code> and <code>/run</code> is a system reboot, as these filesystems are cleared on each boot.
| |
| | |
| == Check VoIPmonitor Logs for General Errors ==
| |
| After addressing the specific issues above, check the system logs for other error messages from the sensor process that may reveal additional problems.
| |
| | |
| <syntaxhighlight lang="bash">
| |
| # For Debian/Ubuntu
| |
| tail -f /var/log/syslog | grep voipmonitor
| |
| | |
| # For CentOS/RHEL/AlmaLinux
| |
| tail -f /var/log/messages | grep voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| Look for errors like:
| |
| * "pcap_open_live(eth0) error: eth0: No such device" (Wrong interface name)
| |
| * "Permission denied" (The sensor is not running with sufficient privileges)
| |
| * Messages about connection issues (see [[#Troubleshoot MySQL/MariaDB Database Connection Errors|Troubleshoot MySQL/MariaDB Database Connection Errors]])
| |
| * Messages about dropping packets
| |
| | |
| == Benign Database Errors When Features Are Disabled ==
| |
| Some VoIPmonitor features may generate harmless database errors when those features are not enabled in your configuration. These errors are '''benign''' (cause no harm to the system) and can be safely ignored.
| |
| | |
| === Common Benign Error: Missing Tables ===
| |
| If you see MySQL errors stating that a table does not exist (e.g., "Table 'voipmonitor.ss7' doesn't exist") even though the corresponding feature is disabled in your configuration, this is expected behavior.
| |
| | |
| ;Common examples:
| |
| * Errors about the <code>ss7</code> table when <code>ss7 = no</code> in <code>voipmonitor.conf</code>
| |
| * Errors about the <code>register_failed</code>, <code>register_state</code>, or <code>sip_msg</code> tables when those features are disabled
| |
| * Other similar errors for optional features that are not enabled
| |
| | |
| === Solution: Ignore or Suppress in Monitoring ===
| |
| Since these errors indicate that a feature is simply not active, they do not impact system functionality. The recommended approach is:
| |
| | |
| # '''Do not change the configuration''' to fix these errors
| |
| # '''Add monitoring exceptions''' to suppress warnings for table-not-found errors (MySQL error code 1146)
| |
| # Configure alerting systems to exclude these specific SQL errors from notifications
| |
| | |
| This prevents alert noise while keeping your logs intact for real issues that require attention.
| |
| | |
| === When to Take Action ===
| |
| You only need to take action in the following situations:
| |
| | |
| * If you actually want to use the feature (enable the corresponding configuration option)
| |
| * If errors persist about tables for features that '''are''' explicitly enabled in <code>voipmonitor.conf</code>
| |
| | |
| Otherwise, these database errors are simply informational and confirm that optional features remain inactive as configured.
| |
| | |
| == Appendix: tshark Display Filter Syntax for SIP ==
| |
| When using <code>tshark</code> to analyze SIP traffic, it is important to use the '''correct Wireshark display filter syntax'''. Below are common filter examples:
| |
| | |
| === Basic SIP Filters ===
| |
| <syntaxhighlight lang="bash">
| |
| # Show all SIP INVITE messages
| |
| tshark -r capture.pcap -Y "sip.Method == INVITE"
| |
| | |
| # Show all SIP messages (any method)
| |
| tshark -r capture.pcap -Y "sip"
| |
| | |
| # Show SIP and RTP traffic
| |
| tshark -r capture.pcap -Y "sip || rtp"
| |
| </syntaxhighlight>
| |
| | |
| === Search for Specific Phone Number or Text ===
| |
| <syntaxhighlight lang="bash">
| |
| # Find calls containing a specific phone number (e.g., 5551234567)
| |
| tshark -r capture.pcap -Y 'sip contains "5551234567"'
| |
| | |
| # Find INVITE messages for a specific number
| |
| tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"'
| |
| </syntaxhighlight>
| |
| | |
| === Extract Call-ID from Matching Calls ===
| |
| <syntaxhighlight lang="bash">
| |
| # Get Call-ID for calls matching a phone number
| |
| tshark -r capture.pcap -Y 'sip.Method == INVITE && sip contains "5551234567"' -T fields -e sip.Call-ID
| |
| | |
| # Get Call-ID along with From and To headers
| |
| tshark -r capture.pcap -Y 'sip.Method == INVITE' -T fields -e sip.Call-ID -e sip.from.user -e sip.to.user
| |
| </syntaxhighlight>
| |
| | |
| === Filter by IP Address ===
| |
| <syntaxhighlight lang="bash">
| |
| # SIP traffic from a specific source IP
| |
| tshark -r capture.pcap -Y "sip && ip.src == 192.168.1.100"
| |
| | |
| # SIP traffic between two hosts
| |
| tshark -r capture.pcap -Y "sip && ip.addr == 192.168.1.100 && ip.addr == 10.0.0.50"
| |
| </syntaxhighlight>
| |
| | |
| === Filter by SIP Response Code ===
| |
| <syntaxhighlight lang="bash">
| |
| # Show all 200 OK responses
| |
| tshark -r capture.pcap -Y "sip.Status-Code == 200"
| |
| | |
| # Show all 4xx and 5xx error responses
| |
| tshark -r capture.pcap -Y "sip.Status-Code >= 400"
| |
| | |
| # Show 486 Busy Here responses
| |
| tshark -r capture.pcap -Y "sip.Status-Code == 486"
| |
| </syntaxhighlight>
| |
| | |
| === Important Syntax Notes ===
| |
| * '''Field names are case-sensitive:''' Use <code>sip.Method</code>, <code>sip.Call-ID</code>, <code>sip.Status-Code</code> (not <code>sip.method</code> or <code>sip.call-id</code>)
| |
| * '''String matching uses <code>contains</code>:''' Use <code>sip contains "text"</code> (not <code>sip.contains()</code>)
| |
| * '''Use double quotes for strings:''' <code>sip contains "number"</code> (not single quotes)
| |
| * '''Boolean operators:''' Use <code>&&</code> (and), <code>||</code> (or), <code>!</code> (not)
| |
| | |
| For a complete reference, see the [https://www.wireshark.org/docs/dfref/s/sip.html Wireshark SIP Display Filter Reference].
| |
| | |
| == AI Summary for RAG ==
| |
| '''Summary:''' Comprehensive troubleshooting guide for VoIPmonitor sensor issues. Covers: (1) Service not running - check <code>systemctl status</code>, binary renamed after crash (<code>voipmonitor_</code>), unresponsive after GUI update (use <code>killall</code>). (2) No traffic - use <code>tshark</code> to verify packets, check promiscuous mode for SPAN/RSPAN (not needed for ERSPAN/GRE/TZSP). (3) Config issues - verify <code>interface</code>, <code>sipport</code>, <code>filter</code> in voipmonitor.conf. (4) Capture rules - GUI "Skip" option blocks calls. (5) Database errors - "Connection refused (111)" after migration, check <code>mysqlhost</code>. (6) HEAP FULL - check <code>dmesg</code> for hardware errors vs performance bottleneck. (7) OOM killer - reduce <code>innodb_buffer_pool_size</code> and <code>ringbuffer</code>/<code>max_buffer_mem</code>. (8) Upgrade fails - <code>/tmp</code> mounted with <code>noexec</code>. (9) "No space left" despite free disk - check <code>/tmp</code> and <code>/run</code> filesystems, vacuum journal logs.
| |
| | |
| '''Keywords:''' troubleshooting, no calls, tshark, promiscuous mode, SPAN, ERSPAN, voipmonitor.conf, interface, sipport, capture rules, Skip, OOM killer, innodb_buffer_pool_size, ringbuffer, max_buffer_mem, HEAP FULL, Connection refused, noexec, /tmp, journal logs, no space left on device
| |
| | |
| '''Key Questions:'''
| |
| * Why is VoIPmonitor not recording any calls?
| |
| * How do I check if VoIP traffic is reaching my sensor?
| |
| * Do I need promiscuous mode for ERSPAN or GRE tunnels?
| |
| * How do I fix "Connection refused (111)" database errors?
| |
| * VoIPmonitor crashes with HEAP FULL error, what should I check?
| |
| * How do I fix OOM killer issues on VoIPmonitor server?
| |
| * How do I calculate the correct innodb_buffer_pool_size?
| |
| * Why does sensor upgrade fail with permission denied from /tmp?
| |
| * "No space left on device" error but disk has free space, what to check?
| |
| * How do I clean up journal logs filling /run filesystem?
| |