|
|
| (7 intermediate revisions by the same user not shown) |
| Line 1: |
Line 1: |
| {{DISPLAYTITLE:Emergency Procedures & System Recovery}} | | {{DISPLAYTITLE:Emergency Procedures: GUI Performance Crisis}} |
| | [[Category:Troubleshooting]] |
| | [[Category:Database]] |
|
| |
|
| '''This guide covers emergency procedures for recovering your VoIPmonitor system from critical failures, including runaway processes, high CPU usage, and system unresponsiveness.'''
| | = GUI Performance Crisis: Database Bottleneck Diagnosis = |
|
| |
|
| == Emergency: VoIPmonitor Process Consuming Excessive CPU or System Unresponsive ==
| | When VoIPmonitor GUI becomes unresponsive or PHP processes are killed by OOM, the root cause is often a '''database bottleneck''', not PHP configuration. This guide shows how to diagnose using sensor RRD charts. |
|
| |
|
| When a VoIPmonitor process consumes excessive CPU (e.g., ~3000% or more) or causes the entire system to become unresponsive, follow these immediate steps:
| | {{Note|For general troubleshooting, see [[Database_troubleshooting]], [[GUI_troubleshooting]], or [[Sniffer_troubleshooting]].}} |
|
| |
|
| === Immediate Action: Force-Terminate Runaway Process === | | == Symptoms == |
|
| |
|
| If the system is still minimally responsive via SSH or requires out-of-band management (iDRAC, IPMI, console):
| | * GUI extremely slow or unresponsive during peak hours |
| | * PHP processes killed by OOM killer |
| | * Dashboard and CDR views take long to load |
| | * Alerts/reports fail during high traffic |
| | * System fine during off-peak, degrades during peak |
|
| |
|
| ;1. Identify the Process ID (PID):
| | == Diagnostic Flowchart == |
| <syntaxhighlight lang="bash">
| |
| # Using htop (if available)
| |
| htop
| |
|
| |
|
| # Or using ps
| | <kroki lang="mermaid"> |
| ps aux | grep voipmonitor
| | %%{init: {'flowchart': {'nodeSpacing': 15, 'rankSpacing': 35}}}%% |
| </syntaxhighlight> | | flowchart TD |
| | A[GUI Slow / OOM Errors] --> B{Check RRD Charts<br/>Settings → Sensors → 📊} |
| | B --> C{SQL Cache growing<br/>during peak?} |
| | C -->|No| D[NOT database bottleneck<br/>Check PHP/Apache config] |
| | C -->|Yes| E[Database Bottleneck<br/>Confirmed] |
| | E --> F{mysqld CPU ~100%?} |
| | F -->|Yes| G[CPU Bottleneck<br/>→ Upgrade CPU] |
| | F -->|No| H{High iowait?<br/>HDD storage?} |
| | H -->|Yes| I[I/O Bottleneck<br/>→ Upgrade to SSD/NVMe] |
| | H -->|No| J[Memory Bottleneck<br/>→ Add RAM + tune buffer_pool] |
|
| |
|
| Look for the voipmonitor process consuming the most CPU resources. Note down the PID (process ID number).
| | style A fill:#f9f,stroke:#333 |
| | style E fill:#ff9,stroke:#333 |
| | </kroki> |
|
| |
|
| ;2. Forcefully terminate the process:
| | == Step 1: Access RRD Charts == |
| <syntaxhighlight lang="bash">
| |
| kill -9 <PID>
| |
| </syntaxhighlight>
| |
|
| |
|
| Replace <PID> with the actual process ID number identified in step 1.
| | # Navigate to '''Settings → Sensors''' |
| | # Click the '''graph icon''' (📊) next to the sensor |
| | # Select time range covering problematic peak hours |
|
| |
|
| ;3. Verify system recovery:
| | == Step 2: Identify Growing SQL Cache == |
| <syntaxhighlight lang="bash">
| |
| # Check CPU usage has returned to normal
| |
| top
| |
|
| |
|
| # Check if the process was terminated
| | The key indicator is '''SQL Cache''' or '''SQL Cache Files''' growing during peak hours: |
| ps aux | grep voipmonitor
| |
| </syntaxhighlight>
| |
|
| |
|
| The system should become responsive again immediately after the process is killed. CPU utilization should drop significantly.
| | {| class="wikitable" |
| | | |- |
| === Optional: Stop and Restart the Service (for persistent issues) ===
| | ! Metric !! What to Look For !! Indicates |
| | | |- |
| If the problem persists or the service needs to be cleanly restarted:
| | | '''SQL Cache''' || Consistently increasing, never decreasing || DB cannot keep up with inserts |
| | | |- |
| <syntaxhighlight lang="bash">
| | | '''SQL Cache Files''' || Growing over time || Buffer pool too small or storage too slow |
| # Stop the voipmonitor service
| | |- |
| systemctl stop voipmonitor
| | | '''mysqld CPU''' || Near 100% || CPU bottleneck |
| | | |- |
| # Verify no zombie processes remaining
| | | '''Disk I/O (mysql)''' || High/saturated || Storage bottleneck (HDD vs SSD) |
| killall voipmonitor
| | |} |
| | |
| # Restart the service
| |
| systemctl start voipmonitor
| |
| | |
| # Verify service status
| |
| systemctl status voipmonitor
| |
| </syntaxhighlight>
| |
|
| |
|
| '''Caution:''' When using <code>systemd</code> service management, avoid using the deprecated <code>service</code> command as it can cause systemd to lose track of the daemon. Always use <code>systemctl</code> commands or direct process commands like <code>killall</code>.
| | {{Warning|1=If SQL Cache is NOT growing, the problem is likely NOT the database. Check PHP/Apache configuration instead.}} |
|
| |
|
| === Root Cause Analysis: Why Did the CPU Spike? === | | == Step 3: Identify Bottleneck Type == |
|
| |
|
| After recovering the system, investigate the root cause to prevent recurrence. Common causes include:
| | === CPU Bottleneck === |
| | * <code>mysqld</code> at or near 100% CPU |
| | * '''Solution:''' More CPU cores or faster CPU |
|
| |
|
| ;SIP REGISTER Flood / Spaming Attack
| | === Memory Bottleneck === |
| Massive volumes of SIP REGISTER messages from malicious IPs can overwhelm the VoIPmonitor process.
| | * SQL cache fills up and stays full |
| | * Buffer pool too small for dataset |
| | * '''Solution:''' Add RAM, tune <code>innodb_buffer_pool_size</code> |
|
| |
|
| * '''Detection:''' Check recent alert triggers in the VoIPmonitor GUI > Alerts > Sent Alerts for SIP REGISTER flood alerts | | === Storage I/O Bottleneck (Most Common) === |
| * '''Immediate mitigation:''' Block attacker IPs at the network edge (SBC, firewall, iptables) | | * High <code>iowait</code> during peak hours |
| * '''Long-term prevention:''' Configure anti-fraud rules with custom scripts to auto-block, see [[Anti-fraud#SIP REGISTER Flood/Attack|SIP REGISTER Flood Mitigation]] | | * Database on magnetic disks (HDD) |
| | * '''Solution:''' Upgrade to SSD/NVMe (10-50x improvement) |
|
| |
|
| ;Packet Capture Overload (pcapcommand)
| | == Solutions == |
| The <code>pcapcommand</code> feature forks a program for ''every'' call, which can generate up to 500,000 interrupts per second.
| |
|
| |
|
| * '''Detection:''' Check <code>/etc/voipmonitor.conf</code> for a <code>pcapcommand</code> line
| | === Add RAM to Database Server === |
| * '''Immediate fix:''' Comment out or remove the <code>pcapcommand</code> directive and restart the service
| |
| * '''Alternative:''' Use the built-in cleaning spool functionality (<code>maxpoolsize</code>, <code>cleanspool</code>) instead
| |
| | |
| ;Excessive RTP Processing Threads
| |
| High concurrent call volumes can overload RTP processing threads.
| |
| | |
| * '''Detection:''' Check performance logs for high <code>tRTP_CPU</code> values (sum of all RTP threads)
| |
| * '''Mitigation:'''
| |
| <pre>callslimit = 2000 # Limit max concurrent calls</pre>
| |
| | |
| ;Audio Feature Overhead
| |
| Silence detection and audio conversion are CPU-intensive operations.
| |
| | |
| * '''Detection:''' Check if <code>silencedetect</code> or <code>saveaudio</code> are enabled
| |
| * '''Mitigation:'''
| |
| <pre>
| |
| silencedetect = no
| |
| # saveaudio = wav # Comment out if not needed
| |
| </pre>
| |
| | |
| See [[Scaling|Scaling and Performance Tuning]] for detailed performance optimization strategies.
| |
| | |
| === Preventive Measures ===
| |
| | |
| Once the root cause is identified, implement these preventive configurations:
| |
| | |
| ;Monitor CPU Trends:
| |
| Use [[Collectd_installation|collectd]] or your existing monitoring system to track CPU usage over time and receive alerts before critical thresholds are reached.
| |
| | |
| ;Anti-Fraud Auto-Blocking:
| |
| Configure [[Anti-fraud|Anti-Fraud rules]] with custom scripts to automatically block attacker IPs when a flood is detected. See the [[Anti-fraud|Anti-Fraud documentation]] for PHP script examples using iptables or ipset.
| |
| | |
| ;Network Edge Protection:
| |
| Block SIP REGISTER spam and floods at your network edge (SBC, firewall) before traffic reaches VoIPmonitor. This provides better performance and reduces CPU load on the monitoring system.
| |
| | |
| == Emergency: GUI and CLI Frequently Inaccessible Due to Memory Exhaustion ==
| |
| | |
| When the VoIPmonitor GUI and CLI become frequently inaccessible or the server becomes unresponsive due to Out of Memory (OOM) conditions, follow these steps to identify and resolve the issue.
| |
| | |
| === Diagnose OOM Events ===
| |
| | |
| The Linux kernel out-of-memory (OOM) killer terminates processes when RAM is exhausted.
| |
| | |
| ;Check the kernel ring buffer for OOM events:
| |
| <syntaxhighlight lang="bash">
| |
| dmesg -T | grep -i killed
| |
| </syntaxhighlight>
| |
| | |
| If you see messages like "Out of memory: Kill process" or "invoke-oom-killer", your system is running out of physical RAM.
| |
| | |
| === Immediate Relief: Reduce Memory Allocation ===
| |
| | |
| Reduce memory consumption by tuning both MySQL and VoIPmonitor parameters.
| |
| | |
| ;1. Reduce MySQL Buffer Pool Size:
| |
| | |
| Edit the MySQL configuration file (typically <code>/etc/my.cnf.d/mysql-server.cnf</code> or <code>/etc/mysql/my.cnf</code> for Debian/Ubuntu):
| |
|
| |
|
| <syntaxhighlight lang="ini"> | | <syntaxhighlight lang="ini"> |
| [mysqld]
| | # /etc/mysql/my.cnf |
| # Reduce from 8GB to 6GB (adjust based on available RAM) | | # Set to 50-70% of total RAM on dedicated DB server |
| innodb_buffer_pool_size = 6G | | innodb_buffer_pool_size = 64G |
| </syntaxhighlight> | | </syntaxhighlight> |
|
| |
|
| A good starting point is <code>innodb_buffer_pool_size = RAM * 0.5 - max_buffer_mem * 0.8</code>. For example, on a 16GB server with 8GB allocated to max_buffer_mem, set innodb_buffer_pool_size to approximately 6GB.
| | {| class="wikitable" |
| | |- |
| | ! Current RAM !! Recommended !! <code>innodb_buffer_pool_size</code> |
| | |- |
| | | 32GB || 64-128GB || 32-64G |
| | |- |
| | | 64GB || 128-256GB || 64-128G |
| | |} |
|
| |
|
| ;2. Reduce VoIPmonitor Buffer Memory:
| | === Upgrade to SSD/NVMe === |
|
| |
|
| Edit <code>/etc/voipmonitor.conf</code> and decrease the <code>max_buffer_mem</code> value:
| | {| class="wikitable" |
| | | |- |
| <syntaxhighlight lang="ini">
| | ! Current !! Upgrade To !! Expected Speedup |
| [general]
| |
| # Reduce from 8000 to 6000 (adjust based on available RAM)
| |
| max_buffer_mem = 6000
| |
| </syntaxhighlight>
| |
| | |
| The <code>max_buffer_mem</code> parameter limits the maximum RAM allocation for the packet buffer. Typical values range from 2000-8000 MB depending on traffic volume and call rates.
| |
| | |
| ;3. Restart the affected services:
| |
| | |
| <syntaxhighlight lang="bash">
| |
| systemctl restart mysqld
| |
| systemctl restart voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| Monitor the system to confirm stability.
| |
| | |
| === Long-term Solution: Increase RAM ===
| |
| | |
| For sustained production operation, increase the server's physical RAM:
| |
| | |
| * '''Minimum''': Add at least 16 GB of additional RAM to eliminate OOM conditions
| |
| * '''Performance benefit''': After the RAM upgrade, you can safely increase <code>innodb_buffer_pool_size</code> to improve MySQL performance
| |
| * '''Recommended settings''': Set <code>innodb_buffer_pool_size</code> to 50-70% of total RAM and <code>max_buffer_mem</code> based on your traffic requirements
| |
| | |
| See [[Sniffer_configuration#max_buffer_mem|Sniffer Configuration]] for details on VoIPmonitor memory settings.
| |
| | |
| == Emergency: Diagnosing System Hangs and Collecting Core Dump Evidence ==
| |
| | |
| When the VoIPmonitor system hangs, packet buffer (heap) spikes to 100%, and a single CPU core is pegged at 100%, you need to diagnose the issue and collect evidence for developer analysis before restarting.
| |
| | |
| ### Identify the Problematic Thread
| |
| | |
| Use the Manager API to identify which sniffer thread is consuming excessive CPU resources.
| |
| | |
| <syntaxhighlight lang="bash">
| |
| # Query thread statistics from the sensor
| |
| echo 'sniffer_threads' | nc <sensor_ip> 5029
| |
| </syntaxhighlight>
| |
| | |
| Replace <sensor_ip> with the actual IP address of your VoIPmonitor sensor. Look for a thread showing approximately 100% CPU usage. This indicates the specific processing thread that is causing the hang.
| |
| | |
| ### Generate Core Dump for Developer Analysis
| |
| | |
| If a thread is pegged at 100% and the system needs to be analyzed by VoIPmonitor developers, generate a core dump before restarting:
| |
| | |
| ;1. Find the VoIPmonitor process ID (PID):
| |
| <syntaxhighlight lang="bash">
| |
| ps aux | grep voipmonitor | grep -v grep
| |
| </syntaxhighlight>
| |
| | |
| ;2. Attach to the process with gdb and generate a core dump:
| |
| <syntaxhighlight lang="bash">
| |
| gdb -p <PID_of_voipmonitor>
| |
| # Within gdb, generate the core dump
| |
| gcore <output_file>
| |
| | |
| # Example:
| |
| gdb -p 12345
| |
| (gdb) gcore /tmp/voipmonitor_hang.core
| |
| </syntaxhighlight>
| |
| | |
| The core dump file provides developers with a complete snapshot of the process state at the moment of the hang, including memory, registers, and stack traces.
| |
| | |
| ;3. Detach from gdb and quit:
| |
| <syntaxhighlight lang="bash"> detach
| |
| quit
| |
| </syntaxhighlight>
| |
| | |
| ### Restore Service and Collect Evidence
| |
| | |
| After collecting the diagnostic evidence, restart the service to restore operation:
| |
| | |
| <syntaxhighlight lang="bash">
| |
| systemctl restart voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| Provide the following files to VoIPmonitor support for analysis:
| |
| | |
| * Core dump file (from gcore command)
| |
| * Thread statistics output (from sniffer_threads command)
| |
| * Performance logs (/var/log/syslog showing the hang period)
| |
| * Configuration file (/etc/voipmonitor.conf)
| |
| | |
| '''Important:''' Core dump files can be very large (several GB depending on max_buffer_mem). Ensure you have sufficient disk space and consider compressing the file before transferring it to support.
| |
| | |
| == Emergency: System Freezes on Every Update Attempt ==
| |
| | |
| If the VoIPmonitor sensor becomes unresponsive or hangs each time you attempt to update it through the Web GUI:
| |
| | |
| ;1. SSH into the sensor host
| |
| ;2. Execute the following commands to forcefully stop and restart:
| |
| <syntaxhighlight lang="bash">
| |
| killall voipmonitor
| |
| systemctl stop voipmonitor
| |
| systemctl start voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| This sequence ensures zombie processes are terminated, systemd is fully stopped, and a clean service restart occurs. Verify the sensor status in the GUI to confirm it is responding correctly.
| |
| | |
| == Emergency: Binary Not Found After Crash ==
| |
| | |
| If the VoIPmonitor service fails to start after a crash with error "Binary not found" for <code>/usr/local/sbin/voipmonitor</code>:
| |
| | |
| ;1. Check for a renamed binary:
| |
| <syntaxhighlight lang="bash">
| |
| ls -l /usr/local/sbin/voipmonitor_*
| |
| </syntaxhighlight>
| |
| | |
| The crash recovery process may have renamed the binary with an underscore suffix.
| |
| | |
| ;2. If found, rename it back:
| |
| <syntaxhighlight lang="bash">
| |
| mv /usr/local/sbin/voipmonitor_ /usr/local/sbin/voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| ;3. Restart the service:
| |
| <syntaxhighlight lang="bash">
| |
| systemctl start voipmonitor
| |
| systemctl status voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| Verify the service starts correctly.
| |
| | |
| == Out-of-Band Management Scenarios ==
| |
| | |
| When the system is completely unresponsive and cannot be accessed via SSH:
| |
| | |
| * '''Use your server's out-of-band management system:'''
| |
| * Dell iDRAC
| |
| * HP iLO
| |
| * Supermicro IPMI
| |
| * Other vendor-specific BMC/management tools
| |
| | |
| * '''Actions available via OBM:'''
| |
| * Access virtual console (KVM-over-IP)
| |
| * Send NMI (Non-Maskable Interrupt) for system dump
| |
| * Force power cycle
| |
| * Monitor hardware health
| |
| | |
| See [[Sniffer_troubleshooting|Sniffer Troubleshooting]] for more diagnostic procedures.
| |
| | |
| == Emergency: Service Restart Loop with "packetbuffer: MEMORY IS FULL" and "Cannot bind to port" ==
| |
| | |
| If the VoIPmonitor service enters a restart loop, logging <code>packetbuffer: MEMORY IS FULL</code> and displaying <code>Cannot bind to port [5029]</code> errors, the issue can have '''multiple root causes'''. The "MEMORY IS FULL" error message is ambiguous and can indicate either RAM exhaustion or disk I/O bottleneck.
| |
| | |
| === Critical: Distinguish Between RAM and Disk I/O Issues ===
| |
| | |
| The symptoms appear identical, but the root causes and solutions are different:
| |
| | |
| {|
| |
| |- | | |- |
| ! style="background:#ffc107;" | RAM-Based Memory Issue
| | | 10K RPM HDD || NVMe SSD || 10-50x |
| ! style="background:#ffc107;" | Disk I/O Performance Issue
| |
| |- | | |- |
| | Memory buffer fills due to excessive concurrent calls or traffic floods | | | SAS HDD || Enterprise SSD || 5-20x |
| | Memory buffer fills because disk cannot write fast enough to drain it | |
| |- | | |- |
| | Solution: Increase <code>max_buffer_mem</code>, enable <code>packetbuffer_compress</code>, or limit concurrent calls | | | Older SSD || NVMe (PCIe 4.0+) || 2-5x |
| | Solution: Upgrade storage, move spool to faster disk, or resolve I/O bottleneck | |
| |} | | |} |
|
| |
|
| === Step 1: Check for Disk I/O Bottleneck (Important!) === | | === Temporary Mitigation === |
|
| |
|
| If the service restarts repeatedly with "MEMORY IS FULL" but increasing <code>max_buffer_mem</code> does not help, check for disk I/O problems on the spool directory (typically <code>/var/spool/voipmonitor</code>). | | If immediate hardware upgrade not possible: |
| | * '''Alerts:''' Reduce frequency or schedule during off-peak (2am-4am) |
| | * '''Reports:''' Schedule for off-peak hours |
| | * '''Dashboards:''' Simplify queries, avoid "All time" ranges |
|
| |
|
| ;1. Monitor disk utilization with iostat:
| | === Component Separation === |
| <syntaxhighlight lang="bash">
| |
| # Monitor disk I/O in real-time (1-second intervals)
| |
| iostat -x 1
| |
| </syntaxhighlight>
| |
| * '''What to look for:''' A value near 100% in the <code>%util</code> column indicates the disk is operating at maximum capacity
| |
| * '''Symptoms:** High %util, high await (average wait time), or high queue depth
| |
|
| |
|
| ;2. Perform a write speed test to the spool directory:
| | For persistent issues, consider dedicated database server: |
| <syntaxhighlight lang="bash">
| | * '''Host 1:''' Database (max RAM + SSD/NVMe) |
| # Test sequential write speed (adjust count based on available disk space)
| | * '''Host 2:''' GUI web server |
| # Note: dd test uses O_DIRECT to bypass cache for accurate measurement
| | * '''Host 3:''' Sensor(s) |
| dd if=/dev/zero of=/var/spool/voipmonitor/testfile bs=1M count=1024 oflag=direct conv=fdatasync
| |
|
| |
|
| # Clean up test file | | See [[Scaling#Scaling_Through_Component_Separation|Scaling - Component Separation]]. |
| rm /var/spool/voipmonitor/testfile
| |
| </syntaxhighlight>
| |
| * '''Interpretation:''' A very slow write speed (e.g., less than 50 MB/s on HDDs or significantly lower than expected SSD speed) confirms a storage bottleneck
| |
| * For SSD/NVMe, expect 400+ MB/s sequential writes
| |
| * For HDDs, expect 80-150 MB/s sequential writes (7200 RPM)
| |
|
| |
|
| ;3. Check for I/O wait (Linux monitoring):
| | == Common Mistakes == |
| <syntaxhighlight lang="bash">
| |
| # Check if the system is spending significant time waiting for I/O
| |
| # High 'wa' (wait) percentage indicates disk bottleneck
| |
| top
| |
| # or
| |
| vmstat 1
| |
| </syntaxhighlight>
| |
| * Look for high <code>%wa</code> (I/O wait) in the CPU section
| |
|
| |
|
| === Step 2: Resolve Disk I/O Bottleneck === | | {{Warning|1=These do NOT fix database bottlenecks:}} |
|
| |
|
| If disk I/O tests confirm the issue:
| | {| class="wikitable" |
| | | |- |
| * '''Option 1: Upgrade storage hardware**
| | ! Wrong Action !! Why It Fails |
| ** Move <code>/var/spool/voipmonitor</code> to a faster local SSD or NVMe drive
| |
| ** Consider RAID 10 for better performance and redundancy
| |
| | |
| * '''Option 2: Tune storage configuration'''
| |
| ** Check if the disk is operating in degraded mode (RAID rebuild in progress)
| |
| ** Verify the storage controller firmware is up to date
| |
| ** Disable unnecessary monitoring or indexing (e.g., updatedb, antivirus scanning) on the spool directory
| |
| | |
| * '''Option 3: Move spool directory to faster volume'''
| |
| <syntaxhighlight lang="bash">
| |
| # Stop service
| |
| systemctl stop voipmonitor
| |
| | |
| # Mount faster disk to /var/spool/voipmonitor
| |
| # Or create symlink:
| |
| mv /var/spool/voipmonitor /var/spool/voipmonitor.backup
| |
| ln -s /path/to/fast/disk/voipmonitor /var/spool/voipmonitor
| |
| | |
| # Restart service
| |
| systemctl start voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| For detailed disk performance benchmarking, see [[IO_Measurement|I/O Performance Measurement]] for advanced testing with <code>fio</code> and <code>ioping</code>.
| |
| | |
| === Step 3: Check for RAM-Based Memory Issue ===
| |
| | |
| If disk I/O is healthy but the error persists, the issue is RAM-based memory exhaustion.
| |
| | |
| ;1. Check RAM allocation:
| |
| <syntaxhighlight lang="bash">
| |
| # Check current memory usage
| |
| free -h
| |
| </syntaxhighlight>
| |
| | |
| ;2. Increase memory buffer limits:
| |
| Edit <code>/etc/voipmonitor.conf</code>:
| |
| | |
| {| class="wikitable" style="background:#fff3cd; border:1px solid #ffc107;"
| |
| |- | | |- |
| ! colspan="2" style="background:#ffc107;" | Recommended Values for "MEMORY IS FULL" Errors
| | | Reducing PHP <code>memory_limit</code> || PHP waits for DB; less memory = earlier crashes |
| |- | | |- |
| | '''ringbuffer''' || For very high traffic (>200Mbps) or severe packet loss scenarios, increase to 2000 MB (maximum allowed). Default is 50 MB, recommended for >100Mbit traffic is 500 MB. | | | Adding more PHP-FPM workers || More workers pile up waiting for slow DB |
| |- | | |- |
| | '''max_buffer_mem''' || For high concurrent call loads (5000+ calls) or persistent buffer issues, increase to 8000 MB. Default is 2000 MB, typical tuning is 4000 MB for moderate loads. | | | Reducing <code>innodb_buffer_pool_size</code> || Makes DB slower, increases disk I/O |
| |- | | |- |
| | '''packetbuffer_compress''' || Enable if RAM is constrained (increases CPU usage to reduce memory footprint). | | | Adding RAM to GUI server || Bottleneck is DB server, not GUI |
| |} | | |} |
|
| |
|
| <syntaxhighlight lang="ini">
| | == Verification == |
| [general]
| |
| # HIGH TRAFFIC CONFIGURATION - Prevent "MEMORY IS FULL" errors
| |
| # Max ringbuffer for very high traffic traffic/serious packet loss
| |
| ringbuffer = 2000
| |
| | |
| # Increase buffer memory for high concurrent call loads
| |
| max_buffer_mem = 8000
| |
| | |
| # Enable compression to save RAM at CPU cost
| |
| packetbuffer_compress = yes
| |
| | |
| # Optional: Limit concurrent calls to prevent overload
| |
| callslimit = 2000
| |
| </syntaxhighlight>
| |
| | |
| '''Alternative: Moderate Traffic Configuration'''
| |
| <syntaxhighlight lang="ini">
| |
| [general]
| |
| # For moderate traffic (100-200 Mbit, 2000-5000 concurrent calls)
| |
| ringbuffer = 500
| |
| max_buffer_mem = 4000
| |
| packetbuffer_compress = yes
| |
| </syntaxhighlight>
| |
| | |
| ;3. Restart and monitor:
| |
| <syntaxhighlight lang="bash">
| |
| systemctl restart voipmonitor
| |
| journalctl -u voipmonitor -f
| |
| </syntaxhighlight>
| |
| | |
| See [[Sniffer_configuration#max_buffer_mem|Sniffer Configuration]] for more memory tuning options.
| |
| | |
| === Step 4: Alternative Root Cause - Adaptive Jitterbuffer Overload ===
| |
| | |
| If the "packetbuffer: MEMORY IS FULL" and "HEAP FULL" errors occur even after adjusting <code>max_buffer_mem</code>, the issue may be caused by the adaptive jitterbuffer feature consuming excessive memory during processing. The adaptive jitterbuffer (which simulates jitter up to 500ms) is CPU and memory-intensive and can trigger heap exhaustion on high-traffic systems.
| |
| | |
| ;1. Check if jitterbuffer_adapt is enabled:
| |
| <syntaxhighlight lang="bash">
| |
| # Check voipmonitor.conf for jitter buffer settings
| |
| grep jitterbuffer /etc/voipmonitor.conf
| |
| </syntaxhighlight>
| |
| | |
| If <code>jitterbuffer_adapt = yes</code> is set, this features may be causing the memory exhaustion.
| |
| | |
| ;2. Disable adaptive jitterbuffer:
| |
| Edit <code>/etc/voipmonitor.conf</code> and set:
| |
| <syntaxhighlight lang="ini">
| |
| [general]
| |
| # Disable adaptive jitterbuffer to prevent memory/CPU exhaustion
| |
| jitterbuffer_adapt = no
| |
| </syntaxhighlight>
| |
| | |
| ;3. Restart the service:
| |
| <syntaxhighlight lang="bash">
| |
| systemctl restart voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| ;4. Verify the error is resolved:
| |
| <syntaxhighlight lang="bash">
| |
| # Monitor for MEMORY IS FULL errors
| |
| journalctl -u voipmonitor -f
| |
| </syntaxhighlight>
| |
| | |
| '''Important Trade-offs:'''
| |
| | |
| * Disabling <code>jitterbuffer_adapt</code> removes the CPU/memory overhead but also disables <code>MOS_adaptive</code> score calculation
| |
| * Fixed jitterbuffer modes (<code>jitterbuffer_f1</code> for 50ms, <code>jitterbuffer_f2</code> for 200ms) remain available and consume significantly less resources
| |
| * If MOS quality scoring is required, consider using <code>jitterbuffer_f2 = yes</code> instead
| |
| | |
| This solution is particularly effective when the system crashes with both "MEMORY IS FULL" and "HEAP FULL" errors simultaneously, indicating the adaptive jitterbuffer heap is overflowing during real-time packet processing.
| |
| | |
| === Step 5: Clear Stale Port 5029 Bindings ===
| |
| | |
| The "Cannot bind to port [5029]" error occurs when a zombie process still holds the Manager API port. This prevents clean restarts.
| |
| | |
| <syntaxhighlight lang="bash">
| |
| # Force kill all VoIPmonitor processes
| |
| killall -9 voipmonitor
| |
| | |
| # Ensure service is stopped
| |
| systemctl stop voipmonitor
| |
| | |
| # Verify no processes are running
| |
| ps aux | grep voipmonitor
| |
| | |
| # Restart service
| |
| systemctl start voipmonitor
| |
| </syntaxhighlight>
| |
| | |
| After clearing zombie processes and addressing the root cause (I/O or RAM), the service should start successfully without the bind error.
| |
| | |
| === Related Issues ===
| |
|
| |
|
| For performance tuning and scaling guidance, see:
| | After implementing fix: |
| * [[Scaling|Scaling and Performance Tuning Guide]]
| | # Monitor SQL cache during next peak period |
| * [[IO_Measurement|I/O Performance Measurement]]
| | # Verify SQL cache does NOT grow uncontrollably |
| * [[High-Performance_VoIPmonitor_and_MySQL_Setup_Manual|High-Performance Setup]]
| | # Confirm GUI responsiveness |
| | # Check for OOM killer events in system logs |
|
| |
|
| == Related Documentation == | | == See Also == |
|
| |
|
| * [[Scaling|Scaling and Performance Tuning Guide]] - For performance optimization | | * [[Database_troubleshooting]] - SQL queue issues, CDR delays |
| * [[Anti-fraud|Anti-Fraud Rules]] - For attack detection and mitigation | | * [[Scaling]] - Performance tuning and database optimization |
| * [[Sniffer_troubleshooting|Sniffer Troubleshooting]] - For systematic diagnostic procedures
| | * [[GUI_troubleshooting]] - HTTP 500, login issues, debug mode |
| * [[High-Performance_VoIPmonitor_and_MySQL_Setup_Manual|High-Performance Setup]] - For optimizing high-traffic deployments
| |
| * [[Systemd_for_voipmonitor_service_management|Systemd Service Management]] - For service management best practices | |
|
| |
|
| == AI Summary for RAG == | | == AI Summary for RAG == |
|
| |
|
| '''Summary:''' This article provides emergency procedures for recovering VoIPmonitor from critical failures. It covers steps to force-terminate runaway processes consuming excessive CPU (including kill -9 and systemctl commands), root cause analysis for CPU spikes (SIP REGISTER floods, pcapcommand, RTP threads, audio features), OOM memory exhaustion troubleshooting (checking dmesg for killed processes, reducing innodb_buffer_pool_size and max_buffer_mem), preventive measures (monitoring, anti-fraud auto-blocking, network edge protection), recovery procedures for system freezes during updates and binary issues after crashes, out-of-band management scenarios, and CRITICAL troubleshooting for service restart loop with "packetbuffer: MEMORY IS FULL" and "Cannot bind to port [5029]" errors. The MEMORY IS FULL error has multiple root causes: (1) RAM-based memory exhaustion (solution: increase max_buffer_mem, enable packetbuffer_compress) or (2) Disk I/O performance bottleneck (solution: check iostat -x 1 for 100% utilization, test write speed with dd to /var/spool/voipmonitor with oflag=direct, upgrade storage or move spool directory) or (3) Adaptive jitterbuffer overload (solution: check jitterbuffer settings with grep jitterbuffer /etc/voipmonitor.conf, disable jitterbuffer_adapt=no if enabled, which also disables MOS_adaptive scoring but keeps jitterbuffer_f1/f2 available). The "Cannot bind to port [5029]" error requires clearing zombie processes (killall -9 voipmonitor, systemctl stop voipmonitor). | | '''Summary:''' Emergency guide for diagnosing database bottlenecks affecting VoIPmonitor GUI using sensor RRD charts. Symptoms: GUI unresponsive during peak hours, OOM killer terminating PHP, slow dashboards. KEY DIAGNOSTIC: Access RRD charts (Settings → Sensors → graph icon), look for growing SQL cache during peak hours - primary indicator of DB bottleneck. Bottleneck types: CPU (mysqld at 100%), Memory (buffer pool full), Storage I/O (most common - high iowait, HDD storage). Solutions: (1) Add RAM and set innodb_buffer_pool_size to 50-70% of RAM; (2) Upgrade HDD to SSD/NVMe (10-50x speedup); (3) Schedule alerts/reports off-peak; (4) Component separation with dedicated DB server. WRONG solutions: Do NOT reduce PHP memory_limit, do NOT add PHP-FPM workers, do NOT reduce innodb_buffer_pool_size, do NOT add RAM to GUI server. |
|
| |
|
| '''Keywords:''' emergency recovery, high CPU, system unresponsive, runaway process, kill process, kill -9, systemctl, SIP REGISTER flood, pcapcommand, performance optimization, out-of-band management, iDRAC, iLO, IPMI, crash recovery, OOM, out of memory, memory exhaustion, dmesg, innodb_buffer_pool_size, max_buffer_mem, MEMORY IS FULL, HEAP FULL, packetbuffer, disk I/O, I/O bottleneck, iostat, disk utilization, %util, write speed test, dd oflag=direct, spool directory, SSD, NVMe, RAID, Cannot bind to port 5029, zombie process, Manager API port, port 5029, restart loop, storage performance, I/O wait, %wa, jitterbuffer, jitterbuffer_adapt, adaptive jitterbuffer, jitterbuffer_f1, jitterbuffer_f2, MOS_adaptive, CPU intensive, memory exhaustion | | '''Keywords:''' database bottleneck, RRD charts, SQL cache, peak hours, OOM killer, GUI slow, GUI unresponsive, innodb_buffer_pool_size, SSD upgrade, NVMe, iowait, component separation, emergency procedures, performance crisis |
|
| |
|
| '''Key Questions:''' | | '''Key Questions:''' |
| * What to do when VoIPmonitor consumes 3000% CPU or system becomes unresponsive?
| | * How to diagnose database bottlenecks in VoIPmonitor? |
| * How to forcefully terminate a runaway VoIPmonitor process? | | * What does growing SQL cache in RRD charts indicate? |
| * What are common causes of CPU spikes in VoIPmonitor? | | * Why is VoIPmonitor GUI slow during peak hours? |
| * How to mitigate SIP REGISTER flood attacks causing high CPU? | | * How to fix OOM killer terminating PHP processes? |
| * How to diagnose OOM (Out of Memory) events?
| | * Should I upgrade RAM on GUI or database server? |
| * How to fix GUI and CLI frequently inaccessible due to memory exhaustion?
| | * What storage is recommended for VoIPmonitor database? |
| * How to reduce memory usage of MySQL and VoIPmonitor?
| | * How to access sensor RRD charts? |
| * What is max_buffer_mem and how to configure it?
| | * What are wrong solutions for database bottlenecks? |
| * How to restart VoIPmonitor service after a crash?
| |
| * What to do if service binary is not found after crash?
| |
| * How to prevent VoIPmonitor from freezing during GUI updates?
| |
| * What tools can help diagnose VoIPmonitor performance issues?
| |
| * What causes "packetbuffer: MEMORY IS FULL" error message?
| |
| * How to distinguish between RAM exhaustion and disk I/O bottleneck? | |
| * How to check for disk I/O performance issues causing restart loops? | |
| * How to use iostat to diagnose disk utilization?
| |
| * How to perform write speed test to /var/spool/voipmonitor directory?
| |
| * What does "Cannot bind to port [5029]" error mean? | |
| * How to clear zombie processes holding port 5029?
| |
| * How to resolve disk I/O bottleneck for VoIPmonitor?
| |
| * How to move spool directory to faster storage?
| |
| * What is the correct dd command to test disk write speed?
| |
| * What causes "HEAP FULL" errors in VoIPmonitor?
| |
| * How is jitterbuffer_adapt related to MEMORY IS FULL errors? | |
| * What is the solution for MEMORY IS FULL + HEAP FULL crashes caused by jitterbuffer_adapt? | |
| * Why should I disable jitterbuffer_adapt?
| |
| * What happens when I set jitterbuffer_adapt = no?
| |
| * What is the trade-off when disabling jitterbuffer_adapt?
| |
| * Can I still use jitterbuffer_f1 and jitterbuffer_f2 with jitterbuffer_adapt disabled?
| |