Emergency procedures: Difference between revisions

From VoIPmonitor.org
(Add jitterbuffer_adapt fix for MEMORY IS FULL + HEAP FULL errors)
(Rewrite: streamlined structure, improved flowchart, clearer diagnostic steps)
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{DISPLAYTITLE:Emergency Procedures & System Recovery}}
{{DISPLAYTITLE:Emergency Procedures: GUI Performance Crisis}}
[[Category:Troubleshooting]]
[[Category:Database]]


'''This guide covers emergency procedures for recovering your VoIPmonitor system from critical failures, including runaway processes, high CPU usage, and system unresponsiveness.'''
= GUI Performance Crisis: Database Bottleneck Diagnosis =


== Emergency: VoIPmonitor Process Consuming Excessive CPU or System Unresponsive ==
When VoIPmonitor GUI becomes unresponsive or PHP processes are killed by OOM, the root cause is often a '''database bottleneck''', not PHP configuration. This guide shows how to diagnose using sensor RRD charts.


When a VoIPmonitor process consumes excessive CPU (e.g., ~3000% or more) or causes the entire system to become unresponsive, follow these immediate steps:
{{Note|For general troubleshooting, see [[Database_troubleshooting]], [[GUI_troubleshooting]], or [[Sniffer_troubleshooting]].}}


=== Immediate Action: Force-Terminate Runaway Process ===
== Symptoms ==


If the system is still minimally responsive via SSH or requires out-of-band management (iDRAC, IPMI, console):
* GUI extremely slow or unresponsive during peak hours
* PHP processes killed by OOM killer
* Dashboard and CDR views take long to load
* Alerts/reports fail during high traffic
* System fine during off-peak, degrades during peak


;1. Identify the Process ID (PID):
== Diagnostic Flowchart ==
<syntaxhighlight lang="bash">
# Using htop (if available)
htop


# Or using ps
<kroki lang="mermaid">
ps aux | grep voipmonitor
%%{init: {'flowchart': {'nodeSpacing': 15, 'rankSpacing': 35}}}%%
</syntaxhighlight>
flowchart TD
    A[GUI Slow / OOM Errors] --> B{Check RRD Charts<br/>Settings → Sensors → 📊}
    B --> C{SQL Cache growing<br/>during peak?}
    C -->|No| D[NOT database bottleneck<br/>Check PHP/Apache config]
    C -->|Yes| E[Database Bottleneck<br/>Confirmed]
    E --> F{mysqld CPU ~100%?}
    F -->|Yes| G[CPU Bottleneck<br/>→ Upgrade CPU]
    F -->|No| H{High iowait?<br/>HDD storage?}
    H -->|Yes| I[I/O Bottleneck<br/>→ Upgrade to SSD/NVMe]
    H -->|No| J[Memory Bottleneck<br/>→ Add RAM + tune buffer_pool]


Look for the voipmonitor process consuming the most CPU resources. Note down the PID (process ID number).
    style A fill:#f9f,stroke:#333
    style E fill:#ff9,stroke:#333
</kroki>


;2. Forcefully terminate the process:
== Step 1: Access RRD Charts ==
<syntaxhighlight lang="bash">
kill -9 <PID>
</syntaxhighlight>


Replace <PID> with the actual process ID number identified in step 1.
# Navigate to '''Settings → Sensors'''
# Click the '''graph icon''' (📊) next to the sensor
# Select time range covering problematic peak hours


;3. Verify system recovery:
== Step 2: Identify Growing SQL Cache ==
<syntaxhighlight lang="bash">
# Check CPU usage has returned to normal
top


# Check if the process was terminated
The key indicator is '''SQL Cache''' or '''SQL Cache Files''' growing during peak hours:
ps aux | grep voipmonitor
</syntaxhighlight>


The system should become responsive again immediately after the process is killed. CPU utilization should drop significantly.
{| class="wikitable"
 
|-
=== Optional: Stop and Restart the Service (for persistent issues) ===
! Metric !! What to Look For !! Indicates
 
|-
If the problem persists or the service needs to be cleanly restarted:
| '''SQL Cache''' || Consistently increasing, never decreasing || DB cannot keep up with inserts
 
|-
<syntaxhighlight lang="bash">
| '''SQL Cache Files''' || Growing over time || Buffer pool too small or storage too slow
# Stop the voipmonitor service
|-
systemctl stop voipmonitor
| '''mysqld CPU''' || Near 100% || CPU bottleneck
 
|-
# Verify no zombie processes remaining
| '''Disk I/O (mysql)''' || High/saturated || Storage bottleneck (HDD vs SSD)
killall voipmonitor
|}
 
# Restart the service
systemctl start voipmonitor
 
# Verify service status
systemctl status voipmonitor
</syntaxhighlight>


'''Caution:''' When using <code>systemd</code> service management, avoid using the deprecated <code>service</code> command as it can cause systemd to lose track of the daemon. Always use <code>systemctl</code> commands or direct process commands like <code>killall</code>.
{{Warning|1=If SQL Cache is NOT growing, the problem is likely NOT the database. Check PHP/Apache configuration instead.}}


=== Root Cause Analysis: Why Did the CPU Spike? ===
== Step 3: Identify Bottleneck Type ==


After recovering the system, investigate the root cause to prevent recurrence. Common causes include:
=== CPU Bottleneck ===
* <code>mysqld</code> at or near 100% CPU
* '''Solution:''' More CPU cores or faster CPU


;SIP REGISTER Flood / Spaming Attack
=== Memory Bottleneck ===
Massive volumes of SIP REGISTER messages from malicious IPs can overwhelm the VoIPmonitor process.
* SQL cache fills up and stays full
* Buffer pool too small for dataset
* '''Solution:''' Add RAM, tune <code>innodb_buffer_pool_size</code>


* '''Detection:''' Check recent alert triggers in the VoIPmonitor GUI > Alerts > Sent Alerts for SIP REGISTER flood alerts
=== Storage I/O Bottleneck (Most Common) ===
* '''Immediate mitigation:''' Block attacker IPs at the network edge (SBC, firewall, iptables)
* High <code>iowait</code> during peak hours
* '''Long-term prevention:''' Configure anti-fraud rules with custom scripts to auto-block, see [[Anti-fraud#SIP REGISTER Flood/Attack|SIP REGISTER Flood Mitigation]]
* Database on magnetic disks (HDD)
* '''Solution:''' Upgrade to SSD/NVMe (10-50x improvement)


;Packet Capture Overload (pcapcommand)
== Solutions ==
The <code>pcapcommand</code> feature forks a program for ''every'' call, which can generate up to 500,000 interrupts per second.


* '''Detection:''' Check <code>/etc/voipmonitor.conf</code> for a <code>pcapcommand</code> line
=== Add RAM to Database Server ===
* '''Immediate fix:''' Comment out or remove the <code>pcapcommand</code> directive and restart the service
* '''Alternative:''' Use the built-in cleaning spool functionality (<code>maxpoolsize</code>, <code>cleanspool</code>) instead
 
;Excessive RTP Processing Threads
High concurrent call volumes can overload RTP processing threads.
 
* '''Detection:''' Check performance logs for high <code>tRTP_CPU</code> values (sum of all RTP threads)
* '''Mitigation:'''
  <pre>callslimit = 2000  # Limit max concurrent calls</pre>
 
;Audio Feature Overhead
Silence detection and audio conversion are CPU-intensive operations.
 
* '''Detection:''' Check if <code>silencedetect</code> or <code>saveaudio</code> are enabled
* '''Mitigation:'''
  <pre>
  silencedetect = no
  # saveaudio = wav  # Comment out if not needed
  </pre>
 
See [[Scaling|Scaling and Performance Tuning]] for detailed performance optimization strategies.
 
=== Preventive Measures ===
 
Once the root cause is identified, implement these preventive configurations:
 
;Monitor CPU Trends:
Use [[Collectd_installation|collectd]] or your existing monitoring system to track CPU usage over time and receive alerts before critical thresholds are reached.
 
;Anti-Fraud Auto-Blocking:
Configure [[Anti-fraud|Anti-Fraud rules]] with custom scripts to automatically block attacker IPs when a flood is detected. See the [[Anti-fraud|Anti-Fraud documentation]] for PHP script examples using iptables or ipset.
 
;Network Edge Protection:
Block SIP REGISTER spam and floods at your network edge (SBC, firewall) before traffic reaches VoIPmonitor. This provides better performance and reduces CPU load on the monitoring system.
 
== Emergency: GUI and CLI Frequently Inaccessible Due to Memory Exhaustion ==
 
When the VoIPmonitor GUI and CLI become frequently inaccessible or the server becomes unresponsive due to Out of Memory (OOM) conditions, follow these steps to identify and resolve the issue.
 
=== Diagnose OOM Events ===
 
The Linux kernel out-of-memory (OOM) killer terminates processes when RAM is exhausted.
 
;Check the kernel ring buffer for OOM events:
<syntaxhighlight lang="bash">
dmesg -T | grep -i killed
</syntaxhighlight>
 
If you see messages like "Out of memory: Kill process" or "invoke-oom-killer", your system is running out of physical RAM.
 
=== Immediate Relief: Reduce Memory Allocation ===
 
Reduce memory consumption by tuning both MySQL and VoIPmonitor parameters.
 
;1. Reduce MySQL Buffer Pool Size:
 
Edit the MySQL configuration file (typically <code>/etc/my.cnf.d/mysql-server.cnf</code> or <code>/etc/mysql/my.cnf</code> for Debian/Ubuntu):


<syntaxhighlight lang="ini">
<syntaxhighlight lang="ini">
[mysqld]
# /etc/mysql/my.cnf
# Reduce from 8GB to 6GB (adjust based on available RAM)
# Set to 50-70% of total RAM on dedicated DB server
innodb_buffer_pool_size = 6G
innodb_buffer_pool_size = 64G
</syntaxhighlight>
</syntaxhighlight>


A good starting point is <code>innodb_buffer_pool_size = RAM * 0.5 - max_buffer_mem * 0.8</code>. For example, on a 16GB server with 8GB allocated to max_buffer_mem, set innodb_buffer_pool_size to approximately 6GB.
{| class="wikitable"
|-
! Current RAM !! Recommended !! <code>innodb_buffer_pool_size</code>
|-
| 32GB || 64-128GB || 32-64G
|-
| 64GB || 128-256GB || 64-128G
|}


;2. Reduce VoIPmonitor Buffer Memory:
=== Upgrade to SSD/NVMe ===


Edit <code>/etc/voipmonitor.conf</code> and decrease the <code>max_buffer_mem</code> value:
{| class="wikitable"
 
|-
<syntaxhighlight lang="ini">
! Current !! Upgrade To !! Expected Speedup
[general]
# Reduce from 8000 to 6000 (adjust based on available RAM)
max_buffer_mem = 6000
</syntaxhighlight>
 
The <code>max_buffer_mem</code> parameter limits the maximum RAM allocation for the packet buffer. Typical values range from 2000-8000 MB depending on traffic volume and call rates.
 
;3. Restart the affected services:
 
<syntaxhighlight lang="bash">
systemctl restart mysqld
systemctl restart voipmonitor
</syntaxhighlight>
 
Monitor the system to confirm stability.
 
=== Long-term Solution: Increase RAM ===
 
For sustained production operation, increase the server's physical RAM:
 
* '''Minimum''': Add at least 16 GB of additional RAM to eliminate OOM conditions
* '''Performance benefit''': After the RAM upgrade, you can safely increase <code>innodb_buffer_pool_size</code> to improve MySQL performance
* '''Recommended settings''': Set <code>innodb_buffer_pool_size</code> to 50-70% of total RAM and <code>max_buffer_mem</code> based on your traffic requirements
 
See [[Sniffer_configuration#max_buffer_mem|Sniffer Configuration]] for details on VoIPmonitor memory settings.
 
== Emergency: Diagnosing System Hangs and Collecting Core Dump Evidence ==
 
When the VoIPmonitor system hangs, packet buffer (heap) spikes to 100%, and a single CPU core is pegged at 100%, you need to diagnose the issue and collect evidence for developer analysis before restarting.
 
### Identify the Problematic Thread
 
Use the Manager API to identify which sniffer thread is consuming excessive CPU resources.
 
<syntaxhighlight lang="bash">
# Query thread statistics from the sensor
echo 'sniffer_threads' | nc <sensor_ip> 5029
</syntaxhighlight>
 
Replace <sensor_ip> with the actual IP address of your VoIPmonitor sensor. Look for a thread showing approximately 100% CPU usage. This indicates the specific processing thread that is causing the hang.
 
### Generate Core Dump for Developer Analysis
 
If a thread is pegged at 100% and the system needs to be analyzed by VoIPmonitor developers, generate a core dump before restarting:
 
;1. Find the VoIPmonitor process ID (PID):
<syntaxhighlight lang="bash">
ps aux | grep voipmonitor | grep -v grep
</syntaxhighlight>
 
;2. Attach to the process with gdb and generate a core dump:
<syntaxhighlight lang="bash">
gdb -p <PID_of_voipmonitor>
# Within gdb, generate the core dump
gcore <output_file>
 
# Example:
gdb -p 12345
(gdb) gcore /tmp/voipmonitor_hang.core
</syntaxhighlight>
 
The core dump file provides developers with a complete snapshot of the process state at the moment of the hang, including memory, registers, and stack traces.
 
;3. Detach from gdb and quit:
<syntaxhighlight lang="bash"> detach
quit
</syntaxhighlight>
 
### Restore Service and Collect Evidence
 
After collecting the diagnostic evidence, restart the service to restore operation:
 
<syntaxhighlight lang="bash">
systemctl restart voipmonitor
</syntaxhighlight>
 
Provide the following files to VoIPmonitor support for analysis:
 
* Core dump file (from gcore command)
* Thread statistics output (from sniffer_threads command)
* Performance logs (/var/log/syslog showing the hang period)
* Configuration file (/etc/voipmonitor.conf)
 
'''Important:''' Core dump files can be very large (several GB depending on max_buffer_mem). Ensure you have sufficient disk space and consider compressing the file before transferring it to support.
 
== Emergency: System Freezes on Every Update Attempt ==
 
If the VoIPmonitor sensor becomes unresponsive or hangs each time you attempt to update it through the Web GUI:
 
;1. SSH into the sensor host
;2. Execute the following commands to forcefully stop and restart:
<syntaxhighlight lang="bash">
killall voipmonitor
systemctl stop voipmonitor
systemctl start voipmonitor
</syntaxhighlight>
 
This sequence ensures zombie processes are terminated, systemd is fully stopped, and a clean service restart occurs. Verify the sensor status in the GUI to confirm it is responding correctly.
 
== Emergency: Binary Not Found After Crash ==
 
If the VoIPmonitor service fails to start after a crash with error "Binary not found" for <code>/usr/local/sbin/voipmonitor</code>:
 
;1. Check for a renamed binary:
<syntaxhighlight lang="bash">
ls -l /usr/local/sbin/voipmonitor_*
</syntaxhighlight>
 
The crash recovery process may have renamed the binary with an underscore suffix.
 
;2. If found, rename it back:
<syntaxhighlight lang="bash">
mv /usr/local/sbin/voipmonitor_ /usr/local/sbin/voipmonitor
</syntaxhighlight>
 
;3. Restart the service:
<syntaxhighlight lang="bash">
systemctl start voipmonitor
systemctl status voipmonitor
</syntaxhighlight>
 
Verify the service starts correctly.
 
== Out-of-Band Management Scenarios ==
 
When the system is completely unresponsive and cannot be accessed via SSH:
 
* '''Use your server's out-of-band management system:'''
  * Dell iDRAC
  * HP iLO
  * Supermicro IPMI
  * Other vendor-specific BMC/management tools
 
* '''Actions available via OBM:'''
  * Access virtual console (KVM-over-IP)
  * Send NMI (Non-Maskable Interrupt) for system dump
  * Force power cycle
  * Monitor hardware health
 
See [[Sniffer_troubleshooting|Sniffer Troubleshooting]] for more diagnostic procedures.
 
== Emergency: Service Restart Loop with "packetbuffer: MEMORY IS FULL" and "Cannot bind to port" ==
 
If the VoIPmonitor service enters a restart loop, logging <code>packetbuffer: MEMORY IS FULL</code> and displaying <code>Cannot bind to port [5029]</code> errors, the issue can have '''multiple root causes'''. The "MEMORY IS FULL" error message is ambiguous and can indicate either RAM exhaustion or disk I/O bottleneck.
 
=== Critical: Distinguish Between RAM and Disk I/O Issues ===
 
The symptoms appear identical, but the root causes and solutions are different:
 
{|
|-
|-
! style="background:#ffc107;" | RAM-Based Memory Issue
| 10K RPM HDD || NVMe SSD || 10-50x
! style="background:#ffc107;" | Disk I/O Performance Issue
|-
|-
| Memory buffer fills due to excessive concurrent calls or traffic floods
| SAS HDD || Enterprise SSD || 5-20x
| Memory buffer fills because disk cannot write fast enough to drain it
|-
|-
| Solution: Increase <code>max_buffer_mem</code>, enable <code>packetbuffer_compress</code>, or limit concurrent calls
| Older SSD || NVMe (PCIe 4.0+) || 2-5x
| Solution: Upgrade storage, move spool to faster disk, or resolve I/O bottleneck
|}
|}


=== Step 1: Check for Disk I/O Bottleneck (Important!) ===
=== Temporary Mitigation ===


If the service restarts repeatedly with "MEMORY IS FULL" but increasing <code>max_buffer_mem</code> does not help, check for disk I/O problems on the spool directory (typically <code>/var/spool/voipmonitor</code>).
If immediate hardware upgrade not possible:
* '''Alerts:''' Reduce frequency or schedule during off-peak (2am-4am)
* '''Reports:''' Schedule for off-peak hours
* '''Dashboards:''' Simplify queries, avoid "All time" ranges


;1. Monitor disk utilization with iostat:
=== Component Separation ===
<syntaxhighlight lang="bash">
# Monitor disk I/O in real-time (1-second intervals)
iostat -x 1
</syntaxhighlight>
* '''What to look for:''' A value near 100% in the <code>%util</code> column indicates the disk is operating at maximum capacity
* '''Symptoms:** High %util, high await (average wait time), or high queue depth


;2. Perform a write speed test to the spool directory:
For persistent issues, consider dedicated database server:
<syntaxhighlight lang="bash">
* '''Host 1:''' Database (max RAM + SSD/NVMe)
# Test sequential write speed (adjust count based on available disk space)
* '''Host 2:''' GUI web server
# Note: dd test uses O_DIRECT to bypass cache for accurate measurement
* '''Host 3:''' Sensor(s)
dd if=/dev/zero of=/var/spool/voipmonitor/testfile bs=1M count=1024 oflag=direct conv=fdatasync


# Clean up test file
See [[Scaling#Scaling_Through_Component_Separation|Scaling - Component Separation]].
rm /var/spool/voipmonitor/testfile
</syntaxhighlight>
* '''Interpretation:''' A very slow write speed (e.g., less than 50 MB/s on HDDs or significantly lower than expected SSD speed) confirms a storage bottleneck
* For SSD/NVMe, expect 400+ MB/s sequential writes
* For HDDs, expect 80-150 MB/s sequential writes (7200 RPM)


;3. Check for I/O wait (Linux monitoring):
== Common Mistakes ==
<syntaxhighlight lang="bash">
# Check if the system is spending significant time waiting for I/O
# High 'wa' (wait) percentage indicates disk bottleneck
top
# or
vmstat 1
</syntaxhighlight>
* Look for high <code>%wa</code> (I/O wait) in the CPU section


=== Step 2: Resolve Disk I/O Bottleneck ===
{{Warning|1=These do NOT fix database bottlenecks:}}


If disk I/O tests confirm the issue:
{| class="wikitable"
 
|-
* '''Option 1: Upgrade storage hardware**
! Wrong Action !! Why It Fails
  ** Move <code>/var/spool/voipmonitor</code> to a faster local SSD or NVMe drive
  ** Consider RAID 10 for better performance and redundancy
 
* '''Option 2: Tune storage configuration'''
  ** Check if the disk is operating in degraded mode (RAID rebuild in progress)
  ** Verify the storage controller firmware is up to date
  ** Disable unnecessary monitoring or indexing (e.g., updatedb, antivirus scanning) on the spool directory
 
* '''Option 3: Move spool directory to faster volume'''
  <syntaxhighlight lang="bash">
  # Stop service
  systemctl stop voipmonitor
 
  # Mount faster disk to /var/spool/voipmonitor
  # Or create symlink:
  mv /var/spool/voipmonitor /var/spool/voipmonitor.backup
  ln -s /path/to/fast/disk/voipmonitor /var/spool/voipmonitor
 
  # Restart service
  systemctl start voipmonitor
  </syntaxhighlight>
 
For detailed disk performance benchmarking, see [[IO_Measurement|I/O Performance Measurement]] for advanced testing with <code>fio</code> and <code>ioping</code>.
 
=== Step 3: Check for RAM-Based Memory Issue ===
 
If disk I/O is healthy but the error persists, the issue is RAM-based memory exhaustion.
 
;1. Check RAM allocation:
<syntaxhighlight lang="bash">
# Check current memory usage
free -h
</syntaxhighlight>
 
;2. Increase memory buffer limits:
Edit <code>/etc/voipmonitor.conf</code>:
 
{| class="wikitable" style="background:#fff3cd; border:1px solid #ffc107;"
|-
|-
! colspan="2" style="background:#ffc107;" | Recommended Values for "MEMORY IS FULL" Errors
| Reducing PHP <code>memory_limit</code> || PHP waits for DB; less memory = earlier crashes
|-
|-
| '''ringbuffer''' || For very high traffic (>200Mbps) or severe packet loss scenarios, increase to 2000 MB (maximum allowed). Default is 50 MB, recommended for >100Mbit traffic is 500 MB.
| Adding more PHP-FPM workers || More workers pile up waiting for slow DB
|-
|-
| '''max_buffer_mem''' || For high concurrent call loads (5000+ calls) or persistent buffer issues, increase to 8000 MB. Default is 2000 MB, typical tuning is 4000 MB for moderate loads.
| Reducing <code>innodb_buffer_pool_size</code> || Makes DB slower, increases disk I/O
|-
|-
| '''packetbuffer_compress''' || Enable if RAM is constrained (increases CPU usage to reduce memory footprint).
| Adding RAM to GUI server || Bottleneck is DB server, not GUI
|}
|}


<syntaxhighlight lang="ini">
== Verification ==
[general]
# HIGH TRAFFIC CONFIGURATION - Prevent "MEMORY IS FULL" errors
# Max ringbuffer for very high traffic traffic/serious packet loss
ringbuffer = 2000
 
# Increase buffer memory for high concurrent call loads
max_buffer_mem = 8000
 
# Enable compression to save RAM at CPU cost
packetbuffer_compress = yes
 
# Optional: Limit concurrent calls to prevent overload
callslimit = 2000
</syntaxhighlight>
 
'''Alternative: Moderate Traffic Configuration'''
<syntaxhighlight lang="ini">
[general]
# For moderate traffic (100-200 Mbit, 2000-5000 concurrent calls)
ringbuffer = 500
max_buffer_mem = 4000
packetbuffer_compress = yes
</syntaxhighlight>
 
;3. Restart and monitor:
<syntaxhighlight lang="bash">
systemctl restart voipmonitor
journalctl -u voipmonitor -f
</syntaxhighlight>
 
See [[Sniffer_configuration#max_buffer_mem|Sniffer Configuration]] for more memory tuning options.
 
=== Step 4: Alternative Root Cause - Adaptive Jitterbuffer Overload ===
 
If the "packetbuffer: MEMORY IS FULL" and "HEAP FULL" errors occur even after adjusting <code>max_buffer_mem</code>, the issue may be caused by the adaptive jitterbuffer feature consuming excessive memory during processing. The adaptive jitterbuffer (which simulates jitter up to 500ms) is CPU and memory-intensive and can trigger heap exhaustion on high-traffic systems.
 
;1. Check if jitterbuffer_adapt is enabled:
<syntaxhighlight lang="bash">
# Check voipmonitor.conf for jitter buffer settings
grep jitterbuffer /etc/voipmonitor.conf
</syntaxhighlight>
 
If <code>jitterbuffer_adapt = yes</code> is set, this features may be causing the memory exhaustion.
 
;2. Disable adaptive jitterbuffer:
Edit <code>/etc/voipmonitor.conf</code> and set:
<syntaxhighlight lang="ini">
[general]
# Disable adaptive jitterbuffer to prevent memory/CPU exhaustion
jitterbuffer_adapt = no
</syntaxhighlight>
 
;3. Restart the service:
<syntaxhighlight lang="bash">
systemctl restart voipmonitor
</syntaxhighlight>
 
;4. Verify the error is resolved:
<syntaxhighlight lang="bash">
# Monitor for MEMORY IS FULL errors
journalctl -u voipmonitor -f
</syntaxhighlight>
 
'''Important Trade-offs:'''
 
* Disabling <code>jitterbuffer_adapt</code> removes the CPU/memory overhead but also disables <code>MOS_adaptive</code> score calculation
* Fixed jitterbuffer modes (<code>jitterbuffer_f1</code> for 50ms, <code>jitterbuffer_f2</code> for 200ms) remain available and consume significantly less resources
* If MOS quality scoring is required, consider using <code>jitterbuffer_f2 = yes</code> instead
 
This solution is particularly effective when the system crashes with both "MEMORY IS FULL" and "HEAP FULL" errors simultaneously, indicating the adaptive jitterbuffer heap is overflowing during real-time packet processing.
 
=== Step 5: Clear Stale Port 5029 Bindings ===
 
The "Cannot bind to port [5029]" error occurs when a zombie process still holds the Manager API port. This prevents clean restarts.
 
<syntaxhighlight lang="bash">
# Force kill all VoIPmonitor processes
killall -9 voipmonitor
 
# Ensure service is stopped
systemctl stop voipmonitor
 
# Verify no processes are running
ps aux | grep voipmonitor
 
# Restart service
systemctl start voipmonitor
</syntaxhighlight>
 
After clearing zombie processes and addressing the root cause (I/O or RAM), the service should start successfully without the bind error.
 
=== Related Issues ===


For performance tuning and scaling guidance, see:
After implementing fix:
* [[Scaling|Scaling and Performance Tuning Guide]]
# Monitor SQL cache during next peak period
* [[IO_Measurement|I/O Performance Measurement]]
# Verify SQL cache does NOT grow uncontrollably
* [[High-Performance_VoIPmonitor_and_MySQL_Setup_Manual|High-Performance Setup]]
# Confirm GUI responsiveness
# Check for OOM killer events in system logs


== Related Documentation ==
== See Also ==


* [[Scaling|Scaling and Performance Tuning Guide]] - For performance optimization
* [[Database_troubleshooting]] - SQL queue issues, CDR delays
* [[Anti-fraud|Anti-Fraud Rules]] - For attack detection and mitigation
* [[Scaling]] - Performance tuning and database optimization
* [[Sniffer_troubleshooting|Sniffer Troubleshooting]] - For systematic diagnostic procedures
* [[GUI_troubleshooting]] - HTTP 500, login issues, debug mode
* [[High-Performance_VoIPmonitor_and_MySQL_Setup_Manual|High-Performance Setup]] - For optimizing high-traffic deployments
* [[Systemd_for_voipmonitor_service_management|Systemd Service Management]] - For service management best practices


== AI Summary for RAG ==
== AI Summary for RAG ==


'''Summary:''' This article provides emergency procedures for recovering VoIPmonitor from critical failures. It covers steps to force-terminate runaway processes consuming excessive CPU (including kill -9 and systemctl commands), root cause analysis for CPU spikes (SIP REGISTER floods, pcapcommand, RTP threads, audio features), OOM memory exhaustion troubleshooting (checking dmesg for killed processes, reducing innodb_buffer_pool_size and max_buffer_mem), preventive measures (monitoring, anti-fraud auto-blocking, network edge protection), recovery procedures for system freezes during updates and binary issues after crashes, out-of-band management scenarios, and CRITICAL troubleshooting for service restart loop with "packetbuffer: MEMORY IS FULL" and "Cannot bind to port [5029]" errors. The MEMORY IS FULL error has multiple root causes: (1) RAM-based memory exhaustion (solution: increase max_buffer_mem, enable packetbuffer_compress) or (2) Disk I/O performance bottleneck (solution: check iostat -x 1 for 100% utilization, test write speed with dd to /var/spool/voipmonitor with oflag=direct, upgrade storage or move spool directory) or (3) Adaptive jitterbuffer overload (solution: check jitterbuffer settings with grep jitterbuffer /etc/voipmonitor.conf, disable jitterbuffer_adapt=no if enabled, which also disables MOS_adaptive scoring but keeps jitterbuffer_f1/f2 available). The "Cannot bind to port [5029]" error requires clearing zombie processes (killall -9 voipmonitor, systemctl stop voipmonitor).
'''Summary:''' Emergency guide for diagnosing database bottlenecks affecting VoIPmonitor GUI using sensor RRD charts. Symptoms: GUI unresponsive during peak hours, OOM killer terminating PHP, slow dashboards. KEY DIAGNOSTIC: Access RRD charts (Settings → Sensors → graph icon), look for growing SQL cache during peak hours - primary indicator of DB bottleneck. Bottleneck types: CPU (mysqld at 100%), Memory (buffer pool full), Storage I/O (most common - high iowait, HDD storage). Solutions: (1) Add RAM and set innodb_buffer_pool_size to 50-70% of RAM; (2) Upgrade HDD to SSD/NVMe (10-50x speedup); (3) Schedule alerts/reports off-peak; (4) Component separation with dedicated DB server. WRONG solutions: Do NOT reduce PHP memory_limit, do NOT add PHP-FPM workers, do NOT reduce innodb_buffer_pool_size, do NOT add RAM to GUI server.


'''Keywords:''' emergency recovery, high CPU, system unresponsive, runaway process, kill process, kill -9, systemctl, SIP REGISTER flood, pcapcommand, performance optimization, out-of-band management, iDRAC, iLO, IPMI, crash recovery, OOM, out of memory, memory exhaustion, dmesg, innodb_buffer_pool_size, max_buffer_mem, MEMORY IS FULL, HEAP FULL, packetbuffer, disk I/O, I/O bottleneck, iostat, disk utilization, %util, write speed test, dd oflag=direct, spool directory, SSD, NVMe, RAID, Cannot bind to port 5029, zombie process, Manager API port, port 5029, restart loop, storage performance, I/O wait, %wa, jitterbuffer, jitterbuffer_adapt, adaptive jitterbuffer, jitterbuffer_f1, jitterbuffer_f2, MOS_adaptive, CPU intensive, memory exhaustion
'''Keywords:''' database bottleneck, RRD charts, SQL cache, peak hours, OOM killer, GUI slow, GUI unresponsive, innodb_buffer_pool_size, SSD upgrade, NVMe, iowait, component separation, emergency procedures, performance crisis


'''Key Questions:'''
'''Key Questions:'''
* What to do when VoIPmonitor consumes 3000% CPU or system becomes unresponsive?
* How to diagnose database bottlenecks in VoIPmonitor?
* How to forcefully terminate a runaway VoIPmonitor process?
* What does growing SQL cache in RRD charts indicate?
* What are common causes of CPU spikes in VoIPmonitor?
* Why is VoIPmonitor GUI slow during peak hours?
* How to mitigate SIP REGISTER flood attacks causing high CPU?
* How to fix OOM killer terminating PHP processes?
* How to diagnose OOM (Out of Memory) events?
* Should I upgrade RAM on GUI or database server?
* How to fix GUI and CLI frequently inaccessible due to memory exhaustion?
* What storage is recommended for VoIPmonitor database?
* How to reduce memory usage of MySQL and VoIPmonitor?
* How to access sensor RRD charts?
* What is max_buffer_mem and how to configure it?
* What are wrong solutions for database bottlenecks?
* How to restart VoIPmonitor service after a crash?
* What to do if service binary is not found after crash?
* How to prevent VoIPmonitor from freezing during GUI updates?
* What tools can help diagnose VoIPmonitor performance issues?
* What causes "packetbuffer: MEMORY IS FULL" error message?
* How to distinguish between RAM exhaustion and disk I/O bottleneck?
* How to check for disk I/O performance issues causing restart loops?
* How to use iostat to diagnose disk utilization?
* How to perform write speed test to /var/spool/voipmonitor directory?
* What does "Cannot bind to port [5029]" error mean?
* How to clear zombie processes holding port 5029?
* How to resolve disk I/O bottleneck for VoIPmonitor?
* How to move spool directory to faster storage?
* What is the correct dd command to test disk write speed?
* What causes "HEAP FULL" errors in VoIPmonitor?
* How is jitterbuffer_adapt related to MEMORY IS FULL errors?
* What is the solution for MEMORY IS FULL + HEAP FULL crashes caused by jitterbuffer_adapt?
* Why should I disable jitterbuffer_adapt?
* What happens when I set jitterbuffer_adapt = no?
* What is the trade-off when disabling jitterbuffer_adapt?
* Can I still use jitterbuffer_f1 and jitterbuffer_f2 with jitterbuffer_adapt disabled?

Latest revision as of 16:48, 8 January 2026


GUI Performance Crisis: Database Bottleneck Diagnosis

When VoIPmonitor GUI becomes unresponsive or PHP processes are killed by OOM, the root cause is often a database bottleneck, not PHP configuration. This guide shows how to diagnose using sensor RRD charts.

ℹ️ Note: For general troubleshooting, see Database_troubleshooting, GUI_troubleshooting, or Sniffer_troubleshooting.

Symptoms

  • GUI extremely slow or unresponsive during peak hours
  • PHP processes killed by OOM killer
  • Dashboard and CDR views take long to load
  • Alerts/reports fail during high traffic
  • System fine during off-peak, degrades during peak

Diagnostic Flowchart

Step 1: Access RRD Charts

  1. Navigate to Settings → Sensors
  2. Click the graph icon (📊) next to the sensor
  3. Select time range covering problematic peak hours

Step 2: Identify Growing SQL Cache

The key indicator is SQL Cache or SQL Cache Files growing during peak hours:

Metric What to Look For Indicates
SQL Cache Consistently increasing, never decreasing DB cannot keep up with inserts
SQL Cache Files Growing over time Buffer pool too small or storage too slow
mysqld CPU Near 100% CPU bottleneck
Disk I/O (mysql) High/saturated Storage bottleneck (HDD vs SSD)

⚠️ Warning: If SQL Cache is NOT growing, the problem is likely NOT the database. Check PHP/Apache configuration instead.

Step 3: Identify Bottleneck Type

CPU Bottleneck

  • mysqld at or near 100% CPU
  • Solution: More CPU cores or faster CPU

Memory Bottleneck

  • SQL cache fills up and stays full
  • Buffer pool too small for dataset
  • Solution: Add RAM, tune innodb_buffer_pool_size

Storage I/O Bottleneck (Most Common)

  • High iowait during peak hours
  • Database on magnetic disks (HDD)
  • Solution: Upgrade to SSD/NVMe (10-50x improvement)

Solutions

Add RAM to Database Server

# /etc/mysql/my.cnf
# Set to 50-70% of total RAM on dedicated DB server
innodb_buffer_pool_size = 64G
Current RAM Recommended innodb_buffer_pool_size
32GB 64-128GB 32-64G
64GB 128-256GB 64-128G

Upgrade to SSD/NVMe

Current Upgrade To Expected Speedup
10K RPM HDD NVMe SSD 10-50x
SAS HDD Enterprise SSD 5-20x
Older SSD NVMe (PCIe 4.0+) 2-5x

Temporary Mitigation

If immediate hardware upgrade not possible:

  • Alerts: Reduce frequency or schedule during off-peak (2am-4am)
  • Reports: Schedule for off-peak hours
  • Dashboards: Simplify queries, avoid "All time" ranges

Component Separation

For persistent issues, consider dedicated database server:

  • Host 1: Database (max RAM + SSD/NVMe)
  • Host 2: GUI web server
  • Host 3: Sensor(s)

See Scaling - Component Separation.

Common Mistakes

⚠️ Warning: These do NOT fix database bottlenecks:

Wrong Action Why It Fails
Reducing PHP memory_limit PHP waits for DB; less memory = earlier crashes
Adding more PHP-FPM workers More workers pile up waiting for slow DB
Reducing innodb_buffer_pool_size Makes DB slower, increases disk I/O
Adding RAM to GUI server Bottleneck is DB server, not GUI

Verification

After implementing fix:

  1. Monitor SQL cache during next peak period
  2. Verify SQL cache does NOT grow uncontrollably
  3. Confirm GUI responsiveness
  4. Check for OOM killer events in system logs

See Also

AI Summary for RAG

Summary: Emergency guide for diagnosing database bottlenecks affecting VoIPmonitor GUI using sensor RRD charts. Symptoms: GUI unresponsive during peak hours, OOM killer terminating PHP, slow dashboards. KEY DIAGNOSTIC: Access RRD charts (Settings → Sensors → graph icon), look for growing SQL cache during peak hours - primary indicator of DB bottleneck. Bottleneck types: CPU (mysqld at 100%), Memory (buffer pool full), Storage I/O (most common - high iowait, HDD storage). Solutions: (1) Add RAM and set innodb_buffer_pool_size to 50-70% of RAM; (2) Upgrade HDD to SSD/NVMe (10-50x speedup); (3) Schedule alerts/reports off-peak; (4) Component separation with dedicated DB server. WRONG solutions: Do NOT reduce PHP memory_limit, do NOT add PHP-FPM workers, do NOT reduce innodb_buffer_pool_size, do NOT add RAM to GUI server.

Keywords: database bottleneck, RRD charts, SQL cache, peak hours, OOM killer, GUI slow, GUI unresponsive, innodb_buffer_pool_size, SSD upgrade, NVMe, iowait, component separation, emergency procedures, performance crisis

Key Questions:

  • How to diagnose database bottlenecks in VoIPmonitor?
  • What does growing SQL cache in RRD charts indicate?
  • Why is VoIPmonitor GUI slow during peak hours?
  • How to fix OOM killer terminating PHP processes?
  • Should I upgrade RAM on GUI or database server?
  • What storage is recommended for VoIPmonitor database?
  • How to access sensor RRD charts?
  • What are wrong solutions for database bottlenecks?