Emergency procedures: Difference between revisions

From VoIPmonitor.org
(Add OOM memory exhaustion troubleshooting section for GUI/CLI inaccessibility)
(Review: použití šablon {{Note}} a {{Warning}} místo prostého textu)
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{DISPLAYTITLE:Emergency Procedures & System Recovery}}
= Diagnosing Database Bottlenecks Using Sensor RRD Charts =


'''This guide covers emergency procedures for recovering your VoIPmonitor system from critical failures, including runaway processes, high CPU usage, and system unresponsiveness.'''
If your VoIPmonitor GUI becomes unresponsive or PHP processes are being terminated by the OOM (Out of Memory) killer, the root cause may be a '''database performance bottleneck''', not a PHP configuration issue.


== Emergency: VoIPmonitor Process Consuming Excessive CPU or System Unresponsive ==
This guide explains how to use the sensor's RRD (Round-Robin Database) charts to identify whether the database server is the limiting factor.


When a VoIPmonitor process consumes excessive CPU (e.g., ~3000% or more) or causes the entire system to become unresponsive, follow these immediate steps:
== Symptoms of Database Bottlenecks affecting the GUI ==


=== Immediate Action: Force-Terminate Runaway Process ===
* GUI becomes extremely slow or unresponsive during peak hours
* PHP processes are killed by the OOM killer on the GUI server
* Dashboard and CDR views take a long time to load
* Alerts and reports fail during high traffic periods
* System appears fine during off-peak hours but degrades during peak usage


If the system is still minimally responsive via SSH or requires out-of-band management (iDRAC, IPMI, console):
{{Note|These symptoms often occur when the GUI server is waiting for database queries to complete, causing PHP processes to pile up and consume excessive memory.}}


;1. Identify the Process ID (PID):
== Understanding Sensor RRD Charts ==
<syntaxhighlight lang="bash">
# Using htop (if available)
htop


# Or using ps
The VoIPmonitor sensor generates performance charts (RRD files) that track system metrics over time. These charts are accessible through the GUI and provide visual indicators of where bottlenecks are occurring.
ps aux | grep voipmonitor
</syntaxhighlight>
 
Look for the voipmonitor process consuming the most CPU resources. Note down the PID (process ID number).
 
;2. Forcefully terminate the process:
<syntaxhighlight lang="bash">
kill -9 <PID>
</syntaxhighlight>
 
Replace <PID> with the actual process ID number identified in step 1.
 
;3. Verify system recovery:
<syntaxhighlight lang="bash">
# Check CPU usage has returned to normal
top
 
# Check if the process was terminated
ps aux | grep voipmonitor
</syntaxhighlight>
 
The system should become responsive again immediately after the process is killed. CPU utilization should drop significantly.
 
=== Optional: Stop and Restart the Service (for persistent issues) ===
 
If the problem persists or the service needs to be cleanly restarted:
 
<syntaxhighlight lang="bash">
# Stop the voipmonitor service
systemctl stop voipmonitor
 
# Verify no zombie processes remaining
killall voipmonitor
 
# Restart the service
systemctl start voipmonitor
 
# Verify service status
systemctl status voipmonitor
</syntaxhighlight>
 
'''Caution:''' When using <code>systemd</code> service management, avoid using the deprecated <code>service</code> command as it can cause systemd to lose track of the daemon. Always use <code>systemctl</code> commands or direct process commands like <code>killall</code>.
 
=== Root Cause Analysis: Why Did the CPU Spike? ===
 
After recovering the system, investigate the root cause to prevent recurrence. Common causes include:
 
;SIP REGISTER Flood / Spaming Attack
Massive volumes of SIP REGISTER messages from malicious IPs can overwhelm the VoIPmonitor process.
 
* '''Detection:''' Check recent alert triggers in the VoIPmonitor GUI > Alerts > Sent Alerts for SIP REGISTER flood alerts
* '''Immediate mitigation:''' Block attacker IPs at the network edge (SBC, firewall, iptables)
* '''Long-term prevention:''' Configure anti-fraud rules with custom scripts to auto-block, see [[Anti-fraud#SIP REGISTER Flood/Attack|SIP REGISTER Flood Mitigation]]
 
;Packet Capture Overload (pcapcommand)
The <code>pcapcommand</code> feature forks a program for ''every'' call, which can generate up to 500,000 interrupts per second.
 
* '''Detection:''' Check <code>/etc/voipmonitor.conf</code> for a <code>pcapcommand</code> line
* '''Immediate fix:''' Comment out or remove the <code>pcapcommand</code> directive and restart the service
* '''Alternative:''' Use the built-in cleaning spool functionality (<code>maxpoolsize</code>, <code>cleanspool</code>) instead
 
;Excessive RTP Processing Threads
High concurrent call volumes can overload RTP processing threads.
 
* '''Detection:''' Check performance logs for high <code>tRTP_CPU</code> values (sum of all RTP threads)
* '''Mitigation:'''
  <pre>callslimit = 2000  # Limit max concurrent calls</pre>
 
;Audio Feature Overhead
Silence detection and audio conversion are CPU-intensive operations.


* '''Detection:''' Check if <code>silencedetect</code> or <code>saveaudio</code> are enabled
To access sensor RRD charts:
* '''Mitigation:'''
# Navigate to '''Settings > Sensors''' in the GUI
  <pre>
# Click the graph icon next to the sensor
  silencedetect = no
# Select the time range covering the problematic peak hours
  # saveaudio = wav  # Comment out if not needed
  </pre>


See [[Scaling|Scaling and Performance Tuning]] for detailed performance optimization strategies.
== Diagnostic Flowchart ==


=== Preventive Measures ===
<kroki lang="mermaid">
flowchart TD
    A[GUI Unresponsive / OOM Errors] --> B{Check Sensor RRD Charts}
    B --> C{SQL Cache Growing<br/>During Peak Hours?}
    C -->|No| D[Issue is NOT<br/>database bottleneck]
    C -->|Yes| E[Database Bottleneck<br/>Confirmed]
    E --> F{Identify<br/>Bottleneck Type}
    F --> G{mysqld CPU<br/>near 100%?}
    G -->|Yes| H[CPU Bottleneck]
    H --> I[Add CPU cores<br/>or upgrade CPU]
    G -->|No| J{Buffer pool full?<br/>Swap usage?}
    J -->|Yes| K[Memory Bottleneck]
    K --> L[Add RAM<br/>Tune innodb_buffer_pool_size]
    J -->|No| M{High iowait?<br/>Magnetic disks?}
    M -->|Yes| N[Storage I/O Bottleneck]
    N --> O[Upgrade to SSD/NVMe]


Once the root cause is identified, implement these preventive configurations:
    style A fill:#f9f,stroke:#333
    style E fill:#ff9,stroke:#333
    style H fill:#f96,stroke:#333
    style K fill:#f96,stroke:#333
    style N fill:#f96,stroke:#333
</kroki>


;Monitor CPU Trends:
== Diagnostic Step 1: Look for Growing SQL Cache ==
Use [[Collectd_installation|collectd]] or your existing monitoring system to track CPU usage over time and receive alerts before critical thresholds are reached.


;Anti-Fraud Auto-Blocking:
The most critical indicator of a database bottleneck is '''growing SQL cache''' or '''SQL cache files''' during peak hours.
Configure [[Anti-fraud|Anti-Fraud rules]] with custom scripts to automatically block attacker IPs when a flood is detected. See the [[Anti-fraud|Anti-Fraud documentation]] for PHP script examples using iptables or ipset.


;Network Edge Protection:
{| class="wikitable"
Block SIP REGISTER spam and floods at your network edge (SBC, firewall) before traffic reaches VoIPmonitor. This provides better performance and reduces CPU load on the monitoring system.
|-
! Metric !! What to Look For !! Indicates
|-
| '''SQL Cache''' || Consistently increasing during peak hours, never decreasing || Database cannot keep up with insert rate
|-
| '''SQL Cache Files''' || Growing over time during peak usage || Database buffer pool too small or storage too slow
|-
| '''CPU Load (mysqld)''' || Near 100% during peak hours || CPU bottleneck on database server
|-
| '''Disk I/O (mysql)''' || High or saturated during peak hours || Storage bottleneck (magnetic disks instead of SSDs)
|}


== Emergency: GUI and CLI Frequently Inaccessible Due to Memory Exhaustion ==
If you see SQL cache or SQL cache files growing consistently during peak traffic periods, the database server is the bottleneck.


When the VoIPmonitor GUI and CLI become frequently inaccessible or the server becomes unresponsive due to Out of Memory (OOM) conditions, follow these steps to identify and resolve the issue.
== Diagnostic Step 2: Determine the Bottleneck Type ==
 
=== Diagnose OOM Events ===
 
The Linux kernel out-of-memory (OOM) killer terminates processes when RAM is exhausted.
 
;Check the kernel ring buffer for OOM events:
<syntaxhighlight lang="bash">
dmesg -T | grep -i killed
</syntaxhighlight>


If you see messages like "Out of memory: Kill process" or "invoke-oom-killer", your system is running out of physical RAM.
After identifying that the database is the issue, determine which resource is the limiting factor:


=== Immediate Relief: Reduce Memory Allocation ===
=== CPU Bottleneck ===
* Check database CPU usage during peak hours
* If mysqld is at or near 100% CPU, you need more CPU cores or faster CPUs


Reduce memory consumption by tuning both MySQL and VoIPmonitor parameters.
=== Memory Bottleneck ===
* Check if SQL cache grows because buffer pool is too small
* Database runs out of RAM for caching, forcing disk reads
* The SQL cache chart shows a pattern of filling up and staying full


;1. Reduce MySQL Buffer Pool Size:
=== Storage I/O Bottleneck (Most Common) ===
* High disk I/O wait times for mysqld process
* Disk latency (iowait) increases during peak hours
* Database storage on magnetic disks (e.g., 10K SAS) instead of SSD/NVMe
* SQL cache grows because data cannot be written/read fast enough


Edit the MySQL configuration file (typically <code>/etc/my.cnf.d/mysql-server.cnf</code> or <code>/etc/mysql/my.cnf</code> for Debian/Ubuntu):
== Solutions for Database Performance Bottlenecks ==


<syntaxhighlight lang="ini">
=== Solution 1: Add More RAM to the Database Server ===
[mysqld]
# Reduce from 8GB to 6GB (adjust based on available RAM)
innodb_buffer_pool_size = 6G
</syntaxhighlight>


A good starting point is <code>innodb_buffer_pool_size = RAM * 0.5 - max_buffer_mem * 0.8</code>. For example, on a 16GB server with 8GB allocated to max_buffer_mem, set innodb_buffer_pool_size to approximately 6GB.
This is often the most effective fix for memory-related bottlenecks.


;2. Reduce VoIPmonitor Buffer Memory:
{| class="wikitable"
|-
! Current RAM !! Recommended Upgrade !! Expected Impact
|-
| 32GB || 64GB or 128GB || Significantly reduces cache growth
|-
| 64GB || 128GB or 256GB || Handles much higher peak loads
|-
| 128GB || 256GB || Suitable for large deployments
|}


Edit <code>/etc/voipmonitor.conf</code> and decrease the <code>max_buffer_mem</code> value:
After adding RAM, tune <code>innodb_buffer_pool_size</code> in your MySQL configuration:


<syntaxhighlight lang="ini">
<syntaxhighlight lang="ini">
[general]
# /etc/mysql/my.cnf
# Reduce from 8000 to 6000 (adjust based on available RAM)
# Set to 50-70% of total RAM on dedicated database server
max_buffer_mem = 6000
innodb_buffer_pool_size = 128G
</syntaxhighlight>
 
The <code>max_buffer_mem</code> parameter limits the maximum RAM allocation for the packet buffer. Typical values range from 2000-8000 MB depending on traffic volume and call rates.
 
;3. Restart the affected services:
 
<syntaxhighlight lang="bash">
systemctl restart mysqld
systemctl restart voipmonitor
</syntaxhighlight>
</syntaxhighlight>


Monitor the system to confirm stability.
For more tuning guidance, see [[Scaling#Optimizing_Database_Performance_.28MySQL.2FMariaDB.29|Scaling - Database Performance]].


=== Long-term Solution: Increase RAM ===
{{Warning|1=Do NOT reduce <code>innodb_buffer_pool_size</code> on the GUI server when the database is the bottleneck. This will make the problem worse.}}


For sustained production operation, increase the server's physical RAM:
=== Solution 2: Upgrade Database Storage to SSD/NVMe ===


* '''Minimum''': Add at least 16 GB of additional RAM to eliminate OOM conditions
If your database storage is on magnetic disks (e.g., 10K SATA or SAS), upgrading to SSDs is often the single most effective improvement.
* '''Performance benefit''': After the RAM upgrade, you can safely increase <code>innodb_buffer_pool_size</code> to improve MySQL performance
* '''Recommended settings''': Set <code>innodb_buffer_pool_size</code> to 50-70% of total RAM and <code>max_buffer_mem</code> based on your traffic requirements


See [[Sniffer_configuration#max_buffer_mem|Sniffer Configuration]] for details on VoIPmonitor memory settings.
{| class="wikitable"
|-
! Current Storage !! Recommended Upgrade !! Expected Speedup
|-
| 10K RPM SATA HDD || NVMe SSD array || 10-50x faster I/O
|-
| 10K RPM SAS HDD || Enterprise SSD (SAS/SATA) || 5-20x faster I/O
|-
| Older SSD || Modern NVMe (PCIe 4.0+) || 2-5x faster I/O
|}


== Emergency: System Freezes on Every Update Attempt ==
For high-traffic deployments, '''NVMe storage is recommended for the database host'''.


If the VoIPmonitor sensor becomes unresponsive or hangs each time you attempt to update it through the Web GUI:
See [[Hardware#Database_Storage|Hardware - Storage Selection]] for detailed recommendations.


;1. SSH into the sensor host
=== Solution 3: Temporary Mitigation - Schedule Alerts/Reports Outside Peak Hours ===
;2. Execute the following commands to forcefully stop and restart:
<syntaxhighlight lang="bash">
killall voipmonitor
systemctl stop voipmonitor
systemctl start voipmonitor
</syntaxhighlight>


This sequence ensures zombie processes are terminated, systemd is fully stopped, and a clean service restart occurs. Verify the sensor status in the GUI to confirm it is responding correctly.
If you cannot immediately upgrade the database server hardware, temporarily reduce the load by scheduling intensive tasks during off-peak hours.


== Emergency: Binary Not Found After Crash ==
'''1. Disable or reduce alert frequency''' during peak hours:
* Navigate to '''GUI > Alerts'''
* Temporarily disable high-frequency alerts
* Set alerts to run during off-peak periods (e.g., 2am-4am)


If the VoIPmonitor service fails to start after a crash with error "Binary not found" for <code>/usr/local/sbin/voipmonitor</code>:
'''2. Schedule reports outside peak usage:'''
* Navigate to '''GUI > Reports'''
* Configure scheduled reports for off-peak hours
* Avoid generating reports during the busiest part of the day


;1. Check for a renamed binary:
'''3. Reduce dashboard complexity''' during peak hours:
<syntaxhighlight lang="bash">
* Simplify dashboards that query large ranges of data
ls -l /usr/local/sbin/voipmonitor_*
* Avoid "All time" statistics during peak loads
</syntaxhighlight>
* Use cached dashboards or static displays when possible


The crash recovery process may have renamed the binary with an underscore suffix.
=== Solution 4: Consider Component Separation ===


;2. If found, rename it back:
If the database server is a bottleneck and upgrading is not feasible, consider moving to a dedicated database architecture.
<syntaxhighlight lang="bash">
mv /usr/local/sbin/voipmonitor_ /usr/local/sbin/voipmonitor
</syntaxhighlight>


;3. Restart the service:
In a component separation deployment (see [[Scaling#Scaling_Through_Component_Separation|Scaling - Component Separation]]):
<syntaxhighlight lang="bash">
* '''Host 1:''' Dedicated database server with maximum RAM and SSD/NVMe storage
systemctl start voipmonitor
* '''Host 2:''' GUI web server
systemctl status voipmonitor
* '''Host 3:''' Sensor(s)
</syntaxhighlight>


Verify the service starts correctly.
This allows you to independently scale the database with more powerful hardware without affecting the GUI.


== Out-of-Band Management Scenarios ==
== Common Pitfalls to Avoid ==


When the system is completely unresponsive and cannot be accessed via SSH:
{| class="wikitable" style="background:#fff3cd; border:1px solid #ffc107;"
|-
! colspan="2" style="background:#ffc107;" | Incorrect Solutions When Database is the Bottleneck
|-
| style="vertical-align: top;" | '''Reducing PHP memory_limit'''
| This does NOT fix the root cause. PHP waits for the database; less memory means processes crash sooner.
|-
| style="vertical-align: top;" | '''Tuning PHP-FPM worker counts'''
| More workers will pile up waiting for slow database queries, consuming even more memory.
|-
| style="vertical-align: top;" | '''Reducing innodb_buffer_pool_size'''
| This makes the database slower, not faster. It causes more disk I/O and longer query times.
|-
| style="vertical-align: top;" | '''Adding RAM to the GUI server'''
| If the bottleneck is the database, adding RAM to the GUI won't help. The database is the limiting factor.
|}


* '''Use your server's out-of-band management system:'''
== Verification Checklist ==
  * Dell iDRAC
  * HP iLO
  * Supermicro IPMI
  * Other vendor-specific BMC/management tools


* '''Actions available via OBM:'''
After implementing a database upgrade to fix the bottleneck:
  * Access virtual console (KVM-over-IP)
  * Send NMI (Non-Maskable Interrupt) for system dump
  * Force power cycle
  * Monitor hardware health


See [[Sniffer_troubleshooting|Sniffer Troubleshooting]] for more diagnostic procedures.
1. Monitor SQL cache charts during the next peak traffic period
2. Check that SQL cache does not grow uncontrollably
3. Verify GUI responsiveness during peak hours
4. Confirm no OOM killer events
5. Check database query latency (slow queries should be minimal)


== Related Documentation ==
== Related Documentation ==


* [[Scaling|Scaling and Performance Tuning Guide]] - For performance optimization
* [[Scaling]] - General performance tuning and scaling guide
* [[Anti-fraud|Anti-Fraud Rules]] - For attack detection and mitigation
* [[Scaling#Optimizing_Database_Performance_.28MySQL.2FMariaDB.29|Optimizing Database Performance]] - MySQL tuning parameters
* [[Sniffer_troubleshooting|Sniffer Troubleshooting]] - For systematic diagnostic procedures
* [[Hardware]] - Hardware sizing recommendations for different deployment sizes
* [[High-Performance_VoIPmonitor_and_MySQL_Setup_Manual|High-Performance Setup]] - For optimizing high-traffic deployments
* [[Scaling#Scaling_Through_Component_Separation|Component Separation]] - Dedicated database architecture
* [[Systemd_for_voipmonitor_service_management|Systemd Service Management]] - For service management best practices


== AI Summary for RAG ==
== AI Summary for RAG ==


'''Summary:''' This article provides emergency procedures for recovering VoIPmonitor from critical failures. It covers steps to force-terminate runaway processes consuming excessive CPU (including kill -9 and systemctl commands), root cause analysis for CPU spikes (SIP REGISTER floods, pcapcommand, RTP threads, audio features), OOM memory exhaustion troubleshooting (checking dmesg for killed processes, reducing innodb_buffer_pool_size and max_buffer_mem), preventive measures (monitoring, anti-fraud auto-blocking, network edge protection), recovery procedures for system freezes during updates and binary issues after crashes, and out-of-band management scenarios.
'''Summary:''' Guide for diagnosing database bottlenecks affecting VoIPmonitor GUI using sensor RRD charts. Symptoms: GUI unresponsive during peak hours, OOM killer terminating PHP processes, slow dashboard/CDR views. KEY DIAGNOSTIC: Check sensor RRD charts (Settings > Sensors > graph icon) for growing SQL cache during peak hours - primary indicator of database bottleneck. Bottleneck types: CPU (mysqld at 100%), Memory (buffer pool too small), Storage I/O (most common - high iowait, magnetic disks). Solutions: (1) Add RAM to database server and tune innodb_buffer_pool_size to 50-70% of RAM; (2) Upgrade storage from HDD to SSD/NVMe (10-50x speedup); (3) Schedule alerts/reports outside peak hours; (4) Component separation with dedicated database server. INCORRECT solutions: Do NOT reduce PHP memory_limit, do NOT tune PHP-FPM workers, do NOT reduce innodb_buffer_pool_size, do NOT add RAM to GUI server instead of database.


'''Keywords:''' emergency recovery, high CPU, system unresponsive, runaway process, kill process, kill -9, systemctl, SIP REGISTER flood, pcapcommand, performance optimization, out-of-band management, iDRAC, iLO, IPMI, crash recovery, OOM, out of memory, memory exhaustion, dmesg, innodb_buffer_pool_size, max_buffer_mem
'''Keywords:''' database bottleneck, RRD charts, sensor performance, SQL cache, SQL cache files, peak hours, OOM killer, GUI unresponsive, dashboard slow, RAM upgrade, SSD upgrade, NVMe, iowait, innodb_buffer_pool_size, component separation, dedicated database


'''Key Questions:'''
'''Key Questions:'''
* What to do when VoIPmonitor consumes 3000% CPU or system becomes unresponsive?
* How do I diagnose database bottlenecks in VoIPmonitor?
* How to forcefully terminate a runaway VoIPmonitor process?
* What do growing SQL cache files in RRD charts indicate?
* What are common causes of CPU spikes in VoIPmonitor?
* Why is my VoIPmonitor GUI slow during peak hours?
* How to mitigate SIP REGISTER flood attacks causing high CPU?
* How to fix OOM killer terminating PHP processes?
* How to diagnose OOM (Out of Memory) events?
* Should I upgrade RAM on GUI server or database server?
* How to fix GUI and CLI frequently inaccessible due to memory exhaustion?
* What storage is recommended for VoIPmonitor database?
* How to reduce memory usage of MySQL and VoIPmonitor?
* How to access sensor RRD charts in VoIPmonitor GUI?
* What is max_buffer_mem and how to configure it?
* What are incorrect solutions for database bottlenecks?
* How to restart VoIPmonitor service after a crash?
* How much RAM should innodb_buffer_pool_size be set to?
* What to do if service binary is not found after crash?
* When should I consider component separation for VoIPmonitor?
* How to prevent VoIPmonitor from freezing during GUI updates?
* What tools can help diagnose VoIPmonitor performance issues?

Revision as of 18:00, 6 January 2026

Diagnosing Database Bottlenecks Using Sensor RRD Charts

If your VoIPmonitor GUI becomes unresponsive or PHP processes are being terminated by the OOM (Out of Memory) killer, the root cause may be a database performance bottleneck, not a PHP configuration issue.

This guide explains how to use the sensor's RRD (Round-Robin Database) charts to identify whether the database server is the limiting factor.

Symptoms of Database Bottlenecks affecting the GUI

  • GUI becomes extremely slow or unresponsive during peak hours
  • PHP processes are killed by the OOM killer on the GUI server
  • Dashboard and CDR views take a long time to load
  • Alerts and reports fail during high traffic periods
  • System appears fine during off-peak hours but degrades during peak usage

ℹ️ Note: These symptoms often occur when the GUI server is waiting for database queries to complete, causing PHP processes to pile up and consume excessive memory.

Understanding Sensor RRD Charts

The VoIPmonitor sensor generates performance charts (RRD files) that track system metrics over time. These charts are accessible through the GUI and provide visual indicators of where bottlenecks are occurring.

To access sensor RRD charts:

  1. Navigate to Settings > Sensors in the GUI
  2. Click the graph icon next to the sensor
  3. Select the time range covering the problematic peak hours

Diagnostic Flowchart

Diagnostic Step 1: Look for Growing SQL Cache

The most critical indicator of a database bottleneck is growing SQL cache or SQL cache files during peak hours.

Metric What to Look For Indicates
SQL Cache Consistently increasing during peak hours, never decreasing Database cannot keep up with insert rate
SQL Cache Files Growing over time during peak usage Database buffer pool too small or storage too slow
CPU Load (mysqld) Near 100% during peak hours CPU bottleneck on database server
Disk I/O (mysql) High or saturated during peak hours Storage bottleneck (magnetic disks instead of SSDs)

If you see SQL cache or SQL cache files growing consistently during peak traffic periods, the database server is the bottleneck.

Diagnostic Step 2: Determine the Bottleneck Type

After identifying that the database is the issue, determine which resource is the limiting factor:

CPU Bottleneck

  • Check database CPU usage during peak hours
  • If mysqld is at or near 100% CPU, you need more CPU cores or faster CPUs

Memory Bottleneck

  • Check if SQL cache grows because buffer pool is too small
  • Database runs out of RAM for caching, forcing disk reads
  • The SQL cache chart shows a pattern of filling up and staying full

Storage I/O Bottleneck (Most Common)

  • High disk I/O wait times for mysqld process
  • Disk latency (iowait) increases during peak hours
  • Database storage on magnetic disks (e.g., 10K SAS) instead of SSD/NVMe
  • SQL cache grows because data cannot be written/read fast enough

Solutions for Database Performance Bottlenecks

Solution 1: Add More RAM to the Database Server

This is often the most effective fix for memory-related bottlenecks.

Current RAM Recommended Upgrade Expected Impact
32GB 64GB or 128GB Significantly reduces cache growth
64GB 128GB or 256GB Handles much higher peak loads
128GB 256GB Suitable for large deployments

After adding RAM, tune innodb_buffer_pool_size in your MySQL configuration:

# /etc/mysql/my.cnf
# Set to 50-70% of total RAM on dedicated database server
innodb_buffer_pool_size = 128G

For more tuning guidance, see Scaling - Database Performance.

⚠️ Warning: Do NOT reduce innodb_buffer_pool_size on the GUI server when the database is the bottleneck. This will make the problem worse.

Solution 2: Upgrade Database Storage to SSD/NVMe

If your database storage is on magnetic disks (e.g., 10K SATA or SAS), upgrading to SSDs is often the single most effective improvement.

Current Storage Recommended Upgrade Expected Speedup
10K RPM SATA HDD NVMe SSD array 10-50x faster I/O
10K RPM SAS HDD Enterprise SSD (SAS/SATA) 5-20x faster I/O
Older SSD Modern NVMe (PCIe 4.0+) 2-5x faster I/O

For high-traffic deployments, NVMe storage is recommended for the database host.

See Hardware - Storage Selection for detailed recommendations.

Solution 3: Temporary Mitigation - Schedule Alerts/Reports Outside Peak Hours

If you cannot immediately upgrade the database server hardware, temporarily reduce the load by scheduling intensive tasks during off-peak hours.

1. Disable or reduce alert frequency during peak hours:

  • Navigate to GUI > Alerts
  • Temporarily disable high-frequency alerts
  • Set alerts to run during off-peak periods (e.g., 2am-4am)

2. Schedule reports outside peak usage:

  • Navigate to GUI > Reports
  • Configure scheduled reports for off-peak hours
  • Avoid generating reports during the busiest part of the day

3. Reduce dashboard complexity during peak hours:

  • Simplify dashboards that query large ranges of data
  • Avoid "All time" statistics during peak loads
  • Use cached dashboards or static displays when possible

Solution 4: Consider Component Separation

If the database server is a bottleneck and upgrading is not feasible, consider moving to a dedicated database architecture.

In a component separation deployment (see Scaling - Component Separation):

  • Host 1: Dedicated database server with maximum RAM and SSD/NVMe storage
  • Host 2: GUI web server
  • Host 3: Sensor(s)

This allows you to independently scale the database with more powerful hardware without affecting the GUI.

Common Pitfalls to Avoid

Incorrect Solutions When Database is the Bottleneck
Reducing PHP memory_limit This does NOT fix the root cause. PHP waits for the database; less memory means processes crash sooner.
Tuning PHP-FPM worker counts More workers will pile up waiting for slow database queries, consuming even more memory.
Reducing innodb_buffer_pool_size This makes the database slower, not faster. It causes more disk I/O and longer query times.
Adding RAM to the GUI server If the bottleneck is the database, adding RAM to the GUI won't help. The database is the limiting factor.

Verification Checklist

After implementing a database upgrade to fix the bottleneck:

1. Monitor SQL cache charts during the next peak traffic period 2. Check that SQL cache does not grow uncontrollably 3. Verify GUI responsiveness during peak hours 4. Confirm no OOM killer events 5. Check database query latency (slow queries should be minimal)

Related Documentation

AI Summary for RAG

Summary: Guide for diagnosing database bottlenecks affecting VoIPmonitor GUI using sensor RRD charts. Symptoms: GUI unresponsive during peak hours, OOM killer terminating PHP processes, slow dashboard/CDR views. KEY DIAGNOSTIC: Check sensor RRD charts (Settings > Sensors > graph icon) for growing SQL cache during peak hours - primary indicator of database bottleneck. Bottleneck types: CPU (mysqld at 100%), Memory (buffer pool too small), Storage I/O (most common - high iowait, magnetic disks). Solutions: (1) Add RAM to database server and tune innodb_buffer_pool_size to 50-70% of RAM; (2) Upgrade storage from HDD to SSD/NVMe (10-50x speedup); (3) Schedule alerts/reports outside peak hours; (4) Component separation with dedicated database server. INCORRECT solutions: Do NOT reduce PHP memory_limit, do NOT tune PHP-FPM workers, do NOT reduce innodb_buffer_pool_size, do NOT add RAM to GUI server instead of database.

Keywords: database bottleneck, RRD charts, sensor performance, SQL cache, SQL cache files, peak hours, OOM killer, GUI unresponsive, dashboard slow, RAM upgrade, SSD upgrade, NVMe, iowait, innodb_buffer_pool_size, component separation, dedicated database

Key Questions:

  • How do I diagnose database bottlenecks in VoIPmonitor?
  • What do growing SQL cache files in RRD charts indicate?
  • Why is my VoIPmonitor GUI slow during peak hours?
  • How to fix OOM killer terminating PHP processes?
  • Should I upgrade RAM on GUI server or database server?
  • What storage is recommended for VoIPmonitor database?
  • How to access sensor RRD charts in VoIPmonitor GUI?
  • What are incorrect solutions for database bottlenecks?
  • How much RAM should innodb_buffer_pool_size be set to?
  • When should I consider component separation for VoIPmonitor?