Emergency procedures: Difference between revisions

From VoIPmonitor.org
(Review: použití šablon {{Note}} a {{Warning}} místo prostého textu)
(Rewrite: streamlined structure, improved flowchart, clearer diagnostic steps)
 
Line 1: Line 1:
= Diagnosing Database Bottlenecks Using Sensor RRD Charts =
{{DISPLAYTITLE:Emergency Procedures: GUI Performance Crisis}}
[[Category:Troubleshooting]]
[[Category:Database]]


If your VoIPmonitor GUI becomes unresponsive or PHP processes are being terminated by the OOM (Out of Memory) killer, the root cause may be a '''database performance bottleneck''', not a PHP configuration issue.
= GUI Performance Crisis: Database Bottleneck Diagnosis =


This guide explains how to use the sensor's RRD (Round-Robin Database) charts to identify whether the database server is the limiting factor.
When VoIPmonitor GUI becomes unresponsive or PHP processes are killed by OOM, the root cause is often a '''database bottleneck''', not PHP configuration. This guide shows how to diagnose using sensor RRD charts.


== Symptoms of Database Bottlenecks affecting the GUI ==
{{Note|For general troubleshooting, see [[Database_troubleshooting]], [[GUI_troubleshooting]], or [[Sniffer_troubleshooting]].}}


* GUI becomes extremely slow or unresponsive during peak hours
== Symptoms ==
* PHP processes are killed by the OOM killer on the GUI server
* Dashboard and CDR views take a long time to load
* Alerts and reports fail during high traffic periods
* System appears fine during off-peak hours but degrades during peak usage


{{Note|These symptoms often occur when the GUI server is waiting for database queries to complete, causing PHP processes to pile up and consume excessive memory.}}
* GUI extremely slow or unresponsive during peak hours
 
* PHP processes killed by OOM killer
== Understanding Sensor RRD Charts ==
* Dashboard and CDR views take long to load
 
* Alerts/reports fail during high traffic
The VoIPmonitor sensor generates performance charts (RRD files) that track system metrics over time. These charts are accessible through the GUI and provide visual indicators of where bottlenecks are occurring.
* System fine during off-peak, degrades during peak
 
To access sensor RRD charts:
# Navigate to '''Settings > Sensors''' in the GUI
# Click the graph icon next to the sensor
# Select the time range covering the problematic peak hours


== Diagnostic Flowchart ==
== Diagnostic Flowchart ==


<kroki lang="mermaid">
<kroki lang="mermaid">
%%{init: {'flowchart': {'nodeSpacing': 15, 'rankSpacing': 35}}}%%
flowchart TD
flowchart TD
     A[GUI Unresponsive / OOM Errors] --> B{Check Sensor RRD Charts}
     A[GUI Slow / OOM Errors] --> B{Check RRD Charts<br/>Settings → Sensors → 📊}
     B --> C{SQL Cache Growing<br/>During Peak Hours?}
     B --> C{SQL Cache growing<br/>during peak?}
     C -->|No| D[Issue is NOT<br/>database bottleneck]
     C -->|No| D[NOT database bottleneck<br/>Check PHP/Apache config]
     C -->|Yes| E[Database Bottleneck<br/>Confirmed]
     C -->|Yes| E[Database Bottleneck<br/>Confirmed]
     E --> F{Identify<br/>Bottleneck Type}
     E --> F{mysqld CPU ~100%?}
    F --> G{mysqld CPU<br/>near 100%?}
     F -->|Yes| G[CPU Bottleneck<br/>→ Upgrade CPU]
     G -->|Yes| H[CPU Bottleneck]
     F -->|No| H{High iowait?<br/>HDD storage?}
    H --> I[Add CPU cores<br/>or upgrade CPU]
     H -->|Yes| I[I/O Bottleneck<br/>→ Upgrade to SSD/NVMe]
     G -->|No| J{Buffer pool full?<br/>Swap usage?}
     H -->|No| J[Memory Bottleneck<br/>→ Add RAM + tune buffer_pool]
     J -->|Yes| K[Memory Bottleneck]
    K --> L[Add RAM<br/>Tune innodb_buffer_pool_size]
     J -->|No| M{High iowait?<br/>Magnetic disks?}
    M -->|Yes| N[Storage I/O Bottleneck]
    N --> O[Upgrade to SSD/NVMe]


     style A fill:#f9f,stroke:#333
     style A fill:#f9f,stroke:#333
     style E fill:#ff9,stroke:#333
     style E fill:#ff9,stroke:#333
    style H fill:#f96,stroke:#333
    style K fill:#f96,stroke:#333
    style N fill:#f96,stroke:#333
</kroki>
</kroki>


== Diagnostic Step 1: Look for Growing SQL Cache ==
== Step 1: Access RRD Charts ==


The most critical indicator of a database bottleneck is '''growing SQL cache''' or '''SQL cache files''' during peak hours.
# Navigate to '''Settings → Sensors'''
# Click the '''graph icon''' (📊) next to the sensor
# Select time range covering problematic peak hours
 
== Step 2: Identify Growing SQL Cache ==
 
The key indicator is '''SQL Cache''' or '''SQL Cache Files''' growing during peak hours:


{| class="wikitable"
{| class="wikitable"
Line 58: Line 50:
! Metric !! What to Look For !! Indicates
! Metric !! What to Look For !! Indicates
|-
|-
| '''SQL Cache''' || Consistently increasing during peak hours, never decreasing || Database cannot keep up with insert rate
| '''SQL Cache''' || Consistently increasing, never decreasing || DB cannot keep up with inserts
|-
|-
| '''SQL Cache Files''' || Growing over time during peak usage || Database buffer pool too small or storage too slow
| '''SQL Cache Files''' || Growing over time || Buffer pool too small or storage too slow
|-
|-
| '''CPU Load (mysqld)''' || Near 100% during peak hours || CPU bottleneck on database server
| '''mysqld CPU''' || Near 100% || CPU bottleneck
|-
|-
| '''Disk I/O (mysql)''' || High or saturated during peak hours || Storage bottleneck (magnetic disks instead of SSDs)
| '''Disk I/O (mysql)''' || High/saturated || Storage bottleneck (HDD vs SSD)
|}
|}


If you see SQL cache or SQL cache files growing consistently during peak traffic periods, the database server is the bottleneck.
{{Warning|1=If SQL Cache is NOT growing, the problem is likely NOT the database. Check PHP/Apache configuration instead.}}


== Diagnostic Step 2: Determine the Bottleneck Type ==
== Step 3: Identify Bottleneck Type ==
 
After identifying that the database is the issue, determine which resource is the limiting factor:


=== CPU Bottleneck ===
=== CPU Bottleneck ===
* Check database CPU usage during peak hours
* <code>mysqld</code> at or near 100% CPU
* If mysqld is at or near 100% CPU, you need more CPU cores or faster CPUs
* '''Solution:''' More CPU cores or faster CPU


=== Memory Bottleneck ===
=== Memory Bottleneck ===
* Check if SQL cache grows because buffer pool is too small
* SQL cache fills up and stays full
* Database runs out of RAM for caching, forcing disk reads
* Buffer pool too small for dataset
* The SQL cache chart shows a pattern of filling up and staying full
* '''Solution:''' Add RAM, tune <code>innodb_buffer_pool_size</code>


=== Storage I/O Bottleneck (Most Common) ===
=== Storage I/O Bottleneck (Most Common) ===
* High disk I/O wait times for mysqld process
* High <code>iowait</code> during peak hours
* Disk latency (iowait) increases during peak hours
* Database on magnetic disks (HDD)
* Database storage on magnetic disks (e.g., 10K SAS) instead of SSD/NVMe
* '''Solution:''' Upgrade to SSD/NVMe (10-50x improvement)
* SQL cache grows because data cannot be written/read fast enough


== Solutions for Database Performance Bottlenecks ==
== Solutions ==


=== Solution 1: Add More RAM to the Database Server ===
=== Add RAM to Database Server ===


This is often the most effective fix for memory-related bottlenecks.
<syntaxhighlight lang="ini">
# /etc/mysql/my.cnf
# Set to 50-70% of total RAM on dedicated DB server
innodb_buffer_pool_size = 64G
</syntaxhighlight>


{| class="wikitable"
{| class="wikitable"
|-
|-
! Current RAM !! Recommended Upgrade !! Expected Impact
! Current RAM !! Recommended !! <code>innodb_buffer_pool_size</code>
|-
| 32GB || 64GB or 128GB || Significantly reduces cache growth
|-
|-
| 64GB || 128GB or 256GB || Handles much higher peak loads
| 32GB || 64-128GB || 32-64G
|-
|-
| 128GB || 256GB || Suitable for large deployments
| 64GB || 128-256GB || 64-128G
|}
|}


After adding RAM, tune <code>innodb_buffer_pool_size</code> in your MySQL configuration:
=== Upgrade to SSD/NVMe ===
 
<syntaxhighlight lang="ini">
# /etc/mysql/my.cnf
# Set to 50-70% of total RAM on dedicated database server
innodb_buffer_pool_size = 128G
</syntaxhighlight>
 
For more tuning guidance, see [[Scaling#Optimizing_Database_Performance_.28MySQL.2FMariaDB.29|Scaling - Database Performance]].
 
{{Warning|1=Do NOT reduce <code>innodb_buffer_pool_size</code> on the GUI server when the database is the bottleneck. This will make the problem worse.}}
 
=== Solution 2: Upgrade Database Storage to SSD/NVMe ===
 
If your database storage is on magnetic disks (e.g., 10K SATA or SAS), upgrading to SSDs is often the single most effective improvement.


{| class="wikitable"
{| class="wikitable"
|-
|-
! Current Storage !! Recommended Upgrade !! Expected Speedup
! Current !! Upgrade To !! Expected Speedup
|-
|-
| 10K RPM SATA HDD || NVMe SSD array || 10-50x faster I/O
| 10K RPM HDD || NVMe SSD || 10-50x
|-
|-
| 10K RPM SAS HDD || Enterprise SSD (SAS/SATA) || 5-20x faster I/O
| SAS HDD || Enterprise SSD || 5-20x
|-
|-
| Older SSD || Modern NVMe (PCIe 4.0+) || 2-5x faster I/O
| Older SSD || NVMe (PCIe 4.0+) || 2-5x
|}
|}


For high-traffic deployments, '''NVMe storage is recommended for the database host'''.
=== Temporary Mitigation ===


See [[Hardware#Database_Storage|Hardware - Storage Selection]] for detailed recommendations.
If immediate hardware upgrade not possible:
* '''Alerts:''' Reduce frequency or schedule during off-peak (2am-4am)
* '''Reports:''' Schedule for off-peak hours
* '''Dashboards:''' Simplify queries, avoid "All time" ranges


=== Solution 3: Temporary Mitigation - Schedule Alerts/Reports Outside Peak Hours ===
=== Component Separation ===


If you cannot immediately upgrade the database server hardware, temporarily reduce the load by scheduling intensive tasks during off-peak hours.
For persistent issues, consider dedicated database server:
 
* '''Host 1:''' Database (max RAM + SSD/NVMe)
'''1. Disable or reduce alert frequency''' during peak hours:
* Navigate to '''GUI > Alerts'''
* Temporarily disable high-frequency alerts
* Set alerts to run during off-peak periods (e.g., 2am-4am)
 
'''2. Schedule reports outside peak usage:'''
* Navigate to '''GUI > Reports'''
* Configure scheduled reports for off-peak hours
* Avoid generating reports during the busiest part of the day
 
'''3. Reduce dashboard complexity''' during peak hours:
* Simplify dashboards that query large ranges of data
* Avoid "All time" statistics during peak loads
* Use cached dashboards or static displays when possible
 
=== Solution 4: Consider Component Separation ===
 
If the database server is a bottleneck and upgrading is not feasible, consider moving to a dedicated database architecture.
 
In a component separation deployment (see [[Scaling#Scaling_Through_Component_Separation|Scaling - Component Separation]]):
* '''Host 1:''' Dedicated database server with maximum RAM and SSD/NVMe storage
* '''Host 2:''' GUI web server
* '''Host 2:''' GUI web server
* '''Host 3:''' Sensor(s)
* '''Host 3:''' Sensor(s)


This allows you to independently scale the database with more powerful hardware without affecting the GUI.
See [[Scaling#Scaling_Through_Component_Separation|Scaling - Component Separation]].
 
== Common Mistakes ==


== Common Pitfalls to Avoid ==
{{Warning|1=These do NOT fix database bottlenecks:}}


{| class="wikitable" style="background:#fff3cd; border:1px solid #ffc107;"
{| class="wikitable"
|-
|-
! colspan="2" style="background:#ffc107;" | Incorrect Solutions When Database is the Bottleneck
! Wrong Action !! Why It Fails
|-
|-
| style="vertical-align: top;" | '''Reducing PHP memory_limit'''
| Reducing PHP <code>memory_limit</code> || PHP waits for DB; less memory = earlier crashes
| This does NOT fix the root cause. PHP waits for the database; less memory means processes crash sooner.
|-
|-
| style="vertical-align: top;" | '''Tuning PHP-FPM worker counts'''
| Adding more PHP-FPM workers || More workers pile up waiting for slow DB
| More workers will pile up waiting for slow database queries, consuming even more memory.
|-
|-
| style="vertical-align: top;" | '''Reducing innodb_buffer_pool_size'''
| Reducing <code>innodb_buffer_pool_size</code> || Makes DB slower, increases disk I/O
| This makes the database slower, not faster. It causes more disk I/O and longer query times.
|-
|-
| style="vertical-align: top;" | '''Adding RAM to the GUI server'''
| Adding RAM to GUI server || Bottleneck is DB server, not GUI
| If the bottleneck is the database, adding RAM to the GUI won't help. The database is the limiting factor.
|}
|}


== Verification Checklist ==
== Verification ==
 
After implementing a database upgrade to fix the bottleneck:


1. Monitor SQL cache charts during the next peak traffic period
After implementing fix:
2. Check that SQL cache does not grow uncontrollably
# Monitor SQL cache during next peak period
3. Verify GUI responsiveness during peak hours
# Verify SQL cache does NOT grow uncontrollably
4. Confirm no OOM killer events
# Confirm GUI responsiveness
5. Check database query latency (slow queries should be minimal)
# Check for OOM killer events in system logs


== Related Documentation ==
== See Also ==


* [[Scaling]] - General performance tuning and scaling guide
* [[Database_troubleshooting]] - SQL queue issues, CDR delays
* [[Scaling#Optimizing_Database_Performance_.28MySQL.2FMariaDB.29|Optimizing Database Performance]] - MySQL tuning parameters
* [[Scaling]] - Performance tuning and database optimization
* [[Hardware]] - Hardware sizing recommendations for different deployment sizes
* [[GUI_troubleshooting]] - HTTP 500, login issues, debug mode
* [[Scaling#Scaling_Through_Component_Separation|Component Separation]] - Dedicated database architecture


== AI Summary for RAG ==
== AI Summary for RAG ==


'''Summary:''' Guide for diagnosing database bottlenecks affecting VoIPmonitor GUI using sensor RRD charts. Symptoms: GUI unresponsive during peak hours, OOM killer terminating PHP processes, slow dashboard/CDR views. KEY DIAGNOSTIC: Check sensor RRD charts (Settings > Sensors > graph icon) for growing SQL cache during peak hours - primary indicator of database bottleneck. Bottleneck types: CPU (mysqld at 100%), Memory (buffer pool too small), Storage I/O (most common - high iowait, magnetic disks). Solutions: (1) Add RAM to database server and tune innodb_buffer_pool_size to 50-70% of RAM; (2) Upgrade storage from HDD to SSD/NVMe (10-50x speedup); (3) Schedule alerts/reports outside peak hours; (4) Component separation with dedicated database server. INCORRECT solutions: Do NOT reduce PHP memory_limit, do NOT tune PHP-FPM workers, do NOT reduce innodb_buffer_pool_size, do NOT add RAM to GUI server instead of database.
'''Summary:''' Emergency guide for diagnosing database bottlenecks affecting VoIPmonitor GUI using sensor RRD charts. Symptoms: GUI unresponsive during peak hours, OOM killer terminating PHP, slow dashboards. KEY DIAGNOSTIC: Access RRD charts (Settings Sensors graph icon), look for growing SQL cache during peak hours - primary indicator of DB bottleneck. Bottleneck types: CPU (mysqld at 100%), Memory (buffer pool full), Storage I/O (most common - high iowait, HDD storage). Solutions: (1) Add RAM and set innodb_buffer_pool_size to 50-70% of RAM; (2) Upgrade HDD to SSD/NVMe (10-50x speedup); (3) Schedule alerts/reports off-peak; (4) Component separation with dedicated DB server. WRONG solutions: Do NOT reduce PHP memory_limit, do NOT add PHP-FPM workers, do NOT reduce innodb_buffer_pool_size, do NOT add RAM to GUI server.


'''Keywords:''' database bottleneck, RRD charts, sensor performance, SQL cache, SQL cache files, peak hours, OOM killer, GUI unresponsive, dashboard slow, RAM upgrade, SSD upgrade, NVMe, iowait, innodb_buffer_pool_size, component separation, dedicated database
'''Keywords:''' database bottleneck, RRD charts, SQL cache, peak hours, OOM killer, GUI slow, GUI unresponsive, innodb_buffer_pool_size, SSD upgrade, NVMe, iowait, component separation, emergency procedures, performance crisis


'''Key Questions:'''
'''Key Questions:'''
* How do I diagnose database bottlenecks in VoIPmonitor?
* How to diagnose database bottlenecks in VoIPmonitor?
* What do growing SQL cache files in RRD charts indicate?
* What does growing SQL cache in RRD charts indicate?
* Why is my VoIPmonitor GUI slow during peak hours?
* Why is VoIPmonitor GUI slow during peak hours?
* How to fix OOM killer terminating PHP processes?
* How to fix OOM killer terminating PHP processes?
* Should I upgrade RAM on GUI server or database server?
* Should I upgrade RAM on GUI or database server?
* What storage is recommended for VoIPmonitor database?
* What storage is recommended for VoIPmonitor database?
* How to access sensor RRD charts in VoIPmonitor GUI?
* How to access sensor RRD charts?
* What are incorrect solutions for database bottlenecks?
* What are wrong solutions for database bottlenecks?
* How much RAM should innodb_buffer_pool_size be set to?
* When should I consider component separation for VoIPmonitor?

Latest revision as of 16:48, 8 January 2026


GUI Performance Crisis: Database Bottleneck Diagnosis

When VoIPmonitor GUI becomes unresponsive or PHP processes are killed by OOM, the root cause is often a database bottleneck, not PHP configuration. This guide shows how to diagnose using sensor RRD charts.

ℹ️ Note: For general troubleshooting, see Database_troubleshooting, GUI_troubleshooting, or Sniffer_troubleshooting.

Symptoms

  • GUI extremely slow or unresponsive during peak hours
  • PHP processes killed by OOM killer
  • Dashboard and CDR views take long to load
  • Alerts/reports fail during high traffic
  • System fine during off-peak, degrades during peak

Diagnostic Flowchart

Step 1: Access RRD Charts

  1. Navigate to Settings → Sensors
  2. Click the graph icon (📊) next to the sensor
  3. Select time range covering problematic peak hours

Step 2: Identify Growing SQL Cache

The key indicator is SQL Cache or SQL Cache Files growing during peak hours:

Metric What to Look For Indicates
SQL Cache Consistently increasing, never decreasing DB cannot keep up with inserts
SQL Cache Files Growing over time Buffer pool too small or storage too slow
mysqld CPU Near 100% CPU bottleneck
Disk I/O (mysql) High/saturated Storage bottleneck (HDD vs SSD)

⚠️ Warning: If SQL Cache is NOT growing, the problem is likely NOT the database. Check PHP/Apache configuration instead.

Step 3: Identify Bottleneck Type

CPU Bottleneck

  • mysqld at or near 100% CPU
  • Solution: More CPU cores or faster CPU

Memory Bottleneck

  • SQL cache fills up and stays full
  • Buffer pool too small for dataset
  • Solution: Add RAM, tune innodb_buffer_pool_size

Storage I/O Bottleneck (Most Common)

  • High iowait during peak hours
  • Database on magnetic disks (HDD)
  • Solution: Upgrade to SSD/NVMe (10-50x improvement)

Solutions

Add RAM to Database Server

# /etc/mysql/my.cnf
# Set to 50-70% of total RAM on dedicated DB server
innodb_buffer_pool_size = 64G
Current RAM Recommended innodb_buffer_pool_size
32GB 64-128GB 32-64G
64GB 128-256GB 64-128G

Upgrade to SSD/NVMe

Current Upgrade To Expected Speedup
10K RPM HDD NVMe SSD 10-50x
SAS HDD Enterprise SSD 5-20x
Older SSD NVMe (PCIe 4.0+) 2-5x

Temporary Mitigation

If immediate hardware upgrade not possible:

  • Alerts: Reduce frequency or schedule during off-peak (2am-4am)
  • Reports: Schedule for off-peak hours
  • Dashboards: Simplify queries, avoid "All time" ranges

Component Separation

For persistent issues, consider dedicated database server:

  • Host 1: Database (max RAM + SSD/NVMe)
  • Host 2: GUI web server
  • Host 3: Sensor(s)

See Scaling - Component Separation.

Common Mistakes

⚠️ Warning: These do NOT fix database bottlenecks:

Wrong Action Why It Fails
Reducing PHP memory_limit PHP waits for DB; less memory = earlier crashes
Adding more PHP-FPM workers More workers pile up waiting for slow DB
Reducing innodb_buffer_pool_size Makes DB slower, increases disk I/O
Adding RAM to GUI server Bottleneck is DB server, not GUI

Verification

After implementing fix:

  1. Monitor SQL cache during next peak period
  2. Verify SQL cache does NOT grow uncontrollably
  3. Confirm GUI responsiveness
  4. Check for OOM killer events in system logs

See Also

AI Summary for RAG

Summary: Emergency guide for diagnosing database bottlenecks affecting VoIPmonitor GUI using sensor RRD charts. Symptoms: GUI unresponsive during peak hours, OOM killer terminating PHP, slow dashboards. KEY DIAGNOSTIC: Access RRD charts (Settings → Sensors → graph icon), look for growing SQL cache during peak hours - primary indicator of DB bottleneck. Bottleneck types: CPU (mysqld at 100%), Memory (buffer pool full), Storage I/O (most common - high iowait, HDD storage). Solutions: (1) Add RAM and set innodb_buffer_pool_size to 50-70% of RAM; (2) Upgrade HDD to SSD/NVMe (10-50x speedup); (3) Schedule alerts/reports off-peak; (4) Component separation with dedicated DB server. WRONG solutions: Do NOT reduce PHP memory_limit, do NOT add PHP-FPM workers, do NOT reduce innodb_buffer_pool_size, do NOT add RAM to GUI server.

Keywords: database bottleneck, RRD charts, SQL cache, peak hours, OOM killer, GUI slow, GUI unresponsive, innodb_buffer_pool_size, SSD upgrade, NVMe, iowait, component separation, emergency procedures, performance crisis

Key Questions:

  • How to diagnose database bottlenecks in VoIPmonitor?
  • What does growing SQL cache in RRD charts indicate?
  • Why is VoIPmonitor GUI slow during peak hours?
  • How to fix OOM killer terminating PHP processes?
  • Should I upgrade RAM on GUI or database server?
  • What storage is recommended for VoIPmonitor database?
  • How to access sensor RRD charts?
  • What are wrong solutions for database bottlenecks?