Emergency procedures: Difference between revisions
(Create new page for emergency recovery procedures) |
(Rewrite: streamlined structure, improved flowchart, clearer diagnostic steps) |
||
| (12 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
{{DISPLAYTITLE:Emergency Procedures | {{DISPLAYTITLE:Emergency Procedures: GUI Performance Crisis}} | ||
[[Category:Troubleshooting]] | |||
[[Category:Database]] | |||
= GUI Performance Crisis: Database Bottleneck Diagnosis = | |||
When VoIPmonitor GUI becomes unresponsive or PHP processes are killed by OOM, the root cause is often a '''database bottleneck''', not PHP configuration. This guide shows how to diagnose using sensor RRD charts. | |||
{{Note|For general troubleshooting, see [[Database_troubleshooting]], [[GUI_troubleshooting]], or [[Sniffer_troubleshooting]].}} | |||
== | == Symptoms == | ||
* GUI extremely slow or unresponsive during peak hours | |||
* PHP processes killed by OOM killer | |||
* Dashboard and CDR views take long to load | |||
* Alerts/reports fail during high traffic | |||
* System fine during off-peak, degrades during peak | |||
== Diagnostic Flowchart == | |||
<kroki lang="mermaid"> | |||
%%{init: {'flowchart': {'nodeSpacing': 15, 'rankSpacing': 35}}}%% | |||
</ | flowchart TD | ||
A[GUI Slow / OOM Errors] --> B{Check RRD Charts<br/>Settings → Sensors → 📊} | |||
B --> C{SQL Cache growing<br/>during peak?} | |||
C -->|No| D[NOT database bottleneck<br/>Check PHP/Apache config] | |||
C -->|Yes| E[Database Bottleneck<br/>Confirmed] | |||
E --> F{mysqld CPU ~100%?} | |||
F -->|Yes| G[CPU Bottleneck<br/>→ Upgrade CPU] | |||
F -->|No| H{High iowait?<br/>HDD storage?} | |||
H -->|Yes| I[I/O Bottleneck<br/>→ Upgrade to SSD/NVMe] | |||
H -->|No| J[Memory Bottleneck<br/>→ Add RAM + tune buffer_pool] | |||
style A fill:#f9f,stroke:#333 | |||
style E fill:#ff9,stroke:#333 | |||
</kroki> | |||
== Step 1: Access RRD Charts == | |||
# Navigate to '''Settings → Sensors''' | |||
# Click the '''graph icon''' (📊) next to the sensor | |||
# Select time range covering problematic peak hours | |||
== Step 2: Identify Growing SQL Cache == | |||
The key indicator is '''SQL Cache''' or '''SQL Cache Files''' growing during peak hours: | |||
{| class="wikitable" | |||
|- | |||
! Metric !! What to Look For !! Indicates | |||
|- | |||
| '''SQL Cache''' || Consistently increasing, never decreasing || DB cannot keep up with inserts | |||
|- | |||
| '''SQL Cache Files''' || Growing over time || Buffer pool too small or storage too slow | |||
|- | |||
| '''mysqld CPU''' || Near 100% || CPU bottleneck | |||
|- | |||
| '''Disk I/O (mysql)''' || High/saturated || Storage bottleneck (HDD vs SSD) | |||
|} | |||
= | {{Warning|1=If SQL Cache is NOT growing, the problem is likely NOT the database. Check PHP/Apache configuration instead.}} | ||
== Step 3: Identify Bottleneck Type == | |||
< | === CPU Bottleneck === | ||
* <code>mysqld</code> at or near 100% CPU | |||
* '''Solution:''' More CPU cores or faster CPU | |||
=== Memory Bottleneck === | |||
* SQL cache fills up and stays full | |||
* Buffer pool too small for dataset | |||
* '''Solution:''' Add RAM, tune <code>innodb_buffer_pool_size</code> | |||
=== Storage I/O Bottleneck (Most Common) === | |||
* High <code>iowait</code> during peak hours | |||
* Database on magnetic disks (HDD) | |||
* '''Solution:''' Upgrade to SSD/NVMe (10-50x improvement) | |||
== Solutions == | |||
=== Add RAM to Database Server === | |||
= | <syntaxhighlight lang="ini"> | ||
# /etc/mysql/my.cnf | |||
# Set to 50-70% of total RAM on dedicated DB server | |||
innodb_buffer_pool_size = 64G | |||
</syntaxhighlight> | </syntaxhighlight> | ||
{| class="wikitable" | |||
|- | |||
! Current RAM !! Recommended !! <code>innodb_buffer_pool_size</code> | |||
|- | |||
| 32GB || 64-128GB || 32-64G | |||
|- | |||
| 64GB || 128-256GB || 64-128G | |||
|} | |||
== | === Upgrade to SSD/NVMe === | ||
{| class="wikitable" | |||
|- | |||
! Current !! Upgrade To !! Expected Speedup | |||
|- | |||
| 10K RPM HDD || NVMe SSD || 10-50x | |||
|- | |||
| SAS HDD || Enterprise SSD || 5-20x | |||
|- | |||
| Older SSD || NVMe (PCIe 4.0+) || 2-5x | |||
|} | |||
=== Temporary Mitigation === | |||
If immediate hardware upgrade not possible: | |||
* '''Alerts:''' Reduce frequency or schedule during off-peak (2am-4am) | |||
* '''Reports:''' Schedule for off-peak hours | |||
* '''Dashboards:''' Simplify queries, avoid "All time" ranges | |||
=== Component Separation === | |||
For persistent issues, consider dedicated database server: | |||
* '''Host 1:''' Database (max RAM + SSD/NVMe) | |||
* '''Host 2:''' GUI web server | |||
* '''Host 3:''' Sensor(s) | |||
See [[Scaling#Scaling_Through_Component_Separation|Scaling - Component Separation]]. | |||
== | == Common Mistakes == | ||
{{Warning|1=These do NOT fix database bottlenecks:}} | |||
{| class="wikitable" | |||
|- | |||
! Wrong Action !! Why It Fails | |||
|- | |||
| Reducing PHP <code>memory_limit</code> || PHP waits for DB; less memory = earlier crashes | |||
|- | |||
| Adding more PHP-FPM workers || More workers pile up waiting for slow DB | |||
|- | |||
| Reducing <code>innodb_buffer_pool_size</code> || Makes DB slower, increases disk I/O | |||
|- | |||
| Adding RAM to GUI server || Bottleneck is DB server, not GUI | |||
|} | |||
== Verification == | |||
After implementing fix: | |||
# Monitor SQL cache during next peak period | |||
# Verify SQL cache does NOT grow uncontrollably | |||
# Confirm GUI responsiveness | |||
# Check for OOM killer events in system logs | |||
== | == See Also == | ||
* [[ | * [[Database_troubleshooting]] - SQL queue issues, CDR delays | ||
* [[ | * [[Scaling]] - Performance tuning and database optimization | ||
* [[GUI_troubleshooting]] - HTTP 500, login issues, debug mode | |||
* [[ | |||
== AI Summary for RAG == | == AI Summary for RAG == | ||
'''Summary:''' | '''Summary:''' Emergency guide for diagnosing database bottlenecks affecting VoIPmonitor GUI using sensor RRD charts. Symptoms: GUI unresponsive during peak hours, OOM killer terminating PHP, slow dashboards. KEY DIAGNOSTIC: Access RRD charts (Settings → Sensors → graph icon), look for growing SQL cache during peak hours - primary indicator of DB bottleneck. Bottleneck types: CPU (mysqld at 100%), Memory (buffer pool full), Storage I/O (most common - high iowait, HDD storage). Solutions: (1) Add RAM and set innodb_buffer_pool_size to 50-70% of RAM; (2) Upgrade HDD to SSD/NVMe (10-50x speedup); (3) Schedule alerts/reports off-peak; (4) Component separation with dedicated DB server. WRONG solutions: Do NOT reduce PHP memory_limit, do NOT add PHP-FPM workers, do NOT reduce innodb_buffer_pool_size, do NOT add RAM to GUI server. | ||
'''Keywords:''' | '''Keywords:''' database bottleneck, RRD charts, SQL cache, peak hours, OOM killer, GUI slow, GUI unresponsive, innodb_buffer_pool_size, SSD upgrade, NVMe, iowait, component separation, emergency procedures, performance crisis | ||
'''Key Questions:''' | '''Key Questions:''' | ||
* | * How to diagnose database bottlenecks in VoIPmonitor? | ||
* | * What does growing SQL cache in RRD charts indicate? | ||
* | * Why is VoIPmonitor GUI slow during peak hours? | ||
* How to | * How to fix OOM killer terminating PHP processes? | ||
* | * Should I upgrade RAM on GUI or database server? | ||
* What | * What storage is recommended for VoIPmonitor database? | ||
* How to | * How to access sensor RRD charts? | ||
* What | * What are wrong solutions for database bottlenecks? | ||
Latest revision as of 16:48, 8 January 2026
GUI Performance Crisis: Database Bottleneck Diagnosis
When VoIPmonitor GUI becomes unresponsive or PHP processes are killed by OOM, the root cause is often a database bottleneck, not PHP configuration. This guide shows how to diagnose using sensor RRD charts.
ℹ️ Note: For general troubleshooting, see Database_troubleshooting, GUI_troubleshooting, or Sniffer_troubleshooting.
Symptoms
- GUI extremely slow or unresponsive during peak hours
- PHP processes killed by OOM killer
- Dashboard and CDR views take long to load
- Alerts/reports fail during high traffic
- System fine during off-peak, degrades during peak
Diagnostic Flowchart
Step 1: Access RRD Charts
- Navigate to Settings → Sensors
- Click the graph icon (📊) next to the sensor
- Select time range covering problematic peak hours
Step 2: Identify Growing SQL Cache
The key indicator is SQL Cache or SQL Cache Files growing during peak hours:
| Metric | What to Look For | Indicates |
|---|---|---|
| SQL Cache | Consistently increasing, never decreasing | DB cannot keep up with inserts |
| SQL Cache Files | Growing over time | Buffer pool too small or storage too slow |
| mysqld CPU | Near 100% | CPU bottleneck |
| Disk I/O (mysql) | High/saturated | Storage bottleneck (HDD vs SSD) |
⚠️ Warning: If SQL Cache is NOT growing, the problem is likely NOT the database. Check PHP/Apache configuration instead.
Step 3: Identify Bottleneck Type
CPU Bottleneck
mysqldat or near 100% CPU- Solution: More CPU cores or faster CPU
Memory Bottleneck
- SQL cache fills up and stays full
- Buffer pool too small for dataset
- Solution: Add RAM, tune
innodb_buffer_pool_size
Storage I/O Bottleneck (Most Common)
- High
iowaitduring peak hours - Database on magnetic disks (HDD)
- Solution: Upgrade to SSD/NVMe (10-50x improvement)
Solutions
Add RAM to Database Server
# /etc/mysql/my.cnf
# Set to 50-70% of total RAM on dedicated DB server
innodb_buffer_pool_size = 64G
| Current RAM | Recommended | innodb_buffer_pool_size
|
|---|---|---|
| 32GB | 64-128GB | 32-64G |
| 64GB | 128-256GB | 64-128G |
Upgrade to SSD/NVMe
| Current | Upgrade To | Expected Speedup |
|---|---|---|
| 10K RPM HDD | NVMe SSD | 10-50x |
| SAS HDD | Enterprise SSD | 5-20x |
| Older SSD | NVMe (PCIe 4.0+) | 2-5x |
Temporary Mitigation
If immediate hardware upgrade not possible:
- Alerts: Reduce frequency or schedule during off-peak (2am-4am)
- Reports: Schedule for off-peak hours
- Dashboards: Simplify queries, avoid "All time" ranges
Component Separation
For persistent issues, consider dedicated database server:
- Host 1: Database (max RAM + SSD/NVMe)
- Host 2: GUI web server
- Host 3: Sensor(s)
See Scaling - Component Separation.
Common Mistakes
⚠️ Warning: These do NOT fix database bottlenecks:
| Wrong Action | Why It Fails |
|---|---|
Reducing PHP memory_limit |
PHP waits for DB; less memory = earlier crashes |
| Adding more PHP-FPM workers | More workers pile up waiting for slow DB |
Reducing innodb_buffer_pool_size |
Makes DB slower, increases disk I/O |
| Adding RAM to GUI server | Bottleneck is DB server, not GUI |
Verification
After implementing fix:
- Monitor SQL cache during next peak period
- Verify SQL cache does NOT grow uncontrollably
- Confirm GUI responsiveness
- Check for OOM killer events in system logs
See Also
- Database_troubleshooting - SQL queue issues, CDR delays
- Scaling - Performance tuning and database optimization
- GUI_troubleshooting - HTTP 500, login issues, debug mode
AI Summary for RAG
Summary: Emergency guide for diagnosing database bottlenecks affecting VoIPmonitor GUI using sensor RRD charts. Symptoms: GUI unresponsive during peak hours, OOM killer terminating PHP, slow dashboards. KEY DIAGNOSTIC: Access RRD charts (Settings → Sensors → graph icon), look for growing SQL cache during peak hours - primary indicator of DB bottleneck. Bottleneck types: CPU (mysqld at 100%), Memory (buffer pool full), Storage I/O (most common - high iowait, HDD storage). Solutions: (1) Add RAM and set innodb_buffer_pool_size to 50-70% of RAM; (2) Upgrade HDD to SSD/NVMe (10-50x speedup); (3) Schedule alerts/reports off-peak; (4) Component separation with dedicated DB server. WRONG solutions: Do NOT reduce PHP memory_limit, do NOT add PHP-FPM workers, do NOT reduce innodb_buffer_pool_size, do NOT add RAM to GUI server.
Keywords: database bottleneck, RRD charts, SQL cache, peak hours, OOM killer, GUI slow, GUI unresponsive, innodb_buffer_pool_size, SSD upgrade, NVMe, iowait, component separation, emergency procedures, performance crisis
Key Questions:
- How to diagnose database bottlenecks in VoIPmonitor?
- What does growing SQL cache in RRD charts indicate?
- Why is VoIPmonitor GUI slow during peak hours?
- How to fix OOM killer terminating PHP processes?
- Should I upgrade RAM on GUI or database server?
- What storage is recommended for VoIPmonitor database?
- How to access sensor RRD charts?
- What are wrong solutions for database bottlenecks?