Emergency procedures

From VoIPmonitor.org
Revision as of 15:51, 4 January 2026 by Admin (talk | contribs) (Create new page for emergency recovery procedures)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


This guide covers emergency procedures for recovering your VoIPmonitor system from critical failures, including runaway processes, high CPU usage, and system unresponsiveness.

Emergency: VoIPmonitor Process Consuming Excessive CPU or System Unresponsive

When a VoIPmonitor process consumes excessive CPU (e.g., ~3000% or more) or causes the entire system to become unresponsive, follow these immediate steps:

Immediate Action: Force-Terminate Runaway Process

If the system is still minimally responsive via SSH or requires out-of-band management (iDRAC, IPMI, console):

1. Identify the Process ID (PID)
# Using htop (if available)
htop

# Or using ps
ps aux | grep voipmonitor

Look for the voipmonitor process consuming the most CPU resources. Note down the PID (process ID number).

2. Forcefully terminate the process
kill -9 <PID>

Replace <PID> with the actual process ID number identified in step 1.

3. Verify system recovery
# Check CPU usage has returned to normal
top

# Check if the process was terminated
ps aux | grep voipmonitor

The system should become responsive again immediately after the process is killed. CPU utilization should drop significantly.

Optional: Stop and Restart the Service (for persistent issues)

If the problem persists or the service needs to be cleanly restarted:

# Stop the voipmonitor service
systemctl stop voipmonitor

# Verify no zombie processes remaining
killall voipmonitor

# Restart the service
systemctl start voipmonitor

# Verify service status
systemctl status voipmonitor

Caution: When using systemd service management, avoid using the deprecated service command as it can cause systemd to lose track of the daemon. Always use systemctl commands or direct process commands like killall.

Root Cause Analysis: Why Did the CPU Spike?

After recovering the system, investigate the root cause to prevent recurrence. Common causes include:

SIP REGISTER Flood / Spaming Attack

Massive volumes of SIP REGISTER messages from malicious IPs can overwhelm the VoIPmonitor process.

  • Detection: Check recent alert triggers in the VoIPmonitor GUI > Alerts > Sent Alerts for SIP REGISTER flood alerts
  • Immediate mitigation: Block attacker IPs at the network edge (SBC, firewall, iptables)
  • Long-term prevention: Configure anti-fraud rules with custom scripts to auto-block, see SIP REGISTER Flood Mitigation
Packet Capture Overload (pcapcommand)

The pcapcommand feature forks a program for every call, which can generate up to 500,000 interrupts per second.

  • Detection: Check /etc/voipmonitor.conf for a pcapcommand line
  • Immediate fix: Comment out or remove the pcapcommand directive and restart the service
  • Alternative: Use the built-in cleaning spool functionality (maxpoolsize, cleanspool) instead
Excessive RTP Processing Threads

High concurrent call volumes can overload RTP processing threads.

  • Detection: Check performance logs for high tRTP_CPU values (sum of all RTP threads)
  • Mitigation:
callslimit = 2000  # Limit max concurrent calls
Audio Feature Overhead

Silence detection and audio conversion are CPU-intensive operations.

  • Detection: Check if silencedetect or saveaudio are enabled
  • Mitigation:
  silencedetect = no
  # saveaudio = wav  # Comment out if not needed
  

See Scaling and Performance Tuning for detailed performance optimization strategies.

Preventive Measures

Once the root cause is identified, implement these preventive configurations:

Monitor CPU Trends

Use collectd or your existing monitoring system to track CPU usage over time and receive alerts before critical thresholds are reached.

Anti-Fraud Auto-Blocking

Configure Anti-Fraud rules with custom scripts to automatically block attacker IPs when a flood is detected. See the Anti-Fraud documentation for PHP script examples using iptables or ipset.

Network Edge Protection

Block SIP REGISTER spam and floods at your network edge (SBC, firewall) before traffic reaches VoIPmonitor. This provides better performance and reduces CPU load on the monitoring system.

Emergency: System Freezes on Every Update Attempt

If the VoIPmonitor sensor becomes unresponsive or hangs each time you attempt to update it through the Web GUI:

1. SSH into the sensor host
2. Execute the following commands to forcefully stop and restart
killall voipmonitor
systemctl stop voipmonitor
systemctl start voipmonitor

This sequence ensures zombie processes are terminated, systemd is fully stopped, and a clean service restart occurs. Verify the sensor status in the GUI to confirm it is responding correctly.

Emergency: Binary Not Found After Crash

If the VoIPmonitor service fails to start after a crash with error "Binary not found" for /usr/local/sbin/voipmonitor:

1. Check for a renamed binary
ls -l /usr/local/sbin/voipmonitor_*

The crash recovery process may have renamed the binary with an underscore suffix.

2. If found, rename it back
mv /usr/local/sbin/voipmonitor_ /usr/local/sbin/voipmonitor
3. Restart the service
systemctl start voipmonitor
systemctl status voipmonitor
Verify the service starts correctly.

Out-of-Band Management Scenarios

When the system is completely unresponsive and cannot be accessed via SSH:

  • Use your server's out-of-band management system:
 * Dell iDRAC
 * HP iLO
 * Supermicro IPMI
 * Other vendor-specific BMC/management tools
  • Actions available via OBM:
 * Access virtual console (KVM-over-IP)
 * Send NMI (Non-Maskable Interrupt) for system dump
 * Force power cycle
 * Monitor hardware health

See Sniffer Troubleshooting for more diagnostic procedures.

Related Documentation

AI Summary for RAG

Summary: This article provides emergency procedures for recovering VoIPmonitor from critical failures. It covers steps to force-terminate runaway processes consuming excessive CPU (including kill -9 and systemctl commands), root cause analysis for CPU spikes (SIP REGISTER floods, pcapcommand, RTP threads, audio features), preventive measures (monitoring, anti-fraud auto-blocking, network edge protection), recovery procedures for system freezes during updates and binary issues after crashes, and out-of-band management scenarios.

Keywords: emergency recovery, high CPU, system unresponsive, runaway process, kill process, kill -9, systemctl, SIP REGISTER flood, pcapcommand, performance optimization, out-of-band management, iDRAC, iLO, IPMI, crash recovery

Key Questions:

  • What to do when VoIPmonitor consumes 3000% CPU or system becomes unresponsive?
  • How to forcefully terminate a runaway VoIPmonitor process?
  • What are common causes of CPU spikes in VoIPmonitor?
  • How to mitigate SIP REGISTER flood attacks causing high CPU?
  • How to restart VoIPmonitor service after a crash?
  • What to do if service binary is not found after crash?
  • How to prevent VoIPmonitor from freezing during GUI updates?
  • What tools can help diagnose VoIPmonitor performance issues?