High availability redundancy failover

From VoIPmonitor.org
Revision as of 21:39, 30 June 2025 by Festr (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


This is an expert-level guide for creating a high-availability (HA) VoIPmonitor cluster using an Active-Passive failover model. This architecture ensures that if your primary sensor server fails, a secondary server will automatically and seamlessly take over its responsibilities.

Overview: How Active-Passive Failover Works

This setup consists of two identical VoIPmonitor servers that share a single "virtual" IP address.

  • Two Identical Nodes: Both servers (Node 1 and Node 2) receive the exact same mirrored network traffic from your switch (or other source). Both run the VoIPmonitor sensor.
  • One Active Node: At any given time, only one server is the Active (or Master) node. This node holds the Virtual IP (VIP) and is the only one actively writing Call Detail Records (CDRs) to the database.
  • One Passive Node: The other server is Passive (or Backup). It is sniffing traffic and writing PCAP files to its local disk, but its CDR writing is disabled to prevent database conflicts.
  • Keepalived & The Heartbeat: A small, efficient service called `keepalived` runs on both nodes. The Active node constantly sends "I'm alive" heartbeat messages to the Passive node.
  • Automatic Failover: If the Passive node stops receiving heartbeats, it assumes the Active node has failed. It immediately takes over the Virtual IP and runs a script to enable CDR writing, thus seamlessly becoming the new Active node.

This architecture requires a Master-Master MySQL/MariaDB replication setup, ensuring both nodes can write to the database without causing conflicts when a failover occurs.

Advantages vs. Disadvantages

  • Pros: Simpler to set up and manage than a full database cluster like Galera. It has a very clear and predictable failover path.
  • Cons: Only one node is "active" at a time (no load balancing). Failover time, while fast (a few seconds), is not instantaneous.

Architectural Diagram

Node 1 (Primary)
IP: 10.0.0.1
State: Initially ACTIVE
Node 2 (Secondary)
IP: 10.0.0.2
State: Initially PASSIVE
Shared Virtual IP
VIP: 10.0.0.128 (This is the IP your GUI and other services will use)
Network
Both nodes are connected to the same mirrored traffic source. They are also connected to each other, preferably via a direct, dedicated crossover cable for the heartbeat signal to ensure reliability.

Step 1: System Preparation (Both Nodes)

Before configuring `keepalived`, prepare both servers.

1. Install Keepalived
# For Debian/Ubuntu
sudo apt-get update && sudo apt-get install keepalived

# For CentOS/RHEL/AlmaLinux
sudo yum install keepalived
2. Allow Binding to a Non-Local IP

This kernel parameter is required for `keepalived` to manage the Virtual IP. Edit `/etc/sysctl.conf` and add this line:

net.ipv4.ip_nonlocal_bind=1

Apply the change immediately without rebooting:

sudo sysctl -p

Step 2: Create the Failover Script (Both Nodes)

`keepalived` will execute this script whenever a node's state changes (e.g., from Backup to Master). This script uses VoIPmonitor's manager API to enable or disable CDR writing.

Create the script file
sudo nano /etc/keepalived/voipmonitor_failover.sh
Copy and paste the following content
#!/bin/bash
#
# This script is managed by keepalived to control the VoIPmonitor CDR writing state.
#

TYPE=$1  # "GROUP" or "INSTANCE"
NAME=$2  # The name of the group or instance
STATE=$3 # "MASTER", "BACKUP", or "FAULT"

LOG_FILE="/var/log/keepalived-voipmonitor.log"
MANAGER_PORT=5029 # Ensure this matches your voipmonitor.conf

log() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE"
}

case $STATE in
    "MASTER")
        log "State changed to MASTER. Enabling CDR writing."
        echo "enablecdr" | nc localhost "$MANAGER_PORT"
        exit 0
        ;;
    "BACKUP")
        log "State changed to BACKUP. Disabling CDR writing."
        echo "disablecdr" | nc localhost "$MANAGER_PORT"
        exit 0
        ;;
    "FAULT")
        log "State changed to FAULT. Disabling CDR writing as a precaution."
        echo "disablecdr" | nc localhost "$MANAGER_PORT"
        exit 0
        ;;
    *)
        log "Unknown state '$STATE' received. Exiting."
        exit 1
        ;;
esac
Make the script executable
sudo chmod +x /etc/keepalived/voipmonitor_failover.sh

Step 3: Configure Keepalived

The configuration is nearly identical for both nodes, with only the `priority` needing to be different.

Configuration for Node 1 (Primary/Master)

Edit `/etc/keepalived/keepalived.conf`
# /etc/keepalived/keepalived.conf on NODE 1

global_defs {
   router_id voipmonitor_ha
}

vrrp_script check_voipmonitor {
    script "/usr/bin/pgrep voipmonitor" # Check if the voipmonitor process is running
    interval 2                         # Check every 2 seconds
    weight 20                          # If this script fails, reduce priority by 20
}

vrrp_instance VI_1 {
    state MASTER            # Start as the MASTER node
    interface eth0          # Network interface for heartbeats
    virtual_router_id 51    # Must be the same on both nodes
    priority 150            # Higher priority becomes master
    advert_int 1            # Send heartbeat every 1 second
    
    authentication {
        auth_type PASS
        auth_pass your_secret_password
    }
    
    virtual_ipaddress {
        10.0.0.128/24 dev eth0
    }

    track_script {
        check_voipmonitor
    }

    notify /etc/keepalived/voipmonitor_failover.sh
}

Configuration for Node 2 (Secondary/Backup)

Edit `/etc/keepalived/keepalived.conf` on Node 2. It is identical except for two lines
# /etc/keepalived/keepalived.conf on NODE 2

vrrp_instance VI_1 {
    state BACKUP            # Start as the BACKUP node
    interface eth0
    virtual_router_id 51
    priority 100            # LOWER priority than the master
    # ... rest of the file is identical ...
}

Step 4: Configure VoIPmonitor

Finally, on both nodes, edit `/etc/voipmonitor.conf` to disable CDR writing by default. `keepalived` will be responsible for enabling it on the active node.

# /etc/voipmonitor.conf on BOTH nodes

# Start with CDR writing disabled. The failover script will enable it on the active node.
nocdr = yes

Step 5: Start and Test

Start and enable the `keepalived` service on both nodes
sudo systemctl start keepalived
sudo systemctl enable keepalived

You can test the failover by rebooting the primary node (`reboot` on Node 1) or by stopping its `keepalived` service. You should see the Virtual IP `10.0.0.128` automatically appear on Node 2, and the log file `/var/log/keepalived-voipmonitor.log` should show that it has transitioned to the MASTER state and enabled CDRs.

AI Summary for RAG

Summary: This guide provides a detailed tutorial for setting up a high-availability (HA) Active-Passive VoIPmonitor cluster using `keepalived`. It explains the architecture, where two identical sensors receive the same mirrored traffic, but only one "Active" node holds a shared Virtual IP (VIP) and writes CDRs to a master-master replicated database. The guide replaces the outdated `heartbeat` software with the modern `keepalived` service. The process is broken down into clear steps: 1) Preparing the system by installing `keepalived` and setting the `net.ipv4.ip_nonlocal_bind` sysctl parameter. 2) Creating a `voipmonitor_failover.sh` script that `keepalived` uses to enable or disable CDR writing via the manager API. 3) Providing complete `keepalived.conf` examples for both the primary (MASTER) and secondary (BACKUP) nodes, highlighting the use of different priorities. 4) Configuring VoIPmonitor itself with `nocdr=yes` to ensure CDR writing is disabled by default and only activated by the failover script. Finally, it explains how to start the services and test the failover mechanism. Keywords: high availability, ha, failover, active-passive, keepalived, heartbeat, cluster, redundancy, virtual ip, vip, floating ip, master-master replication, `vrrp_instance`, `ip_nonlocal_bind`, `nocdr`, manager api Key Questions:

  • How can I set up a high-availability failover for VoIPmonitor?
  • What is an Active-Passive cluster and how does it work?
  • How to configure `keepalived` for VoIPmonitor?
  • What is a Virtual IP (VIP) and how does it provide failover?
  • What is the modern alternative to the `heartbeat` service on Linux?
  • How do I test my `keepalived` failover setup?