High availability redundancy failover: Difference between revisions
No edit summary |
|||
Line 1: | Line 1: | ||
{{DISPLAYTITLE:High Availability with Keepalived (Active-Passive Failover)}} | |||
This guide | '''This is an expert-level guide for creating a high-availability (HA) VoIPmonitor cluster using an Active-Passive failover model. This architecture ensures that if your primary sensor server fails, a secondary server will automatically and seamlessly take over its responsibilities.''' | ||
== | == Overview: How Active-Passive Failover Works == | ||
This setup consists of two identical VoIPmonitor servers that share a single "virtual" IP address. | |||
*'''Two Identical Nodes:''' Both servers (Node 1 and Node 2) receive the exact same mirrored network traffic from your switch (or other source). Both run the VoIPmonitor sensor. | |||
*'''One Active Node:''' At any given time, only one server is the '''Active''' (or Master) node. This node holds the '''Virtual IP (VIP)''' and is the only one actively writing Call Detail Records (CDRs) to the database. | |||
*'''One Passive Node:''' The other server is '''Passive''' (or Backup). It is sniffing traffic and writing PCAP files to its local disk, but its CDR writing is disabled to prevent database conflicts. | |||
*'''Keepalived & The Heartbeat:''' A small, efficient service called `keepalived` runs on both nodes. The Active node constantly sends "I'm alive" heartbeat messages to the Passive node. | |||
*'''Automatic Failover:''' If the Passive node stops receiving heartbeats, it assumes the Active node has failed. It immediately takes over the Virtual IP and runs a script to enable CDR writing, thus seamlessly becoming the new Active node. | |||
This architecture requires a '''Master-Master MySQL/MariaDB replication''' setup, ensuring both nodes can write to the database without causing conflicts when a failover occurs. | |||
== | === Advantages vs. Disadvantages === | ||
*'''Pros:''' Simpler to set up and manage than a full database cluster like Galera. It has a very clear and predictable failover path. | |||
*'''Cons:''' Only one node is "active" at a time (no load balancing). Failover time, while fast (a few seconds), is not instantaneous. | |||
== Architectural Diagram == | |||
;Node 1 (Primary) | |||
:'''IP:''' 10.0.0.1 | |||
:'''State:''' Initially '''ACTIVE''' | |||
;Node 2 (Secondary) | |||
:'''IP:''' 10.0.0.2 | |||
:'''State:''' Initially '''PASSIVE''' | |||
;Shared Virtual IP | |||
:'''VIP:''' 10.0.0.128 (This is the IP your GUI and other services will use) | |||
;Network | |||
:Both nodes are connected to the same mirrored traffic source. They are also connected to each other, preferably via a direct, dedicated crossover cable for the heartbeat signal to ensure reliability. | |||
== Step 1: System Preparation (Both Nodes) == | |||
Before configuring `keepalived`, prepare both servers. | |||
;1. Install Keepalived: | |||
<pre> | |||
# For Debian/Ubuntu | |||
sudo apt-get update && sudo apt-get install keepalived | |||
# For CentOS/RHEL/AlmaLinux | |||
sudo yum install keepalived | |||
</pre> | |||
;2. Allow Binding to a Non-Local IP: | |||
This kernel parameter is required for `keepalived` to manage the Virtual IP. Edit `/etc/sysctl.conf` and add this line: | |||
<pre>net.ipv4.ip_nonlocal_bind=1</pre> | |||
Apply the change immediately without rebooting: | |||
<pre>sudo sysctl -p</pre> | |||
== Step 2: Create the Failover Script (Both Nodes) == | |||
`keepalived` will execute this script whenever a node's state changes (e.g., from Backup to Master). This script uses VoIPmonitor's manager API to enable or disable CDR writing. | |||
;Create the script file: | |||
<pre>sudo nano /etc/keepalived/voipmonitor_failover.sh</pre> | |||
;Copy and paste the following content: | |||
<pre> | |||
#!/bin/bash | |||
# | |||
# This script is managed by keepalived to control the VoIPmonitor CDR writing state. | |||
# | |||
TYPE=$1 # "GROUP" or "INSTANCE" | |||
# | NAME=$2 # The name of the group or instance | ||
# | STATE=$3 # "MASTER", "BACKUP", or "FAULT" | ||
= | LOG_FILE="/var/log/keepalived-voipmonitor.log" | ||
MANAGER_PORT=5029 # Ensure this matches your voipmonitor.conf | |||
log() { | |||
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE" | |||
} | |||
case $STATE in | |||
"MASTER") | |||
log "State changed to MASTER. Enabling CDR writing." | |||
echo "enablecdr" | nc localhost "$MANAGER_PORT" | |||
exit 0 | |||
;; | |||
"BACKUP") | |||
log "State changed to BACKUP. Disabling CDR writing." | |||
echo "disablecdr" | nc localhost "$MANAGER_PORT" | |||
exit 0 | |||
;; | |||
"FAULT") | |||
log "State changed to FAULT. Disabling CDR writing as a precaution." | |||
echo "disablecdr" | nc localhost "$MANAGER_PORT" | |||
exit 0 | |||
;; | |||
*) | |||
log "Unknown state '$STATE' received. Exiting." | |||
exit 1 | |||
;; | |||
esac | |||
</pre> | |||
;Make the script executable: | |||
<pre>sudo chmod +x /etc/keepalived/voipmonitor_failover.sh</pre> | |||
== Step 3: Configure Keepalived == | |||
The configuration is nearly identical for both nodes, with only the `priority` needing to be different. | |||
=== Configuration for Node 1 (Primary/Master) === | |||
;Edit `/etc/keepalived/keepalived.conf`: | |||
<pre> | |||
# /etc/keepalived/keepalived.conf on NODE 1 | |||
global_defs { | |||
router_id voipmonitor_ha | |||
} | |||
vrrp_script check_voipmonitor { | |||
script "/usr/bin/pgrep voipmonitor" # Check if the voipmonitor process is running | |||
interval 2 # Check every 2 seconds | |||
weight 20 # If this script fails, reduce priority by 20 | |||
} | |||
vrrp_instance VI_1 { | |||
state MASTER # Start as the MASTER node | |||
interface eth0 # Network interface for heartbeats | |||
virtual_router_id 51 # Must be the same on both nodes | |||
priority 150 # Higher priority becomes master | |||
advert_int 1 # Send heartbeat every 1 second | |||
authentication { | |||
auth_type PASS | |||
auth_pass your_secret_password | |||
} | |||
virtual_ipaddress { | |||
10.0.0.128/24 dev eth0 | |||
} | |||
track_script { | |||
check_voipmonitor | |||
} | |||
notify /etc/keepalived/voipmonitor_failover.sh | |||
} | |||
</pre> | |||
=== Configuration for Node 2 (Secondary/Backup) === | |||
;Edit `/etc/keepalived/keepalived.conf` on Node 2. It is identical except for two lines: | |||
<pre> | |||
# /etc/keepalived/keepalived.conf on NODE 2 | |||
vrrp_instance VI_1 { | |||
state BACKUP # Start as the BACKUP node | |||
interface eth0 | |||
virtual_router_id 51 | |||
priority 100 # LOWER priority than the master | |||
# ... rest of the file is identical ... | |||
} | |||
</pre> | |||
== Step 4: Configure VoIPmonitor == | |||
Finally, on '''both''' nodes, edit `/etc/voipmonitor.conf` to disable CDR writing by default. `keepalived` will be responsible for enabling it on the active node. | |||
<pre> | |||
# /etc/voipmonitor.conf on BOTH nodes | |||
# Start with CDR writing disabled. The failover script will enable it on the active node. | |||
nocdr = yes | |||
</pre> | |||
== Step 5: Start and Test == | |||
;Start and enable the `keepalived` service on both nodes: | |||
<pre> | |||
sudo systemctl start keepalived | |||
sudo systemctl enable keepalived | |||
</pre> | |||
You can test the failover by rebooting the primary node (`reboot` on Node 1) or by stopping its `keepalived` service. You should see the Virtual IP `10.0.0.128` automatically appear on Node 2, and the log file `/var/log/keepalived-voipmonitor.log` should show that it has transitioned to the MASTER state and enabled CDRs. | |||
== AI Summary for RAG == | |||
'''Summary:''' This guide provides a detailed tutorial for setting up a high-availability (HA) Active-Passive VoIPmonitor cluster using `keepalived`. It explains the architecture, where two identical sensors receive the same mirrored traffic, but only one "Active" node holds a shared Virtual IP (VIP) and writes CDRs to a master-master replicated database. The guide replaces the outdated `heartbeat` software with the modern `keepalived` service. The process is broken down into clear steps: 1) Preparing the system by installing `keepalived` and setting the `net.ipv4.ip_nonlocal_bind` sysctl parameter. 2) Creating a `voipmonitor_failover.sh` script that `keepalived` uses to enable or disable CDR writing via the manager API. 3) Providing complete `keepalived.conf` examples for both the primary (MASTER) and secondary (BACKUP) nodes, highlighting the use of different priorities. 4) Configuring VoIPmonitor itself with `nocdr=yes` to ensure CDR writing is disabled by default and only activated by the failover script. Finally, it explains how to start the services and test the failover mechanism. | |||
'''Keywords:''' high availability, ha, failover, active-passive, keepalived, heartbeat, cluster, redundancy, virtual ip, vip, floating ip, master-master replication, `vrrp_instance`, `ip_nonlocal_bind`, `nocdr`, manager api | |||
'''Key Questions:''' | |||
* How can I set up a high-availability failover for VoIPmonitor? | |||
* What is an Active-Passive cluster and how does it work? | |||
* How to configure `keepalived` for VoIPmonitor? | |||
* What is a Virtual IP (VIP) and how does it provide failover? | |||
* What is the modern alternative to the `heartbeat` service on Linux? | |||
* How do I test my `keepalived` failover setup? |
Latest revision as of 21:39, 30 June 2025
This is an expert-level guide for creating a high-availability (HA) VoIPmonitor cluster using an Active-Passive failover model. This architecture ensures that if your primary sensor server fails, a secondary server will automatically and seamlessly take over its responsibilities.
Overview: How Active-Passive Failover Works
This setup consists of two identical VoIPmonitor servers that share a single "virtual" IP address.
- Two Identical Nodes: Both servers (Node 1 and Node 2) receive the exact same mirrored network traffic from your switch (or other source). Both run the VoIPmonitor sensor.
- One Active Node: At any given time, only one server is the Active (or Master) node. This node holds the Virtual IP (VIP) and is the only one actively writing Call Detail Records (CDRs) to the database.
- One Passive Node: The other server is Passive (or Backup). It is sniffing traffic and writing PCAP files to its local disk, but its CDR writing is disabled to prevent database conflicts.
- Keepalived & The Heartbeat: A small, efficient service called `keepalived` runs on both nodes. The Active node constantly sends "I'm alive" heartbeat messages to the Passive node.
- Automatic Failover: If the Passive node stops receiving heartbeats, it assumes the Active node has failed. It immediately takes over the Virtual IP and runs a script to enable CDR writing, thus seamlessly becoming the new Active node.
This architecture requires a Master-Master MySQL/MariaDB replication setup, ensuring both nodes can write to the database without causing conflicts when a failover occurs.
Advantages vs. Disadvantages
- Pros: Simpler to set up and manage than a full database cluster like Galera. It has a very clear and predictable failover path.
- Cons: Only one node is "active" at a time (no load balancing). Failover time, while fast (a few seconds), is not instantaneous.
Architectural Diagram
- Node 1 (Primary)
- IP: 10.0.0.1
- State: Initially ACTIVE
- Node 2 (Secondary)
- IP: 10.0.0.2
- State: Initially PASSIVE
- Shared Virtual IP
- VIP: 10.0.0.128 (This is the IP your GUI and other services will use)
- Network
- Both nodes are connected to the same mirrored traffic source. They are also connected to each other, preferably via a direct, dedicated crossover cable for the heartbeat signal to ensure reliability.
Step 1: System Preparation (Both Nodes)
Before configuring `keepalived`, prepare both servers.
- 1. Install Keepalived
# For Debian/Ubuntu sudo apt-get update && sudo apt-get install keepalived # For CentOS/RHEL/AlmaLinux sudo yum install keepalived
- 2. Allow Binding to a Non-Local IP
This kernel parameter is required for `keepalived` to manage the Virtual IP. Edit `/etc/sysctl.conf` and add this line:
net.ipv4.ip_nonlocal_bind=1
Apply the change immediately without rebooting:
sudo sysctl -p
Step 2: Create the Failover Script (Both Nodes)
`keepalived` will execute this script whenever a node's state changes (e.g., from Backup to Master). This script uses VoIPmonitor's manager API to enable or disable CDR writing.
- Create the script file
sudo nano /etc/keepalived/voipmonitor_failover.sh
- Copy and paste the following content
#!/bin/bash # # This script is managed by keepalived to control the VoIPmonitor CDR writing state. # TYPE=$1 # "GROUP" or "INSTANCE" NAME=$2 # The name of the group or instance STATE=$3 # "MASTER", "BACKUP", or "FAULT" LOG_FILE="/var/log/keepalived-voipmonitor.log" MANAGER_PORT=5029 # Ensure this matches your voipmonitor.conf log() { echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE" } case $STATE in "MASTER") log "State changed to MASTER. Enabling CDR writing." echo "enablecdr" | nc localhost "$MANAGER_PORT" exit 0 ;; "BACKUP") log "State changed to BACKUP. Disabling CDR writing." echo "disablecdr" | nc localhost "$MANAGER_PORT" exit 0 ;; "FAULT") log "State changed to FAULT. Disabling CDR writing as a precaution." echo "disablecdr" | nc localhost "$MANAGER_PORT" exit 0 ;; *) log "Unknown state '$STATE' received. Exiting." exit 1 ;; esac
- Make the script executable
sudo chmod +x /etc/keepalived/voipmonitor_failover.sh
Step 3: Configure Keepalived
The configuration is nearly identical for both nodes, with only the `priority` needing to be different.
Configuration for Node 1 (Primary/Master)
- Edit `/etc/keepalived/keepalived.conf`
# /etc/keepalived/keepalived.conf on NODE 1 global_defs { router_id voipmonitor_ha } vrrp_script check_voipmonitor { script "/usr/bin/pgrep voipmonitor" # Check if the voipmonitor process is running interval 2 # Check every 2 seconds weight 20 # If this script fails, reduce priority by 20 } vrrp_instance VI_1 { state MASTER # Start as the MASTER node interface eth0 # Network interface for heartbeats virtual_router_id 51 # Must be the same on both nodes priority 150 # Higher priority becomes master advert_int 1 # Send heartbeat every 1 second authentication { auth_type PASS auth_pass your_secret_password } virtual_ipaddress { 10.0.0.128/24 dev eth0 } track_script { check_voipmonitor } notify /etc/keepalived/voipmonitor_failover.sh }
Configuration for Node 2 (Secondary/Backup)
- Edit `/etc/keepalived/keepalived.conf` on Node 2. It is identical except for two lines
# /etc/keepalived/keepalived.conf on NODE 2 vrrp_instance VI_1 { state BACKUP # Start as the BACKUP node interface eth0 virtual_router_id 51 priority 100 # LOWER priority than the master # ... rest of the file is identical ... }
Step 4: Configure VoIPmonitor
Finally, on both nodes, edit `/etc/voipmonitor.conf` to disable CDR writing by default. `keepalived` will be responsible for enabling it on the active node.
# /etc/voipmonitor.conf on BOTH nodes # Start with CDR writing disabled. The failover script will enable it on the active node. nocdr = yes
Step 5: Start and Test
- Start and enable the `keepalived` service on both nodes
sudo systemctl start keepalived sudo systemctl enable keepalived
You can test the failover by rebooting the primary node (`reboot` on Node 1) or by stopping its `keepalived` service. You should see the Virtual IP `10.0.0.128` automatically appear on Node 2, and the log file `/var/log/keepalived-voipmonitor.log` should show that it has transitioned to the MASTER state and enabled CDRs.
AI Summary for RAG
Summary: This guide provides a detailed tutorial for setting up a high-availability (HA) Active-Passive VoIPmonitor cluster using `keepalived`. It explains the architecture, where two identical sensors receive the same mirrored traffic, but only one "Active" node holds a shared Virtual IP (VIP) and writes CDRs to a master-master replicated database. The guide replaces the outdated `heartbeat` software with the modern `keepalived` service. The process is broken down into clear steps: 1) Preparing the system by installing `keepalived` and setting the `net.ipv4.ip_nonlocal_bind` sysctl parameter. 2) Creating a `voipmonitor_failover.sh` script that `keepalived` uses to enable or disable CDR writing via the manager API. 3) Providing complete `keepalived.conf` examples for both the primary (MASTER) and secondary (BACKUP) nodes, highlighting the use of different priorities. 4) Configuring VoIPmonitor itself with `nocdr=yes` to ensure CDR writing is disabled by default and only activated by the failover script. Finally, it explains how to start the services and test the failover mechanism. Keywords: high availability, ha, failover, active-passive, keepalived, heartbeat, cluster, redundancy, virtual ip, vip, floating ip, master-master replication, `vrrp_instance`, `ip_nonlocal_bind`, `nocdr`, manager api Key Questions:
- How can I set up a high-availability failover for VoIPmonitor?
- What is an Active-Passive cluster and how does it work?
- How to configure `keepalived` for VoIPmonitor?
- What is a Virtual IP (VIP) and how does it provide failover?
- What is the modern alternative to the `heartbeat` service on Linux?
- How do I test my `keepalived` failover setup?