Data Cleaning: Difference between revisions

Revision as of 04:45, 5 January 2026

This guide explains how VoIPmonitor manages data retention for both captured packets (PCAP files) and Call Detail Records (CDRs) in the database. Proper configuration is essential for managing disk space and maintaining long-term database performance.

Overview

VoIPmonitor generates two primary types of data that require periodic cleaning:

PCAP Files: Raw packet captures of SIP/RTP/GRAPH data stored on the filesystem in the spool directory. These can consume significant disk space.
CDR Data: Call metadata stored in the MySQL database. Large tables can slow down GUI performance if not managed properly.

The system uses two separate, independent mechanisms to manage the retention of this data:

Filesystem Cleaning (PCAP Spool Directory)

The sensor stores captured call data in a structured directory tree on the local filesystem.

Reducing Data Collection at Source

Before configuring cleanup policies, consider reducing the amount of data captured. This is often the most effective long-term solution for storage management.

Save Only RTP Headers (Major Space Saver)

RTP packets typically contain the full audio payload, which consumes the majority of disk space. If you only need call quality statistics (MOS, jitter, packet loss) and not actual audio playback, switch to saving RTP headers only.

Edit /etc/voipmonitor.conf:

# Change from full RTP to headers only
savertp = header

Setting	Storage Impact	Use Case
`savertp = yes`	High (~10x more)	Requires ability to play back audio from PCAPs
`savertp = header`	Low	Only CDR statistics needed, no audio playback required

With savertp = header, VoIPmonitor still captures all necessary metadata for MOS scoring, jitter analysis, packet loss statistics, and quality graphs, but does not store the actual audio payload. This can reduce storage consumption by up to 90%.

Important: After changing from savertp = yes to savertp = header, existing PCAP files will remain playable. New calls will only contain RTP headers.

For more configuration options, see Sniffer Configuration - Saving Options.

Spool Directory Location

By default, all data is stored in /var/spool/voipmonitor. This location can be changed by setting the spooldir option in voipmonitor.conf.

Retention Configuration

The cleaning process runs automatically every 5 minutes and removes the oldest data based on the rules you define in voipmonitor.conf. You can set limits based on total size (in Megabytes) or age (in days). If both a size and day limit are set for the same data type, the first limit that is reached will trigger the cleaning.

Parameter	Default Value	Description
`maxpoolsize`	102400 (100 GB)	The total maximum disk space for all captured data (SIP, RTP, GRAPH, AUDIO).
`maxpooldays`	(unset)	The maximum number of days to keep all captured data.
`maxpoolsipsize`	(unset)	A specific size limit for SIP PCAP files only.
`maxpoolsipdays`	(unset)	A specific age limit for SIP PCAP files only.
`maxpoolrtpsize`	(unset)	A specific size limit for RTP PCAP files only.
`maxpoolrtpdays`	(unset)	A specific age limit for RTP PCAP files only.
`maxpoolgraphsize`	(unset)	A specific size limit for GRAPH files only.
`maxpoolgraphdays`	(unset)	A specific age limit for GRAPH files only.
`maxpoolaudiosize`	(unset)	A specific size limit for converted audio files (WAV/OGG) only.
`maxpoolaudiodays`	(unset)	An age limit for converted audio files (WAV/OGG) only.

Understanding Directory Size Differences (SIP vs RTP Retention)

If you configure different retention periods for SIP and RTP data (for example, maxpoolsipdays = 90 and maxpoolrtpdays = 30), you will notice a significant size difference between directories within different time windows.

Age of Directory	Contents	Size
0-30 days	Both SIP and RTP PCAP files	Large
30-90 days	SIP PCAP files only (RTP deleted)	Smaller
90+ days	None (both SIP and RTP deleted)	Empty or absent

This behavior is expected and by design:

Directories within the maxpoolrtpdays retention window contain both SIP and RTP data, making them significantly larger (often 5-10x larger).
Directories older than maxpoolrtpdays but within maxpoolsipdays contain only SIP data, so they are much smaller.
The automatic cleanup process removes RTP files after maxpoolrtpdays and SIP files after maxpoolsipdays.

This is not an error or configuration issue. It is the expected result of having different retention periods for different data types.

Diagnosis: Compare Directory Sizes

To verify this behavior, check your spool directory:

cd /var/spool/voipmonitor

# Show directories sorted by size
du -h --max-depth=1 ./ | sort -rh | head -20

# Example output:
# 80G    ./2025-01          # Current month (has SIP+RTP)
# 15G    ./2024-12          # Previous month (SIP only, RTP deleted)
# 120G   .

Compare with your configuration:

grep -E "maxpoolsip|maxpoolrtp" /etc/voipmonitor.conf

# Example configuration:
# maxpoolsipdays = 90     # SIP kept for 90 days
# maxpoolrtpdays = 30     # RTP kept for 30 days

If the size difference matches your retention configuration, this is expected behavior and no action is needed.

Troubleshooting: Disk Full / Files Disappearing

If you see errors when attempting to extract older calls from the GUI, or if call files are disappearing too quickly, your spool directory may have reached its size limit.

Diagnosis: Check Disk Usage

Identify the sensor/probe responsible for the missing data.
SSH into the sensor/probe and navigate to the spooldir.
Check the disk usage:

cd /var/spool/voipmonitor
du -h --max-depth=1 ./

# Example output:
# 150G    ./2025-01
# 120G    ./2024-12
# 90G     ./2024-11
# 360G    .

Compare with the configured limit:

grep maxpoolsize /etc/voipmonitor.conf
# Example output: maxpoolsize = 102400  (100 GB in MB)

Resolution: Increase Spooldir Size

If the actual usage exceeds the configured limit, increase maxpoolsize:

# Edit /etc/voipmonitor.conf
[general]
maxpoolsize = 716800   # 700 GB in MB
maxpooldays = 90       # Optional: Keep data for last 90 days

Apply changes:

systemctl restart voipmonitor

Maintenance: Re-indexing the Spool Directory

VoIPmonitor maintains an index of all created PCAP files to perform cleaning efficiently without scanning the entire directory tree. If this index becomes corrupt, or if you manually move files into the spool, old data may not be deleted correctly.

To trigger a manual re-index via the manager API:

# Open a manager API session
echo 'manager_file start /tmp/vmsck' | nc 127.0.0.1 5029

# Send the re-index command
echo reindexfiles | nc -U /tmp/vmsck

Note: This command requires netcat with support for UNIX sockets (-U). For alternative methods, see the Manager API documentation.

Database Cleaning (CDR Retention)

Managing the size of the cdr table and other large tables is critical for GUI performance.

Partitioning Method (Recommended)

Since version 7, VoIPmonitor utilizes database partitioning, which splits large tables into smaller, daily segments. This is the recommended method for managing database retention.

Aspect	Description
How it works	Set `cleandatabase = 30` in `voipmonitor.conf` to keep the last 30 days of data.
Why it's better	Dropping old partitions is instantaneous (milliseconds), regardless of row count. Zero database load.
Requirement	Partitioning is enabled by default on new installations.

Quick Start: Global Retention

For most deployments, configure one parameter in voipmonitor.conf:

# Keep all records for 30 days
cleandatabase = 30

The cleandatabase parameter acts as a global default for all cleandatabase_* options and applies to:

cdr - Call Detail Records
message - SIP MESSAGE texts
sip_msg - SIP OPTIONS/SUBSCRIBE/NOTIFY messages
register_state - SIP registration states
register_failed - Failed registration attempts

Retention Parameters

Parameter	Default	Description
`cleandatabase`	0 (disabled)	Master retention setting in days.
`cleandatabase_cdr`	0	Specific retention for `cdr` and `message` tables.
`cleandatabase_rtp_stat`	2	Retention for detailed RTP statistics.
`cleandatabase_sip_msg`	0	Retention for OPTIONS/SUBSCRIBE/NOTIFY.
`cleandatabase_size`	(unset)	Alternative: size-based limit in MB (requires version 2024.05.1+).
`partition_operations_enable_fromto`	1-5	Time window for partition operations (e.g., 1-5 AM).

More details: Sniffer Configuration - Database Cleaning.

Legacy Method: Manual Deletion (Not Recommended)

For very old, non-partitioned databases, you would need custom scripts with DELETE FROM cdr WHERE calldate < ... queries.

Warning: Manual DELETE on large tables is extremely slow and resource-intensive. A single operation on millions of rows can take hours and impact GUI performance.

Troubleshooting Disk Space Issues

Disk Space Not Reclaimed After Cleanup

If automatic cleanup runs but disk space is not freed from the MySQL data directory, check the innodb_file_per_table setting:

SHOW GLOBAL VARIABLES LIKE 'innodb_file_per_table';

Value	Behavior
ON	Each table/partition has its own `.ibd` file. Dropping partitions reclaims space immediately.
OFF	All data in shared `ibdata1` file. Dropping partitions does not reduce file size.

Solutions

Option 1: Enable for Future Tables

Add to /etc/my.cnf or /etc/mysql/my.cnf:

[mysqld]
innodb_file_per_table = 1

systemctl restart mysql

Note: This only affects NEW tables/partitions. Existing data in ibdata1 remains.

Option 2: Reclaim Space from Existing Tables

OPTIMIZE TABLE cdr;

Warning: Requires significant free disk space to duplicate table data. May crash if disk is nearly full.

Option 3: Export and Re-import

mysqldump -u root -p voipmonitor > voipmonitor_backup.sql
mysql -u root -p -e "DROP DATABASE voipmonitor; CREATE DATABASE voipmonitor;"
mysql -u root -p voipmonitor < voipmonitor_backup.sql

Monitoring Database Health

SQL Queue Metrics

The sensor tracks queue metrics visible in GUI → Settings → Sensors → Status:

Metric	Description	Healthy Range
SQLq	CDRs waiting to be written to database	Near 0, sporadic spikes OK
SQLf	Failed database write attempts	Zero (not growing)

Consistently high/growing SQLq → database cannot keep up
Non-zero/growing SQLf → database errors or connectivity issues

See SQL Queue Troubleshooting for details.

System Monitoring

# Check system load
cat /proc/loadavg

# Monitor disk I/O (shows only active processes)
iotop -o

High I/O from mysqld processes may indicate slow storage or poorly tuned MySQL settings.

MySQL Performance Settings

For high-performance operation with partitioning:

[mysqld]
# Use 50-70% of available RAM for caching
innodb_buffer_pool_size = 4G

# Flush logs to OS every second (faster, safe for VoIPmonitor)
innodb_flush_log_at_trx_commit = 2

# Enable per-table filespace for easy space reclamation
innodb_file_per_table = 1

For comprehensive tuning, see Scaling and Performance Guide.

AI Summary for RAG

Summary: VoIPmonitor has two independent data retention mechanisms: (1) Filesystem cleaning for PCAP files using maxpoolsize/maxpooldays parameters, running every 5 minutes to delete oldest data first; (2) Database cleaning using cleandatabase parameter with daily partitioning for instant partition drops. Key behavior: directories within maxpoolrtpdays window contain both SIP and RTP (large), while directories within maxpoolsipdays but beyond maxpoolrtpdays contain only SIP (smaller) - this is expected. Troubleshooting covers disk full scenarios (check with du -h --max-depth=1, increase maxpoolsize), space not reclaimed issues (innodb_file_per_table setting), and database health monitoring (SQLq/SQLf metrics).

Keywords: data retention, cleaning, delete old calls, disk space, spooldir, maxpoolsize, maxpooldays, maxpoolsipdays, maxpoolrtpdays, cleandatabase, partitioning, reindexfiles, innodb_file_per_table, SQLq, SQLf, directory size difference, SIP vs RTP retention

Key Questions:

How do I automatically delete old PCAP files?
What is the difference between maxpoolsize and maxpooldays?
My spool directory is full but old files are not deleted - how to fix?
How do I configure database retention with cleandatabase?
Why is disk space not reclaimed after MySQL cleanup?
What do SQLq and SQLf metrics mean?
Why are recent directories much larger than old ones in my spool?
Directory size difference with maxpoolsipdays and maxpoolrtpdays

@@ Line 116: / Line 116: @@
 | <code>maxpoolaudiodays</code> || (unset) || An age limit for converted audio files (WAV/OGG) only.
 |}
+=== Understanding Directory Size Differences (SIP vs RTP Retention) ===
+If you configure different retention periods for SIP and RTP data (for example, <code>maxpoolsipdays = 90</code> and <code>maxpoolrtpdays = 30</code>), you will notice a significant size difference between directories within different time windows.
+{| class="wikitable"
+|-
+! Age of Directory !! Contents !! Size
+|-
+| 0-30 days || Both SIP and RTP PCAP files || Large
+|-
+| 30-90 days || SIP PCAP files only (RTP deleted) || Smaller
+|-
+| 90+ days || None (both SIP and RTP deleted) || Empty or absent
+|}
+This behavior is '''expected and by design''':
+* Directories within the <code>maxpoolrtpdays</code> retention window contain both SIP and RTP data, making them significantly larger (often 5-10x larger).
+* Directories older than <code>maxpoolrtpdays</code> but within <code>maxpoolsipdays</code> contain only SIP data, so they are much smaller.
+* The automatic cleanup process removes RTP files after <code>maxpoolrtpdays</code> and SIP files after <code>maxpoolsipdays</code>.
+This is not an error or configuration issue. It is the expected result of having different retention periods for different data types.
+==== Diagnosis: Compare Directory Sizes ====
+To verify this behavior, check your spool directory:
+<syntaxhighlight lang="bash">
+cd /var/spool/voipmonitor
+# Show directories sorted by size
+du -h --max-depth=1 ./ | sort -rh | head -20
+# Example output:
+# 80G    ./2025-01          # Current month (has SIP+RTP)
+# 15G    ./2024-12          # Previous month (SIP only, RTP deleted)
+# 120G   .
+</syntaxhighlight>
+Compare with your configuration:
+<syntaxhighlight lang="bash">
+grep -E "maxpoolsip|maxpoolrtp" /etc/voipmonitor.conf
+# Example configuration:
+# maxpoolsipdays = 90     # SIP kept for 90 days
+# maxpoolrtpdays = 30     # RTP kept for 30 days
+</syntaxhighlight>
+If the size difference matches your retention configuration, this is expected behavior and no action is needed.
 === Troubleshooting: Disk Full / Files Disappearing ===
@@ Line 196: / Line 247: @@
 ==== Quick Start: Global Retention ====
-For most deployments, configure one parameter in <code>/etc/voipmonitor.conf</code>:
+For most deployments, configure one parameter in <code>voipmonitor.conf</code>:
 <syntaxhighlight lang="ini">
@@ Line 340: / Line 391: @@
 == AI Summary for RAG ==
-'''Summary:''' VoIPmonitor has two independent data retention mechanisms: (1) Filesystem cleaning for PCAP files using <code>maxpoolsize</code>/<code>maxpooldays</code> parameters, running every 5 minutes to delete oldest data first; (2) Database cleaning using <code>cleandatabase</code> parameter with daily partitioning for instant partition drops. Troubleshooting covers disk full scenarios (check with <code>du -h --max-depth=1</code>, increase <code>maxpoolsize</code>), space not reclaimed issues (<code>innodb_file_per_table</code> setting), and database health monitoring (SQLq/SQLf metrics).
+'''Summary:''' VoIPmonitor has two independent data retention mechanisms: (1) Filesystem cleaning for PCAP files using <code>maxpoolsize</code>/<code>maxpooldays</code> parameters, running every 5 minutes to delete oldest data first; (2) Database cleaning using <code>cleandatabase</code> parameter with daily partitioning for instant partition drops. Key behavior: directories within <code>maxpoolrtpdays</code> window contain both SIP and RTP (large), while directories within <code>maxpoolsipdays</code> but beyond <code>maxpoolrtpdays</code> contain only SIP (smaller) - this is expected. Troubleshooting covers disk full scenarios (check with <code>du -h --max-depth=1</code>, increase <code>maxpoolsize</code>), space not reclaimed issues (<code>innodb_file_per_table</code> setting), and database health monitoring (SQLq/SQLf metrics).
-'''Keywords:''' data retention, cleaning, delete old calls, disk space, spooldir, maxpoolsize, maxpooldays, cleandatabase, partitioning, reindexfiles, innodb_file_per_table, SQLq, SQLf
+'''Keywords:''' data retention, cleaning, delete old calls, disk space, spooldir, maxpoolsize, maxpooldays, maxpoolsipdays, maxpoolrtpdays, cleandatabase, partitioning, reindexfiles, innodb_file_per_table, SQLq, SQLf, directory size difference, SIP vs RTP retention
 '''Key Questions:'''
@@ Line 351: / Line 402: @@
 * Why is disk space not reclaimed after MySQL cleanup?
 * What do SQLq and SQLf metrics mean?
+* Why are recent directories much larger than old ones in my spool?
+* Directory size difference with maxpoolsipdays and maxpoolrtpdays