Data Cleaning: Difference between revisions

From VoIPmonitor.org
Jump to navigation Jump to search
No edit summary
No edit summary
 
(3 intermediate revisions by one other user not shown)
Line 1: Line 1:
= PCAP spool directory =
'''This guide explains how VoIPmonitor manages data retention for both captured packets (PCAP files) and Call Detail Records (CDRs) in the database. Proper configuration is essential for managing disk space and maintaining long-term database performance.'''


By default sniffer stores all data to /var/spool/voipmonitor which can be changed in voipmonitor.conf - spooldir = ...
== Overview of Data Cleaning ==
VoIPmonitor generates two primary types of data that require periodic cleaning:
*'''PCAP Files:''' Raw packet captures of SIP/RTP/GRAPH data stored on the filesystem in the spool directory. These can consume significant disk space.
*'''CDR Data:''' Call metadata stored in the MySQL database. Large tables can slow down GUI performance if not managed properly.


Version >= 8.0 implements new cleaning in voipmonitor.conf. Cleaning procedure runs every hour and checks size or days according to following options. If you set maxpoolsize it will wipe out the oldest data every hour until the size is reached. maxpooldays keeps maximum number of data to set days. The same is for sip rtp and graph so you can keep sip pcaps longer than rtp pcaps. All following options can be activated at onc and all sizes are in MB
The system uses two separate, independent mechanisms to manage the retention of this data.


#set default maxpoolsize to 100 GB (102400 MB)  
== 1. Filesystem Cleaning (PCAP Spool Directory) ==
maxpoolsize            = 102400
The sensor stores captured call data in a structured directory tree on the local filesystem.
maxpooldays            = 
maxpoolsipsize        =
maxpoolsipdays        =  
maxpoolrtpsize        =
maxpoolrtpdays        = 
maxpoolgraphsize      =
maxpoolgraphdays      =
maxpoolaudiosize      =
maxpoolaudiodays      =


Detailed description:
=== Spool Directory Location ===
By default, all data is stored in `/var/spool/voipmonitor`. This location can be changed by setting the `spooldir` option in `voipmonitor.conf`.


each created file is indexed in /var/spool/voipmonitor/filesindex/ in hours interval and the file size is added to aggregation mysql table files. Cleaning procedure iterates through index files and unlink files without need to scan directories.  
=== Retention Configuration ===
The cleaning process runs automatically every 5 minutes and removes the oldest data based on the rules you define in `voipmonitor.conf`. You can set limits based on total size (in Megabytes) or age (in days). If both a size and day limit are set for the same data type, the first limit that is reached will trigger the cleaning.


If you accidentally remove  /var/spool/voipmonitor/filesindex/ or files table in database, you need to reindex the spool directory otherwise the old not indexed files will be never deleted by the cleaning procedure. To initiate reindex send reindexfiles to manager interface
The following options are available:
{| class="wikitable"
|-
! Parameter !! Default Value !! Description
|-
| `maxpoolsize` || `102400` (100 GB) || The total maximum disk space for '''all''' captured data (SIP, RTP, GRAPH, AUDIO).
|-
| `maxpooldays` || (unset) || The maximum number of days to keep '''all''' captured data.
|-
| `maxpoolsipsize` || (unset) || A specific size limit for SIP PCAP files only.
|-
| `maxpoolsipdays` || (unset) || A specific age limit for SIP PCAP files only.
|-
| `maxpoolrtpsize` || (unset) || A specific size limit for RTP PCAP files only.
|-
| `maxpoolrtpdays` || (unset) || A specific age limit for RTP PCAP files only.
|-
| `maxpoolgraphsize` || (unset) || A specific size limit for GRAPH files only.
|-
| `maxpoolgraphdays` || (unset) || A specific age limit for GRAPH files only.
|-
| `maxpoolaudiosize` || (unset) || A specific size limit for converted audio files (WAV/OGG) only.
|-
| `maxpoolaudiodays` || (unset) || A specific age limit for converted audio files (WAV/OGG) only.
|}


=== Maintenance: Re-indexing the Spool Directory ===
VoIPmonitor maintains an index of all created PCAP files to perform cleaning efficiently without scanning the entire directory tree. If this index becomes corrupt, or if you manually move files into the spool, old data may not be deleted correctly.


telnet localhost 5029  
In this case, you must trigger a manual re-index. This can be done via the sniffer's manager API.
reindexfiles [enter]
# '''Open a manager API session:'''
#<pre>echo 'manager_file start /tmp/vmsck' | nc 127.0.0.1 5029</pre>
# '''Send the re-index command:'''
#<pre>echo reindexfiles | nc -U /tmp/vmsck</pre>
''Note: This command requires `netcat` with support for UNIX sockets (`-U`). For alternative methods, see the [[Encryption_in_manager_api_customer|Manager API documentation]].''


= Database cleaning =  
== 2. Database Cleaning (CDR Retention) ==
Managing the size of the `cdr` table and other large tables is critical for GUI performance.


Since version 7 the sniffer is partitioning tables by days which allows efficient cleaning of partitions. There is configuration option in voipmonitor.conf [[Sniffer_configuration#cleandatabase]].  
=== The Modern Method: Partitioning (Recommended) ===
Since version 7, VoIPmonitor utilizes **database partitioning**, which splits large tables into smaller, daily segments. This is the highly recommended method for managing database retention.
* '''How it works:''' You set a single parameter, `cleandatabase`, in `voipmonitor.conf`. This defines the number of days to keep CDRs. For example, `cleandatabase = 30` will keep the last 30 days of data.
* '''Why it's better:''' The sniffer automatically drops old daily partitions, which is an instantaneous operation that takes milliseconds, regardless of how many millions of rows it contains. This puts zero load on the database.
* '''Requirement:''' Partitioning is enabled by default on all new installations. If you are upgrading from a very old version, it may require starting with a fresh database.


If you do not have partitions you cannot use this option and you need to create some script which will delete old data from cdr table. This method is very slow since the delete has to update giant index file if tables are not partitioned.
More details can be found in the [[Sniffer_configuration#cleandatabase|Sniffer Configuration guide]].
 
=== The Legacy Method: Manual Deletion (Not Recommended) ===
If you are running a very old, non-partitioned database, you cannot use the `cleandatabase` option. You would need to create a custom script that runs a `DELETE FROM cdr WHERE calldate < ...` query.
* '''Warning:''' This method is extremely slow and resource-intensive on large tables. A single `DELETE` operation on millions of rows can take hours and generate significant I/O load on your database server, potentially impacting GUI performance.
 
== AI Summary for RAG ==
'''Summary:''' This article explains the two distinct data retention mechanisms in VoIPmonitor: filesystem cleaning for PCAP files and database cleaning for CDRs. For filesystem storage in the `spooldir`, it details the various `maxpoolsize` and `maxpooldays` configuration options that control the retention of SIP, RTP, and other files based on size or age. It also describes the `reindexfiles` manager command for troubleshooting cases where old files are not being deleted. For database retention, it strongly recommends the modern, default method of using MySQL/MariaDB partitioning combined with the `cleandatabase` setting in `voipmonitor.conf`, explaining that this allows for instantaneous deletion of old data. It contrasts this with the slow and inefficient legacy method of using `DELETE` queries on non-partitioned tables.
'''Keywords:''' data retention, cleaning, delete old calls, purge data, disk space, spooldir, maxpoolsize, maxpooldays, pcap, filesystem, database, cdr, cleandatabase, partitioning, reindexfiles, manager api
'''Key Questions:'''
* How do I automatically delete old PCAP files to free up disk space?
* What is the difference between `maxpoolsize` and `maxpooldays`?
* My spool directory is full, but old files are not being deleted. How do I fix it?
* How do I automatically delete old CDRs from the database?
* What is the `cleandatabase` option and how does it work?
* Why is database partitioning important for VoIPmonitor?
* What is the best way to manage data retention in VoIPmonitor?

Latest revision as of 09:56, 30 June 2025

This guide explains how VoIPmonitor manages data retention for both captured packets (PCAP files) and Call Detail Records (CDRs) in the database. Proper configuration is essential for managing disk space and maintaining long-term database performance.

Overview of Data Cleaning

VoIPmonitor generates two primary types of data that require periodic cleaning:

  • PCAP Files: Raw packet captures of SIP/RTP/GRAPH data stored on the filesystem in the spool directory. These can consume significant disk space.
  • CDR Data: Call metadata stored in the MySQL database. Large tables can slow down GUI performance if not managed properly.

The system uses two separate, independent mechanisms to manage the retention of this data.

1. Filesystem Cleaning (PCAP Spool Directory)

The sensor stores captured call data in a structured directory tree on the local filesystem.

Spool Directory Location

By default, all data is stored in `/var/spool/voipmonitor`. This location can be changed by setting the `spooldir` option in `voipmonitor.conf`.

Retention Configuration

The cleaning process runs automatically every 5 minutes and removes the oldest data based on the rules you define in `voipmonitor.conf`. You can set limits based on total size (in Megabytes) or age (in days). If both a size and day limit are set for the same data type, the first limit that is reached will trigger the cleaning.

The following options are available:

Parameter Default Value Description
`maxpoolsize` `102400` (100 GB) The total maximum disk space for all captured data (SIP, RTP, GRAPH, AUDIO).
`maxpooldays` (unset) The maximum number of days to keep all captured data.
`maxpoolsipsize` (unset) A specific size limit for SIP PCAP files only.
`maxpoolsipdays` (unset) A specific age limit for SIP PCAP files only.
`maxpoolrtpsize` (unset) A specific size limit for RTP PCAP files only.
`maxpoolrtpdays` (unset) A specific age limit for RTP PCAP files only.
`maxpoolgraphsize` (unset) A specific size limit for GRAPH files only.
`maxpoolgraphdays` (unset) A specific age limit for GRAPH files only.
`maxpoolaudiosize` (unset) A specific size limit for converted audio files (WAV/OGG) only.
`maxpoolaudiodays` (unset) A specific age limit for converted audio files (WAV/OGG) only.

Maintenance: Re-indexing the Spool Directory

VoIPmonitor maintains an index of all created PCAP files to perform cleaning efficiently without scanning the entire directory tree. If this index becomes corrupt, or if you manually move files into the spool, old data may not be deleted correctly.

In this case, you must trigger a manual re-index. This can be done via the sniffer's manager API.

  1. Open a manager API session:
  2. echo 'manager_file start /tmp/vmsck' | nc 127.0.0.1 5029
  3. Send the re-index command:
  4. echo reindexfiles | nc -U /tmp/vmsck

Note: This command requires `netcat` with support for UNIX sockets (`-U`). For alternative methods, see the Manager API documentation.

2. Database Cleaning (CDR Retention)

Managing the size of the `cdr` table and other large tables is critical for GUI performance.

The Modern Method: Partitioning (Recommended)

Since version 7, VoIPmonitor utilizes **database partitioning**, which splits large tables into smaller, daily segments. This is the highly recommended method for managing database retention.

  • How it works: You set a single parameter, `cleandatabase`, in `voipmonitor.conf`. This defines the number of days to keep CDRs. For example, `cleandatabase = 30` will keep the last 30 days of data.
  • Why it's better: The sniffer automatically drops old daily partitions, which is an instantaneous operation that takes milliseconds, regardless of how many millions of rows it contains. This puts zero load on the database.
  • Requirement: Partitioning is enabled by default on all new installations. If you are upgrading from a very old version, it may require starting with a fresh database.

More details can be found in the Sniffer Configuration guide.

The Legacy Method: Manual Deletion (Not Recommended)

If you are running a very old, non-partitioned database, you cannot use the `cleandatabase` option. You would need to create a custom script that runs a `DELETE FROM cdr WHERE calldate < ...` query.

  • Warning: This method is extremely slow and resource-intensive on large tables. A single `DELETE` operation on millions of rows can take hours and generate significant I/O load on your database server, potentially impacting GUI performance.

AI Summary for RAG

Summary: This article explains the two distinct data retention mechanisms in VoIPmonitor: filesystem cleaning for PCAP files and database cleaning for CDRs. For filesystem storage in the `spooldir`, it details the various `maxpoolsize` and `maxpooldays` configuration options that control the retention of SIP, RTP, and other files based on size or age. It also describes the `reindexfiles` manager command for troubleshooting cases where old files are not being deleted. For database retention, it strongly recommends the modern, default method of using MySQL/MariaDB partitioning combined with the `cleandatabase` setting in `voipmonitor.conf`, explaining that this allows for instantaneous deletion of old data. It contrasts this with the slow and inefficient legacy method of using `DELETE` queries on non-partitioned tables. Keywords: data retention, cleaning, delete old calls, purge data, disk space, spooldir, maxpoolsize, maxpooldays, pcap, filesystem, database, cdr, cleandatabase, partitioning, reindexfiles, manager api Key Questions:

  • How do I automatically delete old PCAP files to free up disk space?
  • What is the difference between `maxpoolsize` and `maxpooldays`?
  • My spool directory is full, but old files are not being deleted. How do I fix it?
  • How do I automatically delete old CDRs from the database?
  • What is the `cleandatabase` option and how does it work?
  • Why is database partitioning important for VoIPmonitor?
  • What is the best way to manage data retention in VoIPmonitor?