Syslog Status Line: Difference between revisions
(Create comprehensive Syslog Status Line reference (includes T2cpu content)) |
(Patch: replace '{{Tip|The IO[] metrics are wri...') |
||
| (8 intermediate revisions by 2 users not shown) | |||
| Line 117: | Line 117: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
High values indicate | High values indicate high CPU usage | ||
== [X.X Mb/s] - Traffic Rate == | == [X.X Mb/s] - Traffic Rate == | ||
| Line 130: | Line 128: | ||
Useful for capacity planning and detecting traffic spikes. | Useful for capacity planning and detecting traffic spikes. | ||
== IO[...] - Disk I/O Monitoring (v2026.01.3+) == | |||
<syntaxhighlight lang="text"> | |||
[283.4/283.4Mb/s] IO[B1.1|L0.7|Q3.2|U45|C75|W125|R10|WI1.2k|RI0.5k] | |||
│ │ │ │ │ │ │ │ └── RI: Read IOPS (k=thousands) | |||
│ │ │ │ │ │ │ └───────── WI: Write IOPS (k=thousands) | |||
│ │ │ │ │ │ └───────────── R: Read throughput MB/s | |||
│ │ │ │ │ └────────────────── W: Write throughput MB/s | |||
│ │ │ │ └────────────────────── C: Capacity % (current/knee throughput) | |||
│ │ │ └────────────────────────── U: Disk utilization % | |||
│ │ └─────────────────────────────── Q: I/O queue depth | |||
│ └──────────────────────────────────── L: Current write latency (ms) | |||
└───────────────────────────────────────── B: Baseline latency (calibrated, ms) | |||
</syntaxhighlight> | |||
'''Key concept:''' The '''Capacity (C)''' metric shows what percentage of the disk's sustainable throughput ("knee point") is currently being used. This is the primary indicator for disk saturation risk. | |||
{| class="wikitable" | |||
|- | |||
! Field !! Description !! Source | |||
|- | |||
| '''B''' || Baseline latency from calibration profile (ms) || Measured at minimal I/O load | |||
|- | |||
| '''L''' || Current write latency (ms) || Calculated from <code>/proc/diskstats</code> | |||
|- | |||
| '''Q''' || I/O queue depth (pending operations) || <code>/proc/diskstats</code> weighted_io_time | |||
|- | |||
| '''U''' || Disk utilization % (time spent doing I/O) || <code>/proc/diskstats</code> io_time | |||
|- | |||
| '''C''' || Capacity % = current throughput / knee throughput × 100 || Compared to calibration profile | |||
|- | |||
| '''W''' || Write throughput (MB/s) || <code>/proc/diskstats</code> sectors_written | |||
|- | |||
| '''R''' || Read throughput (MB/s) || <code>/proc/diskstats</code> sectors_read | |||
|- | |||
| '''WI''' || Write IOPS (always shown with 'k' suffix) || <code>/proc/diskstats</code> writes_completed | |||
|- | |||
| '''RI''' || Read IOPS (always shown with 'k' suffix) || <code>/proc/diskstats</code> reads_completed | |||
|} | |||
=== Status Indicators === | |||
{| class="wikitable" | |||
|- | |||
! Status !! Condition !! Meaning !! Action | |||
|- | |||
| (none) || C < 80% || Normal operation || None required | |||
|- | |||
| '''WARN''' || C ≥ 80% || Approaching disk capacity limit || Monitor closely, plan storage upgrade | |||
|- | |||
| '''DISK_SAT''' || C ≥ 95% OR L ≥ 3×B || Disk cannot keep up with writes || Immediate action required - see [[Sniffer_troubleshooting#Solution:_I.2FO_Bottleneck|I/O Bottleneck Solutions]] | |||
|} | |||
'''Example with saturation warning:''' | |||
<syntaxhighlight lang="text"> | |||
[450.2/450.2Mb/s] IO[B1.1|L8.5|Q45.2|U98|C97|W890|R5|WI12.5k|RI0.1k] DISK_SAT | |||
</syntaxhighlight> | |||
=== Calibration === | |||
On first startup, VoIPmonitor performs automatic disk calibration (~2-3 minutes, runs in background): | |||
# Measures baseline latency with minimal I/O | |||
# Progressively increases write load to find the "knee point" (where latency starts increasing) | |||
# Stores profile in <code>.disk_io_calibration.conf</code> in the spool directory | |||
'''During calibration:''' | |||
<syntaxhighlight lang="text"> | |||
IO[calibrating 45%] | |||
</syntaxhighlight> | |||
'''After calibration:''' | |||
<syntaxhighlight lang="text"> | |||
IO[B1.1|L0.7|Q1.2|U12|C8|W65|R2|WI0.8k|RI0.1k] | |||
</syntaxhighlight> | |||
=== Recalibration === | |||
Calibration is automatically triggered when: | |||
* No calibration profile exists | |||
* Filesystem UUID changes (disk replaced) | |||
To force recalibration, delete the profile: | |||
<syntaxhighlight lang="bash"> | |||
rm /var/spool/voipmonitor/.disk_io_calibration.conf | |||
systemctl restart voipmonitor | |||
</syntaxhighlight> | |||
{{Tip|The IO[] metrics are written to RRD for graphing. The chart <code>2db-diskio</code> contains 8 metrics: io-latency, io-qdepth, io-util, io-capacity, io-write-throughput, io-read-throughput, io-write-iops, io-read-iops.}} | |||
== t0CPU[X%] - Packet Capture Thread == | == t0CPU[X%] - Packet Capture Thread == | ||
| Line 259: | Line 347: | ||
query_cache = yes # Prevent OOM by using disk queue | query_cache = yes # Prevent OOM by using disk queue | ||
mysqlstore_max_threads_cdr = 8 # Parallel DB writes | mysqlstore_max_threads_cdr = 8 # Parallel DB writes | ||
</syntaxhighlight> | </syntaxhighlight> | ||
| Line 275: | Line 362: | ||
== PACKETBUFFER FULL == | == PACKETBUFFER FULL == | ||
'''Symptoms:''' <code>heap[90|80|70]</code> or log message "PACKETBUFFER: MEMORY IS FULL" | '''Symptoms:''' <code>heap[90|80|70]</code> approaching 100%, or log message "PACKETBUFFER: MEMORY IS FULL" | ||
''' | {{Warning|This refers to VoIPmonitor's '''internal packet buffer''', NOT system RAM. The root cause is always a downstream bottleneck (disk I/O, CPU, or database) - simply increasing buffer sizes will only delay the problem.}} | ||
'''Diagnosis and Solutions:''' See [[Sniffer_troubleshooting#PACKETBUFFER_Saturation|PACKETBUFFER Saturation Troubleshooting]] for comprehensive diagnostic steps including: | |||
* How to identify I/O vs CPU bottleneck using syslog metrics | |||
* Linux diagnostic tools (iostat, iotop, ioping) | |||
* Decision matrix for identifying root cause | |||
* Solutions for I/O, CPU, and database bottlenecks | |||
= Accessing Status Programmatically = | = Accessing Status Programmatically = | ||
Latest revision as of 00:17, 23 January 2026
Every 10 seconds, VoIPmonitor outputs a status line to syslog containing real-time metrics about calls, CPU usage, memory, and queue sizes. This page explains each metric and what to do when values indicate problems.
Overview
The status line is logged to:
- Debian/Ubuntu:
/var/log/syslog - CentOS/RHEL:
/var/log/messages
Example status line:
Nov 26 07:22:06 voipmonitor voipmonitor[2518]: calls[424][424] PS[C:4 S:41/41 R:13540 A:13583] SQLq[C:0 M:0 Cl:0] heap[0|0|0] comp[48] [25.6Mb/s] t0CPU[7.0%] t1CPU[2.5%] t2CPU[1.6%] tacCPU[8.0|8.0|6.8|6.9%] RSS/VSZ[365|1640]MB
Monitor in real-time:
journalctl -u voipmonitor -f
# or
tail -f /var/log/syslog | grep voipmonitor
Metric Reference
calls[X][Y] - Active Calls
calls[424][424]
│ └── Total calls in memory (including finishing)
└─────── Active calls (in progress)
| Value | Meaning | Action |
|---|---|---|
| Both equal | Normal operation | None |
| Second > First | Calls finishing, cleanup pending | Normal |
| Very high numbers | High traffic or stuck calls | Check for call leaks |
PS[C:x S:y/z R:a A:b] - Packet Statistics
PS[C:4 S:41/41 R:13540 A:13583]
│ │ │ │ └── A: All packets processed
│ │ │ └────────── R: RTP packets
│ │ └─────────────── S: SIP packets (current/total)
│ └─────────────────── S: SIP packets per interval
└─────────────────────── C: Control packets
💡 Tip: If R (RTP) is 0 but S (SIP) shows traffic, check that RTP ports are included in capture or NAT aliases are configured.
SQLq[C:x M:y Cl:z] - SQL Queue
Critical metric for database health.
SQLq[C:0 M:0 Cl:0]
│ │ └── Cl: Cleanup queries pending
│ └─────── M: Message/register queries pending
└─────────── C: CDR queries pending
| Value | Status | Action |
|---|---|---|
| All zeros | Healthy - DB keeping up | None |
| Growing slowly | Warning - DB slightly behind | Monitor trend |
| Growing continuously | Critical - DB bottleneck | See SQL Queue Growing |
| Stuck at high value | Critical - DB connection issue | Check MySQL connectivity |
Alternative format (older versions):
SQLq[cdr: 5234] SQLf[cdr: 12]
└── In-memory queue └── On-disk queue (query_cache files)
heap[A|B|C] - Memory Buffers
heap[0|0|0]
│ │ └── C: Processing heap usage %
│ └──── B: Secondary buffer usage %
└────── A: Primary packet buffer usage %
| Value | Status | Action |
|---|---|---|
| All < 20% | Healthy | None |
| Any > 50% | Warning - Buffer filling | Investigate bottleneck |
| Any approaching 100% | Critical | Increase max_buffer_mem or fix bottleneck
|
⚠️ Warning: If heap reaches 100%, you'll see PACKETBUFFER: MEMORY IS FULL and packets will be dropped.
comp[x] - Compression Threads
comp[48]
└── Number of active compression threads for PCAP/audio files
High values indicate high CPU usage
[X.X Mb/s] - Traffic Rate
[25.6Mb/s]
└── Current network traffic rate being processed
Useful for capacity planning and detecting traffic spikes.
IO[...] - Disk I/O Monitoring (v2026.01.3+)
[283.4/283.4Mb/s] IO[B1.1|L0.7|Q3.2|U45|C75|W125|R10|WI1.2k|RI0.5k]
│ │ │ │ │ │ │ │ └── RI: Read IOPS (k=thousands)
│ │ │ │ │ │ │ └───────── WI: Write IOPS (k=thousands)
│ │ │ │ │ │ └───────────── R: Read throughput MB/s
│ │ │ │ │ └────────────────── W: Write throughput MB/s
│ │ │ │ └────────────────────── C: Capacity % (current/knee throughput)
│ │ │ └────────────────────────── U: Disk utilization %
│ │ └─────────────────────────────── Q: I/O queue depth
│ └──────────────────────────────────── L: Current write latency (ms)
└───────────────────────────────────────── B: Baseline latency (calibrated, ms)
Key concept: The Capacity (C) metric shows what percentage of the disk's sustainable throughput ("knee point") is currently being used. This is the primary indicator for disk saturation risk.
| Field | Description | Source |
|---|---|---|
| B | Baseline latency from calibration profile (ms) | Measured at minimal I/O load |
| L | Current write latency (ms) | Calculated from /proc/diskstats
|
| Q | I/O queue depth (pending operations) | /proc/diskstats weighted_io_time
|
| U | Disk utilization % (time spent doing I/O) | /proc/diskstats io_time
|
| C | Capacity % = current throughput / knee throughput × 100 | Compared to calibration profile |
| W | Write throughput (MB/s) | /proc/diskstats sectors_written
|
| R | Read throughput (MB/s) | /proc/diskstats sectors_read
|
| WI | Write IOPS (always shown with 'k' suffix) | /proc/diskstats writes_completed
|
| RI | Read IOPS (always shown with 'k' suffix) | /proc/diskstats reads_completed
|
Status Indicators
| Status | Condition | Meaning | Action |
|---|---|---|---|
| (none) | C < 80% | Normal operation | None required |
| WARN | C ≥ 80% | Approaching disk capacity limit | Monitor closely, plan storage upgrade |
| DISK_SAT | C ≥ 95% OR L ≥ 3×B | Disk cannot keep up with writes | Immediate action required - see I/O Bottleneck Solutions |
Example with saturation warning:
[450.2/450.2Mb/s] IO[B1.1|L8.5|Q45.2|U98|C97|W890|R5|WI12.5k|RI0.1k] DISK_SAT
Calibration
On first startup, VoIPmonitor performs automatic disk calibration (~2-3 minutes, runs in background):
- Measures baseline latency with minimal I/O
- Progressively increases write load to find the "knee point" (where latency starts increasing)
- Stores profile in
.disk_io_calibration.confin the spool directory
During calibration:
IO[calibrating 45%]
After calibration:
IO[B1.1|L0.7|Q1.2|U12|C8|W65|R2|WI0.8k|RI0.1k]
Recalibration
Calibration is automatically triggered when:
- No calibration profile exists
- Filesystem UUID changes (disk replaced)
To force recalibration, delete the profile:
rm /var/spool/voipmonitor/.disk_io_calibration.conf
systemctl restart voipmonitor
💡 Tip: The IO[] metrics are written to RRD for graphing. The chart 2db-diskio contains 8 metrics: io-latency, io-qdepth, io-util, io-capacity, io-write-throughput, io-read-throughput, io-write-iops, io-read-iops.
t0CPU[X%] - Packet Capture Thread
Most critical CPU metric.
t0CPU[7.0%]
└── CPU usage of the main packet capture thread
| Value | Status | Action |
|---|---|---|
| < 50% | Healthy | None |
| 50-80% | Warning | Plan capacity upgrade |
| > 90% | Critical | Packets will be dropped! |
⚠️ Warning: The t0 thread cannot be parallelized. If it hits 100%, you must reduce load (filters, disable features) or use kernel bypass (DPDK, Napatech).
t1CPU[X%], t2CPU[X%] - Processing Threads
t1CPU[2.5%] t2CPU[1.6%]
Secondary processing threads. These can scale with traffic automatically.
t2CPU Detailed Breakdown
When t2CPU shows high usage, the detailed breakdown helps identify the bottleneck:
t2CPU[pb:10.5/ d:39.2/ s:24.6/ e:17.3/ c:6.8/ g:6.4/ r:7.3/ rm:24.6/ rh:16.7/ rd:19.3/]
| Code | Function | Description |
|---|---|---|
| pb | Packet buffer | Output from packet buffer |
| d | Dispatch | Creating structures for processing |
| s | SIP parsing | Parsing SIP headers |
| e | Entity lookup | Finding/creating calls |
| c | Call processing | Processing call packets |
| g | Register processing | Processing REGISTER packets |
| r | RTP processing | Processing RTP packets |
| rm | RTP move | Moving RTP packets for processing |
| rh | RTP hash | RTP hash table lookups |
| rd | RTP dispatch | Dispatching to RTP read queue |
Thread auto-scaling:
- If d > 50% → s thread starts (SIP parsing)
- If s > 50% → e thread starts (entity lookup)
- If e > 50% → c, g, r threads start
tacCPU[A|B|C|D%] - TAR Compression
tacCPU[8.0|8.0|6.8|6.9%]
└── CPU usage of TAR archive compression threads
High values indicate heavy PCAP archiving. Controlled by tar_maxthreads (default: 8).
RSS/VSZ[X|Y]MB - Memory Usage
RSS/VSZ[365|1640]MB
│ └── VSZ: Virtual memory (pre-allocated, not all used)
└─────── RSS: Resident Set Size (actual physical memory used)
| Metric | Meaning | Concern Level |
|---|---|---|
| RSS growing continuously | Possible memory leak | Investigate |
| High VSZ, normal RSS | Normal - virtual pre-allocation | None |
| RSS approaching server RAM | OOM risk | Reduce buffers or add RAM |
Common Issues
SQL Queue Growing
Symptoms: SQLq[C:5234...] increasing over time
Diagnosis:
# Check MySQL CPU
top -p $(pgrep mysqld)
# Check disk I/O
iostat -x 1 5
Solutions:
| Bottleneck | Solution |
|---|---|
| I/O (high iowait) | Upgrade to SSD/NVMe storage |
| CPU (mysqld at 100%) | Upgrade CPU or optimize queries |
| RAM (swapping) | Increase innodb_buffer_pool_size
|
Immediate mitigations:
# /etc/voipmonitor.conf
query_cache = yes # Prevent OOM by using disk queue
mysqlstore_max_threads_cdr = 8 # Parallel DB writes
See SQL_queue_is_growing_in_a_peaktime and Database_troubleshooting for detailed guidance.
High t0CPU
Symptoms: t0CPU[>90%]
Solutions (in order of preference):
- Use
interface_ip_filterinstead of BPFfilter - Disable unused features (jitterbuffer if not needed)
- Upgrade to kernel bypass: DPDK or Napatech
PACKETBUFFER FULL
Symptoms: heap[90|80|70] approaching 100%, or log message "PACKETBUFFER: MEMORY IS FULL"
⚠️ Warning: This refers to VoIPmonitor's internal packet buffer, NOT system RAM. The root cause is always a downstream bottleneck (disk I/O, CPU, or database) - simply increasing buffer sizes will only delay the problem.
Diagnosis and Solutions: See PACKETBUFFER Saturation Troubleshooting for comprehensive diagnostic steps including:
- How to identify I/O vs CPU bottleneck using syslog metrics
- Linux diagnostic tools (iostat, iotop, ioping)
- Decision matrix for identifying root cause
- Solutions for I/O, CPU, and database bottlenecks
Accessing Status Programmatically
Via Manager API
The sniffer_stat command returns JSON including pbStatString:
echo 'sniffer_stat' | nc -U /tmp/vm_manager_socket | jq '.pbStatString'
See Manager_API for connection details.
Via GUI
Navigate to Settings → Sensors → Status to see real-time metrics for all connected sensors.
See Also
- Manager_API - Query status programmatically
- Sniffer_detailed_architecture - Internal architecture overview
- SQL_queue_is_growing_in_a_peaktime - Database bottleneck troubleshooting
- Scaling - Performance tuning guide
- Database_troubleshooting - MySQL optimization