Whisper

From VoIPmonitor.org
Revision as of 16:48, 8 January 2026 by Admin (talk | contribs) (Rewrite: consolidated structure, added tables for parameters and engines, streamlined content)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Integrate OpenAI's Whisper ASR with VoIPmonitor for on-demand or automatic call transcription.

Overview

VoIPmonitor supports Whisper, a speech recognition system trained on 680,000 hours of multilingual data. Two integration modes are available:

Mode Location Use Case
On-Demand GUI server User clicks "Transcribe" on individual calls
Automatic Sensor All calls transcribed automatically after ending

Whisper Engines

Engine Pros Cons Recommended For
whisper.cpp (C++) Fast, low resource usage, CUDA support (30x speedup) Requires compilation Server-side processing
OpenAI Whisper (Python) Easy install (pip install) Slower, requires ffmpeg Quick testing

💡 Tip: Use whisper.cpp for production deployments. It's significantly faster and supports GPU acceleration.

Quick Start: GUI On-Demand (No Compilation)

The simplest setup - download a pre-built model and start transcribing immediately.

# Download model to GUI bin directory
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/html/bin/ggml-base.bin

# Set ownership (Debian/Ubuntu)
chown www-data:www-data /var/www/html/bin/ggml-base.bin

# For RedHat/CentOS, use: chown apache:apache

The "Transcribe" button now appears on call detail pages. No configuration changes needed.

GUI On-Demand: Advanced Setup

For custom model paths or using the Python engine.

Option 1: whisper.cpp with Custom Model

# Compile whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp && make -j

# Download model
./models/download-ggml-model.sh base.en

Configure /var/www/html/config/configuration.php:

define('WHISPER_NATIVE', true);
define('WHISPER_MODEL', '/path/to/whisper.cpp/models/ggml-base.en.bin');
define('WHISPER_THREADS', 4);  // Optional

Option 2: OpenAI Whisper (Python)

pip install openai-whisper
apt install ffmpeg  # or dnf install ffmpeg

Configure /var/www/html/config/configuration.php:

define('WHISPER_MODEL', '/opt/whisper_models/small.pt');
define('WHISPER_THREADS', 4);

Automatic Transcription (Sniffer)

Transcribe all calls automatically on the sensor after they end.

Basic Configuration

Edit /etc/voipmonitor.conf:

# Enable transcription
audio_transcribe = yes

# Using whisper.cpp (recommended)
whisper_native = yes
whisper_model = /path/to/whisper.cpp/models/ggml-small.bin

# OR using Python (slower)
# whisper_native = no
# whisper_model = small

Restart: systemctl restart voipmonitor

Configuration Parameters

Parameter Default Description
audio_transcribe no Enable/disable transcription
audio_transcribe_connect_duration_min 10 Minimum call duration (seconds) to transcribe
audio_transcribe_threads 2 Concurrent transcription jobs
audio_transcribe_queue_length_max 100 Max queue size
whisper_native no Use whisper.cpp (yes) or Python (no)
whisper_model small Model name (Python) or absolute path to .bin file (whisper.cpp)
whisper_language auto Language code (en, de), auto, or by_number
whisper_threads 2 CPU threads per transcription job
whisper_timeout 300 Timeout in seconds (Python only)
whisper_deterministic_mode yes Consistent results (Python only)
whisper_python - Custom Python binary path (Python only)
whisper_native_lib - Path to libwhisper.so (advanced)

Advanced: CUDA GPU Acceleration

Compile whisper.cpp with NVIDIA CUDA for up to 30x speedup.

# Install CUDA toolkit (see nvidia.com/cuda-downloads)
# Add to ~/.bashrc:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

# Compile with CUDA
cd /path/to/whisper.cpp
make clean
WHISPER_CUDA=1 make -j
WHISPER_CUDA=1 make libwhisper.so -j

Advanced: Loadable Module

Use whisper.cpp as a separate library (update without recompiling sniffer):

# Build libraries
cd /path/to/whisper.cpp
make libwhisper.so -j
make libwhisper.a -j

# Optional: Install system-wide
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so

Configure in voipmonitor.conf:

whisper_native_lib = /path/to/whisper.cpp/libwhisper.so

Troubleshooting

Model Download Fails

Test connectivity:

curl -I https://download.voipmonitor.org/whisper/ggml-base.bin

If blocked:

  • Check firewall: iptables -L -v -n, ufw status
  • Check proxy: Set HTTP_PROXY / HTTPS_PROXY environment variables
  • Check DNS: nslookup download.voipmonitor.org

Workaround: Download manually on another machine and copy via SCP.

Testing from CLI

/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' \
  --json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/ggml-small.bin"}]' \
  -v1,whisper

AI Summary for RAG

Summary: VoIPmonitor integrates Whisper ASR for call transcription via two modes: on-demand (GUI button) and automatic (sniffer background processing). Two engines available: whisper.cpp (C++, recommended, fast, CUDA support) and OpenAI Whisper (Python, easier install). Quick start: download pre-built model from https://download.voipmonitor.org/whisper/ggml-base.bin to /var/www/html/bin/, set ownership to www-data. Sniffer config: enable audio_transcribe=yes and whisper_native=yes with absolute path to model in whisper_model. Key parameters: audio_transcribe_connect_duration_min (min call length), whisper_threads (CPU threads), whisper_language (auto/code/by_number). CUDA acceleration available for whisper.cpp (30x speedup).

Keywords: whisper, transcription, asr, speech to text, openai, whisper.cpp, audio_transcribe, whisper_native, whisper_model, cuda, gpu, ggml-base.bin, libwhisper.so, automatic transcription, on-demand

Key Questions:

  • How do I enable call transcription in VoIPmonitor?
  • What is the quickest way to enable Whisper transcription?
  • How do I download the Whisper model for the GUI?
  • What is the difference between whisper.cpp and OpenAI Whisper?
  • How do I configure automatic transcription on the sniffer?
  • What parameters control Whisper transcription behavior?
  • How do I enable GPU acceleration for Whisper?
  • Why is the model download failing and how do I fix it?
  • How do I test Whisper transcription from the command line?