Whisper: Difference between revisions

Latest revision as of 16:48, 8 January 2026

Integrate OpenAI's Whisper ASR with VoIPmonitor for on-demand or automatic call transcription.

Overview

VoIPmonitor supports Whisper, a speech recognition system trained on 680,000 hours of multilingual data. Two integration modes are available:

Mode	Location	Use Case
On-Demand	GUI server	User clicks "Transcribe" on individual calls
Automatic	Sensor	All calls transcribed automatically after ending

Whisper Engines

Engine	Pros	Cons	Recommended For
whisper.cpp (C++)	Fast, low resource usage, CUDA support (30x speedup)	Requires compilation	Server-side processing
OpenAI Whisper (Python)	Easy install (`pip install`)	Slower, requires ffmpeg	Quick testing

💡 Tip: Use whisper.cpp for production deployments. It's significantly faster and supports GPU acceleration.

Quick Start: GUI On-Demand (No Compilation)

The simplest setup - download a pre-built model and start transcribing immediately.

# Download model to GUI bin directory
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/html/bin/ggml-base.bin

# Set ownership (Debian/Ubuntu)
chown www-data:www-data /var/www/html/bin/ggml-base.bin

# For RedHat/CentOS, use: chown apache:apache

The "Transcribe" button now appears on call detail pages. No configuration changes needed.

GUI On-Demand: Advanced Setup

For custom model paths or using the Python engine.

Option 1: whisper.cpp with Custom Model

# Compile whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp && make -j

# Download model
./models/download-ggml-model.sh base.en

Configure /var/www/html/config/configuration.php:

define('WHISPER_NATIVE', true);
define('WHISPER_MODEL', '/path/to/whisper.cpp/models/ggml-base.en.bin');
define('WHISPER_THREADS', 4);  // Optional

Option 2: OpenAI Whisper (Python)

pip install openai-whisper
apt install ffmpeg  # or dnf install ffmpeg

Configure /var/www/html/config/configuration.php:

define('WHISPER_MODEL', '/opt/whisper_models/small.pt');
define('WHISPER_THREADS', 4);

Automatic Transcription (Sniffer)

Transcribe all calls automatically on the sensor after they end.

Basic Configuration

Edit /etc/voipmonitor.conf:

# Enable transcription
audio_transcribe = yes

# Using whisper.cpp (recommended)
whisper_native = yes
whisper_model = /path/to/whisper.cpp/models/ggml-small.bin

# OR using Python (slower)
# whisper_native = no
# whisper_model = small

Restart: systemctl restart voipmonitor

Configuration Parameters

Parameter	Default	Description
`audio_transcribe`	no	Enable/disable transcription
`audio_transcribe_connect_duration_min`	10	Minimum call duration (seconds) to transcribe
`audio_transcribe_threads`	2	Concurrent transcription jobs
`audio_transcribe_queue_length_max`	100	Max queue size
`whisper_native`	no	Use whisper.cpp (`yes`) or Python (`no`)
`whisper_model`	small	Model name (Python) or absolute path to .bin file (whisper.cpp)
`whisper_language`	auto	Language code (`en`, `de`), `auto`, or `by_number`
`whisper_threads`	2	CPU threads per transcription job
`whisper_timeout`	300	Timeout in seconds (Python only)
`whisper_deterministic_mode`	yes	Consistent results (Python only)
`whisper_python`	-	Custom Python binary path (Python only)
`whisper_native_lib`	-	Path to libwhisper.so (advanced)

Advanced: CUDA GPU Acceleration

Compile whisper.cpp with NVIDIA CUDA for up to 30x speedup.

# Install CUDA toolkit (see nvidia.com/cuda-downloads)
# Add to ~/.bashrc:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

# Compile with CUDA
cd /path/to/whisper.cpp
make clean
WHISPER_CUDA=1 make -j
WHISPER_CUDA=1 make libwhisper.so -j

Advanced: Loadable Module

Use whisper.cpp as a separate library (update without recompiling sniffer):

# Build libraries
cd /path/to/whisper.cpp
make libwhisper.so -j
make libwhisper.a -j

# Optional: Install system-wide
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so

Configure in voipmonitor.conf:

whisper_native_lib = /path/to/whisper.cpp/libwhisper.so

Troubleshooting

Model Download Fails

Test connectivity:

curl -I https://download.voipmonitor.org/whisper/ggml-base.bin

If blocked:

Check firewall: iptables -L -v -n, ufw status
Check proxy: Set HTTP_PROXY / HTTPS_PROXY environment variables
Check DNS: nslookup download.voipmonitor.org

Workaround: Download manually on another machine and copy via SCP.

Testing from CLI

/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' \
  --json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/ggml-small.bin"}]' \
  -v1,whisper

AI Summary for RAG

Summary: VoIPmonitor integrates Whisper ASR for call transcription via two modes: on-demand (GUI button) and automatic (sniffer background processing). Two engines available: whisper.cpp (C++, recommended, fast, CUDA support) and OpenAI Whisper (Python, easier install). Quick start: download pre-built model from https://download.voipmonitor.org/whisper/ggml-base.bin to /var/www/html/bin/, set ownership to www-data. Sniffer config: enable audio_transcribe=yes and whisper_native=yes with absolute path to model in whisper_model. Key parameters: audio_transcribe_connect_duration_min (min call length), whisper_threads (CPU threads), whisper_language (auto/code/by_number). CUDA acceleration available for whisper.cpp (30x speedup).

Keywords: whisper, transcription, asr, speech to text, openai, whisper.cpp, audio_transcribe, whisper_native, whisper_model, cuda, gpu, ggml-base.bin, libwhisper.so, automatic transcription, on-demand

Key Questions:

How do I enable call transcription in VoIPmonitor?
What is the quickest way to enable Whisper transcription?
How do I download the Whisper model for the GUI?
What is the difference between whisper.cpp and OpenAI Whisper?
How do I configure automatic transcription on the sniffer?
What parameters control Whisper transcription behavior?
How do I enable GPU acceleration for Whisper?
Why is the model download failing and how do I fix it?
How do I test Whisper transcription from the command line?