Whisper: Difference between revisions
No edit summary |
(Rewrite: consolidated structure, added tables for parameters and engines, streamlined content) |
||
| (6 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
{{DISPLAYTITLE:Call Transcription with Whisper AI}} | {{DISPLAYTITLE:Call Transcription with Whisper AI}} | ||
''' | '''Integrate OpenAI's Whisper ASR with VoIPmonitor for on-demand or automatic call transcription.''' | ||
== | == Overview == | ||
VoIPmonitor supports [https://openai.com/index/whisper/ Whisper], a speech recognition system trained on 680,000 hours of multilingual data. Two integration modes are available: | |||
{| class="wikitable" | |||
! Mode !! Location !! Use Case | |||
|- | |||
| '''On-Demand''' || GUI server || User clicks "Transcribe" on individual calls | |||
|- | |||
| '''Automatic''' || Sensor || All calls transcribed automatically after ending | |||
|} | |||
<kroki lang="mermaid"> | |||
%%{init: {'flowchart': {'nodeSpacing': 15, 'rankSpacing': 30}}}%% | |||
flowchart LR | |||
subgraph "On-Demand (GUI)" | |||
A1[User clicks Transcribe] --> A2[GUI Server] --> A3[Result displayed] | |||
end | |||
subgraph "Automatic (Sniffer)" | |||
B1[Call ends] --> B2[Queued] --> B3[Transcribed] --> B4[Stored in DB] | |||
end | |||
</kroki> | |||
== | === Whisper Engines === | ||
= | {| class="wikitable" | ||
! Engine !! Pros !! Cons !! Recommended For | |||
|- | |||
| '''whisper.cpp''' (C++) || Fast, low resource usage, CUDA support (30x speedup) || Requires compilation || Server-side processing | |||
|- | |||
| '''OpenAI Whisper''' (Python) || Easy install (<code>pip install</code>) || Slower, requires ffmpeg || Quick testing | |||
|} | |||
{{Tip|Use '''whisper.cpp''' for production deployments. It's significantly faster and supports GPU acceleration.}} | |||
< | == Quick Start: GUI On-Demand (No Compilation) == | ||
# | |||
The simplest setup - download a pre-built model and start transcribing immediately. | |||
<syntaxhighlight lang="bash"> | |||
# Download model to GUI bin directory | |||
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/html/bin/ggml-base.bin | |||
# Set ownership (Debian/Ubuntu) | |||
chown www-data:www-data /var/www/html/bin/ggml-base.bin | |||
# For RedHat/CentOS, use: chown apache:apache | |||
</syntaxhighlight> | |||
The "Transcribe" button now appears on call detail pages. No configuration changes needed. | |||
== GUI On-Demand: Advanced Setup == | |||
For custom model paths or using the Python engine. | |||
=== Option 1: whisper.cpp with Custom Model === | |||
<syntaxhighlight lang="bash"> | |||
# Compile whisper.cpp | |||
git clone https://github.com/ggerganov/whisper.cpp.git | git clone https://github.com/ggerganov/whisper.cpp.git | ||
cd whisper.cpp | cd whisper.cpp && make -j | ||
make -j | |||
# Download | # Download model | ||
./models/download-ggml-model.sh base.en | ./models/download-ggml-model.sh base.en | ||
</ | </syntaxhighlight> | ||
Configure <code>/var/www/html/config/configuration.php</code>: | |||
<syntaxhighlight lang="php"> | |||
define('WHISPER_NATIVE', true); | define('WHISPER_NATIVE', true); | ||
define('WHISPER_MODEL', '/path/to/whisper.cpp/models/ggml-base.en.bin'); | |||
define('WHISPER_THREADS', 4); // Optional | |||
</syntaxhighlight> | |||
=== Option 2: OpenAI Whisper (Python) === | |||
<syntaxhighlight lang="bash"> | |||
pip install openai-whisper | |||
apt install ffmpeg # or dnf install ffmpeg | |||
</syntaxhighlight> | |||
// | Configure <code>/var/www/html/config/configuration.php</code>: | ||
// | <syntaxhighlight lang="php"> | ||
define('WHISPER_MODEL', '/opt/whisper_models/small.pt'); | |||
define('WHISPER_THREADS', 4); | define('WHISPER_THREADS', 4); | ||
</ | </syntaxhighlight> | ||
== Automatic Transcription (Sniffer) == | |||
Transcribe all calls automatically on the sensor after they end. | |||
=== Basic Configuration === | |||
Edit <code>/etc/voipmonitor.conf</code>: | |||
<syntaxhighlight lang="ini"> | |||
# Enable transcription | |||
audio_transcribe = yes | |||
# Using whisper.cpp (recommended) | |||
whisper_native = yes | |||
whisper_model = /path/to/whisper.cpp/models/ggml-small.bin | |||
# OR using Python (slower) | |||
< | # whisper_native = no | ||
# whisper_model = small | |||
</syntaxhighlight> | |||
Restart: <code>systemctl restart voipmonitor</code> | |||
</ | |||
==== | === Configuration Parameters === | ||
// | {| class="wikitable" | ||
// | ! Parameter !! Default !! Description | ||
|- | |||
</ | | <code>audio_transcribe</code> || no || Enable/disable transcription | ||
|- | |||
| <code>audio_transcribe_connect_duration_min</code> || 10 || Minimum call duration (seconds) to transcribe | |||
|- | |||
| <code>audio_transcribe_threads</code> || 2 || Concurrent transcription jobs | |||
|- | |||
| <code>audio_transcribe_queue_length_max</code> || 100 || Max queue size | |||
|- | |||
| <code>whisper_native</code> || no || Use whisper.cpp (<code>yes</code>) or Python (<code>no</code>) | |||
|- | |||
| <code>whisper_model</code> || small || Model name (Python) or '''absolute path''' to .bin file (whisper.cpp) | |||
|- | |||
| <code>whisper_language</code> || auto || Language code (<code>en</code>, <code>de</code>), <code>auto</code>, or <code>by_number</code> | |||
|- | |||
| <code>whisper_threads</code> || 2 || CPU threads per transcription job | |||
|- | |||
| <code>whisper_timeout</code> || 300 || Timeout in seconds (Python only) | |||
|- | |||
| <code>whisper_deterministic_mode</code> || yes || Consistent results (Python only) | |||
|- | |||
| <code>whisper_python</code> || - || Custom Python binary path (Python only) | |||
|- | |||
| <code>whisper_native_lib</code> || - || Path to libwhisper.so (advanced) | |||
|} | |||
== | == Advanced: CUDA GPU Acceleration == | ||
Compile whisper.cpp with NVIDIA CUDA for up to 30x speedup. | |||
== | <syntaxhighlight lang="bash"> | ||
# Install CUDA toolkit (see nvidia.com/cuda-downloads) | |||
# Add to ~/.bashrc: | |||
export PATH=/usr/local/cuda/bin:$PATH | |||
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH | |||
# Compile with CUDA | |||
< | cd /path/to/whisper.cpp | ||
make clean | |||
WHISPER_CUDA=1 make -j | |||
WHISPER_CUDA=1 make libwhisper.so -j | |||
</syntaxhighlight> | |||
== Advanced: Loadable Module == | |||
Use whisper.cpp as a separate library (update without recompiling sniffer): | |||
<syntaxhighlight lang="bash"> | |||
# | # Build libraries | ||
cd /path/to/whisper.cpp | |||
make libwhisper.so -j | |||
make libwhisper.a -j | |||
# Optional: Install system-wide | |||
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h | |||
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so | |||
</syntaxhighlight> | |||
Configure in <code>voipmonitor.conf</code>: | |||
<syntaxhighlight lang="ini"> | |||
whisper_native_lib = /path/to/whisper.cpp/libwhisper.so | |||
</syntaxhighlight> | |||
== Troubleshooting == | |||
=== | === Model Download Fails === | ||
Test connectivity: | |||
<syntaxhighlight lang="bash"> | |||
curl -I https://download.voipmonitor.org/whisper/ggml-base.bin | |||
: | </syntaxhighlight> | ||
'''If blocked:''' | |||
* Check firewall: <code>iptables -L -v -n</code>, <code>ufw status</code> | |||
* Check proxy: Set <code>HTTP_PROXY</code> / <code>HTTPS_PROXY</code> environment variables | |||
* Check DNS: <code>nslookup download.voipmonitor.org</code> | |||
'''Workaround:''' Download manually on another machine and copy via SCP. | |||
=== Testing from CLI === | |||
<syntaxhighlight lang="bash"> | |||
/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' \ | |||
--json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/ggml-small.bin"}]' \ | |||
-v1,whisper | |||
</syntaxhighlight> | |||
== AI Summary for RAG == | == AI Summary for RAG == | ||
'''Summary:''' | |||
'''Keywords:''' whisper, transcription, asr, speech to text, openai, whisper.cpp, | '''Summary:''' VoIPmonitor integrates Whisper ASR for call transcription via two modes: on-demand (GUI button) and automatic (sniffer background processing). Two engines available: whisper.cpp (C++, recommended, fast, CUDA support) and OpenAI Whisper (Python, easier install). Quick start: download pre-built model from <code>https://download.voipmonitor.org/whisper/ggml-base.bin</code> to <code>/var/www/html/bin/</code>, set ownership to www-data. Sniffer config: enable <code>audio_transcribe=yes</code> and <code>whisper_native=yes</code> with absolute path to model in <code>whisper_model</code>. Key parameters: <code>audio_transcribe_connect_duration_min</code> (min call length), <code>whisper_threads</code> (CPU threads), <code>whisper_language</code> (auto/code/by_number). CUDA acceleration available for whisper.cpp (30x speedup). | ||
'''Keywords:''' whisper, transcription, asr, speech to text, openai, whisper.cpp, audio_transcribe, whisper_native, whisper_model, cuda, gpu, ggml-base.bin, libwhisper.so, automatic transcription, on-demand | |||
'''Key Questions:''' | '''Key Questions:''' | ||
* How | * How do I enable call transcription in VoIPmonitor? | ||
* What is the | * What is the quickest way to enable Whisper transcription? | ||
* How do I | * How do I download the Whisper model for the GUI? | ||
* How do I | * What is the difference between whisper.cpp and OpenAI Whisper? | ||
* What | * How do I configure automatic transcription on the sniffer? | ||
* How | * What parameters control Whisper transcription behavior? | ||
* | * How do I enable GPU acceleration for Whisper? | ||
* | * Why is the model download failing and how do I fix it? | ||
* How do I test Whisper transcription from the command line? | |||
Latest revision as of 16:48, 8 January 2026
Integrate OpenAI's Whisper ASR with VoIPmonitor for on-demand or automatic call transcription.
Overview
VoIPmonitor supports Whisper, a speech recognition system trained on 680,000 hours of multilingual data. Two integration modes are available:
| Mode | Location | Use Case |
|---|---|---|
| On-Demand | GUI server | User clicks "Transcribe" on individual calls |
| Automatic | Sensor | All calls transcribed automatically after ending |
Whisper Engines
| Engine | Pros | Cons | Recommended For |
|---|---|---|---|
| whisper.cpp (C++) | Fast, low resource usage, CUDA support (30x speedup) | Requires compilation | Server-side processing |
| OpenAI Whisper (Python) | Easy install (pip install) |
Slower, requires ffmpeg | Quick testing |
💡 Tip: Use whisper.cpp for production deployments. It's significantly faster and supports GPU acceleration.
Quick Start: GUI On-Demand (No Compilation)
The simplest setup - download a pre-built model and start transcribing immediately.
# Download model to GUI bin directory
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/html/bin/ggml-base.bin
# Set ownership (Debian/Ubuntu)
chown www-data:www-data /var/www/html/bin/ggml-base.bin
# For RedHat/CentOS, use: chown apache:apache
The "Transcribe" button now appears on call detail pages. No configuration changes needed.
GUI On-Demand: Advanced Setup
For custom model paths or using the Python engine.
Option 1: whisper.cpp with Custom Model
# Compile whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp && make -j
# Download model
./models/download-ggml-model.sh base.en
Configure /var/www/html/config/configuration.php:
define('WHISPER_NATIVE', true);
define('WHISPER_MODEL', '/path/to/whisper.cpp/models/ggml-base.en.bin');
define('WHISPER_THREADS', 4); // Optional
Option 2: OpenAI Whisper (Python)
pip install openai-whisper
apt install ffmpeg # or dnf install ffmpeg
Configure /var/www/html/config/configuration.php:
define('WHISPER_MODEL', '/opt/whisper_models/small.pt');
define('WHISPER_THREADS', 4);
Automatic Transcription (Sniffer)
Transcribe all calls automatically on the sensor after they end.
Basic Configuration
Edit /etc/voipmonitor.conf:
# Enable transcription
audio_transcribe = yes
# Using whisper.cpp (recommended)
whisper_native = yes
whisper_model = /path/to/whisper.cpp/models/ggml-small.bin
# OR using Python (slower)
# whisper_native = no
# whisper_model = small
Restart: systemctl restart voipmonitor
Configuration Parameters
| Parameter | Default | Description |
|---|---|---|
audio_transcribe |
no | Enable/disable transcription |
audio_transcribe_connect_duration_min |
10 | Minimum call duration (seconds) to transcribe |
audio_transcribe_threads |
2 | Concurrent transcription jobs |
audio_transcribe_queue_length_max |
100 | Max queue size |
whisper_native |
no | Use whisper.cpp (yes) or Python (no)
|
whisper_model |
small | Model name (Python) or absolute path to .bin file (whisper.cpp) |
whisper_language |
auto | Language code (en, de), auto, or by_number
|
whisper_threads |
2 | CPU threads per transcription job |
whisper_timeout |
300 | Timeout in seconds (Python only) |
whisper_deterministic_mode |
yes | Consistent results (Python only) |
whisper_python |
- | Custom Python binary path (Python only) |
whisper_native_lib |
- | Path to libwhisper.so (advanced) |
Advanced: CUDA GPU Acceleration
Compile whisper.cpp with NVIDIA CUDA for up to 30x speedup.
# Install CUDA toolkit (see nvidia.com/cuda-downloads)
# Add to ~/.bashrc:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
# Compile with CUDA
cd /path/to/whisper.cpp
make clean
WHISPER_CUDA=1 make -j
WHISPER_CUDA=1 make libwhisper.so -j
Advanced: Loadable Module
Use whisper.cpp as a separate library (update without recompiling sniffer):
# Build libraries
cd /path/to/whisper.cpp
make libwhisper.so -j
make libwhisper.a -j
# Optional: Install system-wide
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so
Configure in voipmonitor.conf:
whisper_native_lib = /path/to/whisper.cpp/libwhisper.so
Troubleshooting
Model Download Fails
Test connectivity:
curl -I https://download.voipmonitor.org/whisper/ggml-base.bin
If blocked:
- Check firewall:
iptables -L -v -n,ufw status - Check proxy: Set
HTTP_PROXY/HTTPS_PROXYenvironment variables - Check DNS:
nslookup download.voipmonitor.org
Workaround: Download manually on another machine and copy via SCP.
Testing from CLI
/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' \
--json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/ggml-small.bin"}]' \
-v1,whisper
AI Summary for RAG
Summary: VoIPmonitor integrates Whisper ASR for call transcription via two modes: on-demand (GUI button) and automatic (sniffer background processing). Two engines available: whisper.cpp (C++, recommended, fast, CUDA support) and OpenAI Whisper (Python, easier install). Quick start: download pre-built model from https://download.voipmonitor.org/whisper/ggml-base.bin to /var/www/html/bin/, set ownership to www-data. Sniffer config: enable audio_transcribe=yes and whisper_native=yes with absolute path to model in whisper_model. Key parameters: audio_transcribe_connect_duration_min (min call length), whisper_threads (CPU threads), whisper_language (auto/code/by_number). CUDA acceleration available for whisper.cpp (30x speedup).
Keywords: whisper, transcription, asr, speech to text, openai, whisper.cpp, audio_transcribe, whisper_native, whisper_model, cuda, gpu, ggml-base.bin, libwhisper.so, automatic transcription, on-demand
Key Questions:
- How do I enable call transcription in VoIPmonitor?
- What is the quickest way to enable Whisper transcription?
- How do I download the Whisper model for the GUI?
- What is the difference between whisper.cpp and OpenAI Whisper?
- How do I configure automatic transcription on the sniffer?
- What parameters control Whisper transcription behavior?
- How do I enable GPU acceleration for Whisper?
- Why is the model download failing and how do I fix it?
- How do I test Whisper transcription from the command line?