Whisper: Difference between revisions

From VoIPmonitor.org
No edit summary
(Rewrite: consolidated structure, added tables for parameters and engines, streamlined content)
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{DISPLAYTITLE:Call Transcription with Whisper AI}}
{{DISPLAYTITLE:Call Transcription with Whisper AI}}


'''This guide explains how to integrate OpenAI's Whisper, a powerful automatic speech recognition (ASR) system, with VoIPmonitor for both on-demand and automatic call transcription.'''
'''Integrate OpenAI's Whisper ASR with VoIPmonitor for on-demand or automatic call transcription.'''


== Introduction to Whisper Integration ==
== Overview ==
VoIPmonitor integrates [https://openai.com/index/whisper/ Whisper], an ASR system trained on a massive dataset, enabling robust transcription of calls with various languages, accents, and background noise.


There are two primary ways to use Whisper with VoIPmonitor:
VoIPmonitor supports [https://openai.com/index/whisper/ Whisper], a speech recognition system trained on 680,000 hours of multilingual data. Two integration modes are available:
#'''On-Demand Transcription (in the GUI):''' The simplest method. A user can click a button on any call in the GUI to transcribe it. The processing happens on the GUI server.
#'''Automatic Transcription (in the Sniffer):''' A more advanced setup where the sensor automatically transcribes all calls (or a subset) in the background immediately after they finish.


For both methods, you must choose one of two underlying Whisper engines to install and configure.
{| class="wikitable"
! Mode !! Location !! Use Case
|-
| '''On-Demand''' || GUI server || User clicks "Transcribe" on individual calls
|-
| '''Automatic''' || Sensor || All calls transcribed automatically after ending
|}


;Choosing Your Whisper Engine
<kroki lang="mermaid">
*'''OpenAI Whisper''' (Python): The official implementation from OpenAI. It is easier to install (<code>pip install openai-whisper</code>) but can be slower for CPU-based transcription.
%%{init: {'flowchart': {'nodeSpacing': 15, 'rankSpacing': 30}}}%%
*'''whisper.cpp''' (C++): A high-performance C++ port of Whisper. It is significantly faster for CPU transcription and is the '''recommended engine for server-side processing'''. It requires manual compilation but offers superior performance and optimizations like CUDA.
flowchart LR
    subgraph "On-Demand (GUI)"
        A1[User clicks Transcribe] --> A2[GUI Server] --> A3[Result displayed]
    end
    subgraph "Automatic (Sniffer)"
        B1[Call ends] --> B2[Queued] --> B3[Transcribed] --> B4[Stored in DB]
    end
</kroki>


== Path A: On-Demand Transcription in the GUI ==
=== Whisper Engines ===
This setup allows users to manually trigger transcription from the call detail page. The processing occurs on the web server where the GUI is hosted.


=== Option 1: Using the `whisper.cpp` Engine (Recommended) ===
{| class="wikitable"
! Engine !! Pros !! Cons !! Recommended For
|-
| '''whisper.cpp''' (C++) || Fast, low resource usage, CUDA support (30x speedup) || Requires compilation || Server-side processing
|-
| '''OpenAI Whisper''' (Python) || Easy install (<code>pip install</code>) || Slower, requires ffmpeg || Quick testing
|}


==== Step 1: Install `whisper.cpp` and Download a Model ====
{{Tip|Use '''whisper.cpp''' for production deployments. It's significantly faster and supports GPU acceleration.}}
First, you need to compile the <code>whisper.cpp</code> project and download a pre-trained model on your GUI server.
 
<pre>
== Quick Start: GUI On-Demand (No Compilation) ==
# Clone the repository
 
The simplest setup - download a pre-built model and start transcribing immediately.
 
<syntaxhighlight lang="bash">
# Download model to GUI bin directory
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/html/bin/ggml-base.bin
 
# Set ownership (Debian/Ubuntu)
chown www-data:www-data /var/www/html/bin/ggml-base.bin
 
# For RedHat/CentOS, use: chown apache:apache
</syntaxhighlight>
 
The "Transcribe" button now appears on call detail pages. No configuration changes needed.
 
== GUI On-Demand: Advanced Setup ==
 
For custom model paths or using the Python engine.
 
=== Option 1: whisper.cpp with Custom Model ===
 
<syntaxhighlight lang="bash">
# Compile whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp.git
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cd whisper.cpp && make -j
 
# Compile the main application
make -j


# Download a model (e.g., 'base.en' for English-only, or 'small' for multilingual)
# Download model
./models/download-ggml-model.sh base.en
./models/download-ggml-model.sh base.en
</pre>
</syntaxhighlight>
This will create the main executable at <code>./main</code> and download the model to the <code>./models/</code> directory.


==== Step 2: Configure the VoIPmonitor GUI ====
Configure <code>/var/www/html/config/configuration.php</code>:
Edit your GUI's configuration file at <code>/var/www/html/config/configuration.php</code> and add the following definitions:
<pre>
<?php
// /var/www/html/config/configuration.php


// Tell the GUI to use the whisper.cpp engine
<syntaxhighlight lang="php">
define('WHISPER_NATIVE', true);
define('WHISPER_NATIVE', true);
define('WHISPER_MODEL', '/path/to/whisper.cpp/models/ggml-base.en.bin');
define('WHISPER_THREADS', 4);  // Optional
</syntaxhighlight>
=== Option 2: OpenAI Whisper (Python) ===
<syntaxhighlight lang="bash">
pip install openai-whisper
apt install ffmpeg  # or dnf install ffmpeg
</syntaxhighlight>


// Provide the absolute path to the model file you downloaded
Configure <code>/var/www/html/config/configuration.php</code>:
define('WHISPER_MODEL', '/path/to/your/whisper.cpp/models/ggml-base.en.bin');


// Optional: Specify the number of threads for transcription
<syntaxhighlight lang="php">
define('WHISPER_MODEL', '/opt/whisper_models/small.pt');
define('WHISPER_THREADS', 4);
define('WHISPER_THREADS', 4);
</pre>
</syntaxhighlight>
No further setup is required. The GUI will now show a "Transcribe" button on call detail pages.
 
== Automatic Transcription (Sniffer) ==
 
Transcribe all calls automatically on the sensor after they end.
 
=== Basic Configuration ===
 
Edit <code>/etc/voipmonitor.conf</code>:
 
<syntaxhighlight lang="ini">
# Enable transcription
audio_transcribe = yes


=== Option 2: Using the `OpenAI Whisper` Engine ===
# Using whisper.cpp (recommended)
whisper_native = yes
whisper_model = /path/to/whisper.cpp/models/ggml-small.bin


==== Step 1: Install the Python Package and Dependencies ====
# OR using Python (slower)
<pre>
# whisper_native = no
# Install the whisper library via pip
# whisper_model = small
pip install openai-whisper
</syntaxhighlight>


# Install ffmpeg, which is required for audio conversion
Restart: <code>systemctl restart voipmonitor</code>
# For Debian/Ubuntu
sudo apt-get install ffmpeg
# For CentOS/RHEL
sudo yum install ffmpeg
</pre>


==== Step 2: Configure the VoIPmonitor GUI ====
=== Configuration Parameters ===
Edit <code>/var/www/html/config/configuration.php</code> and define the model you want to use. The Whisper library will download it automatically on the first run.
<pre>
<?php
// /var/www/html/config/configuration.php


// Specify the model name. Options: tiny, base, small, medium, large
{| class="wikitable"
// 'small' is a good balance of speed and accuracy.
! Parameter !! Default !! Description
define('WHISPER_MODEL', 'small');
|-
</pre>
| <code>audio_transcribe</code> || no || Enable/disable transcription
|-
| <code>audio_transcribe_connect_duration_min</code> || 10 || Minimum call duration (seconds) to transcribe
|-
| <code>audio_transcribe_threads</code> || 2 || Concurrent transcription jobs
|-
| <code>audio_transcribe_queue_length_max</code> || 100 || Max queue size
|-
| <code>whisper_native</code> || no || Use whisper.cpp (<code>yes</code>) or Python (<code>no</code>)
|-
| <code>whisper_model</code> || small || Model name (Python) or '''absolute path''' to .bin file (whisper.cpp)
|-
| <code>whisper_language</code> || auto || Language code (<code>en</code>, <code>de</code>), <code>auto</code>, or <code>by_number</code>
|-
| <code>whisper_threads</code> || 2 || CPU threads per transcription job
|-
| <code>whisper_timeout</code> || 300 || Timeout in seconds (Python only)
|-
| <code>whisper_deterministic_mode</code> || yes || Consistent results (Python only)
|-
| <code>whisper_python</code> || - || Custom Python binary path (Python only)
|-
| <code>whisper_native_lib</code> || - || Path to libwhisper.so (advanced)
|}


== Path B: Automatic Transcription in the Sniffer ==
== Advanced: CUDA GPU Acceleration ==
This setup automatically transcribes calls in the background on the sensor itself. This is a headless operation and requires configuration in <code>voipmonitor.conf</code>.


=== Step 1: Choose and Prepare Your Engine ===
Compile whisper.cpp with NVIDIA CUDA for up to 30x speedup.
You must have one of the Whisper engines installed '''on the sensor machine'''. Using '''<code>whisper.cpp</code> is strongly recommended''' for this server-side task due to its superior performance. Follow the installation steps from "Path A" to compile <code>whisper.cpp</code> or install the <code>openai-whisper</code> Python package.


=== Step 2: Configure the Sniffer ===
<syntaxhighlight lang="bash">
Edit <code>/etc/voipmonitor.conf</code> on your sensor to enable and control automatic transcription.
# Install CUDA toolkit (see nvidia.com/cuda-downloads)
# Add to ~/.bashrc:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH


==== Minimal `whisper.cpp` Configuration: ====
# Compile with CUDA
<pre>
cd /path/to/whisper.cpp
# /etc/voipmonitor.conf
make clean
WHISPER_CUDA=1 make -j
WHISPER_CUDA=1 make libwhisper.so -j
</syntaxhighlight>


# Enable the transcription feature
== Advanced: Loadable Module ==
audio_transcribe = yes


# Tell the sniffer to use the high-performance C++ engine
Use whisper.cpp as a separate library (update without recompiling sniffer):
whisper_native = yes


# --- CRITICAL ---
<syntaxhighlight lang="bash">
# You MUST provide the absolute path to the downloaded whisper.cpp model file
# Build libraries
whisper_model = /path/to/your/whisper.cpp/models/ggml-base.en.bin
cd /path/to/whisper.cpp
</pre>
make libwhisper.so -j
make libwhisper.a -j


==== Minimal `OpenAI Whisper` Configuration: ====
# Optional: Install system-wide
<pre>
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
# /etc/voipmonitor.conf
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so
</syntaxhighlight>


# Enable the transcription feature
Configure in <code>voipmonitor.conf</code>:
audio_transcribe = yes


# Use the Python engine (this is the default)
<syntaxhighlight lang="ini">
whisper_native = no
whisper_native_lib = /path/to/whisper.cpp/libwhisper.so
</syntaxhighlight>


# Specify the model name to use. 'small' is the default.
== Troubleshooting ==
whisper_model = small
</pre>


=== Step 3: Fine-Tuning Transcription (Optional) ===
=== Model Download Fails ===
The following parameters in <code>voipmonitor.conf</code> allow you to control the transcription process:


;<code>audio_transcribe_connect_duration_min = 10</code>
Test connectivity:
: (Default: 10) Only transcribes calls that were connected for at least this many seconds.
<syntaxhighlight lang="bash">
;<code>audio_transcribe_threads = 2</code>
curl -I https://download.voipmonitor.org/whisper/ggml-base.bin
: (Default: 2) The number of calls to transcribe concurrently.
</syntaxhighlight>
;<code>audio_transcribe_queue_length_max = 100</code>
: (Default: 100) The maximum number of calls waiting in the transcription queue.
;<code>whisper_language = auto</code>
: (Default: auto) Can be set to a specific language code (e.g., <code>en</code>, <code>de</code>) or <code>by_number</code> to guess based on the phone number's country code.
;<code>whisper_threads = 2</code>
: (Default: 2) The number of CPU threads to use for a ''single'' transcription job.
;<code>whisper_deterministic_mode = yes</code>
: (Default: yes) For the <code>OpenAI Whisper</code> engine only. Aims for more consistent, repeatable transcription results.


== Advanced: CUDA Acceleration for `whisper.cpp` ==
'''If blocked:'''
To achieve a massive speed increase (up to 30x), you can compile <code>whisper.cpp</code> with NVIDIA CUDA support. This is highly recommended if you have a compatible NVIDIA GPU.
* Check firewall: <code>iptables -L -v -n</code>, <code>ufw status</code>
* Check proxy: Set <code>HTTP_PROXY</code> / <code>HTTPS_PROXY</code> environment variables
* Check DNS: <code>nslookup download.voipmonitor.org</code>


;1. Install the NVIDIA CUDA Toolkit:
'''Workaround:''' Download manually on another machine and copy via SCP.
:Follow the [https://developer.nvidia.com/cuda-downloads official guide] for your Linux distribution.


;2. Verify the installation:
=== Testing from CLI ===
:<pre>nvcc --version</pre>


;3. Re-compile <code>whisper.cpp</code> with the CUDA flag:
<syntaxhighlight lang="bash">
<pre>
/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' \
cd /path/to/your/whisper.cpp
  --json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/ggml-small.bin"}]' \
make clean
  -v1,whisper
WHISPER_CUDA=1 make -j
</syntaxhighlight>
</pre>
VoIPmonitor will automatically detect and use the CUDA-enabled <code>whisper.cpp</code> binary or library if available.


== AI Summary for RAG ==
== AI Summary for RAG ==
'''Summary:''' This guide explains how to integrate OpenAI's Whisper ASR for call transcription in VoIPmonitor. It details two primary methods: on-demand transcription from the GUI and automatic background transcription on the sniffer. For both methods, it compares the two available engines: the official Python `OpenAI Whisper` library and the high-performance C++ port, `whisper.cpp`, recommending `whisper.cpp` for server-side processing. The article provides step-by-step instructions for installing each engine, including compiling `whisper.cpp` from source and installing the Python package via `pip`. It details the necessary configuration in both the GUI's `configuration.php` (e.g., `WHISPER_NATIVE`, `WHISPER_MODEL`) and the sniffer's `voipmonitor.conf` (e.g., `audio_transcribe`, `whisper_native`). It also covers optional parameters for fine-tuning, such as setting language, thread counts, and minimum call duration. Finally, it includes a section on enabling NVIDIA CUDA acceleration for `whisper.cpp` to achieve significant performance gains.
 
'''Keywords:''' whisper, transcription, asr, speech to text, openai, whisper.cpp, `audio_transcribe`, `whisper_native`, `whisper_model`, cuda, nvidia, gpu, acceleration, gui, sniffer, automatic transcription, on-demand
'''Summary:''' VoIPmonitor integrates Whisper ASR for call transcription via two modes: on-demand (GUI button) and automatic (sniffer background processing). Two engines available: whisper.cpp (C++, recommended, fast, CUDA support) and OpenAI Whisper (Python, easier install). Quick start: download pre-built model from <code>https://download.voipmonitor.org/whisper/ggml-base.bin</code> to <code>/var/www/html/bin/</code>, set ownership to www-data. Sniffer config: enable <code>audio_transcribe=yes</code> and <code>whisper_native=yes</code> with absolute path to model in <code>whisper_model</code>. Key parameters: <code>audio_transcribe_connect_duration_min</code> (min call length), <code>whisper_threads</code> (CPU threads), <code>whisper_language</code> (auto/code/by_number). CUDA acceleration available for whisper.cpp (30x speedup).
 
'''Keywords:''' whisper, transcription, asr, speech to text, openai, whisper.cpp, audio_transcribe, whisper_native, whisper_model, cuda, gpu, ggml-base.bin, libwhisper.so, automatic transcription, on-demand
 
'''Key Questions:'''
'''Key Questions:'''
* How can I transcribe phone calls in VoIPmonitor?
* How do I enable call transcription in VoIPmonitor?
* What is the difference between OpenAI Whisper and whisper.cpp? Which one should I use?
* What is the quickest way to enable Whisper transcription?
* How do I configure on-demand call transcription in the GUI?
* How do I download the Whisper model for the GUI?
* How do I set up the sniffer for automatic, server-side transcription of all calls?
* What is the difference between whisper.cpp and OpenAI Whisper?
* What are the required parameters in `voipmonitor.conf` for Whisper?
* How do I configure automatic transcription on the sniffer?
* How can I speed up Whisper transcription using an NVIDIA GPU (CUDA)?
* What parameters control Whisper transcription behavior?
* How do I install and compile `whisper.cpp`?
* How do I enable GPU acceleration for Whisper?
* What do the `audio_transcribe` and `whisper_native` options do?
* Why is the model download failing and how do I fix it?
* How do I test Whisper transcription from the command line?

Latest revision as of 16:48, 8 January 2026


Integrate OpenAI's Whisper ASR with VoIPmonitor for on-demand or automatic call transcription.

Overview

VoIPmonitor supports Whisper, a speech recognition system trained on 680,000 hours of multilingual data. Two integration modes are available:

Mode Location Use Case
On-Demand GUI server User clicks "Transcribe" on individual calls
Automatic Sensor All calls transcribed automatically after ending

Whisper Engines

Engine Pros Cons Recommended For
whisper.cpp (C++) Fast, low resource usage, CUDA support (30x speedup) Requires compilation Server-side processing
OpenAI Whisper (Python) Easy install (pip install) Slower, requires ffmpeg Quick testing

💡 Tip: Use whisper.cpp for production deployments. It's significantly faster and supports GPU acceleration.

Quick Start: GUI On-Demand (No Compilation)

The simplest setup - download a pre-built model and start transcribing immediately.

# Download model to GUI bin directory
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/html/bin/ggml-base.bin

# Set ownership (Debian/Ubuntu)
chown www-data:www-data /var/www/html/bin/ggml-base.bin

# For RedHat/CentOS, use: chown apache:apache

The "Transcribe" button now appears on call detail pages. No configuration changes needed.

GUI On-Demand: Advanced Setup

For custom model paths or using the Python engine.

Option 1: whisper.cpp with Custom Model

# Compile whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp && make -j

# Download model
./models/download-ggml-model.sh base.en

Configure /var/www/html/config/configuration.php:

define('WHISPER_NATIVE', true);
define('WHISPER_MODEL', '/path/to/whisper.cpp/models/ggml-base.en.bin');
define('WHISPER_THREADS', 4);  // Optional

Option 2: OpenAI Whisper (Python)

pip install openai-whisper
apt install ffmpeg  # or dnf install ffmpeg

Configure /var/www/html/config/configuration.php:

define('WHISPER_MODEL', '/opt/whisper_models/small.pt');
define('WHISPER_THREADS', 4);

Automatic Transcription (Sniffer)

Transcribe all calls automatically on the sensor after they end.

Basic Configuration

Edit /etc/voipmonitor.conf:

# Enable transcription
audio_transcribe = yes

# Using whisper.cpp (recommended)
whisper_native = yes
whisper_model = /path/to/whisper.cpp/models/ggml-small.bin

# OR using Python (slower)
# whisper_native = no
# whisper_model = small

Restart: systemctl restart voipmonitor

Configuration Parameters

Parameter Default Description
audio_transcribe no Enable/disable transcription
audio_transcribe_connect_duration_min 10 Minimum call duration (seconds) to transcribe
audio_transcribe_threads 2 Concurrent transcription jobs
audio_transcribe_queue_length_max 100 Max queue size
whisper_native no Use whisper.cpp (yes) or Python (no)
whisper_model small Model name (Python) or absolute path to .bin file (whisper.cpp)
whisper_language auto Language code (en, de), auto, or by_number
whisper_threads 2 CPU threads per transcription job
whisper_timeout 300 Timeout in seconds (Python only)
whisper_deterministic_mode yes Consistent results (Python only)
whisper_python - Custom Python binary path (Python only)
whisper_native_lib - Path to libwhisper.so (advanced)

Advanced: CUDA GPU Acceleration

Compile whisper.cpp with NVIDIA CUDA for up to 30x speedup.

# Install CUDA toolkit (see nvidia.com/cuda-downloads)
# Add to ~/.bashrc:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

# Compile with CUDA
cd /path/to/whisper.cpp
make clean
WHISPER_CUDA=1 make -j
WHISPER_CUDA=1 make libwhisper.so -j

Advanced: Loadable Module

Use whisper.cpp as a separate library (update without recompiling sniffer):

# Build libraries
cd /path/to/whisper.cpp
make libwhisper.so -j
make libwhisper.a -j

# Optional: Install system-wide
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so

Configure in voipmonitor.conf:

whisper_native_lib = /path/to/whisper.cpp/libwhisper.so

Troubleshooting

Model Download Fails

Test connectivity:

curl -I https://download.voipmonitor.org/whisper/ggml-base.bin

If blocked:

  • Check firewall: iptables -L -v -n, ufw status
  • Check proxy: Set HTTP_PROXY / HTTPS_PROXY environment variables
  • Check DNS: nslookup download.voipmonitor.org

Workaround: Download manually on another machine and copy via SCP.

Testing from CLI

/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' \
  --json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/ggml-small.bin"}]' \
  -v1,whisper

AI Summary for RAG

Summary: VoIPmonitor integrates Whisper ASR for call transcription via two modes: on-demand (GUI button) and automatic (sniffer background processing). Two engines available: whisper.cpp (C++, recommended, fast, CUDA support) and OpenAI Whisper (Python, easier install). Quick start: download pre-built model from https://download.voipmonitor.org/whisper/ggml-base.bin to /var/www/html/bin/, set ownership to www-data. Sniffer config: enable audio_transcribe=yes and whisper_native=yes with absolute path to model in whisper_model. Key parameters: audio_transcribe_connect_duration_min (min call length), whisper_threads (CPU threads), whisper_language (auto/code/by_number). CUDA acceleration available for whisper.cpp (30x speedup).

Keywords: whisper, transcription, asr, speech to text, openai, whisper.cpp, audio_transcribe, whisper_native, whisper_model, cuda, gpu, ggml-base.bin, libwhisper.so, automatic transcription, on-demand

Key Questions:

  • How do I enable call transcription in VoIPmonitor?
  • What is the quickest way to enable Whisper transcription?
  • How do I download the Whisper model for the GUI?
  • What is the difference between whisper.cpp and OpenAI Whisper?
  • How do I configure automatic transcription on the sniffer?
  • What parameters control Whisper transcription behavior?
  • How do I enable GPU acceleration for Whisper?
  • Why is the model download failing and how do I fix it?
  • How do I test Whisper transcription from the command line?