Whisper: Difference between revisions

From VoIPmonitor.org
(Add troubleshooting section for failed model downloads (firewall, proxy, DNS diagnostics))
(Rewrite: consolidated structure, added tables for parameters and engines, streamlined content)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
{{DISPLAYTITLE:Call Transcription with Whisper AI}}
{{DISPLAYTITLE:Call Transcription with Whisper AI}}


'''This guide explains how to integrate OpenAI's Whisper, a powerful automatic speech recognition (ASR) system, with VoIPmonitor for both on-demand and automatic call transcription.'''
'''Integrate OpenAI's Whisper ASR with VoIPmonitor for on-demand or automatic call transcription.'''


== Introduction to Whisper Integration ==
== Overview ==
VoIPmonitor integrates [https://openai.com/index/whisper/ Whisper], an ASR system from OpenAI trained on 680,000 hours of multilingual and multitask supervised data. This large and diverse dataset leads to improved robustness to accents, background noise, and technical language.


There are two primary ways to use Whisper with VoIPmonitor:
VoIPmonitor supports [https://openai.com/index/whisper/ Whisper], a speech recognition system trained on 680,000 hours of multilingual data. Two integration modes are available:
#'''On-Demand Transcription (in the GUI):''' The simplest method. A user can click a button on any call in the GUI to transcribe it. The processing happens on the GUI server.
 
#'''Automatic Transcription (in the Sniffer):''' A more advanced setup where the sensor automatically transcribes all calls (or a subset) in the background immediately after they finish.
{| class="wikitable"
! Mode !! Location !! Use Case
|-
| '''On-Demand''' || GUI server || User clicks "Transcribe" on individual calls
|-
| '''Automatic''' || Sensor || All calls transcribed automatically after ending
|}


<kroki lang="mermaid">
<kroki lang="mermaid">
flowchart TB
%%{init: {'flowchart': {'nodeSpacing': 15, 'rankSpacing': 30}}}%%
     subgraph "Path A: On-Demand (GUI)"
flowchart LR
         A1[User clicks Transcribe] --> A2[GUI Server]
     subgraph "On-Demand (GUI)"
        A2 --> A3[whisper.cpp or OpenAI Whisper]
         A1[User clicks Transcribe] --> A2[GUI Server] --> A3[Result displayed]
        A3 --> A4[Transcription displayed in GUI]
     end
     end
 
     subgraph "Automatic (Sniffer)"
     subgraph "Path B: Automatic (Sniffer)"
         B1[Call ends] --> B2[Queued] --> B3[Transcribed] --> B4[Stored in DB]
         B1[Call ends] --> B2[Sniffer detects call completion]
        B2 --> B3[Audio queued for transcription]
        B3 --> B4[whisper.cpp processes audio]
        B4 --> B5[Transcription stored in DB]
     end
     end
</kroki>
</kroki>


For both methods, you must choose one of two underlying Whisper engines to install and configure.
=== Whisper Engines ===


=== Choosing Your Whisper Engine ===
{| class="wikitable"
*'''OpenAI Whisper''' (Python): The official implementation from OpenAI. It is easier to install (<code>pip install openai-whisper</code>) but can be slower for CPU-based transcription. It uses PyTorch and requires <code>ffmpeg</code> for audio pre-processing. The official implementation does not behave deterministically (the same run can have different results), but this can be addressed with a custom script.
! Engine !! Pros !! Cons !! Recommended For
*'''whisper.cpp''' (C++): A high-performance C++ port of [https://github.com/ggerganov/whisper.cpp whisper.cpp]. It is significantly faster for CPU transcription and is the '''recommended engine for server-side processing'''. It requires manual compilation but offers superior performance and optimizations like NVIDIA CUDA for GPU acceleration (up to 30x faster). Note that it requires audio input to be 16kHz, 1-channel (mono).
|-
| '''whisper.cpp''' (C++) || Fast, low resource usage, CUDA support (30x speedup) || Requires compilation || Server-side processing
|-
| '''OpenAI Whisper''' (Python) || Easy install (<code>pip install</code>) || Slower, requires ffmpeg || Quick testing
|}


== Path A: On-Demand Transcription in the GUI ==
{{Tip|Use '''whisper.cpp''' for production deployments. It's significantly faster and supports GPU acceleration.}}
This setup allows users to manually trigger transcription from the call detail page. The processing occurs on the web server where the GUI is hosted.


=== Quick Start: Using Pre-built Model (No Compilation Required) ===
== Quick Start: GUI On-Demand (No Compilation) ==


If you want to enable on-demand transcription without compiling or installing packages, you can download a pre-built model from the VoIPmonitor server and use it directly.
The simplest setup - download a pre-built model and start transcribing immediately.


==== Step 1: Download the Pre-built Model ====
Download the Whisper model file directly to the GUI's <code>bin/</code> directory:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Download the base model to the default GUI directory
# Download model to GUI bin directory
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/html/bin/ggml-base.bin
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/html/bin/ggml-base.bin


# For Debian-based systems where the GUI is in /var/www/:
# Set ownership (Debian/Ubuntu)
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/voipmonitor/bin/ggml-base.bin
</syntaxhighlight>
 
==== Step 2: Set File Ownership ====
Ensure the web server user owns the model file:
<syntaxhighlight lang="bash">
# For Apache on Debian/Ubuntu (www-data user)
chown www-data:www-data /var/www/html/bin/ggml-base.bin
chown www-data:www-data /var/www/html/bin/ggml-base.bin


# For Apache on RedHat/CentOS (apache user)
# For RedHat/CentOS, use: chown apache:apache
chown apache:apache /var/www/html/bin/ggml-base.bin
</syntaxhighlight>


# For GUI in /var/www/voipmonitor:
The "Transcribe" button now appears on call detail pages. No configuration changes needed.
chown www-data:www-data /var/www/voipmonitor/bin/ggml-base.bin
</syntaxhighlight>


==== Step 3: Verify ====
== GUI On-Demand: Advanced Setup ==
The "Transcribe" button should now appear on call detail pages in the GUI. No configuration changes are required when using the default <code>/var/www/html/bin/</code> location.


'''Note:''' This method uses the <code>whisper.cpp</code> engine which is bundled with the GUI. The model file format (.bin) is compatible with the bundled engine.
For custom model paths or using the Python engine.


=== Option 1: Using the whisper.cpp Engine (Recommended) ===
=== Option 1: whisper.cpp with Custom Model ===


==== Step 1: Install whisper.cpp and Download a Model ====
First, you need to compile the <code>whisper.cpp</code> project and download a pre-trained model on your GUI server.
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Clone the repository
# Compile whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp.git
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cd whisper.cpp && make -j


# Compile the main application
# Download model
make -j
 
# Download a model (e.g., 'base.en' for English-only, or 'small' for multilingual)
./models/download-ggml-model.sh base.en
./models/download-ggml-model.sh base.en
</syntaxhighlight>
</syntaxhighlight>
This will create the main executable at <code>./main</code> and download the model to the <code>./models/</code> directory. Note that <code>whisper.cpp</code> models (GGML format) are not binary compatible with the official OpenAI models (.pt format), but a conversion script is provided in the project.


==== Step 2: Configure the VoIPmonitor GUI ====
Configure <code>/var/www/html/config/configuration.php</code>:
Edit your GUI's configuration file at <code>/var/www/html/config/configuration.php</code> and add the following definitions:
 
<syntaxhighlight lang="php">
<syntaxhighlight lang="php">
<?php
// /var/www/html/config/configuration.php
// Tell the GUI to use the whisper.cpp engine
define('WHISPER_NATIVE', true);
define('WHISPER_NATIVE', true);
 
define('WHISPER_MODEL', '/path/to/whisper.cpp/models/ggml-base.en.bin');
// Provide the absolute path to the model file you downloaded
define('WHISPER_THREADS', 4); // Optional
define('WHISPER_MODEL', '/path/to/your/whisper.cpp/models/ggml-base.en.bin');
 
// Optional: Specify the number of threads for transcription
define('WHISPER_THREADS', 4);
</syntaxhighlight>
</syntaxhighlight>
No further setup is required. The GUI will now show a "Transcribe" button on call detail pages.


=== Option 2: Using the OpenAI Whisper Engine ===
=== Option 2: OpenAI Whisper (Python) ===


==== Step 1: Install the Python Package and Dependencies ====
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Install the whisper library via pip
pip install openai-whisper
pip install openai-whisper
apt install ffmpeg  # or dnf install ffmpeg
</syntaxhighlight>


# Install ffmpeg, which is required for audio conversion
Configure <code>/var/www/html/config/configuration.php</code>:
# For Debian/Ubuntu
sudo apt-get install ffmpeg
# For CentOS/RHEL/Fedora
sudo dnf install ffmpeg
</syntaxhighlight>


==== Step 2: Prepare the Model and Configure the GUI ====
The Python library can download models automatically, but it's best practice to specify an absolute path in the configuration. First, trigger a download to a known location.
<syntaxhighlight lang="bash">
# This command will download the 'small' model to /opt/whisper_models/
# You can use any audio file; its content doesn't matter for the download.
whisper audio.wav --model=small --model_dir=/opt/whisper_models
</syntaxhighlight>
Now, edit <code>/var/www/html/config/configuration.php</code> and provide the full path to the downloaded model file.
<syntaxhighlight lang="php">
<syntaxhighlight lang="php">
<?php
// /var/www/html/config/configuration.php
// Provide the absolute path to the downloaded .pt model file.
define('WHISPER_MODEL', '/opt/whisper_models/small.pt');
define('WHISPER_MODEL', '/opt/whisper_models/small.pt');
// Optional: Specify the number of threads
define('WHISPER_THREADS', 4);
define('WHISPER_THREADS', 4);
</syntaxhighlight>
</syntaxhighlight>


=== Testing the GUI Integration ===
== Automatic Transcription (Sniffer) ==
You can test the transcription process from the command line as the GUI would run it. This is useful for debugging paths and performance.
<syntaxhighlight lang="bash">
# Example test for a whisper.cpp setup
/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' --json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/your/whisper.cpp/models/ggml-small.bin"},{"whisper_threads":"2"}]' -v1,whisper
</syntaxhighlight>


=== Troubleshooting Failed Model Downloads ===
Transcribe all calls automatically on the sensor after they end.


If the Whisper transcription feature fails to automatically download the required model file (e.g., <code>ggml-base.bin</code>), follow these steps to diagnose the issue.
=== Basic Configuration ===


==== Step 1: Test Connectivity to the Download URL ====
Edit <code>/etc/voipmonitor.conf</code>:


From your GUI server, test if the server can reach the model download URL:
<syntaxhighlight lang="ini">
# Enable transcription
audio_transcribe = yes


<syntaxhighlight lang="bash">
# Using whisper.cpp (recommended)
# Test connectivity to the pre-built model
whisper_native = yes
curl -I https://download.voipmonitor.org/whisper/ggml-base.bin
whisper_model = /path/to/whisper.cpp/models/ggml-small.bin


# Or test with wget (this will show connection details)
# OR using Python (slower)
wget --spider https://download.voipmonitor.org/whisper/ggml-base.bin
# whisper_native = no
# whisper_model = small
</syntaxhighlight>
</syntaxhighlight>


If the URL is reachable, you should see <code>HTTP 200 OK</code> or similar success response.
Restart: <code>systemctl restart voipmonitor</code>
 
==== Step 2: Check Firewall Rules ====
 
If the connectivity test fails or times out, check if outbound connections are blocked by a firewall:
 
<syntaxhighlight lang="bash">
# Check if iptables is blocking outbound connections (as root)
sudo iptables -L -v -n | grep -E "OUTPUT|FORWARD"
 
# Check if firewalld is active and blocking outbound connections
sudo firewall-cmd --list-all


# Check UFW status (Ubuntu/Debian)
=== Configuration Parameters ===
sudo ufw status
</syntaxhighlight>


If outbound connections are restricted, you may need to allow connections on port 443 (HTTPS) from your server.
{| class="wikitable"
! Parameter !! Default !! Description
|-
| <code>audio_transcribe</code> || no || Enable/disable transcription
|-
| <code>audio_transcribe_connect_duration_min</code> || 10 || Minimum call duration (seconds) to transcribe
|-
| <code>audio_transcribe_threads</code> || 2 || Concurrent transcription jobs
|-
| <code>audio_transcribe_queue_length_max</code> || 100 || Max queue size
|-
| <code>whisper_native</code> || no || Use whisper.cpp (<code>yes</code>) or Python (<code>no</code>)
|-
| <code>whisper_model</code> || small || Model name (Python) or '''absolute path''' to .bin file (whisper.cpp)
|-
| <code>whisper_language</code> || auto || Language code (<code>en</code>, <code>de</code>), <code>auto</code>, or <code>by_number</code>
|-
| <code>whisper_threads</code> || 2 || CPU threads per transcription job
|-
| <code>whisper_timeout</code> || 300 || Timeout in seconds (Python only)
|-
| <code>whisper_deterministic_mode</code> || yes || Consistent results (Python only)
|-
| <code>whisper_python</code> || - || Custom Python binary path (Python only)
|-
| <code>whisper_native_lib</code> || - || Path to libwhisper.so (advanced)
|}


==== Step 3: Check Proxy Settings ====
== Advanced: CUDA GPU Acceleration ==


If your network requires outbound connections through a proxy server, configure the appropriate environment variables:
Compile whisper.cpp with NVIDIA CUDA for up to 30x speedup.


<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Set proxy environment variables (temporary, for current session)
# Install CUDA toolkit (see nvidia.com/cuda-downloads)
export HTTP_PROXY="http://proxy.yourdomain.com:3128"
# Add to ~/.bashrc:
export HTTPS_PROXY="http://proxy.yourdomain.com:3128"
export PATH=/usr/local/cuda/bin:$PATH
export http_proxy="http://proxy.yourdomain.com:3128"
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export https_proxy="http://proxy.yourdomain.com:3128"


# Then retry the download or connection test
# Compile with CUDA
wget https://download.voipmonitor.org/whisper/ggml-base.bin
cd /path/to/whisper.cpp
make clean
WHISPER_CUDA=1 make -j
WHISPER_CUDA=1 make libwhisper.so -j
</syntaxhighlight>
</syntaxhighlight>


To make proxy settings permanent, add them to <code>/etc/environment</code> or your web server's configuration file.
== Advanced: Loadable Module ==


==== Step 4: Check DNS Resolution ====
Use whisper.cpp as a separate library (update without recompiling sniffer):
 
If you get connection errors like "Could not resolve host", verify DNS resolution is working:


<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Test DNS resolution
# Build libraries
nslookup download.voipmonitor.org
cd /path/to/whisper.cpp
dig download.voipmonitor.org
make libwhisper.so -j
make libwhisper.a -j


# Check your DNS servers
# Optional: Install system-wide
cat /etc/resolv.conf
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
</syntaxhighlight>
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so
 
If DNS resolution fails, you may need to update your DNS servers in <code>/etc/resolv.conf</code> or your network configuration.
 
==== Workaround: Manual Download ====
 
If you cannot resolve the network issues immediately, you can manually download the model file using the instructions in the "Quick Start" section above:
<syntaxhighlight lang="bash">
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/html/bin/ggml-base.bin
chown www-data:www-data /var/www/html/bin/ggml-base.bin
</syntaxhighlight>
 
{{Warning|1=Manually downloading the model file is a workaround. The underlying network issue (firewall, proxy, DNS) should still be resolved to ensure future automatic downloads work correctly.}}
 
== Path B: Automatic Transcription in the Sniffer ==
This setup automatically transcribes calls in the background on the sensor itself. This is a headless operation and requires configuration in <code>voipmonitor.conf</code>. Using '''whisper.cpp is strongly recommended''' for this server-side task due to its superior performance.
 
=== Step 1: Prepare Your Engine on the Sensor ===
You must have one of the Whisper engines installed '''on the sensor machine'''.
 
;For whisper.cpp:
Follow the installation steps from "Path A" to compile <code>whisper.cpp</code>. For advanced integration, you may need to build the shared libraries and install them system-wide (see Advanced section below).
 
;For OpenAI Whisper:
Follow the Python package installation steps from "Path A".
 
=== Step 2: Configure the Sniffer ===
Edit <code>/etc/voipmonitor.conf</code> on your sensor to enable and control automatic transcription. You have three main ways to integrate it.
 
==== Option 1: Using whisper.cpp (Recommended) ====
This uses the compiled <code>main</code> executable.
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
 
# Enable the transcription feature
audio_transcribe = yes
 
# Tell the sniffer to use the high-performance C++ engine
whisper_native = yes
 
# --- CRITICAL ---
# You MUST provide the absolute path to the downloaded whisper.cpp model file
whisper_model = /path/to/your/whisper.cpp/models/ggml-small.bin
</syntaxhighlight>
</syntaxhighlight>


==== Option 2: Using OpenAI Whisper ====
Configure in <code>voipmonitor.conf</code>:
This uses the Python library.
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
 
# Enable the transcription feature
audio_transcribe = yes
 
# Use the Python engine (this is the default, but explicit is better)
whisper_native = no
 
# Specify the model name to use ('small' is a good default).
# The library will download it to ~/.cache/whisper/ if not found.
whisper_model = small
</syntaxhighlight>


==== Option 3: Using whisper.cpp as a Loadable Module (Advanced) ====
This method allows you to update the <code>whisper.cpp</code> library without recompiling the entire sniffer. It requires a modified <code>whisper.cpp</code> build (see Advanced section).
<syntaxhighlight lang="ini">
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
whisper_native_lib = /path/to/whisper.cpp/libwhisper.so
 
audio_transcribe = yes
whisper_native = yes
whisper_model = /path/to/your/whisper.cpp/models/ggml-small.bin
 
# Specify the path to the compiled shared library
whisper_native_lib = /path/to/your/whisper.cpp/libwhisper.so
</syntaxhighlight>
</syntaxhighlight>


=== Step 3: Fine-Tuning Transcription Parameters ===
== Troubleshooting ==
The following parameters in <code>voipmonitor.conf</code> allow you to control the transcription process:
 
;<code>audio_transcribe = yes</code>
: (Default: no) Enables the audio transcription feature.
;<code>audio_transcribe_connect_duration_min = 10</code>
: (Default: 10) Only transcribes calls that were connected for at least this many seconds.
;<code>audio_transcribe_threads = 2</code>
: (Default: 2) The number of calls to transcribe concurrently.
;<code>audio_transcribe_queue_length_max = 100</code>
: (Default: 100) The maximum number of calls waiting in the transcription queue.
;<code>whisper_native = no</code>
: (Default: no) Set to <code>yes</code> to force the use of the <code>whisper.cpp</code> engine.
;<code>whisper_model = small</code>
: For OpenAI Whisper, this is the model name (tiny, base, small, etc.). For <code>whisper.cpp</code>, this '''must''' be the full, absolute path to the <code>.bin</code> model file.
;<code>whisper_language = auto</code>
: (Default: auto) Can be a specific language code (e.g., <code>en</code>, <code>de</code>), <code>auto</code> for detection, or <code>by_number</code> to guess based on the phone number's country code.
;<code>whisper_threads = 2</code>
: (Default: 2) The number of CPU threads to use for a ''single'' transcription job.
;<code>whisper_timeout = 300</code>
: (Default: 300) For OpenAI Whisper only. Maximum time in seconds for a single transcription.
;<code>whisper_deterministic_mode = yes</code>
: (Default: yes) For OpenAI Whisper only. Aims for more consistent, repeatable transcription results.
;<code>whisper_python = /usr/bin/python3</code>
: (Default: not set) For OpenAI Whisper only. Specifies the path to the Python binary if it's not in the system's <code>PATH</code>.
;<code>whisper_native_lib = /path/to/libwhisper.so</code>
: (Default: not set) For <code>whisper.cpp</code> only. Specifies the path to the shared library when using the loadable module method.
 
== Advanced Topics ==


=== Compiling whisper.cpp with Libraries for Sniffer Integration ===
=== Model Download Fails ===
To compile the VoIPmonitor sniffer with built-in <code>whisper.cpp</code> support or to use it as a loadable library, you must build its shared and static libraries.


;1. Build the libraries:
Test connectivity:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
cd /path/to/your/whisper.cpp
curl -I https://download.voipmonitor.org/whisper/ggml-base.bin
# Build the main executable, shared lib, and static lib
make -j
make libwhisper.so -j
make libwhisper.a -j
</syntaxhighlight>
 
;2. (Optional) Apply patch for loadable module:
For the advanced "loadable module" integration (<code>whisper_native_lib</code>), a patch is required.
<syntaxhighlight lang="bash">
# Inside the whisper.cpp directory
patch < whisper.diff
make clean
make -j
make libwhisper.so -j
</syntaxhighlight>
 
;3. Install libraries and headers:
For the sniffer's build process to find the <code>whisper.cpp</code> components, place them in standard system locations or create symbolic links.
<syntaxhighlight lang="bash">
# Create symbolic links to the compiled files in your whisper.cpp directory
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
ln -s $(pwd)/ggml.h /usr/local/include/ggml.h
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so
ln -s $(pwd)/libwhisper.a /usr/local/lib64/libwhisper.a
</syntaxhighlight>
</syntaxhighlight>


=== CUDA Acceleration for whisper.cpp ===
'''If blocked:'''
To achieve a massive speed increase (up to 30x), you can compile <code>whisper.cpp</code> with NVIDIA CUDA support. This is highly recommended if you have a compatible NVIDIA GPU on your sensor or GUI server.
* Check firewall: <code>iptables -L -v -n</code>, <code>ufw status</code>
* Check proxy: Set <code>HTTP_PROXY</code> / <code>HTTPS_PROXY</code> environment variables
* Check DNS: <code>nslookup download.voipmonitor.org</code>


;1. Install the NVIDIA CUDA Toolkit:
'''Workaround:''' Download manually on another machine and copy via SCP.
Follow the [https://developer.nvidia.com/cuda-downloads official guide] for your Linux distribution.


;2. Set environment variables:
=== Testing from CLI ===
Ensure the CUDA toolkit is in your system's path. You can add these lines to your <code>~/.bashrc</code> file.
<syntaxhighlight lang="bash">
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
</syntaxhighlight>
Verify with <code>nvcc --version</code>.


;3. Re-compile whisper.cpp with the CUDA flag:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
cd /path/to/your/whisper.cpp
/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' \
make clean
  --json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/ggml-small.bin"}]' \
# Rebuild the executable and libraries with CUDA enabled
  -v1,whisper
WHISPER_CUDA=1 make -j
WHISPER_CUDA=1 make libwhisper.so -j
WHISPER_CUDA=1 make libwhisper.a -j
</syntaxhighlight>
</syntaxhighlight>
VoIPmonitor will automatically detect and use the CUDA-enabled <code>whisper.cpp</code> binary or library.


== AI Summary for RAG ==
== AI Summary for RAG ==
'''Summary:''' This guide explains how to integrate OpenAI's Whisper ASR for call transcription in VoIPmonitor. It details two primary methods: on-demand transcription from the GUI (user-triggered) and automatic background transcription on the sniffer (server-side). For GUI on-demand transcription without installing packages or compiling, users can download a pre-built Whisper model directly from <code>https://download.voipmonitor.org/whisper/ggml-base.bin</code> to <code>/var/www/html/bin/</code> and set ownership to the web server user (e.g., <code>www-data:www-data</code>). For more advanced setups, the guide compares the two available engines: the official Python OpenAI Whisper library and the high-performance C++ port, whisper.cpp, recommending whisper.cpp for server-side processing. It provides step-by-step instructions for compiling whisper.cpp from source, building its libraries, and installing the Python package via pip. It details the necessary configuration in both the GUI's <code>configuration.php</code> (e.g., <code>WHISPER_NATIVE</code>, <code>WHISPER_MODEL</code>) and the sniffer's <code>voipmonitor.conf</code> (e.g., <code>audio_transcribe</code>, <code>whisper_native</code>, <code>whisper_native_lib</code>). It also covers optional parameters for fine-tuning, such as setting language, thread counts, and minimum call duration. Finally, it includes a detailed section on enabling NVIDIA CUDA acceleration for whisper.cpp to achieve significant performance gains and explains advanced integration methods like using whisper.cpp as a loadable library.


'''Keywords:''' whisper, transcription, asr, speech to text, openai, whisper.cpp, audio_transcribe, whisper_native, whisper_model, whisper_native_lib, cuda, nvidia, gpu, acceleration, gui, sniffer, automatic transcription, on-demand, libwhisper.so, loadable module
'''Summary:''' VoIPmonitor integrates Whisper ASR for call transcription via two modes: on-demand (GUI button) and automatic (sniffer background processing). Two engines available: whisper.cpp (C++, recommended, fast, CUDA support) and OpenAI Whisper (Python, easier install). Quick start: download pre-built model from <code>https://download.voipmonitor.org/whisper/ggml-base.bin</code> to <code>/var/www/html/bin/</code>, set ownership to www-data. Sniffer config: enable <code>audio_transcribe=yes</code> and <code>whisper_native=yes</code> with absolute path to model in <code>whisper_model</code>. Key parameters: <code>audio_transcribe_connect_duration_min</code> (min call length), <code>whisper_threads</code> (CPU threads), <code>whisper_language</code> (auto/code/by_number). CUDA acceleration available for whisper.cpp (30x speedup).
 
'''Keywords:''' whisper, transcription, asr, speech to text, openai, whisper.cpp, audio_transcribe, whisper_native, whisper_model, cuda, gpu, ggml-base.bin, libwhisper.so, automatic transcription, on-demand


'''Key Questions:'''
'''Key Questions:'''
* How can I transcribe phone calls in VoIPmonitor?
* How do I enable call transcription in VoIPmonitor?
* Can I enable on-demand transcription in the GUI without compiling or installing packages?
* What is the quickest way to enable Whisper transcription?
* How do I fix the "required Whisper model is missing" error?
* How do I download the Whisper model for the GUI?
* What is the simplest way to download the Whisper model for the GUI?
* What is the difference between whisper.cpp and OpenAI Whisper?
* What is the difference between OpenAI Whisper and whisper.cpp? Which one should I use?
* How do I configure automatic transcription on the sniffer?
* How do I configure on-demand call transcription in the GUI?
* What parameters control Whisper transcription behavior?
* How do I set up the sniffer for automatic, server-side transcription of all calls?
* How do I enable GPU acceleration for Whisper?
* What are the required parameters in voipmonitor.conf for Whisper?
* Why is the model download failing and how do I fix it?
* How can I speed up Whisper transcription using an NVIDIA GPU (CUDA)?
* How do I test Whisper transcription from the command line?
* How do I install and compile whisper.cpp, including its libraries (libwhisper.so)?
* What do the audio_transcribe, whisper_native, and whisper_native_lib options do?
* How do I use whisper.cpp as a loadable module in the sniffer?
* If the Whisper feature fails to download the model file, how do I troubleshoot the issue (firewall, proxy, DNS)?

Latest revision as of 16:48, 8 January 2026


Integrate OpenAI's Whisper ASR with VoIPmonitor for on-demand or automatic call transcription.

Overview

VoIPmonitor supports Whisper, a speech recognition system trained on 680,000 hours of multilingual data. Two integration modes are available:

Mode Location Use Case
On-Demand GUI server User clicks "Transcribe" on individual calls
Automatic Sensor All calls transcribed automatically after ending

Whisper Engines

Engine Pros Cons Recommended For
whisper.cpp (C++) Fast, low resource usage, CUDA support (30x speedup) Requires compilation Server-side processing
OpenAI Whisper (Python) Easy install (pip install) Slower, requires ffmpeg Quick testing

💡 Tip: Use whisper.cpp for production deployments. It's significantly faster and supports GPU acceleration.

Quick Start: GUI On-Demand (No Compilation)

The simplest setup - download a pre-built model and start transcribing immediately.

# Download model to GUI bin directory
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/html/bin/ggml-base.bin

# Set ownership (Debian/Ubuntu)
chown www-data:www-data /var/www/html/bin/ggml-base.bin

# For RedHat/CentOS, use: chown apache:apache

The "Transcribe" button now appears on call detail pages. No configuration changes needed.

GUI On-Demand: Advanced Setup

For custom model paths or using the Python engine.

Option 1: whisper.cpp with Custom Model

# Compile whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp && make -j

# Download model
./models/download-ggml-model.sh base.en

Configure /var/www/html/config/configuration.php:

define('WHISPER_NATIVE', true);
define('WHISPER_MODEL', '/path/to/whisper.cpp/models/ggml-base.en.bin');
define('WHISPER_THREADS', 4);  // Optional

Option 2: OpenAI Whisper (Python)

pip install openai-whisper
apt install ffmpeg  # or dnf install ffmpeg

Configure /var/www/html/config/configuration.php:

define('WHISPER_MODEL', '/opt/whisper_models/small.pt');
define('WHISPER_THREADS', 4);

Automatic Transcription (Sniffer)

Transcribe all calls automatically on the sensor after they end.

Basic Configuration

Edit /etc/voipmonitor.conf:

# Enable transcription
audio_transcribe = yes

# Using whisper.cpp (recommended)
whisper_native = yes
whisper_model = /path/to/whisper.cpp/models/ggml-small.bin

# OR using Python (slower)
# whisper_native = no
# whisper_model = small

Restart: systemctl restart voipmonitor

Configuration Parameters

Parameter Default Description
audio_transcribe no Enable/disable transcription
audio_transcribe_connect_duration_min 10 Minimum call duration (seconds) to transcribe
audio_transcribe_threads 2 Concurrent transcription jobs
audio_transcribe_queue_length_max 100 Max queue size
whisper_native no Use whisper.cpp (yes) or Python (no)
whisper_model small Model name (Python) or absolute path to .bin file (whisper.cpp)
whisper_language auto Language code (en, de), auto, or by_number
whisper_threads 2 CPU threads per transcription job
whisper_timeout 300 Timeout in seconds (Python only)
whisper_deterministic_mode yes Consistent results (Python only)
whisper_python - Custom Python binary path (Python only)
whisper_native_lib - Path to libwhisper.so (advanced)

Advanced: CUDA GPU Acceleration

Compile whisper.cpp with NVIDIA CUDA for up to 30x speedup.

# Install CUDA toolkit (see nvidia.com/cuda-downloads)
# Add to ~/.bashrc:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

# Compile with CUDA
cd /path/to/whisper.cpp
make clean
WHISPER_CUDA=1 make -j
WHISPER_CUDA=1 make libwhisper.so -j

Advanced: Loadable Module

Use whisper.cpp as a separate library (update without recompiling sniffer):

# Build libraries
cd /path/to/whisper.cpp
make libwhisper.so -j
make libwhisper.a -j

# Optional: Install system-wide
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so

Configure in voipmonitor.conf:

whisper_native_lib = /path/to/whisper.cpp/libwhisper.so

Troubleshooting

Model Download Fails

Test connectivity:

curl -I https://download.voipmonitor.org/whisper/ggml-base.bin

If blocked:

  • Check firewall: iptables -L -v -n, ufw status
  • Check proxy: Set HTTP_PROXY / HTTPS_PROXY environment variables
  • Check DNS: nslookup download.voipmonitor.org

Workaround: Download manually on another machine and copy via SCP.

Testing from CLI

/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' \
  --json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/ggml-small.bin"}]' \
  -v1,whisper

AI Summary for RAG

Summary: VoIPmonitor integrates Whisper ASR for call transcription via two modes: on-demand (GUI button) and automatic (sniffer background processing). Two engines available: whisper.cpp (C++, recommended, fast, CUDA support) and OpenAI Whisper (Python, easier install). Quick start: download pre-built model from https://download.voipmonitor.org/whisper/ggml-base.bin to /var/www/html/bin/, set ownership to www-data. Sniffer config: enable audio_transcribe=yes and whisper_native=yes with absolute path to model in whisper_model. Key parameters: audio_transcribe_connect_duration_min (min call length), whisper_threads (CPU threads), whisper_language (auto/code/by_number). CUDA acceleration available for whisper.cpp (30x speedup).

Keywords: whisper, transcription, asr, speech to text, openai, whisper.cpp, audio_transcribe, whisper_native, whisper_model, cuda, gpu, ggml-base.bin, libwhisper.so, automatic transcription, on-demand

Key Questions:

  • How do I enable call transcription in VoIPmonitor?
  • What is the quickest way to enable Whisper transcription?
  • How do I download the Whisper model for the GUI?
  • What is the difference between whisper.cpp and OpenAI Whisper?
  • How do I configure automatic transcription on the sniffer?
  • What parameters control Whisper transcription behavior?
  • How do I enable GPU acceleration for Whisper?
  • Why is the model download failing and how do I fix it?
  • How do I test Whisper transcription from the command line?