Whisper: Difference between revisions

From VoIPmonitor.org
(Add quick-start section for pre-built model download without compilation)
(Review: opravy formátování (pre→syntaxhighlight), přidán diagram architektury, opraveny backticks na code tagy, opraven překlep VoIPmirror)
Line 9: Line 9:
#'''On-Demand Transcription (in the GUI):''' The simplest method. A user can click a button on any call in the GUI to transcribe it. The processing happens on the GUI server.
#'''On-Demand Transcription (in the GUI):''' The simplest method. A user can click a button on any call in the GUI to transcribe it. The processing happens on the GUI server.
#'''Automatic Transcription (in the Sniffer):''' A more advanced setup where the sensor automatically transcribes all calls (or a subset) in the background immediately after they finish.
#'''Automatic Transcription (in the Sniffer):''' A more advanced setup where the sensor automatically transcribes all calls (or a subset) in the background immediately after they finish.
<kroki lang="mermaid">
flowchart TB
    subgraph "Path A: On-Demand (GUI)"
        A1[User clicks Transcribe] --> A2[GUI Server]
        A2 --> A3[whisper.cpp or OpenAI Whisper]
        A3 --> A4[Transcription displayed in GUI]
    end
    subgraph "Path B: Automatic (Sniffer)"
        B1[Call ends] --> B2[Sniffer detects call completion]
        B2 --> B3[Audio queued for transcription]
        B3 --> B4[whisper.cpp processes audio]
        B4 --> B5[Transcription stored in DB]
    end
</kroki>


For both methods, you must choose one of two underlying Whisper engines to install and configure.
For both methods, you must choose one of two underlying Whisper engines to install and configure.


;Choosing Your Whisper Engine
=== Choosing Your Whisper Engine ===
*'''OpenAI Whisper''' (Python): The official implementation from OpenAI. It is easier to install (<code>pip install openai-whisper</code>) but can be slower for CPU-based transcription. It uses PyTorch and requires `ffmpeg` for audio pre-processing. The official implementation does not behave deterministically (the same run can have different results), but this can be addressed with a custom script.
*'''OpenAI Whisper''' (Python): The official implementation from OpenAI. It is easier to install (<code>pip install openai-whisper</code>) but can be slower for CPU-based transcription. It uses PyTorch and requires <code>ffmpeg</code> for audio pre-processing. The official implementation does not behave deterministically (the same run can have different results), but this can be addressed with a custom script.
*'''whisper.cpp''' (C++): A high-performance C++ port of [https://github.com/ggerganov/whisper.cpp whisper.cpp]. It is significantly faster for CPU transcription and is the '''recommended engine for server-side processing'''. It requires manual compilation but offers superior performance and optimizations like NVIDIA CUDA for GPU acceleration (up to 30x faster). Note that it requires audio input to be 16kHz, 1-channel (mono).
*'''whisper.cpp''' (C++): A high-performance C++ port of [https://github.com/ggerganov/whisper.cpp whisper.cpp]. It is significantly faster for CPU transcription and is the '''recommended engine for server-side processing'''. It requires manual compilation but offers superior performance and optimizations like NVIDIA CUDA for GPU acceleration (up to 30x faster). Note that it requires audio input to be 16kHz, 1-channel (mono).


Line 21: Line 37:
=== Quick Start: Using Pre-built Model (No Compilation Required) ===
=== Quick Start: Using Pre-built Model (No Compilation Required) ===


If you want to enable on-demand transcription without compiling or installing packages, you can download a pre-built model from the VoIPmirror server and use it directly.
If you want to enable on-demand transcription without compiling or installing packages, you can download a pre-built model from the VoIPmonitor server and use it directly.


==== Step 1: Download the Pre-built Model ====
==== Step 1: Download the Pre-built Model ====
Download the Whisper model file directly to the GUI's <code>bin/</code> directory:
Download the Whisper model file directly to the GUI's <code>bin/</code> directory:
<pre>
<syntaxhighlight lang="bash">
# Download the base model to the default GUI directory
# Download the base model to the default GUI directory
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/html/bin/ggml-base.bin
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/html/bin/ggml-base.bin
Line 31: Line 47:
# For Debian-based systems where the GUI is in /var/www/:
# For Debian-based systems where the GUI is in /var/www/:
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/voipmonitor/bin/ggml-base.bin
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/voipmonitor/bin/ggml-base.bin
</pre>
</syntaxhighlight>


==== Step 2: Set File Ownership ====
==== Step 2: Set File Ownership ====
Ensure the web server user owns the model file:
Ensure the web server user owns the model file:
<pre>
<syntaxhighlight lang="bash">
# For Apache on Debian/Ubuntu (www-data user)
# For Apache on Debian/Ubuntu (www-data user)
chown www-data:www-data /var/www/html/bin/ggml-base.bin
chown www-data:www-data /var/www/html/bin/ggml-base.bin
Line 44: Line 60:
# For GUI in /var/www/voipmonitor:
# For GUI in /var/www/voipmonitor:
chown www-data:www-data /var/www/voipmonitor/bin/ggml-base.bin
chown www-data:www-data /var/www/voipmonitor/bin/ggml-base.bin
</pre>
</syntaxhighlight>


==== Step 3: Verify ====
==== Step 3: Verify ====
Line 51: Line 67:
'''Note:''' This method uses the <code>whisper.cpp</code> engine which is bundled with the GUI. The model file format (.bin) is compatible with the bundled engine.
'''Note:''' This method uses the <code>whisper.cpp</code> engine which is bundled with the GUI. The model file format (.bin) is compatible with the bundled engine.


=== Option 1: Using the `whisper.cpp` Engine (Recommended) ===
=== Option 1: Using the whisper.cpp Engine (Recommended) ===


==== Step 1: Install `whisper.cpp` and Download a Model ====
==== Step 1: Install whisper.cpp and Download a Model ====
First, you need to compile the `whisper.cpp` project and download a pre-trained model on your GUI server.
First, you need to compile the <code>whisper.cpp</code> project and download a pre-trained model on your GUI server.
<pre>
<syntaxhighlight lang="bash">
# Clone the repository
# Clone the repository
git clone https://github.com/ggerganov/whisper.cpp.git
git clone https://github.com/ggerganov/whisper.cpp.git
Line 65: Line 81:
# Download a model (e.g., 'base.en' for English-only, or 'small' for multilingual)
# Download a model (e.g., 'base.en' for English-only, or 'small' for multilingual)
./models/download-ggml-model.sh base.en
./models/download-ggml-model.sh base.en
</pre>
</syntaxhighlight>
This will create the main executable at <code>./main</code> and download the model to the <code>./models/</code> directory. Note that `whisper.cpp` models (GGML format) are not binary compatible with the official OpenAI models (.pt format), but a conversion script is provided in the project.
This will create the main executable at <code>./main</code> and download the model to the <code>./models/</code> directory. Note that <code>whisper.cpp</code> models (GGML format) are not binary compatible with the official OpenAI models (.pt format), but a conversion script is provided in the project.


==== Step 2: Configure the VoIPmonitor GUI ====
==== Step 2: Configure the VoIPmonitor GUI ====
Edit your GUI's configuration file at <code>/var/www/html/config/configuration.php</code> and add the following definitions:
Edit your GUI's configuration file at <code>/var/www/html/config/configuration.php</code> and add the following definitions:
<pre>
<syntaxhighlight lang="php">
<?php
<?php
// /var/www/html/config/configuration.php
// /var/www/html/config/configuration.php
Line 82: Line 98:
// Optional: Specify the number of threads for transcription
// Optional: Specify the number of threads for transcription
define('WHISPER_THREADS', 4);
define('WHISPER_THREADS', 4);
</pre>
</syntaxhighlight>
No further setup is required. The GUI will now show a "Transcribe" button on call detail pages.
No further setup is required. The GUI will now show a "Transcribe" button on call detail pages.


=== Option 2: Using the `OpenAI Whisper` Engine ===
=== Option 2: Using the OpenAI Whisper Engine ===


==== Step 1: Install the Python Package and Dependencies ====
==== Step 1: Install the Python Package and Dependencies ====
<pre>
<syntaxhighlight lang="bash">
# Install the whisper library via pip
# Install the whisper library via pip
pip install openai-whisper
pip install openai-whisper
Line 97: Line 113:
# For CentOS/RHEL/Fedora
# For CentOS/RHEL/Fedora
sudo dnf install ffmpeg
sudo dnf install ffmpeg
</pre>
</syntaxhighlight>


==== Step 2: Prepare the Model and Configure the GUI ====
==== Step 2: Prepare the Model and Configure the GUI ====
The Python library can download models automatically, but it's best practice to specify an absolute path in the configuration. First, trigger a download to a known location.
The Python library can download models automatically, but it's best practice to specify an absolute path in the configuration. First, trigger a download to a known location.
<pre>
<syntaxhighlight lang="bash">
# This command will download the 'small' model to /opt/whisper_models/
# This command will download the 'small' model to /opt/whisper_models/
# You can use any audio file; its content doesn't matter for the download.
# You can use any audio file; its content doesn't matter for the download.
whisper audio.wav --model=small --model_dir=/opt/whisper_models
whisper audio.wav --model=small --model_dir=/opt/whisper_models
</pre>
</syntaxhighlight>
Now, edit <code>/var/www/html/config/configuration.php</code> and provide the full path to the downloaded model file.
Now, edit <code>/var/www/html/config/configuration.php</code> and provide the full path to the downloaded model file.
<pre>
<syntaxhighlight lang="php">
<?php
<?php
// /var/www/html/config/configuration.php
// /var/www/html/config/configuration.php
Line 116: Line 132:
// Optional: Specify the number of threads
// Optional: Specify the number of threads
define('WHISPER_THREADS', 4);
define('WHISPER_THREADS', 4);
</pre>
</syntaxhighlight>


=== Testing the GUI Integration ===
=== Testing the GUI Integration ===
You can test the transcription process from the command line as the GUI would run it. This is useful for debugging paths and performance.
You can test the transcription process from the command line as the GUI would run it. This is useful for debugging paths and performance.
<pre>
<syntaxhighlight lang="bash">
# Example test for a whisper.cpp setup
# Example test for a whisper.cpp setup
/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' --json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/your/whisper.cpp/models/ggml-small.bin"},{"whisper_threads":"2"}]' -v1,whisper
/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' --json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/your/whisper.cpp/models/ggml-small.bin"},{"whisper_threads":"2"}]' -v1,whisper
</pre>
</syntaxhighlight>


== Path B: Automatic Transcription in the Sniffer ==
== Path B: Automatic Transcription in the Sniffer ==
This setup automatically transcribes calls in the background on the sensor itself. This is a headless operation and requires configuration in <code>voipmonitor.conf</code>. Using '''<code>whisper.cpp</code> is strongly recommended''' for this server-side task due to its superior performance.
This setup automatically transcribes calls in the background on the sensor itself. This is a headless operation and requires configuration in <code>voipmonitor.conf</code>. Using '''whisper.cpp is strongly recommended''' for this server-side task due to its superior performance.


=== Step 1: Prepare Your Engine on the Sensor ===
=== Step 1: Prepare Your Engine on the Sensor ===
You must have one of the Whisper engines installed '''on the sensor machine'''.
You must have one of the Whisper engines installed '''on the sensor machine'''.


;For `whisper.cpp`:
;For whisper.cpp:
Follow the installation steps from "Path A" to compile `whisper.cpp`. For advanced integration, you may need to build the shared libraries and install them system-wide (see Advanced section below).
Follow the installation steps from "Path A" to compile <code>whisper.cpp</code>. For advanced integration, you may need to build the shared libraries and install them system-wide (see Advanced section below).


;For `OpenAI Whisper`:
;For OpenAI Whisper:
Follow the Python package installation steps from "Path A".
Follow the Python package installation steps from "Path A".


Line 140: Line 156:
Edit <code>/etc/voipmonitor.conf</code> on your sensor to enable and control automatic transcription. You have three main ways to integrate it.
Edit <code>/etc/voipmonitor.conf</code> on your sensor to enable and control automatic transcription. You have three main ways to integrate it.


==== Option 1: Using `whisper.cpp` (Recommended) ====
==== Option 1: Using whisper.cpp (Recommended) ====
This uses the compiled `main` executable.
This uses the compiled <code>main</code> executable.
<pre>
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
# /etc/voipmonitor.conf


Line 154: Line 170:
# You MUST provide the absolute path to the downloaded whisper.cpp model file
# You MUST provide the absolute path to the downloaded whisper.cpp model file
whisper_model = /path/to/your/whisper.cpp/models/ggml-small.bin
whisper_model = /path/to/your/whisper.cpp/models/ggml-small.bin
</pre>
</syntaxhighlight>


==== Option 2: Using `OpenAI Whisper` ====
==== Option 2: Using OpenAI Whisper ====
This uses the Python library.
This uses the Python library.
<pre>
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
# /etc/voipmonitor.conf


Line 170: Line 186:
# The library will download it to ~/.cache/whisper/ if not found.
# The library will download it to ~/.cache/whisper/ if not found.
whisper_model = small
whisper_model = small
</pre>
</syntaxhighlight>


==== Option 3: Using `whisper.cpp` as a Loadable Module (Advanced) ====
==== Option 3: Using whisper.cpp as a Loadable Module (Advanced) ====
This method allows you to update the `whisper.cpp` library without recompiling the entire sniffer. It requires a modified `whisper.cpp` build (see Advanced section).
This method allows you to update the <code>whisper.cpp</code> library without recompiling the entire sniffer. It requires a modified <code>whisper.cpp</code> build (see Advanced section).
<pre>
<syntaxhighlight lang="ini">
# /etc/voipmonitor.conf
# /etc/voipmonitor.conf


Line 183: Line 199:
# Specify the path to the compiled shared library
# Specify the path to the compiled shared library
whisper_native_lib = /path/to/your/whisper.cpp/libwhisper.so
whisper_native_lib = /path/to/your/whisper.cpp/libwhisper.so
</pre>
</syntaxhighlight>


=== Step 3: Fine-Tuning Transcription Parameters ===
=== Step 3: Fine-Tuning Transcription Parameters ===
Line 197: Line 213:
: (Default: 100) The maximum number of calls waiting in the transcription queue.
: (Default: 100) The maximum number of calls waiting in the transcription queue.
;<code>whisper_native = no</code>
;<code>whisper_native = no</code>
: (Default: no) Set to `yes` to force the use of the `whisper.cpp` engine.
: (Default: no) Set to <code>yes</code> to force the use of the <code>whisper.cpp</code> engine.
;<code>whisper_model = small</code>
;<code>whisper_model = small</code>
: For `OpenAI Whisper`, this is the model name (tiny, base, small, etc.). For `whisper.cpp`, this '''must''' be the full, absolute path to the `.bin` model file.
: For OpenAI Whisper, this is the model name (tiny, base, small, etc.). For <code>whisper.cpp</code>, this '''must''' be the full, absolute path to the <code>.bin</code> model file.
;<code>whisper_language = auto</code>
;<code>whisper_language = auto</code>
: (Default: auto) Can be a specific language code (e.g., <code>en</code>, <code>de</code>), `auto` for detection, or <code>by_number</code> to guess based on the phone number's country code.
: (Default: auto) Can be a specific language code (e.g., <code>en</code>, <code>de</code>), <code>auto</code> for detection, or <code>by_number</code> to guess based on the phone number's country code.
;<code>whisper_threads = 2</code>
;<code>whisper_threads = 2</code>
: (Default: 2) The number of CPU threads to use for a ''single'' transcription job.
: (Default: 2) The number of CPU threads to use for a ''single'' transcription job.
;<code>whisper_timeout = 300</code>
;<code>whisper_timeout = 300</code>
: (Default: 300) For `OpenAI Whisper` only. Maximum time in seconds for a single transcription.
: (Default: 300) For OpenAI Whisper only. Maximum time in seconds for a single transcription.
;<code>whisper_deterministic_mode = yes</code>
;<code>whisper_deterministic_mode = yes</code>
: (Default: yes) For `OpenAI Whisper` only. Aims for more consistent, repeatable transcription results.
: (Default: yes) For OpenAI Whisper only. Aims for more consistent, repeatable transcription results.
;<code>whisper_python = /usr/bin/python3</code>
;<code>whisper_python = /usr/bin/python3</code>
: (Default: not set) For `OpenAI Whisper` only. Specifies the path to the Python binary if it's not in the system's `PATH`.
: (Default: not set) For OpenAI Whisper only. Specifies the path to the Python binary if it's not in the system's <code>PATH</code>.
;<code>whisper_native_lib = /path/to/libwhisper.so</code>
;<code>whisper_native_lib = /path/to/libwhisper.so</code>
: (Default: not set) For `whisper.cpp` only. Specifies the path to the shared library when using the loadable module method.
: (Default: not set) For <code>whisper.cpp</code> only. Specifies the path to the shared library when using the loadable module method.


== Advanced Topics ==
== Advanced Topics ==


=== Compiling `whisper.cpp` with Libraries for Sniffer Integration ===
=== Compiling whisper.cpp with Libraries for Sniffer Integration ===
To compile the VoIPmonitor sniffer with built-in `whisper.cpp` support or to use it as a loadable library, you must build its shared and static libraries.
To compile the VoIPmonitor sniffer with built-in <code>whisper.cpp</code> support or to use it as a loadable library, you must build its shared and static libraries.


;1. Build the libraries:
;1. Build the libraries:
<pre>
<syntaxhighlight lang="bash">
cd /path/to/your/whisper.cpp
cd /path/to/your/whisper.cpp
# Build the main executable, shared lib, and static lib
# Build the main executable, shared lib, and static lib
Line 225: Line 241:
make libwhisper.so -j
make libwhisper.so -j
make libwhisper.a -j
make libwhisper.a -j
</pre>
</syntaxhighlight>


;2. (Optional) Apply patch for loadable module:
;2. (Optional) Apply patch for loadable module:
For the advanced "loadable module" integration (<code>whisper_native_lib</code>), a patch is required.
For the advanced "loadable module" integration (<code>whisper_native_lib</code>), a patch is required.
<pre>
<syntaxhighlight lang="bash">
# Inside the whisper.cpp directory
# Inside the whisper.cpp directory
patch < whisper.diff
patch < whisper.diff
Line 235: Line 251:
make -j
make -j
make libwhisper.so -j
make libwhisper.so -j
</pre>
</syntaxhighlight>


;3. Install libraries and headers:
;3. Install libraries and headers:
For the sniffer's build process to find the `whisper.cpp` components, place them in standard system locations or create symbolic links.
For the sniffer's build process to find the <code>whisper.cpp</code> components, place them in standard system locations or create symbolic links.
<pre>
<syntaxhighlight lang="bash">
# Create symbolic links to the compiled files in your whisper.cpp directory
# Create symbolic links to the compiled files in your whisper.cpp directory
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
Line 245: Line 261:
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so
ln -s $(pwd)/libwhisper.a /usr/local/lib64/libwhisper.a
ln -s $(pwd)/libwhisper.a /usr/local/lib64/libwhisper.a
</pre>
</syntaxhighlight>


=== CUDA Acceleration for `whisper.cpp` ===
=== CUDA Acceleration for whisper.cpp ===
To achieve a massive speed increase (up to 30x), you can compile <code>whisper.cpp</code> with NVIDIA CUDA support. This is highly recommended if you have a compatible NVIDIA GPU on your sensor or GUI server.
To achieve a massive speed increase (up to 30x), you can compile <code>whisper.cpp</code> with NVIDIA CUDA support. This is highly recommended if you have a compatible NVIDIA GPU on your sensor or GUI server.


Line 255: Line 271:
;2. Set environment variables:
;2. Set environment variables:
Ensure the CUDA toolkit is in your system's path. You can add these lines to your <code>~/.bashrc</code> file.
Ensure the CUDA toolkit is in your system's path. You can add these lines to your <code>~/.bashrc</code> file.
<pre>
<syntaxhighlight lang="bash">
export PATH=/usr/local/cuda/bin:$PATH
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
</pre>
</syntaxhighlight>
Verify with <code>nvcc --version</code>.
Verify with <code>nvcc --version</code>.


;3. Re-compile `whisper.cpp` with the CUDA flag:
;3. Re-compile whisper.cpp with the CUDA flag:
<pre>
<syntaxhighlight lang="bash">
cd /path/to/your/whisper.cpp
cd /path/to/your/whisper.cpp
make clean
make clean
Line 269: Line 285:
WHISPER_CUDA=1 make libwhisper.so -j
WHISPER_CUDA=1 make libwhisper.so -j
WHISPER_CUDA=1 make libwhisper.a -j
WHISPER_CUDA=1 make libwhisper.a -j
</pre>
</syntaxhighlight>
VoIPmonitor will automatically detect and use the CUDA-enabled `whisper.cpp` binary or library.
VoIPmonitor will automatically detect and use the CUDA-enabled <code>whisper.cpp</code> binary or library.


== AI Summary for RAG ==
== AI Summary for RAG ==
'''Summary:''' This guide explains how to integrate OpenAI's Whisper ASR for call transcription in VoIPmonitor. It details two primary methods: on-demand transcription from the GUI (user-triggered) and automatic background transcription on the sniffer (server-side). For GUI on-demand transcription without installing packages or compiling, users can download a pre-built Whisper model directly from `https://download.voipmonitor.org/whisper/ggml-base.bin` to `/var/www/html/bin/` and set ownership to the web server user (e.g., `www-data:www-data`). For more advanced setups, the guide compares the two available engines: the official Python `OpenAI Whisper` library and the high-performance C++ port, `whisper.cpp`, recommending `whisper.cpp` for server-side processing. It provides step-by-step instructions for compiling `whisper.cpp` from source, building its libraries, and installing the Python package via `pip`. It details the necessary configuration in both the GUI's `configuration.php` (e.g., `WHISPER_NATIVE`, `WHISPER_MODEL`) and the sniffer's `voipmonitor.conf` (e.g., `audio_transcribe`, `whisper_native`, `whisper_native_lib`). It also covers optional parameters for fine-tuning, such as setting language, thread counts, and minimum call duration. Finally, it includes a detailed section on enabling NVIDIA CUDA acceleration for `whisper.cpp` to achieve significant performance gains and explains advanced integration methods like using `whisper.cpp` as a loadable library.
'''Summary:''' This guide explains how to integrate OpenAI's Whisper ASR for call transcription in VoIPmonitor. It details two primary methods: on-demand transcription from the GUI (user-triggered) and automatic background transcription on the sniffer (server-side). For GUI on-demand transcription without installing packages or compiling, users can download a pre-built Whisper model directly from <code>https://download.voipmonitor.org/whisper/ggml-base.bin</code> to <code>/var/www/html/bin/</code> and set ownership to the web server user (e.g., <code>www-data:www-data</code>). For more advanced setups, the guide compares the two available engines: the official Python OpenAI Whisper library and the high-performance C++ port, whisper.cpp, recommending whisper.cpp for server-side processing. It provides step-by-step instructions for compiling whisper.cpp from source, building its libraries, and installing the Python package via pip. It details the necessary configuration in both the GUI's <code>configuration.php</code> (e.g., <code>WHISPER_NATIVE</code>, <code>WHISPER_MODEL</code>) and the sniffer's <code>voipmonitor.conf</code> (e.g., <code>audio_transcribe</code>, <code>whisper_native</code>, <code>whisper_native_lib</code>). It also covers optional parameters for fine-tuning, such as setting language, thread counts, and minimum call duration. Finally, it includes a detailed section on enabling NVIDIA CUDA acceleration for whisper.cpp to achieve significant performance gains and explains advanced integration methods like using whisper.cpp as a loadable library.
'''Keywords:''' whisper, transcription, asr, speech to text, openai, whisper.cpp, `audio_transcribe`, `whisper_native`, `whisper_model`, `whisper_native_lib`, cuda, nvidia, gpu, acceleration, gui, sniffer, automatic transcription, on-demand, libwhisper.so, loadable module
 
'''Keywords:''' whisper, transcription, asr, speech to text, openai, whisper.cpp, audio_transcribe, whisper_native, whisper_model, whisper_native_lib, cuda, nvidia, gpu, acceleration, gui, sniffer, automatic transcription, on-demand, libwhisper.so, loadable module
 
'''Key Questions:'''
'''Key Questions:'''
* How can I transcribe phone calls in VoIPmonitor?
* How can I transcribe phone calls in VoIPmonitor?
Line 283: Line 301:
* How do I configure on-demand call transcription in the GUI?
* How do I configure on-demand call transcription in the GUI?
* How do I set up the sniffer for automatic, server-side transcription of all calls?
* How do I set up the sniffer for automatic, server-side transcription of all calls?
* What are the required parameters in `voipmonitor.conf` for Whisper?
* What are the required parameters in voipmonitor.conf for Whisper?
* How can I speed up Whisper transcription using an NVIDIA GPU (CUDA)?
* How can I speed up Whisper transcription using an NVIDIA GPU (CUDA)?
* How do I install and compile `whisper.cpp`, including its libraries (`libwhisper.so`)?
* How do I install and compile whisper.cpp, including its libraries (libwhisper.so)?
* What do the `audio_transcribe`, `whisper_native`, and `whisper_native_lib` options do?
* What do the audio_transcribe, whisper_native, and whisper_native_lib options do?
* How do I use `whisper.cpp` as a loadable module in the sniffer?
* How do I use whisper.cpp as a loadable module in the sniffer?

Revision as of 11:25, 6 January 2026


This guide explains how to integrate OpenAI's Whisper, a powerful automatic speech recognition (ASR) system, with VoIPmonitor for both on-demand and automatic call transcription.

Introduction to Whisper Integration

VoIPmonitor integrates Whisper, an ASR system from OpenAI trained on 680,000 hours of multilingual and multitask supervised data. This large and diverse dataset leads to improved robustness to accents, background noise, and technical language.

There are two primary ways to use Whisper with VoIPmonitor:

  1. On-Demand Transcription (in the GUI): The simplest method. A user can click a button on any call in the GUI to transcribe it. The processing happens on the GUI server.
  2. Automatic Transcription (in the Sniffer): A more advanced setup where the sensor automatically transcribes all calls (or a subset) in the background immediately after they finish.

For both methods, you must choose one of two underlying Whisper engines to install and configure.

Choosing Your Whisper Engine

  • OpenAI Whisper (Python): The official implementation from OpenAI. It is easier to install (pip install openai-whisper) but can be slower for CPU-based transcription. It uses PyTorch and requires ffmpeg for audio pre-processing. The official implementation does not behave deterministically (the same run can have different results), but this can be addressed with a custom script.
  • whisper.cpp (C++): A high-performance C++ port of whisper.cpp. It is significantly faster for CPU transcription and is the recommended engine for server-side processing. It requires manual compilation but offers superior performance and optimizations like NVIDIA CUDA for GPU acceleration (up to 30x faster). Note that it requires audio input to be 16kHz, 1-channel (mono).

Path A: On-Demand Transcription in the GUI

This setup allows users to manually trigger transcription from the call detail page. The processing occurs on the web server where the GUI is hosted.

Quick Start: Using Pre-built Model (No Compilation Required)

If you want to enable on-demand transcription without compiling or installing packages, you can download a pre-built model from the VoIPmonitor server and use it directly.

Step 1: Download the Pre-built Model

Download the Whisper model file directly to the GUI's bin/ directory:

# Download the base model to the default GUI directory
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/html/bin/ggml-base.bin

# For Debian-based systems where the GUI is in /var/www/:
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/voipmonitor/bin/ggml-base.bin

Step 2: Set File Ownership

Ensure the web server user owns the model file:

# For Apache on Debian/Ubuntu (www-data user)
chown www-data:www-data /var/www/html/bin/ggml-base.bin

# For Apache on RedHat/CentOS (apache user)
chown apache:apache /var/www/html/bin/ggml-base.bin

# For GUI in /var/www/voipmonitor:
chown www-data:www-data /var/www/voipmonitor/bin/ggml-base.bin

Step 3: Verify

The "Transcribe" button should now appear on call detail pages in the GUI. No configuration changes are required when using the default /var/www/html/bin/ location.

Note: This method uses the whisper.cpp engine which is bundled with the GUI. The model file format (.bin) is compatible with the bundled engine.

Option 1: Using the whisper.cpp Engine (Recommended)

Step 1: Install whisper.cpp and Download a Model

First, you need to compile the whisper.cpp project and download a pre-trained model on your GUI server.

# Clone the repository
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp

# Compile the main application
make -j

# Download a model (e.g., 'base.en' for English-only, or 'small' for multilingual)
./models/download-ggml-model.sh base.en

This will create the main executable at ./main and download the model to the ./models/ directory. Note that whisper.cpp models (GGML format) are not binary compatible with the official OpenAI models (.pt format), but a conversion script is provided in the project.

Step 2: Configure the VoIPmonitor GUI

Edit your GUI's configuration file at /var/www/html/config/configuration.php and add the following definitions:

<?php
// /var/www/html/config/configuration.php

// Tell the GUI to use the whisper.cpp engine
define('WHISPER_NATIVE', true);

// Provide the absolute path to the model file you downloaded
define('WHISPER_MODEL', '/path/to/your/whisper.cpp/models/ggml-base.en.bin');

// Optional: Specify the number of threads for transcription
define('WHISPER_THREADS', 4);

No further setup is required. The GUI will now show a "Transcribe" button on call detail pages.

Option 2: Using the OpenAI Whisper Engine

Step 1: Install the Python Package and Dependencies

# Install the whisper library via pip
pip install openai-whisper

# Install ffmpeg, which is required for audio conversion
# For Debian/Ubuntu
sudo apt-get install ffmpeg
# For CentOS/RHEL/Fedora
sudo dnf install ffmpeg

Step 2: Prepare the Model and Configure the GUI

The Python library can download models automatically, but it's best practice to specify an absolute path in the configuration. First, trigger a download to a known location.

# This command will download the 'small' model to /opt/whisper_models/
# You can use any audio file; its content doesn't matter for the download.
whisper audio.wav --model=small --model_dir=/opt/whisper_models

Now, edit /var/www/html/config/configuration.php and provide the full path to the downloaded model file.

<?php
// /var/www/html/config/configuration.php

// Provide the absolute path to the downloaded .pt model file.
define('WHISPER_MODEL', '/opt/whisper_models/small.pt');

// Optional: Specify the number of threads
define('WHISPER_THREADS', 4);

Testing the GUI Integration

You can test the transcription process from the command line as the GUI would run it. This is useful for debugging paths and performance.

# Example test for a whisper.cpp setup
/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' --json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/your/whisper.cpp/models/ggml-small.bin"},{"whisper_threads":"2"}]' -v1,whisper

Path B: Automatic Transcription in the Sniffer

This setup automatically transcribes calls in the background on the sensor itself. This is a headless operation and requires configuration in voipmonitor.conf. Using whisper.cpp is strongly recommended for this server-side task due to its superior performance.

Step 1: Prepare Your Engine on the Sensor

You must have one of the Whisper engines installed on the sensor machine.

For whisper.cpp

Follow the installation steps from "Path A" to compile whisper.cpp. For advanced integration, you may need to build the shared libraries and install them system-wide (see Advanced section below).

For OpenAI Whisper

Follow the Python package installation steps from "Path A".

Step 2: Configure the Sniffer

Edit /etc/voipmonitor.conf on your sensor to enable and control automatic transcription. You have three main ways to integrate it.

Option 1: Using whisper.cpp (Recommended)

This uses the compiled main executable.

# /etc/voipmonitor.conf

# Enable the transcription feature
audio_transcribe = yes

# Tell the sniffer to use the high-performance C++ engine
whisper_native = yes

# --- CRITICAL ---
# You MUST provide the absolute path to the downloaded whisper.cpp model file
whisper_model = /path/to/your/whisper.cpp/models/ggml-small.bin

Option 2: Using OpenAI Whisper

This uses the Python library.

# /etc/voipmonitor.conf

# Enable the transcription feature
audio_transcribe = yes

# Use the Python engine (this is the default, but explicit is better)
whisper_native = no

# Specify the model name to use ('small' is a good default).
# The library will download it to ~/.cache/whisper/ if not found.
whisper_model = small

Option 3: Using whisper.cpp as a Loadable Module (Advanced)

This method allows you to update the whisper.cpp library without recompiling the entire sniffer. It requires a modified whisper.cpp build (see Advanced section).

# /etc/voipmonitor.conf

audio_transcribe = yes
whisper_native = yes
whisper_model = /path/to/your/whisper.cpp/models/ggml-small.bin

# Specify the path to the compiled shared library
whisper_native_lib = /path/to/your/whisper.cpp/libwhisper.so

Step 3: Fine-Tuning Transcription Parameters

The following parameters in voipmonitor.conf allow you to control the transcription process:

audio_transcribe = yes
(Default: no) Enables the audio transcription feature.
audio_transcribe_connect_duration_min = 10
(Default: 10) Only transcribes calls that were connected for at least this many seconds.
audio_transcribe_threads = 2
(Default: 2) The number of calls to transcribe concurrently.
audio_transcribe_queue_length_max = 100
(Default: 100) The maximum number of calls waiting in the transcription queue.
whisper_native = no
(Default: no) Set to yes to force the use of the whisper.cpp engine.
whisper_model = small
For OpenAI Whisper, this is the model name (tiny, base, small, etc.). For whisper.cpp, this must be the full, absolute path to the .bin model file.
whisper_language = auto
(Default: auto) Can be a specific language code (e.g., en, de), auto for detection, or by_number to guess based on the phone number's country code.
whisper_threads = 2
(Default: 2) The number of CPU threads to use for a single transcription job.
whisper_timeout = 300
(Default: 300) For OpenAI Whisper only. Maximum time in seconds for a single transcription.
whisper_deterministic_mode = yes
(Default: yes) For OpenAI Whisper only. Aims for more consistent, repeatable transcription results.
whisper_python = /usr/bin/python3
(Default: not set) For OpenAI Whisper only. Specifies the path to the Python binary if it's not in the system's PATH.
whisper_native_lib = /path/to/libwhisper.so
(Default: not set) For whisper.cpp only. Specifies the path to the shared library when using the loadable module method.

Advanced Topics

Compiling whisper.cpp with Libraries for Sniffer Integration

To compile the VoIPmonitor sniffer with built-in whisper.cpp support or to use it as a loadable library, you must build its shared and static libraries.

1. Build the libraries
cd /path/to/your/whisper.cpp
# Build the main executable, shared lib, and static lib
make -j
make libwhisper.so -j
make libwhisper.a -j
2. (Optional) Apply patch for loadable module

For the advanced "loadable module" integration (whisper_native_lib), a patch is required.

# Inside the whisper.cpp directory
patch < whisper.diff
make clean
make -j
make libwhisper.so -j
3. Install libraries and headers

For the sniffer's build process to find the whisper.cpp components, place them in standard system locations or create symbolic links.

# Create symbolic links to the compiled files in your whisper.cpp directory
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
ln -s $(pwd)/ggml.h /usr/local/include/ggml.h
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so
ln -s $(pwd)/libwhisper.a /usr/local/lib64/libwhisper.a

CUDA Acceleration for whisper.cpp

To achieve a massive speed increase (up to 30x), you can compile whisper.cpp with NVIDIA CUDA support. This is highly recommended if you have a compatible NVIDIA GPU on your sensor or GUI server.

1. Install the NVIDIA CUDA Toolkit

Follow the official guide for your Linux distribution.

2. Set environment variables

Ensure the CUDA toolkit is in your system's path. You can add these lines to your ~/.bashrc file.

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Verify with nvcc --version.

3. Re-compile whisper.cpp with the CUDA flag
cd /path/to/your/whisper.cpp
make clean
# Rebuild the executable and libraries with CUDA enabled
WHISPER_CUDA=1 make -j
WHISPER_CUDA=1 make libwhisper.so -j
WHISPER_CUDA=1 make libwhisper.a -j

VoIPmonitor will automatically detect and use the CUDA-enabled whisper.cpp binary or library.

AI Summary for RAG

Summary: This guide explains how to integrate OpenAI's Whisper ASR for call transcription in VoIPmonitor. It details two primary methods: on-demand transcription from the GUI (user-triggered) and automatic background transcription on the sniffer (server-side). For GUI on-demand transcription without installing packages or compiling, users can download a pre-built Whisper model directly from https://download.voipmonitor.org/whisper/ggml-base.bin to /var/www/html/bin/ and set ownership to the web server user (e.g., www-data:www-data). For more advanced setups, the guide compares the two available engines: the official Python OpenAI Whisper library and the high-performance C++ port, whisper.cpp, recommending whisper.cpp for server-side processing. It provides step-by-step instructions for compiling whisper.cpp from source, building its libraries, and installing the Python package via pip. It details the necessary configuration in both the GUI's configuration.php (e.g., WHISPER_NATIVE, WHISPER_MODEL) and the sniffer's voipmonitor.conf (e.g., audio_transcribe, whisper_native, whisper_native_lib). It also covers optional parameters for fine-tuning, such as setting language, thread counts, and minimum call duration. Finally, it includes a detailed section on enabling NVIDIA CUDA acceleration for whisper.cpp to achieve significant performance gains and explains advanced integration methods like using whisper.cpp as a loadable library.

Keywords: whisper, transcription, asr, speech to text, openai, whisper.cpp, audio_transcribe, whisper_native, whisper_model, whisper_native_lib, cuda, nvidia, gpu, acceleration, gui, sniffer, automatic transcription, on-demand, libwhisper.so, loadable module

Key Questions:

  • How can I transcribe phone calls in VoIPmonitor?
  • Can I enable on-demand transcription in the GUI without compiling or installing packages?
  • How do I fix the "required Whisper model is missing" error?
  • What is the simplest way to download the Whisper model for the GUI?
  • What is the difference between OpenAI Whisper and whisper.cpp? Which one should I use?
  • How do I configure on-demand call transcription in the GUI?
  • How do I set up the sniffer for automatic, server-side transcription of all calls?
  • What are the required parameters in voipmonitor.conf for Whisper?
  • How can I speed up Whisper transcription using an NVIDIA GPU (CUDA)?
  • How do I install and compile whisper.cpp, including its libraries (libwhisper.so)?
  • What do the audio_transcribe, whisper_native, and whisper_native_lib options do?
  • How do I use whisper.cpp as a loadable module in the sniffer?