Whisper: Difference between revisions

From VoIPmonitor.org
Jump to navigation Jump to search
No edit summary
No edit summary
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Whisper ==
{{DISPLAYTITLE:Call Transcription with Whisper AI}}


VoIPmonitor integrates [https://openai.com/index/whisper/  whisper] from OpenAI which is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. The use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English.  
'''This guide explains how to integrate OpenAI's Whisper, a powerful automatic speech recognition (ASR) system, with VoIPmonitor for both on-demand and automatic call transcription.'''


This integration is built directly into the GUI for selective transcription and also in to the sniffer for online transcription.  
== Introduction to Whisper Integration ==
VoIPmonitor integrates [https://openai.com/index/whisper/ Whisper], an ASR system from OpenAI trained on 680,000 hours of multilingual and multitask supervised data. This large and diverse dataset leads to improved robustness to accents, background noise, and technical language.


Processing speed depends on used model and available hardware. Accelerator like Nvidia CUDA are 30x faster than CPU.  
There are two primary ways to use Whisper with VoIPmonitor:
#'''On-Demand Transcription (in the GUI):''' The simplest method. A user can click a button on any call in the GUI to transcribe it. The processing happens on the GUI server.
#'''Automatic Transcription (in the Sniffer):''' A more advanced setup where the sensor automatically transcribes all calls (or a subset) in the background immediately after they finish.


For the demonstration purpose install the latest GUI and click on transcribe icon - this method does not need to configure or install anything.  
For both methods, you must choose one of two underlying Whisper engines to install and configure.


;Choosing Your Whisper Engine
*'''OpenAI Whisper''' (Python): The official implementation from OpenAI. It is easier to install (<code>pip install openai-whisper</code>) but can be slower for CPU-based transcription. It uses PyTorch and requires `ffmpeg` for audio pre-processing. The official implementation does not behave deterministically (the same run can have different results), but this can be addressed with a custom script.
*'''whisper.cpp''' (C++): A high-performance C++ port of [https://github.com/ggerganov/whisper.cpp whisper.cpp]. It is significantly faster for CPU transcription and is the '''recommended engine for server-side processing'''. It requires manual compilation but offers superior performance and optimizations like NVIDIA CUDA for GPU acceleration (up to 30x faster). Note that it requires audio input to be 16kHz, 1-channel (mono).


=== Methods ===
== Path A: On-Demand Transcription in the GUI ==
There are two modes of Whisper that can be integrated into the sniffer.
This setup allows users to manually trigger transcription from the call detail page. The processing occurs on the web server where the GUI is hosted.


* [https://openai.com/index/whisper/ Openai Whisper]
=== Option 1: Using the `whisper.cpp` Engine (Recommended) ===
* [https://github.com/ggerganov/whisper.cpp whisper.cpp]


Both modes require a model, which are trained data sets that are loaded into the machine learning library.
==== Step 1: Install `whisper.cpp` and Download a Model ====
First, you need to compile the `whisper.cpp` project and download a pre-trained model on your GUI server.
<pre>
# Clone the repository
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp


==== Openai Whisper ====
# Compile the main application
make -j


===== Installation =====
# Download a model (e.g., 'base.en' for English-only, or 'small' for multilingual)
Installation of OpenAI Whisper.
./models/download-ggml-model.sh base.en
</pre>
This will create the main executable at <code>./main</code> and download the model to the <code>./models/</code> directory. Note that `whisper.cpp` models (GGML format) are not binary compatible with the official OpenAI models (.pt format), but a conversion script is provided in the project.


<code>pip install openai-whisper</code>
==== Step 2: Configure the VoIPmonitor GUI ====
Edit your GUI's configuration file at <code>/var/www/html/config/configuration.php</code> and add the following definitions:
<pre>
<?php
// /var/www/html/config/configuration.php


Dependency checks will automatically enforce the installation of PyTorch, an open source machine learning library. It is also recommended to install ffmpeg. So for Debian:
// Tell the GUI to use the whisper.cpp engine
define('WHISPER_NATIVE', true);


<code>sudo apt install ffmpeg</code>
// Provide the absolute path to the model file you downloaded
define('WHISPER_MODEL', '/path/to/your/whisper.cpp/models/ggml-base.en.bin');


or for RedHat/Fedora:
// Optional: Specify the number of threads for transcription
define('WHISPER_THREADS', 4);
</pre>
No further setup is required. The GUI will now show a "Transcribe" button on call detail pages.


<code>sudo dnf install ffmpeg</code>
=== Option 2: Using the `OpenAI Whisper` Engine ===


===== Model =====
==== Step 1: Install the Python Package and Dependencies ====
Whisper automatically downloads the model. The type of model can be specified with the --model parameter. Models are base, tiny, small, medium, large. The default model is small.
<pre>
# Install the whisper library via pip
pip install openai-whisper


The model is downloaded to the home directory in the folder ~/.cache/whisper/. The location of the models can be changed with the --model_dir parameter.
# Install ffmpeg, which is required for audio conversion
# For Debian/Ubuntu
sudo apt-get install ffmpeg
# For CentOS/RHEL/Fedora
sudo dnf install ffmpeg
</pre>


Types of models and their accuracy and speed:
==== Step 2: Prepare the Model and Configure the GUI ====
The Python library can download models automatically, but it's best practice to specify an absolute path in the configuration. First, trigger a download to a known location.
<pre>
# This command will download the 'small' model to /opt/whisper_models/
# You can use any audio file; its content doesn't matter for the download.
whisper audio.wav --model=small --model_dir=/opt/whisper_models
</pre>
Now, edit <code>/var/www/html/config/configuration.php</code> and provide the full path to the downloaded model file.
<pre>
<?php
// /var/www/html/config/configuration.php


* tiny and base models are smaller and faster, but with lower accuracy,
// Provide the absolute path to the downloaded .pt model file.
* small and medium models are larger and offer better accuracy,
define('WHISPER_MODEL', '/opt/whisper_models/small.pt');
* large model is the largest and most accurate, but also the slowest and most computationally intensive.


===== Test =====
// Optional: Specify the number of threads
The test is very simple.
define('WHISPER_THREADS', 4);
</pre>


<code>whisper audio.wav</code>
=== Testing the GUI Integration ===
You can test the transcription process from the command line as the GUI would run it. This is useful for debugging paths and performance.
<pre>
# Example test for a whisper.cpp setup
/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' --json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/your/whisper.cpp/models/ggml-small.bin"},{"whisper_threads":"2"}]' -v1,whisper
</pre>


Parameters can be revealed with the –help parameter.
== Path B: Automatic Transcription in the Sniffer ==
This setup automatically transcribes calls in the background on the sensor itself. This is a headless operation and requires configuration in <code>voipmonitor.conf</code>. Using '''<code>whisper.cpp</code> is strongly recommended''' for this server-side task due to its superior performance.


[https://openai.com/index/whisper/ https://openai.com/index/whisper/]
=== Step 1: Prepare Your Engine on the Sensor ===
[https://github.com/ggerganov/whisper.cpp https://github.com/ggerganov/whisper.cpp]
You must have one of the Whisper engines installed '''on the sensor machine'''.


<code>whisper –help</code>
;For `whisper.cpp`:
Follow the installation steps from "Path A" to compile `whisper.cpp`. For advanced integration, you may need to build the shared libraries and install them system-wide (see Advanced section below).


===== Python Script =====
;For `OpenAI Whisper`:
The desired behavior of Whisper can be modified using a Python script. The script transcribe.py demonstrates enforcing deterministic mode. The OpenAI implementation of Whisper does not behave deterministically and the same run can have different results each time, which is undesirable.
Follow the Python package installation steps from "Path A".


==== Whisper.cpp ====
=== Step 2: Configure the Sniffer ===
Edit <code>/etc/voipmonitor.conf</code> on your sensor to enable and control automatic transcription. You have three main ways to integrate it.


===== Installation =====
==== Option 1: Using `whisper.cpp` (Recommended) ====
To download the code, create the desired folder and enter it. Then:
This uses the compiled `main` executable.
<pre>
# /etc/voipmonitor.conf


<code>git clone https://github.com/ggerganov/whisper.cpp.git</code>
# Enable the transcription feature
audio_transcribe = yes


Git will download the project into the whisper.cpp folder. All subsequent commands assume that you have entered this folder. So:
# Tell the sniffer to use the high-performance C++ engine
whisper_native = yes


<code>cd whisper.cpp</code>
# --- CRITICAL ---
# You MUST provide the absolute path to the downloaded whisper.cpp model file
whisper_model = /path/to/your/whisper.cpp/models/ggml-small.bin
</pre>


Now run the build:
==== Option 2: Using `OpenAI Whisper` ====
This uses the Python library.
<pre>
# /etc/voipmonitor.conf


<code>make -j</code>
# Enable the transcription feature
audio_transcribe = yes


Optionally, you can also build the dynamic and static library.
# Use the Python engine (this is the default, but explicit is better)
whisper_native = no


<code>make libwhisper.so -j
# Specify the model name to use ('small' is a good default).
make libwhisper.a -j</code>
# The library will download it to ~/.cache/whisper/ if not found.
whisper_model = small
</pre>


===== Model =====
==== Option 3: Using `whisper.cpp` as a Loadable Module (Advanced) ====
Unlike OpenAI Whisper, the model is not downloaded automatically. The script models/download-ggml-model.sh is used to download the model. Model types (i.e., tiny, base, small ...) are the same as for OpenAI Whisper. There are also variants of models that contain only English. These have the suffix '-en'. The default model is base-en. The base model is downloaded like this:
This method allows you to update the `whisper.cpp` library without recompiling the entire sniffer. It requires a modified `whisper.cpp` build (see Advanced section).
<pre>
# /etc/voipmonitor.conf


<code>models/download-ggml-model.sh base</code>
audio_transcribe = yes
whisper_native = yes
whisper_model = /path/to/your/whisper.cpp/models/ggml-small.bin


The model is thus saved in the 'models' folder. Note that OpenAI Whisper models are not binary compatible with models for whisper.cpp. However, it is possible to convert OpenAI models to the required format using the models/convert-pt-to-ggml.py script.
# Specify the path to the compiled shared library
whisper_native_lib = /path/to/your/whisper.cpp/libwhisper.so
</pre>


===== Test =====
=== Step 3: Fine-Tuning Transcription Parameters ===
For testing, simply specify the audio file and the parameter specifying the model.
The following parameters in <code>voipmonitor.conf</code> allow you to control the transcription process:


<code>./main audio.wav -m models/ggml-base.bin</code>
;<code>audio_transcribe = yes</code>
: (Default: no) Enables the audio transcription feature.
;<code>audio_transcribe_connect_duration_min = 10</code>
: (Default: 10) Only transcribes calls that were connected for at least this many seconds.
;<code>audio_transcribe_threads = 2</code>
: (Default: 2) The number of calls to transcribe concurrently.
;<code>audio_transcribe_queue_length_max = 100</code>
: (Default: 100) The maximum number of calls waiting in the transcription queue.
;<code>whisper_native = no</code>
: (Default: no) Set to `yes` to force the use of the `whisper.cpp` engine.
;<code>whisper_model = small</code>
: For `OpenAI Whisper`, this is the model name (tiny, base, small, etc.). For `whisper.cpp`, this '''must''' be the full, absolute path to the `.bin` model file.
;<code>whisper_language = auto</code>
: (Default: auto) Can be a specific language code (e.g., <code>en</code>, <code>de</code>), `auto` for detection, or <code>by_number</code> to guess based on the phone number's country code.
;<code>whisper_threads = 2</code>
: (Default: 2) The number of CPU threads to use for a ''single'' transcription job.
;<code>whisper_timeout = 300</code>
: (Default: 300) For `OpenAI Whisper` only. Maximum time in seconds for a single transcription.
;<code>whisper_deterministic_mode = yes</code>
: (Default: yes) For `OpenAI Whisper` only. Aims for more consistent, repeatable transcription results.
;<code>whisper_python = /usr/bin/python3</code>
: (Default: not set) For `OpenAI Whisper` only. Specifies the path to the Python binary if it's not in the system's `PATH`.
;<code>whisper_native_lib = /path/to/libwhisper.so</code>
: (Default: not set) For `whisper.cpp` only. Specifies the path to the shared library when using the loadable module method.


Note that the audio file has strict limitations. It must contain 16kHz sampling and only one channel. OpenAI Whisper handles this itself using ffmpeg.
== Advanced Topics ==


===== Modification =====
=== Compiling `whisper.cpp` with Libraries for Sniffer Integration ===
For advanced integration in the sniffer as a loadable library, modification is required. The patch file whisper.diff is used for this.
To compile the VoIPmonitor sniffer with built-in `whisper.cpp` support or to use it as a loadable library, you must build its shared and static libraries.


<code>patch < whisper.diff
;1. Build the libraries:
<pre>
cd /path/to/your/whisper.cpp
# Build the main executable, shared lib, and static lib
make -j
make -j
make libwhisper.so -j
make libwhisper.so -j
make libwhisper.a -j</code>
make libwhisper.a -j
</pre>


===== Compilation including Nvidia CUDA Acceleration =====
;2. (Optional) Apply patch for loadable module:
Acceleration using Nvidia graphics card or Nvidia accelerator significantly speeds up the transcription process.
For the advanced "loadable module" integration (<code>whisper_native_lib</code>), a patch is required.
<pre>
# Inside the whisper.cpp directory
patch < whisper.diff
make clean
make -j
make libwhisper.so -j
</pre>


First, you need to download the CUDA libraries. Here is the necessary guide:
;3. Install libraries and headers:
 
For the sniffer's build process to find the `whisper.cpp` components, place them in standard system locations or create symbolic links.
[https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64 https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64]
<pre>
 
# Create symbolic links to the compiled files in your whisper.cpp directory
Do not forget to add CUDA to your $PATH and $LD_LIBRARY_PATH. You can do this as follows:
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
 
ln -s $(pwd)/ggml.h /usr/local/include/ggml.h
<code>export PATH=/usr/local/cuda/bin:$PATH
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH</code>
ln -s $(pwd)/libwhisper.a /usr/local/lib64/libwhisper.a
 
</pre>
You can add both lines to the ~/.bashrc file.


If the installation is complete, verify it by checking the version of nvcc.
=== CUDA Acceleration for `whisper.cpp` ===
To achieve a massive speed increase (up to 30x), you can compile <code>whisper.cpp</code> with NVIDIA CUDA support. This is highly recommended if you have a compatible NVIDIA GPU on your sensor or GUI server.


<code>nvcc --version</code>
;1. Install the NVIDIA CUDA Toolkit:
Follow the [https://developer.nvidia.com/cuda-downloads official guide] for your Linux distribution.


If everything is fine, you can rebuild whisper.cpp including Nvidia CUDA acceleration.
;2. Set environment variables:
Ensure the CUDA toolkit is in your system's path. You can add these lines to your <code>~/.bashrc</code> file.
<pre>
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
</pre>
Verify with <code>nvcc --version</code>.


<code>make clean
;3. Re-compile `whisper.cpp` with the CUDA flag:
<pre>
cd /path/to/your/whisper.cpp
make clean
# Rebuild the executable and libraries with CUDA enabled
WHISPER_CUDA=1 make -j
WHISPER_CUDA=1 make -j
WHISPER_CUDA=1 make libwhisper.so -j
WHISPER_CUDA=1 make libwhisper.so -j
WHISPER_CUDA=1 make libwhisper.a -j</code>
WHISPER_CUDA=1 make libwhisper.a -j
 
</pre>
===== Installation of Libraries and Headers =====
VoIPmonitor will automatically detect and use the CUDA-enabled `whisper.cpp` binary or library.
If you want to build a sniffer with whisper.cpp, you will need the headers and libraries of whisper in the usual location. To do this, simply create links.
 
<code>ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
ln -s $(pwd)/ggml.h /usr/local/include/ggml.h
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so
ln -s $(pwd)/libwhisper.a /usr/local/lib64/libwhisper.a</code>
 
=== Integration in the Sniffer ===
 
Whisper can be used in the sniffer in the following ways:
 
# Using OpenAI Whisper
# Compilation with whisper.cpp (or using a static binary)
# Compilation using the whisper.cpp library as a loadable module
 
==== Parameters ====
The following parameters are available for all the above methods:
 
<code>audio_transcribe = yes/NO</code>
 
Enables audio transcription. This is done after the call is saved to the database. The transcription result is saved in the 'cdr_audio_transcribe' table.
default: no
 
<code>audio_transcribe_connect_duration_min = N</code>
 
Limits audio transcription only to calls where the connection duration is at least the specified number of seconds.
default: 10
 
<code>audio_transcribe_threads = N</code>
 
Number of threads in which audio transcription is run. This is the maximum number of calls processed concurrently.
default: 2
 
<code>audio_transcribe_queue_length_max = N</code>
 
Maximum queue length of calls waiting for transcription. If the queue is full, transcription for additional calls is not performed.
default: 100
 
<code>whisper_native = yes/NO</code>
 
If set to 'yes', forces the use of whisper.cpp. If set to 'no', OpenAI Whisper is used.
default: no
 
<code>whisper_model = {model}</code>
 
For OpenAI Whisper, the model type (tiny, base, small, medium, large) is specified. If not specified, 'small' is used. The value of the parameter for OpenAI Whisper can also be the model file name including its path.
For whisper.cpp, the parameter is mandatory and the model file name including its path must be specified.
default: not specified
 
<code>whisper_language = auto/by_number/{language}</code>
 
Whisper can automatically perform language detection. However, this may not be reliable. Therefore, in addition to the 'auto' option, the following options are available:
* by_number – language is enforced according to the country assignment based on phone numbers in the call,
* {language} – specify the language according to ISO 639-1.
default: auto
 
<code>whisper_timeout = N</code>
 
Maximum time (in seconds) for transcription.
This parameter is only for OpenAI Whisper.
default: 300
 
<code>whisper_deterministic_mode = YES/no</code>
 
Determines whether Whisper should be run in deterministic mode.
This parameter is only for OpenAI Whisper.
default: yes
 
<code>whisper_python = {filepathname of python}</code>
 
Specifies the Python binary used to run the transcription. This parameter is not mandatory.
This parameter is only for OpenAI Whisper.
default: not specified
 
<code>whisper_threads = N</code>
 
Number of Whisper threads allocated for transcribing a single call. Increasing the number of threads speeds up transcription. To determine the optimal number of threads, you can use the test described in the 'Usage in GUI' / 'test' section.
default: 2
 
<code>whisper_native_lib = {filepathname of whisper.cpp library}</code>
 
Specifies the location of the whisper.cpp library. This parameter is only for the case of compiling the sniffer using the whisper.cpp library as a loadable module.
default: not specified
 
==== Using OpenAI Whisper ====
The prerequisite is the installation of OpenAI Whisper. This is described above. To use it in the sniffer, simply enable transcription.
 
<code>audio_transcribe = yes</code>
 
By default, the 'small' model (parameter whisper_model) and the 'auto' language (parameter whisper_language) will be used.
 
==== Compilation with whisper.cpp ====
The prerequisites are:
 
* For compiling the sniffer: building whisper.cpp including the creation of the libwhisper.so library (the procedure is described above).
* Or using a static sniffer binary that includes whisper.cpp.
 
The static sniffer binary allows basic transcription operation using the whisper.cpp project. For compatibility reasons, it does not include advanced optimizations for new CPU types or Nvidia CUDA acceleration. However, you can easily achieve this by compiling the sniffer including the whisper.cpp project.
 
Whether you have compiled the sniffer or downloaded the static binary, to enable transcription it is necessary to specify the use of whisper.cpp and specify the model. You must have the model downloaded in advance (as described above). Assuming you have the small model downloaded as /opt/whisper.cpp/models/ggml-small.bin, the parameters will be:
 
<code>audio_transcribe = yes
whisper_native = yes
whisper_model = /opt/whisper.cpp/models/ggml-small.bin</code>
 
By default, the 'auto' language (parameter whisper_language) will be used.
 
==== Compilation using the whisper.cpp library as a loadable module ====
This option is advanced and serves for easy experimentation with the whisper.cpp build without having to build the entire sniffer.
 
The prerequisites are:
 
* A compiled sniffer without integrated whisper.cpp support. The config.h file must not have the '#define HAVE_LIBWHISPER 1' option enabled and the Makefile must not contain '-lwhisper' in 'SHARED_LIBS'.
* A compiled whisper.cpp including the 'whisper.diff' modification and the creation of the libwhisper.so library.
 
Compared to the previously described method of running, it is enough to add the location of the whisper.cpp library.
 
<code>audio_transcribe = yes
whisper_native = yes
whisper_model = /opt/whisper.cpp/models/ggml-small.bin
whisper_native_lib = /opt/whisper.cpp/libwhisper.so</code>
 
=== Usage in GUI ===
It is possible to enable transcription for the active sniffer run. However, if you do not have an Nvidia accelerator (or Nvidia graphics card), transcription will be very CPU intensive. Therefore, it may be more useful to request transcription in the GUI for a selected call. The above alternatives for OpenAI Whisper and whisper.cpp can be used for this.
 
==== GUI – OpenAI Whisper ====
 
===== Preparation =====
Install OpenAI Whisper according to the procedure described above.
 
Download the required model. The easiest way to do this is by running a test with the specified model. For the 'small' model, for example, like this:
 
<code>whisper audio.wav --model=small</code>
 
You will then find the model in the ~/.cache/whisper folder. You can force the folder for model download using the --model_dir parameter. To download the required model to the /opt/whisper/models folder, do this:
 
<code>whisper audio.wav --model=small --model_dir=/opt/whisper/models</code>
 
===== Configuration =====
Simply add the model specification to the GUI configuration. It is best to specify the file name of the model including the path to it. If the 'small' model was in the /opt/whisper/models folder, the configuration should be (in the config/configuration.php file):
 
<code>define('WHISPER_MODEL', '/opt/whisper/models/small.pt');</code>
 
==== GUI – whisper.cpp ====
 
===== Preparation =====
If including whisper.cpp in the static binary is sufficient for you, you do not need to do anything within the preparation. However, the static binary may not contain all useful optimizations for your CPU and does not include support for Nvidia accelerators (or graphics cards).
 
Especially if you have an Nvidia accelerator, this procedure is suitable:
 
# Build the sniffer including the whisper.cpp library according to the procedures described above.
# Create a symbolic link to the resulting sniffer binary module in the bin folder of your GUI installation. The name of the link must be 'vm'.
 
You will also need the model. Its download is described above (it is the use of the models/download-ggml-model.sh script). So download and place the chosen model in your chosen folder. For example, to /opt/whisper.cpp/models.
 
===== Configuration =====
The configuration is now easy. Simply specify the use of whisper.cpp (using the whisper_native parameter) and specify the model.
 
<code>define('WHISPER_NATIVE', true);
define('WHISPER_MODEL', '/opt/whisper.cpp/models/ggml-small.bin');</code>
 
If you chose to build the sniffer with the whisper.cpp loadable library, the parameter specifying where the library is located would also be needed. For example:
 
<code>define('WHISPER_NATIVE_LIB', '/opt/whisper.cpp/libwhisper.so');</code>
 
==== Common Parameters ====
The only common parameter is the WHISPER_THREADS parameter to specify the number of threads. By default, two threads are used. Setting the use of 4 threads looks like this:
 
<code>define('WHISPER_THREADS', 4);</code>
 
==== Test ====
A useful test might be to run the transcription as it is run by the GUI. If we assume the location of your GUI web folder at /var/www/html and if you have a test audio file /tmp/audio.wav, the test run might look like this:
 
<code>/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' --
json_config='[{"whisper_native":"yes"},{"whisper_model":"/opt/whisper.cpp
/models/ggml-small.bin"},{"whisper_threads":"2"}]' -v1,whisper</code>
 
The test run will allow you to easily debug the necessary number of threads allocated for transcription.
 
=== Methods ===
* Openai Whisper
* Installation
* Model
* Test
* Python Script
 
* Whisper.cpp
* Installation
* Model
* Test
* Modification
* Compilation including Nvidia CUDA acceleration
* Installation of Libraries and Headers
 
=== Integration in the Sniffer ===
* Parameters
* Using OpenAI Whisper
* Compilation with whisper.cpp
* Compilation using the whisper.cpp library as a loadable module
 
=== Usage in GUI ===
* GUI – OpenAI Whisper
* Preparation
* Configuration
 
* GUI – whisper.cpp
* Preparation
* Configuration


* Common Parameters
== AI Summary for RAG ==
* Test
'''Summary:''' This guide explains how to integrate OpenAI's Whisper ASR for call transcription in VoIPmonitor. It details two primary methods: on-demand transcription from the GUI and automatic background transcription on the sniffer. For both methods, it compares the two available engines: the official Python `OpenAI Whisper` library and the high-performance C++ port, `whisper.cpp`, recommending `whisper.cpp` for server-side processing. The article provides step-by-step instructions for installing each engine, including compiling `whisper.cpp` from source, building its libraries, and installing the Python package via `pip`. It details the necessary configuration in both the GUI's `configuration.php` (e.g., `WHISPER_NATIVE`, `WHISPER_MODEL`) and the sniffer's `voipmonitor.conf` (e.g., `audio_transcribe`, `whisper_native`, `whisper_native_lib`). It also covers optional parameters for fine-tuning, such as setting language, thread counts, and minimum call duration. Finally, it includes a detailed section on enabling NVIDIA CUDA acceleration for `whisper.cpp` to achieve significant performance gains and explains advanced integration methods like using `whisper.cpp` as a loadable library.
'''Keywords:''' whisper, transcription, asr, speech to text, openai, whisper.cpp, `audio_transcribe`, `whisper_native`, `whisper_model`, `whisper_native_lib`, cuda, nvidia, gpu, acceleration, gui, sniffer, automatic transcription, on-demand, libwhisper.so, loadable module
'''Key Questions:'''
* How can I transcribe phone calls in VoIPmonitor?
* What is the difference between OpenAI Whisper and whisper.cpp? Which one should I use?
* How do I configure on-demand call transcription in the GUI?
* How do I set up the sniffer for automatic, server-side transcription of all calls?
* What are the required parameters in `voipmonitor.conf` for Whisper?
* How can I speed up Whisper transcription using an NVIDIA GPU (CUDA)?
* How do I install and compile `whisper.cpp`, including its libraries (`libwhisper.so`)?
* What do the `audio_transcribe`, `whisper_native`, and `whisper_native_lib` options do?
* How do I use `whisper.cpp` as a loadable module in the sniffer?

Latest revision as of 02:09, 5 July 2025


This guide explains how to integrate OpenAI's Whisper, a powerful automatic speech recognition (ASR) system, with VoIPmonitor for both on-demand and automatic call transcription.

Introduction to Whisper Integration

VoIPmonitor integrates Whisper, an ASR system from OpenAI trained on 680,000 hours of multilingual and multitask supervised data. This large and diverse dataset leads to improved robustness to accents, background noise, and technical language.

There are two primary ways to use Whisper with VoIPmonitor:

  1. On-Demand Transcription (in the GUI): The simplest method. A user can click a button on any call in the GUI to transcribe it. The processing happens on the GUI server.
  2. Automatic Transcription (in the Sniffer): A more advanced setup where the sensor automatically transcribes all calls (or a subset) in the background immediately after they finish.

For both methods, you must choose one of two underlying Whisper engines to install and configure.

Choosing Your Whisper Engine
  • OpenAI Whisper (Python): The official implementation from OpenAI. It is easier to install (pip install openai-whisper) but can be slower for CPU-based transcription. It uses PyTorch and requires `ffmpeg` for audio pre-processing. The official implementation does not behave deterministically (the same run can have different results), but this can be addressed with a custom script.
  • whisper.cpp (C++): A high-performance C++ port of whisper.cpp. It is significantly faster for CPU transcription and is the recommended engine for server-side processing. It requires manual compilation but offers superior performance and optimizations like NVIDIA CUDA for GPU acceleration (up to 30x faster). Note that it requires audio input to be 16kHz, 1-channel (mono).

Path A: On-Demand Transcription in the GUI

This setup allows users to manually trigger transcription from the call detail page. The processing occurs on the web server where the GUI is hosted.

Option 1: Using the `whisper.cpp` Engine (Recommended)

Step 1: Install `whisper.cpp` and Download a Model

First, you need to compile the `whisper.cpp` project and download a pre-trained model on your GUI server.

# Clone the repository
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp

# Compile the main application
make -j

# Download a model (e.g., 'base.en' for English-only, or 'small' for multilingual)
./models/download-ggml-model.sh base.en

This will create the main executable at ./main and download the model to the ./models/ directory. Note that `whisper.cpp` models (GGML format) are not binary compatible with the official OpenAI models (.pt format), but a conversion script is provided in the project.

Step 2: Configure the VoIPmonitor GUI

Edit your GUI's configuration file at /var/www/html/config/configuration.php and add the following definitions:

<?php
// /var/www/html/config/configuration.php

// Tell the GUI to use the whisper.cpp engine
define('WHISPER_NATIVE', true);

// Provide the absolute path to the model file you downloaded
define('WHISPER_MODEL', '/path/to/your/whisper.cpp/models/ggml-base.en.bin');

// Optional: Specify the number of threads for transcription
define('WHISPER_THREADS', 4);

No further setup is required. The GUI will now show a "Transcribe" button on call detail pages.

Option 2: Using the `OpenAI Whisper` Engine

Step 1: Install the Python Package and Dependencies

# Install the whisper library via pip
pip install openai-whisper

# Install ffmpeg, which is required for audio conversion
# For Debian/Ubuntu
sudo apt-get install ffmpeg
# For CentOS/RHEL/Fedora
sudo dnf install ffmpeg

Step 2: Prepare the Model and Configure the GUI

The Python library can download models automatically, but it's best practice to specify an absolute path in the configuration. First, trigger a download to a known location.

# This command will download the 'small' model to /opt/whisper_models/
# You can use any audio file; its content doesn't matter for the download.
whisper audio.wav --model=small --model_dir=/opt/whisper_models

Now, edit /var/www/html/config/configuration.php and provide the full path to the downloaded model file.

<?php
// /var/www/html/config/configuration.php

// Provide the absolute path to the downloaded .pt model file.
define('WHISPER_MODEL', '/opt/whisper_models/small.pt');

// Optional: Specify the number of threads
define('WHISPER_THREADS', 4);

Testing the GUI Integration

You can test the transcription process from the command line as the GUI would run it. This is useful for debugging paths and performance.

# Example test for a whisper.cpp setup
/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' --json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/your/whisper.cpp/models/ggml-small.bin"},{"whisper_threads":"2"}]' -v1,whisper

Path B: Automatic Transcription in the Sniffer

This setup automatically transcribes calls in the background on the sensor itself. This is a headless operation and requires configuration in voipmonitor.conf. Using whisper.cpp is strongly recommended for this server-side task due to its superior performance.

Step 1: Prepare Your Engine on the Sensor

You must have one of the Whisper engines installed on the sensor machine.

For `whisper.cpp`

Follow the installation steps from "Path A" to compile `whisper.cpp`. For advanced integration, you may need to build the shared libraries and install them system-wide (see Advanced section below).

For `OpenAI Whisper`

Follow the Python package installation steps from "Path A".

Step 2: Configure the Sniffer

Edit /etc/voipmonitor.conf on your sensor to enable and control automatic transcription. You have three main ways to integrate it.

Option 1: Using `whisper.cpp` (Recommended)

This uses the compiled `main` executable.

# /etc/voipmonitor.conf

# Enable the transcription feature
audio_transcribe = yes

# Tell the sniffer to use the high-performance C++ engine
whisper_native = yes

# --- CRITICAL ---
# You MUST provide the absolute path to the downloaded whisper.cpp model file
whisper_model = /path/to/your/whisper.cpp/models/ggml-small.bin

Option 2: Using `OpenAI Whisper`

This uses the Python library.

# /etc/voipmonitor.conf

# Enable the transcription feature
audio_transcribe = yes

# Use the Python engine (this is the default, but explicit is better)
whisper_native = no

# Specify the model name to use ('small' is a good default).
# The library will download it to ~/.cache/whisper/ if not found.
whisper_model = small

Option 3: Using `whisper.cpp` as a Loadable Module (Advanced)

This method allows you to update the `whisper.cpp` library without recompiling the entire sniffer. It requires a modified `whisper.cpp` build (see Advanced section).

# /etc/voipmonitor.conf

audio_transcribe = yes
whisper_native = yes
whisper_model = /path/to/your/whisper.cpp/models/ggml-small.bin

# Specify the path to the compiled shared library
whisper_native_lib = /path/to/your/whisper.cpp/libwhisper.so

Step 3: Fine-Tuning Transcription Parameters

The following parameters in voipmonitor.conf allow you to control the transcription process:

audio_transcribe = yes
(Default: no) Enables the audio transcription feature.
audio_transcribe_connect_duration_min = 10
(Default: 10) Only transcribes calls that were connected for at least this many seconds.
audio_transcribe_threads = 2
(Default: 2) The number of calls to transcribe concurrently.
audio_transcribe_queue_length_max = 100
(Default: 100) The maximum number of calls waiting in the transcription queue.
whisper_native = no
(Default: no) Set to `yes` to force the use of the `whisper.cpp` engine.
whisper_model = small
For `OpenAI Whisper`, this is the model name (tiny, base, small, etc.). For `whisper.cpp`, this must be the full, absolute path to the `.bin` model file.
whisper_language = auto
(Default: auto) Can be a specific language code (e.g., en, de), `auto` for detection, or by_number to guess based on the phone number's country code.
whisper_threads = 2
(Default: 2) The number of CPU threads to use for a single transcription job.
whisper_timeout = 300
(Default: 300) For `OpenAI Whisper` only. Maximum time in seconds for a single transcription.
whisper_deterministic_mode = yes
(Default: yes) For `OpenAI Whisper` only. Aims for more consistent, repeatable transcription results.
whisper_python = /usr/bin/python3
(Default: not set) For `OpenAI Whisper` only. Specifies the path to the Python binary if it's not in the system's `PATH`.
whisper_native_lib = /path/to/libwhisper.so
(Default: not set) For `whisper.cpp` only. Specifies the path to the shared library when using the loadable module method.

Advanced Topics

Compiling `whisper.cpp` with Libraries for Sniffer Integration

To compile the VoIPmonitor sniffer with built-in `whisper.cpp` support or to use it as a loadable library, you must build its shared and static libraries.

1. Build the libraries
cd /path/to/your/whisper.cpp
# Build the main executable, shared lib, and static lib
make -j
make libwhisper.so -j
make libwhisper.a -j
2. (Optional) Apply patch for loadable module

For the advanced "loadable module" integration (whisper_native_lib), a patch is required.

# Inside the whisper.cpp directory
patch < whisper.diff
make clean
make -j
make libwhisper.so -j
3. Install libraries and headers

For the sniffer's build process to find the `whisper.cpp` components, place them in standard system locations or create symbolic links.

# Create symbolic links to the compiled files in your whisper.cpp directory
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
ln -s $(pwd)/ggml.h /usr/local/include/ggml.h
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so
ln -s $(pwd)/libwhisper.a /usr/local/lib64/libwhisper.a

CUDA Acceleration for `whisper.cpp`

To achieve a massive speed increase (up to 30x), you can compile whisper.cpp with NVIDIA CUDA support. This is highly recommended if you have a compatible NVIDIA GPU on your sensor or GUI server.

1. Install the NVIDIA CUDA Toolkit

Follow the official guide for your Linux distribution.

2. Set environment variables

Ensure the CUDA toolkit is in your system's path. You can add these lines to your ~/.bashrc file.

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Verify with nvcc --version.

3. Re-compile `whisper.cpp` with the CUDA flag
cd /path/to/your/whisper.cpp
make clean
# Rebuild the executable and libraries with CUDA enabled
WHISPER_CUDA=1 make -j
WHISPER_CUDA=1 make libwhisper.so -j
WHISPER_CUDA=1 make libwhisper.a -j

VoIPmonitor will automatically detect and use the CUDA-enabled `whisper.cpp` binary or library.

AI Summary for RAG

Summary: This guide explains how to integrate OpenAI's Whisper ASR for call transcription in VoIPmonitor. It details two primary methods: on-demand transcription from the GUI and automatic background transcription on the sniffer. For both methods, it compares the two available engines: the official Python `OpenAI Whisper` library and the high-performance C++ port, `whisper.cpp`, recommending `whisper.cpp` for server-side processing. The article provides step-by-step instructions for installing each engine, including compiling `whisper.cpp` from source, building its libraries, and installing the Python package via `pip`. It details the necessary configuration in both the GUI's `configuration.php` (e.g., `WHISPER_NATIVE`, `WHISPER_MODEL`) and the sniffer's `voipmonitor.conf` (e.g., `audio_transcribe`, `whisper_native`, `whisper_native_lib`). It also covers optional parameters for fine-tuning, such as setting language, thread counts, and minimum call duration. Finally, it includes a detailed section on enabling NVIDIA CUDA acceleration for `whisper.cpp` to achieve significant performance gains and explains advanced integration methods like using `whisper.cpp` as a loadable library. Keywords: whisper, transcription, asr, speech to text, openai, whisper.cpp, `audio_transcribe`, `whisper_native`, `whisper_model`, `whisper_native_lib`, cuda, nvidia, gpu, acceleration, gui, sniffer, automatic transcription, on-demand, libwhisper.so, loadable module Key Questions:

  • How can I transcribe phone calls in VoIPmonitor?
  • What is the difference between OpenAI Whisper and whisper.cpp? Which one should I use?
  • How do I configure on-demand call transcription in the GUI?
  • How do I set up the sniffer for automatic, server-side transcription of all calls?
  • What are the required parameters in `voipmonitor.conf` for Whisper?
  • How can I speed up Whisper transcription using an NVIDIA GPU (CUDA)?
  • How do I install and compile `whisper.cpp`, including its libraries (`libwhisper.so`)?
  • What do the `audio_transcribe`, `whisper_native`, and `whisper_native_lib` options do?
  • How do I use `whisper.cpp` as a loadable module in the sniffer?