Whisper: Difference between revisions

From VoIPmonitor.org
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
Samozřejmě, tady je kompletně přepracovaná a sloučená verze, která zachovává přehlednou strukturu mého původního návrhu, ale integruje **všechny technické detaily** z původní, podrobnější stránky.
Výsledkem je komplexní průvodce, který je jak snadno čitelný pro začátečníky, tak dostatečně detailní pro pokročilé uživatele.
---
{{DISPLAYTITLE:Call Transcription with Whisper AI}}
{{DISPLAYTITLE:Call Transcription with Whisper AI}}


Line 4: Line 10:


== Introduction to Whisper Integration ==
== Introduction to Whisper Integration ==
VoIPmonitor integrates [https://openai.com/index/whisper/ Whisper], an ASR system trained on a massive dataset, enabling robust transcription of calls with various languages, accents, and background noise.
VoIPmonitor integrates [https://openai.com/index/whisper/ Whisper], an ASR system from OpenAI trained on 680,000 hours of multilingual and multitask supervised data. This large and diverse dataset leads to improved robustness to accents, background noise, and technical language.


There are two primary ways to use Whisper with VoIPmonitor:
There are two primary ways to use Whisper with VoIPmonitor:
Line 13: Line 19:


;Choosing Your Whisper Engine
;Choosing Your Whisper Engine
*'''OpenAI Whisper''' (Python): The official implementation from OpenAI. It is easier to install (<code>pip install openai-whisper</code>) but can be slower for CPU-based transcription.
*'''OpenAI Whisper''' (Python): The official implementation from OpenAI. It is easier to install (<code>pip install openai-whisper</code>) but can be slower for CPU-based transcription. It uses PyTorch and requires `ffmpeg` for audio pre-processing. The official implementation does not behave deterministically (the same run can have different results), but this can be addressed with a custom script.
*'''whisper.cpp''' (C++): A high-performance C++ port of Whisper. It is significantly faster for CPU transcription and is the '''recommended engine for server-side processing'''. It requires manual compilation but offers superior performance and optimizations like CUDA.
*'''whisper.cpp''' (C++): A high-performance C++ port of [https://github.com/ggerganov/whisper.cpp whisper.cpp]. It is significantly faster for CPU transcription and is the '''recommended engine for server-side processing'''. It requires manual compilation but offers superior performance and optimizations like NVIDIA CUDA for GPU acceleration (up to 30x faster). Note that it requires audio input to be 16kHz, 1-channel (mono).


== Path A: On-Demand Transcription in the GUI ==
== Path A: On-Demand Transcription in the GUI ==
Line 22: Line 28:


==== Step 1: Install `whisper.cpp` and Download a Model ====
==== Step 1: Install `whisper.cpp` and Download a Model ====
First, you need to compile the <code>whisper.cpp</code> project and download a pre-trained model on your GUI server.
First, you need to compile the `whisper.cpp` project and download a pre-trained model on your GUI server.
<pre>
<pre>
# Clone the repository
# Clone the repository
Line 34: Line 40:
./models/download-ggml-model.sh base.en
./models/download-ggml-model.sh base.en
</pre>
</pre>
This will create the main executable at <code>./main</code> and download the model to the <code>./models/</code> directory.
This will create the main executable at <code>./main</code> and download the model to the <code>./models/</code> directory. Note that `whisper.cpp` models (GGML format) are not binary compatible with the official OpenAI models (.pt format), but a conversion script is provided in the project.


==== Step 2: Configure the VoIPmonitor GUI ====
==== Step 2: Configure the VoIPmonitor GUI ====
Line 63: Line 69:
# For Debian/Ubuntu
# For Debian/Ubuntu
sudo apt-get install ffmpeg
sudo apt-get install ffmpeg
# For CentOS/RHEL
# For CentOS/RHEL/Fedora
sudo yum install ffmpeg
sudo dnf install ffmpeg
</pre>
</pre>


==== Step 2: Configure the VoIPmonitor GUI ====
==== Step 2: Prepare the Model and Configure the GUI ====
Edit <code>/var/www/html/config/configuration.php</code> and define the model you want to use. The Whisper library will download it automatically on the first run.
The Python library can download models automatically, but it's best practice to specify an absolute path in the configuration. First, trigger a download to a known location.
<pre>
# This command will download the 'small' model to /opt/whisper_models/
# You can use any audio file; its content doesn't matter for the download.
whisper audio.wav --model=small --model_dir=/opt/whisper_models
</pre>
Now, edit <code>/var/www/html/config/configuration.php</code> and provide the full path to the downloaded model file.
<pre>
<pre>
<?php
<?php
// /var/www/html/config/configuration.php
// /var/www/html/config/configuration.php


// Specify the model name. Options: tiny, base, small, medium, large
// Provide the absolute path to the downloaded .pt model file.
// 'small' is a good balance of speed and accuracy.
define('WHISPER_MODEL', '/opt/whisper_models/small.pt');
define('WHISPER_MODEL', 'small');
 
// Optional: Specify the number of threads
define('WHISPER_THREADS', 4);
</pre>
 
=== Testing the GUI Integration ===
You can test the transcription process from the command line as the GUI would run it. This is useful for debugging paths and performance.
<pre>
# Example test for a whisper.cpp setup
/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' --json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/your/whisper.cpp/models/ggml-small.bin"},{"whisper_threads":"2"}]' -v1,whisper
</pre>
</pre>


== Path B: Automatic Transcription in the Sniffer ==
== Path B: Automatic Transcription in the Sniffer ==
This setup automatically transcribes calls in the background on the sensor itself. This is a headless operation and requires configuration in <code>voipmonitor.conf</code>.
This setup automatically transcribes calls in the background on the sensor itself. This is a headless operation and requires configuration in <code>voipmonitor.conf</code>. Using '''<code>whisper.cpp</code> is strongly recommended''' for this server-side task due to its superior performance.


=== Step 1: Choose and Prepare Your Engine ===
=== Step 1: Prepare Your Engine on the Sensor ===
You must have one of the Whisper engines installed '''on the sensor machine'''. Using '''<code>whisper.cpp</code> is strongly recommended''' for this server-side task due to its superior performance. Follow the installation steps from "Path A" to compile <code>whisper.cpp</code> or install the <code>openai-whisper</code> Python package.
You must have one of the Whisper engines installed '''on the sensor machine'''.
 
;For `whisper.cpp`:
Follow the installation steps from "Path A" to compile `whisper.cpp`. For advanced integration, you may need to build the shared libraries and install them system-wide (see Advanced section below).
 
;For `OpenAI Whisper`:
Follow the Python package installation steps from "Path A".


=== Step 2: Configure the Sniffer ===
=== Step 2: Configure the Sniffer ===
Edit <code>/etc/voipmonitor.conf</code> on your sensor to enable and control automatic transcription.
Edit <code>/etc/voipmonitor.conf</code> on your sensor to enable and control automatic transcription. You have three main ways to integrate it.


==== Minimal `whisper.cpp` Configuration: ====
==== Option 1: Using `whisper.cpp` (Recommended) ====
This uses the compiled `main` executable.
<pre>
<pre>
# /etc/voipmonitor.conf
# /etc/voipmonitor.conf
Line 99: Line 127:
# --- CRITICAL ---
# --- CRITICAL ---
# You MUST provide the absolute path to the downloaded whisper.cpp model file
# You MUST provide the absolute path to the downloaded whisper.cpp model file
whisper_model = /path/to/your/whisper.cpp/models/ggml-base.en.bin
whisper_model = /path/to/your/whisper.cpp/models/ggml-small.bin
</pre>
</pre>


==== Minimal `OpenAI Whisper` Configuration: ====
==== Option 2: Using `OpenAI Whisper` ====
This uses the Python library.
<pre>
<pre>
# /etc/voipmonitor.conf
# /etc/voipmonitor.conf
Line 109: Line 138:
audio_transcribe = yes
audio_transcribe = yes


# Use the Python engine (this is the default)
# Use the Python engine (this is the default, but explicit is better)
whisper_native = no
whisper_native = no


# Specify the model name to use. 'small' is the default.
# Specify the model name to use ('small' is a good default).
# The library will download it to ~/.cache/whisper/ if not found.
whisper_model = small
whisper_model = small
</pre>
</pre>


=== Step 3: Fine-Tuning Transcription (Optional) ===
==== Option 3: Using `whisper.cpp` as a Loadable Module (Advanced) ====
This method allows you to update the `whisper.cpp` library without recompiling the entire sniffer. It requires a modified `whisper.cpp` build (see Advanced section).
<pre>
# /etc/voipmonitor.conf
 
audio_transcribe = yes
whisper_native = yes
whisper_model = /path/to/your/whisper.cpp/models/ggml-small.bin
 
# Specify the path to the compiled shared library
whisper_native_lib = /path/to/your/whisper.cpp/libwhisper.so
</pre>
 
=== Step 3: Fine-Tuning Transcription Parameters ===
The following parameters in <code>voipmonitor.conf</code> allow you to control the transcription process:
The following parameters in <code>voipmonitor.conf</code> allow you to control the transcription process:


;<code>audio_transcribe = yes</code>
: (Default: no) Enables the audio transcription feature.
;<code>audio_transcribe_connect_duration_min = 10</code>
;<code>audio_transcribe_connect_duration_min = 10</code>
: (Default: 10) Only transcribes calls that were connected for at least this many seconds.
: (Default: 10) Only transcribes calls that were connected for at least this many seconds.
Line 125: Line 170:
;<code>audio_transcribe_queue_length_max = 100</code>
;<code>audio_transcribe_queue_length_max = 100</code>
: (Default: 100) The maximum number of calls waiting in the transcription queue.
: (Default: 100) The maximum number of calls waiting in the transcription queue.
;<code>whisper_native = no</code>
: (Default: no) Set to `yes` to force the use of the `whisper.cpp` engine.
;<code>whisper_model = small</code>
: For `OpenAI Whisper`, this is the model name (tiny, base, small, etc.). For `whisper.cpp`, this '''must''' be the full, absolute path to the `.bin` model file.
;<code>whisper_language = auto</code>
;<code>whisper_language = auto</code>
: (Default: auto) Can be set to a specific language code (e.g., <code>en</code>, <code>de</code>) or <code>by_number</code> to guess based on the phone number's country code.
: (Default: auto) Can be a specific language code (e.g., <code>en</code>, <code>de</code>), `auto` for detection, or <code>by_number</code> to guess based on the phone number's country code.
;<code>whisper_threads = 2</code>
;<code>whisper_threads = 2</code>
: (Default: 2) The number of CPU threads to use for a ''single'' transcription job.
: (Default: 2) The number of CPU threads to use for a ''single'' transcription job.
;<code>whisper_timeout = 300</code>
: (Default: 300) For `OpenAI Whisper` only. Maximum time in seconds for a single transcription.
;<code>whisper_deterministic_mode = yes</code>
;<code>whisper_deterministic_mode = yes</code>
: (Default: yes) For the <code>OpenAI Whisper</code> engine only. Aims for more consistent, repeatable transcription results.
: (Default: yes) For `OpenAI Whisper` only. Aims for more consistent, repeatable transcription results.
;<code>whisper_python = /usr/bin/python3</code>
: (Default: not set) For `OpenAI Whisper` only. Specifies the path to the Python binary if it's not in the system's `PATH`.
;<code>whisper_native_lib = /path/to/libwhisper.so</code>
: (Default: not set) For `whisper.cpp` only. Specifies the path to the shared library when using the loadable module method.


== Advanced: CUDA Acceleration for `whisper.cpp` ==
== Advanced Topics ==
To achieve a massive speed increase (up to 30x), you can compile <code>whisper.cpp</code> with NVIDIA CUDA support. This is highly recommended if you have a compatible NVIDIA GPU.
 
=== Compiling `whisper.cpp` with Libraries for Sniffer Integration ===
To compile the VoIPmonitor sniffer with built-in `whisper.cpp` support or to use it as a loadable library, you must build its shared and static libraries.
 
;1. Build the libraries:
<pre>
cd /path/to/your/whisper.cpp
# Build the main executable, shared lib, and static lib
make -j
make libwhisper.so -j
make libwhisper.a -j
</pre>
 
;2. (Optional) Apply patch for loadable module:
For the advanced "loadable module" integration (<code>whisper_native_lib</code>), a patch is required.
<pre>
# Inside the whisper.cpp directory
patch < whisper.diff
make clean
make -j
make libwhisper.so -j
</pre>
 
;3. Install libraries and headers:
For the sniffer's build process to find the `whisper.cpp` components, place them in standard system locations or create symbolic links.
<pre>
# Create symbolic links to the compiled files in your whisper.cpp directory
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
ln -s $(pwd)/ggml.h /usr/local/include/ggml.h
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so
ln -s $(pwd)/libwhisper.a /usr/local/lib64/libwhisper.a
</pre>
 
=== CUDA Acceleration for `whisper.cpp` ===
To achieve a massive speed increase (up to 30x), you can compile <code>whisper.cpp</code> with NVIDIA CUDA support. This is highly recommended if you have a compatible NVIDIA GPU on your sensor or GUI server.


;1. Install the NVIDIA CUDA Toolkit:
;1. Install the NVIDIA CUDA Toolkit:
:Follow the [https://developer.nvidia.com/cuda-downloads official guide] for your Linux distribution.
Follow the [https://developer.nvidia.com/cuda-downloads official guide] for your Linux distribution.


;2. Verify the installation:
;2. Set environment variables:
:<pre>nvcc --version</pre>
Ensure the CUDA toolkit is in your system's path. You can add these lines to your <code>~/.bashrc</code> file.
<pre>
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
</pre>
Verify with <code>nvcc --version</code>.


;3. Re-compile <code>whisper.cpp</code> with the CUDA flag:
;3. Re-compile `whisper.cpp` with the CUDA flag:
<pre>
<pre>
cd /path/to/your/whisper.cpp
cd /path/to/your/whisper.cpp
make clean
make clean
# Rebuild the executable and libraries with CUDA enabled
WHISPER_CUDA=1 make -j
WHISPER_CUDA=1 make -j
WHISPER_CUDA=1 make libwhisper.so -j
WHISPER_CUDA=1 make libwhisper.a -j
</pre>
</pre>
VoIPmonitor will automatically detect and use the CUDA-enabled <code>whisper.cpp</code> binary or library if available.
VoIPmonitor will automatically detect and use the CUDA-enabled `whisper.cpp` binary or library.


== AI Summary for RAG ==
== AI Summary for RAG ==
'''Summary:''' This guide explains how to integrate OpenAI's Whisper ASR for call transcription in VoIPmonitor. It details two primary methods: on-demand transcription from the GUI and automatic background transcription on the sniffer. For both methods, it compares the two available engines: the official Python `OpenAI Whisper` library and the high-performance C++ port, `whisper.cpp`, recommending `whisper.cpp` for server-side processing. The article provides step-by-step instructions for installing each engine, including compiling `whisper.cpp` from source and installing the Python package via `pip`. It details the necessary configuration in both the GUI's `configuration.php` (e.g., `WHISPER_NATIVE`, `WHISPER_MODEL`) and the sniffer's `voipmonitor.conf` (e.g., `audio_transcribe`, `whisper_native`). It also covers optional parameters for fine-tuning, such as setting language, thread counts, and minimum call duration. Finally, it includes a section on enabling NVIDIA CUDA acceleration for `whisper.cpp` to achieve significant performance gains.
'''Summary:''' This guide explains how to integrate OpenAI's Whisper ASR for call transcription in VoIPmonitor. It details two primary methods: on-demand transcription from the GUI and automatic background transcription on the sniffer. For both methods, it compares the two available engines: the official Python `OpenAI Whisper` library and the high-performance C++ port, `whisper.cpp`, recommending `whisper.cpp` for server-side processing. The article provides step-by-step instructions for installing each engine, including compiling `whisper.cpp` from source, building its libraries, and installing the Python package via `pip`. It details the necessary configuration in both the GUI's `configuration.php` (e.g., `WHISPER_NATIVE`, `WHISPER_MODEL`) and the sniffer's `voipmonitor.conf` (e.g., `audio_transcribe`, `whisper_native`, `whisper_native_lib`). It also covers optional parameters for fine-tuning, such as setting language, thread counts, and minimum call duration. Finally, it includes a detailed section on enabling NVIDIA CUDA acceleration for `whisper.cpp` to achieve significant performance gains and explains advanced integration methods like using `whisper.cpp` as a loadable library.
'''Keywords:''' whisper, transcription, asr, speech to text, openai, whisper.cpp, `audio_transcribe`, `whisper_native`, `whisper_model`, cuda, nvidia, gpu, acceleration, gui, sniffer, automatic transcription, on-demand
'''Keywords:''' whisper, transcription, asr, speech to text, openai, whisper.cpp, `audio_transcribe`, `whisper_native`, `whisper_model`, `whisper_native_lib`, cuda, nvidia, gpu, acceleration, gui, sniffer, automatic transcription, on-demand, libwhisper.so, loadable module
'''Key Questions:'''
'''Key Questions:'''
* How can I transcribe phone calls in VoIPmonitor?
* How can I transcribe phone calls in VoIPmonitor?
Line 159: Line 256:
* What are the required parameters in `voipmonitor.conf` for Whisper?
* What are the required parameters in `voipmonitor.conf` for Whisper?
* How can I speed up Whisper transcription using an NVIDIA GPU (CUDA)?
* How can I speed up Whisper transcription using an NVIDIA GPU (CUDA)?
* How do I install and compile `whisper.cpp`?
* How do I install and compile `whisper.cpp`, including its libraries (`libwhisper.so`)?
* What do the `audio_transcribe` and `whisper_native` options do?
* What do the `audio_transcribe`, `whisper_native`, and `whisper_native_lib` options do?
* How do I use `whisper.cpp` as a loadable module in the sniffer?

Revision as of 02:09, 5 July 2025

Samozřejmě, tady je kompletně přepracovaná a sloučená verze, která zachovává přehlednou strukturu mého původního návrhu, ale integruje **všechny technické detaily** z původní, podrobnější stránky.

Výsledkem je komplexní průvodce, který je jak snadno čitelný pro začátečníky, tak dostatečně detailní pro pokročilé uživatele.

---


This guide explains how to integrate OpenAI's Whisper, a powerful automatic speech recognition (ASR) system, with VoIPmonitor for both on-demand and automatic call transcription.

Introduction to Whisper Integration

VoIPmonitor integrates Whisper, an ASR system from OpenAI trained on 680,000 hours of multilingual and multitask supervised data. This large and diverse dataset leads to improved robustness to accents, background noise, and technical language.

There are two primary ways to use Whisper with VoIPmonitor:

  1. On-Demand Transcription (in the GUI): The simplest method. A user can click a button on any call in the GUI to transcribe it. The processing happens on the GUI server.
  2. Automatic Transcription (in the Sniffer): A more advanced setup where the sensor automatically transcribes all calls (or a subset) in the background immediately after they finish.

For both methods, you must choose one of two underlying Whisper engines to install and configure.

Choosing Your Whisper Engine
  • OpenAI Whisper (Python): The official implementation from OpenAI. It is easier to install (pip install openai-whisper) but can be slower for CPU-based transcription. It uses PyTorch and requires `ffmpeg` for audio pre-processing. The official implementation does not behave deterministically (the same run can have different results), but this can be addressed with a custom script.
  • whisper.cpp (C++): A high-performance C++ port of whisper.cpp. It is significantly faster for CPU transcription and is the recommended engine for server-side processing. It requires manual compilation but offers superior performance and optimizations like NVIDIA CUDA for GPU acceleration (up to 30x faster). Note that it requires audio input to be 16kHz, 1-channel (mono).

Path A: On-Demand Transcription in the GUI

This setup allows users to manually trigger transcription from the call detail page. The processing occurs on the web server where the GUI is hosted.

Option 1: Using the `whisper.cpp` Engine (Recommended)

Step 1: Install `whisper.cpp` and Download a Model

First, you need to compile the `whisper.cpp` project and download a pre-trained model on your GUI server.

# Clone the repository
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp

# Compile the main application
make -j

# Download a model (e.g., 'base.en' for English-only, or 'small' for multilingual)
./models/download-ggml-model.sh base.en

This will create the main executable at ./main and download the model to the ./models/ directory. Note that `whisper.cpp` models (GGML format) are not binary compatible with the official OpenAI models (.pt format), but a conversion script is provided in the project.

Step 2: Configure the VoIPmonitor GUI

Edit your GUI's configuration file at /var/www/html/config/configuration.php and add the following definitions:

<?php
// /var/www/html/config/configuration.php

// Tell the GUI to use the whisper.cpp engine
define('WHISPER_NATIVE', true);

// Provide the absolute path to the model file you downloaded
define('WHISPER_MODEL', '/path/to/your/whisper.cpp/models/ggml-base.en.bin');

// Optional: Specify the number of threads for transcription
define('WHISPER_THREADS', 4);

No further setup is required. The GUI will now show a "Transcribe" button on call detail pages.

Option 2: Using the `OpenAI Whisper` Engine

Step 1: Install the Python Package and Dependencies

# Install the whisper library via pip
pip install openai-whisper

# Install ffmpeg, which is required for audio conversion
# For Debian/Ubuntu
sudo apt-get install ffmpeg
# For CentOS/RHEL/Fedora
sudo dnf install ffmpeg

Step 2: Prepare the Model and Configure the GUI

The Python library can download models automatically, but it's best practice to specify an absolute path in the configuration. First, trigger a download to a known location.

# This command will download the 'small' model to /opt/whisper_models/
# You can use any audio file; its content doesn't matter for the download.
whisper audio.wav --model=small --model_dir=/opt/whisper_models

Now, edit /var/www/html/config/configuration.php and provide the full path to the downloaded model file.

<?php
// /var/www/html/config/configuration.php

// Provide the absolute path to the downloaded .pt model file.
define('WHISPER_MODEL', '/opt/whisper_models/small.pt');

// Optional: Specify the number of threads
define('WHISPER_THREADS', 4);

Testing the GUI Integration

You can test the transcription process from the command line as the GUI would run it. This is useful for debugging paths and performance.

# Example test for a whisper.cpp setup
/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' --json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/your/whisper.cpp/models/ggml-small.bin"},{"whisper_threads":"2"}]' -v1,whisper

Path B: Automatic Transcription in the Sniffer

This setup automatically transcribes calls in the background on the sensor itself. This is a headless operation and requires configuration in voipmonitor.conf. Using whisper.cpp is strongly recommended for this server-side task due to its superior performance.

Step 1: Prepare Your Engine on the Sensor

You must have one of the Whisper engines installed on the sensor machine.

For `whisper.cpp`

Follow the installation steps from "Path A" to compile `whisper.cpp`. For advanced integration, you may need to build the shared libraries and install them system-wide (see Advanced section below).

For `OpenAI Whisper`

Follow the Python package installation steps from "Path A".

Step 2: Configure the Sniffer

Edit /etc/voipmonitor.conf on your sensor to enable and control automatic transcription. You have three main ways to integrate it.

Option 1: Using `whisper.cpp` (Recommended)

This uses the compiled `main` executable.

# /etc/voipmonitor.conf

# Enable the transcription feature
audio_transcribe = yes

# Tell the sniffer to use the high-performance C++ engine
whisper_native = yes

# --- CRITICAL ---
# You MUST provide the absolute path to the downloaded whisper.cpp model file
whisper_model = /path/to/your/whisper.cpp/models/ggml-small.bin

Option 2: Using `OpenAI Whisper`

This uses the Python library.

# /etc/voipmonitor.conf

# Enable the transcription feature
audio_transcribe = yes

# Use the Python engine (this is the default, but explicit is better)
whisper_native = no

# Specify the model name to use ('small' is a good default).
# The library will download it to ~/.cache/whisper/ if not found.
whisper_model = small

Option 3: Using `whisper.cpp` as a Loadable Module (Advanced)

This method allows you to update the `whisper.cpp` library without recompiling the entire sniffer. It requires a modified `whisper.cpp` build (see Advanced section).

# /etc/voipmonitor.conf

audio_transcribe = yes
whisper_native = yes
whisper_model = /path/to/your/whisper.cpp/models/ggml-small.bin

# Specify the path to the compiled shared library
whisper_native_lib = /path/to/your/whisper.cpp/libwhisper.so

Step 3: Fine-Tuning Transcription Parameters

The following parameters in voipmonitor.conf allow you to control the transcription process:

audio_transcribe = yes
(Default: no) Enables the audio transcription feature.
audio_transcribe_connect_duration_min = 10
(Default: 10) Only transcribes calls that were connected for at least this many seconds.
audio_transcribe_threads = 2
(Default: 2) The number of calls to transcribe concurrently.
audio_transcribe_queue_length_max = 100
(Default: 100) The maximum number of calls waiting in the transcription queue.
whisper_native = no
(Default: no) Set to `yes` to force the use of the `whisper.cpp` engine.
whisper_model = small
For `OpenAI Whisper`, this is the model name (tiny, base, small, etc.). For `whisper.cpp`, this must be the full, absolute path to the `.bin` model file.
whisper_language = auto
(Default: auto) Can be a specific language code (e.g., en, de), `auto` for detection, or by_number to guess based on the phone number's country code.
whisper_threads = 2
(Default: 2) The number of CPU threads to use for a single transcription job.
whisper_timeout = 300
(Default: 300) For `OpenAI Whisper` only. Maximum time in seconds for a single transcription.
whisper_deterministic_mode = yes
(Default: yes) For `OpenAI Whisper` only. Aims for more consistent, repeatable transcription results.
whisper_python = /usr/bin/python3
(Default: not set) For `OpenAI Whisper` only. Specifies the path to the Python binary if it's not in the system's `PATH`.
whisper_native_lib = /path/to/libwhisper.so
(Default: not set) For `whisper.cpp` only. Specifies the path to the shared library when using the loadable module method.

Advanced Topics

Compiling `whisper.cpp` with Libraries for Sniffer Integration

To compile the VoIPmonitor sniffer with built-in `whisper.cpp` support or to use it as a loadable library, you must build its shared and static libraries.

1. Build the libraries
cd /path/to/your/whisper.cpp
# Build the main executable, shared lib, and static lib
make -j
make libwhisper.so -j
make libwhisper.a -j
2. (Optional) Apply patch for loadable module

For the advanced "loadable module" integration (whisper_native_lib), a patch is required.

# Inside the whisper.cpp directory
patch < whisper.diff
make clean
make -j
make libwhisper.so -j
3. Install libraries and headers

For the sniffer's build process to find the `whisper.cpp` components, place them in standard system locations or create symbolic links.

# Create symbolic links to the compiled files in your whisper.cpp directory
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
ln -s $(pwd)/ggml.h /usr/local/include/ggml.h
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so
ln -s $(pwd)/libwhisper.a /usr/local/lib64/libwhisper.a

CUDA Acceleration for `whisper.cpp`

To achieve a massive speed increase (up to 30x), you can compile whisper.cpp with NVIDIA CUDA support. This is highly recommended if you have a compatible NVIDIA GPU on your sensor or GUI server.

1. Install the NVIDIA CUDA Toolkit

Follow the official guide for your Linux distribution.

2. Set environment variables

Ensure the CUDA toolkit is in your system's path. You can add these lines to your ~/.bashrc file.

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Verify with nvcc --version.

3. Re-compile `whisper.cpp` with the CUDA flag
cd /path/to/your/whisper.cpp
make clean
# Rebuild the executable and libraries with CUDA enabled
WHISPER_CUDA=1 make -j
WHISPER_CUDA=1 make libwhisper.so -j
WHISPER_CUDA=1 make libwhisper.a -j

VoIPmonitor will automatically detect and use the CUDA-enabled `whisper.cpp` binary or library.

AI Summary for RAG

Summary: This guide explains how to integrate OpenAI's Whisper ASR for call transcription in VoIPmonitor. It details two primary methods: on-demand transcription from the GUI and automatic background transcription on the sniffer. For both methods, it compares the two available engines: the official Python `OpenAI Whisper` library and the high-performance C++ port, `whisper.cpp`, recommending `whisper.cpp` for server-side processing. The article provides step-by-step instructions for installing each engine, including compiling `whisper.cpp` from source, building its libraries, and installing the Python package via `pip`. It details the necessary configuration in both the GUI's `configuration.php` (e.g., `WHISPER_NATIVE`, `WHISPER_MODEL`) and the sniffer's `voipmonitor.conf` (e.g., `audio_transcribe`, `whisper_native`, `whisper_native_lib`). It also covers optional parameters for fine-tuning, such as setting language, thread counts, and minimum call duration. Finally, it includes a detailed section on enabling NVIDIA CUDA acceleration for `whisper.cpp` to achieve significant performance gains and explains advanced integration methods like using `whisper.cpp` as a loadable library. Keywords: whisper, transcription, asr, speech to text, openai, whisper.cpp, `audio_transcribe`, `whisper_native`, `whisper_model`, `whisper_native_lib`, cuda, nvidia, gpu, acceleration, gui, sniffer, automatic transcription, on-demand, libwhisper.so, loadable module Key Questions:

  • How can I transcribe phone calls in VoIPmonitor?
  • What is the difference between OpenAI Whisper and whisper.cpp? Which one should I use?
  • How do I configure on-demand call transcription in the GUI?
  • How do I set up the sniffer for automatic, server-side transcription of all calls?
  • What are the required parameters in `voipmonitor.conf` for Whisper?
  • How can I speed up Whisper transcription using an NVIDIA GPU (CUDA)?
  • How do I install and compile `whisper.cpp`, including its libraries (`libwhisper.so`)?
  • What do the `audio_transcribe`, `whisper_native`, and `whisper_native_lib` options do?
  • How do I use `whisper.cpp` as a loadable module in the sniffer?