Whisper: Difference between revisions
(Created page with "== Whisper == === Methods === There are two modes of Whisper that can be integrated into the sniffer. * [https://openai.com/index/whisper/ Openai Whisper] * [https://github.com/ggerganov/whisper.cpp whisper.cpp] Both modes require a model, which are trained data sets that are loaded into the machine learning library. ==== Openai Whisper ==== ===== Installation ===== Installation of OpenAI Whisper. <code>pip install openai-whisper</code> Dependency checks will auto...") |
No edit summary |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
{{DISPLAYTITLE:Call Transcription with Whisper AI}} | |||
'''This guide explains how to integrate OpenAI's Whisper, a powerful automatic speech recognition (ASR) system, with VoIPmonitor for both on-demand and automatic call transcription.''' | |||
== Introduction to Whisper Integration == | |||
VoIPmonitor integrates [https://openai.com/index/whisper/ Whisper], an ASR system trained on a massive dataset, enabling robust transcription of calls with various languages, accents, and background noise. | |||
There are two primary ways to use Whisper with VoIPmonitor: | |||
#'''On-Demand Transcription (in the GUI):''' The simplest method. A user can click a button on any call in the GUI to transcribe it. The processing happens on the GUI server. | |||
#'''Automatic Transcription (in the Sniffer):''' A more advanced setup where the sensor automatically transcribes all calls (or a subset) in the background immediately after they finish. | |||
For both methods, you must choose one of two underlying Whisper engines to install and configure. | |||
;Choosing Your Whisper Engine | |||
*'''OpenAI Whisper''' (Python): The official implementation from OpenAI. It is easier to install (<code>pip install openai-whisper</code>) but can be slower for CPU-based transcription. | |||
*'''whisper.cpp''' (C++): A high-performance C++ port of Whisper. It is significantly faster for CPU transcription and is the '''recommended engine for server-side processing'''. It requires manual compilation but offers superior performance and optimizations like CUDA. | |||
== Path A: On-Demand Transcription in the GUI == | |||
This setup allows users to manually trigger transcription from the call detail page. The processing occurs on the web server where the GUI is hosted. | |||
=== Option 1: Using the `whisper.cpp` Engine (Recommended) === | |||
<code> | ==== Step 1: Install `whisper.cpp` and Download a Model ==== | ||
First, you need to compile the <code>whisper.cpp</code> project and download a pre-trained model on your GUI server. | |||
<pre> | |||
# Clone the repository | |||
git clone https://github.com/ggerganov/whisper.cpp.git | |||
cd whisper.cpp | |||
# Compile the main application | |||
make -j | make -j | ||
# Download a model (e.g., 'base.en' for English-only, or 'small' for multilingual) | |||
./models/download-ggml-model.sh base.en | |||
</pre> | |||
This will create the main executable at <code>./main</code> and download the model to the <code>./models/</code> directory. | |||
<code> | ==== Step 2: Configure the VoIPmonitor GUI ==== | ||
Edit your GUI's configuration file at <code>/var/www/html/config/configuration.php</code> and add the following definitions: | |||
<pre> | |||
<?php | |||
// /var/www/html/config/configuration.php | |||
// Tell the GUI to use the whisper.cpp engine | |||
define('WHISPER_NATIVE', true); | |||
// Provide the absolute path to the model file you downloaded | |||
define('WHISPER_MODEL', '/path/to/your/whisper.cpp/models/ggml-base.en.bin'); | |||
// Optional: Specify the number of threads for transcription | |||
define('WHISPER_THREADS', 4); | |||
</pre> | |||
No further setup is required. The GUI will now show a "Transcribe" button on call detail pages. | |||
=== Option 2: Using the `OpenAI Whisper` Engine === | |||
==== Step 1: Install the Python Package and Dependencies ==== | |||
<pre> | |||
# Install the whisper library via pip | |||
pip install openai-whisper | |||
# Install ffmpeg, which is required for audio conversion | |||
# For Debian/Ubuntu | |||
sudo apt-get install ffmpeg | |||
# For CentOS/RHEL | |||
sudo yum install ffmpeg | |||
</pre> | |||
==== Step 2: Configure the VoIPmonitor GUI ==== | |||
Edit <code>/var/www/html/config/configuration.php</code> and define the model you want to use. The Whisper library will download it automatically on the first run. | |||
<pre> | |||
<?php | |||
// /var/www/html/config/configuration.php | |||
// Specify the model name. Options: tiny, base, small, medium, large | |||
// 'small' is a good balance of speed and accuracy. | |||
define('WHISPER_MODEL', 'small'); | |||
</pre> | |||
== Path B: Automatic Transcription in the Sniffer == | |||
This setup automatically transcribes calls in the background on the sensor itself. This is a headless operation and requires configuration in <code>voipmonitor.conf</code>. | |||
=== | === Step 1: Choose and Prepare Your Engine === | ||
You must have one of the Whisper engines installed '''on the sensor machine'''. Using '''<code>whisper.cpp</code> is strongly recommended''' for this server-side task due to its superior performance. Follow the installation steps from "Path A" to compile <code>whisper.cpp</code> or install the <code>openai-whisper</code> Python package. | |||
<code> | === Step 2: Configure the Sniffer === | ||
Edit <code>/etc/voipmonitor.conf</code> on your sensor to enable and control automatic transcription. | |||
==== Minimal `whisper.cpp` Configuration: ==== | |||
<pre> | |||
# /etc/voipmonitor.conf | |||
= | # Enable the transcription feature | ||
audio_transcribe = yes | |||
# Tell the sniffer to use the high-performance C++ engine | |||
whisper_native = yes | whisper_native = yes | ||
# --- CRITICAL --- | |||
# You MUST provide the absolute path to the downloaded whisper.cpp model file | |||
whisper_model = /path/to/your/whisper.cpp/models/ggml-base.en.bin | |||
</pre> | |||
==== | ==== Minimal `OpenAI Whisper` Configuration: ==== | ||
<pre> | |||
# /etc/voipmonitor.conf | |||
# Enable the transcription feature | |||
audio_transcribe = yes | |||
# Use the Python engine (this is the default) | |||
whisper_native = no | |||
# Specify the model name to use. 'small' is the default. | |||
whisper_model = small | |||
</pre> | |||
=== Step 3: Fine-Tuning Transcription (Optional) === | |||
The following parameters in <code>voipmonitor.conf</code> allow you to control the transcription process: | |||
=== | ;<code>audio_transcribe_connect_duration_min = 10</code> | ||
: (Default: 10) Only transcribes calls that were connected for at least this many seconds. | |||
;<code>audio_transcribe_threads = 2</code> | |||
: (Default: 2) The number of calls to transcribe concurrently. | |||
;<code>audio_transcribe_queue_length_max = 100</code> | |||
: (Default: 100) The maximum number of calls waiting in the transcription queue. | |||
;<code>whisper_language = auto</code> | |||
: (Default: auto) Can be set to a specific language code (e.g., <code>en</code>, <code>de</code>) or <code>by_number</code> to guess based on the phone number's country code. | |||
;<code>whisper_threads = 2</code> | |||
: (Default: 2) The number of CPU threads to use for a ''single'' transcription job. | |||
;<code>whisper_deterministic_mode = yes</code> | |||
: (Default: yes) For the <code>OpenAI Whisper</code> engine only. Aims for more consistent, repeatable transcription results. | |||
==== | == Advanced: CUDA Acceleration for `whisper.cpp` == | ||
To achieve a massive speed increase (up to 30x), you can compile <code>whisper.cpp</code> with NVIDIA CUDA support. This is highly recommended if you have a compatible NVIDIA GPU. | |||
;1. Install the NVIDIA CUDA Toolkit: | |||
:Follow the [https://developer.nvidia.com/cuda-downloads official guide] for your Linux distribution. | |||
;2. Verify the installation: | |||
:<pre>nvcc --version</pre> | |||
;3. Re-compile <code>whisper.cpp</code> with the CUDA flag: | |||
<pre> | |||
cd /path/to/your/whisper.cpp | |||
make clean | |||
WHISPER_CUDA=1 make -j | |||
</pre> | |||
VoIPmonitor will automatically detect and use the CUDA-enabled <code>whisper.cpp</code> binary or library if available. | |||
< | |||
= | |||
< | |||
<code> | |||
* | == AI Summary for RAG == | ||
* | '''Summary:''' This guide explains how to integrate OpenAI's Whisper ASR for call transcription in VoIPmonitor. It details two primary methods: on-demand transcription from the GUI and automatic background transcription on the sniffer. For both methods, it compares the two available engines: the official Python `OpenAI Whisper` library and the high-performance C++ port, `whisper.cpp`, recommending `whisper.cpp` for server-side processing. The article provides step-by-step instructions for installing each engine, including compiling `whisper.cpp` from source and installing the Python package via `pip`. It details the necessary configuration in both the GUI's `configuration.php` (e.g., `WHISPER_NATIVE`, `WHISPER_MODEL`) and the sniffer's `voipmonitor.conf` (e.g., `audio_transcribe`, `whisper_native`). It also covers optional parameters for fine-tuning, such as setting language, thread counts, and minimum call duration. Finally, it includes a section on enabling NVIDIA CUDA acceleration for `whisper.cpp` to achieve significant performance gains. | ||
'''Keywords:''' whisper, transcription, asr, speech to text, openai, whisper.cpp, `audio_transcribe`, `whisper_native`, `whisper_model`, cuda, nvidia, gpu, acceleration, gui, sniffer, automatic transcription, on-demand | |||
'''Key Questions:''' | |||
* How can I transcribe phone calls in VoIPmonitor? | |||
* What is the difference between OpenAI Whisper and whisper.cpp? Which one should I use? | |||
* How do I configure on-demand call transcription in the GUI? | |||
* How do I set up the sniffer for automatic, server-side transcription of all calls? | |||
* What are the required parameters in `voipmonitor.conf` for Whisper? | |||
* How can I speed up Whisper transcription using an NVIDIA GPU (CUDA)? | |||
* How do I install and compile `whisper.cpp`? | |||
* What do the `audio_transcribe` and `whisper_native` options do? |
Latest revision as of 17:18, 30 June 2025
This guide explains how to integrate OpenAI's Whisper, a powerful automatic speech recognition (ASR) system, with VoIPmonitor for both on-demand and automatic call transcription.
Introduction to Whisper Integration
VoIPmonitor integrates Whisper, an ASR system trained on a massive dataset, enabling robust transcription of calls with various languages, accents, and background noise.
There are two primary ways to use Whisper with VoIPmonitor:
- On-Demand Transcription (in the GUI): The simplest method. A user can click a button on any call in the GUI to transcribe it. The processing happens on the GUI server.
- Automatic Transcription (in the Sniffer): A more advanced setup where the sensor automatically transcribes all calls (or a subset) in the background immediately after they finish.
For both methods, you must choose one of two underlying Whisper engines to install and configure.
- Choosing Your Whisper Engine
- OpenAI Whisper (Python): The official implementation from OpenAI. It is easier to install (
pip install openai-whisper
) but can be slower for CPU-based transcription. - whisper.cpp (C++): A high-performance C++ port of Whisper. It is significantly faster for CPU transcription and is the recommended engine for server-side processing. It requires manual compilation but offers superior performance and optimizations like CUDA.
Path A: On-Demand Transcription in the GUI
This setup allows users to manually trigger transcription from the call detail page. The processing occurs on the web server where the GUI is hosted.
Option 1: Using the `whisper.cpp` Engine (Recommended)
Step 1: Install `whisper.cpp` and Download a Model
First, you need to compile the whisper.cpp
project and download a pre-trained model on your GUI server.
# Clone the repository git clone https://github.com/ggerganov/whisper.cpp.git cd whisper.cpp # Compile the main application make -j # Download a model (e.g., 'base.en' for English-only, or 'small' for multilingual) ./models/download-ggml-model.sh base.en
This will create the main executable at ./main
and download the model to the ./models/
directory.
Step 2: Configure the VoIPmonitor GUI
Edit your GUI's configuration file at /var/www/html/config/configuration.php
and add the following definitions:
<?php // /var/www/html/config/configuration.php // Tell the GUI to use the whisper.cpp engine define('WHISPER_NATIVE', true); // Provide the absolute path to the model file you downloaded define('WHISPER_MODEL', '/path/to/your/whisper.cpp/models/ggml-base.en.bin'); // Optional: Specify the number of threads for transcription define('WHISPER_THREADS', 4);
No further setup is required. The GUI will now show a "Transcribe" button on call detail pages.
Option 2: Using the `OpenAI Whisper` Engine
Step 1: Install the Python Package and Dependencies
# Install the whisper library via pip pip install openai-whisper # Install ffmpeg, which is required for audio conversion # For Debian/Ubuntu sudo apt-get install ffmpeg # For CentOS/RHEL sudo yum install ffmpeg
Step 2: Configure the VoIPmonitor GUI
Edit /var/www/html/config/configuration.php
and define the model you want to use. The Whisper library will download it automatically on the first run.
<?php // /var/www/html/config/configuration.php // Specify the model name. Options: tiny, base, small, medium, large // 'small' is a good balance of speed and accuracy. define('WHISPER_MODEL', 'small');
Path B: Automatic Transcription in the Sniffer
This setup automatically transcribes calls in the background on the sensor itself. This is a headless operation and requires configuration in voipmonitor.conf
.
Step 1: Choose and Prepare Your Engine
You must have one of the Whisper engines installed on the sensor machine. Using whisper.cpp
is strongly recommended for this server-side task due to its superior performance. Follow the installation steps from "Path A" to compile whisper.cpp
or install the openai-whisper
Python package.
Step 2: Configure the Sniffer
Edit /etc/voipmonitor.conf
on your sensor to enable and control automatic transcription.
Minimal `whisper.cpp` Configuration:
# /etc/voipmonitor.conf # Enable the transcription feature audio_transcribe = yes # Tell the sniffer to use the high-performance C++ engine whisper_native = yes # --- CRITICAL --- # You MUST provide the absolute path to the downloaded whisper.cpp model file whisper_model = /path/to/your/whisper.cpp/models/ggml-base.en.bin
Minimal `OpenAI Whisper` Configuration:
# /etc/voipmonitor.conf # Enable the transcription feature audio_transcribe = yes # Use the Python engine (this is the default) whisper_native = no # Specify the model name to use. 'small' is the default. whisper_model = small
Step 3: Fine-Tuning Transcription (Optional)
The following parameters in voipmonitor.conf
allow you to control the transcription process:
audio_transcribe_connect_duration_min = 10
- (Default: 10) Only transcribes calls that were connected for at least this many seconds.
audio_transcribe_threads = 2
- (Default: 2) The number of calls to transcribe concurrently.
audio_transcribe_queue_length_max = 100
- (Default: 100) The maximum number of calls waiting in the transcription queue.
whisper_language = auto
- (Default: auto) Can be set to a specific language code (e.g.,
en
,de
) orby_number
to guess based on the phone number's country code. whisper_threads = 2
- (Default: 2) The number of CPU threads to use for a single transcription job.
whisper_deterministic_mode = yes
- (Default: yes) For the
OpenAI Whisper
engine only. Aims for more consistent, repeatable transcription results.
Advanced: CUDA Acceleration for `whisper.cpp`
To achieve a massive speed increase (up to 30x), you can compile whisper.cpp
with NVIDIA CUDA support. This is highly recommended if you have a compatible NVIDIA GPU.
- 1. Install the NVIDIA CUDA Toolkit
- Follow the official guide for your Linux distribution.
- 2. Verify the installation
nvcc --version
- 3. Re-compile
whisper.cpp
with the CUDA flag
cd /path/to/your/whisper.cpp make clean WHISPER_CUDA=1 make -j
VoIPmonitor will automatically detect and use the CUDA-enabled whisper.cpp
binary or library if available.
AI Summary for RAG
Summary: This guide explains how to integrate OpenAI's Whisper ASR for call transcription in VoIPmonitor. It details two primary methods: on-demand transcription from the GUI and automatic background transcription on the sniffer. For both methods, it compares the two available engines: the official Python `OpenAI Whisper` library and the high-performance C++ port, `whisper.cpp`, recommending `whisper.cpp` for server-side processing. The article provides step-by-step instructions for installing each engine, including compiling `whisper.cpp` from source and installing the Python package via `pip`. It details the necessary configuration in both the GUI's `configuration.php` (e.g., `WHISPER_NATIVE`, `WHISPER_MODEL`) and the sniffer's `voipmonitor.conf` (e.g., `audio_transcribe`, `whisper_native`). It also covers optional parameters for fine-tuning, such as setting language, thread counts, and minimum call duration. Finally, it includes a section on enabling NVIDIA CUDA acceleration for `whisper.cpp` to achieve significant performance gains. Keywords: whisper, transcription, asr, speech to text, openai, whisper.cpp, `audio_transcribe`, `whisper_native`, `whisper_model`, cuda, nvidia, gpu, acceleration, gui, sniffer, automatic transcription, on-demand Key Questions:
- How can I transcribe phone calls in VoIPmonitor?
- What is the difference between OpenAI Whisper and whisper.cpp? Which one should I use?
- How do I configure on-demand call transcription in the GUI?
- How do I set up the sniffer for automatic, server-side transcription of all calls?
- What are the required parameters in `voipmonitor.conf` for Whisper?
- How can I speed up Whisper transcription using an NVIDIA GPU (CUDA)?
- How do I install and compile `whisper.cpp`?
- What do the `audio_transcribe` and `whisper_native` options do?