|
|
| (7 intermediate revisions by 2 users not shown) |
| Line 1: |
Line 1: |
| == Whisper ==
| | {{DISPLAYTITLE:Call Transcription with Whisper AI}} |
|
| |
|
| VoIPmonitor integrates [https://openai.com/index/whisper/ whisper] from OpenAI which is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. The use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English.
| | '''Integrate OpenAI's Whisper ASR with VoIPmonitor for on-demand or automatic call transcription.''' |
|
| |
|
| This integration is built directly into the GUI for selective transcription and also in to the sniffer for online transcription.
| | == Overview == |
|
| |
|
| Processing speed depends on used model and available hardware. Accelerator like Nvidia CUDA are 30x faster than CPU.
| | VoIPmonitor supports [https://openai.com/index/whisper/ Whisper], a speech recognition system trained on 680,000 hours of multilingual data. Two integration modes are available: |
|
| |
|
| For the demonstration purpose install the latest GUI and click on transcribe icon - this method does not need to configure or install anything.
| | {| class="wikitable" |
| | ! Mode !! Location !! Use Case |
| | |- |
| | | '''On-Demand''' || GUI server || User clicks "Transcribe" on individual calls |
| | |- |
| | | '''Automatic''' || Sensor || All calls transcribed automatically after ending |
| | |} |
|
| |
|
| | <kroki lang="mermaid"> |
| | %%{init: {'flowchart': {'nodeSpacing': 15, 'rankSpacing': 30}}}%% |
| | flowchart LR |
| | subgraph "On-Demand (GUI)" |
| | A1[User clicks Transcribe] --> A2[GUI Server] --> A3[Result displayed] |
| | end |
| | subgraph "Automatic (Sniffer)" |
| | B1[Call ends] --> B2[Queued] --> B3[Transcribed] --> B4[Stored in DB] |
| | end |
| | </kroki> |
|
| |
|
| === Methods === | | === Whisper Engines === |
| There are two modes of Whisper that can be integrated into the sniffer.
| |
|
| |
|
| * [https://openai.com/index/whisper/ Openai Whisper]
| | {| class="wikitable" |
| * [https://github.com/ggerganov/whisper.cpp whisper.cpp]
| | ! Engine !! Pros !! Cons !! Recommended For |
| | |- |
| | | '''whisper.cpp''' (C++) || Fast, low resource usage, CUDA support (30x speedup) || Requires compilation || Server-side processing |
| | |- |
| | | '''OpenAI Whisper''' (Python) || Easy install (<code>pip install</code>) || Slower, requires ffmpeg || Quick testing |
| | |} |
|
| |
|
| Both modes require a model, which are trained data sets that are loaded into the machine learning library.
| | {{Tip|Use '''whisper.cpp''' for production deployments. It's significantly faster and supports GPU acceleration.}} |
|
| |
|
| ==== Openai Whisper ==== | | == Quick Start: GUI On-Demand (No Compilation) == |
|
| |
|
| ===== Installation =====
| | The simplest setup - download a pre-built model and start transcribing immediately. |
| Installation of OpenAI Whisper.
| |
|
| |
|
| <code>pip install openai-whisper</code> | | <syntaxhighlight lang="bash"> |
| | # Download model to GUI bin directory |
| | wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/html/bin/ggml-base.bin |
|
| |
|
| Dependency checks will automatically enforce the installation of PyTorch, an open source machine learning library. It is also recommended to install ffmpeg. So for Debian:
| | # Set ownership (Debian/Ubuntu) |
| | chown www-data:www-data /var/www/html/bin/ggml-base.bin |
|
| |
|
| <code>sudo apt install ffmpeg</code>
| | # For RedHat/CentOS, use: chown apache:apache |
| | </syntaxhighlight> |
|
| |
|
| or for RedHat/Fedora:
| | The "Transcribe" button now appears on call detail pages. No configuration changes needed. |
|
| |
|
| <code>sudo dnf install ffmpeg</code>
| | == GUI On-Demand: Advanced Setup == |
|
| |
|
| ===== Model =====
| | For custom model paths or using the Python engine. |
| Whisper automatically downloads the model. The type of model can be specified with the --model parameter. Models are base, tiny, small, medium, large. The default model is small.
| |
|
| |
|
| The model is downloaded to the home directory in the folder ~/.cache/whisper/. The location of the models can be changed with the --model_dir parameter.
| | === Option 1: whisper.cpp with Custom Model === |
|
| |
|
| Types of models and their accuracy and speed:
| | <syntaxhighlight lang="bash"> |
| | # Compile whisper.cpp |
| | git clone https://github.com/ggerganov/whisper.cpp.git |
| | cd whisper.cpp && make -j |
|
| |
|
| * tiny and base models are smaller and faster, but with lower accuracy,
| | # Download model |
| * small and medium models are larger and offer better accuracy,
| | ./models/download-ggml-model.sh base.en |
| * large model is the largest and most accurate, but also the slowest and most computationally intensive.
| | </syntaxhighlight> |
|
| |
|
| ===== Test =====
| | Configure <code>/var/www/html/config/configuration.php</code>: |
| The test is very simple.
| |
|
| |
|
| <code>whisper audio.wav</code> | | <syntaxhighlight lang="php"> |
| | define('WHISPER_NATIVE', true); |
| | define('WHISPER_MODEL', '/path/to/whisper.cpp/models/ggml-base.en.bin'); |
| | define('WHISPER_THREADS', 4); // Optional |
| | </syntaxhighlight> |
|
| |
|
| Parameters can be revealed with the –help parameter.
| | === Option 2: OpenAI Whisper (Python) === |
|
| |
|
| [https://openai.com/index/whisper/ https://openai.com/index/whisper/]
| | <syntaxhighlight lang="bash"> |
| [https://github.com/ggerganov/whisper.cpp https://github.com/ggerganov/whisper.cpp]
| | pip install openai-whisper |
| | apt install ffmpeg # or dnf install ffmpeg |
| | </syntaxhighlight> |
|
| |
|
| <code>whisper –help</code> | | Configure <code>/var/www/html/config/configuration.php</code>: |
|
| |
|
| ===== Python Script ===== | | <syntaxhighlight lang="php"> |
| The desired behavior of Whisper can be modified using a Python script. The script transcribe.py demonstrates enforcing deterministic mode. The OpenAI implementation of Whisper does not behave deterministically and the same run can have different results each time, which is undesirable.
| | define('WHISPER_MODEL', '/opt/whisper_models/small.pt'); |
| | define('WHISPER_THREADS', 4); |
| | </syntaxhighlight> |
|
| |
|
| ==== Whisper.cpp ==== | | == Automatic Transcription (Sniffer) == |
|
| |
|
| ===== Installation =====
| | Transcribe all calls automatically on the sensor after they end. |
| To download the code, create the desired folder and enter it. Then:
| |
|
| |
|
| <code>git clone https://github.com/ggerganov/whisper.cpp.git</code>
| | === Basic Configuration === |
|
| |
|
| Git will download the project into the whisper.cpp folder. All subsequent commands assume that you have entered this folder. So:
| | Edit <code>/etc/voipmonitor.conf</code>: |
|
| |
|
| <code>cd whisper.cpp</code> | | <syntaxhighlight lang="ini"> |
| | # Enable transcription |
| | audio_transcribe = yes |
|
| |
|
| Now run the build:
| | # Using whisper.cpp (recommended) |
| | whisper_native = yes |
| | whisper_model = /path/to/whisper.cpp/models/ggml-small.bin |
|
| |
|
| <code>make -j</code>
| | # OR using Python (slower) |
| | # whisper_native = no |
| | # whisper_model = small |
| | </syntaxhighlight> |
|
| |
|
| Optionally, you can also build the dynamic and static library.
| | Restart: <code>systemctl restart voipmonitor</code> |
|
| |
|
| <code>make libwhisper.so -j
| | === Configuration Parameters === |
| make libwhisper.a -j</code>
| |
|
| |
|
| ===== Model ===== | | {| class="wikitable" |
| Unlike OpenAI Whisper, the model is not downloaded automatically. The script models/download-ggml-model.sh is used to download the model. Model types (i.e., tiny, base, small ...) are the same as for OpenAI Whisper. There are also variants of models that contain only English. These have the suffix '-en'. The default model is base-en. The base model is downloaded like this:
| | ! Parameter !! Default !! Description |
| | |- |
| | | <code>audio_transcribe</code> || no || Enable/disable transcription |
| | |- |
| | | <code>audio_transcribe_connect_duration_min</code> || 10 || Minimum call duration (seconds) to transcribe |
| | |- |
| | | <code>audio_transcribe_threads</code> || 2 || Concurrent transcription jobs |
| | |- |
| | | <code>audio_transcribe_queue_length_max</code> || 100 || Max queue size |
| | |- |
| | | <code>whisper_native</code> || no || Use whisper.cpp (<code>yes</code>) or Python (<code>no</code>) |
| | |- |
| | | <code>whisper_model</code> || small || Model name (Python) or '''absolute path''' to .bin file (whisper.cpp) |
| | |- |
| | | <code>whisper_language</code> || auto || Language code (<code>en</code>, <code>de</code>), <code>auto</code>, or <code>by_number</code> |
| | |- |
| | | <code>whisper_threads</code> || 2 || CPU threads per transcription job |
| | |- |
| | | <code>whisper_timeout</code> || 300 || Timeout in seconds (Python only) |
| | |- |
| | | <code>whisper_deterministic_mode</code> || yes || Consistent results (Python only) |
| | |- |
| | | <code>whisper_python</code> || - || Custom Python binary path (Python only) |
| | |- |
| | | <code>whisper_native_lib</code> || - || Path to libwhisper.so (advanced) |
| | |} |
|
| |
|
| <code>models/download-ggml-model.sh base</code>
| | == Advanced: CUDA GPU Acceleration == |
|
| |
|
| The model is thus saved in the 'models' folder. Note that OpenAI Whisper models are not binary compatible with models for whisper.cpp. However, it is possible to convert OpenAI models to the required format using the models/convert-pt-to-ggml.py script.
| | Compile whisper.cpp with NVIDIA CUDA for up to 30x speedup. |
|
| |
|
| ===== Test ===== | | <syntaxhighlight lang="bash"> |
| For testing, simply specify the audio file and the parameter specifying the model.
| | # Install CUDA toolkit (see nvidia.com/cuda-downloads) |
| | # Add to ~/.bashrc: |
| | export PATH=/usr/local/cuda/bin:$PATH |
| | export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH |
|
| |
|
| <code>./main audio.wav -m models/ggml-base.bin</code>
| | # Compile with CUDA |
| | cd /path/to/whisper.cpp |
| | make clean |
| | WHISPER_CUDA=1 make -j |
| | WHISPER_CUDA=1 make libwhisper.so -j |
| | </syntaxhighlight> |
|
| |
|
| Note that the audio file has strict limitations. It must contain 16kHz sampling and only one channel. OpenAI Whisper handles this itself using ffmpeg.
| | == Advanced: Loadable Module == |
|
| |
|
| ===== Modification =====
| | Use whisper.cpp as a separate library (update without recompiling sniffer): |
| For advanced integration in the sniffer as a loadable library, modification is required. The patch file whisper.diff is used for this.
| |
|
| |
|
| <code>patch < whisper.diff | | <syntaxhighlight lang="bash"> |
| make -j
| | # Build libraries |
| | cd /path/to/whisper.cpp |
| make libwhisper.so -j | | make libwhisper.so -j |
| make libwhisper.a -j</code> | | make libwhisper.a -j |
| | |
| ===== Compilation including Nvidia CUDA Acceleration =====
| |
| Acceleration using Nvidia graphics card or Nvidia accelerator significantly speeds up the transcription process.
| |
| | |
| First, you need to download the CUDA libraries. Here is the necessary guide:
| |
| | |
| [https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64 https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64]
| |
|
| |
|
| Do not forget to add CUDA to your $PATH and $LD_LIBRARY_PATH. You can do this as follows:
| | # Optional: Install system-wide |
| | | ln -s $(pwd)/whisper.h /usr/local/include/whisper.h |
| <code>export PATH=/usr/local/cuda/bin:$PATH
| |
| export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH</code>
| |
| | |
| You can add both lines to the ~/.bashrc file.
| |
| | |
| If the installation is complete, verify it by checking the version of nvcc.
| |
| | |
| <code>nvcc --version</code>
| |
| | |
| If everything is fine, you can rebuild whisper.cpp including Nvidia CUDA acceleration.
| |
| | |
| <code>make clean
| |
| WHISPER_CUDA=1 make -j
| |
| WHISPER_CUDA=1 make libwhisper.so -j
| |
| WHISPER_CUDA=1 make libwhisper.a -j</code>
| |
| | |
| ===== Installation of Libraries and Headers =====
| |
| If you want to build a sniffer with whisper.cpp, you will need the headers and libraries of whisper in the usual location. To do this, simply create links.
| |
| | |
| <code>ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
| |
| ln -s $(pwd)/ggml.h /usr/local/include/ggml.h
| |
| ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so | | ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so |
| ln -s $(pwd)/libwhisper.a /usr/local/lib64/libwhisper.a</code>
| | </syntaxhighlight> |
| | |
| === Integration in the Sniffer ===
| |
| | |
| Whisper can be used in the sniffer in the following ways:
| |
| | |
| # Using OpenAI Whisper
| |
| # Compilation with whisper.cpp (or using a static binary)
| |
| # Compilation using the whisper.cpp library as a loadable module
| |
| | |
| ==== Parameters ====
| |
| The following parameters are available for all the above methods:
| |
| | |
| <code>audio_transcribe = yes/NO</code>
| |
| | |
| Enables audio transcription. This is done after the call is saved to the database. The transcription result is saved in the 'cdr_audio_transcribe' table.
| |
| default: no
| |
| | |
| <code>audio_transcribe_connect_duration_min = N</code>
| |
| | |
| Limits audio transcription only to calls where the connection duration is at least the specified number of seconds.
| |
| default: 10
| |
| | |
| <code>audio_transcribe_threads = N</code>
| |
| | |
| Number of threads in which audio transcription is run. This is the maximum number of calls processed concurrently.
| |
| default: 2
| |
| | |
| <code>audio_transcribe_queue_length_max = N</code>
| |
| | |
| Maximum queue length of calls waiting for transcription. If the queue is full, transcription for additional calls is not performed.
| |
| default: 100
| |
| | |
| <code>whisper_native = yes/NO</code>
| |
| | |
| If set to 'yes', forces the use of whisper.cpp. If set to 'no', OpenAI Whisper is used.
| |
| default: no
| |
| | |
| <code>whisper_model = {model}</code>
| |
| | |
| For OpenAI Whisper, the model type (tiny, base, small, medium, large) is specified. If not specified, 'small' is used. The value of the parameter for OpenAI Whisper can also be the model file name including its path.
| |
| For whisper.cpp, the parameter is mandatory and the model file name including its path must be specified.
| |
| default: not specified
| |
| | |
| <code>whisper_language = auto/by_number/{language}</code>
| |
| | |
| Whisper can automatically perform language detection. However, this may not be reliable. Therefore, in addition to the 'auto' option, the following options are available:
| |
| * by_number – language is enforced according to the country assignment based on phone numbers in the call,
| |
| * {language} – specify the language according to ISO 639-1.
| |
| default: auto
| |
| | |
| <code>whisper_timeout = N</code>
| |
| | |
| Maximum time (in seconds) for transcription.
| |
| This parameter is only for OpenAI Whisper.
| |
| default: 300
| |
| | |
| <code>whisper_deterministic_mode = YES/no</code>
| |
| | |
| Determines whether Whisper should be run in deterministic mode.
| |
| This parameter is only for OpenAI Whisper.
| |
| default: yes
| |
| | |
| <code>whisper_python = {filepathname of python}</code>
| |
| | |
| Specifies the Python binary used to run the transcription. This parameter is not mandatory.
| |
| This parameter is only for OpenAI Whisper.
| |
| default: not specified
| |
| | |
| <code>whisper_threads = N</code>
| |
| | |
| Number of Whisper threads allocated for transcribing a single call. Increasing the number of threads speeds up transcription. To determine the optimal number of threads, you can use the test described in the 'Usage in GUI' / 'test' section.
| |
| default: 2
| |
| | |
| <code>whisper_native_lib = {filepathname of whisper.cpp library}</code>
| |
| | |
| Specifies the location of the whisper.cpp library. This parameter is only for the case of compiling the sniffer using the whisper.cpp library as a loadable module.
| |
| default: not specified
| |
| | |
| ==== Using OpenAI Whisper ====
| |
| The prerequisite is the installation of OpenAI Whisper. This is described above. To use it in the sniffer, simply enable transcription.
| |
| | |
| <code>audio_transcribe = yes</code>
| |
| | |
| By default, the 'small' model (parameter whisper_model) and the 'auto' language (parameter whisper_language) will be used.
| |
| | |
| ==== Compilation with whisper.cpp ====
| |
| The prerequisites are:
| |
| | |
| * For compiling the sniffer: building whisper.cpp including the creation of the libwhisper.so library (the procedure is described above).
| |
| * Or using a static sniffer binary that includes whisper.cpp.
| |
| | |
| The static sniffer binary allows basic transcription operation using the whisper.cpp project. For compatibility reasons, it does not include advanced optimizations for new CPU types or Nvidia CUDA acceleration. However, you can easily achieve this by compiling the sniffer including the whisper.cpp project.
| |
| | |
| Whether you have compiled the sniffer or downloaded the static binary, to enable transcription it is necessary to specify the use of whisper.cpp and specify the model. You must have the model downloaded in advance (as described above). Assuming you have the small model downloaded as /opt/whisper.cpp/models/ggml-small.bin, the parameters will be:
| |
| | |
| <code>audio_transcribe = yes
| |
| whisper_native = yes
| |
| whisper_model = /opt/whisper.cpp/models/ggml-small.bin</code>
| |
| | |
| By default, the 'auto' language (parameter whisper_language) will be used.
| |
| | |
| ==== Compilation using the whisper.cpp library as a loadable module ====
| |
| This option is advanced and serves for easy experimentation with the whisper.cpp build without having to build the entire sniffer.
| |
| | |
| The prerequisites are:
| |
| | |
| * A compiled sniffer without integrated whisper.cpp support. The config.h file must not have the '#define HAVE_LIBWHISPER 1' option enabled and the Makefile must not contain '-lwhisper' in 'SHARED_LIBS'.
| |
| * A compiled whisper.cpp including the 'whisper.diff' modification and the creation of the libwhisper.so library.
| |
| | |
| Compared to the previously described method of running, it is enough to add the location of the whisper.cpp library.
| |
| | |
| <code>audio_transcribe = yes
| |
| whisper_native = yes
| |
| whisper_model = /opt/whisper.cpp/models/ggml-small.bin
| |
| whisper_native_lib = /opt/whisper.cpp/libwhisper.so</code>
| |
| | |
| === Usage in GUI ===
| |
| It is possible to enable transcription for the active sniffer run. However, if you do not have an Nvidia accelerator (or Nvidia graphics card), transcription will be very CPU intensive. Therefore, it may be more useful to request transcription in the GUI for a selected call. The above alternatives for OpenAI Whisper and whisper.cpp can be used for this.
| |
| | |
| ==== GUI – OpenAI Whisper ====
| |
| | |
| ===== Preparation =====
| |
| Install OpenAI Whisper according to the procedure described above.
| |
| | |
| Download the required model. The easiest way to do this is by running a test with the specified model. For the 'small' model, for example, like this:
| |
| | |
| <code>whisper audio.wav --model=small</code>
| |
| | |
| You will then find the model in the ~/.cache/whisper folder. You can force the folder for model download using the --model_dir parameter. To download the required model to the /opt/whisper/models folder, do this:
| |
| | |
| <code>whisper audio.wav --model=small --model_dir=/opt/whisper/models</code>
| |
| | |
| ===== Configuration =====
| |
| Simply add the model specification to the GUI configuration. It is best to specify the file name of the model including the path to it. If the 'small' model was in the /opt/whisper/models folder, the configuration should be (in the config/configuration.php file):
| |
| | |
| <code>define('WHISPER_MODEL', '/opt/whisper/models/small.pt');</code>
| |
| | |
| ==== GUI – whisper.cpp ====
| |
| | |
| ===== Preparation =====
| |
| If including whisper.cpp in the static binary is sufficient for you, you do not need to do anything within the preparation. However, the static binary may not contain all useful optimizations for your CPU and does not include support for Nvidia accelerators (or graphics cards).
| |
| | |
| Especially if you have an Nvidia accelerator, this procedure is suitable:
| |
| | |
| # Build the sniffer including the whisper.cpp library according to the procedures described above.
| |
| # Create a symbolic link to the resulting sniffer binary module in the bin folder of your GUI installation. The name of the link must be 'vm'.
| |
| | |
| You will also need the model. Its download is described above (it is the use of the models/download-ggml-model.sh script). So download and place the chosen model in your chosen folder. For example, to /opt/whisper.cpp/models.
| |
| | |
| ===== Configuration =====
| |
| The configuration is now easy. Simply specify the use of whisper.cpp (using the whisper_native parameter) and specify the model.
| |
| | |
| <code>define('WHISPER_NATIVE', true);
| |
| define('WHISPER_MODEL', '/opt/whisper.cpp/models/ggml-small.bin');</code>
| |
|
| |
|
| If you chose to build the sniffer with the whisper.cpp loadable library, the parameter specifying where the library is located would also be needed. For example:
| | Configure in <code>voipmonitor.conf</code>: |
|
| |
|
| <code>define('WHISPER_NATIVE_LIB', '/opt/whisper.cpp/libwhisper.so');</code> | | <syntaxhighlight lang="ini"> |
| | whisper_native_lib = /path/to/whisper.cpp/libwhisper.so |
| | </syntaxhighlight> |
|
| |
|
| ==== Common Parameters ==== | | == Troubleshooting == |
| The only common parameter is the WHISPER_THREADS parameter to specify the number of threads. By default, two threads are used. Setting the use of 4 threads looks like this:
| |
|
| |
|
| <code>define('WHISPER_THREADS', 4);</code>
| | === Model Download Fails === |
|
| |
|
| ==== Test ====
| | Test connectivity: |
| A useful test might be to run the transcription as it is run by the GUI. If we assume the location of your GUI web folder at /var/www/html and if you have a test audio file /tmp/audio.wav, the test run might look like this:
| | <syntaxhighlight lang="bash"> |
| | curl -I https://download.voipmonitor.org/whisper/ggml-base.bin |
| | </syntaxhighlight> |
|
| |
|
| <code>/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' -- | | '''If blocked:''' |
| json_config='[{"whisper_native":"yes"},{"whisper_model":"/opt/whisper.cpp
| | * Check firewall: <code>iptables -L -v -n</code>, <code>ufw status</code> |
| /models/ggml-small.bin"},{"whisper_threads":"2"}]' -v1,whisper</code>
| | * Check proxy: Set <code>HTTP_PROXY</code> / <code>HTTPS_PROXY</code> environment variables |
| | * Check DNS: <code>nslookup download.voipmonitor.org</code> |
|
| |
|
| The test run will allow you to easily debug the necessary number of threads allocated for transcription.
| | '''Workaround:''' Download manually on another machine and copy via SCP. |
|
| |
|
| === Methods === | | === Testing from CLI === |
| * Openai Whisper
| |
| * Installation
| |
| * Model
| |
| * Test
| |
| * Python Script
| |
|
| |
|
| * Whisper.cpp
| | <syntaxhighlight lang="bash"> |
| * Installation
| | /var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' \ |
| * Model
| | --json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/ggml-small.bin"}]' \ |
| * Test
| | -v1,whisper |
| * Modification
| | </syntaxhighlight> |
| * Compilation including Nvidia CUDA acceleration
| |
| * Installation of Libraries and Headers
| |
|
| |
|
| === Integration in the Sniffer === | | == AI Summary for RAG == |
| * Parameters
| |
| * Using OpenAI Whisper
| |
| * Compilation with whisper.cpp
| |
| * Compilation using the whisper.cpp library as a loadable module
| |
|
| |
|
| === Usage in GUI === | | '''Summary:''' VoIPmonitor integrates Whisper ASR for call transcription via two modes: on-demand (GUI button) and automatic (sniffer background processing). Two engines available: whisper.cpp (C++, recommended, fast, CUDA support) and OpenAI Whisper (Python, easier install). Quick start: download pre-built model from <code>https://download.voipmonitor.org/whisper/ggml-base.bin</code> to <code>/var/www/html/bin/</code>, set ownership to www-data. Sniffer config: enable <code>audio_transcribe=yes</code> and <code>whisper_native=yes</code> with absolute path to model in <code>whisper_model</code>. Key parameters: <code>audio_transcribe_connect_duration_min</code> (min call length), <code>whisper_threads</code> (CPU threads), <code>whisper_language</code> (auto/code/by_number). CUDA acceleration available for whisper.cpp (30x speedup). |
| * GUI – OpenAI Whisper
| |
| * Preparation
| |
| * Configuration
| |
|
| |
|
| * GUI – whisper.cpp
| | '''Keywords:''' whisper, transcription, asr, speech to text, openai, whisper.cpp, audio_transcribe, whisper_native, whisper_model, cuda, gpu, ggml-base.bin, libwhisper.so, automatic transcription, on-demand |
| * Preparation
| |
| * Configuration
| |
|
| |
|
| * Common Parameters | | '''Key Questions:''' |
| * Test | | * How do I enable call transcription in VoIPmonitor? |
| | * What is the quickest way to enable Whisper transcription? |
| | * How do I download the Whisper model for the GUI? |
| | * What is the difference between whisper.cpp and OpenAI Whisper? |
| | * How do I configure automatic transcription on the sniffer? |
| | * What parameters control Whisper transcription behavior? |
| | * How do I enable GPU acceleration for Whisper? |
| | * Why is the model download failing and how do I fix it? |
| | * How do I test Whisper transcription from the command line? |
Integrate OpenAI's Whisper ASR with VoIPmonitor for on-demand or automatic call transcription.
Overview
VoIPmonitor supports Whisper, a speech recognition system trained on 680,000 hours of multilingual data. Two integration modes are available:
| Mode |
Location |
Use Case
|
| On-Demand |
GUI server |
User clicks "Transcribe" on individual calls
|
| Automatic |
Sensor |
All calls transcribed automatically after ending
|
Whisper Engines
| Engine |
Pros |
Cons |
Recommended For
|
| whisper.cpp (C++) |
Fast, low resource usage, CUDA support (30x speedup) |
Requires compilation |
Server-side processing
|
| OpenAI Whisper (Python) |
Easy install (pip install) |
Slower, requires ffmpeg |
Quick testing
|
💡 Tip: Use whisper.cpp for production deployments. It's significantly faster and supports GPU acceleration.
Quick Start: GUI On-Demand (No Compilation)
The simplest setup - download a pre-built model and start transcribing immediately.
# Download model to GUI bin directory
wget https://download.voipmonitor.org/whisper/ggml-base.bin -O /var/www/html/bin/ggml-base.bin
# Set ownership (Debian/Ubuntu)
chown www-data:www-data /var/www/html/bin/ggml-base.bin
# For RedHat/CentOS, use: chown apache:apache
The "Transcribe" button now appears on call detail pages. No configuration changes needed.
GUI On-Demand: Advanced Setup
For custom model paths or using the Python engine.
Option 1: whisper.cpp with Custom Model
# Compile whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp && make -j
# Download model
./models/download-ggml-model.sh base.en
Configure /var/www/html/config/configuration.php:
define('WHISPER_NATIVE', true);
define('WHISPER_MODEL', '/path/to/whisper.cpp/models/ggml-base.en.bin');
define('WHISPER_THREADS', 4); // Optional
Option 2: OpenAI Whisper (Python)
pip install openai-whisper
apt install ffmpeg # or dnf install ffmpeg
Configure /var/www/html/config/configuration.php:
define('WHISPER_MODEL', '/opt/whisper_models/small.pt');
define('WHISPER_THREADS', 4);
Automatic Transcription (Sniffer)
Transcribe all calls automatically on the sensor after they end.
Basic Configuration
Edit /etc/voipmonitor.conf:
# Enable transcription
audio_transcribe = yes
# Using whisper.cpp (recommended)
whisper_native = yes
whisper_model = /path/to/whisper.cpp/models/ggml-small.bin
# OR using Python (slower)
# whisper_native = no
# whisper_model = small
Restart: systemctl restart voipmonitor
Configuration Parameters
| Parameter |
Default |
Description
|
audio_transcribe |
no |
Enable/disable transcription
|
audio_transcribe_connect_duration_min |
10 |
Minimum call duration (seconds) to transcribe
|
audio_transcribe_threads |
2 |
Concurrent transcription jobs
|
audio_transcribe_queue_length_max |
100 |
Max queue size
|
whisper_native |
no |
Use whisper.cpp (yes) or Python (no)
|
whisper_model |
small |
Model name (Python) or absolute path to .bin file (whisper.cpp)
|
whisper_language |
auto |
Language code (en, de), auto, or by_number
|
whisper_threads |
2 |
CPU threads per transcription job
|
whisper_timeout |
300 |
Timeout in seconds (Python only)
|
whisper_deterministic_mode |
yes |
Consistent results (Python only)
|
whisper_python |
- |
Custom Python binary path (Python only)
|
whisper_native_lib |
- |
Path to libwhisper.so (advanced)
|
Advanced: CUDA GPU Acceleration
Compile whisper.cpp with NVIDIA CUDA for up to 30x speedup.
# Install CUDA toolkit (see nvidia.com/cuda-downloads)
# Add to ~/.bashrc:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
# Compile with CUDA
cd /path/to/whisper.cpp
make clean
WHISPER_CUDA=1 make -j
WHISPER_CUDA=1 make libwhisper.so -j
Advanced: Loadable Module
Use whisper.cpp as a separate library (update without recompiling sniffer):
# Build libraries
cd /path/to/whisper.cpp
make libwhisper.so -j
make libwhisper.a -j
# Optional: Install system-wide
ln -s $(pwd)/whisper.h /usr/local/include/whisper.h
ln -s $(pwd)/libwhisper.so /usr/local/lib64/libwhisper.so
Configure in voipmonitor.conf:
whisper_native_lib = /path/to/whisper.cpp/libwhisper.so
Troubleshooting
Model Download Fails
Test connectivity:
curl -I https://download.voipmonitor.org/whisper/ggml-base.bin
If blocked:
- Check firewall:
iptables -L -v -n, ufw status
- Check proxy: Set
HTTP_PROXY / HTTPS_PROXY environment variables
- Check DNS:
nslookup download.voipmonitor.org
Workaround: Download manually on another machine and copy via SCP.
Testing from CLI
/var/www/html/bin/vm --audio-transcribe='/tmp/audio.wav {}' \
--json_config='[{"whisper_native":"yes"},{"whisper_model":"/path/to/ggml-small.bin"}]' \
-v1,whisper
AI Summary for RAG
Summary: VoIPmonitor integrates Whisper ASR for call transcription via two modes: on-demand (GUI button) and automatic (sniffer background processing). Two engines available: whisper.cpp (C++, recommended, fast, CUDA support) and OpenAI Whisper (Python, easier install). Quick start: download pre-built model from https://download.voipmonitor.org/whisper/ggml-base.bin to /var/www/html/bin/, set ownership to www-data. Sniffer config: enable audio_transcribe=yes and whisper_native=yes with absolute path to model in whisper_model. Key parameters: audio_transcribe_connect_duration_min (min call length), whisper_threads (CPU threads), whisper_language (auto/code/by_number). CUDA acceleration available for whisper.cpp (30x speedup).
Keywords: whisper, transcription, asr, speech to text, openai, whisper.cpp, audio_transcribe, whisper_native, whisper_model, cuda, gpu, ggml-base.bin, libwhisper.so, automatic transcription, on-demand
Key Questions:
- How do I enable call transcription in VoIPmonitor?
- What is the quickest way to enable Whisper transcription?
- How do I download the Whisper model for the GUI?
- What is the difference between whisper.cpp and OpenAI Whisper?
- How do I configure automatic transcription on the sniffer?
- What parameters control Whisper transcription behavior?
- How do I enable GPU acceleration for Whisper?
- Why is the model download failing and how do I fix it?
- How do I test Whisper transcription from the command line?