# whisper_flutter **Repository Path**: jimonik/whisper_flutter ## Basic Information - **Project Name**: whisper_flutter - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-14 - **Last Updated**: 2025-12-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Whisper Flutter Flutter plugin for real-time speech-to-text transcription using Whisper.cpp with intelligent Voice Activity Detection (VAD). ## Features - ✅ **Real-time streaming transcription** - Continuous audio capture and processing - ✅ **Intelligent VAD** - Automatic speech segmentation based on energy detection - ✅ **Multi-platform support** - Android and iOS native implementation - ✅ **Offline processing** - All transcription runs on-device using Whisper.cpp - ✅ **English language support** - Optimized for English speech recognition - ✅ **Non-blocking architecture** - Audio recording never blocks during transcription ## Project Structure ``` whisper_flutter/ ├── whisper_lib/ # C++ Whisper.cpp wrapper library │ ├── src/ # Source code (whisper_lib.cpp, whisper_lib.h) │ ├── scripts/ # Build scripts for Android and iOS │ └── libs/ # Compiled libraries (.so for Android, .a for iOS) │ └── flutter_plugin/ # Flutter plugin implementation ├── android/ # Android native implementation (JNI) ├── ios/ # iOS native implementation (Objective-C/Swift) ├── lib/ # Dart API └── example/ # Example Flutter app with streaming demo ``` ## Building ### Prerequisites - **Android**: Android NDK 27.0.12077973 or later - **iOS**: Xcode 15.0 or later - **Flutter**: Flutter 3.0 or later ### Build Native Libraries #### Android ```bash cd whisper_lib bash scripts/build_android_simple.sh arm64-v8a ``` Output: `libs/android/libwhisper_lib_arm64-v8a.so` #### iOS ```bash cd whisper_lib bash scripts/build_ios_simple.sh ``` Output: `libs/ios/libwhisper_lib_arm64_x86_64.a` (universal library) ### Install Libraries to Plugin ```bash # Android cp whisper_lib/libs/android/libwhisper_lib_arm64-v8a.so \ flutter_plugin/android/src/main/jniLibs/arm64-v8a/libwhisper_lib.so # iOS cp whisper_lib/libs/ios/libwhisper_lib_arm64_x86_64.a \ flutter_plugin/ios/Frameworks/ ``` ## Running the Example 1. **Download Whisper model** (tiny model recommended for mobile): ```bash # Download ggml-tiny.bin model curl -L "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin" \ -o flutter_plugin/example/assets/whisperkit-coreml/openai_whisper-tiny/ggml-tiny.bin ``` 2. **Run on Android**: ```bash cd flutter_plugin/example flutter run -d ``` 3. **Run on iOS**: ```bash cd flutter_plugin/example flutter run -d ``` ## Usage ```dart import 'package:flutter_plugin/flutter_plugin.dart'; // Initialize model final modelPath = await ModelManager.copyModelFromAssets(); await WhisperPlugin.createStreamingContext( modelPath: modelPath, contextId: 'main', ); // Start recording and streaming transcription final audioStream = WhisperPlugin.startStreamingRecord(contextId: 'main'); audioStream.listen((audioData) { // Audio data is automatically processed by native VAD // Transcription results are retrieved via getStreamingResult }); // Get transcription results final result = await WhisperPlugin.getStreamingResult(contextId: 'main'); final segments = result['segments'] as List; for (var segment in segments) { print('${segment['from_ts']}-${segment['to_ts']}: ${segment['text']}'); } // Stop recording await WhisperPlugin.stopStreamingRecord(contextId: 'main'); // Clean up await WhisperPlugin.freeStreamingContext(contextId: 'main'); ``` ## How It Works ### Voice Activity Detection (VAD) The plugin uses an energy-based VAD algorithm: 1. **Energy Monitoring**: Continuously monitors audio energy (100ms intervals) 2. **Speech Detection**: When energy exceeds threshold (0.03), marks as speech 3. **Pause Detection**: When energy stays low for 1 second, triggers segmentation 4. **Automatic Transcription**: Each detected segment is transcribed asynchronously ### Architecture ``` Flutter App (Dart) ↓ MethodChannel / EventChannel Native Layer (Kotlin/Swift) ↓ JNI / Objective-C Whisper Lib (C++) ↓ Whisper.cpp (C) ``` ### Key Components - **`whisper_lib.cpp`**: C++ wrapper for Whisper.cpp with streaming support - **`AudioStreamHandler` (Android/iOS)**: Native audio capture (16kHz, mono, float32) - **`streaming_whisper_page.dart`**: Flutter UI with VAD logic and state management - **JNI/Objective-C bridge**: Platform-specific FFI bindings ## Configuration ### VAD Parameters (in `streaming_whisper_page.dart`) ```dart double voiceEnergyThreshold = 0.03; // Voice detection threshold int pauseDetectionFrames = 10; // Frames needed to detect pause (1 second) ``` ### Whisper Parameters ```dart params: { 'language': 'en', // Language (en, zh, etc.) 'threads': 4, // CPU threads 'is_translate': false, // Translation mode 'split_on_word': true, // Word-level segmentation } ``` ## Performance - **Tiny model**: ~1-2 seconds per segment on modern mobile devices - **Memory usage**: ~200-300 MB (model + runtime) - **Battery impact**: Moderate (continuous microphone + CPU usage) ## Troubleshooting ### Android Issues 1. **"UnsatisfiedLinkError"**: Ensure libraries are in correct path - Check: `android/src/main/jniLibs/arm64-v8a/libwhisper_lib.so` - Verify JNI function naming (package underscores → `_1`) 2. **"No implementation found"**: Call `WhisperNative.loadLibraries()` in plugin initialization 3. **Audio recording fails**: Check `RECORD_AUDIO` permission in `AndroidManifest.xml` ### iOS Issues 1. **Build errors**: Ensure library is universal (arm64 + x86_64) - Check: `ios/Frameworks/libwhisper_lib_arm64_x86_64.a` 2. **Microphone permission**: Add `NSMicrophoneUsageDescription` to `Info.plist` 3. **"Symbol not found"**: Verify library is linked in `flutter_plugin.podspec` ## License MIT License ## Credits - [whisper.cpp](https://github.com/ggerganov/whisper.cpp) - High-performance Whisper inference - [OpenAI Whisper](https://github.com/openai/whisper) - Original Whisper model ## Contributing Contributions welcome! Please open an issue or PR.