# whisper_flutter

**Repository Path**: jimonik/whisper_flutter

## Basic Information

- **Project Name**: whisper_flutter
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-12-14
- **Last Updated**: 2025-12-16

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Whisper Flutter

Flutter plugin for real-time speech-to-text transcription using Whisper.cpp with intelligent Voice Activity Detection (VAD).

## Features

- ✅ **Real-time streaming transcription** - Continuous audio capture and processing
- ✅ **Intelligent VAD** - Automatic speech segmentation based on energy detection
- ✅ **Multi-platform support** - Android and iOS native implementation
- ✅ **Offline processing** - All transcription runs on-device using Whisper.cpp
- ✅ **English language support** - Optimized for English speech recognition
- ✅ **Non-blocking architecture** - Audio recording never blocks during transcription

## Project Structure

```
whisper_flutter/
├── whisper_lib/          # C++ Whisper.cpp wrapper library
│   ├── src/              # Source code (whisper_lib.cpp, whisper_lib.h)
│   ├── scripts/          # Build scripts for Android and iOS
│   └── libs/             # Compiled libraries (.so for Android, .a for iOS)
│
└── flutter_plugin/       # Flutter plugin implementation
    ├── android/          # Android native implementation (JNI)
    ├── ios/              # iOS native implementation (Objective-C/Swift)
    ├── lib/              # Dart API
    └── example/          # Example Flutter app with streaming demo
```

## Building

### Prerequisites

- **Android**: Android NDK 27.0.12077973 or later
- **iOS**: Xcode 15.0 or later
- **Flutter**: Flutter 3.0 or later

### Build Native Libraries

#### Android

```bash
cd whisper_lib
bash scripts/build_android_simple.sh arm64-v8a
```

Output: `libs/android/libwhisper_lib_arm64-v8a.so`

#### iOS

```bash
cd whisper_lib
bash scripts/build_ios_simple.sh
```

Output: `libs/ios/libwhisper_lib_arm64_x86_64.a` (universal library)

### Install Libraries to Plugin

```bash
# Android
cp whisper_lib/libs/android/libwhisper_lib_arm64-v8a.so \
   flutter_plugin/android/src/main/jniLibs/arm64-v8a/libwhisper_lib.so

# iOS
cp whisper_lib/libs/ios/libwhisper_lib_arm64_x86_64.a \
   flutter_plugin/ios/Frameworks/
```

## Running the Example

1. **Download Whisper model** (tiny model recommended for mobile):

```bash
# Download ggml-tiny.bin model
curl -L "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin" \
  -o flutter_plugin/example/assets/whisperkit-coreml/openai_whisper-tiny/ggml-tiny.bin
```

2. **Run on Android**:

```bash
cd flutter_plugin/example
flutter run -d <android-device-id>
```

3. **Run on iOS**:

```bash
cd flutter_plugin/example
flutter run -d <ios-device-id>
```

## Usage

```dart
import 'package:flutter_plugin/flutter_plugin.dart';

// Initialize model
final modelPath = await ModelManager.copyModelFromAssets();
await WhisperPlugin.createStreamingContext(
  modelPath: modelPath,
  contextId: 'main',
);

// Start recording and streaming transcription
final audioStream = WhisperPlugin.startStreamingRecord(contextId: 'main');
audioStream.listen((audioData) {
  // Audio data is automatically processed by native VAD
  // Transcription results are retrieved via getStreamingResult
});

// Get transcription results
final result = await WhisperPlugin.getStreamingResult(contextId: 'main');
final segments = result['segments'] as List;
for (var segment in segments) {
  print('${segment['from_ts']}-${segment['to_ts']}: ${segment['text']}');
}

// Stop recording
await WhisperPlugin.stopStreamingRecord(contextId: 'main');

// Clean up
await WhisperPlugin.freeStreamingContext(contextId: 'main');
```

## How It Works

### Voice Activity Detection (VAD)

The plugin uses an energy-based VAD algorithm:

1. **Energy Monitoring**: Continuously monitors audio energy (100ms intervals)
2. **Speech Detection**: When energy exceeds threshold (0.03), marks as speech
3. **Pause Detection**: When energy stays low for 1 second, triggers segmentation
4. **Automatic Transcription**: Each detected segment is transcribed asynchronously

### Architecture

```
Flutter App (Dart)
    ↓ MethodChannel / EventChannel
Native Layer (Kotlin/Swift)
    ↓ JNI / Objective-C
Whisper Lib (C++)
    ↓
Whisper.cpp (C)
```

### Key Components

- **`whisper_lib.cpp`**: C++ wrapper for Whisper.cpp with streaming support
- **`AudioStreamHandler` (Android/iOS)**: Native audio capture (16kHz, mono, float32)
- **`streaming_whisper_page.dart`**: Flutter UI with VAD logic and state management
- **JNI/Objective-C bridge**: Platform-specific FFI bindings

## Configuration

### VAD Parameters (in `streaming_whisper_page.dart`)

```dart
double voiceEnergyThreshold = 0.03;      // Voice detection threshold
int pauseDetectionFrames = 10;           // Frames needed to detect pause (1 second)
```

### Whisper Parameters

```dart
params: {
  'language': 'en',           // Language (en, zh, etc.)
  'threads': 4,               // CPU threads
  'is_translate': false,      // Translation mode
  'split_on_word': true,      // Word-level segmentation
}
```

## Performance

- **Tiny model**: ~1-2 seconds per segment on modern mobile devices
- **Memory usage**: ~200-300 MB (model + runtime)
- **Battery impact**: Moderate (continuous microphone + CPU usage)

## Troubleshooting

### Android Issues

1. **"UnsatisfiedLinkError"**: Ensure libraries are in correct path
   - Check: `android/src/main/jniLibs/arm64-v8a/libwhisper_lib.so`
   - Verify JNI function naming (package underscores → `_1`)

2. **"No implementation found"**: Call `WhisperNative.loadLibraries()` in plugin initialization

3. **Audio recording fails**: Check `RECORD_AUDIO` permission in `AndroidManifest.xml`

### iOS Issues

1. **Build errors**: Ensure library is universal (arm64 + x86_64)
   - Check: `ios/Frameworks/libwhisper_lib_arm64_x86_64.a`

2. **Microphone permission**: Add `NSMicrophoneUsageDescription` to `Info.plist`

3. **"Symbol not found"**: Verify library is linked in `flutter_plugin.podspec`

## License

MIT License

## Credits

- [whisper.cpp](https://github.com/ggerganov/whisper.cpp) - High-performance Whisper inference
- [OpenAI Whisper](https://github.com/openai/whisper) - Original Whisper model

## Contributing

Contributions welcome! Please open an issue or PR.