Launch powerful mobile apps in weeks.
Build powerful web app & SaaS platforms.
Build AI-powered cross-platform app.
Launch premium website that sells.
Launch apps that think, learn, & perform.
Deploy powerful eCommerce app in weeks.
Written by Anika Ali Nitu
Get experienced developers for your project
Voice input has quietly become one of the most expected features in modern mobile apps, from note-taking tools to customer support chatbots. If you’re planning Flutter speech-to-text mobile app development for your next project, you’re in the right place. This guide walks through the entire process from package setup and platform permissions to writing the Dart logic that converts spoken words into text in real time.
We’ll build a working voice input screen step by step, explain the supporting concepts like speech recognition, Flutter packages and microphone permission handling, and point out the pitfalls that trip up most developers the first time they try this.
Voice interfaces aren’t a novelty anymore. Hands-free note entry, accessibility support, voice search, and conversational AI assistants all depend on reliable speech-to-text conversion. A well-built Flutter voice app gives you a single codebase that taps into the native speech recognition engines on both Android and iOS, which means you don’t need separate native modules to get a working voice-to-text app.
The core tool for this job is the speech_to_text package, a Flutter plugin that exposes device-specific speech recognition capability so your Dart code can request a transcription without writing any platform-specific Swift or Kotlin. It’s free, actively maintained, and works on Android, iOS, macOS, and web, which makes it the default choice for most voice input app development work in Flutter.
speech_to_text
Before writing any UI code, you need to add the right dependencies and configure platform permissions. Skipping this step is the most common reason a voice input app development effort fails silently on a real device, even though it works fine in an emulator.
Add speech_to_text and permission_handler to your pubspec.yaml, or install them directly from the terminal:
permission_handler
pubspec.yaml
flutter pub add speech_to_text flutter pub add permission_handler
Your pubspec.yaml dependencies block should look something like this:
dependencies: flutter: sdk: flutter speech_to_text: ^7.0.0 permission_handler: ^11.3.1
permission_handler isn’t strictly required by the plugin itself, but it gives you finer control over checking and requesting microphone permission Flutter apps need before the recognizer can start listening.
Speech recognition needs explicit access to the device microphone. Open android/app/src/main/AndroidManifest.xml and add the following inside the <manifest> tag, above the <application> block:
android/app/src/main/AndroidManifest.xml
<manifest>
<application>
<uses-permission android:name="android.permission.RECORD_AUDIO" /> <uses-permission android:name="android.permission.INTERNET" /> <queries> <intent> <action android:name="android.speech.RecognitionService" /> </intent> </queries>
The <queries> block matters more than developers expect. Starting with Android 11, apps must explicitly declare which system services they intend to query, and omitting this causes the recognizer to silently report itself as unavailable on newer devices.
<queries>
For iOS, open ios/Runner/Info.plist and add two usage description keys. Apple rejects builds (and the OS blocks the permission prompt) if these strings are missing:
ios/Runner/Info.plist
<key>NSSpeechRecognitionUsageDescription</key> <string>This app uses speech recognition to convert your voice into text.</string> <key>NSMicrophoneUsageDescription</key> <string>This app needs microphone access to listen to your voice.</string>
With both manifests configured, your speech recognition Flutter setup is ready for the actual Dart implementation.
Now for the part that actually matters to your users: capturing speech and converting it into readable text. We’ll build a self-contained stateful widget so you can see the full lifecycle of a Flutter speech-to-text app in one place.
Create a SpeechToText object, track whether it’s currently listening, and store the recognized text in a controller you can bind to a TextField.
SpeechToText
TextField
import 'package:flutter/material.dart'; import 'package:speech_to_text/speech_to_text.dart' as stt; import 'package:permission_handler/permission_handler.dart'; class VoiceInputScreen extends StatefulWidget { const VoiceInputScreen({super.key}); @override State<VoiceInputScreen> createState() => _VoiceInputScreenState(); } class _VoiceInputScreenState extends State<VoiceInputScreen> { final stt.SpeechToText _speechToText = stt.SpeechToText(); final TextEditingController _textController = TextEditingController(); bool _speechEnabled = false; bool _isListening = false; String _recognizedText = ''; @override void initState() { super.initState(); _initSpeech(); } @override void dispose() { _textController.dispose(); super.dispose(); } }
Before initializing the recognizer, confirm the microphone permission Flutter requires has actually been granted. This avoids a confusing failure where initialize() returns false with no clear explanation.
initialize()
false
Future<bool> _checkMicPermission() async { var status = await Permission.microphone.status; if (!status.isGranted) { status = await Permission.microphone.request(); } return status.isGranted; }
_initSpeech() prepares the plugin once when the screen loads. _startListening() kicks off a live session, and _onSpeechResult() appends the transcribed words as they arrive, so longer dictations aren’t lost between pauses.
_initSpeech()
_startListening()
_onSpeechResult()
Future<void> _initSpeech() async { final hasPermission = await _checkMicPermission(); if (hasPermission) { _speechEnabled = await _speechToText.initialize( onError: (error) => debugPrint('Speech error: $error'), onStatus: (status) => debugPrint('Speech status: $status'), ); } setState(() {}); } void _startListening() async { await _speechToText.listen( onResult: _onSpeechResult, listenFor: const Duration(minutes: 2), pauseFor: const Duration(seconds: 5), localeId: 'en_US', listenOptions: stt.SpeechListenOptions( partialResults: true, cancelOnError: true, ), ); setState(() => _isListening = true); } void _stopListening() async { await _speechToText.stop(); setState(() => _isListening = false); } void _onSpeechResult(result) { setState(() { _recognizedText = result.recognizedWords; _textController.text = _recognizedText; }); }
Setting partialResults: true lets the text field update live as the user talks, which gives a far better experience than waiting silently for a final result. The pauseFor duration controls how long the recognizer waits during silence before deciding the user has finished a phrase.
partialResults: true
pauseFor
A minimal voice-to-text app interface just needs a text field to show the transcription and a button to toggle listening:
@override Widget build(BuildContext context) { return Scaffold( appBar: AppBar(title: const Text('Flutter Speech-to-Text Demo')), body: Padding( padding: const EdgeInsets.all(16.0), child: Column( children: [ TextField( controller: _textController, maxLines: 5, decoration: const InputDecoration( border: OutlineInputBorder(), hintText: 'Your speech will appear here...', ), ), const SizedBox(height: 20), Text(_isListening ? 'Listening...' : 'Tap mic to speak'), ], ), ), floatingActionButton: FloatingActionButton( onPressed: _speechEnabled ? (_isListening ? _stopListening : _startListening) : null, backgroundColor: _isListening ? Colors.red : Colors.blue, child: Icon(_isListening ? Icons.mic : Icons.mic_none), ), ); }
That’s a complete, runnable example of Flutter speech-to-text mobile app development: permissions are requested up front, the recognizer initializes once, and the UI reflects listening state in real time.
A frequent complaint with speech recognition Flutter implementations is that the plugin stops listening after a short pause, or clears previously recognized text every time the user presses the microphone button again. You can solve both issues with small adjustments:
_recognizedText
listenFor
_speechToText.locales()
en_US
_speechToText.stop()
Many production apps built around speech recognition Flutter packages also need the reverse capability: reading text back aloud. The flutter_tts package handles this and pairs naturally with speech_to_text for building conversational interfaces.
flutter_tts
flutter pub add flutter_tts
import 'package:flutter_tts/flutter_tts.dart'; final FlutterTts _flutterTts = FlutterTts(); Future<void> _speak(String text) async { await _flutterTts.setLanguage('en-US'); await _flutterTts.setSpeechRate(0.5); await _flutterTts.setVolume(1.0); await _flutterTts.speak(text); }
Together, these two packages let you build a full voice loop: the user speaks, your app transcribes it with speech-to-text, processes the result, and responds out loud with text-to-speech — the foundation of most voice assistant and accessibility-focused features in Flutter speech-to-text mobile app development today.
Speech recognition behaves differently across simulators and hardware, so test on a real Android phone and a real iPhone before shipping. A few checks worth running:
localeId
Flutter speech-to-text mobile app development comes down to three things done correctly: requesting microphone permission Flutter requires upfront, initializing the speech_to_text plugin once per session, and managing listening state so partial and final results update your UI smoothly. Once that foundation is in place, extending it with text-to-speech, multilingual locale support, or AI-powered post-processing of the transcribed text is straightforward.
Whether you’re building a note-taking app, a voice-controlled form, or a full conversational assistant, this pattern for speech recognition Flutter development scales well and keeps your codebase shared across Android and iOS which is, after all, the whole point of going with a cross-platform voice-to-text app instead of writing native voice modules twice.
Yes. The speech_to_text package is open source and free, and it relies on the speech recognition engine already built into Android and iOS rather than a paid third-party API. The only cost consideration is that on some Android devices, recognition is processed through Google’s servers, which requires an active internet connection.
Not by default. The standard speech_to_text plugin depends on the platform’s native recognizer, and on most Android devices that means a network connection unless the device has on-device language models downloaded. For true offline voice input app development, look at packages built specifically for local inference, such as Whisper-based plugins or Picovoice’s offline SDKs, which run the model entirely on-device instead of sending audio to a cloud service.
This trips up almost everyone the first time. The package’s own documentation states plainly that it’s designed for commands and short phrases, not continuous, always-on transcription. Android and iOS recognizers automatically end a session after a few seconds of silence. You can extend this somewhat with the pauseFor and listenFor parameters, but for true continuous dictation you generally need to detect when listening stops and immediately restart a new session in the background.
By default, each new listening session replaces recognizedWords with only the latest transcription. To preserve earlier text, append the new result to your existing string instead of assigning over it, something like _recognizedText = '$_recognizedText ${result.recognizedWords}'.trim(). This is one of the most common fixes requested in developer forums and GitHub issue threads for this plugin.
recognizedWords
_recognizedText = '$_recognizedText ${result.recognizedWords}'.trim()
Web support for speech_to_text has historically lagged behind mobile, with browser-specific bugs reported around partial results and result formatting. Browser support for the underlying Web Speech API also varies — it works reasonably well in Chrome but is unreliable or unsupported in other browsers. If your app’s primary target is web, test thoroughly in each target browser before relying on this feature.
Yes. Call _speechToText.locales() after initialization to get a list of locales supported on the user’s device, then pass the selected locale’s identifier into the localeId parameter of listen(). Supported languages depend on what the device’s underlying speech engine offers, so always populate your language picker dynamically rather than hardcoding a fixed list.
listen()
This page was last edited on 17 June 2026, at 1:09 pm
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Save my name, email, and website in this browser for the next time I comment.
Build faster, scale smarter, and cut costs with secure application services that drive growth.
Welcome! My team and I personally ensure every project gets world-class attention, backed by experience you can trust.
What is your estimated budget for this project?*$50K+$25K – $50K$10K – $25K$5K - $10KUnder $5K
What is your target timeline for kick-off?*Ready to start immediatelyWithin 2-4 weeksIn 1–3 monthsIn 3–6 monthsExploring options
By proceeding, you agree to our Privacy Policy
Thank you for filling out our contact form.A representative will contact you shortly.
You can also schedule a meeting with our team: