Launch powerful mobile apps in weeks.
Build powerful web app & SaaS platforms.
Build AI-powered cross-platform app.
Launch premium website that sells.
Launch apps that think, learn, & perform.
Deploy powerful eCommerce app in weeks.
Written by Anika Ali Nitu
Get experienced developers for cross-platform app development.
To add a voice assistant in React Native, start by designing how the app will listen, understand, and respond to users. Then set up microphone permissions, add speech-to-text for capturing voice, use NLU or an AI backend to process the command, return a response through text-to-speech, and test offline support, errors, accessibility, and performance before release.
Voice is no longer a novelty — it’s an expectation. Over 27% of mobile users now rely on voice interaction daily, and apps that lack it are quietly losing ground. If you’re a mobile developer looking to implement React Native voice assistant integration, this guide covers everything: architecture decisions, the best libraries, real code patterns, offline handling, and the pitfalls that kill production apps before launch.
Whether you’re adding a simple voice search or building a fully conversational AI assistant, this is the only guide you need.
The mobile app landscape has fundamentally shifted. Users expect ambient, hands-free experiences — especially in fitness apps, navigation, productivity tools, and accessibility-first products.
Key industry signals:
React Native sits in a uniquely powerful position here: a single codebase that accesses native iOS and Android speech APIs, bridged through JavaScript, with a thriving ecosystem of voice-specific libraries. Done right, React Native voice assistant integration feels completely native.
Before writing a single line of code, understand the three-layer architecture that all production voice apps share:
Each layer has distinct responsibilities. Conflating them — especially mixing UI state with NLU logic — is the #1 source of messy, unmaintainable voice app codebases.
@react-native-voice/voice
The de facto standard for speech-to-text in React Native. It wraps native speech recognition on both iOS (SFSpeechRecognizer) and Android (SpeechRecognizer API).
npm install @react-native-voice/voice npx pod-install # iOS only
Strengths: Mature, battle-tested, supports continuous recognition, locale switching Weaknesses: Requires internet on most devices, permission setup is manual
react-native-tts
Clean, well-maintained text-to-speech library that surfaces native TTS engines on iOS and Android.
npm install react-native-tts
Strengths: Supports pitch/rate control, multiple voices, event callbacks Weaknesses: Voice quality limited by OS TTS engine unless you integrate a cloud provider
react-native-openai
fetch
For apps powered by LLMs. Neither OpenAI nor Anthropic ship official React Native SDKs, but both REST APIs work perfectly via fetch or Axios in React Native.
@picovoice/react-native-voice-processor
For wake word detection and on-device NLU. Privacy-first, no cloud dependency.
npm install @picovoice/react-native-voice-processor
react-native-whisper
On-device speech recognition using OpenAI’s Whisper model. Excellent accuracy, works fully offline.
npm install whisper.rn
Best for: Apps requiring offline voice input or privacy-sensitive use cases.
iOS — Add to Info.plist:
Info.plist
<key>NSSpeechRecognitionUsageDescription</key> <string>We use speech recognition to process your voice commands.</string> <key>NSMicrophoneUsageDescription</key> <string>We need microphone access to listen to your voice.</string>
Android — Add to AndroidManifest.xml:
AndroidManifest.xml
<uses-permission android:name="android.permission.RECORD_AUDIO" /> <uses-permission android:name="android.permission.INTERNET" />
import Voice from '@react-native-voice/voice'; import { useEffect, useState, useCallback } from 'react'; export function useVoiceRecognition() { const [isListening, setIsListening] = useState(false); const [transcript, setTranscript] = useState(''); const [error, setError] = useState(null); useEffect(() => { Voice.onSpeechResults = (e) => { setTranscript(e.value[0]); }; Voice.onSpeechError = (e) => { setError(e.error); setIsListening(false); }; Voice.onSpeechEnd = () => { setIsListening(false); }; return () => { Voice.destroy().then(Voice.removeAllListeners); }; }, []); const startListening = useCallback(async (locale = 'en-US') => { try { setError(null); setTranscript(''); await Voice.start(locale); setIsListening(true); } catch (e) { setError(e.message); } }, []); const stopListening = useCallback(async () => { await Voice.stop(); setIsListening(false); }, []); return { isListening, transcript, error, startListening, stopListening }; }
This hook gives you a clean, reusable interface for any component in your app.
import React from 'react'; import { View, Text, TouchableOpacity, StyleSheet } from 'react-native'; import { useVoiceRecognition } from './useVoiceRecognition'; export default function VoiceButton() { const { isListening, transcript, startListening, stopListening } = useVoiceRecognition(); return ( <View style={styles.container}> <TouchableOpacity style={[styles.button, isListening && styles.active]} onPress={isListening ? stopListening : startListening} > <Text style={styles.label}> {isListening ? '🎙️ Listening...' : '🎤 Tap to Speak'} </Text> </TouchableOpacity> {transcript ? ( <Text style={styles.transcript}>{transcript}</Text> ) : null} </View> ); } const styles = StyleSheet.create({ container: { alignItems: 'center', padding: 24 }, button: { backgroundColor: '#4F46E5', paddingHorizontal: 32, paddingVertical: 16, borderRadius: 50, }, active: { backgroundColor: '#EF4444' }, label: { color: '#fff', fontSize: 16, fontWeight: '600' }, transcript: { marginTop: 16, fontSize: 14, color: '#374151' }, });
Send the transcript to your own backend, which proxies to an LLM API. This keeps API keys off the device and gives you full control over the system prompt, conversation history, and safety filters.
async function getAssistantResponse(userMessage, conversationHistory) { const response = await fetch('https://your-api.com/voice-chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ message: userMessage, history: conversationHistory, }), }); const data = await response.json(); return data.reply; }
Use the Alexa Skills Kit (ASK) with AWS Lambda. React Native communicates via Alexa Voice Service (AVS) SDK or via deep links.
import Linking from 'react-native/Libraries/Linking/Linking'; // Launch Alexa app with a specific intent Linking.openURL('alexa://actions?command=your-skill-invocation');
Dialogflow CX integrates cleanly with React Native via REST API. It handles NLU, intent mapping, and entity extraction server-side.
const detectIntent = async (text, sessionId) => { const response = await fetch( `https://dialogflow.googleapis.com/v2/projects/${PROJECT_ID}/agent/sessions/${sessionId}:detectIntent`, { method: 'POST', headers: { Authorization: `Bearer ${accessToken}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ queryInput: { text: { text, languageCode: 'en-US' } }, }), } ); return response.json(); };
For apps that don’t need a full LLM, lightweight NLU handles most use cases efficiently.
const INTENTS = { navigate: ['go to', 'open', 'show me', 'navigate to'], search: ['find', 'search for', 'look up', 'where is'], play: ['play', 'start', 'listen to', 'put on'], stop: ['stop', 'pause', 'cancel', 'quit'], }; function classifyIntent(transcript) { const lower = transcript.toLowerCase(); for (const [intent, triggers] of Object.entries(INTENTS)) { if (triggers.some((trigger) => lower.includes(trigger))) { return intent; } } return 'unknown'; }
For more robust NLU without an LLM, consider compromise.cool (runs in React Native via Metro bundler config) or ml5.js for on-device text classification.
import Tts from 'react-native-tts'; import { useEffect } from 'react'; export function useTTS() { useEffect(() => { Tts.setDefaultLanguage('en-US'); Tts.setDefaultRate(0.5); // 0.0 – 1.0 (0.5 = natural pace) Tts.setDefaultPitch(1.0); // 0.5 – 2.0 }, []); const speak = (text) => { Tts.stop(); // Cancel any in-progress speech Tts.speak(text); }; const stop = () => Tts.stop(); return { speak, stop }; }
The native TTS engines sound robotic. For production apps, stream audio from a cloud provider:
const speakWithElevenLabs = async (text, voiceId) => { const response = await fetch( `https://api.elevenlabs.io/v1/text-to-speech/${voiceId}/stream`, { method: 'POST', headers: { 'xi-api-key': YOUR_API_KEY, 'Content-Type': 'application/json', }, body: JSON.stringify({ text, model_id: 'eleven_turbo_v2' }), } ); // Pipe audio stream to react-native-sound or expo-av };
Offline support separates good voice apps from great ones. Here’s the tiered approach used by production apps:
import { initWhisper } from 'whisper.rn'; const whisperContext = await initWhisper({ filePath: require('./assets/whisper.en.bin'), // ~75MB model }); const { stop, promise } = whisperContext.transcribe(audioFilePath, { language: 'en', }); const { result } = await promise; console.log('Transcript:', result);
Store the most common intents and responses in SQLite or MMKV. When offline, match against local intent rules before failing.
import { MMKV } from 'react-native-mmkv'; const storage = new MMKV(); function handleOfflineIntent(transcript) { const intent = classifyIntent(transcript); const cachedResponse = storage.getString(`offline_response_${intent}`); return cachedResponse ?? "I'm offline right now, but I'll remember that."; }
For voice commands that require a server action (booking, ordering, etc.), queue them locally and sync when connectivity returns.
import NetInfo from '@react-native-community/netinfo'; const commandQueue = []; NetInfo.addEventListener((state) => { if (state.isConnected && commandQueue.length > 0) { processQueue(commandQueue); commandQueue.length = 0; } });
Voice features are CPU-intensive. These optimizations prevent the jank that kills user trust.
1. Debounce transcript processing
import { useMemo } from 'react'; import debounce from 'lodash/debounce'; const processTranscript = useMemo( () => debounce((text) => sendToNLU(text), 400), [] );
2. Run heavy processing off the JS thread
Use react-native-worklets-core or move NLU logic to a native module to avoid blocking the main thread during animations.
react-native-worklets-core
3. Lazy-load voice modules
const VoiceModule = React.lazy(() => import('./VoiceAssistant'));
Only load voice components when the user activates the feature — not on app startup.
4. Compress audio before upload
Use react-native-audio-recorder-player with AAC encoding at 16kHz mono — the minimum quality most STT APIs need. This reduces payload size by ~70% vs uncompressed WAV.
react-native-audio-recorder-player
AAC
Voice UIs require a layered testing strategy.
jest.mock('@react-native-voice/voice', () => ({ start: jest.fn(), stop: jest.fn(), destroy: jest.fn(() => Promise.resolve()), removeAllListeners: jest.fn(), onSpeechResults: null, onSpeechError: null, onSpeechEnd: null, }));
Don’t try to pipe real audio in CI. Instead, test the NLU and response pipeline by injecting transcript strings:
it('navigates to profile when user says "open my profile"', async () => { const { getByText } = render(<VoiceAssistant />); act(() => { mockVoiceTranscript('open my profile'); }); await waitFor(() => { expect(mockNavigation.navigate).toHaveBeenCalledWith('Profile'); }); });
Detox (by Wix) supports simulating voice input on iOS simulators via accessibility identifiers. For Android, use adb shell input text for text-based fallback testing.
adb shell input text
Pitfall 1: Not handling interim results Most STT APIs return partial transcripts before the final result. If you only handle onSpeechResults, you miss the UX opportunity to show live feedback.
onSpeechResults
Voice.onSpeechPartialResults = (e) => { setInterimTranscript(e.value[0]); // Show live as user speaks };
Pitfall 2: Forgetting to destroy the Voice instance Leaked listeners cause ghost microphone activity and crash on re-renders.
useEffect(() => { return () => { Voice.destroy().then(Voice.removeAllListeners); }; }, []);
Pitfall 3: Hardcoding English only At minimum, detect the device locale and pass it to Voice.start(). It takes two minutes and makes your app globally usable.
Voice.start()
Pitfall 4: No fallback for denied permissions Always handle the case where the user denies microphone access. Render a text input fallback, not a broken UI.
Pitfall 5: Ignoring background audio sessions on iOS If your app plays audio (music, podcasts) and adds voice, configure AVAudioSession correctly or your voice feature will interrupt — or be interrupted by — background audio.
Before shipping your React Native voice assistant integration, confirm:
☐ Declare microphone and speech recognition permissions for both iOS and Android☐ Add runtime permission requests with a graceful fallback for denied access☐ Call Voice.destroy() when the component unmounts☐ Show interim transcripts for real-time user feedback☐ Detect offline status and support queue-and-sync or local fallback☐ Call TTS stop() before playing a new response to avoid audio overlap☐ Store API keys in environment variables, never inside the app bundle☐ Configure audio compression to reduce STT API payload size☐ Add locale and language detection☐ Write unit tests for NLU intent classification☐ Add integration tests with a mocked Voice module☐ Add accessibility labels to all voice UI controls☐ Handle key error states, including no network, STT failure, and timeout☐ Track analytics events for voice session start, matched intent, and errors
Voice.destroy()
stop()
Yes. Using whisper.rn for STT and react-native-tts for TTS, plus local NLU rules or a bundled ONNX model, you can build a fully offline voice assistant. The tradeoff is bundle size (~75–150MB for Whisper models).
whisper.rn
For cloud-dependent apps: @react-native-voice/voice. For offline or privacy-first apps: whisper.rn. For highest accuracy regardless of cost: route transcription through OpenAI Whisper API via your backend.
Use @picovoice/react-native-voice-processor with Picovoice’s Porcupine wake word engine. It runs on-device, consumes minimal battery, and supports custom wake words.
Partially. @react-native-voice/voice requires bare workflow (or Expo Dev Client with custom native modules). Pure Expo Go does not support it. expo-speech works in managed workflow but only for TTS, not STT.
Pass the device locale to Voice.start(locale) — e.g., ‘fr-FR’, ‘es-ES’. For multi-language apps, let users select their preferred language in settings and store it in AsyncStorage or MMKV. Most cloud STT APIs auto-detect language if you pass language: ‘auto’.
No. Always proxy through your backend. Embedding API keys in app bundles is a serious security risk — they can be extracted from the APK/IPA. Your backend also gives you rate limiting, logging, and the ability to rotate keys without an app update.
This page was last edited on 29 June 2026, at 5:38 pm
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Save my name, email, and website in this browser for the next time I comment.
Build faster, scale smarter, and cut costs with secure application services that drive growth.
Welcome! My team and I personally ensure every project gets world-class attention, backed by experience you can trust.
What is your estimated budget for this project?*$50K+$25K – $50K$10K – $25K$5K - $10KUnder $5K
What is your target timeline for kick-off?*Ready to start immediatelyWithin 2-4 weeksIn 1–3 monthsIn 3–6 monthsExploring options
By proceeding, you agree to our Privacy Policy
Thank you for filling out our contact form.A representative will contact you shortly.
You can also schedule a meeting with our team: