iPhone can transcribe a video in three taps in 2026, as long as you know which built-in feature to reach for. iOS 18 added a transcription layer to Voice Memos and kept Live Captions in system-wide playback, which together cover most casual needs without a download. This guide covers three paths: the native 3-tap route, third-party apps for cases native can’t handle, and the AI pipeline if your goal is quote graphics rather than plain text. This is the iPhone-specific slice of the complete video transcription guide — the pillar covers the broader source-to-method matrix if your workflow mixes phone with desktop or web sources. Transcription is often stage one — our AI quote generator guide covers the rest of the pipeline.
Method 1 — the 3-tap iOS 18 native workflow
The fastest route uses Live Captions, an accessibility feature Apple made system-wide in iOS 16 and upgraded in iOS 18. It reads any audio playing through the phone, video included, and prints a rolling caption overlay you can copy.
-
Turn on Live Captions once
Settings → Accessibility → Live Captions → toggle on. The on-device language model downloads and runs offline.
-
Play the video
Open it in Photos, Safari, or any app. Live Captions floats a draggable caption box on top of whatever is playing.
-
Tap the caption box and hit Save
In iOS 18 the caption window has a save action that copies the running transcript to Notes.
Accuracy sits around 85-90% on clear English and drops on accented speech or noise. For short clips it’s enough. Voice Memos is the alternative for audio already on device: import it and the iOS 18 transcription panel renders a searchable transcript while the recording plays.
When to use: short videos, single English speaker, offline use. Skip it for: long-form content, non-English audio, multi-speaker, or anything you’ll publish without proofreading.
Method 2 — third-party iPhone apps (when native isn’t enough)
Native hits a wall fast. For anything over five minutes, multi-speaker, or non-English, an app is worth the install. The three that matter in 2026:
- Otter.ai — strongest for meetings and interviews, free tier around 300 minutes per month, speaker labels built in.
- Rev Voice Recorder — better on accented English, optional human review, exports SRT and DOCX.
- Descript Mobile — slower, but the best pick if you’ll edit the transcript alongside the video later.
Accuracy sits in the 94-98% range on clean audio, a few points lower on heavy accents.
-
Share the video into the app
From Files or Photos, most apps accept MP4 or M4A directly.
-
Set language and speaker count
Apps default to English. Flag multi-speaker and change the language if needed.
-
Run the job
A 10-minute video transcribes in 30-90 seconds on cloud services.
-
Clean and export
Fix homophones and brand names, then export plain text, SRT, DOCX, or JSON.
If you’re weighing a dedicated transcription app against a broader tool, the TurboScribe head-to-head breakdown covers the tradeoff between pure transcription and repurposing pipelines.
Method 3 — AI pipeline for content creators
Transcription is rarely the destination. If your end goal is quote graphics, short clips, or social-ready text, the standalone-app route adds a design step most creators half-finish. End-to-end tools handle transcription, quote extraction, and graphic rendering in one pass from your iPhone.
The same framework shows up across video sources — the sibling piece on how to transcribe Facebook videos to text walks through the native / third-party / AI split and the logic carries over to iPhone uploads.
When to use: creators, coaches, and podcasters who treat video as a source for Instagram or LinkedIn output.
Which method should you actually use?
| Scenario | Recommended method |
|---|---|
| Short clip, single English speaker | Method 1 (Live Captions) |
| Long interview, multi-speaker | Method 2 (Otter or Rev) |
| Non-English or accented audio | Method 2 (Rev, or Whisper via API) |
| End goal is quote graphics / social | Method 3 (AI pipeline) |
| Voice memo already on device | Method 1 (Voice Memos transcription) |
Common iPhone transcription mistakes
Trusting Live Captions on long content. Accuracy drifts past the 5-minute mark as the on-device model loses context. Fine for quick reference, risky for anything you’ll publish.
Underestimating battery drain. Live Captions pushes the Neural Engine hard — a 30-minute session drains 15-20% on an iPhone 14 or older. Keep the phone plugged in for long jobs.
Ignoring free-tier upload caps. Otter caps free recordings around 40 minutes; Rev caps exports on the free tier. Split long videos or expect a paywall mid-job.
Not re-granting permissions after iOS updates. iOS 18 reset several permission states. If an app fails silently, check Settings → Privacy & Security → Microphone before reinstalling.
Frequently asked questions
Can iPhone transcribe videos without internet? Yes. Live Captions (Settings → Accessibility → Live Captions) runs on-device in iOS 16 and later, and iOS 18’s Voice Memos transcription also works offline once the language model is downloaded. Expect 85-90% accuracy on clear English. Cloud apps like Otter and Rev need a connection, so offline reliability is the main advantage of the native route.
Does iOS 18 include a built-in video transcription feature? iOS 18 has no dedicated video transcription button, but it layers two features that cover the use case: system-wide Live Captions that read any playing audio including video, and Voice Memos transcription for imported audio. Together they replace most standalone apps for short English content — not for accented or multi-speaker audio.
Which free iPhone app is most accurate for video transcription? Otter.ai is the strongest free option in 2026 for clean English meetings and interviews, around 95-97% accurate with speaker labels. Rev Voice Recorder edges it on accented English but caps free exports harder. For multilingual audio, a Whisper-based app beats both but usually needs a paid tier on iOS.
How long can a video be for iPhone transcription to work well? Live Captions works reliably up to about 5 minutes before on-device accuracy drifts. Voice Memos handles around 30 minutes cleanly on recent iPhones. Otter and similar apps transcribe multi-hour recordings, though free tiers cap individual recordings at 30-40 minutes. For videos over an hour, split them.
Can I transcribe videos on iPad using the same methods? Yes. Live Captions, Voice Memos transcription, and every third-party app mentioned here ship identical iPad versions. iPadOS 18 mirrors iOS 18 on transcription features. The iPad advantage is the larger screen for cleaning the transcript inline and side-by-side multitasking against the source video.
Where to go from here
iPhone handles casual transcription well enough that standalone apps are overkill for short content. For longer or critical jobs, Method 2 still wins. The iPhone workflow is one row in the source-to-method decision matrix — if you also work with YouTube, Zoom, or screen recordings, the pillar covers the method choice for each. If the transcript is a stepping stone to social content, our complete AI quote generator guide covers the downstream pipeline — and a dedicated transcription tool like TurboScribe is worth a look only if raw text is the final deliverable.