OpenAI Whisper changed speech recognition when it launched in 2022. For the first time, a transcription model that rivaled commercial products was available for free, open-source, and could run on your own hardware. Four years later, Whisper remains the foundation of most serious transcription tools on the Mac. But running raw Whisper from the command line is not practical for everyday use. You want an app.
This guide compares the best Mac apps built on Whisper: what they add on top of the base model, how fast they run on Apple Silicon, and which one makes sense for different use cases. We tested Voice Keyboard Pro, MacWhisper, Superwhisper, and Whisper.cpp CLI on an M2 MacBook Air and an M4 Pro MacBook Pro.
What Is OpenAI Whisper and Why It Matters
Whisper is a neural network trained on 680,000 hours of multilingual audio from the internet. It can transcribe speech in over 90 languages, translate between languages, and handle noisy audio far better than previous open-source models. OpenAI released it under the MIT license, which means anyone can use it, modify it, and build products on top of it.
Before Whisper, accurate speech recognition required either Apple's Dictation (decent but limited), Google's cloud API (accurate but sends audio to Google), or Dragon NaturallySpeaking (expensive and Windows-focused). Whisper gave developers an accurate model that runs locally, for free, on any hardware.
The model comes in five sizes:
| Model | Parameters | Size on Disk | Relative Speed | Relative Accuracy |
|---|---|---|---|---|
| Tiny | 39M | ~75 MB | Fastest | Good for simple speech |
| Base | 74M | ~140 MB | Fast | Good |
| Small | 244M | ~460 MB | Moderate | Very good |
| Medium | 769M | ~1.5 GB | Slower | Excellent |
| Large-v3 | 1.5B | ~3 GB | Slowest | Best |
Larger models are more accurate but take longer to process. On modern Apple Silicon Macs, even the large model runs comfortably. On older Intel Macs, you will want to stick with small or medium.
Why You Need an App (Not Just the Model)
You can run Whisper directly from the command line using Python or Whisper.cpp. But raw Whisper is a batch transcription tool: you give it an audio file, wait, and get text back. That is useful for transcribing recordings but useless for real-time dictation.
A good Whisper app adds:
- Real-time dictation: Press a button, speak, get text instantly in whatever app you are using.
- System-wide integration: Works in every text field on your Mac, not just the app itself.
- Apple Silicon optimization: Uses Core ML, Metal, or the Neural Engine to run models faster than generic Python Whisper.
- AI post-processing: Cleans up filler words, fixes grammar, adjusts tone after transcription.
- Custom vocabulary: Adds domain-specific terms the base Whisper model does not know.
- Audio recording management: Saves and organizes your recordings for later reference.
The app is where the user experience happens. The Whisper model is the engine, but the app is the car.
App Comparison
| Feature | Voice Keyboard Pro | MacWhisper | Superwhisper | Whisper.cpp CLI |
|---|---|---|---|---|
| Price | Free / $4.99/mo | Free / $29 once | $8/mo | Free |
| Real-time dictation | Yes | No (batch only) | Yes | No |
| System-wide | Yes | No (own app) | Yes | No |
| Apple Silicon optimized | Yes (Core ML) | Yes (Core ML) | Yes | Yes (Metal) |
| AI cleanup | 7 actions | GPT integration | AI rewrite modes | No |
| Custom vocabulary | Yes | No | No | Via prompt |
| iPhone app | Yes | No | No | No |
| Offline | Yes | Yes | Yes | Yes |
| Best for | Daily dictation | Batch transcription | Quick dictation | Developers/scripts |
Detailed Reviews
Voice Keyboard Pro
Voice Keyboard Pro uses Whisper as its transcription engine but adds several layers on top. The most significant is profession-aware vocabulary. Whisper's base model handles everyday English well but struggles with specialized terminology. Voice Keyboard Pro detects your profession and tunes the transcription pipeline to handle domain-specific terms: medical terminology for doctors, legal terms for lawyers, technical jargon for developers. This happens through a combination of custom Whisper prompting and post-processing.
Voice Keyboard Pro runs Whisper via Core ML on Apple Silicon, which takes advantage of the Neural Engine for inference. On an M2 MacBook Air, transcription latency for a 10-second dictation is under 1 second. On M4 Pro, it is nearly instantaneous. The app works system-wide: press a keyboard shortcut, speak, and text appears in whatever app has focus.
Voice Keyboard Pro also offers AI actions after transcription: clean up filler words, fix grammar, change tone, shorten, or translate. These use a language model, not Whisper, but the integration is seamless. Dictate a rough draft, tap a button, get polished text.
The free tier includes basic dictation. The $4.99/month plan adds AI actions, custom vocabulary, and longer dictation sessions.
Best for: Daily dictation across all apps. Professionals who need specialized vocabulary. People who want one tool for both Mac and iPhone. For a broader comparison including non-Whisper tools, see our best dictation app for Mac guide.
Honest limitation: The profession-aware features require the paid plan. If you just want basic Whisper transcription without the extras, MacWhisper or Whisper.cpp are cheaper.
MacWhisper
MacWhisper is a straightforward Mac app for batch transcription using Whisper. You drop in an audio file, or record directly in the app, and it transcribes using your choice of Whisper model. The free version uses the tiny and base models. The Pro version ($29 one-time) unlocks all model sizes and adds features like GPT-powered summaries, translation, and export to SRT subtitles.
MacWhisper is not a dictation tool. It does not work system-wide. You cannot press a shortcut and dictate into Gmail. Instead, you record or import audio into MacWhisper, wait for transcription, and copy the result. This makes it excellent for transcribing podcasts, interviews, lectures, and recorded meetings. It is not suitable for real-time typing replacement.
The app is well-optimized for Apple Silicon. Transcribing a 30-minute podcast with the large model takes about 3-4 minutes on an M2 chip. The interface is clean, the export options are comprehensive, and the one-time price is reasonable.
Best for: Batch transcription of recordings. Podcast producers, journalists, researchers, and anyone who transcribes existing audio files. Content creators who need subtitles.
Honest limitation: Not a dictation tool. No system-wide integration. No real-time transcription. If you want to speak and have text appear in your current app, MacWhisper is the wrong tool.
Superwhisper
Superwhisper is a dictation app that runs Whisper locally on your Mac. It works system-wide via a keyboard shortcut. Press the shortcut, speak, release, and text appears in your current app. The differentiator is its AI modes: you can switch between "Dictation" (faithful transcription), "Writing" (AI rewrites your speech into polished text), and "Translation" (speak in one language, get text in another).
Superwhisper lets you choose which Whisper model to use and downloads models locally. Smaller models transcribe faster but are less accurate. Larger models are slower but handle complex vocabulary and accents better. You can switch models depending on your needs.
At $8/month, Superwhisper is the most expensive option for individual users. The price reflects the AI rewriting features, which go beyond what basic Whisper transcription offers.
Best for: Users who want AI-enhanced dictation with flexible models. People who frequently switch between languages. Those who want their spoken thoughts reshaped into polished prose automatically.
Honest limitation: More expensive than alternatives. No iPhone app. No profession-specific vocabulary. The AI rewrite modes can be unpredictable if you want your exact words preserved rather than a polished version.
Whisper.cpp CLI
Whisper.cpp is Georgi Gerganov's C/C++ port of Whisper. It runs on the command line, uses Metal for GPU acceleration on Apple Silicon, and is completely free. You install it via Homebrew (brew install whisper-cpp), download a model, and run it on audio files.
For developers and power users, Whisper.cpp is the most flexible option. You can script it, pipe it into other tools, run it on batches of files, and customize every parameter. It supports real-time streaming transcription via the stream example, though the user experience is bare-bones compared to a native app.
Whisper.cpp is also the fastest way to run Whisper on Mac. It uses Metal compute shaders to run inference on the GPU, which is faster than Core ML for some model sizes. On an M4 Pro, the large-v3 model processes 30 seconds of audio in about 1.5 seconds.
Best for: Developers who want maximum control. Batch transcription pipelines. People who are comfortable with the command line and want free, no-strings-attached Whisper transcription.
Honest limitation: No GUI. No system-wide integration. No AI cleanup. Not practical for daily dictation unless you build your own tooling around it. The learning curve for non-developers is steep.
On-Device vs Cloud Whisper
OpenAI also offers Whisper as a cloud API. You send audio to their servers and get a transcript back. This raises an important choice: run Whisper locally on your Mac, or use the cloud version?
On-device advantages
- Privacy: Audio never leaves your Mac. No third party hears or stores your voice recordings.
- Offline: Works without internet. Useful on planes, in remote locations, or when your connection is unreliable.
- No per-minute costs: Once you have the app, transcription is free. The cloud API charges $0.006 per minute. That adds up for heavy users.
- Lower latency for short dictation: For 5-15 second utterances, on-device processing is faster than the round trip to the cloud.
Cloud API advantages
- Faster for large files: Transcribing a 2-hour recording is faster via the API because OpenAI runs it on powerful GPUs.
- Always uses the latest model: OpenAI updates their hosted model. Your local model stays at whatever version you downloaded.
- No local resources used: If your Mac is already under heavy load (rendering video, compiling code), the cloud API does not compete for CPU/GPU.
For most users, on-device Whisper is the right choice. The privacy benefit alone justifies it. Cloud Whisper makes sense for specific workflows like transcribing large backlogs of recorded audio. For more on offline transcription, see our offline voice-to-text guide.
Apple Silicon Performance
Whisper performance on Mac varies dramatically by chip. Here is what to expect for transcribing 30 seconds of audio:
| Chip | Tiny Model | Small Model | Medium Model | Large-v3 Model |
|---|---|---|---|---|
| M1 | 0.3s | 1.2s | 3.5s | 8s |
| M2 | 0.2s | 0.9s | 2.8s | 6s |
| M3 | 0.15s | 0.7s | 2.2s | 4.5s |
| M4 Pro | 0.1s | 0.4s | 1.3s | 2.5s |
| Intel (i7) | 0.8s | 4s | 12s | 30s+ |
These are approximate times using optimized implementations (Core ML or Metal). Python Whisper without optimization is 3-5x slower. The key takeaway: any Apple Silicon Mac handles Whisper comfortably. Even the base M1 runs the small model fast enough for real-time dictation. Intel Macs are usable with smaller models but struggle with medium and large.
The Neural Engine on Apple Silicon is particularly important for Whisper performance. Apps that use Core ML (like Voice Keyboard Pro and MacWhisper) offload inference to the Neural Engine, which is specifically designed for machine learning workloads. This keeps the CPU and GPU free for other tasks and improves battery life compared to running Whisper on the GPU alone.
Which Whisper Model Should You Use?
Model choice matters more than app choice for transcription accuracy. Here is a practical guide:
- Tiny: Fast but prone to errors. Use it only for quick, casual dictation where accuracy does not matter much. Good for capturing rough ideas you will edit heavily.
- Base: Noticeably better than tiny. Handles clear, simple speech well. Struggles with accents, fast speech, and specialized vocabulary. A reasonable choice for everyday dictation on older hardware.
- Small: The sweet spot for most users. Good accuracy on standard vocabulary, fast enough for real-time dictation on any Apple Silicon Mac. This is what most dictation apps default to.
- Medium: Significantly better accuracy on difficult audio: accents, background noise, rapid speech, and unusual vocabulary. Slightly slower than small but still real-time on M1 and later. Good choice if you dictate in noisy environments or have a non-standard accent.
- Large-v3: Best accuracy, especially for non-English languages and heavily accented speech. Slower and uses more memory. Best for batch transcription where speed is not critical. Overkill for simple English dictation.
Beyond Whisper: What Apps Add on Top
Whisper gives you raw transcription. The apps that build on it add value in several ways:
Vocabulary customization
Whisper does not know your company's product names, your industry's jargon, or your colleagues' names. Apps like Voice Keyboard Pro let you add custom vocabulary so these terms transcribe correctly. You add "Vercel" to your vocabulary once, and it stops being transcribed as "herself" or "for cell." This is the single biggest accuracy improvement you can make beyond choosing a larger model.
AI post-processing
Raw dictation includes filler words (um, uh, you know), false starts, and grammatical errors that come naturally when speaking. AI post-processing cleans these up. Voice Keyboard Pro offers actions like "clean up," "professional tone," "shorten," and "fix grammar." Superwhisper has writing modes that rewrite your dictation into polished prose. MacWhisper integrates with GPT for summaries and editing.
Context-aware formatting
When you dictate "three hundred dollars," should the app output "three hundred dollars" or "$300"? When you say "new line," should it type those words or insert a line break? Good apps handle this formatting intelligently based on context. For a deeper look at how Voice Keyboard Pro handles these technical details, see our under the hood article.
Frequently Asked Questions
What is OpenAI Whisper?
Whisper is an open-source speech recognition model created by OpenAI, trained on 680,000 hours of multilingual audio. It can transcribe speech in over 90 languages and runs locally on your device. Because it is open-source, developers have built it into dozens of apps for Mac, Windows, and mobile.
Does Whisper run well on Apple Silicon Macs?
Yes. Apple Silicon Macs are one of the best platforms for running Whisper locally. The Neural Engine and unified memory architecture handle Whisper inference efficiently. Even the base M1 chip runs the small model faster than real-time. The M3 and M4 chips are even faster, making the large model practical for real-time use.
What is the difference between Whisper model sizes?
Whisper comes in five sizes from tiny (39M parameters) to large (1.5B parameters). Larger models are more accurate, especially on accents, background noise, and unusual vocabulary, but they are slower and use more memory. For most Mac users, the small or medium model offers the best balance of accuracy and speed.
Is Whisper transcription free?
The Whisper model itself is free and open-source. You can run it for free via Whisper.cpp on the command line. Mac apps that use Whisper charge for the app experience: user interface, AI features, and convenience. Prices range from $29 one-time (MacWhisper) to $4.99-8/month (Voice Keyboard Pro, Superwhisper).
Can Whisper transcribe audio in languages other than English?
Yes. Whisper supports over 90 languages and can auto-detect the spoken language. English, Spanish, French, German, and Mandarin have the best accuracy. Less common languages work but with lower accuracy. All Whisper-based Mac apps inherit this multilingual capability, though the experience varies by app.
Should I use on-device Whisper or OpenAI's cloud Whisper API?
On-device is better for most people. It keeps your audio private, works offline, and has no per-minute costs. The cloud API is faster for transcribing large batches of recorded audio and always uses the latest model version. For real-time dictation, on-device is the clear winner because there is no network latency.
Whisper is the engine. The app is the experience. Choose based on how you want to use transcription, not which one claims the best Whisper implementation.
Try Voice Keyboard Pro, a Whisper-powered dictation app for Mac and iPhone: voicekeyboardpro.com. On-device transcription with profession-aware vocabulary built on top of Whisper.