All posts

OpenAI Whisper changed speech recognition when it launched in 2022. For the first time, a transcription model that rivaled commercial products was available for free, open-source, and could run on your own hardware. Four years later, Whisper remains the foundation of most serious transcription tools on the Mac. But running raw Whisper from the command line is not practical for everyday use. You want an app.

This guide compares the best Mac apps built on Whisper: what they add on top of the base model, how fast they run on Apple Silicon, and which one makes sense for different use cases. We tested Voice Keyboard Pro, MacWhisper, Superwhisper, and Whisper.cpp CLI on an M2 MacBook Air and an M4 Pro MacBook Pro.

What Is OpenAI Whisper and Why It Matters

Whisper is a neural network trained on 680,000 hours of multilingual audio from the internet. It can transcribe speech in over 90 languages, translate between languages, and handle noisy audio far better than previous open-source models. OpenAI released it under the MIT license, which means anyone can use it, modify it, and build products on top of it.

Before Whisper, accurate speech recognition required either Apple's Dictation (decent but limited), Google's cloud API (accurate but sends audio to Google), or Dragon NaturallySpeaking (expensive and Windows-focused). Whisper gave developers an accurate model that runs locally, for free, on any hardware.

The model comes in five sizes:

ModelParametersSize on DiskRelative SpeedRelative Accuracy
Tiny39M~75 MBFastestGood for simple speech
Base74M~140 MBFastGood
Small244M~460 MBModerateVery good
Medium769M~1.5 GBSlowerExcellent
Large-v31.5B~3 GBSlowestBest

Larger models are more accurate but take longer to process. On modern Apple Silicon Macs, even the large model runs comfortably. On older Intel Macs, you will want to stick with small or medium.

Why You Need an App (Not Just the Model)

You can run Whisper directly from the command line using Python or Whisper.cpp. But raw Whisper is a batch transcription tool: you give it an audio file, wait, and get text back. That is useful for transcribing recordings but useless for real-time dictation.

A good Whisper app adds:

The app is where the user experience happens. The Whisper model is the engine, but the app is the car.

App Comparison

FeatureVoice Keyboard ProMacWhisperSuperwhisperWhisper.cpp CLI
PriceFree / $4.99/moFree / $29 once$8/moFree
Real-time dictationYesNo (batch only)YesNo
System-wideYesNo (own app)YesNo
Apple Silicon optimizedYes (Core ML)Yes (Core ML)YesYes (Metal)
AI cleanup7 actionsGPT integrationAI rewrite modesNo
Custom vocabularyYesNoNoVia prompt
iPhone appYesNoNoNo
OfflineYesYesYesYes
Best forDaily dictationBatch transcriptionQuick dictationDevelopers/scripts

Detailed Reviews

Voice Keyboard Pro

Voice Keyboard Pro uses Whisper as its transcription engine but adds several layers on top. The most significant is profession-aware vocabulary. Whisper's base model handles everyday English well but struggles with specialized terminology. Voice Keyboard Pro detects your profession and tunes the transcription pipeline to handle domain-specific terms: medical terminology for doctors, legal terms for lawyers, technical jargon for developers. This happens through a combination of custom Whisper prompting and post-processing.

Voice Keyboard Pro runs Whisper via Core ML on Apple Silicon, which takes advantage of the Neural Engine for inference. On an M2 MacBook Air, transcription latency for a 10-second dictation is under 1 second. On M4 Pro, it is nearly instantaneous. The app works system-wide: press a keyboard shortcut, speak, and text appears in whatever app has focus.

Voice Keyboard Pro also offers AI actions after transcription: clean up filler words, fix grammar, change tone, shorten, or translate. These use a language model, not Whisper, but the integration is seamless. Dictate a rough draft, tap a button, get polished text.

The free tier includes basic dictation. The $4.99/month plan adds AI actions, custom vocabulary, and longer dictation sessions.

Best for: Daily dictation across all apps. Professionals who need specialized vocabulary. People who want one tool for both Mac and iPhone. For a broader comparison including non-Whisper tools, see our best dictation app for Mac guide.

Honest limitation: The profession-aware features require the paid plan. If you just want basic Whisper transcription without the extras, MacWhisper or Whisper.cpp are cheaper.

MacWhisper

MacWhisper is a straightforward Mac app for batch transcription using Whisper. You drop in an audio file, or record directly in the app, and it transcribes using your choice of Whisper model. The free version uses the tiny and base models. The Pro version ($29 one-time) unlocks all model sizes and adds features like GPT-powered summaries, translation, and export to SRT subtitles.

MacWhisper is not a dictation tool. It does not work system-wide. You cannot press a shortcut and dictate into Gmail. Instead, you record or import audio into MacWhisper, wait for transcription, and copy the result. This makes it excellent for transcribing podcasts, interviews, lectures, and recorded meetings. It is not suitable for real-time typing replacement.

The app is well-optimized for Apple Silicon. Transcribing a 30-minute podcast with the large model takes about 3-4 minutes on an M2 chip. The interface is clean, the export options are comprehensive, and the one-time price is reasonable.

Best for: Batch transcription of recordings. Podcast producers, journalists, researchers, and anyone who transcribes existing audio files. Content creators who need subtitles.

Honest limitation: Not a dictation tool. No system-wide integration. No real-time transcription. If you want to speak and have text appear in your current app, MacWhisper is the wrong tool.

Superwhisper

Superwhisper is a dictation app that runs Whisper locally on your Mac. It works system-wide via a keyboard shortcut. Press the shortcut, speak, release, and text appears in your current app. The differentiator is its AI modes: you can switch between "Dictation" (faithful transcription), "Writing" (AI rewrites your speech into polished text), and "Translation" (speak in one language, get text in another).

Superwhisper lets you choose which Whisper model to use and downloads models locally. Smaller models transcribe faster but are less accurate. Larger models are slower but handle complex vocabulary and accents better. You can switch models depending on your needs.

At $8/month, Superwhisper is the most expensive option for individual users. The price reflects the AI rewriting features, which go beyond what basic Whisper transcription offers.

Best for: Users who want AI-enhanced dictation with flexible models. People who frequently switch between languages. Those who want their spoken thoughts reshaped into polished prose automatically.

Honest limitation: More expensive than alternatives. No iPhone app. No profession-specific vocabulary. The AI rewrite modes can be unpredictable if you want your exact words preserved rather than a polished version.

Whisper.cpp CLI

Whisper.cpp is Georgi Gerganov's C/C++ port of Whisper. It runs on the command line, uses Metal for GPU acceleration on Apple Silicon, and is completely free. You install it via Homebrew (brew install whisper-cpp), download a model, and run it on audio files.

For developers and power users, Whisper.cpp is the most flexible option. You can script it, pipe it into other tools, run it on batches of files, and customize every parameter. It supports real-time streaming transcription via the stream example, though the user experience is bare-bones compared to a native app.

Whisper.cpp is also the fastest way to run Whisper on Mac. It uses Metal compute shaders to run inference on the GPU, which is faster than Core ML for some model sizes. On an M4 Pro, the large-v3 model processes 30 seconds of audio in about 1.5 seconds.

Best for: Developers who want maximum control. Batch transcription pipelines. People who are comfortable with the command line and want free, no-strings-attached Whisper transcription.

Honest limitation: No GUI. No system-wide integration. No AI cleanup. Not practical for daily dictation unless you build your own tooling around it. The learning curve for non-developers is steep.

On-Device vs Cloud Whisper

OpenAI also offers Whisper as a cloud API. You send audio to their servers and get a transcript back. This raises an important choice: run Whisper locally on your Mac, or use the cloud version?

On-device advantages

Cloud API advantages

For most users, on-device Whisper is the right choice. The privacy benefit alone justifies it. Cloud Whisper makes sense for specific workflows like transcribing large backlogs of recorded audio. For more on offline transcription, see our offline voice-to-text guide.

Apple Silicon Performance

Whisper performance on Mac varies dramatically by chip. Here is what to expect for transcribing 30 seconds of audio:

ChipTiny ModelSmall ModelMedium ModelLarge-v3 Model
M10.3s1.2s3.5s8s
M20.2s0.9s2.8s6s
M30.15s0.7s2.2s4.5s
M4 Pro0.1s0.4s1.3s2.5s
Intel (i7)0.8s4s12s30s+

These are approximate times using optimized implementations (Core ML or Metal). Python Whisper without optimization is 3-5x slower. The key takeaway: any Apple Silicon Mac handles Whisper comfortably. Even the base M1 runs the small model fast enough for real-time dictation. Intel Macs are usable with smaller models but struggle with medium and large.

The Neural Engine on Apple Silicon is particularly important for Whisper performance. Apps that use Core ML (like Voice Keyboard Pro and MacWhisper) offload inference to the Neural Engine, which is specifically designed for machine learning workloads. This keeps the CPU and GPU free for other tasks and improves battery life compared to running Whisper on the GPU alone.

Which Whisper Model Should You Use?

Model choice matters more than app choice for transcription accuracy. Here is a practical guide:

Beyond Whisper: What Apps Add on Top

Whisper gives you raw transcription. The apps that build on it add value in several ways:

Vocabulary customization

Whisper does not know your company's product names, your industry's jargon, or your colleagues' names. Apps like Voice Keyboard Pro let you add custom vocabulary so these terms transcribe correctly. You add "Vercel" to your vocabulary once, and it stops being transcribed as "herself" or "for cell." This is the single biggest accuracy improvement you can make beyond choosing a larger model.

AI post-processing

Raw dictation includes filler words (um, uh, you know), false starts, and grammatical errors that come naturally when speaking. AI post-processing cleans these up. Voice Keyboard Pro offers actions like "clean up," "professional tone," "shorten," and "fix grammar." Superwhisper has writing modes that rewrite your dictation into polished prose. MacWhisper integrates with GPT for summaries and editing.

Context-aware formatting

When you dictate "three hundred dollars," should the app output "three hundred dollars" or "$300"? When you say "new line," should it type those words or insert a line break? Good apps handle this formatting intelligently based on context. For a deeper look at how Voice Keyboard Pro handles these technical details, see our under the hood article.

Frequently Asked Questions

What is OpenAI Whisper?

Whisper is an open-source speech recognition model created by OpenAI, trained on 680,000 hours of multilingual audio. It can transcribe speech in over 90 languages and runs locally on your device. Because it is open-source, developers have built it into dozens of apps for Mac, Windows, and mobile.

Does Whisper run well on Apple Silicon Macs?

Yes. Apple Silicon Macs are one of the best platforms for running Whisper locally. The Neural Engine and unified memory architecture handle Whisper inference efficiently. Even the base M1 chip runs the small model faster than real-time. The M3 and M4 chips are even faster, making the large model practical for real-time use.

What is the difference between Whisper model sizes?

Whisper comes in five sizes from tiny (39M parameters) to large (1.5B parameters). Larger models are more accurate, especially on accents, background noise, and unusual vocabulary, but they are slower and use more memory. For most Mac users, the small or medium model offers the best balance of accuracy and speed.

Is Whisper transcription free?

The Whisper model itself is free and open-source. You can run it for free via Whisper.cpp on the command line. Mac apps that use Whisper charge for the app experience: user interface, AI features, and convenience. Prices range from $29 one-time (MacWhisper) to $4.99-8/month (Voice Keyboard Pro, Superwhisper).

Can Whisper transcribe audio in languages other than English?

Yes. Whisper supports over 90 languages and can auto-detect the spoken language. English, Spanish, French, German, and Mandarin have the best accuracy. Less common languages work but with lower accuracy. All Whisper-based Mac apps inherit this multilingual capability, though the experience varies by app.

Should I use on-device Whisper or OpenAI's cloud Whisper API?

On-device is better for most people. It keeps your audio private, works offline, and has no per-minute costs. The cloud API is faster for transcribing large batches of recorded audio and always uses the latest model version. For real-time dictation, on-device is the clear winner because there is no network latency.

Whisper is the engine. The app is the experience. Choose based on how you want to use transcription, not which one claims the best Whisper implementation.

Try Voice Keyboard Pro, a Whisper-powered dictation app for Mac and iPhone: voicekeyboardpro.com. On-device transcription with profession-aware vocabulary built on top of Whisper.