Why We Built Voice Keyboard Pro as a Native macOS App

All posts

When we started building Voice Keyboard Pro, the default choice would have been Electron. It is what most desktop apps choose today. Write once in JavaScript, ship to macOS, Windows, and Linux from a single codebase. The tooling is mature, the ecosystem is massive, and the time-to-market is fast. We chose the opposite path: native Swift, one platform, no cross-platform layer. This post explains why that decision was not a constraint we accepted reluctantly but the enabling choice that made Voice Keyboard Pro possible.

The Electron Problem

Electron is a good framework for many applications. If you are building a chat client, a note-taking app, or a project management tool, Electron gives you a reasonable UI with broad platform support. But Electron ships a full Chromium browser runtime with every application. That runtime has a cost: 150 to 300 MB of baseline RAM, a multi-second launch time, a JavaScript event loop that adds latency to every system interaction, and a security sandbox that blocks direct access to most operating system APIs.

For a text editor or a chat app, these costs are tolerable. For a dictation app, they are disqualifying. A dictation app needs to capture audio in real time, run speech recognition models on dedicated hardware accelerators, detect global hotkeys with sub-millisecond precision, and insert text into any application on the system through deep OS integration. Each of these requirements hits a wall in Electron that native development walks through.

Why Native Matters for Dictation

Latency is the product

When you release the hotkey after speaking, you expect text to appear immediately. Not in a second. Not after a loading spinner. Immediately. The perceived quality of a dictation app is almost entirely determined by the gap between "I stopped talking" and "I see text." Every millisecond in that gap erodes trust.

In Voice Keyboard Pro's native pipeline, the time from hotkey release to audio engine stop is under 2 milliseconds. Audio-to-transcription on the Neural Engine takes 300 to 450 milliseconds for a typical sentence. Text insertion via the Accessibility API takes under 5 milliseconds. Total end-to-end: under 500 milliseconds. You cannot achieve this through a JavaScript event loop, IPC bridges, and Web Audio API abstractions. The overhead of those layers alone would exceed our entire latency budget.

System integration is not optional

Voice Keyboard Pro uses four macOS system APIs that are either unavailable or severely limited in Electron:

CGEvent taps for global hotkey detection. This low-level API intercepts keyboard events system-wide. It is what makes hold-to-speak work regardless of which application is in the foreground. Electron apps can use global shortcuts, but they cannot implement the precise press-and-release detection that hold-to-speak requires.
The Accessibility API for text insertion. Voice Keyboard Pro injects text directly at the cursor position in any application without touching the clipboard. This requires querying the focused UI element hierarchy and programmatically setting its value. The Accessibility API is a native C/Objective-C API with no JavaScript binding.
AVAudioEngine for real-time audio capture. Native audio provides direct access to the hardware input with controlled buffer sizes and sample rates. The Web Audio API available in Electron adds abstraction layers designed for browser sandboxes, not real-time dictation.
Core ML and the Neural Engine for on-device speech recognition. Voice Keyboard Pro runs Whisper models locally on the Neural Engine, Apple's dedicated ML accelerator. Core ML is a native framework with no web equivalent. An Electron app would need to shell out to a separate process for inference, adding IPC latency and architectural complexity.

You could theoretically bridge each of these from Electron through native Node modules. Some apps do. But each bridge is a fragile layer that adds latency, increases crash surface, and requires maintenance as Apple updates its APIs. For a dictation app where all four capabilities are exercised on every single use, the compound fragility of four native bridges is not a reasonable engineering choice.

Swift and SwiftUI

Voice Keyboard Pro is written entirely in Swift. The UI layer uses a combination of SwiftUI for settings panels, history views, and configuration interfaces, and AppKit for the menu bar item, global event monitoring, and window management. SwiftUI's declarative approach keeps the UI code lean. AppKit provides the low-level system access that a menu bar utility requires.

Swift itself matters for performance-sensitive code paths. The audio pipeline, the VAD calculation, the Accessibility API interaction, and the Core ML inference scheduling are all written in Swift with no garbage collection pauses, no JIT compilation variability, and deterministic memory management through ARC. For code that runs on a real-time audio thread, deterministic execution is not a luxury. It is a requirement.

The language also matters for long-term maintenance. Swift's type system catches entire categories of bugs at compile time that would be runtime errors in JavaScript. For a background app that runs all day and handles audio, this reduces the frequency of subtle crashes and state corruption that are difficult to debug in production.

Privacy by Architecture

Privacy in Voice Keyboard Pro is not a policy promise. It is an architectural fact. Because the speech recognition model runs locally on the Neural Engine, audio never leaves your Mac. There is no server to receive it, no API endpoint to send it to, no intermediate storage to compromise. If you wiretapped Voice Keyboard Pro's network traffic, you would see app update checks and nothing else during basic dictation.

This architecture is only possible because of native development. Running a Whisper model efficiently on-device requires Core ML and Neural Engine access. A web-based or Electron-based app would need to send audio to a cloud API for transcription, which means your spoken words travel across the internet, get processed on someone else's server, and are stored (at least transiently) in someone else's infrastructure. The entire privacy model of Voice Keyboard Pro depends on capabilities that only native development provides.

Performance: 30 MB vs 200 MB

Voice Keyboard Pro idles at roughly 28 to 32 MB of RAM. A comparable Electron app starts at 150 to 200 MB before doing anything useful, because it must load Chromium's renderer, V8 JavaScript engine, and Node.js runtime. For a menu bar utility that runs from login to shutdown, the difference between 30 MB and 200 MB is significant. It is RAM that your browser, your IDE, and your other tools cannot use.

CPU usage at idle is effectively zero. There is no JavaScript event loop running idle timers, no garbage collector waking periodically, no Chromium render loop painting a UI that nobody is looking at. macOS can fully deprioritize the process. It contributes nothing to energy impact. Users consistently tell us they forget Voice Keyboard Pro is running, which is exactly what a background tool should achieve.

During active dictation, CPU usage is minimal because the heavy computation (speech recognition) runs on the Neural Engine, not the CPU. The audio pipeline runs on a high-priority Core Audio thread with less than 2% CPU utilization. The combination means you can dictate while running a full build, a video call, and a dozen browser tabs without any of them competing for resources.

The Menu Bar Philosophy

Voice Keyboard Pro has no dock icon and no main window. It lives as a small icon in the menu bar. When you click it, a popover drops down showing your recent dictations, statistics, and settings. When you close the popover, the app disappears from view entirely. The only visual evidence of its existence is the menu bar icon and the text that appears at your cursor when you dictate.

This design philosophy is only achievable with native development. NSStatusItem is a native AppKit class that creates a properly behaving menu bar item. It follows system conventions for positioning, appearance, and interaction. It respects the user's menu bar density settings. It responds correctly to dark mode, accent color changes, and accessibility settings like increased contrast and reduced transparency. An Electron menu bar app can approximate this, but the small deviations from native behavior (a slightly wrong hover state, a popover that does not dismiss correctly, a click target that is a pixel off) accumulate into the feeling that the app does not quite belong.

For a tool that runs all day, "does not quite belong" is a slow-acting poison. Users eventually replace it with something that feels right. Native development ensures Voice Keyboard Pro feels right from the first interaction.

Apple Silicon Optimization

Every Mac sold since late 2020 runs on Apple Silicon. The M1, M2, M3, and M4 chips share an architecture that includes unified memory, a high-performance GPU, and the Neural Engine. Voice Keyboard Pro is built to exploit this architecture specifically.

The Whisper model runs on the Neural Engine through Core ML. The audio pipeline runs on the efficiency cores through Core Audio. The UI renders through SwiftUI's native Metal-backed compositor. There is no abstraction mismatch between what the silicon provides and what the app uses. Every frame of audio, every inference pass, and every pixel of UI follows the path that Apple designed for maximum efficiency on their hardware.

This optimization is not possible through a compatibility layer. Electron apps run on V8, which runs on the CPU cores. They access audio through Web APIs, which run through additional kernel abstraction. They render through Skia (Chromium's rendering engine), which does not use the same compositor pipeline as native SwiftUI views. At every level of the stack, a cross-platform layer adds translation overhead between what the app wants to do and what the hardware can do. Native development eliminates that translation entirely.

Why Mac First

We chose to build for Mac first because Mac users are disproportionately knowledge workers who produce large volumes of text daily. Developers, writers, designers, researchers, and business professionals who rely on their Mac as their primary work machine. These are the people who benefit most from voice-to-text and who are most willing to adopt a tool that saves them meaningful time.

The Mac also provides the best technical foundation for on-device dictation. Apple Silicon's Neural Engine is the most efficient consumer ML accelerator available. macOS's Accessibility API is the most capable system for application-level text insertion. Core Audio is a mature, battle-tested real-time audio framework. The platform provides everything a dictation app needs, and it provides it at a quality level that other platforms do not yet match.

The iPhone Extension

Voice Keyboard Pro extends to iPhone with a keyboard extension that brings the same voice-to-text experience to iOS. The keyboard runs Whisper models on the iPhone's Neural Engine, uses the same profession-aware vocabulary, and maintains the same privacy model: all audio is processed on-device.

Building the iPhone keyboard was possible precisely because we went native. The same Core ML models that run on the Mac's Neural Engine run on the iPhone's A-series or M-series chip. The same Swift code that handles audio processing, VAD, and post-processing compiles for both platforms with minimal modification. A shared foundation of native code gave us an iPhone keyboard without building a second app from scratch.

If we had built Voice Keyboard Pro in Electron, there would be no path to an iPhone keyboard. Electron does not run on iOS. We would have needed an entirely separate technology stack for mobile, with no code sharing, no model sharing, and no architectural consistency. Going native on Apple's platforms gave us both Mac and iPhone from a unified engineering investment.

The Cost of Going Native

We are not pretending there is no tradeoff. Going native means Voice Keyboard Pro runs on macOS and only macOS. There is no Windows version. There is no Linux version. A significant number of potential users are on platforms we do not serve. We accept this deliberately.

A dictation app that runs all day, captures audio from the hardware microphone, runs ML models on a dedicated accelerator, and inserts text into other applications through system-level APIs is deeply coupled to the operating system. Building it natively for one platform produces an experience that feels like it belongs. Building it cross-platform would mean compromises in latency, memory, integration, and privacy that directly undermine the core value proposition.

If demand warrants expansion, we will consider native builds for other platforms. Each would be built natively for that platform, using its native audio frameworks, its native accessibility APIs, and its native ML acceleration. Not ported through a compatibility layer. The lesson of Voice Keyboard Pro is that the quality of a background tool depends entirely on the quality of its integration with the host system. You cannot fake that through abstraction.

Frequently Asked Questions

Why doesn't Voice Keyboard Pro use Electron like most desktop apps?

Electron bundles a full Chromium browser, which consumes 150 to 300 MB of RAM at baseline. For a menu bar dictation app that runs all day, this overhead is unacceptable. Voice Keyboard Pro uses 30 MB. More critically, Electron cannot access macOS system APIs directly. The Accessibility API for text insertion, CGEvent taps for global hotkeys, and the Neural Engine for on-device AI inference all require native frameworks that Electron cannot reach without fragile bridging layers.

Will Voice Keyboard Pro come to Windows or Linux?

Voice Keyboard Pro is currently macOS-only. If we build for other platforms, each version will be native to that platform, not a cross-platform port. A dictation app is deeply integrated with the operating system's audio, accessibility, and input systems, and these differ fundamentally across platforms. A single codebase cannot serve all three well.

Does being native mean Voice Keyboard Pro is faster than cloud-based dictation?

For the parts that happen on your machine, yes. Audio capture starts in under 15 milliseconds. Speech recognition runs on the Neural Engine without network latency. Text insertion via the Accessibility API takes under 5 milliseconds. Cloud dictation tools add 200 to 500+ milliseconds of network round-trip on top of their processing time. Voice Keyboard Pro's total end-to-end latency for a typical sentence is under 500 milliseconds.

What is the Apple Neural Engine and how does Voice Keyboard Pro use it?

The Neural Engine is a dedicated machine learning accelerator built into every Apple Silicon chip (M1, M2, M3, M4, and A-series). It is designed specifically for the matrix operations that AI models require, and it runs them at a fraction of the power draw of GPU or CPU inference. Voice Keyboard Pro runs Whisper speech recognition models on the Neural Engine via Core ML, enabling fast, efficient, on-device transcription without draining your battery.

Try Voice Keyboard Pro

Voice Keyboard Pro is what happens when you build a dictation app the way Apple would build it: native Swift, minimal memory, maximum integration, privacy by default. If you have been waiting for a voice-to-text tool that feels like it belongs on your Mac rather than running inside a browser pretending to be an app, download Voice Keyboard Pro and see the difference native makes. It is free to start, and you will feel it in the first dictation.