Short answer: The dictation software for Citrix or VDI that works in remote desktop is the kind that runs locally and types text as keystrokes, not the kind that needs your microphone inside the session. Dictate on your Mac, and the finished text flows into the remote window exactly like normal typing.
If you have ever tried to dictate inside a Citrix Workspace session, a VMware Horizon desktop, or a Windows 365 Cloud PC and watched it fail silently, you are not doing anything wrong. You have hit one of the oldest, least-documented limitations in remote work: audio almost never survives the trip into a virtual desktop, and the dictation tools that live inside that desktop have nothing to listen to.
This guide explains exactly why that happens, why the usual fixes do not hold up, and the one architecture that sidesteps the entire problem. The short version is that you should stop trying to make the remote session hear you, and instead make your local machine do the listening. Once you understand that distinction, dictating into Citrix and VDI becomes reliable rather than a daily gamble.
Why dictation breaks inside Citrix and VDI
A virtual desktop is not your computer. When you connect through Citrix, VMware Horizon, Microsoft Remote Desktop, the Windows App, Parallels, or a Cloud PC, what you see is a stream of pixels rendered on a server somewhere else. Your keyboard and mouse events are captured locally and forwarded over the connection protocol (ICA/HDX for Citrix, Blast or PCoIP for Horizon, RDP for Microsoft). The remote machine treats those forwarded events as if a keyboard and mouse were plugged directly into it.
Audio is a completely different story. For your microphone to reach a dictation program running inside the session, the remote client has to capture the mic locally, encode it, redirect it across the network, and hand it to a virtual audio device on the server. That chain is fragile for three reasons:
- It is frequently disabled. Microphone redirection is a security and bandwidth setting that IT administrators often turn off by default. In regulated industries, recording paths into the data center are locked down deliberately.
- It is latency-sensitive. Even when redirection is enabled, the round trip adds delay and jitter. Dictation engines are unforgiving about gaps and timing in the audio stream, so redirected microphone input degrades accuracy in ways local audio never would.
- It is quality-limited. Redirected audio is compressed to save bandwidth. The clean, full-fidelity signal a dictation engine wants rarely makes it through intact.
So when you install a dictation tool inside the virtual desktop and it either hears nothing or produces garbled text, the software is not broken. The audio simply is not arriving in usable form. Reinstalling, switching tools, or buying a more expensive package inside the session will not change the underlying plumbing.
The fixes people try first (and why they disappoint)
Before landing on the approach that works, most people cycle through a predictable set of dead ends. It is worth naming them so you do not lose a week to each.
Asking IT to enable microphone redirection
Sometimes this is possible, and if your administrator allows it and your dictation tool tolerates the latency, it can work. But it is the exception, not the rule. In many corporate, healthcare, finance, and government environments, redirecting a live audio path into the secure session is a non-starter for policy reasons, and no amount of asking will change that. Even when granted, you are now dependent on a setting that can be revoked in the next group policy update.
Installing dictation software inside the virtual desktop
This is the intuitive move, because that is where the application you want to type into lives. But the software inside the session can only use the audio the session receives, which brings you right back to the redirection problem. You also inherit the licensing, performance, and provisioning headaches of running heavy speech software on a shared VDI image.
Using a separate phone app and copy-pasting
Dictating into a notes app on your phone and then pasting into the remote session technically works, but it is slow, breaks your flow, and clipboard sharing into a locked-down VDI is often disabled too. It is a workaround, not a workflow.
The architecture that actually works: dictate local, type remote
Here is the key insight. The remote session cannot reliably hear your microphone, but it can always receive keystrokes, because keystroke forwarding is the entire reason remote desktops exist. So the winning design inverts the usual setup:
- Your microphone stays connected to your local Mac, where the audio is clean and full quality.
- Dictation runs locally on the Mac, never inside the session.
- The finished text is inserted at your cursor as ordinary keyboard input.
- Because your cursor is inside the focused remote window, those keystrokes are forwarded into the Citrix or VDI session exactly like anything you would type by hand.
To the remote server, there is no difference between text that arrives this way and text typed on a physical keyboard. It never needed to hear you, because the listening already happened on your side of the connection. Microphone redirection, audio compression, and admin policy simply stop being relevant.
This is precisely how Voice Keyboard Pro is built. It is a native macOS menu bar app: you hold a hotkey, speak, and release, and the transcribed text appears at your cursor in whatever window is focused, system-wide. When that focused window happens to be your Citrix Workspace, VMware Horizon, Microsoft Remote Desktop, or Parallels window, the text lands inside the remote app the same way it lands in a local one. The same approach that lets it type at the cursor in any Mac app is exactly what lets it reach into a remote session.
Stop trying to make the remote desktop hear you. Make your Mac listen, and let the remote desktop do what it already does well: receive keystrokes.
How to set it up step by step
The setup is short because there is nothing to install inside the virtual desktop. Everything happens on your Mac.
- Install the dictation app locally. Download Voice Keyboard Pro to your Mac, not into the VDI image. Grant it the standard macOS microphone and accessibility permissions when prompted; the accessibility permission is what allows text to be inserted at the cursor.
- Open your remote desktop client. Launch Citrix Workspace, VMware Horizon Client, Microsoft Remote Desktop, the Windows App, or Parallels, and connect to your session as usual.
- Click into the field you want to fill. Put your cursor inside the document, the email, the EMR field, the ticketing system, or whatever remote application you are working in. Keyboard focus must be on the remote window.
- Hold the hotkey and speak. Dictate a sentence, release, and watch the text appear inside the remote app. Because you are speaking into your local mic, accuracy is the same as it would be in any local application.
- Adjust insertion if needed. Most remote clients accept inserted keystrokes instantly. If a particular legacy app inside the session is picky about how fast characters arrive, dictate in shorter bursts. In practice this is rarely necessary.
That is the entire configuration. No group policy changes, no audio redirection request, no software pushed into the shared image, no help desk ticket. The work all happens on the machine you actually control.
What works, and the honest limits
It would be a disservice to promise that every remote scenario behaves identically, so here is the realistic picture.
What reliably works: Any remote client that runs as a normal macOS window and accepts standard keyboard input will receive dictated text. That covers the mainstream stack: Citrix Workspace app for Mac, VMware Horizon Client, Microsoft Remote Desktop, the Windows App, Windows 365 Cloud PC sessions, Amazon WorkSpaces, and Parallels Desktop. If you can type into the remote app by hand, you can dictate into it.
Where to test first: A small number of older or heavily locked-down enterprise apps inside the session validate input character by character or block programmatic keystrokes. These are uncommon, but if your environment uses one, dictate a short test sentence before you rely on it for a long document. You will know within ten seconds whether your specific session accepts inserted text.
What this does not do: This approach types text; it does not move your mouse or click buttons inside the remote session. That is exactly what you want for dictation. If you also need full hands-free control of the remote machine, that is a different category of accessibility tooling, and worth pairing with rather than expecting from a dictation app.
Why this matters for remote and hybrid workers
The dictate-local, type-remote model is not just a Citrix trick. It is the same reason voice input is finally practical for the way people actually work now. A large share of knowledge workers spend their day inside at least one remote or virtualized environment, whether that is a Cloud PC, a banking terminal, a hospital EMR, or a developer's jump box. If dictation only worked in local apps, it would be useless for the exact people who do the most repetitive typing into locked-down systems.
Consider the real beneficiaries. A clinician charting in a browser-based or virtualized EMR can speak notes instead of pecking them out between patients. A financial analyst inside a hardened VDI can dictate commentary into a remote spreadsheet. A support agent in a virtualized ticketing system can talk through a resolution instead of typing it. In each case the constraint was never the worker's voice; it was the assumption that the remote session had to do the listening. Once you move the listening to the local Mac, the constraint disappears. We cover the broader pattern in our guide to voice dictation for remote workers.
Citrix and VDI dictation: frequently asked questions
Do I need my IT department to approve anything?
For the local-dictation approach, generally no. You are installing software on your own Mac and granting it standard macOS permissions, and the remote session only ever receives ordinary keystrokes. There is no audio path into the data center to approve, because the audio never leaves your machine. That said, if your Mac itself is corporate-managed, follow your organization's normal software install policy.
Will the remote app know the text was dictated?
No. The remote application receives keyboard input and has no way to distinguish dictated text from typed text. From its perspective, someone typed.
Does it work over a slow or high-latency connection?
Yes, better than the alternative. Because the audio is processed locally and only the resulting text is sent as keystrokes, connection latency affects your dictation about as much as it affects your typing. You are not streaming a live audio feed across the link, so jitter and bandwidth limits that would wreck redirected microphone input are not a factor.
What about accuracy with technical or industry terms?
Because dictation runs locally, you get the full benefit of the app's accuracy features. Voice Keyboard Pro includes Smart Vocabulary, a personal dictionary with replacement rules, so the names, abbreviations, and jargon specific to the systems you work in are transcribed the way you actually use them. If you frequently dictate into a specialized remote application, building out that vocabulary is the single biggest accuracy upgrade you can make. For a broader look at choosing a Mac dictation tool, see our roundup of the best dictation software for Mac in 2026.
Can I use this on iPhone too?
The same principle applies anywhere your cursor lives in a text field. On the Mac the most common remote-desktop scenarios are covered by the menu bar app. On iPhone, Voice Keyboard Pro's custom keyboard puts a mic button into any app, so if you are reaching a remote system through a mobile client, you dictate into the field directly.
The bottom line
The reason dictation seems impossible inside Citrix and VDI is that almost everyone attacks it from the wrong side, trying to force a microphone into a session that was never designed to carry one well. Flip the model. Dictate locally where the audio is pristine, and let the remote desktop receive the finished text as keystrokes, which is the one thing it has always done flawlessly.
That is the difference between dictation software that fights your remote environment and dictation software that works in it. Voice Keyboard Pro is built around the second approach, with a free tier so you can confirm it works in your exact Citrix, Horizon, or Cloud PC setup before committing. Open your remote session, click into a field, hold the hotkey, and speak. The text will be there.