Short answer: The best app to transcribe a meeting with speaker names and a summary on Mac is Voice Keyboard Pro's Meeting Mode. It captures the conversation, separates and labels each speaker, and generates AI notes with a summary and action items, all without uploading your audio.
A raw wall of meeting text is almost useless. If you have ever scrolled through a one-hour transcript that reads as a single unbroken paragraph with no idea who said what, you already know the two things that turn a transcript into something you can actually use: speaker names and a summary. Knowing that "Priya raised the budget concern" and that the meeting produced three action items is worth far more than 9,000 words of undifferentiated speech.
This guide explains exactly what to look for in a Mac meeting app that does both, how the speaker-labeling and summarizing actually work, and how to set it up so your meetings turn into clean, attributed notes you can paste into a doc the moment the call ends.
What you actually need from a meeting app
"Transcribe a meeting" sounds like one feature, but the useful version is really three jobs stacked together. A tool that does only the first leaves you doing the rest by hand.
- Accurate transcription. The words people said, captured reliably even with cross-talk, accents, and the occasional bad connection.
- Speaker separation and labels. The transcript broken into turns, with each turn attributed to a distinct speaker, so you can follow the conversation and quote people correctly.
- A summary and action items. A short readout of what was decided, what is open, and who owns what, so nobody has to re-read the whole thing.
Most generic dictation tools handle only the first. They give you text, and you are left to figure out who said which line and to write the summary yourself. The whole point of a dedicated meeting app is that it does all three in one pass.
A transcript tells you what was said. Speaker names and a summary tell you what it meant.
Meeting Mode on Voice Keyboard Pro
On Mac, Voice Keyboard Pro handles this with a dedicated Meeting Mode. The same menu bar app you use for everyday hold-to-talk dictation has a mode built specifically for multi-person conversations, and it covers all three jobs above:
- Speaker detection separates the conversation into turns and labels who is speaking, so the transcript reads as a back-and-forth instead of a monologue.
- AI notes produce a structured summary with the key decisions and action items pulled out of the discussion.
- Calendar meeting detection notices when a scheduled meeting is starting, so capturing it is one click instead of a scramble to set up after everyone has already started talking.
Because it lives in the menu bar, there is no separate heavyweight app to launch and no bot that joins the call as a visible participant. You start Meeting Mode, hold your meeting, and end up with an attributed transcript and a summary on your own machine. If you want the broader picture of running meetings this way, we covered it in meeting transcription on Mac and in our guide to dictation for meeting notes.
How speaker names actually work
This is the part people are most curious about, so it is worth being clear. Speaker detection works by analyzing the distinct vocal characteristics in the audio and grouping the speech into turns that belong to different voices. Out of the box, the app can tell that the conversation involves several different people and split the transcript accordingly, labeling them as separate speakers.
To turn "Speaker 1" and "Speaker 2" into real names, you assign the labels. Because the speakers are already separated into consistent groups, naming them is a quick mapping step: you match each detected voice to a person once, and that name carries through the transcript. The result is a document where every line of dialogue is attributed, which is what makes it quotable in your notes and searchable later.
A few practical factors improve speaker separation:
- Let people finish their sentences. Constant cross-talk is hard for any system to untangle, just as it is hard for a human note-taker. Conversations with reasonable turn-taking separate cleanly.
- Decent audio helps. A clear capture of the room or call gives the detection more to work with than a muffled feed.
- Distinct voices are easier. Two people with very similar voices are the hardest case for any speaker-detection tool, and may occasionally need a manual correction.
Getting a summary and action items
The transcript is the raw material; the summary is what you actually share. Voice Keyboard Pro's AI notes read the full attributed conversation and condense it into the parts a busy reader needs:
- A short overview of what the meeting was about and what was decided.
- Action items with the owner where the conversation made it clear who is responsible.
- Open questions that were raised but not resolved, so they do not get lost.
Because the summary is generated from the attributed transcript, the action items can be tied back to who raised or accepted them. That is the difference between "someone will follow up on the contract" and "Priya will follow up on the contract by Friday." When you paste the notes into your team doc or send them around after the call, that attribution is what makes them trustworthy.
Setting it up before your next call
The most common mistake is trying to set up transcription after the meeting has already started. A minute of preparation fixes that.
- Install Voice Keyboard Pro and grant the microphone permission macOS asks for. This is the same permission any meeting or recording tool needs.
- Check your audio source. Make sure the app is capturing the audio you intend, whether that is your room microphone for an in-person meeting or your system audio for a video call.
- Let calendar detection do the reminding. With calendar meeting detection on, the app prompts you when a scheduled meeting begins, so you are not relying on memory.
- Start Meeting Mode as the call opens. Beginning at the top of the meeting gives the cleanest speaker separation, since the detection has the full conversation to work from.
- Name your speakers once. Map the detected voices to real names, and let the labels carry through.
When the meeting ends, you have an attributed transcript and a summary ready to copy. For getting those notes into whatever app you live in afterward, our guide on how to dictate in any Mac app covers the same system-wide cursor approach that lets you drop text directly into your doc, email, or task manager.
Zoom, Teams, Meet, and in-person meetings
A frequent question is whether this is tied to a specific video platform. It is not. Because the app captures audio on your Mac rather than plugging into one meeting service's API, it works the same across the tools you already use:
- Video calls on Zoom, Microsoft Teams, Google Meet, or anything else, by capturing the call audio on your machine.
- In-person meetings in a room, by capturing through your Mac's microphone.
- Hybrid meetings with some people in the room and some on a call, since the audio reaches your Mac either way.
That platform independence matters because most teams do not live in a single tool. A meeting app that only works inside one video service leaves you stuck the moment a client sends a link for a different platform.
Privacy: where your meeting goes
Meetings are often the most sensitive content a person handles all week: deal terms, personnel discussions, unreleased plans. So it is worth stating plainly what happens to the audio. With Voice Keyboard Pro, our servers store only operational pings needed to keep the app running. We do not store your meeting audio or the transcript content. The notes you generate are yours, on your device.
If you are evaluating tools for a regulated or confidential environment, that distinction is the one to scrutinize first. Many meeting tools keep full recordings and transcripts on their servers indefinitely. The safer default for sensitive conversations is a tool that processes what it needs and does not retain your content.
Why not just record and upload later?
You could record the meeting, then upload the file to a separate transcription service afterward. People do this, and it works, but it adds friction at every step: you have to manage recordings, remember to upload them, wait for processing, and then still do the speaker labeling and summarizing as separate tasks. By the time the notes exist, the meeting is a day old and the momentum is gone.
Doing it live, in one flow, means the attributed transcript and the summary are ready when the call ends, while everything is fresh and you can still correct a misheard name from memory. For most people the live workflow is not just faster, it is the difference between notes that actually get written and notes that stay on the someday list. For broader options, our roundup of the best ways to capture meeting notes compares the trade-offs in more detail.
Turning the notes into something your team uses
An attributed transcript and a summary are only valuable if they reach the people who need them. The advantage of a menu bar app is that the text is already on your Mac, ready to go wherever you work, with no export-and-download dance.
A simple post-meeting routine keeps the notes alive instead of letting them rot in a folder:
- Skim the summary first. Confirm the decisions and action items match what you remember. A 30-second read while the call is fresh catches the occasional misheard figure or name.
- Fix any speaker mislabels. If two similar voices got crossed, correct them now while you still recall who said what.
- Paste the summary where work happens. Drop the action items into your project tracker, the decisions into your team doc, and the full transcript into your archive for searchable reference.
- Send it out the same hour. Notes that arrive while people still remember the meeting get read. Notes that arrive two days later get ignored.
Because the summary already separates decisions, owners, and open questions, this routine is mostly copy and paste rather than rewriting. The work of turning a conversation into shareable notes, which used to eat the half hour after every meeting, collapses into a couple of minutes.
Frequently asked questions
Does it work for a meeting with more than two or three people?
Yes. Speaker detection separates the conversation into distinct voices regardless of headcount. Larger groups with more cross-talk are inherently harder to untangle than a calm two-person call, so reasonable turn-taking and clear audio matter more as the group grows. You can still name each detected voice once and have the label carry through.
Can it label speakers in a recording I already have?
Meeting Mode is designed for live capture, where it produces the cleanest separation and lets you correct names from memory while the meeting is fresh. The strongest results come from running it during the call rather than processing an old file afterward.
What about meetings in other languages?
Voice Keyboard Pro's transcription engine handles a wide range of languages, so a meeting held in a language other than English can still be transcribed. Speaker separation is based on voices rather than words, so it does not depend on the meeting being in English.
Do remote participants have to install anything?
No. Because the app captures audio on your Mac, the other people on the call do not need the app, an account, or a plugin. You run it on your machine and capture the conversation from your side.
Will people see a bot join the meeting?
No. There is no participant bot that appears in the attendee list. The capture happens on your own machine through the menu bar app, so the meeting looks completely normal to everyone else.
The bottom line
If you want a Mac app that transcribes a meeting with speaker names and a summary, the feature you are looking for is a tool that does all three jobs together: accurate transcription, speaker separation with labels, and AI notes that surface the summary and action items. Voice Keyboard Pro's Meeting Mode is built for exactly that, works across every meeting platform because it captures audio on your machine, and keeps your content on your device rather than on a server.
There is a free tier with daily limits so you can run it on your next call and see the attributed notes for yourself. Pro is $4.99 a month or $34.99 a year. Try it on one meeting this week and compare the result to whatever you are doing now. A transcript with names and a summary, ready the moment the call ends, tends to make the old way feel like a lot of unnecessary work.