Audio & Voice

Whisper

OpenAI's speech recognition model that converts any recording into text.

View the project Step-by-step guide

Model freeLevel intermediate8 min read

By Thiago Lourenço Martins

* Independent and free analysis. Educational content, not sponsored or paid in any way. We show both the strengths and the limitations — an honest review so you can decide for yourself.

What it is

A digital stenographer that understands any language

Whisper is a speech recognition model created by OpenAI and released for free as open-source software. Think of it as a stenographer who never gets tired: you hand it a recording — a hearing, a meeting, a consultation — and it returns the complete text, without you typing a single word. Audio goes in, text comes out. That's it.

Why it matters

Hours of recordings that should have been text yesterday

A two-hour court hearing can generate over 30 pages of text if someone has to type it all manually. An expert witness who records findings by voice and then rewrites the same content. A manager who leaves a meeting without knowing what was decided because no one could write it all down.

~4 min

That is how long Whisper takes to transcribe 90 minutes of good-quality audio. The model processes speech faster than real time — what would take hours of typing is ready before you finish your coffee.

5 ways to use it

Who uses it and what for

01 Lawyer

Records the hearing on a phone, uploads the file to Whisper, and gets the full transcript. Instead of spending hours reconstructing what was said, they start drafting legal documents immediately — with the actual testimony in hand.

02 HR / Recruiting

Transcribes recorded job interviews and generates a summary per candidate. Eliminates note-taking during the conversation and keeps the focus on the interviewee, not the notepad.

03 Doctor / Therapist

Dictates medical notes by voice during or after the appointment. Whisper converts them to structured text, ready to review and save. No dictaphone, no secretary, no rework.

04 Journalist / Communicator

Records field interviews on a phone. Returns to the office with an audio file and, in minutes, has the full transcript ready to turn into a story — no time wasted listening and retyping.

05 Manager / Executive

Records meetings and transcribes afterward. Uses the text with ChatGPT to extract decisions, owners and deadlines. The meeting summary that took an hour to write is ready in 5 minutes.

Step by step

From zero to text in under 10 minutes

The method below uses Google Colab — a free, browser-based code notebook. No need to install anything. You don't need to know how to code: just copy and paste.

// interface verified in June 2026 — if something looks different, look for the button name in the Google Colab help

Go to colab.research.google.com
Sign in with any Google account. It is free.
Click "+ New notebook"
An empty code cell appears on screen.
think of it as a single instruction line you will send to the computer
Paste the install command and click "Run" (the triangle)
In the cell, paste exactly: !pip install openai-whisper — then click the triangle on the left. Wait for installation (may take 1–2 minutes).
Upload your audio file
In the left panel, click the folder icon. Then drag your audio file (.mp3, .mp4, .wav, .m4a) into the area that appears. Wait for the upload to finish.
the file disappears when you close Colab — this is normal; the results stay separately
Create a new cell and paste the transcription command
Click "+ Code" to create another cell. Paste the block below, replacing hearing.mp3 with the exact name of your file:
import whisper model = whisper.load_model("medium") result = model.transcribe("hearing.mp3", language="pt") print(result["text"])
Click "Run" and wait
The model loads and transcribes. For a 90-minute audio file, the process takes about 3 to 6 minutes. The full text appears just below the cell.
use "base" for faster results; use "large-v3" for maximum accuracy on technical terms

Copy and use now

From raw transcript to a formal legal summary

After getting the text from Whisper, paste this request into ChatGPT — along with the transcript:

prompt

Here is the transcript of a court hearing: [paste the text].

Create a formal summary with the following sections:
1. Participants (identify by role: judge, defense attorney, opposing counsel, witness)
2. Undisputed facts — points accepted by both parties
3. Points in dispute — key disagreements
4. Decisions made — with deadline and responsible party when mentioned

Use formal legal language and organize in numbered bullet points.

You get a structured summary ready to review, adapt, and sign — without rewriting everything from scratch.

What few people know

Tips & real limitations

Do this

Record in a quiet space with the microphone close to the speaker — Whisper does not filter background noise; audio quality determines transcript quality.
Use the "medium" or "large-v3" model for audio with technical terminology (legal, medical) — they make fewer errors on domain-specific terms.
Always review the final text before using it in official documents — proper names, ID numbers, and acronyms are the most error-prone parts.

Real limitations

Whisper does not identify who is speaking: the transcript comes out as continuous text without separating participants. Speaker diarization requires an additional tool.
Via the paid OpenAI API, the limit is 25 MB per file. Long recordings must be split beforehand — otherwise the request will fail.
The open-source model does not transcribe in real time — it needs the complete audio file to begin. For live captioning, dedicated real-time tools exist.

Want to see it in action?

A complete tutorial in Portuguese

Otávio Miranda

Whisper OpenAI: Guia Completo de Transcrição com Inteligência Artificial (vídeo e áudio)

Watch on YouTube →

* Independent suggestion, chosen for content quality. We have no relationship or sponsorship with this channel.

Challenge · 5 minutes

Transcribe 1 minute of audio right now

Record a 1-minute voice note on your phone — read any paragraph from a document in your field. Follow the steps above with the "base" model (fastest) and compare the generated text to the original.

It worked if more than 90% of the words are correct — including technical terms from your field. If it falls short, switch to "medium" and compare again.

Every day, a new tool explained.

Receive on WhatsApp