Dictating on a Mac without a Dragon

Dictating on a Mac without a Dragon

Today, I want to talk about using dictation as a writing method on a Mac. One popular option is Dragon Naturally Speaking. However, it is important to note that this software is becoming increasingly outdated for Mac users as Nuance, the company behind it, has stopped actively maintaining it.

If you would rather not invest in an outdated Mac version of Dragon Naturally Speaking or find it too expensive, there are now alternative ways to start dictating. These methods are particularly useful if you want to dictate away from your computer and then have the text transcribed automatically.

The process is straightforward and won't change much if you're already familiar with Dragon Naturally Speaking: Find a voice recorder, start speaking and when you are finished, connect the recorder to your computer and import the MP3 file of your recording. Then transcribe the file and make any necessary edits for future use.

The tools I am going to introduce today make this possible without relying on an online service or buying expensive software.

So what does the process look like in detail? You might be wondering if you're entirely new to dictation.

Firstly, it's important to have a clear idea of what you want to talk about. How you prepare for dictation is similar to how you would prepare for any writing session. Personally, I like to make an outline and follow it as I speak into the microphone.

For example, if I am writing a blog post, I can do it in one go and capture exactly what I want to see on the screen. Fictional content, on the other hand, usually takes me longer. I start with a rough version of the scene, dictating for about three to five minutes. Then I use that as a baseline or outline for the extended version that I'll include in my draft.

Once you've recorded your text, you'll need a tool to transcribe the audio for you.

One excellent solution is based on Whisper, a free AI model developed by OpenAI, the company behind ChatGPT. They have made the model available to developers so that they can create their own applications free of charge and adapt it to different use cases. It runs on your machine, which means that no data is sent to any service that might use your text to train their AI models. The field is evolving rapidly, with existing models being used by developers to create useful tools for mainstream users.

I am currently using a tool called MacWhisper. It is an easy-to-use software that specialises in transcribing audio from various sources, such as application recordings, voice recordings, audio streams from videos and live dictation from a microphone. While the free version of MacWhisper provides access to average accuracy transcription models, there is also a paid version available. The paid version offers the highest accuracy or the largest version of the Whisper model. However, the free version is perfectly adequate for most users, so it's worth starting there to see if it suits your needs. In my case, I opted for the paid version, which costs around $29 and gives me access to the large Whisper model, which is renowned for its accuracy.

The most significant difference I had to deal with when switching from Dragon Naturally Speaking to Whisper is that it doesn't support punctuation commands. For example, I used to say “Hello COMMA this is Benjamin” when dictating with Dragon, but this doesn't work in the basic version of Whisper.

However, Whisper is very good at inferring the correct punctuation from the text you're dictating, and it actually makes it easier to start dictating. In most cases, this inference is sufficient for the first draft, as we usually need to make changes anyway.

Once you have recorded your text, you can simply drag the MP3 file from your voice recorder into MacWhisper. Once you have downloaded the model you want, MacWhisper will begin to process the audio and work its magic. In my case, I use the large model for English and German, as these are the languages I write in most. The large model covers most languages, supporting about 100 different ones. It's important to note, however, that the accuracy of the model decreases as the language becomes more niche.

Transcribing the text takes some time. On my M2 MacBook Air, it often takes about 60% of the recording time. Once the transcription is done, I can copy and paste it into the next step.

However, I can't consider the text to be a first draft as there are usually issues that need to be fixed. Such as punctuation errors, unnecessary pauses like 'um' and 'mm', or moments where I pause to think about the next sentence. Addressing these issues can take a few minutes, depending on the length of the text. For short texts I typically prefer to do the editing manually, but for longer texts I recommend using Mistral, an open-source language model.

Mistral, trained in France, is currently designed primarily for English applications. It does have some understanding of other languages, but not to the same extent as English.

With Mistral, I can ask the AI to correct typos, add missing commas and clean up the text without changing the content. This saves me time compared to going through the whole text line by line, which is, in most cases, not a particularly exciting process.

So once I have the first draft, I can edit it further using my favourite text editor.

In the next post, I'll explain how to choose and use an appropriate language model for writing tasks that can run on your local machine.