Inside OS X 10.8 Mountain Lion GM: Dictation and speech

“In Mountain Lion, Macs are getting system-wide speech recognition, the same ‘Dictation’ feature Apple gave the new iPad at the beginning of the year. While it works well, it does require a network connection,” Daniel Eran Dilger reports for AppleInsider.

“Apple’s cloud-based Dictation feature, currently supported on the new iPad and as part of the broader Siri voice assistant feature of iPhone 4S, converts speech to text virtually anywhere,” Dilger reports. “It works by sending audio recordings of captured speech to Apple’s servers, which respond with plain text. While it doesn’t go as far as the more intelligent Siri, Dictation does intelligently cross reference the names and assigned nicknames of your contacts in order to better understand what you are saying.”

Advertisement: Save 10% on purchase of MacSpeech Scribe from Use coupon code SCRIBE10 during checkout.

Dilger reports, “Similar to Siri or Dictation on the new iPad, Dictation on Macs running OS X Mountain Lion pops up a simple mic icon when activated, which listens until you click or type the key to finish. Just as with Siri or dictation on the new iPad, Dictation under Mountain Lion is quite fast and highly accurate, but does require a network connection to function. If you don’t have a network connection, the Dictation input icon will simply shake, indicating that it is not available.”

Much more, including screenshots, in the full article here.

MacDailyNews Take: We’ve dictated commands to our Macs since the early 1990s (PlainTalk, Speakable Items). We didn’t need a network connection then. Why do we need it in 2012? Dragon Dictate for Mac and MacSpeech Scribe don’t require network connections, either – and they work wonderfully! We understand how Siri works, but mere dictation isn’t Siri. Why can’t Mountain Lion’s dictation feature work locally and free users from network dependency?

The network requirement seems to be an artificial limitation, part of the deal between Apple and Nuance, that’s meant to preserve Nuance’s revenue stream for standalone dictation software rather than for any actual technical reason. If so, bad form, Apple!


  1. It would require that the resources and libraries used in dictation be stored locally. Since this is a Nuance technology, it is unlikely that the OS has the necessary licensing required.

  2. Dictation programs that reside on the device usually have a much more limited spectrum of sound samples to work with and cannot improve as fast as a centralized system that can correlate sounds/phonemes sampled from thousands (millions) of effective translations. Over time, the network based approach can provide better results without any “training” of the device and accommodate wide variations in dialect, tone of voice and accents not to mention larger vocabularies and more idiomatic intelligence.
    I believe government security agencies have been using such centralized voice recognition software technologies to monitor large numbers of phone conversations way before the shrunk down (device based) versions came to market.

    1. Sitting in front of a fully functional Mac with a keyboard in front of you, how many people are going use dictation as a primary input for text. Out of that how many are going to use it on a regular, as opposed to novelty, basis. 1%? A tiny sliver of Mac owners.

      It would be more efficient to include the software as a downloadable option to OS X and then train it by saying a few key words into the microphone. Not only would the transliteration process be a lot faster, it wouldn’t be dependent on a reliable Internet connection during speech to text.

      1. 1% of all Mac owners is still a lot of people – and, of course, you don’t have to use it. But most people I know are very poor typists. I think a lot may prefer good voice dictation, if it’s just there, ready to go.
        I think, actually, as Sire and voice dictation get better and better, they will both be used more and more.

      2. An automatic transcript of a teleconference that is able to recognise accents of non-native English speaking participants will be an excellent use of the technology.

    2. Altos is correct. The back-end system that does this is based on quantum neural network technology developed in the 1990s, which has gradually worked its way into non-spy use. This technology is NOT computer-based and, instead, is an independent physical system of anyons interacting within the 2DEG environment of HEMTs. This system has been transcribing phone calls to text, suitable for data mining analysis, for over a decade. This function is part of ECHELON, and is operated by the Five Eyes (AUS CAN NZ UK US). The same AI also handles a bunch of other tasks, such as Face Recognition, Cognitive Signature Recognition and Emulation, et cetera. Apologies for the use of so much jargon, but no simpler terms to describe this stuff exist yet. Look it up in Wikipedia if you are interested, or seek out a Post Quantum Historical Retrospective.

  3. If so, Apple needs to take some of the mountain of cash at its disposal and buy Nuance out. And while they’re at it, then have it discontinue support for Windows and Android while they’re at it. (heh-heh)

  4. I use Nuance Dragon Naturally Speaking at work on the PC… 98% accuracy with no training. Bought it on sale for $49.00. I’d love to have it on my Mac w/o a network connection, but I’ll take this for now. Would I use it? Pretty much constantly, in combination with my mouse, trackpad, etc.
    Perhaps Apple has their reasons for not trying to Acquire Nuance outright, or perhaps they won’t sell. They seem to be in a pretty good position these days.

  5. “Dictation under Mountain Lion is quite fast and highly accurate”
    I hope so, because it sure isn’t on my new iPad.

    “Why can’t Mountain Lion’s dictation feature work locally and free users from network dependency?”
    Agreed, 100%!!

    1. Dictation on my iPad is incredibly accurate and fast. I’m comparing it to my iPhone 4S. Funny how it works. But it definitely works better than my 4s.
      And I agree I wish it would work without Internet connectivity.

  6. I’ve been using dictation software on the Mac since ViaVoice for Mac was released. Moved to iListen a few years later, then to Dictate. With very little training, Dictate is easily 98% accurate with most applications. The only application I experienced issues with initially was FileMaker Pro. Those problems have since been remedied through experimentation with the command set or a solution found on the Nuance forum.

    As a long time user of dictation, I was pleased to hear Apple was incorporating system wide dictation in ML. For me, it’s really the only feature I considered worthwhile. Having learned Dictation requires an internet connection to function, I’ll be staying with Dictate.

    1. I agree with you, Mark. I followed the same progression of voice aps as you did. In the past, I has also configured Speakable Items to do remarkable things (though not handle much dictation). I, too, don’t understand why network connectivity is required now. Even if the accuracy might be slightly reduced if handled locally, I believe that is what we should have. Agreeing with MDN on this, too.

  7. The fuck are people complaining about the network connectivity for? Dictation works great, and how often is your mac off the network?

    Also dictating commands via Speakable Items is nowhere in the league of dictating speech. And Speakable items always kinda sucked anyway.

    1. Dear all, dear Scrapdroid. Indeed, we are probably connected to the net anyway, so that part is fine. Checking online for better accuracy, welcome! Checking online for “references” to places / faces to hyperlink them to a map or an Facebook or other, brilliant. BUT…… surrendering your (business?) text to Apple? NO WAY! Not even Apple can have the rights they will have… very aware. I do not think Apple means evil, Google does when it translates “for free”. It should be that you can connect while training and then when you start using it for real it no longer connects. Then you tell it (are it can be a setting to default) to look for places / faces and it will do just that, without saving your texts for “future reference” or whatever.

    2. Scrapoid. It’s not a matter of whether it works great or not. For me, it’s a matter of security. For most people, the need for the internet connection hardly matters. For those who deal with Classified, Secret, or Top Secret material, anything that threatens operational security is a no-no. Dictation’s reliance on the internet and remote servers to learn are a security risk that is unacceptable.

  8. Hey Daniel! You agreed to the same NDA (Non-Disclosure Agreement) that I did as a 10.8 beta tester.

    So why are you BREAKING the NDA by talking about 10.8 BEFORE it’s released? Can’t you hold it in while we wait for 10.8 to be ready for prime time? I certainly can! It’s not nice to fool with mother Apple.

  9. How dictation works best:

    1) You have a noise cancelation microphone system. (Like the iPhone 4S has).

    2) The dictation software has been trained to understand your particular voice within one particular audio environment.

    If you don’t have both of the above, your dictation text translation is going to be compromised. That’s not going to change.

Reader Feedback

This site uses Akismet to reduce spam. Learn how your comment data is processed.