Deep learning for Siri’s Voice: On-device deep mixture density networks for hybrid unit selection synthesis

“Starting in iOS 10 and continuing with new features in iOS 11, we base Siri voices on deep learning,” Siri Team writes for Apple’s Machine Learning Journal. “The resulting voices are more natural, smoother, and allow Siri’s personality to shine through.”

“Recently, deep learning has gained momentum the field of speech technology, largely surpassing conventional techniques, such as hidden Markov models (HMMs). Parametric synthesis has benefited greatly from deep learning technology,” Siri Team writes. “Deep learning has also enabled a completely new approach for speech synthesis called direct waveform modeling (for example using WaveNet), which has the potential to provide both the high quality of unit selection synthesis and flexibility of parametric synthesis. However, given its extremely high computational cost, it is not yet feasible for a production system.”

“In order to provide the best possible quality for Siri’s voices across all platforms,” Siri Team writes, “Apple is now taking a step forward to utilize deep learning in an on-device hybrid unit selection system.”

Read more in the full article here.

MacDailyNews Take: The new US English Siri voice certainly does sound better than ever!


  1. Why don’t they just make the voice mimic the voice of the owner? That should certainly be creepy enough. Or, and there’s no way this can’t be coming, have the voice be that of your favorite Hollywood star or starlet. Copyrighted, of course.

  2. This is great and all, but for the last week or so, for lots of people, Siri has completely forgotten what the word ‘today’ means, presumably due to some sort of server-side bug. So it doesn’t understand what you mean when you ask it to “set a reminder today at…”, which is utterly pathetic.

    So maybe Apple should just focus on getting the damn thing to work properly first, and then worry about the voice.

  3. Just tried it and she understands the concept of “today” perfectly and has set an alarm for me accordingly.

    As it’s working for others, it doesn’t appear to be a server issue. Maybe you’re not speaking clearly enough?

    Try it again and tell us if it works or not.

Reader Feedback

This site uses Akismet to reduce spam. Learn how your comment data is processed.