By Ben Coxworth
For some time now, speech-recognition programs have existed that attempt to reproduce the user’s spoken words in another language. Such “speech-to-speech” apps, however, provide their translations using a very flat, synthetic voice. Now, experimental new software developed by Microsoft is able not only to translate between 26 different languages, but it plays the translated speech back in the user’s own voice – complete with the inflections they used when speaking in their own language. It looks like a real-life version of Star Trek‘s universal translator could soon be here.
The system was demonstrated this Tuesday at Microsoft’s Redmond, Washington, campus, by its inventor, Microsoft research scientist Frank Soong. He started by using the software to read out Spanish text using the voice of his boss, Rick Rashid, and then proceeded to use it to allow the company’s chief research and strategy officer, Craig Mundie, to converse in Mandarin.
So far, the program isn’t ready to go as soon as it’s been installed. Users must initially spend about an hour with it, training it to recognize and reproduce their voice. Once that’s been accomplished, the software applies that user-specific speech model to a generic text-to-speech model for the desired output language. Individual sounds of the user’s voice are selected from the training session, then strung together and appropriately altered, in order to create a natural-sounding translation.
It’s been suggested that such a system would make users more confident that their speech was being translated accurately, and that fewer misunderstandings would occur due to a lack of context – in other words, it would be more obvious if the speaker was being sarcastic, or exaggerating. It could also help facilitate the learning of foreign languages, as students may find it easier to imitate phrases spoken in their own voice.
Examples of a phrase spoken in different languages via the system can be heard in the link below.