Microsoft has upped the ante in voice technology by leveraging artificial intelligence to mimic real individuals with Custom Neural Voice. The advancement represents a breakthrough in the natural language generation space, which includes competitors such as Google, AWS, IBM and others. While the innovation is commendable, it raises ethical concerns related to governance and potential use cases.
For decades the ability to converse with computers had been the stuff of science fiction movies. But technology that uses artificial intelligence to generate speech is widely available today. While computer generated speech has often sounded robotic, the Microsoft Custom Neural Voice solution is changing that. Custom Neural Voice can be trained to generate natural sounding speech that mimics a person. And not just a fictional person – but a specific individual.
The technology is already in use. At an AT&T retail store in Dallas, the voice of Bugs Bunny converses with customers, offering personalized greetings. In the future, the technology could be trained using specific actors’ voices, leading to applications in film making or television. Should an actor stumble over words or forget their lines in a script, it could be used to recreate the appropriate dialogue, thereby reducing the need for repeating and rerecording scenes.
Microsoft Custom Neural Voice could be just the beginning
However, the ability to imitate an individual’s speech (each person has a unique prosody, which is the tone and duration of phonemes, or units of sound) takes natural generation to another level. It also underscores the urgency for discussions related to responsible AI.
Imagine that a computer, with enough sample data, could be trained to sound like any individual, and to say anything. Similarly, deepfake videos, which use machine learning to generate visual content, can be made to depict individuals doing or saying just about anything, with improving quality. It doesn’t take a highly creative individual to see how it could be used for fraudulent or malicious purposes.
Microsoft says it has considered the implications of Custom Neural Voice and prioritizes responsible use of AI. The company must approve all applications of the technology and has built in safeguards to ensure speakers consent to the use of their voices. But if Microsoft can develop the technology, likely others aren’t far behind, including companies and individuals that may not have the same strong responsible AI principles or processes in place.
Responsible use of technology is an issue
The new solution highlights an issue that has cast a shadow over AI: responsible and acceptable use. It’s a difficult topic because what is considered acceptable use by one culture may not be acceptable to another – it can even vary from individual to individual. For example, facial recognition has been in the public eye over the past year, with a good number of people in the US and Europe eager to limit its use by law enforcement.
Elsewhere, adoption by law enforcement is seen as an improvement to public safety and creates a greater sense of security. What happens if analysis of facial movements is used to judge whether a person is lying, paying attention in class, or engaged in a meeting?
Technology moves quickly – often faster than society or the regulatory environment. But advances in AI are going to continue, and current events underscore the need for collective conversations on Responsible AI that includes policy makers, technology vendors, academics, and civil liberties organizations.
Yesterday it was facial recognition, today it’s natural language generation, tomorrow it may be something entirely different….now is the time to start having these difficult discussions and to address the need for regulations that guide its use, domestically and internationally.