Alexa, Siri and Cortana may sound like the top three hipster baby names in 2021, but they are actually Amazon, Apple and Microsoft’s virtual assistants. In recent years, we have experienced a boom in speech recognition tools that understand what we are saying. And soon they could also understand how we are feeling.
The list of companies working on the development of emotion recognition technology is growing exponentially, and investors appear to be excited when it comes to emotionally intelligent tech.
The industry is undoubtedly booming, with estimates predicting that the global emotional intelligence market will grow to $64m by 2027.
The most common form of emotion detection software uses cameras to record and analyse facial expressions, body movements and gestures to detect how people are feeling.
But in recent years, there has been a growth in interest in speech emotion recognition technology. That is, software that can divine a person’s emotional state based on physiological changes in their speech patterns.
And the tech is increasingly being used in the wild. Many call centres already use speech emotion detection technology. If a computer notices that the person on the other end of the phone is angry, it may automatically redirect the customer to an operator who is specialised in calming people down.
The makers of smart assistants such as Amazon and Google are also experimenting with artificial intelligence (AI) that can scan people’s voices for signs of emotions. New technology suggests that smart home assistants such as Alexa and Google Home may soon be able to understand humanity’s wants and needs better than ever before – and, realistically, use that to sell us things more effectively.
Mood playlists made just for you
Music streaming service Spotify is one of the latest prominent names in tech, aiming for an emotional upgrade. The company recently acquired a patent that claims to analyse users’ voice and background noise to make inferences about their emotional state, gender, age and accent.
According to the patent filing, the technology can extract “intonation, stress, rhythm, and the likes of units of speech” that would merit the “emotional state of a speaker to be detected and categorised.” Combined with other data from the user’s listening history and past requests, Spotify could then recommend more personalised and mood-appropriate music.
Assume the platform notices that you are feeling stressed, it may suggest some calming meditation music. Comparably, it may play some upbeat music to keep you awake if it detects that you are dozing off while driving. Convenient right? But what are the trade-offs and are they worth it?
Privacy activists think not. In May, Access Now, Fight for the Future, Union of Musicians and Allied Workers and a coalition of over 180 musicians and human rights organisations sent a letter to Spotify calling on the company to make a public commitment never to use, license, sell or monetise this new speech emotion recognition technology, citing privacy and human rights violations.
Emotional recognition software works on the underlying assumption that a person’s emotional state can be readily inferred from their facial movements or speech patterns. This assumption alone is already highly contested. Moreover, many activists argue that – even if emotions could be deducted from a person’s face or voice – the current technology does not actually work.
This is (probably) how it works
First things first, approaching the automatic recognition of emotion requires an appropriate emotion representation model. This raises two main questions: how to represent emotion per se, and how to optimally put them into a quantifiable framework.
Most studies concerning emotion detection technology are based on Paul Ekman’s seminal theory. In the 1960s, Ekman travelled to Papua New Guinea to conduct a series of experiments to prove that all humans – regardless of culture, gender, geography or circumstances – exhibit the same set of six universal emotions: fear, anger, joy, sadness, disgust and surprise.
However, more recent studies have debunked Ekman’s theory and have shown that people communicate different emotions differently across cultures, situations and even across people within a single situation.
Opponents of emotional recognition technology have therefore pointed out that the scientific background is inconclusive at best and outright wrong at worst.
“Technology that assumes that it can determine people’s emotional state is going to be inherently biased,” Caitlin Seeley George, campaign director and project manager at digital rights group Fight for the Future, tells Verdict.
“It’s assuming that there is one normal, quote-unquote, version of what people’s emotions sound like and that’s just not realistic. Emotions obviously vary from individual to individual. They vary within cultures. It’s problematic to have this idea of a baseline of what and how emotions are determined.”
Moreover, Ekman’s theory focuses on facial expressions. Speech emotion recognition theory enters into even more uncharted territory. There is no widely accepted scientific study that outlines the correlation between a person’s voice and emotional state. In other words, very little is known about the scientific grounding of the technology that is out there.
“It’s a general issue that we don’t know what theories are used and what theories it’s based on. We don’t have transparency,” Daniel Leufer, Europe policy analyst at global human and digital rights organisation Access Now, tells Verdict.
Most speech emotion theories assume that various changes in the automatic nervous system can be identified in a person’s speech depending on their emotional state, and affective technologies can leverage this information to recognise emotion.
For example, speech produced in a state of fear, anger or joy becomes fast, loud and precisely enunciated with a higher and wider range in pitch, whereas emotions such as tiredness, boredom or sadness tend to generate slow, low-pitched and slurred speech. Some emotions have been found to be more easily computationally identified, such as anger or approval.
Of course, then there is the issue of quantification. Even if we could determine that all humans express emotions universally, it still needs to be configured into a model that can be captured by computers. Experts have on various occasions pointed out that AI leaves room for bias when the training data is biased.
“AI is a trained model, and the trained model is biased by the person who trained it,” Amir Liberman, CEO of the Israel-based speech emotion recognition company Nemesysco, tells Verdict.
Another issue is the high level of subjectivity involved in dealing with human emotions that cannot simply be captured in data.
In relation to Spotify’s patent filing, George points out: “The patent also says that it can determine gender, which is also a discriminatory issue because it’s impossible to infer gender without making assumptions or discriminating against transgender and non-binary people, who just don’t fit into specific gender stereotypes around what their voice sounds like.”
In the case of Spotify, a biased or flawed algorithm may end up playing the wrong songs because it is making incorrect inferences about listeners’ supposed emotions, gender, race etc. Annoying, no doubt, but arguably not the end of the world.
Yet, the stakes are much higher in areas such as education, policing, hiring and border controls, where emotion recognition software are increasingly being implemented.
Big Brother is listening
Bias is, however, not the only problem. The idea of speech emotion recognition is arguably a natural progression of speech recognition technology, which on its own is a hotly debated issue.
Activists and scholars point out that having devices that are listening to and recording all your conversations is a serious infringement of people’s freedom and privacy.
These devices don’t respond unless you say the wake word, i.e. you call them. For instance, saying “Hey Alexa” would trigger Amazon’s smart assistant. However, that doesn’t mean that they aren’t listening. In fact, smart assistants are constantly eavesdropping on your conversations in order to respond to audible cues, a fact that many people find creepy.
When it comes to emotion detection, the problem arguably goes even deeper. It touches upon not only issues of privacy but also the fundamental human right of freedom of expression.
“The thing with privacy is that it’s not an absolute right. There are sanctioned cases where there can be interference with someone’s privacy. […] freedom, of course, is an absolute right and it admits no interference whatsoever,” Leufer explains.
The knowledge that a device in the room is not only constantly listening in on you but also making inferences about your emotional state, gender, race, etc., may change how we behave on a daily basis and thus fundamentally alter the notion of freedom of expression.
“It’s an argument that hasn’t been used much,” Leufer adds. “Most of the opposition to AI systems have focused on data protection and privacy, but this is something that people are now considering, that, actually, this might be a really serious violation of human rights.”
Proponents of speech emotion recognition technology point out that emotional recognition can get machines to produce more appropriate responses in human-machine interactions.
Most automatic speech recognition systems such as Amazon’s Alexa and Apple’s Siri resort to natural language understanding to improve the accuracy of recognition of spoken words. The aim of emotion recognition is to improve that accuracy by guessing and interpreting the speaker’s emotional state.
A commonly cited positive use case for this kind of technology is in the health tech industry, notably to help people with disabilities.
Israeli company Voiceitt, for instance, is an AI-powered speech recognition app for individuals with speech impairments. CEO and co-founder Danny Weissberg explains that the app “translates atypical speech to allow users to communicate in their own voice with loved ones, caretakers and others.”
“A person with atypical speech can utilise Alexa to turn on and off lights, play music, turn on the television, etc.,” says Weissberg, adding that the aim is to give Voiceitt users “newfound independence, helping improve their quality of life.”
Meanwhile, for Liberman, it is all about getting to “the truth”.
“Humans are very bad at detecting other people’s emotions,” he says, arguing that technology can pick up on the slightest physiological changes in speech patterns that are untraceable to the human ear.
Regardless of whether that is true or not, it is worth questioning if machines should ever be in charge of making inferences about people’s emotions. According to Access Now and Fight for the Future, they should not.
“When it comes to emotional recognition technology, we don’t think this should be used, and we think it should not be developed,” George argues.
Similarly, Leufer maintains that “certain uses of AI systems” can never be permitted.
“There’s no amount of de-biasing or improvement of accuracy or legal safeguards. Some applications of AI are just such a deep violation of human rights that they have to be prohibited,” Leufer says.
I hear your concerns
So, where do we go from here? Organisations such as Access Now and Fight for the Future agree that laws need to be implemented to safeguard consumers’ privacy and freedom.
The European Data Protection Board came out earlier this year calling for a ban on the use of AI for automated recognition of human features, including emotional detection, citing concerns over discrimination and “risks to fundamental rights”.
Faster policy implementation is also important. “We know that our legislators often work very slowly, and so that’s one piece of the problem,” George emphasises.
In addition, Leufer points out that it is crucial to look at the political climate in each specific region. For instance, in Europe, there has been a strong push from activists to implement stricter legal foundations. In Latin America and Southeast Asia, the focus is largely on litigation. In North America, policy is important, but so is talking to the actual companies that use emotion recognition tools.
George also emphasises that companies should be held responsible for the technology they invent and implement.
As for Spotify, it has acquired its hotly contested patent, but it hasn’t said that it would apply it. After all, it is quite common for companies to acquire patents that end up just sitting on a shelf.
It’s frustrating because more than 180 artists signed on to this letter showing that this is a concern for the people who are in partnership with Spotify to share their music.
However, that is not enough for Access Now, Fight for the Future and the coalition of activists. They want the company to come out publicly and pledge that it will not use the technology, which it hasn’t done. The organisations presented Spotify with a final date on 18 May to make this promise which it failed to do.
“After the deadline, we responded and put out more public pressure, and they still have not responded to it, so we are kind of at a holding point right now in talking with some of the artists we’ve been working with and some of the other organisations that we’ve been partnering with to figure out what our next move is here,” George says.
The organisations are looking into the option of passing legislation that would stop Spotify from using this technology. Spotify did not respond to Verdict’s requests to comment on the issue.
George also points out that this is a risk for many smaller artists who depend on Spotify for an income. Conversations about artists boycotting Spotify are being floated. However, for most, that is not an option.
“There are a lot of smaller musicians that have signed on to the letter that are honestly just worried about how this will impact their career, and the fact that they signed on to this letter already shows that they’re going out on a limb,” she adds.
“It’s frustrating because more than 180 artists signed on to this letter showing that this is a concern for the people who are in partnership with Spotify to share their music, but we also know that Spotify is historically not a great partner for musicians and how they pay them and how they treat them. So I guess in some ways, it shouldn’t be totally surprising that they aren’t listening to them on this issue. It’s definitely an insight into how Spotify prioritises their profits over not only users but also the artists that they claim to be promoting.”