Concept: Meta has introduced Audio-Visual Hidden Unit BERT (AV-HuBERT), a speech recognition framework that can understand speech analyzing sound and the movement of the speaker’s lips. It claims that AV-HuBERT shows recognition accuracy 75% higher than other audiovisual speech recognition systems trained on the same number of transcriptions.

Nature of Disruption: AV-HuBERT leverages unsupervised, or self-supervised ML. This multimodal framework learns to detect language using a combination of audio and lip-movement inputs. It can train supervised learning, algorithms such as DeepMind on labeled example data until it can determine the underlying correlations between the examples and certain outputs. The technology can classify unlabeled data by analyzing it and learning from its inherent structure. Meta claims that the framework can also capture complex correlations between the two data types by merging cues like the movement of the lips and teeth during speech with audio information. AV-HuBERT, according to Meta, recognizes a person’s speech 50% better than audio-only models when loud music or noise is playing in the background. When voice and background noise are both equally loud, AV-HuBERT achieves a WER (Word error rate) of 3.2%, compared to 25.5% for the previous best multimodal model. It boasts that AV-HuBERT only utilizes a tenth of the labeled data, making it potentially useful for languages with limited audio data.

Outlook: According to Meta, AV-HuBERT could open new opportunities for constructing conversational models for low-resource languages like Susu in the Niger-Congo family because it requires less labeled data for training. It can also be used to develop speech recognition systems for those with speech impairments, as well as to detect deepfakes and generate realistic lip motions for virtual reality avatars. AV-HuBERT has the potential to be used in the future to improve the performance of speech recognition technologies in noisy everyday situations, such as at a party or in a crowded street market. This technique could also benefit smartphone assistants, AR glasses, and smart speakers with cameras.

How well do you really know your competitors?

Access the most comprehensive Company Profiles on the market, powered by GlobalData. Save hours of research. Gain competitive edge.

Company Profile – free sample

Thank you!

Your download email will arrive shortly

Not ready to buy yet? Download a free sample

We are confident about the unique quality of our Company Profiles. However, we want you to make the most beneficial decision for your business, so we offer a free sample that you can download by submitting the below form

By GlobalData
Visit our Privacy Policy for more information about our services, how we may use, process and share your personal data, including information of your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.