1. Comment
November 2, 2020updated 03 Nov 2020 12:48pm

Facebook’s machine learning translation software raises the stakes

By GlobalData Thematic Research

Facebook has launched a multilingual machine learning translation model. Previous models tended to rely on English data as an intermediary. However, Facebook’s many-to-many software, called M2M-100, can translate directly between any pair of 100 languages. The software is open-source with the model, raw data, training, and evaluation setup available on GitHub.

M2M-100, if it works correctly, provides a functional product with real-world applications, which can be built on by other developers. In a globalized world, accurate translation of a wide variety of languages is vital. It enables accurate communication between different communities, which is essential for multinational businesses. It also allows news articles and social media posts to be accurately portrayed, reducing instances of misinformation.

Overhyping AI distracts from necessary developments

GlobalData’s recent thematic report on AI suggests that years of bold proclamations by tech companies eager for publicity have resulted in AI becoming overhyped. The reality has often fallen short of the rhetoric. Principal Microsoft researcher Katja Hofmann argues that AI is transitioning to a new phase, in which breakthroughs occur but at a slower rate than previously suggested. The next few years will require practical uses of AI with tangible benefits, addressing AI to specific use cases.

M2M-100 provides 2,200 translation combinations of 100 languages without relying on English data as a mediator. Among its main competitors, Amazon Translate and Microsoft Translator both support significantly fewer languages than Facebook. However, Google Translate supports 108 languages, both dead and alive, having added five new languages in February 2020.

Google and Facebook’s products have offer differences. Google uses BookCorpus and English Wikipedia as training data, whereas Facebook analyzes the language of its users. Facebook is, therefore, more suitable for conversational translation, while Google excels at academic style web page translation. Google performs best when English is the target language, which correlates to the training data used. Facebook’s multi-directional model claims there is no English bias, with translations functioning between 2,200 language pairs. Accurate conversational translations based on real-time data and multiple language pairs can fulfil global business needs, making Facebook a market leader.

Open-source machine learning allows continued innovation

Facebook’s strength in this aspect of AI is unsurprising. GlobalData has given the company a thematic score of 5 out of 5 for machine learning, suggesting that this theme will significantly improve Facebook’s future performance.

However, natural language processing (NLP) can be problematic, with language semantics making it hard for algorithms to provide accurate translations. In 2017, Facebook translated the phrase “good morning” in Arabic, posted on its platform by a Palestinian man, as “attack them” in Hebrew, resulting in the sender’s arrest by Israeli police. The open-source nature of the software will help developers recognize pain points. It also allows innovation, enabling multilingual models to be advanced in the future by developers.

Language translation is a high-profile use case for AI due to its applications in conversational plaforms like Amazon’s Alexa, Google’s Assistant, and Apple’s Siri. The tech giants are racking to improve the performance of their virtual assistants. Facebook’s M2M-100 announcement will raise the stakes in AI translation software, pushing the company’s main competitors to respond.

In an interconnected, globalized world, accurate translation is essential. Facebook has used its global community and access to large datasets to progress machine learning and AI, creating a practical, real-world use case. Allowing access to the training data and models propels future developments, moving linguistic machine learning away from a traditionally Anglo-centric model.