A language is an art form, a mechanism by which culture and tradition are passed on through generations. There are more than 7,000 languages spoken globally but over 40% of these are endangered, meaning less than 1,000 speakers remain. Even more concerning is the pressing issue of how they are lost. As our society increasingly shifts online, how many of these languages will make the same transition? Currently, 60.4% of the top 10 million websites are in English, but English speakers only represent 16% of the global population, according to Ethnologue. If there is a bias toward dominant languages online, does society risk cultural homogenization?
Languages can range from incredibly dominant, for example, English with 1.35 billion speakers globally, to extremely rare, such as the indigenous Nepalese language Dumi with only seven speakers remaining. The loss of a language can represent the loss of cultural identity – if those languages are not present online, then neither is the associated culture. This only increases the already-present societal digital divide.
In 2016, the UN declared that internet access is a human right, however, this means little if only certain languages are found online. For example, a recent study of Google Search results for certain countries found that the US supplied over half of all first-page content. US content is defining what is read by others, creating a digital hegemony.
Language defines the online experience
Since English dominates the internet, large information gaps exist for non-dominant languages, reflected by limited locally generated content. UNESCO has stated that of the 7,000 languages, a mere 5% are online. Limited local content (such as local services) hinders the ability of people to use the internet, produce content, or interact with others. According to a CSA Research study, 76% of consumers prefer purchasing products in their native language. When people are excluded from content on the internet based on their language choice, the digital divide is widened.
Language preservation on social media
Social media should be used to connect speakers of endangered languages. Many rare languages are not documented, and the internet could also be used as a place for cultural preservation. In schools, children are often taught the more dominant languages and learn them through the media they consume. However, social media allows users to determine how and what language they use. And when people use social media, they often write more colloquially or phonetically.
Social media has become a powerful tool to help languages stay alive and increase the connectivity between remaining speakers. In fact, social media is described as a vital tool in the Welsh Government’s plans to help the country reach one million Welsh speakers. They aim to have the Welsh language be at the heart of all technological innovation so that it is available in all digital contexts.
Closing the divide
The internet originated in Silicon Valley, where the first language was English. However, Big Tech companies are starting to recognize the digital language divide and are working to increase the number of languages found on their platforms. However, the algorithms used on these platforms require vast amounts of data to be trained and, if this data does not exist, it can be difficult to achieve high levels of translation quality.
Many indigenous languages are spoken by only a few people and often only exist in oral form. However, Meta is improving the accuracy with which its AI algorithms decode and translate 55 marginalized African languages, aiming to boost technological inclusion in a project called No Language Left Behind. Meta’s open-sourced AI model is also being informed by native speakers, helping rare languages get an online presence so marginalized groups can fully participate online. Although the UN has recognized the importance of internet availability, the web presence of all languages is now increasingly being seen as vital to bridge the digital divide.