Facebook Translator: the 2021 Update

Facebook Translator: the 2021 Update

Recently, Facebook AI released Multilingual LibriSpeech, a new large-scale, open source dataset to advance research in automatic speech recognition (ASR).

In 2020, Facebook developed an Artificial Intelligence capable of accurately translating between any pair of 100 languages without relying on first translating to English, as many existing systems do.

The AI outperforms such systems which was also assessed by humans, who scored it as around 90 per cent accurate.

Facebook’s system was trained on a data set of 7.5 billion sentence pairs gathered from the web across 100 languages. The model was trained by focusing on languages that are commonly translated to and from each other, grouping languages into 14 separate collections based on geography and cultural similarities. This was done to ensure high quality translation of more commonly used connections, and to train the model more accurately.

Recently in January 2021, Facebook AI released Multilingual LibriSpeech (MLS), a new large-scale, open source dataset to advance research in automatic speech recognition (ASR).

According to recent blog posts, its English-language data set is about 47 times larger than that of the original LibriSpeech (MLS: reading English Speech software), a corpus that contains 1,000 hours of English reads.

MLS already improved: When Facebook AI researchers trained a model on an MLS English subset, they produced a “20 percent improvement in word error rate compared with the same model trained using LibriSpeech data.”

Like LibriSpeech, MLS content comes from public domain audio-books and allows Facebook AI to release the data with a non-restrictive license.

The social media giant believes that “MLS will promote open and collaborative research in multilingual ASR and improve speech recognition systems in more languages around the world.”