META’s new text-to-speech for 1,100+ languages

Massively Multilingual Speech: Expanding Speech Technology to Over 1,100 Languages

The Massively Multilingual Speech (MMS) project represents a significant leap forward in speech technology, expanding support from approximately 100 languages to over 1,100 languages. This initiative aims to make information accessible to a broader audience, including those who rely on voice for information access, by equipping machines with the ability to recognize and produce speech in multiple languages.

Key Features

Supports speech-to-text and text-to-speech for 1,107 languages.
Offers language identification for over 4,000 languages.
Utilizes self-supervised learning and a new dataset for model training.
Outperforms existing models in multilingual speech recognition.

Main Use Cases

Enhancing accessibility for individuals who rely on voice to access information.
Preserving endangered languages by making them usable in technology.
Enabling more inclusive communication in various applications, from messaging services to VR/AR technology.

How to Use

Access the models and code on GitHub for research and development purposes.
Utilize the dataset for training new speech recognition and synthesis models.
Implement the technology in applications to support multilingual speech functionalities.

User Experience

The MMS project has demonstrated promising results in evaluations against benchmark datasets, showing a significant improvement in language coverage and performance compared to existing models. The models have been designed to minimize gender bias and domain-specific biases, ensuring equitable performance across different user groups.

Potential Limitations

The dataset primarily consists of religious texts, which may limit the diversity of content the models are exposed to.
The models may still have limitations in handling dialects and specific accents.
There is a risk of mistranscription, which could lead to offensive or inaccurate language output.

The MMS project underscores the commitment to advancing speech technology for a more inclusive and linguistically diverse world, inviting the research community to contribute to this ongoing effort.