This post was originally published by [email protected] (Carl Franzen) on Venture Beat.
Meta has just released a new multilingual automatic speech recognition (ASR) system supporting 1,600+ languages — dwarfing OpenAI’s open source Whisper model, which supports just 99.
Is architecture also allows developers to extend that support to thousands more. Through a feature called zero-shot in-context learning, users can provide a few paired examples of audio and text in a new language at inference time, enabling the model to transcribe additional utterances in that language without any retraining.
In practice, this expands potential coverage to more than 5,400 languages — roughly every spoken language with a known script.
It’s a shift from static model