正在加载视频...
视频加载失败
Today we released Meta Spirit LM — our first open source multimodal language model that freely mixes text and speech. Many existing AI voice experiences today use ASR to techniques to process speech before synthesizing with an LLM to generate text — but these approaches compromise the expressive aspects... show more
10 条评论

More details, including links to the research paper, model weights and code 👇

Sounds ass ngl

Non commercial and poor quality speech? Sad 😔

europe

The demo was a bit ...

The study introduces S PI R IT -LM, a model that can generate both speech and text. It is based on continuously pre-training a text language model (L LAMA 2) with a combination of text-only, speech-only, and aligned speech-text datasets. S PI R IT -LM performs well on speech and text comprehension tasks, matching or exceeding the performance of previous speech-only and text-only models. It can also learn new tasks in a few-shot setting, both within and across modalities (speech-to-text and text-to-speech). The S PI R IT -LM-E XPRESSIVE version is the first language model that can preserve the sentiment of text and speech prompts both within and across modalities. full paper:

Everything happening at once

this preview seems to lack somewhat

The quality isn't that good.

I love you, but this demo was... well, let's just say it had a rough start! At first, I thought my speakers were broken because there was no sound for a few seconds. Then, when the sound finally kicked in, I was like, "Yep, my speakers are definitely broken!"



