正在加载视频...

视频加载失败

Today we released Meta Spirit LM — our first open source multimodal language model that freely mixes text and speech. Many existing AI voice experiences today use ASR to techniques to process speech before synthesizing with an LLM to generate text — but these approaches compromise the expressive aspects...

351,674 次观看 • 1 年前 •via X (Twitter)

10 条评论

AI at Meta 的头像
AI at Meta1 年前

More details, including links to the research paper, model weights and code 👇

$@®+#@|= 的头像
$@®+#@|=1 年前

Sounds ass ngl

floating point 的头像
floating point1 年前

Non commercial and poor quality speech? Sad 😔

Leocifer 的头像
Leocifer1 年前

europe

Tech Dev Notes 的头像
Tech Dev Notes1 年前

The demo was a bit ...

BensenHsu 的头像
BensenHsu1 年前

The study introduces S PI R IT -LM, a model that can generate both speech and text. It is based on continuously pre-training a text language model (L LAMA 2) with a combination of text-only, speech-only, and aligned speech-text datasets. S PI R IT -LM performs well on speech and text comprehension tasks, matching or exceeding the performance of previous speech-only and text-only models. It can also learn new tasks in a few-shot setting, both within and across modalities (speech-to-text and text-to-speech). The S PI R IT -LM-E XPRESSIVE version is the first language model that can preserve the sentiment of text and speech prompts both within and across modalities. full paper:

$Q*🍓on Ethereum 的头像
$Q*🍓on Ethereum1 年前

Everything happening at once

Hamza 的头像
Hamza1 年前

this preview seems to lack somewhat

Risphere 的头像
Risphere1 年前

The quality isn't that good.

Qual 的头像
Qual1 年前

I love you, but this demo was... well, let's just say it had a rough start! At first, I thought my speakers were broken because there was no sound for a few seconds. Then, when the sound finally kicked in, I was like, "Yep, my speakers are definitely broken!"

相关视频