Загрузка видео...

Не удалось загрузить видео

На главную

LegoGPT, an LLM-based system that generates physically stable LEGO structures from text prompts, backed by a new 47,000+ sample dataset and physics-aware filtering during inference. → LegoGPT is trained on a custom dataset, StableText2Lego, which includes 47,000+ 3D LEGO models mapped to text, spanning 28,000+ unique objects. → The...

75,260 просмотров • 1 год назад •via X (Twitter)

Комментарии: 10

Фото профиля RTTS
RTTS1 год назад

Testing Salesforce presents unique challenges due to its complexity, scalability and customizability. ​ RTTS can plan, design & automate a successful testing process for you.

Фото профиля Justin Obney
Justin Obney1 год назад

This is dope. Check out this exploration I did with my kids.

Фото профиля Rohan Paul
Rohan Paul1 год назад

cool.. 👍

Фото профиля Sanskar Pandey
Sanskar Pandey1 год назад

the logical conclusion to NLP is its intersection with robotics @ruhzi57

Фото профиля Jacek (Jomsborg.eth)
Jacek (Jomsborg.eth)1 год назад

L(L)M. We should care more about what cannot be expressed through language. All rest is like LEGO.

Фото профиля ✨
1 год назад

@_mcbench irl

Фото профиля Jack Lau
Jack Lau1 год назад

Can't wait to see what others come up with using it.

Фото профиля Varun K | AI Insights
Varun K | AI Insights1 год назад

LegoGPT actually sounds like the future of playtime! 47k+ models, physics checks, AND it’s tested with real humans and robots building the stuff?? according to Tom's Hardware, it nails physical stability 98% of the time. gonna try this out for my next LEGO binge lol

Фото профиля Vinayak
Vinayak1 год назад

Damn it's sooo cool, I wanna work on this amazing stuff one day!

Фото профиля cryptobiot
cryptobiot1 год назад

don't tell me chatgpt is now taking my childhood lego master builder dream job, too

Похожие видео

We’re excited to announce the release and open-source of HunyuanImage 3.0 — the largest and most powerful open-source text-to-image model to date, with over 80 billion total parameters, of which 13 billion are activated per token during inference.The effect is completely comparable to the industry’s flagship closed-source model.🚀🚀🚀 HunyuanImage 3.0 originates from our internally developed native multimodal large language model, with fine-tuning and post-training focused on text-to-image generation. This unique foundation gives the model a powerful set of capabilities: ✅Reason with world knowledge ✅Understand complex, thousand-word prompts ✅Generate precise text within images Different from traditional DiT architecture image generation models, HunyuanImage 3.0’s MoE architecture uses a Transfusion-based approach to deeply couple Diffusion and LLM training for a single, powerful system. Built on Hunyuan-A13B, HunyuanImage 3.0 was trained on a massive dataset: 5 billion image-text pairs, video frames, interleaved image-text data, and 6 trillion tokens of text corpora. This hybrid training across multimodal generation, understanding, and LLM capabilities allows the model to seamlessly integrate multiple tasks. Whether you're an illustrator, designer, or creator, this is built to slash your workflow from hours to minutes. HunyuanImage 3.0 can generate intricate text, detailed comics, expressive emojis, and lively, engaging illustrations for educational content. The current release focuses solely on text-to-image generation and future updates will include image-to-image, image editing, multi-turn interaction, and more. 👉🏻Try it now: 🔗GitHub: 🤗Hugging Face:

Tencent Hy

412,616 просмотров • 9 месяцев назад

We've officially released and open-sourced HunyuanImage 2.1, our latest text-to-image model. The new model delivers on our commitment to balancing performance and quality. With native 2K image generation, HunyuanImage 2.1 is an advanced open-source text-to-image model.🎨 ✨ New in 2.1: 🔹Advanced Semantics: Supports ultra-long and complex prompts of up to 1000 tokens, and precisely controls the generation of multiple subjects in a single image. 🔹Precise Chinese and English Text Rendering with seamless image–text integration: The model naturally integrates text into images, making it suitable for a wide range of applications such as product covers, illustrations, and poster design to meet the needs of various fields. 🔹Rich Styles and High Aesthetic: Capable of generating images in various styles—including photorealistic portraits, comics, and vinyl figures—it delivers outstanding visual appeal and artistic quality. 🔹High-Quality Generation: Efficiently produces ultra-high-definition (2K) images in the same time other models take to generate a 1K image. HunyuanImage 2.1 uses two text encoders: a multimodal large language model (MLLM) to improve the model's image and text alignment capabilities, and a multi-language character-aware encoder to improve text rendering capabilities. The model is a single- and double-stream diffusion transformer with 17B parameters. We've also open-sourced the weights of the the accelerated version with meanflow which reduces inference steps from 100 to just 8, and PromptEnhancer, the first industrial-grade rewriting model that enhances your prompts for more nuanced and expressive image generation. Now, creators turn complex ideas—like posters with slogans or multi-panel comics—into visuals faster than ever. We’re just getting started. Stay tuned for our native multimodal image generation model coming soon. 🌐Website: 🔗Github: 🤗Hugging Face: ✨Hugging Face Demo:

Tencent Hy

89,257 просмотров • 9 месяцев назад

🇨🇳 Another great Chinese Model, OmniHuman-1.5 from ByteDance Turns 1 image plus a voice track into expressive avatar video by pairing a System 1 and System 2 inspired planner with a Diffusion Transformer, Produces coherent motion for over 1 minute with moving camera and multi character scenes. Most avatar models move to the beat of the audio but miss meaning, so gestures feel generic and emotions feel shallow. The fix here is a Multimodal LLM planner that listens to the speech and drafts a structured plan describing intent, emotions, beats, and high level actions, which gives the motion engine clear semantic targets instead of only rhythm. The motion engine is a Multimodal Diffusion Transformer that fuses the plan with audio, the single reference image, and optional text prompts, then synthesizes continuous body, face, and head motion that matches both words and tone. A key trick is a Pseudo Last Frame, a synthetic target that summarizes the next expected state, which stabilizes fusion across modalities and keeps motion consistent over long spans. From just 1 image and speech, the system outputs speaking avatars with synchronized lips, context aware gestures, and continuous camera movement, and it also supports multi character interactions without manual choreography. Reported results show strong lip sync accuracy, high video quality, natural motion, and close match to text prompts, and the same setup works on nonhuman characters too.

Rohan Paul

63,859 просмотров • 10 месяцев назад

Today is a good day for open science. As part of our continued commitment to the growth and development of an open ecosystem, today at Meta FAIR we’re announcing four new publicly available AI models and additional research artifacts to inspire innovation in the community and help advance AI in a responsible way. More in the video from Joelle Pineau. What we’re releasing: 🦎 Meta Chameleon 7B & 34B language models that support mixed-modal input and text-only outputs. 🪙 Meta Multi-Token Prediction Pretrained Language Models for code completion using Multi-Token Prediction. 🎼 Meta JASCO Generative text-to-music models capable of accepting various conditioning inputs for greater controllability. Paper available today with a pretrained model coming soon. 🗣️ Meta AudioSeal An audio watermarking model that we believe is the first designed specifically for the localized detection of AI-generated speech, available under a commercial license. 📝 Additional RAI artifacts Including research, data and code to measure and improve the representation of geographical and cultural preferences and diversity in AI systems. We believe that access to state-of-the-art AI creates opportunities for everyone – not just a small handful of Big Tech companies. We’re excited to share this work and to see how the community learns, iterates and builds using this technology. Details and access to everything released by FAIR today ➡️

AI at Meta

380,714 просмотров • 2 лет назад