正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

MVDiffusion: How to take a pre-trained text2image model for a perspective view (e.g., Stable Diffusion) and retrain to generate multiple consistent views (e.g., a panorama). Project site: Hugging Face demo: Code out in a month.

Yasutaka Furukawa

2,421 subscribers

33,418 次观看 • 3 年前 •via X (Twitter)

教育科学技术

Anya Rossi• Live Now

Private livecam show

8 条评论

Yasutaka Furukawa 的头像

Yasutaka Furukawa3 年前

I am very sorry. Typo fixing. Had to delete the old one and retweet.

Yasutaka Furukawa 的头像

Yasutaka Furukawa3 年前

I had to delete and repost. Highly appreciate it if you could reshare/retweet this post

Ziyu Wan 的头像

Ziyu Wan3 年前

Awesome work!!!!! I wonder where we could find the paper😊

Yasutaka Furukawa 的头像

Yasutaka Furukawa3 年前

Thank you Ziyu. Did not realize that we haven't uploaded to arxiv yet... Asking students to upload in a day.

Jake Harrison 的头像

Jake Harrison3 年前

I would add to my newsletter

Philipp Tsipman 的头像

Philipp Tsipman3 年前

🔥

Asriel H 的头像

Asriel H3 年前

does it support image prompt as input or only text prompt?

Yasutaka Furukawa 的头像

Yasutaka Furukawa3 年前

Only text for this work. But it seems trivial to add image-prompt capability.

相关视频

Stable Diffusion generates beautiful images, but can it be used for open-world recognition? Try Demo! Our #CVPR2023 paper shows that the pre-trained diffusion model indeed is a good image parser, allows for open-vocabulary segmentation and detection.

Stable Diffusion generates beautiful images, but can it be used for open-world recognition? Try Demo! Our #CVPR2023 paper shows that the pre-trained diffusion model indeed is a good image parser, allows for open-vocabulary segmentation and detection.

Xiaolong Wang

241,225 次观看 • 3 年前

VideoScene announced on Hugging Face Distilling Video Diffusion Model to Generate 3D Scenes in One Step

VideoScene announced on Hugging Face Distilling Video Diffusion Model to Generate 3D Scenes in One Step

AK

21,301 次观看 • 1 年前

DiffPortrait360 just dropped on Hugging Face Consistent Portrait Diffusion for 360 View Synthesis

DiffPortrait360 just dropped on Hugging Face Consistent Portrait Diffusion for 360 View Synthesis

AK

57,593 次观看 • 1 年前

MusicHiFi Fast High-Fidelity Stereo Vocoding Diffusion-based audio and music generation models commonly generate music by constructing an image representation of audio (e.g., a mel-spectrogram) and then converting it to audio using a phase reconstruction model or vocoder.

MusicHiFi Fast High-Fidelity Stereo Vocoding Diffusion-based audio and music generation models commonly generate music by constructing an image representation of audio (e.g., a mel-spectrogram) and then converting it to audio using a phase reconstruction model or vocoder.

AK

27,285 次观看 • 2 年前

New work with Alec Radford and David Duvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:

New work with Alec Radford and David Duvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:

Nick Levine

1,195,022 次观看 • 1 个月前

MVDream: Multi-view Diffusion for 3D Generation paper page: propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data. Such a model can thus be applied as a multi-view prior for 3D generation via Score Distillation Sampling, where it greatly improves the stability of existing 2D-lifting methods by solving the 3D consistency problem. Finally, we show that the multi-view diffusion model can also be fine-tuned under a few shot setting for personalized 3D generation, i.e. DreamBooth3D application, where the consistency can be maintained after learning the subject identity.

MVDream: Multi-view Diffusion for 3D Generation paper page: propose MVDream, a multi-view diffusion model that is able to generate geometrically consistent multi-view images from a given text prompt. By leveraging image diffusion models pre-trained on large-scale web datasets and a multi-view dataset rendered from 3D assets, the resulting multi-view diffusion model can achieve both the generalizability of 2D diffusion and the consistency of 3D data. Such a model can thus be applied as a multi-view prior for 3D generation via Score Distillation Sampling, where it greatly improves the stability of existing 2D-lifting methods by solving the 3D consistency problem. Finally, we show that the multi-view diffusion model can also be fine-tuned under a few shot setting for personalized 3D generation, i.e. DreamBooth3D application, where the consistency can be maintained after learning the subject identity.

AK

294,442 次观看 • 2 年前

Introducing darkspark, a gui for your neural network. It traces your pytorch code and brings up a visual representation for you to interact with. We have a hosted gallery of popular model architectures pre-traced and ready to explore. Here’s stable-diffusion-v1.5

Introducing darkspark, a gui for your neural network. It traces your pytorch code and brings up a visual representation for you to interact with. We have a hosted gallery of popular model architectures pre-traced and ready to explore. Here’s stable-diffusion-v1.5

Rudy Gilman

16,740 次观看 • 1 年前

Stability AI just dropped Stable Virtual Camera on Hugging Face a generalist diffusion model designed to address the exciting challenge of Novel View Synthesis (NVS). With just one or a few images, it allows you to create a smooth trajectory video from any viewpoint you desire.

Stability AI just dropped Stable Virtual Camera on Hugging Face a generalist diffusion model designed to address the exciting challenge of Novel View Synthesis (NVS). With just one or a few images, it allows you to create a smooth trajectory video from any viewpoint you desire.

AK

133,583 次观看 • 1 年前

try out the Gradio Demo for AudioLDM: Text-to-Audio Generation with Latent Diffusion Models on Hugging Face demo:

try out the Gradio Demo for AudioLDM: Text-to-Audio Generation with Latent Diffusion Models on Hugging Face demo:

AK

82,137 次观看 • 3 年前

Introducing Texel Splatting: Perspective-Stable 3D Pixel Art open source paper+code Most 3D pixel art techniques (e.g. t3ssel8r, ProPixelizer) snap pixels to a screen grid, which only works with an orthographic camera Texel splatting solves this for perspective cameras: first,

Introducing Texel Splatting: Perspective-Stable 3D Pixel Art open source paper+code Most 3D pixel art techniques (e.g. t3ssel8r, ProPixelizer) snap pixels to a screen grid, which only works with an orthographic camera Texel splatting solves this for perspective cameras: first,

dylan

111,285 次观看 • 3 个月前

Nvidia presents ConsiStory Training-Free Consistent Text-to-Image Generation paper page: enable Stable Diffusion XL (SDXL) to generate consistent subjects across a series of images, without additional training.

Nvidia presents ConsiStory Training-Free Consistent Text-to-Image Generation paper page: enable Stable Diffusion XL (SDXL) to generate consistent subjects across a series of images, without additional training.

AK

161,685 次观看 • 2 年前

New short course: Building Code Agents with Hugging Face smolagents! Learn how to build code agents in this course, created in collaboration with Hugging Face, and taught by Thomas Wolf, its co-founder and CSO, and m_ric, Hugging Face’s Project Lead on Agents. Tool-calling agents use LLMs to generate multiple function calls sequentially to complete a complex sequence of tasks. They generate one function call, execute it, observe, reason, and decide what to do next. Code agents take a different approach. They consolidate all these calls into a single block of code, letting the LLM lay out an entire action plan at once, which can be executed efficiently to provide more reliable results. You’ll learn how to code agents using smolagents, a lightweight agentic framework from Hugging Face. Along the way, you’ll learn how to run LLM-generated code safely and develop an evaluation system to optimize your code agent for production. In detail, you’ll learn: - How agentic systems have evolved, gaining greater levels of agency over time—and why code agents are a next step. - How code agents write their actions in code. - When code agents outperform function-calling agents. - How to run code agents safely in your system using a constrained Python interpreter and sandboxing using E2B. - To trace, debug, and assess the code agent to optimize its behaviours for complex requests. - How to build a research multi-agent system that can find information online and organize it into an interactive report. By the end of this course, you’ll know how to build and run code agents using smolagents, and deploy them safely with a structured evaluation system in your projects. Please sign up here!

New short course: Building Code Agents with Hugging Face smolagents! Learn how to build code agents in this course, created in collaboration with Hugging Face, and taught by Thomas Wolf, its co-founder and CSO, and m_ric, Hugging Face’s Project Lead on Agents. Tool-calling agents use LLMs to generate multiple function calls sequentially to complete a complex sequence of tasks. They generate one function call, execute it, observe, reason, and decide what to do next. Code agents take a different approach. They consolidate all these calls into a single block of code, letting the LLM lay out an entire action plan at once, which can be executed efficiently to provide more reliable results. You’ll learn how to code agents using smolagents, a lightweight agentic framework from Hugging Face. Along the way, you’ll learn how to run LLM-generated code safely and develop an evaluation system to optimize your code agent for production. In detail, you’ll learn: - How agentic systems have evolved, gaining greater levels of agency over time—and why code agents are a next step. - How code agents write their actions in code. - When code agents outperform function-calling agents. - How to run code agents safely in your system using a constrained Python interpreter and sandboxing using E2B. - To trace, debug, and assess the code agent to optimize its behaviours for complex requests. - How to build a research multi-agent system that can find information online and organize it into an interactive report. By the end of this course, you’ll know how to build and run code agents using smolagents, and deploy them safely with a structured evaluation system in your projects. Please sign up here!

Andrew Ng

124,382 次观看 • 1 年前

We created a series of simplified notebooks that cover essential aspects of Stable Diffusion, using the vanilla Stable Diffusion 2.1 base to utilise it as a face-editing model for building your own face app 🧵(1/3) Github :

We created a series of simplified notebooks that cover essential aspects of Stable Diffusion, using the vanilla Stable Diffusion 2.1 base to utilise it as a face-editing model for building your own face app 🧵(1/3) Github :

OutofAi

45,856 次观看 • 2 年前

Check out CAT3D! Image(s)-to-3D in 1 minute! Given any number of real or generated images, CAT3D uses a multi-view diffusion prior to create consistent novel views. These views are used to reconstruct a 3D scene using NeRF/3DGS.

Check out CAT3D! Image(s)-to-3D in 1 minute! Given any number of real or generated images, CAT3D uses a multi-view diffusion prior to create consistent novel views. These views are used to reconstruct a 3D scene using NeRF/3DGS.

Philipp Henzler

12,737 次观看 • 2 年前

Excited to share our #CVPR2023 on synthesizing new views along a camera trajectory from a **single image**! How? 💡 The good old epipolar constraints in a pose-guided diffusion model! Paper: Project:

Excited to share our #CVPR2023 on synthesizing new views along a camera trajectory from a single image! How? 💡 The good old epipolar constraints in a pose-guided diffusion model! Paper: Project:

Jia-Bin Huang

94,196 次观看 • 3 年前

We’re excited to release ACE-Step / ACE-Step-v1-3.5B, a fast, versatile DiT-based foundation model for music generation that runs on consumer-grade GPUs. With its simple architecture and low hardware requirements, it’s easy to fine-tune for various music tasks, empowering, not replacing, artists and creators. Think of it as a step toward music’s Stable Diffusion moment. ※ Trained on authorized, purchased data. Demo Page: Hugging Face: Git repo:

We’re excited to release ACE-Step / ACE-Step-v1-3.5B, a fast, versatile DiT-based foundation model for music generation that runs on consumer-grade GPUs. With its simple architecture and low hardware requirements, it’s easy to fine-tune for various music tasks, empowering, not replacing, artists and creators. Think of it as a step toward music’s Stable Diffusion moment. ※ Trained on authorized, purchased data. Demo Page: Hugging Face: Git repo:

ACE Studio

112,342 次观看 • 1 年前

PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models Gradio demo is out on Hugging Face Spaces demo:

PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models Gradio demo is out on Hugging Face Spaces demo:

AK

87,679 次观看 • 3 年前

[1/N] We present a plug-and-play mechanism to controllably steer inference of any diffusion/flow model towards a sharper or flatter sampling distribution, resulting in improvements across domains e.g. text-to-image (10% FID reduction), protein generation (improved designability).

[1/N] We present a plug-and-play mechanism to controllably steer inference of any diffusion/flow model towards a sharper or flatter sampling distribution, resulting in improvements across domains e.g. text-to-image (10% FID reduction), protein generation (improved designability).

Shubham Tulsiani

60,777 次观看 • 8 个月前

Speed and quality can finally coexist in diffusion-based language generation. Introducing DiDi-Instruct, a Discrete Diffusion Divergence Instruct method that distills a pre-trained discrete diffusion language model (dLLM) into a few-step student for ultra-fast generation. Built on integral KL-divergence minimization, DiDi-Instruct achieves up to 64× faster decoding, surpasses both its teacher and GPT-2, and cuts training time by 20×. Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct Paper: Code: Project: Our report: 📬 #PapersAccepted by Jiqizhixin

Speed and quality can finally coexist in diffusion-based language generation. Introducing DiDi-Instruct, a Discrete Diffusion Divergence Instruct method that distills a pre-trained discrete diffusion language model (dLLM) into a few-step student for ultra-fast generation. Built on integral KL-divergence minimization, DiDi-Instruct achieves up to 64× faster decoding, surpasses both its teacher and GPT-2, and cuts training time by 20×. Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct Paper: Code: Project: Our report: 📬 #PapersAccepted by Jiqizhixin

机器之心 JIQIZHIXIN

18,126 次观看 • 7 个月前