merve

@mervenoyann • 88,913 subscribers

(mer-veh) open-sourceress at @huggingface 🧙🏻‍♀️ DM me for any feedback about HF 🤗 https://t.co/MhrMkGTm7p

Shorts

56,967 views

176,047 views

155,222 views

83,376 views

49,547 views

36,624 views

25,314 views

18,307 views

14,958 views

16,226 views

Videos

sweetdream.ai

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Private Show

Join now for exclusive access

Free preview available • Premium content

24,058 views • 3 days ago

40,717 views • 1 month ago

87,324 views • 3 months ago

473,157 views • 1 year ago

103,420 views • 8 months ago

46,395 views • 6 months ago

81,353 views • 1 year ago

143,830 views • 2 years ago

83,712 views • 1 year ago

58,148 views • 10 months ago

30,122 views • 5 months ago

32,923 views • 7 months ago

14,752 views • 3 months ago

50,221 views • 2 years ago

36,901 views • 1 year ago

35,905 views • 1 year ago

16,066 views • 7 months ago

19,727 views • 1 year ago

28,014 views • 1 year ago

19,267 views • 1 year ago

Live Cam

merve

Shorts

RF-DETR just landed to Hugging Face transformers 🥵🔥 sota real-time detection &amp; segmentation models by Roboflow 💜 &gt; play with our real-time demo &gt; fine-tune the models on your use case with our tutorials (takes a toaster's VRAM) &gt; or just hand them to your agents 😄

this is the BEST vision language model I have ever tried! Aria is a new model by Rhymes.AI: a 25.3B multimodal model that can take image/video inputs 🤩 They release the model with Apache-2.0 license and fine-tuning scripts as well 👏 I tested it extensively, keep reading to learn more 🧶

Real-time DEtection Transformer (RT-DETR) landed in Hugging Face transformers 🤩 with Apache 2.0 license 😍 do DETRs Beat YOLOs on Real-time Object Detection? keep reading 👀

ViTPose -- best open-source pose estimation model just landed to Hugging Face transformers 🕺🏻💃🏻 See how to use on the next one ⤵️

Meta released LongVU: a new video LM that can handle long videos (great performance, battle-tested by me ⚔) TLDR; 1️⃣ downsample using DINOv2 to eliminate redundant scenes 🦖 2️⃣ fuse rest of the features using DINOv2 and SigLIP 3️⃣ select some tokens, pass to Qwen2/Llama-3.2-3B

OlmOCR is a new drop by Ai2 to parse any PDF 📝🤝 I have fed one of my old master's notes and it did a great job 💗 It is based on Qwen2VL-7B and works out of the box with transformers, has Apache 2.0 license 🔥

New InternVL drop with a sota 78B model with MIT license 🔥 The release comes with seven new vision LMs based on InternViT 300M/6B and Qwen2.5 and InternLM2 in different sizes ✨ 78B model is of InternViT 6B and Qwen2.5-72B Instruct, can accomplish variety of tasks 👏

Spaces at Hugging Face is the app store of AI 📱 it's also the MCP store now 🤠 filter thousands of MCPs you can attach to your LLM 🤗

many parts of Hugging Face Hub is actually powered by open machine learning models 🥹 translation feature is one of them, it uses a very tiny (600M) multilingual translation model by AI at Meta 💗

Aya by Cohere For AI can now see! 👀 C4AI community has built Maya 8B, a new open-source multilingual VLM built on SigLIP and Aya 8B 🌱 works on 8 languages! 🗣️ The authors extend Llava dataset using Aya's translation capabilities with 558k examples! works very well ⬇️

Videos

Watch Anya Live

Inkling Thinking Machines 1-bit quant by Unsloth AI running 30-40 TPS 🤯 video sped up, feel free to jump at the end to check outputs also llama.cpp webui is 😍 has reasoning slider, html preview, mcp and multimodal support

DiffusionGemma is out 🔥 it's compute-bound so 4x faster compared to other Gemma-4 models (1k tok/s on H100) 💨 also great on coding, generate and iterate on any code from 3D generation to front-end ⤵️

new open-source Bonsai models are out 🔥 &gt; ternary weights in 8B (1.75 GB), 4B (0.86 GB), and 1.7B (0.37 GB) &gt; comes in MLX, ONNX weights and WebGPU browser demo 😍 &gt; a2.0 licensed 👏

Microsoft released a groundbreaking model that can be used for web automation, with MIT license 🔥👏 OmniParser is a state-of-the-art UI parsing/understanding model that outperforms GPT4V in parsing. 👏

SAM3 is so good even cat playing didgeridoo is within distribution 🤯😂

real-time vision in your browser 🔥 try out YOLO26 for pose estimation and detection built on WebGPU ⚡️

Google released MedGemma on I/O'25 👏 &gt; 4B and 27B instruction fine-tuned vision LMs and a 4B pre-trained vision LM for medicine &gt; available with transformers from the get-go 🤗 they also released a cool demo for scan reading ⤵️

New 🤗 transformers release includes a very powerful Multimodel Large Language Model (MLLM) by Microsoft called KOSMOS-2! 🤩 The highlight of KOSMOS-2 is grounding, the model is *incredibly* accurate! 🌎 Play with the demo here 👉 But how does this model work? Let's take a look! 👀🧶

Google just released PaliGemma 2 Mix: new versatile instruction vision language models 🔥 &gt; Three new models: 3B, 10B, 28B with res 224, 448 💙 &gt; Can do vision language tasks with open-ended prompts, understand documents, and segment or detect anything 🤯

MatchAnything is an insane framework authors have tried to get every view that they can and dump them to modern keypoint matching models for instance you can match iphone map view to google aerial view, thermal camera views to day view even if the images are warped!

agents are now running Hugging Face dataset viewers 🤯

Lyria by Google DeepMind is next level it did average Turkish pop song very well all my Turkish friends in the conference hall was summoned lol

Chameleon 🦎 by Meta is now available in Hugging Face transformers 😍 A multimodal model that comes in 7B and 34B sizes 🤩 But what makes this model so special? keep reading ⇣

JUST IN: Inference Providers at Hugging Face Hub Use fal SambaNova Together AI Replicate on Hugging Face to infer gigantic models like DeepSeek R1 from model page or through our client SDKs (Python, JS)/HTTP calls

run AI agents in one line of CLI command 🤯 you can run tool-calling AI agents or web automation agents through CLI in smolagents get started as easily as $ webagent {prompt} in CLI tool-calling agent and more details on the next one ⬇️

icymi Hugging Face dropped a computer use agent last week 🔥 built on various Qwen3-VL models &amp; E2B sandbox, ask the app to do anything 🙌🏻 it exposes each thinking step to you, try different models with a neat UI🤩

y'all know that Hugging Face Spaces is the app store of AI what you don't know is all these apps are MCP Servers thanks to Gradio MCP server 😮 plug it to your favorite provider 🤠 insanely powerful!

Idefics3-Llama is out! 💥 It's a multimodal model based on Llama 3.1 that accepts arbitrary number of interleaved images with text with a huge context window (10k tokens!) 😍 Link to demo and model in the next one 😏

start agents with giants like Llama 4, with only one line of code 🔥 Hugging Face Inference Providers 🤝 smolagents

RF-DETR just landed to Hugging Face transformers 🥵🔥 sota real-time detection & segmentation models by Roboflow 💜 > play with our real-time demo > fine-tune the models on your use case with our tutorials (takes a toaster's VRAM) > or just hand them to your agents 😄

new open-source Bonsai models are out 🔥 > ternary weights in 8B (1.75 GB), 4B (0.86 GB), and 1.7B (0.37 GB) > comes in MLX, ONNX weights and WebGPU browser demo 😍 > a2.0 licensed 👏

Google released MedGemma on I/O'25 👏 > 4B and 27B instruction fine-tuned vision LMs and a 4B pre-trained vision LM for medicine > available with transformers from the get-go 🤗 they also released a cool demo for scan reading ⤵️

New 🤗 transformers release includes a very powerful Multimodel Large Language Model (MLLM) by Microsoft called KOSMOS-2! 🤩 The highlight of KOSMOS-2 is grounding, the model is incredibly accurate! 🌎 Play with the demo here 👉 But how does this model work? Let's take a look! 👀🧶

Google just released PaliGemma 2 Mix: new versatile instruction vision language models 🔥 > Three new models: 3B, 10B, 28B with res 224, 448 💙 > Can do vision language tasks with open-ended prompts, understand documents, and segment or detect anything 🤯

icymi Hugging Face dropped a computer use agent last week 🔥 built on various Qwen3-VL models & E2B sandbox, ask the app to do anything 🙌🏻 it exposes each thinking step to you, try different models with a neat UI🤩