Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

Lossfunk

17,288 subscribers

1,261,523 просмотров • 3 месяцев назад •via X (Twitter)

Образование Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

I gave a talk at ICLR 2026 about how we are scaling RL on frontier LLMs with 1T+ parameters, on experimental data from our physical lab at Periodic! Here's a rough recording of the talk:

I gave a talk at ICLR 2026 about how we are scaling RL on frontier LLMs with 1T+ parameters, on experimental data from our physical lab at Periodic! Here's a rough recording of the talk:

Rishabh Agarwal

218,988 просмотров • 2 месяцев назад

🚨 New Paper Training an LLM to speak low-resource language (EACL workshop, 2026) Tulu is spoken by 2M+ people in coastal Karnataka and LLMs basically can't speak it. We got to 85% grammar accuracy without fine-tuning anything or collecting a single new training example.

🚨 New Paper Training an LLM to speak low-resource language (EACL workshop, 2026) Tulu is spoken by 2M+ people in coastal Karnataka and LLMs basically can't speak it. We got to 85% grammar accuracy without fine-tuning anything or collecting a single new training example.

Lossfunk

120,797 просмотров • 3 месяцев назад

🚨 New Research LLMs are trained only on text... Yet their internal representations progressively organize in ways that resemble human perceptual geometry across different domains (like color, pitch, emotion and taste), with the structures peaking in intermediate layers before attenuating in deeper representations. 🥳 Accepted at ICML Mechanistic Interpretability Workshop 2026

🚨 New Research LLMs are trained only on text... Yet their internal representations progressively organize in ways that resemble human perceptual geometry across different domains (like color, pitch, emotion and taste), with the structures peaking in intermediate layers before attenuating in deeper representations. 🥳 Accepted at ICML Mechanistic Interpretability Workshop 2026

Lossfunk

61,265 просмотров • 4 дней назад

Musicians of Khalistan 🧵 We gave them audience. We gave them fame. We gave them a name. But They They gave us threats. They gave us social media khalistani sermons of human rights. They desecrated the "Tiranga" and supported those who did it openly. India and Indians need to boycott this Khalistani Cartel.

Musicians of Khalistan 🧵 We gave them audience. We gave them fame. We gave them a name. But They They gave us threats. They gave us social media khalistani sermons of human rights. They desecrated the "Tiranga" and supported those who did it openly. India and Indians need to boycott this Khalistani Cartel.

The Story Teller

186,068 просмотров • 1 год назад

Sam Altman on Open Source AI's Future⁣ ⁣ "Right now, what everybody wants is just the most capable frontier coding model they can have. The big frontier models, even if we made them open source, are hard to run." — #sama

Sam Altman on Open Source AI's Future⁣ ⁣ "Right now, what everybody wants is just the most capable frontier coding model they can have. The big frontier models, even if we made them open source, are hard to run." — #sama

AI Insights

19,607 просмотров • 25 дней назад

Hiring RL Engineer! Started off as a curious project at Lossfunk to push the boundaries of LLMs in social reasoning - we are now building RL environments, data, and benchmarks to simulate more real-world scenarios. If you want to train SoTA RL models over multi-GPUs (H200s/B200s) to unlock next AI frontier, this is for you.

Hiring RL Engineer! Started off as a curious project at Lossfunk to push the boundaries of LLMs in social reasoning - we are now building RL environments, data, and benchmarks to simulate more real-world scenarios. If you want to train SoTA RL models over multi-GPUs (H200s/B200s) to unlock next AI frontier, this is for you.

Satpal Singh Rathore

45,915 просмотров • 10 месяцев назад

RouteLLM - Use The Best LLM For Your Query We have updated RouteLLM to route to all the SOTA LLMs o1 for hard problems Sonnet 3.5 for standard coding GPT-4o for essays Gemini Flash 2.0 for big context Mix and match to use the best LLM for your query on ChatLLM.

RouteLLM - Use The Best LLM For Your Query We have updated RouteLLM to route to all the SOTA LLMs o1 for hard problems Sonnet 3.5 for standard coding GPT-4o for essays Gemini Flash 2.0 for big context Mix and match to use the best LLM for your query on ChatLLM.

Bindu Reddy

17,230 просмотров • 1 год назад

85% to 39% 9% to 5% we will gwaza them until they get their deserving 0%

85% to 39% 9% to 5% we will gwaza them until they get their deserving 0%

Raisibe

22,580 просмотров • 1 год назад

we all know LLMs love markdown i have built a tool that lets you convert any web page to markdown npm i web-to-markdown i ran some benchmarks and the results are shocking at how many tokens you can save🤯

we all know LLMs love markdown i have built a tool that lets you convert any web page to markdown npm i web-to-markdown i ran some benchmarks and the results are shocking at how many tokens you can save🤯

Nidhi Singh

30,741 просмотров • 4 месяцев назад

On Elections in Hungary: “I'm sure that it would be really upsetting to the gentle Congresswoman; they have one-day election paper votes. And they count there, right there, in small precincts, I guess is what we call them. So, they do have as close to fair elections as possible, not using the machines, and all of the problems we have they don't have.”

On Elections in Hungary: “I'm sure that it would be really upsetting to the gentle Congresswoman; they have one-day election paper votes. And they count there, right there, in small precincts, I guess is what we call them. So, they do have as close to fair elections as possible, not using the machines, and all of the problems we have they don't have.”

Jordan Conradson🇺🇸

11,079 просмотров • 7 месяцев назад

🚨 𝗥𝗘𝗔𝗖𝗧𝗜𝗢𝗡 🚨 "It's down to me now to try and find a solution where we can create more chances and give them players more opportunities to put the ball in the net." Derby County failed to score at home for the seventh consecutive game, as they were beaten 1-0 by Millwall today. Head coach John Eustace gave his thoughts following the loss. Hear more reaction on the Rams Daily podcast 👇 #DCFC

🚨 𝗥𝗘𝗔𝗖𝗧𝗜𝗢𝗡 🚨 "It's down to me now to try and find a solution where we can create more chances and give them players more opportunities to put the ball in the net." Derby County failed to score at home for the seventh consecutive game, as they were beaten 1-0 by Millwall today. Head coach John Eustace gave his thoughts following the loss. Hear more reaction on the Rams Daily podcast 👇 #DCFC

BBC Sport Derby

22,019 просмотров • 1 год назад

Overview of our recent launch of Coding Agent benchmarks on Artificial Analysis and our first Youtube Video! We walk through the performance, cost, token usage and speed differences across different coding agents. This includes looking at Opus 4.7 in Claude Code's leading performance and Composer 2.5's strong positioning on the Coding Agent Index / Cost Pareto frontier. We have also launched our YouTube channel! Come say hi and subscribe:

Overview of our recent launch of Coding Agent benchmarks on Artificial Analysis and our first Youtube Video! We walk through the performance, cost, token usage and speed differences across different coding agents. This includes looking at Opus 4.7 in Claude Code's leading performance and Composer 2.5's strong positioning on the Coding Agent Index / Cost Pareto frontier. We have also launched our YouTube channel! Come say hi and subscribe:

Artificial Analysis

10,533 просмотров • 1 месяц назад

Correspondent claims: "Hamas is rearming itself as a police force, shooting rivals. What's the message to Hamas? "Because they want to stop the problems. They have been open about it. And we gave them an approval for a period of time..."

Correspondent claims: "Hamas is rearming itself as a police force, shooting rivals. What's the message to Hamas? "Because they want to stop the problems. They have been open about it. And we gave them an approval for a period of time..."

Abubaker Abed

3,283,556 просмотров • 8 месяцев назад

New Benchtalks with John Yang: on ProgramBench (0% frontier models at launch) and the lineage/future of coding benchmarks, from SWE-bench/InterCode to now 01:29 ProgramBench launch and reception 03:41 Why artifact-level evaluation, not code-level 06:03 Why models love Python 08:29 ProgramBench as a research tool 12:45 From SWE-bench & InterCode to ProgramBench 17:47 How to grade a coding model 21:53 The position paper & humans in the loop 25:01 Managing quality with agents-in-the-loop 28:40 Internet access and benchmark integrity 35:26 Where models may surpass human abilities 38:56 When a model hits 80% on ProgramBench 43:55 Benchmarks worth paying attention to 46:24 What benchmark do you wish existed 49:32 Will benchmarks still look like benchmarks in 5 years 52:02 How to contribute to ProgramBench

New Benchtalks with John Yang: on ProgramBench (0% frontier models at launch) and the lineage/future of coding benchmarks, from SWE-bench/InterCode to now 01:29 ProgramBench launch and reception 03:41 Why artifact-level evaluation, not code-level 06:03 Why models love Python 08:29 ProgramBench as a research tool 12:45 From SWE-bench & InterCode to ProgramBench 17:47 How to grade a coding model 21:53 The position paper & humans in the loop 25:01 Managing quality with agents-in-the-loop 28:40 Internet access and benchmark integrity 35:26 Where models may surpass human abilities 38:56 When a model hits 80% on ProgramBench 43:55 Benchmarks worth paying attention to 46:24 What benchmark do you wish existed 49:32 Will benchmarks still look like benchmarks in 5 years 52:02 How to contribute to ProgramBench

vincent sunn chen

26,458 просмотров • 1 месяц назад

Eric Swalwell: "Republicans are going to have to answer for this. Right now, they think they are invincible and they don't think we have the balls to hold them accountable. They look at the way we acted in the past, where we didn't flex when we had power."

Eric Swalwell: "Republicans are going to have to answer for this. Right now, they think they are invincible and they don't think we have the balls to hold them accountable. They look at the way we acted in the past, where we didn't flex when we had power."

TheBlaze

64,675 просмотров • 7 месяцев назад

🚨 JUST IN: Trump invites all the girls to gather around him for the banning men in women's sports executive order signing. They will never forget this moment. "Secret Service is worried about THEM? If we have to worry about them, we have big problems."

🚨 JUST IN: Trump invites all the girls to gather around him for the banning men in women's sports executive order signing. They will never forget this moment. "Secret Service is worried about THEM? If we have to worry about them, we have big problems."

Eric Daugherty

440,980 просмотров • 1 год назад

Yann LeCun says LLMs are strongest in domains where language itself is the substrate of reasoning, like math and code They can solve problems, prove theorems, and write programs — but they are not creative mathematicians, software architects, or computer scientists "their role is to help humans build"

Yann LeCun says LLMs are strongest in domains where language itself is the substrate of reasoning, like math and code They can solve problems, prove theorems, and write programs — but they are not creative mathematicians, software architects, or computer scientists "their role is to help humans build"

Haider.

347,220 просмотров • 1 месяц назад

🚨🇺🇸 President Trump on Kurds and his order to attack Iran: “I'm very disappointed in the Kurds, we gave them weapons to deliver inside Iran, but they kept them instead. The Kurds take, take, take.”

🚨🇺🇸 President Trump on Kurds and his order to attack Iran: “I'm very disappointed in the Kurds, we gave them weapons to deliver inside Iran, but they kept them instead. The Kurds take, take, take.”

The Saviour

20,107 просмотров • 1 месяц назад

#WATCH | US President Donald Trump says, "We have the strongest military in the world. And I gave them (Iran) a break at the request of Pakistan... We stopped them from going to war with India. You would have had a nuclear war if it weren't for me. But they became friendly to me. They're close to Iran, and they work, and they still are working on trying to get them to do what's right. But we want a deal that's meaningful. We want a deal that works..." (Source: YouTube/White House)

#WATCH | US President Donald Trump says, "We have the strongest military in the world. And I gave them (Iran) a break at the request of Pakistan... We stopped them from going to war with India. You would have had a nuclear war if it weren't for me. But they became friendly to me. They're close to Iran, and they work, and they still are working on trying to get them to do what's right. But we want a deal that's meaningful. We want a deal that works..." (Source: YouTube/White House)

ANI

186,753 просмотров • 23 дней назад