正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

We’re open sourcing the first document OCR benchmark for the agentic era, ParseBench. Document parsing is the foundation of every AI agent that works with real-world files. ParseBench is a benchmark that measures parsing quality specifically for agent knowledge work: ✅ It optimizes for semantic correctness (instead of exact... similarity) ✅ It has the most comprehensive distribution of real-world enterprise documents It contains ~2,000 human-verified enterprise document pages with 167,000+ test rules across five dimensions that matter most: tables, charts, content faithfulness, semantic formatting, and visual grounding. We benchmarked 14 known document parsers on ParseBench, from frontier/OSS VLMs to specialized parsers to LlamaParse. Here are some of our findings: 💡 Increasing compute budget yields diminishing returns - Gemini/gpt-5-mini/haiku gain 3-5 points from minimal to high thinking, at 4x the cost. 💡 Charts are the most polarizing dimension for evaluation. Most specialized parsers score below 6%, while some VLM-based parsers do a bit better. 💡 VLMs are great at visual understanding but terrible at layout extraction. GPT-5-mini/haiku score below 10% on our visual grounding task, all specialized parsers do much better. 💡 No method crushes all 5 dimensions at once, but LlamaParse achieves the highest overall score at 84.9%, and is the leader in 4 out of the 5 dimensions. This is by far the deepest technical work that we’ve published as a company. I would encourage you to start with our blog and explore our links to Hugging Face to GitHub. All the details are in our full 35-page (!!) ArXiv whitepaper. 🌐: Blog: 📄 Paper: 💻 Code: 📊 Dataset: 🎥 YouTube:show more

Jerry Liu

79,461 subscribers

108,011 次观看 • 3 个月前 •via X (Twitter)

教育新闻政治科学技术

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

Document OCR benchmarks are still an open problem Existing document OCR benchmarks are either too narrowly focused on a specific type (e.g. FinTabNet, ChartQA), or on documents that aren’t reflective of real-world tasks (e.g. OmniDocBench, OlmOCR-bench on over academic papers) ParseBench is a step towards solving this problem. * It tries to comprehensively cover real-world document distributions within the enterprise. * It contains comprehensive evaluations across 5 different dimensions (tables, charts, content faithfulness, formatting, grounding). * It tries to use metrics that optimize for agent semantic understanding rather than structural similarity. We released this yesterday, and there’s a TON of content: 1. Whitepaper 2. HF dataset 3. Github repo 4. Blog 5. Video And today, we’re excited to feature our home page website for ParseBench 💫 come check it out! Take a look at some of our other materials if you’re interested: Blog: Paper:

Document OCR benchmarks are still an open problem Existing document OCR benchmarks are either too narrowly focused on a specific type (e.g. FinTabNet, ChartQA), or on documents that aren’t reflective of real-world tasks (e.g. OmniDocBench, OlmOCR-bench on over academic papers) ParseBench is a step towards solving this problem. * It tries to comprehensively cover real-world document distributions within the enterprise. * It contains comprehensive evaluations across 5 different dimensions (tables, charts, content faithfulness, formatting, grounding). * It tries to use metrics that optimize for agent semantic understanding rather than structural similarity. We released this yesterday, and there’s a TON of content: 1. Whitepaper 2. HF dataset 3. Github repo 4. Blog 5. Video And today, we’re excited to feature our home page website for ParseBench 💫 come check it out! Take a look at some of our other materials if you’re interested: Blog: Paper:

Jerry Liu

21,657 次观看 • 3 个月前

We’re excited to officially launch LlamaParse, the first genAI-native document parsing solution. Not only is it better at parsing out images/tables/charts 📊📈 than virtually every other parser, it is now steerable through natural language instructions - output the document in whatever format you desire! It is also the only parsing solution that seamlessly allows you to build accurate RAG over complex documents, free of hallucinations 🔥 We launched it in private preview a few weeks ago and hit 2k users, 1M total PDF pages parsed. And now it’s better than ever. LlamaParse contains the following killer features: ✅ SOTA table/chart extraction ✅ Seamless integration with LlamaIndex 🦙 advanced RAG/agents ✅✨ Natural language Parsing Instructions ✅✨JSON mode and image extraction ✅✨Support for ~10 document types (.pdf, .pptx, .docx, .xml) and more Our pricing is simple: 1k free per day, and additional pages at 0.3c a page, or $3 for 1k pages. If you want advanced document RAG and/or private deployments, come get in touch with us to chat about LlamaCloud. Check out our full blog post here: LlamaParse client repo: Signup at 🦙☁️: Come talk to us:

We’re excited to officially launch LlamaParse, the first genAI-native document parsing solution. Not only is it better at parsing out images/tables/charts 📊📈 than virtually every other parser, it is now steerable through natural language instructions - output the document in whatever format you desire! It is also the only parsing solution that seamlessly allows you to build accurate RAG over complex documents, free of hallucinations 🔥 We launched it in private preview a few weeks ago and hit 2k users, 1M total PDF pages parsed. And now it’s better than ever. LlamaParse contains the following killer features: ✅ SOTA table/chart extraction ✅ Seamless integration with LlamaIndex 🦙 advanced RAG/agents ✅✨ Natural language Parsing Instructions ✅✨JSON mode and image extraction ✅✨Support for ~10 document types (.pdf, .pptx, .docx, .xml) and more Our pricing is simple: 1k free per day, and additional pages at 0.3c a page, or $3 for 1k pages. If you want advanced document RAG and/or private deployments, come get in touch with us to chat about LlamaCloud. Check out our full blog post here: LlamaParse client repo: Signup at 🦙☁️: Come talk to us:

LlamaIndex 🦙

143,136 次观看 • 2 年前

We built an AI agent that lets you vibe-code document extraction - high accuracy and citations over the most complex documents. Our latest release lets you upload documents as context. All you then have to do is describe what you want extracted in natural language. 💡 Our agent will then read the document with file tools to infer the right schema, validation rules, and other pre/postprocessing logic. ✅ It will give you back a workflow that can extract over thousands/millions of documents at scale. You can still of course review and edit every output before approving. Stop handling paperwork manually; just upload files, describe your task, and let our agent handle the rest. Our vision for LlamaAgents is to provide the most advanced and easy-to-use way for you to orchestrate document work. Walkthrough: Check it out: If you’re interested in reducing the operational burden of document extraction (invoices, claims, onboarding forms), come talk to us!

We built an AI agent that lets you vibe-code document extraction - high accuracy and citations over the most complex documents. Our latest release lets you upload documents as context. All you then have to do is describe what you want extracted in natural language. 💡 Our agent will then read the document with file tools to infer the right schema, validation rules, and other pre/postprocessing logic. ✅ It will give you back a workflow that can extract over thousands/millions of documents at scale. You can still of course review and edit every output before approving. Stop handling paperwork manually; just upload files, describe your task, and let our agent handle the rest. Our vision for LlamaAgents is to provide the most advanced and easy-to-use way for you to orchestrate document work. Walkthrough: Check it out: If you’re interested in reducing the operational burden of document extraction (invoices, claims, onboarding forms), come talk to us!

Jerry Liu

20,857 次观看 • 5 个月前

Introducing LiteParse - the best model-free document parsing tool for AI agents 💫 ✅ It’s completely open-source and free. ✅ No GPU required, will process ~500 pages in 2 seconds on commodity hardware ✅ More accurate than PyPDF, PyMuPDF, Markdown. Also way more readable - see below for how we parse tables!! ✅ Supports 50+ file formats, from PDFs to Office docs to images ✅ Is designed to plug and play with Claude Code, OpenClaw, and any other AI agent with a one-line skills install. Supports native screenshotting capabilities. We spent years building up LlamaParse by orchestrating state-of-the-art VLMs over the most complex documents. Along the way we realized that you could get quite far on most docs through fast and cheap text parsing. Take a look at the video below. For really complex tables within PDFs, we output them in a spatial grid that’s both AI and human-interpretable. Any other free/light parser light PyPDF will destroy the representation of this table and output a sequential list. This is not a replacement for a VLM-based OCR tool (it requires 0 GPUs and doesn’t use models), but it is shocking how good it is to parse most documents. Huge shoutout to Logan Markewich and Clelia Bertelli (🦙/acc) for all the work here. Come check it out: Repo:

Introducing LiteParse - the best model-free document parsing tool for AI agents 💫 ✅ It’s completely open-source and free. ✅ No GPU required, will process ~500 pages in 2 seconds on commodity hardware ✅ More accurate than PyPDF, PyMuPDF, Markdown. Also way more readable - see below for how we parse tables!! ✅ Supports 50+ file formats, from PDFs to Office docs to images ✅ Is designed to plug and play with Claude Code, OpenClaw, and any other AI agent with a one-line skills install. Supports native screenshotting capabilities. We spent years building up LlamaParse by orchestrating state-of-the-art VLMs over the most complex documents. Along the way we realized that you could get quite far on most docs through fast and cheap text parsing. Take a look at the video below. For really complex tables within PDFs, we output them in a spatial grid that’s both AI and human-interpretable. Any other free/light parser light PyPDF will destroy the representation of this table and output a sequential list. This is not a replacement for a VLM-based OCR tool (it requires 0 GPUs and doesn’t use models), but it is shocking how good it is to parse most documents. Huge shoutout to Logan Markewich and Clelia Bertelli (🦙/acc) for all the work here. Come check it out: Repo:

Jerry Liu

256,659 次观看 • 4 个月前

Over 1 billion PDFs are created every day, but your agents still can’t read them reliably. Today we’re releasing Parse 2.0, the most accurate document parsing API in the world. Extend already processes millions of pages daily for leading AI teams like Brex, Mercury, Opendoor, Flatiron Health, and hundreds of others. Now, its even better. Parse 2.0 is SOTA quality on RealDoc-Bench, our open source benchmark that measures agent success rate on real world docs that agents actually encounter in production. We trained Parse 2.0 on 1M+ pages of the hardest documents seen in production. Here’s how it stacks up: - #1 in healthcare, real estate, logistics, and financial services - 95.7% agent Q&A accuracy on 581 docs (next best: 92%) - 0.847 F1 on layout (next best: 0.759) Give it a try today and build production-ready document agents with Extend.

Over 1 billion PDFs are created every day, but your agents still can’t read them reliably. Today we’re releasing Parse 2.0, the most accurate document parsing API in the world. Extend already processes millions of pages daily for leading AI teams like Brex, Mercury, Opendoor, Flatiron Health, and hundreds of others. Now, its even better. Parse 2.0 is SOTA quality on RealDoc-Bench, our open source benchmark that measures agent success rate on real world docs that agents actually encounter in production. We trained Parse 2.0 on 1M+ pages of the hardest documents seen in production. Here’s how it stacks up: - #1 in healthcare, real estate, logistics, and financial services - 95.7% agent Q&A accuracy on 581 docs (next best: 92%) - 0.847 F1 on layout (next best: 0.759) Give it a try today and build production-ready document agents with Extend.

Kushal Byatnal

586,167 次观看 • 2 个月前

Today, Box is announcing major new AI agent capabilities to let customers tap into the full value of their unstructured data. First, we’re announcing all new updates to the Box AI Studio to make it even easier to build AI agents that tap into your enterprise content for any job function, business process, or industry specific use case. We are also expanding our set of foundational agents that customers will be able to use to work with their enterprise content, including new features like search and research on unstructured data. Next, we’re announcing Box Extract to enable customers to use AI agents seamlessly for complex data extraction from any type of document or content. This makes it easier than ever to pull out data from contracts, invoices, research data, marketing assets, medical charts, and more. Finally, we’re introducing Box Automate, a new workflow automation solution within Box that lets you deploy AI agents across enterprise content-centric workflows. With Box Automate, you can design your business process in a simple drag and drop builder and then drop in AI agents at any step in the process. This ensures agents execute tasks at the right steps in a workflow every time. Best of all, our AI agents and workflow tools are designed to work across any system our customers work within, whether it’s leveraging pre-built integrations, Box APIs, or the new Box MCP Server. Ultimately, all of these capabilities come together to transform how companies can work with their enterprise content. Software has historically only been good at automating work that deals with structured data, which is why ERP, CRM, and HR systems have been mainstays of enterprise software for so long. The data in these systems fits neatly into a database, and the workflows are very ripe for automation. But it turns out most of the work in the world deals with unstructured data. It’s ideating through research documents, working with a client on contracts, reviewing details for a new product launch, looking at a patient’s healthcare record to make a diagnosis, working through due diligence documents for an M&A deal, and so on. For the first time ever, we can begin to bring all new insights and automation to this work with AI agents. At Box, we’re incredibly excited to be on this journey to help customers transform how they work with their most important data.

Today, Box is announcing major new AI agent capabilities to let customers tap into the full value of their unstructured data. First, we’re announcing all new updates to the Box AI Studio to make it even easier to build AI agents that tap into your enterprise content for any job function, business process, or industry specific use case. We are also expanding our set of foundational agents that customers will be able to use to work with their enterprise content, including new features like search and research on unstructured data. Next, we’re announcing Box Extract to enable customers to use AI agents seamlessly for complex data extraction from any type of document or content. This makes it easier than ever to pull out data from contracts, invoices, research data, marketing assets, medical charts, and more. Finally, we’re introducing Box Automate, a new workflow automation solution within Box that lets you deploy AI agents across enterprise content-centric workflows. With Box Automate, you can design your business process in a simple drag and drop builder and then drop in AI agents at any step in the process. This ensures agents execute tasks at the right steps in a workflow every time. Best of all, our AI agents and workflow tools are designed to work across any system our customers work within, whether it’s leveraging pre-built integrations, Box APIs, or the new Box MCP Server. Ultimately, all of these capabilities come together to transform how companies can work with their enterprise content. Software has historically only been good at automating work that deals with structured data, which is why ERP, CRM, and HR systems have been mainstays of enterprise software for so long. The data in these systems fits neatly into a database, and the workflows are very ripe for automation. But it turns out most of the work in the world deals with unstructured data. It’s ideating through research documents, working with a client on contracts, reviewing details for a new product launch, looking at a patient’s healthcare record to make a diagnosis, working through due diligence documents for an M&A deal, and so on. For the first time ever, we can begin to bring all new insights and automation to this work with AI agents. At Box, we’re incredibly excited to be on this journey to help customers transform how they work with their most important data.

Aaron Levie

91,863 次观看 • 10 个月前

The more I learn about DNA and the inner workings of our cells, the more I’m convinced that there is a Creator/Force/God. You can call it what you want but to think it’s all just “evolution”, the result of natural selection, seems absolutely absurd to me. I am not saying evolution isn’t real but I am saying that our understanding about the nature of our reality and where we come from is sorely lacking. We like to tell ourselves the “we know the truth”, but the correct answer is that we really don’t know at all. Human origins is the mystery of all mysteries and our arrogance instead of reverence for that mystery is a big reason why evil has gained such a foothold in our world in my opinion. What do you think? Evolution explains it or something else is at play? I don’t claim to know the truth but instinctually feel our explanations are fundamentally flawed. Watch this stunning animation that explains how our DNA is tightly packed up and tell me you are not left with a sense of wonder for the mysterious nature of it all.

The more I learn about DNA and the inner workings of our cells, the more I’m convinced that there is a Creator/Force/God. You can call it what you want but to think it’s all just “evolution”, the result of natural selection, seems absolutely absurd to me. I am not saying evolution isn’t real but I am saying that our understanding about the nature of our reality and where we come from is sorely lacking. We like to tell ourselves the “we know the truth”, but the correct answer is that we really don’t know at all. Human origins is the mystery of all mysteries and our arrogance instead of reverence for that mystery is a big reason why evil has gained such a foothold in our world in my opinion. What do you think? Evolution explains it or something else is at play? I don’t claim to know the truth but instinctually feel our explanations are fundamentally flawed. Watch this stunning animation that explains how our DNA is tightly packed up and tell me you are not left with a sense of wonder for the mysterious nature of it all.

Champagne Joshi

149,527 次观看 • 2 年前

The same kinds of productivity gains we've seen in coding with AI agents are heading to the rest of knowledge work. This is the jump when you go from having a chatbot to being able to actually have an agent go off and do work for minutes or even hours and come back with a complete work output that you then review. Here's an example of the new Box Agent filling out an RFP response from an existing knowledge base. This process would normally take hours to fill out, and requires the full attention of the user doing the work. Now, you provide the Box Agent with the RFP questions, and it will go off, make a plan, extract all the relevant questions, read through existing source material to come up with an answer, and then generate a new word document as the final output. All while you're doing something else. The key to this architecture is that the agent is able to use all of the same tools in the background that a user uses to get work done. The agent can search for documents, read entire files, run scripts and tools in the background, and even be able to write code on the fly to automate tasks it hasn't seen before. And best of all, the Box Agent will (soon) work from the Box MCP and CLI so you can invoke it in any agentic system as a step in a process. This kind of agent complexity would have been impossible even 6 months ago. Models consistently failed at tracking long running tasks or using the right tools at the right moment for the task. But this is all now possible because of models like GPT-5.4, Opus 4.6, and Gemini 3, and is only getting better by the month. Just as we moved from engineers writing code and using AI as an assistant to answer questions, in many areas of knowledge work -like legal, finance, consulting, sales, marketing, and more- when we have a problem we'll just kick off the AI agent to just go work on it for us in the background.

The same kinds of productivity gains we've seen in coding with AI agents are heading to the rest of knowledge work. This is the jump when you go from having a chatbot to being able to actually have an agent go off and do work for minutes or even hours and come back with a complete work output that you then review. Here's an example of the new Box Agent filling out an RFP response from an existing knowledge base. This process would normally take hours to fill out, and requires the full attention of the user doing the work. Now, you provide the Box Agent with the RFP questions, and it will go off, make a plan, extract all the relevant questions, read through existing source material to come up with an answer, and then generate a new word document as the final output. All while you're doing something else. The key to this architecture is that the agent is able to use all of the same tools in the background that a user uses to get work done. The agent can search for documents, read entire files, run scripts and tools in the background, and even be able to write code on the fly to automate tasks it hasn't seen before. And best of all, the Box Agent will (soon) work from the Box MCP and CLI so you can invoke it in any agentic system as a step in a process. This kind of agent complexity would have been impossible even 6 months ago. Models consistently failed at tracking long running tasks or using the right tools at the right moment for the task. But this is all now possible because of models like GPT-5.4, Opus 4.6, and Gemini 3, and is only getting better by the month. Just as we moved from engineers writing code and using AI as an assistant to answer questions, in many areas of knowledge work -like legal, finance, consulting, sales, marketing, and more- when we have a problem we'll just kick off the AI agent to just go work on it for us in the background.

Aaron Levie

24,618 次观看 • 3 个月前

As a historian, I can tell you that societies that allow Jews to thrive are societies in history that are flourishing themselves. Look at America. It is the center of the most influential, the wealthiest, the most powerful Jewish community that has ever existed in the world, and it is no surprise it is also the most powerful, the most influential, and the wealthiest force for good the world has ever had. We are privileged to live here. But on the other hand, societies that allow themselves to be taken over by Jew hatred are societies that are sick and dying. Look at the Russia of Kishinev in 1903, the worst pogrom of the 20th century before the Holocaust. It was the biggest country in the world at that time. It had existed for hundreds of years. 14 years later, it was gone. Look at the Germany of Kristallnacht in 1938, the most powerful army, the most powerful air force. It was supposed to be the thousand-year Reich. Just seven years later, it was dead. So not because I'm a Jew, but because I am an American who came here from Venezuela with nothing, knowing no one, and who was embraced by this community and this country with open arms, which has given me and my family every blessing and privilege under the sun, I understand that we, each of us, Jew and not Jew alike, have a moral and practical obligation to root out anti-Semitism in our society because it is the moral rot in the wooden framework of our house. If we are not careful, it will bring the entire edifice tumbling down on all of us, not just the Jews.

As a historian, I can tell you that societies that allow Jews to thrive are societies in history that are flourishing themselves. Look at America. It is the center of the most influential, the wealthiest, the most powerful Jewish community that has ever existed in the world, and it is no surprise it is also the most powerful, the most influential, and the wealthiest force for good the world has ever had. We are privileged to live here. But on the other hand, societies that allow themselves to be taken over by Jew hatred are societies that are sick and dying. Look at the Russia of Kishinev in 1903, the worst pogrom of the 20th century before the Holocaust. It was the biggest country in the world at that time. It had existed for hundreds of years. 14 years later, it was gone. Look at the Germany of Kristallnacht in 1938, the most powerful army, the most powerful air force. It was supposed to be the thousand-year Reich. Just seven years later, it was dead. So not because I'm a Jew, but because I am an American who came here from Venezuela with nothing, knowing no one, and who was embraced by this community and this country with open arms, which has given me and my family every blessing and privilege under the sun, I understand that we, each of us, Jew and not Jew alike, have a moral and practical obligation to root out anti-Semitism in our society because it is the moral rot in the wooden framework of our house. If we are not careful, it will bring the entire edifice tumbling down on all of us, not just the Jews.

Roy K. Altman

155,318 次观看 • 2 个月前

Agentic AI will transform every enterprise–but only if agents are trusted experts. The key: Evaluation & tuning on specialized, expert data. I’m excited to announce two new products to support this–Snorkel AI Evaluate & Expert Data-as-a-Service–along w/ our $100M Series D! --- Snorkel Evaluate is our new data-centric agentic AI evaluation platform for specialized, mission-critical enterprise settings where vibe checks and out-of-the-box metrics driven by simple LLM prompts are not enough. Snorkel Expert Data-as-a-Service is our white glove service for expert-level AI datasets, powering frontier LLM developers in areas like expert knowledge, reasoning, agentic action and tool use, and more! Both built on top of Snorkel AI’s Data Development Platform, using our programmatic technology to drive higher-quality expert data, faster– for getting specialized AI to real production value. If you’re building enterprise AI and want to partner around the key ingredient in AI today–the data–book a demo and let's talk! Finally, see thread for details on 🧵👇 - 📽️ A walkthrough of Snorkel Evaluate and Expert Data-as-a-Service on an agentic AI enterprise task - 📅 An upcoming event on Enterprise Agentic AI with innovators from Accenture @BNY Comcast Stanford University QBE & others - 📊 An upcoming series of benchmark datasets and model artifact releases 👀 Want early access to the full agentic AI dataset? Retweet this post and we'll send you the link!

Agentic AI will transform every enterprise–but only if agents are trusted experts. The key: Evaluation & tuning on specialized, expert data. I’m excited to announce two new products to support this–Snorkel AI Evaluate & Expert Data-as-a-Service–along w/ our $100M Series D! --- Snorkel Evaluate is our new data-centric agentic AI evaluation platform for specialized, mission-critical enterprise settings where vibe checks and out-of-the-box metrics driven by simple LLM prompts are not enough. Snorkel Expert Data-as-a-Service is our white glove service for expert-level AI datasets, powering frontier LLM developers in areas like expert knowledge, reasoning, agentic action and tool use, and more! Both built on top of Snorkel AI’s Data Development Platform, using our programmatic technology to drive higher-quality expert data, faster– for getting specialized AI to real production value. If you’re building enterprise AI and want to partner around the key ingredient in AI today–the data–book a demo and let's talk! Finally, see thread for details on 🧵👇 - 📽️ A walkthrough of Snorkel Evaluate and Expert Data-as-a-Service on an agentic AI enterprise task - 📅 An upcoming event on Enterprise Agentic AI with innovators from Accenture @BNY Comcast Stanford University QBE & others - 📊 An upcoming series of benchmark datasets and model artifact releases 👀 Want early access to the full agentic AI dataset? Retweet this post and we'll send you the link!

Alex Ratner

49,964 次观看 • 1 年前

Satya Nadella was asked what Microsoft does when its AI revenue 10x's from $13B to $130B. He refused to answer in revenue - and named the only AGI benchmark he says counts "The first thing we have to observe is GDP growth. There's only one governor in all of this" "This is where a little bit of, we get ahead of ourselves with all this AGI hype" "The developed world is what, 2% growth? And if you adjust for inflation, it's zero. We have a real growth challenge" "When we say this is like the Industrial Revolution - that type of growth means to me 10%, 7%, developed world inflation adjusted, growing at 5%. That's the real marker" "The big winners here are not going to be tech companies. The winners are going to be the broader industry that uses this commodity" "Us self-claiming some AGI milestone - that's just nonsensical benchmark hacking to me. The real benchmark is the world growing at 10%" The CEO with one of the largest AI infrastructure bets on earth just told you to ignore every model benchmark and watch a single number: world GDP. By that test, nothing has happened yet. Developed-world growth is flat, and no lab's press release moves it. Real AGI shows up as the economy compounding at Industrial Revolution rates - or it isn't real.

Satya Nadella was asked what Microsoft does when its AI revenue 10x's from $13B to $130B. He refused to answer in revenue - and named the only AGI benchmark he says counts "The first thing we have to observe is GDP growth. There's only one governor in all of this" "This is where a little bit of, we get ahead of ourselves with all this AGI hype" "The developed world is what, 2% growth? And if you adjust for inflation, it's zero. We have a real growth challenge" "When we say this is like the Industrial Revolution - that type of growth means to me 10%, 7%, developed world inflation adjusted, growing at 5%. That's the real marker" "The big winners here are not going to be tech companies. The winners are going to be the broader industry that uses this commodity" "Us self-claiming some AGI milestone - that's just nonsensical benchmark hacking to me. The real benchmark is the world growing at 10%" The CEO with one of the largest AI infrastructure bets on earth just told you to ignore every model benchmark and watch a single number: world GDP. By that test, nothing has happened yet. Developed-world growth is flat, and no lab's press release moves it. Real AGI shows up as the economy compounding at Industrial Revolution rates - or it isn't real.

Karl Mehta

82,403 次观看 • 27 天前

GPT-5.6 vs GPT-5.5 on my custom spaceship prompt. I gave both models the exact same custom prompt. This is also the same prompt I previously gave to Fable 5. For context, GPT-5.6 Pro worked for 87 minutes, while GPT-5.5 Extra High worked for 34 minutes and 42 seconds. As I’ve said before, based on great authority GPT-5.6 will be an incremental/soldi improvement over GPT-5.5, not a “Fable killer.” My rough expectation has been that it would trade blows with Fable 5 on some benchmarks, maybe win around half depending on the category, but not clearly surpass it overall. And again fable five will have bigger model smell, but this was expected. After testing this coding output, that view feels pretty accurate. GPT-5.6 is clearly better than GPT-5.5 in several visual areas. The lighting, shading, chairs, object details, and exterior of the spaceship looked noticeably stronger. The scene was also easier to test. I do want to give GPT-5.5 credit though. It built out the rooms much much better and the planets looked better than GPT-5.6’s. It was also interesting that both GPT-5.5 and GPT-5.6 produced better-looking planets than Fable 5 in this specific test. The downside with GPT-5.5 was stability. The game was much glitchier and harder to test compared to GPT-5.6. But when it comes to the core of the demo, which is the spaceship itself, Fable 5 still beat both models pretty comfortably. GPT-5.6 is impressive, but from this test, it looks exactly like what I expected which was a meaningful incremental improvement over GPT-5.5, at least for indie game demos, but not something that replaces Fable 5. In collaboration with Chetaslua

GPT-5.6 vs GPT-5.5 on my custom spaceship prompt. I gave both models the exact same custom prompt. This is also the same prompt I previously gave to Fable 5. For context, GPT-5.6 Pro worked for 87 minutes, while GPT-5.5 Extra High worked for 34 minutes and 42 seconds. As I’ve said before, based on great authority GPT-5.6 will be an incremental/soldi improvement over GPT-5.5, not a “Fable killer.” My rough expectation has been that it would trade blows with Fable 5 on some benchmarks, maybe win around half depending on the category, but not clearly surpass it overall. And again fable five will have bigger model smell, but this was expected. After testing this coding output, that view feels pretty accurate. GPT-5.6 is clearly better than GPT-5.5 in several visual areas. The lighting, shading, chairs, object details, and exterior of the spaceship looked noticeably stronger. The scene was also easier to test. I do want to give GPT-5.5 credit though. It built out the rooms much much better and the planets looked better than GPT-5.6’s. It was also interesting that both GPT-5.5 and GPT-5.6 produced better-looking planets than Fable 5 in this specific test. The downside with GPT-5.5 was stability. The game was much glitchier and harder to test compared to GPT-5.6. But when it comes to the core of the demo, which is the spaceship itself, Fable 5 still beat both models pretty comfortably. GPT-5.6 is impressive, but from this test, it looks exactly like what I expected which was a meaningful incremental improvement over GPT-5.5, at least for indie game demos, but not something that replaces Fable 5. In collaboration with Chetaslua

Chris

250,150 次观看 • 1 个月前

Our world is vast, but we are all in it together. And this is our shared cause – peace. 🇺🇦 proposed its Peace Formula to the world. As long as invaders remain on our land, no one will sit down at the negotiating table with 🇷🇺. The colonizer must get out. And the world has enough power to force 🇷🇺 to restore peace step by step. We have developed the Peace Formula in a way that ensures each of its points is backed by United Nations 🇺🇳 resolutions. And in a way that everyone in the world can choose the track they can contribute to. From Japan to the Arab countries, from Europe to Latin America, we find support for our Formula. And we continue this work.

Our world is vast, but we are all in it together. And this is our shared cause – peace. 🇺🇦 proposed its Peace Formula to the world. As long as invaders remain on our land, no one will sit down at the negotiating table with 🇷🇺. The colonizer must get out. And the world has enough power to force 🇷🇺 to restore peace step by step. We have developed the Peace Formula in a way that ensures each of its points is backed by United Nations 🇺🇳 resolutions. And in a way that everyone in the world can choose the track they can contribute to. From Japan to the Arab countries, from Europe to Latin America, we find support for our Formula. And we continue this work.

Volodymyr Zelenskyy / Володимир Зеленський

2,286,783 次观看 • 3 年前

🚨🚨🚨 When I was growing up, I always felt like there was something wrong in the world - After the #COVID #Vaccine, I knew for a fact that something was terribly wrong, so I started doing relentless research every single day. I dedicate every minute of my spare time to understand what’s going on to share it with you guys - Here are some things that made me realize that we are being LIED to on a massive scale 🧐 : (1) Vaccines are NOT safe for humans or animals - My research shows they’re being used to injure and kill beings indiscriminately. I’m not sure if it’s for profit or to keep the population down or BOTH. Do you know why they have to keep the population as low as possible? Because if there are too many of us, we would figure out the game that’s being played on us (2) Most of our food products have things that are not safe for human consumption = RED 40 is an example (3) They put stuff in our water supply = Atrazine and Fluoride are examples (4) Corporations and Central banks are really the ones that control our world - Our politicians are nothing but scripted actors that work for the best interest of these bankers and corporations (5) The mainstream media work for these same corporations and central bankers. The media destroys anyone that gets too close to the truth The media elevates their chosen people into desired positions of POWER The media has the ability to amplify any message, just like they have the ability to make any story disappear by not talking about it at all (6) Central banks and government artificially induce crisis throughout history. Government spending and printing of money causes inflation, then banks raise interest rates, which increases the cost of borrowing and VOILA, here is your cost of living crisis (7) The earth is not a spinning ball that’s flying through space - We are stationary and NOT spinning around in circles - There is a firmament (dome) that surrounds our planet (8) The North Pole is the Centre of the World - Do you know that the North Pole was mapped before, BUT they removed the maps and claim there is nothing but ice there😆 Do you know that the tallest mountain in the world is at the North Pole? Do you know that your compass points north towards the magnetic mountain that’s at the North Pole? The Rupes Nigra, also known as BlackRock is at the North Pole Use the search bar on the top of my page and look for some info I posted about the North Pole (9) Human beings are not allowed to freely explore Antarctica or go past it, even though Admiral Byrd admitted that there is LAND as big as the United States that remains unexplored there - Have you heard about the Antarctic treaty? (10) Human beings never went to the moon - How did they get past the Van Allen Radiation belt? The spaceship has to refuel about 9X before reaching the moon, how did they do that? How is it possible that we lost the technology for going back to the moon? Technology never goes backwards. (11) Only 5% of our oceans have been explored, but we want to go to space? How ridiculous is that? We haven’t fully explored our own house, but we are going to explore someone else’s house? (12) The Sun is NOT a ball of fire that is going to explode at some point😆 Who is deceiving and hurting us on such a massive scale and WHY? How is it possible that so many people are in on it? Humanity deserves to know the TRUTH about our reality and I will NEVER stop fighting for it - I hope you feel the same way, because without your help, there’s not much people like me can do. We have to do this TOGETHER❤️

🚨🚨🚨 When I was growing up, I always felt like there was something wrong in the world - After the #COVID #Vaccine, I knew for a fact that something was terribly wrong, so I started doing relentless research every single day. I dedicate every minute of my spare time to understand what’s going on to share it with you guys - Here are some things that made me realize that we are being LIED to on a massive scale 🧐 : (1) Vaccines are NOT safe for humans or animals - My research shows they’re being used to injure and kill beings indiscriminately. I’m not sure if it’s for profit or to keep the population down or BOTH. Do you know why they have to keep the population as low as possible? Because if there are too many of us, we would figure out the game that’s being played on us (2) Most of our food products have things that are not safe for human consumption = RED 40 is an example (3) They put stuff in our water supply = Atrazine and Fluoride are examples (4) Corporations and Central banks are really the ones that control our world - Our politicians are nothing but scripted actors that work for the best interest of these bankers and corporations (5) The mainstream media work for these same corporations and central bankers. The media destroys anyone that gets too close to the truth The media elevates their chosen people into desired positions of POWER The media has the ability to amplify any message, just like they have the ability to make any story disappear by not talking about it at all (6) Central banks and government artificially induce crisis throughout history. Government spending and printing of money causes inflation, then banks raise interest rates, which increases the cost of borrowing and VOILA, here is your cost of living crisis (7) The earth is not a spinning ball that’s flying through space - We are stationary and NOT spinning around in circles - There is a firmament (dome) that surrounds our planet (8) The North Pole is the Centre of the World - Do you know that the North Pole was mapped before, BUT they removed the maps and claim there is nothing but ice there😆 Do you know that the tallest mountain in the world is at the North Pole? Do you know that your compass points north towards the magnetic mountain that’s at the North Pole? The Rupes Nigra, also known as BlackRock is at the North Pole Use the search bar on the top of my page and look for some info I posted about the North Pole (9) Human beings are not allowed to freely explore Antarctica or go past it, even though Admiral Byrd admitted that there is LAND as big as the United States that remains unexplored there - Have you heard about the Antarctic treaty? (10) Human beings never went to the moon - How did they get past the Van Allen Radiation belt? The spaceship has to refuel about 9X before reaching the moon, how did they do that? How is it possible that we lost the technology for going back to the moon? Technology never goes backwards. (11) Only 5% of our oceans have been explored, but we want to go to space? How ridiculous is that? We haven’t fully explored our own house, but we are going to explore someone else’s house? (12) The Sun is NOT a ball of fire that is going to explode at some point😆 Who is deceiving and hurting us on such a massive scale and WHY? How is it possible that so many people are in on it? Humanity deserves to know the TRUTH about our reality and I will NEVER stop fighting for it - I hope you feel the same way, because without your help, there’s not much people like me can do. We have to do this TOGETHER❤️

Kevin - WE THE PEOPLE🦁

199,410 次观看 • 2 年前

Introducing /visual-plan - a skill to generate rich, visual plans for Claude Code and Codex. Plan mode in Claude Code is incredible. But I always find my eyes glazing over when it gives me this huge markdown essay in my terminal. I found I can make much better visual plans with reusable components. So I made a skill called `/visual-plan`. It generates plans as MDX with visual, interactive components. Diagrams, interactive API specs, schema design changes, annotated code, and even pan and zoomable wireframes. So for any UI work, you can look at a wireframe first, comment on it, iterate, and then have the agent work. I’ve found this to be a much more intuitive interface for reasoning about what the agent is doing. It’s somewhat inspired by that popular post about how HTML is better than Markdown. But HTML can be slow and verbose to write. And it doesn’t look good checked into a repo. This has really made me feel like humans and engineering are entering a new abstraction phase, where we reason about things at the plan level. As long as the plan is good, agents are getting more and more reliable at executing on it. Almost to the degree that we trust the C compiler to compile to assembly reliably. Plans are the new intermediate representation. I also made a skill for the reverse of this, called `/visual-recap`. After the agent works, it gives you a recap of everything it did. Same idea: wireframes, interactive API specs and diffs, schemas, annotated code, etc. So now when you’re reviewing what the agent did for you, or looking at a pull request of somebody else’s code, you can see a visual recap instead of just reading a wall of text. It’s all free and open source. You can find it on my GitHub. Will link to it in the reply because we all know how dumb these algorithms are with links.

Introducing /visual-plan - a skill to generate rich, visual plans for Claude Code and Codex. Plan mode in Claude Code is incredible. But I always find my eyes glazing over when it gives me this huge markdown essay in my terminal. I found I can make much better visual plans with reusable components. So I made a skill called `/visual-plan`. It generates plans as MDX with visual, interactive components. Diagrams, interactive API specs, schema design changes, annotated code, and even pan and zoomable wireframes. So for any UI work, you can look at a wireframe first, comment on it, iterate, and then have the agent work. I’ve found this to be a much more intuitive interface for reasoning about what the agent is doing. It’s somewhat inspired by that popular post about how HTML is better than Markdown. But HTML can be slow and verbose to write. And it doesn’t look good checked into a repo. This has really made me feel like humans and engineering are entering a new abstraction phase, where we reason about things at the plan level. As long as the plan is good, agents are getting more and more reliable at executing on it. Almost to the degree that we trust the C compiler to compile to assembly reliably. Plans are the new intermediate representation. I also made a skill for the reverse of this, called `/visual-recap`. After the agent works, it gives you a recap of everything it did. Same idea: wireframes, interactive API specs and diffs, schemas, annotated code, etc. So now when you’re reviewing what the agent did for you, or looking at a pull request of somebody else’s code, you can see a visual recap instead of just reading a wall of text. It’s all free and open source. You can find it on my GitHub. Will link to it in the reply because we all know how dumb these algorithms are with links.

Steve (Builder.io)

124,409 次观看 • 1 个月前

Zack Polanski, "The problem is chaining yourself to an economic system that is so fundamentally broken" "You don't get marks or points for just being good, a gold star for doing the basics, or even just under the necessary" "We are at this urgent moment in our history where the only thing that is acceptable is what is required" "What is required is to reduce our emissions, to invest in climate adaptation, to protect nature in one of the most nature depleted countries in the world" "And to make sure we're reinvigorating communities with the truth and putting them in charge" "What is the biggest tool we have?" "It is our shared humanity" "Tackling the cost of living crisis is the same solution as tackling the climate crisis" "It's so vital at this budget that we're cutting bills and taxing billionaires" "There's an implication that all of this is transactional, and the only thing people care about is the money in their pockets" "If you're struggling, that probably is the only thing you care about" "But for so many people, they care about the future of this planet. About clean air. Green spaces" "That when people go to work they have a shared sense of purpose, that it is meaningful and that they are contributing something to society" "We are in a crisis. Climate. Equality. Air pollution. Identity. All of these problems are all interlinked" "And it all comes down to our communities and our humanity"

Zack Polanski, "The problem is chaining yourself to an economic system that is so fundamentally broken" "You don't get marks or points for just being good, a gold star for doing the basics, or even just under the necessary" "We are at this urgent moment in our history where the only thing that is acceptable is what is required" "What is required is to reduce our emissions, to invest in climate adaptation, to protect nature in one of the most nature depleted countries in the world" "And to make sure we're reinvigorating communities with the truth and putting them in charge" "What is the biggest tool we have?" "It is our shared humanity" "Tackling the cost of living crisis is the same solution as tackling the climate crisis" "It's so vital at this budget that we're cutting bills and taxing billionaires" "There's an implication that all of this is transactional, and the only thing people care about is the money in their pockets" "If you're struggling, that probably is the only thing you care about" "But for so many people, they care about the future of this planet. About clean air. Green spaces" "That when people go to work they have a shared sense of purpose, that it is meaningful and that they are contributing something to society" "We are in a crisis. Climate. Equality. Air pollution. Identity. All of these problems are all interlinked" "And it all comes down to our communities and our humanity"

Farrukh

205,108 次观看 • 8 个月前

Elon Musk: The probability of AI going bad is not zero. “I saw GPT-1, GPT-2, GPT-3, GPT-4, the whole lead-up to that. So it was easy for me to kind of see where it's going. If you just extrapolate the points on a curve and assume that trend will continue, then we will have profound artificial intelligence, and obviously at a level that far exceeds human intelligence. So I'm glad to see at this point that people are taking safety seriously. And I do think overall that the potential is there for artificial intelligence, AI, to have most likely a positive effect and to create a future of abundance where there is no scarcity of goods and services. But it is somewhat of the magic genie problem, where if you have a magic genie that can grant all the wishes, usually those stories don't end well. Be careful what you wish for.” In Conversation with Rishi Sunak, November 2, 2023.

Elon Musk: The probability of AI going bad is not zero. “I saw GPT-1, GPT-2, GPT-3, GPT-4, the whole lead-up to that. So it was easy for me to kind of see where it's going. If you just extrapolate the points on a curve and assume that trend will continue, then we will have profound artificial intelligence, and obviously at a level that far exceeds human intelligence. So I'm glad to see at this point that people are taking safety seriously. And I do think overall that the potential is there for artificial intelligence, AI, to have most likely a positive effect and to create a future of abundance where there is no scarcity of goods and services. But it is somewhat of the magic genie problem, where if you have a magic genie that can grant all the wishes, usually those stories don't end well. Be careful what you wish for.” In Conversation with Rishi Sunak, November 2, 2023.

ELON CLIPS

22,095 次观看 • 1 个月前

Introducing EdgeBench, a benchmark designed to study how agents learn from environments over at least 12~72-hour runs. We find that performance follows a log-sigmoid function of environment interaction time with high precision. EdgeBench is built with three ingredients: - 🌍 Real & Diverse: 134 real-world tasks across 6 task categories, spanning scientific problems, professional knowledge work, software engineering, optimization, formal math, and games. - ⏳ Ultra-Long-Horizon: Each task supports 12–72 hours of agent work. Recorded human effort averages 57.2 hours. - 🔁 Informative Feedback: Agents receive real-world feedback for continuous improvement. After 38,000 hours of agent runs on EdgeBench, a scaling law for learning from environments emerges: - 📈 As agents interact with task environments over time, their aggregate performance is precisely fit by a log-sigmoid function. - 🧠 This phenomenon can be explained by an elegant theory of graph exploration. We are releasing an initial 51 of the 134 tasks, together with the full evaluation framework, to help advance long-horizon agent research. Check our blog & paper for more findings! Blog Paper GitHub Dataset Details below 👇🧵

Introducing EdgeBench, a benchmark designed to study how agents learn from environments over at least 12~72-hour runs. We find that performance follows a log-sigmoid function of environment interaction time with high precision. EdgeBench is built with three ingredients: - 🌍 Real & Diverse: 134 real-world tasks across 6 task categories, spanning scientific problems, professional knowledge work, software engineering, optimization, formal math, and games. - ⏳ Ultra-Long-Horizon: Each task supports 12–72 hours of agent work. Recorded human effort averages 57.2 hours. - 🔁 Informative Feedback: Agents receive real-world feedback for continuous improvement. After 38,000 hours of agent runs on EdgeBench, a scaling law for learning from environments emerges: - 📈 As agents interact with task environments over time, their aggregate performance is precisely fit by a log-sigmoid function. - 🧠 This phenomenon can be explained by an elegant theory of graph exploration. We are releasing an initial 51 of the 134 tasks, together with the full evaluation framework, to help advance long-horizon agent research. Check our blog & paper for more findings! Blog Paper GitHub Dataset Details below 👇🧵

Deyao Zhu

357,755 次观看 • 28 天前

Bash is all you need! Which is why I'm introducing my holiday project: just-bash just-bash is a pretty complete implementation of bash in TypeScript designed to be used as a bash tool by AI agents. Because it turns out agents love exploring data via shell scripts, even beyond coding. It comes with grep, sed, awk and the 99th percentile features that an agent like Claude Code or Cursor would use. In fact, Claude Code can use it for secure bash execution. In the package - A bash-tool for AI SDK - A binary for use by yourself or your coding agents - An overlay filesystem to feed files to your agent securely - A Vercel Sandbox compatible API, so you can quickly upgrade to a real VM if you need to run binaries - An example AI agent that explores the just-bash code base using just-bash - I imported the Oils shell bash compatibility suite and just-bash passes a very good chunk What is interesting about this codebase: It was essentially entirely written by Opus 4.5. Coding agents love bash and they are good at reproducing it. They are also great at text-book recursive descent parsers and AST tweet-walk interpreters. That said, it is, like, a lot of code and I didn't read it all 😅. This is very much a hack, but it also seems to be _really_ useful. I haven't really found anything agents want to use that it doesn't support and it's fast and secure (caveats apply). It doesn't have write access to your computer and the filesystem is given a root that the agent cannot escape from. Find it at Related: Our recent blog post how we migrated our data analysis agent to bash tools and achieved incredible quality improvements The video shows the example agent investigating the just-bash code base

Bash is all you need! Which is why I'm introducing my holiday project: just-bash just-bash is a pretty complete implementation of bash in TypeScript designed to be used as a bash tool by AI agents. Because it turns out agents love exploring data via shell scripts, even beyond coding. It comes with grep, sed, awk and the 99th percentile features that an agent like Claude Code or Cursor would use. In fact, Claude Code can use it for secure bash execution. In the package - A bash-tool for AI SDK - A binary for use by yourself or your coding agents - An overlay filesystem to feed files to your agent securely - A Vercel Sandbox compatible API, so you can quickly upgrade to a real VM if you need to run binaries - An example AI agent that explores the just-bash code base using just-bash - I imported the Oils shell bash compatibility suite and just-bash passes a very good chunk What is interesting about this codebase: It was essentially entirely written by Opus 4.5. Coding agents love bash and they are good at reproducing it. They are also great at text-book recursive descent parsers and AST tweet-walk interpreters. That said, it is, like, a lot of code and I didn't read it all 😅. This is very much a hack, but it also seems to be _really_ useful. I haven't really found anything agents want to use that it doesn't support and it's fast and secure (caveats apply). It doesn't have write access to your computer and the filesystem is given a root that the agent cannot escape from. Find it at Related: Our recent blog post how we migrated our data analysis agent to bash tools and achieved incredible quality improvements The video shows the example agent investigating the just-bash code base

Malte Ubl

124,713 次观看 • 7 个月前

Welcome to HANDL Our new name. 🧡 A name that resonates with our vision to turn all social media usernames into global cross-chain payment handles. In 2024, we launched an MVP and experimented with different user groups. As we are now gearing up for our TGE and go-to-market campaign, it is time for a brand facelift. Handl is young, sharp, and resonates with the Gen Z and Gen Alpha audience, who were the highest volume of users to engage with our product and most likely to adopt it. The brand celebrates and will leverage internet culture as a key part of its growth strategy. The Gen Z audience is largely aware and curious of Bitcoin, stablecoins, and memecoins, but many struggle to jump over the technical barriers to get started. In our aim to onboard new users to Web3, we effectively expose our target to crypto by bringing payments to the platforms they spend most of their time on: social media. 💡 No crypto knowledge required. We are a cryptocurrency native company, the orange, our most prominent brand color, echoes Bitcoin and crypto as a whole. While the green in our brand refers to traditional & tangible cash, touching grass, the “real-world”. Join us and become part of one of the fundamental payment layers acting as the glue between both worlds.

HANDL

28,256 次观看 • 1 年前