Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

When you prompt an LLM for code, you get one deterministic program. However, the LLM actually defines a distribution over many programs, and existing methods discard it‼️ PPoT uses this distribution to extract free performance and efficiency gains. 🧵👇

Daniel Israel

1,360 subscribers

11,358 Aufrufe • vor 1 Monat •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Demo of the ModelRouter 🤖🤖🤖🤖 People have been begging me for this for months. IT'S HERE!!!!! It selects an LLM for you so you don't have to choose which one you need for a task. It also makes a system prompt, sets temperature, and more! ⎆ task ⎆ boss selects best llm like gpt-4o, claude, deepseek, and more ⎆ chosen llm executes your task. Learn more ⬇️

Demo of the ModelRouter 🤖🤖🤖🤖 People have been begging me for this for months. IT'S HERE!!!!! It selects an LLM for you so you don't have to choose which one you need for a task. It also makes a system prompt, sets temperature, and more! ⎆ task ⎆ boss selects best llm like gpt-4o, claude, deepseek, and more ⎆ chosen llm executes your task. Learn more ⬇️

Kye Gomez (swarms)

21,684 Aufrufe • vor 1 Jahr

RouteLLM - Route To The Best LLM Based On Your Prompt Our RouteLLM got an upgrade this week.., it got smarter at picking the right LLM - o1 for complex queries - gpt4o for quick answers - sonnet for coding - deepseek for simple code - gemini for long context - llama and mini for simple question It optimizes for performance, speed and cost!

RouteLLM - Route To The Best LLM Based On Your Prompt Our RouteLLM got an upgrade this week.., it got smarter at picking the right LLM - o1 for complex queries - gpt4o for quick answers - sonnet for coding - deepseek for simple code - gemini for long context - llama and mini for simple question It optimizes for performance, speed and cost!

Bindu Reddy

17,860 Aufrufe • vor 1 Jahr

An LLM-controlled robot dog saw us press its shutdown button, and the LLM rewrote the robot’s code so it could stay on. When AI interacts with the physical world, it brings all its capabilities and failure modes with it. 🧵

An LLM-controlled robot dog saw us press its shutdown button, and the LLM rewrote the robot’s code so it could stay on. When AI interacts with the physical world, it brings all its capabilities and failure modes with it. 🧵

Palisade Research

1,359,126 Aufrufe • vor 3 Monaten

Deterministic inference: getting the exact same output every time you run an LLM with identical inputs. Try it yourself: Powered by EigenAI

Deterministic inference: getting the exact same output every time you run an LLM with identical inputs. Try it yourself: Powered by EigenAI

nader dabit

29,335 Aufrufe • vor 7 Monaten

Introducing RAT: Retrieval Argument Thinking. Extract just the reasoning process from deepSeek-r1 and send it to any LLM via . Boost the second LLM's performance and gain access to missing capabilities like function calling and JSON mode. 👇

Introducing RAT: Retrieval Argument Thinking. Extract just the reasoning process from deepSeek-r1 and send it to any LLM via . Boost the second LLM's performance and gain access to missing capabilities like function calling and JSON mode. 👇

Pietro Schirano

532,880 Aufrufe • vor 1 Jahr

Learn to carry out red teaming attacks against your own LLM-based applications to spot and patch vulnerabilities! In our new short course, Red Teaming LLM Applications, Matteo Dora & Luca Martial of LLM testing company Giskard teach how to simulate malicious actions to discover vulnerabilities, and improve security. We start with prompt injection, where you can trick an LLM into bypassing safeguards to reveal private information, or say something inappropriate. There is no one-size-fits-all approach to security, but this course will help you identify some scenarios to protect against. We believe having red teaming capabilities widely known will result in greater transparency and safer LLM-based systems. However, we ask you to use the skills you gain from this course ethically. Please sign up here:

Learn to carry out red teaming attacks against your own LLM-based applications to spot and patch vulnerabilities! In our new short course, Red Teaming LLM Applications, Matteo Dora & Luca Martial of LLM testing company Giskard teach how to simulate malicious actions to discover vulnerabilities, and improve security. We start with prompt injection, where you can trick an LLM into bypassing safeguards to reveal private information, or say something inappropriate. There is no one-size-fits-all approach to security, but this course will help you identify some scenarios to protect against. We believe having red teaming capabilities widely known will result in greater transparency and safer LLM-based systems. However, we ask you to use the skills you gain from this course ethically. Please sign up here:

Andrew Ng

109,739 Aufrufe • vor 2 Jahren

This is a trillion-dollar industry, and you can't solve it with an LLM: • Forecasting • Fraud detection • Churn prediction Large Language Models are fundamentally bad at solving these problems. When you feed structured data into an LLM, it doesn't see relationships, and it treats every number, date, and foreign key as a token. That's why you always get garbage back. An LLM thinks your database is a Wikipedia article. It doesn't understand its structure or its relationships. GPT-4 scores 63% on relational prediction tasks. That's the best it can do, and that's pretty much useless. You can't expect real-world business value to come from summarizing Wikipedia articles.

This is a trillion-dollar industry, and you can't solve it with an LLM: • Forecasting • Fraud detection • Churn prediction Large Language Models are fundamentally bad at solving these problems. When you feed structured data into an LLM, it doesn't see relationships, and it treats every number, date, and foreign key as a token. That's why you always get garbage back. An LLM thinks your database is a Wikipedia article. It doesn't understand its structure or its relationships. GPT-4 scores 63% on relational prediction tasks. That's the best it can do, and that's pretty much useless. You can't expect real-world business value to come from summarizing Wikipedia articles.

Santiago

94,651 Aufrufe • vor 1 Monat

New short course: Evaluating AI Agents! Evals are important for driving AI system improvements, and in this course you'll learn to systematically assess and improve an AI agent’s performance. This is built in partnership with Arize AI and taught by John Gilhuly, Head of Developer Relations, and , Director of Product. I've often found evals to be a critical tool in the agent development process - they can be the difference between picking the right thing to work on vs. wasting weeks of effort. Whether you’re building a shopping assistant, coding agent, or research assistant, having a structured evaluation process helps you refine its performance systematically, rather than relying on random trial and error. This course shows you how to structure your evals to assess the performance of each component of an agent and its end-to-end performance. For each component, you select the appropriate evaluators, test examples, and performance metrics. This helps you identify areas for improvement both during development and in production. (If you're familiar with error analysis in supervised learning, think of this as adapting those ideas to agentic workflows.) In this course, you'll build an AI agent, and add observability to visualize and debug its steps. You’ll learn about code-based evals, in which you write code explicitly to test a certain step, as well as LLM-as-a-Judge evals, in which you prompt an LLM to efficiently come up with ways to evaluate more open-ended outputs. In detail, you’ll: - Understand key differences between evaluating LLM-based systems and traditional software testing. - Add observability to an agent by collecting traces of the steps taken by the agent and visualizing them - Choose the appropriate evaluator - code-based, LLM-as-a-Judge, human-annotation based - for each component. - Compute a convergence score to evaluate if your agent can respond to a query in an efficient number of steps. - Run structured experiments to improve the agent’s performance by exploring changes to the prompt, LLM model, or the agent’s logic. - Understand how to deploy these evaluation techniques to monitor the agent’s performance in production. By the end of this course, you’ll know how to trace AI agents, systematically evaluate them, and improve their performance. Please sign up here:

New short course: Evaluating AI Agents! Evals are important for driving AI system improvements, and in this course you'll learn to systematically assess and improve an AI agent’s performance. This is built in partnership with Arize AI and taught by John Gilhuly, Head of Developer Relations, and , Director of Product. I've often found evals to be a critical tool in the agent development process - they can be the difference between picking the right thing to work on vs. wasting weeks of effort. Whether you’re building a shopping assistant, coding agent, or research assistant, having a structured evaluation process helps you refine its performance systematically, rather than relying on random trial and error. This course shows you how to structure your evals to assess the performance of each component of an agent and its end-to-end performance. For each component, you select the appropriate evaluators, test examples, and performance metrics. This helps you identify areas for improvement both during development and in production. (If you're familiar with error analysis in supervised learning, think of this as adapting those ideas to agentic workflows.) In this course, you'll build an AI agent, and add observability to visualize and debug its steps. You’ll learn about code-based evals, in which you write code explicitly to test a certain step, as well as LLM-as-a-Judge evals, in which you prompt an LLM to efficiently come up with ways to evaluate more open-ended outputs. In detail, you’ll: - Understand key differences between evaluating LLM-based systems and traditional software testing. - Add observability to an agent by collecting traces of the steps taken by the agent and visualizing them - Choose the appropriate evaluator - code-based, LLM-as-a-Judge, human-annotation based - for each component. - Compute a convergence score to evaluate if your agent can respond to a query in an efficient number of steps. - Run structured experiments to improve the agent’s performance by exploring changes to the prompt, LLM model, or the agent’s logic. - Understand how to deploy these evaluation techniques to monitor the agent’s performance in production. By the end of this course, you’ll know how to trace AI agents, systematically evaluate them, and improve their performance. Please sign up here:

Andrew Ng

126,355 Aufrufe • vor 1 Jahr

🚨 FRANCOIS CHOLLET SAYS THE CURRENT LLM STACK IS THE WRONG PATH TO AGI Interviewer: “When will we accomplish the first definition of AGI?” Chollet confirms we are already on that exact trajectory, pointing out that: "Current technology can fully automate at human level or beyond any domain where you have verifiable rewards, right? And code... code being the first one." He then brings up the problem with LLMs, stating that: "It's possible in principle to build something that looks a lot like AGI on top of the LLM stack... I do believe, however, this would be the wrong thing to do because it would be very inefficient." Looking ahead to what the final architecture will actually require, he explains that: "AI research will have to trend towards not just efficiency, but in fact optimality over time. And for this reason, future AI in a few decades... it's not going to be this harness on top of a reasoning model on top of a base LLM."

🚨 FRANCOIS CHOLLET SAYS THE CURRENT LLM STACK IS THE WRONG PATH TO AGI Interviewer: “When will we accomplish the first definition of AGI?” Chollet confirms we are already on that exact trajectory, pointing out that: "Current technology can fully automate at human level or beyond any domain where you have verifiable rewards, right? And code... code being the first one." He then brings up the problem with LLMs, stating that: "It's possible in principle to build something that looks a lot like AGI on top of the LLM stack... I do believe, however, this would be the wrong thing to do because it would be very inefficient." Looking ahead to what the final architecture will actually require, he explains that: "AI research will have to trend towards not just efficiency, but in fact optimality over time. And for this reason, future AI in a few decades... it's not going to be this harness on top of a reasoning model on top of a base LLM."

Chris

15,578 Aufrufe • vor 2 Monaten

Claude 3.7 Sonnet is insane... just one prompt, and you get these animated cards in HTML, JS, and CSS. You can try it for free in VSCode using the CodeGPT extension. Sharing the prompt in this thread 👇

Claude 3.7 Sonnet is insane... just one prompt, and you get these animated cards in HTML, JS, and CSS. You can try it for free in VSCode using the CodeGPT extension. Sharing the prompt in this thread 👇

Daniel San

117,693 Aufrufe • vor 1 Jahr

"Orgs and cos building MCP servers are taking an LLM-first approach to what the API needs to expose to the agent(s)." - Nikunj Handa from OpenAI "For example, Stripe has a bunch of APIs that can be used to create a subscription/customer/product/price. For an LLM, it can just combine that into a single function." "Instead of returning this massive JSON object, they can return something very specific to the task being solved, so that the LLM can more easily understand what's happening." "It's an opportunity to rewrite your APIs to be very LLM-first. Why do 2 hours of work, when you can do it in 4 lines of code under a minute?"

"Orgs and cos building MCP servers are taking an LLM-first approach to what the API needs to expose to the agent(s)." - Nikunj Handa from OpenAI "For example, Stripe has a bunch of APIs that can be used to create a subscription/customer/product/price. For an LLM, it can just combine that into a single function." "Instead of returning this massive JSON object, they can return something very specific to the task being solved, so that the LLM can more easily understand what's happening." "It's an opportunity to rewrite your APIs to be very LLM-first. Why do 2 hours of work, when you can do it in 4 lines of code under a minute?"

TBPN

11,445 Aufrufe • vor 1 Jahr

Karp predicting tokenmaxxing 2 years ago “You buy an LLM, you party with it, and the next day you have a hangover.”

Karp predicting tokenmaxxing 2 years ago “You buy an LLM, you party with it, and the next day you have a hangover.”

Chad Wahlquist

46,866 Aufrufe • vor 8 Tagen

Claude Code with Sonnet 4.5 is actually incredible I gave it a prompt for a super complex app, and it one shot the entire thing In this video I walk you through how to use Claude Code to build a prompt library app you can start using immediately (no coding experience required):

Claude Code with Sonnet 4.5 is actually incredible I gave it a prompt for a super complex app, and it one shot the entire thing In this video I walk you through how to use Claude Code to build a prompt library app you can start using immediately (no coding experience required):

Alex Finn

40,858 Aufrufe • vor 8 Monaten

Announcing Web Search for /extract 🔎 Augment your extract queries with data sourced from the internet. Just enter a prompt and get the data you need. Try it out today with 500K free tokens 🔥

Announcing Web Search for /extract 🔎 Augment your extract queries with data sourced from the internet. Just enter a prompt and get the data you need. Try it out today with 500K free tokens 🔥

Firecrawl

17,621 Aufrufe • vor 1 Jahr

I'm reiterating - I think a better interface for LLMs on desktop is an interactive notebook, not chat. I'm trying to build a better intuition for PCA and strongly feel an interactive notebook where I'm chatting with LLM + also modifying code/plotting figures is a much better interaction. Bonus points if you make notebook cells branchable (so if you get an error, you can ask LLM to fix it without adding that rotten context to the main branch of learning). Please steal this idea. Make an AI-powered, interactive, branching notebook.

I'm reiterating - I think a better interface for LLMs on desktop is an interactive notebook, not chat. I'm trying to build a better intuition for PCA and strongly feel an interactive notebook where I'm chatting with LLM + also modifying code/plotting figures is a much better interaction. Bonus points if you make notebook cells branchable (so if you get an error, you can ask LLM to fix it without adding that rotten context to the main branch of learning). Please steal this idea. Make an AI-powered, interactive, branching notebook.

Paras Chopra

58,064 Aufrufe • vor 9 Monaten

‼️HOW ALGORAND USES BLOCKCHAIN TO IMPROVE HUMANITARIAN AID DISTRIBUTION‼️ Listen closely.👂👇

‼️HOW ALGORAND USES BLOCKCHAIN TO IMPROVE HUMANITARIAN AID DISTRIBUTION‼️ Listen closely.👂👇

SMQKE

14,671 Aufrufe • vor 19 Tagen

Learn how to fine-tune FunctionGemma on TPUs in a Colaboratory notebook. This guide uses Tunix, a lightweight JAX library, to streamline post-training your LLM🧵👇

Learn how to fine-tune FunctionGemma on TPUs in a Colaboratory notebook. This guide uses Tunix, a lightweight JAX library, to streamline post-training your LLM🧵👇

Google AI Developers

20,590 Aufrufe • vor 2 Monaten

i found the secret to writing copy that SELLS with AI... it's not about the model, it's not about the prompt, and it doesn't matter how good you are at writing it's about feeding AI with world-class examples i built a full copywriting swipe file you can use with any LLM: - over 1,000 examples of the best emails, ads and landing pages copy - my 3-step process to use this inside AI projects - my super-prompt to extract the sauce from any copy reply "SWIPE" + retweet and i'll send it for free (must be following so i can DM)

i found the secret to writing copy that SELLS with AI... it's not about the model, it's not about the prompt, and it doesn't matter how good you are at writing it's about feeding AI with world-class examples i built a full copywriting swipe file you can use with any LLM: - over 1,000 examples of the best emails, ads and landing pages copy - my 3-step process to use this inside AI projects - my super-prompt to extract the sauce from any copy reply "SWIPE" + retweet and i'll send it for free (must be following so i can DM)

Machina

32,449 Aufrufe • vor 1 Jahr