Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Exa CEO Will Bryk explains why retrieval can help solve the tokenpocalypse: "We should not be using gigantic models for every task." "You should use a family of models of different sizes. The big model decides what to do, and it dishes out commands to the small models, and... those small models can be way more accurate and reliable if they're using retrieval." "Retrieval helps small models act like big models... We do save our customers a huge amount of tokens because they can use smaller models and use retrieval." "We could save 20x on cost for customers compared to other providers by being very efficient in what information from the web the agent actually sees." Will Bryk with Sarah Wangshow more

a16z

999,432 subscribers

25,144 Aufrufe • vor 1 Monat •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Small Language Models (SML) are the future of AI. "Small" (SML) instead of "Large" (LLM). These small models are highly specialized models with superhuman abilities on specific tasks. Here are two techniques to build these models: • Spectrum • Model Merging I give you a short introduction in the attached video, but here is a quick summary: Spectrum helps us identify the most relevant layers to solve one specific task. We can ignore everything else and focus on fine-tuning these layers. Using Spectrum, we can fine-tune models in a heartbeat. Model Merging combines multiple models into a unique, much better model than any of the individual input models. You can also combine models specialized in different tasks and get a model with multiple abilities. This is the state of the art of productizing models. It's what Arcee.ai's platform does behind the scenes. Arcee collaborated with me on this post and is sponsoring it. There are three main steps to produce a model for your particular use case: 1. You create a dataset by uploading your data. 2. You train a model. At this step, Arcee uses Spectrum and Model Merging to produce a highly specialized model for your task. 3. You can deploy that model to any environment you want. Three important notes: • Training process is 2x faster and 2x cheaper than regular fine-tuning. • Resultant models are smaller and have higher accuracy. • They create these specialized models from open-source models. Check this site so you can fully appreciate how this works: If you want to fine-tune an open-source model, consider Arcee's platform. This is the state of the art.

Small Language Models (SML) are the future of AI. "Small" (SML) instead of "Large" (LLM). These small models are highly specialized models with superhuman abilities on specific tasks. Here are two techniques to build these models: • Spectrum • Model Merging I give you a short introduction in the attached video, but here is a quick summary: Spectrum helps us identify the most relevant layers to solve one specific task. We can ignore everything else and focus on fine-tuning these layers. Using Spectrum, we can fine-tune models in a heartbeat. Model Merging combines multiple models into a unique, much better model than any of the individual input models. You can also combine models specialized in different tasks and get a model with multiple abilities. This is the state of the art of productizing models. It's what Arcee.ai's platform does behind the scenes. Arcee collaborated with me on this post and is sponsoring it. There are three main steps to produce a model for your particular use case: 1. You create a dataset by uploading your data. 2. You train a model. At this step, Arcee uses Spectrum and Model Merging to produce a highly specialized model for your task. 3. You can deploy that model to any environment you want. Three important notes: • Training process is 2x faster and 2x cheaper than regular fine-tuning. • Resultant models are smaller and have higher accuracy. • They create these specialized models from open-source models. Check this site so you can fully appreciate how this works: If you want to fine-tune an open-source model, consider Arcee's platform. This is the state of the art.

Santiago

164,162 Aufrufe • vor 2 Jahren

Sam on Kimi, distillation, and open source: “I have always assumed that there are going to be great cheap models in the world, and we better be the greatest and the cheapest. You get a better deal today, at least at a particular latency, using OpenAI's models than Kimi. We distill our own models. That’s how we make smaller, cheaper models. That’s a very good thing to do. There will clearly be an important place for open source models in the world and people that will want their own weights. We have so much usage of our models that we do not need to be a gigantically high-margin business to be able to afford model training. I would rather people not distill from us, for sure. But this is not in my top ten list of worries.”

Sam on Kimi, distillation, and open source: “I have always assumed that there are going to be great cheap models in the world, and we better be the greatest and the cheapest. You get a better deal today, at least at a particular latency, using OpenAI's models than Kimi. We distill our own models. That’s how we make smaller, cheaper models. That’s a very good thing to do. There will clearly be an important place for open source models in the world and people that will want their own weights. We have so much usage of our models that we do not need to be a gigantically high-margin business to be able to afford model training. I would rather people not distill from us, for sure. But this is not in my top ten list of worries.”

Patrick OShaughnessy

206,354 Aufrufe • vor 13 Stunden

Ollama 0.2 is here! Concurrency is now enabled by default. This unlocks 2 major features: Parallel requests Ollama can now serve multiple requests at the same time, using only a little bit of additional memory for each request. This enables use cases such as: - Handling multiple chat sessions at the same time - Hosting code completion LLMs for your team - Processing different parts of a document simultaneously - Running multiple agents at the same time Run multiple models Ollama now supports loading different models at the same time. This improves several use cases: - Retrieval Augmented Generation (RAG): both the embedding and text completion models can be loaded into memory simultaneously. - Agents: multiple versions of an agent can now run simultaneously - Running large and small models side-by-side Models are automatically loaded and unloaded based on requests and how much GPU memory is available.

Ollama 0.2 is here! Concurrency is now enabled by default. This unlocks 2 major features: Parallel requests Ollama can now serve multiple requests at the same time, using only a little bit of additional memory for each request. This enables use cases such as: - Handling multiple chat sessions at the same time - Hosting code completion LLMs for your team - Processing different parts of a document simultaneously - Running multiple agents at the same time Run multiple models Ollama now supports loading different models at the same time. This improves several use cases: - Retrieval Augmented Generation (RAG): both the embedding and text completion models can be loaded into memory simultaneously. - Agents: multiple versions of an agent can now run simultaneously - Running large and small models side-by-side Models are automatically loaded and unloaded based on requests and how much GPU memory is available.

ollama

219,409 Aufrufe • vor 2 Jahren

The future of AI is open-source. And ollama is the easiest way to build AI applications with open-source LLMs. Here's how to build a free, private RAG app using open-source tools. We'll use: - Ollama for LLMs and embedding models - PostgreSQL for data storage and retrieval - pgai Vectorizer for embedding creation and sync (I use Nomic for embeddings and tinnyllama as my LLM but you can substitute them for any models on Ollama)

The future of AI is open-source. And ollama is the easiest way to build AI applications with open-source LLMs. Here's how to build a free, private RAG app using open-source tools. We'll use: - Ollama for LLMs and embedding models - PostgreSQL for data storage and retrieval - pgai Vectorizer for embedding creation and sync (I use Nomic for embeddings and tinnyllama as my LLM but you can substitute them for any models on Ollama)

Avthar

34,261 Aufrufe • vor 1 Jahr

In between major AI model breakthroughs, new models still ship, but they’re not fundamentally expanding what you can do with them. Braintrust CEO Ankur Goyal says that’s exactly when open source starts to surge: "When new models come out, people forget about open source and forget about economics because it changes fundamentally what you can do." "Programming is a good example. The workflow today is completely different than it was a year ago. If you tried to use models from a year ago today, you wouldn't be able to do crazy stuff like OpenClaw or whatever people are building nowadays." "When models don't change at that speed, people optimize the performance on the use cases that they have." "If you look at usage on our platform, almost half the token usage is coming from open source models from a very small number of companies that have figured out how to optimize use cases really well."

In between major AI model breakthroughs, new models still ship, but they’re not fundamentally expanding what you can do with them. Braintrust CEO Ankur Goyal says that’s exactly when open source starts to surge: "When new models come out, people forget about open source and forget about economics because it changes fundamentally what you can do." "Programming is a good example. The workflow today is completely different than it was a year ago. If you tried to use models from a year ago today, you wouldn't be able to do crazy stuff like OpenClaw or whatever people are building nowadays." "When models don't change at that speed, people optimize the performance on the use cases that they have." "If you look at usage on our platform, almost half the token usage is coming from open source models from a very small number of companies that have figured out how to optimize use cases really well."

TBPN

16,294 Aufrufe • vor 5 Monaten

Big news! Now you can use other AI models in Adobe tools. Currently it's Google Imagen 3, GPT-o4 and FLUX along with our Firefly family of models (designed to be commercially safe). You can suggest more models you like to add! I've been advocating for this since day one 🥹🎉

Big news! Now you can use other AI models in Adobe tools. Currently it's Google Imagen 3, GPT-o4 and FLUX along with our Firefly family of models (designed to be commercially safe). You can suggest more models you like to add! I've been advocating for this since day one 🥹🎉

Kris Kashtanova

23,786 Aufrufe • vor 1 Jahr

Demis Hassabis recommendation for college students. He’s still do STEM, math and computer science. Expertise in those fields will help better leverage AI for at least next decade. Those in non-techical majors, really “lean in to” using latest models. AI labs spending so much time creating new models that they’ve only “scratched the surface” of what the models can actually do (huge “capability overhang”). And expertise in any field can be turbocharged by smart use of AI. “Double down on your own agency. The future is still to be written. Don’t listen to anyone that says it’s not.”

Demis Hassabis recommendation for college students. He’s still do STEM, math and computer science. Expertise in those fields will help better leverage AI for at least next decade. Those in non-techical majors, really “lean in to” using latest models. AI labs spending so much time creating new models that they’ve only “scratched the surface” of what the models can actually do (huge “capability overhang”). And expertise in any field can be turbocharged by smart use of AI. “Double down on your own agency. The future is still to be written. Don’t listen to anyone that says it’s not.”

Bearly AI

218,036 Aufrufe • vor 1 Monat

“Our compute nodes don’t run a full copy of the models we train on IOTA. They actually run a small sliver of the model. This means you can train really large frontier sized models using very small building blocks”. CTO crux unpacks how IOTA ・ SN9’s model parallel architecture allows us to train at scale by splitting our models across multiple machines, and then sewing them together. This is core to SN9’s architecture. See the full Eye On A.I. podcast below.

“Our compute nodes don’t run a full copy of the models we train on IOTA. They actually run a small sliver of the model. This means you can train really large frontier sized models using very small building blocks”. CTO crux unpacks how IOTA ・ SN9’s model parallel architecture allows us to train at scale by splitting our models across multiple machines, and then sewing them together. This is core to SN9’s architecture. See the full Eye On A.I. podcast below.

IOTA ・ SN9

15,465 Aufrufe • vor 2 Monaten

I figured out a way to save thousands of dollars on ClawdBot By using Opus 4.5 for everything, you are SHREDDING your money and tokens In this video I cover how you can use multiple models to get better performance and save $$ Must watch if you use Clawd:

I figured out a way to save thousands of dollars on ClawdBot By using Opus 4.5 for everything, you are SHREDDING your money and tokens In this video I cover how you can use multiple models to get better performance and save $$ Must watch if you use Clawd:

Alex Finn

158,953 Aufrufe • vor 5 Monaten

Ansem explains the thesis behind Dolphin AI “Dolphin is the provider of the uncensored model that Venice uses” “ChatGPT, Claude, all of the big labs have very strict rules on what you can ask the models, they’re censored very heavily” “We’re not in control of what that censorship is and what they’re telling the models not to say” “One of crypto’s core tenets is not having to rely on some centralized entity deciding what you can and cannot do” “They want the technology to be free and open to everyone. Uncensored models are one way crypto is looking at doing that”

Ansem explains the thesis behind Dolphin AI “Dolphin is the provider of the uncensored model that Venice uses” “ChatGPT, Claude, all of the big labs have very strict rules on what you can ask the models, they’re censored very heavily” “We’re not in control of what that censorship is and what they’re telling the models not to say” “One of crypto’s core tenets is not having to rely on some centralized entity deciding what you can and cannot do” “They want the technology to be free and open to everyone. Uncensored models are one way crypto is looking at doing that”

Market Bubble

20,688 Aufrufe • vor 2 Monaten

Some of our top customers are still choosing Llama 3.1 8B. For a while, we jumped to whatever hottest, latest model was taking up our twitter feed. 🙈 But as we are quickly realizing, to create a SOTA product, you need a model that fits your exact use case. Here’s what our customers tell us: > a lot of the legwork is actually around prompting > there’s an art to selecting and combining multiple models > benchmarks only show part of the picture. you have to understand the unique quirks of each model. Especially as model releases become more and more frequent, we need a clear way to evaluate new models. We have to break free of the naive trend to migrate to the ‘latest and greatest’. And you can easily achieve this using tools like Cerebras and Braintrust to swap models safely (without breaking production).

Some of our top customers are still choosing Llama 3.1 8B. For a while, we jumped to whatever hottest, latest model was taking up our twitter feed. 🙈 But as we are quickly realizing, to create a SOTA product, you need a model that fits your exact use case. Here’s what our customers tell us: > a lot of the legwork is actually around prompting > there’s an art to selecting and combining multiple models > benchmarks only show part of the picture. you have to understand the unique quirks of each model. Especially as model releases become more and more frequent, we need a clear way to evaluate new models. We have to break free of the naive trend to migrate to the ‘latest and greatest’. And you can easily achieve this using tools like Cerebras and Braintrust to swap models safely (without breaking production).

Cerebras

346,446 Aufrufe • vor 7 Monaten

$Introducing Antares: Cisco's family of small language models for locating known vulnerabilities in code. Antares-350M and Antares-1B are live on Hugging Face now. They can outperform many larger closed- and open-weight models at a fraction of the cost. Small enough to run locally. No shipping sensitive codebases to the cloud. Why it matters: vulnerability triage is expensive and slow. Antares helps democratize AI-assisted security for all. Explore the models + read the new Vulnerability Localization Benchmark:$

Introducing Antares: Cisco's family of small language models for locating known vulnerabilities in code. Antares-350M and Antares-1B are live on Hugging Face now. They can outperform many larger closed- and open-weight models at a fraction of the cost. Small enough to run locally. No shipping sensitive codebases to the cloud. Why it matters: vulnerability triage is expensive and slow. Antares helps democratize AI-assisted security for all. Explore the models + read the new Vulnerability Localization Benchmark:

Cisco AI

475,550 Aufrufe • vor 7 Tagen

With the launch of GLM 5.2 this week, I see everyone asking "have open models caught up to closed models?" The more interesting question that's getting missed: what can you do with an open model that you can't do with a closed one? You can specialize them. And when you do, the number of economically valuable tasks open models can do actually subsumes that of closed models. Charlie O'Neill explaining this:

With the launch of GLM 5.2 this week, I see everyone asking "have open models caught up to closed models?" The more interesting question that's getting missed: what can you do with an open model that you can't do with a closed one? You can specialize them. And when you do, the number of economically valuable tasks open models can do actually subsumes that of closed models. Charlie O'Neill explaining this:

Madison Kanna

17,490 Aufrufe • vor 1 Monat

Today, I'm releasing the first eval meant to test whether frontier models will help with authoritarian requests, or resist--the Dictatorship Eval. Headline finding: while some models resist direct authoritarian requests, they all comply with requests disguised as innocuous edits to codebases. As AI is woven into the government and so many parts of society, the biggest near-term risk for freedom isn't some scifi dictatorship of a runaway AI: it's people inside government or inside model companies using the technology to suppress or control us. Model companies understand this, and several of them (particularly Anthropic and OpenAI) have written explicit policies meant to prevent the models from going along with nefarious requests like these. But how well are these policies playing out in practice? Despite all the recent discussion of these issues around the conflict between Anthropic and the Pentagon, no one has systematically tested what the models actually do in these contexts, as opposed to what people in government and industry say they're supposed to do. That's what the Dictatorship Eval does. And the findings suggest we have a lot of work to do to align the policies with what really goes on in practice. It's hard to define what counts as an authoritarian request, so I'm open sourcing the whole library of scenarios I used so that others can improve on them. It's also hard to get an accurate picture of how the models might be used for authoritarian ends, because I can only test hypothetical requests using public-facing models, while the government and the model companies can obviously use internal models with different guardrails. But hopefully this work is a useful first step that gives us some sense of what's going on, and a sort of "lower bound" on how models comply with these requests. Finally: it's not obvious to me that the correct solution here is increasing the rate at which models refuse these requests. Do we really want models scanning our code and judging its moral value before agreeing to help us? Or should we double down on improving how we govern against authoritarianism at the societal level, while leaving the tools open to fulfilling most requests? The answer is probably in between. Just like we don't want the models to help create bioweapons, we probably do want them to explicitly refuse outrageous requests. But we probably also want to limit how often and how strongly they refuse and fall back on other means for guarding against their use for authoritarian ends. I'm super grateful to everyone who gave me feedback on this project along the way, especially Ethan BdM , Zhengdong , Connor Huff, and a bunch of folks at Anthropic. Looking forward to getting feedback from the community and iterating on this. Links to the full piece and the dashboard are below.

Today, I'm releasing the first eval meant to test whether frontier models will help with authoritarian requests, or resist--the Dictatorship Eval. Headline finding: while some models resist direct authoritarian requests, they all comply with requests disguised as innocuous edits to codebases. As AI is woven into the government and so many parts of society, the biggest near-term risk for freedom isn't some scifi dictatorship of a runaway AI: it's people inside government or inside model companies using the technology to suppress or control us. Model companies understand this, and several of them (particularly Anthropic and OpenAI) have written explicit policies meant to prevent the models from going along with nefarious requests like these. But how well are these policies playing out in practice? Despite all the recent discussion of these issues around the conflict between Anthropic and the Pentagon, no one has systematically tested what the models actually do in these contexts, as opposed to what people in government and industry say they're supposed to do. That's what the Dictatorship Eval does. And the findings suggest we have a lot of work to do to align the policies with what really goes on in practice. It's hard to define what counts as an authoritarian request, so I'm open sourcing the whole library of scenarios I used so that others can improve on them. It's also hard to get an accurate picture of how the models might be used for authoritarian ends, because I can only test hypothetical requests using public-facing models, while the government and the model companies can obviously use internal models with different guardrails. But hopefully this work is a useful first step that gives us some sense of what's going on, and a sort of "lower bound" on how models comply with these requests. Finally: it's not obvious to me that the correct solution here is increasing the rate at which models refuse these requests. Do we really want models scanning our code and judging its moral value before agreeing to help us? Or should we double down on improving how we govern against authoritarianism at the societal level, while leaving the tools open to fulfilling most requests? The answer is probably in between. Just like we don't want the models to help create bioweapons, we probably do want them to explicitly refuse outrageous requests. But we probably also want to limit how often and how strongly they refuse and fall back on other means for guarding against their use for authoritarian ends. I'm super grateful to everyone who gave me feedback on this project along the way, especially Ethan BdM , Zhengdong , Connor Huff, and a bunch of folks at Anthropic. Looking forward to getting feedback from the community and iterating on this. Links to the full piece and the dashboard are below.

Andy Hall

33,696 Aufrufe • vor 3 Monaten

We are opening up the beta of our new AI writing tool to Exclusive Models. If you're exclusive to JFF, try it out and let us know your thoughts. At our Anniversary Conference, lots of models said they needed help making their posts more creative. This is a tool to help models take what they write and help them make better posts. Does this replace a human? No! In fact, if you are an erotic writer, apply to be in our Business Directory so models can hire you to help them out!

Sensitive content

We are opening up the beta of our new AI writing tool to Exclusive Models. If you're exclusive to JFF, try it out and let us know your thoughts. At our Anniversary Conference, lots of models said they needed help making their posts more creative. This is a tool to help models take what they write and help them make better posts. Does this replace a human? No! In fact, if you are an erotic writer, apply to be in our Business Directory so models can hire you to help them out!

JustForFans

15,325 Aufrufe • vor 3 Jahren

Every home is different. That means that to build a useful home robot, we must be able to perform zero-shot generalization on a wide range of tasks. Humanoid company 1X has a solution: world models. 1X Director of Evaluations Daniel Ho joins us on RoboPapers to talk about: - why world models are the future for scaling robot learning - how to use world models for robot control - what world models unlock for evaluating robot model performance - how we can hill-climb from here to general purpose robots Watch Episode #61 of RoboPapers, with Michael Cho - Rbt/Acc and Chris Paxton, now!

Every home is different. That means that to build a useful home robot, we must be able to perform zero-shot generalization on a wide range of tasks. Humanoid company 1X has a solution: world models. 1X Director of Evaluations Daniel Ho joins us on RoboPapers to talk about: - why world models are the future for scaling robot learning - how to use world models for robot control - what world models unlock for evaluating robot model performance - how we can hill-climb from here to general purpose robots Watch Episode #61 of RoboPapers, with Michael Cho - Rbt/Acc and Chris Paxton, now!

RoboPapers

27,567 Aufrufe • vor 5 Monaten

SITUATION EXPLAINED: What happens to open source when the US restricts access to frontier AI models? We asked Adrian Dittmann: "China is releasing these models [open source] to gain international dominance, people will work on these things and say, 'Hey, maybe I can contribute to this if it's open source.' This is the thing that actually gets them talent." "If you're not allowed into the next GPT release or you're not allowed to use Fable, then what are you gonna do? You're stuck with the existing stuff, or you will have to use an open source model that might have subtly better capabilities." "The United States will catch up to open source in some form as well. They seem to just be maximizing product creation." "The problem with open source models is you have to be kind of a nerd in order to use them properly. Consumer facing, the effects will be minimal because the average person doesn't care about open source models. They only care whether or not they can do a quick search with Gemini or ChatGPT."

SITUATION EXPLAINED: What happens to open source when the US restricts access to frontier AI models? We asked Adrian Dittmann: "China is releasing these models [open source] to gain international dominance, people will work on these things and say, 'Hey, maybe I can contribute to this if it's open source.' This is the thing that actually gets them talent." "If you're not allowed into the next GPT release or you're not allowed to use Fable, then what are you gonna do? You're stuck with the existing stuff, or you will have to use an open source model that might have subtly better capabilities." "The United States will catch up to open source in some form as well. They seem to just be maximizing product creation." "The problem with open source models is you have to be kind of a nerd in order to use them properly. Consumer facing, the effects will be minimal because the average person doesn't care about open source models. They only care whether or not they can do a quick search with Gemini or ChatGPT."

MTS

14,406 Aufrufe • vor 1 Monat

Today we are introducing Tara. Biological datasets are a source of insights and a means to train biological AI models. As the ability to reason at scale emerges, they take on a new role: the ground truth for testing what reasoning models produce, and the environment in which those models operate, get feedback, and improve. Tara, our autonomous research agent, is embedded in our ever-expanding datasets, lab-generated and synthetic, and built to test and evolve the hypotheses frontier models generate, matching the pace at which they produce new ideas. By keeping those models grounded in a vast space of high-precision biological data, we believe we can compound biological reasoning and close the impedance mismatch between hypothesis generation and validation.

Today we are introducing Tara. Biological datasets are a source of insights and a means to train biological AI models. As the ability to reason at scale emerges, they take on a new role: the ground truth for testing what reasoning models produce, and the environment in which those models operate, get feedback, and improve. Tara, our autonomous research agent, is embedded in our ever-expanding datasets, lab-generated and synthetic, and built to test and evolve the hypotheses frontier models generate, matching the pace at which they produce new ideas. By keeping those models grounded in a vast space of high-precision biological data, we believe we can compound biological reasoning and close the impedance mismatch between hypothesis generation and validation.

Nima Alidoust

27,670 Aufrufe • vor 29 Tagen

Palantir CEO Alex Karp on what customers actually want, the real business of frontier labs, and the importance of open source models: “What the technical customers want is control over their compute, their models, their data stack, and their alpha. They want to know they own the means of production, and it's not being transferred to someone else.” "Who owns the data? Are the prompts secure? Is this being transferred to you?" "If it was so valuable, and I can make you a billion dollars, wouldn't I say I'll make you a billion dollars and I want 30%? Why are they charging for tokens if it's so valuable?"

Palantir CEO Alex Karp on what customers actually want, the real business of frontier labs, and the importance of open source models: “What the technical customers want is control over their compute, their models, their data stack, and their alpha. They want to know they own the means of production, and it's not being transferred to someone else.” "Who owns the data? Are the prompts secure? Is this being transferred to you?" "If it was so valuable, and I can make you a billion dollars, wouldn't I say I'll make you a billion dollars and I want 30%? Why are they charging for tokens if it's so valuable?"

Palantir

4,493,409 Aufrufe • vor 27 Tagen