Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Today, AWS CEO Matt Garman announced Nova Forge, a model builder which lets companies inject their own data during the pre-training phase. "You [tell Forge]: 'Here's my corpus of corporate data, here's everything I need to know about my industry.' We then mix that in and finish pre-training the... show more

TBPN

1,129,486 subscribers

97,010 просмотров • 7 месяцев назад •via X (Twitter)

Образование Новости и политика Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

Meet Amazon Nova Forge, the easiest and most cost-effective path to your own frontier models. * Early Nova checkpoints across pre-training, mid-training, and post-training phases * Blend proprietary data with Amazon Nova-curated training data * Reinforcement Fine Tuning (RFT) with reward functions in your environment * Custom content moderation settings

Meet Amazon Nova Forge, the easiest and most cost-effective path to your own frontier models. * Early Nova checkpoints across pre-training, mid-training, and post-training phases * Blend proprietary data with Amazon Nova-curated training data * Reinforcement Fine Tuning (RFT) with reward functions in your environment * Custom content moderation settings

Amazon Web Services

2,344,052 просмотров • 7 месяцев назад

Still following your human intuition to mix corpora from different sources for language model pre-training 🧠? Everyone says that data mixture has a big impact on model performance, but how - and why🕵️? Did you know that web corpora are actually highly impactful for downstream tasks 🏆? Let's check out our preprint "RegMix: Data Mixture as Regression for Language Model Pre-training" 📄 🔬In this paper, we've proposed an automatic data mixture method RegMix that achieves a 6.3% improvement over human selection on the widely used HellaSwag benchmark - and it only needs a 2% extra training FLOPs! 📈 Details in the thread 🧵

Still following your human intuition to mix corpora from different sources for language model pre-training 🧠? Everyone says that data mixture has a big impact on model performance, but how - and why🕵️? Did you know that web corpora are actually highly impactful for downstream tasks 🏆? Let's check out our preprint "RegMix: Data Mixture as Regression for Language Model Pre-training" 📄 🔬In this paper, we've proposed an automatic data mixture method RegMix that achieves a 6.3% improvement over human selection on the widely used HellaSwag benchmark - and it only needs a 2% extra training FLOPs! 📈 Details in the thread 🧵

Qian Liu

54,961 просмотров • 2 лет назад

New short course on Fine-tuning LLMs! Many developers are moving beyond only prompting, to also fine-tuning LLMs - that is, taking a pre-trained model and training it further on your own data, which can deliver superior results inexpensively. In this course, Sharon Zhou, CEO of Lamini (disclosure: I’m a minor shareholder) shows you how to recognize when fine-tuning can be help, and how to train an open-source LLM on your own data. I hope you enjoy the course!

New short course on Fine-tuning LLMs! Many developers are moving beyond only prompting, to also fine-tuning LLMs - that is, taking a pre-trained model and training it further on your own data, which can deliver superior results inexpensively. In this course, Sharon Zhou, CEO of Lamini (disclosure: I’m a minor shareholder) shows you how to recognize when fine-tuning can be help, and how to train an open-source LLM on your own data. I hope you enjoy the course!

Andrew Ng

502,821 просмотров • 2 лет назад

We asked Sholto Douglas from Anthropic about the costs of RL (Reinforcement Learning) runs. "In Dario Amodei's essay, he said that RL runs cost only $1M back in December." "RL is a more naively parallelizable and scalable than pre-training." "With pre-training, you need everything in one big data center ideally. For RL, in theory, you could scale all over the world."

We asked Sholto Douglas from Anthropic about the costs of RL (Reinforcement Learning) runs. "In Dario Amodei's essay, he said that RL runs cost only $1M back in December." "RL is a more naively parallelizable and scalable than pre-training." "With pre-training, you need everything in one big data center ideally. For RL, in theory, you could scale all over the world."

TBPN

76,696 просмотров • 1 год назад

Tether Data, AI model training platform preview. This PaaS will be available to any company interested in (pre-)training own models. Bonus, at the core of this platform we're leveraging Holepunch's tech for all data-structures to make training and models highly-resilient and unstoppable. Soon available via Northern Data Group , leveraging 24k+ H100 GPUs.

Tether Data, AI model training platform preview. This PaaS will be available to any company interested in (pre-)training own models. Bonus, at the core of this platform we're leveraging Holepunch's tech for all data-structures to make training and models highly-resilient and unstoppable. Soon available via Northern Data Group , leveraging 24k+ H100 GPUs.

Paolo Ardoino 🤖

28,092 просмотров • 1 год назад

Building a truly private AI model isn’t as simple as the tech world wants you to believe. despite all the marketing around one click deployments, RAG systems, fine tuning services, and memory features, we still don’t have models that are genuinely trained only on your personal data with complete privacy from third parties. the reality is that most AI solutions today require you to trust someone else with your books, notes, journals, and training data. even when companies promise privacy, there’s usually a way for the provider or other parties to access your information during training or inference. Phala is taking a different approach by building the infrastructure needed for real data ownership. They’re developing confidential runtime environments where your AI training happens inside secure enclaves with attestation guarantees. The key innovation is their in-enclave keying system through dstack, which means that once your model is training, even Phala themselves cannot see your data or model weights.

Building a truly private AI model isn’t as simple as the tech world wants you to believe. despite all the marketing around one click deployments, RAG systems, fine tuning services, and memory features, we still don’t have models that are genuinely trained only on your personal data with complete privacy from third parties. the reality is that most AI solutions today require you to trust someone else with your books, notes, journals, and training data. even when companies promise privacy, there’s usually a way for the provider or other parties to access your information during training or inference. Phala is taking a different approach by building the infrastructure needed for real data ownership. They’re developing confidential runtime environments where your AI training happens inside secure enclaves with attestation guarantees. The key innovation is their in-enclave keying system through dstack, which means that once your model is training, even Phala themselves cannot see your data or model weights.

soulman 🎮

11,702 просмотров • 10 месяцев назад

“don’t train your own model” is common ai advice. it's wrong. your token bill's the proof. today, we’re excited to launch castform into open preview. castform is the easiest way for you to train your own model, on your own data. open-weights models are performant and much cheaper. when trained on your task & proprietary data, they beat closed models. the thing standing between you and that was weeks of plumbing & years of ml expertise. with castform, model training is as simple as prompt engineering. castform bring your agent traces or raw corpora. castform turns it into training data, picks the right algorithmic recipes, manages gpus, and gives you an ide to watch and chat with your model as it learns. see what you can build with castform👇

“don’t train your own model” is common ai advice. it's wrong. your token bill's the proof. today, we’re excited to launch castform into open preview. castform is the easiest way for you to train your own model, on your own data. open-weights models are performant and much cheaper. when trained on your task & proprietary data, they beat closed models. the thing standing between you and that was weeks of plumbing & years of ml expertise. with castform, model training is as simple as prompt engineering. castform bring your agent traces or raw corpora. castform turns it into training data, picks the right algorithmic recipes, manages gpus, and gives you an ide to watch and chat with your model as it learns. see what you can build with castform👇

girish

456,024 просмотров • 1 месяц назад

We just launched @ai_browser's pre-beta – it's a browser that contains your very own team of AI interns that you can teach to your grunt work. Here's me spinning up hundreds of parallel research agents to augment sheet data, right in the browser.

We just launched @ai_browser's pre-beta – it's a browser that contains your very own team of AI interns that you can teach to your grunt work. Here's me spinning up hundreds of parallel research agents to augment sheet data, right in the browser.

Charles Maddock

52,477 просмотров • 1 год назад

Real-world robot data is expensive and slow to collect, creating a major challenge for humanoid development. 🤖 The NVIDIA GR00T N1.6 open vision language action model is pre-trained on a diverse mix of data, including thousands of hours of Stanford Vision and Learning Lab’s BEHAVIOR simulation data, which covers long-horizon everyday manipulation tasks. This diverse training is the key to robust cross-embodiment performance and real-world adaptability. 🌍 Read the blog 🔗

Real-world robot data is expensive and slow to collect, creating a major challenge for humanoid development. 🤖 The NVIDIA GR00T N1.6 open vision language action model is pre-trained on a diverse mix of data, including thousands of hours of Stanford Vision and Learning Lab’s BEHAVIOR simulation data, which covers long-horizon everyday manipulation tasks. This diverse training is the key to robust cross-embodiment performance and real-world adaptability. 🌍 Read the blog 🔗

NVIDIA Robotics

13,456 просмотров • 6 месяцев назад

We asked Angus about the future of robotics and AI training. “I built an industrial-grade kinematic solver for 6 degrees of freedom motion, with full path planning and joint control." "You can stream AI model outputs into it, or run it like a traditional industrial robot." "The benefit of that is all of that information you can stream and record from those joints… you can use that as the training data." "Frankly, moving a robot with a lever is really crap data for training robots. But this gives you smooth, high-quality motion, without needing two robots.”

We asked Angus about the future of robotics and AI training. “I built an industrial-grade kinematic solver for 6 degrees of freedom motion, with full path planning and joint control." "You can stream AI model outputs into it, or run it like a traditional industrial robot." "The benefit of that is all of that information you can stream and record from those joints… you can use that as the training data." "Frankly, moving a robot with a lever is really crap data for training robots. But this gives you smooth, high-quality motion, without needing two robots.”

TBPN

18,811 просмотров • 1 год назад

In case you missed it, we recently launched "Post-training of LLMs," a short course where you'll: ✅ Understand when and why to use post-training methods like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning. ✅ Learn the concepts underlying the three post-training methods of SFT, DPO, and Online RL, their common use-cases, and how to curate high-quality data to effectively train a model using each method. ✅ Download a pre-trained model and implement post-training pipelines to turn a base model into an instruct model, change the identity of a chat assistant, and improve a model’s math capabilities. Learn more and enroll for free:

In case you missed it, we recently launched "Post-training of LLMs," a short course where you'll: ✅ Understand when and why to use post-training methods like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning. ✅ Learn the concepts underlying the three post-training methods of SFT, DPO, and Online RL, their common use-cases, and how to curate high-quality data to effectively train a model using each method. ✅ Download a pre-trained model and implement post-training pipelines to turn a base model into an instruct model, change the identity of a chat assistant, and improve a model’s math capabilities. Learn more and enroll for free:

DeepLearning.AI

16,779 просмотров • 1 год назад

Google's Jeff Dean says current pre-training is passive: initialize a model, stream the internet past it, let it observe But models need to learn not just from data, but by acting, predicting, and choosing what to learn from next "we have this artificial distinction now between pre and post-training, and it shouldn't exist long term"

Google's Jeff Dean says current pre-training is passive: initialize a model, stream the internet past it, let it observe But models need to learn not just from data, but by acting, predicting, and choosing what to learn from next "we have this artificial distinction now between pre and post-training, and it shouldn't exist long term"

Haider.

55,694 просмотров • 3 месяцев назад

🚨 Jensen Huang says everyone panicked about the AI data when MOST training data was never REAL to begin with. Ilya Sutskever told the industry pre-training was over. "Ilya said, 'We're out of data,' or something like that. 'Pre-training is over,' or something like that," Huang says. "The industry panicked, you know, that this is the end of AI." "And of course, of course that's obviously not true. We're gonna keep on scaling the amount of data that we have to train with." "A lot of that data is probably gonna be synthetic." That's where the panic came from — synthetic data sounds like cheating. "Most of the data that we are training, that we teach each other with, inform each other with, is synthetic." "It's synthetic because it didn't come out of nature." "You created it. I'm consuming it. I modify it, augment it, I regenerate it, somebody else consumes it." The textbook in your hand is synthetic. The post you're reading is synthetic. The lecture you took is synthetic. Nature didn't make any of it. Humans did. AI just learned to do the same thing — faster. "Training is now limited by compute," Huang says. "Data is now limited by compute." The data wall wasn't a wall. It was a mirror. If you're new here, follow @AiEvolutio for the latest on ChatGPT, Claude, and the AI tools shaping how we work and create. — Jensen Huang ( NVIDIA ), NVIDIA CEO, on Lex Fridman's ( Lex Fridman ) podcast

🚨 Jensen Huang says everyone panicked about the AI data when MOST training data was never REAL to begin with. Ilya Sutskever told the industry pre-training was over. "Ilya said, 'We're out of data,' or something like that. 'Pre-training is over,' or something like that," Huang says. "The industry panicked, you know, that this is the end of AI." "And of course, of course that's obviously not true. We're gonna keep on scaling the amount of data that we have to train with." "A lot of that data is probably gonna be synthetic." That's where the panic came from — synthetic data sounds like cheating. "Most of the data that we are training, that we teach each other with, inform each other with, is synthetic." "It's synthetic because it didn't come out of nature." "You created it. I'm consuming it. I modify it, augment it, I regenerate it, somebody else consumes it." The textbook in your hand is synthetic. The post you're reading is synthetic. The lecture you took is synthetic. Nature didn't make any of it. Humans did. AI just learned to do the same thing — faster. "Training is now limited by compute," Huang says. "Data is now limited by compute." The data wall wasn't a wall. It was a mirror. If you're new here, follow @AiEvolutio for the latest on ChatGPT, Claude, and the AI tools shaping how we work and create. — Jensen Huang ( NVIDIA ), NVIDIA CEO, on Lex Fridman's ( Lex Fridman ) podcast

AI Evolution

15,565 просмотров • 1 месяц назад

Lightspeed's Bucky Moore says the real opportunity in the AI app layer is in large industries far enough afield from where the model providers are today — and where the context engineering to get customer data into the model is extremely nuanced and messy. "I think this is kind of the elephant in the room right now — whether post-training open-source models combined with the unique user feedback you get from being an application provider is defensible enough." "That is going to be an inevitable challenge for any of these industries that hit a maturation point of AI adoption, like legal and software engineering have." "But on the other hand, there are some industries where they're very large, they're far enough afield from where the model providers are today — and probably will continue to be — and the context engineering to actually get the customer data into the model is just so messy. It requires going across different business functions, it requires a lot of hands-on forward-deployed engineering." "Those are the kind of companies that we get really excited about. Because I think being really good at that is not only defensible, but it also allows you to generate a feedback loop with your customers, where you hear a lot of their secrets. And those secrets allow you to feed that back into how you make your product better at the expense of anyone else playing in the space. Because if you're serving the customer, they're only serving you those secrets." "I think Palantir is a good example of this in the pre-AI era, and I think we're going to see many companies ascend in that same way."

Lightspeed's Bucky Moore says the real opportunity in the AI app layer is in large industries far enough afield from where the model providers are today — and where the context engineering to get customer data into the model is extremely nuanced and messy. "I think this is kind of the elephant in the room right now — whether post-training open-source models combined with the unique user feedback you get from being an application provider is defensible enough." "That is going to be an inevitable challenge for any of these industries that hit a maturation point of AI adoption, like legal and software engineering have." "But on the other hand, there are some industries where they're very large, they're far enough afield from where the model providers are today — and probably will continue to be — and the context engineering to actually get the customer data into the model is just so messy. It requires going across different business functions, it requires a lot of hands-on forward-deployed engineering." "Those are the kind of companies that we get really excited about. Because I think being really good at that is not only defensible, but it also allows you to generate a feedback loop with your customers, where you hear a lot of their secrets. And those secrets allow you to feed that back into how you make your product better at the expense of anyone else playing in the space. Because if you're serving the customer, they're only serving you those secrets." "I think Palantir is a good example of this in the pre-AI era, and I think we're going to see many companies ascend in that same way."

TBPN

46,746 просмотров • 4 месяцев назад

Engram cofounder Jack Morris just raised $98M to build a new type of AI. He says models don't need to get smarter over time. Instead, they just need to know you better and better over time. Jack describes what he's building: "Our product is a new type of AI. We have a pretty different vision from a lot of the frontier labs, which are working on one model per lab, and trying to make that model smarter every month." "There's another way to think about it, which is that the model doesn't need to get smarter every month. It needs to know you better." "So we're working on a whole different stack, which is a way to train models that train themselves to know your world better and adjust to the things that you say." "So: new ways of training, new ways of running the models."

Engram cofounder Jack Morris just raised $98M to build a new type of AI. He says models don't need to get smarter over time. Instead, they just need to know you better and better over time. Jack describes what he's building: "Our product is a new type of AI. We have a pretty different vision from a lot of the frontier labs, which are working on one model per lab, and trying to make that model smarter every month." "There's another way to think about it, which is that the model doesn't need to get smarter every month. It needs to know you better." "So we're working on a whole different stack, which is a way to train models that train themselves to know your world better and adjust to the things that you say." "So: new ways of training, new ways of running the models."

TBPN

69,035 просмотров • 29 дней назад

David Friedberg says Anthropic asked big pharma for their data and nearly everyone said no "There's been an effort by Anthropic to sign up life sciences companies to contribute to a new life sciences focused model. They're approaching these large companies with large proprietary data sets and saying, if you share your data, we will give you early access, some sort of proprietary value. Sign this NDA and you can participate with us." "I think nearly everyone I've spoken with has woken up to the fact that they are trying to commoditize everyone's business. If all of the tens of billions of dollars you have invested in experiments and product development, and you've generated all of this proprietary data along the way, that data is a true asset of your organization. It's an asset that you've spent billions of dollars developing." "And by handing it over to a model company to then combine with other people's data, you are commoditizing the one core differentiation that you have. And so everyone is largely saying no." "I think what everyone's realizing is they're better off developing their own weights and their own models using either an open source basis or there might be some intermediary business model that evolves."

David Friedberg says Anthropic asked big pharma for their data and nearly everyone said no "There's been an effort by Anthropic to sign up life sciences companies to contribute to a new life sciences focused model. They're approaching these large companies with large proprietary data sets and saying, if you share your data, we will give you early access, some sort of proprietary value. Sign this NDA and you can participate with us." "I think nearly everyone I've spoken with has woken up to the fact that they are trying to commoditize everyone's business. If all of the tens of billions of dollars you have invested in experiments and product development, and you've generated all of this proprietary data along the way, that data is a true asset of your organization. It's an asset that you've spent billions of dollars developing." "And by handing it over to a model company to then combine with other people's data, you are commoditizing the one core differentiation that you have. And so everyone is largely saying no." "I think what everyone's realizing is they're better off developing their own weights and their own models using either an open source basis or there might be some intermediary business model that evolves."

dnap

286,036 просмотров • 23 дней назад

Perplexity CEO Aravind Srinivas on the biggest threat to the data center industry: It's not competition. It's not regulation. It's decentralisation. "The biggest threat to a data center is if the intelligence can be packed locally on a chip that's running on the device and then there's no need to inference all of it on like one centralized data center." He outlines how this could work in practice. Personalisation doesn't necessarily require on-device model training. Retrieval augmented generation, tool calls, and local data can already tailor AI to individual users. But the real unlock? Test time training. Aravind Srinivas describes a future where AI lives on your device, watches how you work and gradually automates your repetitive tasks. "Imagine we crack test time training where the AI watches tasks you repeatedly do on your local system, adapts to you over time and starts automating a lot of the things you do." The key insight: in this model, the intelligence belongs to you. It's your data, your device, your personalised AI brain. And if that future arrives, the economics of centralised infrastructure start to collapse. "That really disrupts the whole data center industry. It doesn't make sense to spend all this money, 500 billion, 5 trillion, whatever on building all the centralized data centers across the world that do a lot of the intelligence workloads for people." The companies spending trillions on centralised infrastructure may want to rethink where intelligence actually needs to live.

Perplexity CEO Aravind Srinivas on the biggest threat to the data center industry: It's not competition. It's not regulation. It's decentralisation. "The biggest threat to a data center is if the intelligence can be packed locally on a chip that's running on the device and then there's no need to inference all of it on like one centralized data center." He outlines how this could work in practice. Personalisation doesn't necessarily require on-device model training. Retrieval augmented generation, tool calls, and local data can already tailor AI to individual users. But the real unlock? Test time training. Aravind Srinivas describes a future where AI lives on your device, watches how you work and gradually automates your repetitive tasks. "Imagine we crack test time training where the AI watches tasks you repeatedly do on your local system, adapts to you over time and starts automating a lot of the things you do." The key insight: in this model, the intelligence belongs to you. It's your data, your device, your personalised AI brain. And if that future arrives, the economics of centralised infrastructure start to collapse. "That really disrupts the whole data center industry. It doesn't make sense to spend all this money, 500 billion, 5 trillion, whatever on building all the centralized data centers across the world that do a lot of the intelligence workloads for people." The companies spending trillions on centralised infrastructure may want to rethink where intelligence actually needs to live.

Big Brain AI

90,102 просмотров • 5 месяцев назад

oh my.. this shouldn't be possible a 1B model that runs inside your browser, beats every model its size, and comes with its own desktop pet. MiniCPM-5 1B just changed the game for on-device AI. here's everything you need to know 🧵

oh my.. this shouldn't be possible a 1B model that runs inside your browser, beats every model its size, and comes with its own desktop pet. MiniCPM-5 1B just changed the game for on-device AI. here's everything you need to know 🧵

Muhammad Ayan

62,050 просмотров • 2 месяцев назад

Tony Blair and Oracle co-founder Larry Ellison plan to use digital ID to "unify" all data on each country's citizens "so it can be consumed and used by" their AI models. "We have to take all of this data... and move it into a single... unified data platform." "When we want to ask a question, we've provided that AI model with all the data they need to understand our country." "We need to unify all of the national data, put it into a database where it's easily consumable by the AI model, and then ask whatever question you like."

Tony Blair and Oracle co-founder Larry Ellison plan to use digital ID to "unify" all data on each country's citizens "so it can be consumed and used by" their AI models. "We have to take all of this data... and move it into a single... unified data platform." "When we want to ask a question, we've provided that AI model with all the data they need to understand our country." "We need to unify all of the national data, put it into a database where it's easily consumable by the AI model, and then ask whatever question you like."

Wide Awake Media

54,084 просмотров • 8 месяцев назад