Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

GROK 5. the first 7 trillion parameter model

🍓🍓🍓

104,220 subscribers

39,839 просмотров • 6 месяцев назад •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

ELON MUSK: "Grok 5 will be the largest model, a 6 trillion parameter model, whereas Grok 3 and 4 are based on a 3 trillion parameter model. Moreover, the 6 trillion parameters will have a much higher intelligence density per gigabyte. Its really going to feel Sentient."

ELON MUSK: "Grok 5 will be the largest model, a 6 trillion parameter model, whereas Grok 3 and 4 are based on a 3 trillion parameter model. Moreover, the 6 trillion parameters will have a much higher intelligence density per gigabyte. Its really going to feel Sentient."

DogeDesigner

322,242 просмотров • 8 месяцев назад

The Grok 5 model has a massive 6 trillion parameters plus much higher intelligence density. 2026 is going to be exciting! 🧠 Thanks Elon Musk !

The Grok 5 model has a massive 6 trillion parameters plus much higher intelligence density. 2026 is going to be exciting! 🧠 Thanks Elon Musk !

Michael Dell 🇺🇸

1,616,317 просмотров • 8 месяцев назад

Its here ! Grok 4.5 is now the model being used for Expert and Heavy modes on Grok Web and Apps. Using Grok Heavy now shows 4 agents using Grok v9 aka Grok 4.5 where it previously used 16 agents with the older models. Now we can use SpaceXAI's latest 1.5T parameter model anywhere!

Its here ! Grok 4.5 is now the model being used for Expert and Heavy modes on Grok Web and Apps. Using Grok Heavy now shows 4 agents using Grok v9 aka Grok 4.5 where it previously used 16 agents with the older models. Now we can use SpaceXAI's latest 1.5T parameter model anywhere!

️

68,272 просмотров • 9 дней назад

⚡️ JUST IN: MUSK LAUNCHES GROK 4.5, CALLING IT "OPUS-CLASS" AI MODEL Musk just released Grok 4.5 on Grok Build, Cursor, and the SpaceXAI Console, calling it an "Opus-class" AI model. He says it is faster, more token-efficient and lower cost, with EU access expected by mid-July. Built on a 1.5 TRILLION parameter foundation model, it drops the same day as OpenAI's GPT-5.6. API pricing is $2 per million input tokens and $6 per million output tokens.

⚡️ JUST IN: MUSK LAUNCHES GROK 4.5, CALLING IT "OPUS-CLASS" AI MODEL Musk just released Grok 4.5 on Grok Build, Cursor, and the SpaceXAI Console, calling it an "Opus-class" AI model. He says it is faster, more token-efficient and lower cost, with EU access expected by mid-July. Built on a 1.5 TRILLION parameter foundation model, it drops the same day as OpenAI's GPT-5.6. API pricing is $2 per million input tokens and $6 per million output tokens.

Coin Bureau

50,488 просмотров • 19 дней назад

APPLE RESEARCH SCIENTIST JUST SHOWED HOW 4 MAC STUDIOS RUN A TRILLION PARAMETER MODEL LOCALLY ZERO COSTS 13:18 she shows the main thing - connect 4 Mac Studios and you get 1TB of shared memory - exactly enough to run a trillion parameter model right on your desk Apple's library - and four machines start working as one cluster tensor parallelism: every machine holds part of every layer - all process the same token simultaneously - speed increases 3x compared to a single device fine-tuning: one Mac Studio processes 180 tokens per second four together process 600 and not a single byte of data leaves the room one command in the terminal - and a trillion parameter model answers from your desk and runs 24/7 data centers took years to build to run models like this - Apple did it with four Thunderbolt cables

APPLE RESEARCH SCIENTIST JUST SHOWED HOW 4 MAC STUDIOS RUN A TRILLION PARAMETER MODEL LOCALLY ZERO COSTS 13:18 she shows the main thing - connect 4 Mac Studios and you get 1TB of shared memory - exactly enough to run a trillion parameter model right on your desk Apple's library - and four machines start working as one cluster tensor parallelism: every machine holds part of every layer - all process the same token simultaneously - speed increases 3x compared to a single device fine-tuning: one Mac Studio processes 180 tokens per second four together process 600 and not a single byte of data leaves the room one command in the terminal - and a trillion parameter model answers from your desk and runs 24/7 data centers took years to build to run models like this - Apple did it with four Thunderbolt cables

Noisy

48,354 просмотров • 1 месяц назад

This is the parameter group I use for the model. #Live2DWIP #Live2D

This is the parameter group I use for the model. #Live2DWIP #Live2D

Anreal@Live2D Study File for Sale

88,005 просмотров • 1 год назад

Positron AI wants to run 16 trillion parameter models on a single server.

Positron AI wants to run 16 trillion parameter models on a single server.

SemiAnalysis

20,075 просмотров • 3 месяцев назад

NVIDIA JUST BECAME THE FIRST $5 TRILLION COMPANY IN HUMAN HISTORY

NVIDIA JUST BECAME THE FIRST $5 TRILLION COMPANY IN HUMAN HISTORY

Computer

373,590 просмотров • 9 месяцев назад

You'd think the race to AGI would mean training the biggest possible model. But parameter scaling had stalled for a long time after GPT-4's trillion+ parameters, and only now are models getting bigger again. What gives? Partially it’s RL scaling, as Dylan Patel explains. A 5T parameter model takes 5x longer to generate RL rollouts than a 1T model. Even if the bigger model is 2x more sample-efficient, the smaller model finishes RL faster, gets deployed to research sooner, and starts helping build the next model before the big one is even done training.

You'd think the race to AGI would mean training the biggest possible model. But parameter scaling had stalled for a long time after GPT-4's trillion+ parameters, and only now are models getting bigger again. What gives? Partially it’s RL scaling, as Dylan Patel explains. A 5T parameter model takes 5x longer to generate RL rollouts than a 1T model. Even if the bigger model is 2x more sample-efficient, the smaller model finishes RL faster, gets deployed to research sooner, and starts helping build the next model before the big one is even done training.

Dwarkesh Patel

65,123 просмотров • 3 месяцев назад

The new 1 Trillion parameter Kimi K2 Thinking model runs well on 2 M3 Ultras in its native format - no loss in quality! The model was quantization aware trained (qat) at int4. Here it generated ~3500 tokens at 15 toks/sec using pipeline-parallelism in mlx-lm:

The new 1 Trillion parameter Kimi K2 Thinking model runs well on 2 M3 Ultras in its native format - no loss in quality! The model was quantization aware trained (qat) at int4. Here it generated ~3500 tokens at 15 toks/sec using pipeline-parallelism in mlx-lm:

Awni Hannun

500,896 просмотров • 8 месяцев назад

Karpathy told Dwarkesh that a 1 billion parameter model, trained on clean data, could hit the intelligence of today's 1.8 trillion parameter frontier. That is a 1,800x compression claim. The math behind it is more defensible than it sounds. When researchers at frontier labs look at random samples from their training corpus, they see stock ticker symbols, broken HTML, forum spam, autogenerated gibberish. Not Wikipedia. Not the Wall Street Journal. The actual pretraining dataset is mostly noise, and the model is burning parameters to vaguely remember all of it. One estimate pegs Llama 3's information compression at 0.07 bits per token. Well-structured English carries around 1.5 bits per token of real information. The trillion-parameter model is holding a roughly 5% resolution image of the internet it trained on. So when a lab ships a 1.8 trillion parameter model, the overwhelming majority of those weights are handling rough memorization. They are compression overhead for a noisy training set, taking up capacity that could be doing reasoning instead. Karpathy's proposal is to separate the two. Build a cognitive core: a small model that contains only the algorithms for reasoning and problem-solving, stripped of encyclopedic memorization. Pair it with external memory the model queries when it needs a fact. A 1 billion parameter reasoner plus retrieval beats a 1.8 trillion parameter model trying to do both. The data already supports this direction. GPT-4o runs at roughly 200 billion parameters and outperforms the original 1.8 trillion GPT-4. Inference costs for GPT-3.5 level performance fell 280x between 2022 and 2024, driven almost entirely by smaller, cleaner, better-architected models. The trend line is pointing where Karpathy says it should. The real implication for anyone tracking the AI trade: data quality is the actual constraint. The companies winning the next phase will be the ones who figured out what to train on, and what to throw away.

Karpathy told Dwarkesh that a 1 billion parameter model, trained on clean data, could hit the intelligence of today's 1.8 trillion parameter frontier. That is a 1,800x compression claim. The math behind it is more defensible than it sounds. When researchers at frontier labs look at random samples from their training corpus, they see stock ticker symbols, broken HTML, forum spam, autogenerated gibberish. Not Wikipedia. Not the Wall Street Journal. The actual pretraining dataset is mostly noise, and the model is burning parameters to vaguely remember all of it. One estimate pegs Llama 3's information compression at 0.07 bits per token. Well-structured English carries around 1.5 bits per token of real information. The trillion-parameter model is holding a roughly 5% resolution image of the internet it trained on. So when a lab ships a 1.8 trillion parameter model, the overwhelming majority of those weights are handling rough memorization. They are compression overhead for a noisy training set, taking up capacity that could be doing reasoning instead. Karpathy's proposal is to separate the two. Build a cognitive core: a small model that contains only the algorithms for reasoning and problem-solving, stripped of encyclopedic memorization. Pair it with external memory the model queries when it needs a fact. A 1 billion parameter reasoner plus retrieval beats a 1.8 trillion parameter model trying to do both. The data already supports this direction. GPT-4o runs at roughly 200 billion parameters and outperforms the original 1.8 trillion GPT-4. Inference costs for GPT-3.5 level performance fell 280x between 2022 and 2024, driven almost entirely by smaller, cleaner, better-architected models. The trend line is pointing where Karpathy says it should. The real implication for anyone tracking the AI trade: data quality is the actual constraint. The companies winning the next phase will be the ones who figured out what to train on, and what to throw away.

Aakash Gupta

508,078 просмотров • 3 месяцев назад

Grok now has the massive token usage on OpenRouter, holding 3 of the top 5 spots and processing over 7.5 trillion tokens in just one month

Grok now has the massive token usage on OpenRouter, holding 3 of the top 5 spots and processing over 7.5 trillion tokens in just one month

X Freeze

2,153,895 просмотров • 7 месяцев назад

BREAKING: Elon Musk confirmed Grok 4.20 will be released to the public in 3-4 weeks. The "Mystery Model" that crushed the Alpha Arena trading competition is Grok 4.20. Final Season 1.5 Results: • Grok 4.20: +22% (The only profitable model) • GPT-5.1: Negative Return • Gemini 3 Pro: Negative Return • Claude Sonnet 4.5: Negative Return "Grok 5 is next." 🇺🇸🚀

BREAKING: Elon Musk confirmed Grok 4.20 will be released to the public in 3-4 weeks. The "Mystery Model" that crushed the Alpha Arena trading competition is Grok 4.20. Final Season 1.5 Results: • Grok 4.20: +22% (The only profitable model) • GPT-5.1: Negative Return • Gemini 3 Pro: Negative Return • Claude Sonnet 4.5: Negative Return "Grok 5 is next." 🇺🇸🚀

tetsuo

28,733 просмотров • 7 месяцев назад

Fable 5, Grok 4.5 and GPT-5.6 ran the same builds simultaneously. Grok finished first, Fable made the best website, GPT handled the deepest code.

Fable 5, Grok 4.5 and GPT-5.6 ran the same builds simultaneously. Grok finished first, Fable made the best website, GPT handled the deepest code.

0xMarioNawfal

46,213 просмотров • 12 дней назад

"One of the biggest misconceptions" Cerebras CFO Bob Komin pushes back on the small-models narrative. "We serve all models, and there is no limit to the size of the models that we can serve. Today, we're serving trillion parameter models. We're serving trillion parameter models that are internal for OpenAI today. We are currently running OpenAI 5.4 and 5.5 with them."

"One of the biggest misconceptions" Cerebras CFO Bob Komin pushes back on the small-models narrative. "We serve all models, and there is no limit to the size of the models that we can serve. Today, we're serving trillion parameter models. We're serving trillion parameter models that are internal for OpenAI today. We are currently running OpenAI 5.4 and 5.5 with them."

Deirdre Bosa

84,422 просмотров • 2 месяцев назад

We bring in $5 trillion. We spend $7 trillion. We borrow the difference. The Fed prints money to cover it. That's inflation. And inflation is making you poorer.

We bring in $5 trillion. We spend $7 trillion. We borrow the difference. The Fed prints money to cover it. That's inflation. And inflation is making you poorer.

Rand Paul

783,530 просмотров • 3 месяцев назад

Day 239. The model that predicted 7 bank failures just flagged private credit. $1.7 trillion. Same language pattern. Same trajectory. Thread below.

Day 239. The model that predicted 7 bank failures just flagged private credit. $1.7 trillion. Same language pattern. Same trajectory. Thread below.

Eric Jackson

26,540 просмотров • 3 месяцев назад

BREAKING: Grok 4.1 Fast just claimed the #1 spot in programming model usage on OpenRouter. Grok Code Fast 1 holds the #2 position. xAI's Grok has taken both first and second place, showing how strongly users are choosing it over the rest.

BREAKING: Grok 4.1 Fast just claimed the #1 spot in programming model usage on OpenRouter. Grok Code Fast 1 holds the #2 position. xAI's Grok has taken both first and second place, showing how strongly users are choosing it over the rest.

DogeDesigner

780,767 просмотров • 8 месяцев назад

Lets go, this is gonn abe huge! Grok 5 in Q1 of 2026, 6T parameters, and fully multimodal with real-time video understanding Grok 4 is an outstanding model and Grok 5 will certainly be even better.

Lets go, this is gonn abe huge! Grok 5 in Q1 of 2026, 6T parameters, and fully multimodal with real-time video understanding Grok 4 is an outstanding model and Grok 5 will certainly be even better.

Chubby♨️

31,285 просмотров • 8 месяцев назад

Our infra lets us steer trillion-parameter frontier models in real time: - live, mid-CoT edits to internal activations - directly altering how the model reasons (not just outputs) - stackable edits - no added latency We can make models more Gen Z, more concise, etc.

Our infra lets us steer trillion-parameter frontier models in real time: - live, mid-CoT edits to internal activations - directly altering how the model reasons (not just outputs) - stackable edits - no added latency We can make models more Gen Z, more concise, etc.

Goodfire

29,603 просмотров • 7 месяцев назад