ๆญฃๅœจๅŠ ่ฝฝ่ง†้ข‘...

่ง†้ข‘ๅŠ ่ฝฝๅคฑ่ดฅ

GROK 5. the first 7 trillion parameter model

39,839 ๆฌก่ง‚็œ‹ โ€ข 5 ไธชๆœˆๅ‰ โ€ขvia X (Twitter)

0 ๆก่ฏ„่ฎบ

ๆš‚ๆ— ่ฏ„่ฎบ

ๅŽŸๅง‹ๅธ–ๅญ็š„่ฏ„่ฎบๅฐ†ๆ˜พ็คบๅœจ่ฟ™้‡Œ

็›ธๅ…ณ่ง†้ข‘

Karpathy told Dwarkesh that a 1 billion parameter model, trained on clean data, could hit the intelligence of today's 1.8 trillion parameter frontier. That is a 1,800x compression claim. The math behind it is more defensible than it sounds. When researchers at frontier labs look at random samples from their training corpus, they see stock ticker symbols, broken HTML, forum spam, autogenerated gibberish. Not Wikipedia. Not the Wall Street Journal. The actual pretraining dataset is mostly noise, and the model is burning parameters to vaguely remember all of it. One estimate pegs Llama 3's information compression at 0.07 bits per token. Well-structured English carries around 1.5 bits per token of real information. The trillion-parameter model is holding a roughly 5% resolution image of the internet it trained on. So when a lab ships a 1.8 trillion parameter model, the overwhelming majority of those weights are handling rough memorization. They are compression overhead for a noisy training set, taking up capacity that could be doing reasoning instead. Karpathy's proposal is to separate the two. Build a cognitive core: a small model that contains only the algorithms for reasoning and problem-solving, stripped of encyclopedic memorization. Pair it with external memory the model queries when it needs a fact. A 1 billion parameter reasoner plus retrieval beats a 1.8 trillion parameter model trying to do both. The data already supports this direction. GPT-4o runs at roughly 200 billion parameters and outperforms the original 1.8 trillion GPT-4. Inference costs for GPT-3.5 level performance fell 280x between 2022 and 2024, driven almost entirely by smaller, cleaner, better-architected models. The trend line is pointing where Karpathy says it should. The real implication for anyone tracking the AI trade: data quality is the actual constraint. The companies winning the next phase will be the ones who figured out what to train on, and what to throw away.

Aakash Gupta

507,508 ๆฌก่ง‚็œ‹ โ€ข 1 ไธชๆœˆๅ‰