
⿻ Andrew Trask
@iamtrask • 81,706 subscribers
i build & teach AI with attribution-based control @openminedorg @GoogleDeepMind @OxfordUni
Videos

Excited to release a new repo: abcGPT! It can be hard to "dial in" the voice you want from an LLM, because an LLM is a tangled superposition of millions of voices from millions of different authors around the world. Instead, frontier LLMs tend to give that slop-ish / generic / corporate tone that's hard to avoid, even with aggressive prompting and an informative context window. Lately I've been experimenting with some ideas on the fringes of attribution/unlearning, trying to make it so an AI user can "dial in" the specific voice/style/sources they want to use in a way that's more rigorous than prompting/context-engineering. and I'm starting to get pretty good results. the model below uses the following technique: - Take nanoGPT as written by Andrej Karpathy - Assign each neuron a random "specialty score" m between 0 and 1, sampled from a U-shape so most neurons land near 0 or near 1 with some in the middle. - Freeze this "m" for the lifetime of the network (it's the neuron's permanent corpus assignment) - Extend the forward() code with an α parameter, a kind of vibe-fader from 0 to 1. Think of each neuron's m as its position on that same slider. The slider acts like a spotlight: it lights up neurons whose m is near its current position, and silences those far away. Slide all the way to 0, and only TinyStories specialists fire. Slide all the way to 1, and only Shakespeare specialists fire. - Train this new nanoGPT on two datasets (in this case, TinyStories and Shakespeare) - During training, sample α from Beta(0.5, 0.5) AND draw the corpus from Bernoulli(α), so a Shakespeare batch tends to come with a high-α (Shakespeare-favoring) gate, and a TinyStories batch tends to come with low-α. - train until golden brown 🧑🍳 Perhaps surprisingly... it works! ¯\_(ツ)_/¯ The neurons we pre-assigned to Shakespeare learn to behave as Shakespeare specialists. the neurons we pre-assigned to TinyStories become children's-story specialists. the halfsies learn to bridge between them. After training, you can play with the kindof... vibe dial... you can "dial in" the voice you want during inference, by choosing whether to lean on Shakespeare or TinyStories neurons more or less. 📀💿 When you fully dial in Shakespeare neurons, the model only outputs tokens which look like Shakespeare, and when you fully dial in TinyStories, the model only outputs tokens which look like children's stories, and... (honestly this was the hard part)... everywhere inbetween! In a way, it's partitioning statistical signal into fuzzy segments, and then the end user can choose which pre-training data sources they want to lean upon for generation... and how much. My goal was to get a version of this working at scale, with clear intuition for why it works, and I'd like to explore ways to scale up this effect to large numbers of sources and larger models, and study the interplay between individuality/generality as scale increases. Link to repo and a detailed walkthrough of the abcGPT methodology in the reply.
⿻ Andrew Trask131,466 次观看 • 11 天前

A few days ago I asked if anyone was still interested in decentralized AI. Turns out... yeah! So here's lecture 1: Decentralized AI From Scratch We build a peer-to-peer AI from scratch in about 50 lines of Python. It runs on your laptop, answers your friends' WhatsApp messages using your local data, and begins to address the privacy / prompt-injection problem through user-specific context management.
⿻ Andrew Trask25,291 次观看 • 2 个月前

IMO — Decentralized AI is more than: - an AI model in the sky, with good external auditing - an AI model in the sky, which people vote on how to use - an AI model in the sky, which is free for anyone to use - open source AI - federated training None of these are truly an interface to the world's collective intelligence. Each is actually... *mostly* centralized AI... but with the right ambitions!!! In this podcast, I lay out what I think a true decentralized AI ecosystem looks like, and my guesses on how to get there. The key use-case is broad listening (video below describes broad listening) (link to full podcast in reply)
⿻ Andrew Trask65,168 次观看 • 9 个月前
没有更多内容可加载