Prompting is cool and all, but isn't it a waste of compute to encode a prompt over and over again? We learn to compress prompts up to 26x by using "gist tokens", saving memory+storage and speeding up LM inference: (w/ Xiang Lisa Li and noahdgoodman) 🧵