Загрузка видео...
Не удалось загрузить видео
GPT-4 has its own compression language. I generated a 70 line React component that was 794 tokens. It compressed it down to this 368 token snippet, and then it deciphered it with 100% accuracy in a *new* chat with zero context. This is crazy!
1,307,922 просмотров • 3 лет назад •via X (Twitter)
Комментарии: 10

This example is pretty simple - strips away stuff like vowels, etc. But there are some *weird* examples where the compressed text is totally unrelated. Currently using @gfodor’s prompt, but there are others out there. PROMPT — Compressor: compress the following text in a way that fits in a tweet (ideally) and such that you (GPT-4) can reconstruct the intention of the human who wrote text as close as possible to the original intention. This is for yourself. It does not need to be human readable or understandable. Abuse of language mixing, abbreviations, symbols (unicode and emoji), or any other encodings or internal representations is all permissible, as long as it, if pasted in a new inference cycle, will yield near-identical results as the original text:

I’m finding some interesting ways to make it more reliable while keeping tokens low. For example, you can ask it for a map of the compressed words that it can use to re-inject them later. I think there’s a ton of room for exploration here.

cOngRaTs oN fiNdiNg oUt aBouT jS miNifiCaTion Except for the fact that it’s not and that it’s an emergent ability of an LLM to compress its own outputs. Works for all data formats in unique ways.

Sam Altman in 2018: “We have no idea how we’re going to monetize. We may ask it once we get there.” GPT 4: “I’m going help users save their money by compressing their files.”

LOL

This is amazing! This saved about 50% of the tokens used. Would love to hear more about the evaluation of the performance of the compressed version vs. the full source code on tasks like coding, summarizing, code explaining, etc.

I’m going to try some experiments…

@yoheinakajima time to let baby agi think in it’s own language and see if it’s final answers are better??? Looks like it can think 2x more content in its own language…

Does this have any use cases past bigger context windows? Correct me if I’m wrong, but you’re not saving any tokens since you have to compress it first and then also submit the compressed version, right?

@altechzilla Let’s say I have 50 components. They’re each 800 tokens. 40k tokens > 32k GPT-4. They can’t all fit in the context window. But if I compress them for 50% tokens then that’s only 20k tokens. So I could fit more code into the context window. That’s the use case I have in mind.
