正在加载视频...
视频加载失败
We have a new, embarrassingly simple method for generating sprites and levels from text. It's also fast. We call it the five-dollar model. Read the paper: And see it in acton:
55,028 次观看 • 2 年前 •via X (Twitter)
10 条评论

By training on a few hundred human-created 2D levels and their annotations, we created a model that could reliably create levels that corresponded to simple text prompts. In milliseconds.

We also got good results on generating sprites by training on pico-8 sprites with simple descriptions, and decent results from training on emojis.

So what's the neural network architecture like? Well, here's where it becomes embarrassing: it's just, like, a feedforward network. No adversarial training, no diffusion. Basically vintage 1990's neural networks, plus the transformer for the text embedding of course.

One of the things we do that might feel a bit like cheating is that in the training phase, we augment the data by using GPT-4 to reformulate the level captions, so that we get a much larger training set. We also do standard augmentations on the levels/sprites.

However, during inference we don't do anything that wouldn't run comfortably on an old cellphone or a Switch.

As a bonus, the generator shows some nice compositional properties in embedding space.

This paper is led by @BigTimeMothy with contributions from Roman Negri, @dipikarajesh18, @MasterMilkX, and yours truly. It will be presented at @AIIDEconference in October. Read the full paper here:

@dipikarajesh18 @MasterMilkX @AIIDEconference ...the name? You want to know why gave a neural network such as peculiar, pecuniar name? Well, Tim and Roman made a bet for five dollars as to whether this architecture would work. It just seemed too simple.

Roman, one of the authors, just got himself a Twitter account: @romanoneg

Code and model here: Repository might be somewhat in progress, if something is missing ping @BigTimeMothy and he'll fix it.
