They trained an agent program to play Doom (surely demos or existing bots could do ok at this?)
They probably had that lying around anyway. Traditional bots follow pre-programmed paths, a navmesh. I don’t think you’d get the more organic, exploring gameplay out of those that’s necessary here. They say there’s not enough recorded gameplay from humans.
They use video of the bot playing, along with a list of actions performed, to train an LLM
No, they trained Stable Diffusion 1.4; a so-called latent diffusion model. It does text-to-image. You’re probably aware of such image generators.
They trained it further for their task; to take frames and player input as a prompt instead of text. On google hardware, they get 20 fps out of the AI, which is just enough for humans to play the game, in a sense.
Not quite.
They probably had that lying around anyway. Traditional bots follow pre-programmed paths, a navmesh. I don’t think you’d get the more organic, exploring gameplay out of those that’s necessary here. They say there’s not enough recorded gameplay from humans.
No, they trained Stable Diffusion 1.4; a so-called latent diffusion model. It does text-to-image. You’re probably aware of such image generators.
They trained it further for their task; to take frames and player input as a prompt instead of text. On google hardware, they get 20 fps out of the AI, which is just enough for humans to play the game, in a sense.