2025. december 8. · MI Történik? · 3 perc olvasás
DeepMind has published details on SIMA 2, the second version of its ‘Scalable Instructable Multiworld Agent’. SIMA 2 is a game-playing agent which has been developed by taking a Gemini-class frontier model then fine-tuning it on rich interaction-prompt pair data generated from a variety of videogames and education software. The result is a general-purpose AI agent that can carry out a very large range of actions inside 3D worlds, and also something of a triumph for DeepMind whose original research agenda was all about building general intelligence through developing generally capable AI agents through reinforcement learning. What SIMA 2 is: “The SIMA 2 agent architecture is a Gemini Flash-Lite model that is trained using a mixture of gameplay and Gemini pretraining (non-gameplay) data. We found this mixture crucial to maintain the original capabilities of the base model, such as vision understanding, dialogue, reasoning, and promptability,” DeepMind writes. “By training across a growing portfolio of 3D games, the agent shows a remarkable capacity to generalize to previously unseen environments, including photorealistic worlds generated on-the-fly by Genie 3”. Some of the games SIMA 2 was trained on include Goat Simulator 3, No Man’s Sky, and Space Engineers. Held out evaluations: SIMA 2 displays strong generalization - most well evidenced by its performance on ASKA, an early access crafting and survival game about building a viking settlement. SIMA 2 wasn’t directly trained on ASKA and is able to perform well on it out of the box. But most impressively it also displays the ability to self-improve on it - ASKA has a crafting menu which is “quite distinct” from ones SIMA 2 encountered during training, but DeepMind was able to overcome this via the use of a self-improving scaffold. Self improvement: The funny thing about modern AI systems is they’re sufficiently smart you can use them to improve other AI systems. That’s the case here, where a Gemini model is used to set tasks for the SIMA 2 agent to perform that involve manipulating the crafting menu. The Gemini model scores how well it does and then saves the trajectories where it is able to complete the tasks it was set without getting distracted. This data is then fed back into it for fine-tuning, letting it automatically bootstrap its way to better performance. “Through focused effort by the task setter, the agent was eventually able to acquire this skill,” the authors write. As a consequence, the SIMA 2 agent using the self-improving scaffold can do far, far better at the ASKA game than without the ability to self-improve. “Despite purely training on self-generated experience, the resulting agent is capable of progressing much further than SIMA 2, ultimately building a shelter within a one hour time window”.
Miért fontos?
Research like SIMA 2 is the same sort of paradigm I expect people will use to teach robots to be able to do useful, open-ended things in our world: fine-tune a powerful frontier model on a bunch of data gathered from agents taking actions in the world. And in the same way SIMA 2 displays strong generalization, I expect the same for robots as well. Problems remain, but this is a simple, scalable idea, and it naturally leverages the underlying boom in frontier model capabilities, so it’s likely to work: ‘SIMA 2 still faces challenges with very long-horizon, complex tasks that require extensive, multi-step reasoning and goal verification. The agent also has a relatively short memory of its interactions—it must use a limited context window to achieve low-latency interaction,” the authors write. But nonetheless: “these results suggest a promising path toward using self-improvement to eventually bridge the virtual and physical worlds, enabling more capable physically-embodied agents in applications like robotics”.