Videos

Force-Conditioned Video Models as Physical World Models for Agents

Presenter
May 9, 2026
Abstract
A central challenge in agentic scientific computing is equipping agents with world models that can accurately simulate the physical consequences of actions. Recent advances in video generation models have sparked interest in using video to simulate such physical causes and effects. In this talk, we investigate using physical forces as a conditioning signal for video generation, which enable users and agents to interact with images and videos through both localized point forces, such as poking a plant, and global force fields, such as wind blowing on fabric. We demonstrate that these "force prompts" can enable videos to respond realistically to physical control signals by leveraging the visual and motion priors in the original pretrained model, without using any 3D asset or physics simulator at inference time. The primary challenge of force prompting is the difficulty in obtaining high-quality paired force-video training data, both in real-world data, due to the difficulty of obtaining force signals, and in synthetic data, due to limitations in the visual quality and domain diversity of physics simulators. Our key finding is that video generation models can generalize remarkably well when adapted to follow physical force conditioning using only synthetic videos. Building on this, we explore the role of such force-conditioned world models in robot planning and decision-making.
Supplementary Materials