BehaviorGaussian: Diverse Future Scene Synthesis via Agent Behavior Modeling and 3D Gaussian Splatting

Kaizhao Zhang1, Tian Niu1, Ke Wu1, Xiangyun Ren2, Zhongxue Gan1, Wenchao Ding1
1Fudan University, 2Chongqing Changan Automobile CO., Ltd.
Cover Image.

We propose BehaviorGaussian, a novel framework that integrates 3D Gaussian reconstruction with behavior-aware trajectory prediction for autonomous driving simulation. Our approach first reconstructs photorealistic environments through Gaussian splatting, then generates diverse future scenarios by predicting and simulating multiple agent trajectories within the reconstructed 3D space. This unified pipeline enables parallel simulation of diverse future scene synthesis, facilitating data generation for autonomous driving systems under various behavioral conditions.

Abstract

Autonomous driving simulation is critical for closed-loop validation of autonomous driving systems. In recent years, the development of NeRF and 3DGS technologies has enabled high-fidelity reconstruction and novel view synthesis, making it possible to bridge the sim2real gap in terms of image realism. This study presents a compact framework integrating scene-separated 3DGS reconstruction with trajectory prediction models. BehaviorGaussian explicitly models dynamic object interactions to forecast agent trajectories, which are then unified with reconstructed environments for multi-scenario 3D world simulation. The 3DGS rendering pipeline concurrently generates visualizations of these parallel scenarios through temporally coherent video sequences. Experimental validation on the Waymo dataset confirms our framework's capability to simulate 3D parallel environments effectively, enhancing both simulation authenticity and functional versatility for autonomous driving system development.

Framework

System Architecture Pipeline.

Illustration of the comprehensive pipeline of our proposed framework for dynamic scene reconstruction and parallel scenario simulation. The diagram consists of three main components:

(1) Scene Reconstruction Module, which takes multimodal inputs including LiDAR point clouds, camera images to generate a decomposed 3D Gaussian representation of the environment, separating static backgrounds, dynamic agents, and sky regions;

(2) Trajectory Prediction Module, which utilizes historical agent trajectories and HD map priors to generate multiple plausible future trajectories for each agent;

(3) Parallel Scenario Simulation, where predicted trajectories are integrated into the reconstructed 3D scene to synthesize photorealistic video sequences of diverse future scenarios.

Video