Abstract
State-of-the-art novel view synthesis methods achieve impressive results for
multi-view captures of static 3D scenes. However, the reconstructed scenes
still lack "liveliness," a key component for creating engaging 3D experiences.
Recently, novel video diffusion models generate realistic videos with complex
motion and enable animations of 2D images, however they cannot naively be used
to animate 3D scenes as they lack multi-view consistency. To breathe life into
the static world, we propose Gaussians2Life, a method for animating parts of
high-quality 3D scenes in a Gaussian Splatting representation. Our key idea is
to leverage powerful video diffusion models as the generative component of our
model and to combine these with a robust technique to lift 2D videos into
meaningful 3D motion. We find that, in contrast to prior work, this enables
realistic animations of complex, pre-existing 3D scenes and further enables the
animation of a large variety of object classes, while related work is mostly
focused on prior-based character animation, or single 3D objects. Our model
enables the creation of consistent, immersive 3D experiences for arbitrary
scenes.
BibTeX
@inproceedings{Wimmer3DV25, TITLE = {Gaussians-to-Life: {T}ext-Driven Animation of {3D Gaussian} Splatting Scenes}, AUTHOR = {Wimmer, Thomas and Oechsle, Michael and Niemeyer, Michael and Tombari, Federico}, LANGUAGE = {eng}, PUBLISHER = {IEEE}, YEAR = {2025}, PUBLREMARK = {Accepted}, MARGINALMARK = {$\bullet$}, ABSTRACT = {State-of-the-art novel view synthesis methods achieve impressive results for<br>multi-view captures of static 3D scenes. However, the reconstructed scenes<br>still lack "liveliness," a key component for creating engaging 3D experiences.<br>Recently, novel video diffusion models generate realistic videos with complex<br>motion and enable animations of 2D images, however they cannot naively be used<br>to animate 3D scenes as they lack multi-view consistency. To breathe life into<br>the static world, we propose Gaussians2Life, a method for animating parts of<br>high-quality 3D scenes in a Gaussian Splatting representation. Our key idea is<br>to leverage powerful video diffusion models as the generative component of our<br>model and to combine these with a robust technique to lift 2D videos into<br>meaningful 3D motion. We find that, in contrast to prior work, this enables<br>realistic animations of complex, pre-existing 3D scenes and further enables the<br>animation of a large variety of object classes, while related work is mostly<br>focused on prior-based character animation, or single 3D objects. Our model<br>enables the creation of consistent, immersive 3D experiences for arbitrary<br>scenes.<br>}, BOOKTITLE = {3DV 2025, International Conference on 3D Vision}, ADDRESS = {Singapore}, }
Endnote
%0 Conference Proceedings %A Wimmer, Thomas %A Oechsle, Michael %A Niemeyer, Michael %A Tombari, Federico %+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes : %G eng %U http://hdl.handle.net/21.11116/0000-0010-DAEF-C %D 2025 %B International Conference on 3D Vision %Z date of event: 2025-03-25 - 2025-03-28 %C Singapore %X State-of-the-art novel view synthesis methods achieve impressive results for<br>multi-view captures of static 3D scenes. However, the reconstructed scenes<br>still lack "liveliness," a key component for creating engaging 3D experiences.<br>Recently, novel video diffusion models generate realistic videos with complex<br>motion and enable animations of 2D images, however they cannot naively be used<br>to animate 3D scenes as they lack multi-view consistency. To breathe life into<br>the static world, we propose Gaussians2Life, a method for animating parts of<br>high-quality 3D scenes in a Gaussian Splatting representation. Our key idea is<br>to leverage powerful video diffusion models as the generative component of our<br>model and to combine these with a robust technique to lift 2D videos into<br>meaningful 3D motion. We find that, in contrast to prior work, this enables<br>realistic animations of complex, pre-existing 3D scenes and further enables the<br>animation of a large variety of object classes, while related work is mostly<br>focused on prior-based character animation, or single 3D objects. Our model<br>enables the creation of consistent, immersive 3D experiences for arbitrary<br>scenes.<br> %K Computer Science, Computer Vision and Pattern Recognition, cs.CV %B 3DV 2025 %I IEEE