Thomas Wimmer

MSc Thomas Wimmer

Address: Max-Planck-Institut für Informatik
Saarland Informatics Campus
Campus E1 4
66123 Saarbrücken
Location: E1 4 - 628
Phone: +49 681 9325 2123
Fax: +49 681 9325 2099
E-mail: Get email via email

Personal Information

My research focuses on 3D computer vision and visual semantics and lies at the intersection of computer vision, computer graphics and geometry processing. I am generally interested in the reconstruction, understanding and generation of (dynamic) 3D scenes. Currently, I am focusing on the task of semantic correspondence, which lies at the heart of many computer vision and graphics applications.

Please check my personal website https://wimmerth.github.io for updates on my research.

Note to Students Interested in Master's Theses or HiWi Positions
We receive many inquiries each week, and unfortunately, most are overly generic and show little awareness of our group's research. If you're genuinely interested in joining us, please take the time to explore our current work. In your email, highlight how your background aligns with our research interests. Ideally, propose a concrete project idea or a topic you're enthusiastic about. Doing this significantly increases your chances of a positive response.

Publications

2025

Conference paper

D2

T. Wimmer, M. Oechsle, M. Niemeyer, and F. Tombari

“Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes,” in 3DV 2025, International Conference on 3D Vision, Singapore.

more

Abstract

State-of-the-art novel view synthesis methods achieve impressive results for
multi-view captures of static 3D scenes. However, the reconstructed scenes
still lack "liveliness," a key component for creating engaging 3D experiences.
Recently, novel video diffusion models generate realistic videos with complex
motion and enable animations of 2D images, however they cannot naively be used
to animate 3D scenes as they lack multi-view consistency. To breathe life into
the static world, we propose Gaussians2Life, a method for animating parts of
high-quality 3D scenes in a Gaussian Splatting representation. Our key idea is
to leverage powerful video diffusion models as the generative component of our
model and to combine these with a robust technique to lift 2D videos into
meaningful 3D motion. We find that, in contrast to prior work, this enables
realistic animations of complex, pre-existing 3D scenes and further enables the
animation of a large variety of object classes, while related work is mostly
focused on prior-based character animation, or single 3D objects. Our model
enables the creation of consistent, immersive 3D experiences for arbitrary
scenes.

BibTeX

@inproceedings{Wimmer3DV25,
TITLE = {Gaussians-to-Life: {T}ext-Driven Animation of {3D Gaussian} Splatting Scenes},
AUTHOR = {Wimmer, Thomas and Oechsle, Michael and Niemeyer, Michael and Tombari, Federico},
LANGUAGE = {eng},
PUBLISHER = {IEEE},
YEAR = {2025},
PUBLREMARK = {Accepted},
MARGINALMARK = {$\bullet$},
ABSTRACT = {State-of-the-art novel view synthesis methods achieve impressive results for<br>multi-view captures of static 3D scenes. However, the reconstructed scenes<br>still lack "liveliness," a key component for creating engaging 3D experiences.<br>Recently, novel video diffusion models generate realistic videos with complex<br>motion and enable animations of 2D images, however they cannot naively be used<br>to animate 3D scenes as they lack multi-view consistency. To breathe life into<br>the static world, we propose Gaussians2Life, a method for animating parts of<br>high-quality 3D scenes in a Gaussian Splatting representation. Our key idea is<br>to leverage powerful video diffusion models as the generative component of our<br>model and to combine these with a robust technique to lift 2D videos into<br>meaningful 3D motion. We find that, in contrast to prior work, this enables<br>realistic animations of complex, pre-existing 3D scenes and further enables the<br>animation of a large variety of object classes, while related work is mostly<br>focused on prior-based character animation, or single 3D objects. Our model<br>enables the creation of consistent, immersive 3D experiences for arbitrary<br>scenes.<br>},
BOOKTITLE = {3DV 2025, International Conference on 3D Vision},
ADDRESS = {Singapore},
}

Endnote

%0 Conference Proceedings
%A Wimmer, Thomas
%A Oechsle, Michael
%A Niemeyer, Michael
%A Tombari, Federico
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
External Organizations
External Organizations
External Organizations
%T Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-DAEF-C
%D 2025
%B International Conference on 3D Vision
%Z date of event: 2025-03-25 - 2025-03-28
%C Singapore
%X   State-of-the-art novel view synthesis methods achieve impressive results for<br>multi-view captures of static 3D scenes. However, the reconstructed scenes<br>still lack "liveliness," a key component for creating engaging 3D experiences.<br>Recently, novel video diffusion models generate realistic videos with complex<br>motion and enable animations of 2D images, however they cannot naively be used<br>to animate 3D scenes as they lack multi-view consistency. To breathe life into<br>the static world, we propose Gaussians2Life, a method for animating parts of<br>high-quality 3D scenes in a Gaussian Splatting representation. Our key idea is<br>to leverage powerful video diffusion models as the generative component of our<br>model and to combine these with a robust technique to lift 2D videos into<br>meaningful 3D motion. We find that, in contrast to prior work, this enables<br>realistic animations of complex, pre-existing 3D scenes and further enables the<br>animation of a large variety of object classes, while related work is mostly<br>focused on prior-based character animation, or single 3D objects. Our model<br>enables the creation of consistent, immersive 3D experiences for arbitrary<br>scenes.<br>
%K Computer Science, Computer Vision and Pattern Recognition, cs.CV
%B 3DV 2025
%I IEEE

Conference paper

D2

M. Asim, C. Wewer, T. Wimmer, B. Schiele, and J. E. Lenssen

“MEt3R: Measuring Multi-View Consistency in Generated Images,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025), Nashville, TN, USA.

more

BibTeX

@inproceedings{Asim_CVPR25,
TITLE = {{MEt3R}: {M}easuring Multi-View Consistency in Generated Images},
AUTHOR = {Asim, Mohammad and Wewer, Christopher and Wimmer, Thomas and Schiele, Bernt and Lenssen, Jan Eric},
LANGUAGE = {eng},
PUBLISHER = {IEEE},
YEAR = {2025},
PUBLREMARK = {Accepted},
MARGINALMARK = {$\bullet$},
BOOKTITLE = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)},
ADDRESS = {Nashville, TN, USA},
}

Endnote

%0 Conference Proceedings
%A Asim, Mohammad
%A Wewer, Christopher
%A Wimmer, Thomas
%A Schiele, Bernt
%A Lenssen, Jan Eric
%+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society
%T MEt3R: Measuring Multi-View Consistency in Generated Images : 
%G eng
%U http://hdl.handle.net/21.11116/0000-0010-7934-C
%D 2025
%B IEEE/CVF Conference on Computer Vision and Pattern Recognition
%Z date of event: 2025-06-11 - 2025-06-15
%C Nashville, TN, USA
%B IEEE/CVF Conference on Computer Vision and Pattern Recognition
%I IEEE