Current Year

PhD
[1]
A. Abbas, “Efficient and Differentiable Combinatorial Optimization for Visual Computing,” Universität des Saarlandes, Saarbrücken, 2024.
Export
BibTeX
@phdthesis{ThesisPhDAbbas24, TITLE = {Efficient and Differentiable Combinatorial Optimization for Visual Computing}, AUTHOR = {Abbas, Ahmed}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291--ds-426550}, DOI = {10.22028/D291-42655}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2024}, DATE = {2024}, }
Endnote
%0 Thesis %A Abbas, Ahmed %Y Swoboda, Paul %A referee: Schiele, Bernt %A referee: Kumar, Pawan %+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations %T Efficient and Differentiable Combinatorial Optimization for Visual Computing : %G eng %U http://hdl.handle.net/21.11116/0000-000F-D701-D %R 10.22028/D291-42655 %U urn:nbn:de:bsz:291--ds-426550 %F OTHER: hdl:20.500.11880/38409 %I Universität des Saarlandes %C Saarbrücken %D 2024 %P 140 p. %V phd %9 phd %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/38409
[2]
M. Böhle, “Towards Designing Inherently Interpretable Deep Neural Networks for Image Classification,” Universität des Saarlandes, Saarbrücken, 2024.
Export
BibTeX
@phdthesis{BoehlePhD24, TITLE = {Towards Designing Inherently Interpretable Deep Neural Networks for Image Classification}, AUTHOR = {B{\"o}hle, Moritz}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291--ds-421904}, DOI = {10.22028/D291-42190}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2024}, DATE = {2024}, }
Endnote
%0 Thesis %A Böhle, Moritz %Y Schiele, Bernt %Y Fritz, Mario %A referee: Akata, Zeynep %A referee: Brendel, Wieland %+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Towards Designing Inherently Interpretable Deep Neural Networks for Image Classification : %G eng %U http://hdl.handle.net/21.11116/0000-000F-76AA-D %U urn:nbn:de:bsz:291--ds-421904 %R 10.22028/D291-42190 %F OTHER: hdl:20.500.11880/37907 %I Universität des Saarlandes %C Saarbrücken %D 2024 %P XIII, 265 p. %V phd %9 phd %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/37907
[3]
A. Kukleva, “Advancing Image and Video Recognition with Less Supervision,” Universität des Saarlandes, Saarbrücken, 2024.
Abstract
Deep learning is increasingly relevant in our daily lives, as it simplifies tedious tasks and enhances quality of life across various domains such as entertainment, learning, automatic assistance, and autonomous driving. However, the demand for more data to train models for emerging tasks is increasing dramatically. Deep learning models heavily depend on the quality and quantity of data, necessitating high-quality labeled datasets. Yet, each task requires different types of annotations for training and evaluation, posing challenges in obtaining comprehensive supervision. The acquisition of annotations is not only resource-intensive in terms of time and cost but also introduces biases, such as granularity in classification, where distinctions like specific breeds versus generic categories may arise. Furthermore, the dynamic nature of the world causes the challenge that previously annotated data becomes potentially irrelevant, and new categories and rare occurrences continually emerge, making it impossible to label every aspect of the world. <br>Therefore, this thesis aims to explore various supervision scenarios to mitigate the need for full supervision and reduce data acquisition costs. Specifically, we investigate learning without labels, referred to as self-supervised and unsupervised methods, to better understand video and image representations. To learn from data without labels, we leverage injected priors such as motion speed, direction, action order in videos, or semantic information granularity to obtain powerful data representations. Further, we study scenarios involving reduced supervision levels. To reduce annotation costs, first, we propose to omit precise annotations for one modality in multimodal learning, namely in text-video and image-video settings, and transfer available knowledge to large copora of video data. Second, we study semi-supervised learning scenarios, where only a subset of annotated data alongside unlabeled data is available, and propose to revisit regularization constraints and improve generalization to unlabeled data. Additionally, we address scenarios where parts of available data is inherently limited due to privacy and security reasons or naturally rare events, which not only restrict annotations but also limit the overall data volume. For these scenarios, we propose methods that carefully balance between previously obtained knowledge and incoming limited data by introducing a calibration method or combining a space reservation technique with orthogonality constraints. Finally, we explore multimodal and unimodal open-world scenarios where the model is asked to generalize beyond the given set of object or action classes. Specifically, we propose a new challenging setting on multimodal egocentric videos and propose an adaptation method for vision-language models to generalize on egocentric domain. Moreover, we study unimodal image recognition in an open-set setting and propose to disentangle open-set detection and image classification tasks that effectively improve generalization in different settings. <br>In summary, this thesis investigates challenges arising when full supervision for training models is not available. We develop methods to understand learning dynamics and the role of biases in data, while also proposing novel setups to advance training with less supervision.
Export
BibTeX
@phdthesis{ThesisPhDKukleva24, TITLE = {Advancing Image and Video Recognition with Less Supervision}, AUTHOR = {Kukleva, Anna}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291--ds-426798}, DOI = {10.22028/D291-42679}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2024}, DATE = {2024}, ABSTRACT = {Deep learning is increasingly relevant in our daily lives, as it simplifies tedious tasks and enhances quality of life across various domains such as entertainment, learning, automatic assistance, and autonomous driving. However, the demand for more data to train models for emerging tasks is increasing dramatically. Deep learning models heavily depend on the quality and quantity of data, necessitating high-quality labeled datasets. Yet, each task requires different types of annotations for training and evaluation, posing challenges in obtaining comprehensive supervision. The acquisition of annotations is not only resource-intensive in terms of time and cost but also introduces biases, such as granularity in classification, where distinctions like specific breeds versus generic categories may arise. Furthermore, the dynamic nature of the world causes the challenge that previously annotated data becomes potentially irrelevant, and new categories and rare occurrences continually emerge, making it impossible to label every aspect of the world. <br>Therefore, this thesis aims to explore various supervision scenarios to mitigate the need for full supervision and reduce data acquisition costs. Specifically, we investigate learning without labels, referred to as self-supervised and unsupervised methods, to better understand video and image representations. To learn from data without labels, we leverage injected priors such as motion speed, direction, action order in videos, or semantic information granularity to obtain powerful data representations. Further, we study scenarios involving reduced supervision levels. To reduce annotation costs, first, we propose to omit precise annotations for one modality in multimodal learning, namely in text-video and image-video settings, and transfer available knowledge to large copora of video data. Second, we study semi-supervised learning scenarios, where only a subset of annotated data alongside unlabeled data is available, and propose to revisit regularization constraints and improve generalization to unlabeled data. Additionally, we address scenarios where parts of available data is inherently limited due to privacy and security reasons or naturally rare events, which not only restrict annotations but also limit the overall data volume. For these scenarios, we propose methods that carefully balance between previously obtained knowledge and incoming limited data by introducing a calibration method or combining a space reservation technique with orthogonality constraints. Finally, we explore multimodal and unimodal open-world scenarios where the model is asked to generalize beyond the given set of object or action classes. Specifically, we propose a new challenging setting on multimodal egocentric videos and propose an adaptation method for vision-language models to generalize on egocentric domain. Moreover, we study unimodal image recognition in an open-set setting and propose to disentangle open-set detection and image classification tasks that effectively improve generalization in different settings. <br>In summary, this thesis investigates challenges arising when full supervision for training models is not available. We develop methods to understand learning dynamics and the role of biases in data, while also proposing novel setups to advance training with less supervision.}, }
Endnote
%0 Thesis %A Kukleva, Anna %Y Schiele, Bernt %A referee: Kuehne, Hilde %A referee: Damen, Dima %A referee: Saenko, Kate %+ Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society External Organizations External Organizations External Organizations %T Advancing Image and Video Recognition with Less Supervision : %G eng %U http://hdl.handle.net/21.11116/0000-000F-C61E-1 %R 10.22028/D291-42679 %U urn:nbn:de:bsz:291--ds-426798 %F OTHER: hdl:20.500.11880/38297 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2024 %P XIII, 267 p. %V phd %9 phd %X Deep learning is increasingly relevant in our daily lives, as it simplifies tedious tasks and enhances quality of life across various domains such as entertainment, learning, automatic assistance, and autonomous driving. However, the demand for more data to train models for emerging tasks is increasing dramatically. Deep learning models heavily depend on the quality and quantity of data, necessitating high-quality labeled datasets. Yet, each task requires different types of annotations for training and evaluation, posing challenges in obtaining comprehensive supervision. The acquisition of annotations is not only resource-intensive in terms of time and cost but also introduces biases, such as granularity in classification, where distinctions like specific breeds versus generic categories may arise. Furthermore, the dynamic nature of the world causes the challenge that previously annotated data becomes potentially irrelevant, and new categories and rare occurrences continually emerge, making it impossible to label every aspect of the world. <br>Therefore, this thesis aims to explore various supervision scenarios to mitigate the need for full supervision and reduce data acquisition costs. Specifically, we investigate learning without labels, referred to as self-supervised and unsupervised methods, to better understand video and image representations. To learn from data without labels, we leverage injected priors such as motion speed, direction, action order in videos, or semantic information granularity to obtain powerful data representations. Further, we study scenarios involving reduced supervision levels. To reduce annotation costs, first, we propose to omit precise annotations for one modality in multimodal learning, namely in text-video and image-video settings, and transfer available knowledge to large copora of video data. Second, we study semi-supervised learning scenarios, where only a subset of annotated data alongside unlabeled data is available, and propose to revisit regularization constraints and improve generalization to unlabeled data. Additionally, we address scenarios where parts of available data is inherently limited due to privacy and security reasons or naturally rare events, which not only restrict annotations but also limit the overall data volume. For these scenarios, we propose methods that carefully balance between previously obtained knowledge and incoming limited data by introducing a calibration method or combining a space reservation technique with orthogonality constraints. Finally, we explore multimodal and unimodal open-world scenarios where the model is asked to generalize beyond the given set of object or action classes. Specifically, we propose a new challenging setting on multimodal egocentric videos and propose an adaptation method for vision-language models to generalize on egocentric domain. Moreover, we study unimodal image recognition in an open-set setting and propose to disentangle open-set detection and image classification tasks that effectively improve generalization in different settings. <br>In summary, this thesis investigates challenges arising when full supervision for training models is not available. We develop methods to understand learning dynamics and the role of biases in data, while also proposing novel setups to advance training with less supervision. %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/38297
[4]
S. Shimada, “Physically Plausible 3D Human Motion Capture and Synthesis with Interactions,” Universität des Saarlandes, Saarbrücken, 2024.
Export
BibTeX
@phdthesis{ThesisPhDShimada2024, TITLE = {Physically Plausible {3D} Human Motion Capture and Synthesis with Interactions}, AUTHOR = {Shimada, Soshi}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291--ds-418503}, DOI = {10.22028/D291-41850}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2024}, DATE = {2024}, }
Endnote
%0 Thesis %A Shimada, Soshi %Y Theobalt, Christian %A referee: P&#233;rez, Patrick %A referee: Komura, Taku %+ Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society External Organizations External Organizations %T Physically Plausible 3D Human Motion Capture and Synthesis with Interactions : %G eng %U http://hdl.handle.net/21.11116/0000-000F-5555-2 %R 10.22028/D291-41850 %U urn:nbn:de:bsz:291--ds-418503 %F OTHER: hdl:20.500.11880/37475 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2024 %V phd %9 phd %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/37475
[5]
E. Tretschk, “Representing and Reconstructing General Non-rigid Objects with Neural Models,” Universität des Saarlandes, Saarbrücken, 2024.
Export
BibTeX
@phdthesis{Tretschk_PhD2024, TITLE = {Representing and Reconstructing General Non-rigid Objects with Neural Models}, AUTHOR = {Tretschk, Edith}, LANGUAGE = {eng}, URL = {urn:nbn:de:bsz:291--ds-416509}, DOI = {doi:10.22028/D291-41650}, SCHOOL = {Universit{\"a}t des Saarlandes}, ADDRESS = {Saarbr{\"u}cken}, YEAR = {2024}, }
Endnote
%0 Thesis %A Tretschk, Edith %Y Theobalt, Christian %A referee: Seidel, Hans-Peter %A referee: Agapito, Lourdes %+ Computer Graphics, MPI for Informatics, Max Planck Society International Max Planck Research School, MPI for Informatics, Max Planck Society Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society Computer Graphics, MPI for Informatics, Max Planck Society %T Representing and Reconstructing General Non-rigid Objects with Neural Models : %G eng %U http://hdl.handle.net/21.11116/0000-000F-2406-2 %R doi:10.22028/D291-41650 %U urn:nbn:de:bsz:291--ds-416509 %F OTHER: hdl:20.500.11880/37392 %I Universit&#228;t des Saarlandes %C Saarbr&#252;cken %D 2024 %V phd %9 phd %U https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/37392