Publications - The Year Before Last
2023
- “Divide & Bind Your Attention for Improved Generative Semantic Nursing,” in 34th British Machine Vision Conference (BMVC 2023), Aberdeen, UK, 2023.
- “Class-Incremental Exemplar Compression for Class-Incremental Learning,” in 36th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, 2023.
- “Transitivity Recovering Decompositions: Interpretable and Robust Fine-Grained Relationships,” in Advances in Neural Information Processing Systems 36 (NeurIPS 2023), New Orleans, LA, USA, 2023.
- “LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching,” in Advances in Neural Information Processing Systems 36 (NeurIPS 2023), New Orleans, LA, USA, 2023.
- “Differentiable Architecture Search: a One-Shot Method?,” in AutoML Conference 2023, Potsdam/Berlin, Germany, 2023.
- “A Polyhedral Study of Lifted Multicuts,” Discrete Optimization, vol. 47, 2023.
- “SoftMatch: Addressing the Quantity-Quality Tradeoff in Semi-supervised Learning,” in Eleventh International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda, 2023.
- “Towards Robust Object Detection Invariant to Real-World Domain Shifts,” in Eleventh International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda, 2023.
- “Neural Architecture Design and Robustness: A Dataset,” in Eleventh International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda, 2023.more
Abstract
Deep learning models have proven to be successful in a wide
range of machine learning tasks. Yet, they are often highly sensitive to
perturbations on the input data which can lead to incorrect decisions
with high confidence, hampering their deployment for practical
use-cases. Thus, finding architectures that are (more) robust against
perturbations has received much attention in recent years. Just like the
search for well-performing architectures in terms of clean accuracy,
this usually involves a tedious trial-and-error process with one
additional challenge: the evaluation of a network's robustness is
significantly more expensive than its evaluation for clean accuracy.
Thus, the aim of this paper is to facilitate better streamlined research
on architectural design choices with respect to their impact on
robustness as well as, for example, the evaluation of surrogate measures
for robustness. We therefore borrow one of the most commonly considered
search spaces for neural architecture search for image classification,
NAS-Bench-201, which contains a manageable size of 6466 non-isomorphic
network designs. We evaluate all these networks on a range of common
adversarial attacks and corruption types and introduce a database on
neural architecture design and robustness evaluations. We further
present three exemplary use cases of this dataset, in which we (i)
benchmark robustness measurements based on Jacobian and Hessian matrices
for their robustness predictability, (ii) perform neural architecture
search on robust accuracies, and (iii) provide an initial analysis of
how architectural design choices affect robustness. We find that
carefully crafting the topology of a network can have substantial impact
on its robustness, where networks with the same parameter count range in
mean adversarial robust accuracy from 20%-41%. - “Temperature Schedules for Self-Supervised Contrastive Methods on Long-Tail Data,” in Eleventh International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda, 2023.
- “FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning,” in Eleventh International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda, 2023.
- “Visual Coherence Loss for Coherent and Visually Grounded Story Generation,” in Findings of the Association for Computational Linguistics (ACL 2023), Toronto, Canada, 2023.more
Abstract
Local coherence is essential for long-form text generation models. We identify two important aspects of local coherence within the visual storytelling task: (1) the model needs to represent re-occurrences of characters within the image sequence in order to mention them correctly in the story; (2) character representations should enable us to find instances of the same characters and distinguish different characters. In this paper, we propose a loss function inspired by a linguistic theory of coherence for self-supervised learning for image sequence representations. We further propose combining features from an object and a face detector to construct stronger character features. To evaluate input-output relevance that current reference-based metrics don't measure, we propose a character matching metric to check whether the models generate referring expressions correctly for characters in input image sequences. Experiments on a visual story generation dataset show that our proposed features and loss function are effective for generating more coherent and visually grounded stories.
- “Weakly-Supervised Domain Adaptive Semantic Segmentation With Prototypical Contrastive Learning,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, 2023.
- “HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, 2023.
- “Federated Incremental Semantic Segmentation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, 2023.
- “Continuous Pseudo-Label Rectified Domain Adaptive Semantic Segmentation With Implicit Neural Representations,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, 2023.
- “Improving Robustness of Vision Transformers by Reducing Sensitivity To Patch Corruptions,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, 2023.
- “MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, 2023.
- “A Meta-Learning Approach to Predicting Performance and Data Requirements,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, 2023.
- “Self-Supervised Pre-Training With Masked Shape Prediction for 3D Scene Understanding,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, 2023.
- “Continual Detection Transformer for Incremental Object Detection,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, 2023.
- “Object Pop-Up: Can We Infer 3D Objects and their Poses from Human Interactions Alone?,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, 2023.
- “DSVT: Dynamic Sparse Voxel Transformer With Rotated Sets,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, 2023.
- “Virtual Sparse Convolution for Multimodal 3D Object Detection,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, 2023.
- “Visibility Aware Human-Object Interaction Tracking from Single RGB Camera,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, 2023.
- “ConQueR: Query Contrast Voxel-DETR for 3D Object Detection,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, Canada, 2023.
- “TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses,” in IEEE/CVF International Conference on Computer Vision (ICCV 2023), Paris, France, 2023.
- “SSB: Simple but Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning,” in IEEE/CVF International Conference on Computer Vision (ICCV 2023), Paris, France, 2023.
- “Robustifying Token Attention for Vision Transformers,” in IEEE/CVF International Conference on Computer Vision (ICCV 2023), Paris, France, 2023.
- “Studying How to Efficiently and Effectively Guide Models with Explanations,” in IEEE/CVF International Conference on Computer Vision (ICCV 2023), Paris, France, 2023.
- “DARTH: Holistic Test-time Adaptation for Multiple Object Tracking,” in IEEE/CVF International Conference on Computer Vision (ICCV 2023), Paris, France, 2023.
- “In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval,” in IEEE/CVF International Conference on Computer Vision (ICCV 2023), Paris, France, 2023.
- “Learning by Sorting: Self-supervised Learning with Group Ordering Constraints,” in IEEE/CVF International Conference on Computer Vision (ICCV 2023), Paris, France, 2023.
- “UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation,” in IEEE/CVF International Conference on Computer Vision (ICCV 2023), Paris, France, 2023.
- “SimNP: Learning Self-Similarity Priors Between Neural Points,” in IEEE/CVF International Conference on Computer Vision (ICCV 2023), Paris, France, 2023.
- “NSF: Neural Surface Fields for Human Modeling from Monocular Depth,” in IEEE/CVF International Conference on Computer Vision (ICCV 2023), Paris, France, 2023.
- “On the Unreasonable Vulnerability of Transformers for Image Restoration – and an Easy Fix,” in IEEE/CVF International Conference on Computer Vision Workshops (ICCVW 2023), Paris, France, 2023.
- “Classification Robustness to Common Optical Aberrations,” in IEEE/CVF International Conference on Computer Vision Workshops (ICCVW 2023), Paris, France, 2023.
- “HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection,” in IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 2023.
- “Test-time Domain Adaptation for Monocular Depth Estimation,” in IEEE International Conference on Robotics and Automation (ICRA 2023), London, UK, 2023.
- “TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction,” in IEEE International Conference on Robotics and Automation (ICRA 2023), London, UK, 2023.
- “Optimising for Interpretability: Convolutional Dynamic Alignment Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 6, 2023.
- “LayerNet: High-Resolution Semantic 3D Reconstruction of Clothed People,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 2, 2023.
- “Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, 2023.
- “A Deeper Look into DeepCap,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 4, 2023.more
Abstract
Human performance capture is a highly important computer vision problem with
many applications in movie production and virtual/augmented reality. Many
previous performance capture approaches either required expensive multi-view
setups or did not recover dense space-time coherent geometry with
frame-to-frame correspondences. We propose a novel deep learning approach for
monocular dense human performance capture. Our method is trained in a weakly
supervised manner based on multi-view supervision completely removing the need
for training data with 3D ground truth annotations. The network architecture is
based on two separate networks that disentangle the task into a pose estimation
and a non-rigid surface deformation step. Extensive qualitative and
quantitative evaluations show that our approach outperforms the state of the
art in terms of quality and robustness. This work is an extended version of
DeepCap where we provide more detailed explanations, comparisons and results as
well as applications. - “Higher-Order Multicuts for Geometric Model Fitting and Motion Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, 2023.more
Abstract
Minimum cost lifted multicut problem is a generalization of the multicut problem and is a means to optimizing a decomposition of a graph w.r.t. both positive and negative edge costs. Its main advantage is that multicut-based formulations do not require the number of components given a priori; instead, it is deduced from the solution. However, the standard multicut cost function is limited to pairwise relationships between nodes, while several important applications either require or can benefit from a higher-order cost function, i.e. hyper-edges. In this paper, we propose a pseudo-boolean formulation for a multiple model fitting problem. It is based on a formulation of any-order minimum cost lifted multicuts, which allows to partition an undirected graph with pairwise connectivity such as to minimize costs defined over any set of hyper-edges. As the proposed formulation is NP-hard and the branch-and-bound algorithm is too slow in practice, we propose an efficient local search algorithm for inference into resulting problems. We demonstrate versatility and effectiveness of our approach in several applications: geometric multiple model fitting, homography and motion estimation, motion segmentation.
- “Random and Adversarial Bit Error Robustness: Energy-Efficient and Secure DNN Accelerators,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, 2023.
- “SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 6, 2023.
- “Urban Scene Semantic Segmentation With Low-Cost Coarse Annotation,” in 2023 IEEE Winter Conference on Applications of Computer Vision (WACV 2023), Waikoloa Village, HI, USA, 2023.
- “Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation,” in 2023 IEEE Winter Conference on Applications of Computer Vision (WACV 2023), Waikoloa Village, HI, USA, 2023.
- “Jointly Learning Band Selection and Filter Array Design for Hyperspectral Imaging,” in 2023 IEEE Winter Conference on Applications of Computer Vision (WACV 2023), Waikoloa Village, HI, USA, 2023.
- “Intra-Source Style Augmentation for Improved Domain Generalization,” in 2023 IEEE Winter Conference on Applications of Computer Vision (WACV 2023), Waikoloa Village, HI, USA, 2023.
- “Revisiting Consistency Regularization for Semi-supervised Learning,” International Journal of Computer Vision, vol. 131, 2023.
- “Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation,” International Journal of Computer Vision, vol. 131, 2023.
- “3D Object Detection for Autonomous Driving: A Comprehensive Survey,” International Journal of Computer Vision, vol. 131, 2023.
- “Improving Primary-Vertex Reconstruction with a Minimum-Cost Lifted Multicut Graph Partitioning Algorithm,” Journal of Instrumentation, vol. 18, 2023.
- “Towards Understanding Climate Change Perceptions: A Social Media Dataset,” in NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning, New Orleans, LA, USA, 2023.
- “Learning Comprehensive Global Features in Person Re-identification: Ensuring Discriminativeness of more Local Regions,” Pattern Recognition, vol. 134, 2023.
- “Certified Robust Models with Slack Control and Large Lipschitz Constants,” in Pattern Recognition (DAGM GCPR 2023), Heidelberg, Germany, 2023.
- “An Evaluation of Zero-Cost Proxies - From Neural Architecture Performance Prediction to Model Robustness,” in Pattern Recognition (DAGM GCPR 2023), Heidelberg, Germany, 2023.
- “FullFormer: Generating Shapes Inside Shapes,” in Pattern Recognition (DAGM GCPR 2023), Heidelberg, Germany, 2023.
- “Unfolding Local Growth Rate Estimates for (Almost) Perfect Adversarial Detection,” in Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. - Vol. 5, VISAPP (VISIGRAPP 2023), Lisbon, Portugal, 2023.
- “Online Hyperparameter Optimization for Class-Incremental Learning,” in Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, DC, USA, 2023.
- “Joint Self-Supervised Image-Volume Representation Learning with Intra-Inter Contrastive Clustering,” in Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, DC, USA, 2023.
- “Learning Context-Aware Classifier for Semantic Segmentation,” in Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, DC, USA, 2023.
- “ClusterFuG: Clustering Fully connected Graphs by Multicut,” in Proceedings of the 40th International Conference on Machine Learning (ICML 2023), Honolulu, Hawaii, USA, 2023.
- “Discovering Class-Specific GAN Controls for Semantic Image Synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2023), Vancouver, Canada, 2023.
- “Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences,” Transactions of the Association for Computational Linguistics, vol. 11, 2023.
- “Improving Native CNN Robustness with Filter Frequency Regularization,” Transactions on Machine Learning Research, vol. 2023, 2023.
- “Happy People --Image Synthesis as Black-Box Optimization Problem in the Discrete Latent Space of Deep Generative Models,” in Workshop Generative Models for Computer Vision, Vancouver, Canada, 2023.
- “Modelling 3D Humans : Pose, Shape, Clothing and Interactions,” Universität des Saarlandes, Saarbrücken, 2023.
- “Holistically Explainable Vision Transformers,” 2023. [Online]. Available: https://arxiv.org/abs/2301.08669.more
Abstract
Transformers increasingly dominate the machine learning landscape across many
tasks and domains, which increases the importance for understanding their
outputs. While their attention modules provide partial insight into their inner
workings, the attention scores have been shown to be insufficient for
explaining the models as a whole. To address this, we propose B-cos
transformers, which inherently provide holistic explanations for their
decisions. Specifically, we formulate each model component - such as the
multi-layer perceptrons, attention layers, and the tokenisation module - to be
dynamic linear, which allows us to faithfully summarise the entire transformer
via a single linear transform. We apply our proposed design to Vision
Transformers (ViTs) and show that the resulting models, dubbed Bcos-ViTs, are
highly interpretable and perform competitively to baseline ViTs on ImageNet.
Code will be made available soon. - “Relational Deep Learning: Graph Representation Learning on Relational Databases,” 2023. [Online]. Available: https://arxiv.org/abs/2312.04615.more
Abstract
Much of the world's most valued data is stored in relational databases and
data warehouses, where the data is organized into many tables connected by
primary-foreign key relations. However, building machine learning models using
this data is both challenging and time consuming. The core problem is that no
machine learning method is capable of learning on multiple tables
interconnected by primary-foreign key relations. Current methods can only learn
from a single table, so the data must first be manually joined and aggregated
into a single training table, the process known as feature engineering. Feature
engineering is slow, error prone and leads to suboptimal models. Here we
introduce an end-to-end deep representation learning approach to directly learn
on data laid out across multiple tables. We name our approach Relational Deep
Learning (RDL). The core idea is to view relational databases as a temporal,
heterogeneous graph, with a node for each row in each table, and edges
specified by primary-foreign key links. Message Passing Graph Neural Networks
can then automatically learn across the graph to extract representations that
leverage all input data, without any manual feature engineering. Relational
Deep Learning leads to more accurate models that can be built much faster. To
facilitate research in this area, we develop RelBench, a set of benchmark
datasets and an implementation of Relational Deep Learning. The data covers a
wide spectrum, from discussions on Stack Exchange to book reviews on the Amazon
Product Catalog. Overall, we define a new research area that generalizes graph
machine learning and broadens its applicability to a wide set of AI use cases. - “An Extended Study of Human-like Behavior under Adversarial Training,” 2023. [Online]. Available: https://arxiv.org/abs/2303.12669.more
Abstract
Neural networks have a number of shortcomings. Amongst the severest ones is
the sensitivity to distribution shifts which allows models to be easily fooled
into wrong predictions by small perturbations to inputs that are often
imperceivable to humans and do not have to carry semantic meaning. Adversarial
training poses a partial solution to address this issue by training models on
worst-case perturbations. Yet, recent work has also pointed out that the
reasoning in neural networks is different from humans. Humans identify objects
by shape, while neural nets mainly employ texture cues. Exemplarily, a model
trained on photographs will likely fail to generalize to datasets containing
sketches. Interestingly, it was also shown that adversarial training seems to
favorably increase the shift toward shape bias. In this work, we revisit this
observation and provide an extensive analysis of this effect on various
architectures, the common $\ell_2$- and $\ell_\infty$-training, and
Transformer-based models. Further, we provide a possible explanation for this
phenomenon from a frequency perspective. - “Fix your downsampling ASAP! Be natively more robust via Aliasing and Spectral Artifact free Pooling,” 2023. [Online]. Available: https://arxiv.org/abs/2307.09804.more
Abstract
Convolutional neural networks encode images through a sequence of
convolutions, normalizations and non-linearities as well as downsampling
operations into potentially strong semantic embeddings. Yet, previous work
showed that even slight mistakes during sampling, leading to aliasing, can be
directly attributed to the networks' lack in robustness. To address such issues
and facilitate simpler and faster adversarial training, [12] recently proposed
FLC pooling, a method for provably alias-free downsampling - in theory. In this
work, we conduct a further analysis through the lens of signal processing and
find that such current pooling methods, which address aliasing in the frequency
domain, are still prone to spectral leakage artifacts. Hence, we propose
aliasing and spectral artifact-free pooling, short ASAP. While only introducing
a few modifications to FLC pooling, networks using ASAP as downsampling method
exhibit higher native robustness against common corruptions, a property that
FLC pooling was missing. ASAP also increases native robustness against
adversarial attacks on high and low resolution data while maintaining similar
clean accuracy or even outperforming the baseline. - “Learning from Imperfect Data Incremental Learning and Few-shot Learning,” Universität des Saarlandes, Saarbrücken, 2023.
- “Divide & Bind Your Attention for Improved Generative Semantic Nursing,” 2023. [Online]. Available: https://arxiv.org/abs/2307.10864.more
Abstract
Emerging large-scale text-to-image generative models, e.g., Stable Diffusion
(SD), have exhibited overwhelming results with high fidelity. Despite the
magnificent progress, current state-of-the-art models still struggle to
generate images fully adhering to the input prompt. Prior work, Attend &
Excite, has introduced the concept of Generative Semantic Nursing (GSN), aiming
to optimize cross-attention during inference time to better incorporate the
semantics. It demonstrates promising results in generating simple prompts,
e.g., "a cat and a dog". However, its efficacy declines when dealing with more
complex prompts, and it does not explicitly address the problem of improper
attribute binding. To address the challenges posed by complex prompts or
scenarios involving multiple entities and to achieve improved attribute
binding, we propose Divide & Bind. We introduce two novel loss objectives for
GSN: a novel attendance loss and a binding loss. Our approach stands out in its
ability to faithfully synthesize desired objects with improved attribute
alignment from complex prompts and exhibits superior performance across
multiple evaluation benchmarks. - “Local Spherical Harmonics Improve Skeleton-Based Hand Action Recognition,” 2023. [Online]. Available: https://arxiv.org/abs/2308.10557.more
Abstract
Hand action recognition is essential. Communication, human-robot
interactions, and gesture control are dependent on it. Skeleton-based action
recognition traditionally includes hands, which belong to the classes which
remain challenging to correctly recognize to date. We propose a method
specifically designed for hand action recognition which uses relative angular
embeddings and local Spherical Harmonics to create novel hand representations.
The use of Spherical Harmonics creates rotation-invariant representations which
make hand action recognition even more robust against inter-subject differences
and viewpoint changes. We conduct extensive experiments on the hand joints in
the First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose
Annotations, and on the NTU RGB+D 120 dataset, demonstrating the benefit of
using Local Spherical Harmonics Representations. Our code is available at
github.com/KathPra/LSHR_LSHT. - “Improving Quality and Controllability in GAN-based Image Synthesis,” Universität des Saarlandes, Saarbrücken, 2023.