Abstract
We introduce PersonaHOI, a training- and tuning-free framework that fuses a
general StableDiffusion model with a personalized face diffusion (PFD) model to
generate identity-consistent human-object interaction (HOI) images. While
existing PFD models have advanced significantly, they often overemphasize
facial features at the expense of full-body coherence, PersonaHOI introduces an
additional StableDiffusion (SD) branch guided by HOI-oriented text inputs. By
incorporating cross-attention constraints in the PFD branch and spatial merging
at both latent and residual levels, PersonaHOI preserves personalized facial
details while ensuring interactive non-facial regions. Experiments, validated
by a novel interaction alignment metric, demonstrate the superior realism and
scalability of PersonaHOI, establishing a new standard for practical
personalized face with HOI generation. Our code will be available at
github.com/JoyHuYY1412/PersonaHOI