RIGID

Abstract

GAN inversion is indispensable for applying the powerful editability of GAN to real images. However, existing methods invert video frames individually often lead to undesired inconsistent results over time. In this paper, we propose a unify recurrent framework, named Recurrent vIdeo GAN Inversion and eDiting (RIGID), to explicitly and simultaneously enforce temporally coherent GAN inversion and facial editing of real videos. Our approach models the temporal relations between current and previous frames from three aspects. To enable a faithful real video reconstruction, we first maximize the inversion fidelity and consistency by learning a temporal compensated latent code. Second, we observe incoherence noises lie in high-frequency domain that can be disentangled from the latent space. Third, to remove the inconsistency after attribute manipulation, we propose an in-between frame composition constraint such that the an arbitrary frame must be a direct composite of its neighboring frames. Our unify framework learns the inherent coherence between input frames in an end-to-end manner, and therefore it is agnostic to a specific attribute and can be applied to arbitrary editing of the same video without re-training. Extensive experiments demonstrate that RIGID outperforms state-of-the-art methods qualitatively and quantitatively in both inversion and editing tasks.

RIGID: Recurrent GAN Inversion and Editing of Real Face Videos

Paper

Code

Abstract

Comparison on Video Inversion

Comparison on Video Editing

+ Chubby

+ Smiling

+ Wearing Lipstick

+ Chubby

+ Smiling

+ EyeGlasses

- Age

+ Narrow Eyes

+ Wearing Lipstick

+ Chubby

+ Smiling

+ EyeGlasses

- Age

+ Narrow Eyes