Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing

Cong Wang^1,2, Di Kang², He-Yi Sun¹, Shen-Han Qian³, Zi-Xuan Wang⁴, Linchao Bao², Song-Hai Zhang¹,

¹Tsinghua University, ²Tencent, ³Technical University of Munich, ⁴Carnegie Mellon University

Comparisons on Novel View Synthesis

MeGA captures more accurate target expressions and detailed skin textures (e.g., forehead wrinkles), while avoiding artifacts inside the mouth.

The Gaussian Head Avatar actually is not consistent with the ground truth (even if they look similar), especially at the beginning of the dynamic expression video (i.e., Row 2 and 4) when there is a noticeable location shift.

Comparisons on Novel Expression Synthesis

When rendering novel expressions, MeGA produces more detailed facial textures while allowing better matching of target expressions.

As mentioned in our main text, we use the same train/test split as GaussianAvatars. Compared to the split used by the original Gaussian Head Avatar (GHA), our split uses much less expression sequences for training, amplifing the weakness of GHA in rendering novel expressions. Training details and possible reasons are stated in our supplementary document.

Comparisons on Cross-identity Reenactment

Driven by Normal Expressions

Driven by Extreme Expressions

Driven by Talking Head (with audio)

Data from VOCA (Capture, learning, and synthesis of 3D speaking styles, CVPR 2019).

The Gaussian Head Avatar employs the BFM module as their 3DMM model and provide a repository to estimate the BFM parameters from multi-view videos. However, they do not provide a script to estimate the BFM parameters from a given mesh scan. So we don't provide their talking head demo here.

Driving Edited Head Avatars

Driven by a normal expression of his own:

Driven by a extreme expression of another person: