Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing

Cong Wang1,2, Di Kang2, He-Yi Sun1, Shen-Han Qian3, Zi-Xuan Wang4, Linchao Bao2, Song-Hai Zhang1,
1Tsinghua University, 2Tencent, 3Technical University of Munich, 4Carnegie Mellon University
Drag the silde to see the magic!

Comparisons on Novel View Synthesis

MeGA captures more accurate target expressions and detailed skin textures (e.g., forehead wrinkles), while avoiding artifacts inside the mouth.


The Gaussian Head Avatar actually is not consistent with the ground truth (even if they look similar), especially at the beginning of the dynamic expression video (i.e., Row 2 and 4) when there is a noticeable location shift.

Comparisons on Novel Expression Synthesis

When rendering novel expressions, MeGA produces more detailed facial textures while allowing better matching of target expressions.


As mentioned in our main text, we use the same train/test split as GaussianAvatars. Compared to the split used by the original Gaussian Head Avatar (GHA), our split uses much less expression sequences for training, amplifing the weakness of GHA in rendering novel expressions. Training details and possible reasons are stated in our supplementary document.

Comparisons on Cross-identity Reenactment


Driven by Normal Expressions


Driven by Extreme Expressions



Driven by Talking Head (with audio)

Data from VOCA (Capture, learning, and synthesis of 3D speaking styles, CVPR 2019).


The Gaussian Head Avatar employs the BFM module as their 3DMM model and provide a repository to estimate the BFM parameters from multi-view videos. However, they do not provide a script to estimate the BFM parameters from a given mesh scan. So we don't provide their talking head demo here.



Driving Edited Head Avatars

Driven by a normal expression of his own:



Driven by a extreme expression of another person: