MeGA LOGO

Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing

Cong Wang1,2, Di Kang2, He-Yi Sun1, Shen-Han Qian3, Zi-Xuan Wang4, Linchao Bao5, Song-Hai Zhang1,
1Tsinghua University, 2Tencent AI Lab, 3Technical University of Munich,
4Carnegie Mellon University, 5University of Birmingham

Abstract

Creating high-fidelity head avatars from multi-view videos is a core issue for many AR/VR applications. However, existing methods usually struggle to obtain high-quality renderings for all different head components simultaneously since they use one single representation to model components with drastically different characteristics (e.g., skin vs. hair). In this paper, we propose a Hybrid Mesh-Gaussian Head Avatar (MeGA) that models different head components with more suitable representations. Specifically, we select an enhanced FLAME mesh as our facial representation and predict a UV displacement map to provide per-vertex offsets for improved personalized geometric details. To achieve photorealistic renderings, we obtain facial colors using deferred neural rendering and disentangle neural textures into three meaningful parts. For hair modeling, we first build a static canonical hair using 3D Gaussian Splatting. A rigid transformation and an MLP-based deformation field are further applied to handle complex dynamic expressions. Combined with our occlusion-aware blending, MeGA generates higher-fidelity renderings for the whole head and naturally supports more downstream tasks. Experiments on the NeRSemble dataset demonstrate the effectiveness of our designs, outperforming previous state-of-the-art methods and supporting various editing functionalities, including hairstyle alteration and texture editing.


  1. Table Of Contents

    Comparisons on Novel View Synthesis
      The qualitative results
      The quantitative results
    Comparisons on Cross-identity Reenactment
      Driving with Extreme Expressions
      Driving with Talking Head
      Driving Edited Head Avatars
    More Head Editting Results

Comparisons on Novel View Synthesis



The qualitative comparisons on novel view synthesis

MeGA captures more accurate target expressions and detailed skin textures (e.g., forehead wrinkles), while avoiding artifacts inside the mouth.


The quantitative comparisons on novel view synthesis


NVS Table

Comparisons on Cross-identity Reenactment


Driving with Extreme Expressions



Driving with Talking Head (with audio)

Data from VOCA (Capture, learning, and synthesis of 3D speaking styles, CVPR 2019).



Driving Edited Head Avatars

More Head Editting Results

Cite

@article{wang2024mega,
  title={MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing},
  author={Wang, Cong and Kang, Di and Sun, He-Yi and Qian, Shen-Han and Wang, Zi-Xuan and Bao, Linchao and Zhang, Song-Hai},
  journal={arXiv preprint arXiv:2404.19026},
  year={2024}
}