Neural Point-based Volumetric Avatar

Surface-guided Neural Volumetric Points for Efficient and High-quality Head Avatar

(SIGGRAPH ASIA 2023)

1Tsinghua University, 2Tencent AI Lab

NPVA achieves high-fidelity facial animations (images and depth maps) while maintaining efficiency comparable to mesh-based methods.

Abstract

In this paper, we propose Neural Point-based Volumetric Avatar (NPVA), a method that adopts the neural point representation as well as the neural volume rendering process and discards the predefined connectivity and hard correspondence imposed by mesh-based approaches.

Specifically, the neural points are strategically constrained around the surface of the target expression via a high-resolution UV displacement map, achieving increased modeling capacity and more accurate control. We introduce three technical innovations to improve the rendering and training efficiency: a patch-wise depth-guided (shading point) sampling strategy, a lightweight radiance decoding process, and a Grid-Error-Patch (GEP) ray sampling strategy during training.

Experiments conducted on the Multiface dataset demonstrate the effectiveness of our designs, outperforming previous state-of-the-art methods, especially in handling challenging facial regions.

Overview

The core of our approach is a neural point-based volumetric representation (middle), with points distributed around the surface of the target expression. This surface is defined by the low-resolution position map 𝑮𝑜 with intermediate supervision. A high-resolution displacement map 𝑮𝑑 allows the points to adaptively move within a certain range, as needed to provide increased capacity in more challenging regions (e.g., mouth, hair/beard). The attached point features are obtained from the feature map 𝑭. 𝑮𝑜, 𝑮𝑑, and 𝑭 are decoded from the latent code 𝒛 (left), which is trained in a variational auto-encoding style (encoder omitted). In addition, we propose three technical innovations with the aim of achieving rendering efficiency on par with mesh-based methods (right).

Comparisons with state-of-the-art methods

NPVA Rendering MVP Rendering
NPVA Rendering DAM Rendering
NPVA Rendering PiCA Rendering
NPVA Rendering Ground Truth
NPVA Rendering NPVA Geometry

Results (talking head with audio)

The talking head rendered using our NPVA model. The left part is the input (driving) mesh, the middle part is our rendering result, and the right part is the detailed depth map generated by the volume rendering.

The first sentence - "take charge of choosing her bridesmaids gowns"

The second sentence - "approach your interview with statuesque composure"

More Results (expressions)

A neutral expression rendered by our approach.

A neutral expression rendered by NPVA(ours), DAM, PiCA and MVP.

A normal expression rendered by our approach.

A normal expression rendered by NPVA(ours), DAM, PiCA and MVP.

An extreme expression rendered by our approach.

An extreme expression rendered by NPVA(ours), DAM, PiCA and MVP.

More People (TBA)

Cite

@inproceedings{DBLP:conf/siggrapha/WangKCBSZ23,
      author       = {Cong Wang and
                      Di Kang and
                      Yan{-}Pei Cao and
                      Linchao Bao and
                      Ying Shan and
                      Song{-}Hai Zhang},
      editor       = {June Kim and
                      Ming C. Lin and
                      Bernd Bickel},
      title        = {Neural Point-based Volumetric Avatar: Surface-guided Neural Points
                      for Efficient and Photorealistic Volumetric Head Avatar},
      booktitle    = {{SIGGRAPH} Asia 2023 Conference Papers, {SA} 2023, Sydney, NSW, Australia,
                      December 12-15, 2023},
      pages        = {50:1--50:12},
      publisher    = {{ACM}},
      year         = {2023},
  }