MeshArt: Generating Articulated Meshes with Structure-guided Transformers
- 1Technical University of Munich
- 2Meta
MeshArt is a hierarchical transformer-based approach to generate articulated 3D meshes with clean, compact geometry.
Abstract
Articulated 3D object generation is fundamental for creating realistic, functional, and interactable virtual assets which are not simply static. We introduce MeshArt, a hierarchical transformer-based approach to generate articulated 3D meshes with clean, compact geometry, reminiscent of human-crafted 3D models. We approach articulated mesh generation in a part-by-part fashion across two stages.
First, we generate a high-level articulation-aware object structure; then, based on this structural information, we synthesize each part's mesh faces. Key to our approach is modeling both articulation structures and part meshes as sequences of quantized triangle embeddings, leading to a unified hierarchical framework with transformers for autoregressive generation. Object part structures are first generated as their bounding primitives and articulation modes; a second transformer, guided by these articulation structures, then generates each part's mesh triangles. To ensure coherency among generated parts, we introduce structure-guided conditioning that also incorporates local part mesh connectivity.
MeshArt shows significant improvements over state of the art, with 57.1% improvement in structure coverage and a 209-point improvement in mesh generation FID.
Video
Pipeline
Our pipeline consists of two main stages. First, we learn to generate articulation-aware structure using a transformer that predicts part bounding boxes and articulation information (left). Then, guided by this structural information and the local geometry of vicinity parts (junction faces), a second transformer generates the detailed geometry for each part while maintaining coherent connectivity between parts (right).