DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image

Technical University of Munich
ACM Transaction on Graphics - SIGGRAPH 2024

Abstract

Perceiving 3D structures from RGB images based on CAD model primitives can enable an effective, efficient 3D object-based representation of scenes. However, current approaches rely on supervision from expensive annotations of CAD models associated with real images, and encounter challenges due to the inherent ambiguities in the task -- both in depth-scale ambiguity in monocular perception, as well as inexact matches of CAD database models to real observations.

We thus propose DiffCAD, the first weakly-supervised probabilistic approach to CAD retrieval and alignment from an RGB image. We formulate this as a conditional generative task, leveraging diffusion to learn implicit probabilistic models capturing the shape, pose, and scale of CAD objects in an image. This enables multi-hypothesis generation of different plausible CAD reconstructions, requiring only a few hypotheses to characterize ambiguities in depth/scale and inexact shape matches.

Our approach is trained only on synthetic data, leveraging monocular depth and mask estimates to enable robust zero-shot adaptation to various real target domains. Despite being trained solely on synthetic data, our multi-hypothesis approach can even surpass the supervised state-of-the-art on the Scan2CAD dataset by 5.9% with 8 hypotheses.

Video

Pipeline

we introduce DiffCAD, the first probabilistic approach for CAD retrieval and alignment to an RGB image that does not require any real-world supervision. To mitigate the multiple ambiguities inherent in monocular perception, we model the likely distributions of scene scale , object pose , and object shape as separate and disentangled conditional generative tasks.


Qualitative Results

Despite being trained on synthetic data, our approach presents robust retrieval and alignment to various real-world datasets, reconstructing the scene with multi-feasible sets of object shape and pose pairs given the ambiguities in monocular perception.


ScanNet

ARKit

BibTeX

@article{gao2023diffcad,
title= {DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image},
author={Gao, Daoyi and Rozenberszki, David and Leutenegger, Stefan and Dai, Angela},
booktitle={ArXiv Preprint},
year={2023}
}