We proposed a weakly-supervised approach for CAD model retrieval and alignment from an RGB image.
Our approach utilzes diffusion models to tackle the ambiguities in the monocular perception, and
achives robuts cross-domain performance while only trained on synthetic dataset
We proposed a hybrid model that utilizes polarizaiotn information with physical priors in
a data-driven learning strategy to improve the accuracy of pose predictions for photometric
challenging objects.
We proposed a self-supervised monocular depth estimation pipeline that aims to improve consistency
while preserving accuracy. We also proposed a new Temporal Consistency Metric (TCM) to quantify
depth consistency across frames.
We proposed a spatial-temporal attention mechanism guided by geometric constraints to aggregate
geometrically meaningful attention and improve temporal depth
stability and accuracy compared to previous methods.