Depth-Aware Unseen Mask Generation

Depth-aware unseen mask generation creates depth-aware unseen contours via depth warping to identify occluded regions, then uses these contours as bounding box prompts for SAM2 to segment unseen areas.
Three-dimensional scene inpainting is crucial for applications from virtual reality to architectural visualization, yet existing methods struggle with view consistency and geometric accuracy in 360° unbounded scenes. We present AuraFusion360, a novel reference-based method that enables high-quality object removal and hole filling in 3D scenes represented by Gaussian Splatting. Our approach introduces (1) depth-aware unseen mask generation for accurate occlusion identification, (2) Adaptive Guided Depth Diffusion, a zero-shot method for accurate initial point placement without requiring additional training, and (3) SDEdit-based detail enhancement for multi-view coherence. We also introduce 360-USID, the first comprehensive dataset for 360° unbounded scene inpainting with ground truth. Extensive experiments demonstrate that AuraFusion360 significantly outperforms existing methods, achieving superior perceptual quality while maintaining geometric accuracy across dramatic viewpoint changes.
Our approach takes multi-view RGB images and corresponding object masks as input and outputs a Gaussian representation with the masked objects removed. The pipeline consists of three main stages: (a) Depth-Aware Unseen Masks Generation to identify truly occluded areas, referred to as the “unseen region”, (b) Depth-Aligned Gaussian Initialization on Reference View to fill unseen regions with initialized Gaussian containing reference RGB information after object removal, and (c) SDEdit-Based RGB Guidance for Detail Enhancement, which enhances fine details using an inpainting model while preserving reference view information. Instead of applying SDEdit with random noise, we use DDIM Inversion on the rendered initial Gaussians to generate noise that retains the structure of the reference view, ensuring multi-view consistency across all RGB Guidance.
Depth-aware unseen mask generation creates depth-aware unseen contours via depth warping to identify occluded regions, then uses these contours as bounding box prompts for SAM2 to segment unseen areas.
AGDD predicts estimated depth with existing geometric scale in a zero-shot manner through latent optimization with an adaptive loss term to achieve accurate alignment in regions adjacent to unseen areas, which is more appropriate for depth inpainting scenarios.
3 Try selecting different methods and scenes!
Quantitative comparison of 360° inpainting methods on our 360-USID dataset. Red text indicates the best, and Blue text indicates the second-best performing method.
This work was supported by NVIDIA Taiwan AI Research & Development Center (TRDC).