DocHero AI - Best paraphrasing and translation tool for academic and professional writing | DocHero AI - Best paraphrasing and translation tool for academic and professional writing

SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation

Jiehong Lin, Lihua Liu, Dekun Lu et al. (4 total)

2023-11-27

2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

10.1109/cvpr52733.2024.02636

137 citations

摘要

Zero-shot 6D object pose estimation involves the detection of novel objects with their 6D poses in cluttered scenes, presenting significant challenges for model generalizability. Fortunately, the recent Segment Anything Model (SAM) has showcased remarkable zero-shot transfer performance, which provides a promising solution to tackle this task. Motivated by this, we introduce SAM-6D, a novel framework designed to realize the task through two steps, including instance segmentation and pose estimat...

查看文献

问题

The research addresses the challenge of zero-shot 6D object pose estimation, which involves detecting novel objects and estimating their 6D poses in cluttered scenes. This is difficult due to the need for model generalizability to unseen objects.

方法

The study introduces SAM-6D, a framework using the Segment Anything Model (SAM) for instance segmentation and a Pose Estimation Model (PEM) for pose estimation. It uses object matching scores and a two-stage point matching process.

关键发现

SAM-6D outperforms existing methods on the seven core datasets of the BOP Benchmark for both instance segmentation and pose estimation of novel objects, demonstrating robust generalization capabilities without bells and whistles.

3个要点

SAM-6D realizes joint instance segmentation and pose estimation of novel objects from RGB-D images, outperforming existing methods.
The framework leverages SAM's zero-shot capacities and devises a novel object matching score for identifying novel objects.
SAM-6D approaches pose estimation as a partial-to-partial point matching problem with a two-stage point matching model.

学术详情点击展开

干预措施:SAM-6D employs an Instance Segmentation Model (ISM) and a Pose Estimation Model (PEM) to segment instances and predict their 6D poses.

结果指标:Mean Average Precision (mAP) for instance segmentation and mean Average Recall (AR) for pose estimation using VSD, MSSD, and MSPD.

统计方法:ADAM optimizer to train PEM with a total of 600,000 iterations; the learning rate is initialized as 0.0001, with a cosine annealing schedule used, and the batch size is set as 28.

关键发现:SAM-6D, built on SAM, delivers superior results without the need for network re-training or finetuning. The mask predictions from ISM significantly enhance the performance of PEM.

生成于 3/22/2026