Autolabel is a tool to produce dense segmentation maps for RGB-D video through volumetric segmentation. Behind the scenes, it uses a semantic NeRF model to infer a 3D segmentation of the scene from very sparse user labels. The learned representation is then used to render dense segmentation maps for given viewpoints.

To improve segmentation performance, we explored using pre-learned feature maps as an additional supervision signal. We wrote about the work in our research paper.