Haoran Chen extended vision-language feature fields to also segment instances of objects, creating an open-vocabulary panoptic segmentation system. This means we can not only figure out which parts of the scene are relevant for a query, but also tell how many of those relevant objects there are. The work was published in IEEE Robotics and Automation Letters.

See the project website for the paper, video and more details.