Kenneth Blomqvist

I'm a robotics researcher at the Autonomous Systems Lab and an engineer.

Lately, my main interest has been in making learning based 3D perception algorithms applicable to real-world robotics scenarios. The main challenge being that they require tons of data to supervise. I've investigated different approaches, including automated data acquisition and annotation, generating synthetic ground truth data and using large scale image and language models as a source of supervision.

Previously, I used to be a software engineer at Wolt, Webflow and Flowdock. I founded the Junction hackathon at the Aalto Entrepreneurship Society.

Writing

Neural Stereo Depth Matching Onboard the Luxonis OAK-D

14 Nov 2022

Collecting RGB-D Datasets on LiDAR Enabled iOS devices

10 Mar 2021

We Need Organic Software

29 Nov 2020

Projects

Panoptic Vision-Language Feature Fields

Haoran Chen extended vision-language feature fields to also segment instances of objects, creating an open-vocabulary panoptic segmentation system. This means we can not only figure out which parts of the scene are relevant for a query, but also tell how many of those relevant objects there are. The work was published in IEEE Robotics and Automation Letters.

See the project website for the paper, video and more details.

ISAR: A Benchmark for Single- and Few-Shot Object Instance Segmentation and Re-Identification

Nicolas Gorlo presented his work on few-shot instance segmentation and re-identification at WACV 2024. I had the great pleasure of advising Nicolas on this project.

To build robots that can be quickly be taught about new objects and concepts, we first need to solve the problem of segmenting and re-identifying objects in images from few examples. Here, we present a benchmark and baseline algorithm for segmenting and re-identifying objects from sparse examples in the form of a few clicks from a user. The algorithm then has to segment the objects in future frames and re-identify the same object in different contexts.

See the project website for all the details.

Neural Implicit Vision-Language Feature Fields

Neural Implicit Vision-Language Feature Fields are an approach to dense scene understanding. It is a neural implicit scene representation with additional vision-language feature outputs, which can be correlated with text prompts, enabling zero-shot open-vocabulary scene segmentation. You can basically type in the thing you are looking for, and it will highlight the relevant parts of the scene.

The scene representation can be built up incrementally, it runs in real-time on real hardware, it can handle millions of semantic point queries per second and works for any text queries. See here for the research paper.

Autolabel

Autolabel is a tool to produce dense segmentation maps for RGB-D video through volumetric segmentation. Behind the scenes, it uses a semantic NeRF model to infer a 3D segmentation of the scene from very sparse user labels. The learned representation is then used to render dense segmentation maps for given viewpoints.

To improve segmentation performance, we explored using pre-learned feature maps as an additional supervision signal. We wrote about the work in our research paper.

Stray Studio and Command Line Toolkit

The Stray CLI and Studio is a software toolkit for solving computer vision problems. It allows you to reconstruct 3D scenes from RGB-D video, annotate the scenes with high-level semantic information and fit computer vision algorithms to the annotated scenes.

Object Keypoints

Object Keypoints is a simple method to build a model that can track pre-defined points on objects in 3D. It includes a way to scan the objects using a robot arm, a tool to annotate the points, a script to fit the model and a pipeline to detect the 3D points at runtime using the trained model. It can handle multiple objects in a frame simultanously and does not require object instance segmentation.

Stray Scanner App

Stray Scanner is an app for collecting RGB-D datasets with an iPhone or iPad with a LiDAR sensor.

Mobile Manipulation Demo

We created a simple demo to investigate the limitations of state-of-the-art methods in mobile manipulation. The robot is tasked to go through an office to find an object, pick it up and return with the object. It uses SLAM, motion planning, grasp planning and some perception algorithms to get the job done.

We wrote about the work here. A video of the demo can be found here.

Particle Simulation

We implemented a particle based material simulation to simulated deformable and grainy materials as well as fluids. I later extended it to make use of CUDA.

Check out the demo here. Code is available here.

Autonomous Race Car

We built an autonomous race car using an old 1/10th scale radio-controlled touring car. It uses an Nvidia Jetson TX 1 which communicates with some rc electronics through an Arduino board. It has an RGB camera at the front and an IMU sensor. Velocity is measured through a sensor inside the brushless motor.

The car be both be driven using the regular RC remote or autonomously reading commands from the Jetson. The commands from the remote can be recorded and used for learning. The picture is from an early version which used a Raspberry Pi instead of the Jetson.

We used the platform to do some research into driving the car using reinforcement learning. Most of the code is available here.

Deep Convolutional Gaussian Processes

In this project, we basically tried to build a modern convolutional neural network using Gaussian processes. We ran the model on some image classification benchmarks. At the time, we were able to get better results than any other GP based method, but results are still behind the advanced neural network based techniques. Scaling these large GP models still remains a challenge.

The results can be found in our paper and code is available here.

Get in touch

You can find me on GitHub, Twitter, LinkedIn and Goodreads. Feel free to get in touch via email.