Computer Vision - Structure From Motion

This project was to use state-of-the-art computer vision techniques to create a 3D model of an environment from a sequence of 2D images taken from a car driving around a city. The motivation is to explore how self-driving cars may be able to perceive a 3D world from their 2D images which may be able to inform decisions such as where to drive or what certain objects are in the scene.

What I Learnt:

How to use the OpenCV python library to perform image processing/computer vision techniques on images and videos
How to implement a state-of-the-art Structure-From-Motion algorithm

Algorithm:

Assuming two images taken in sequence, the basic algorithm works as follows:

Features are extracted from each image using a sophisticated algorithm such as SURF
Features are matched between the images to form feature-correspondances
The Essential Matrix is found using these matched features (Note:The essential matrix basically encodes the transformation between two images, see more here)
The RT Matrix, describing the rotation and translation between the two positions from which the camera took the images are determined by decomposing the Essential Matrix
At this point a sparse point cloud of 3D points can be created using the rotation and translation found previously. The following two steps are optional
The RT matrix can be used to create an accurate disparity map, assigning a depth value to every point in the image
A dense point cloud can be created using this disparity map.

Results:

The following video shows the algorithm running in real-time on the first few images in the sequence. Intermediate steps are shown, including feature matches, sparse point cloud and disparity map:

The following images show the intermediate steps of the algorithm:

First Image in Sequence

Second Image in Sequence

Feature Matches

SURF is used to extract features of each image; then a brute-force ratio-driven matcher gets the best matching features between the images.

Disparity Map

The method to generate a smooth disparity map is taken from here.
The light areas indicate bigger differences between the two images, and this is used to represent depth information.

Dense Point Cloud (Left Side)

Using the algorithm mentioned above, a sparse point cloud is generated. This is a view from the left-side.

Dense Point Cloud (Right Side)

Using the algorithm mentioned above, a sparse point cloud is generated. This is a view from the right-side.

Dense Point Cloud (Top-Down)

Using the algorithm mentioned above, a sparse point cloud is generated. This is a view from above; you can see how the 3D structure has been found from the images clearly.

❮ ❯