< Back to previous page

Publication

Incorporating prior knowledge in 3D scene understanding

Book - Dissertation

Over the past decades, substantial progress has been made in the field of computer vision. This can be explained at least partially by the advances in machine learning and artificial intelligence as a result of the reduced hardware cost and enhanced computer power. Despite this progress, the gap between human vision and computer vision remains. One remarkable aspect of human vision that contributes to this is the human brains ability to employ prior knowledge about the world we live in. Therefore, in this thesis, we try to bridge the gap between human vision and computer vision by incorporating prior knowledge. We can classify knowledge into the following four categories: permanent theoretical knowledge, circumstantial knowledge, data knowledge, and subjective experiential knowledge. We investigate which prior knowledge is available and how it can be incorporated into computer vision pipelines. For this, we examine three different case studies in the field of 3D scene understanding. In our first case study, we exploit target domain knowledge and circumstantial knowledge for the task of pallet detection and localization in a multi-camera Time-of-Flight setup. Already in the early years of computer vision, it became clear that prior knowledge about the target, such as shape and other geometrical properties, can help in different image understanding tasks such as object detection. In this case study, we show that in 3D the extra dimension, as a result of the depth data, can be used to exploit target domain knowledge more efficiently. For the extrinsic calibration of our multi-camera Time-of-Flight setup, we exploit the knowledge about the 3D model of our calibration target. This allows us to externally calibrate both cameras using only a single image of the calibration target taken by each camera. For the detection of pallets from Time-of-Flight images, we propose a template matching-based approach. The knowledge of the shape and geometry of the pallets together with the fact that the forklift will always move parallel to the racks, allow us to define a single, robust template. Our experiments show that the pallets can be detected reliably from the fused depth image as long as the forks of the automated guided vehicle do not occlude the pallet. Next, in our second case study, we propose to incorporate the available circumstantial knowledge and data knowledge to create a textured 3D model of outdoor building scenes from drone images in near real-time. 3D reconstruction from 2D images is usually achieved using a structure-from-motion pipeline. However, building a 3D model from 2D images is a very time-consuming task. Therefore we propose to exploit the available circumstantial knowledge and data knowledge to build an a priori 3D model. As we have circumstantial knowledge about the location of the drone from GPS, additional data can be retrieved based on this location. With the advances in 3D acquisition techniques, more and more 3D data is becoming available among which LIDAR data. We propose a novel pipeline to construct an a priori 3D model from this additional LIDAR data. To texture the model, we have to register the drone images with the a priori 3D model. This a very challenging task due to the dimensionality gap between the 3D model and the 2D images. We solve this dimensionality gap by reprojecting the LIDAR data onto a virtual image plane and iteratively refining the pose of the virtual image plane until the drone image matches the reprojected image. In our experiments, we examined the robustness of the registration algorithm by applying random offsets to the ground truth pose and checking the convergence of the algorithm. This showed promising results for perturbations up to 5 meter for the translation and up to 3 degree in rotation. Finally, in our third case study, the goal is to construct a geometric 3D model of the room layout from a scanned point cloud of the interior. This task can be split into two subtasks. First, the permanent structures, such as the walls, floor, and ceiling of the room must be extracted from the scanned point cloud. We assume that these permanent structures are mostly planar. However, the scanned point cloud also contains other objects such as furniture, cabinets, etc. Therefore, not all planar surfaces that are extracted belong to permanent structures. To solve this ambiguity, we propose to exploit data knowledge by first segmenting the point cloud using deep learning. For this, we started from an existing deep learning framework. However, this framework assumes that prior knowledge about the scanning trajectory is available, which is not always the case for our point clouds. Therefore, we adapted the network to use a different input data representation. Furthermore, the original dataset on which the network was trained did not represent our data very well, so we had to construct our own dataset. To construct this dataset, we had to rely on subjective experiential knowledge from 3D modeling artists. Next, the room layout can be extracted based on the permanent structures. To do this, we developed a fully automatic approach consisting of three steps. First, a 3D cell complex is constructed by slicing the bounding box around the scanned point cloud subsequently with each of the planes supporting the extracted permanent structures. Then, the cells representing the interior of the room are selected by solving the corresponding binary optimization problem. Finally, the room layout is defined by extracting all boundary faces from the selected cells. From the experiments, it can be derived that the majority of the room layouts can be extracted properly.
Publication year:2021
Accessibility:Open