< Back to previous page

Project

Performance Improvement Strategies for Semantic Segmentation

This thesis focuses on semantic segmentation. It presents methods to improve semantic segmentation, such as context-aware padding, error correction, model selection and the use of temporal information.
In terms of padding for convolutional operation, standard convolutional neural networks use padding to maintain a consistent shape for feature maps. Zero padding is simple and efficient but the additional zeros that are generated by zero padding result in a discontinuous value on the border of feature maps so this study proposes context-aware padding. A model is trained using data to extrapolate the input image. To accelerate the computation, the local region is used as the input.
To increase accuracy, the results of semantic segmentation models are compared. These models cause two types of errors: errors in the object boundaries and errors in the inside of large objects. A proposed post-processing method uses a two-branch error correction network to reduce these errors. The proposed network learns directly from the joint space between the image and the label space and predicts corrected segmentation maps.
Studies of semantic segmentation focus on either fast inference or accuracy. Some cases require a combination method to fully leverage the computational budget so this study proposes a model selection method that combines the advantages of both methods. A fast model is firstly applied to the full image and then regions that do not create confidence in the preliminary results are forwarded to an accurate model. The confidence level determines the balance between accuracy and inference time. Depending on the computational budget, this method can be optimized to deliver the best performance.
In many applications, video sequences are part of the input, but these additional image frames are rarely annotated. To incorporate the information from these images, this study leverages the temporal information to increase the accuracy of semantic segmentation. A motion boundary is used to incorporate temporal information, in order to train the segmentation network so additional frames can be used without the need for ground truth labels.

Date:15 Oct 2014 →  7 Dec 2021
Keywords:computer vision
Disciplines:Nanotechnology, Design theories and methods
Project type:PhD project