< Back to previous page

Project

Deep Visual Recognition for the Real World

The recognition of people and objects in a scene is a fundamental problem in the field of computer vision. Most of the proposed solutions, however, focus on general applicability. These one-size-fits-all approaches ignore the subtleties inherent to more specific use cases. Indeed, many applications hold challenges and opportunities that are neither tackled nor exploited when applying general recognition solutions as-is. This dissertation focuses on three industrially-relevant use cases for which we first explore the feasibility of general approaches and then investigate how intrinsic challenges and opportunities can be exploited to improve the recognition quality.

The first use case is face recognition for automatic individualized photo album creation. In this application, an important challenge is that pictures often contain people that are unknown by the computer vision system. There are many real-world cases where these unknown faces occur multiple times throughout a collection of pictures. As such, we propose to employ clustering to find such unknown faces. From these clusters, we create pseudo-references to avoid wrong matches with people that do have a reference image. We show that this---when only three identities are known---improves the mAP from 44.6% to 59.1% on a real-world dataset and from 94.7% to 99.7% for a publicly-available dataset.

Secondly, we address grocery product detection and recognition for automatic planogram compliance verification. We train a product detector on the SKU-110K dataset and demonstrate how a product recognition network can be efficiently trained on the very large---yet noisy and imbalanced---AliProducts dataset. We elaborate on training a CNN to jointly perform these tasks. We demonstrate that this is feasible when we possess a dataset that contains both detection and recognition annotations, and look at the potential of two separate datasets, i.e., one for detection and one for recognition. After a thorough analysis, we formulate a COCO AP vs. inference time characteristic that allows to choose the most optimal network design and training procedure.

Finally, we investigate diamond recognition for secure diamond trading. We show that generic recognition approaches are already very well suited for diamond recognition, achieving an mAP of 99.970%. To further improve the recognition result, we propose to make the recognition model rotation-equivariant with a polar transformation. This yields models that achieve an mAP of 99.989%. We implement our own polar warping algorithm that can run on GPU. This allows for speed-ups of more than a factor 750 with respect to the commonly-used OpenCV implementation.

After investigating these applications, we present a Python library that we have developed to help researchers kickstart future recognition projects.

Date:25 Sep 2017 →  7 Sep 2023
Keywords:Computer vision, Person re-identification, Facial recognition, Computer Vision
Disciplines:Applied mathematics in specific fields
Project type:PhD project