< Back to previous page

Project

Deep Learning image processing for crop management

Deep learning (DL) is a subfield of machine learning where algorithms are modeled to imitate human rationale. Among DL most important application areas are computer vision and natural language processing, impacting our everyday life when we access our phone through facial recognition authorization, interact with a company’s chatbot, or get support from a virtual assistant like Alexa. Two turning points in the technology era explain DL current success: never before seen computational capability and never before seen data availability. Unfortunately, these are simultaneously the most severe limitations in applied DL since it is required to provide large amounts of data while relying on extensive computational capacity to optimize massive architectures of parameters at the core of DL algorithms. Nevertheless, given the impossibility of having large, labeled datasets and access to vast computer infrastructure, training DL models can be impossible for myriad applications. To overcome such, transfer learning has become the default strategy to share the knowledge acquired from a resource-enabled source domain to a resource-constrained target domain, with the source and target data not required to be independent and identically distributed. Still, in fields of application that use proximal and remote sensing, the similarity between the source and target data can influence the extent of knowledge transfer and, in consequence, the required size and quality of the training data in the target domain. This ultimately can make a significant difference as the curated annotation of training data is labor-intensive and time-consuming.

With a focus on tree and fruit detection applications that use proximal and remote sensing data, the objective of the dissertation was to demonstrate that image processing techniques based on DL can be leveraged despite its data dependency. More specifically, we here aimed at creating tree and fruit detection models with manually labeled data and other data that is limited because of (i) noisiness, (ii) spatial coarseness, and (iii) no labels. To reduce the effort of generating the manually labeled data and still exploiting the other data, we implemented strategies demonstrating alternatives to DL researchers and practitioners about overcoming data dependency restrictions, particularly in agricultural applications.

First, we addressed the noisy data by enabling a transfer learning approach called two-stage training. Under this approach, we proposed pre-training detection models with the noisy data and fine-tuning them with the manually labeled data. For the subject of study, the region-wide inventory of Phoenix palms in the Spanish province of Alicante, we had 5104 manual annotations besides 116,330 noisy annotations. The latter were created using a point-based inventory of Phoenix palms and aerial high-resolution RGB images from the Canary Islands, another Spanish region. This data is said to be noisy as (i) images had a different spatial resolution, plus palms are differently distributed in the landscapes between the two regions, (ii) less than 70% of points overlap with a palm crown, and (iii) annotating bounding boxes had a standard size instead of fitting the individual crown extent. Object detection models trained under this approach obtained between 10% and 14.7% gains in their average precision compared to those trained only with the manually labeled data. This is because the pre-training step helps the fine-tuning one to better adapt to the appearance of palms from a top view. In other words, the similarity between the data source, Canary Islands annotations, and the target source, Alicante annotations, positively impacted the models’ detection capacity. This is a primary lesson, especially in remote sensing, as pre-trained DL publicly available models are usually trained with large datasets of natural images where objects appear from a side-looking perspective, not a top-view one.

Next, we handled coarse data by shifting the paradigm from individual object detection to object density estimation and by integrating multimodal data (radar and optical imageries). To achieve this, we implemented a novel DL architecture that estimates the density of sub-pixel size objects from medium-spatial resolution imagery, and modify it for multimodal data integration. Following the previous subject of study, the aim was to enable a region-wide palm inventory model that did not depend on the availability of aerial high-resolution RGB images but relied on freely available space-borne imagery. The reason is that the latter is publicly available at no cost and densely covers the globe at a high revisit rate. However, such imagery change meant that palm crowns would go from appearing as a recognizable object at a 25-cm spatial resolution to a coarse object at a 10-m spatial resolution. This challenge is managed by the DL architecture that conceives the object detection problem as density estimation and semantic segmentation problems, thus allowing us to obtain a map containing the estimated number of objects per pixel and the predicted class label per pixel. Supported in the region-wide palm tree map from the first part, which is considered noisy as other tree species were mislabeled as Phoenix palms, we created a large, labeled dataset with 462,500 annotations. Also, we created a small, labeled dataset using 19,650 manual annotations. These two datasets were used in the two-stage training approach from before. Density estimation models trained under this approach obtained between 12.9% and 27.4% gains in their balanced accuracy than those restricted to use only the manually labeled data. Thus, we hypothesize that, in general, the two-stage training approach can be beneficial for other DL-remote sensing applications beyond object detection.

Finally, we dealt with non-labeled data through its annotation with minimum human intervention by creating an automatic labeling approach. For this approach, we joined an unsupervised object discovery algorithm and a supervised object classification algorithm. Here the object of study was detecting pears in orchard environments. Even though there was a closer similarity between our target dataset (on-tree pear fruits from a side look) and large datasets of natural images, we could prove the existence of an applicability gap that restricts the impact of that transfer learning. Given the limited training data attained by manual annotation and the large non-labeled dataset of images available, we intended to generate more target data automatically. This was achieved thanks to the discovery and classification algorithms based on the latest advances in self-supervised learning and transformers. However, the automatically labeled data is regarded as noisy because its quality is compromised by mistakes in the discovery step when separating overlapping objects and in the classification step when mislabeling other objects as the target. Relying on the two-stage training, we pre-trained a model with the automatically labeled data (121,038 annotations) and fine-tuned it with the manually labeled data (500 and 1000 annotations). Despite the noisiness, object detection models trained following this approach showed superior performance by obtaining average precision gains between 17.6% and 35.9% compared to those trained exclusively with limited manual data. Moreover, these superior models also acquired a better generalization capacity on unseen images taken with a different camera setup, with average precision gains between 25.1% and 38.5%. We concluded that the automatic labeling and the two-stage training are suitable procedures even for applications of proximal sensing, where the source and target data similarity does not seem critical.

With this dissertation, we contribute to accelerating the adoption of DL-based solutions in agriculture. By addressing some of the barriers that DL researchers and practitioners can encounter when working with proximal and remote sensing data, especially for noisy, coarse, and non-labeled data, we demonstrate strategies to reduce the dependency on large-scale, high-quality datasets.

Date:21 Aug 2020 →  15 Mar 2023
Keywords:Deep Learning, Agriculture, Remote Sensing, Artificial Intelligence
Disciplines:Machine learning and decision making, Computer vision, Remote sensing, Photogrammetry and remote sensing, Agricultural plant protection, Agriculture, land and farm management not elsewhere classified
Project type:PhD project