Project

Automated Analysis of Histological Images Using Machine Learning and Image Processing Techniques

SUMMARY

Histology studies microscopic tissue appearance and properties.

Histology analysis is used to study disease at the cellular level. This analysis happens through microscopic examination of tissue sections,

thin slices of tissue that were obtained from biopsies, mounted on a microscope slide, and made visible using a specific dye. Digital imaging allows individual microscopic structures, such as cells, nuclei or connective tissue, to be accurately quantified. Medical researchers are interested in answering questions about physical properties of these structures. For example, they might want to determine the presence of a biological compound in a certain tissue, or the average muscle cell cross-sectional area for a certain population under certain conditions.

To answer such questions they often rely on manual methods of analysis.

Exact, quantitative answers can be obtained in various ways, such as tracing of cells or regions of interest by hand or using some general purpose image editing tools. This process requires a large investment of both time and effort.

Because of this, researchers often opt to answer their questions in a semi-quantitative way, meaning that they label image structure according to a rating scale based on their visual interpretation. A researcher might for example express the intensity of the staining produced by a dye on the tissue by categorizing the tissue’s appearance as lightly stained, normally stained or heavily stained. The use of these semi-quantitative scales allow researchers to work many times faster than they would using exact manual quantification. The drawback is that such a scoring system is limited in what it can express, as only a few cases can be distinguished. Often, as in the example above, it requires scorers to split up a continuous variable along a number of boundaries, based on their interpretation of terms such as lightly stained and heavily stained. These boundaries are inherently subjective, leading to a sometimes significant degree in variance between observers, or between an individual observer’s scores over time.

These limitations of manual approaches to histological quantification have spurred interest in methods to automate this work. Computer algorithms can process certain kinds of information orders of magnitude faster than humans, and by definition they do this while following a precise set of rules. By formulating the questions in an appropriate format, this computational power could be leveraged to provide precise and reproducible quantitative answers in a fraction of the time required by manual (semi-) quantitative evaluation.

There are several ways automated solutions can assist researchers. They can segment the cells, detect their exact location and boundaries in the image. They can also help discover new relations between symptoms and disease through knowledge discovery. However, their application to various histological problems has been limited.

Some software analysis tools are available for microscopy image processing, but they have found limited use in the processing of histological image, due to some critical shortcomings. These include an inability to deal with artifacts, irrelevant tissue, close cell groups,

and disintegrating cells. Another drawback is the common requirement to manually set a number of parameters for the segmentation process.

Furthermore, these tools only generate rudimentary statistics, such as total and individual cell area. To properly address clinical questions more statistics should be computed, like cell shape, texture, and integrity.

Recent advances in machine learning methods have led to the availability of a wide array of techniques for training computer algorithms to detect certain patterns in data. By implementing these techniques in a histological setting, our goal is to create flexible methods that adapt to the specific tissue they are processing. We train the algorithm to classify (distinguish) segments into a set of predefined classes, based on a user-labeled training dataset. In the context of images, these datasets consist of image properties (features)

obtained with various image processing techniques. If the images contain cells or other histological structures, a useful first step is the extraction of these from the general image and regard them as separate segments (segmentation). The algorithm can then study examples from a training set to learn how it can use these segments’ properties to assign either them to a certain class, e.g. “nucleus”, or “cell”.

The general objective of this work is the use of machine learning together with more general image processing techniques for automatic segmentation and quantification of different tissue types in histological images.

For some histological settings a general pixel-level image processing algorithm, possibly combined with machine learning on simple features is sufficient to generate a good segmentation. We examined this for several types of imaged tissue, namely the adrenal cortex, trabecular bone and the cornea.

Certain histological structures prove too challenging to solve through applying only generally known algorithms. Hematoxylin- and eosin-stained skeletal muscle cells are an example of this: the dense structures prove too complex to segment by traditional approaches. To accurately detect and quantify these cells, we combined known and novel algorithms. That method was also extended to another challenging problem, the quantification of adipocyte cell size.

Histological images of tissue affected by a condition might contain some information relevant to its understanding. We hypothesize that some patterns might be too subtle for the human eye, and require algorithmic methods to discover. This new information can be useful in hypothesis generation regarding that condition. We investigated whether automatic image processing algorithms could pick up this type of information in a study investigating Intensive Care Unit-Acquired Weakness, or ICU-AW,

a debilitating condition that affects a large fraction of intensive care unit (ICU) patients, but remains poorly understood. Using a tissue section dataset containing muscle from ICU patients with known ICU-AW status, we investigated how predictive manually collected semi-quantitative measures of muscle deterioration were for ICU-AW status. Its performance was compared against the best predictive image-feature model, which was trained by letting it analyze example images of weak and non-weak patients. We further compared these results against a set of biochemical markers that represent the best non-clinical way to detect ICU-AW. To evaluate their ICU-AW detection effectiveness, all information sources (biochemical markers,

semi-quantitative labels, and image features) were used as inputs to machine learning algorithms and trained for ICU-AW status classification. While the semi-quantitative features provided almost no information for ICU-AW detection, the machine learning method was as predictive as the biochemical marker data. This means the algorithm detected features in the images predictive for ICU-AW that experts were not aware of or were not able to see. Further investigation of what these features are could lead to novel hypotheses and insights into the pathophysiology of ICU-AW.

Though image analysis models designed for a specific tissue type can be time-saving and useful, designing them requires an image analysis expert to invest a sometimes considerable amount of time in model creation.

This motivated us to create a framework for segmentation and quantification of arbitrary stain and tissue types. This is performed through active learning, a feedback-driven methodology where a classification model is constantly refined by a learning algorithm that repeatedly queries the user for labels of unknown data points it has determined to be the most informative. This technique saves the user the work of labeling many examples in the dataset before it is even clear if they make the training set more informative. Using these techniques in a simulated active learning experiment, we achieved good performance on two previously examined datasets, the trabecular bone and the adrenal tissue.

To summarize, in this PhD project we have laid out a variety of histology segmentation methods developed and validated by medical experts. The methods were used to automatically segment various histological tissue images across many settings, in datasets that were diverse in terms of image and stain properties as well as the required results. We showed how these algorithms generate results faster and more objective than semi-quantitative evaluation, while also offering exact reproducibility and precise quantitative results that cannot be obtained otherwise. We also showed the potential for automated image analysis as an information source in its own right for the study of illness.

Finally, we describe and test a framework designed for user feedback-informed learning of arbitrary histology data.

Date:1 Oct 2011 → 20 Nov 2015

Keywords:Machine learning, Image processing techniques, Histology

Disciplines:Anaesthesiology, Intensive care and emergency medicine

Project type:PhD project

Project

Automated Analysis of Histological Images Using Machine Learning and Image Processing Techniques

Researchers

Project partners

Funding

Publications