Project

Development of a microbial network inference and analysis platform

The group of organisms too small to observe with the naked eye are called microbes. Despite their small stature, they have an outsized impact on global health and are important participants in most biochemical processes that take place on the planet. Historically, microbiological studies focused on species and their ability to carry out certain functions, such as nitrogen fixation in the soil or their capacity to become human pathogens. However, it has become apparent that microbiomes, the complete collection of microbes in a system, are more than the sum of their parts. Their behaviour cannot be understood by a complete understanding of microbial behaviour in pure cultures; rather, interactions between microbes have a major impact on ecosystem functioning. In this context, microbiome research first enabled detailed quantitative studies of the role of microbes in their communities. Specifically, a range of computational methods has been developed to predict microbial interactions from sequencing data. These methods need to tackle a number of statistical issues, including compositionality (microbial counts are fractions) and sparsity (many microbes are rare). Because of these issues, the accuracy of microbial network inference methods has been found to be low. Yet, there is a more fundamental issue with this approach, which is that the associations found by microbial association network inference methods do not necessarily reflect interactions. Predatory interactions can lead to microbes occurring in the same ecosystems, but it may also lead to oscillatory patterns of abundances where one microbe is only abundant when the other is not. The same goes for other biotic interactions such as parasitism and amensalism. Another issue is that of abiotic interactions leading to associations which are then misinterpreted as possible biotic interactions. When species are responding to abiotic drivers of community structure, such as pH, this can drastically affect the structure of any inferred association network.

Given these issues, there is a distinct difference between microbial association networks and most types of networks analyzed in other fields of science (Chapter 2). It therefore remains an open question whether methods developed to analyze those networks are applicable to microbial association networks. In this thesis, I therefore describe several computational methods that I developed to improve our understanding of microbial networks (Chapters 3-5). These include a clustering algorithm, a toolbox to find core association networks and a toolbox for working with Neo4j databases. Additionally, I present a meta-analysis of a large number of networks to investigate possible drivers of network structure (Chapter 6). Altogether, these chapters therefore describe a range of advanced strategies for the analysis of microbial networks where relationships cannot directly be linked to biotic interactions.

The clustering algorithm I developed addresses the fundamental issue of weighted associations (Chapter 3). Most interactions are directed, meaning that it is possible to observe whether species A affects species B or species B affects species A. In association networks, this information is usually lacking, but we do know whether the relationship between A and B has a positive or negative weight, meaning they are more or less likely to occur together. Not all clustering algorithms are able to take this into account as many can only use nonnegative edge weights. In contrast, the manta clustering algorithm I developed is able to do this through a unique matrix normalization step. I demonstrate on simulated data sets that our clustering algorithm is able to recover simulated environmental structure. Most other clustering algorithms are similarly able to do this, but require appropriate preprocessing or careful adjustment of their parameter settings. On real-world data sets the manta clustering algorithm also appears to be able to recover meaningful biological structure, as the clusters it recovered could be linked to abiotic drivers of community structure. With the presented algorithm, it is therefore possible to cluster microbial association networks without additional prior information.

The second presented software method, anuran, is a toolbox for the analysis of core association networks (CANs) (Chapter 4. Core association networks contain associations that were found across multiple networks and may therefore be conserved. In a meta-study of microbial association networks, these networks may overlap by chance because they are derived from similar data sets. Microbiomes collected from the same biome are likely to share their most abundant species, as these are present in the core microbiome. A random sampling of microbial species can therefore be used to generate networks that are quite similar. The anuran toolbox tackles this issue by generating networks based on null models. Therefore, observed CANs can be compared to networks that are derived from the null hypothesis that all networks are random. They can also be compared to networks that contain a synthetic CAN. These comparisons make meta-analyses of microbial networks more informative because it is feasible to assess whether the observed CAN is larger or smaller than would be expected from chance alone. In case studies of sponge networks and human gut networks, I found CANs that were significantly larger compared to those observed for fully randomized networks. These CANs appeared to be linked to drivers of community structure. For the sponge networks, this was a property of the host species, while for the human microbiomes, this seemed to be the enterotypes.

The mako toolbox specifically targets multi-omics and other integrated approaches (Chapter 5). This toolbox defines a database schema for storing data in a Neo4j database and provides functions for writing standard biological formats to a Neo4j database according to this schema. Given the size of multi-omics data sets, it can be helpful to work with a database to avoid memory limitations. In contrast to a relational database, graph databases like Neo4j do not store data as tables but as nodes and relationships between those nodes. As a result, the Cypher queries needed to access Neo4j databases are pattern-based and therefore more closely resemble an intuitive human understanding of structured information. I used a collection of 60 networks to investigate the presence of 3- and 4-node cliques and found that animal biomes contained larger numbers of cliques compared to other biomes. Moreover, I demonstrated that this database model could be used to easily integrate literature-validated metabolic interactions, allowing us to query microbial associations that could be realisations of these metabolic interactions.

The networks presented with mako were also used to study the impact of phylogeny and summary statistics of relative abundance on degree (Chapter 6). The degree of a taxon represents how well-connected a taxon is in a network. Taxa with a high degree and other centrality measures are often presumed to be crucial for ecosystem structure and are therefore labelled keystone species, an untested assumption that was criticized in the correspondence that inspired this chapter. Hence, I used multiple statistical models and a machine learning approach to study the relationship between taxon abundance and degree. Although this approach could not confidently prove that taxa with a high degree were or were not keystones, I found a significant relationship between prevalence and degree. However, this relationship was insufficient to predict high degree as the machine learning approach barely outperformed a null model approach that used mean degree as the only predictor. Consequently, I concluded that there must be other aspects of microbial taxa that contribute to high degree, with a likely candidate being their relationship to factors affecting community structure.

In conclusion, the presented software methods provide novel avenues for the analysis of microbial association networks that do not presume that these networks are representations of interaction networks. Moreover, they are able to integrate alternative sources of data with the associations, supporting a more systems-based approach for microbial analysis that will be able to contribute to an improved understanding of microbial community dynamics.

Date:9 Oct 2017 → 26 Oct 2021

Keywords:microbiome, microbial ecology, metagenomics, networks

Disciplines:Biomaterials engineering, Biological system engineering, Biomechanical engineering, Other (bio)medical engineering, Environmental engineering and biotechnology, Industrial biotechnology, Other biotechnology, bio-engineering and biosystem engineering, Microbiology, Systems biology, Laboratory medicine, Immunology

Project type:PhD project

Project

Development of a microbial network inference and analysis platform

Researchers

Project partners

Funding

Publications