Computational and Omics Analyses providing new Leads for improving Streptomyces lividans for Heterologous Protein Production
As the industrial and medical use of enzymes intensifies, so does the search for appropriate host organisms to produce these enzymes heterologously. In this respect, the bacterium Streptomyces lividans is increasingly attracting attention, mainly because of its favorable traits concerning protein secretion. This dissertation encompasses a contribution to the scientific community’s efforts to understand and modify S. lividans in order to prepare it for a successful career as an industrial host organism.
The problem of improving and understanding S. lividans is tackled in various ways throughout this work. Besides immediate insights in S. lividans and the implications for its use in heterologous protein production, various methods are developed that may prove useful in computational biology in general. Furthermore, many of the insights obtained for S. lividans are expected to hold for streptomycetes as a whole. A thermostable cellulase A (CelA) which is poorly produced by current most-used host organism Escherichia coli is used as a representative protein throughout this work.
An updated, modern genome-scale model of S. lividans metabolism is developed, which is then used to identify knockout targets expected to lead to an increase in CelA production. Besides the immediate use in target prediction, such a metabolic model is an invaluable tool for strain improvement and research purposes. Database and literature information, growth phenotyping and mathematical modeling are used in model development. The resulting model contains 1970 metabolic reactions, 1494 metabolites, and 1200 reaction-linked genes.
Prediction of metabolic strain improvement targets using the newly developed genome-scale model is based on existing methods that first estimate the metabolic flux of the wild-type strain, and then seek to minimize the change from this wild-type flux in the knockout mutant. Adjustments to these existing methods are made to increase flexibility of the resulting optimization problem. This additional flexibility is required to make existing methods developed for metabolite overproduction suitable for finding protein overproduction targets. Furthermore, a new method is written in order to incorporate gene expression data into flux predictions, with the objective of obtaining a more accurate wild-type flux distribution. In total, 118 potentially beneficial reaction knockouts are identified.
Effects of heterologous CelA production and secretion on S. lividans are assessed on a transcriptomic and fluxomic level. RNA-sequencing and differential gene expression to the wild-type strain show a transcriptomic response related to secretory stress and DNA damage, as well as activation of the OsdR regulon—a regulon associated with secondary metabolism, hypoxia, oxidative stress, intracellular signaling and morphological development. For grouping of differentially expressed genes, a clustering algorithm based on co-expression and genomic co-localization is developed. Flux-wise, the use of 13C metabolic flux analysis notably shows increased flux through the pentose phosphate pathway and tricarboxylic acid cycle, leading to higher NADPH production. The documented transcriptomic and fluxomic changes provide valuable new leads for targeted strain improvement strategies that decrease the metabolic burden associated with S. lividans heterologous protein synthesis and secretion.
To assess the effects of the growth medium on CelA production, the S. lividans transcriptome and exometabolome were analyzed during culturing in three different liquid growth media: nutrient broth, nutrient broth with glucose, and casamino acids with glucose. S. lividans exhibits a very low growth rate but surprisingly high CelA production when growing in plain nutrient broth. The media with glucose, on the other hand, give rise to high growth, but low CelA production, as well as overflow of acetic acid, pyruvic acid, alanine, and glutamic acid. The low CelA production in these media suggests strong exponential growth and resulting activation of metabolic overflow are to be avoided to attain high heterologous production. Both principal component analysis (PCA) on gene expression and differential expression analysis suggest similar transcriptomic states in the two glucose-containing media. Though PCA shows expression in plain nutrient broth does differ from the media containing glucose, differential expression analysis provides no clear link between the high CelA production rate and expression of specific (groups of) genes.
As removing genes that do not serve the industrial purpose can be expected to both improve (heterologous) production and simplify downstream processing, potential targets for genome reduction are predicted in silico, using metabolic modeling and gene expression. The effects of gene knockouts are assessed using flux balance analysis on a genome-scale model of S. lividans metabolism. The results are given both in terms of single genes and groups of adjacent genes that can likely be removed while retaining strain viability.
In a final study, in vivo transposon mutagenesis was employed to generate a mutant library from which the impact of gene disruption on strain fitness could be assessed in a high-throughput fashion. Through the development of new methods for the statistical analysis of such transposon-sequencing results, a likelihood of being essential could be determined to all S. lividans genes. Furthermore, the effects of gene disruption on growth could be estimated for the non-essential genes. These data can prove essential when undertaking a genome reduction of S. lividans, but also for cross-checking potential specific gene knockout targets and creating strains with increased or decreased fitness.
All in all, this dissertation and the accompanying additional files provide a wealth of information for future S. lividans research—with a focus on heterologous protein production—as well as a number of concrete leads for improving heterologous protein production through (i) reducing the burden of heterologous protein production, (ii) genome reduction, (iii) redirecting metabolic fluxes, and (vi) manipulating strain fitness.