< Back to previous page

Project

EXPLORING THE CODE OF LIFE: FROM DECODING TO DESIGNING CELL TYPE-SPECIFIC ENHANCERS WITH DEEP LEARNING

Cellular identity, which is defined by the activity of certain genes, is provided by the transcriptional enhancer code combined with differential and combinatorial expression of transcription factors. This code plays a central role in the regulation of gene expression. To understand the functional impact of noncoding genome variation and to develop cell type-specific drivers, decoding the code of enhancers is essential. Here in this thesis, we aim to decode and design cell type-specific enhancers by combining deep learning models, integrative genomics, and their interpretations.

In the first part of the thesis, we focused on deciphering the enhancer code in melanoma, which is a cancer cell type characterized by distinct cell states. First, we trained a deep learning model, DeepMEL, on DNA sequences using chromatin accessibility data obtain from 16 different human patient-derived cell lines. DeepMEL precisely predicts enhancer function, and we used the model to understand the code and architecture of enhancers, and to identify transcription factor binding sites for core regulatory complexes. Furthermore, by using chromatin accessibility data from 5 other species, we studied the code of orthologous enhancers and their conservation with the help of the DeepMEL model and highlighted nucleotide substitutions underlying enhancer turnover.

In the second part of the thesis, we aimed to identify and interpret mutations in functional enhancers that we obtained from 10 different personal cancer genomes. Firstly, we improved our initial DeepMEL model with additional training data and with better training strategies, which led to creation of the DeepMEL2 model. Then, by using this model, we scored and interpreted allele-specific chromatin accessibility variants (ASCAVs) in melanoma genomes and observed that a considerable fraction of ASCAVs are caused by changes in AP-1 binding sites, and our model outperformed motif-based approaches or more generic deep learning models.

In the third part of the thesis, we moved to a multicellular and more complex system, namely Drosophila brain. Using single-cell chromatin accessibility data, we trained our DeepFlyBrain model to understand the code of cell type-specific neuronal and glial enhancers. The enhancer architectures revealed by the model led to a better understanding of neuronal regulatory diversity and how it is established. Moreover, we showed that the model can be used to prune genetic driver lines for different cell types at specific timepoints, facilitating their characterization and manipulation.

In the fourth and final part of the thesis, we aimed to explore synthetic design of cell type-specific enhancers with the help of the deep learning models we trained from the previous parts of this thesis and the new insights we had obtained. We implemented and compared three different enhancer design strategies guided by our deep learning models: directed sequence evolution, iterative motif implantation, and generative design. With these strategies, we were able to create functional synthetic enhancers targeting Kenyon cells in the fruit fly brain as well as human melanoma cells. Directed sequence evolution showed that a random sequence can be converted into a functional enhancer through only 10 serial mutations that destroy repressor binding sites and create activator sites. We also used in silico evolution to modify existing genomic sequences: (1) to prune enhancers that are active in two cell types, making them specific to only one cell-type; (2) to augment enhancers that are active in one cell type, by incorporating the code for a second cell type into the same enhancer; and (3) to utilize the potential of near-enhancer sequences or “lost” enhancers (during evolution) that only have a partial enhancer code and convert them into functional enhancers. Investigating nucleotide-by-nucleotide sequence evolution showed that almost all the selected mutations were associated with the creation or destruction of a transcription factor binding site, rather than affecting contextual sequence between motif instances. This suggested that a combination of appropriately positioned activator motifs, without the presence of repressor motifs, would be sufficient to create a cell type-specific enhancer. In the second strategy, we embedded recognition motifs for transcription factors that co-operate in our target cell types. Particularly, we embedded weak and strong activator TF binding sites into random sequences at the optimal positions dictated by the deep learning model. This led to the identification of critical motif distances and allowed us to create minimal enhancers even shorter than 50bp. For each strategy, we selected several dozens of enhancers and evaluated in vivo using transgenic flies, and in vitro in human cell culture. The successful application of enhancer design strategies guided by deep learning models on both fly brain and human cancer cells show that these strategies are adaptable to any organism or system. Enhancer design guided by deep learning leads to a better understanding of how enhancers work and shows that their code can be exploited to generate improved cell-type specific drivers for gene therapy and to manipulate cell states.

Overall, this thesis demonstrated the power of deep learning models combined with integrative genomics approaches to decode and design cell type-specific enhancers. This provides valuable insights into enhancer function and enables manipulation of cell states for therapeutic purposes.

Date:1 May 2018 →  19 Jul 2023
Keywords:Deep Learning, Epigenomics
Disciplines:Other biological sciences
Project type:PhD project