< Back to previous page

Project

The first genome-wide maps of the micropeptidome validated through large-scale reprocessing of public proteomics data

Complete annotation of the genome is imperative for understanding development, health, and disease. Nevertheless, the annotation of the protein coding genes is far from complete. Especially micropeptides, small proteins less than 100 amino acids, are historically underrepresented in gene annotation databases. In my project proposal, I will develop a machine learning based algorithm to discover novel micropeptides in long non-coding RNA and circular RNA annotation. I will then apply this algorithm on large RNA sequencing transcriptomes of human and reference annotation of mouse, Arabidopsis and yeast to generate an in silico predicted micropeptidome. Subsequently, I will validate the existence of large numbers of these micropeptides using massive volumes of public tandem mass spectrometry data. To perform these analyses, I will rely on Ionbot, an in-house developed and state of the art sequence database search algorithm capable of performing open modification and open mutation searches. In parallel, I will create proteome-wide in silico spectral libraries and use these for spectral library searching on the same data. Finally, I will report all findings in a custom public micropeptide database.

Date:1 Oct 2020 →  13 Feb 2022
Keywords:micropeptides, bioinformatics, non-coding RNA
Disciplines:Genetics, Development of bioinformatics software, tools and databases, Structural bioinformatics and computational proteomics, Computational transcriptomics and epigenomics, Transcription and translation