< Back to previous page

Project

STATISTICAL ANALYSIS OF NEXT-GENERATION SEQUENCING DATA (R-7699)

Next-generation sequencing (NGS) technology produces millions of short reads. One of the NGS-based applications is RNA sequencing (RNA-seq), which is widely used to study gene (transcript or exon) expression. In order to quantify the gene expression level, the short sequenced reads need to be identified. After it, expression summaries, i.e., read counts, are generated. Therefore, mapping the short reads is a key step in RNA sequencing processing. Read mapping makes possible to find a region, where a short read is identical or similar to genomic or transcriptomic location. However, such matching may not be accurate. Sequenced reads may be matched to multiple locations. In practice, ambiguously mapped sequences cause problems in finding region from which they truly originate, and as such, its abundance estimation. The existing methods for assigning ambiguous reads produce biased abundance estimation. In this project, we will develop two novel approaches, the theoretical framework and the weighted approach, for allocating multiple-mapped reads which allows for alleviating mentioned bias. Moreover, the ambiguity problem occurs when determining the geneisoform quantification. Different transcripts can share the same exon. In order to estimate the gene-isoform expression level, shared exons should be incorporated into statistical model.
Date:1 Jan 2017 →  31 Dec 2017
Keywords:CLINICAL TRIALS
Disciplines:Applied mathematics in specific fields, Statistics and numerical methods
Project type:Collaboration project