< Terug naar vorige pagina

Publicatie

Topic modeling of biomedical text: from words and topics to disease and gene links

Boekbijdrage - Boekhoofdstuk Conferentiebijdrage

The massive growth of biomedical text makes it very challenging for researchers to review all relevant work and generate all possible hypotheses in a reasonable amount of time. Many text mining methods have been developed to simplify this process and quickly present the researcher with a learned set of biomedical hypotheses that could be potentially validated. Previously, we have focused on the task of identifying genes that are linked with a given disease by text mining the PubMed abstracts. We applied a word-based concept profile similarity to learn patterns between disease and gene entities and hence identify links between them. In this work, we study an alternative approach based on topic modelling to learn different patterns between the disease and the gene entities and measure how well this affects the identified links. We investigated multiple input corpuses, word representations, topic parameters, and similarity measures. On one hand, our results show that when we (1) learn the topics from an input set of gene-clustered set of abstracts, and (2) apply the dot-product similarity measure, we succeed to improve our original methods and identify more correct disease-gene links. On the other hand, the results also show that the learned topics remain limited to the diseases existing in our vocabulary such that scaling the methodology to new disease queries becomes non trivial.
Boek: Proc. 2016 IEEE International Conference on Bioinformatics and Biomedicine
Pagina's: 712 - 716
ISBN:9781509016105
Jaar van publicatie:2016
BOF-keylabel:ja
IOF-keylabel:ja
Authors from:Higher Education
Toegankelijkheid:Open