< Terug naar vorige pagina

Publicatie

A comprehensive comparison of two MEDLINE annotators for disease and gene linkage: sometimes less is more

Boekbijdrage - Boekhoofdstuk Conferentiebijdrage

© Springer International Publishing Switzerland 2016. Text mining is popular in biomedical applications because it allows retrieving highly relevant information. Particularly for us, it is quite practical in linking diseases to the genes involved in them. However text mining involves multiple challenges, such as (1) recognizing named entities (e.g., diseases and genes) inside the text, (2) constructing specific vocabularies that efficiently represent the available text, and (3) applying the correct statistical criteria to link biomedical entities with each other. We have previously developed Beegle, a tool that allows prioritizing genes for any search query of interest. The method starts with a search phase, where relevant genes are identified via the literature. Once known genes are identified, a second phase allows prioritizing novel candidate genes through a data fusion strategy. Many aspects of our method could be potentially improved. Here we evaluate two MEDLINE annotators that recognize biomedical entities inside a given abstract using different dictionaries and annotation strategies. We compare the contribution of each of the two annotators in associating genes with diseases under different vocabulary settings. Somewhat surprisingly, with fewer recognized entities and a more compact vocabulary, we obtain better associations between genes and diseases. We also propose a novel but simple association criterion to link genes with diseases, which relies on recognizing only gene entities inside the biomedical text. These refinements significantly improve the performance of our method.
Boek: Lecture Notes in Computer Science
Pagina's: 765 - 778
ISBN:978-3-319-31743-4
Jaar van publicatie:2016
BOF-keylabel:ja
IOF-keylabel:ja
Authors from:Higher Education
Toegankelijkheid:Open