< Back to previous page


Machine reading of biological texts: bacteria-biotope extraction

Book Contribution - Book Chapter Conference Contribution

The tremendous amount of scientific literature available about bacteria and their biotopes underlines the need for efficient mechanisms to automatically extract this information. This paper presents a system to extract the bacteria and their habitats, as well as the relations between them. We investigate to what extent current techniques are suited for this task and test a variety of models in this regard. To detect entities in a biological text we use a linear chain Conditional Random Field (CRF). For the prediction of relations between the entities, a model based on logistic regression is built. Designing a system upon these techniques, we explore several improvements for both the generation and selection of good candidates. One contribution to this lies in the extended flexibility of our ontology mapper, allowing for a more advanced boundary detection. Furthermore, we discover value in the combination of several distinct candidate generation rules. Using these techniques, we show results that are significantly improving upon the state of art for the BioNLP Bacteria Biotopes task.
Book: Proceedings of the 6th international conference on bioinformatics models, methods and algorithms
Pages: 55 - 64