< Terug naar vorige pagina

Publicatie

Irregularity Detection in Categorized Document Corpora

Boekbijdrage - Boekhoofdstuk Conferentiebijdrage

The paper presents an approach to extract irregularities in document corpora, where the documents originate from different sources and the analyst's interest is to find documents which are a typical for the given source. The main contribution of the paper is a voting-based approach to irregularity detection and its evaluation on a collection of newspaper articles from two sources: Western (UK and US) and local (Kenyan) media. The evaluation of a domain expert proves that the method is very effective in uncovering interesting irregularities in categorized document corpora.
Boek: Proceedings of the Eighth International Conference on Language Resources and Evaluation
Pagina's: 1598-1603
Aantal pagina's: 6
ISBN:978-2-9517408-7-7
Jaar van publicatie:2012
Trefwoorden:Text mining, Text categorization, Irregularity detection
  • Scopus Id: 85037378394